* Re: [PATCH v1 0/9] perf stat: Enable '--per-thread' on all threads
2017-11-20 14:43 [PATCH v1 0/9] perf stat: Enable '--per-thread' on all threads Jin Yao
@ 2017-11-20 9:26 ` Jiri Olsa
2017-11-20 12:15 ` Jin, Yao
2017-11-20 14:43 ` [PATCH v1 1/9] perf util: Create rblist__reset() function Jin Yao
` (8 subsequent siblings)
9 siblings, 1 reply; 49+ messages in thread
From: Jiri Olsa @ 2017-11-20 9:26 UTC (permalink / raw)
To: Jin Yao
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On Mon, Nov 20, 2017 at 10:43:35PM +0800, Jin Yao wrote:
> perf stat --per-thread is useful to break down data per thread.
> But it currently requires specifying --pid/--tid to limit it to a process.
>
> For analysis it would be useful to do it globally for the whole system.
I can't compile this:
builtin-script.c: In function ‘perf_sample__fprint_metric’:
builtin-script.c:1549:2: error: too few arguments to function ‘perf_stat__update_shadow_stats’
perf_stat__update_shadow_stats(evsel,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from builtin-script.c:24:0:
util/stat.h:134:6: note: declared here
void perf_stat__update_shadow_stats(struct perf_evsel *counter, u64 count,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
builtin-script.c:1555:4: error: too few arguments to function ‘perf_stat__print_shadow_stats’
perf_stat__print_shadow_stats(ev2,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from builtin-script.c:24:0:
util/stat.h:143:6: note: declared here
void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mv: cannot stat './.builtin-script.o.tmp': No such file or directory
thanks,
jirka
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 0/9] perf stat: Enable '--per-thread' on all threads
2017-11-20 9:26 ` Jiri Olsa
@ 2017-11-20 12:15 ` Jin, Yao
2017-11-20 12:29 ` Jiri Olsa
0 siblings, 1 reply; 49+ messages in thread
From: Jin, Yao @ 2017-11-20 12:15 UTC (permalink / raw)
To: Jiri Olsa
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On 11/20/2017 5:26 PM, Jiri Olsa wrote:
> On Mon, Nov 20, 2017 at 10:43:35PM +0800, Jin Yao wrote:
>> perf stat --per-thread is useful to break down data per thread.
>> But it currently requires specifying --pid/--tid to limit it to a process.
>>
>> For analysis it would be useful to do it globally for the whole system.
>
> I can't compile this:
>
> builtin-script.c: In function ‘perf_sample__fprint_metric’:
> builtin-script.c:1549:2: error: too few arguments to function ‘perf_stat__update_shadow_stats’
> perf_stat__update_shadow_stats(evsel,
> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> In file included from builtin-script.c:24:0:
> util/stat.h:134:6: note: declared here
> void perf_stat__update_shadow_stats(struct perf_evsel *counter, u64 count,
> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> builtin-script.c:1555:4: error: too few arguments to function ‘perf_stat__print_shadow_stats’
> perf_stat__print_shadow_stats(ev2,
> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> In file included from builtin-script.c:24:0:
> util/stat.h:143:6: note: declared here
> void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> mv: cannot stat './.builtin-script.o.tmp': No such file or directory
>
> thanks,
> jirka
>
Hi Jiri,
This patch set is based on the latest perf/core branch.
I pull the branch right now and try the build again, the building is OK.
Could you tell me which branch you are testing on?
Thanks
Jin Yao
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 0/9] perf stat: Enable '--per-thread' on all threads
2017-11-20 12:15 ` Jin, Yao
@ 2017-11-20 12:29 ` Jiri Olsa
2017-11-20 15:50 ` Andi Kleen
0 siblings, 1 reply; 49+ messages in thread
From: Jiri Olsa @ 2017-11-20 12:29 UTC (permalink / raw)
To: Jin, Yao
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On Mon, Nov 20, 2017 at 08:15:56PM +0800, Jin, Yao wrote:
>
>
> On 11/20/2017 5:26 PM, Jiri Olsa wrote:
> > On Mon, Nov 20, 2017 at 10:43:35PM +0800, Jin Yao wrote:
> > > perf stat --per-thread is useful to break down data per thread.
> > > But it currently requires specifying --pid/--tid to limit it to a process.
> > >
> > > For analysis it would be useful to do it globally for the whole system.
> >
> > I can't compile this:
> >
> > builtin-script.c: In function ‘perf_sample__fprint_metric’:
> > builtin-script.c:1549:2: error: too few arguments to function ‘perf_stat__update_shadow_stats’
> > perf_stat__update_shadow_stats(evsel,
> > ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > In file included from builtin-script.c:24:0:
> > util/stat.h:134:6: note: declared here
> > void perf_stat__update_shadow_stats(struct perf_evsel *counter, u64 count,
> > ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > builtin-script.c:1555:4: error: too few arguments to function ‘perf_stat__print_shadow_stats’
> > perf_stat__print_shadow_stats(ev2,
> > ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > In file included from builtin-script.c:24:0:
> > util/stat.h:143:6: note: declared here
> > void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
> > ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > mv: cannot stat './.builtin-script.o.tmp': No such file or directory
> >
> > thanks,
> > jirka
> >
>
> Hi Jiri,
>
> This patch set is based on the latest perf/core branch.
>
> I pull the branch right now and try the build again, the building is OK.
>
> Could you tell me which branch you are testing on?
ugh.. I was over Andi's changes.. I'll recheck
sry for noise
jirka
^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH v1 0/9] perf stat: Enable '--per-thread' on all threads
@ 2017-11-20 14:43 Jin Yao
2017-11-20 9:26 ` Jiri Olsa
` (9 more replies)
0 siblings, 10 replies; 49+ messages in thread
From: Jin Yao @ 2017-11-20 14:43 UTC (permalink / raw)
To: acme, jolsa, peterz, mingo, alexander.shishkin
Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao
perf stat --per-thread is useful to break down data per thread.
But it currently requires specifying --pid/--tid to limit it to a process.
For analysis it would be useful to do it globally for the whole system.
1. Currently, if we perform 'perf stat --per-thread' without pid/tid,
perf returns error:
root@skl:/tmp# perf stat --per-thread
The --per-thread option is only available when monitoring via -p -t options.
-p, --pid <pid> stat events on existing process id
-t, --tid <tid> stat events on existing thread id
2. With this patch series, it returns data per thread with shadow metrics.
(run "vmstat 1" in following example)
root@skl:/tmp# perf stat --per-thread
^C
Performance counter stats for 'system wide':
perf-24165 4.302433 cpu-clock (msec) # 0.001 CPUs utilized
vmstat-23127 1.562215 cpu-clock (msec) # 0.000 CPUs utilized
irqbalance-2780 0.827851 cpu-clock (msec) # 0.000 CPUs utilized
sshd-23111 0.278308 cpu-clock (msec) # 0.000 CPUs utilized
thermald-2841 0.230880 cpu-clock (msec) # 0.000 CPUs utilized
sshd-23058 0.207306 cpu-clock (msec) # 0.000 CPUs utilized
kworker/0:2-19991 0.133983 cpu-clock (msec) # 0.000 CPUs utilized
kworker/u16:1-18249 0.125636 cpu-clock (msec) # 0.000 CPUs utilized
rcu_sched-8 0.085533 cpu-clock (msec) # 0.000 CPUs utilized
kworker/u16:2-23146 0.077139 cpu-clock (msec) # 0.000 CPUs utilized
gmain-2700 0.041789 cpu-clock (msec) # 0.000 CPUs utilized
kworker/4:1-15354 0.028370 cpu-clock (msec) # 0.000 CPUs utilized
kworker/6:0-17528 0.023895 cpu-clock (msec) # 0.000 CPUs utilized
kworker/4:1H-1887 0.013209 cpu-clock (msec) # 0.000 CPUs utilized
kworker/5:2-31362 0.011627 cpu-clock (msec) # 0.000 CPUs utilized
watchdog/0-11 0.010892 cpu-clock (msec) # 0.000 CPUs utilized
kworker/3:2-12870 0.010220 cpu-clock (msec) # 0.000 CPUs utilized
ksoftirqd/0-7 0.008869 cpu-clock (msec) # 0.000 CPUs utilized
watchdog/1-14 0.008476 cpu-clock (msec) # 0.000 CPUs utilized
watchdog/7-50 0.002944 cpu-clock (msec) # 0.000 CPUs utilized
watchdog/3-26 0.002893 cpu-clock (msec) # 0.000 CPUs utilized
watchdog/4-32 0.002759 cpu-clock (msec) # 0.000 CPUs utilized
watchdog/2-20 0.002429 cpu-clock (msec) # 0.000 CPUs utilized
watchdog/6-44 0.001491 cpu-clock (msec) # 0.000 CPUs utilized
watchdog/5-38 0.001477 cpu-clock (msec) # 0.000 CPUs utilized
rcu_sched-8 10 context-switches # 0.117 M/sec
kworker/u16:1-18249 7 context-switches # 0.056 M/sec
sshd-23111 4 context-switches # 0.014 M/sec
vmstat-23127 4 context-switches # 0.003 M/sec
perf-24165 4 context-switches # 0.930 K/sec
kworker/0:2-19991 3 context-switches # 0.022 M/sec
kworker/u16:2-23146 3 context-switches # 0.039 M/sec
kworker/4:1-15354 2 context-switches # 0.070 M/sec
kworker/6:0-17528 2 context-switches # 0.084 M/sec
sshd-23058 2 context-switches # 0.010 M/sec
ksoftirqd/0-7 1 context-switches # 0.113 M/sec
watchdog/0-11 1 context-switches # 0.092 M/sec
watchdog/1-14 1 context-switches # 0.118 M/sec
watchdog/2-20 1 context-switches # 0.412 M/sec
watchdog/3-26 1 context-switches # 0.346 M/sec
watchdog/4-32 1 context-switches # 0.362 M/sec
watchdog/5-38 1 context-switches # 0.677 M/sec
watchdog/6-44 1 context-switches # 0.671 M/sec
watchdog/7-50 1 context-switches # 0.340 M/sec
kworker/4:1H-1887 1 context-switches # 0.076 M/sec
thermald-2841 1 context-switches # 0.004 M/sec
gmain-2700 1 context-switches # 0.024 M/sec
irqbalance-2780 1 context-switches # 0.001 M/sec
kworker/3:2-12870 1 context-switches # 0.098 M/sec
kworker/5:2-31362 1 context-switches # 0.086 M/sec
kworker/u16:1-18249 2 cpu-migrations # 0.016 M/sec
kworker/u16:2-23146 2 cpu-migrations # 0.026 M/sec
rcu_sched-8 1 cpu-migrations # 0.012 M/sec
sshd-23058 1 cpu-migrations # 0.005 M/sec
perf-24165 8,833,385 cycles # 2.053 GHz
vmstat-23127 1,702,699 cycles # 1.090 GHz
irqbalance-2780 739,847 cycles # 0.894 GHz
sshd-23111 269,506 cycles # 0.968 GHz
thermald-2841 204,556 cycles # 0.886 GHz
sshd-23058 158,780 cycles # 0.766 GHz
kworker/0:2-19991 112,981 cycles # 0.843 GHz
kworker/u16:1-18249 100,926 cycles # 0.803 GHz
rcu_sched-8 74,024 cycles # 0.865 GHz
kworker/u16:2-23146 55,984 cycles # 0.726 GHz
gmain-2700 34,278 cycles # 0.820 GHz
kworker/4:1-15354 20,665 cycles # 0.728 GHz
kworker/6:0-17528 16,445 cycles # 0.688 GHz
kworker/5:2-31362 9,492 cycles # 0.816 GHz
watchdog/3-26 8,695 cycles # 3.006 GHz
kworker/4:1H-1887 8,238 cycles # 0.624 GHz
watchdog/4-32 7,580 cycles # 2.747 GHz
kworker/3:2-12870 7,306 cycles # 0.715 GHz
watchdog/2-20 7,274 cycles # 2.995 GHz
watchdog/0-11 6,988 cycles # 0.642 GHz
ksoftirqd/0-7 6,376 cycles # 0.719 GHz
watchdog/1-14 5,340 cycles # 0.630 GHz
watchdog/5-38 4,061 cycles # 2.749 GHz
watchdog/6-44 3,976 cycles # 2.667 GHz
watchdog/7-50 3,418 cycles # 1.161 GHz
vmstat-23127 2,511,699 instructions # 1.48 insn per cycle
perf-24165 1,829,908 instructions # 0.21 insn per cycle
irqbalance-2780 1,190,204 instructions # 1.61 insn per cycle
thermald-2841 143,544 instructions # 0.70 insn per cycle
sshd-23111 128,138 instructions # 0.48 insn per cycle
sshd-23058 57,654 instructions # 0.36 insn per cycle
rcu_sched-8 44,063 instructions # 0.60 insn per cycle
kworker/u16:1-18249 42,551 instructions # 0.42 insn per cycle
kworker/0:2-19991 25,873 instructions # 0.23 insn per cycle
kworker/u16:2-23146 21,407 instructions # 0.38 insn per cycle
gmain-2700 13,691 instructions # 0.40 insn per cycle
kworker/4:1-15354 12,964 instructions # 0.63 insn per cycle
kworker/6:0-17528 10,034 instructions # 0.61 insn per cycle
kworker/5:2-31362 5,203 instructions # 0.55 insn per cycle
kworker/3:2-12870 4,866 instructions # 0.67 insn per cycle
kworker/4:1H-1887 3,586 instructions # 0.44 insn per cycle
ksoftirqd/0-7 3,463 instructions # 0.54 insn per cycle
watchdog/0-11 3,135 instructions # 0.45 insn per cycle
watchdog/1-14 3,135 instructions # 0.59 insn per cycle
watchdog/2-20 3,135 instructions # 0.43 insn per cycle
watchdog/3-26 3,135 instructions # 0.36 insn per cycle
watchdog/4-32 3,135 instructions # 0.41 insn per cycle
watchdog/5-38 3,135 instructions # 0.77 insn per cycle
watchdog/6-44 3,135 instructions # 0.79 insn per cycle
watchdog/7-50 3,135 instructions # 0.92 insn per cycle
vmstat-23127 539,181 branches # 345.139 M/sec
perf-24165 375,364 branches # 87.245 M/sec
irqbalance-2780 262,092 branches # 316.593 M/sec
thermald-2841 31,611 branches # 136.915 M/sec
sshd-23111 21,874 branches # 78.596 M/sec
sshd-23058 10,682 branches # 51.528 M/sec
rcu_sched-8 8,693 branches # 101.633 M/sec
kworker/u16:1-18249 7,891 branches # 62.808 M/sec
kworker/0:2-19991 5,761 branches # 42.998 M/sec
kworker/u16:2-23146 4,099 branches # 53.138 M/sec
kworker/4:1-15354 2,755 branches # 97.110 M/sec
gmain-2700 2,638 branches # 63.127 M/sec
kworker/6:0-17528 2,216 branches # 92.739 M/sec
kworker/5:2-31362 1,132 branches # 97.360 M/sec
kworker/3:2-12870 1,081 branches # 105.773 M/sec
kworker/4:1H-1887 725 branches # 54.887 M/sec
ksoftirqd/0-7 707 branches # 79.716 M/sec
watchdog/0-11 652 branches # 59.860 M/sec
watchdog/1-14 652 branches # 76.923 M/sec
watchdog/2-20 652 branches # 268.423 M/sec
watchdog/3-26 652 branches # 225.372 M/sec
watchdog/4-32 652 branches # 236.318 M/sec
watchdog/5-38 652 branches # 441.435 M/sec
watchdog/6-44 652 branches # 437.290 M/sec
watchdog/7-50 652 branches # 221.467 M/sec
vmstat-23127 8,960 branch-misses # 1.66% of all branches
irqbalance-2780 3,047 branch-misses # 1.16% of all branches
perf-24165 2,876 branch-misses # 0.77% of all branches
sshd-23111 1,843 branch-misses # 8.43% of all branches
thermald-2841 1,444 branch-misses # 4.57% of all branches
sshd-23058 1,379 branch-misses # 12.91% of all branches
kworker/u16:1-18249 982 branch-misses # 12.44% of all branches
rcu_sched-8 893 branch-misses # 10.27% of all branches
kworker/u16:2-23146 578 branch-misses # 14.10% of all branches
kworker/0:2-19991 376 branch-misses # 6.53% of all branches
gmain-2700 280 branch-misses # 10.61% of all branches
kworker/6:0-17528 196 branch-misses # 8.84% of all branches
kworker/4:1-15354 187 branch-misses # 6.79% of all branches
kworker/5:2-31362 123 branch-misses # 10.87% of all branches
watchdog/0-11 95 branch-misses # 14.57% of all branches
watchdog/4-32 89 branch-misses # 13.65% of all branches
kworker/3:2-12870 80 branch-misses # 7.40% of all branches
watchdog/3-26 61 branch-misses # 9.36% of all branches
kworker/4:1H-1887 60 branch-misses # 8.28% of all branches
watchdog/2-20 52 branch-misses # 7.98% of all branches
ksoftirqd/0-7 47 branch-misses # 6.65% of all branches
watchdog/1-14 46 branch-misses # 7.06% of all branches
watchdog/7-50 13 branch-misses # 1.99% of all branches
watchdog/5-38 8 branch-misses # 1.23% of all branches
watchdog/6-44 7 branch-misses # 1.07% of all branches
3.695150786 seconds time elapsed
root@skl:/tmp# perf stat --per-thread -M IPC,CPI
^C
Performance counter stats for 'system wide':
vmstat-23127 2,000,783 inst_retired.any # 1.5 IPC
thermald-2841 1,472,670 inst_retired.any # 1.3 IPC
sshd-23111 977,374 inst_retired.any # 1.2 IPC
perf-24163 483,779 inst_retired.any # 0.2 IPC
gmain-2700 341,213 inst_retired.any # 0.9 IPC
sshd-23058 148,891 inst_retired.any # 0.8 IPC
rtkit-daemon-3288 71,210 inst_retired.any # 0.7 IPC
kworker/u16:1-18249 39,562 inst_retired.any # 0.3 IPC
rcu_sched-8 14,474 inst_retired.any # 0.8 IPC
kworker/0:2-19991 7,659 inst_retired.any # 0.2 IPC
kworker/4:1-15354 6,714 inst_retired.any # 0.8 IPC
rtkit-daemon-3289 4,839 inst_retired.any # 0.3 IPC
kworker/6:0-17528 3,321 inst_retired.any # 0.6 IPC
kworker/5:2-31362 3,215 inst_retired.any # 0.5 IPC
kworker/7:2-23145 3,173 inst_retired.any # 0.7 IPC
kworker/4:1H-1887 1,719 inst_retired.any # 0.3 IPC
watchdog/0-11 1,479 inst_retired.any # 0.3 IPC
watchdog/1-14 1,479 inst_retired.any # 0.3 IPC
watchdog/2-20 1,479 inst_retired.any # 0.4 IPC
watchdog/3-26 1,479 inst_retired.any # 0.4 IPC
watchdog/4-32 1,479 inst_retired.any # 0.3 IPC
watchdog/5-38 1,479 inst_retired.any # 0.3 IPC
watchdog/6-44 1,479 inst_retired.any # 0.7 IPC
watchdog/7-50 1,479 inst_retired.any # 0.7 IPC
kworker/u16:2-23146 1,408 inst_retired.any # 0.5 IPC
perf-24163 2,249,872 cpu_clk_unhalted.thread
vmstat-23127 1,352,455 cpu_clk_unhalted.thread
thermald-2841 1,161,140 cpu_clk_unhalted.thread
sshd-23111 807,827 cpu_clk_unhalted.thread
gmain-2700 375,535 cpu_clk_unhalted.thread
sshd-23058 194,071 cpu_clk_unhalted.thread
kworker/u16:1-18249 114,306 cpu_clk_unhalted.thread
rtkit-daemon-3288 103,547 cpu_clk_unhalted.thread
kworker/0:2-19991 46,550 cpu_clk_unhalted.thread
rcu_sched-8 18,855 cpu_clk_unhalted.thread
rtkit-daemon-3289 17,549 cpu_clk_unhalted.thread
kworker/4:1-15354 8,812 cpu_clk_unhalted.thread
kworker/5:2-31362 6,812 cpu_clk_unhalted.thread
kworker/4:1H-1887 5,270 cpu_clk_unhalted.thread
kworker/6:0-17528 5,111 cpu_clk_unhalted.thread
kworker/7:2-23145 4,667 cpu_clk_unhalted.thread
watchdog/0-11 4,663 cpu_clk_unhalted.thread
watchdog/1-14 4,663 cpu_clk_unhalted.thread
watchdog/4-32 4,626 cpu_clk_unhalted.thread
watchdog/5-38 4,403 cpu_clk_unhalted.thread
watchdog/3-26 3,936 cpu_clk_unhalted.thread
watchdog/2-20 3,850 cpu_clk_unhalted.thread
kworker/u16:2-23146 2,654 cpu_clk_unhalted.thread
watchdog/6-44 2,017 cpu_clk_unhalted.thread
watchdog/7-50 2,017 cpu_clk_unhalted.thread
vmstat-23127 2,000,783 inst_retired.any # 0.7 CPI
thermald-2841 1,472,670 inst_retired.any # 0.8 CPI
sshd-23111 977,374 inst_retired.any # 0.8 CPI
perf-24163 495,037 inst_retired.any # 4.7 CPI
gmain-2700 341,213 inst_retired.any # 1.1 CPI
sshd-23058 148,891 inst_retired.any # 1.3 CPI
rtkit-daemon-3288 71,210 inst_retired.any # 1.5 CPI
kworker/u16:1-18249 39,562 inst_retired.any # 2.9 CPI
rcu_sched-8 14,474 inst_retired.any # 1.3 CPI
kworker/0:2-19991 7,659 inst_retired.any # 6.1 CPI
kworker/4:1-15354 6,714 inst_retired.any # 1.3 CPI
rtkit-daemon-3289 4,839 inst_retired.any # 3.6 CPI
kworker/6:0-17528 3,321 inst_retired.any # 1.5 CPI
kworker/5:2-31362 3,215 inst_retired.any # 2.1 CPI
kworker/7:2-23145 3,173 inst_retired.any # 1.5 CPI
kworker/4:1H-1887 1,719 inst_retired.any # 3.1 CPI
watchdog/0-11 1,479 inst_retired.any # 3.2 CPI
watchdog/1-14 1,479 inst_retired.any # 3.2 CPI
watchdog/2-20 1,479 inst_retired.any # 2.6 CPI
watchdog/3-26 1,479 inst_retired.any # 2.7 CPI
watchdog/4-32 1,479 inst_retired.any # 3.1 CPI
watchdog/5-38 1,479 inst_retired.any # 3.0 CPI
watchdog/6-44 1,479 inst_retired.any # 1.4 CPI
watchdog/7-50 1,479 inst_retired.any # 1.4 CPI
kworker/u16:2-23146 1,408 inst_retired.any # 1.9 CPI
perf-24163 2,302,323 cycles
vmstat-23127 1,352,455 cycles
thermald-2841 1,161,140 cycles
sshd-23111 807,827 cycles
gmain-2700 375,535 cycles
sshd-23058 194,071 cycles
kworker/u16:1-18249 114,306 cycles
rtkit-daemon-3288 103,547 cycles
kworker/0:2-19991 46,550 cycles
rcu_sched-8 18,855 cycles
rtkit-daemon-3289 17,549 cycles
kworker/4:1-15354 8,812 cycles
kworker/5:2-31362 6,812 cycles
kworker/4:1H-1887 5,270 cycles
kworker/6:0-17528 5,111 cycles
kworker/7:2-23145 4,667 cycles
watchdog/0-11 4,663 cycles
watchdog/1-14 4,663 cycles
watchdog/4-32 4,626 cycles
watchdog/5-38 4,403 cycles
watchdog/3-26 3,936 cycles
watchdog/2-20 3,850 cycles
kworker/u16:2-23146 2,654 cycles
watchdog/6-44 2,017 cycles
watchdog/7-50 2,017 cycles
2.175726600 seconds time elapsed
Jin Yao (9):
perf util: Create rblist__reset() function
perf util: Define a structure for runtime shadow metrics stats
perf util: Reconstruct rblist for supporting per-thread shadow stats
perf util: Update and print per-thread shadow stats
perf util: Remove a set of shadow stats static variables
perf stat: Allocate shadow stats buffer for threads
perf util: Reuse thread_map__new_by_uid to enumerate threads from
/proc
perf stat: Remove --per-thread pid/tid limitation
perf stat: Resort '--per-thread' result
tools/perf/builtin-stat.c | 161 +++++++++++++---
tools/perf/tests/thread-map.c | 2 +-
tools/perf/util/evlist.c | 3 +-
tools/perf/util/rblist.c | 24 ++-
tools/perf/util/rblist.h | 1 +
tools/perf/util/stat-shadow.c | 433 ++++++++++++++++++++++++++----------------
tools/perf/util/stat.c | 15 +-
tools/perf/util/stat.h | 63 +++++-
tools/perf/util/target.h | 7 +
tools/perf/util/thread_map.c | 19 +-
tools/perf/util/thread_map.h | 3 +-
11 files changed, 519 insertions(+), 212 deletions(-)
--
2.7.4
^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH v1 1/9] perf util: Create rblist__reset() function
2017-11-20 14:43 [PATCH v1 0/9] perf stat: Enable '--per-thread' on all threads Jin Yao
2017-11-20 9:26 ` Jiri Olsa
@ 2017-11-20 14:43 ` Jin Yao
2017-11-20 14:43 ` [PATCH v1 2/9] perf util: Define a structure for runtime shadow metrics stats Jin Yao
` (7 subsequent siblings)
9 siblings, 0 replies; 49+ messages in thread
From: Jin Yao @ 2017-11-20 14:43 UTC (permalink / raw)
To: acme, jolsa, peterz, mingo, alexander.shishkin
Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao
Currently we have a rblist__delete() which is used to delete a rblist.
While rblist__delete() will free the pointer of rblist at the end.
It's inconvenience for user to delete a rblist which is not allocated
by something like malloc(). For example, the rblist is defined in a
data structure.
This patch creates a new function rblist__reset() which function is
similar as rblist__delete() but it will not free the rblist.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
tools/perf/util/rblist.c | 24 +++++++++++++++++-------
tools/perf/util/rblist.h | 1 +
2 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/tools/perf/util/rblist.c b/tools/perf/util/rblist.c
index 0dfe27d..eafa663 100644
--- a/tools/perf/util/rblist.c
+++ b/tools/perf/util/rblist.c
@@ -101,20 +101,30 @@ void rblist__init(struct rblist *rblist)
return;
}
+static void remove_nodes(struct rblist *rblist)
+{
+ struct rb_node *pos, *next = rb_first(&rblist->entries);
+
+ while (next) {
+ pos = next;
+ next = rb_next(pos);
+ rblist__remove_node(rblist, pos);
+ }
+}
+
void rblist__delete(struct rblist *rblist)
{
if (rblist != NULL) {
- struct rb_node *pos, *next = rb_first(&rblist->entries);
-
- while (next) {
- pos = next;
- next = rb_next(pos);
- rblist__remove_node(rblist, pos);
- }
+ remove_nodes(rblist);
free(rblist);
}
}
+void rblist__reset(struct rblist *rblist)
+{
+ remove_nodes(rblist);
+}
+
struct rb_node *rblist__entry(const struct rblist *rblist, unsigned int idx)
{
struct rb_node *node;
diff --git a/tools/perf/util/rblist.h b/tools/perf/util/rblist.h
index 4c8638a..048c285 100644
--- a/tools/perf/util/rblist.h
+++ b/tools/perf/util/rblist.h
@@ -35,6 +35,7 @@ void rblist__remove_node(struct rblist *rblist, struct rb_node *rb_node);
struct rb_node *rblist__find(struct rblist *rblist, const void *entry);
struct rb_node *rblist__findnew(struct rblist *rblist, const void *entry);
struct rb_node *rblist__entry(const struct rblist *rblist, unsigned int idx);
+void rblist__reset(struct rblist *rblist);
static inline bool rblist__empty(const struct rblist *rblist)
{
--
2.7.4
^ permalink raw reply related [flat|nested] 49+ messages in thread
* [PATCH v1 2/9] perf util: Define a structure for runtime shadow metrics stats
2017-11-20 14:43 [PATCH v1 0/9] perf stat: Enable '--per-thread' on all threads Jin Yao
2017-11-20 9:26 ` Jiri Olsa
2017-11-20 14:43 ` [PATCH v1 1/9] perf util: Create rblist__reset() function Jin Yao
@ 2017-11-20 14:43 ` Jin Yao
2017-11-21 15:18 ` Jiri Olsa
2017-11-20 14:43 ` [PATCH v1 3/9] perf util: Reconstruct rblist for supporting per-thread shadow stats Jin Yao
` (6 subsequent siblings)
9 siblings, 1 reply; 49+ messages in thread
From: Jin Yao @ 2017-11-20 14:43 UTC (permalink / raw)
To: acme, jolsa, peterz, mingo, alexander.shishkin
Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao
Perf has a set of static variables to record the runtime shadow
metrics stats.
While if we want to record the runtime shadow stats for per-thread,
it will be the limitation. This patch creates a structure and the
next patches will use this structure to update the runtime shadow
stats for per-thread.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
tools/perf/util/stat-shadow.c | 11 -----------
tools/perf/util/stat.h | 46 ++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 45 insertions(+), 12 deletions(-)
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 855e35c..5853901 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -9,17 +9,6 @@
#include "expr.h"
#include "metricgroup.h"
-enum {
- CTX_BIT_USER = 1 << 0,
- CTX_BIT_KERNEL = 1 << 1,
- CTX_BIT_HV = 1 << 2,
- CTX_BIT_HOST = 1 << 3,
- CTX_BIT_IDLE = 1 << 4,
- CTX_BIT_MAX = 1 << 5,
-};
-
-#define NUM_CTX CTX_BIT_MAX
-
/*
* AGGR_GLOBAL: Use CPU 0
* AGGR_SOCKET: Use first CPU of socket
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index eefca5c..61fd2e0 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -5,6 +5,8 @@
#include <linux/types.h>
#include <stdio.h>
#include "xyarray.h"
+#include "evsel.h"
+#include "rblist.h"
struct stats
{
@@ -43,11 +45,54 @@ enum aggr_mode {
AGGR_UNSET,
};
+enum {
+ CTX_BIT_USER = 1 << 0,
+ CTX_BIT_KERNEL = 1 << 1,
+ CTX_BIT_HV = 1 << 2,
+ CTX_BIT_HOST = 1 << 3,
+ CTX_BIT_IDLE = 1 << 4,
+ CTX_BIT_MAX = 1 << 5,
+};
+
+#define NUM_CTX CTX_BIT_MAX
+
+enum stat_type {
+ STAT_NONE = 0,
+ STAT_NSECS,
+ STAT_CYCLES,
+ STAT_STALLED_CYCLES_FRONT,
+ STAT_STALLED_CYCLES_BACK,
+ STAT_BRANCHES,
+ STAT_CACHEREFS,
+ STAT_L1_DCACHE,
+ STAT_L1_ICACHE,
+ STAT_LL_CACHE,
+ STAT_ITLB_CACHE,
+ STAT_DTLB_CACHE,
+ STAT_CYCLES_IN_TX,
+ STAT_TRANSACTION,
+ STAT_ELISION,
+ STAT_TOPDOWN_TOTAL_SLOTS,
+ STAT_TOPDOWN_SLOTS_ISSUED,
+ STAT_TOPDOWN_SLOTS_RETIRED,
+ STAT_TOPDOWN_FETCH_BUBBLES,
+ STAT_TOPDOWN_RECOVERY_BUBBLES,
+ STAT_SMI_NUM,
+ STAT_APERF,
+ STAT_MAX
+};
+
+struct runtime_stat {
+ struct rblist value_list;
+};
+
struct perf_stat_config {
enum aggr_mode aggr_mode;
bool scale;
FILE *output;
unsigned int interval;
+ struct runtime_stat *stats;
+ int stat_num;
};
void update_stats(struct stats *stats, u64 val);
@@ -92,7 +137,6 @@ struct perf_stat_output_ctx {
bool force_header;
};
-struct rblist;
void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
double avg, int cpu,
struct perf_stat_output_ctx *out,
--
2.7.4
^ permalink raw reply related [flat|nested] 49+ messages in thread
* [PATCH v1 3/9] perf util: Reconstruct rblist for supporting per-thread shadow stats
2017-11-20 14:43 [PATCH v1 0/9] perf stat: Enable '--per-thread' on all threads Jin Yao
` (2 preceding siblings ...)
2017-11-20 14:43 ` [PATCH v1 2/9] perf util: Define a structure for runtime shadow metrics stats Jin Yao
@ 2017-11-20 14:43 ` Jin Yao
2017-11-21 15:17 ` Jiri Olsa
` (6 more replies)
2017-11-20 14:43 ` [PATCH v1 4/9] perf util: Update and print " Jin Yao
` (5 subsequent siblings)
9 siblings, 7 replies; 49+ messages in thread
From: Jin Yao @ 2017-11-20 14:43 UTC (permalink / raw)
To: acme, jolsa, peterz, mingo, alexander.shishkin
Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao
In current stat-shadow.c, the rblist deleting is ignored.
The patch reconstructs the codes of rblist init/free, and adds
the implementation to node_delete method of rblist.
This patch also does:
1. Add the ctx/type/stat into rbtree keys because we will
use this rbtree to maintain shadow metrics to replace original
a couple of static arrays for supporting per-thread shadow stats.
2. Create a static runtime_stat variable 'rt_stat' which
will log the shadow metrics by default.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
tools/perf/util/stat-shadow.c | 62 ++++++++++++++++++++++++++++++++++++++++---
tools/perf/util/stat.h | 2 ++
2 files changed, 60 insertions(+), 4 deletions(-)
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 5853901..045e129 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -40,12 +40,16 @@ static struct stats runtime_aperf_stats[NUM_CTX][MAX_NR_CPUS];
static struct rblist runtime_saved_values;
static bool have_frontend_stalled;
+static struct runtime_stat rt_stat;
struct stats walltime_nsecs_stats;
struct saved_value {
struct rb_node rb_node;
struct perf_evsel *evsel;
+ enum stat_type type;
+ int ctx;
int cpu;
+ struct runtime_stat *stat;
struct stats stats;
};
@@ -58,6 +62,23 @@ static int saved_value_cmp(struct rb_node *rb_node, const void *entry)
if (a->cpu != b->cpu)
return a->cpu - b->cpu;
+
+ if (a->type != b->type)
+ return a->type - b->type;
+
+ if (a->ctx != b->ctx)
+ return a->ctx - b->ctx;
+
+ if (a->evsel == NULL && b->evsel == NULL) {
+ if (a->stat == b->stat)
+ return 0;
+
+ if ((char *)a->stat < (char *)b->stat)
+ return -1;
+
+ return 1;
+ }
+
if (a->evsel == b->evsel)
return 0;
if ((char *)a->evsel < (char *)b->evsel)
@@ -76,6 +97,17 @@ static struct rb_node *saved_value_new(struct rblist *rblist __maybe_unused,
return &nd->rb_node;
}
+static void saved_value_delete(struct rblist *rblist __maybe_unused,
+ struct rb_node *rb_node)
+{
+ struct saved_value *v = container_of(rb_node,
+ struct saved_value,
+ rb_node);
+
+ if (v)
+ free(v);
+}
+
static struct saved_value *saved_value_lookup(struct perf_evsel *evsel,
int cpu,
bool create)
@@ -97,13 +129,35 @@ static struct saved_value *saved_value_lookup(struct perf_evsel *evsel,
return NULL;
}
+static void init_saved_rblist(struct rblist *rblist)
+{
+ rblist__init(rblist);
+ rblist->node_cmp = saved_value_cmp;
+ rblist->node_new = saved_value_new;
+ rblist->node_delete = saved_value_delete;
+}
+
+static void free_saved_rblist(struct rblist *rblist)
+{
+ rblist__reset(rblist);
+}
+
+void perf_stat__init_runtime_stat(struct runtime_stat *stat)
+{
+ memset(stat, 0, sizeof(struct runtime_stat));
+ init_saved_rblist(&stat->value_list);
+}
+
+void perf_stat__free_runtime_stat(struct runtime_stat *stat)
+{
+ free_saved_rblist(&stat->value_list);
+}
+
void perf_stat__init_shadow_stats(void)
{
have_frontend_stalled = pmu_have_event("cpu", "stalled-cycles-frontend");
- rblist__init(&runtime_saved_values);
- runtime_saved_values.node_cmp = saved_value_cmp;
- runtime_saved_values.node_new = saved_value_new;
- /* No delete for now */
+ memset(&walltime_nsecs_stats, 0, sizeof(walltime_nsecs_stats));
+ perf_stat__init_runtime_stat(&rt_stat);
}
static int evsel_context(struct perf_evsel *evsel)
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 61fd2e0..4eb081d 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -126,6 +126,8 @@ typedef void (*print_metric_t)(void *ctx, const char *color, const char *unit,
const char *fmt, double val);
typedef void (*new_line_t )(void *ctx);
+void perf_stat__init_runtime_stat(struct runtime_stat *stat);
+void perf_stat__free_runtime_stat(struct runtime_stat *stat);
void perf_stat__init_shadow_stats(void);
void perf_stat__reset_shadow_stats(void);
void perf_stat__update_shadow_stats(struct perf_evsel *counter, u64 count,
--
2.7.4
^ permalink raw reply related [flat|nested] 49+ messages in thread
* [PATCH v1 4/9] perf util: Update and print per-thread shadow stats
2017-11-20 14:43 [PATCH v1 0/9] perf stat: Enable '--per-thread' on all threads Jin Yao
` (3 preceding siblings ...)
2017-11-20 14:43 ` [PATCH v1 3/9] perf util: Reconstruct rblist for supporting per-thread shadow stats Jin Yao
@ 2017-11-20 14:43 ` Jin Yao
2017-11-21 15:17 ` Jiri Olsa
2017-11-21 15:18 ` Jiri Olsa
2017-11-20 14:43 ` [PATCH v1 5/9] perf util: Remove a set of shadow stats static variables Jin Yao
` (4 subsequent siblings)
9 siblings, 2 replies; 49+ messages in thread
From: Jin Yao @ 2017-11-20 14:43 UTC (permalink / raw)
To: acme, jolsa, peterz, mingo, alexander.shishkin
Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao
The functions perf_stat__update_shadow_stats() and
perf_stat__print_shadow_statss() are called to update
and print the shadow stats on a set of static variables.
But the static variables are the limitations to support
per-thread shadow stats.
This patch lets the perf_stat__update_shadow_stats() support
to update the shadow stats on a input parameter 'stat' and
uses update_runtime_stat() to update the stats. It will not
directly update the static variables as before.
And this patch also lets the perf_stat__print_shadow_stats()
support to print the shadow stats from a input parameter 'stat'.
It will not directly get value from static variable. Instead, it now
uses runtime_stat_avg() and runtime_stat_n() to get and compute the
values.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
tools/perf/builtin-stat.c | 25 ++--
tools/perf/util/stat-shadow.c | 296 +++++++++++++++++++++++++++---------------
tools/perf/util/stat.c | 15 ++-
tools/perf/util/stat.h | 5 +-
4 files changed, 222 insertions(+), 119 deletions(-)
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 59af5a8..2ad5f4a 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1151,7 +1151,8 @@ static void abs_printout(int id, int nr, struct perf_evsel *evsel, double avg)
}
static void printout(int id, int nr, struct perf_evsel *counter, double uval,
- char *prefix, u64 run, u64 ena, double noise)
+ char *prefix, u64 run, u64 ena, double noise,
+ struct runtime_stat *stat)
{
struct perf_stat_output_ctx out;
struct outstate os = {
@@ -1244,7 +1245,8 @@ static void printout(int id, int nr, struct perf_evsel *counter, double uval,
perf_stat__print_shadow_stats(counter, uval,
first_shadow_cpu(counter, id),
- &out, &metric_events);
+ &out, &metric_events,
+ stat);
if (!csv_output && !metric_only) {
print_noise(counter, noise);
print_running(run, ena);
@@ -1268,7 +1270,7 @@ static void aggr_update_shadow(void)
val += perf_counts(counter->counts, cpu, 0)->val;
}
perf_stat__update_shadow_stats(counter, val,
- first_shadow_cpu(counter, id));
+ first_shadow_cpu(counter, id), NULL);
}
}
}
@@ -1388,7 +1390,8 @@ static void print_aggr(char *prefix)
fprintf(output, "%s", prefix);
uval = val * counter->scale;
- printout(id, nr, counter, uval, prefix, run, ena, 1.0);
+ printout(id, nr, counter, uval, prefix, run, ena, 1.0,
+ NULL);
if (!metric_only)
fputc('\n', output);
}
@@ -1418,7 +1421,8 @@ static void print_aggr_thread(struct perf_evsel *counter, char *prefix)
fprintf(output, "%s", prefix);
uval = val * counter->scale;
- printout(thread, 0, counter, uval, prefix, run, ena, 1.0);
+ printout(thread, 0, counter, uval, prefix, run, ena, 1.0,
+ &stat_config.stats[thread]);
fputc('\n', output);
}
}
@@ -1455,7 +1459,8 @@ static void print_counter_aggr(struct perf_evsel *counter, char *prefix)
fprintf(output, "%s", prefix);
uval = cd.avg * counter->scale;
- printout(-1, 0, counter, uval, prefix, cd.avg_running, cd.avg_enabled, cd.avg);
+ printout(-1, 0, counter, uval, prefix, cd.avg_running, cd.avg_enabled,
+ cd.avg, NULL);
if (!metric_only)
fprintf(output, "\n");
}
@@ -1494,7 +1499,7 @@ static void print_counter(struct perf_evsel *counter, char *prefix)
fprintf(output, "%s", prefix);
uval = val * counter->scale;
- printout(cpu, 0, counter, uval, prefix, run, ena, 1.0);
+ printout(cpu, 0, counter, uval, prefix, run, ena, 1.0, NULL);
fputc('\n', output);
}
@@ -1526,7 +1531,8 @@ static void print_no_aggr_metric(char *prefix)
run = perf_counts(counter->counts, cpu, 0)->run;
uval = val * counter->scale;
- printout(cpu, 0, counter, uval, prefix, run, ena, 1.0);
+ printout(cpu, 0, counter, uval, prefix, run, ena, 1.0,
+ NULL);
}
fputc('\n', stat_config.output);
}
@@ -1582,7 +1588,8 @@ static void print_metric_headers(const char *prefix, bool no_indent)
perf_stat__print_shadow_stats(counter, 0,
0,
&out,
- &metric_events);
+ &metric_events,
+ NULL);
}
fputc('\n', stat_config.output);
}
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 045e129..6f28782 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -110,19 +110,32 @@ static void saved_value_delete(struct rblist *rblist __maybe_unused,
static struct saved_value *saved_value_lookup(struct perf_evsel *evsel,
int cpu,
- bool create)
+ bool create,
+ enum stat_type type,
+ int ctx,
+ struct runtime_stat *stat)
{
+ struct rblist *rblist;
struct rb_node *nd;
struct saved_value dm = {
.cpu = cpu,
.evsel = evsel,
+ .type = type,
+ .ctx = ctx,
+ .stat = stat,
};
- nd = rblist__find(&runtime_saved_values, &dm);
+
+ if (stat)
+ rblist = &stat->value_list;
+ else
+ rblist = &rt_stat.value_list;
+
+ nd = rblist__find(rblist, &dm);
if (nd)
return container_of(nd, struct saved_value, rb_node);
if (create) {
- rblist__add_node(&runtime_saved_values, &dm);
- nd = rblist__find(&runtime_saved_values, &dm);
+ rblist__add_node(rblist, &dm);
+ nd = rblist__find(rblist, &dm);
if (nd)
return container_of(nd, struct saved_value, rb_node);
}
@@ -217,13 +230,24 @@ void perf_stat__reset_shadow_stats(void)
}
}
+static void update_runtime_stat(struct runtime_stat *stat,
+ enum stat_type type,
+ int ctx, int cpu, u64 count)
+{
+ struct saved_value *v = saved_value_lookup(NULL, cpu, true,
+ type, ctx, stat);
+
+ if (v)
+ update_stats(&v->stats, count);
+}
+
/*
* Update various tracking values we maintain to print
* more semantic information such as miss/hit ratios,
* instruction rates, etc:
*/
void perf_stat__update_shadow_stats(struct perf_evsel *counter, u64 count,
- int cpu)
+ int cpu, struct runtime_stat *stat)
{
int ctx = evsel_context(counter);
@@ -231,50 +255,58 @@ void perf_stat__update_shadow_stats(struct perf_evsel *counter, u64 count,
if (perf_evsel__match(counter, SOFTWARE, SW_TASK_CLOCK) ||
perf_evsel__match(counter, SOFTWARE, SW_CPU_CLOCK))
- update_stats(&runtime_nsecs_stats[cpu], count);
+ update_runtime_stat(stat, STAT_NSECS, 0, cpu, count);
else if (perf_evsel__match(counter, HARDWARE, HW_CPU_CYCLES))
- update_stats(&runtime_cycles_stats[ctx][cpu], count);
+ update_runtime_stat(stat, STAT_CYCLES, ctx, cpu, count);
else if (perf_stat_evsel__is(counter, CYCLES_IN_TX))
- update_stats(&runtime_cycles_in_tx_stats[ctx][cpu], count);
+ update_runtime_stat(stat, STAT_CYCLES_IN_TX, ctx, cpu, count);
else if (perf_stat_evsel__is(counter, TRANSACTION_START))
- update_stats(&runtime_transaction_stats[ctx][cpu], count);
+ update_runtime_stat(stat, STAT_TRANSACTION, ctx, cpu, count);
else if (perf_stat_evsel__is(counter, ELISION_START))
- update_stats(&runtime_elision_stats[ctx][cpu], count);
+ update_runtime_stat(stat, STAT_ELISION, ctx, cpu, count);
else if (perf_stat_evsel__is(counter, TOPDOWN_TOTAL_SLOTS))
- update_stats(&runtime_topdown_total_slots[ctx][cpu], count);
+ update_runtime_stat(stat, STAT_TOPDOWN_TOTAL_SLOTS,
+ ctx, cpu, count);
else if (perf_stat_evsel__is(counter, TOPDOWN_SLOTS_ISSUED))
- update_stats(&runtime_topdown_slots_issued[ctx][cpu], count);
+ update_runtime_stat(stat, STAT_TOPDOWN_SLOTS_ISSUED,
+ ctx, cpu, count);
else if (perf_stat_evsel__is(counter, TOPDOWN_SLOTS_RETIRED))
- update_stats(&runtime_topdown_slots_retired[ctx][cpu], count);
+ update_runtime_stat(stat, STAT_TOPDOWN_SLOTS_RETIRED,
+ ctx, cpu, count);
else if (perf_stat_evsel__is(counter, TOPDOWN_FETCH_BUBBLES))
- update_stats(&runtime_topdown_fetch_bubbles[ctx][cpu], count);
+ update_runtime_stat(stat, STAT_TOPDOWN_FETCH_BUBBLES,
+ ctx, cpu, count);
else if (perf_stat_evsel__is(counter, TOPDOWN_RECOVERY_BUBBLES))
- update_stats(&runtime_topdown_recovery_bubbles[ctx][cpu], count);
+ update_runtime_stat(stat, STAT_TOPDOWN_RECOVERY_BUBBLES,
+ ctx, cpu, count);
else if (perf_evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_FRONTEND))
- update_stats(&runtime_stalled_cycles_front_stats[ctx][cpu], count);
+ update_runtime_stat(stat, STAT_STALLED_CYCLES_FRONT,
+ ctx, cpu, count);
else if (perf_evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_BACKEND))
- update_stats(&runtime_stalled_cycles_back_stats[ctx][cpu], count);
+ update_runtime_stat(stat, STAT_STALLED_CYCLES_BACK,
+ ctx, cpu, count);
else if (perf_evsel__match(counter, HARDWARE, HW_BRANCH_INSTRUCTIONS))
- update_stats(&runtime_branches_stats[ctx][cpu], count);
+ update_runtime_stat(stat, STAT_BRANCHES, ctx, cpu, count);
else if (perf_evsel__match(counter, HARDWARE, HW_CACHE_REFERENCES))
- update_stats(&runtime_cacherefs_stats[ctx][cpu], count);
+ update_runtime_stat(stat, STAT_CACHEREFS, ctx, cpu, count);
else if (perf_evsel__match(counter, HW_CACHE, HW_CACHE_L1D))
- update_stats(&runtime_l1_dcache_stats[ctx][cpu], count);
+ update_runtime_stat(stat, STAT_L1_DCACHE, ctx, cpu, count);
else if (perf_evsel__match(counter, HW_CACHE, HW_CACHE_L1I))
- update_stats(&runtime_ll_cache_stats[ctx][cpu], count);
+ update_runtime_stat(stat, STAT_L1_ICACHE, ctx, cpu, count);
else if (perf_evsel__match(counter, HW_CACHE, HW_CACHE_LL))
- update_stats(&runtime_ll_cache_stats[ctx][cpu], count);
+ update_runtime_stat(stat, STAT_LL_CACHE, ctx, cpu, count);
else if (perf_evsel__match(counter, HW_CACHE, HW_CACHE_DTLB))
- update_stats(&runtime_dtlb_cache_stats[ctx][cpu], count);
+ update_runtime_stat(stat, STAT_DTLB_CACHE, ctx, cpu, count);
else if (perf_evsel__match(counter, HW_CACHE, HW_CACHE_ITLB))
- update_stats(&runtime_itlb_cache_stats[ctx][cpu], count);
+ update_runtime_stat(stat, STAT_ITLB_CACHE, ctx, cpu, count);
else if (perf_stat_evsel__is(counter, SMI_NUM))
- update_stats(&runtime_smi_num_stats[ctx][cpu], count);
+ update_runtime_stat(stat, STAT_SMI_NUM, ctx, cpu, count);
else if (perf_stat_evsel__is(counter, APERF))
- update_stats(&runtime_aperf_stats[ctx][cpu], count);
+ update_runtime_stat(stat, STAT_APERF, ctx, cpu, count);
if (counter->collect_stat) {
- struct saved_value *v = saved_value_lookup(counter, cpu, true);
+ struct saved_value *v = saved_value_lookup(counter, cpu, true,
+ STAT_NONE, 0, stat);
update_stats(&v->stats, count);
}
}
@@ -395,15 +427,40 @@ void perf_stat__collect_metric_expr(struct perf_evlist *evsel_list)
}
}
+static double runtime_stat_avg(struct runtime_stat *stat,
+ enum stat_type type, int ctx, int cpu)
+{
+ struct saved_value *v;
+
+ v = saved_value_lookup(NULL, cpu, false, type, ctx, stat);
+ if (!v)
+ return 0.0;
+
+ return avg_stats(&v->stats);
+}
+
+static double runtime_stat_n(struct runtime_stat *stat,
+ enum stat_type type, int ctx, int cpu)
+{
+ struct saved_value *v;
+
+ v = saved_value_lookup(NULL, cpu, false, type, ctx, stat);
+ if (!v)
+ return 0.0;
+
+ return v->stats.n;
+}
+
static void print_stalled_cycles_frontend(int cpu,
struct perf_evsel *evsel, double avg,
- struct perf_stat_output_ctx *out)
+ struct perf_stat_output_ctx *out,
+ struct runtime_stat *stat)
{
double total, ratio = 0.0;
const char *color;
int ctx = evsel_context(evsel);
- total = avg_stats(&runtime_cycles_stats[ctx][cpu]);
+ total = runtime_stat_avg(stat, STAT_CYCLES, ctx, cpu);
if (total)
ratio = avg / total * 100.0;
@@ -419,13 +476,14 @@ static void print_stalled_cycles_frontend(int cpu,
static void print_stalled_cycles_backend(int cpu,
struct perf_evsel *evsel, double avg,
- struct perf_stat_output_ctx *out)
+ struct perf_stat_output_ctx *out,
+ struct runtime_stat *stat)
{
double total, ratio = 0.0;
const char *color;
int ctx = evsel_context(evsel);
- total = avg_stats(&runtime_cycles_stats[ctx][cpu]);
+ total = runtime_stat_avg(stat, STAT_CYCLES, ctx, cpu);
if (total)
ratio = avg / total * 100.0;
@@ -438,13 +496,14 @@ static void print_stalled_cycles_backend(int cpu,
static void print_branch_misses(int cpu,
struct perf_evsel *evsel,
double avg,
- struct perf_stat_output_ctx *out)
+ struct perf_stat_output_ctx *out,
+ struct runtime_stat *stat)
{
double total, ratio = 0.0;
const char *color;
int ctx = evsel_context(evsel);
- total = avg_stats(&runtime_branches_stats[ctx][cpu]);
+ total = runtime_stat_avg(stat, STAT_BRANCHES, ctx, cpu);
if (total)
ratio = avg / total * 100.0;
@@ -457,13 +516,15 @@ static void print_branch_misses(int cpu,
static void print_l1_dcache_misses(int cpu,
struct perf_evsel *evsel,
double avg,
- struct perf_stat_output_ctx *out)
+ struct perf_stat_output_ctx *out,
+ struct runtime_stat *stat)
+
{
double total, ratio = 0.0;
const char *color;
int ctx = evsel_context(evsel);
- total = avg_stats(&runtime_l1_dcache_stats[ctx][cpu]);
+ total = runtime_stat_avg(stat, STAT_L1_DCACHE, ctx, cpu);
if (total)
ratio = avg / total * 100.0;
@@ -476,13 +537,15 @@ static void print_l1_dcache_misses(int cpu,
static void print_l1_icache_misses(int cpu,
struct perf_evsel *evsel,
double avg,
- struct perf_stat_output_ctx *out)
+ struct perf_stat_output_ctx *out,
+ struct runtime_stat *stat)
+
{
double total, ratio = 0.0;
const char *color;
int ctx = evsel_context(evsel);
- total = avg_stats(&runtime_l1_icache_stats[ctx][cpu]);
+ total = runtime_stat_avg(stat, STAT_L1_ICACHE, ctx, cpu);
if (total)
ratio = avg / total * 100.0;
@@ -494,13 +557,14 @@ static void print_l1_icache_misses(int cpu,
static void print_dtlb_cache_misses(int cpu,
struct perf_evsel *evsel,
double avg,
- struct perf_stat_output_ctx *out)
+ struct perf_stat_output_ctx *out,
+ struct runtime_stat *stat)
{
double total, ratio = 0.0;
const char *color;
int ctx = evsel_context(evsel);
- total = avg_stats(&runtime_dtlb_cache_stats[ctx][cpu]);
+ total = runtime_stat_avg(stat, STAT_DTLB_CACHE, ctx, cpu);
if (total)
ratio = avg / total * 100.0;
@@ -512,13 +576,14 @@ static void print_dtlb_cache_misses(int cpu,
static void print_itlb_cache_misses(int cpu,
struct perf_evsel *evsel,
double avg,
- struct perf_stat_output_ctx *out)
+ struct perf_stat_output_ctx *out,
+ struct runtime_stat *stat)
{
double total, ratio = 0.0;
const char *color;
int ctx = evsel_context(evsel);
- total = avg_stats(&runtime_itlb_cache_stats[ctx][cpu]);
+ total = runtime_stat_avg(stat, STAT_ITLB_CACHE, ctx, cpu);
if (total)
ratio = avg / total * 100.0;
@@ -530,13 +595,14 @@ static void print_itlb_cache_misses(int cpu,
static void print_ll_cache_misses(int cpu,
struct perf_evsel *evsel,
double avg,
- struct perf_stat_output_ctx *out)
+ struct perf_stat_output_ctx *out,
+ struct runtime_stat *stat)
{
double total, ratio = 0.0;
const char *color;
int ctx = evsel_context(evsel);
- total = avg_stats(&runtime_ll_cache_stats[ctx][cpu]);
+ total = runtime_stat_avg(stat, STAT_LL_CACHE, ctx, cpu);
if (total)
ratio = avg / total * 100.0;
@@ -594,68 +660,72 @@ static double sanitize_val(double x)
return x;
}
-static double td_total_slots(int ctx, int cpu)
+static double td_total_slots(int ctx, int cpu, struct runtime_stat *stat)
{
- return avg_stats(&runtime_topdown_total_slots[ctx][cpu]);
+ return runtime_stat_avg(stat, STAT_TOPDOWN_TOTAL_SLOTS, ctx, cpu);
}
-static double td_bad_spec(int ctx, int cpu)
+static double td_bad_spec(int ctx, int cpu, struct runtime_stat *stat)
{
double bad_spec = 0;
double total_slots;
double total;
- total = avg_stats(&runtime_topdown_slots_issued[ctx][cpu]) -
- avg_stats(&runtime_topdown_slots_retired[ctx][cpu]) +
- avg_stats(&runtime_topdown_recovery_bubbles[ctx][cpu]);
- total_slots = td_total_slots(ctx, cpu);
+ total = runtime_stat_avg(stat, STAT_TOPDOWN_SLOTS_ISSUED, ctx, cpu) -
+ runtime_stat_avg(stat, STAT_TOPDOWN_SLOTS_RETIRED, ctx, cpu) +
+ runtime_stat_avg(stat, STAT_TOPDOWN_RECOVERY_BUBBLES, ctx, cpu);
+
+ total_slots = td_total_slots(ctx, cpu, stat);
if (total_slots)
bad_spec = total / total_slots;
return sanitize_val(bad_spec);
}
-static double td_retiring(int ctx, int cpu)
+static double td_retiring(int ctx, int cpu, struct runtime_stat *stat)
{
double retiring = 0;
- double total_slots = td_total_slots(ctx, cpu);
- double ret_slots = avg_stats(&runtime_topdown_slots_retired[ctx][cpu]);
+ double total_slots = td_total_slots(ctx, cpu, stat);
+ double ret_slots = runtime_stat_avg(stat, STAT_TOPDOWN_SLOTS_RETIRED,
+ ctx, cpu);
if (total_slots)
retiring = ret_slots / total_slots;
return retiring;
}
-static double td_fe_bound(int ctx, int cpu)
+static double td_fe_bound(int ctx, int cpu, struct runtime_stat *stat)
{
double fe_bound = 0;
- double total_slots = td_total_slots(ctx, cpu);
- double fetch_bub = avg_stats(&runtime_topdown_fetch_bubbles[ctx][cpu]);
+ double total_slots = td_total_slots(ctx, cpu, stat);
+ double fetch_bub = runtime_stat_avg(stat, STAT_TOPDOWN_FETCH_BUBBLES,
+ ctx, cpu);
if (total_slots)
fe_bound = fetch_bub / total_slots;
return fe_bound;
}
-static double td_be_bound(int ctx, int cpu)
+static double td_be_bound(int ctx, int cpu, struct runtime_stat *stat)
{
- double sum = (td_fe_bound(ctx, cpu) +
- td_bad_spec(ctx, cpu) +
- td_retiring(ctx, cpu));
+ double sum = (td_fe_bound(ctx, cpu, stat) +
+ td_bad_spec(ctx, cpu, stat) +
+ td_retiring(ctx, cpu, stat));
if (sum == 0)
return 0;
return sanitize_val(1.0 - sum);
}
static void print_smi_cost(int cpu, struct perf_evsel *evsel,
- struct perf_stat_output_ctx *out)
+ struct perf_stat_output_ctx *out,
+ struct runtime_stat *stat)
{
double smi_num, aperf, cycles, cost = 0.0;
int ctx = evsel_context(evsel);
const char *color = NULL;
- smi_num = avg_stats(&runtime_smi_num_stats[ctx][cpu]);
- aperf = avg_stats(&runtime_aperf_stats[ctx][cpu]);
- cycles = avg_stats(&runtime_cycles_stats[ctx][cpu]);
+ smi_num = runtime_stat_avg(stat, STAT_SMI_NUM, ctx, cpu);
+ aperf = runtime_stat_avg(stat, STAT_APERF, ctx, cpu);
+ cycles = runtime_stat_avg(stat, STAT_CYCLES, ctx, cpu);
if ((cycles == 0) || (aperf == 0))
return;
@@ -675,7 +745,8 @@ static void generic_metric(const char *metric_expr,
const char *metric_name,
double avg,
int cpu,
- struct perf_stat_output_ctx *out)
+ struct perf_stat_output_ctx *out,
+ struct runtime_stat *stat)
{
print_metric_t print_metric = out->print_metric;
struct parse_ctx pctx;
@@ -694,7 +765,8 @@ static void generic_metric(const char *metric_expr,
stats = &walltime_nsecs_stats;
scale = 1e-9;
} else {
- v = saved_value_lookup(metric_events[i], cpu, false);
+ v = saved_value_lookup(metric_events[i], cpu, false,
+ STAT_NONE, 0, stat);
if (!v)
break;
stats = &v->stats;
@@ -722,7 +794,8 @@ static void generic_metric(const char *metric_expr,
void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
double avg, int cpu,
struct perf_stat_output_ctx *out,
- struct rblist *metric_events)
+ struct rblist *metric_events,
+ struct runtime_stat *stat)
{
void *ctxp = out->ctx;
print_metric_t print_metric = out->print_metric;
@@ -733,7 +806,8 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
int num = 1;
if (perf_evsel__match(evsel, HARDWARE, HW_INSTRUCTIONS)) {
- total = avg_stats(&runtime_cycles_stats[ctx][cpu]);
+ total = runtime_stat_avg(stat, STAT_CYCLES, ctx, cpu);
+
if (total) {
ratio = avg / total;
print_metric(ctxp, NULL, "%7.2f ",
@@ -741,8 +815,13 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
} else {
print_metric(ctxp, NULL, NULL, "insn per cycle", 0);
}
- total = avg_stats(&runtime_stalled_cycles_front_stats[ctx][cpu]);
- total = max(total, avg_stats(&runtime_stalled_cycles_back_stats[ctx][cpu]));
+
+ total = runtime_stat_avg(stat, STAT_STALLED_CYCLES_FRONT,
+ ctx, cpu);
+
+ total = max(total, runtime_stat_avg(stat,
+ STAT_STALLED_CYCLES_BACK,
+ ctx, cpu));
if (total && avg) {
out->new_line(ctxp);
@@ -755,8 +834,8 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
"stalled cycles per insn", 0);
}
} else if (perf_evsel__match(evsel, HARDWARE, HW_BRANCH_MISSES)) {
- if (runtime_branches_stats[ctx][cpu].n != 0)
- print_branch_misses(cpu, evsel, avg, out);
+ if (runtime_stat_n(stat, STAT_BRANCHES, ctx, cpu) != 0)
+ print_branch_misses(cpu, evsel, avg, out, stat);
else
print_metric(ctxp, NULL, NULL, "of all branches", 0);
} else if (
@@ -764,8 +843,9 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
evsel->attr.config == ( PERF_COUNT_HW_CACHE_L1D |
((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
- if (runtime_l1_dcache_stats[ctx][cpu].n != 0)
- print_l1_dcache_misses(cpu, evsel, avg, out);
+
+ if (runtime_stat_n(stat, STAT_L1_DCACHE, ctx, cpu) != 0)
+ print_l1_dcache_misses(cpu, evsel, avg, out, stat);
else
print_metric(ctxp, NULL, NULL, "of all L1-dcache hits", 0);
} else if (
@@ -773,8 +853,9 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
evsel->attr.config == ( PERF_COUNT_HW_CACHE_L1I |
((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
- if (runtime_l1_icache_stats[ctx][cpu].n != 0)
- print_l1_icache_misses(cpu, evsel, avg, out);
+
+ if (runtime_stat_n(stat, STAT_L1_ICACHE, ctx, cpu) != 0)
+ print_l1_icache_misses(cpu, evsel, avg, out, stat);
else
print_metric(ctxp, NULL, NULL, "of all L1-icache hits", 0);
} else if (
@@ -782,8 +863,9 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
evsel->attr.config == ( PERF_COUNT_HW_CACHE_DTLB |
((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
- if (runtime_dtlb_cache_stats[ctx][cpu].n != 0)
- print_dtlb_cache_misses(cpu, evsel, avg, out);
+
+ if (runtime_stat_n(stat, STAT_DTLB_CACHE, ctx, cpu) != 0)
+ print_dtlb_cache_misses(cpu, evsel, avg, out, stat);
else
print_metric(ctxp, NULL, NULL, "of all dTLB cache hits", 0);
} else if (
@@ -791,8 +873,9 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
evsel->attr.config == ( PERF_COUNT_HW_CACHE_ITLB |
((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
- if (runtime_itlb_cache_stats[ctx][cpu].n != 0)
- print_itlb_cache_misses(cpu, evsel, avg, out);
+
+ if (runtime_stat_n(stat, STAT_ITLB_CACHE, ctx, cpu) != 0)
+ print_itlb_cache_misses(cpu, evsel, avg, out, stat);
else
print_metric(ctxp, NULL, NULL, "of all iTLB cache hits", 0);
} else if (
@@ -800,27 +883,28 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
evsel->attr.config == ( PERF_COUNT_HW_CACHE_LL |
((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
- if (runtime_ll_cache_stats[ctx][cpu].n != 0)
- print_ll_cache_misses(cpu, evsel, avg, out);
+
+ if (runtime_stat_n(stat, STAT_LL_CACHE, ctx, cpu) != 0)
+ print_ll_cache_misses(cpu, evsel, avg, out, stat);
else
print_metric(ctxp, NULL, NULL, "of all LL-cache hits", 0);
} else if (perf_evsel__match(evsel, HARDWARE, HW_CACHE_MISSES)) {
- total = avg_stats(&runtime_cacherefs_stats[ctx][cpu]);
+ total = runtime_stat_avg(stat, STAT_CACHEREFS, ctx, cpu);
if (total)
ratio = avg * 100 / total;
- if (runtime_cacherefs_stats[ctx][cpu].n != 0)
+ if (runtime_stat_n(stat, STAT_CACHEREFS, ctx, cpu) != 0)
print_metric(ctxp, NULL, "%8.3f %%",
"of all cache refs", ratio);
else
print_metric(ctxp, NULL, NULL, "of all cache refs", 0);
} else if (perf_evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_FRONTEND)) {
- print_stalled_cycles_frontend(cpu, evsel, avg, out);
+ print_stalled_cycles_frontend(cpu, evsel, avg, out, stat);
} else if (perf_evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_BACKEND)) {
- print_stalled_cycles_backend(cpu, evsel, avg, out);
+ print_stalled_cycles_backend(cpu, evsel, avg, out, stat);
} else if (perf_evsel__match(evsel, HARDWARE, HW_CPU_CYCLES)) {
- total = avg_stats(&runtime_nsecs_stats[cpu]);
+ total = runtime_stat_avg(stat, STAT_NSECS, 0, cpu);
if (total) {
ratio = avg / total;
@@ -829,7 +913,8 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
print_metric(ctxp, NULL, NULL, "Ghz", 0);
}
} else if (perf_stat_evsel__is(evsel, CYCLES_IN_TX)) {
- total = avg_stats(&runtime_cycles_stats[ctx][cpu]);
+ total = runtime_stat_avg(stat, STAT_CYCLES, ctx, cpu);
+
if (total)
print_metric(ctxp, NULL,
"%7.2f%%", "transactional cycles",
@@ -838,8 +923,9 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
print_metric(ctxp, NULL, NULL, "transactional cycles",
0);
} else if (perf_stat_evsel__is(evsel, CYCLES_IN_TX_CP)) {
- total = avg_stats(&runtime_cycles_stats[ctx][cpu]);
- total2 = avg_stats(&runtime_cycles_in_tx_stats[ctx][cpu]);
+ total = runtime_stat_avg(stat, STAT_CYCLES, ctx, cpu);
+ total2 = runtime_stat_avg(stat, STAT_CYCLES_IN_TX, ctx, cpu);
+
if (total2 < avg)
total2 = avg;
if (total)
@@ -848,19 +934,21 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
else
print_metric(ctxp, NULL, NULL, "aborted cycles", 0);
} else if (perf_stat_evsel__is(evsel, TRANSACTION_START)) {
- total = avg_stats(&runtime_cycles_in_tx_stats[ctx][cpu]);
+ total = runtime_stat_avg(stat, STAT_CYCLES_IN_TX,
+ ctx, cpu);
if (avg)
ratio = total / avg;
- if (runtime_cycles_in_tx_stats[ctx][cpu].n != 0)
+ if (runtime_stat_n(stat, STAT_CYCLES_IN_TX, ctx, cpu) != 0)
print_metric(ctxp, NULL, "%8.0f",
"cycles / transaction", ratio);
else
print_metric(ctxp, NULL, NULL, "cycles / transaction",
- 0);
+ 0);
} else if (perf_stat_evsel__is(evsel, ELISION_START)) {
- total = avg_stats(&runtime_cycles_in_tx_stats[ctx][cpu]);
+ total = runtime_stat_avg(stat, STAT_CYCLES_IN_TX,
+ ctx, cpu);
if (avg)
ratio = total / avg;
@@ -874,28 +962,28 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
else
print_metric(ctxp, NULL, NULL, "CPUs utilized", 0);
} else if (perf_stat_evsel__is(evsel, TOPDOWN_FETCH_BUBBLES)) {
- double fe_bound = td_fe_bound(ctx, cpu);
+ double fe_bound = td_fe_bound(ctx, cpu, stat);
if (fe_bound > 0.2)
color = PERF_COLOR_RED;
print_metric(ctxp, color, "%8.1f%%", "frontend bound",
fe_bound * 100.);
} else if (perf_stat_evsel__is(evsel, TOPDOWN_SLOTS_RETIRED)) {
- double retiring = td_retiring(ctx, cpu);
+ double retiring = td_retiring(ctx, cpu, stat);
if (retiring > 0.7)
color = PERF_COLOR_GREEN;
print_metric(ctxp, color, "%8.1f%%", "retiring",
retiring * 100.);
} else if (perf_stat_evsel__is(evsel, TOPDOWN_RECOVERY_BUBBLES)) {
- double bad_spec = td_bad_spec(ctx, cpu);
+ double bad_spec = td_bad_spec(ctx, cpu, stat);
if (bad_spec > 0.1)
color = PERF_COLOR_RED;
print_metric(ctxp, color, "%8.1f%%", "bad speculation",
bad_spec * 100.);
} else if (perf_stat_evsel__is(evsel, TOPDOWN_SLOTS_ISSUED)) {
- double be_bound = td_be_bound(ctx, cpu);
+ double be_bound = td_be_bound(ctx, cpu, stat);
const char *name = "backend bound";
static int have_recovery_bubbles = -1;
@@ -908,19 +996,19 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
if (be_bound > 0.2)
color = PERF_COLOR_RED;
- if (td_total_slots(ctx, cpu) > 0)
+ if (td_total_slots(ctx, cpu, stat) > 0)
print_metric(ctxp, color, "%8.1f%%", name,
be_bound * 100.);
else
print_metric(ctxp, NULL, NULL, name, 0);
} else if (evsel->metric_expr) {
generic_metric(evsel->metric_expr, evsel->metric_events, evsel->name,
- evsel->metric_name, avg, cpu, out);
- } else if (runtime_nsecs_stats[cpu].n != 0) {
+ evsel->metric_name, avg, cpu, out, stat);
+ } else if (runtime_stat_n(stat, STAT_NSECS, 0, cpu) != 0) {
char unit = 'M';
char unit_buf[10];
- total = avg_stats(&runtime_nsecs_stats[cpu]);
+ total = runtime_stat_avg(stat, STAT_NSECS, 0, cpu);
if (total)
ratio = 1000.0 * avg / total;
@@ -931,7 +1019,7 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
snprintf(unit_buf, sizeof(unit_buf), "%c/sec", unit);
print_metric(ctxp, NULL, "%8.3f", unit_buf, ratio);
} else if (perf_stat_evsel__is(evsel, SMI_NUM)) {
- print_smi_cost(cpu, evsel, out);
+ print_smi_cost(cpu, evsel, out, stat);
} else {
num = 0;
}
@@ -944,7 +1032,7 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
out->new_line(ctxp);
generic_metric(mexp->metric_expr, mexp->metric_events,
evsel->name, mexp->metric_name,
- avg, cpu, out);
+ avg, cpu, out, stat);
}
}
if (num == 0)
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 151e9ef..50bb16d 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -278,9 +278,16 @@ process_counter_values(struct perf_stat_config *config, struct perf_evsel *evsel
perf_evsel__compute_deltas(evsel, cpu, thread, count);
perf_counts_values__scale(count, config->scale, NULL);
if (config->aggr_mode == AGGR_NONE)
- perf_stat__update_shadow_stats(evsel, count->val, cpu);
- if (config->aggr_mode == AGGR_THREAD)
- perf_stat__update_shadow_stats(evsel, count->val, 0);
+ perf_stat__update_shadow_stats(evsel, count->val, cpu,
+ NULL);
+ if (config->aggr_mode == AGGR_THREAD) {
+ if (config->stats)
+ perf_stat__update_shadow_stats(evsel,
+ count->val, 0, &config->stats[thread]);
+ else
+ perf_stat__update_shadow_stats(evsel,
+ count->val, 0, NULL);
+ }
break;
case AGGR_GLOBAL:
aggr->val += count->val;
@@ -362,7 +369,7 @@ int perf_stat_process_counter(struct perf_stat_config *config,
/*
* Save the full runtime - to allow normalization during printout:
*/
- perf_stat__update_shadow_stats(counter, *count, 0);
+ perf_stat__update_shadow_stats(counter, *count, 0, NULL);
return 0;
}
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 4eb081d..92671ed 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -131,7 +131,7 @@ void perf_stat__free_runtime_stat(struct runtime_stat *stat);
void perf_stat__init_shadow_stats(void);
void perf_stat__reset_shadow_stats(void);
void perf_stat__update_shadow_stats(struct perf_evsel *counter, u64 count,
- int cpu);
+ int cpu, struct runtime_stat *stat);
struct perf_stat_output_ctx {
void *ctx;
print_metric_t print_metric;
@@ -142,7 +142,8 @@ struct perf_stat_output_ctx {
void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
double avg, int cpu,
struct perf_stat_output_ctx *out,
- struct rblist *metric_events);
+ struct rblist *metric_events,
+ struct runtime_stat *stat);
void perf_stat__collect_metric_expr(struct perf_evlist *);
int perf_evlist__alloc_stats(struct perf_evlist *evlist, bool alloc_raw);
--
2.7.4
^ permalink raw reply related [flat|nested] 49+ messages in thread
* [PATCH v1 5/9] perf util: Remove a set of shadow stats static variables
2017-11-20 14:43 [PATCH v1 0/9] perf stat: Enable '--per-thread' on all threads Jin Yao
` (4 preceding siblings ...)
2017-11-20 14:43 ` [PATCH v1 4/9] perf util: Update and print " Jin Yao
@ 2017-11-20 14:43 ` Jin Yao
2017-11-21 15:17 ` Jiri Olsa
2017-11-20 14:43 ` [PATCH v1 6/9] perf stat: Allocate shadow stats buffer for threads Jin Yao
` (3 subsequent siblings)
9 siblings, 1 reply; 49+ messages in thread
From: Jin Yao @ 2017-11-20 14:43 UTC (permalink / raw)
To: acme, jolsa, peterz, mingo, alexander.shishkin
Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao
In previous patches, we have reconstructed the code and let
it not access the static variables directly.
This patch removes these static variables.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
tools/perf/util/stat-shadow.c | 64 ++++++++++---------------------------------
tools/perf/util/stat.h | 1 +
2 files changed, 16 insertions(+), 49 deletions(-)
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 6f28782..74bcc4d 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -16,28 +16,6 @@
* AGGR_NONE: Use matching CPU
* AGGR_THREAD: Not supported?
*/
-static struct stats runtime_nsecs_stats[MAX_NR_CPUS];
-static struct stats runtime_cycles_stats[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_stalled_cycles_front_stats[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_stalled_cycles_back_stats[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_branches_stats[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_cacherefs_stats[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_l1_dcache_stats[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_l1_icache_stats[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_ll_cache_stats[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_itlb_cache_stats[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_dtlb_cache_stats[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_cycles_in_tx_stats[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_transaction_stats[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_elision_stats[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_topdown_total_slots[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_topdown_slots_issued[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_topdown_slots_retired[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_topdown_fetch_bubbles[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_topdown_recovery_bubbles[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_smi_num_stats[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_aperf_stats[NUM_CTX][MAX_NR_CPUS];
-static struct rblist runtime_saved_values;
static bool have_frontend_stalled;
static struct runtime_stat rt_stat;
@@ -191,36 +169,13 @@ static int evsel_context(struct perf_evsel *evsel)
return ctx;
}
-void perf_stat__reset_shadow_stats(void)
+static void reset_stat(struct runtime_stat *stat)
{
+ struct rblist *rblist;
struct rb_node *pos, *next;
- memset(runtime_nsecs_stats, 0, sizeof(runtime_nsecs_stats));
- memset(runtime_cycles_stats, 0, sizeof(runtime_cycles_stats));
- memset(runtime_stalled_cycles_front_stats, 0, sizeof(runtime_stalled_cycles_front_stats));
- memset(runtime_stalled_cycles_back_stats, 0, sizeof(runtime_stalled_cycles_back_stats));
- memset(runtime_branches_stats, 0, sizeof(runtime_branches_stats));
- memset(runtime_cacherefs_stats, 0, sizeof(runtime_cacherefs_stats));
- memset(runtime_l1_dcache_stats, 0, sizeof(runtime_l1_dcache_stats));
- memset(runtime_l1_icache_stats, 0, sizeof(runtime_l1_icache_stats));
- memset(runtime_ll_cache_stats, 0, sizeof(runtime_ll_cache_stats));
- memset(runtime_itlb_cache_stats, 0, sizeof(runtime_itlb_cache_stats));
- memset(runtime_dtlb_cache_stats, 0, sizeof(runtime_dtlb_cache_stats));
- memset(runtime_cycles_in_tx_stats, 0,
- sizeof(runtime_cycles_in_tx_stats));
- memset(runtime_transaction_stats, 0,
- sizeof(runtime_transaction_stats));
- memset(runtime_elision_stats, 0, sizeof(runtime_elision_stats));
- memset(&walltime_nsecs_stats, 0, sizeof(walltime_nsecs_stats));
- memset(runtime_topdown_total_slots, 0, sizeof(runtime_topdown_total_slots));
- memset(runtime_topdown_slots_retired, 0, sizeof(runtime_topdown_slots_retired));
- memset(runtime_topdown_slots_issued, 0, sizeof(runtime_topdown_slots_issued));
- memset(runtime_topdown_fetch_bubbles, 0, sizeof(runtime_topdown_fetch_bubbles));
- memset(runtime_topdown_recovery_bubbles, 0, sizeof(runtime_topdown_recovery_bubbles));
- memset(runtime_smi_num_stats, 0, sizeof(runtime_smi_num_stats));
- memset(runtime_aperf_stats, 0, sizeof(runtime_aperf_stats));
-
- next = rb_first(&runtime_saved_values.entries);
+ rblist = &stat->value_list;
+ next = rb_first(&rblist->entries);
while (next) {
pos = next;
next = rb_next(pos);
@@ -230,6 +185,17 @@ void perf_stat__reset_shadow_stats(void)
}
}
+void perf_stat__reset_shadow_stats(void)
+{
+ reset_stat(&rt_stat);
+ memset(&walltime_nsecs_stats, 0, sizeof(walltime_nsecs_stats));
+}
+
+void perf_stat__reset_shadow_per_stat(struct runtime_stat *stat)
+{
+ reset_stat(stat);
+}
+
static void update_runtime_stat(struct runtime_stat *stat,
enum stat_type type,
int ctx, int cpu, u64 count)
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 92671ed..7ed77b8 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -130,6 +130,7 @@ void perf_stat__init_runtime_stat(struct runtime_stat *stat);
void perf_stat__free_runtime_stat(struct runtime_stat *stat);
void perf_stat__init_shadow_stats(void);
void perf_stat__reset_shadow_stats(void);
+void perf_stat__reset_shadow_per_stat(struct runtime_stat *stat);
void perf_stat__update_shadow_stats(struct perf_evsel *counter, u64 count,
int cpu, struct runtime_stat *stat);
struct perf_stat_output_ctx {
--
2.7.4
^ permalink raw reply related [flat|nested] 49+ messages in thread
* [PATCH v1 6/9] perf stat: Allocate shadow stats buffer for threads
2017-11-20 14:43 [PATCH v1 0/9] perf stat: Enable '--per-thread' on all threads Jin Yao
` (5 preceding siblings ...)
2017-11-20 14:43 ` [PATCH v1 5/9] perf util: Remove a set of shadow stats static variables Jin Yao
@ 2017-11-20 14:43 ` Jin Yao
2017-11-20 14:43 ` [PATCH v1 7/9] perf util: Reuse thread_map__new_by_uid to enumerate threads from /proc Jin Yao
` (2 subsequent siblings)
9 siblings, 0 replies; 49+ messages in thread
From: Jin Yao @ 2017-11-20 14:43 UTC (permalink / raw)
To: acme, jolsa, peterz, mingo, alexander.shishkin
Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao
After perf_evlist__create_maps() being executed, we can get all
threads from /proc. And via thread_map__nr(), we can also get
the number of threads.
With the number of threads, the patch allocates a buffer which
will record the shadow stats for these threads.
The buffer pointer is saved in stat_config.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
tools/perf/builtin-stat.c | 46 +++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 45 insertions(+), 1 deletion(-)
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 2ad5f4a..9eec145 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -214,8 +214,13 @@ static inline void diff_timespec(struct timespec *r, struct timespec *a,
static void perf_stat__reset_stats(void)
{
+ int i;
+
perf_evlist__reset_stats(evsel_list);
perf_stat__reset_shadow_stats();
+
+ for (i = 0; i < stat_config.stat_num; i++)
+ perf_stat__reset_shadow_per_stat(&stat_config.stats[i]);
}
static int create_perf_stat_counter(struct perf_evsel *evsel)
@@ -2548,6 +2553,35 @@ int process_cpu_map_event(struct perf_tool *tool,
return set_maps(st);
}
+static int runtime_stat_alloc(struct perf_stat_config *config, int nthreads)
+{
+ int i;
+
+ config->stats = calloc(nthreads, sizeof(struct runtime_stat));
+ if (!config->stats)
+ return -1;
+
+ config->stat_num = nthreads;
+
+ for (i = 0; i < nthreads; i++)
+ perf_stat__init_runtime_stat(&config->stats[i]);
+
+ return 0;
+}
+
+static void runtime_stat_free(struct perf_stat_config *config)
+{
+ int i;
+
+ if (!config->stats)
+ return;
+
+ for (i = 0; i < config->stat_num; i++)
+ perf_stat__free_runtime_stat(&config->stats[i]);
+
+ free(config->stats);
+}
+
static const char * const stat_report_usage[] = {
"perf stat report [<options>]",
NULL,
@@ -2803,8 +2837,15 @@ int cmd_stat(int argc, const char **argv)
* Initialize thread_map with comm names,
* so we could print it out on output.
*/
- if (stat_config.aggr_mode == AGGR_THREAD)
+ if (stat_config.aggr_mode == AGGR_THREAD) {
thread_map__read_comms(evsel_list->threads);
+ if (target.system_wide) {
+ if (runtime_stat_alloc(&stat_config,
+ thread_map__nr(evsel_list->threads))) {
+ goto out;
+ }
+ }
+ }
if (interval && interval < 100) {
if (interval < 10) {
@@ -2894,5 +2935,8 @@ int cmd_stat(int argc, const char **argv)
sysfs__write_int(FREEZE_ON_SMI_PATH, 0);
perf_evlist__delete(evsel_list);
+
+ runtime_stat_free(&stat_config);
+
return status;
}
--
2.7.4
^ permalink raw reply related [flat|nested] 49+ messages in thread
* [PATCH v1 7/9] perf util: Reuse thread_map__new_by_uid to enumerate threads from /proc
2017-11-20 14:43 [PATCH v1 0/9] perf stat: Enable '--per-thread' on all threads Jin Yao
` (6 preceding siblings ...)
2017-11-20 14:43 ` [PATCH v1 6/9] perf stat: Allocate shadow stats buffer for threads Jin Yao
@ 2017-11-20 14:43 ` Jin Yao
2017-11-20 14:43 ` [PATCH v1 8/9] perf stat: Remove --per-thread pid/tid limitation Jin Yao
2017-11-20 14:43 ` [PATCH v1 9/9] perf stat: Resort '--per-thread' result Jin Yao
9 siblings, 0 replies; 49+ messages in thread
From: Jin Yao @ 2017-11-20 14:43 UTC (permalink / raw)
To: acme, jolsa, peterz, mingo, alexander.shishkin
Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao
Perf already has a function thread_map__new_by_uid() which can
enumerate all threads from /proc by uid.
This patch creates a static function enumerate_threads() which
reuses the common code in thread_map__new_by_uid() to enumerate
threads from /proc.
The enumerate_threads() is shared by thread_map__new_by_uid()
and a new function thread_map__new_threads().
The new function thread_map__new_threads() is called to enumerate
all threads from /proc.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
tools/perf/tests/thread-map.c | 2 +-
tools/perf/util/evlist.c | 3 ++-
tools/perf/util/thread_map.c | 19 ++++++++++++++++---
tools/perf/util/thread_map.h | 3 ++-
4 files changed, 21 insertions(+), 6 deletions(-)
diff --git a/tools/perf/tests/thread-map.c b/tools/perf/tests/thread-map.c
index dbcb6a1..4de1939 100644
--- a/tools/perf/tests/thread-map.c
+++ b/tools/perf/tests/thread-map.c
@@ -105,7 +105,7 @@ int test__thread_map_remove(struct test *test __maybe_unused, int subtest __mayb
TEST_ASSERT_VAL("failed to allocate map string",
asprintf(&str, "%d,%d", getpid(), getppid()) >= 0);
- threads = thread_map__new_str(str, NULL, 0);
+ threads = thread_map__new_str(str, NULL, 0, false);
TEST_ASSERT_VAL("failed to allocate thread_map",
threads);
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 199bb82..05b8f2b 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1102,7 +1102,8 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target)
struct cpu_map *cpus;
struct thread_map *threads;
- threads = thread_map__new_str(target->pid, target->tid, target->uid);
+ threads = thread_map__new_str(target->pid, target->tid, target->uid,
+ target->per_thread);
if (!threads)
return -1;
diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c
index be0d5a7..5672268 100644
--- a/tools/perf/util/thread_map.c
+++ b/tools/perf/util/thread_map.c
@@ -92,7 +92,7 @@ struct thread_map *thread_map__new_by_tid(pid_t tid)
return threads;
}
-struct thread_map *thread_map__new_by_uid(uid_t uid)
+static struct thread_map *enumerate_threads(uid_t uid)
{
DIR *proc;
int max_threads = 32, items, i;
@@ -124,7 +124,7 @@ struct thread_map *thread_map__new_by_uid(uid_t uid)
if (stat(path, &st) != 0)
continue;
- if (st.st_uid != uid)
+ if ((uid != UINT_MAX) && (st.st_uid != uid))
continue;
snprintf(path, sizeof(path), "/proc/%d/task", pid);
@@ -178,6 +178,16 @@ struct thread_map *thread_map__new_by_uid(uid_t uid)
goto out_closedir;
}
+struct thread_map *thread_map__new_by_uid(uid_t uid)
+{
+ return enumerate_threads(uid);
+}
+
+struct thread_map *thread_map__new_threads(void)
+{
+ return enumerate_threads(UINT_MAX);
+}
+
struct thread_map *thread_map__new(pid_t pid, pid_t tid, uid_t uid)
{
if (pid != -1)
@@ -313,7 +323,7 @@ struct thread_map *thread_map__new_by_tid_str(const char *tid_str)
}
struct thread_map *thread_map__new_str(const char *pid, const char *tid,
- uid_t uid)
+ uid_t uid, bool per_thread)
{
if (pid)
return thread_map__new_by_pid_str(pid);
@@ -321,6 +331,9 @@ struct thread_map *thread_map__new_str(const char *pid, const char *tid,
if (!tid && uid != UINT_MAX)
return thread_map__new_by_uid(uid);
+ if (per_thread)
+ return thread_map__new_threads();
+
return thread_map__new_by_tid_str(tid);
}
diff --git a/tools/perf/util/thread_map.h b/tools/perf/util/thread_map.h
index f158039..dc07543 100644
--- a/tools/perf/util/thread_map.h
+++ b/tools/perf/util/thread_map.h
@@ -23,6 +23,7 @@ struct thread_map *thread_map__new_dummy(void);
struct thread_map *thread_map__new_by_pid(pid_t pid);
struct thread_map *thread_map__new_by_tid(pid_t tid);
struct thread_map *thread_map__new_by_uid(uid_t uid);
+struct thread_map *thread_map__new_threads(void);
struct thread_map *thread_map__new(pid_t pid, pid_t tid, uid_t uid);
struct thread_map *thread_map__new_event(struct thread_map_event *event);
@@ -30,7 +31,7 @@ struct thread_map *thread_map__get(struct thread_map *map);
void thread_map__put(struct thread_map *map);
struct thread_map *thread_map__new_str(const char *pid,
- const char *tid, uid_t uid);
+ const char *tid, uid_t uid, bool per_thread);
struct thread_map *thread_map__new_by_tid_str(const char *tid_str);
--
2.7.4
^ permalink raw reply related [flat|nested] 49+ messages in thread
* [PATCH v1 8/9] perf stat: Remove --per-thread pid/tid limitation
2017-11-20 14:43 [PATCH v1 0/9] perf stat: Enable '--per-thread' on all threads Jin Yao
` (7 preceding siblings ...)
2017-11-20 14:43 ` [PATCH v1 7/9] perf util: Reuse thread_map__new_by_uid to enumerate threads from /proc Jin Yao
@ 2017-11-20 14:43 ` Jin Yao
2017-11-21 15:18 ` Jiri Olsa
` (2 more replies)
2017-11-20 14:43 ` [PATCH v1 9/9] perf stat: Resort '--per-thread' result Jin Yao
9 siblings, 3 replies; 49+ messages in thread
From: Jin Yao @ 2017-11-20 14:43 UTC (permalink / raw)
To: acme, jolsa, peterz, mingo, alexander.shishkin
Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao
Currently, if we execute 'perf stat --per-thread' without specifying
pid/tid, perf will return error.
root@skl:/tmp# perf stat --per-thread
The --per-thread option is only available when monitoring via -p -t options.
-p, --pid <pid> stat events on existing process id
-t, --tid <tid> stat events on existing thread id
This patch removes this limitation. If no pid/tid specified, it returns
all threads (get threads from /proc).
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
tools/perf/builtin-stat.c | 23 +++++++++++++++--------
tools/perf/util/target.h | 7 +++++++
2 files changed, 22 insertions(+), 8 deletions(-)
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 9eec145..2d718f7 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -277,7 +277,7 @@ static int create_perf_stat_counter(struct perf_evsel *evsel)
attr->enable_on_exec = 1;
}
- if (target__has_cpu(&target))
+ if (target__has_cpu(&target) && !target__has_per_thread(&target))
return perf_evsel__open_per_cpu(evsel, perf_evsel__cpus(evsel));
return perf_evsel__open_per_thread(evsel, evsel_list->threads);
@@ -340,7 +340,7 @@ static int read_counter(struct perf_evsel *counter)
int nthreads = thread_map__nr(evsel_list->threads);
int ncpus, cpu, thread;
- if (target__has_cpu(&target))
+ if (target__has_cpu(&target) && !target__has_per_thread(&target))
ncpus = perf_evsel__nr_cpus(counter);
else
ncpus = 1;
@@ -2791,12 +2791,16 @@ int cmd_stat(int argc, const char **argv)
run_count = 1;
}
- if ((stat_config.aggr_mode == AGGR_THREAD) && !target__has_task(&target)) {
- fprintf(stderr, "The --per-thread option is only available "
- "when monitoring via -p -t options.\n");
- parse_options_usage(NULL, stat_options, "p", 1);
- parse_options_usage(NULL, stat_options, "t", 1);
- goto out;
+ if ((stat_config.aggr_mode == AGGR_THREAD) &&
+ !target__has_task(&target)) {
+ if (!target.system_wide || target.cpu_list) {
+ fprintf(stderr, "The --per-thread option is only "
+ "available when monitoring via -p -t "
+ "options.\n");
+ parse_options_usage(NULL, stat_options, "p", 1);
+ parse_options_usage(NULL, stat_options, "t", 1);
+ goto out;
+ }
}
/*
@@ -2820,6 +2824,9 @@ int cmd_stat(int argc, const char **argv)
target__validate(&target);
+ if ((stat_config.aggr_mode == AGGR_THREAD) && (target.system_wide))
+ target.per_thread = true;
+
if (perf_evlist__create_maps(evsel_list, &target) < 0) {
if (target__has_task(&target)) {
pr_err("Problems finding threads of monitor\n");
diff --git a/tools/perf/util/target.h b/tools/perf/util/target.h
index 446aa7a..6ef01a8 100644
--- a/tools/perf/util/target.h
+++ b/tools/perf/util/target.h
@@ -64,6 +64,11 @@ static inline bool target__none(struct target *target)
return !target__has_task(target) && !target__has_cpu(target);
}
+static inline bool target__has_per_thread(struct target *target)
+{
+ return target->system_wide && target->per_thread;
+}
+
static inline bool target__uses_dummy_map(struct target *target)
{
bool use_dummy = false;
@@ -73,6 +78,8 @@ static inline bool target__uses_dummy_map(struct target *target)
else if (target__has_task(target) ||
(!target__has_cpu(target) && !target->uses_mmap))
use_dummy = true;
+ else if (target__has_per_thread(target))
+ use_dummy = true;
return use_dummy;
}
--
2.7.4
^ permalink raw reply related [flat|nested] 49+ messages in thread
* [PATCH v1 9/9] perf stat: Resort '--per-thread' result
2017-11-20 14:43 [PATCH v1 0/9] perf stat: Enable '--per-thread' on all threads Jin Yao
` (8 preceding siblings ...)
2017-11-20 14:43 ` [PATCH v1 8/9] perf stat: Remove --per-thread pid/tid limitation Jin Yao
@ 2017-11-20 14:43 ` Jin Yao
9 siblings, 0 replies; 49+ messages in thread
From: Jin Yao @ 2017-11-20 14:43 UTC (permalink / raw)
To: acme, jolsa, peterz, mingo, alexander.shishkin
Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao
There are many threads reported if we enable '--per-thread'
globally.
1. Most of the threads are not counted or counting value 0.
This patch removes these threads.
2. We also resort the threads in display according to the
counting value. It's useful for user to see the hottest
threads easily.
For example, the new results would be:
root@skl:/tmp# perf stat --per-thread
^C
Performance counter stats for 'system wide':
perf-24165 4.302433 cpu-clock (msec) # 0.001 CPUs utilized
vmstat-23127 1.562215 cpu-clock (msec) # 0.000 CPUs utilized
irqbalance-2780 0.827851 cpu-clock (msec) # 0.000 CPUs utilized
sshd-23111 0.278308 cpu-clock (msec) # 0.000 CPUs utilized
thermald-2841 0.230880 cpu-clock (msec) # 0.000 CPUs utilized
sshd-23058 0.207306 cpu-clock (msec) # 0.000 CPUs utilized
kworker/0:2-19991 0.133983 cpu-clock (msec) # 0.000 CPUs utilized
kworker/u16:1-18249 0.125636 cpu-clock (msec) # 0.000 CPUs utilized
rcu_sched-8 0.085533 cpu-clock (msec) # 0.000 CPUs utilized
kworker/u16:2-23146 0.077139 cpu-clock (msec) # 0.000 CPUs utilized
gmain-2700 0.041789 cpu-clock (msec) # 0.000 CPUs utilized
kworker/4:1-15354 0.028370 cpu-clock (msec) # 0.000 CPUs utilized
kworker/6:0-17528 0.023895 cpu-clock (msec) # 0.000 CPUs utilized
kworker/4:1H-1887 0.013209 cpu-clock (msec) # 0.000 CPUs utilized
kworker/5:2-31362 0.011627 cpu-clock (msec) # 0.000 CPUs utilized
watchdog/0-11 0.010892 cpu-clock (msec) # 0.000 CPUs utilized
kworker/3:2-12870 0.010220 cpu-clock (msec) # 0.000 CPUs utilized
ksoftirqd/0-7 0.008869 cpu-clock (msec) # 0.000 CPUs utilized
watchdog/1-14 0.008476 cpu-clock (msec) # 0.000 CPUs utilized
watchdog/7-50 0.002944 cpu-clock (msec) # 0.000 CPUs utilized
watchdog/3-26 0.002893 cpu-clock (msec) # 0.000 CPUs utilized
watchdog/4-32 0.002759 cpu-clock (msec) # 0.000 CPUs utilized
watchdog/2-20 0.002429 cpu-clock (msec) # 0.000 CPUs utilized
watchdog/6-44 0.001491 cpu-clock (msec) # 0.000 CPUs utilized
watchdog/5-38 0.001477 cpu-clock (msec) # 0.000 CPUs utilized
rcu_sched-8 10 context-switches # 0.117 M/sec
kworker/u16:1-18249 7 context-switches # 0.056 M/sec
sshd-23111 4 context-switches # 0.014 M/sec
vmstat-23127 4 context-switches # 0.003 M/sec
perf-24165 4 context-switches # 0.930 K/sec
kworker/0:2-19991 3 context-switches # 0.022 M/sec
kworker/u16:2-23146 3 context-switches # 0.039 M/sec
kworker/4:1-15354 2 context-switches # 0.070 M/sec
kworker/6:0-17528 2 context-switches # 0.084 M/sec
sshd-23058 2 context-switches # 0.010 M/sec
ksoftirqd/0-7 1 context-switches # 0.113 M/sec
watchdog/0-11 1 context-switches # 0.092 M/sec
watchdog/1-14 1 context-switches # 0.118 M/sec
watchdog/2-20 1 context-switches # 0.412 M/sec
watchdog/3-26 1 context-switches # 0.346 M/sec
watchdog/4-32 1 context-switches # 0.362 M/sec
watchdog/5-38 1 context-switches # 0.677 M/sec
watchdog/6-44 1 context-switches # 0.671 M/sec
watchdog/7-50 1 context-switches # 0.340 M/sec
kworker/4:1H-1887 1 context-switches # 0.076 M/sec
thermald-2841 1 context-switches # 0.004 M/sec
gmain-2700 1 context-switches # 0.024 M/sec
irqbalance-2780 1 context-switches # 0.001 M/sec
kworker/3:2-12870 1 context-switches # 0.098 M/sec
kworker/5:2-31362 1 context-switches # 0.086 M/sec
kworker/u16:1-18249 2 cpu-migrations # 0.016 M/sec
kworker/u16:2-23146 2 cpu-migrations # 0.026 M/sec
rcu_sched-8 1 cpu-migrations # 0.012 M/sec
sshd-23058 1 cpu-migrations # 0.005 M/sec
perf-24165 8,833,385 cycles # 2.053 GHz
vmstat-23127 1,702,699 cycles # 1.090 GHz
irqbalance-2780 739,847 cycles # 0.894 GHz
sshd-23111 269,506 cycles # 0.968 GHz
thermald-2841 204,556 cycles # 0.886 GHz
sshd-23058 158,780 cycles # 0.766 GHz
kworker/0:2-19991 112,981 cycles # 0.843 GHz
kworker/u16:1-18249 100,926 cycles # 0.803 GHz
rcu_sched-8 74,024 cycles # 0.865 GHz
kworker/u16:2-23146 55,984 cycles # 0.726 GHz
gmain-2700 34,278 cycles # 0.820 GHz
kworker/4:1-15354 20,665 cycles # 0.728 GHz
kworker/6:0-17528 16,445 cycles # 0.688 GHz
kworker/5:2-31362 9,492 cycles # 0.816 GHz
watchdog/3-26 8,695 cycles # 3.006 GHz
kworker/4:1H-1887 8,238 cycles # 0.624 GHz
watchdog/4-32 7,580 cycles # 2.747 GHz
kworker/3:2-12870 7,306 cycles # 0.715 GHz
watchdog/2-20 7,274 cycles # 2.995 GHz
watchdog/0-11 6,988 cycles # 0.642 GHz
ksoftirqd/0-7 6,376 cycles # 0.719 GHz
watchdog/1-14 5,340 cycles # 0.630 GHz
watchdog/5-38 4,061 cycles # 2.749 GHz
watchdog/6-44 3,976 cycles # 2.667 GHz
watchdog/7-50 3,418 cycles # 1.161 GHz
vmstat-23127 2,511,699 instructions # 1.48 insn per cycle
perf-24165 1,829,908 instructions # 0.21 insn per cycle
irqbalance-2780 1,190,204 instructions # 1.61 insn per cycle
thermald-2841 143,544 instructions # 0.70 insn per cycle
sshd-23111 128,138 instructions # 0.48 insn per cycle
sshd-23058 57,654 instructions # 0.36 insn per cycle
rcu_sched-8 44,063 instructions # 0.60 insn per cycle
kworker/u16:1-18249 42,551 instructions # 0.42 insn per cycle
kworker/0:2-19991 25,873 instructions # 0.23 insn per cycle
kworker/u16:2-23146 21,407 instructions # 0.38 insn per cycle
gmain-2700 13,691 instructions # 0.40 insn per cycle
kworker/4:1-15354 12,964 instructions # 0.63 insn per cycle
kworker/6:0-17528 10,034 instructions # 0.61 insn per cycle
kworker/5:2-31362 5,203 instructions # 0.55 insn per cycle
kworker/3:2-12870 4,866 instructions # 0.67 insn per cycle
kworker/4:1H-1887 3,586 instructions # 0.44 insn per cycle
ksoftirqd/0-7 3,463 instructions # 0.54 insn per cycle
watchdog/0-11 3,135 instructions # 0.45 insn per cycle
watchdog/1-14 3,135 instructions # 0.59 insn per cycle
watchdog/2-20 3,135 instructions # 0.43 insn per cycle
watchdog/3-26 3,135 instructions # 0.36 insn per cycle
watchdog/4-32 3,135 instructions # 0.41 insn per cycle
watchdog/5-38 3,135 instructions # 0.77 insn per cycle
watchdog/6-44 3,135 instructions # 0.79 insn per cycle
watchdog/7-50 3,135 instructions # 0.92 insn per cycle
vmstat-23127 539,181 branches # 345.139 M/sec
perf-24165 375,364 branches # 87.245 M/sec
irqbalance-2780 262,092 branches # 316.593 M/sec
thermald-2841 31,611 branches # 136.915 M/sec
sshd-23111 21,874 branches # 78.596 M/sec
sshd-23058 10,682 branches # 51.528 M/sec
rcu_sched-8 8,693 branches # 101.633 M/sec
kworker/u16:1-18249 7,891 branches # 62.808 M/sec
kworker/0:2-19991 5,761 branches # 42.998 M/sec
kworker/u16:2-23146 4,099 branches # 53.138 M/sec
kworker/4:1-15354 2,755 branches # 97.110 M/sec
gmain-2700 2,638 branches # 63.127 M/sec
kworker/6:0-17528 2,216 branches # 92.739 M/sec
kworker/5:2-31362 1,132 branches # 97.360 M/sec
kworker/3:2-12870 1,081 branches # 105.773 M/sec
kworker/4:1H-1887 725 branches # 54.887 M/sec
ksoftirqd/0-7 707 branches # 79.716 M/sec
watchdog/0-11 652 branches # 59.860 M/sec
watchdog/1-14 652 branches # 76.923 M/sec
watchdog/2-20 652 branches # 268.423 M/sec
watchdog/3-26 652 branches # 225.372 M/sec
watchdog/4-32 652 branches # 236.318 M/sec
watchdog/5-38 652 branches # 441.435 M/sec
watchdog/6-44 652 branches # 437.290 M/sec
watchdog/7-50 652 branches # 221.467 M/sec
vmstat-23127 8,960 branch-misses # 1.66% of all branches
irqbalance-2780 3,047 branch-misses # 1.16% of all branches
perf-24165 2,876 branch-misses # 0.77% of all branches
sshd-23111 1,843 branch-misses # 8.43% of all branches
thermald-2841 1,444 branch-misses # 4.57% of all branches
sshd-23058 1,379 branch-misses # 12.91% of all branches
kworker/u16:1-18249 982 branch-misses # 12.44% of all branches
rcu_sched-8 893 branch-misses # 10.27% of all branches
kworker/u16:2-23146 578 branch-misses # 14.10% of all branches
kworker/0:2-19991 376 branch-misses # 6.53% of all branches
gmain-2700 280 branch-misses # 10.61% of all branches
kworker/6:0-17528 196 branch-misses # 8.84% of all branches
kworker/4:1-15354 187 branch-misses # 6.79% of all branches
kworker/5:2-31362 123 branch-misses # 10.87% of all branches
watchdog/0-11 95 branch-misses # 14.57% of all branches
watchdog/4-32 89 branch-misses # 13.65% of all branches
kworker/3:2-12870 80 branch-misses # 7.40% of all branches
watchdog/3-26 61 branch-misses # 9.36% of all branches
kworker/4:1H-1887 60 branch-misses # 8.28% of all branches
watchdog/2-20 52 branch-misses # 7.98% of all branches
ksoftirqd/0-7 47 branch-misses # 6.65% of all branches
watchdog/1-14 46 branch-misses # 7.06% of all branches
watchdog/7-50 13 branch-misses # 1.99% of all branches
watchdog/5-38 8 branch-misses # 1.23% of all branches
watchdog/6-44 7 branch-misses # 1.07% of all branches
3.695150786 seconds time elapsed
root@skl:/tmp# perf stat --per-thread -M IPC,CPI
^C
Performance counter stats for 'system wide':
vmstat-23127 2,000,783 inst_retired.any # 1.5 IPC
thermald-2841 1,472,670 inst_retired.any # 1.3 IPC
sshd-23111 977,374 inst_retired.any # 1.2 IPC
perf-24163 483,779 inst_retired.any # 0.2 IPC
gmain-2700 341,213 inst_retired.any # 0.9 IPC
sshd-23058 148,891 inst_retired.any # 0.8 IPC
rtkit-daemon-3288 71,210 inst_retired.any # 0.7 IPC
kworker/u16:1-18249 39,562 inst_retired.any # 0.3 IPC
rcu_sched-8 14,474 inst_retired.any # 0.8 IPC
kworker/0:2-19991 7,659 inst_retired.any # 0.2 IPC
kworker/4:1-15354 6,714 inst_retired.any # 0.8 IPC
rtkit-daemon-3289 4,839 inst_retired.any # 0.3 IPC
kworker/6:0-17528 3,321 inst_retired.any # 0.6 IPC
kworker/5:2-31362 3,215 inst_retired.any # 0.5 IPC
kworker/7:2-23145 3,173 inst_retired.any # 0.7 IPC
kworker/4:1H-1887 1,719 inst_retired.any # 0.3 IPC
watchdog/0-11 1,479 inst_retired.any # 0.3 IPC
watchdog/1-14 1,479 inst_retired.any # 0.3 IPC
watchdog/2-20 1,479 inst_retired.any # 0.4 IPC
watchdog/3-26 1,479 inst_retired.any # 0.4 IPC
watchdog/4-32 1,479 inst_retired.any # 0.3 IPC
watchdog/5-38 1,479 inst_retired.any # 0.3 IPC
watchdog/6-44 1,479 inst_retired.any # 0.7 IPC
watchdog/7-50 1,479 inst_retired.any # 0.7 IPC
kworker/u16:2-23146 1,408 inst_retired.any # 0.5 IPC
perf-24163 2,249,872 cpu_clk_unhalted.thread
vmstat-23127 1,352,455 cpu_clk_unhalted.thread
thermald-2841 1,161,140 cpu_clk_unhalted.thread
sshd-23111 807,827 cpu_clk_unhalted.thread
gmain-2700 375,535 cpu_clk_unhalted.thread
sshd-23058 194,071 cpu_clk_unhalted.thread
kworker/u16:1-18249 114,306 cpu_clk_unhalted.thread
rtkit-daemon-3288 103,547 cpu_clk_unhalted.thread
kworker/0:2-19991 46,550 cpu_clk_unhalted.thread
rcu_sched-8 18,855 cpu_clk_unhalted.thread
rtkit-daemon-3289 17,549 cpu_clk_unhalted.thread
kworker/4:1-15354 8,812 cpu_clk_unhalted.thread
kworker/5:2-31362 6,812 cpu_clk_unhalted.thread
kworker/4:1H-1887 5,270 cpu_clk_unhalted.thread
kworker/6:0-17528 5,111 cpu_clk_unhalted.thread
kworker/7:2-23145 4,667 cpu_clk_unhalted.thread
watchdog/0-11 4,663 cpu_clk_unhalted.thread
watchdog/1-14 4,663 cpu_clk_unhalted.thread
watchdog/4-32 4,626 cpu_clk_unhalted.thread
watchdog/5-38 4,403 cpu_clk_unhalted.thread
watchdog/3-26 3,936 cpu_clk_unhalted.thread
watchdog/2-20 3,850 cpu_clk_unhalted.thread
kworker/u16:2-23146 2,654 cpu_clk_unhalted.thread
watchdog/6-44 2,017 cpu_clk_unhalted.thread
watchdog/7-50 2,017 cpu_clk_unhalted.thread
vmstat-23127 2,000,783 inst_retired.any # 0.7 CPI
thermald-2841 1,472,670 inst_retired.any # 0.8 CPI
sshd-23111 977,374 inst_retired.any # 0.8 CPI
perf-24163 495,037 inst_retired.any # 4.7 CPI
gmain-2700 341,213 inst_retired.any # 1.1 CPI
sshd-23058 148,891 inst_retired.any # 1.3 CPI
rtkit-daemon-3288 71,210 inst_retired.any # 1.5 CPI
kworker/u16:1-18249 39,562 inst_retired.any # 2.9 CPI
rcu_sched-8 14,474 inst_retired.any # 1.3 CPI
kworker/0:2-19991 7,659 inst_retired.any # 6.1 CPI
kworker/4:1-15354 6,714 inst_retired.any # 1.3 CPI
rtkit-daemon-3289 4,839 inst_retired.any # 3.6 CPI
kworker/6:0-17528 3,321 inst_retired.any # 1.5 CPI
kworker/5:2-31362 3,215 inst_retired.any # 2.1 CPI
kworker/7:2-23145 3,173 inst_retired.any # 1.5 CPI
kworker/4:1H-1887 1,719 inst_retired.any # 3.1 CPI
watchdog/0-11 1,479 inst_retired.any # 3.2 CPI
watchdog/1-14 1,479 inst_retired.any # 3.2 CPI
watchdog/2-20 1,479 inst_retired.any # 2.6 CPI
watchdog/3-26 1,479 inst_retired.any # 2.7 CPI
watchdog/4-32 1,479 inst_retired.any # 3.1 CPI
watchdog/5-38 1,479 inst_retired.any # 3.0 CPI
watchdog/6-44 1,479 inst_retired.any # 1.4 CPI
watchdog/7-50 1,479 inst_retired.any # 1.4 CPI
kworker/u16:2-23146 1,408 inst_retired.any # 1.9 CPI
perf-24163 2,302,323 cycles
vmstat-23127 1,352,455 cycles
thermald-2841 1,161,140 cycles
sshd-23111 807,827 cycles
gmain-2700 375,535 cycles
sshd-23058 194,071 cycles
kworker/u16:1-18249 114,306 cycles
rtkit-daemon-3288 103,547 cycles
kworker/0:2-19991 46,550 cycles
rcu_sched-8 18,855 cycles
rtkit-daemon-3289 17,549 cycles
kworker/4:1-15354 8,812 cycles
kworker/5:2-31362 6,812 cycles
kworker/4:1H-1887 5,270 cycles
kworker/6:0-17528 5,111 cycles
kworker/7:2-23145 4,667 cycles
watchdog/0-11 4,663 cycles
watchdog/1-14 4,663 cycles
watchdog/4-32 4,626 cycles
watchdog/5-38 4,403 cycles
watchdog/3-26 3,936 cycles
watchdog/2-20 3,850 cycles
kworker/u16:2-23146 2,654 cycles
watchdog/6-44 2,017 cycles
watchdog/7-50 2,017 cycles
2.175726600 seconds time elapsed
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
tools/perf/builtin-stat.c | 71 +++++++++++++++++++++++++++++++++++++++++------
tools/perf/util/stat.h | 9 ++++++
2 files changed, 72 insertions(+), 8 deletions(-)
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 2d718f7..25ccf4e 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1405,13 +1405,24 @@ static void print_aggr(char *prefix)
}
}
-static void print_aggr_thread(struct perf_evsel *counter, char *prefix)
+static int cmp_val(const void *a, const void *b)
{
- FILE *output = stat_config.output;
- int nthreads = thread_map__nr(counter->threads);
- int ncpus = cpu_map__nr(counter->cpus);
- int cpu, thread;
+ return ((struct perf_aggr_thread_value *)b)->val -
+ ((struct perf_aggr_thread_value *)a)->val;
+}
+
+static struct perf_aggr_thread_value *sort_aggr_thread(
+ struct perf_evsel *counter,
+ int nthreads, int ncpus,
+ int *ret)
+{
+ int cpu, thread, i = 0;
double uval;
+ struct perf_aggr_thread_value *buf;
+
+ buf = calloc(nthreads, sizeof(struct perf_aggr_thread_value));
+ if (!buf)
+ return NULL;
for (thread = 0; thread < nthreads; thread++) {
u64 ena = 0, run = 0, val = 0;
@@ -1422,14 +1433,58 @@ static void print_aggr_thread(struct perf_evsel *counter, char *prefix)
run += perf_counts(counter->counts, cpu, thread)->run;
}
+ uval = val * counter->scale;
+
+ /*
+ * Skip value 0 when enabling --per-thread globally,
+ * otherwise too many 0 output.
+ */
+ if (uval == 0.0 && target__has_per_thread(&target))
+ continue;
+
+ buf[i].counter = counter;
+ buf[i].id = thread;
+ buf[i].uval = uval;
+ buf[i].val = val;
+ buf[i].run = run;
+ buf[i].ena = ena;
+ i++;
+ }
+
+ qsort(buf, i, sizeof(struct perf_aggr_thread_value), cmp_val);
+
+ if (ret)
+ *ret = i;
+
+ return buf;
+}
+
+static void print_aggr_thread(struct perf_evsel *counter, char *prefix)
+{
+ FILE *output = stat_config.output;
+ int nthreads = thread_map__nr(counter->threads);
+ int ncpus = cpu_map__nr(counter->cpus);
+ int thread, sorted_threads, id;
+ struct perf_aggr_thread_value *buf;
+
+ buf = sort_aggr_thread(counter, nthreads, ncpus, &sorted_threads);
+ if (!buf) {
+ perror("cannot sort aggr thread");
+ return;
+ }
+
+ for (thread = 0; thread < sorted_threads; thread++) {
if (prefix)
fprintf(output, "%s", prefix);
- uval = val * counter->scale;
- printout(thread, 0, counter, uval, prefix, run, ena, 1.0,
- &stat_config.stats[thread]);
+ id = buf[thread].id;
+ printout(id, 0, buf[thread].counter, buf[thread].uval,
+ prefix, buf[thread].run, buf[thread].ena, 1.0,
+ &stat_config.stats[id]);
fputc('\n', output);
}
+
+ free(buf);
}
struct caggr_data {
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 7ed77b8..b80f9a6 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -112,6 +112,15 @@ static inline void init_stats(struct stats *stats)
struct perf_evsel;
struct perf_evlist;
+struct perf_aggr_thread_value {
+ struct perf_evsel *counter;
+ int id;
+ double uval;
+ u64 val;
+ u64 run;
+ u64 ena;
+};
+
bool __perf_evsel_stat__is(struct perf_evsel *evsel,
enum perf_stat_evsel_id id);
--
2.7.4
^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH v1 0/9] perf stat: Enable '--per-thread' on all threads
2017-11-20 12:29 ` Jiri Olsa
@ 2017-11-20 15:50 ` Andi Kleen
0 siblings, 0 replies; 49+ messages in thread
From: Andi Kleen @ 2017-11-20 15:50 UTC (permalink / raw)
To: Jiri Olsa
Cc: Jin, Yao, acme, jolsa, peterz, mingo, alexander.shishkin,
Linux-kernel, kan.liang, yao.jin
> > Hi Jiri,
> >
> > This patch set is based on the latest perf/core branch.
> >
> > I pull the branch right now and try the build again, the building is OK.
> >
> > Could you tell me which branch you are testing on?
>
> ugh.. I was over Andi's changes.. I'll recheck
Yes the two patch kits conflicts. Whoever gets merged first, the
other has to rebase.
-Andi
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 3/9] perf util: Reconstruct rblist for supporting per-thread shadow stats
2017-11-20 14:43 ` [PATCH v1 3/9] perf util: Reconstruct rblist for supporting per-thread shadow stats Jin Yao
@ 2017-11-21 15:17 ` Jiri Olsa
2017-11-22 1:29 ` Jin, Yao
2017-11-21 15:17 ` Jiri Olsa
` (5 subsequent siblings)
6 siblings, 1 reply; 49+ messages in thread
From: Jiri Olsa @ 2017-11-21 15:17 UTC (permalink / raw)
To: Jin Yao
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On Mon, Nov 20, 2017 at 10:43:38PM +0800, Jin Yao wrote:
SNIP
>
> +static void init_saved_rblist(struct rblist *rblist)
> +{
> + rblist__init(rblist);
> + rblist->node_cmp = saved_value_cmp;
> + rblist->node_new = saved_value_new;
> + rblist->node_delete = saved_value_delete;
> +}
> +
> +static void free_saved_rblist(struct rblist *rblist)
> +{
> + rblist__reset(rblist);
> +}
> +
> +void perf_stat__init_runtime_stat(struct runtime_stat *stat)
> +{
> + memset(stat, 0, sizeof(struct runtime_stat));
what's this memset for? that struct has only rb_list
jirka
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 3/9] perf util: Reconstruct rblist for supporting per-thread shadow stats
2017-11-20 14:43 ` [PATCH v1 3/9] perf util: Reconstruct rblist for supporting per-thread shadow stats Jin Yao
2017-11-21 15:17 ` Jiri Olsa
@ 2017-11-21 15:17 ` Jiri Olsa
2017-11-22 1:35 ` Jin, Yao
2017-11-21 15:17 ` Jiri Olsa
` (4 subsequent siblings)
6 siblings, 1 reply; 49+ messages in thread
From: Jiri Olsa @ 2017-11-21 15:17 UTC (permalink / raw)
To: Jin Yao
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On Mon, Nov 20, 2017 at 10:43:38PM +0800, Jin Yao wrote:
SNIP
>
> +static void init_saved_rblist(struct rblist *rblist)
> +{
> + rblist__init(rblist);
> + rblist->node_cmp = saved_value_cmp;
> + rblist->node_new = saved_value_new;
> + rblist->node_delete = saved_value_delete;
> +}
> +
> +static void free_saved_rblist(struct rblist *rblist)
> +{
> + rblist__reset(rblist);
> +}
I don't see a reason for this code to be in separate
functions.. could go directly in:
perf_stat__init_runtime_stat
perf_stat__free_runtime_stat
jirka
> +
> +void perf_stat__init_runtime_stat(struct runtime_stat *stat)
> +{
> + memset(stat, 0, sizeof(struct runtime_stat));
> + init_saved_rblist(&stat->value_list);
> +}
> +
> +void perf_stat__free_runtime_stat(struct runtime_stat *stat)
> +{
> + free_saved_rblist(&stat->value_list);
> +}
> +
SNIP
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 3/9] perf util: Reconstruct rblist for supporting per-thread shadow stats
2017-11-20 14:43 ` [PATCH v1 3/9] perf util: Reconstruct rblist for supporting per-thread shadow stats Jin Yao
2017-11-21 15:17 ` Jiri Olsa
2017-11-21 15:17 ` Jiri Olsa
@ 2017-11-21 15:17 ` Jiri Olsa
2017-11-22 1:45 ` Jin, Yao
2017-11-21 15:17 ` Jiri Olsa
` (3 subsequent siblings)
6 siblings, 1 reply; 49+ messages in thread
From: Jiri Olsa @ 2017-11-21 15:17 UTC (permalink / raw)
To: Jin Yao
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On Mon, Nov 20, 2017 at 10:43:38PM +0800, Jin Yao wrote:
> In current stat-shadow.c, the rblist deleting is ignored.
>
> The patch reconstructs the codes of rblist init/free, and adds
> the implementation to node_delete method of rblist.
>
> This patch also does:
>
> 1. Add the ctx/type/stat into rbtree keys because we will
> use this rbtree to maintain shadow metrics to replace original
> a couple of static arrays for supporting per-thread shadow stats.
>
> 2. Create a static runtime_stat variable 'rt_stat' which
> will log the shadow metrics by default.
please make that separate patches then,
one patch - one logical change
thanks,
jirka
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 3/9] perf util: Reconstruct rblist for supporting per-thread shadow stats
2017-11-20 14:43 ` [PATCH v1 3/9] perf util: Reconstruct rblist for supporting per-thread shadow stats Jin Yao
` (2 preceding siblings ...)
2017-11-21 15:17 ` Jiri Olsa
@ 2017-11-21 15:17 ` Jiri Olsa
2017-11-22 2:11 ` Jin, Yao
2017-11-21 15:17 ` Jiri Olsa
` (2 subsequent siblings)
6 siblings, 1 reply; 49+ messages in thread
From: Jiri Olsa @ 2017-11-21 15:17 UTC (permalink / raw)
To: Jin Yao
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On Mon, Nov 20, 2017 at 10:43:38PM +0800, Jin Yao wrote:
SNIP
> +static void init_saved_rblist(struct rblist *rblist)
> +{
> + rblist__init(rblist);
> + rblist->node_cmp = saved_value_cmp;
> + rblist->node_new = saved_value_new;
> + rblist->node_delete = saved_value_delete;
> +}
> +
> +static void free_saved_rblist(struct rblist *rblist)
> +{
> + rblist__reset(rblist);
> +}
> +
> +void perf_stat__init_runtime_stat(struct runtime_stat *stat)
> +{
> + memset(stat, 0, sizeof(struct runtime_stat));
> + init_saved_rblist(&stat->value_list);
> +}
> +
> +void perf_stat__free_runtime_stat(struct runtime_stat *stat)
> +{
> + free_saved_rblist(&stat->value_list);
> +}
> +
> void perf_stat__init_shadow_stats(void)
> {
> have_frontend_stalled = pmu_have_event("cpu", "stalled-cycles-frontend");
> - rblist__init(&runtime_saved_values);
> - runtime_saved_values.node_cmp = saved_value_cmp;
> - runtime_saved_values.node_new = saved_value_new;
> - /* No delete for now */
> + memset(&walltime_nsecs_stats, 0, sizeof(walltime_nsecs_stats));
why do you zero walltime_nsecs_stats in here?
jirka
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 3/9] perf util: Reconstruct rblist for supporting per-thread shadow stats
2017-11-20 14:43 ` [PATCH v1 3/9] perf util: Reconstruct rblist for supporting per-thread shadow stats Jin Yao
` (3 preceding siblings ...)
2017-11-21 15:17 ` Jiri Olsa
@ 2017-11-21 15:17 ` Jiri Olsa
2017-11-22 2:19 ` Jin, Yao
2017-11-21 15:17 ` Jiri Olsa
2017-11-22 6:31 ` Ravi Bangoria
6 siblings, 1 reply; 49+ messages in thread
From: Jiri Olsa @ 2017-11-21 15:17 UTC (permalink / raw)
To: Jin Yao
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On Mon, Nov 20, 2017 at 10:43:38PM +0800, Jin Yao wrote:
> In current stat-shadow.c, the rblist deleting is ignored.
>
> The patch reconstructs the codes of rblist init/free, and adds
> the implementation to node_delete method of rblist.
>
> This patch also does:
>
> 1. Add the ctx/type/stat into rbtree keys because we will
> use this rbtree to maintain shadow metrics to replace original
> a couple of static arrays for supporting per-thread shadow stats.
>
> 2. Create a static runtime_stat variable 'rt_stat' which
> will log the shadow metrics by default.
>
> Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
> ---
> tools/perf/util/stat-shadow.c | 62 ++++++++++++++++++++++++++++++++++++++++---
> tools/perf/util/stat.h | 2 ++
> 2 files changed, 60 insertions(+), 4 deletions(-)
>
> diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
> index 5853901..045e129 100644
> --- a/tools/perf/util/stat-shadow.c
> +++ b/tools/perf/util/stat-shadow.c
> @@ -40,12 +40,16 @@ static struct stats runtime_aperf_stats[NUM_CTX][MAX_NR_CPUS];
> static struct rblist runtime_saved_values;
> static bool have_frontend_stalled;
>
> +static struct runtime_stat rt_stat;
> struct stats walltime_nsecs_stats;
>
> struct saved_value {
> struct rb_node rb_node;
> struct perf_evsel *evsel;
> + enum stat_type type;
> + int ctx;
> int cpu;
> + struct runtime_stat *stat;
> struct stats stats;
> };
>
> @@ -58,6 +62,23 @@ static int saved_value_cmp(struct rb_node *rb_node, const void *entry)
>
> if (a->cpu != b->cpu)
> return a->cpu - b->cpu;
> +
> + if (a->type != b->type)
> + return a->type - b->type;
> +
> + if (a->ctx != b->ctx)
> + return a->ctx - b->ctx;
> +
could you please comment in here on the cases where
evsel is defined and when not
jirka
> + if (a->evsel == NULL && b->evsel == NULL) {
> + if (a->stat == b->stat)
> + return 0;
> +
> + if ((char *)a->stat < (char *)b->stat)
> + return -1;
> +
> + return 1;
> + }
> +
SNIP
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 3/9] perf util: Reconstruct rblist for supporting per-thread shadow stats
2017-11-20 14:43 ` [PATCH v1 3/9] perf util: Reconstruct rblist for supporting per-thread shadow stats Jin Yao
` (4 preceding siblings ...)
2017-11-21 15:17 ` Jiri Olsa
@ 2017-11-21 15:17 ` Jiri Olsa
2017-11-22 2:20 ` Jin, Yao
2017-11-22 6:31 ` Ravi Bangoria
6 siblings, 1 reply; 49+ messages in thread
From: Jiri Olsa @ 2017-11-21 15:17 UTC (permalink / raw)
To: Jin Yao
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On Mon, Nov 20, 2017 at 10:43:38PM +0800, Jin Yao wrote:
SNIP
> static int evsel_context(struct perf_evsel *evsel)
> diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
> index 61fd2e0..4eb081d 100644
> --- a/tools/perf/util/stat.h
> +++ b/tools/perf/util/stat.h
> @@ -126,6 +126,8 @@ typedef void (*print_metric_t)(void *ctx, const char *color, const char *unit,
> const char *fmt, double val);
> typedef void (*new_line_t )(void *ctx);
>
> +void perf_stat__init_runtime_stat(struct runtime_stat *stat);
> +void perf_stat__free_runtime_stat(struct runtime_stat *stat);
maybe that could be:
runtime_stat__init(struct runtime_stat *stat);
runtime_stat__free(struct runtime_stat *stat);
jirka
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 5/9] perf util: Remove a set of shadow stats static variables
2017-11-20 14:43 ` [PATCH v1 5/9] perf util: Remove a set of shadow stats static variables Jin Yao
@ 2017-11-21 15:17 ` Jiri Olsa
2017-11-21 18:03 ` Andi Kleen
0 siblings, 1 reply; 49+ messages in thread
From: Jiri Olsa @ 2017-11-21 15:17 UTC (permalink / raw)
To: Jin Yao
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On Mon, Nov 20, 2017 at 10:43:40PM +0800, Jin Yao wrote:
> In previous patches, we have reconstructed the code and let
> it not access the static variables directly.
>
> This patch removes these static variables.
>
> Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
> ---
> tools/perf/util/stat-shadow.c | 64 ++++++++++---------------------------------
> tools/perf/util/stat.h | 1 +
> 2 files changed, 16 insertions(+), 49 deletions(-)
>
> diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
> index 6f28782..74bcc4d 100644
> --- a/tools/perf/util/stat-shadow.c
> +++ b/tools/perf/util/stat-shadow.c
> @@ -16,28 +16,6 @@
> * AGGR_NONE: Use matching CPU
> * AGGR_THREAD: Not supported?
> */
> -static struct stats runtime_nsecs_stats[MAX_NR_CPUS];
> -static struct stats runtime_cycles_stats[NUM_CTX][MAX_NR_CPUS];
> -static struct stats runtime_stalled_cycles_front_stats[NUM_CTX][MAX_NR_CPUS];
> -static struct stats runtime_stalled_cycles_back_stats[NUM_CTX][MAX_NR_CPUS];
> -static struct stats runtime_branches_stats[NUM_CTX][MAX_NR_CPUS];
> -static struct stats runtime_cacherefs_stats[NUM_CTX][MAX_NR_CPUS];
> -static struct stats runtime_l1_dcache_stats[NUM_CTX][MAX_NR_CPUS];
> -static struct stats runtime_l1_icache_stats[NUM_CTX][MAX_NR_CPUS];
> -static struct stats runtime_ll_cache_stats[NUM_CTX][MAX_NR_CPUS];
> -static struct stats runtime_itlb_cache_stats[NUM_CTX][MAX_NR_CPUS];
> -static struct stats runtime_dtlb_cache_stats[NUM_CTX][MAX_NR_CPUS];
> -static struct stats runtime_cycles_in_tx_stats[NUM_CTX][MAX_NR_CPUS];
> -static struct stats runtime_transaction_stats[NUM_CTX][MAX_NR_CPUS];
> -static struct stats runtime_elision_stats[NUM_CTX][MAX_NR_CPUS];
> -static struct stats runtime_topdown_total_slots[NUM_CTX][MAX_NR_CPUS];
> -static struct stats runtime_topdown_slots_issued[NUM_CTX][MAX_NR_CPUS];
> -static struct stats runtime_topdown_slots_retired[NUM_CTX][MAX_NR_CPUS];
> -static struct stats runtime_topdown_fetch_bubbles[NUM_CTX][MAX_NR_CPUS];
> -static struct stats runtime_topdown_recovery_bubbles[NUM_CTX][MAX_NR_CPUS];
> -static struct stats runtime_smi_num_stats[NUM_CTX][MAX_NR_CPUS];
> -static struct stats runtime_aperf_stats[NUM_CTX][MAX_NR_CPUS];
> -static struct rblist runtime_saved_values;
> static bool have_frontend_stalled;
all this is about switching from array to rb_list for the --per-thread case,
which can be considered as a special use case.. how much do we suffer in
performance with new code? how about the "perf stat -I 100", would it scale
ok for extreme cases (many events in -e or -dddd..)
jirka
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 4/9] perf util: Update and print per-thread shadow stats
2017-11-20 14:43 ` [PATCH v1 4/9] perf util: Update and print " Jin Yao
@ 2017-11-21 15:17 ` Jiri Olsa
2017-11-22 2:42 ` Jin, Yao
2017-11-21 15:18 ` Jiri Olsa
1 sibling, 1 reply; 49+ messages in thread
From: Jiri Olsa @ 2017-11-21 15:17 UTC (permalink / raw)
To: Jin Yao
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On Mon, Nov 20, 2017 at 10:43:39PM +0800, Jin Yao wrote:
SNIP
> diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
> index 045e129..6f28782 100644
> --- a/tools/perf/util/stat-shadow.c
> +++ b/tools/perf/util/stat-shadow.c
> @@ -110,19 +110,32 @@ static void saved_value_delete(struct rblist *rblist __maybe_unused,
>
> static struct saved_value *saved_value_lookup(struct perf_evsel *evsel,
> int cpu,
> - bool create)
> + bool create,
> + enum stat_type type,
> + int ctx,
> + struct runtime_stat *stat)
> {
> + struct rblist *rblist;
> struct rb_node *nd;
> struct saved_value dm = {
> .cpu = cpu,
> .evsel = evsel,
> + .type = type,
> + .ctx = ctx,
> + .stat = stat,
> };
> - nd = rblist__find(&runtime_saved_values, &dm);
> +
> + if (stat)
> + rblist = &stat->value_list;
> + else
> + rblist = &rt_stat.value_list;
please pass the correct 'struct runtime_stat *stat',
I don't see a reason noot to pass &rt_stat directly below:
> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> index 151e9ef..50bb16d 100644
> --- a/tools/perf/util/stat.c
> +++ b/tools/perf/util/stat.c
> @@ -278,9 +278,16 @@ process_counter_values(struct perf_stat_config *config, struct perf_evsel *evsel
> perf_evsel__compute_deltas(evsel, cpu, thread, count);
> perf_counts_values__scale(count, config->scale, NULL);
> if (config->aggr_mode == AGGR_NONE)
> - perf_stat__update_shadow_stats(evsel, count->val, cpu);
> - if (config->aggr_mode == AGGR_THREAD)
> - perf_stat__update_shadow_stats(evsel, count->val, 0);
> + perf_stat__update_shadow_stats(evsel, count->val, cpu,
> + NULL);
> + if (config->aggr_mode == AGGR_THREAD) {
> + if (config->stats)
> + perf_stat__update_shadow_stats(evsel,
> + count->val, 0, &config->stats[thread]);
> + else
> + perf_stat__update_shadow_stats(evsel,
> + count->val, 0, NULL);
here
> + }
> break;
> case AGGR_GLOBAL:
> aggr->val += count->val;
> @@ -362,7 +369,7 @@ int perf_stat_process_counter(struct perf_stat_config *config,
> /*
> * Save the full runtime - to allow normalization during printout:
> */
> - perf_stat__update_shadow_stats(counter, *count, 0);
> + perf_stat__update_shadow_stats(counter, *count, 0, NULL);
and here
thanks,
jirka
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 4/9] perf util: Update and print per-thread shadow stats
2017-11-20 14:43 ` [PATCH v1 4/9] perf util: Update and print " Jin Yao
2017-11-21 15:17 ` Jiri Olsa
@ 2017-11-21 15:18 ` Jiri Olsa
2017-11-22 3:10 ` Jin, Yao
1 sibling, 1 reply; 49+ messages in thread
From: Jiri Olsa @ 2017-11-21 15:18 UTC (permalink / raw)
To: Jin Yao
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On Mon, Nov 20, 2017 at 10:43:39PM +0800, Jin Yao wrote:
SNIP
> if (num == 0)
> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> index 151e9ef..50bb16d 100644
> --- a/tools/perf/util/stat.c
> +++ b/tools/perf/util/stat.c
> @@ -278,9 +278,16 @@ process_counter_values(struct perf_stat_config *config, struct perf_evsel *evsel
> perf_evsel__compute_deltas(evsel, cpu, thread, count);
> perf_counts_values__scale(count, config->scale, NULL);
> if (config->aggr_mode == AGGR_NONE)
> - perf_stat__update_shadow_stats(evsel, count->val, cpu);
> - if (config->aggr_mode == AGGR_THREAD)
> - perf_stat__update_shadow_stats(evsel, count->val, 0);
> + perf_stat__update_shadow_stats(evsel, count->val, cpu,
> + NULL);
> + if (config->aggr_mode == AGGR_THREAD) {
> + if (config->stats)
> + perf_stat__update_shadow_stats(evsel,
> + count->val, 0, &config->stats[thread]);
please add this part together with config->stats allocation/usage
and keep this change only about adding the struct runtime_stat *stat
into the code path with no functional change
thanks,
jirka
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 2/9] perf util: Define a structure for runtime shadow metrics stats
2017-11-20 14:43 ` [PATCH v1 2/9] perf util: Define a structure for runtime shadow metrics stats Jin Yao
@ 2017-11-21 15:18 ` Jiri Olsa
2017-11-22 3:11 ` Jin, Yao
0 siblings, 1 reply; 49+ messages in thread
From: Jiri Olsa @ 2017-11-21 15:18 UTC (permalink / raw)
To: Jin Yao
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On Mon, Nov 20, 2017 at 10:43:37PM +0800, Jin Yao wrote:
SNIP
> + STAT_SMI_NUM,
> + STAT_APERF,
> + STAT_MAX
> +};
> +
> +struct runtime_stat {
> + struct rblist value_list;
> +};
> +
> struct perf_stat_config {
> enum aggr_mode aggr_mode;
> bool scale;
> FILE *output;
> unsigned int interval;
> + struct runtime_stat *stats;
> + int stat_num;
s/stat_num/stats_num/ or s/stats/stat/
thanks,
jirka
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 8/9] perf stat: Remove --per-thread pid/tid limitation
2017-11-20 14:43 ` [PATCH v1 8/9] perf stat: Remove --per-thread pid/tid limitation Jin Yao
@ 2017-11-21 15:18 ` Jiri Olsa
2017-11-22 3:42 ` Jin, Yao
2017-11-21 15:18 ` Jiri Olsa
2017-11-21 15:18 ` Jiri Olsa
2 siblings, 1 reply; 49+ messages in thread
From: Jiri Olsa @ 2017-11-21 15:18 UTC (permalink / raw)
To: Jin Yao
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On Mon, Nov 20, 2017 at 10:43:43PM +0800, Jin Yao wrote:
SNIP
> - if ((stat_config.aggr_mode == AGGR_THREAD) && !target__has_task(&target)) {
> - fprintf(stderr, "The --per-thread option is only available "
> - "when monitoring via -p -t options.\n");
> - parse_options_usage(NULL, stat_options, "p", 1);
> - parse_options_usage(NULL, stat_options, "t", 1);
> - goto out;
> + if ((stat_config.aggr_mode == AGGR_THREAD) &&
> + !target__has_task(&target)) {
> + if (!target.system_wide || target.cpu_list) {
> + fprintf(stderr, "The --per-thread option is only "
> + "available when monitoring via -p -t "
> + "options.\n");
the message should be updated with '-a' option, that you just added,
also why dont we support target.cpu_list, it should work no?
jirka
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 8/9] perf stat: Remove --per-thread pid/tid limitation
2017-11-20 14:43 ` [PATCH v1 8/9] perf stat: Remove --per-thread pid/tid limitation Jin Yao
2017-11-21 15:18 ` Jiri Olsa
@ 2017-11-21 15:18 ` Jiri Olsa
2017-11-22 5:34 ` Jin, Yao
2017-11-21 15:18 ` Jiri Olsa
2 siblings, 1 reply; 49+ messages in thread
From: Jiri Olsa @ 2017-11-21 15:18 UTC (permalink / raw)
To: Jin Yao
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On Mon, Nov 20, 2017 at 10:43:43PM +0800, Jin Yao wrote:
SNIP
> diff --git a/tools/perf/util/target.h b/tools/perf/util/target.h
> index 446aa7a..6ef01a8 100644
> --- a/tools/perf/util/target.h
> +++ b/tools/perf/util/target.h
> @@ -64,6 +64,11 @@ static inline bool target__none(struct target *target)
> return !target__has_task(target) && !target__has_cpu(target);
> }
>
> +static inline bool target__has_per_thread(struct target *target)
> +{
> + return target->system_wide && target->per_thread;
> +}
this is confusing.. has_per_thread depends on system_wide?
> +
> static inline bool target__uses_dummy_map(struct target *target)
> {
> bool use_dummy = false;
> @@ -73,6 +78,8 @@ static inline bool target__uses_dummy_map(struct target *target)
> else if (target__has_task(target) ||
> (!target__has_cpu(target) && !target->uses_mmap))
> use_dummy = true;
> + else if (target__has_per_thread(target))
> + use_dummy = true;
why do we need dummy_map for this? please comment
thanks,
jirka
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 8/9] perf stat: Remove --per-thread pid/tid limitation
2017-11-20 14:43 ` [PATCH v1 8/9] perf stat: Remove --per-thread pid/tid limitation Jin Yao
2017-11-21 15:18 ` Jiri Olsa
2017-11-21 15:18 ` Jiri Olsa
@ 2017-11-21 15:18 ` Jiri Olsa
2017-11-22 5:38 ` Jin, Yao
2 siblings, 1 reply; 49+ messages in thread
From: Jiri Olsa @ 2017-11-21 15:18 UTC (permalink / raw)
To: Jin Yao
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On Mon, Nov 20, 2017 at 10:43:43PM +0800, Jin Yao wrote:
> Currently, if we execute 'perf stat --per-thread' without specifying
> pid/tid, perf will return error.
>
> root@skl:/tmp# perf stat --per-thread
> The --per-thread option is only available when monitoring via -p -t options.
> -p, --pid <pid> stat events on existing process id
> -t, --tid <tid> stat events on existing thread id
>
> This patch removes this limitation. If no pid/tid specified, it returns
> all threads (get threads from /proc).
>
> Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
> ---
> tools/perf/builtin-stat.c | 23 +++++++++++++++--------
> tools/perf/util/target.h | 7 +++++++
> 2 files changed, 22 insertions(+), 8 deletions(-)
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 9eec145..2d718f7 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -277,7 +277,7 @@ static int create_perf_stat_counter(struct perf_evsel *evsel)
> attr->enable_on_exec = 1;
> }
>
> - if (target__has_cpu(&target))
> + if (target__has_cpu(&target) && !target__has_per_thread(&target))
please add comment on why this is needed..
> return perf_evsel__open_per_cpu(evsel, perf_evsel__cpus(evsel));
>
> return perf_evsel__open_per_thread(evsel, evsel_list->threads);
> @@ -340,7 +340,7 @@ static int read_counter(struct perf_evsel *counter)
> int nthreads = thread_map__nr(evsel_list->threads);
> int ncpus, cpu, thread;
>
> - if (target__has_cpu(&target))
> + if (target__has_cpu(&target) && !target__has_per_thread(&target))
same here
thanks,
jirka
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 5/9] perf util: Remove a set of shadow stats static variables
2017-11-21 15:17 ` Jiri Olsa
@ 2017-11-21 18:03 ` Andi Kleen
2017-11-21 21:19 ` Jiri Olsa
0 siblings, 1 reply; 49+ messages in thread
From: Andi Kleen @ 2017-11-21 18:03 UTC (permalink / raw)
To: Jiri Olsa
Cc: Jin Yao, acme, jolsa, peterz, mingo, alexander.shishkin,
Linux-kernel, kan.liang, yao.jin
> all this is about switching from array to rb_list for the --per-thread case,
> which can be considered as a special use case.. how much do we suffer in
> performance with new code? how about the "perf stat -I 100", would it scale
> ok for extreme cases (many events in -e or -dddd..)
rbtrees scale by log N, with N being the entries in the tree.
Even in extreme cases, let's say 10000 events and 1000 cpus it would
need only 8 memory accesses and comparisons for each look up.
Even if we assume cache misses for all of the memory lookups,
at ~200ns per cache miss it's still only 1us per event, which
is negligible.
In practice not all memory accesses will be misses because
the upper levels of the tree are almost certainly cached
from earlier accesses.
-Andi
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 5/9] perf util: Remove a set of shadow stats static variables
2017-11-21 18:03 ` Andi Kleen
@ 2017-11-21 21:19 ` Jiri Olsa
0 siblings, 0 replies; 49+ messages in thread
From: Jiri Olsa @ 2017-11-21 21:19 UTC (permalink / raw)
To: Andi Kleen
Cc: Jin Yao, acme, jolsa, peterz, mingo, alexander.shishkin,
Linux-kernel, kan.liang, yao.jin
On Tue, Nov 21, 2017 at 10:03:50AM -0800, Andi Kleen wrote:
> > all this is about switching from array to rb_list for the --per-thread case,
> > which can be considered as a special use case.. how much do we suffer in
> > performance with new code? how about the "perf stat -I 100", would it scale
> > ok for extreme cases (many events in -e or -dddd..)
>
> rbtrees scale by log N, with N being the entries in the tree.
>
> Even in extreme cases, let's say 10000 events and 1000 cpus it would
> need only 8 memory accesses and comparisons for each look up.
> Even if we assume cache misses for all of the memory lookups,
> at ~200ns per cache miss it's still only 1us per event, which
> is negligible.
>
> In practice not all memory accesses will be misses because
> the upper levels of the tree are almost certainly cached
> from earlier accesses.
sounds good, thanks
jirka
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 3/9] perf util: Reconstruct rblist for supporting per-thread shadow stats
2017-11-21 15:17 ` Jiri Olsa
@ 2017-11-22 1:29 ` Jin, Yao
2017-11-22 8:30 ` Jiri Olsa
0 siblings, 1 reply; 49+ messages in thread
From: Jin, Yao @ 2017-11-22 1:29 UTC (permalink / raw)
To: Jiri Olsa
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On 11/21/2017 11:17 PM, Jiri Olsa wrote:
> On Mon, Nov 20, 2017 at 10:43:38PM +0800, Jin Yao wrote:
>
> SNIP
>
>>
>> +static void init_saved_rblist(struct rblist *rblist)
>> +{
>> + rblist__init(rblist);
>> + rblist->node_cmp = saved_value_cmp;
>> + rblist->node_new = saved_value_new;
>> + rblist->node_delete = saved_value_delete;
>> +}
>> +
>> +static void free_saved_rblist(struct rblist *rblist)
>> +{
>> + rblist__reset(rblist);
>> +}
>> +
>> +void perf_stat__init_runtime_stat(struct runtime_stat *stat)
>> +{
>> + memset(stat, 0, sizeof(struct runtime_stat));
>
> what's this memset for? that struct has only rb_list
>
> jirka
>
Currently this struct has only rblist. I use memset to zero this struct
is for future potential new added fields to this struct.
If it's not good, I will remove this memset.
Thanks
Jin Yao
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 3/9] perf util: Reconstruct rblist for supporting per-thread shadow stats
2017-11-21 15:17 ` Jiri Olsa
@ 2017-11-22 1:35 ` Jin, Yao
0 siblings, 0 replies; 49+ messages in thread
From: Jin, Yao @ 2017-11-22 1:35 UTC (permalink / raw)
To: Jiri Olsa
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On 11/21/2017 11:17 PM, Jiri Olsa wrote:
> On Mon, Nov 20, 2017 at 10:43:38PM +0800, Jin Yao wrote:
>
> SNIP
>
>>
>> +static void init_saved_rblist(struct rblist *rblist)
>> +{
>> + rblist__init(rblist);
>> + rblist->node_cmp = saved_value_cmp;
>> + rblist->node_new = saved_value_new;
>> + rblist->node_delete = saved_value_delete;
>> +}
>> +
>> +static void free_saved_rblist(struct rblist *rblist)
>> +{
>> + rblist__reset(rblist);
>> +}
>
> I don't see a reason for this code to be in separate
> functions.. could go directly in:
> perf_stat__init_runtime_stat
> perf_stat__free_runtime_stat
>
> jirka
>
Agree, that's better.
I will put this code in perf_stat__init_runtime_stat and
perf_stat__free_runtime_stat.
Thanks
Jin Yao
>> +
>> +void perf_stat__init_runtime_stat(struct runtime_stat *stat)
>> +{
>> + memset(stat, 0, sizeof(struct runtime_stat));
>> + init_saved_rblist(&stat->value_list);
>> +}
>> +
>> +void perf_stat__free_runtime_stat(struct runtime_stat *stat)
>> +{
>> + free_saved_rblist(&stat->value_list);
>> +}
>> +
>
> SNIP
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 3/9] perf util: Reconstruct rblist for supporting per-thread shadow stats
2017-11-21 15:17 ` Jiri Olsa
@ 2017-11-22 1:45 ` Jin, Yao
0 siblings, 0 replies; 49+ messages in thread
From: Jin, Yao @ 2017-11-22 1:45 UTC (permalink / raw)
To: Jiri Olsa
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On 11/21/2017 11:17 PM, Jiri Olsa wrote:
> On Mon, Nov 20, 2017 at 10:43:38PM +0800, Jin Yao wrote:
>> In current stat-shadow.c, the rblist deleting is ignored.
>>
>> The patch reconstructs the codes of rblist init/free, and adds
>> the implementation to node_delete method of rblist.
>>
>> This patch also does:
>>
>> 1. Add the ctx/type/stat into rbtree keys because we will
>> use this rbtree to maintain shadow metrics to replace original
>> a couple of static arrays for supporting per-thread shadow stats.
>>
>> 2. Create a static runtime_stat variable 'rt_stat' which
>> will log the shadow metrics by default.
>
> please make that separate patches then,
> one patch - one logical change
>
> thanks,
> jirka
>
OK, I will separate the patches.
Thanks
Jin Yao
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 3/9] perf util: Reconstruct rblist for supporting per-thread shadow stats
2017-11-21 15:17 ` Jiri Olsa
@ 2017-11-22 2:11 ` Jin, Yao
0 siblings, 0 replies; 49+ messages in thread
From: Jin, Yao @ 2017-11-22 2:11 UTC (permalink / raw)
To: Jiri Olsa
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On 11/21/2017 11:17 PM, Jiri Olsa wrote:
> On Mon, Nov 20, 2017 at 10:43:38PM +0800, Jin Yao wrote:
>
> SNIP
>
>> +static void init_saved_rblist(struct rblist *rblist)
>> +{
>> + rblist__init(rblist);
>> + rblist->node_cmp = saved_value_cmp;
>> + rblist->node_new = saved_value_new;
>> + rblist->node_delete = saved_value_delete;
>> +}
>> +
>> +static void free_saved_rblist(struct rblist *rblist)
>> +{
>> + rblist__reset(rblist);
>> +}
>> +
>> +void perf_stat__init_runtime_stat(struct runtime_stat *stat)
>> +{
>> + memset(stat, 0, sizeof(struct runtime_stat));
>> + init_saved_rblist(&stat->value_list);
>> +}
>> +
>> +void perf_stat__free_runtime_stat(struct runtime_stat *stat)
>> +{
>> + free_saved_rblist(&stat->value_list);
>> +}
>> +
>> void perf_stat__init_shadow_stats(void)
>> {
>> have_frontend_stalled = pmu_have_event("cpu", "stalled-cycles-frontend");
>> - rblist__init(&runtime_saved_values);
>> - runtime_saved_values.node_cmp = saved_value_cmp;
>> - runtime_saved_values.node_new = saved_value_new;
>> - /* No delete for now */
>> + memset(&walltime_nsecs_stats, 0, sizeof(walltime_nsecs_stats));
>
> why do you zero walltime_nsecs_stats in here?
>
> jirka
>
walltime_nsecs_stats is initialized in process_interval().
init_stats(&walltime_nsecs_stats);
Yes, the zero of walltime_nsecs_stats in perf_stat__init_shadow_stats
looks a bit redundant. I will remove it.
Thanks
Jin Yao
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 3/9] perf util: Reconstruct rblist for supporting per-thread shadow stats
2017-11-21 15:17 ` Jiri Olsa
@ 2017-11-22 2:19 ` Jin, Yao
0 siblings, 0 replies; 49+ messages in thread
From: Jin, Yao @ 2017-11-22 2:19 UTC (permalink / raw)
To: Jiri Olsa
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On 11/21/2017 11:17 PM, Jiri Olsa wrote:
> On Mon, Nov 20, 2017 at 10:43:38PM +0800, Jin Yao wrote:
>> In current stat-shadow.c, the rblist deleting is ignored.
>>
>> The patch reconstructs the codes of rblist init/free, and adds
>> the implementation to node_delete method of rblist.
>>
>> This patch also does:
>>
>> 1. Add the ctx/type/stat into rbtree keys because we will
>> use this rbtree to maintain shadow metrics to replace original
>> a couple of static arrays for supporting per-thread shadow stats.
>>
>> 2. Create a static runtime_stat variable 'rt_stat' which
>> will log the shadow metrics by default.
>>
>> Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
>> ---
>> tools/perf/util/stat-shadow.c | 62 ++++++++++++++++++++++++++++++++++++++++---
>> tools/perf/util/stat.h | 2 ++
>> 2 files changed, 60 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
>> index 5853901..045e129 100644
>> --- a/tools/perf/util/stat-shadow.c
>> +++ b/tools/perf/util/stat-shadow.c
>> @@ -40,12 +40,16 @@ static struct stats runtime_aperf_stats[NUM_CTX][MAX_NR_CPUS];
>> static struct rblist runtime_saved_values;
>> static bool have_frontend_stalled;
>>
>> +static struct runtime_stat rt_stat;
>> struct stats walltime_nsecs_stats;
>>
>> struct saved_value {
>> struct rb_node rb_node;
>> struct perf_evsel *evsel;
>> + enum stat_type type;
>> + int ctx;
>> int cpu;
>> + struct runtime_stat *stat;
>> struct stats stats;
>> };
>>
>> @@ -58,6 +62,23 @@ static int saved_value_cmp(struct rb_node *rb_node, const void *entry)
>>
>> if (a->cpu != b->cpu)
>> return a->cpu - b->cpu;
>> +
>> + if (a->type != b->type)
>> + return a->type - b->type;
>> +
>> + if (a->ctx != b->ctx)
>> + return a->ctx - b->ctx;
>> +
>
> could you please comment in here on the cases where
> evsel is defined and when not
>
> jirka
>
Thanks for reminding.
I will add comments to describe the cases where and when the evsel is
set or not.
Thanks
Jin Yao
>> + if (a->evsel == NULL && b->evsel == NULL) {
>> + if (a->stat == b->stat)
>> + return 0;
>> +
>> + if ((char *)a->stat < (char *)b->stat)
>> + return -1;
>> +
>> + return 1;
>> + }
>> +
>
> SNIP
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 3/9] perf util: Reconstruct rblist for supporting per-thread shadow stats
2017-11-21 15:17 ` Jiri Olsa
@ 2017-11-22 2:20 ` Jin, Yao
0 siblings, 0 replies; 49+ messages in thread
From: Jin, Yao @ 2017-11-22 2:20 UTC (permalink / raw)
To: Jiri Olsa
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On 11/21/2017 11:17 PM, Jiri Olsa wrote:
> On Mon, Nov 20, 2017 at 10:43:38PM +0800, Jin Yao wrote:
>
> SNIP
>
>> static int evsel_context(struct perf_evsel *evsel)
>> diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
>> index 61fd2e0..4eb081d 100644
>> --- a/tools/perf/util/stat.h
>> +++ b/tools/perf/util/stat.h
>> @@ -126,6 +126,8 @@ typedef void (*print_metric_t)(void *ctx, const char *color, const char *unit,
>> const char *fmt, double val);
>> typedef void (*new_line_t )(void *ctx);
>>
>> +void perf_stat__init_runtime_stat(struct runtime_stat *stat);
>> +void perf_stat__free_runtime_stat(struct runtime_stat *stat);
>
> maybe that could be:
> runtime_stat__init(struct runtime_stat *stat);
> runtime_stat__free(struct runtime_stat *stat);
>
> jirka
>
Sure, I will update according to your suggestion.
Thanks
Jin Yao
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 4/9] perf util: Update and print per-thread shadow stats
2017-11-21 15:17 ` Jiri Olsa
@ 2017-11-22 2:42 ` Jin, Yao
0 siblings, 0 replies; 49+ messages in thread
From: Jin, Yao @ 2017-11-22 2:42 UTC (permalink / raw)
To: Jiri Olsa
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On 11/21/2017 11:17 PM, Jiri Olsa wrote:
> On Mon, Nov 20, 2017 at 10:43:39PM +0800, Jin Yao wrote:
>
> SNIP
>
>> diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
>> index 045e129..6f28782 100644
>> --- a/tools/perf/util/stat-shadow.c
>> +++ b/tools/perf/util/stat-shadow.c
>> @@ -110,19 +110,32 @@ static void saved_value_delete(struct rblist *rblist __maybe_unused,
>>
>> static struct saved_value *saved_value_lookup(struct perf_evsel *evsel,
>> int cpu,
>> - bool create)
>> + bool create,
>> + enum stat_type type,
>> + int ctx,
>> + struct runtime_stat *stat)
>> {
>> + struct rblist *rblist;
>> struct rb_node *nd;
>> struct saved_value dm = {
>> .cpu = cpu,
>> .evsel = evsel,
>> + .type = type,
>> + .ctx = ctx,
>> + .stat = stat,
>> };
>> - nd = rblist__find(&runtime_saved_values, &dm);
>> +
>> + if (stat)
>> + rblist = &stat->value_list;
>> + else
>> + rblist = &rt_stat.value_list;
>
> please pass the correct 'struct runtime_stat *stat',
>
> I don't see a reason noot to pass &rt_stat directly below:
>
>
>> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
>> index 151e9ef..50bb16d 100644
>> --- a/tools/perf/util/stat.c
>> +++ b/tools/perf/util/stat.c
>> @@ -278,9 +278,16 @@ process_counter_values(struct perf_stat_config *config, struct perf_evsel *evsel
>> perf_evsel__compute_deltas(evsel, cpu, thread, count);
>> perf_counts_values__scale(count, config->scale, NULL);
>> if (config->aggr_mode == AGGR_NONE)
>> - perf_stat__update_shadow_stats(evsel, count->val, cpu);
>> - if (config->aggr_mode == AGGR_THREAD)
>> - perf_stat__update_shadow_stats(evsel, count->val, 0);
>> + perf_stat__update_shadow_stats(evsel, count->val, cpu,
>> + NULL);
>> + if (config->aggr_mode == AGGR_THREAD) {
>> + if (config->stats)
>> + perf_stat__update_shadow_stats(evsel,
>> + count->val, 0, &config->stats[thread]);
>> + else
>> + perf_stat__update_shadow_stats(evsel,
>> + count->val, 0, NULL);
>
> here
>
>> + }
>> break;
>> case AGGR_GLOBAL:
>> aggr->val += count->val;
>> @@ -362,7 +369,7 @@ int perf_stat_process_counter(struct perf_stat_config *config,
>> /*
>> * Save the full runtime - to allow normalization during printout:
>> */
>> - perf_stat__update_shadow_stats(counter, *count, 0);
>> + perf_stat__update_shadow_stats(counter, *count, 0, NULL);
>
> and here
>
> thanks,
> jirka
>
Fine, I will pass &rt_stat to perf_stat__update_shadow_stats to replace
the NULL.
The change maybe needs to let the rt_stat be a global variable.
Thanks
Jin Yao
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 4/9] perf util: Update and print per-thread shadow stats
2017-11-21 15:18 ` Jiri Olsa
@ 2017-11-22 3:10 ` Jin, Yao
2017-11-22 8:37 ` Jiri Olsa
0 siblings, 1 reply; 49+ messages in thread
From: Jin, Yao @ 2017-11-22 3:10 UTC (permalink / raw)
To: Jiri Olsa
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On 11/21/2017 11:18 PM, Jiri Olsa wrote:
> On Mon, Nov 20, 2017 at 10:43:39PM +0800, Jin Yao wrote:
>
> SNIP
>
>> if (num == 0)
>> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
>> index 151e9ef..50bb16d 100644
>> --- a/tools/perf/util/stat.c
>> +++ b/tools/perf/util/stat.c
>> @@ -278,9 +278,16 @@ process_counter_values(struct perf_stat_config *config, struct perf_evsel *evsel
>> perf_evsel__compute_deltas(evsel, cpu, thread, count);
>> perf_counts_values__scale(count, config->scale, NULL);
>> if (config->aggr_mode == AGGR_NONE)
>> - perf_stat__update_shadow_stats(evsel, count->val, cpu);
>> - if (config->aggr_mode == AGGR_THREAD)
>> - perf_stat__update_shadow_stats(evsel, count->val, 0);
>> + perf_stat__update_shadow_stats(evsel, count->val, cpu,
>> + NULL);
>> + if (config->aggr_mode == AGGR_THREAD) {
>> + if (config->stats)
>> + perf_stat__update_shadow_stats(evsel,
>> + count->val, 0, &config->stats[thread]);
>
> please add this part together with config->stats allocation/usage
> and keep this change only about adding the struct runtime_stat *stat
> into the code path with no functional change
>
> thanks,
> jirka
>
The interface of perf_stat__update_shadow_stats() is changed (added with
a new parameter 'struct runtime_stat *stat').
If I move above part to another patch (e.g. move to
v1-0006-perf-stat-Allocate-shadow-stats-buffer-for-thread.patch), the
compilation would be failed if only applying with this patch. It's not
good for git bisect.
Thanks
Jin Yao
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 2/9] perf util: Define a structure for runtime shadow metrics stats
2017-11-21 15:18 ` Jiri Olsa
@ 2017-11-22 3:11 ` Jin, Yao
0 siblings, 0 replies; 49+ messages in thread
From: Jin, Yao @ 2017-11-22 3:11 UTC (permalink / raw)
To: Jiri Olsa
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On 11/21/2017 11:18 PM, Jiri Olsa wrote:
> On Mon, Nov 20, 2017 at 10:43:37PM +0800, Jin Yao wrote:
>
> SNIP
>
>> + STAT_SMI_NUM,
>> + STAT_APERF,
>> + STAT_MAX
>> +};
>> +
>> +struct runtime_stat {
>> + struct rblist value_list;
>> +};
>> +
>> struct perf_stat_config {
>> enum aggr_mode aggr_mode;
>> bool scale;
>> FILE *output;
>> unsigned int interval;
>> + struct runtime_stat *stats;
>> + int stat_num;
>
> s/stat_num/stats_num/ or s/stats/stat/
>
> thanks,
> jirka
>
Fine, I will update according to your suggestion.
Thanks
Jin Yao
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 8/9] perf stat: Remove --per-thread pid/tid limitation
2017-11-21 15:18 ` Jiri Olsa
@ 2017-11-22 3:42 ` Jin, Yao
2017-11-22 8:35 ` Jiri Olsa
0 siblings, 1 reply; 49+ messages in thread
From: Jin, Yao @ 2017-11-22 3:42 UTC (permalink / raw)
To: Jiri Olsa
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On 11/21/2017 11:18 PM, Jiri Olsa wrote:
> On Mon, Nov 20, 2017 at 10:43:43PM +0800, Jin Yao wrote:
>
> SNIP
>
>> - if ((stat_config.aggr_mode == AGGR_THREAD) && !target__has_task(&target)) {
>> - fprintf(stderr, "The --per-thread option is only available "
>> - "when monitoring via -p -t options.\n");
>> - parse_options_usage(NULL, stat_options, "p", 1);
>> - parse_options_usage(NULL, stat_options, "t", 1);
>> - goto out;
>> + if ((stat_config.aggr_mode == AGGR_THREAD) &&
>> + !target__has_task(&target)) {
>> + if (!target.system_wide || target.cpu_list) {
>> + fprintf(stderr, "The --per-thread option is only "
>> + "available when monitoring via -p -t "
>> + "options.\n");
>
> the message should be updated with '-a' option, that you just added,
OK. Could I update the message like this?
"The --per-thread option is only "
"available when monitoring via -p -t -a"
"options or only --per-thread without any other option"
> also why dont we support target.cpu_list, it should work no?
>
Currently it doesn't support cpu_list.
I just think this patch series is too big and I wish to add supporting
for cpu_list, cgroup or others in follow up patches.
Is that OK?
Thanks
Jin Yao
> jirka
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 8/9] perf stat: Remove --per-thread pid/tid limitation
2017-11-21 15:18 ` Jiri Olsa
@ 2017-11-22 5:34 ` Jin, Yao
0 siblings, 0 replies; 49+ messages in thread
From: Jin, Yao @ 2017-11-22 5:34 UTC (permalink / raw)
To: Jiri Olsa
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On 11/21/2017 11:18 PM, Jiri Olsa wrote:
> On Mon, Nov 20, 2017 at 10:43:43PM +0800, Jin Yao wrote:
>
> SNIP
>
>> diff --git a/tools/perf/util/target.h b/tools/perf/util/target.h
>> index 446aa7a..6ef01a8 100644
>> --- a/tools/perf/util/target.h
>> +++ b/tools/perf/util/target.h
>> @@ -64,6 +64,11 @@ static inline bool target__none(struct target *target)
>> return !target__has_task(target) && !target__has_cpu(target);
>> }
>>
>> +static inline bool target__has_per_thread(struct target *target)
>> +{
>> + return target->system_wide && target->per_thread;
>> +}
>
> this is confusing.. has_per_thread depends on system_wide?
>
This patch series only supports to get per-thread data for the whole system.
So I add checking of system_wide here.
>> +
>> static inline bool target__uses_dummy_map(struct target *target)
>> {
>> bool use_dummy = false;
>> @@ -73,6 +78,8 @@ static inline bool target__uses_dummy_map(struct target *target)
>> else if (target__has_task(target) ||
>> (!target__has_cpu(target) && !target->uses_mmap))
>> use_dummy = true;
>> + else if (target__has_per_thread(target))
>> + use_dummy = true;
>
> why do we need dummy_map for this? please comment
>
We need dummy_map here. That's similar to the processing of
'--per-thread -p -t'.
We need a dummy map to help us to aggregate counts per-thread.
Thanks
Jin Yao
> thanks,
> jirka
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 8/9] perf stat: Remove --per-thread pid/tid limitation
2017-11-21 15:18 ` Jiri Olsa
@ 2017-11-22 5:38 ` Jin, Yao
0 siblings, 0 replies; 49+ messages in thread
From: Jin, Yao @ 2017-11-22 5:38 UTC (permalink / raw)
To: Jiri Olsa
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On 11/21/2017 11:18 PM, Jiri Olsa wrote:
> On Mon, Nov 20, 2017 at 10:43:43PM +0800, Jin Yao wrote:
>> Currently, if we execute 'perf stat --per-thread' without specifying
>> pid/tid, perf will return error.
>>
>> root@skl:/tmp# perf stat --per-thread
>> The --per-thread option is only available when monitoring via -p -t options.
>> -p, --pid <pid> stat events on existing process id
>> -t, --tid <tid> stat events on existing thread id
>>
>> This patch removes this limitation. If no pid/tid specified, it returns
>> all threads (get threads from /proc).
>>
>> Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
>> ---
>> tools/perf/builtin-stat.c | 23 +++++++++++++++--------
>> tools/perf/util/target.h | 7 +++++++
>> 2 files changed, 22 insertions(+), 8 deletions(-)
>>
>> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
>> index 9eec145..2d718f7 100644
>> --- a/tools/perf/builtin-stat.c
>> +++ b/tools/perf/builtin-stat.c
>> @@ -277,7 +277,7 @@ static int create_perf_stat_counter(struct perf_evsel *evsel)
>> attr->enable_on_exec = 1;
>> }
>>
>> - if (target__has_cpu(&target))
>> + if (target__has_cpu(&target) && !target__has_per_thread(&target))
>
> please add comment on why this is needed..
>
>> return perf_evsel__open_per_cpu(evsel, perf_evsel__cpus(evsel));
>>
>> return perf_evsel__open_per_thread(evsel, evsel_list->threads);
>> @@ -340,7 +340,7 @@ static int read_counter(struct perf_evsel *counter)
>> int nthreads = thread_map__nr(evsel_list->threads);
>> int ncpus, cpu, thread;
>>
>> - if (target__has_cpu(&target))
>> + if (target__has_cpu(&target) && !target__has_per_thread(&target))
>
> same here
>
That's because this patch series doesn't support cpu_list yet. So if
it's a cpu_list case, then skip.
I plan to add cpu_list supporting as follow-up patch to avoid adding too
much in this patch series.
Thanks
Jin Yao
> thanks,
> jirka
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 3/9] perf util: Reconstruct rblist for supporting per-thread shadow stats
2017-11-20 14:43 ` [PATCH v1 3/9] perf util: Reconstruct rblist for supporting per-thread shadow stats Jin Yao
` (5 preceding siblings ...)
2017-11-21 15:17 ` Jiri Olsa
@ 2017-11-22 6:31 ` Ravi Bangoria
2017-11-22 6:57 ` Jin, Yao
6 siblings, 1 reply; 49+ messages in thread
From: Ravi Bangoria @ 2017-11-22 6:31 UTC (permalink / raw)
To: Jin Yao
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin, Ravi Bangoria
On 11/20/2017 08:13 PM, Jin Yao wrote:
> @@ -76,6 +97,17 @@ static struct rb_node *saved_value_new(struct rblist *rblist __maybe_unused,
> return &nd->rb_node;
> }
>
> +static void saved_value_delete(struct rblist *rblist __maybe_unused,
> + struct rb_node *rb_node)
> +{
> + struct saved_value *v = container_of(rb_node,
> + struct saved_value,
> + rb_node);
> +
> + if (v)
> + free(v);
> +}
Do we really need if(v) ?
Thanks,
Ravi
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 3/9] perf util: Reconstruct rblist for supporting per-thread shadow stats
2017-11-22 6:31 ` Ravi Bangoria
@ 2017-11-22 6:57 ` Jin, Yao
2017-11-22 8:32 ` Jiri Olsa
0 siblings, 1 reply; 49+ messages in thread
From: Jin, Yao @ 2017-11-22 6:57 UTC (permalink / raw)
To: Ravi Bangoria
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On 11/22/2017 2:31 PM, Ravi Bangoria wrote:
>
> On 11/20/2017 08:13 PM, Jin Yao wrote:
>> @@ -76,6 +97,17 @@ static struct rb_node *saved_value_new(struct
>> rblist *rblist __maybe_unused,
>> return &nd->rb_node;
>> }
>>
>> +static void saved_value_delete(struct rblist *rblist __maybe_unused,
>> + struct rb_node *rb_node)
>> +{
>> + struct saved_value *v = container_of(rb_node,
>> + struct saved_value,
>> + rb_node);
>> +
>> + if (v)
>> + free(v);
>> +}
>
> Do we really need if(v) ?
>
> Thanks,
> Ravi
>
Hi Ravi,
Looks it doesn't need if(v).
I put if(v) here is from my coding habits (checking pointer before free).
It's OK for me if you think the code should be removed.
Thanks
Jin Yao
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 3/9] perf util: Reconstruct rblist for supporting per-thread shadow stats
2017-11-22 1:29 ` Jin, Yao
@ 2017-11-22 8:30 ` Jiri Olsa
0 siblings, 0 replies; 49+ messages in thread
From: Jiri Olsa @ 2017-11-22 8:30 UTC (permalink / raw)
To: Jin, Yao
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On Wed, Nov 22, 2017 at 09:29:26AM +0800, Jin, Yao wrote:
>
>
> On 11/21/2017 11:17 PM, Jiri Olsa wrote:
> > On Mon, Nov 20, 2017 at 10:43:38PM +0800, Jin Yao wrote:
> >
> > SNIP
> >
> > > +static void init_saved_rblist(struct rblist *rblist)
> > > +{
> > > + rblist__init(rblist);
> > > + rblist->node_cmp = saved_value_cmp;
> > > + rblist->node_new = saved_value_new;
> > > + rblist->node_delete = saved_value_delete;
> > > +}
> > > +
> > > +static void free_saved_rblist(struct rblist *rblist)
> > > +{
> > > + rblist__reset(rblist);
> > > +}
> > > +
> > > +void perf_stat__init_runtime_stat(struct runtime_stat *stat)
> > > +{
> > > + memset(stat, 0, sizeof(struct runtime_stat));
> >
> > what's this memset for? that struct has only rb_list
> >
> > jirka
> >
>
> Currently this struct has only rblist. I use memset to zero this struct is
> for future potential new added fields to this struct.
>
> If it's not good, I will remove this memset.
ok, np
jirka
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 3/9] perf util: Reconstruct rblist for supporting per-thread shadow stats
2017-11-22 6:57 ` Jin, Yao
@ 2017-11-22 8:32 ` Jiri Olsa
2017-11-22 12:03 ` Jin, Yao
0 siblings, 1 reply; 49+ messages in thread
From: Jiri Olsa @ 2017-11-22 8:32 UTC (permalink / raw)
To: Jin, Yao
Cc: Ravi Bangoria, acme, jolsa, peterz, mingo, alexander.shishkin,
Linux-kernel, ak, kan.liang, yao.jin
On Wed, Nov 22, 2017 at 02:57:12PM +0800, Jin, Yao wrote:
>
>
> On 11/22/2017 2:31 PM, Ravi Bangoria wrote:
> >
> > On 11/20/2017 08:13 PM, Jin Yao wrote:
> > > @@ -76,6 +97,17 @@ static struct rb_node *saved_value_new(struct
> > > rblist *rblist __maybe_unused,
> > > return &nd->rb_node;
> > > }
> > >
> > > +static void saved_value_delete(struct rblist *rblist __maybe_unused,
> > > + struct rb_node *rb_node)
> > > +{
> > > + struct saved_value *v = container_of(rb_node,
> > > + struct saved_value,
> > > + rb_node);
> > > +
> > > + if (v)
> > > + free(v);
> > > +}
> >
> > Do we really need if(v) ?
> >
> > Thanks,
> > Ravi
> >
>
> Hi Ravi,
>
> Looks it doesn't need if(v).
>
> I put if(v) here is from my coding habits (checking pointer before free).
>
> It's OK for me if you think the code should be removed.
you could add BUG_ON(!rb_node);
jirka
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 8/9] perf stat: Remove --per-thread pid/tid limitation
2017-11-22 3:42 ` Jin, Yao
@ 2017-11-22 8:35 ` Jiri Olsa
0 siblings, 0 replies; 49+ messages in thread
From: Jiri Olsa @ 2017-11-22 8:35 UTC (permalink / raw)
To: Jin, Yao
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On Wed, Nov 22, 2017 at 11:42:05AM +0800, Jin, Yao wrote:
>
>
> On 11/21/2017 11:18 PM, Jiri Olsa wrote:
> > On Mon, Nov 20, 2017 at 10:43:43PM +0800, Jin Yao wrote:
> >
> > SNIP
> >
> > > - if ((stat_config.aggr_mode == AGGR_THREAD) && !target__has_task(&target)) {
> > > - fprintf(stderr, "The --per-thread option is only available "
> > > - "when monitoring via -p -t options.\n");
> > > - parse_options_usage(NULL, stat_options, "p", 1);
> > > - parse_options_usage(NULL, stat_options, "t", 1);
> > > - goto out;
> > > + if ((stat_config.aggr_mode == AGGR_THREAD) &&
> > > + !target__has_task(&target)) {
> > > + if (!target.system_wide || target.cpu_list) {
> > > + fprintf(stderr, "The --per-thread option is only "
> > > + "available when monitoring via -p -t "
> > > + "options.\n");
> >
> > the message should be updated with '-a' option, that you just added,
>
> OK. Could I update the message like this?
>
> "The --per-thread option is only "
> "available when monitoring via -p -t -a"
> "options or only --per-thread without any other option"
>
> > also why dont we support target.cpu_list, it should work no?
> >
>
> Currently it doesn't support cpu_list.
>
> I just think this patch series is too big and I wish to add supporting for
> cpu_list, cgroup or others in follow up patches.
>
> Is that OK?
ok, thanks
jirka
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 4/9] perf util: Update and print per-thread shadow stats
2017-11-22 3:10 ` Jin, Yao
@ 2017-11-22 8:37 ` Jiri Olsa
2017-11-22 12:06 ` Jin, Yao
0 siblings, 1 reply; 49+ messages in thread
From: Jiri Olsa @ 2017-11-22 8:37 UTC (permalink / raw)
To: Jin, Yao
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On Wed, Nov 22, 2017 at 11:10:37AM +0800, Jin, Yao wrote:
>
>
> On 11/21/2017 11:18 PM, Jiri Olsa wrote:
> > On Mon, Nov 20, 2017 at 10:43:39PM +0800, Jin Yao wrote:
> >
> > SNIP
> >
> > > if (num == 0)
> > > diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> > > index 151e9ef..50bb16d 100644
> > > --- a/tools/perf/util/stat.c
> > > +++ b/tools/perf/util/stat.c
> > > @@ -278,9 +278,16 @@ process_counter_values(struct perf_stat_config *config, struct perf_evsel *evsel
> > > perf_evsel__compute_deltas(evsel, cpu, thread, count);
> > > perf_counts_values__scale(count, config->scale, NULL);
> > > if (config->aggr_mode == AGGR_NONE)
> > > - perf_stat__update_shadow_stats(evsel, count->val, cpu);
> > > - if (config->aggr_mode == AGGR_THREAD)
> > > - perf_stat__update_shadow_stats(evsel, count->val, 0);
> > > + perf_stat__update_shadow_stats(evsel, count->val, cpu,
> > > + NULL);
> > > + if (config->aggr_mode == AGGR_THREAD) {
> > > + if (config->stats)
> > > + perf_stat__update_shadow_stats(evsel,
> > > + count->val, 0, &config->stats[thread]);
> >
> > please add this part together with config->stats allocation/usage
> > and keep this change only about adding the struct runtime_stat *stat
> > into the code path with no functional change
> >
> > thanks,
> > jirka
> >
>
> The interface of perf_stat__update_shadow_stats() is changed (added with a
> new parameter 'struct runtime_stat *stat').
>
> If I move above part to another patch (e.g. move to
> v1-0006-perf-stat-Allocate-shadow-stats-buffer-for-thread.patch), the
> compilation would be failed if only applying with this patch. It's not good
> for git bisect.
but you could make the patch just to add the 'struct runtime_stat *stat'
as an argument through the code path.. as a base for the stats, no?
jirka
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 3/9] perf util: Reconstruct rblist for supporting per-thread shadow stats
2017-11-22 8:32 ` Jiri Olsa
@ 2017-11-22 12:03 ` Jin, Yao
0 siblings, 0 replies; 49+ messages in thread
From: Jin, Yao @ 2017-11-22 12:03 UTC (permalink / raw)
To: Jiri Olsa
Cc: Ravi Bangoria, acme, jolsa, peterz, mingo, alexander.shishkin,
Linux-kernel, ak, kan.liang, yao.jin
On 11/22/2017 4:32 PM, Jiri Olsa wrote:
> On Wed, Nov 22, 2017 at 02:57:12PM +0800, Jin, Yao wrote:
>>
>>
>> On 11/22/2017 2:31 PM, Ravi Bangoria wrote:
>>>
>>> On 11/20/2017 08:13 PM, Jin Yao wrote:
>>>> @@ -76,6 +97,17 @@ static struct rb_node *saved_value_new(struct
>>>> rblist *rblist __maybe_unused,
>>>> return &nd->rb_node;
>>>> }
>>>>
>>>> +static void saved_value_delete(struct rblist *rblist __maybe_unused,
>>>> + struct rb_node *rb_node)
>>>> +{
>>>> + struct saved_value *v = container_of(rb_node,
>>>> + struct saved_value,
>>>> + rb_node);
>>>> +
>>>> + if (v)
>>>> + free(v);
>>>> +}
>>>
>>> Do we really need if(v) ?
>>>
>>> Thanks,
>>> Ravi
>>>
>>
>> Hi Ravi,
>>
>> Looks it doesn't need if(v).
>>
>> I put if(v) here is from my coding habits (checking pointer before free).
>>
>> It's OK for me if you think the code should be removed.
>
> you could add BUG_ON(!rb_node);
>
> jirka
>
Good idea! I will add BUG_ON checking there.
Thanks
Jin Yao
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v1 4/9] perf util: Update and print per-thread shadow stats
2017-11-22 8:37 ` Jiri Olsa
@ 2017-11-22 12:06 ` Jin, Yao
0 siblings, 0 replies; 49+ messages in thread
From: Jin, Yao @ 2017-11-22 12:06 UTC (permalink / raw)
To: Jiri Olsa
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On 11/22/2017 4:37 PM, Jiri Olsa wrote:
> On Wed, Nov 22, 2017 at 11:10:37AM +0800, Jin, Yao wrote:
>>
>>
>> On 11/21/2017 11:18 PM, Jiri Olsa wrote:
>>> On Mon, Nov 20, 2017 at 10:43:39PM +0800, Jin Yao wrote:
>>>
>>> SNIP
>>>
>>>> if (num == 0)
>>>> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
>>>> index 151e9ef..50bb16d 100644
>>>> --- a/tools/perf/util/stat.c
>>>> +++ b/tools/perf/util/stat.c
>>>> @@ -278,9 +278,16 @@ process_counter_values(struct perf_stat_config *config, struct perf_evsel *evsel
>>>> perf_evsel__compute_deltas(evsel, cpu, thread, count);
>>>> perf_counts_values__scale(count, config->scale, NULL);
>>>> if (config->aggr_mode == AGGR_NONE)
>>>> - perf_stat__update_shadow_stats(evsel, count->val, cpu);
>>>> - if (config->aggr_mode == AGGR_THREAD)
>>>> - perf_stat__update_shadow_stats(evsel, count->val, 0);
>>>> + perf_stat__update_shadow_stats(evsel, count->val, cpu,
>>>> + NULL);
>>>> + if (config->aggr_mode == AGGR_THREAD) {
>>>> + if (config->stats)
>>>> + perf_stat__update_shadow_stats(evsel,
>>>> + count->val, 0, &config->stats[thread]);
>>>
>>> please add this part together with config->stats allocation/usage
>>> and keep this change only about adding the struct runtime_stat *stat
>>> into the code path with no functional change
>>>
>>> thanks,
>>> jirka
>>>
>>
>> The interface of perf_stat__update_shadow_stats() is changed (added with a
>> new parameter 'struct runtime_stat *stat').
>>
>> If I move above part to another patch (e.g. move to
>> v1-0006-perf-stat-Allocate-shadow-stats-buffer-for-thread.patch), the
>> compilation would be failed if only applying with this patch. It's not good
>> for git bisect.
>
> but you could make the patch just to add the 'struct runtime_stat *stat'
> as an argument through the code path.. as a base for the stats, no?
>
> jirka
>
Yeah, that might be OK. Let me have a try.
Thanks
Jin Yao
^ permalink raw reply [flat|nested] 49+ messages in thread
end of thread, other threads:[~2017-11-22 12:06 UTC | newest]
Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-20 14:43 [PATCH v1 0/9] perf stat: Enable '--per-thread' on all threads Jin Yao
2017-11-20 9:26 ` Jiri Olsa
2017-11-20 12:15 ` Jin, Yao
2017-11-20 12:29 ` Jiri Olsa
2017-11-20 15:50 ` Andi Kleen
2017-11-20 14:43 ` [PATCH v1 1/9] perf util: Create rblist__reset() function Jin Yao
2017-11-20 14:43 ` [PATCH v1 2/9] perf util: Define a structure for runtime shadow metrics stats Jin Yao
2017-11-21 15:18 ` Jiri Olsa
2017-11-22 3:11 ` Jin, Yao
2017-11-20 14:43 ` [PATCH v1 3/9] perf util: Reconstruct rblist for supporting per-thread shadow stats Jin Yao
2017-11-21 15:17 ` Jiri Olsa
2017-11-22 1:29 ` Jin, Yao
2017-11-22 8:30 ` Jiri Olsa
2017-11-21 15:17 ` Jiri Olsa
2017-11-22 1:35 ` Jin, Yao
2017-11-21 15:17 ` Jiri Olsa
2017-11-22 1:45 ` Jin, Yao
2017-11-21 15:17 ` Jiri Olsa
2017-11-22 2:11 ` Jin, Yao
2017-11-21 15:17 ` Jiri Olsa
2017-11-22 2:19 ` Jin, Yao
2017-11-21 15:17 ` Jiri Olsa
2017-11-22 2:20 ` Jin, Yao
2017-11-22 6:31 ` Ravi Bangoria
2017-11-22 6:57 ` Jin, Yao
2017-11-22 8:32 ` Jiri Olsa
2017-11-22 12:03 ` Jin, Yao
2017-11-20 14:43 ` [PATCH v1 4/9] perf util: Update and print " Jin Yao
2017-11-21 15:17 ` Jiri Olsa
2017-11-22 2:42 ` Jin, Yao
2017-11-21 15:18 ` Jiri Olsa
2017-11-22 3:10 ` Jin, Yao
2017-11-22 8:37 ` Jiri Olsa
2017-11-22 12:06 ` Jin, Yao
2017-11-20 14:43 ` [PATCH v1 5/9] perf util: Remove a set of shadow stats static variables Jin Yao
2017-11-21 15:17 ` Jiri Olsa
2017-11-21 18:03 ` Andi Kleen
2017-11-21 21:19 ` Jiri Olsa
2017-11-20 14:43 ` [PATCH v1 6/9] perf stat: Allocate shadow stats buffer for threads Jin Yao
2017-11-20 14:43 ` [PATCH v1 7/9] perf util: Reuse thread_map__new_by_uid to enumerate threads from /proc Jin Yao
2017-11-20 14:43 ` [PATCH v1 8/9] perf stat: Remove --per-thread pid/tid limitation Jin Yao
2017-11-21 15:18 ` Jiri Olsa
2017-11-22 3:42 ` Jin, Yao
2017-11-22 8:35 ` Jiri Olsa
2017-11-21 15:18 ` Jiri Olsa
2017-11-22 5:34 ` Jin, Yao
2017-11-21 15:18 ` Jiri Olsa
2017-11-22 5:38 ` Jin, Yao
2017-11-20 14:43 ` [PATCH v1 9/9] perf stat: Resort '--per-thread' result Jin Yao
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.