linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/12] perf stat: Enable '--per-thread' on all thread
@ 2017-12-01 10:57 Jin Yao
  2017-12-01 10:57 ` [PATCH v5 01/12] perf util: Create rblist__exit() function Jin Yao
                   ` (11 more replies)
  0 siblings, 12 replies; 27+ messages in thread
From: Jin Yao @ 2017-12-01 10:57 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

v5:
---
Update according to Arnaldo's comments.

Try to replace '_reset' with '_exit' and '_free' with '_delete'.

1. In 'perf util: Create rblist__exit() function', 
rblist__reset() -> rblist__exit()

2. In 'perf util: Create the runtime_stat init/exit function'
runtime_stat__free() -> runtime_stat__exit()

I don't change the name to runtime_stat__delete() because
it doesn't free the object in this function. So maybe '__exit'
is better.

3. In 'perf stat: Allocate shadow stats buffer for threads'
runtime_stat_alloc() -> runtime_stat_new()
runtime_stat_free() -> runtime_stat_delete()

v4:
---
Update according to Jiri's comments. The major modification is:

Move struct perf_stat_config::*stats|stats_num to
'perf stat: Allocate shadow stats buffer for threads'.

I also move the code of updating/printing per-thread stats
from 'perf util: Update and print per-thread shadow stats'
and 'perf stat: Allocate shadow stats buffer for threads'
to a new patch 'perf stat: Update or print per-thread stats'.
That should be more clear for patch reading.

Impacted patch:
---------------
  perf util: Define a structure for runtime shadow stats
  perf util: Update and print per-thread shadow stats
  perf stat: Allocate shadow stats buffer for threads
  perf stat: Update or print per-thread stats
  perf stat: Resort '--per-thread' result

v3:
---
Update according to Jiri's comments. The major modifications are:

1. Fix the crash issue when performing the git bisect.
   Move the removing of runtime_saved_values to the switching point
   (perf util: Remove a set of shadow stats static variables).

2. Still add struct perf_stat_config::*stats|stats_num in earlier
   patch because 'stats' will be used in
   'perf util: Update and print per-thread shadow stats'.
   If it's moved to 'perf stat: Allocate shadow stats buffer for threads',
   the compilation would be failed.

3. v = container_of(rb_node,
		    struct saved_value,
		    rb_node);

   Let this code to be one line.

Impacted patch:
---------------
  perf util: Add rbtree node_delete ops
  perf util: Create the runtime_stat init/free function
  perf util: Update and print per-thread shadow stats
  perf util: Remove a set of shadow stats static variables

v2:
---
Update according to Jiri's comments. The major modifications are:

1. Remove unnecessary memset for runtime_stat and
   walltime_nsecs_stats.

2. Remove init_saved_rblist/free_saved_rblist and move the codes
   to runtime_stat__init/runtime_stat__free.

3. Change function name from perf_stat__init_runtime_stat/
   perf_stat__free_runtime_stat to runtime_stat__init/
   runtime_stat__free

4. Create a new patch 'perf util: Extend rbtree to support shadow stats'
   to add new ctx/type/stat into rbtree keys for supporting per-thread
   shadow stats. It also comment in here on the cases where evsel is
   defined and when not.

5. Pass &rt_stat directly to perf_stat__update_shadow_stats, replace
   original parameter NULL.

6. Move the code of shadow stat updating for per-thread case to config->stats
   allocation/usage patch.

7. Add BUG_ON(!rb_node) in saved_value_delete.

8. s/stat_num/stats_num/ or s/stats/stat/ in struct perf_stat_config.

9. Rebase to latest perf/core branch.

Impacted patch:
---------------
  perf util: Define a structure for runtime shadow metrics stats
  perf util: Extend rbtree to support shadow stats
  perf util: Add rbtree node_delete ops
  perf util: Update and print per-thread shadow stats
  perf stat: Allocate shadow stats buffer for threads
  perf stat: Remove --per-thread pid/tid limitation
  perf stat: Resort '--per-thread' result

Initial post:
-------------

perf stat --per-thread is useful to break down data per thread.
But it currently requires specifying --pid/--tid to limit it to a process.

For analysis it would be useful to do it globally for the whole system.

1. Currently, if we perform 'perf stat --per-thread' without pid/tid,
perf returns error:

root@skl:/tmp# perf stat --per-thread
The --per-thread option is only available when monitoring via -p -t options.
    -p, --pid <pid>       stat events on existing process id
    -t, --tid <tid>       stat events on existing thread id

2. With this patch series, it returns data per thread with shadow metrics.
   (run "vmstat 1" in following example)

    root@skl:/tmp# perf stat --per-thread
    ^C
     Performance counter stats for 'system wide':

                perf-24165              4.302433      cpu-clock (msec)          #    0.001 CPUs utilized
              vmstat-23127              1.562215      cpu-clock (msec)          #    0.000 CPUs utilized
          irqbalance-2780               0.827851      cpu-clock (msec)          #    0.000 CPUs utilized
                sshd-23111              0.278308      cpu-clock (msec)          #    0.000 CPUs utilized
            thermald-2841               0.230880      cpu-clock (msec)          #    0.000 CPUs utilized
                sshd-23058              0.207306      cpu-clock (msec)          #    0.000 CPUs utilized
         kworker/0:2-19991              0.133983      cpu-clock (msec)          #    0.000 CPUs utilized
       kworker/u16:1-18249              0.125636      cpu-clock (msec)          #    0.000 CPUs utilized
           rcu_sched-8                  0.085533      cpu-clock (msec)          #    0.000 CPUs utilized
       kworker/u16:2-23146              0.077139      cpu-clock (msec)          #    0.000 CPUs utilized
               gmain-2700               0.041789      cpu-clock (msec)          #    0.000 CPUs utilized
         kworker/4:1-15354              0.028370      cpu-clock (msec)          #    0.000 CPUs utilized
         kworker/6:0-17528              0.023895      cpu-clock (msec)          #    0.000 CPUs utilized
        kworker/4:1H-1887               0.013209      cpu-clock (msec)          #    0.000 CPUs utilized
         kworker/5:2-31362              0.011627      cpu-clock (msec)          #    0.000 CPUs utilized
          watchdog/0-11                 0.010892      cpu-clock (msec)          #    0.000 CPUs utilized
         kworker/3:2-12870              0.010220      cpu-clock (msec)          #    0.000 CPUs utilized
         ksoftirqd/0-7                  0.008869      cpu-clock (msec)          #    0.000 CPUs utilized
          watchdog/1-14                 0.008476      cpu-clock (msec)          #    0.000 CPUs utilized
          watchdog/7-50                 0.002944      cpu-clock (msec)          #    0.000 CPUs utilized
          watchdog/3-26                 0.002893      cpu-clock (msec)          #    0.000 CPUs utilized
          watchdog/4-32                 0.002759      cpu-clock (msec)          #    0.000 CPUs utilized
          watchdog/2-20                 0.002429      cpu-clock (msec)          #    0.000 CPUs utilized
          watchdog/6-44                 0.001491      cpu-clock (msec)          #    0.000 CPUs utilized
          watchdog/5-38                 0.001477      cpu-clock (msec)          #    0.000 CPUs utilized
           rcu_sched-8                        10      context-switches          #    0.117 M/sec
       kworker/u16:1-18249                     7      context-switches          #    0.056 M/sec
                sshd-23111                     4      context-switches          #    0.014 M/sec
              vmstat-23127                     4      context-switches          #    0.003 M/sec
                perf-24165                     4      context-switches          #    0.930 K/sec
         kworker/0:2-19991                     3      context-switches          #    0.022 M/sec
       kworker/u16:2-23146                     3      context-switches          #    0.039 M/sec
         kworker/4:1-15354                     2      context-switches          #    0.070 M/sec
         kworker/6:0-17528                     2      context-switches          #    0.084 M/sec
                sshd-23058                     2      context-switches          #    0.010 M/sec
         ksoftirqd/0-7                         1      context-switches          #    0.113 M/sec
          watchdog/0-11                        1      context-switches          #    0.092 M/sec
          watchdog/1-14                        1      context-switches          #    0.118 M/sec
          watchdog/2-20                        1      context-switches          #    0.412 M/sec
          watchdog/3-26                        1      context-switches          #    0.346 M/sec
          watchdog/4-32                        1      context-switches          #    0.362 M/sec
          watchdog/5-38                        1      context-switches          #    0.677 M/sec
          watchdog/6-44                        1      context-switches          #    0.671 M/sec
          watchdog/7-50                        1      context-switches          #    0.340 M/sec
        kworker/4:1H-1887                      1      context-switches          #    0.076 M/sec
            thermald-2841                      1      context-switches          #    0.004 M/sec
               gmain-2700                      1      context-switches          #    0.024 M/sec
          irqbalance-2780                      1      context-switches          #    0.001 M/sec
         kworker/3:2-12870                     1      context-switches          #    0.098 M/sec
         kworker/5:2-31362                     1      context-switches          #    0.086 M/sec
       kworker/u16:1-18249                     2      cpu-migrations            #    0.016 M/sec
       kworker/u16:2-23146                     2      cpu-migrations            #    0.026 M/sec
           rcu_sched-8                         1      cpu-migrations            #    0.012 M/sec
                sshd-23058                     1      cpu-migrations            #    0.005 M/sec
                perf-24165             8,833,385      cycles                    #    2.053 GHz
              vmstat-23127             1,702,699      cycles                    #    1.090 GHz
          irqbalance-2780                739,847      cycles                    #    0.894 GHz
                sshd-23111               269,506      cycles                    #    0.968 GHz
            thermald-2841                204,556      cycles                    #    0.886 GHz
                sshd-23058               158,780      cycles                    #    0.766 GHz
         kworker/0:2-19991               112,981      cycles                    #    0.843 GHz
       kworker/u16:1-18249               100,926      cycles                    #    0.803 GHz
           rcu_sched-8                    74,024      cycles                    #    0.865 GHz
       kworker/u16:2-23146                55,984      cycles                    #    0.726 GHz
               gmain-2700                 34,278      cycles                    #    0.820 GHz
         kworker/4:1-15354                20,665      cycles                    #    0.728 GHz
         kworker/6:0-17528                16,445      cycles                    #    0.688 GHz
         kworker/5:2-31362                 9,492      cycles                    #    0.816 GHz
          watchdog/3-26                    8,695      cycles                    #    3.006 GHz
        kworker/4:1H-1887                  8,238      cycles                    #    0.624 GHz
          watchdog/4-32                    7,580      cycles                    #    2.747 GHz
         kworker/3:2-12870                 7,306      cycles                    #    0.715 GHz
          watchdog/2-20                    7,274      cycles                    #    2.995 GHz
          watchdog/0-11                    6,988      cycles                    #    0.642 GHz
         ksoftirqd/0-7                     6,376      cycles                    #    0.719 GHz
          watchdog/1-14                    5,340      cycles                    #    0.630 GHz
          watchdog/5-38                    4,061      cycles                    #    2.749 GHz
          watchdog/6-44                    3,976      cycles                    #    2.667 GHz
          watchdog/7-50                    3,418      cycles                    #    1.161 GHz
              vmstat-23127             2,511,699      instructions              #    1.48  insn per cycle
                perf-24165             1,829,908      instructions              #    0.21  insn per cycle
          irqbalance-2780              1,190,204      instructions              #    1.61  insn per cycle
            thermald-2841                143,544      instructions              #    0.70  insn per cycle
                sshd-23111               128,138      instructions              #    0.48  insn per cycle
                sshd-23058                57,654      instructions              #    0.36  insn per cycle
           rcu_sched-8                    44,063      instructions              #    0.60  insn per cycle
       kworker/u16:1-18249                42,551      instructions              #    0.42  insn per cycle
         kworker/0:2-19991                25,873      instructions              #    0.23  insn per cycle
       kworker/u16:2-23146                21,407      instructions              #    0.38  insn per cycle
               gmain-2700                 13,691      instructions              #    0.40  insn per cycle
         kworker/4:1-15354                12,964      instructions              #    0.63  insn per cycle
         kworker/6:0-17528                10,034      instructions              #    0.61  insn per cycle
         kworker/5:2-31362                 5,203      instructions              #    0.55  insn per cycle
         kworker/3:2-12870                 4,866      instructions              #    0.67  insn per cycle
        kworker/4:1H-1887                  3,586      instructions              #    0.44  insn per cycle
         ksoftirqd/0-7                     3,463      instructions              #    0.54  insn per cycle
          watchdog/0-11                    3,135      instructions              #    0.45  insn per cycle
          watchdog/1-14                    3,135      instructions              #    0.59  insn per cycle
          watchdog/2-20                    3,135      instructions              #    0.43  insn per cycle
          watchdog/3-26                    3,135      instructions              #    0.36  insn per cycle
          watchdog/4-32                    3,135      instructions              #    0.41  insn per cycle
          watchdog/5-38                    3,135      instructions              #    0.77  insn per cycle
          watchdog/6-44                    3,135      instructions              #    0.79  insn per cycle
          watchdog/7-50                    3,135      instructions              #    0.92  insn per cycle
              vmstat-23127               539,181      branches                  #  345.139 M/sec
                perf-24165               375,364      branches                  #   87.245 M/sec
          irqbalance-2780                262,092      branches                  #  316.593 M/sec
            thermald-2841                 31,611      branches                  #  136.915 M/sec
                sshd-23111                21,874      branches                  #   78.596 M/sec
                sshd-23058                10,682      branches                  #   51.528 M/sec
           rcu_sched-8                     8,693      branches                  #  101.633 M/sec
       kworker/u16:1-18249                 7,891      branches                  #   62.808 M/sec
         kworker/0:2-19991                 5,761      branches                  #   42.998 M/sec
       kworker/u16:2-23146                 4,099      branches                  #   53.138 M/sec
         kworker/4:1-15354                 2,755      branches                  #   97.110 M/sec
               gmain-2700                  2,638      branches                  #   63.127 M/sec
         kworker/6:0-17528                 2,216      branches                  #   92.739 M/sec
         kworker/5:2-31362                 1,132      branches                  #   97.360 M/sec
         kworker/3:2-12870                 1,081      branches                  #  105.773 M/sec
        kworker/4:1H-1887                    725      branches                  #   54.887 M/sec
         ksoftirqd/0-7                       707      branches                  #   79.716 M/sec
          watchdog/0-11                      652      branches                  #   59.860 M/sec
          watchdog/1-14                      652      branches                  #   76.923 M/sec
          watchdog/2-20                      652      branches                  #  268.423 M/sec
          watchdog/3-26                      652      branches                  #  225.372 M/sec
          watchdog/4-32                      652      branches                  #  236.318 M/sec
          watchdog/5-38                      652      branches                  #  441.435 M/sec
          watchdog/6-44                      652      branches                  #  437.290 M/sec
          watchdog/7-50                      652      branches                  #  221.467 M/sec
              vmstat-23127                 8,960      branch-misses             #    1.66% of all branches
          irqbalance-2780                  3,047      branch-misses             #    1.16% of all branches
                perf-24165                 2,876      branch-misses             #    0.77% of all branches
                sshd-23111                 1,843      branch-misses             #    8.43% of all branches
            thermald-2841                  1,444      branch-misses             #    4.57% of all branches
                sshd-23058                 1,379      branch-misses             #   12.91% of all branches
       kworker/u16:1-18249                   982      branch-misses             #   12.44% of all branches
           rcu_sched-8                       893      branch-misses             #   10.27% of all branches
       kworker/u16:2-23146                   578      branch-misses             #   14.10% of all branches
         kworker/0:2-19991                   376      branch-misses             #    6.53% of all branches
               gmain-2700                    280      branch-misses             #   10.61% of all branches
         kworker/6:0-17528                   196      branch-misses             #    8.84% of all branches
         kworker/4:1-15354                   187      branch-misses             #    6.79% of all branches
         kworker/5:2-31362                   123      branch-misses             #   10.87% of all branches
          watchdog/0-11                       95      branch-misses             #   14.57% of all branches
          watchdog/4-32                       89      branch-misses             #   13.65% of all branches
         kworker/3:2-12870                    80      branch-misses             #    7.40% of all branches
          watchdog/3-26                       61      branch-misses             #    9.36% of all branches
        kworker/4:1H-1887                     60      branch-misses             #    8.28% of all branches
          watchdog/2-20                       52      branch-misses             #    7.98% of all branches
         ksoftirqd/0-7                        47      branch-misses             #    6.65% of all branches
          watchdog/1-14                       46      branch-misses             #    7.06% of all branches
          watchdog/7-50                       13      branch-misses             #    1.99% of all branches
          watchdog/5-38                        8      branch-misses             #    1.23% of all branches
          watchdog/6-44                        7      branch-misses             #    1.07% of all branches

           3.695150786 seconds time elapsed

    root@skl:/tmp# perf stat --per-thread -M IPC,CPI
    ^C

     Performance counter stats for 'system wide':

              vmstat-23127             2,000,783      inst_retired.any          #      1.5 IPC
            thermald-2841              1,472,670      inst_retired.any          #      1.3 IPC
                sshd-23111               977,374      inst_retired.any          #      1.2 IPC
                perf-24163               483,779      inst_retired.any          #      0.2 IPC
               gmain-2700                341,213      inst_retired.any          #      0.9 IPC
                sshd-23058               148,891      inst_retired.any          #      0.8 IPC
        rtkit-daemon-3288                 71,210      inst_retired.any          #      0.7 IPC
       kworker/u16:1-18249                39,562      inst_retired.any          #      0.3 IPC
           rcu_sched-8                    14,474      inst_retired.any          #      0.8 IPC
         kworker/0:2-19991                 7,659      inst_retired.any          #      0.2 IPC
         kworker/4:1-15354                 6,714      inst_retired.any          #      0.8 IPC
        rtkit-daemon-3289                  4,839      inst_retired.any          #      0.3 IPC
         kworker/6:0-17528                 3,321      inst_retired.any          #      0.6 IPC
         kworker/5:2-31362                 3,215      inst_retired.any          #      0.5 IPC
         kworker/7:2-23145                 3,173      inst_retired.any          #      0.7 IPC
        kworker/4:1H-1887                  1,719      inst_retired.any          #      0.3 IPC
          watchdog/0-11                    1,479      inst_retired.any          #      0.3 IPC
          watchdog/1-14                    1,479      inst_retired.any          #      0.3 IPC
          watchdog/2-20                    1,479      inst_retired.any          #      0.4 IPC
          watchdog/3-26                    1,479      inst_retired.any          #      0.4 IPC
          watchdog/4-32                    1,479      inst_retired.any          #      0.3 IPC
          watchdog/5-38                    1,479      inst_retired.any          #      0.3 IPC
          watchdog/6-44                    1,479      inst_retired.any          #      0.7 IPC
          watchdog/7-50                    1,479      inst_retired.any          #      0.7 IPC
       kworker/u16:2-23146                 1,408      inst_retired.any          #      0.5 IPC
                perf-24163             2,249,872      cpu_clk_unhalted.thread
              vmstat-23127             1,352,455      cpu_clk_unhalted.thread
            thermald-2841              1,161,140      cpu_clk_unhalted.thread
                sshd-23111               807,827      cpu_clk_unhalted.thread
               gmain-2700                375,535      cpu_clk_unhalted.thread
                sshd-23058               194,071      cpu_clk_unhalted.thread
       kworker/u16:1-18249               114,306      cpu_clk_unhalted.thread
        rtkit-daemon-3288                103,547      cpu_clk_unhalted.thread
         kworker/0:2-19991                46,550      cpu_clk_unhalted.thread
           rcu_sched-8                    18,855      cpu_clk_unhalted.thread
        rtkit-daemon-3289                 17,549      cpu_clk_unhalted.thread
         kworker/4:1-15354                 8,812      cpu_clk_unhalted.thread
         kworker/5:2-31362                 6,812      cpu_clk_unhalted.thread
        kworker/4:1H-1887                  5,270      cpu_clk_unhalted.thread
         kworker/6:0-17528                 5,111      cpu_clk_unhalted.thread
         kworker/7:2-23145                 4,667      cpu_clk_unhalted.thread
          watchdog/0-11                    4,663      cpu_clk_unhalted.thread
          watchdog/1-14                    4,663      cpu_clk_unhalted.thread
          watchdog/4-32                    4,626      cpu_clk_unhalted.thread
          watchdog/5-38                    4,403      cpu_clk_unhalted.thread
          watchdog/3-26                    3,936      cpu_clk_unhalted.thread
          watchdog/2-20                    3,850      cpu_clk_unhalted.thread
       kworker/u16:2-23146                 2,654      cpu_clk_unhalted.thread
          watchdog/6-44                    2,017      cpu_clk_unhalted.thread
          watchdog/7-50                    2,017      cpu_clk_unhalted.thread
              vmstat-23127             2,000,783      inst_retired.any          #      0.7 CPI
            thermald-2841              1,472,670      inst_retired.any          #      0.8 CPI
                sshd-23111               977,374      inst_retired.any          #      0.8 CPI
                perf-24163               495,037      inst_retired.any          #      4.7 CPI
               gmain-2700                341,213      inst_retired.any          #      1.1 CPI
                sshd-23058               148,891      inst_retired.any          #      1.3 CPI
        rtkit-daemon-3288                 71,210      inst_retired.any          #      1.5 CPI
       kworker/u16:1-18249                39,562      inst_retired.any          #      2.9 CPI
           rcu_sched-8                    14,474      inst_retired.any          #      1.3 CPI
         kworker/0:2-19991                 7,659      inst_retired.any          #      6.1 CPI
         kworker/4:1-15354                 6,714      inst_retired.any          #      1.3 CPI
        rtkit-daemon-3289                  4,839      inst_retired.any          #      3.6 CPI
         kworker/6:0-17528                 3,321      inst_retired.any          #      1.5 CPI
         kworker/5:2-31362                 3,215      inst_retired.any          #      2.1 CPI
         kworker/7:2-23145                 3,173      inst_retired.any          #      1.5 CPI
        kworker/4:1H-1887                  1,719      inst_retired.any          #      3.1 CPI
          watchdog/0-11                    1,479      inst_retired.any          #      3.2 CPI
          watchdog/1-14                    1,479      inst_retired.any          #      3.2 CPI
          watchdog/2-20                    1,479      inst_retired.any          #      2.6 CPI
          watchdog/3-26                    1,479      inst_retired.any          #      2.7 CPI
          watchdog/4-32                    1,479      inst_retired.any          #      3.1 CPI
          watchdog/5-38                    1,479      inst_retired.any          #      3.0 CPI
          watchdog/6-44                    1,479      inst_retired.any          #      1.4 CPI
          watchdog/7-50                    1,479      inst_retired.any          #      1.4 CPI
       kworker/u16:2-23146                 1,408      inst_retired.any          #      1.9 CPI
                perf-24163             2,302,323      cycles
              vmstat-23127             1,352,455      cycles
            thermald-2841              1,161,140      cycles
                sshd-23111               807,827      cycles
               gmain-2700                375,535      cycles
                sshd-23058               194,071      cycles
       kworker/u16:1-18249               114,306      cycles
        rtkit-daemon-3288                103,547      cycles
         kworker/0:2-19991                46,550      cycles
           rcu_sched-8                    18,855      cycles
        rtkit-daemon-3289                 17,549      cycles
         kworker/4:1-15354                 8,812      cycles
         kworker/5:2-31362                 6,812      cycles
        kworker/4:1H-1887                  5,270      cycles
         kworker/6:0-17528                 5,111      cycles
         kworker/7:2-23145                 4,667      cycles
          watchdog/0-11                    4,663      cycles
          watchdog/1-14                    4,663      cycles
          watchdog/4-32                    4,626      cycles
          watchdog/5-38                    4,403      cycles
          watchdog/3-26                    3,936      cycles
          watchdog/2-20                    3,850      cycles
       kworker/u16:2-23146                 2,654      cycles
          watchdog/6-44                    2,017      cycles
          watchdog/7-50                    2,017      cycles

           2.175726600 seconds time elapsed

Jin Yao (12):
  perf util: Create rblist__exit() function
  perf util: Define a structure for runtime shadow stats
  perf util: Extend rbtree to support shadow stats
  perf util: Add rbtree node_delete ops
  perf util: Create the runtime_stat init/exit function
  perf util: Update and print per-thread shadow stats
  perf util: Remove a set of shadow stats static variables
  perf stat: Allocate shadow stats buffer for threads
  perf stat: Update or print per-thread stats
  perf util: Reuse thread_map__new_by_uid to enumerate threads from
    /proc
  perf stat: Remove --per-thread pid/tid limitation
  perf stat: Resort '--per-thread' result

 tools/perf/builtin-script.c   |   6 +-
 tools/perf/builtin-stat.c     | 168 ++++++++++++++---
 tools/perf/tests/thread-map.c |   2 +-
 tools/perf/util/evlist.c      |   3 +-
 tools/perf/util/rblist.c      |  19 +-
 tools/perf/util/rblist.h      |   1 +
 tools/perf/util/stat-shadow.c | 426 +++++++++++++++++++++++++-----------------
 tools/perf/util/stat.c        |  15 +-
 tools/perf/util/stat.h        |  64 ++++++-
 tools/perf/util/target.h      |   7 +
 tools/perf/util/thread_map.c  |  19 +-
 tools/perf/util/thread_map.h  |   3 +-
 12 files changed, 519 insertions(+), 214 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v5 01/12] perf util: Create rblist__exit() function
  2017-12-01 10:57 [PATCH v5 00/12] perf stat: Enable '--per-thread' on all thread Jin Yao
@ 2017-12-01 10:57 ` Jin Yao
  2017-12-06 16:36   ` [tip:perf/core] perf rblist: " tip-bot for Jin Yao
  2017-12-01 10:57 ` [PATCH v5 02/12] perf util: Define a structure for runtime shadow stats Jin Yao
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 27+ messages in thread
From: Jin Yao @ 2017-12-01 10:57 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

Currently we have a rblist__delete() which is used to delete a rblist.
While rblist__delete() will free the pointer of rblist at the end.

It's inconvenience for user to delete a rblist which is not allocated
by something like malloc(). For example, the rblist is defined in a
data structure.

This patch creates a new function rblist__exit() which is similar
as rblist__delete() but it will not free the pointer of rblist.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/util/rblist.c | 19 ++++++++++++-------
 tools/perf/util/rblist.h |  1 +
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/rblist.c b/tools/perf/util/rblist.c
index 0dfe27d..0efc325 100644
--- a/tools/perf/util/rblist.c
+++ b/tools/perf/util/rblist.c
@@ -101,16 +101,21 @@ void rblist__init(struct rblist *rblist)
 	return;
 }
 
+void rblist__exit(struct rblist *rblist)
+{
+	struct rb_node *pos, *next = rb_first(&rblist->entries);
+
+	while (next) {
+		pos = next;
+		next = rb_next(pos);
+		rblist__remove_node(rblist, pos);
+	}
+}
+
 void rblist__delete(struct rblist *rblist)
 {
 	if (rblist != NULL) {
-		struct rb_node *pos, *next = rb_first(&rblist->entries);
-
-		while (next) {
-			pos = next;
-			next = rb_next(pos);
-			rblist__remove_node(rblist, pos);
-		}
+		rblist__exit(rblist);
 		free(rblist);
 	}
 }
diff --git a/tools/perf/util/rblist.h b/tools/perf/util/rblist.h
index 4c8638a..76df15c 100644
--- a/tools/perf/util/rblist.h
+++ b/tools/perf/util/rblist.h
@@ -29,6 +29,7 @@ struct rblist {
 };
 
 void rblist__init(struct rblist *rblist);
+void rblist__exit(struct rblist *rblist);
 void rblist__delete(struct rblist *rblist);
 int rblist__add_node(struct rblist *rblist, const void *new_entry);
 void rblist__remove_node(struct rblist *rblist, struct rb_node *rb_node);
-- 
2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v5 02/12] perf util: Define a structure for runtime shadow stats
  2017-12-01 10:57 [PATCH v5 00/12] perf stat: Enable '--per-thread' on all thread Jin Yao
  2017-12-01 10:57 ` [PATCH v5 01/12] perf util: Create rblist__exit() function Jin Yao
@ 2017-12-01 10:57 ` Jin Yao
  2017-12-01 14:02   ` Arnaldo Carvalho de Melo
  2017-12-01 10:57 ` [PATCH v5 03/12] perf util: Extend rbtree to support " Jin Yao
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 27+ messages in thread
From: Jin Yao @ 2017-12-01 10:57 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

Perf has a set of static variables to record the runtime shadow
metrics stats.

While if we want to record the runtime shadow stats for per-thread,
it will be the limitation. This patch creates a structure and the
next patches will use this structure to update the runtime shadow
stats for per-thread.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/util/stat-shadow.c | 11 -----------
 tools/perf/util/stat.h        | 44 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 43 insertions(+), 12 deletions(-)

diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 855e35c..5853901 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -9,17 +9,6 @@
 #include "expr.h"
 #include "metricgroup.h"
 
-enum {
-	CTX_BIT_USER	= 1 << 0,
-	CTX_BIT_KERNEL	= 1 << 1,
-	CTX_BIT_HV	= 1 << 2,
-	CTX_BIT_HOST	= 1 << 3,
-	CTX_BIT_IDLE	= 1 << 4,
-	CTX_BIT_MAX	= 1 << 5,
-};
-
-#define NUM_CTX CTX_BIT_MAX
-
 /*
  * AGGR_GLOBAL: Use CPU 0
  * AGGR_SOCKET: Use first CPU of socket
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index eefca5c..290c51e 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -5,6 +5,8 @@
 #include <linux/types.h>
 #include <stdio.h>
 #include "xyarray.h"
+#include "evsel.h"
+#include "rblist.h"
 
 struct stats
 {
@@ -43,6 +45,47 @@ enum aggr_mode {
 	AGGR_UNSET,
 };
 
+enum {
+	CTX_BIT_USER	= 1 << 0,
+	CTX_BIT_KERNEL	= 1 << 1,
+	CTX_BIT_HV	= 1 << 2,
+	CTX_BIT_HOST	= 1 << 3,
+	CTX_BIT_IDLE	= 1 << 4,
+	CTX_BIT_MAX	= 1 << 5,
+};
+
+#define NUM_CTX CTX_BIT_MAX
+
+enum stat_type {
+	STAT_NONE = 0,
+	STAT_NSECS,
+	STAT_CYCLES,
+	STAT_STALLED_CYCLES_FRONT,
+	STAT_STALLED_CYCLES_BACK,
+	STAT_BRANCHES,
+	STAT_CACHEREFS,
+	STAT_L1_DCACHE,
+	STAT_L1_ICACHE,
+	STAT_LL_CACHE,
+	STAT_ITLB_CACHE,
+	STAT_DTLB_CACHE,
+	STAT_CYCLES_IN_TX,
+	STAT_TRANSACTION,
+	STAT_ELISION,
+	STAT_TOPDOWN_TOTAL_SLOTS,
+	STAT_TOPDOWN_SLOTS_ISSUED,
+	STAT_TOPDOWN_SLOTS_RETIRED,
+	STAT_TOPDOWN_FETCH_BUBBLES,
+	STAT_TOPDOWN_RECOVERY_BUBBLES,
+	STAT_SMI_NUM,
+	STAT_APERF,
+	STAT_MAX
+};
+
+struct runtime_stat {
+	struct rblist value_list;
+};
+
 struct perf_stat_config {
 	enum aggr_mode	aggr_mode;
 	bool		scale;
@@ -92,7 +135,6 @@ struct perf_stat_output_ctx {
 	bool force_header;
 };
 
-struct rblist;
 void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
 				   double avg, int cpu,
 				   struct perf_stat_output_ctx *out,
-- 
2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v5 03/12] perf util: Extend rbtree to support shadow stats
  2017-12-01 10:57 [PATCH v5 00/12] perf stat: Enable '--per-thread' on all thread Jin Yao
  2017-12-01 10:57 ` [PATCH v5 01/12] perf util: Create rblist__exit() function Jin Yao
  2017-12-01 10:57 ` [PATCH v5 02/12] perf util: Define a structure for runtime shadow stats Jin Yao
@ 2017-12-01 10:57 ` Jin Yao
  2017-12-01 14:10   ` Arnaldo Carvalho de Melo
  2017-12-01 10:57 ` [PATCH v5 04/12] perf util: Add rbtree node_delete ops Jin Yao
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 27+ messages in thread
From: Jin Yao @ 2017-12-01 10:57 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

Previously the rbtree was used to link generic metrics.

This patches adds new ctx/type/stat into rbtree keys because we
will use this rbtree to maintain shadow metrics to replace original
a couple of static arrays for supporting per-thread shadow stats.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/util/stat-shadow.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 5853901..c53b80d 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -45,7 +45,10 @@ struct stats walltime_nsecs_stats;
 struct saved_value {
 	struct rb_node rb_node;
 	struct perf_evsel *evsel;
+	enum stat_type type;
+	int ctx;
 	int cpu;
+	struct runtime_stat *stat;
 	struct stats stats;
 };
 
@@ -58,6 +61,30 @@ static int saved_value_cmp(struct rb_node *rb_node, const void *entry)
 
 	if (a->cpu != b->cpu)
 		return a->cpu - b->cpu;
+
+	/*
+	 * Previously the rbtree was used to link generic metrics.
+	 * The keys were evsel/cpu. Now the rbtree is extended to support
+	 * per-thread shadow stats. For shadow stats case, the keys
+	 * are cpu/type/ctx/stat (evsel is NULL). For generic metrics
+	 * case, the keys are still evsel/cpu (type/ctx/stat are 0 or NULL).
+	 */
+	if (a->type != b->type)
+		return a->type - b->type;
+
+	if (a->ctx != b->ctx)
+		return a->ctx - b->ctx;
+
+	if (a->evsel == NULL && b->evsel == NULL) {
+		if (a->stat == b->stat)
+			return 0;
+
+		if ((char *)a->stat < (char *)b->stat)
+			return -1;
+
+		return 1;
+	}
+
 	if (a->evsel == b->evsel)
 		return 0;
 	if ((char *)a->evsel < (char *)b->evsel)
-- 
2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v5 04/12] perf util: Add rbtree node_delete ops
  2017-12-01 10:57 [PATCH v5 00/12] perf stat: Enable '--per-thread' on all thread Jin Yao
                   ` (2 preceding siblings ...)
  2017-12-01 10:57 ` [PATCH v5 03/12] perf util: Extend rbtree to support " Jin Yao
@ 2017-12-01 10:57 ` Jin Yao
  2017-12-01 14:14   ` Arnaldo Carvalho de Melo
  2017-12-06 16:37   ` [tip:perf/core] perf stat: Add rbtree node_delete op tip-bot for Jin Yao
  2017-12-01 10:57 ` [PATCH v5 05/12] perf util: Create the runtime_stat init/exit function Jin Yao
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 27+ messages in thread
From: Jin Yao @ 2017-12-01 10:57 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

In current stat-shadow.c, the rbtree deleting is ignored.

The patch adds the implementation to node_delete method
of rblist.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/util/stat-shadow.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index c53b80d..528be3e 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -103,6 +103,16 @@ static struct rb_node *saved_value_new(struct rblist *rblist __maybe_unused,
 	return &nd->rb_node;
 }
 
+static void saved_value_delete(struct rblist *rblist __maybe_unused,
+			       struct rb_node *rb_node)
+{
+	struct saved_value *v;
+
+	BUG_ON(!rb_node);
+	v = container_of(rb_node, struct saved_value, rb_node);
+	free(v);
+}
+
 static struct saved_value *saved_value_lookup(struct perf_evsel *evsel,
 					      int cpu,
 					      bool create)
@@ -130,7 +140,7 @@ void perf_stat__init_shadow_stats(void)
 	rblist__init(&runtime_saved_values);
 	runtime_saved_values.node_cmp = saved_value_cmp;
 	runtime_saved_values.node_new = saved_value_new;
-	/* No delete for now */
+	runtime_saved_values.node_delete = saved_value_delete;
 }
 
 static int evsel_context(struct perf_evsel *evsel)
-- 
2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v5 05/12] perf util: Create the runtime_stat init/exit function
  2017-12-01 10:57 [PATCH v5 00/12] perf stat: Enable '--per-thread' on all thread Jin Yao
                   ` (3 preceding siblings ...)
  2017-12-01 10:57 ` [PATCH v5 04/12] perf util: Add rbtree node_delete ops Jin Yao
@ 2017-12-01 10:57 ` Jin Yao
  2017-12-01 10:57 ` [PATCH v5 06/12] perf util: Update and print per-thread shadow stats Jin Yao
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Jin Yao @ 2017-12-01 10:57 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

It mainly initializes and releases the rblist which is
defined in struct runtime_stat.

For the original rblist 'runtime_saved_values', it's still
kept there for keeping the patch bisectable.

The rblist 'runtime_saved_values' will be removed in later
patch at switching time.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/util/stat-shadow.c | 17 +++++++++++++++++
 tools/perf/util/stat.h        |  3 +++
 2 files changed, 20 insertions(+)

diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 528be3e..e60c321 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -40,6 +40,7 @@ static struct stats runtime_aperf_stats[NUM_CTX][MAX_NR_CPUS];
 static struct rblist runtime_saved_values;
 static bool have_frontend_stalled;
 
+struct runtime_stat rt_stat;
 struct stats walltime_nsecs_stats;
 
 struct saved_value {
@@ -134,6 +135,21 @@ static struct saved_value *saved_value_lookup(struct perf_evsel *evsel,
 	return NULL;
 }
 
+void runtime_stat__init(struct runtime_stat *stat)
+{
+	struct rblist *rblist = &stat->value_list;
+
+	rblist__init(rblist);
+	rblist->node_cmp = saved_value_cmp;
+	rblist->node_new = saved_value_new;
+	rblist->node_delete = saved_value_delete;
+}
+
+void runtime_stat__exit(struct runtime_stat *stat)
+{
+	rblist__exit(&stat->value_list);
+}
+
 void perf_stat__init_shadow_stats(void)
 {
 	have_frontend_stalled = pmu_have_event("cpu", "stalled-cycles-frontend");
@@ -141,6 +157,7 @@ void perf_stat__init_shadow_stats(void)
 	runtime_saved_values.node_cmp = saved_value_cmp;
 	runtime_saved_values.node_new = saved_value_new;
 	runtime_saved_values.node_delete = saved_value_delete;
+	runtime_stat__init(&rt_stat);
 }
 
 static int evsel_context(struct perf_evsel *evsel)
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 290c51e..1e2b761 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -118,12 +118,15 @@ bool __perf_evsel_stat__is(struct perf_evsel *evsel,
 
 void perf_stat_evsel_id_init(struct perf_evsel *evsel);
 
+extern struct runtime_stat rt_stat;
 extern struct stats walltime_nsecs_stats;
 
 typedef void (*print_metric_t)(void *ctx, const char *color, const char *unit,
 			       const char *fmt, double val);
 typedef void (*new_line_t )(void *ctx);
 
+void runtime_stat__init(struct runtime_stat *stat);
+void runtime_stat__exit(struct runtime_stat *stat);
 void perf_stat__init_shadow_stats(void);
 void perf_stat__reset_shadow_stats(void);
 void perf_stat__update_shadow_stats(struct perf_evsel *counter, u64 count,
-- 
2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v5 06/12] perf util: Update and print per-thread shadow stats
  2017-12-01 10:57 [PATCH v5 00/12] perf stat: Enable '--per-thread' on all thread Jin Yao
                   ` (4 preceding siblings ...)
  2017-12-01 10:57 ` [PATCH v5 05/12] perf util: Create the runtime_stat init/exit function Jin Yao
@ 2017-12-01 10:57 ` Jin Yao
  2017-12-01 14:21   ` Arnaldo Carvalho de Melo
  2017-12-01 10:57 ` [PATCH v5 07/12] perf util: Remove a set of shadow stats static variables Jin Yao
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 27+ messages in thread
From: Jin Yao @ 2017-12-01 10:57 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

The functions perf_stat__update_shadow_stats() and
perf_stat__print_shadow_statss() are called to update
and print the shadow stats on a set of static variables.

But the static variables are the limitations to support
per-thread shadow stats.

This patch lets the perf_stat__update_shadow_stats() support
to update the shadow stats on a input parameter 'stat' and
uses update_runtime_stat() to update the stats. It will not
directly update the static variables as before.

And this patch also lets the perf_stat__print_shadow_stats()
support to print the shadow stats from a input parameter 'stat'.

It will not directly get value from static variable. Instead, it now
uses runtime_stat_avg() and runtime_stat_n() to get and compute the
values.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/builtin-script.c   |   6 +-
 tools/perf/builtin-stat.c     |  27 ++--
 tools/perf/util/stat-shadow.c | 293 +++++++++++++++++++++++++++---------------
 tools/perf/util/stat.c        |   8 +-
 tools/perf/util/stat.h        |   5 +-
 5 files changed, 219 insertions(+), 120 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 39d8b55..fac6f05 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1548,7 +1548,8 @@ static void perf_sample__fprint_metric(struct perf_script *script,
 	val = sample->period * evsel->scale;
 	perf_stat__update_shadow_stats(evsel,
 				       val,
-				       sample->cpu);
+				       sample->cpu,
+				       &rt_stat);
 	evsel_script(evsel)->val = val;
 	if (evsel_script(evsel->leader)->gnum == evsel->leader->nr_members) {
 		for_each_group_member (ev2, evsel->leader) {
@@ -1556,7 +1557,8 @@ static void perf_sample__fprint_metric(struct perf_script *script,
 						      evsel_script(ev2)->val,
 						      sample->cpu,
 						      &ctx,
-						      NULL);
+						      NULL,
+						      &rt_stat);
 		}
 		evsel_script(evsel->leader)->gnum = 0;
 	}
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index a027b47..1edc082 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1097,7 +1097,8 @@ static void abs_printout(int id, int nr, struct perf_evsel *evsel, double avg)
 }
 
 static void printout(int id, int nr, struct perf_evsel *counter, double uval,
-		     char *prefix, u64 run, u64 ena, double noise)
+		     char *prefix, u64 run, u64 ena, double noise,
+		     struct runtime_stat *stat)
 {
 	struct perf_stat_output_ctx out;
 	struct outstate os = {
@@ -1190,7 +1191,8 @@ static void printout(int id, int nr, struct perf_evsel *counter, double uval,
 
 	perf_stat__print_shadow_stats(counter, uval,
 				first_shadow_cpu(counter, id),
-				&out, &metric_events);
+				&out, &metric_events,
+				stat);
 	if (!csv_output && !metric_only) {
 		print_noise(counter, noise);
 		print_running(run, ena);
@@ -1214,7 +1216,8 @@ static void aggr_update_shadow(void)
 				val += perf_counts(counter->counts, cpu, 0)->val;
 			}
 			perf_stat__update_shadow_stats(counter, val,
-						       first_shadow_cpu(counter, id));
+					first_shadow_cpu(counter, id),
+					&rt_stat);
 		}
 	}
 }
@@ -1334,7 +1337,8 @@ static void print_aggr(char *prefix)
 				fprintf(output, "%s", prefix);
 
 			uval = val * counter->scale;
-			printout(id, nr, counter, uval, prefix, run, ena, 1.0);
+			printout(id, nr, counter, uval, prefix, run, ena, 1.0,
+				 &rt_stat);
 			if (!metric_only)
 				fputc('\n', output);
 		}
@@ -1364,7 +1368,8 @@ static void print_aggr_thread(struct perf_evsel *counter, char *prefix)
 			fprintf(output, "%s", prefix);
 
 		uval = val * counter->scale;
-		printout(thread, 0, counter, uval, prefix, run, ena, 1.0);
+		printout(thread, 0, counter, uval, prefix, run, ena, 1.0,
+			 &rt_stat);
 		fputc('\n', output);
 	}
 }
@@ -1401,7 +1406,8 @@ static void print_counter_aggr(struct perf_evsel *counter, char *prefix)
 		fprintf(output, "%s", prefix);
 
 	uval = cd.avg * counter->scale;
-	printout(-1, 0, counter, uval, prefix, cd.avg_running, cd.avg_enabled, cd.avg);
+	printout(-1, 0, counter, uval, prefix, cd.avg_running, cd.avg_enabled,
+		 cd.avg, &rt_stat);
 	if (!metric_only)
 		fprintf(output, "\n");
 }
@@ -1440,7 +1446,8 @@ static void print_counter(struct perf_evsel *counter, char *prefix)
 			fprintf(output, "%s", prefix);
 
 		uval = val * counter->scale;
-		printout(cpu, 0, counter, uval, prefix, run, ena, 1.0);
+		printout(cpu, 0, counter, uval, prefix, run, ena, 1.0,
+			 &rt_stat);
 
 		fputc('\n', output);
 	}
@@ -1472,7 +1479,8 @@ static void print_no_aggr_metric(char *prefix)
 			run = perf_counts(counter->counts, cpu, 0)->run;
 
 			uval = val * counter->scale;
-			printout(cpu, 0, counter, uval, prefix, run, ena, 1.0);
+			printout(cpu, 0, counter, uval, prefix, run, ena, 1.0,
+				 &rt_stat);
 		}
 		fputc('\n', stat_config.output);
 	}
@@ -1528,7 +1536,8 @@ static void print_metric_headers(const char *prefix, bool no_indent)
 		perf_stat__print_shadow_stats(counter, 0,
 					      0,
 					      &out,
-					      &metric_events);
+					      &metric_events,
+					      &rt_stat);
 	}
 	fputc('\n', stat_config.output);
 }
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index e60c321..0d34d5e 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -116,19 +116,29 @@ static void saved_value_delete(struct rblist *rblist __maybe_unused,
 
 static struct saved_value *saved_value_lookup(struct perf_evsel *evsel,
 					      int cpu,
-					      bool create)
+					      bool create,
+					      enum stat_type type,
+					      int ctx,
+					      struct runtime_stat *stat)
 {
+	struct rblist *rblist;
 	struct rb_node *nd;
 	struct saved_value dm = {
 		.cpu = cpu,
 		.evsel = evsel,
+		.type = type,
+		.ctx = ctx,
+		.stat = stat,
 	};
-	nd = rblist__find(&runtime_saved_values, &dm);
+
+	rblist = &stat->value_list;
+
+	nd = rblist__find(rblist, &dm);
 	if (nd)
 		return container_of(nd, struct saved_value, rb_node);
 	if (create) {
-		rblist__add_node(&runtime_saved_values, &dm);
-		nd = rblist__find(&runtime_saved_values, &dm);
+		rblist__add_node(rblist, &dm);
+		nd = rblist__find(rblist, &dm);
 		if (nd)
 			return container_of(nd, struct saved_value, rb_node);
 	}
@@ -217,13 +227,24 @@ void perf_stat__reset_shadow_stats(void)
 	}
 }
 
+static void update_runtime_stat(struct runtime_stat *stat,
+				enum stat_type type,
+				int ctx, int cpu, u64 count)
+{
+	struct saved_value *v = saved_value_lookup(NULL, cpu, true,
+						   type, ctx, stat);
+
+	if (v)
+		update_stats(&v->stats, count);
+}
+
 /*
  * Update various tracking values we maintain to print
  * more semantic information such as miss/hit ratios,
  * instruction rates, etc:
  */
 void perf_stat__update_shadow_stats(struct perf_evsel *counter, u64 count,
-				    int cpu)
+				    int cpu, struct runtime_stat *stat)
 {
 	int ctx = evsel_context(counter);
 
@@ -231,50 +252,58 @@ void perf_stat__update_shadow_stats(struct perf_evsel *counter, u64 count,
 
 	if (perf_evsel__match(counter, SOFTWARE, SW_TASK_CLOCK) ||
 	    perf_evsel__match(counter, SOFTWARE, SW_CPU_CLOCK))
-		update_stats(&runtime_nsecs_stats[cpu], count);
+		update_runtime_stat(stat, STAT_NSECS, 0, cpu, count);
 	else if (perf_evsel__match(counter, HARDWARE, HW_CPU_CYCLES))
-		update_stats(&runtime_cycles_stats[ctx][cpu], count);
+		update_runtime_stat(stat, STAT_CYCLES, ctx, cpu, count);
 	else if (perf_stat_evsel__is(counter, CYCLES_IN_TX))
-		update_stats(&runtime_cycles_in_tx_stats[ctx][cpu], count);
+		update_runtime_stat(stat, STAT_CYCLES_IN_TX, ctx, cpu, count);
 	else if (perf_stat_evsel__is(counter, TRANSACTION_START))
-		update_stats(&runtime_transaction_stats[ctx][cpu], count);
+		update_runtime_stat(stat, STAT_TRANSACTION, ctx, cpu, count);
 	else if (perf_stat_evsel__is(counter, ELISION_START))
-		update_stats(&runtime_elision_stats[ctx][cpu], count);
+		update_runtime_stat(stat, STAT_ELISION, ctx, cpu, count);
 	else if (perf_stat_evsel__is(counter, TOPDOWN_TOTAL_SLOTS))
-		update_stats(&runtime_topdown_total_slots[ctx][cpu], count);
+		update_runtime_stat(stat, STAT_TOPDOWN_TOTAL_SLOTS,
+				    ctx, cpu, count);
 	else if (perf_stat_evsel__is(counter, TOPDOWN_SLOTS_ISSUED))
-		update_stats(&runtime_topdown_slots_issued[ctx][cpu], count);
+		update_runtime_stat(stat, STAT_TOPDOWN_SLOTS_ISSUED,
+				    ctx, cpu, count);
 	else if (perf_stat_evsel__is(counter, TOPDOWN_SLOTS_RETIRED))
-		update_stats(&runtime_topdown_slots_retired[ctx][cpu], count);
+		update_runtime_stat(stat, STAT_TOPDOWN_SLOTS_RETIRED,
+				    ctx, cpu, count);
 	else if (perf_stat_evsel__is(counter, TOPDOWN_FETCH_BUBBLES))
-		update_stats(&runtime_topdown_fetch_bubbles[ctx][cpu], count);
+		update_runtime_stat(stat, STAT_TOPDOWN_FETCH_BUBBLES,
+				    ctx, cpu, count);
 	else if (perf_stat_evsel__is(counter, TOPDOWN_RECOVERY_BUBBLES))
-		update_stats(&runtime_topdown_recovery_bubbles[ctx][cpu], count);
+		update_runtime_stat(stat, STAT_TOPDOWN_RECOVERY_BUBBLES,
+				    ctx, cpu, count);
 	else if (perf_evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_FRONTEND))
-		update_stats(&runtime_stalled_cycles_front_stats[ctx][cpu], count);
+		update_runtime_stat(stat, STAT_STALLED_CYCLES_FRONT,
+				    ctx, cpu, count);
 	else if (perf_evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_BACKEND))
-		update_stats(&runtime_stalled_cycles_back_stats[ctx][cpu], count);
+		update_runtime_stat(stat, STAT_STALLED_CYCLES_BACK,
+				    ctx, cpu, count);
 	else if (perf_evsel__match(counter, HARDWARE, HW_BRANCH_INSTRUCTIONS))
-		update_stats(&runtime_branches_stats[ctx][cpu], count);
+		update_runtime_stat(stat, STAT_BRANCHES, ctx, cpu, count);
 	else if (perf_evsel__match(counter, HARDWARE, HW_CACHE_REFERENCES))
-		update_stats(&runtime_cacherefs_stats[ctx][cpu], count);
+		update_runtime_stat(stat, STAT_CACHEREFS, ctx, cpu, count);
 	else if (perf_evsel__match(counter, HW_CACHE, HW_CACHE_L1D))
-		update_stats(&runtime_l1_dcache_stats[ctx][cpu], count);
+		update_runtime_stat(stat, STAT_L1_DCACHE, ctx, cpu, count);
 	else if (perf_evsel__match(counter, HW_CACHE, HW_CACHE_L1I))
-		update_stats(&runtime_ll_cache_stats[ctx][cpu], count);
+		update_runtime_stat(stat, STAT_L1_ICACHE, ctx, cpu, count);
 	else if (perf_evsel__match(counter, HW_CACHE, HW_CACHE_LL))
-		update_stats(&runtime_ll_cache_stats[ctx][cpu], count);
+		update_runtime_stat(stat, STAT_LL_CACHE, ctx, cpu, count);
 	else if (perf_evsel__match(counter, HW_CACHE, HW_CACHE_DTLB))
-		update_stats(&runtime_dtlb_cache_stats[ctx][cpu], count);
+		update_runtime_stat(stat, STAT_DTLB_CACHE, ctx, cpu, count);
 	else if (perf_evsel__match(counter, HW_CACHE, HW_CACHE_ITLB))
-		update_stats(&runtime_itlb_cache_stats[ctx][cpu], count);
+		update_runtime_stat(stat, STAT_ITLB_CACHE, ctx, cpu, count);
 	else if (perf_stat_evsel__is(counter, SMI_NUM))
-		update_stats(&runtime_smi_num_stats[ctx][cpu], count);
+		update_runtime_stat(stat, STAT_SMI_NUM, ctx, cpu, count);
 	else if (perf_stat_evsel__is(counter, APERF))
-		update_stats(&runtime_aperf_stats[ctx][cpu], count);
+		update_runtime_stat(stat, STAT_APERF, ctx, cpu, count);
 
 	if (counter->collect_stat) {
-		struct saved_value *v = saved_value_lookup(counter, cpu, true);
+		struct saved_value *v = saved_value_lookup(counter, cpu, true,
+							   STAT_NONE, 0, stat);
 		update_stats(&v->stats, count);
 	}
 }
@@ -395,15 +424,40 @@ void perf_stat__collect_metric_expr(struct perf_evlist *evsel_list)
 	}
 }
 
+static double runtime_stat_avg(struct runtime_stat *stat,
+			       enum stat_type type, int ctx, int cpu)
+{
+	struct saved_value *v;
+
+	v = saved_value_lookup(NULL, cpu, false, type, ctx, stat);
+	if (!v)
+		return 0.0;
+
+	return avg_stats(&v->stats);
+}
+
+static double runtime_stat_n(struct runtime_stat *stat,
+			     enum stat_type type, int ctx, int cpu)
+{
+	struct saved_value *v;
+
+	v = saved_value_lookup(NULL, cpu, false, type, ctx, stat);
+	if (!v)
+		return 0.0;
+
+	return v->stats.n;
+}
+
 static void print_stalled_cycles_frontend(int cpu,
 					  struct perf_evsel *evsel, double avg,
-					  struct perf_stat_output_ctx *out)
+					  struct perf_stat_output_ctx *out,
+					  struct runtime_stat *stat)
 {
 	double total, ratio = 0.0;
 	const char *color;
 	int ctx = evsel_context(evsel);
 
-	total = avg_stats(&runtime_cycles_stats[ctx][cpu]);
+	total = runtime_stat_avg(stat, STAT_CYCLES, ctx, cpu);
 
 	if (total)
 		ratio = avg / total * 100.0;
@@ -419,13 +473,14 @@ static void print_stalled_cycles_frontend(int cpu,
 
 static void print_stalled_cycles_backend(int cpu,
 					 struct perf_evsel *evsel, double avg,
-					 struct perf_stat_output_ctx *out)
+					 struct perf_stat_output_ctx *out,
+					 struct runtime_stat *stat)
 {
 	double total, ratio = 0.0;
 	const char *color;
 	int ctx = evsel_context(evsel);
 
-	total = avg_stats(&runtime_cycles_stats[ctx][cpu]);
+	total = runtime_stat_avg(stat, STAT_CYCLES, ctx, cpu);
 
 	if (total)
 		ratio = avg / total * 100.0;
@@ -438,13 +493,14 @@ static void print_stalled_cycles_backend(int cpu,
 static void print_branch_misses(int cpu,
 				struct perf_evsel *evsel,
 				double avg,
-				struct perf_stat_output_ctx *out)
+				struct perf_stat_output_ctx *out,
+				struct runtime_stat *stat)
 {
 	double total, ratio = 0.0;
 	const char *color;
 	int ctx = evsel_context(evsel);
 
-	total = avg_stats(&runtime_branches_stats[ctx][cpu]);
+	total = runtime_stat_avg(stat, STAT_BRANCHES, ctx, cpu);
 
 	if (total)
 		ratio = avg / total * 100.0;
@@ -457,13 +513,15 @@ static void print_branch_misses(int cpu,
 static void print_l1_dcache_misses(int cpu,
 				   struct perf_evsel *evsel,
 				   double avg,
-				   struct perf_stat_output_ctx *out)
+				   struct perf_stat_output_ctx *out,
+				   struct runtime_stat *stat)
+
 {
 	double total, ratio = 0.0;
 	const char *color;
 	int ctx = evsel_context(evsel);
 
-	total = avg_stats(&runtime_l1_dcache_stats[ctx][cpu]);
+	total = runtime_stat_avg(stat, STAT_L1_DCACHE, ctx, cpu);
 
 	if (total)
 		ratio = avg / total * 100.0;
@@ -476,13 +534,15 @@ static void print_l1_dcache_misses(int cpu,
 static void print_l1_icache_misses(int cpu,
 				   struct perf_evsel *evsel,
 				   double avg,
-				   struct perf_stat_output_ctx *out)
+				   struct perf_stat_output_ctx *out,
+				   struct runtime_stat *stat)
+
 {
 	double total, ratio = 0.0;
 	const char *color;
 	int ctx = evsel_context(evsel);
 
-	total = avg_stats(&runtime_l1_icache_stats[ctx][cpu]);
+	total = runtime_stat_avg(stat, STAT_L1_ICACHE, ctx, cpu);
 
 	if (total)
 		ratio = avg / total * 100.0;
@@ -494,13 +554,14 @@ static void print_l1_icache_misses(int cpu,
 static void print_dtlb_cache_misses(int cpu,
 				    struct perf_evsel *evsel,
 				    double avg,
-				    struct perf_stat_output_ctx *out)
+				    struct perf_stat_output_ctx *out,
+				    struct runtime_stat *stat)
 {
 	double total, ratio = 0.0;
 	const char *color;
 	int ctx = evsel_context(evsel);
 
-	total = avg_stats(&runtime_dtlb_cache_stats[ctx][cpu]);
+	total = runtime_stat_avg(stat, STAT_DTLB_CACHE, ctx, cpu);
 
 	if (total)
 		ratio = avg / total * 100.0;
@@ -512,13 +573,14 @@ static void print_dtlb_cache_misses(int cpu,
 static void print_itlb_cache_misses(int cpu,
 				    struct perf_evsel *evsel,
 				    double avg,
-				    struct perf_stat_output_ctx *out)
+				    struct perf_stat_output_ctx *out,
+				    struct runtime_stat *stat)
 {
 	double total, ratio = 0.0;
 	const char *color;
 	int ctx = evsel_context(evsel);
 
-	total = avg_stats(&runtime_itlb_cache_stats[ctx][cpu]);
+	total = runtime_stat_avg(stat, STAT_ITLB_CACHE, ctx, cpu);
 
 	if (total)
 		ratio = avg / total * 100.0;
@@ -530,13 +592,14 @@ static void print_itlb_cache_misses(int cpu,
 static void print_ll_cache_misses(int cpu,
 				  struct perf_evsel *evsel,
 				  double avg,
-				  struct perf_stat_output_ctx *out)
+				  struct perf_stat_output_ctx *out,
+				  struct runtime_stat *stat)
 {
 	double total, ratio = 0.0;
 	const char *color;
 	int ctx = evsel_context(evsel);
 
-	total = avg_stats(&runtime_ll_cache_stats[ctx][cpu]);
+	total = runtime_stat_avg(stat, STAT_LL_CACHE, ctx, cpu);
 
 	if (total)
 		ratio = avg / total * 100.0;
@@ -594,68 +657,72 @@ static double sanitize_val(double x)
 	return x;
 }
 
-static double td_total_slots(int ctx, int cpu)
+static double td_total_slots(int ctx, int cpu, struct runtime_stat *stat)
 {
-	return avg_stats(&runtime_topdown_total_slots[ctx][cpu]);
+	return runtime_stat_avg(stat, STAT_TOPDOWN_TOTAL_SLOTS, ctx, cpu);
 }
 
-static double td_bad_spec(int ctx, int cpu)
+static double td_bad_spec(int ctx, int cpu, struct runtime_stat *stat)
 {
 	double bad_spec = 0;
 	double total_slots;
 	double total;
 
-	total = avg_stats(&runtime_topdown_slots_issued[ctx][cpu]) -
-		avg_stats(&runtime_topdown_slots_retired[ctx][cpu]) +
-		avg_stats(&runtime_topdown_recovery_bubbles[ctx][cpu]);
-	total_slots = td_total_slots(ctx, cpu);
+	total = runtime_stat_avg(stat, STAT_TOPDOWN_SLOTS_ISSUED, ctx, cpu) -
+		runtime_stat_avg(stat, STAT_TOPDOWN_SLOTS_RETIRED, ctx, cpu) +
+		runtime_stat_avg(stat, STAT_TOPDOWN_RECOVERY_BUBBLES, ctx, cpu);
+
+	total_slots = td_total_slots(ctx, cpu, stat);
 	if (total_slots)
 		bad_spec = total / total_slots;
 	return sanitize_val(bad_spec);
 }
 
-static double td_retiring(int ctx, int cpu)
+static double td_retiring(int ctx, int cpu, struct runtime_stat *stat)
 {
 	double retiring = 0;
-	double total_slots = td_total_slots(ctx, cpu);
-	double ret_slots = avg_stats(&runtime_topdown_slots_retired[ctx][cpu]);
+	double total_slots = td_total_slots(ctx, cpu, stat);
+	double ret_slots = runtime_stat_avg(stat, STAT_TOPDOWN_SLOTS_RETIRED,
+					    ctx, cpu);
 
 	if (total_slots)
 		retiring = ret_slots / total_slots;
 	return retiring;
 }
 
-static double td_fe_bound(int ctx, int cpu)
+static double td_fe_bound(int ctx, int cpu, struct runtime_stat *stat)
 {
 	double fe_bound = 0;
-	double total_slots = td_total_slots(ctx, cpu);
-	double fetch_bub = avg_stats(&runtime_topdown_fetch_bubbles[ctx][cpu]);
+	double total_slots = td_total_slots(ctx, cpu, stat);
+	double fetch_bub = runtime_stat_avg(stat, STAT_TOPDOWN_FETCH_BUBBLES,
+					    ctx, cpu);
 
 	if (total_slots)
 		fe_bound = fetch_bub / total_slots;
 	return fe_bound;
 }
 
-static double td_be_bound(int ctx, int cpu)
+static double td_be_bound(int ctx, int cpu, struct runtime_stat *stat)
 {
-	double sum = (td_fe_bound(ctx, cpu) +
-		      td_bad_spec(ctx, cpu) +
-		      td_retiring(ctx, cpu));
+	double sum = (td_fe_bound(ctx, cpu, stat) +
+		      td_bad_spec(ctx, cpu, stat) +
+		      td_retiring(ctx, cpu, stat));
 	if (sum == 0)
 		return 0;
 	return sanitize_val(1.0 - sum);
 }
 
 static void print_smi_cost(int cpu, struct perf_evsel *evsel,
-			   struct perf_stat_output_ctx *out)
+			   struct perf_stat_output_ctx *out,
+			   struct runtime_stat *stat)
 {
 	double smi_num, aperf, cycles, cost = 0.0;
 	int ctx = evsel_context(evsel);
 	const char *color = NULL;
 
-	smi_num = avg_stats(&runtime_smi_num_stats[ctx][cpu]);
-	aperf = avg_stats(&runtime_aperf_stats[ctx][cpu]);
-	cycles = avg_stats(&runtime_cycles_stats[ctx][cpu]);
+	smi_num = runtime_stat_avg(stat, STAT_SMI_NUM, ctx, cpu);
+	aperf = runtime_stat_avg(stat, STAT_APERF, ctx, cpu);
+	cycles = runtime_stat_avg(stat, STAT_CYCLES, ctx, cpu);
 
 	if ((cycles == 0) || (aperf == 0))
 		return;
@@ -675,7 +742,8 @@ static void generic_metric(const char *metric_expr,
 			   const char *metric_name,
 			   double avg,
 			   int cpu,
-			   struct perf_stat_output_ctx *out)
+			   struct perf_stat_output_ctx *out,
+			   struct runtime_stat *stat)
 {
 	print_metric_t print_metric = out->print_metric;
 	struct parse_ctx pctx;
@@ -694,7 +762,8 @@ static void generic_metric(const char *metric_expr,
 			stats = &walltime_nsecs_stats;
 			scale = 1e-9;
 		} else {
-			v = saved_value_lookup(metric_events[i], cpu, false);
+			v = saved_value_lookup(metric_events[i], cpu, false,
+					       STAT_NONE, 0, stat);
 			if (!v)
 				break;
 			stats = &v->stats;
@@ -722,7 +791,8 @@ static void generic_metric(const char *metric_expr,
 void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
 				   double avg, int cpu,
 				   struct perf_stat_output_ctx *out,
-				   struct rblist *metric_events)
+				   struct rblist *metric_events,
+				   struct runtime_stat *stat)
 {
 	void *ctxp = out->ctx;
 	print_metric_t print_metric = out->print_metric;
@@ -733,7 +803,8 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
 	int num = 1;
 
 	if (perf_evsel__match(evsel, HARDWARE, HW_INSTRUCTIONS)) {
-		total = avg_stats(&runtime_cycles_stats[ctx][cpu]);
+		total = runtime_stat_avg(stat, STAT_CYCLES, ctx, cpu);
+
 		if (total) {
 			ratio = avg / total;
 			print_metric(ctxp, NULL, "%7.2f ",
@@ -741,8 +812,13 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
 		} else {
 			print_metric(ctxp, NULL, NULL, "insn per cycle", 0);
 		}
-		total = avg_stats(&runtime_stalled_cycles_front_stats[ctx][cpu]);
-		total = max(total, avg_stats(&runtime_stalled_cycles_back_stats[ctx][cpu]));
+
+		total = runtime_stat_avg(stat, STAT_STALLED_CYCLES_FRONT,
+					 ctx, cpu);
+
+		total = max(total, runtime_stat_avg(stat,
+						    STAT_STALLED_CYCLES_BACK,
+						    ctx, cpu));
 
 		if (total && avg) {
 			out->new_line(ctxp);
@@ -755,8 +831,8 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
 				     "stalled cycles per insn", 0);
 		}
 	} else if (perf_evsel__match(evsel, HARDWARE, HW_BRANCH_MISSES)) {
-		if (runtime_branches_stats[ctx][cpu].n != 0)
-			print_branch_misses(cpu, evsel, avg, out);
+		if (runtime_stat_n(stat, STAT_BRANCHES, ctx, cpu) != 0)
+			print_branch_misses(cpu, evsel, avg, out, stat);
 		else
 			print_metric(ctxp, NULL, NULL, "of all branches", 0);
 	} else if (
@@ -764,8 +840,9 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
 		evsel->attr.config ==  ( PERF_COUNT_HW_CACHE_L1D |
 					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
 					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
-		if (runtime_l1_dcache_stats[ctx][cpu].n != 0)
-			print_l1_dcache_misses(cpu, evsel, avg, out);
+
+		if (runtime_stat_n(stat, STAT_L1_DCACHE, ctx, cpu) != 0)
+			print_l1_dcache_misses(cpu, evsel, avg, out, stat);
 		else
 			print_metric(ctxp, NULL, NULL, "of all L1-dcache hits", 0);
 	} else if (
@@ -773,8 +850,9 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
 		evsel->attr.config ==  ( PERF_COUNT_HW_CACHE_L1I |
 					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
 					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
-		if (runtime_l1_icache_stats[ctx][cpu].n != 0)
-			print_l1_icache_misses(cpu, evsel, avg, out);
+
+		if (runtime_stat_n(stat, STAT_L1_ICACHE, ctx, cpu) != 0)
+			print_l1_icache_misses(cpu, evsel, avg, out, stat);
 		else
 			print_metric(ctxp, NULL, NULL, "of all L1-icache hits", 0);
 	} else if (
@@ -782,8 +860,9 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
 		evsel->attr.config ==  ( PERF_COUNT_HW_CACHE_DTLB |
 					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
 					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
-		if (runtime_dtlb_cache_stats[ctx][cpu].n != 0)
-			print_dtlb_cache_misses(cpu, evsel, avg, out);
+
+		if (runtime_stat_n(stat, STAT_DTLB_CACHE, ctx, cpu) != 0)
+			print_dtlb_cache_misses(cpu, evsel, avg, out, stat);
 		else
 			print_metric(ctxp, NULL, NULL, "of all dTLB cache hits", 0);
 	} else if (
@@ -791,8 +870,9 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
 		evsel->attr.config ==  ( PERF_COUNT_HW_CACHE_ITLB |
 					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
 					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
-		if (runtime_itlb_cache_stats[ctx][cpu].n != 0)
-			print_itlb_cache_misses(cpu, evsel, avg, out);
+
+		if (runtime_stat_n(stat, STAT_ITLB_CACHE, ctx, cpu) != 0)
+			print_itlb_cache_misses(cpu, evsel, avg, out, stat);
 		else
 			print_metric(ctxp, NULL, NULL, "of all iTLB cache hits", 0);
 	} else if (
@@ -800,27 +880,28 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
 		evsel->attr.config ==  ( PERF_COUNT_HW_CACHE_LL |
 					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
 					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
-		if (runtime_ll_cache_stats[ctx][cpu].n != 0)
-			print_ll_cache_misses(cpu, evsel, avg, out);
+
+		if (runtime_stat_n(stat, STAT_LL_CACHE, ctx, cpu) != 0)
+			print_ll_cache_misses(cpu, evsel, avg, out, stat);
 		else
 			print_metric(ctxp, NULL, NULL, "of all LL-cache hits", 0);
 	} else if (perf_evsel__match(evsel, HARDWARE, HW_CACHE_MISSES)) {
-		total = avg_stats(&runtime_cacherefs_stats[ctx][cpu]);
+		total = runtime_stat_avg(stat, STAT_CACHEREFS, ctx, cpu);
 
 		if (total)
 			ratio = avg * 100 / total;
 
-		if (runtime_cacherefs_stats[ctx][cpu].n != 0)
+		if (runtime_stat_n(stat, STAT_CACHEREFS, ctx, cpu) != 0)
 			print_metric(ctxp, NULL, "%8.3f %%",
 				     "of all cache refs", ratio);
 		else
 			print_metric(ctxp, NULL, NULL, "of all cache refs", 0);
 	} else if (perf_evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_FRONTEND)) {
-		print_stalled_cycles_frontend(cpu, evsel, avg, out);
+		print_stalled_cycles_frontend(cpu, evsel, avg, out, stat);
 	} else if (perf_evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_BACKEND)) {
-		print_stalled_cycles_backend(cpu, evsel, avg, out);
+		print_stalled_cycles_backend(cpu, evsel, avg, out, stat);
 	} else if (perf_evsel__match(evsel, HARDWARE, HW_CPU_CYCLES)) {
-		total = avg_stats(&runtime_nsecs_stats[cpu]);
+		total = runtime_stat_avg(stat, STAT_NSECS, 0, cpu);
 
 		if (total) {
 			ratio = avg / total;
@@ -829,7 +910,8 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
 			print_metric(ctxp, NULL, NULL, "Ghz", 0);
 		}
 	} else if (perf_stat_evsel__is(evsel, CYCLES_IN_TX)) {
-		total = avg_stats(&runtime_cycles_stats[ctx][cpu]);
+		total = runtime_stat_avg(stat, STAT_CYCLES, ctx, cpu);
+
 		if (total)
 			print_metric(ctxp, NULL,
 					"%7.2f%%", "transactional cycles",
@@ -838,8 +920,9 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
 			print_metric(ctxp, NULL, NULL, "transactional cycles",
 				     0);
 	} else if (perf_stat_evsel__is(evsel, CYCLES_IN_TX_CP)) {
-		total = avg_stats(&runtime_cycles_stats[ctx][cpu]);
-		total2 = avg_stats(&runtime_cycles_in_tx_stats[ctx][cpu]);
+		total = runtime_stat_avg(stat, STAT_CYCLES, ctx, cpu);
+		total2 = runtime_stat_avg(stat, STAT_CYCLES_IN_TX, ctx, cpu);
+
 		if (total2 < avg)
 			total2 = avg;
 		if (total)
@@ -848,19 +931,21 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
 		else
 			print_metric(ctxp, NULL, NULL, "aborted cycles", 0);
 	} else if (perf_stat_evsel__is(evsel, TRANSACTION_START)) {
-		total = avg_stats(&runtime_cycles_in_tx_stats[ctx][cpu]);
+		total = runtime_stat_avg(stat, STAT_CYCLES_IN_TX,
+					 ctx, cpu);
 
 		if (avg)
 			ratio = total / avg;
 
-		if (runtime_cycles_in_tx_stats[ctx][cpu].n != 0)
+		if (runtime_stat_n(stat, STAT_CYCLES_IN_TX, ctx, cpu) != 0)
 			print_metric(ctxp, NULL, "%8.0f",
 				     "cycles / transaction", ratio);
 		else
 			print_metric(ctxp, NULL, NULL, "cycles / transaction",
-				     0);
+				      0);
 	} else if (perf_stat_evsel__is(evsel, ELISION_START)) {
-		total = avg_stats(&runtime_cycles_in_tx_stats[ctx][cpu]);
+		total = runtime_stat_avg(stat, STAT_CYCLES_IN_TX,
+					 ctx, cpu);
 
 		if (avg)
 			ratio = total / avg;
@@ -874,28 +959,28 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
 		else
 			print_metric(ctxp, NULL, NULL, "CPUs utilized", 0);
 	} else if (perf_stat_evsel__is(evsel, TOPDOWN_FETCH_BUBBLES)) {
-		double fe_bound = td_fe_bound(ctx, cpu);
+		double fe_bound = td_fe_bound(ctx, cpu, stat);
 
 		if (fe_bound > 0.2)
 			color = PERF_COLOR_RED;
 		print_metric(ctxp, color, "%8.1f%%", "frontend bound",
 				fe_bound * 100.);
 	} else if (perf_stat_evsel__is(evsel, TOPDOWN_SLOTS_RETIRED)) {
-		double retiring = td_retiring(ctx, cpu);
+		double retiring = td_retiring(ctx, cpu, stat);
 
 		if (retiring > 0.7)
 			color = PERF_COLOR_GREEN;
 		print_metric(ctxp, color, "%8.1f%%", "retiring",
 				retiring * 100.);
 	} else if (perf_stat_evsel__is(evsel, TOPDOWN_RECOVERY_BUBBLES)) {
-		double bad_spec = td_bad_spec(ctx, cpu);
+		double bad_spec = td_bad_spec(ctx, cpu, stat);
 
 		if (bad_spec > 0.1)
 			color = PERF_COLOR_RED;
 		print_metric(ctxp, color, "%8.1f%%", "bad speculation",
 				bad_spec * 100.);
 	} else if (perf_stat_evsel__is(evsel, TOPDOWN_SLOTS_ISSUED)) {
-		double be_bound = td_be_bound(ctx, cpu);
+		double be_bound = td_be_bound(ctx, cpu, stat);
 		const char *name = "backend bound";
 		static int have_recovery_bubbles = -1;
 
@@ -908,19 +993,19 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
 
 		if (be_bound > 0.2)
 			color = PERF_COLOR_RED;
-		if (td_total_slots(ctx, cpu) > 0)
+		if (td_total_slots(ctx, cpu, stat) > 0)
 			print_metric(ctxp, color, "%8.1f%%", name,
 					be_bound * 100.);
 		else
 			print_metric(ctxp, NULL, NULL, name, 0);
 	} else if (evsel->metric_expr) {
 		generic_metric(evsel->metric_expr, evsel->metric_events, evsel->name,
-				evsel->metric_name, avg, cpu, out);
-	} else if (runtime_nsecs_stats[cpu].n != 0) {
+				evsel->metric_name, avg, cpu, out, stat);
+	} else if (runtime_stat_n(stat, STAT_NSECS, 0, cpu) != 0) {
 		char unit = 'M';
 		char unit_buf[10];
 
-		total = avg_stats(&runtime_nsecs_stats[cpu]);
+		total = runtime_stat_avg(stat, STAT_NSECS, 0, cpu);
 
 		if (total)
 			ratio = 1000.0 * avg / total;
@@ -931,7 +1016,7 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
 		snprintf(unit_buf, sizeof(unit_buf), "%c/sec", unit);
 		print_metric(ctxp, NULL, "%8.3f", unit_buf, ratio);
 	} else if (perf_stat_evsel__is(evsel, SMI_NUM)) {
-		print_smi_cost(cpu, evsel, out);
+		print_smi_cost(cpu, evsel, out, stat);
 	} else {
 		num = 0;
 	}
@@ -944,7 +1029,7 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
 				out->new_line(ctxp);
 			generic_metric(mexp->metric_expr, mexp->metric_events,
 					evsel->name, mexp->metric_name,
-					avg, cpu, out);
+					avg, cpu, out, stat);
 		}
 	}
 	if (num == 0)
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 151e9ef..78abfd4 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -278,9 +278,11 @@ process_counter_values(struct perf_stat_config *config, struct perf_evsel *evsel
 			perf_evsel__compute_deltas(evsel, cpu, thread, count);
 		perf_counts_values__scale(count, config->scale, NULL);
 		if (config->aggr_mode == AGGR_NONE)
-			perf_stat__update_shadow_stats(evsel, count->val, cpu);
+			perf_stat__update_shadow_stats(evsel, count->val, cpu,
+						       &rt_stat);
 		if (config->aggr_mode == AGGR_THREAD)
-			perf_stat__update_shadow_stats(evsel, count->val, 0);
+			perf_stat__update_shadow_stats(evsel, count->val, 0,
+						       &rt_stat);
 		break;
 	case AGGR_GLOBAL:
 		aggr->val += count->val;
@@ -362,7 +364,7 @@ int perf_stat_process_counter(struct perf_stat_config *config,
 	/*
 	 * Save the full runtime - to allow normalization during printout:
 	 */
-	perf_stat__update_shadow_stats(counter, *count, 0);
+	perf_stat__update_shadow_stats(counter, *count, 0, &rt_stat);
 
 	return 0;
 }
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 1e2b761..b8448b1 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -130,7 +130,7 @@ void runtime_stat__exit(struct runtime_stat *stat);
 void perf_stat__init_shadow_stats(void);
 void perf_stat__reset_shadow_stats(void);
 void perf_stat__update_shadow_stats(struct perf_evsel *counter, u64 count,
-				    int cpu);
+				    int cpu, struct runtime_stat *stat);
 struct perf_stat_output_ctx {
 	void *ctx;
 	print_metric_t print_metric;
@@ -141,7 +141,8 @@ struct perf_stat_output_ctx {
 void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
 				   double avg, int cpu,
 				   struct perf_stat_output_ctx *out,
-				   struct rblist *metric_events);
+				   struct rblist *metric_events,
+				   struct runtime_stat *stat);
 void perf_stat__collect_metric_expr(struct perf_evlist *);
 
 int perf_evlist__alloc_stats(struct perf_evlist *evlist, bool alloc_raw);
-- 
2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v5 07/12] perf util: Remove a set of shadow stats static variables
  2017-12-01 10:57 [PATCH v5 00/12] perf stat: Enable '--per-thread' on all thread Jin Yao
                   ` (5 preceding siblings ...)
  2017-12-01 10:57 ` [PATCH v5 06/12] perf util: Update and print per-thread shadow stats Jin Yao
@ 2017-12-01 10:57 ` Jin Yao
  2017-12-01 10:57 ` [PATCH v5 08/12] perf stat: Allocate shadow stats buffer for threads Jin Yao
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Jin Yao @ 2017-12-01 10:57 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

In previous patches, we have reconstructed the code and let
it not access the static variables directly.

This patch removes these static variables.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/util/stat-shadow.c | 68 ++++++++++---------------------------------
 tools/perf/util/stat.h        |  1 +
 2 files changed, 16 insertions(+), 53 deletions(-)

diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 0d34d5e..3b929fb 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -16,28 +16,6 @@
  * AGGR_NONE: Use matching CPU
  * AGGR_THREAD: Not supported?
  */
-static struct stats runtime_nsecs_stats[MAX_NR_CPUS];
-static struct stats runtime_cycles_stats[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_stalled_cycles_front_stats[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_stalled_cycles_back_stats[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_branches_stats[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_cacherefs_stats[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_l1_dcache_stats[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_l1_icache_stats[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_ll_cache_stats[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_itlb_cache_stats[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_dtlb_cache_stats[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_cycles_in_tx_stats[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_transaction_stats[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_elision_stats[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_topdown_total_slots[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_topdown_slots_issued[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_topdown_slots_retired[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_topdown_fetch_bubbles[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_topdown_recovery_bubbles[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_smi_num_stats[NUM_CTX][MAX_NR_CPUS];
-static struct stats runtime_aperf_stats[NUM_CTX][MAX_NR_CPUS];
-static struct rblist runtime_saved_values;
 static bool have_frontend_stalled;
 
 struct runtime_stat rt_stat;
@@ -163,10 +141,6 @@ void runtime_stat__exit(struct runtime_stat *stat)
 void perf_stat__init_shadow_stats(void)
 {
 	have_frontend_stalled = pmu_have_event("cpu", "stalled-cycles-frontend");
-	rblist__init(&runtime_saved_values);
-	runtime_saved_values.node_cmp = saved_value_cmp;
-	runtime_saved_values.node_new = saved_value_new;
-	runtime_saved_values.node_delete = saved_value_delete;
 	runtime_stat__init(&rt_stat);
 }
 
@@ -188,36 +162,13 @@ static int evsel_context(struct perf_evsel *evsel)
 	return ctx;
 }
 
-void perf_stat__reset_shadow_stats(void)
+static void reset_stat(struct runtime_stat *stat)
 {
+	struct rblist *rblist;
 	struct rb_node *pos, *next;
 
-	memset(runtime_nsecs_stats, 0, sizeof(runtime_nsecs_stats));
-	memset(runtime_cycles_stats, 0, sizeof(runtime_cycles_stats));
-	memset(runtime_stalled_cycles_front_stats, 0, sizeof(runtime_stalled_cycles_front_stats));
-	memset(runtime_stalled_cycles_back_stats, 0, sizeof(runtime_stalled_cycles_back_stats));
-	memset(runtime_branches_stats, 0, sizeof(runtime_branches_stats));
-	memset(runtime_cacherefs_stats, 0, sizeof(runtime_cacherefs_stats));
-	memset(runtime_l1_dcache_stats, 0, sizeof(runtime_l1_dcache_stats));
-	memset(runtime_l1_icache_stats, 0, sizeof(runtime_l1_icache_stats));
-	memset(runtime_ll_cache_stats, 0, sizeof(runtime_ll_cache_stats));
-	memset(runtime_itlb_cache_stats, 0, sizeof(runtime_itlb_cache_stats));
-	memset(runtime_dtlb_cache_stats, 0, sizeof(runtime_dtlb_cache_stats));
-	memset(runtime_cycles_in_tx_stats, 0,
-			sizeof(runtime_cycles_in_tx_stats));
-	memset(runtime_transaction_stats, 0,
-		sizeof(runtime_transaction_stats));
-	memset(runtime_elision_stats, 0, sizeof(runtime_elision_stats));
-	memset(&walltime_nsecs_stats, 0, sizeof(walltime_nsecs_stats));
-	memset(runtime_topdown_total_slots, 0, sizeof(runtime_topdown_total_slots));
-	memset(runtime_topdown_slots_retired, 0, sizeof(runtime_topdown_slots_retired));
-	memset(runtime_topdown_slots_issued, 0, sizeof(runtime_topdown_slots_issued));
-	memset(runtime_topdown_fetch_bubbles, 0, sizeof(runtime_topdown_fetch_bubbles));
-	memset(runtime_topdown_recovery_bubbles, 0, sizeof(runtime_topdown_recovery_bubbles));
-	memset(runtime_smi_num_stats, 0, sizeof(runtime_smi_num_stats));
-	memset(runtime_aperf_stats, 0, sizeof(runtime_aperf_stats));
-
-	next = rb_first(&runtime_saved_values.entries);
+	rblist = &stat->value_list;
+	next = rb_first(&rblist->entries);
 	while (next) {
 		pos = next;
 		next = rb_next(pos);
@@ -227,6 +178,17 @@ void perf_stat__reset_shadow_stats(void)
 	}
 }
 
+void perf_stat__reset_shadow_stats(void)
+{
+	reset_stat(&rt_stat);
+	memset(&walltime_nsecs_stats, 0, sizeof(walltime_nsecs_stats));
+}
+
+void perf_stat__reset_shadow_per_stat(struct runtime_stat *stat)
+{
+	reset_stat(stat);
+}
+
 static void update_runtime_stat(struct runtime_stat *stat,
 				enum stat_type type,
 				int ctx, int cpu, u64 count)
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index b8448b1..c639c3e 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -129,6 +129,7 @@ void runtime_stat__init(struct runtime_stat *stat);
 void runtime_stat__exit(struct runtime_stat *stat);
 void perf_stat__init_shadow_stats(void);
 void perf_stat__reset_shadow_stats(void);
+void perf_stat__reset_shadow_per_stat(struct runtime_stat *stat);
 void perf_stat__update_shadow_stats(struct perf_evsel *counter, u64 count,
 				    int cpu, struct runtime_stat *stat);
 struct perf_stat_output_ctx {
-- 
2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v5 08/12] perf stat: Allocate shadow stats buffer for threads
  2017-12-01 10:57 [PATCH v5 00/12] perf stat: Enable '--per-thread' on all thread Jin Yao
                   ` (6 preceding siblings ...)
  2017-12-01 10:57 ` [PATCH v5 07/12] perf util: Remove a set of shadow stats static variables Jin Yao
@ 2017-12-01 10:57 ` Jin Yao
  2017-12-01 10:57 ` [PATCH v5 09/12] perf stat: Update or print per-thread stats Jin Yao
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Jin Yao @ 2017-12-01 10:57 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

After perf_evlist__create_maps() being executed, we can get all
threads from /proc. And via thread_map__nr(), we can also get
the number of threads.

With the number of threads, the patch allocates a buffer which
will record the shadow stats for these threads.

The buffer pointer is saved in stat_config.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/builtin-stat.c | 46 +++++++++++++++++++++++++++++++++++++++++++++-
 tools/perf/util/stat.h    |  2 ++
 2 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 1edc082..8ff3348 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -214,8 +214,13 @@ static inline void diff_timespec(struct timespec *r, struct timespec *a,
 
 static void perf_stat__reset_stats(void)
 {
+	int i;
+
 	perf_evlist__reset_stats(evsel_list);
 	perf_stat__reset_shadow_stats();
+
+	for (i = 0; i < stat_config.stats_num; i++)
+		perf_stat__reset_shadow_per_stat(&stat_config.stats[i]);
 }
 
 static int create_perf_stat_counter(struct perf_evsel *evsel)
@@ -2496,6 +2501,35 @@ int process_cpu_map_event(struct perf_tool *tool,
 	return set_maps(st);
 }
 
+static int runtime_stat_new(struct perf_stat_config *config, int nthreads)
+{
+	int i;
+
+	config->stats = calloc(nthreads, sizeof(struct runtime_stat));
+	if (!config->stats)
+		return -1;
+
+	config->stats_num = nthreads;
+
+	for (i = 0; i < nthreads; i++)
+		runtime_stat__init(&config->stats[i]);
+
+	return 0;
+}
+
+static void runtime_stat_delete(struct perf_stat_config *config)
+{
+	int i;
+
+	if (!config->stats)
+		return;
+
+	for (i = 0; i < config->stats_num; i++)
+		runtime_stat__exit(&config->stats[i]);
+
+	free(config->stats);
+}
+
 static const char * const stat_report_usage[] = {
 	"perf stat report [<options>]",
 	NULL,
@@ -2751,8 +2785,15 @@ int cmd_stat(int argc, const char **argv)
 	 * Initialize thread_map with comm names,
 	 * so we could print it out on output.
 	 */
-	if (stat_config.aggr_mode == AGGR_THREAD)
+	if (stat_config.aggr_mode == AGGR_THREAD) {
 		thread_map__read_comms(evsel_list->threads);
+		if (target.system_wide) {
+			if (runtime_stat_new(&stat_config,
+				thread_map__nr(evsel_list->threads))) {
+				goto out;
+			}
+		}
+	}
 
 	if (interval && interval < 100) {
 		if (interval < 10) {
@@ -2842,5 +2883,8 @@ int cmd_stat(int argc, const char **argv)
 		sysfs__write_int(FREEZE_ON_SMI_PATH, 0);
 
 	perf_evlist__delete(evsel_list);
+
+	runtime_stat_delete(&stat_config);
+
 	return status;
 }
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index c639c3e..762a239 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -91,6 +91,8 @@ struct perf_stat_config {
 	bool		scale;
 	FILE		*output;
 	unsigned int	interval;
+	struct runtime_stat *stats;
+	int		stats_num;
 };
 
 void update_stats(struct stats *stats, u64 val);
-- 
2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v5 09/12] perf stat: Update or print per-thread stats
  2017-12-01 10:57 [PATCH v5 00/12] perf stat: Enable '--per-thread' on all thread Jin Yao
                   ` (7 preceding siblings ...)
  2017-12-01 10:57 ` [PATCH v5 08/12] perf stat: Allocate shadow stats buffer for threads Jin Yao
@ 2017-12-01 10:57 ` Jin Yao
  2017-12-01 10:57 ` [PATCH v5 10/12] perf util: Reuse thread_map__new_by_uid to enumerate threads from /proc Jin Yao
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Jin Yao @ 2017-12-01 10:57 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

If the stats pointer in stat_config structure is not null, it will
update the per-thread stats or print the per-thread stats on this
buffer.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/builtin-stat.c |  9 +++++++--
 tools/perf/util/stat.c    | 11 ++++++++---
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 8ff3348..23d5618 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1373,8 +1373,13 @@ static void print_aggr_thread(struct perf_evsel *counter, char *prefix)
 			fprintf(output, "%s", prefix);
 
 		uval = val * counter->scale;
-		printout(thread, 0, counter, uval, prefix, run, ena, 1.0,
-			 &rt_stat);
+
+		if (stat_config.stats)
+			printout(thread, 0, counter, uval, prefix, run, ena,
+				 1.0, &stat_config.stats[thread]);
+		else
+			printout(thread, 0, counter, uval, prefix, run, ena,
+				 1.0, &rt_stat);
 		fputc('\n', output);
 	}
 }
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 78abfd4..32235657 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -280,9 +280,14 @@ process_counter_values(struct perf_stat_config *config, struct perf_evsel *evsel
 		if (config->aggr_mode == AGGR_NONE)
 			perf_stat__update_shadow_stats(evsel, count->val, cpu,
 						       &rt_stat);
-		if (config->aggr_mode == AGGR_THREAD)
-			perf_stat__update_shadow_stats(evsel, count->val, 0,
-						       &rt_stat);
+		if (config->aggr_mode == AGGR_THREAD) {
+			if (config->stats)
+				perf_stat__update_shadow_stats(evsel,
+					count->val, 0, &config->stats[thread]);
+			else
+				perf_stat__update_shadow_stats(evsel,
+					count->val, 0, &rt_stat);
+		}
 		break;
 	case AGGR_GLOBAL:
 		aggr->val += count->val;
-- 
2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v5 10/12] perf util: Reuse thread_map__new_by_uid to enumerate threads from /proc
  2017-12-01 10:57 [PATCH v5 00/12] perf stat: Enable '--per-thread' on all thread Jin Yao
                   ` (8 preceding siblings ...)
  2017-12-01 10:57 ` [PATCH v5 09/12] perf stat: Update or print per-thread stats Jin Yao
@ 2017-12-01 10:57 ` Jin Yao
  2017-12-01 14:44   ` Arnaldo Carvalho de Melo
  2017-12-01 10:57 ` [PATCH v5 11/12] perf stat: Remove --per-thread pid/tid limitation Jin Yao
  2017-12-01 10:57 ` [PATCH v5 12/12] perf stat: Resort '--per-thread' result Jin Yao
  11 siblings, 1 reply; 27+ messages in thread
From: Jin Yao @ 2017-12-01 10:57 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

Perf already has a function thread_map__new_by_uid() which can
enumerate all threads from /proc by uid.

This patch creates a static function enumerate_threads() which
reuses the common code in thread_map__new_by_uid() to enumerate
threads from /proc.

The enumerate_threads() is shared by thread_map__new_by_uid()
and a new function thread_map__new_threads().

The new function thread_map__new_threads() is called to enumerate
all threads from /proc.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/tests/thread-map.c |  2 +-
 tools/perf/util/evlist.c      |  3 ++-
 tools/perf/util/thread_map.c  | 19 ++++++++++++++++---
 tools/perf/util/thread_map.h  |  3 ++-
 4 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/tools/perf/tests/thread-map.c b/tools/perf/tests/thread-map.c
index dbcb6a1..4de1939 100644
--- a/tools/perf/tests/thread-map.c
+++ b/tools/perf/tests/thread-map.c
@@ -105,7 +105,7 @@ int test__thread_map_remove(struct test *test __maybe_unused, int subtest __mayb
 	TEST_ASSERT_VAL("failed to allocate map string",
 			asprintf(&str, "%d,%d", getpid(), getppid()) >= 0);
 
-	threads = thread_map__new_str(str, NULL, 0);
+	threads = thread_map__new_str(str, NULL, 0, false);
 
 	TEST_ASSERT_VAL("failed to allocate thread_map",
 			threads);
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 199bb82..05b8f2b 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1102,7 +1102,8 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target)
 	struct cpu_map *cpus;
 	struct thread_map *threads;
 
-	threads = thread_map__new_str(target->pid, target->tid, target->uid);
+	threads = thread_map__new_str(target->pid, target->tid, target->uid,
+				      target->per_thread);
 
 	if (!threads)
 		return -1;
diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c
index be0d5a7..5672268 100644
--- a/tools/perf/util/thread_map.c
+++ b/tools/perf/util/thread_map.c
@@ -92,7 +92,7 @@ struct thread_map *thread_map__new_by_tid(pid_t tid)
 	return threads;
 }
 
-struct thread_map *thread_map__new_by_uid(uid_t uid)
+static struct thread_map *enumerate_threads(uid_t uid)
 {
 	DIR *proc;
 	int max_threads = 32, items, i;
@@ -124,7 +124,7 @@ struct thread_map *thread_map__new_by_uid(uid_t uid)
 		if (stat(path, &st) != 0)
 			continue;
 
-		if (st.st_uid != uid)
+		if ((uid != UINT_MAX) && (st.st_uid != uid))
 			continue;
 
 		snprintf(path, sizeof(path), "/proc/%d/task", pid);
@@ -178,6 +178,16 @@ struct thread_map *thread_map__new_by_uid(uid_t uid)
 	goto out_closedir;
 }
 
+struct thread_map *thread_map__new_by_uid(uid_t uid)
+{
+	return enumerate_threads(uid);
+}
+
+struct thread_map *thread_map__new_threads(void)
+{
+	return enumerate_threads(UINT_MAX);
+}
+
 struct thread_map *thread_map__new(pid_t pid, pid_t tid, uid_t uid)
 {
 	if (pid != -1)
@@ -313,7 +323,7 @@ struct thread_map *thread_map__new_by_tid_str(const char *tid_str)
 }
 
 struct thread_map *thread_map__new_str(const char *pid, const char *tid,
-				       uid_t uid)
+				       uid_t uid, bool per_thread)
 {
 	if (pid)
 		return thread_map__new_by_pid_str(pid);
@@ -321,6 +331,9 @@ struct thread_map *thread_map__new_str(const char *pid, const char *tid,
 	if (!tid && uid != UINT_MAX)
 		return thread_map__new_by_uid(uid);
 
+	if (per_thread)
+		return thread_map__new_threads();
+
 	return thread_map__new_by_tid_str(tid);
 }
 
diff --git a/tools/perf/util/thread_map.h b/tools/perf/util/thread_map.h
index f158039..dc07543 100644
--- a/tools/perf/util/thread_map.h
+++ b/tools/perf/util/thread_map.h
@@ -23,6 +23,7 @@ struct thread_map *thread_map__new_dummy(void);
 struct thread_map *thread_map__new_by_pid(pid_t pid);
 struct thread_map *thread_map__new_by_tid(pid_t tid);
 struct thread_map *thread_map__new_by_uid(uid_t uid);
+struct thread_map *thread_map__new_threads(void);
 struct thread_map *thread_map__new(pid_t pid, pid_t tid, uid_t uid);
 struct thread_map *thread_map__new_event(struct thread_map_event *event);
 
@@ -30,7 +31,7 @@ struct thread_map *thread_map__get(struct thread_map *map);
 void thread_map__put(struct thread_map *map);
 
 struct thread_map *thread_map__new_str(const char *pid,
-		const char *tid, uid_t uid);
+		const char *tid, uid_t uid, bool per_thread);
 
 struct thread_map *thread_map__new_by_tid_str(const char *tid_str);
 
-- 
2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v5 11/12] perf stat: Remove --per-thread pid/tid limitation
  2017-12-01 10:57 [PATCH v5 00/12] perf stat: Enable '--per-thread' on all thread Jin Yao
                   ` (9 preceding siblings ...)
  2017-12-01 10:57 ` [PATCH v5 10/12] perf util: Reuse thread_map__new_by_uid to enumerate threads from /proc Jin Yao
@ 2017-12-01 10:57 ` Jin Yao
  2017-12-01 10:57 ` [PATCH v5 12/12] perf stat: Resort '--per-thread' result Jin Yao
  11 siblings, 0 replies; 27+ messages in thread
From: Jin Yao @ 2017-12-01 10:57 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

Currently, if we execute 'perf stat --per-thread' without specifying
pid/tid, perf will return error.

root@skl:/tmp# perf stat --per-thread
The --per-thread option is only available when monitoring via -p -t options.
    -p, --pid <pid>       stat events on existing process id
    -t, --tid <tid>       stat events on existing thread id

This patch removes this limitation. If no pid/tid specified, it returns
all threads (get threads from /proc).

Note that it doesn't support cpu_list yet so if it's a cpu_list case,
then skip.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/builtin-stat.c | 23 +++++++++++++++--------
 tools/perf/util/target.h  |  7 +++++++
 2 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 23d5618..167c35c 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -277,7 +277,7 @@ static int create_perf_stat_counter(struct perf_evsel *evsel)
 			attr->enable_on_exec = 1;
 	}
 
-	if (target__has_cpu(&target))
+	if (target__has_cpu(&target) && !target__has_per_thread(&target))
 		return perf_evsel__open_per_cpu(evsel, perf_evsel__cpus(evsel));
 
 	return perf_evsel__open_per_thread(evsel, evsel_list->threads);
@@ -340,7 +340,7 @@ static int read_counter(struct perf_evsel *counter)
 	int nthreads = thread_map__nr(evsel_list->threads);
 	int ncpus, cpu, thread;
 
-	if (target__has_cpu(&target))
+	if (target__has_cpu(&target) && !target__has_per_thread(&target))
 		ncpus = perf_evsel__nr_cpus(counter);
 	else
 		ncpus = 1;
@@ -2744,12 +2744,16 @@ int cmd_stat(int argc, const char **argv)
 		run_count = 1;
 	}
 
-	if ((stat_config.aggr_mode == AGGR_THREAD) && !target__has_task(&target)) {
-		fprintf(stderr, "The --per-thread option is only available "
-			"when monitoring via -p -t options.\n");
-		parse_options_usage(NULL, stat_options, "p", 1);
-		parse_options_usage(NULL, stat_options, "t", 1);
-		goto out;
+	if ((stat_config.aggr_mode == AGGR_THREAD) &&
+		!target__has_task(&target)) {
+		if (!target.system_wide || target.cpu_list) {
+			fprintf(stderr, "The --per-thread option is only "
+				"available when monitoring via -p -t -a "
+				"options or only --per-thread.\n");
+			parse_options_usage(NULL, stat_options, "p", 1);
+			parse_options_usage(NULL, stat_options, "t", 1);
+			goto out;
+		}
 	}
 
 	/*
@@ -2773,6 +2777,9 @@ int cmd_stat(int argc, const char **argv)
 
 	target__validate(&target);
 
+	if ((stat_config.aggr_mode == AGGR_THREAD) && (target.system_wide))
+		target.per_thread = true;
+
 	if (perf_evlist__create_maps(evsel_list, &target) < 0) {
 		if (target__has_task(&target)) {
 			pr_err("Problems finding threads of monitor\n");
diff --git a/tools/perf/util/target.h b/tools/perf/util/target.h
index 446aa7a..6ef01a8 100644
--- a/tools/perf/util/target.h
+++ b/tools/perf/util/target.h
@@ -64,6 +64,11 @@ static inline bool target__none(struct target *target)
 	return !target__has_task(target) && !target__has_cpu(target);
 }
 
+static inline bool target__has_per_thread(struct target *target)
+{
+	return target->system_wide && target->per_thread;
+}
+
 static inline bool target__uses_dummy_map(struct target *target)
 {
 	bool use_dummy = false;
@@ -73,6 +78,8 @@ static inline bool target__uses_dummy_map(struct target *target)
 	else if (target__has_task(target) ||
 	         (!target__has_cpu(target) && !target->uses_mmap))
 		use_dummy = true;
+	else if (target__has_per_thread(target))
+		use_dummy = true;
 
 	return use_dummy;
 }
-- 
2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v5 12/12] perf stat: Resort '--per-thread' result
  2017-12-01 10:57 [PATCH v5 00/12] perf stat: Enable '--per-thread' on all thread Jin Yao
                   ` (10 preceding siblings ...)
  2017-12-01 10:57 ` [PATCH v5 11/12] perf stat: Remove --per-thread pid/tid limitation Jin Yao
@ 2017-12-01 10:57 ` Jin Yao
  11 siblings, 0 replies; 27+ messages in thread
From: Jin Yao @ 2017-12-01 10:57 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

There are many threads reported if we enable '--per-thread'
globally.

1. Most of the threads are not counted or counting value 0.
This patch removes these threads.

2. We also resort the threads in display according to the
counting value. It's useful for user to see the hottest
threads easily.

For example, the new results would be:

root@skl:/tmp# perf stat --per-thread
^C
 Performance counter stats for 'system wide':

            perf-24165              4.302433      cpu-clock (msec)          #    0.001 CPUs utilized
          vmstat-23127              1.562215      cpu-clock (msec)          #    0.000 CPUs utilized
      irqbalance-2780               0.827851      cpu-clock (msec)          #    0.000 CPUs utilized
            sshd-23111              0.278308      cpu-clock (msec)          #    0.000 CPUs utilized
        thermald-2841               0.230880      cpu-clock (msec)          #    0.000 CPUs utilized
            sshd-23058              0.207306      cpu-clock (msec)          #    0.000 CPUs utilized
     kworker/0:2-19991              0.133983      cpu-clock (msec)          #    0.000 CPUs utilized
   kworker/u16:1-18249              0.125636      cpu-clock (msec)          #    0.000 CPUs utilized
       rcu_sched-8                  0.085533      cpu-clock (msec)          #    0.000 CPUs utilized
   kworker/u16:2-23146              0.077139      cpu-clock (msec)          #    0.000 CPUs utilized
           gmain-2700               0.041789      cpu-clock (msec)          #    0.000 CPUs utilized
     kworker/4:1-15354              0.028370      cpu-clock (msec)          #    0.000 CPUs utilized
     kworker/6:0-17528              0.023895      cpu-clock (msec)          #    0.000 CPUs utilized
    kworker/4:1H-1887               0.013209      cpu-clock (msec)          #    0.000 CPUs utilized
     kworker/5:2-31362              0.011627      cpu-clock (msec)          #    0.000 CPUs utilized
      watchdog/0-11                 0.010892      cpu-clock (msec)          #    0.000 CPUs utilized
     kworker/3:2-12870              0.010220      cpu-clock (msec)          #    0.000 CPUs utilized
     ksoftirqd/0-7                  0.008869      cpu-clock (msec)          #    0.000 CPUs utilized
      watchdog/1-14                 0.008476      cpu-clock (msec)          #    0.000 CPUs utilized
      watchdog/7-50                 0.002944      cpu-clock (msec)          #    0.000 CPUs utilized
      watchdog/3-26                 0.002893      cpu-clock (msec)          #    0.000 CPUs utilized
      watchdog/4-32                 0.002759      cpu-clock (msec)          #    0.000 CPUs utilized
      watchdog/2-20                 0.002429      cpu-clock (msec)          #    0.000 CPUs utilized
      watchdog/6-44                 0.001491      cpu-clock (msec)          #    0.000 CPUs utilized
      watchdog/5-38                 0.001477      cpu-clock (msec)          #    0.000 CPUs utilized
       rcu_sched-8                        10      context-switches          #    0.117 M/sec
   kworker/u16:1-18249                     7      context-switches          #    0.056 M/sec
            sshd-23111                     4      context-switches          #    0.014 M/sec
          vmstat-23127                     4      context-switches          #    0.003 M/sec
            perf-24165                     4      context-switches          #    0.930 K/sec
     kworker/0:2-19991                     3      context-switches          #    0.022 M/sec
   kworker/u16:2-23146                     3      context-switches          #    0.039 M/sec
     kworker/4:1-15354                     2      context-switches          #    0.070 M/sec
     kworker/6:0-17528                     2      context-switches          #    0.084 M/sec
            sshd-23058                     2      context-switches          #    0.010 M/sec
     ksoftirqd/0-7                         1      context-switches          #    0.113 M/sec
      watchdog/0-11                        1      context-switches          #    0.092 M/sec
      watchdog/1-14                        1      context-switches          #    0.118 M/sec
      watchdog/2-20                        1      context-switches          #    0.412 M/sec
      watchdog/3-26                        1      context-switches          #    0.346 M/sec
      watchdog/4-32                        1      context-switches          #    0.362 M/sec
      watchdog/5-38                        1      context-switches          #    0.677 M/sec
      watchdog/6-44                        1      context-switches          #    0.671 M/sec
      watchdog/7-50                        1      context-switches          #    0.340 M/sec
    kworker/4:1H-1887                      1      context-switches          #    0.076 M/sec
        thermald-2841                      1      context-switches          #    0.004 M/sec
           gmain-2700                      1      context-switches          #    0.024 M/sec
      irqbalance-2780                      1      context-switches          #    0.001 M/sec
     kworker/3:2-12870                     1      context-switches          #    0.098 M/sec
     kworker/5:2-31362                     1      context-switches          #    0.086 M/sec
   kworker/u16:1-18249                     2      cpu-migrations            #    0.016 M/sec
   kworker/u16:2-23146                     2      cpu-migrations            #    0.026 M/sec
       rcu_sched-8                         1      cpu-migrations            #    0.012 M/sec
            sshd-23058                     1      cpu-migrations            #    0.005 M/sec
            perf-24165             8,833,385      cycles                    #    2.053 GHz
          vmstat-23127             1,702,699      cycles                    #    1.090 GHz
      irqbalance-2780                739,847      cycles                    #    0.894 GHz
            sshd-23111               269,506      cycles                    #    0.968 GHz
        thermald-2841                204,556      cycles                    #    0.886 GHz
            sshd-23058               158,780      cycles                    #    0.766 GHz
     kworker/0:2-19991               112,981      cycles                    #    0.843 GHz
   kworker/u16:1-18249               100,926      cycles                    #    0.803 GHz
       rcu_sched-8                    74,024      cycles                    #    0.865 GHz
   kworker/u16:2-23146                55,984      cycles                    #    0.726 GHz
           gmain-2700                 34,278      cycles                    #    0.820 GHz
     kworker/4:1-15354                20,665      cycles                    #    0.728 GHz
     kworker/6:0-17528                16,445      cycles                    #    0.688 GHz
     kworker/5:2-31362                 9,492      cycles                    #    0.816 GHz
      watchdog/3-26                    8,695      cycles                    #    3.006 GHz
    kworker/4:1H-1887                  8,238      cycles                    #    0.624 GHz
      watchdog/4-32                    7,580      cycles                    #    2.747 GHz
     kworker/3:2-12870                 7,306      cycles                    #    0.715 GHz
      watchdog/2-20                    7,274      cycles                    #    2.995 GHz
      watchdog/0-11                    6,988      cycles                    #    0.642 GHz
     ksoftirqd/0-7                     6,376      cycles                    #    0.719 GHz
      watchdog/1-14                    5,340      cycles                    #    0.630 GHz
      watchdog/5-38                    4,061      cycles                    #    2.749 GHz
      watchdog/6-44                    3,976      cycles                    #    2.667 GHz
      watchdog/7-50                    3,418      cycles                    #    1.161 GHz
          vmstat-23127             2,511,699      instructions              #    1.48  insn per cycle
            perf-24165             1,829,908      instructions              #    0.21  insn per cycle
      irqbalance-2780              1,190,204      instructions              #    1.61  insn per cycle
        thermald-2841                143,544      instructions              #    0.70  insn per cycle
            sshd-23111               128,138      instructions              #    0.48  insn per cycle
            sshd-23058                57,654      instructions              #    0.36  insn per cycle
       rcu_sched-8                    44,063      instructions              #    0.60  insn per cycle
   kworker/u16:1-18249                42,551      instructions              #    0.42  insn per cycle
     kworker/0:2-19991                25,873      instructions              #    0.23  insn per cycle
   kworker/u16:2-23146                21,407      instructions              #    0.38  insn per cycle
           gmain-2700                 13,691      instructions              #    0.40  insn per cycle
     kworker/4:1-15354                12,964      instructions              #    0.63  insn per cycle
     kworker/6:0-17528                10,034      instructions              #    0.61  insn per cycle
     kworker/5:2-31362                 5,203      instructions              #    0.55  insn per cycle
     kworker/3:2-12870                 4,866      instructions              #    0.67  insn per cycle
    kworker/4:1H-1887                  3,586      instructions              #    0.44  insn per cycle
     ksoftirqd/0-7                     3,463      instructions              #    0.54  insn per cycle
      watchdog/0-11                    3,135      instructions              #    0.45  insn per cycle
      watchdog/1-14                    3,135      instructions              #    0.59  insn per cycle
      watchdog/2-20                    3,135      instructions              #    0.43  insn per cycle
      watchdog/3-26                    3,135      instructions              #    0.36  insn per cycle
      watchdog/4-32                    3,135      instructions              #    0.41  insn per cycle
      watchdog/5-38                    3,135      instructions              #    0.77  insn per cycle
      watchdog/6-44                    3,135      instructions              #    0.79  insn per cycle
      watchdog/7-50                    3,135      instructions              #    0.92  insn per cycle
          vmstat-23127               539,181      branches                  #  345.139 M/sec
            perf-24165               375,364      branches                  #   87.245 M/sec
      irqbalance-2780                262,092      branches                  #  316.593 M/sec
        thermald-2841                 31,611      branches                  #  136.915 M/sec
            sshd-23111                21,874      branches                  #   78.596 M/sec
            sshd-23058                10,682      branches                  #   51.528 M/sec
       rcu_sched-8                     8,693      branches                  #  101.633 M/sec
   kworker/u16:1-18249                 7,891      branches                  #   62.808 M/sec
     kworker/0:2-19991                 5,761      branches                  #   42.998 M/sec
   kworker/u16:2-23146                 4,099      branches                  #   53.138 M/sec
     kworker/4:1-15354                 2,755      branches                  #   97.110 M/sec
           gmain-2700                  2,638      branches                  #   63.127 M/sec
     kworker/6:0-17528                 2,216      branches                  #   92.739 M/sec
     kworker/5:2-31362                 1,132      branches                  #   97.360 M/sec
     kworker/3:2-12870                 1,081      branches                  #  105.773 M/sec
    kworker/4:1H-1887                    725      branches                  #   54.887 M/sec
     ksoftirqd/0-7                       707      branches                  #   79.716 M/sec
      watchdog/0-11                      652      branches                  #   59.860 M/sec
      watchdog/1-14                      652      branches                  #   76.923 M/sec
      watchdog/2-20                      652      branches                  #  268.423 M/sec
      watchdog/3-26                      652      branches                  #  225.372 M/sec
      watchdog/4-32                      652      branches                  #  236.318 M/sec
      watchdog/5-38                      652      branches                  #  441.435 M/sec
      watchdog/6-44                      652      branches                  #  437.290 M/sec
      watchdog/7-50                      652      branches                  #  221.467 M/sec
          vmstat-23127                 8,960      branch-misses             #    1.66% of all branches
      irqbalance-2780                  3,047      branch-misses             #    1.16% of all branches
            perf-24165                 2,876      branch-misses             #    0.77% of all branches
            sshd-23111                 1,843      branch-misses             #    8.43% of all branches
        thermald-2841                  1,444      branch-misses             #    4.57% of all branches
            sshd-23058                 1,379      branch-misses             #   12.91% of all branches
   kworker/u16:1-18249                   982      branch-misses             #   12.44% of all branches
       rcu_sched-8                       893      branch-misses             #   10.27% of all branches
   kworker/u16:2-23146                   578      branch-misses             #   14.10% of all branches
     kworker/0:2-19991                   376      branch-misses             #    6.53% of all branches
           gmain-2700                    280      branch-misses             #   10.61% of all branches
     kworker/6:0-17528                   196      branch-misses             #    8.84% of all branches
     kworker/4:1-15354                   187      branch-misses             #    6.79% of all branches
     kworker/5:2-31362                   123      branch-misses             #   10.87% of all branches
      watchdog/0-11                       95      branch-misses             #   14.57% of all branches
      watchdog/4-32                       89      branch-misses             #   13.65% of all branches
     kworker/3:2-12870                    80      branch-misses             #    7.40% of all branches
      watchdog/3-26                       61      branch-misses             #    9.36% of all branches
    kworker/4:1H-1887                     60      branch-misses             #    8.28% of all branches
      watchdog/2-20                       52      branch-misses             #    7.98% of all branches
     ksoftirqd/0-7                        47      branch-misses             #    6.65% of all branches
      watchdog/1-14                       46      branch-misses             #    7.06% of all branches
      watchdog/7-50                       13      branch-misses             #    1.99% of all branches
      watchdog/5-38                        8      branch-misses             #    1.23% of all branches
      watchdog/6-44                        7      branch-misses             #    1.07% of all branches

       3.695150786 seconds time elapsed

root@skl:/tmp# perf stat --per-thread -M IPC,CPI
^C

 Performance counter stats for 'system wide':

          vmstat-23127             2,000,783      inst_retired.any          #      1.5 IPC
        thermald-2841              1,472,670      inst_retired.any          #      1.3 IPC
            sshd-23111               977,374      inst_retired.any          #      1.2 IPC
            perf-24163               483,779      inst_retired.any          #      0.2 IPC
           gmain-2700                341,213      inst_retired.any          #      0.9 IPC
            sshd-23058               148,891      inst_retired.any          #      0.8 IPC
    rtkit-daemon-3288                 71,210      inst_retired.any          #      0.7 IPC
   kworker/u16:1-18249                39,562      inst_retired.any          #      0.3 IPC
       rcu_sched-8                    14,474      inst_retired.any          #      0.8 IPC
     kworker/0:2-19991                 7,659      inst_retired.any          #      0.2 IPC
     kworker/4:1-15354                 6,714      inst_retired.any          #      0.8 IPC
    rtkit-daemon-3289                  4,839      inst_retired.any          #      0.3 IPC
     kworker/6:0-17528                 3,321      inst_retired.any          #      0.6 IPC
     kworker/5:2-31362                 3,215      inst_retired.any          #      0.5 IPC
     kworker/7:2-23145                 3,173      inst_retired.any          #      0.7 IPC
    kworker/4:1H-1887                  1,719      inst_retired.any          #      0.3 IPC
      watchdog/0-11                    1,479      inst_retired.any          #      0.3 IPC
      watchdog/1-14                    1,479      inst_retired.any          #      0.3 IPC
      watchdog/2-20                    1,479      inst_retired.any          #      0.4 IPC
      watchdog/3-26                    1,479      inst_retired.any          #      0.4 IPC
      watchdog/4-32                    1,479      inst_retired.any          #      0.3 IPC
      watchdog/5-38                    1,479      inst_retired.any          #      0.3 IPC
      watchdog/6-44                    1,479      inst_retired.any          #      0.7 IPC
      watchdog/7-50                    1,479      inst_retired.any          #      0.7 IPC
   kworker/u16:2-23146                 1,408      inst_retired.any          #      0.5 IPC
            perf-24163             2,249,872      cpu_clk_unhalted.thread
          vmstat-23127             1,352,455      cpu_clk_unhalted.thread
        thermald-2841              1,161,140      cpu_clk_unhalted.thread
            sshd-23111               807,827      cpu_clk_unhalted.thread
           gmain-2700                375,535      cpu_clk_unhalted.thread
            sshd-23058               194,071      cpu_clk_unhalted.thread
   kworker/u16:1-18249               114,306      cpu_clk_unhalted.thread
    rtkit-daemon-3288                103,547      cpu_clk_unhalted.thread
     kworker/0:2-19991                46,550      cpu_clk_unhalted.thread
       rcu_sched-8                    18,855      cpu_clk_unhalted.thread
    rtkit-daemon-3289                 17,549      cpu_clk_unhalted.thread
     kworker/4:1-15354                 8,812      cpu_clk_unhalted.thread
     kworker/5:2-31362                 6,812      cpu_clk_unhalted.thread
    kworker/4:1H-1887                  5,270      cpu_clk_unhalted.thread
     kworker/6:0-17528                 5,111      cpu_clk_unhalted.thread
     kworker/7:2-23145                 4,667      cpu_clk_unhalted.thread
      watchdog/0-11                    4,663      cpu_clk_unhalted.thread
      watchdog/1-14                    4,663      cpu_clk_unhalted.thread
      watchdog/4-32                    4,626      cpu_clk_unhalted.thread
      watchdog/5-38                    4,403      cpu_clk_unhalted.thread
      watchdog/3-26                    3,936      cpu_clk_unhalted.thread
      watchdog/2-20                    3,850      cpu_clk_unhalted.thread
   kworker/u16:2-23146                 2,654      cpu_clk_unhalted.thread
      watchdog/6-44                    2,017      cpu_clk_unhalted.thread
      watchdog/7-50                    2,017      cpu_clk_unhalted.thread
          vmstat-23127             2,000,783      inst_retired.any          #      0.7 CPI
        thermald-2841              1,472,670      inst_retired.any          #      0.8 CPI
            sshd-23111               977,374      inst_retired.any          #      0.8 CPI
            perf-24163               495,037      inst_retired.any          #      4.7 CPI
           gmain-2700                341,213      inst_retired.any          #      1.1 CPI
            sshd-23058               148,891      inst_retired.any          #      1.3 CPI
    rtkit-daemon-3288                 71,210      inst_retired.any          #      1.5 CPI
   kworker/u16:1-18249                39,562      inst_retired.any          #      2.9 CPI
       rcu_sched-8                    14,474      inst_retired.any          #      1.3 CPI
     kworker/0:2-19991                 7,659      inst_retired.any          #      6.1 CPI
     kworker/4:1-15354                 6,714      inst_retired.any          #      1.3 CPI
    rtkit-daemon-3289                  4,839      inst_retired.any          #      3.6 CPI
     kworker/6:0-17528                 3,321      inst_retired.any          #      1.5 CPI
     kworker/5:2-31362                 3,215      inst_retired.any          #      2.1 CPI
     kworker/7:2-23145                 3,173      inst_retired.any          #      1.5 CPI
    kworker/4:1H-1887                  1,719      inst_retired.any          #      3.1 CPI
      watchdog/0-11                    1,479      inst_retired.any          #      3.2 CPI
      watchdog/1-14                    1,479      inst_retired.any          #      3.2 CPI
      watchdog/2-20                    1,479      inst_retired.any          #      2.6 CPI
      watchdog/3-26                    1,479      inst_retired.any          #      2.7 CPI
      watchdog/4-32                    1,479      inst_retired.any          #      3.1 CPI
      watchdog/5-38                    1,479      inst_retired.any          #      3.0 CPI
      watchdog/6-44                    1,479      inst_retired.any          #      1.4 CPI
      watchdog/7-50                    1,479      inst_retired.any          #      1.4 CPI
   kworker/u16:2-23146                 1,408      inst_retired.any          #      1.9 CPI
            perf-24163             2,302,323      cycles
          vmstat-23127             1,352,455      cycles
        thermald-2841              1,161,140      cycles
            sshd-23111               807,827      cycles
           gmain-2700                375,535      cycles
            sshd-23058               194,071      cycles
   kworker/u16:1-18249               114,306      cycles
    rtkit-daemon-3288                103,547      cycles
     kworker/0:2-19991                46,550      cycles
       rcu_sched-8                    18,855      cycles
    rtkit-daemon-3289                 17,549      cycles
     kworker/4:1-15354                 8,812      cycles
     kworker/5:2-31362                 6,812      cycles
    kworker/4:1H-1887                  5,270      cycles
     kworker/6:0-17528                 5,111      cycles
     kworker/7:2-23145                 4,667      cycles
      watchdog/0-11                    4,663      cycles
      watchdog/1-14                    4,663      cycles
      watchdog/4-32                    4,626      cycles
      watchdog/5-38                    4,403      cycles
      watchdog/3-26                    3,936      cycles
      watchdog/2-20                    3,850      cycles
   kworker/u16:2-23146                 2,654      cycles
      watchdog/6-44                    2,017      cycles
      watchdog/7-50                    2,017      cycles

       2.175726600 seconds time elapsed

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/builtin-stat.c | 77 ++++++++++++++++++++++++++++++++++++++++-------
 tools/perf/util/stat.h    |  9 ++++++
 2 files changed, 75 insertions(+), 11 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 167c35c..466acab 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1352,13 +1352,24 @@ static void print_aggr(char *prefix)
 	}
 }
 
-static void print_aggr_thread(struct perf_evsel *counter, char *prefix)
+static int cmp_val(const void *a, const void *b)
 {
-	FILE *output = stat_config.output;
-	int nthreads = thread_map__nr(counter->threads);
-	int ncpus = cpu_map__nr(counter->cpus);
-	int cpu, thread;
+	return ((struct perf_aggr_thread_value *)b)->val -
+		((struct perf_aggr_thread_value *)a)->val;
+}
+
+static struct perf_aggr_thread_value *sort_aggr_thread(
+					struct perf_evsel *counter,
+					int nthreads, int ncpus,
+					int *ret)
+{
+	int cpu, thread, i = 0;
 	double uval;
+	struct perf_aggr_thread_value *buf;
+
+	buf = calloc(nthreads, sizeof(struct perf_aggr_thread_value));
+	if (!buf)
+		return NULL;
 
 	for (thread = 0; thread < nthreads; thread++) {
 		u64 ena = 0, run = 0, val = 0;
@@ -1369,19 +1380,63 @@ static void print_aggr_thread(struct perf_evsel *counter, char *prefix)
 			run += perf_counts(counter->counts, cpu, thread)->run;
 		}
 
+		uval = val * counter->scale;
+
+		/*
+		 * Skip value 0 when enabling --per-thread globally,
+		 * otherwise too many 0 output.
+		 */
+		if (uval == 0.0 && target__has_per_thread(&target))
+			continue;
+
+		buf[i].counter = counter;
+		buf[i].id = thread;
+		buf[i].uval = uval;
+		buf[i].val = val;
+		buf[i].run = run;
+		buf[i].ena = ena;
+		i++;
+	}
+
+	qsort(buf, i, sizeof(struct perf_aggr_thread_value), cmp_val);
+
+	if (ret)
+		*ret = i;
+
+	return buf;
+}
+
+static void print_aggr_thread(struct perf_evsel *counter, char *prefix)
+{
+	FILE *output = stat_config.output;
+	int nthreads = thread_map__nr(counter->threads);
+	int ncpus = cpu_map__nr(counter->cpus);
+	int thread, sorted_threads, id;
+	struct perf_aggr_thread_value *buf;
+
+	buf = sort_aggr_thread(counter, nthreads, ncpus, &sorted_threads);
+	if (!buf) {
+		perror("cannot sort aggr thread");
+		return;
+	}
+
+	for (thread = 0; thread < sorted_threads; thread++) {
 		if (prefix)
 			fprintf(output, "%s", prefix);
 
-		uval = val * counter->scale;
-
+		id = buf[thread].id;
 		if (stat_config.stats)
-			printout(thread, 0, counter, uval, prefix, run, ena,
-				 1.0, &stat_config.stats[thread]);
+			printout(id, 0, buf[thread].counter, buf[thread].uval,
+				 prefix, buf[thread].run, buf[thread].ena, 1.0,
+				 &stat_config.stats[id]);
 		else
-			printout(thread, 0, counter, uval, prefix, run, ena,
-				 1.0, &rt_stat);
+			printout(id, 0, buf[thread].counter, buf[thread].uval,
+				 prefix, buf[thread].run, buf[thread].ena, 1.0,
+				 &rt_stat);
 		fputc('\n', output);
 	}
+
+	free(buf);
 }
 
 struct caggr_data {
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 762a239..85bd638 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -112,6 +112,15 @@ static inline void init_stats(struct stats *stats)
 struct perf_evsel;
 struct perf_evlist;
 
+struct perf_aggr_thread_value {
+	struct perf_evsel *counter;
+	int id;
+	double uval;
+	u64 val;
+	u64 run;
+	u64 ena;
+};
+
 bool __perf_evsel_stat__is(struct perf_evsel *evsel,
 			   enum perf_stat_evsel_id id);
 
-- 
2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 02/12] perf util: Define a structure for runtime shadow stats
  2017-12-01 10:57 ` [PATCH v5 02/12] perf util: Define a structure for runtime shadow stats Jin Yao
@ 2017-12-01 14:02   ` Arnaldo Carvalho de Melo
  2017-12-02  4:39     ` Jin, Yao
  0 siblings, 1 reply; 27+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-12-01 14:02 UTC (permalink / raw)
  To: Jin Yao
  Cc: jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

Em Fri, Dec 01, 2017 at 06:57:26PM +0800, Jin Yao escreveu:
> Perf has a set of static variables to record the runtime shadow
> metrics stats.
> 
> While if we want to record the runtime shadow stats for per-thread,
> it will be the limitation. This patch creates a structure and the
> next patches will use this structure to update the runtime shadow
> stats for per-thread.
> 
> Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
> ---
>  tools/perf/util/stat-shadow.c | 11 -----------
>  tools/perf/util/stat.h        | 44 ++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 43 insertions(+), 12 deletions(-)
> 
> diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
> index 855e35c..5853901 100644
> --- a/tools/perf/util/stat-shadow.c
> +++ b/tools/perf/util/stat-shadow.c
> @@ -9,17 +9,6 @@
>  #include "expr.h"
>  #include "metricgroup.h"
>  
> -enum {
> -	CTX_BIT_USER	= 1 << 0,
> -	CTX_BIT_KERNEL	= 1 << 1,
> -	CTX_BIT_HV	= 1 << 2,
> -	CTX_BIT_HOST	= 1 << 3,
> -	CTX_BIT_IDLE	= 1 << 4,
> -	CTX_BIT_MAX	= 1 << 5,
> -};
> -
> -#define NUM_CTX CTX_BIT_MAX
> -
>  /*
>   * AGGR_GLOBAL: Use CPU 0
>   * AGGR_SOCKET: Use first CPU of socket
> diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
> index eefca5c..290c51e 100644
> --- a/tools/perf/util/stat.h
> +++ b/tools/perf/util/stat.h
> @@ -5,6 +5,8 @@
>  #include <linux/types.h>
>  #include <stdio.h>
>  #include "xyarray.h"
> +#include "evsel.h"

What is this for? You don't add anything in this patch that uses things
from evsel.h.

I'm removing it, will fixup if later this becomes needed.

- Arnaldo

> +#include "rblist.h"
>  
>  struct stats
>  {
> @@ -43,6 +45,47 @@ enum aggr_mode {
>  	AGGR_UNSET,
>  };
>  
> +enum {
> +	CTX_BIT_USER	= 1 << 0,
> +	CTX_BIT_KERNEL	= 1 << 1,
> +	CTX_BIT_HV	= 1 << 2,
> +	CTX_BIT_HOST	= 1 << 3,
> +	CTX_BIT_IDLE	= 1 << 4,
> +	CTX_BIT_MAX	= 1 << 5,
> +};
> +
> +#define NUM_CTX CTX_BIT_MAX
> +
> +enum stat_type {
> +	STAT_NONE = 0,
> +	STAT_NSECS,
> +	STAT_CYCLES,
> +	STAT_STALLED_CYCLES_FRONT,
> +	STAT_STALLED_CYCLES_BACK,
> +	STAT_BRANCHES,
> +	STAT_CACHEREFS,
> +	STAT_L1_DCACHE,
> +	STAT_L1_ICACHE,
> +	STAT_LL_CACHE,
> +	STAT_ITLB_CACHE,
> +	STAT_DTLB_CACHE,
> +	STAT_CYCLES_IN_TX,
> +	STAT_TRANSACTION,
> +	STAT_ELISION,
> +	STAT_TOPDOWN_TOTAL_SLOTS,
> +	STAT_TOPDOWN_SLOTS_ISSUED,
> +	STAT_TOPDOWN_SLOTS_RETIRED,
> +	STAT_TOPDOWN_FETCH_BUBBLES,
> +	STAT_TOPDOWN_RECOVERY_BUBBLES,
> +	STAT_SMI_NUM,
> +	STAT_APERF,
> +	STAT_MAX
> +};
> +
> +struct runtime_stat {
> +	struct rblist value_list;
> +};
> +
>  struct perf_stat_config {
>  	enum aggr_mode	aggr_mode;
>  	bool		scale;
> @@ -92,7 +135,6 @@ struct perf_stat_output_ctx {
>  	bool force_header;
>  };
>  
> -struct rblist;
>  void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>  				   double avg, int cpu,
>  				   struct perf_stat_output_ctx *out,
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 03/12] perf util: Extend rbtree to support shadow stats
  2017-12-01 10:57 ` [PATCH v5 03/12] perf util: Extend rbtree to support " Jin Yao
@ 2017-12-01 14:10   ` Arnaldo Carvalho de Melo
  2017-12-02  4:40     ` Jin, Yao
  0 siblings, 1 reply; 27+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-12-01 14:10 UTC (permalink / raw)
  To: Jin Yao
  Cc: jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

Em Fri, Dec 01, 2017 at 06:57:27PM +0800, Jin Yao escreveu:
> Previously the rbtree was used to link generic metrics.

Try to make the one line subject more descriptive, I'm changing it to:

perf stat: Extend rbtree to support per-thread shadow stats

- Arnaldo
 
> This patches adds new ctx/type/stat into rbtree keys because we
> will use this rbtree to maintain shadow metrics to replace original
> a couple of static arrays for supporting per-thread shadow stats.
> 
> Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
> ---
>  tools/perf/util/stat-shadow.c | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
> index 5853901..c53b80d 100644
> --- a/tools/perf/util/stat-shadow.c
> +++ b/tools/perf/util/stat-shadow.c
> @@ -45,7 +45,10 @@ struct stats walltime_nsecs_stats;
>  struct saved_value {
>  	struct rb_node rb_node;
>  	struct perf_evsel *evsel;
> +	enum stat_type type;
> +	int ctx;
>  	int cpu;
> +	struct runtime_stat *stat;
>  	struct stats stats;
>  };
>  
> @@ -58,6 +61,30 @@ static int saved_value_cmp(struct rb_node *rb_node, const void *entry)
>  
>  	if (a->cpu != b->cpu)
>  		return a->cpu - b->cpu;
> +
> +	/*
> +	 * Previously the rbtree was used to link generic metrics.
> +	 * The keys were evsel/cpu. Now the rbtree is extended to support
> +	 * per-thread shadow stats. For shadow stats case, the keys
> +	 * are cpu/type/ctx/stat (evsel is NULL). For generic metrics
> +	 * case, the keys are still evsel/cpu (type/ctx/stat are 0 or NULL).
> +	 */
> +	if (a->type != b->type)
> +		return a->type - b->type;
> +
> +	if (a->ctx != b->ctx)
> +		return a->ctx - b->ctx;
> +
> +	if (a->evsel == NULL && b->evsel == NULL) {
> +		if (a->stat == b->stat)
> +			return 0;
> +
> +		if ((char *)a->stat < (char *)b->stat)
> +			return -1;
> +
> +		return 1;
> +	}
> +
>  	if (a->evsel == b->evsel)
>  		return 0;
>  	if ((char *)a->evsel < (char *)b->evsel)
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 04/12] perf util: Add rbtree node_delete ops
  2017-12-01 10:57 ` [PATCH v5 04/12] perf util: Add rbtree node_delete ops Jin Yao
@ 2017-12-01 14:14   ` Arnaldo Carvalho de Melo
  2017-12-01 18:29     ` Andi Kleen
  2017-12-06 16:37   ` [tip:perf/core] perf stat: Add rbtree node_delete op tip-bot for Jin Yao
  1 sibling, 1 reply; 27+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-12-01 14:14 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jin Yao, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel,
	ak, kan.liang, yao.jin

Em Fri, Dec 01, 2017 at 06:57:28PM +0800, Jin Yao escreveu:
> @@ -130,7 +140,7 @@ void perf_stat__init_shadow_stats(void)
>  	rblist__init(&runtime_saved_values);
>  	runtime_saved_values.node_cmp = saved_value_cmp;
>  	runtime_saved_values.node_new = saved_value_new;
> -	/* No delete for now */
> +	runtime_saved_values.node_delete = saved_value_delete;
>  }

Andi, was some reason behind that comment, is it safe to add it now?

- Arnaldo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 06/12] perf util: Update and print per-thread shadow stats
  2017-12-01 10:57 ` [PATCH v5 06/12] perf util: Update and print per-thread shadow stats Jin Yao
@ 2017-12-01 14:21   ` Arnaldo Carvalho de Melo
  2017-12-02  4:46     ` Jin, Yao
  0 siblings, 1 reply; 27+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-12-01 14:21 UTC (permalink / raw)
  To: Jin Yao
  Cc: jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

Em Fri, Dec 01, 2017 at 06:57:30PM +0800, Jin Yao escreveu:
> The functions perf_stat__update_shadow_stats() and
> perf_stat__print_shadow_statss() are called to update
> and print the shadow stats on a set of static variables.
> 
> But the static variables are the limitations to support
> per-thread shadow stats.
> 
> This patch lets the perf_stat__update_shadow_stats() support
> to update the shadow stats on a input parameter 'stat' and
> uses update_runtime_stat() to update the stats. It will not
> directly update the static variables as before.
> 
> And this patch also lets the perf_stat__print_shadow_stats()

When 'also' appears on a patch usually it means it should be split in
two, one for the things up to the 'also' and another for the remaining
parts.

A patch that has these stats:

5 files changed, 219 insertions(+), 120 deletions(-)

raises eyebrows :-\

I'm trying now to break it into at least two, one for printing and the
other for the rest.

- Arnaldo

> support to print the shadow stats from a input parameter 'stat'.
> 
> It will not directly get value from static variable. Instead, it now
> uses runtime_stat_avg() and runtime_stat_n() to get and compute the
> values.
> 
> Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
> ---
>  tools/perf/builtin-script.c   |   6 +-
>  tools/perf/builtin-stat.c     |  27 ++--
>  tools/perf/util/stat-shadow.c | 293 +++++++++++++++++++++++++++---------------
>  tools/perf/util/stat.c        |   8 +-
>  tools/perf/util/stat.h        |   5 +-
>  5 files changed, 219 insertions(+), 120 deletions(-)
> 
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index 39d8b55..fac6f05 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -1548,7 +1548,8 @@ static void perf_sample__fprint_metric(struct perf_script *script,
>  	val = sample->period * evsel->scale;
>  	perf_stat__update_shadow_stats(evsel,
>  				       val,
> -				       sample->cpu);
> +				       sample->cpu,
> +				       &rt_stat);
>  	evsel_script(evsel)->val = val;
>  	if (evsel_script(evsel->leader)->gnum == evsel->leader->nr_members) {
>  		for_each_group_member (ev2, evsel->leader) {
> @@ -1556,7 +1557,8 @@ static void perf_sample__fprint_metric(struct perf_script *script,
>  						      evsel_script(ev2)->val,
>  						      sample->cpu,
>  						      &ctx,
> -						      NULL);
> +						      NULL,
> +						      &rt_stat);
>  		}
>  		evsel_script(evsel->leader)->gnum = 0;
>  	}
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index a027b47..1edc082 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -1097,7 +1097,8 @@ static void abs_printout(int id, int nr, struct perf_evsel *evsel, double avg)
>  }
>  
>  static void printout(int id, int nr, struct perf_evsel *counter, double uval,
> -		     char *prefix, u64 run, u64 ena, double noise)
> +		     char *prefix, u64 run, u64 ena, double noise,
> +		     struct runtime_stat *stat)
>  {
>  	struct perf_stat_output_ctx out;
>  	struct outstate os = {
> @@ -1190,7 +1191,8 @@ static void printout(int id, int nr, struct perf_evsel *counter, double uval,
>  
>  	perf_stat__print_shadow_stats(counter, uval,
>  				first_shadow_cpu(counter, id),
> -				&out, &metric_events);
> +				&out, &metric_events,
> +				stat);
>  	if (!csv_output && !metric_only) {
>  		print_noise(counter, noise);
>  		print_running(run, ena);
> @@ -1214,7 +1216,8 @@ static void aggr_update_shadow(void)
>  				val += perf_counts(counter->counts, cpu, 0)->val;
>  			}
>  			perf_stat__update_shadow_stats(counter, val,
> -						       first_shadow_cpu(counter, id));
> +					first_shadow_cpu(counter, id),
> +					&rt_stat);
>  		}
>  	}
>  }
> @@ -1334,7 +1337,8 @@ static void print_aggr(char *prefix)
>  				fprintf(output, "%s", prefix);
>  
>  			uval = val * counter->scale;
> -			printout(id, nr, counter, uval, prefix, run, ena, 1.0);
> +			printout(id, nr, counter, uval, prefix, run, ena, 1.0,
> +				 &rt_stat);
>  			if (!metric_only)
>  				fputc('\n', output);
>  		}
> @@ -1364,7 +1368,8 @@ static void print_aggr_thread(struct perf_evsel *counter, char *prefix)
>  			fprintf(output, "%s", prefix);
>  
>  		uval = val * counter->scale;
> -		printout(thread, 0, counter, uval, prefix, run, ena, 1.0);
> +		printout(thread, 0, counter, uval, prefix, run, ena, 1.0,
> +			 &rt_stat);
>  		fputc('\n', output);
>  	}
>  }
> @@ -1401,7 +1406,8 @@ static void print_counter_aggr(struct perf_evsel *counter, char *prefix)
>  		fprintf(output, "%s", prefix);
>  
>  	uval = cd.avg * counter->scale;
> -	printout(-1, 0, counter, uval, prefix, cd.avg_running, cd.avg_enabled, cd.avg);
> +	printout(-1, 0, counter, uval, prefix, cd.avg_running, cd.avg_enabled,
> +		 cd.avg, &rt_stat);
>  	if (!metric_only)
>  		fprintf(output, "\n");
>  }
> @@ -1440,7 +1446,8 @@ static void print_counter(struct perf_evsel *counter, char *prefix)
>  			fprintf(output, "%s", prefix);
>  
>  		uval = val * counter->scale;
> -		printout(cpu, 0, counter, uval, prefix, run, ena, 1.0);
> +		printout(cpu, 0, counter, uval, prefix, run, ena, 1.0,
> +			 &rt_stat);
>  
>  		fputc('\n', output);
>  	}
> @@ -1472,7 +1479,8 @@ static void print_no_aggr_metric(char *prefix)
>  			run = perf_counts(counter->counts, cpu, 0)->run;
>  
>  			uval = val * counter->scale;
> -			printout(cpu, 0, counter, uval, prefix, run, ena, 1.0);
> +			printout(cpu, 0, counter, uval, prefix, run, ena, 1.0,
> +				 &rt_stat);
>  		}
>  		fputc('\n', stat_config.output);
>  	}
> @@ -1528,7 +1536,8 @@ static void print_metric_headers(const char *prefix, bool no_indent)
>  		perf_stat__print_shadow_stats(counter, 0,
>  					      0,
>  					      &out,
> -					      &metric_events);
> +					      &metric_events,
> +					      &rt_stat);
>  	}
>  	fputc('\n', stat_config.output);
>  }
> diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
> index e60c321..0d34d5e 100644
> --- a/tools/perf/util/stat-shadow.c
> +++ b/tools/perf/util/stat-shadow.c
> @@ -116,19 +116,29 @@ static void saved_value_delete(struct rblist *rblist __maybe_unused,
>  
>  static struct saved_value *saved_value_lookup(struct perf_evsel *evsel,
>  					      int cpu,
> -					      bool create)
> +					      bool create,
> +					      enum stat_type type,
> +					      int ctx,
> +					      struct runtime_stat *stat)
>  {
> +	struct rblist *rblist;
>  	struct rb_node *nd;
>  	struct saved_value dm = {
>  		.cpu = cpu,
>  		.evsel = evsel,
> +		.type = type,
> +		.ctx = ctx,
> +		.stat = stat,
>  	};
> -	nd = rblist__find(&runtime_saved_values, &dm);
> +
> +	rblist = &stat->value_list;
> +
> +	nd = rblist__find(rblist, &dm);
>  	if (nd)
>  		return container_of(nd, struct saved_value, rb_node);
>  	if (create) {
> -		rblist__add_node(&runtime_saved_values, &dm);
> -		nd = rblist__find(&runtime_saved_values, &dm);
> +		rblist__add_node(rblist, &dm);
> +		nd = rblist__find(rblist, &dm);
>  		if (nd)
>  			return container_of(nd, struct saved_value, rb_node);
>  	}
> @@ -217,13 +227,24 @@ void perf_stat__reset_shadow_stats(void)
>  	}
>  }
>  
> +static void update_runtime_stat(struct runtime_stat *stat,
> +				enum stat_type type,
> +				int ctx, int cpu, u64 count)
> +{
> +	struct saved_value *v = saved_value_lookup(NULL, cpu, true,
> +						   type, ctx, stat);
> +
> +	if (v)
> +		update_stats(&v->stats, count);
> +}
> +
>  /*
>   * Update various tracking values we maintain to print
>   * more semantic information such as miss/hit ratios,
>   * instruction rates, etc:
>   */
>  void perf_stat__update_shadow_stats(struct perf_evsel *counter, u64 count,
> -				    int cpu)
> +				    int cpu, struct runtime_stat *stat)
>  {
>  	int ctx = evsel_context(counter);
>  
> @@ -231,50 +252,58 @@ void perf_stat__update_shadow_stats(struct perf_evsel *counter, u64 count,
>  
>  	if (perf_evsel__match(counter, SOFTWARE, SW_TASK_CLOCK) ||
>  	    perf_evsel__match(counter, SOFTWARE, SW_CPU_CLOCK))
> -		update_stats(&runtime_nsecs_stats[cpu], count);
> +		update_runtime_stat(stat, STAT_NSECS, 0, cpu, count);
>  	else if (perf_evsel__match(counter, HARDWARE, HW_CPU_CYCLES))
> -		update_stats(&runtime_cycles_stats[ctx][cpu], count);
> +		update_runtime_stat(stat, STAT_CYCLES, ctx, cpu, count);
>  	else if (perf_stat_evsel__is(counter, CYCLES_IN_TX))
> -		update_stats(&runtime_cycles_in_tx_stats[ctx][cpu], count);
> +		update_runtime_stat(stat, STAT_CYCLES_IN_TX, ctx, cpu, count);
>  	else if (perf_stat_evsel__is(counter, TRANSACTION_START))
> -		update_stats(&runtime_transaction_stats[ctx][cpu], count);
> +		update_runtime_stat(stat, STAT_TRANSACTION, ctx, cpu, count);
>  	else if (perf_stat_evsel__is(counter, ELISION_START))
> -		update_stats(&runtime_elision_stats[ctx][cpu], count);
> +		update_runtime_stat(stat, STAT_ELISION, ctx, cpu, count);
>  	else if (perf_stat_evsel__is(counter, TOPDOWN_TOTAL_SLOTS))
> -		update_stats(&runtime_topdown_total_slots[ctx][cpu], count);
> +		update_runtime_stat(stat, STAT_TOPDOWN_TOTAL_SLOTS,
> +				    ctx, cpu, count);
>  	else if (perf_stat_evsel__is(counter, TOPDOWN_SLOTS_ISSUED))
> -		update_stats(&runtime_topdown_slots_issued[ctx][cpu], count);
> +		update_runtime_stat(stat, STAT_TOPDOWN_SLOTS_ISSUED,
> +				    ctx, cpu, count);
>  	else if (perf_stat_evsel__is(counter, TOPDOWN_SLOTS_RETIRED))
> -		update_stats(&runtime_topdown_slots_retired[ctx][cpu], count);
> +		update_runtime_stat(stat, STAT_TOPDOWN_SLOTS_RETIRED,
> +				    ctx, cpu, count);
>  	else if (perf_stat_evsel__is(counter, TOPDOWN_FETCH_BUBBLES))
> -		update_stats(&runtime_topdown_fetch_bubbles[ctx][cpu], count);
> +		update_runtime_stat(stat, STAT_TOPDOWN_FETCH_BUBBLES,
> +				    ctx, cpu, count);
>  	else if (perf_stat_evsel__is(counter, TOPDOWN_RECOVERY_BUBBLES))
> -		update_stats(&runtime_topdown_recovery_bubbles[ctx][cpu], count);
> +		update_runtime_stat(stat, STAT_TOPDOWN_RECOVERY_BUBBLES,
> +				    ctx, cpu, count);
>  	else if (perf_evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_FRONTEND))
> -		update_stats(&runtime_stalled_cycles_front_stats[ctx][cpu], count);
> +		update_runtime_stat(stat, STAT_STALLED_CYCLES_FRONT,
> +				    ctx, cpu, count);
>  	else if (perf_evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_BACKEND))
> -		update_stats(&runtime_stalled_cycles_back_stats[ctx][cpu], count);
> +		update_runtime_stat(stat, STAT_STALLED_CYCLES_BACK,
> +				    ctx, cpu, count);
>  	else if (perf_evsel__match(counter, HARDWARE, HW_BRANCH_INSTRUCTIONS))
> -		update_stats(&runtime_branches_stats[ctx][cpu], count);
> +		update_runtime_stat(stat, STAT_BRANCHES, ctx, cpu, count);
>  	else if (perf_evsel__match(counter, HARDWARE, HW_CACHE_REFERENCES))
> -		update_stats(&runtime_cacherefs_stats[ctx][cpu], count);
> +		update_runtime_stat(stat, STAT_CACHEREFS, ctx, cpu, count);
>  	else if (perf_evsel__match(counter, HW_CACHE, HW_CACHE_L1D))
> -		update_stats(&runtime_l1_dcache_stats[ctx][cpu], count);
> +		update_runtime_stat(stat, STAT_L1_DCACHE, ctx, cpu, count);
>  	else if (perf_evsel__match(counter, HW_CACHE, HW_CACHE_L1I))
> -		update_stats(&runtime_ll_cache_stats[ctx][cpu], count);
> +		update_runtime_stat(stat, STAT_L1_ICACHE, ctx, cpu, count);
>  	else if (perf_evsel__match(counter, HW_CACHE, HW_CACHE_LL))
> -		update_stats(&runtime_ll_cache_stats[ctx][cpu], count);
> +		update_runtime_stat(stat, STAT_LL_CACHE, ctx, cpu, count);
>  	else if (perf_evsel__match(counter, HW_CACHE, HW_CACHE_DTLB))
> -		update_stats(&runtime_dtlb_cache_stats[ctx][cpu], count);
> +		update_runtime_stat(stat, STAT_DTLB_CACHE, ctx, cpu, count);
>  	else if (perf_evsel__match(counter, HW_CACHE, HW_CACHE_ITLB))
> -		update_stats(&runtime_itlb_cache_stats[ctx][cpu], count);
> +		update_runtime_stat(stat, STAT_ITLB_CACHE, ctx, cpu, count);
>  	else if (perf_stat_evsel__is(counter, SMI_NUM))
> -		update_stats(&runtime_smi_num_stats[ctx][cpu], count);
> +		update_runtime_stat(stat, STAT_SMI_NUM, ctx, cpu, count);
>  	else if (perf_stat_evsel__is(counter, APERF))
> -		update_stats(&runtime_aperf_stats[ctx][cpu], count);
> +		update_runtime_stat(stat, STAT_APERF, ctx, cpu, count);
>  
>  	if (counter->collect_stat) {
> -		struct saved_value *v = saved_value_lookup(counter, cpu, true);
> +		struct saved_value *v = saved_value_lookup(counter, cpu, true,
> +							   STAT_NONE, 0, stat);
>  		update_stats(&v->stats, count);
>  	}
>  }
> @@ -395,15 +424,40 @@ void perf_stat__collect_metric_expr(struct perf_evlist *evsel_list)
>  	}
>  }
>  
> +static double runtime_stat_avg(struct runtime_stat *stat,
> +			       enum stat_type type, int ctx, int cpu)
> +{
> +	struct saved_value *v;
> +
> +	v = saved_value_lookup(NULL, cpu, false, type, ctx, stat);
> +	if (!v)
> +		return 0.0;
> +
> +	return avg_stats(&v->stats);
> +}
> +
> +static double runtime_stat_n(struct runtime_stat *stat,
> +			     enum stat_type type, int ctx, int cpu)
> +{
> +	struct saved_value *v;
> +
> +	v = saved_value_lookup(NULL, cpu, false, type, ctx, stat);
> +	if (!v)
> +		return 0.0;
> +
> +	return v->stats.n;
> +}
> +
>  static void print_stalled_cycles_frontend(int cpu,
>  					  struct perf_evsel *evsel, double avg,
> -					  struct perf_stat_output_ctx *out)
> +					  struct perf_stat_output_ctx *out,
> +					  struct runtime_stat *stat)
>  {
>  	double total, ratio = 0.0;
>  	const char *color;
>  	int ctx = evsel_context(evsel);
>  
> -	total = avg_stats(&runtime_cycles_stats[ctx][cpu]);
> +	total = runtime_stat_avg(stat, STAT_CYCLES, ctx, cpu);
>  
>  	if (total)
>  		ratio = avg / total * 100.0;
> @@ -419,13 +473,14 @@ static void print_stalled_cycles_frontend(int cpu,
>  
>  static void print_stalled_cycles_backend(int cpu,
>  					 struct perf_evsel *evsel, double avg,
> -					 struct perf_stat_output_ctx *out)
> +					 struct perf_stat_output_ctx *out,
> +					 struct runtime_stat *stat)
>  {
>  	double total, ratio = 0.0;
>  	const char *color;
>  	int ctx = evsel_context(evsel);
>  
> -	total = avg_stats(&runtime_cycles_stats[ctx][cpu]);
> +	total = runtime_stat_avg(stat, STAT_CYCLES, ctx, cpu);
>  
>  	if (total)
>  		ratio = avg / total * 100.0;
> @@ -438,13 +493,14 @@ static void print_stalled_cycles_backend(int cpu,
>  static void print_branch_misses(int cpu,
>  				struct perf_evsel *evsel,
>  				double avg,
> -				struct perf_stat_output_ctx *out)
> +				struct perf_stat_output_ctx *out,
> +				struct runtime_stat *stat)
>  {
>  	double total, ratio = 0.0;
>  	const char *color;
>  	int ctx = evsel_context(evsel);
>  
> -	total = avg_stats(&runtime_branches_stats[ctx][cpu]);
> +	total = runtime_stat_avg(stat, STAT_BRANCHES, ctx, cpu);
>  
>  	if (total)
>  		ratio = avg / total * 100.0;
> @@ -457,13 +513,15 @@ static void print_branch_misses(int cpu,
>  static void print_l1_dcache_misses(int cpu,
>  				   struct perf_evsel *evsel,
>  				   double avg,
> -				   struct perf_stat_output_ctx *out)
> +				   struct perf_stat_output_ctx *out,
> +				   struct runtime_stat *stat)
> +
>  {
>  	double total, ratio = 0.0;
>  	const char *color;
>  	int ctx = evsel_context(evsel);
>  
> -	total = avg_stats(&runtime_l1_dcache_stats[ctx][cpu]);
> +	total = runtime_stat_avg(stat, STAT_L1_DCACHE, ctx, cpu);
>  
>  	if (total)
>  		ratio = avg / total * 100.0;
> @@ -476,13 +534,15 @@ static void print_l1_dcache_misses(int cpu,
>  static void print_l1_icache_misses(int cpu,
>  				   struct perf_evsel *evsel,
>  				   double avg,
> -				   struct perf_stat_output_ctx *out)
> +				   struct perf_stat_output_ctx *out,
> +				   struct runtime_stat *stat)
> +
>  {
>  	double total, ratio = 0.0;
>  	const char *color;
>  	int ctx = evsel_context(evsel);
>  
> -	total = avg_stats(&runtime_l1_icache_stats[ctx][cpu]);
> +	total = runtime_stat_avg(stat, STAT_L1_ICACHE, ctx, cpu);
>  
>  	if (total)
>  		ratio = avg / total * 100.0;
> @@ -494,13 +554,14 @@ static void print_l1_icache_misses(int cpu,
>  static void print_dtlb_cache_misses(int cpu,
>  				    struct perf_evsel *evsel,
>  				    double avg,
> -				    struct perf_stat_output_ctx *out)
> +				    struct perf_stat_output_ctx *out,
> +				    struct runtime_stat *stat)
>  {
>  	double total, ratio = 0.0;
>  	const char *color;
>  	int ctx = evsel_context(evsel);
>  
> -	total = avg_stats(&runtime_dtlb_cache_stats[ctx][cpu]);
> +	total = runtime_stat_avg(stat, STAT_DTLB_CACHE, ctx, cpu);
>  
>  	if (total)
>  		ratio = avg / total * 100.0;
> @@ -512,13 +573,14 @@ static void print_dtlb_cache_misses(int cpu,
>  static void print_itlb_cache_misses(int cpu,
>  				    struct perf_evsel *evsel,
>  				    double avg,
> -				    struct perf_stat_output_ctx *out)
> +				    struct perf_stat_output_ctx *out,
> +				    struct runtime_stat *stat)
>  {
>  	double total, ratio = 0.0;
>  	const char *color;
>  	int ctx = evsel_context(evsel);
>  
> -	total = avg_stats(&runtime_itlb_cache_stats[ctx][cpu]);
> +	total = runtime_stat_avg(stat, STAT_ITLB_CACHE, ctx, cpu);
>  
>  	if (total)
>  		ratio = avg / total * 100.0;
> @@ -530,13 +592,14 @@ static void print_itlb_cache_misses(int cpu,
>  static void print_ll_cache_misses(int cpu,
>  				  struct perf_evsel *evsel,
>  				  double avg,
> -				  struct perf_stat_output_ctx *out)
> +				  struct perf_stat_output_ctx *out,
> +				  struct runtime_stat *stat)
>  {
>  	double total, ratio = 0.0;
>  	const char *color;
>  	int ctx = evsel_context(evsel);
>  
> -	total = avg_stats(&runtime_ll_cache_stats[ctx][cpu]);
> +	total = runtime_stat_avg(stat, STAT_LL_CACHE, ctx, cpu);
>  
>  	if (total)
>  		ratio = avg / total * 100.0;
> @@ -594,68 +657,72 @@ static double sanitize_val(double x)
>  	return x;
>  }
>  
> -static double td_total_slots(int ctx, int cpu)
> +static double td_total_slots(int ctx, int cpu, struct runtime_stat *stat)
>  {
> -	return avg_stats(&runtime_topdown_total_slots[ctx][cpu]);
> +	return runtime_stat_avg(stat, STAT_TOPDOWN_TOTAL_SLOTS, ctx, cpu);
>  }
>  
> -static double td_bad_spec(int ctx, int cpu)
> +static double td_bad_spec(int ctx, int cpu, struct runtime_stat *stat)
>  {
>  	double bad_spec = 0;
>  	double total_slots;
>  	double total;
>  
> -	total = avg_stats(&runtime_topdown_slots_issued[ctx][cpu]) -
> -		avg_stats(&runtime_topdown_slots_retired[ctx][cpu]) +
> -		avg_stats(&runtime_topdown_recovery_bubbles[ctx][cpu]);
> -	total_slots = td_total_slots(ctx, cpu);
> +	total = runtime_stat_avg(stat, STAT_TOPDOWN_SLOTS_ISSUED, ctx, cpu) -
> +		runtime_stat_avg(stat, STAT_TOPDOWN_SLOTS_RETIRED, ctx, cpu) +
> +		runtime_stat_avg(stat, STAT_TOPDOWN_RECOVERY_BUBBLES, ctx, cpu);
> +
> +	total_slots = td_total_slots(ctx, cpu, stat);
>  	if (total_slots)
>  		bad_spec = total / total_slots;
>  	return sanitize_val(bad_spec);
>  }
>  
> -static double td_retiring(int ctx, int cpu)
> +static double td_retiring(int ctx, int cpu, struct runtime_stat *stat)
>  {
>  	double retiring = 0;
> -	double total_slots = td_total_slots(ctx, cpu);
> -	double ret_slots = avg_stats(&runtime_topdown_slots_retired[ctx][cpu]);
> +	double total_slots = td_total_slots(ctx, cpu, stat);
> +	double ret_slots = runtime_stat_avg(stat, STAT_TOPDOWN_SLOTS_RETIRED,
> +					    ctx, cpu);
>  
>  	if (total_slots)
>  		retiring = ret_slots / total_slots;
>  	return retiring;
>  }
>  
> -static double td_fe_bound(int ctx, int cpu)
> +static double td_fe_bound(int ctx, int cpu, struct runtime_stat *stat)
>  {
>  	double fe_bound = 0;
> -	double total_slots = td_total_slots(ctx, cpu);
> -	double fetch_bub = avg_stats(&runtime_topdown_fetch_bubbles[ctx][cpu]);
> +	double total_slots = td_total_slots(ctx, cpu, stat);
> +	double fetch_bub = runtime_stat_avg(stat, STAT_TOPDOWN_FETCH_BUBBLES,
> +					    ctx, cpu);
>  
>  	if (total_slots)
>  		fe_bound = fetch_bub / total_slots;
>  	return fe_bound;
>  }
>  
> -static double td_be_bound(int ctx, int cpu)
> +static double td_be_bound(int ctx, int cpu, struct runtime_stat *stat)
>  {
> -	double sum = (td_fe_bound(ctx, cpu) +
> -		      td_bad_spec(ctx, cpu) +
> -		      td_retiring(ctx, cpu));
> +	double sum = (td_fe_bound(ctx, cpu, stat) +
> +		      td_bad_spec(ctx, cpu, stat) +
> +		      td_retiring(ctx, cpu, stat));
>  	if (sum == 0)
>  		return 0;
>  	return sanitize_val(1.0 - sum);
>  }
>  
>  static void print_smi_cost(int cpu, struct perf_evsel *evsel,
> -			   struct perf_stat_output_ctx *out)
> +			   struct perf_stat_output_ctx *out,
> +			   struct runtime_stat *stat)
>  {
>  	double smi_num, aperf, cycles, cost = 0.0;
>  	int ctx = evsel_context(evsel);
>  	const char *color = NULL;
>  
> -	smi_num = avg_stats(&runtime_smi_num_stats[ctx][cpu]);
> -	aperf = avg_stats(&runtime_aperf_stats[ctx][cpu]);
> -	cycles = avg_stats(&runtime_cycles_stats[ctx][cpu]);
> +	smi_num = runtime_stat_avg(stat, STAT_SMI_NUM, ctx, cpu);
> +	aperf = runtime_stat_avg(stat, STAT_APERF, ctx, cpu);
> +	cycles = runtime_stat_avg(stat, STAT_CYCLES, ctx, cpu);
>  
>  	if ((cycles == 0) || (aperf == 0))
>  		return;
> @@ -675,7 +742,8 @@ static void generic_metric(const char *metric_expr,
>  			   const char *metric_name,
>  			   double avg,
>  			   int cpu,
> -			   struct perf_stat_output_ctx *out)
> +			   struct perf_stat_output_ctx *out,
> +			   struct runtime_stat *stat)
>  {
>  	print_metric_t print_metric = out->print_metric;
>  	struct parse_ctx pctx;
> @@ -694,7 +762,8 @@ static void generic_metric(const char *metric_expr,
>  			stats = &walltime_nsecs_stats;
>  			scale = 1e-9;
>  		} else {
> -			v = saved_value_lookup(metric_events[i], cpu, false);
> +			v = saved_value_lookup(metric_events[i], cpu, false,
> +					       STAT_NONE, 0, stat);
>  			if (!v)
>  				break;
>  			stats = &v->stats;
> @@ -722,7 +791,8 @@ static void generic_metric(const char *metric_expr,
>  void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>  				   double avg, int cpu,
>  				   struct perf_stat_output_ctx *out,
> -				   struct rblist *metric_events)
> +				   struct rblist *metric_events,
> +				   struct runtime_stat *stat)
>  {
>  	void *ctxp = out->ctx;
>  	print_metric_t print_metric = out->print_metric;
> @@ -733,7 +803,8 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>  	int num = 1;
>  
>  	if (perf_evsel__match(evsel, HARDWARE, HW_INSTRUCTIONS)) {
> -		total = avg_stats(&runtime_cycles_stats[ctx][cpu]);
> +		total = runtime_stat_avg(stat, STAT_CYCLES, ctx, cpu);
> +
>  		if (total) {
>  			ratio = avg / total;
>  			print_metric(ctxp, NULL, "%7.2f ",
> @@ -741,8 +812,13 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>  		} else {
>  			print_metric(ctxp, NULL, NULL, "insn per cycle", 0);
>  		}
> -		total = avg_stats(&runtime_stalled_cycles_front_stats[ctx][cpu]);
> -		total = max(total, avg_stats(&runtime_stalled_cycles_back_stats[ctx][cpu]));
> +
> +		total = runtime_stat_avg(stat, STAT_STALLED_CYCLES_FRONT,
> +					 ctx, cpu);
> +
> +		total = max(total, runtime_stat_avg(stat,
> +						    STAT_STALLED_CYCLES_BACK,
> +						    ctx, cpu));
>  
>  		if (total && avg) {
>  			out->new_line(ctxp);
> @@ -755,8 +831,8 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>  				     "stalled cycles per insn", 0);
>  		}
>  	} else if (perf_evsel__match(evsel, HARDWARE, HW_BRANCH_MISSES)) {
> -		if (runtime_branches_stats[ctx][cpu].n != 0)
> -			print_branch_misses(cpu, evsel, avg, out);
> +		if (runtime_stat_n(stat, STAT_BRANCHES, ctx, cpu) != 0)
> +			print_branch_misses(cpu, evsel, avg, out, stat);
>  		else
>  			print_metric(ctxp, NULL, NULL, "of all branches", 0);
>  	} else if (
> @@ -764,8 +840,9 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>  		evsel->attr.config ==  ( PERF_COUNT_HW_CACHE_L1D |
>  					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
>  					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
> -		if (runtime_l1_dcache_stats[ctx][cpu].n != 0)
> -			print_l1_dcache_misses(cpu, evsel, avg, out);
> +
> +		if (runtime_stat_n(stat, STAT_L1_DCACHE, ctx, cpu) != 0)
> +			print_l1_dcache_misses(cpu, evsel, avg, out, stat);
>  		else
>  			print_metric(ctxp, NULL, NULL, "of all L1-dcache hits", 0);
>  	} else if (
> @@ -773,8 +850,9 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>  		evsel->attr.config ==  ( PERF_COUNT_HW_CACHE_L1I |
>  					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
>  					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
> -		if (runtime_l1_icache_stats[ctx][cpu].n != 0)
> -			print_l1_icache_misses(cpu, evsel, avg, out);
> +
> +		if (runtime_stat_n(stat, STAT_L1_ICACHE, ctx, cpu) != 0)
> +			print_l1_icache_misses(cpu, evsel, avg, out, stat);
>  		else
>  			print_metric(ctxp, NULL, NULL, "of all L1-icache hits", 0);
>  	} else if (
> @@ -782,8 +860,9 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>  		evsel->attr.config ==  ( PERF_COUNT_HW_CACHE_DTLB |
>  					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
>  					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
> -		if (runtime_dtlb_cache_stats[ctx][cpu].n != 0)
> -			print_dtlb_cache_misses(cpu, evsel, avg, out);
> +
> +		if (runtime_stat_n(stat, STAT_DTLB_CACHE, ctx, cpu) != 0)
> +			print_dtlb_cache_misses(cpu, evsel, avg, out, stat);
>  		else
>  			print_metric(ctxp, NULL, NULL, "of all dTLB cache hits", 0);
>  	} else if (
> @@ -791,8 +870,9 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>  		evsel->attr.config ==  ( PERF_COUNT_HW_CACHE_ITLB |
>  					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
>  					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
> -		if (runtime_itlb_cache_stats[ctx][cpu].n != 0)
> -			print_itlb_cache_misses(cpu, evsel, avg, out);
> +
> +		if (runtime_stat_n(stat, STAT_ITLB_CACHE, ctx, cpu) != 0)
> +			print_itlb_cache_misses(cpu, evsel, avg, out, stat);
>  		else
>  			print_metric(ctxp, NULL, NULL, "of all iTLB cache hits", 0);
>  	} else if (
> @@ -800,27 +880,28 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>  		evsel->attr.config ==  ( PERF_COUNT_HW_CACHE_LL |
>  					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
>  					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
> -		if (runtime_ll_cache_stats[ctx][cpu].n != 0)
> -			print_ll_cache_misses(cpu, evsel, avg, out);
> +
> +		if (runtime_stat_n(stat, STAT_LL_CACHE, ctx, cpu) != 0)
> +			print_ll_cache_misses(cpu, evsel, avg, out, stat);
>  		else
>  			print_metric(ctxp, NULL, NULL, "of all LL-cache hits", 0);
>  	} else if (perf_evsel__match(evsel, HARDWARE, HW_CACHE_MISSES)) {
> -		total = avg_stats(&runtime_cacherefs_stats[ctx][cpu]);
> +		total = runtime_stat_avg(stat, STAT_CACHEREFS, ctx, cpu);
>  
>  		if (total)
>  			ratio = avg * 100 / total;
>  
> -		if (runtime_cacherefs_stats[ctx][cpu].n != 0)
> +		if (runtime_stat_n(stat, STAT_CACHEREFS, ctx, cpu) != 0)
>  			print_metric(ctxp, NULL, "%8.3f %%",
>  				     "of all cache refs", ratio);
>  		else
>  			print_metric(ctxp, NULL, NULL, "of all cache refs", 0);
>  	} else if (perf_evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_FRONTEND)) {
> -		print_stalled_cycles_frontend(cpu, evsel, avg, out);
> +		print_stalled_cycles_frontend(cpu, evsel, avg, out, stat);
>  	} else if (perf_evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_BACKEND)) {
> -		print_stalled_cycles_backend(cpu, evsel, avg, out);
> +		print_stalled_cycles_backend(cpu, evsel, avg, out, stat);
>  	} else if (perf_evsel__match(evsel, HARDWARE, HW_CPU_CYCLES)) {
> -		total = avg_stats(&runtime_nsecs_stats[cpu]);
> +		total = runtime_stat_avg(stat, STAT_NSECS, 0, cpu);
>  
>  		if (total) {
>  			ratio = avg / total;
> @@ -829,7 +910,8 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>  			print_metric(ctxp, NULL, NULL, "Ghz", 0);
>  		}
>  	} else if (perf_stat_evsel__is(evsel, CYCLES_IN_TX)) {
> -		total = avg_stats(&runtime_cycles_stats[ctx][cpu]);
> +		total = runtime_stat_avg(stat, STAT_CYCLES, ctx, cpu);
> +
>  		if (total)
>  			print_metric(ctxp, NULL,
>  					"%7.2f%%", "transactional cycles",
> @@ -838,8 +920,9 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>  			print_metric(ctxp, NULL, NULL, "transactional cycles",
>  				     0);
>  	} else if (perf_stat_evsel__is(evsel, CYCLES_IN_TX_CP)) {
> -		total = avg_stats(&runtime_cycles_stats[ctx][cpu]);
> -		total2 = avg_stats(&runtime_cycles_in_tx_stats[ctx][cpu]);
> +		total = runtime_stat_avg(stat, STAT_CYCLES, ctx, cpu);
> +		total2 = runtime_stat_avg(stat, STAT_CYCLES_IN_TX, ctx, cpu);
> +
>  		if (total2 < avg)
>  			total2 = avg;
>  		if (total)
> @@ -848,19 +931,21 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>  		else
>  			print_metric(ctxp, NULL, NULL, "aborted cycles", 0);
>  	} else if (perf_stat_evsel__is(evsel, TRANSACTION_START)) {
> -		total = avg_stats(&runtime_cycles_in_tx_stats[ctx][cpu]);
> +		total = runtime_stat_avg(stat, STAT_CYCLES_IN_TX,
> +					 ctx, cpu);
>  
>  		if (avg)
>  			ratio = total / avg;
>  
> -		if (runtime_cycles_in_tx_stats[ctx][cpu].n != 0)
> +		if (runtime_stat_n(stat, STAT_CYCLES_IN_TX, ctx, cpu) != 0)
>  			print_metric(ctxp, NULL, "%8.0f",
>  				     "cycles / transaction", ratio);
>  		else
>  			print_metric(ctxp, NULL, NULL, "cycles / transaction",
> -				     0);
> +				      0);
>  	} else if (perf_stat_evsel__is(evsel, ELISION_START)) {
> -		total = avg_stats(&runtime_cycles_in_tx_stats[ctx][cpu]);
> +		total = runtime_stat_avg(stat, STAT_CYCLES_IN_TX,
> +					 ctx, cpu);
>  
>  		if (avg)
>  			ratio = total / avg;
> @@ -874,28 +959,28 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>  		else
>  			print_metric(ctxp, NULL, NULL, "CPUs utilized", 0);
>  	} else if (perf_stat_evsel__is(evsel, TOPDOWN_FETCH_BUBBLES)) {
> -		double fe_bound = td_fe_bound(ctx, cpu);
> +		double fe_bound = td_fe_bound(ctx, cpu, stat);
>  
>  		if (fe_bound > 0.2)
>  			color = PERF_COLOR_RED;
>  		print_metric(ctxp, color, "%8.1f%%", "frontend bound",
>  				fe_bound * 100.);
>  	} else if (perf_stat_evsel__is(evsel, TOPDOWN_SLOTS_RETIRED)) {
> -		double retiring = td_retiring(ctx, cpu);
> +		double retiring = td_retiring(ctx, cpu, stat);
>  
>  		if (retiring > 0.7)
>  			color = PERF_COLOR_GREEN;
>  		print_metric(ctxp, color, "%8.1f%%", "retiring",
>  				retiring * 100.);
>  	} else if (perf_stat_evsel__is(evsel, TOPDOWN_RECOVERY_BUBBLES)) {
> -		double bad_spec = td_bad_spec(ctx, cpu);
> +		double bad_spec = td_bad_spec(ctx, cpu, stat);
>  
>  		if (bad_spec > 0.1)
>  			color = PERF_COLOR_RED;
>  		print_metric(ctxp, color, "%8.1f%%", "bad speculation",
>  				bad_spec * 100.);
>  	} else if (perf_stat_evsel__is(evsel, TOPDOWN_SLOTS_ISSUED)) {
> -		double be_bound = td_be_bound(ctx, cpu);
> +		double be_bound = td_be_bound(ctx, cpu, stat);
>  		const char *name = "backend bound";
>  		static int have_recovery_bubbles = -1;
>  
> @@ -908,19 +993,19 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>  
>  		if (be_bound > 0.2)
>  			color = PERF_COLOR_RED;
> -		if (td_total_slots(ctx, cpu) > 0)
> +		if (td_total_slots(ctx, cpu, stat) > 0)
>  			print_metric(ctxp, color, "%8.1f%%", name,
>  					be_bound * 100.);
>  		else
>  			print_metric(ctxp, NULL, NULL, name, 0);
>  	} else if (evsel->metric_expr) {
>  		generic_metric(evsel->metric_expr, evsel->metric_events, evsel->name,
> -				evsel->metric_name, avg, cpu, out);
> -	} else if (runtime_nsecs_stats[cpu].n != 0) {
> +				evsel->metric_name, avg, cpu, out, stat);
> +	} else if (runtime_stat_n(stat, STAT_NSECS, 0, cpu) != 0) {
>  		char unit = 'M';
>  		char unit_buf[10];
>  
> -		total = avg_stats(&runtime_nsecs_stats[cpu]);
> +		total = runtime_stat_avg(stat, STAT_NSECS, 0, cpu);
>  
>  		if (total)
>  			ratio = 1000.0 * avg / total;
> @@ -931,7 +1016,7 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>  		snprintf(unit_buf, sizeof(unit_buf), "%c/sec", unit);
>  		print_metric(ctxp, NULL, "%8.3f", unit_buf, ratio);
>  	} else if (perf_stat_evsel__is(evsel, SMI_NUM)) {
> -		print_smi_cost(cpu, evsel, out);
> +		print_smi_cost(cpu, evsel, out, stat);
>  	} else {
>  		num = 0;
>  	}
> @@ -944,7 +1029,7 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>  				out->new_line(ctxp);
>  			generic_metric(mexp->metric_expr, mexp->metric_events,
>  					evsel->name, mexp->metric_name,
> -					avg, cpu, out);
> +					avg, cpu, out, stat);
>  		}
>  	}
>  	if (num == 0)
> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> index 151e9ef..78abfd4 100644
> --- a/tools/perf/util/stat.c
> +++ b/tools/perf/util/stat.c
> @@ -278,9 +278,11 @@ process_counter_values(struct perf_stat_config *config, struct perf_evsel *evsel
>  			perf_evsel__compute_deltas(evsel, cpu, thread, count);
>  		perf_counts_values__scale(count, config->scale, NULL);
>  		if (config->aggr_mode == AGGR_NONE)
> -			perf_stat__update_shadow_stats(evsel, count->val, cpu);
> +			perf_stat__update_shadow_stats(evsel, count->val, cpu,
> +						       &rt_stat);
>  		if (config->aggr_mode == AGGR_THREAD)
> -			perf_stat__update_shadow_stats(evsel, count->val, 0);
> +			perf_stat__update_shadow_stats(evsel, count->val, 0,
> +						       &rt_stat);
>  		break;
>  	case AGGR_GLOBAL:
>  		aggr->val += count->val;
> @@ -362,7 +364,7 @@ int perf_stat_process_counter(struct perf_stat_config *config,
>  	/*
>  	 * Save the full runtime - to allow normalization during printout:
>  	 */
> -	perf_stat__update_shadow_stats(counter, *count, 0);
> +	perf_stat__update_shadow_stats(counter, *count, 0, &rt_stat);
>  
>  	return 0;
>  }
> diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
> index 1e2b761..b8448b1 100644
> --- a/tools/perf/util/stat.h
> +++ b/tools/perf/util/stat.h
> @@ -130,7 +130,7 @@ void runtime_stat__exit(struct runtime_stat *stat);
>  void perf_stat__init_shadow_stats(void);
>  void perf_stat__reset_shadow_stats(void);
>  void perf_stat__update_shadow_stats(struct perf_evsel *counter, u64 count,
> -				    int cpu);
> +				    int cpu, struct runtime_stat *stat);
>  struct perf_stat_output_ctx {
>  	void *ctx;
>  	print_metric_t print_metric;
> @@ -141,7 +141,8 @@ struct perf_stat_output_ctx {
>  void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>  				   double avg, int cpu,
>  				   struct perf_stat_output_ctx *out,
> -				   struct rblist *metric_events);
> +				   struct rblist *metric_events,
> +				   struct runtime_stat *stat);
>  void perf_stat__collect_metric_expr(struct perf_evlist *);
>  
>  int perf_evlist__alloc_stats(struct perf_evlist *evlist, bool alloc_raw);
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 10/12] perf util: Reuse thread_map__new_by_uid to enumerate threads from /proc
  2017-12-01 10:57 ` [PATCH v5 10/12] perf util: Reuse thread_map__new_by_uid to enumerate threads from /proc Jin Yao
@ 2017-12-01 14:44   ` Arnaldo Carvalho de Melo
  2017-12-01 15:02     ` Arnaldo Carvalho de Melo
  2017-12-02  4:47     ` Jin, Yao
  0 siblings, 2 replies; 27+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-12-01 14:44 UTC (permalink / raw)
  To: Jin Yao
  Cc: jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

Em Fri, Dec 01, 2017 at 06:57:34PM +0800, Jin Yao escreveu:
> Perf already has a function thread_map__new_by_uid() which can
> enumerate all threads from /proc by uid.
> 
> This patch creates a static function enumerate_threads() which
> reuses the common code in thread_map__new_by_uid() to enumerate
> threads from /proc.
> 
> The enumerate_threads() is shared by thread_map__new_by_uid()
> and a new function thread_map__new_threads().
> 
> The new function thread_map__new_threads() is called to enumerate
> all threads from /proc.
> 
> Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
> ---
>  tools/perf/tests/thread-map.c |  2 +-
>  tools/perf/util/evlist.c      |  3 ++-
>  tools/perf/util/thread_map.c  | 19 ++++++++++++++++---
>  tools/perf/util/thread_map.h  |  3 ++-
>  4 files changed, 21 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/perf/tests/thread-map.c b/tools/perf/tests/thread-map.c
> index dbcb6a1..4de1939 100644
> --- a/tools/perf/tests/thread-map.c
> +++ b/tools/perf/tests/thread-map.c
> @@ -105,7 +105,7 @@ int test__thread_map_remove(struct test *test __maybe_unused, int subtest __mayb
>  	TEST_ASSERT_VAL("failed to allocate map string",
>  			asprintf(&str, "%d,%d", getpid(), getppid()) >= 0);
>  
> -	threads = thread_map__new_str(str, NULL, 0);
> +	threads = thread_map__new_str(str, NULL, 0, false);
>  
>  	TEST_ASSERT_VAL("failed to allocate thread_map",
>  			threads);
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 199bb82..05b8f2b 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -1102,7 +1102,8 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target)
>  	struct cpu_map *cpus;
>  	struct thread_map *threads;
>  
> -	threads = thread_map__new_str(target->pid, target->tid, target->uid);
> +	threads = thread_map__new_str(target->pid, target->tid, target->uid,
> +				      target->per_thread);
>  
>  	if (!threads)
>  		return -1;
> diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c
> index be0d5a7..5672268 100644
> --- a/tools/perf/util/thread_map.c
> +++ b/tools/perf/util/thread_map.c
> @@ -92,7 +92,7 @@ struct thread_map *thread_map__new_by_tid(pid_t tid)
>  	return threads;
>  }
>  
> -struct thread_map *thread_map__new_by_uid(uid_t uid)
> +static struct thread_map *enumerate_threads(uid_t uid)
>  {
>  	DIR *proc;
>  	int max_threads = 32, items, i;
> @@ -124,7 +124,7 @@ struct thread_map *thread_map__new_by_uid(uid_t uid)
>  		if (stat(path, &st) != 0)
>  			continue;

Look, for the case where you want all threads enumerated you will incur
the above stat() cost for all of them and will not use it at all...

And new_threads() seems vague, I'm using the ter used by 'perf record'
for system wide sampling, see the patch below:

diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c
index be0d5a736dea..79d11bd4543a 100644
--- a/tools/perf/util/thread_map.c
+++ b/tools/perf/util/thread_map.c
@@ -92,7 +92,7 @@ struct thread_map *thread_map__new_by_tid(pid_t tid)
 	return threads;
 }
 
-struct thread_map *thread_map__new_by_uid(uid_t uid)
+static struct thread_map *__thread_map__new_all_cpus(uid_t uid)
 {
 	DIR *proc;
 	int max_threads = 32, items, i;
@@ -113,7 +113,6 @@ struct thread_map *thread_map__new_by_uid(uid_t uid)
 	while ((dirent = readdir(proc)) != NULL) {
 		char *end;
 		bool grow = false;
-		struct stat st;
 		pid_t pid = strtol(dirent->d_name, &end, 10);
 
 		if (*end) /* only interested in proper numerical dirents */
@@ -121,11 +120,12 @@ struct thread_map *thread_map__new_by_uid(uid_t uid)
 
 		snprintf(path, sizeof(path), "/proc/%s", dirent->d_name);
 
-		if (stat(path, &st) != 0)
-			continue;
+		if (uid != UID_MAX) {
+			struct stat st;
 
-		if (st.st_uid != uid)
-			continue;
+			if (stat(path, &st) != 0 || st.st_uid != uid)
+				continue;
+		}
 
 		snprintf(path, sizeof(path), "/proc/%d/task", pid);
 		items = scandir(path, &namelist, filter, NULL);
@@ -178,6 +178,16 @@ struct thread_map *thread_map__new_by_uid(uid_t uid)
 	goto out_closedir;
 }
 
+struct thread_map *thread_map__new_all_cpus(void)
+{
+	return __thread__new_all_cpus(UID_MAX);
+}
+
+struct thread_map *thread_map__new_by_uid(uid_t uid)
+{
+	return __thread__new_all_cpus(uid);
+}
+
 struct thread_map *thread_map__new(pid_t pid, pid_t tid, uid_t uid)
 {
 	if (pid != -1)
diff --git a/tools/perf/util/thread_map.h b/tools/perf/util/thread_map.h
index f15803985435..07a765fb22bb 100644
--- a/tools/perf/util/thread_map.h
+++ b/tools/perf/util/thread_map.h
@@ -23,6 +23,7 @@ struct thread_map *thread_map__new_dummy(void);
 struct thread_map *thread_map__new_by_pid(pid_t pid);
 struct thread_map *thread_map__new_by_tid(pid_t tid);
 struct thread_map *thread_map__new_by_uid(uid_t uid);
+struct thread_map *thread_map__new_all_cpus(void);
 struct thread_map *thread_map__new(pid_t pid, pid_t tid, uid_t uid);
 struct thread_map *thread_map__new_event(struct thread_map_event *event);
 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 10/12] perf util: Reuse thread_map__new_by_uid to enumerate threads from /proc
  2017-12-01 14:44   ` Arnaldo Carvalho de Melo
@ 2017-12-01 15:02     ` Arnaldo Carvalho de Melo
  2017-12-02  4:53       ` Jin, Yao
  2017-12-02  4:47     ` Jin, Yao
  1 sibling, 1 reply; 27+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-12-01 15:02 UTC (permalink / raw)
  To: Jin Yao
  Cc: jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

Em Fri, Dec 01, 2017 at 11:44:25AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, Dec 01, 2017 at 06:57:34PM +0800, Jin Yao escreveu:
> > Perf already has a function thread_map__new_by_uid() which can
> > enumerate all threads from /proc by uid.
> > 
> > This patch creates a static function enumerate_threads() which
> > reuses the common code in thread_map__new_by_uid() to enumerate
> > threads from /proc.
> > 
> > The enumerate_threads() is shared by thread_map__new_by_uid()
> > and a new function thread_map__new_threads().
> > 
> > The new function thread_map__new_threads() is called to enumerate
> > all threads from /proc.
> > 
> > Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
> > ---
> >  tools/perf/tests/thread-map.c |  2 +-
> >  tools/perf/util/evlist.c      |  3 ++-
> >  tools/perf/util/thread_map.c  | 19 ++++++++++++++++---
> >  tools/perf/util/thread_map.h  |  3 ++-
> >  4 files changed, 21 insertions(+), 6 deletions(-)
> > 
> > diff --git a/tools/perf/tests/thread-map.c b/tools/perf/tests/thread-map.c
> > index dbcb6a1..4de1939 100644
> > --- a/tools/perf/tests/thread-map.c
> > +++ b/tools/perf/tests/thread-map.c
> > @@ -105,7 +105,7 @@ int test__thread_map_remove(struct test *test __maybe_unused, int subtest __mayb
> >  	TEST_ASSERT_VAL("failed to allocate map string",
> >  			asprintf(&str, "%d,%d", getpid(), getppid()) >= 0);
> >  
> > -	threads = thread_map__new_str(str, NULL, 0);
> > +	threads = thread_map__new_str(str, NULL, 0, false);
> >  
> >  	TEST_ASSERT_VAL("failed to allocate thread_map",
> >  			threads);
> > diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> > index 199bb82..05b8f2b 100644
> > --- a/tools/perf/util/evlist.c
> > +++ b/tools/perf/util/evlist.c
> > @@ -1102,7 +1102,8 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target)
> >  	struct cpu_map *cpus;
> >  	struct thread_map *threads;
> >  
> > -	threads = thread_map__new_str(target->pid, target->tid, target->uid);
> > +	threads = thread_map__new_str(target->pid, target->tid, target->uid,
> > +				      target->per_thread);
> >  
> >  	if (!threads)
> >  		return -1;
> > diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c
> > index be0d5a7..5672268 100644
> > --- a/tools/perf/util/thread_map.c
> > +++ b/tools/perf/util/thread_map.c
> > @@ -92,7 +92,7 @@ struct thread_map *thread_map__new_by_tid(pid_t tid)
> >  	return threads;
> >  }
> >  
> > -struct thread_map *thread_map__new_by_uid(uid_t uid)
> > +static struct thread_map *enumerate_threads(uid_t uid)
> >  {
> >  	DIR *proc;
> >  	int max_threads = 32, items, i;
> > @@ -124,7 +124,7 @@ struct thread_map *thread_map__new_by_uid(uid_t uid)
> >  		if (stat(path, &st) != 0)
> >  			continue;
> 
> Look, for the case where you want all threads enumerated you will incur
> the above stat() cost for all of them and will not use it at all...
> 
> And new_threads() seems vague, I'm using the ter used by 'perf record'
> for system wide sampling, see the patch below:
> 

The one below even compiles, I'll push what I merged already and we can
continue from there:

[acme@jouet linux]$ git log --oneline -3
5cc4fc8994eb (HEAD -> perf/core) perf thread_map: Add method to map all threads in the system
9e49c22bd4d8 perf stat: Add rbtree node_delete op
1c92e3226546 perf rblist: Create rblist__exit() function
[acme@jouet linux]$


commit 5cc4fc8994eb3f3af1950f3b726b6008900c7b06
Author: Arnaldo Carvalho de Melo <acme@redhat.com>
Date:   Fri Dec 1 11:44:30 2017 -0300

    perf thread_map: Add method to map all threads in the system
    
    Reusing the thread_map__new_by_uid() proc scanning already in place to
    return a map with all threads in the system.
    
    Based-on-a-patch-by: Jin Yao <yao.jin@linux.intel.com>
    Acked-by: Jiri Olsa <jolsa@kernel.org>
    Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
    Cc: Andi Kleen <ak@linux.intel.com>
    Cc: Kan Liang <kan.liang@intel.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Link: https://lkml.kernel.org/n/tip-khh28q0wwqbqtrk32bfe07hd@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c
index be0d5a736dea..2b653853eec2 100644
--- a/tools/perf/util/thread_map.c
+++ b/tools/perf/util/thread_map.c
@@ -92,7 +92,7 @@ struct thread_map *thread_map__new_by_tid(pid_t tid)
 	return threads;
 }
 
-struct thread_map *thread_map__new_by_uid(uid_t uid)
+static struct thread_map *__thread_map__new_all_cpus(uid_t uid)
 {
 	DIR *proc;
 	int max_threads = 32, items, i;
@@ -113,7 +113,6 @@ struct thread_map *thread_map__new_by_uid(uid_t uid)
 	while ((dirent = readdir(proc)) != NULL) {
 		char *end;
 		bool grow = false;
-		struct stat st;
 		pid_t pid = strtol(dirent->d_name, &end, 10);
 
 		if (*end) /* only interested in proper numerical dirents */
@@ -121,11 +120,12 @@ struct thread_map *thread_map__new_by_uid(uid_t uid)
 
 		snprintf(path, sizeof(path), "/proc/%s", dirent->d_name);
 
-		if (stat(path, &st) != 0)
-			continue;
+		if (uid != UINT_MAX) {
+			struct stat st;
 
-		if (st.st_uid != uid)
-			continue;
+			if (stat(path, &st) != 0 || st.st_uid != uid)
+				continue;
+		}
 
 		snprintf(path, sizeof(path), "/proc/%d/task", pid);
 		items = scandir(path, &namelist, filter, NULL);
@@ -178,6 +178,16 @@ struct thread_map *thread_map__new_by_uid(uid_t uid)
 	goto out_closedir;
 }
 
+struct thread_map *thread_map__new_all_cpus(void)
+{
+	return __thread_map__new_all_cpus(UINT_MAX);
+}
+
+struct thread_map *thread_map__new_by_uid(uid_t uid)
+{
+	return __thread_map__new_all_cpus(uid);
+}
+
 struct thread_map *thread_map__new(pid_t pid, pid_t tid, uid_t uid)
 {
 	if (pid != -1)
diff --git a/tools/perf/util/thread_map.h b/tools/perf/util/thread_map.h
index f15803985435..07a765fb22bb 100644
--- a/tools/perf/util/thread_map.h
+++ b/tools/perf/util/thread_map.h
@@ -23,6 +23,7 @@ struct thread_map *thread_map__new_dummy(void);
 struct thread_map *thread_map__new_by_pid(pid_t pid);
 struct thread_map *thread_map__new_by_tid(pid_t tid);
 struct thread_map *thread_map__new_by_uid(uid_t uid);
+struct thread_map *thread_map__new_all_cpus(void);
 struct thread_map *thread_map__new(pid_t pid, pid_t tid, uid_t uid);
 struct thread_map *thread_map__new_event(struct thread_map_event *event);
 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 04/12] perf util: Add rbtree node_delete ops
  2017-12-01 14:14   ` Arnaldo Carvalho de Melo
@ 2017-12-01 18:29     ` Andi Kleen
  0 siblings, 0 replies; 27+ messages in thread
From: Andi Kleen @ 2017-12-01 18:29 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Andi Kleen, Jin Yao, jolsa, peterz, mingo, alexander.shishkin,
	Linux-kernel, ak, kan.liang, yao.jin

On Fri, Dec 01, 2017 at 11:14:52AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Fri, Dec 01, 2017 at 06:57:28PM +0800, Jin Yao escreveu:
> > @@ -130,7 +140,7 @@ void perf_stat__init_shadow_stats(void)
> >  	rblist__init(&runtime_saved_values);
> >  	runtime_saved_values.node_cmp = saved_value_cmp;
> >  	runtime_saved_values.node_new = saved_value_new;
> > -	/* No delete for now */
> > +	runtime_saved_values.node_delete = saved_value_delete;
> >  }
> 
> Andi, was some reason behind that comment, is it safe to add it now?

It is safe to add.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 02/12] perf util: Define a structure for runtime shadow stats
  2017-12-01 14:02   ` Arnaldo Carvalho de Melo
@ 2017-12-02  4:39     ` Jin, Yao
  0 siblings, 0 replies; 27+ messages in thread
From: Jin, Yao @ 2017-12-02  4:39 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin



On 12/1/2017 10:02 PM, Arnaldo Carvalho de Melo wrote:
> Em Fri, Dec 01, 2017 at 06:57:26PM +0800, Jin Yao escreveu:
>> Perf has a set of static variables to record the runtime shadow
>> metrics stats.
>>
>> While if we want to record the runtime shadow stats for per-thread,
>> it will be the limitation. This patch creates a structure and the
>> next patches will use this structure to update the runtime shadow
>> stats for per-thread.
>>
>> Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
>> ---
>>   tools/perf/util/stat-shadow.c | 11 -----------
>>   tools/perf/util/stat.h        | 44 ++++++++++++++++++++++++++++++++++++++++++-
>>   2 files changed, 43 insertions(+), 12 deletions(-)
>>
>> diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
>> index 855e35c..5853901 100644
>> --- a/tools/perf/util/stat-shadow.c
>> +++ b/tools/perf/util/stat-shadow.c
>> @@ -9,17 +9,6 @@
>>   #include "expr.h"
>>   #include "metricgroup.h"
>>   
>> -enum {
>> -	CTX_BIT_USER	= 1 << 0,
>> -	CTX_BIT_KERNEL	= 1 << 1,
>> -	CTX_BIT_HV	= 1 << 2,
>> -	CTX_BIT_HOST	= 1 << 3,
>> -	CTX_BIT_IDLE	= 1 << 4,
>> -	CTX_BIT_MAX	= 1 << 5,
>> -};
>> -
>> -#define NUM_CTX CTX_BIT_MAX
>> -
>>   /*
>>    * AGGR_GLOBAL: Use CPU 0
>>    * AGGR_SOCKET: Use first CPU of socket
>> diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
>> index eefca5c..290c51e 100644
>> --- a/tools/perf/util/stat.h
>> +++ b/tools/perf/util/stat.h
>> @@ -5,6 +5,8 @@
>>   #include <linux/types.h>
>>   #include <stdio.h>
>>   #include "xyarray.h"
>> +#include "evsel.h"
> 
> What is this for? You don't add anything in this patch that uses things
> from evsel.h.
> 
> I'm removing it, will fixup if later this becomes needed.
> 
> - Arnaldo
> 


That's fine, thanks Arnaldo!

Thanks
Jin Yao

>> +#include "rblist.h"
>>   
>>   struct stats
>>   {
>> @@ -43,6 +45,47 @@ enum aggr_mode {
>>   	AGGR_UNSET,
>>   };
>>   
>> +enum {
>> +	CTX_BIT_USER	= 1 << 0,
>> +	CTX_BIT_KERNEL	= 1 << 1,
>> +	CTX_BIT_HV	= 1 << 2,
>> +	CTX_BIT_HOST	= 1 << 3,
>> +	CTX_BIT_IDLE	= 1 << 4,
>> +	CTX_BIT_MAX	= 1 << 5,
>> +};
>> +
>> +#define NUM_CTX CTX_BIT_MAX
>> +
>> +enum stat_type {
>> +	STAT_NONE = 0,
>> +	STAT_NSECS,
>> +	STAT_CYCLES,
>> +	STAT_STALLED_CYCLES_FRONT,
>> +	STAT_STALLED_CYCLES_BACK,
>> +	STAT_BRANCHES,
>> +	STAT_CACHEREFS,
>> +	STAT_L1_DCACHE,
>> +	STAT_L1_ICACHE,
>> +	STAT_LL_CACHE,
>> +	STAT_ITLB_CACHE,
>> +	STAT_DTLB_CACHE,
>> +	STAT_CYCLES_IN_TX,
>> +	STAT_TRANSACTION,
>> +	STAT_ELISION,
>> +	STAT_TOPDOWN_TOTAL_SLOTS,
>> +	STAT_TOPDOWN_SLOTS_ISSUED,
>> +	STAT_TOPDOWN_SLOTS_RETIRED,
>> +	STAT_TOPDOWN_FETCH_BUBBLES,
>> +	STAT_TOPDOWN_RECOVERY_BUBBLES,
>> +	STAT_SMI_NUM,
>> +	STAT_APERF,
>> +	STAT_MAX
>> +};
>> +
>> +struct runtime_stat {
>> +	struct rblist value_list;
>> +};
>> +
>>   struct perf_stat_config {
>>   	enum aggr_mode	aggr_mode;
>>   	bool		scale;
>> @@ -92,7 +135,6 @@ struct perf_stat_output_ctx {
>>   	bool force_header;
>>   };
>>   
>> -struct rblist;
>>   void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>>   				   double avg, int cpu,
>>   				   struct perf_stat_output_ctx *out,
>> -- 
>> 2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 03/12] perf util: Extend rbtree to support shadow stats
  2017-12-01 14:10   ` Arnaldo Carvalho de Melo
@ 2017-12-02  4:40     ` Jin, Yao
  0 siblings, 0 replies; 27+ messages in thread
From: Jin, Yao @ 2017-12-02  4:40 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin



On 12/1/2017 10:10 PM, Arnaldo Carvalho de Melo wrote:
> Em Fri, Dec 01, 2017 at 06:57:27PM +0800, Jin Yao escreveu:
>> Previously the rbtree was used to link generic metrics.
> 
> Try to make the one line subject more descriptive, I'm changing it to:
> 
> perf stat: Extend rbtree to support per-thread shadow stats
> 
> - Arnaldo
>   

Yes, this new subject is better.

Thanks
Jin Yao

>> This patches adds new ctx/type/stat into rbtree keys because we
>> will use this rbtree to maintain shadow metrics to replace original
>> a couple of static arrays for supporting per-thread shadow stats.
>>
>> Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
>> ---
>>   tools/perf/util/stat-shadow.c | 27 +++++++++++++++++++++++++++
>>   1 file changed, 27 insertions(+)
>>
>> diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
>> index 5853901..c53b80d 100644
>> --- a/tools/perf/util/stat-shadow.c
>> +++ b/tools/perf/util/stat-shadow.c
>> @@ -45,7 +45,10 @@ struct stats walltime_nsecs_stats;
>>   struct saved_value {
>>   	struct rb_node rb_node;
>>   	struct perf_evsel *evsel;
>> +	enum stat_type type;
>> +	int ctx;
>>   	int cpu;
>> +	struct runtime_stat *stat;
>>   	struct stats stats;
>>   };
>>   
>> @@ -58,6 +61,30 @@ static int saved_value_cmp(struct rb_node *rb_node, const void *entry)
>>   
>>   	if (a->cpu != b->cpu)
>>   		return a->cpu - b->cpu;
>> +
>> +	/*
>> +	 * Previously the rbtree was used to link generic metrics.
>> +	 * The keys were evsel/cpu. Now the rbtree is extended to support
>> +	 * per-thread shadow stats. For shadow stats case, the keys
>> +	 * are cpu/type/ctx/stat (evsel is NULL). For generic metrics
>> +	 * case, the keys are still evsel/cpu (type/ctx/stat are 0 or NULL).
>> +	 */
>> +	if (a->type != b->type)
>> +		return a->type - b->type;
>> +
>> +	if (a->ctx != b->ctx)
>> +		return a->ctx - b->ctx;
>> +
>> +	if (a->evsel == NULL && b->evsel == NULL) {
>> +		if (a->stat == b->stat)
>> +			return 0;
>> +
>> +		if ((char *)a->stat < (char *)b->stat)
>> +			return -1;
>> +
>> +		return 1;
>> +	}
>> +
>>   	if (a->evsel == b->evsel)
>>   		return 0;
>>   	if ((char *)a->evsel < (char *)b->evsel)
>> -- 
>> 2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 06/12] perf util: Update and print per-thread shadow stats
  2017-12-01 14:21   ` Arnaldo Carvalho de Melo
@ 2017-12-02  4:46     ` Jin, Yao
  0 siblings, 0 replies; 27+ messages in thread
From: Jin, Yao @ 2017-12-02  4:46 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin



On 12/1/2017 10:21 PM, Arnaldo Carvalho de Melo wrote:
> Em Fri, Dec 01, 2017 at 06:57:30PM +0800, Jin Yao escreveu:
>> The functions perf_stat__update_shadow_stats() and
>> perf_stat__print_shadow_statss() are called to update
>> and print the shadow stats on a set of static variables.
>>
>> But the static variables are the limitations to support
>> per-thread shadow stats.
>>
>> This patch lets the perf_stat__update_shadow_stats() support
>> to update the shadow stats on a input parameter 'stat' and
>> uses update_runtime_stat() to update the stats. It will not
>> directly update the static variables as before.
>>
>> And this patch also lets the perf_stat__print_shadow_stats()
> 
> When 'also' appears on a patch usually it means it should be split in
> two, one for the things up to the 'also' and another for the remaining
> parts.
> 
> A patch that has these stats:
> 
> 5 files changed, 219 insertions(+), 120 deletions(-)
> 
> raises eyebrows :-\
> 
> I'm trying now to break it into at least two, one for printing and the
> other for the rest.
> 
> - Arnaldo
> 

Yes, too much in this patch.

Actually I also want to split it into more patches. While I just feel a 
little bit difficulty because some dependencies are there.

If you need me to do anything on this patch (e.g. 
refine/split/reorg/...), I'd like to, please let me know.

Thanks
Jin Yao

>> support to print the shadow stats from a input parameter 'stat'.
>>
>> It will not directly get value from static variable. Instead, it now
>> uses runtime_stat_avg() and runtime_stat_n() to get and compute the
>> values.
>>
>> Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
>> ---
>>   tools/perf/builtin-script.c   |   6 +-
>>   tools/perf/builtin-stat.c     |  27 ++--
>>   tools/perf/util/stat-shadow.c | 293 +++++++++++++++++++++++++++---------------
>>   tools/perf/util/stat.c        |   8 +-
>>   tools/perf/util/stat.h        |   5 +-
>>   5 files changed, 219 insertions(+), 120 deletions(-)
>>
>> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
>> index 39d8b55..fac6f05 100644
>> --- a/tools/perf/builtin-script.c
>> +++ b/tools/perf/builtin-script.c
>> @@ -1548,7 +1548,8 @@ static void perf_sample__fprint_metric(struct perf_script *script,
>>   	val = sample->period * evsel->scale;
>>   	perf_stat__update_shadow_stats(evsel,
>>   				       val,
>> -				       sample->cpu);
>> +				       sample->cpu,
>> +				       &rt_stat);
>>   	evsel_script(evsel)->val = val;
>>   	if (evsel_script(evsel->leader)->gnum == evsel->leader->nr_members) {
>>   		for_each_group_member (ev2, evsel->leader) {
>> @@ -1556,7 +1557,8 @@ static void perf_sample__fprint_metric(struct perf_script *script,
>>   						      evsel_script(ev2)->val,
>>   						      sample->cpu,
>>   						      &ctx,
>> -						      NULL);
>> +						      NULL,
>> +						      &rt_stat);
>>   		}
>>   		evsel_script(evsel->leader)->gnum = 0;
>>   	}
>> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
>> index a027b47..1edc082 100644
>> --- a/tools/perf/builtin-stat.c
>> +++ b/tools/perf/builtin-stat.c
>> @@ -1097,7 +1097,8 @@ static void abs_printout(int id, int nr, struct perf_evsel *evsel, double avg)
>>   }
>>   
>>   static void printout(int id, int nr, struct perf_evsel *counter, double uval,
>> -		     char *prefix, u64 run, u64 ena, double noise)
>> +		     char *prefix, u64 run, u64 ena, double noise,
>> +		     struct runtime_stat *stat)
>>   {
>>   	struct perf_stat_output_ctx out;
>>   	struct outstate os = {
>> @@ -1190,7 +1191,8 @@ static void printout(int id, int nr, struct perf_evsel *counter, double uval,
>>   
>>   	perf_stat__print_shadow_stats(counter, uval,
>>   				first_shadow_cpu(counter, id),
>> -				&out, &metric_events);
>> +				&out, &metric_events,
>> +				stat);
>>   	if (!csv_output && !metric_only) {
>>   		print_noise(counter, noise);
>>   		print_running(run, ena);
>> @@ -1214,7 +1216,8 @@ static void aggr_update_shadow(void)
>>   				val += perf_counts(counter->counts, cpu, 0)->val;
>>   			}
>>   			perf_stat__update_shadow_stats(counter, val,
>> -						       first_shadow_cpu(counter, id));
>> +					first_shadow_cpu(counter, id),
>> +					&rt_stat);
>>   		}
>>   	}
>>   }
>> @@ -1334,7 +1337,8 @@ static void print_aggr(char *prefix)
>>   				fprintf(output, "%s", prefix);
>>   
>>   			uval = val * counter->scale;
>> -			printout(id, nr, counter, uval, prefix, run, ena, 1.0);
>> +			printout(id, nr, counter, uval, prefix, run, ena, 1.0,
>> +				 &rt_stat);
>>   			if (!metric_only)
>>   				fputc('\n', output);
>>   		}
>> @@ -1364,7 +1368,8 @@ static void print_aggr_thread(struct perf_evsel *counter, char *prefix)
>>   			fprintf(output, "%s", prefix);
>>   
>>   		uval = val * counter->scale;
>> -		printout(thread, 0, counter, uval, prefix, run, ena, 1.0);
>> +		printout(thread, 0, counter, uval, prefix, run, ena, 1.0,
>> +			 &rt_stat);
>>   		fputc('\n', output);
>>   	}
>>   }
>> @@ -1401,7 +1406,8 @@ static void print_counter_aggr(struct perf_evsel *counter, char *prefix)
>>   		fprintf(output, "%s", prefix);
>>   
>>   	uval = cd.avg * counter->scale;
>> -	printout(-1, 0, counter, uval, prefix, cd.avg_running, cd.avg_enabled, cd.avg);
>> +	printout(-1, 0, counter, uval, prefix, cd.avg_running, cd.avg_enabled,
>> +		 cd.avg, &rt_stat);
>>   	if (!metric_only)
>>   		fprintf(output, "\n");
>>   }
>> @@ -1440,7 +1446,8 @@ static void print_counter(struct perf_evsel *counter, char *prefix)
>>   			fprintf(output, "%s", prefix);
>>   
>>   		uval = val * counter->scale;
>> -		printout(cpu, 0, counter, uval, prefix, run, ena, 1.0);
>> +		printout(cpu, 0, counter, uval, prefix, run, ena, 1.0,
>> +			 &rt_stat);
>>   
>>   		fputc('\n', output);
>>   	}
>> @@ -1472,7 +1479,8 @@ static void print_no_aggr_metric(char *prefix)
>>   			run = perf_counts(counter->counts, cpu, 0)->run;
>>   
>>   			uval = val * counter->scale;
>> -			printout(cpu, 0, counter, uval, prefix, run, ena, 1.0);
>> +			printout(cpu, 0, counter, uval, prefix, run, ena, 1.0,
>> +				 &rt_stat);
>>   		}
>>   		fputc('\n', stat_config.output);
>>   	}
>> @@ -1528,7 +1536,8 @@ static void print_metric_headers(const char *prefix, bool no_indent)
>>   		perf_stat__print_shadow_stats(counter, 0,
>>   					      0,
>>   					      &out,
>> -					      &metric_events);
>> +					      &metric_events,
>> +					      &rt_stat);
>>   	}
>>   	fputc('\n', stat_config.output);
>>   }
>> diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
>> index e60c321..0d34d5e 100644
>> --- a/tools/perf/util/stat-shadow.c
>> +++ b/tools/perf/util/stat-shadow.c
>> @@ -116,19 +116,29 @@ static void saved_value_delete(struct rblist *rblist __maybe_unused,
>>   
>>   static struct saved_value *saved_value_lookup(struct perf_evsel *evsel,
>>   					      int cpu,
>> -					      bool create)
>> +					      bool create,
>> +					      enum stat_type type,
>> +					      int ctx,
>> +					      struct runtime_stat *stat)
>>   {
>> +	struct rblist *rblist;
>>   	struct rb_node *nd;
>>   	struct saved_value dm = {
>>   		.cpu = cpu,
>>   		.evsel = evsel,
>> +		.type = type,
>> +		.ctx = ctx,
>> +		.stat = stat,
>>   	};
>> -	nd = rblist__find(&runtime_saved_values, &dm);
>> +
>> +	rblist = &stat->value_list;
>> +
>> +	nd = rblist__find(rblist, &dm);
>>   	if (nd)
>>   		return container_of(nd, struct saved_value, rb_node);
>>   	if (create) {
>> -		rblist__add_node(&runtime_saved_values, &dm);
>> -		nd = rblist__find(&runtime_saved_values, &dm);
>> +		rblist__add_node(rblist, &dm);
>> +		nd = rblist__find(rblist, &dm);
>>   		if (nd)
>>   			return container_of(nd, struct saved_value, rb_node);
>>   	}
>> @@ -217,13 +227,24 @@ void perf_stat__reset_shadow_stats(void)
>>   	}
>>   }
>>   
>> +static void update_runtime_stat(struct runtime_stat *stat,
>> +				enum stat_type type,
>> +				int ctx, int cpu, u64 count)
>> +{
>> +	struct saved_value *v = saved_value_lookup(NULL, cpu, true,
>> +						   type, ctx, stat);
>> +
>> +	if (v)
>> +		update_stats(&v->stats, count);
>> +}
>> +
>>   /*
>>    * Update various tracking values we maintain to print
>>    * more semantic information such as miss/hit ratios,
>>    * instruction rates, etc:
>>    */
>>   void perf_stat__update_shadow_stats(struct perf_evsel *counter, u64 count,
>> -				    int cpu)
>> +				    int cpu, struct runtime_stat *stat)
>>   {
>>   	int ctx = evsel_context(counter);
>>   
>> @@ -231,50 +252,58 @@ void perf_stat__update_shadow_stats(struct perf_evsel *counter, u64 count,
>>   
>>   	if (perf_evsel__match(counter, SOFTWARE, SW_TASK_CLOCK) ||
>>   	    perf_evsel__match(counter, SOFTWARE, SW_CPU_CLOCK))
>> -		update_stats(&runtime_nsecs_stats[cpu], count);
>> +		update_runtime_stat(stat, STAT_NSECS, 0, cpu, count);
>>   	else if (perf_evsel__match(counter, HARDWARE, HW_CPU_CYCLES))
>> -		update_stats(&runtime_cycles_stats[ctx][cpu], count);
>> +		update_runtime_stat(stat, STAT_CYCLES, ctx, cpu, count);
>>   	else if (perf_stat_evsel__is(counter, CYCLES_IN_TX))
>> -		update_stats(&runtime_cycles_in_tx_stats[ctx][cpu], count);
>> +		update_runtime_stat(stat, STAT_CYCLES_IN_TX, ctx, cpu, count);
>>   	else if (perf_stat_evsel__is(counter, TRANSACTION_START))
>> -		update_stats(&runtime_transaction_stats[ctx][cpu], count);
>> +		update_runtime_stat(stat, STAT_TRANSACTION, ctx, cpu, count);
>>   	else if (perf_stat_evsel__is(counter, ELISION_START))
>> -		update_stats(&runtime_elision_stats[ctx][cpu], count);
>> +		update_runtime_stat(stat, STAT_ELISION, ctx, cpu, count);
>>   	else if (perf_stat_evsel__is(counter, TOPDOWN_TOTAL_SLOTS))
>> -		update_stats(&runtime_topdown_total_slots[ctx][cpu], count);
>> +		update_runtime_stat(stat, STAT_TOPDOWN_TOTAL_SLOTS,
>> +				    ctx, cpu, count);
>>   	else if (perf_stat_evsel__is(counter, TOPDOWN_SLOTS_ISSUED))
>> -		update_stats(&runtime_topdown_slots_issued[ctx][cpu], count);
>> +		update_runtime_stat(stat, STAT_TOPDOWN_SLOTS_ISSUED,
>> +				    ctx, cpu, count);
>>   	else if (perf_stat_evsel__is(counter, TOPDOWN_SLOTS_RETIRED))
>> -		update_stats(&runtime_topdown_slots_retired[ctx][cpu], count);
>> +		update_runtime_stat(stat, STAT_TOPDOWN_SLOTS_RETIRED,
>> +				    ctx, cpu, count);
>>   	else if (perf_stat_evsel__is(counter, TOPDOWN_FETCH_BUBBLES))
>> -		update_stats(&runtime_topdown_fetch_bubbles[ctx][cpu], count);
>> +		update_runtime_stat(stat, STAT_TOPDOWN_FETCH_BUBBLES,
>> +				    ctx, cpu, count);
>>   	else if (perf_stat_evsel__is(counter, TOPDOWN_RECOVERY_BUBBLES))
>> -		update_stats(&runtime_topdown_recovery_bubbles[ctx][cpu], count);
>> +		update_runtime_stat(stat, STAT_TOPDOWN_RECOVERY_BUBBLES,
>> +				    ctx, cpu, count);
>>   	else if (perf_evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_FRONTEND))
>> -		update_stats(&runtime_stalled_cycles_front_stats[ctx][cpu], count);
>> +		update_runtime_stat(stat, STAT_STALLED_CYCLES_FRONT,
>> +				    ctx, cpu, count);
>>   	else if (perf_evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_BACKEND))
>> -		update_stats(&runtime_stalled_cycles_back_stats[ctx][cpu], count);
>> +		update_runtime_stat(stat, STAT_STALLED_CYCLES_BACK,
>> +				    ctx, cpu, count);
>>   	else if (perf_evsel__match(counter, HARDWARE, HW_BRANCH_INSTRUCTIONS))
>> -		update_stats(&runtime_branches_stats[ctx][cpu], count);
>> +		update_runtime_stat(stat, STAT_BRANCHES, ctx, cpu, count);
>>   	else if (perf_evsel__match(counter, HARDWARE, HW_CACHE_REFERENCES))
>> -		update_stats(&runtime_cacherefs_stats[ctx][cpu], count);
>> +		update_runtime_stat(stat, STAT_CACHEREFS, ctx, cpu, count);
>>   	else if (perf_evsel__match(counter, HW_CACHE, HW_CACHE_L1D))
>> -		update_stats(&runtime_l1_dcache_stats[ctx][cpu], count);
>> +		update_runtime_stat(stat, STAT_L1_DCACHE, ctx, cpu, count);
>>   	else if (perf_evsel__match(counter, HW_CACHE, HW_CACHE_L1I))
>> -		update_stats(&runtime_ll_cache_stats[ctx][cpu], count);
>> +		update_runtime_stat(stat, STAT_L1_ICACHE, ctx, cpu, count);
>>   	else if (perf_evsel__match(counter, HW_CACHE, HW_CACHE_LL))
>> -		update_stats(&runtime_ll_cache_stats[ctx][cpu], count);
>> +		update_runtime_stat(stat, STAT_LL_CACHE, ctx, cpu, count);
>>   	else if (perf_evsel__match(counter, HW_CACHE, HW_CACHE_DTLB))
>> -		update_stats(&runtime_dtlb_cache_stats[ctx][cpu], count);
>> +		update_runtime_stat(stat, STAT_DTLB_CACHE, ctx, cpu, count);
>>   	else if (perf_evsel__match(counter, HW_CACHE, HW_CACHE_ITLB))
>> -		update_stats(&runtime_itlb_cache_stats[ctx][cpu], count);
>> +		update_runtime_stat(stat, STAT_ITLB_CACHE, ctx, cpu, count);
>>   	else if (perf_stat_evsel__is(counter, SMI_NUM))
>> -		update_stats(&runtime_smi_num_stats[ctx][cpu], count);
>> +		update_runtime_stat(stat, STAT_SMI_NUM, ctx, cpu, count);
>>   	else if (perf_stat_evsel__is(counter, APERF))
>> -		update_stats(&runtime_aperf_stats[ctx][cpu], count);
>> +		update_runtime_stat(stat, STAT_APERF, ctx, cpu, count);
>>   
>>   	if (counter->collect_stat) {
>> -		struct saved_value *v = saved_value_lookup(counter, cpu, true);
>> +		struct saved_value *v = saved_value_lookup(counter, cpu, true,
>> +							   STAT_NONE, 0, stat);
>>   		update_stats(&v->stats, count);
>>   	}
>>   }
>> @@ -395,15 +424,40 @@ void perf_stat__collect_metric_expr(struct perf_evlist *evsel_list)
>>   	}
>>   }
>>   
>> +static double runtime_stat_avg(struct runtime_stat *stat,
>> +			       enum stat_type type, int ctx, int cpu)
>> +{
>> +	struct saved_value *v;
>> +
>> +	v = saved_value_lookup(NULL, cpu, false, type, ctx, stat);
>> +	if (!v)
>> +		return 0.0;
>> +
>> +	return avg_stats(&v->stats);
>> +}
>> +
>> +static double runtime_stat_n(struct runtime_stat *stat,
>> +			     enum stat_type type, int ctx, int cpu)
>> +{
>> +	struct saved_value *v;
>> +
>> +	v = saved_value_lookup(NULL, cpu, false, type, ctx, stat);
>> +	if (!v)
>> +		return 0.0;
>> +
>> +	return v->stats.n;
>> +}
>> +
>>   static void print_stalled_cycles_frontend(int cpu,
>>   					  struct perf_evsel *evsel, double avg,
>> -					  struct perf_stat_output_ctx *out)
>> +					  struct perf_stat_output_ctx *out,
>> +					  struct runtime_stat *stat)
>>   {
>>   	double total, ratio = 0.0;
>>   	const char *color;
>>   	int ctx = evsel_context(evsel);
>>   
>> -	total = avg_stats(&runtime_cycles_stats[ctx][cpu]);
>> +	total = runtime_stat_avg(stat, STAT_CYCLES, ctx, cpu);
>>   
>>   	if (total)
>>   		ratio = avg / total * 100.0;
>> @@ -419,13 +473,14 @@ static void print_stalled_cycles_frontend(int cpu,
>>   
>>   static void print_stalled_cycles_backend(int cpu,
>>   					 struct perf_evsel *evsel, double avg,
>> -					 struct perf_stat_output_ctx *out)
>> +					 struct perf_stat_output_ctx *out,
>> +					 struct runtime_stat *stat)
>>   {
>>   	double total, ratio = 0.0;
>>   	const char *color;
>>   	int ctx = evsel_context(evsel);
>>   
>> -	total = avg_stats(&runtime_cycles_stats[ctx][cpu]);
>> +	total = runtime_stat_avg(stat, STAT_CYCLES, ctx, cpu);
>>   
>>   	if (total)
>>   		ratio = avg / total * 100.0;
>> @@ -438,13 +493,14 @@ static void print_stalled_cycles_backend(int cpu,
>>   static void print_branch_misses(int cpu,
>>   				struct perf_evsel *evsel,
>>   				double avg,
>> -				struct perf_stat_output_ctx *out)
>> +				struct perf_stat_output_ctx *out,
>> +				struct runtime_stat *stat)
>>   {
>>   	double total, ratio = 0.0;
>>   	const char *color;
>>   	int ctx = evsel_context(evsel);
>>   
>> -	total = avg_stats(&runtime_branches_stats[ctx][cpu]);
>> +	total = runtime_stat_avg(stat, STAT_BRANCHES, ctx, cpu);
>>   
>>   	if (total)
>>   		ratio = avg / total * 100.0;
>> @@ -457,13 +513,15 @@ static void print_branch_misses(int cpu,
>>   static void print_l1_dcache_misses(int cpu,
>>   				   struct perf_evsel *evsel,
>>   				   double avg,
>> -				   struct perf_stat_output_ctx *out)
>> +				   struct perf_stat_output_ctx *out,
>> +				   struct runtime_stat *stat)
>> +
>>   {
>>   	double total, ratio = 0.0;
>>   	const char *color;
>>   	int ctx = evsel_context(evsel);
>>   
>> -	total = avg_stats(&runtime_l1_dcache_stats[ctx][cpu]);
>> +	total = runtime_stat_avg(stat, STAT_L1_DCACHE, ctx, cpu);
>>   
>>   	if (total)
>>   		ratio = avg / total * 100.0;
>> @@ -476,13 +534,15 @@ static void print_l1_dcache_misses(int cpu,
>>   static void print_l1_icache_misses(int cpu,
>>   				   struct perf_evsel *evsel,
>>   				   double avg,
>> -				   struct perf_stat_output_ctx *out)
>> +				   struct perf_stat_output_ctx *out,
>> +				   struct runtime_stat *stat)
>> +
>>   {
>>   	double total, ratio = 0.0;
>>   	const char *color;
>>   	int ctx = evsel_context(evsel);
>>   
>> -	total = avg_stats(&runtime_l1_icache_stats[ctx][cpu]);
>> +	total = runtime_stat_avg(stat, STAT_L1_ICACHE, ctx, cpu);
>>   
>>   	if (total)
>>   		ratio = avg / total * 100.0;
>> @@ -494,13 +554,14 @@ static void print_l1_icache_misses(int cpu,
>>   static void print_dtlb_cache_misses(int cpu,
>>   				    struct perf_evsel *evsel,
>>   				    double avg,
>> -				    struct perf_stat_output_ctx *out)
>> +				    struct perf_stat_output_ctx *out,
>> +				    struct runtime_stat *stat)
>>   {
>>   	double total, ratio = 0.0;
>>   	const char *color;
>>   	int ctx = evsel_context(evsel);
>>   
>> -	total = avg_stats(&runtime_dtlb_cache_stats[ctx][cpu]);
>> +	total = runtime_stat_avg(stat, STAT_DTLB_CACHE, ctx, cpu);
>>   
>>   	if (total)
>>   		ratio = avg / total * 100.0;
>> @@ -512,13 +573,14 @@ static void print_dtlb_cache_misses(int cpu,
>>   static void print_itlb_cache_misses(int cpu,
>>   				    struct perf_evsel *evsel,
>>   				    double avg,
>> -				    struct perf_stat_output_ctx *out)
>> +				    struct perf_stat_output_ctx *out,
>> +				    struct runtime_stat *stat)
>>   {
>>   	double total, ratio = 0.0;
>>   	const char *color;
>>   	int ctx = evsel_context(evsel);
>>   
>> -	total = avg_stats(&runtime_itlb_cache_stats[ctx][cpu]);
>> +	total = runtime_stat_avg(stat, STAT_ITLB_CACHE, ctx, cpu);
>>   
>>   	if (total)
>>   		ratio = avg / total * 100.0;
>> @@ -530,13 +592,14 @@ static void print_itlb_cache_misses(int cpu,
>>   static void print_ll_cache_misses(int cpu,
>>   				  struct perf_evsel *evsel,
>>   				  double avg,
>> -				  struct perf_stat_output_ctx *out)
>> +				  struct perf_stat_output_ctx *out,
>> +				  struct runtime_stat *stat)
>>   {
>>   	double total, ratio = 0.0;
>>   	const char *color;
>>   	int ctx = evsel_context(evsel);
>>   
>> -	total = avg_stats(&runtime_ll_cache_stats[ctx][cpu]);
>> +	total = runtime_stat_avg(stat, STAT_LL_CACHE, ctx, cpu);
>>   
>>   	if (total)
>>   		ratio = avg / total * 100.0;
>> @@ -594,68 +657,72 @@ static double sanitize_val(double x)
>>   	return x;
>>   }
>>   
>> -static double td_total_slots(int ctx, int cpu)
>> +static double td_total_slots(int ctx, int cpu, struct runtime_stat *stat)
>>   {
>> -	return avg_stats(&runtime_topdown_total_slots[ctx][cpu]);
>> +	return runtime_stat_avg(stat, STAT_TOPDOWN_TOTAL_SLOTS, ctx, cpu);
>>   }
>>   
>> -static double td_bad_spec(int ctx, int cpu)
>> +static double td_bad_spec(int ctx, int cpu, struct runtime_stat *stat)
>>   {
>>   	double bad_spec = 0;
>>   	double total_slots;
>>   	double total;
>>   
>> -	total = avg_stats(&runtime_topdown_slots_issued[ctx][cpu]) -
>> -		avg_stats(&runtime_topdown_slots_retired[ctx][cpu]) +
>> -		avg_stats(&runtime_topdown_recovery_bubbles[ctx][cpu]);
>> -	total_slots = td_total_slots(ctx, cpu);
>> +	total = runtime_stat_avg(stat, STAT_TOPDOWN_SLOTS_ISSUED, ctx, cpu) -
>> +		runtime_stat_avg(stat, STAT_TOPDOWN_SLOTS_RETIRED, ctx, cpu) +
>> +		runtime_stat_avg(stat, STAT_TOPDOWN_RECOVERY_BUBBLES, ctx, cpu);
>> +
>> +	total_slots = td_total_slots(ctx, cpu, stat);
>>   	if (total_slots)
>>   		bad_spec = total / total_slots;
>>   	return sanitize_val(bad_spec);
>>   }
>>   
>> -static double td_retiring(int ctx, int cpu)
>> +static double td_retiring(int ctx, int cpu, struct runtime_stat *stat)
>>   {
>>   	double retiring = 0;
>> -	double total_slots = td_total_slots(ctx, cpu);
>> -	double ret_slots = avg_stats(&runtime_topdown_slots_retired[ctx][cpu]);
>> +	double total_slots = td_total_slots(ctx, cpu, stat);
>> +	double ret_slots = runtime_stat_avg(stat, STAT_TOPDOWN_SLOTS_RETIRED,
>> +					    ctx, cpu);
>>   
>>   	if (total_slots)
>>   		retiring = ret_slots / total_slots;
>>   	return retiring;
>>   }
>>   
>> -static double td_fe_bound(int ctx, int cpu)
>> +static double td_fe_bound(int ctx, int cpu, struct runtime_stat *stat)
>>   {
>>   	double fe_bound = 0;
>> -	double total_slots = td_total_slots(ctx, cpu);
>> -	double fetch_bub = avg_stats(&runtime_topdown_fetch_bubbles[ctx][cpu]);
>> +	double total_slots = td_total_slots(ctx, cpu, stat);
>> +	double fetch_bub = runtime_stat_avg(stat, STAT_TOPDOWN_FETCH_BUBBLES,
>> +					    ctx, cpu);
>>   
>>   	if (total_slots)
>>   		fe_bound = fetch_bub / total_slots;
>>   	return fe_bound;
>>   }
>>   
>> -static double td_be_bound(int ctx, int cpu)
>> +static double td_be_bound(int ctx, int cpu, struct runtime_stat *stat)
>>   {
>> -	double sum = (td_fe_bound(ctx, cpu) +
>> -		      td_bad_spec(ctx, cpu) +
>> -		      td_retiring(ctx, cpu));
>> +	double sum = (td_fe_bound(ctx, cpu, stat) +
>> +		      td_bad_spec(ctx, cpu, stat) +
>> +		      td_retiring(ctx, cpu, stat));
>>   	if (sum == 0)
>>   		return 0;
>>   	return sanitize_val(1.0 - sum);
>>   }
>>   
>>   static void print_smi_cost(int cpu, struct perf_evsel *evsel,
>> -			   struct perf_stat_output_ctx *out)
>> +			   struct perf_stat_output_ctx *out,
>> +			   struct runtime_stat *stat)
>>   {
>>   	double smi_num, aperf, cycles, cost = 0.0;
>>   	int ctx = evsel_context(evsel);
>>   	const char *color = NULL;
>>   
>> -	smi_num = avg_stats(&runtime_smi_num_stats[ctx][cpu]);
>> -	aperf = avg_stats(&runtime_aperf_stats[ctx][cpu]);
>> -	cycles = avg_stats(&runtime_cycles_stats[ctx][cpu]);
>> +	smi_num = runtime_stat_avg(stat, STAT_SMI_NUM, ctx, cpu);
>> +	aperf = runtime_stat_avg(stat, STAT_APERF, ctx, cpu);
>> +	cycles = runtime_stat_avg(stat, STAT_CYCLES, ctx, cpu);
>>   
>>   	if ((cycles == 0) || (aperf == 0))
>>   		return;
>> @@ -675,7 +742,8 @@ static void generic_metric(const char *metric_expr,
>>   			   const char *metric_name,
>>   			   double avg,
>>   			   int cpu,
>> -			   struct perf_stat_output_ctx *out)
>> +			   struct perf_stat_output_ctx *out,
>> +			   struct runtime_stat *stat)
>>   {
>>   	print_metric_t print_metric = out->print_metric;
>>   	struct parse_ctx pctx;
>> @@ -694,7 +762,8 @@ static void generic_metric(const char *metric_expr,
>>   			stats = &walltime_nsecs_stats;
>>   			scale = 1e-9;
>>   		} else {
>> -			v = saved_value_lookup(metric_events[i], cpu, false);
>> +			v = saved_value_lookup(metric_events[i], cpu, false,
>> +					       STAT_NONE, 0, stat);
>>   			if (!v)
>>   				break;
>>   			stats = &v->stats;
>> @@ -722,7 +791,8 @@ static void generic_metric(const char *metric_expr,
>>   void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>>   				   double avg, int cpu,
>>   				   struct perf_stat_output_ctx *out,
>> -				   struct rblist *metric_events)
>> +				   struct rblist *metric_events,
>> +				   struct runtime_stat *stat)
>>   {
>>   	void *ctxp = out->ctx;
>>   	print_metric_t print_metric = out->print_metric;
>> @@ -733,7 +803,8 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>>   	int num = 1;
>>   
>>   	if (perf_evsel__match(evsel, HARDWARE, HW_INSTRUCTIONS)) {
>> -		total = avg_stats(&runtime_cycles_stats[ctx][cpu]);
>> +		total = runtime_stat_avg(stat, STAT_CYCLES, ctx, cpu);
>> +
>>   		if (total) {
>>   			ratio = avg / total;
>>   			print_metric(ctxp, NULL, "%7.2f ",
>> @@ -741,8 +812,13 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>>   		} else {
>>   			print_metric(ctxp, NULL, NULL, "insn per cycle", 0);
>>   		}
>> -		total = avg_stats(&runtime_stalled_cycles_front_stats[ctx][cpu]);
>> -		total = max(total, avg_stats(&runtime_stalled_cycles_back_stats[ctx][cpu]));
>> +
>> +		total = runtime_stat_avg(stat, STAT_STALLED_CYCLES_FRONT,
>> +					 ctx, cpu);
>> +
>> +		total = max(total, runtime_stat_avg(stat,
>> +						    STAT_STALLED_CYCLES_BACK,
>> +						    ctx, cpu));
>>   
>>   		if (total && avg) {
>>   			out->new_line(ctxp);
>> @@ -755,8 +831,8 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>>   				     "stalled cycles per insn", 0);
>>   		}
>>   	} else if (perf_evsel__match(evsel, HARDWARE, HW_BRANCH_MISSES)) {
>> -		if (runtime_branches_stats[ctx][cpu].n != 0)
>> -			print_branch_misses(cpu, evsel, avg, out);
>> +		if (runtime_stat_n(stat, STAT_BRANCHES, ctx, cpu) != 0)
>> +			print_branch_misses(cpu, evsel, avg, out, stat);
>>   		else
>>   			print_metric(ctxp, NULL, NULL, "of all branches", 0);
>>   	} else if (
>> @@ -764,8 +840,9 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>>   		evsel->attr.config ==  ( PERF_COUNT_HW_CACHE_L1D |
>>   					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
>>   					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
>> -		if (runtime_l1_dcache_stats[ctx][cpu].n != 0)
>> -			print_l1_dcache_misses(cpu, evsel, avg, out);
>> +
>> +		if (runtime_stat_n(stat, STAT_L1_DCACHE, ctx, cpu) != 0)
>> +			print_l1_dcache_misses(cpu, evsel, avg, out, stat);
>>   		else
>>   			print_metric(ctxp, NULL, NULL, "of all L1-dcache hits", 0);
>>   	} else if (
>> @@ -773,8 +850,9 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>>   		evsel->attr.config ==  ( PERF_COUNT_HW_CACHE_L1I |
>>   					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
>>   					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
>> -		if (runtime_l1_icache_stats[ctx][cpu].n != 0)
>> -			print_l1_icache_misses(cpu, evsel, avg, out);
>> +
>> +		if (runtime_stat_n(stat, STAT_L1_ICACHE, ctx, cpu) != 0)
>> +			print_l1_icache_misses(cpu, evsel, avg, out, stat);
>>   		else
>>   			print_metric(ctxp, NULL, NULL, "of all L1-icache hits", 0);
>>   	} else if (
>> @@ -782,8 +860,9 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>>   		evsel->attr.config ==  ( PERF_COUNT_HW_CACHE_DTLB |
>>   					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
>>   					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
>> -		if (runtime_dtlb_cache_stats[ctx][cpu].n != 0)
>> -			print_dtlb_cache_misses(cpu, evsel, avg, out);
>> +
>> +		if (runtime_stat_n(stat, STAT_DTLB_CACHE, ctx, cpu) != 0)
>> +			print_dtlb_cache_misses(cpu, evsel, avg, out, stat);
>>   		else
>>   			print_metric(ctxp, NULL, NULL, "of all dTLB cache hits", 0);
>>   	} else if (
>> @@ -791,8 +870,9 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>>   		evsel->attr.config ==  ( PERF_COUNT_HW_CACHE_ITLB |
>>   					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
>>   					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
>> -		if (runtime_itlb_cache_stats[ctx][cpu].n != 0)
>> -			print_itlb_cache_misses(cpu, evsel, avg, out);
>> +
>> +		if (runtime_stat_n(stat, STAT_ITLB_CACHE, ctx, cpu) != 0)
>> +			print_itlb_cache_misses(cpu, evsel, avg, out, stat);
>>   		else
>>   			print_metric(ctxp, NULL, NULL, "of all iTLB cache hits", 0);
>>   	} else if (
>> @@ -800,27 +880,28 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>>   		evsel->attr.config ==  ( PERF_COUNT_HW_CACHE_LL |
>>   					((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
>>   					 ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16))) {
>> -		if (runtime_ll_cache_stats[ctx][cpu].n != 0)
>> -			print_ll_cache_misses(cpu, evsel, avg, out);
>> +
>> +		if (runtime_stat_n(stat, STAT_LL_CACHE, ctx, cpu) != 0)
>> +			print_ll_cache_misses(cpu, evsel, avg, out, stat);
>>   		else
>>   			print_metric(ctxp, NULL, NULL, "of all LL-cache hits", 0);
>>   	} else if (perf_evsel__match(evsel, HARDWARE, HW_CACHE_MISSES)) {
>> -		total = avg_stats(&runtime_cacherefs_stats[ctx][cpu]);
>> +		total = runtime_stat_avg(stat, STAT_CACHEREFS, ctx, cpu);
>>   
>>   		if (total)
>>   			ratio = avg * 100 / total;
>>   
>> -		if (runtime_cacherefs_stats[ctx][cpu].n != 0)
>> +		if (runtime_stat_n(stat, STAT_CACHEREFS, ctx, cpu) != 0)
>>   			print_metric(ctxp, NULL, "%8.3f %%",
>>   				     "of all cache refs", ratio);
>>   		else
>>   			print_metric(ctxp, NULL, NULL, "of all cache refs", 0);
>>   	} else if (perf_evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_FRONTEND)) {
>> -		print_stalled_cycles_frontend(cpu, evsel, avg, out);
>> +		print_stalled_cycles_frontend(cpu, evsel, avg, out, stat);
>>   	} else if (perf_evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_BACKEND)) {
>> -		print_stalled_cycles_backend(cpu, evsel, avg, out);
>> +		print_stalled_cycles_backend(cpu, evsel, avg, out, stat);
>>   	} else if (perf_evsel__match(evsel, HARDWARE, HW_CPU_CYCLES)) {
>> -		total = avg_stats(&runtime_nsecs_stats[cpu]);
>> +		total = runtime_stat_avg(stat, STAT_NSECS, 0, cpu);
>>   
>>   		if (total) {
>>   			ratio = avg / total;
>> @@ -829,7 +910,8 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>>   			print_metric(ctxp, NULL, NULL, "Ghz", 0);
>>   		}
>>   	} else if (perf_stat_evsel__is(evsel, CYCLES_IN_TX)) {
>> -		total = avg_stats(&runtime_cycles_stats[ctx][cpu]);
>> +		total = runtime_stat_avg(stat, STAT_CYCLES, ctx, cpu);
>> +
>>   		if (total)
>>   			print_metric(ctxp, NULL,
>>   					"%7.2f%%", "transactional cycles",
>> @@ -838,8 +920,9 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>>   			print_metric(ctxp, NULL, NULL, "transactional cycles",
>>   				     0);
>>   	} else if (perf_stat_evsel__is(evsel, CYCLES_IN_TX_CP)) {
>> -		total = avg_stats(&runtime_cycles_stats[ctx][cpu]);
>> -		total2 = avg_stats(&runtime_cycles_in_tx_stats[ctx][cpu]);
>> +		total = runtime_stat_avg(stat, STAT_CYCLES, ctx, cpu);
>> +		total2 = runtime_stat_avg(stat, STAT_CYCLES_IN_TX, ctx, cpu);
>> +
>>   		if (total2 < avg)
>>   			total2 = avg;
>>   		if (total)
>> @@ -848,19 +931,21 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>>   		else
>>   			print_metric(ctxp, NULL, NULL, "aborted cycles", 0);
>>   	} else if (perf_stat_evsel__is(evsel, TRANSACTION_START)) {
>> -		total = avg_stats(&runtime_cycles_in_tx_stats[ctx][cpu]);
>> +		total = runtime_stat_avg(stat, STAT_CYCLES_IN_TX,
>> +					 ctx, cpu);
>>   
>>   		if (avg)
>>   			ratio = total / avg;
>>   
>> -		if (runtime_cycles_in_tx_stats[ctx][cpu].n != 0)
>> +		if (runtime_stat_n(stat, STAT_CYCLES_IN_TX, ctx, cpu) != 0)
>>   			print_metric(ctxp, NULL, "%8.0f",
>>   				     "cycles / transaction", ratio);
>>   		else
>>   			print_metric(ctxp, NULL, NULL, "cycles / transaction",
>> -				     0);
>> +				      0);
>>   	} else if (perf_stat_evsel__is(evsel, ELISION_START)) {
>> -		total = avg_stats(&runtime_cycles_in_tx_stats[ctx][cpu]);
>> +		total = runtime_stat_avg(stat, STAT_CYCLES_IN_TX,
>> +					 ctx, cpu);
>>   
>>   		if (avg)
>>   			ratio = total / avg;
>> @@ -874,28 +959,28 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>>   		else
>>   			print_metric(ctxp, NULL, NULL, "CPUs utilized", 0);
>>   	} else if (perf_stat_evsel__is(evsel, TOPDOWN_FETCH_BUBBLES)) {
>> -		double fe_bound = td_fe_bound(ctx, cpu);
>> +		double fe_bound = td_fe_bound(ctx, cpu, stat);
>>   
>>   		if (fe_bound > 0.2)
>>   			color = PERF_COLOR_RED;
>>   		print_metric(ctxp, color, "%8.1f%%", "frontend bound",
>>   				fe_bound * 100.);
>>   	} else if (perf_stat_evsel__is(evsel, TOPDOWN_SLOTS_RETIRED)) {
>> -		double retiring = td_retiring(ctx, cpu);
>> +		double retiring = td_retiring(ctx, cpu, stat);
>>   
>>   		if (retiring > 0.7)
>>   			color = PERF_COLOR_GREEN;
>>   		print_metric(ctxp, color, "%8.1f%%", "retiring",
>>   				retiring * 100.);
>>   	} else if (perf_stat_evsel__is(evsel, TOPDOWN_RECOVERY_BUBBLES)) {
>> -		double bad_spec = td_bad_spec(ctx, cpu);
>> +		double bad_spec = td_bad_spec(ctx, cpu, stat);
>>   
>>   		if (bad_spec > 0.1)
>>   			color = PERF_COLOR_RED;
>>   		print_metric(ctxp, color, "%8.1f%%", "bad speculation",
>>   				bad_spec * 100.);
>>   	} else if (perf_stat_evsel__is(evsel, TOPDOWN_SLOTS_ISSUED)) {
>> -		double be_bound = td_be_bound(ctx, cpu);
>> +		double be_bound = td_be_bound(ctx, cpu, stat);
>>   		const char *name = "backend bound";
>>   		static int have_recovery_bubbles = -1;
>>   
>> @@ -908,19 +993,19 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>>   
>>   		if (be_bound > 0.2)
>>   			color = PERF_COLOR_RED;
>> -		if (td_total_slots(ctx, cpu) > 0)
>> +		if (td_total_slots(ctx, cpu, stat) > 0)
>>   			print_metric(ctxp, color, "%8.1f%%", name,
>>   					be_bound * 100.);
>>   		else
>>   			print_metric(ctxp, NULL, NULL, name, 0);
>>   	} else if (evsel->metric_expr) {
>>   		generic_metric(evsel->metric_expr, evsel->metric_events, evsel->name,
>> -				evsel->metric_name, avg, cpu, out);
>> -	} else if (runtime_nsecs_stats[cpu].n != 0) {
>> +				evsel->metric_name, avg, cpu, out, stat);
>> +	} else if (runtime_stat_n(stat, STAT_NSECS, 0, cpu) != 0) {
>>   		char unit = 'M';
>>   		char unit_buf[10];
>>   
>> -		total = avg_stats(&runtime_nsecs_stats[cpu]);
>> +		total = runtime_stat_avg(stat, STAT_NSECS, 0, cpu);
>>   
>>   		if (total)
>>   			ratio = 1000.0 * avg / total;
>> @@ -931,7 +1016,7 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>>   		snprintf(unit_buf, sizeof(unit_buf), "%c/sec", unit);
>>   		print_metric(ctxp, NULL, "%8.3f", unit_buf, ratio);
>>   	} else if (perf_stat_evsel__is(evsel, SMI_NUM)) {
>> -		print_smi_cost(cpu, evsel, out);
>> +		print_smi_cost(cpu, evsel, out, stat);
>>   	} else {
>>   		num = 0;
>>   	}
>> @@ -944,7 +1029,7 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>>   				out->new_line(ctxp);
>>   			generic_metric(mexp->metric_expr, mexp->metric_events,
>>   					evsel->name, mexp->metric_name,
>> -					avg, cpu, out);
>> +					avg, cpu, out, stat);
>>   		}
>>   	}
>>   	if (num == 0)
>> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
>> index 151e9ef..78abfd4 100644
>> --- a/tools/perf/util/stat.c
>> +++ b/tools/perf/util/stat.c
>> @@ -278,9 +278,11 @@ process_counter_values(struct perf_stat_config *config, struct perf_evsel *evsel
>>   			perf_evsel__compute_deltas(evsel, cpu, thread, count);
>>   		perf_counts_values__scale(count, config->scale, NULL);
>>   		if (config->aggr_mode == AGGR_NONE)
>> -			perf_stat__update_shadow_stats(evsel, count->val, cpu);
>> +			perf_stat__update_shadow_stats(evsel, count->val, cpu,
>> +						       &rt_stat);
>>   		if (config->aggr_mode == AGGR_THREAD)
>> -			perf_stat__update_shadow_stats(evsel, count->val, 0);
>> +			perf_stat__update_shadow_stats(evsel, count->val, 0,
>> +						       &rt_stat);
>>   		break;
>>   	case AGGR_GLOBAL:
>>   		aggr->val += count->val;
>> @@ -362,7 +364,7 @@ int perf_stat_process_counter(struct perf_stat_config *config,
>>   	/*
>>   	 * Save the full runtime - to allow normalization during printout:
>>   	 */
>> -	perf_stat__update_shadow_stats(counter, *count, 0);
>> +	perf_stat__update_shadow_stats(counter, *count, 0, &rt_stat);
>>   
>>   	return 0;
>>   }
>> diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
>> index 1e2b761..b8448b1 100644
>> --- a/tools/perf/util/stat.h
>> +++ b/tools/perf/util/stat.h
>> @@ -130,7 +130,7 @@ void runtime_stat__exit(struct runtime_stat *stat);
>>   void perf_stat__init_shadow_stats(void);
>>   void perf_stat__reset_shadow_stats(void);
>>   void perf_stat__update_shadow_stats(struct perf_evsel *counter, u64 count,
>> -				    int cpu);
>> +				    int cpu, struct runtime_stat *stat);
>>   struct perf_stat_output_ctx {
>>   	void *ctx;
>>   	print_metric_t print_metric;
>> @@ -141,7 +141,8 @@ struct perf_stat_output_ctx {
>>   void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
>>   				   double avg, int cpu,
>>   				   struct perf_stat_output_ctx *out,
>> -				   struct rblist *metric_events);
>> +				   struct rblist *metric_events,
>> +				   struct runtime_stat *stat);
>>   void perf_stat__collect_metric_expr(struct perf_evlist *);
>>   
>>   int perf_evlist__alloc_stats(struct perf_evlist *evlist, bool alloc_raw);
>> -- 
>> 2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 10/12] perf util: Reuse thread_map__new_by_uid to enumerate threads from /proc
  2017-12-01 14:44   ` Arnaldo Carvalho de Melo
  2017-12-01 15:02     ` Arnaldo Carvalho de Melo
@ 2017-12-02  4:47     ` Jin, Yao
  1 sibling, 0 replies; 27+ messages in thread
From: Jin, Yao @ 2017-12-02  4:47 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin



On 12/1/2017 10:44 PM, Arnaldo Carvalho de Melo wrote:
> Em Fri, Dec 01, 2017 at 06:57:34PM +0800, Jin Yao escreveu:
>> Perf already has a function thread_map__new_by_uid() which can
>> enumerate all threads from /proc by uid.
>>
>> This patch creates a static function enumerate_threads() which
>> reuses the common code in thread_map__new_by_uid() to enumerate
>> threads from /proc.
>>
>> The enumerate_threads() is shared by thread_map__new_by_uid()
>> and a new function thread_map__new_threads().
>>
>> The new function thread_map__new_threads() is called to enumerate
>> all threads from /proc.
>>
>> Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
>> ---
>>   tools/perf/tests/thread-map.c |  2 +-
>>   tools/perf/util/evlist.c      |  3 ++-
>>   tools/perf/util/thread_map.c  | 19 ++++++++++++++++---
>>   tools/perf/util/thread_map.h  |  3 ++-
>>   4 files changed, 21 insertions(+), 6 deletions(-)
>>
>> diff --git a/tools/perf/tests/thread-map.c b/tools/perf/tests/thread-map.c
>> index dbcb6a1..4de1939 100644
>> --- a/tools/perf/tests/thread-map.c
>> +++ b/tools/perf/tests/thread-map.c
>> @@ -105,7 +105,7 @@ int test__thread_map_remove(struct test *test __maybe_unused, int subtest __mayb
>>   	TEST_ASSERT_VAL("failed to allocate map string",
>>   			asprintf(&str, "%d,%d", getpid(), getppid()) >= 0);
>>   
>> -	threads = thread_map__new_str(str, NULL, 0);
>> +	threads = thread_map__new_str(str, NULL, 0, false);
>>   
>>   	TEST_ASSERT_VAL("failed to allocate thread_map",
>>   			threads);
>> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
>> index 199bb82..05b8f2b 100644
>> --- a/tools/perf/util/evlist.c
>> +++ b/tools/perf/util/evlist.c
>> @@ -1102,7 +1102,8 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target)
>>   	struct cpu_map *cpus;
>>   	struct thread_map *threads;
>>   
>> -	threads = thread_map__new_str(target->pid, target->tid, target->uid);
>> +	threads = thread_map__new_str(target->pid, target->tid, target->uid,
>> +				      target->per_thread);
>>   
>>   	if (!threads)
>>   		return -1;
>> diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c
>> index be0d5a7..5672268 100644
>> --- a/tools/perf/util/thread_map.c
>> +++ b/tools/perf/util/thread_map.c
>> @@ -92,7 +92,7 @@ struct thread_map *thread_map__new_by_tid(pid_t tid)
>>   	return threads;
>>   }
>>   
>> -struct thread_map *thread_map__new_by_uid(uid_t uid)
>> +static struct thread_map *enumerate_threads(uid_t uid)
>>   {
>>   	DIR *proc;
>>   	int max_threads = 32, items, i;
>> @@ -124,7 +124,7 @@ struct thread_map *thread_map__new_by_uid(uid_t uid)
>>   		if (stat(path, &st) != 0)
>>   			continue;
> 
> Look, for the case where you want all threads enumerated you will incur
> the above stat() cost for all of them and will not use it at all...
> 
> And new_threads() seems vague, I'm using the ter used by 'perf record'
> for system wide sampling, see the patch below:
> 
> diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c
> index be0d5a736dea..79d11bd4543a 100644
> --- a/tools/perf/util/thread_map.c
> +++ b/tools/perf/util/thread_map.c
> @@ -92,7 +92,7 @@ struct thread_map *thread_map__new_by_tid(pid_t tid)
>   	return threads;
>   }
>   
> -struct thread_map *thread_map__new_by_uid(uid_t uid)
> +static struct thread_map *__thread_map__new_all_cpus(uid_t uid)
>   {
>   	DIR *proc;
>   	int max_threads = 32, items, i;
> @@ -113,7 +113,6 @@ struct thread_map *thread_map__new_by_uid(uid_t uid)
>   	while ((dirent = readdir(proc)) != NULL) {
>   		char *end;
>   		bool grow = false;
> -		struct stat st;
>   		pid_t pid = strtol(dirent->d_name, &end, 10);
>   
>   		if (*end) /* only interested in proper numerical dirents */
> @@ -121,11 +120,12 @@ struct thread_map *thread_map__new_by_uid(uid_t uid)
>   
>   		snprintf(path, sizeof(path), "/proc/%s", dirent->d_name);
>   
> -		if (stat(path, &st) != 0)
> -			continue;
> +		if (uid != UID_MAX) {
> +			struct stat st;
>   
> -		if (st.st_uid != uid)
> -			continue;
> +			if (stat(path, &st) != 0 || st.st_uid != uid)
> +				continue;
> +		}
>   
>   		snprintf(path, sizeof(path), "/proc/%d/task", pid);
>   		items = scandir(path, &namelist, filter, NULL);
> @@ -178,6 +178,16 @@ struct thread_map *thread_map__new_by_uid(uid_t uid)
>   	goto out_closedir;
>   }
>   
> +struct thread_map *thread_map__new_all_cpus(void)
> +{
> +	return __thread__new_all_cpus(UID_MAX);
> +}
> +
> +struct thread_map *thread_map__new_by_uid(uid_t uid)
> +{
> +	return __thread__new_all_cpus(uid);
> +}
> +
>   struct thread_map *thread_map__new(pid_t pid, pid_t tid, uid_t uid)
>   {
>   	if (pid != -1)
> diff --git a/tools/perf/util/thread_map.h b/tools/perf/util/thread_map.h
> index f15803985435..07a765fb22bb 100644
> --- a/tools/perf/util/thread_map.h
> +++ b/tools/perf/util/thread_map.h
> @@ -23,6 +23,7 @@ struct thread_map *thread_map__new_dummy(void);
>   struct thread_map *thread_map__new_by_pid(pid_t pid);
>   struct thread_map *thread_map__new_by_tid(pid_t tid);
>   struct thread_map *thread_map__new_by_uid(uid_t uid);
> +struct thread_map *thread_map__new_all_cpus(void);
>   struct thread_map *thread_map__new(pid_t pid, pid_t tid, uid_t uid);
>   struct thread_map *thread_map__new_event(struct thread_map_event *event);
>   
> 

It looks good, thanks Arnaldo!

Thanks
Jin Yao

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 10/12] perf util: Reuse thread_map__new_by_uid to enumerate threads from /proc
  2017-12-01 15:02     ` Arnaldo Carvalho de Melo
@ 2017-12-02  4:53       ` Jin, Yao
  0 siblings, 0 replies; 27+ messages in thread
From: Jin, Yao @ 2017-12-02  4:53 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin



On 12/1/2017 11:02 PM, Arnaldo Carvalho de Melo wrote:
> Em Fri, Dec 01, 2017 at 11:44:25AM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Fri, Dec 01, 2017 at 06:57:34PM +0800, Jin Yao escreveu:
>>> Perf already has a function thread_map__new_by_uid() which can
>>> enumerate all threads from /proc by uid.
>>>
>>> This patch creates a static function enumerate_threads() which
>>> reuses the common code in thread_map__new_by_uid() to enumerate
>>> threads from /proc.
>>>
>>> The enumerate_threads() is shared by thread_map__new_by_uid()
>>> and a new function thread_map__new_threads().
>>>
>>> The new function thread_map__new_threads() is called to enumerate
>>> all threads from /proc.
>>>
>>> Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
>>> ---
>>>   tools/perf/tests/thread-map.c |  2 +-
>>>   tools/perf/util/evlist.c      |  3 ++-
>>>   tools/perf/util/thread_map.c  | 19 ++++++++++++++++---
>>>   tools/perf/util/thread_map.h  |  3 ++-
>>>   4 files changed, 21 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/tools/perf/tests/thread-map.c b/tools/perf/tests/thread-map.c
>>> index dbcb6a1..4de1939 100644
>>> --- a/tools/perf/tests/thread-map.c
>>> +++ b/tools/perf/tests/thread-map.c
>>> @@ -105,7 +105,7 @@ int test__thread_map_remove(struct test *test __maybe_unused, int subtest __mayb
>>>   	TEST_ASSERT_VAL("failed to allocate map string",
>>>   			asprintf(&str, "%d,%d", getpid(), getppid()) >= 0);
>>>   
>>> -	threads = thread_map__new_str(str, NULL, 0);
>>> +	threads = thread_map__new_str(str, NULL, 0, false);
>>>   
>>>   	TEST_ASSERT_VAL("failed to allocate thread_map",
>>>   			threads);
>>> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
>>> index 199bb82..05b8f2b 100644
>>> --- a/tools/perf/util/evlist.c
>>> +++ b/tools/perf/util/evlist.c
>>> @@ -1102,7 +1102,8 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target)
>>>   	struct cpu_map *cpus;
>>>   	struct thread_map *threads;
>>>   
>>> -	threads = thread_map__new_str(target->pid, target->tid, target->uid);
>>> +	threads = thread_map__new_str(target->pid, target->tid, target->uid,
>>> +				      target->per_thread);
>>>   
>>>   	if (!threads)
>>>   		return -1;
>>> diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c
>>> index be0d5a7..5672268 100644
>>> --- a/tools/perf/util/thread_map.c
>>> +++ b/tools/perf/util/thread_map.c
>>> @@ -92,7 +92,7 @@ struct thread_map *thread_map__new_by_tid(pid_t tid)
>>>   	return threads;
>>>   }
>>>   
>>> -struct thread_map *thread_map__new_by_uid(uid_t uid)
>>> +static struct thread_map *enumerate_threads(uid_t uid)
>>>   {
>>>   	DIR *proc;
>>>   	int max_threads = 32, items, i;
>>> @@ -124,7 +124,7 @@ struct thread_map *thread_map__new_by_uid(uid_t uid)
>>>   		if (stat(path, &st) != 0)
>>>   			continue;
>>
>> Look, for the case where you want all threads enumerated you will incur
>> the above stat() cost for all of them and will not use it at all...
>>
>> And new_threads() seems vague, I'm using the ter used by 'perf record'
>> for system wide sampling, see the patch below:
>>
> 
> The one below even compiles, I'll push what I merged already and we can
> continue from there:
> 
> [acme@jouet linux]$ git log --oneline -3
> 5cc4fc8994eb (HEAD -> perf/core) perf thread_map: Add method to map all threads in the system
> 9e49c22bd4d8 perf stat: Add rbtree node_delete op
> 1c92e3226546 perf rblist: Create rblist__exit() function
> [acme@jouet linux]$
> 
> 

Yes we can continue from these commits.

I just pull the branch perf/core from 
https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git, but it 
looks the new commits have not been synced.

Maybe I will wait to next week to see if the new commits are synced and 
then continue the work.

Thanks
Jin Yao

> commit 5cc4fc8994eb3f3af1950f3b726b6008900c7b06
> Author: Arnaldo Carvalho de Melo <acme@redhat.com>
> Date:   Fri Dec 1 11:44:30 2017 -0300
> 
>      perf thread_map: Add method to map all threads in the system
>      
>      Reusing the thread_map__new_by_uid() proc scanning already in place to
>      return a map with all threads in the system.
>      
>      Based-on-a-patch-by: Jin Yao <yao.jin@linux.intel.com>
>      Acked-by: Jiri Olsa <jolsa@kernel.org>
>      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
>      Cc: Andi Kleen <ak@linux.intel.com>
>      Cc: Kan Liang <kan.liang@intel.com>
>      Cc: Peter Zijlstra <peterz@infradead.org>
>      Link: https://lkml.kernel.org/n/tip-khh28q0wwqbqtrk32bfe07hd@git.kernel.org
>      Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c
> index be0d5a736dea..2b653853eec2 100644
> --- a/tools/perf/util/thread_map.c
> +++ b/tools/perf/util/thread_map.c
> @@ -92,7 +92,7 @@ struct thread_map *thread_map__new_by_tid(pid_t tid)
>   	return threads;
>   }
>   
> -struct thread_map *thread_map__new_by_uid(uid_t uid)
> +static struct thread_map *__thread_map__new_all_cpus(uid_t uid)
>   {
>   	DIR *proc;
>   	int max_threads = 32, items, i;
> @@ -113,7 +113,6 @@ struct thread_map *thread_map__new_by_uid(uid_t uid)
>   	while ((dirent = readdir(proc)) != NULL) {
>   		char *end;
>   		bool grow = false;
> -		struct stat st;
>   		pid_t pid = strtol(dirent->d_name, &end, 10);
>   
>   		if (*end) /* only interested in proper numerical dirents */
> @@ -121,11 +120,12 @@ struct thread_map *thread_map__new_by_uid(uid_t uid)
>   
>   		snprintf(path, sizeof(path), "/proc/%s", dirent->d_name);
>   
> -		if (stat(path, &st) != 0)
> -			continue;
> +		if (uid != UINT_MAX) {
> +			struct stat st;
>   
> -		if (st.st_uid != uid)
> -			continue;
> +			if (stat(path, &st) != 0 || st.st_uid != uid)
> +				continue;
> +		}
>   
>   		snprintf(path, sizeof(path), "/proc/%d/task", pid);
>   		items = scandir(path, &namelist, filter, NULL);
> @@ -178,6 +178,16 @@ struct thread_map *thread_map__new_by_uid(uid_t uid)
>   	goto out_closedir;
>   }
>   
> +struct thread_map *thread_map__new_all_cpus(void)
> +{
> +	return __thread_map__new_all_cpus(UINT_MAX);
> +}
> +
> +struct thread_map *thread_map__new_by_uid(uid_t uid)
> +{
> +	return __thread_map__new_all_cpus(uid);
> +}
> +
>   struct thread_map *thread_map__new(pid_t pid, pid_t tid, uid_t uid)
>   {
>   	if (pid != -1)
> diff --git a/tools/perf/util/thread_map.h b/tools/perf/util/thread_map.h
> index f15803985435..07a765fb22bb 100644
> --- a/tools/perf/util/thread_map.h
> +++ b/tools/perf/util/thread_map.h
> @@ -23,6 +23,7 @@ struct thread_map *thread_map__new_dummy(void);
>   struct thread_map *thread_map__new_by_pid(pid_t pid);
>   struct thread_map *thread_map__new_by_tid(pid_t tid);
>   struct thread_map *thread_map__new_by_uid(uid_t uid);
> +struct thread_map *thread_map__new_all_cpus(void);
>   struct thread_map *thread_map__new(pid_t pid, pid_t tid, uid_t uid);
>   struct thread_map *thread_map__new_event(struct thread_map_event *event);
>   
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [tip:perf/core] perf rblist: Create rblist__exit() function
  2017-12-01 10:57 ` [PATCH v5 01/12] perf util: Create rblist__exit() function Jin Yao
@ 2017-12-06 16:36   ` tip-bot for Jin Yao
  0 siblings, 0 replies; 27+ messages in thread
From: tip-bot for Jin Yao @ 2017-12-06 16:36 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, yao.jin, ak, acme, tglx, mingo, linux-kernel, jolsa,
	kan.liang, alexander.shishkin, peterz

Commit-ID:  33fec3e393dc1c55737cfb9c876b5c0da0d6f380
Gitweb:     https://git.kernel.org/tip/33fec3e393dc1c55737cfb9c876b5c0da0d6f380
Author:     Jin Yao <yao.jin@linux.intel.com>
AuthorDate: Fri, 1 Dec 2017 18:57:25 +0800
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Tue, 5 Dec 2017 10:24:31 -0300

perf rblist: Create rblist__exit() function

Currently we have a rblist__delete() which is used to delete a rblist.
While rblist__delete() will free the pointer of rblist at the end.

It's an inconvenience for the user to delete a rblist which is not
allocated by something like malloc(). For example, the rblist is
embedded in a larger data structure.

This patch creates a new function rblist__exit() which is similar to
rblist__delete() but it will not free the pointer of rblist.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1512125856-22056-2-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/rblist.c | 19 ++++++++++++-------
 tools/perf/util/rblist.h |  1 +
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/rblist.c b/tools/perf/util/rblist.c
index 0dfe27d..0efc325 100644
--- a/tools/perf/util/rblist.c
+++ b/tools/perf/util/rblist.c
@@ -101,16 +101,21 @@ void rblist__init(struct rblist *rblist)
 	return;
 }
 
+void rblist__exit(struct rblist *rblist)
+{
+	struct rb_node *pos, *next = rb_first(&rblist->entries);
+
+	while (next) {
+		pos = next;
+		next = rb_next(pos);
+		rblist__remove_node(rblist, pos);
+	}
+}
+
 void rblist__delete(struct rblist *rblist)
 {
 	if (rblist != NULL) {
-		struct rb_node *pos, *next = rb_first(&rblist->entries);
-
-		while (next) {
-			pos = next;
-			next = rb_next(pos);
-			rblist__remove_node(rblist, pos);
-		}
+		rblist__exit(rblist);
 		free(rblist);
 	}
 }
diff --git a/tools/perf/util/rblist.h b/tools/perf/util/rblist.h
index 4c8638a..76df15c 100644
--- a/tools/perf/util/rblist.h
+++ b/tools/perf/util/rblist.h
@@ -29,6 +29,7 @@ struct rblist {
 };
 
 void rblist__init(struct rblist *rblist);
+void rblist__exit(struct rblist *rblist);
 void rblist__delete(struct rblist *rblist);
 int rblist__add_node(struct rblist *rblist, const void *new_entry);
 void rblist__remove_node(struct rblist *rblist, struct rb_node *rb_node);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [tip:perf/core] perf stat: Add rbtree node_delete op
  2017-12-01 10:57 ` [PATCH v5 04/12] perf util: Add rbtree node_delete ops Jin Yao
  2017-12-01 14:14   ` Arnaldo Carvalho de Melo
@ 2017-12-06 16:37   ` tip-bot for Jin Yao
  1 sibling, 0 replies; 27+ messages in thread
From: tip-bot for Jin Yao @ 2017-12-06 16:37 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, jolsa, linux-kernel, kan.liang, hpa, alexander.shishkin,
	acme, yao.jin, tglx, ak, mingo

Commit-ID:  b984aff7811bbac75b3f05931643d815067cf45c
Gitweb:     https://git.kernel.org/tip/b984aff7811bbac75b3f05931643d815067cf45c
Author:     Jin Yao <yao.jin@linux.intel.com>
AuthorDate: Fri, 1 Dec 2017 18:57:28 +0800
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Tue, 5 Dec 2017 10:24:31 -0300

perf stat: Add rbtree node_delete op

In current stat-shadow.c, the rbtree deleting is ignored.

The patch adds the implementation to node_delete method of rblist.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1512125856-22056-5-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/stat-shadow.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 855e35c..57ec225 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -87,6 +87,16 @@ static struct rb_node *saved_value_new(struct rblist *rblist __maybe_unused,
 	return &nd->rb_node;
 }
 
+static void saved_value_delete(struct rblist *rblist __maybe_unused,
+			       struct rb_node *rb_node)
+{
+	struct saved_value *v;
+
+	BUG_ON(!rb_node);
+	v = container_of(rb_node, struct saved_value, rb_node);
+	free(v);
+}
+
 static struct saved_value *saved_value_lookup(struct perf_evsel *evsel,
 					      int cpu,
 					      bool create)
@@ -114,7 +124,7 @@ void perf_stat__init_shadow_stats(void)
 	rblist__init(&runtime_saved_values);
 	runtime_saved_values.node_cmp = saved_value_cmp;
 	runtime_saved_values.node_new = saved_value_new;
-	/* No delete for now */
+	runtime_saved_values.node_delete = saved_value_delete;
 }
 
 static int evsel_context(struct perf_evsel *evsel)

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2017-12-06 16:42 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-01 10:57 [PATCH v5 00/12] perf stat: Enable '--per-thread' on all thread Jin Yao
2017-12-01 10:57 ` [PATCH v5 01/12] perf util: Create rblist__exit() function Jin Yao
2017-12-06 16:36   ` [tip:perf/core] perf rblist: " tip-bot for Jin Yao
2017-12-01 10:57 ` [PATCH v5 02/12] perf util: Define a structure for runtime shadow stats Jin Yao
2017-12-01 14:02   ` Arnaldo Carvalho de Melo
2017-12-02  4:39     ` Jin, Yao
2017-12-01 10:57 ` [PATCH v5 03/12] perf util: Extend rbtree to support " Jin Yao
2017-12-01 14:10   ` Arnaldo Carvalho de Melo
2017-12-02  4:40     ` Jin, Yao
2017-12-01 10:57 ` [PATCH v5 04/12] perf util: Add rbtree node_delete ops Jin Yao
2017-12-01 14:14   ` Arnaldo Carvalho de Melo
2017-12-01 18:29     ` Andi Kleen
2017-12-06 16:37   ` [tip:perf/core] perf stat: Add rbtree node_delete op tip-bot for Jin Yao
2017-12-01 10:57 ` [PATCH v5 05/12] perf util: Create the runtime_stat init/exit function Jin Yao
2017-12-01 10:57 ` [PATCH v5 06/12] perf util: Update and print per-thread shadow stats Jin Yao
2017-12-01 14:21   ` Arnaldo Carvalho de Melo
2017-12-02  4:46     ` Jin, Yao
2017-12-01 10:57 ` [PATCH v5 07/12] perf util: Remove a set of shadow stats static variables Jin Yao
2017-12-01 10:57 ` [PATCH v5 08/12] perf stat: Allocate shadow stats buffer for threads Jin Yao
2017-12-01 10:57 ` [PATCH v5 09/12] perf stat: Update or print per-thread stats Jin Yao
2017-12-01 10:57 ` [PATCH v5 10/12] perf util: Reuse thread_map__new_by_uid to enumerate threads from /proc Jin Yao
2017-12-01 14:44   ` Arnaldo Carvalho de Melo
2017-12-01 15:02     ` Arnaldo Carvalho de Melo
2017-12-02  4:53       ` Jin, Yao
2017-12-02  4:47     ` Jin, Yao
2017-12-01 10:57 ` [PATCH v5 11/12] perf stat: Remove --per-thread pid/tid limitation Jin Yao
2017-12-01 10:57 ` [PATCH v5 12/12] perf stat: Resort '--per-thread' result Jin Yao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).