LKML Archive on lore.kernel.org
 help / color / Atom feed
* Add top down metrics to perf stat
@ 2016-04-27 20:00 Andi Kleen
  2016-04-27 20:00 ` [PATCH 01/11] x86, perf: Support sysfs files depending on SMT status Andi Kleen
                   ` (10 more replies)
  0 siblings, 11 replies; 31+ messages in thread
From: Andi Kleen @ 2016-04-27 20:00 UTC (permalink / raw)
  To: acme; +Cc: peterz, jolsa, linux-kernel

Note to reviewers: includes both tools and kernel patches.
The kernel patches are at the beginning.

[v2: Address review feedback.
Metrics are now always printed, but colored when crossing threshold.
--topdown implies --metric-only.
Various smaller fixes, see individual patches]
[v3: Add --single-thread option and support it with HT off.
Clean up old HT workaround.
Improve documentation.
Various smaller fixes, see individual patches.]
[v4: Rebased on latest tree]
[v5: Rebased on latest tree. Move debug messages to -vv]

This patchkit adds support for TopDown measurements to perf stat
It applies on top of my earlier metrics patchkit, posted
separately.

TopDown is intended to replace the frontend cycles idle/
backend cycles idle metrics in standard perf stat output.
These metrics are not reliable in many workloads, 
due to out of order effects.

This implements a new --topdown mode in perf stat
(similar to --transaction) that measures the pipe line
bottlenecks using standardized formulas. The measurement
can be all done with 5 counters (one fixed counter)

The result are four metrics:
FrontendBound, BackendBound, BadSpeculation, Retiring

that describe the CPU pipeline behavior on a high level.

FrontendBound and BackendBound
BadSpeculation is a higher

The full top down methology has many hierarchical metrics.
This implementation only supports level 1 which can be
collected without multiplexing. A full implementation
of top down on top of perf is available in pmu-tools toplev.
(http://github.com/andikleen/pmu-tools)

The current version works on Intel Core CPUs starting
with Sandy Bridge, and Atom CPUs starting with Silvermont.
In principle the generic metrics should be also implementable
on other out of order CPUs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):
    
topdown-total-slots   Available slots in the pipeline
topdown-slots-issued          Slots issued into the pipeline
topdown-slots-retired         Slots successfully retired
topdown-fetch-bubbles         Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
                          from misspeculation
    
These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.
    
The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.
    
The kernel declares the events supported by the current
CPU and perf stat then computes the formulas based on the
available metrics.


Example output:

$ perf stat --topdown -I 1000 cmd
     1.000735655                   frontend bound       retiring             bad speculation      backend bound        
     1.000735655 S0-C0           2    47.84%              11.69%               8.37%              32.10%           
     1.000735655 S0-C1           2    45.53%              11.39%               8.52%              34.56%           
     2.003978563 S0-C0           2    49.47%              12.22%               8.65%              29.66%           
     2.003978563 S0-C1           2    47.21%              12.98%               8.77%              31.04%           
     3.004901432 S0-C0           2    49.35%              12.26%               8.68%              29.70%           
     3.004901432 S0-C1           2    47.23%              12.67%               8.76%              31.35%           
     4.005766611 S0-C0           2    48.44%              12.14%               8.59%              30.82%           
     4.005766611 S0-C1           2    46.07%              12.41%               8.67%              32.85%           
     5.006580592 S0-C0           2    47.91%              12.08%               8.57%              31.44%           
     5.006580592 S0-C1           2    45.57%              12.27%               8.63%              33.53%           
     6.007545125 S0-C0           2    47.45%              12.02%               8.57%              31.96%           
     6.007545125 S0-C1           2    45.13%              12.17%               8.57%              34.14%           
     7.008539347 S0-C0           2    47.07%              12.03%               8.61%              32.29%           
...


 
For Level 1 Top Down computes metrics per core instead of per logical CPU
on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown 
is per thread)

In this case perf stat automatically enables --per-core mode and also requires
global mode (-a) and avoiding other filters (no cgroup mode)

When Hyper Threading is off this can be overriden with the --single-thread
option. When Hyper Threading is on it is enforced, the only way to
not require -a here is to off line the logical CPUs of the second
threads.

One side effect is that this may require root rights or a
kernel.perf_event_paranoid=-1 setting. 

Full tree available in 
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-19

^ permalink raw reply	[flat|nested] 31+ messages in thread
* Add top down metrics to perf stat
@ 2016-05-20  0:09 Andi Kleen
  0 siblings, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2016-05-20  0:09 UTC (permalink / raw)
  To: acme; +Cc: peterz, jolsa, linux-kernel, mingo

Note to reviewers: includes both tools and kernel patches.
The kernel patches are at the beginning.

[v2: Address review feedback.
Metrics are now always printed, but colored when crossing threshold.
--topdown implies --metric-only.
Various smaller fixes, see individual patches]
[v3: Add --single-thread option and support it with HT off.
Clean up old HT workaround.
Improve documentation.
Various smaller fixes, see individual patches.]
[v4: Rebased on latest tree]
[v5: Rebased on latest tree. Move debug messages to -vv]
[v6: Rebased. Remove .aggr-per-core and --single-thread to not
break old perf binaries. Put SMT enumeration into 
generic topology API.]
[v7: Address review comments. Change patch title headers.]
[v8: Avoid -0.00 output]

This patchkit adds support for TopDown measurements to perf stat
It applies on top of my earlier metrics patchkit, posted
separately.

TopDown is intended to replace the frontend cycles idle/
backend cycles idle metrics in standard perf stat output.
These metrics are not reliable in many workloads, 
due to out of order effects.

This implements a new --topdown mode in perf stat
(similar to --transaction) that measures the pipe line
bottlenecks using standardized formulas. The measurement
can be all done with 5 counters (one fixed counter)

The result are four metrics:
FrontendBound, BackendBound, BadSpeculation, Retiring

that describe the CPU pipeline behavior on a high level.

FrontendBound and BackendBound
BadSpeculation is a higher

The full top down methology has many hierarchical metrics.
This implementation only supports level 1 which can be
collected without multiplexing. A full implementation
of top down on top of perf is available in pmu-tools toplev.
(http://github.com/andikleen/pmu-tools)

The current version works on Intel Core CPUs starting
with Sandy Bridge, and Atom CPUs starting with Silvermont.
In principle the generic metrics should be also implementable
on other out of order CPUs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):
    
topdown-total-slots   Available slots in the pipeline
topdown-slots-issued          Slots issued into the pipeline
topdown-slots-retired         Slots successfully retired
topdown-fetch-bubbles         Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
                          from misspeculation
    
These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.
    
The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.
    
The kernel declares the events supported by the current
CPU and perf stat then computes the formulas based on the
available metrics.


Example output:

$ perf stat --topdown -I 1000 cmd
     1.000735655                   frontend bound       retiring             bad speculation      backend bound        
     1.000735655 S0-C0           2    47.84%              11.69%               8.37%              32.10%           
     1.000735655 S0-C1           2    45.53%              11.39%               8.52%              34.56%           
     2.003978563 S0-C0           2    49.47%              12.22%               8.65%              29.66%           
     2.003978563 S0-C1           2    47.21%              12.98%               8.77%              31.04%           
     3.004901432 S0-C0           2    49.35%              12.26%               8.68%              29.70%           
     3.004901432 S0-C1           2    47.23%              12.67%               8.76%              31.35%           
     4.005766611 S0-C0           2    48.44%              12.14%               8.59%              30.82%           
     4.005766611 S0-C1           2    46.07%              12.41%               8.67%              32.85%           
     5.006580592 S0-C0           2    47.91%              12.08%               8.57%              31.44%           
     5.006580592 S0-C1           2    45.57%              12.27%               8.63%              33.53%           
     6.007545125 S0-C0           2    47.45%              12.02%               8.57%              31.96%           
     6.007545125 S0-C1           2    45.13%              12.17%               8.57%              34.14%           
     7.008539347 S0-C0           2    47.07%              12.03%               8.61%              32.29%           
...


 
For Level 1 Top Down computes metrics per core instead of per logical CPU
on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown 
is per thread)

In this case perf stat automatically enables --per-core mode and also requires
global mode (-a) and avoiding other filters (no cgroup mode)
One side effect is that this may require root rights or a
kernel.perf_event_paranoid=-1 setting. 

Full tree available in 
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-22

^ permalink raw reply	[flat|nested] 31+ messages in thread
* Add top down metrics to perf stat
@ 2016-05-14  1:44 Andi Kleen
  2016-05-16 12:58 ` Jiri Olsa
  2016-05-20 10:24 ` Jiri Olsa
  0 siblings, 2 replies; 31+ messages in thread
From: Andi Kleen @ 2016-05-14  1:44 UTC (permalink / raw)
  To: acme; +Cc: peterz, jolsa, mingo, linux-kernel

Note to reviewers: includes both tools and kernel patches.
The kernel patches are at the beginning.

[v2: Address review feedback.
Metrics are now always printed, but colored when crossing threshold.
--topdown implies --metric-only.
Various smaller fixes, see individual patches]
[v3: Add --single-thread option and support it with HT off.
Clean up old HT workaround.
Improve documentation.
Various smaller fixes, see individual patches.]
[v4: Rebased on latest tree]
[v5: Rebased on latest tree. Move debug messages to -vv]
[v6: Rebased. Remove .aggr-per-core and --single-thread to not
break old perf binaries. Put SMT enumeration into 
generic topology API.]
[v7: Address review comments. Change patch title headers.]

This patchkit adds support for TopDown measurements to perf stat
It applies on top of my earlier metrics patchkit, posted
separately.

TopDown is intended to replace the frontend cycles idle/
backend cycles idle metrics in standard perf stat output.
These metrics are not reliable in many workloads, 
due to out of order effects.

This implements a new --topdown mode in perf stat
(similar to --transaction) that measures the pipe line
bottlenecks using standardized formulas. The measurement
can be all done with 5 counters (one fixed counter)

The result are four metrics:
FrontendBound, BackendBound, BadSpeculation, Retiring

that describe the CPU pipeline behavior on a high level.

FrontendBound and BackendBound
BadSpeculation is a higher

The full top down methology has many hierarchical metrics.
This implementation only supports level 1 which can be
collected without multiplexing. A full implementation
of top down on top of perf is available in pmu-tools toplev.
(http://github.com/andikleen/pmu-tools)

The current version works on Intel Core CPUs starting
with Sandy Bridge, and Atom CPUs starting with Silvermont.
In principle the generic metrics should be also implementable
on other out of order CPUs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):
    
topdown-total-slots   Available slots in the pipeline
topdown-slots-issued          Slots issued into the pipeline
topdown-slots-retired         Slots successfully retired
topdown-fetch-bubbles         Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
                          from misspeculation
    
These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.
    
The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.
    
The kernel declares the events supported by the current
CPU and perf stat then computes the formulas based on the
available metrics.


Example output:

$ perf stat --topdown -I 1000 cmd
     1.000735655                   frontend bound       retiring             bad speculation      backend bound        
     1.000735655 S0-C0           2    47.84%              11.69%               8.37%              32.10%           
     1.000735655 S0-C1           2    45.53%              11.39%               8.52%              34.56%           
     2.003978563 S0-C0           2    49.47%              12.22%               8.65%              29.66%           
     2.003978563 S0-C1           2    47.21%              12.98%               8.77%              31.04%           
     3.004901432 S0-C0           2    49.35%              12.26%               8.68%              29.70%           
     3.004901432 S0-C1           2    47.23%              12.67%               8.76%              31.35%           
     4.005766611 S0-C0           2    48.44%              12.14%               8.59%              30.82%           
     4.005766611 S0-C1           2    46.07%              12.41%               8.67%              32.85%           
     5.006580592 S0-C0           2    47.91%              12.08%               8.57%              31.44%           
     5.006580592 S0-C1           2    45.57%              12.27%               8.63%              33.53%           
     6.007545125 S0-C0           2    47.45%              12.02%               8.57%              31.96%           
     6.007545125 S0-C1           2    45.13%              12.17%               8.57%              34.14%           
     7.008539347 S0-C0           2    47.07%              12.03%               8.61%              32.29%           
...


 
For Level 1 Top Down computes metrics per core instead of per logical CPU
on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown 
is per thread)

In this case perf stat automatically enables --per-core mode and also requires
global mode (-a) and avoiding other filters (no cgroup mode)
One side effect is that this may require root rights or a
kernel.perf_event_paranoid=-1 setting. 

Full tree available in 
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-21

^ permalink raw reply	[flat|nested] 31+ messages in thread
* Add top down metrics to perf stat
@ 2016-05-05 23:03 Andi Kleen
  2016-05-12  7:47 ` Jiri Olsa
  0 siblings, 1 reply; 31+ messages in thread
From: Andi Kleen @ 2016-05-05 23:03 UTC (permalink / raw)
  To: acme, peterz; +Cc: jolsa, linux-kernel

Note to reviewers: includes both tools and kernel patches.
The kernel patches are at the beginning.

[v2: Address review feedback.
Metrics are now always printed, but colored when crossing threshold.
--topdown implies --metric-only.
Various smaller fixes, see individual patches]
[v3: Add --single-thread option and support it with HT off.
Clean up old HT workaround.
Improve documentation.
Various smaller fixes, see individual patches.]
[v4: Rebased on latest tree]
[v5: Rebased on latest tree. Move debug messages to -vv]
[v6: Rebased. Remove .aggr-per-core and --single-thread to not
break old perf binaries. Put SMT enumeration into 
generic topology API.]

This patchkit adds support for TopDown measurements to perf stat
It applies on top of my earlier metrics patchkit, posted
separately.

TopDown is intended to replace the frontend cycles idle/
backend cycles idle metrics in standard perf stat output.
These metrics are not reliable in many workloads, 
due to out of order effects.

This implements a new --topdown mode in perf stat
(similar to --transaction) that measures the pipe line
bottlenecks using standardized formulas. The measurement
can be all done with 5 counters (one fixed counter)

The result are four metrics:
FrontendBound, BackendBound, BadSpeculation, Retiring

that describe the CPU pipeline behavior on a high level.

FrontendBound and BackendBound
BadSpeculation is a higher

The full top down methology has many hierarchical metrics.
This implementation only supports level 1 which can be
collected without multiplexing. A full implementation
of top down on top of perf is available in pmu-tools toplev.
(http://github.com/andikleen/pmu-tools)

The current version works on Intel Core CPUs starting
with Sandy Bridge, and Atom CPUs starting with Silvermont.
In principle the generic metrics should be also implementable
on other out of order CPUs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):
    
topdown-total-slots   Available slots in the pipeline
topdown-slots-issued          Slots issued into the pipeline
topdown-slots-retired         Slots successfully retired
topdown-fetch-bubbles         Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
                          from misspeculation
    
These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.
    
The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.
    
The kernel declares the events supported by the current
CPU and perf stat then computes the formulas based on the
available metrics.


Example output:

$ perf stat --topdown -I 1000 cmd
     1.000735655                   frontend bound       retiring             bad speculation      backend bound        
     1.000735655 S0-C0           2    47.84%              11.69%               8.37%              32.10%           
     1.000735655 S0-C1           2    45.53%              11.39%               8.52%              34.56%           
     2.003978563 S0-C0           2    49.47%              12.22%               8.65%              29.66%           
     2.003978563 S0-C1           2    47.21%              12.98%               8.77%              31.04%           
     3.004901432 S0-C0           2    49.35%              12.26%               8.68%              29.70%           
     3.004901432 S0-C1           2    47.23%              12.67%               8.76%              31.35%           
     4.005766611 S0-C0           2    48.44%              12.14%               8.59%              30.82%           
     4.005766611 S0-C1           2    46.07%              12.41%               8.67%              32.85%           
     5.006580592 S0-C0           2    47.91%              12.08%               8.57%              31.44%           
     5.006580592 S0-C1           2    45.57%              12.27%               8.63%              33.53%           
     6.007545125 S0-C0           2    47.45%              12.02%               8.57%              31.96%           
     6.007545125 S0-C1           2    45.13%              12.17%               8.57%              34.14%           
     7.008539347 S0-C0           2    47.07%              12.03%               8.61%              32.29%           
...


 
For Level 1 Top Down computes metrics per core instead of per logical CPU
on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown 
is per thread)

In this case perf stat automatically enables --per-core mode and also requires
global mode (-a) and avoiding other filters (no cgroup mode)
One side effect is that this may require root rights or a
kernel.perf_event_paranoid=-1 setting. 

Full tree available in 
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-20

^ permalink raw reply	[flat|nested] 31+ messages in thread
* Add top down metrics to perf stat
@ 2016-04-04 20:41 Andi Kleen
  0 siblings, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2016-04-04 20:41 UTC (permalink / raw)
  To: peterz, acme; +Cc: jolsa, linux-kernel, mingo

Note to reviewers: includes both tools and kernel patches.
The kernel patches are at the beginning.

[v2: Address review feedback.
Metrics are now always printed, but colored when crossing threshold.
--topdown implies --metric-only.
Various smaller fixes, see individual patches]
[v3: Add --single-thread option and support it with HT off.
Clean up old HT workaround.
Improve documentation.
Various smaller fixes, see individual patches.]
[v4: Rebased on latest tree]

This patchkit adds support for TopDown measurements to perf stat
It applies on top of my earlier metrics patchkit, posted
separately.

TopDown is intended to replace the frontend cycles idle/
backend cycles idle metrics in standard perf stat output.
These metrics are not reliable in many workloads, 
due to out of order effects.

This implements a new --topdown mode in perf stat
(similar to --transaction) that measures the pipe line
bottlenecks using standardized formulas. The measurement
can be all done with 5 counters (one fixed counter)

The result are four metrics:
FrontendBound, BackendBound, BadSpeculation, Retiring

that describe the CPU pipeline behavior on a high level.

FrontendBound and BackendBound
BadSpeculation is a higher

The full top down methology has many hierarchical metrics.
This implementation only supports level 1 which can be
collected without multiplexing. A full implementation
of top down on top of perf is available in pmu-tools toplev.
(http://github.com/andikleen/pmu-tools)

The current version works on Intel Core CPUs starting
with Sandy Bridge, and Atom CPUs starting with Silvermont.
In principle the generic metrics should be also implementable
on other out of order CPUs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):
    
topdown-total-slots   Available slots in the pipeline
topdown-slots-issued          Slots issued into the pipeline
topdown-slots-retired         Slots successfully retired
topdown-fetch-bubbles         Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
                          from misspeculation
    
These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.
    
The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.
    
The kernel declares the events supported by the current
CPU and perf stat then computes the formulas based on the
available metrics.


Example output:

$ perf stat --topdown -I 1000 cmd
     1.000735655                   frontend bound       retiring             bad speculation      backend bound        
     1.000735655 S0-C0           2    47.84%              11.69%               8.37%              32.10%           
     1.000735655 S0-C1           2    45.53%              11.39%               8.52%              34.56%           
     2.003978563 S0-C0           2    49.47%              12.22%               8.65%              29.66%           
     2.003978563 S0-C1           2    47.21%              12.98%               8.77%              31.04%           
     3.004901432 S0-C0           2    49.35%              12.26%               8.68%              29.70%           
     3.004901432 S0-C1           2    47.23%              12.67%               8.76%              31.35%           
     4.005766611 S0-C0           2    48.44%              12.14%               8.59%              30.82%           
     4.005766611 S0-C1           2    46.07%              12.41%               8.67%              32.85%           
     5.006580592 S0-C0           2    47.91%              12.08%               8.57%              31.44%           
     5.006580592 S0-C1           2    45.57%              12.27%               8.63%              33.53%           
     6.007545125 S0-C0           2    47.45%              12.02%               8.57%              31.96%           
     6.007545125 S0-C1           2    45.13%              12.17%               8.57%              34.14%           
     7.008539347 S0-C0           2    47.07%              12.03%               8.61%              32.29%           
...


 
For Level 1 Top Down computes metrics per core instead of per logical CPU
on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown 
is per thread)

In this case perf stat automatically enables --per-core mode and also requires
global mode (-a) and avoiding other filters (no cgroup mode)

When Hyper Threading is off this can be overriden with the --single-thread
option. When Hyper Threading is on it is enforced, the only way to
not require -a here is to off line the logical CPUs of the second
threads.

One side effect is that this may require root rights or a
kernel.perf_event_paranoid=-1 setting. 

Full tree available in 
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-17

^ permalink raw reply	[flat|nested] 31+ messages in thread
* Add top down metrics to perf stat
@ 2016-03-22 23:08 Andi Kleen
  2016-03-27 11:27 ` Jiri Olsa
  0 siblings, 1 reply; 31+ messages in thread
From: Andi Kleen @ 2016-03-22 23:08 UTC (permalink / raw)
  To: acme; +Cc: peterz, jolsa, eranian, mingo, linux-kernel

[v2: Address review feedback.
Metrics are now always printed, but colored when crossing threshold.
--topdown implies --metric-only.
Various smaller fixes, see individual patches]
[v3: Add --single-thread option and support it with HT off.
Clean up old HT workaround.
Improve documentation.
Various smaller fixes, see individual patches.]

Note to reviewers: includes both tools and kernel patches.
The kernel patches are at the beginning.

This patchkit adds support for TopDown measurements to perf stat
It applies on top of my earlier metrics patchkit, posted
separately.

TopDown is intended to replace the frontend cycles idle/
backend cycles idle metrics in standard perf stat output.
These metrics are not reliable in many workloads, 
due to out of order effects.

This implements a new --topdown mode in perf stat
(similar to --transaction) that measures the pipe line
bottlenecks using standardized formulas. The measurement
can be all done with 5 counters (one fixed counter)

The result are four metrics:
FrontendBound, BackendBound, BadSpeculation, Retiring

that describe the CPU pipeline behavior on a high level.

FrontendBound and BackendBound
BadSpeculation is a higher

The full top down methology has many hierarchical metrics.
This implementation only supports level 1 which can be
collected without multiplexing. A full implementation
of top down on top of perf is available in pmu-tools toplev.
(http://github.com/andikleen/pmu-tools)

The current version works on Intel Core CPUs starting
with Sandy Bridge, and Atom CPUs starting with Silvermont.
In principle the generic metrics should be also implementable
on other out of order CPUs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):
    
topdown-total-slots   Available slots in the pipeline
topdown-slots-issued          Slots issued into the pipeline
topdown-slots-retired         Slots successfully retired
topdown-fetch-bubbles         Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
                          from misspeculation
    
These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.
    
The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.
    
The kernel declares the events supported by the current
CPU and perf stat then computes the formulas based on the
available metrics.


Example output:

$ perf stat --topdown -I 1000 cmd
     1.000735655                   frontend bound       retiring             bad speculation      backend bound        
     1.000735655 S0-C0           2    47.84%              11.69%               8.37%              32.10%           
     1.000735655 S0-C1           2    45.53%              11.39%               8.52%              34.56%           
     2.003978563 S0-C0           2    49.47%              12.22%               8.65%              29.66%           
     2.003978563 S0-C1           2    47.21%              12.98%               8.77%              31.04%           
     3.004901432 S0-C0           2    49.35%              12.26%               8.68%              29.70%           
     3.004901432 S0-C1           2    47.23%              12.67%               8.76%              31.35%           
     4.005766611 S0-C0           2    48.44%              12.14%               8.59%              30.82%           
     4.005766611 S0-C1           2    46.07%              12.41%               8.67%              32.85%           
     5.006580592 S0-C0           2    47.91%              12.08%               8.57%              31.44%           
     5.006580592 S0-C1           2    45.57%              12.27%               8.63%              33.53%           
     6.007545125 S0-C0           2    47.45%              12.02%               8.57%              31.96%           
     6.007545125 S0-C1           2    45.13%              12.17%               8.57%              34.14%           
     7.008539347 S0-C0           2    47.07%              12.03%               8.61%              32.29%           
...


 
For Level 1 Top Down computes metrics per core instead of per logical CPU
on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown 
is per thread)

In this case perf stat automatically enables --per-core mode and also requires
global mode (-a) and avoiding other filters (no cgroup mode)

When Hyper Threading is off this can be overriden with the --single-thread
option. When Hyper Threading is on it is enforced, the only way to
not require -a here is to off line the logical CPUs of the second
threads.

One side effect is that this may require root rights or a
kernel.perf_event_paranoid=-1 setting. 

Full tree available in 
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-16

^ permalink raw reply	[flat|nested] 31+ messages in thread
* Add top down metrics to perf stat
@ 2016-01-20  2:27 Andi Kleen
  0 siblings, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2016-01-20  2:27 UTC (permalink / raw)
  To: acme; +Cc: jolsa, mingo, linux-kernel, eranian

[v2: Address review feedback.
Metrics are now always printed, but colored when crossing threshold.
--topdown implies --metric-only.
Various smaller fixes, see individual patches]
[v3: Add --single-thread option and support it with HT off.
Clean up old HT workaround.
Improve documentation.
Various smaller fixes, see individual patches.

Note to reviewers: includes both tools and kernel patches.
The kernel patches are at the end.

This patchkit adds support for TopDown measurements to perf stat
It applies on top of my earlier metrics patchkit, posted
separately.

TopDown is intended to replace the frontend cycles idle/
backend cycles idle metrics in standard perf stat output.
These metrics are not reliable in many workloads, 
due to out of order effects.

This implements a new --topdown mode in perf stat
(similar to --transaction) that measures the pipe line
bottlenecks using standardized formulas. The measurement
can be all done with 5 counters (one fixed counter)

The result are four metrics:
FrontendBound, BackendBound, BadSpeculation, Retiring

that describe the CPU pipeline behavior on a high level.

FrontendBound and BackendBound
BadSpeculation is a higher

The full top down methology has many hierarchical metrics.
This implementation only supports level 1 which can be
collected without multiplexing. A full implementation
of top down on top of perf is available in pmu-tools toplev.
(http://github.com/andikleen/pmu-tools)

The current version works on Intel Core CPUs starting
with Sandy Bridge, and Atom CPUs starting with Silvermont.
In principle the generic metrics should be also implementable
on other out of order CPUs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):
    
topdown-total-slots   Available slots in the pipeline
topdown-slots-issued          Slots issued into the pipeline
topdown-slots-retired         Slots successfully retired
topdown-fetch-bubbles         Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
                          from misspeculation
    
These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.
    
The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.
    
The kernel declares the events supported by the current
CPU and perf stat then computes the formulas based on the
available metrics.


Example output:

$ perf stat --topdown -I 1000 cmd
     1.000735655                   frontend bound       retiring             bad speculation      backend bound        
     1.000735655 S0-C0           2    47.84%              11.69%               8.37%              32.10%           
     1.000735655 S0-C1           2    45.53%              11.39%               8.52%              34.56%           
     2.003978563 S0-C0           2    49.47%              12.22%               8.65%              29.66%           
     2.003978563 S0-C1           2    47.21%              12.98%               8.77%              31.04%           
     3.004901432 S0-C0           2    49.35%              12.26%               8.68%              29.70%           
     3.004901432 S0-C1           2    47.23%              12.67%               8.76%              31.35%           
     4.005766611 S0-C0           2    48.44%              12.14%               8.59%              30.82%           
     4.005766611 S0-C1           2    46.07%              12.41%               8.67%              32.85%           
     5.006580592 S0-C0           2    47.91%              12.08%               8.57%              31.44%           
     5.006580592 S0-C1           2    45.57%              12.27%               8.63%              33.53%           
     6.007545125 S0-C0           2    47.45%              12.02%               8.57%              31.96%           
     6.007545125 S0-C1           2    45.13%              12.17%               8.57%              34.14%           
     7.008539347 S0-C0           2    47.07%              12.03%               8.61%              32.29%           
...


 
For Level 1 Top Down computes metrics per core instead of per logical CPU
on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown 
is per thread)

In this case perf stat automatically enables --per-core mode and also requires
global mode (-a) and avoiding other filters (no cgroup mode)

When Hyper Threading is off this can be overriden with the --single-thread
option. When Hyper Threading is on it is enforced, the only way to
not require -a here is to off line the logical CPUs of the second
threads.

One side effect is that this may require root rights or a
kernel.perf_event_paranoid=-1 setting. 

Full tree available in 
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-13

^ permalink raw reply	[flat|nested] 31+ messages in thread
* Add top down metrics to perf stat
@ 2016-01-16  1:12 Andi Kleen
  0 siblings, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2016-01-16  1:12 UTC (permalink / raw)
  To: acme; +Cc: peterz, jolsa, eranian, linux-kernel, mingo

[v2: Address review feedback.
Metrics are now always printed, but colored when crossing threshold.
--topdown implies --metric-only.
Various smaller fixes, see individual patches]

Note to reviewers: includes both tools and kernel patches.
The kernel patches are at the end.

This patchkit adds support for TopDown measurements to perf stat
It applies on top of my earlier metrics patchkit, posted
separately.

TopDown is intended to replace the frontend cycles idle/
backend cycles idle metrics in standard perf stat output.
These metrics are not reliable in many workloads, 
due to out of order effects.

This implements a new --topdown mode in perf stat
(similar to --transaction) that measures the pipe line
bottlenecks using standardized formulas. The measurement
can be all done with 5 counters (one fixed counter)

The result are four metrics:
FrontendBound, BackendBound, BadSpeculation, Retiring

that describe the CPU pipeline behavior on a high level.

FrontendBound and BackendBound
BadSpeculation is a higher

The full top down methology has many hierarchical metrics.
This implementation only supports level 1 which can be
collected without multiplexing. A full implementation
of top down on top of perf is available in pmu-tools toplev.
(http://github.com/andikleen/pmu-tools)

The current version works on Intel Core CPUs starting
with Sandy Bridge, and Atom CPUs starting with Silvermont.
In principle the generic metrics should be also implementable
on other out of order CPUs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):
    
topdown-total-slots   Available slots in the pipeline
topdown-slots-issued          Slots issued into the pipeline
topdown-slots-retired         Slots successfully retired
topdown-fetch-bubbles         Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
                          from misspeculation
    
These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.
    
The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.
    
The kernel declares the events supported by the current
CPU and perf stat then computes the formulas based on the
available metrics.


Example output:

$ perf stat --topdown -I 100 ./BC1s
     0.100576098 frontend bound           retiring                 bad speculation          backend bound            
     0.100576098     8.83%                  48.93%                  35.24%                   7.00%               
     0.200800845     8.84%                  48.49%                  35.53%                   7.13%               
     0.300905983     8.73%                  48.64%                  35.58%                   7.05%            
...


 
On Hyper Threaded CPUs Top Down computes metrics per core instead of per logical CPU.
In this case perf stat automatically enables --per-core mode and also requires
global mode (-a) and avoiding other filters (no cgroup mode)

One side effect is that this may require root rights or a
kernel.perf_event_paranoid=-1 setting.  

On systems without Hyper Threading it can be used per process.

Full tree available in 
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-11

^ permalink raw reply	[flat|nested] 31+ messages in thread
* Add top down metrics to perf stat
@ 2015-08-08  1:06 Andi Kleen
  0 siblings, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2015-08-08  1:06 UTC (permalink / raw)
  To: acme; +Cc: jolsa, linux-kernel, eranian, namhyung, peterz, mingo

This patchkit adds support for TopDown to perf stat
It applies on top of my earlier metrics patchkit, posted
separately.

TopDown is intended to replace the frontend cycles idle/
backend cycles idle metrics in standard perf stat output.
These metrics are not reliable in many workloads, 
due to out of order effects.

This implements a new --topdown mode in perf stat
(similar to --transaction) that measures the pipe line
bottlenecks using standardized formulas. The measurement
can be all done with 5 counters (one fixed counter)

The result are four metrics:
FrontendBound, BackendBound, BadSpeculation, Retiring

that describe the CPU pipeline behavior on a high level.

FrontendBound and BackendBound
BadSpeculation is a higher

The full top down methology has many hierarchical metrics.
This implementation only supports level 1 which can be
collected without multiplexing. A full implementation
of top down on top of perf is available in pmu-tools toplev.
(http://github.com/andikleen/pmu-tools)

The current version works on Intel Core CPUs starting
with Sandy Bridge, and Atom CPUs starting with Silvermont.
In principle the generic metrics should be also implementable
on other out of order CPUs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):
    
topdown-total-slots   Available slots in the pipeline
topdown-slots-issued          Slots issued into the pipeline
topdown-slots-retired         Slots successfully retired
topdown-fetch-bubbles         Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
                          from misspeculation
    
These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.
    
The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.
    
The kernel declares the events supported by the current
CPU and perf stat then computes the formulas based on the
available metrics.


Example output:

$ ./perf stat --topdown -a ./BC1s 

 Performance counter stats for 'system wide':

S0-C0           2           19650790      topdown-total-slots                                           (100.00%)
S0-C0           2         4445680.00      topdown-fetch-bubbles     #    22.62% frontend bound          (100.00%)
S0-C0           2         1743552.00      topdown-slots-retired                                         (100.00%)
S0-C0           2             622954      topdown-recovery-bubbles                                      (100.00%)
S0-C0           2         2025498.00      topdown-slots-issued      #    63.90% backend bound         
S0-C1           2        16685216540      topdown-total-slots                                           (100.00%)
S0-C1           2       962557931.00      topdown-fetch-bubbles                                         (100.00%)
S0-C1           2      4175583320.00      topdown-slots-retired                                         (100.00%)
S0-C1           2         1743329246      topdown-recovery-bubbles  #    22.22% bad speculation         (100.00%)
S0-C1           2      6138901193.50      topdown-slots-issued      #    46.99% backend bound         

       1.535832673 seconds time elapsed
 
On Hyper Threaded CPUs Top Down computes metrics per core instead of per logical CPU.
In this case perf stat automatically enables --per-core mode and also requires
global mode (-a) and avoiding other filters (no cgroup mode)

One side effect is that this may require root rights or a
kernel.perf_event_paranoid=-1 setting.  

On systems without Hyper Threading it can be used per process.

Full tree available in 
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-2


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, back to index

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-27 20:00 Add top down metrics to perf stat Andi Kleen
2016-04-27 20:00 ` [PATCH 01/11] x86, perf: Support sysfs files depending on SMT status Andi Kleen
2016-04-28  8:06   ` Peter Zijlstra
2016-04-28  8:17     ` Peter Zijlstra
2016-04-27 20:00 ` [PATCH 02/11] x86, perf: Add Top Down events to Intel Core Andi Kleen
2016-04-27 20:00 ` [PATCH 03/11] x86, perf: Add Top Down events to Intel Atom Andi Kleen
2016-04-27 20:00 ` [PATCH 04/11] x86, perf: Use new ht_on flag in HT leak workaround Andi Kleen
2016-04-27 20:00 ` [PATCH 05/11] perf, tools: Parse an .aggr-per-core event attribute Andi Kleen
2016-04-27 20:00 ` [PATCH 06/11] perf, tools, stat: Force --per-core mode for .aggr-per-core aliases Andi Kleen
2016-04-27 20:00 ` [PATCH 07/11] perf, tools, stat: Avoid fractional digits for integer scales Andi Kleen
2016-04-27 20:00 ` [PATCH 08/11] perf, tools, stat: Scale values by unit before metrics Andi Kleen
2016-04-27 20:00 ` [PATCH 09/11] perf, tools, stat: Basic support for TopDown in perf stat Andi Kleen
2016-04-27 20:00 ` [PATCH 10/11] perf, tools, stat: Add computation of TopDown formulas Andi Kleen
2016-04-27 20:00 ` [PATCH 11/11] perf, tools, stat: Add extra output of counter values with -vv Andi Kleen
2016-05-07  4:57   ` [tip:perf/core] perf " tip-bot for Andi Kleen
  -- strict thread matches above, loose matches on Subject: below --
2016-05-20  0:09 Add top down metrics to perf stat Andi Kleen
2016-05-14  1:44 Andi Kleen
2016-05-16 12:58 ` Jiri Olsa
2016-05-19 23:51   ` Andi Kleen
2016-05-20  9:59     ` Jiri Olsa
2016-05-20 14:18       ` Andi Kleen
2016-05-20 10:24 ` Jiri Olsa
2016-05-05 23:03 Andi Kleen
2016-05-12  7:47 ` Jiri Olsa
2016-04-04 20:41 Andi Kleen
2016-03-22 23:08 Andi Kleen
2016-03-27 11:27 ` Jiri Olsa
2016-03-27 15:22   ` Andi Kleen
2016-01-20  2:27 Andi Kleen
2016-01-16  1:12 Andi Kleen
2015-08-08  1:06 Andi Kleen

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git
	git clone --mirror https://lore.kernel.org/lkml/10 lkml/git/10.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git