All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/6] Add metrics for neoverse-n2
@ 2022-10-31 11:11 ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-10-31 11:11 UTC (permalink / raw)
  To: linux-arm-kernel, linux-perf-users, linux-kernel
  Cc: John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song

This series add six metricgroups for neoverse-n2, among which, the
formula of topdown L1 is from the document:
https://documentation-service.arm.com/static/60250c7395978b529036da86?token=

Since neoverse-n2 does not yet support topdown L2, metricgroups such
as Cache, TLB, Branch, InstructionsMix, and PEutilization are added to
help further analysis of performance bottlenecks.

with this series on neoverse-n2:

$./perf list metricgroup

List of pre-defined events (to be used in -e):


Metric Groups:

Branch
Cache
InstructionMix
PEutilization
TLB
TopDownL1


$./perf list

...
Metric Groups:

Branch:
  branch_miss_pred_rate
       [The rate of branches mis-predited to the overall branches]
  branch_mpki
       [The rate of branches mis-predicted per kilo instructions]
  branch_pki
       [The rate of branches retired per kilo instructions]
Cache:
  l1d_cache_miss_rate
       [The rate of L1 D-Cache misses to the overall L1 D-Cache]
  l1d_cache_mpki
       [The rate of L1 D-Cache misses per kilo instructions]
...


$sudo ./perf stat -a -M TLB sleep 1

 Performance counter stats for 'system wide':

        35,861,936      L1I_TLB                          #     0.00 itlb_walk_rate           (74.91%)
             5,661      ITLB_WALK                                                            (74.91%)
        97,279,240      INST_RETIRED                     #     0.07 itlb_mpki                (74.91%)
             6,851      ITLB_WALK                                                            (74.91%)
            26,391      DTLB_WALK                        #     0.00 dtlb_walk_rate           (75.07%)
        35,585,545      L1D_TLB                                                              (75.07%)
        85,923,244      INST_RETIRED                     #     0.35 dtlb_mpki                (75.11%)
            29,992      DTLB_WALK                                                            (75.11%)

       1.003450755 seconds time elapsed
       

Jing Zhang (6):
  perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
  perf vendor events arm64: Add TLB metrics for neoverse-n2
  perf vendor events arm64: Add cache metrics for neoverse-n2
  perf vendor events arm64: Add branch metrics for neoverse-n2
  perf vendor events arm64: Add PE utilization metrics for neoverse-n2
  perf vendor events arm64: Add instruction mix metrics for neoverse-n2

 .../arch/arm64/arm/neoverse-n2/metrics.json        | 247 +++++++++++++++++++++
 1 file changed, 247 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH RFC 0/6] Add metrics for neoverse-n2
@ 2022-10-31 11:11 ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-10-31 11:11 UTC (permalink / raw)
  To: linux-arm-kernel, linux-perf-users, linux-kernel
  Cc: John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song

This series add six metricgroups for neoverse-n2, among which, the
formula of topdown L1 is from the document:
https://documentation-service.arm.com/static/60250c7395978b529036da86?token=

Since neoverse-n2 does not yet support topdown L2, metricgroups such
as Cache, TLB, Branch, InstructionsMix, and PEutilization are added to
help further analysis of performance bottlenecks.

with this series on neoverse-n2:

$./perf list metricgroup

List of pre-defined events (to be used in -e):


Metric Groups:

Branch
Cache
InstructionMix
PEutilization
TLB
TopDownL1


$./perf list

...
Metric Groups:

Branch:
  branch_miss_pred_rate
       [The rate of branches mis-predited to the overall branches]
  branch_mpki
       [The rate of branches mis-predicted per kilo instructions]
  branch_pki
       [The rate of branches retired per kilo instructions]
Cache:
  l1d_cache_miss_rate
       [The rate of L1 D-Cache misses to the overall L1 D-Cache]
  l1d_cache_mpki
       [The rate of L1 D-Cache misses per kilo instructions]
...


$sudo ./perf stat -a -M TLB sleep 1

 Performance counter stats for 'system wide':

        35,861,936      L1I_TLB                          #     0.00 itlb_walk_rate           (74.91%)
             5,661      ITLB_WALK                                                            (74.91%)
        97,279,240      INST_RETIRED                     #     0.07 itlb_mpki                (74.91%)
             6,851      ITLB_WALK                                                            (74.91%)
            26,391      DTLB_WALK                        #     0.00 dtlb_walk_rate           (75.07%)
        35,585,545      L1D_TLB                                                              (75.07%)
        85,923,244      INST_RETIRED                     #     0.35 dtlb_mpki                (75.11%)
            29,992      DTLB_WALK                                                            (75.11%)

       1.003450755 seconds time elapsed
       

Jing Zhang (6):
  perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
  perf vendor events arm64: Add TLB metrics for neoverse-n2
  perf vendor events arm64: Add cache metrics for neoverse-n2
  perf vendor events arm64: Add branch metrics for neoverse-n2
  perf vendor events arm64: Add PE utilization metrics for neoverse-n2
  perf vendor events arm64: Add instruction mix metrics for neoverse-n2

 .../arch/arm64/arm/neoverse-n2/metrics.json        | 247 +++++++++++++++++++++
 1 file changed, 247 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json

-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH RFC 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
  2022-10-31 11:11 ` Jing Zhang
@ 2022-10-31 11:11   ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-10-31 11:11 UTC (permalink / raw)
  To: linux-arm-kernel, linux-perf-users, linux-kernel
  Cc: John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song

The calculation formula of topdown L1 is from the document:
https://documentation-service.arm.com/static/60250c7395978b529036da86?token=

Since neoverse-n2 does not yet support topdown L2, metricgroups such
as Cache, TLB, Branch, InstructionsMix, and PEutilization will be
added to further analysis of performance bottlenecks in the following
patches.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 30 ++++++++++++++++++++++
 1 file changed, 30 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
new file mode 100644
index 0000000..b6b3b19
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -0,0 +1,30 @@
+[
+    {
+        "MetricExpr": "stall_slot_frontend / (5 * cpu_cycles)",
+        "PublicDescription": "Frontend bound L1 topdown metric",
+        "BriefDescription": "Frontend bound L1 topdown metric",
+        "MetricGroup": "TopDownL1",
+        "MetricName": "frontend_bound"
+    },
+    {
+        "MetricExpr": "(1 - op_retired / op_spec) * (1 - stall_slot / (5 * cpu_cycles))",
+        "PublicDescription": "Wasted L1 topdown metric",
+        "BriefDescription": "Wasted L1 topdown metric",
+        "MetricGroup": "TopDownL1",
+        "MetricName": "wasted"
+    },
+    {
+        "MetricExpr": "(op_retired / op_spec) * (1 - stall_slot / (5 * cpu_cycles))",
+        "PublicDescription": "Retiring L1 topdown metric",
+        "BriefDescription": "Retiring L1 topdown metric",
+        "MetricGroup": "TopDownL1",
+        "MetricName": "retiring"
+    },
+    {
+        "MetricExpr": "stall_slot_backend / (5 * cpu_cycles)",
+        "PublicDescription": "Backend Bound L1 topdown metric",
+        "BriefDescription": "Backend Bound L1 topdown metric",
+        "MetricGroup": "TopDownL1",
+        "MetricName": "backend_bound"
+    }
+]
\ No newline at end of file
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH RFC 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
@ 2022-10-31 11:11   ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-10-31 11:11 UTC (permalink / raw)
  To: linux-arm-kernel, linux-perf-users, linux-kernel
  Cc: John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song

The calculation formula of topdown L1 is from the document:
https://documentation-service.arm.com/static/60250c7395978b529036da86?token=

Since neoverse-n2 does not yet support topdown L2, metricgroups such
as Cache, TLB, Branch, InstructionsMix, and PEutilization will be
added to further analysis of performance bottlenecks in the following
patches.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 30 ++++++++++++++++++++++
 1 file changed, 30 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
new file mode 100644
index 0000000..b6b3b19
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -0,0 +1,30 @@
+[
+    {
+        "MetricExpr": "stall_slot_frontend / (5 * cpu_cycles)",
+        "PublicDescription": "Frontend bound L1 topdown metric",
+        "BriefDescription": "Frontend bound L1 topdown metric",
+        "MetricGroup": "TopDownL1",
+        "MetricName": "frontend_bound"
+    },
+    {
+        "MetricExpr": "(1 - op_retired / op_spec) * (1 - stall_slot / (5 * cpu_cycles))",
+        "PublicDescription": "Wasted L1 topdown metric",
+        "BriefDescription": "Wasted L1 topdown metric",
+        "MetricGroup": "TopDownL1",
+        "MetricName": "wasted"
+    },
+    {
+        "MetricExpr": "(op_retired / op_spec) * (1 - stall_slot / (5 * cpu_cycles))",
+        "PublicDescription": "Retiring L1 topdown metric",
+        "BriefDescription": "Retiring L1 topdown metric",
+        "MetricGroup": "TopDownL1",
+        "MetricName": "retiring"
+    },
+    {
+        "MetricExpr": "stall_slot_backend / (5 * cpu_cycles)",
+        "PublicDescription": "Backend Bound L1 topdown metric",
+        "BriefDescription": "Backend Bound L1 topdown metric",
+        "MetricGroup": "TopDownL1",
+        "MetricName": "backend_bound"
+    }
+]
\ No newline at end of file
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH RFC 2/6] perf vendor events arm64: Add TLB metrics for neoverse-n2
  2022-10-31 11:11 ` Jing Zhang
@ 2022-10-31 11:11   ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-10-31 11:11 UTC (permalink / raw)
  To: linux-arm-kernel, linux-perf-users, linux-kernel
  Cc: John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song

Add TLB related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 28 ++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index b6b3b19..066d905 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -26,5 +26,33 @@
         "BriefDescription": "Backend Bound L1 topdown metric",
         "MetricGroup": "TopDownL1",
         "MetricName": "backend_bound"
+    },
+    {
+        "MetricExpr": "ITLB_WALK / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of TLB Walks per kilo instructions for instruction accesses",
+        "BriefDescription": "The rate of TLB Walks per kilo instructions for instruction accesses",
+        "MetricGroup": "TLB",
+        "MetricName": "itlb_mpki"
+    },
+    {
+        "MetricExpr": "ITLB_WALK / L1I_TLB",
+        "PublicDescription": "The rate of ITLB Walks to the overall TLB lookups initiated from the instruction side",
+        "BriefDescription": "The rate of ITLB Walks to the overall TLB lookups",
+        "MetricGroup": "TLB",
+        "MetricName": "itlb_walk_rate"
+    },
+    {
+        "MetricExpr": "DTLB_WALK / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of TLB Walks per kilo instructions for data accesses",
+        "BriefDescription": "The rate of TLB Walks per kilo instructions for data accesses",
+        "MetricGroup": "TLB",
+        "MetricName": "dtlb_mpki"
+    },
+    {
+        "MetricExpr": "DTLB_WALK / L1D_TLB",
+        "PublicDescription": "The rate of DTLB Walks to the overall TLB lookups made by the program",
+        "BriefDescription": "The rate of DTLB Walks to the overall TLB lookups",
+        "MetricGroup": "TLB",
+        "MetricName": "dtlb_walk_rate"
     }
 ]
\ No newline at end of file
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH RFC 2/6] perf vendor events arm64: Add TLB metrics for neoverse-n2
@ 2022-10-31 11:11   ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-10-31 11:11 UTC (permalink / raw)
  To: linux-arm-kernel, linux-perf-users, linux-kernel
  Cc: John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song

Add TLB related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 28 ++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index b6b3b19..066d905 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -26,5 +26,33 @@
         "BriefDescription": "Backend Bound L1 topdown metric",
         "MetricGroup": "TopDownL1",
         "MetricName": "backend_bound"
+    },
+    {
+        "MetricExpr": "ITLB_WALK / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of TLB Walks per kilo instructions for instruction accesses",
+        "BriefDescription": "The rate of TLB Walks per kilo instructions for instruction accesses",
+        "MetricGroup": "TLB",
+        "MetricName": "itlb_mpki"
+    },
+    {
+        "MetricExpr": "ITLB_WALK / L1I_TLB",
+        "PublicDescription": "The rate of ITLB Walks to the overall TLB lookups initiated from the instruction side",
+        "BriefDescription": "The rate of ITLB Walks to the overall TLB lookups",
+        "MetricGroup": "TLB",
+        "MetricName": "itlb_walk_rate"
+    },
+    {
+        "MetricExpr": "DTLB_WALK / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of TLB Walks per kilo instructions for data accesses",
+        "BriefDescription": "The rate of TLB Walks per kilo instructions for data accesses",
+        "MetricGroup": "TLB",
+        "MetricName": "dtlb_mpki"
+    },
+    {
+        "MetricExpr": "DTLB_WALK / L1D_TLB",
+        "PublicDescription": "The rate of DTLB Walks to the overall TLB lookups made by the program",
+        "BriefDescription": "The rate of DTLB Walks to the overall TLB lookups",
+        "MetricGroup": "TLB",
+        "MetricName": "dtlb_walk_rate"
     }
 ]
\ No newline at end of file
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH RFC 3/6] perf vendor events arm64: Add cache metrics for neoverse-n2
  2022-10-31 11:11 ` Jing Zhang
@ 2022-10-31 11:11   ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-10-31 11:11 UTC (permalink / raw)
  To: linux-arm-kernel, linux-perf-users, linux-kernel
  Cc: John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song

Add cache related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 77 ++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index 066d905..2dc6d9e 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -54,5 +54,82 @@
         "BriefDescription": "The rate of DTLB Walks to the overall TLB lookups",
         "MetricGroup": "TLB",
         "MetricName": "dtlb_walk_rate"
+    },
+    {
+        "MetricExpr": "L1I_CACHE_REFILL / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of L1 I-Cache misses per kilo instructions",
+        "BriefDescription": "The rate of L1 I-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l1i_cache_mpki"
+    },
+    {
+        "MetricExpr": "L1I_CACHE_REFILL / L1I_CACHE",
+        "PublicDescription": "The rate of L1 I-Cache misses to the overall L1 I-Cache",
+        "BriefDescription": "The rate of L1 I-Cache misses to the overall L1 I-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l1i_cache_miss_rate"
+    },
+    {
+        "MetricExpr": "L1D_CACHE_REFILL / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of L1 D-Cache misses per kilo instructions",
+        "BriefDescription": "The rate of L1 D-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l1d_cache_mpki"
+    },
+    {
+        "MetricExpr": "L1D_CACHE_REFILL / L1D_CACHE",
+        "PublicDescription": "The rate of L1 D-Cache misses to the overall L1 D-Cache",
+        "BriefDescription": "The rate of L1 D-Cache misses to the overall L1 D-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l1d_cache_miss_rate"
+    },
+    {
+        "MetricExpr": "L2D_CACHE_REFILL / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of L2 D-Cache misses per kilo instructions",
+        "BriefDescription": "The rate of L2 D-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l2d_cache_mpki"
+    },
+    {
+        "MetricExpr": "L2D_CACHE_REFILL / L2D_CACHE",
+        "PublicDescription": "The rate of L2 D-Cache misses to the overall L2 D-Cache",
+        "BriefDescription": "The rate of L2 D-Cache misses to the overall L2 D-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l2d_cache_miss_rate"
+    },
+    {
+        "MetricExpr": "L3D_CACHE_REFILL / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of L3 D-Cache misses per kilo instructions",
+        "BriefDescription": "The rate of L3 D-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l3d_cache_mpki"
+    },
+    {
+        "MetricExpr": "L3D_CACHE_REFILL / L3D_CACHE",
+        "PublicDescription": "The rate of L3 D-Cache misses to the overall L3 D-Cache",
+        "BriefDescription": "The rate of L3 D-Cache misses to the overall L3 D-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l3d_cache_miss_rate"
+    },
+    {
+        "MetricExpr": "LL_CACHE_MISS_RD / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of LL Cache read misses per kilo instructions",
+        "BriefDescription": "The rate of LL Cache read misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "ll_cache_read_mpki"
+    },
+    {
+        "MetricExpr": "LL_CACHE_MISS_RD / LL_CACHE_RD",
+        "PublicDescription": "The rate of LL Cache read misses to the overall LL Cache read",
+        "BriefDescription": "The rate of LL Cache read misses to the overall LL Cache read",
+        "MetricGroup": "Cache",
+        "MetricName": "ll_cache_read_miss_rate"
+    },
+    {
+        "MetricExpr": "(LL_CACHE_RD - LL_CACHE_MISS_RD) / LL_CACHE_RD",
+        "PublicDescription": "The rate of LL Cache read hit to the overall LL Cache read",
+        "BriefDescription": "The rate of LL Cache read hit to the overall LL Cache read",
+        "MetricGroup": "Cache",
+        "MetricName": "ll_cache_read_hit_rate"
     }
 ]
\ No newline at end of file
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH RFC 3/6] perf vendor events arm64: Add cache metrics for neoverse-n2
@ 2022-10-31 11:11   ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-10-31 11:11 UTC (permalink / raw)
  To: linux-arm-kernel, linux-perf-users, linux-kernel
  Cc: John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song

Add cache related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 77 ++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index 066d905..2dc6d9e 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -54,5 +54,82 @@
         "BriefDescription": "The rate of DTLB Walks to the overall TLB lookups",
         "MetricGroup": "TLB",
         "MetricName": "dtlb_walk_rate"
+    },
+    {
+        "MetricExpr": "L1I_CACHE_REFILL / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of L1 I-Cache misses per kilo instructions",
+        "BriefDescription": "The rate of L1 I-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l1i_cache_mpki"
+    },
+    {
+        "MetricExpr": "L1I_CACHE_REFILL / L1I_CACHE",
+        "PublicDescription": "The rate of L1 I-Cache misses to the overall L1 I-Cache",
+        "BriefDescription": "The rate of L1 I-Cache misses to the overall L1 I-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l1i_cache_miss_rate"
+    },
+    {
+        "MetricExpr": "L1D_CACHE_REFILL / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of L1 D-Cache misses per kilo instructions",
+        "BriefDescription": "The rate of L1 D-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l1d_cache_mpki"
+    },
+    {
+        "MetricExpr": "L1D_CACHE_REFILL / L1D_CACHE",
+        "PublicDescription": "The rate of L1 D-Cache misses to the overall L1 D-Cache",
+        "BriefDescription": "The rate of L1 D-Cache misses to the overall L1 D-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l1d_cache_miss_rate"
+    },
+    {
+        "MetricExpr": "L2D_CACHE_REFILL / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of L2 D-Cache misses per kilo instructions",
+        "BriefDescription": "The rate of L2 D-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l2d_cache_mpki"
+    },
+    {
+        "MetricExpr": "L2D_CACHE_REFILL / L2D_CACHE",
+        "PublicDescription": "The rate of L2 D-Cache misses to the overall L2 D-Cache",
+        "BriefDescription": "The rate of L2 D-Cache misses to the overall L2 D-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l2d_cache_miss_rate"
+    },
+    {
+        "MetricExpr": "L3D_CACHE_REFILL / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of L3 D-Cache misses per kilo instructions",
+        "BriefDescription": "The rate of L3 D-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l3d_cache_mpki"
+    },
+    {
+        "MetricExpr": "L3D_CACHE_REFILL / L3D_CACHE",
+        "PublicDescription": "The rate of L3 D-Cache misses to the overall L3 D-Cache",
+        "BriefDescription": "The rate of L3 D-Cache misses to the overall L3 D-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l3d_cache_miss_rate"
+    },
+    {
+        "MetricExpr": "LL_CACHE_MISS_RD / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of LL Cache read misses per kilo instructions",
+        "BriefDescription": "The rate of LL Cache read misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "ll_cache_read_mpki"
+    },
+    {
+        "MetricExpr": "LL_CACHE_MISS_RD / LL_CACHE_RD",
+        "PublicDescription": "The rate of LL Cache read misses to the overall LL Cache read",
+        "BriefDescription": "The rate of LL Cache read misses to the overall LL Cache read",
+        "MetricGroup": "Cache",
+        "MetricName": "ll_cache_read_miss_rate"
+    },
+    {
+        "MetricExpr": "(LL_CACHE_RD - LL_CACHE_MISS_RD) / LL_CACHE_RD",
+        "PublicDescription": "The rate of LL Cache read hit to the overall LL Cache read",
+        "BriefDescription": "The rate of LL Cache read hit to the overall LL Cache read",
+        "MetricGroup": "Cache",
+        "MetricName": "ll_cache_read_hit_rate"
     }
 ]
\ No newline at end of file
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH RFC 4/6] perf vendor events arm64: Add branch metrics for neoverse-n2
  2022-10-31 11:11 ` Jing Zhang
@ 2022-10-31 11:11   ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-10-31 11:11 UTC (permalink / raw)
  To: linux-arm-kernel, linux-perf-users, linux-kernel
  Cc: John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song

Add branch related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json         | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index 2dc6d9e..6b5aaf7 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -131,5 +131,26 @@
         "BriefDescription": "The rate of LL Cache read hit to the overall LL Cache read",
         "MetricGroup": "Cache",
         "MetricName": "ll_cache_read_hit_rate"
+    },
+    {
+        "MetricExpr": "BR_MIS_PRED / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of branches mis-predicted per kilo instructions",
+        "BriefDescription": "The rate of branches mis-predicted per kilo instructions",
+        "MetricGroup": "Branch",
+        "MetricName": "branch_mpki"
+    },
+    {
+        "MetricExpr": "(BR_PRED - BR_MIS_PRED) / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of branches retired per kilo instructions",
+        "BriefDescription": "The rate of branches retired per kilo instructions",
+        "MetricGroup": "Branch",
+        "MetricName": "branch_pki"
+    },
+    {
+        "MetricExpr": "BR_MIS_PRED / BR_PRED",
+        "PublicDescription": "The rate of branches mis-predited to the overall branches",
+        "BriefDescription": "The rate of branches mis-predited to the overall branches",
+        "MetricGroup": "Branch",
+        "MetricName": "branch_miss_pred_rate"
     }
 ]
\ No newline at end of file
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH RFC 4/6] perf vendor events arm64: Add branch metrics for neoverse-n2
@ 2022-10-31 11:11   ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-10-31 11:11 UTC (permalink / raw)
  To: linux-arm-kernel, linux-perf-users, linux-kernel
  Cc: John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song

Add branch related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json         | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index 2dc6d9e..6b5aaf7 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -131,5 +131,26 @@
         "BriefDescription": "The rate of LL Cache read hit to the overall LL Cache read",
         "MetricGroup": "Cache",
         "MetricName": "ll_cache_read_hit_rate"
+    },
+    {
+        "MetricExpr": "BR_MIS_PRED / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of branches mis-predicted per kilo instructions",
+        "BriefDescription": "The rate of branches mis-predicted per kilo instructions",
+        "MetricGroup": "Branch",
+        "MetricName": "branch_mpki"
+    },
+    {
+        "MetricExpr": "(BR_PRED - BR_MIS_PRED) / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of branches retired per kilo instructions",
+        "BriefDescription": "The rate of branches retired per kilo instructions",
+        "MetricGroup": "Branch",
+        "MetricName": "branch_pki"
+    },
+    {
+        "MetricExpr": "BR_MIS_PRED / BR_PRED",
+        "PublicDescription": "The rate of branches mis-predited to the overall branches",
+        "BriefDescription": "The rate of branches mis-predited to the overall branches",
+        "MetricGroup": "Branch",
+        "MetricName": "branch_miss_pred_rate"
     }
 ]
\ No newline at end of file
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH RFC 5/6] perf vendor events arm64: Add PE utilization metrics for neoverse-n2
  2022-10-31 11:11 ` Jing Zhang
@ 2022-10-31 11:11   ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-10-31 11:11 UTC (permalink / raw)
  To: linux-arm-kernel, linux-perf-users, linux-kernel
  Cc: John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song

Add PE utilization related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 28 ++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index 6b5aaf7..230e93a 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -152,5 +152,33 @@
         "BriefDescription": "The rate of branches mis-predited to the overall branches",
         "MetricGroup": "Branch",
         "MetricName": "branch_miss_pred_rate"
+    },
+    {
+        "MetricExpr": "instructions / CPU_CYCLES",
+        "PublicDescription": "The average number of instructions executed for each cycle.",
+        "BriefDescription": "Instructions per cycle",
+        "MetricGroup": "PEutilization",
+        "MetricName": "ipc"
+    },
+    {
+        "MetricExpr": "OP_RETIRED / OP_SPEC",
+        "PublicDescription": "Fraction of operations retired",
+        "BriefDescription": "Fraction of operations retired",
+        "MetricGroup": "PEutilization",
+        "MetricName": "retired_rate"
+    },
+    {
+        "MetricExpr": "1 - OP_RETIRED / OP_SPEC",
+        "PublicDescription": "Fraction of operations wasted",
+        "BriefDescription": "Fraction of operations wasted",
+        "MetricGroup": "PEutilization",
+        "MetricName": "wasted_rate"
+    },
+    {
+        "MetricExpr": "(1 - STALL_SLOT / (CPU_CYCLES * 5)) * OP_RETIRED / OP_SPEC",
+        "PublicDescription": "Utilization of CPU",
+        "BriefDescription": "Utilization of CPU",
+        "MetricGroup": "PEutilization",
+        "MetricName": "cpu_utilization"
     }
 ]
\ No newline at end of file
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH RFC 5/6] perf vendor events arm64: Add PE utilization metrics for neoverse-n2
@ 2022-10-31 11:11   ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-10-31 11:11 UTC (permalink / raw)
  To: linux-arm-kernel, linux-perf-users, linux-kernel
  Cc: John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song

Add PE utilization related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 28 ++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index 6b5aaf7..230e93a 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -152,5 +152,33 @@
         "BriefDescription": "The rate of branches mis-predited to the overall branches",
         "MetricGroup": "Branch",
         "MetricName": "branch_miss_pred_rate"
+    },
+    {
+        "MetricExpr": "instructions / CPU_CYCLES",
+        "PublicDescription": "The average number of instructions executed for each cycle.",
+        "BriefDescription": "Instructions per cycle",
+        "MetricGroup": "PEutilization",
+        "MetricName": "ipc"
+    },
+    {
+        "MetricExpr": "OP_RETIRED / OP_SPEC",
+        "PublicDescription": "Fraction of operations retired",
+        "BriefDescription": "Fraction of operations retired",
+        "MetricGroup": "PEutilization",
+        "MetricName": "retired_rate"
+    },
+    {
+        "MetricExpr": "1 - OP_RETIRED / OP_SPEC",
+        "PublicDescription": "Fraction of operations wasted",
+        "BriefDescription": "Fraction of operations wasted",
+        "MetricGroup": "PEutilization",
+        "MetricName": "wasted_rate"
+    },
+    {
+        "MetricExpr": "(1 - STALL_SLOT / (CPU_CYCLES * 5)) * OP_RETIRED / OP_SPEC",
+        "PublicDescription": "Utilization of CPU",
+        "BriefDescription": "Utilization of CPU",
+        "MetricGroup": "PEutilization",
+        "MetricName": "cpu_utilization"
     }
 ]
\ No newline at end of file
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH RFC 6/6] perf vendor events arm64: Add instruction mix metrics for neoverse-n2
  2022-10-31 11:11 ` Jing Zhang
@ 2022-10-31 11:11   ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-10-31 11:11 UTC (permalink / raw)
  To: linux-arm-kernel, linux-perf-users, linux-kernel
  Cc: John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song

Add instruction mix related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 63 ++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index 230e93a..2a3e50d 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -180,5 +180,68 @@
         "BriefDescription": "Utilization of CPU",
         "MetricGroup": "PEutilization",
         "MetricName": "cpu_utilization"
+    },
+    {
+        "MetricExpr": "LD_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of load instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of load instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "load_spec_rate"
+    },
+    {
+        "MetricExpr": "ST_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of store instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of store instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "store_spec_rate"
+    },
+    {
+        "MetricExpr": "DP_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of integer data-processing instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of integer data-processing instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "date_process_spec_rate"
+    },
+    {
+        "MetricExpr": "ASE_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of advanced SIMD instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of advanced SIMD instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "advanced_simd_spec_rate"
+    },
+    {
+        "MetricExpr": "VFP_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of floating point instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of floating point instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "float_point_spec_rate"
+    },
+    {
+        "MetricExpr": "CRYPTO_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of crypto instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of crypto instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "crypto_spec_rate"
+    },
+    {
+        "MetricExpr": "BR_IMMED_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of branch immediate instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of branch immediate instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "branch_immed_spec_rate"
+    },
+    {
+        "MetricExpr": "BR_RETURN_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of procedure return instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of procedure return instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "branch_return_spec_rate"
+    },
+    {
+        "MetricExpr": "BR_INDIRECT_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of indirect branch instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of indirect branch instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "branch_indirect_spec_rate"
     }
 ]
\ No newline at end of file
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH RFC 6/6] perf vendor events arm64: Add instruction mix metrics for neoverse-n2
@ 2022-10-31 11:11   ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-10-31 11:11 UTC (permalink / raw)
  To: linux-arm-kernel, linux-perf-users, linux-kernel
  Cc: John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song

Add instruction mix related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 63 ++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index 230e93a..2a3e50d 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -180,5 +180,68 @@
         "BriefDescription": "Utilization of CPU",
         "MetricGroup": "PEutilization",
         "MetricName": "cpu_utilization"
+    },
+    {
+        "MetricExpr": "LD_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of load instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of load instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "load_spec_rate"
+    },
+    {
+        "MetricExpr": "ST_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of store instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of store instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "store_spec_rate"
+    },
+    {
+        "MetricExpr": "DP_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of integer data-processing instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of integer data-processing instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "date_process_spec_rate"
+    },
+    {
+        "MetricExpr": "ASE_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of advanced SIMD instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of advanced SIMD instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "advanced_simd_spec_rate"
+    },
+    {
+        "MetricExpr": "VFP_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of floating point instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of floating point instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "float_point_spec_rate"
+    },
+    {
+        "MetricExpr": "CRYPTO_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of crypto instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of crypto instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "crypto_spec_rate"
+    },
+    {
+        "MetricExpr": "BR_IMMED_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of branch immediate instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of branch immediate instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "branch_immed_spec_rate"
+    },
+    {
+        "MetricExpr": "BR_RETURN_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of procedure return instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of procedure return instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "branch_return_spec_rate"
+    },
+    {
+        "MetricExpr": "BR_INDIRECT_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of indirect branch instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of indirect branch instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "branch_indirect_spec_rate"
     }
 ]
\ No newline at end of file
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [RFC PATCH v2 0/6] Add metrics for neoverse-n2
  2022-10-31 11:11 ` Jing Zhang
@ 2022-11-14  7:41   ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-14  7:41 UTC (permalink / raw)
  To: linux-arm-kernel, linux-perf-users, linux-kernel, John Garry,
	Will Deacon, James Clark, Mike Leach, Leo Yan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song, Jing Zhang

Changes since v1: 
- Corrected formula for topdown L1 due to wrong counts for stall_slot and
  stall_slot_frontend; 
- Link: https://lore.kernel.org/all/1667214694-89839-1-git-send-email-renyu.zj@linux.alibaba.com/

This series add six metricgroups for neoverse-n2, among which, the formula of
topdown L1 is from the document:
https://documentation-service.arm.com/static/60250c7395978b529036da86?token=

Due to the wrong count of stall_slot and stall_slot_frontend in neoverse-n2, the
real stall_slot and real stall_slot_frontend need to subtract cpu_cycles, so
when calculating the topdownL1 metrics, stall_slot and stall_slot_frontend are
corrected.

Since neoverse-n2 does not yet support topdown L2, metricgroups such as Cache,
TLB, Branch, InstructionsMix, and PEutilization are added to help further
analysis of performance bottlenecks.

with this series on neoverse-n2:

$./perf list

...
Metric Groups:

Branch:
  branch_miss_pred_rate
       [The rate of branches mis-predited to the overall branches]
  branch_mpki
       [The rate of branches mis-predicted per kilo instructions]
  branch_pki
       [The rate of branches retired per kilo instructions]
Cache:
  l1d_cache_miss_rate
       [The rate of L1 D-Cache misses to the overall L1 D-Cache]
  l1d_cache_mpki
       [The rate of L1 D-Cache misses per kilo instructions]
...


$sudo ./perf stat -a -M TLB sleep 1

 Performance counter stats for 'system wide':

        35,861,936      L1I_TLB                          #     0.00 itlb_walk_rate           (74.91%)
             5,661      ITLB_WALK                                                            (74.91%)
        97,279,240      INST_RETIRED                     #     0.07 itlb_mpki                (74.91%)
             6,851      ITLB_WALK                                                            (74.91%)
            26,391      DTLB_WALK                        #     0.00 dtlb_walk_rate           (75.07%)
        35,585,545      L1D_TLB                                                              (75.07%)
        85,923,244      INST_RETIRED                     #     0.35 dtlb_mpki                (75.11%)
            29,992      DTLB_WALK                                                            (75.11%)

       1.003450755 seconds time elapsed


$sudo ./perf stat -M TopDownL1 false_sharing 2

Performance counter stats for 'false_sharing 2':

     3,388,884,713      cpu_cycles                       #     0.05 retiring
                                                  #     0.00 wasted                   (66.59%)
    19,495,064,576      stall_slot                                                           (66.59%)
       838,235,126      op_spec                                                              (66.59%)
       836,787,162      op_retired                                                           (66.59%)
     3,380,520,038      cpu_cycles                       #     0.29 frontend_bound           (67.15%)
     8,267,545,049      stall_slot_frontend                                                  (67.15%)
     3,389,138,804      cpu_cycles                       #     0.67 backend_bound            (66.66%)
    11,337,766,816      stall_slot_backend                                                   (66.66%)

       0.442572628 seconds time elapsed

       1.235153000 seconds user
       0.000000000 seconds sys

Jing Zhang (6):
  perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
  perf vendor events arm64: Add TLB metrics for neoverse-n2
  perf vendor events arm64: Add cache metrics for neoverse-n2
  perf vendor events arm64: Add branch metrics for neoverse-n2
  perf vendor events arm64: Add PE utilization metrics for neoverse-n2
  perf vendor events arm64: Add instruction mix metrics for neoverse-n2

 .../arch/arm64/arm/neoverse-n2/metrics.json        | 247 +++++++++++++++++++++
 1 file changed, 247 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [RFC PATCH v2 0/6] Add metrics for neoverse-n2
@ 2022-11-14  7:41   ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-14  7:41 UTC (permalink / raw)
  To: linux-arm-kernel, linux-perf-users, linux-kernel, John Garry,
	Will Deacon, James Clark, Mike Leach, Leo Yan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song, Jing Zhang

Changes since v1: 
- Corrected formula for topdown L1 due to wrong counts for stall_slot and
  stall_slot_frontend; 
- Link: https://lore.kernel.org/all/1667214694-89839-1-git-send-email-renyu.zj@linux.alibaba.com/

This series add six metricgroups for neoverse-n2, among which, the formula of
topdown L1 is from the document:
https://documentation-service.arm.com/static/60250c7395978b529036da86?token=

Due to the wrong count of stall_slot and stall_slot_frontend in neoverse-n2, the
real stall_slot and real stall_slot_frontend need to subtract cpu_cycles, so
when calculating the topdownL1 metrics, stall_slot and stall_slot_frontend are
corrected.

Since neoverse-n2 does not yet support topdown L2, metricgroups such as Cache,
TLB, Branch, InstructionsMix, and PEutilization are added to help further
analysis of performance bottlenecks.

with this series on neoverse-n2:

$./perf list

...
Metric Groups:

Branch:
  branch_miss_pred_rate
       [The rate of branches mis-predited to the overall branches]
  branch_mpki
       [The rate of branches mis-predicted per kilo instructions]
  branch_pki
       [The rate of branches retired per kilo instructions]
Cache:
  l1d_cache_miss_rate
       [The rate of L1 D-Cache misses to the overall L1 D-Cache]
  l1d_cache_mpki
       [The rate of L1 D-Cache misses per kilo instructions]
...


$sudo ./perf stat -a -M TLB sleep 1

 Performance counter stats for 'system wide':

        35,861,936      L1I_TLB                          #     0.00 itlb_walk_rate           (74.91%)
             5,661      ITLB_WALK                                                            (74.91%)
        97,279,240      INST_RETIRED                     #     0.07 itlb_mpki                (74.91%)
             6,851      ITLB_WALK                                                            (74.91%)
            26,391      DTLB_WALK                        #     0.00 dtlb_walk_rate           (75.07%)
        35,585,545      L1D_TLB                                                              (75.07%)
        85,923,244      INST_RETIRED                     #     0.35 dtlb_mpki                (75.11%)
            29,992      DTLB_WALK                                                            (75.11%)

       1.003450755 seconds time elapsed


$sudo ./perf stat -M TopDownL1 false_sharing 2

Performance counter stats for 'false_sharing 2':

     3,388,884,713      cpu_cycles                       #     0.05 retiring
                                                  #     0.00 wasted                   (66.59%)
    19,495,064,576      stall_slot                                                           (66.59%)
       838,235,126      op_spec                                                              (66.59%)
       836,787,162      op_retired                                                           (66.59%)
     3,380,520,038      cpu_cycles                       #     0.29 frontend_bound           (67.15%)
     8,267,545,049      stall_slot_frontend                                                  (67.15%)
     3,389,138,804      cpu_cycles                       #     0.67 backend_bound            (66.66%)
    11,337,766,816      stall_slot_backend                                                   (66.66%)

       0.442572628 seconds time elapsed

       1.235153000 seconds user
       0.000000000 seconds sys

Jing Zhang (6):
  perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
  perf vendor events arm64: Add TLB metrics for neoverse-n2
  perf vendor events arm64: Add cache metrics for neoverse-n2
  perf vendor events arm64: Add branch metrics for neoverse-n2
  perf vendor events arm64: Add PE utilization metrics for neoverse-n2
  perf vendor events arm64: Add instruction mix metrics for neoverse-n2

 .../arch/arm64/arm/neoverse-n2/metrics.json        | 247 +++++++++++++++++++++
 1 file changed, 247 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json

-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [RFC PATCH v2 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
  2022-10-31 11:11 ` Jing Zhang
@ 2022-11-14  7:41   ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-14  7:41 UTC (permalink / raw)
  To: linux-arm-kernel, linux-perf-users, linux-kernel, John Garry,
	Will Deacon, James Clark, Mike Leach, Leo Yan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song, Jing Zhang

The calculation formula of topdown L1 is from the document:
https://documentation-service.arm.com/static/60250c7395978b529036da86?token=

However, due to the wrong count of stall_slot and stall_slot_frontend
in neoverse-n2, the real stall_slot and real stall_slot_frontend need
to subtract cpu_cycles, so when calculating the topdownL1 metrics,
stall_slot and stall_slot_frontend are corrected.

Since neoverse-n2 does not yet support topdown L2, metricgroups such
as Cache, TLB, Branch, InstructionsMix, and PEutilization will be
added to further analysis of performance bottlenecks in the following
patches.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 30 ++++++++++++++++++++++
 1 file changed, 30 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
new file mode 100644
index 0000000..0048dfe
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -0,0 +1,30 @@
+[
+    {
+        "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * cpu_cycles)",
+        "PublicDescription": "Frontend bound L1 topdown metric",
+        "BriefDescription": "Frontend bound L1 topdown metric",
+        "MetricGroup": "TopDownL1",
+        "MetricName": "frontend_bound"
+    },
+    {
+        "MetricExpr": "(1 - op_retired / op_spec) * (1 - (stall_slot - cpu_cycles) / (5 * cpu_cycles))",
+        "PublicDescription": "Wasted L1 topdown metric",
+        "BriefDescription": "Wasted L1 topdown metric",
+        "MetricGroup": "TopDownL1",
+        "MetricName": "wasted"
+    },
+    {
+        "MetricExpr": "(op_retired / op_spec) * (1 - (stall_slot - cpu_cycles) / (5 * cpu_cycles))",
+        "PublicDescription": "Retiring L1 topdown metric",
+        "BriefDescription": "Retiring L1 topdown metric",
+        "MetricGroup": "TopDownL1",
+        "MetricName": "retiring"
+    },
+    {
+        "MetricExpr": "stall_slot_backend / (5 * cpu_cycles)",
+        "PublicDescription": "Backend Bound L1 topdown metric",
+        "BriefDescription": "Backend Bound L1 topdown metric",
+        "MetricGroup": "TopDownL1",
+        "MetricName": "backend_bound"
+    }
+]
\ No newline at end of file
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [RFC PATCH v2 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
@ 2022-11-14  7:41   ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-14  7:41 UTC (permalink / raw)
  To: linux-arm-kernel, linux-perf-users, linux-kernel, John Garry,
	Will Deacon, James Clark, Mike Leach, Leo Yan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song, Jing Zhang

The calculation formula of topdown L1 is from the document:
https://documentation-service.arm.com/static/60250c7395978b529036da86?token=

However, due to the wrong count of stall_slot and stall_slot_frontend
in neoverse-n2, the real stall_slot and real stall_slot_frontend need
to subtract cpu_cycles, so when calculating the topdownL1 metrics,
stall_slot and stall_slot_frontend are corrected.

Since neoverse-n2 does not yet support topdown L2, metricgroups such
as Cache, TLB, Branch, InstructionsMix, and PEutilization will be
added to further analysis of performance bottlenecks in the following
patches.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 30 ++++++++++++++++++++++
 1 file changed, 30 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
new file mode 100644
index 0000000..0048dfe
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -0,0 +1,30 @@
+[
+    {
+        "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * cpu_cycles)",
+        "PublicDescription": "Frontend bound L1 topdown metric",
+        "BriefDescription": "Frontend bound L1 topdown metric",
+        "MetricGroup": "TopDownL1",
+        "MetricName": "frontend_bound"
+    },
+    {
+        "MetricExpr": "(1 - op_retired / op_spec) * (1 - (stall_slot - cpu_cycles) / (5 * cpu_cycles))",
+        "PublicDescription": "Wasted L1 topdown metric",
+        "BriefDescription": "Wasted L1 topdown metric",
+        "MetricGroup": "TopDownL1",
+        "MetricName": "wasted"
+    },
+    {
+        "MetricExpr": "(op_retired / op_spec) * (1 - (stall_slot - cpu_cycles) / (5 * cpu_cycles))",
+        "PublicDescription": "Retiring L1 topdown metric",
+        "BriefDescription": "Retiring L1 topdown metric",
+        "MetricGroup": "TopDownL1",
+        "MetricName": "retiring"
+    },
+    {
+        "MetricExpr": "stall_slot_backend / (5 * cpu_cycles)",
+        "PublicDescription": "Backend Bound L1 topdown metric",
+        "BriefDescription": "Backend Bound L1 topdown metric",
+        "MetricGroup": "TopDownL1",
+        "MetricName": "backend_bound"
+    }
+]
\ No newline at end of file
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [RFC PATCH v2 2/6] perf vendor events arm64: Add TLB metrics for neoverse-n2
  2022-10-31 11:11 ` Jing Zhang
@ 2022-11-14  7:41   ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-14  7:41 UTC (permalink / raw)
  To: linux-arm-kernel, linux-perf-users, linux-kernel, John Garry,
	Will Deacon, James Clark, Mike Leach, Leo Yan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song, Jing Zhang

Add TLB related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 28 ++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index 0048dfe..324ca12 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -26,5 +26,33 @@
         "BriefDescription": "Backend Bound L1 topdown metric",
         "MetricGroup": "TopDownL1",
         "MetricName": "backend_bound"
+    },
+    {
+        "MetricExpr": "ITLB_WALK / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of TLB Walks per kilo instructions for instruction accesses",
+        "BriefDescription": "The rate of TLB Walks per kilo instructions for instruction accesses",
+        "MetricGroup": "TLB",
+        "MetricName": "itlb_mpki"
+    },
+    {
+        "MetricExpr": "ITLB_WALK / L1I_TLB",
+        "PublicDescription": "The rate of ITLB Walks to the overall TLB lookups initiated from the instruction side",
+        "BriefDescription": "The rate of ITLB Walks to the overall TLB lookups",
+        "MetricGroup": "TLB",
+        "MetricName": "itlb_walk_rate"
+    },
+    {
+        "MetricExpr": "DTLB_WALK / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of TLB Walks per kilo instructions for data accesses",
+        "BriefDescription": "The rate of TLB Walks per kilo instructions for data accesses",
+        "MetricGroup": "TLB",
+        "MetricName": "dtlb_mpki"
+    },
+    {
+        "MetricExpr": "DTLB_WALK / L1D_TLB",
+        "PublicDescription": "The rate of DTLB Walks to the overall TLB lookups made by the program",
+        "BriefDescription": "The rate of DTLB Walks to the overall TLB lookups",
+        "MetricGroup": "TLB",
+        "MetricName": "dtlb_walk_rate"
     }
 ]
\ No newline at end of file
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [RFC PATCH v2 2/6] perf vendor events arm64: Add TLB metrics for neoverse-n2
@ 2022-11-14  7:41   ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-14  7:41 UTC (permalink / raw)
  To: linux-arm-kernel, linux-perf-users, linux-kernel, John Garry,
	Will Deacon, James Clark, Mike Leach, Leo Yan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song, Jing Zhang

Add TLB related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 28 ++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index 0048dfe..324ca12 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -26,5 +26,33 @@
         "BriefDescription": "Backend Bound L1 topdown metric",
         "MetricGroup": "TopDownL1",
         "MetricName": "backend_bound"
+    },
+    {
+        "MetricExpr": "ITLB_WALK / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of TLB Walks per kilo instructions for instruction accesses",
+        "BriefDescription": "The rate of TLB Walks per kilo instructions for instruction accesses",
+        "MetricGroup": "TLB",
+        "MetricName": "itlb_mpki"
+    },
+    {
+        "MetricExpr": "ITLB_WALK / L1I_TLB",
+        "PublicDescription": "The rate of ITLB Walks to the overall TLB lookups initiated from the instruction side",
+        "BriefDescription": "The rate of ITLB Walks to the overall TLB lookups",
+        "MetricGroup": "TLB",
+        "MetricName": "itlb_walk_rate"
+    },
+    {
+        "MetricExpr": "DTLB_WALK / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of TLB Walks per kilo instructions for data accesses",
+        "BriefDescription": "The rate of TLB Walks per kilo instructions for data accesses",
+        "MetricGroup": "TLB",
+        "MetricName": "dtlb_mpki"
+    },
+    {
+        "MetricExpr": "DTLB_WALK / L1D_TLB",
+        "PublicDescription": "The rate of DTLB Walks to the overall TLB lookups made by the program",
+        "BriefDescription": "The rate of DTLB Walks to the overall TLB lookups",
+        "MetricGroup": "TLB",
+        "MetricName": "dtlb_walk_rate"
     }
 ]
\ No newline at end of file
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [RFC PATCH v2 3/6] perf vendor events arm64: Add cache metrics for neoverse-n2
  2022-10-31 11:11 ` Jing Zhang
@ 2022-11-14  7:41   ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-14  7:41 UTC (permalink / raw)
  To: linux-arm-kernel, linux-perf-users, linux-kernel, John Garry,
	Will Deacon, James Clark, Mike Leach, Leo Yan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song, Jing Zhang

Add cache related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 77 ++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index 324ca12..1690ef6 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -54,5 +54,82 @@
         "BriefDescription": "The rate of DTLB Walks to the overall TLB lookups",
         "MetricGroup": "TLB",
         "MetricName": "dtlb_walk_rate"
+    },
+    {
+        "MetricExpr": "L1I_CACHE_REFILL / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of L1 I-Cache misses per kilo instructions",
+        "BriefDescription": "The rate of L1 I-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l1i_cache_mpki"
+    },
+    {
+        "MetricExpr": "L1I_CACHE_REFILL / L1I_CACHE",
+        "PublicDescription": "The rate of L1 I-Cache misses to the overall L1 I-Cache",
+        "BriefDescription": "The rate of L1 I-Cache misses to the overall L1 I-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l1i_cache_miss_rate"
+    },
+    {
+        "MetricExpr": "L1D_CACHE_REFILL / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of L1 D-Cache misses per kilo instructions",
+        "BriefDescription": "The rate of L1 D-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l1d_cache_mpki"
+    },
+    {
+        "MetricExpr": "L1D_CACHE_REFILL / L1D_CACHE",
+        "PublicDescription": "The rate of L1 D-Cache misses to the overall L1 D-Cache",
+        "BriefDescription": "The rate of L1 D-Cache misses to the overall L1 D-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l1d_cache_miss_rate"
+    },
+    {
+        "MetricExpr": "L2D_CACHE_REFILL / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of L2 D-Cache misses per kilo instructions",
+        "BriefDescription": "The rate of L2 D-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l2d_cache_mpki"
+    },
+    {
+        "MetricExpr": "L2D_CACHE_REFILL / L2D_CACHE",
+        "PublicDescription": "The rate of L2 D-Cache misses to the overall L2 D-Cache",
+        "BriefDescription": "The rate of L2 D-Cache misses to the overall L2 D-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l2d_cache_miss_rate"
+    },
+    {
+        "MetricExpr": "L3D_CACHE_REFILL / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of L3 D-Cache misses per kilo instructions",
+        "BriefDescription": "The rate of L3 D-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l3d_cache_mpki"
+    },
+    {
+        "MetricExpr": "L3D_CACHE_REFILL / L3D_CACHE",
+        "PublicDescription": "The rate of L3 D-Cache misses to the overall L3 D-Cache",
+        "BriefDescription": "The rate of L3 D-Cache misses to the overall L3 D-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l3d_cache_miss_rate"
+    },
+    {
+        "MetricExpr": "LL_CACHE_MISS_RD / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of LL Cache read misses per kilo instructions",
+        "BriefDescription": "The rate of LL Cache read misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "ll_cache_read_mpki"
+    },
+    {
+        "MetricExpr": "LL_CACHE_MISS_RD / LL_CACHE_RD",
+        "PublicDescription": "The rate of LL Cache read misses to the overall LL Cache read",
+        "BriefDescription": "The rate of LL Cache read misses to the overall LL Cache read",
+        "MetricGroup": "Cache",
+        "MetricName": "ll_cache_read_miss_rate"
+    },
+    {
+        "MetricExpr": "(LL_CACHE_RD - LL_CACHE_MISS_RD) / LL_CACHE_RD",
+        "PublicDescription": "The rate of LL Cache read hit to the overall LL Cache read",
+        "BriefDescription": "The rate of LL Cache read hit to the overall LL Cache read",
+        "MetricGroup": "Cache",
+        "MetricName": "ll_cache_read_hit_rate"
     }
 ]
\ No newline at end of file
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [RFC PATCH v2 3/6] perf vendor events arm64: Add cache metrics for neoverse-n2
@ 2022-11-14  7:41   ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-14  7:41 UTC (permalink / raw)
  To: linux-arm-kernel, linux-perf-users, linux-kernel, John Garry,
	Will Deacon, James Clark, Mike Leach, Leo Yan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song, Jing Zhang

Add cache related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 77 ++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index 324ca12..1690ef6 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -54,5 +54,82 @@
         "BriefDescription": "The rate of DTLB Walks to the overall TLB lookups",
         "MetricGroup": "TLB",
         "MetricName": "dtlb_walk_rate"
+    },
+    {
+        "MetricExpr": "L1I_CACHE_REFILL / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of L1 I-Cache misses per kilo instructions",
+        "BriefDescription": "The rate of L1 I-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l1i_cache_mpki"
+    },
+    {
+        "MetricExpr": "L1I_CACHE_REFILL / L1I_CACHE",
+        "PublicDescription": "The rate of L1 I-Cache misses to the overall L1 I-Cache",
+        "BriefDescription": "The rate of L1 I-Cache misses to the overall L1 I-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l1i_cache_miss_rate"
+    },
+    {
+        "MetricExpr": "L1D_CACHE_REFILL / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of L1 D-Cache misses per kilo instructions",
+        "BriefDescription": "The rate of L1 D-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l1d_cache_mpki"
+    },
+    {
+        "MetricExpr": "L1D_CACHE_REFILL / L1D_CACHE",
+        "PublicDescription": "The rate of L1 D-Cache misses to the overall L1 D-Cache",
+        "BriefDescription": "The rate of L1 D-Cache misses to the overall L1 D-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l1d_cache_miss_rate"
+    },
+    {
+        "MetricExpr": "L2D_CACHE_REFILL / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of L2 D-Cache misses per kilo instructions",
+        "BriefDescription": "The rate of L2 D-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l2d_cache_mpki"
+    },
+    {
+        "MetricExpr": "L2D_CACHE_REFILL / L2D_CACHE",
+        "PublicDescription": "The rate of L2 D-Cache misses to the overall L2 D-Cache",
+        "BriefDescription": "The rate of L2 D-Cache misses to the overall L2 D-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l2d_cache_miss_rate"
+    },
+    {
+        "MetricExpr": "L3D_CACHE_REFILL / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of L3 D-Cache misses per kilo instructions",
+        "BriefDescription": "The rate of L3 D-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l3d_cache_mpki"
+    },
+    {
+        "MetricExpr": "L3D_CACHE_REFILL / L3D_CACHE",
+        "PublicDescription": "The rate of L3 D-Cache misses to the overall L3 D-Cache",
+        "BriefDescription": "The rate of L3 D-Cache misses to the overall L3 D-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l3d_cache_miss_rate"
+    },
+    {
+        "MetricExpr": "LL_CACHE_MISS_RD / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of LL Cache read misses per kilo instructions",
+        "BriefDescription": "The rate of LL Cache read misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "ll_cache_read_mpki"
+    },
+    {
+        "MetricExpr": "LL_CACHE_MISS_RD / LL_CACHE_RD",
+        "PublicDescription": "The rate of LL Cache read misses to the overall LL Cache read",
+        "BriefDescription": "The rate of LL Cache read misses to the overall LL Cache read",
+        "MetricGroup": "Cache",
+        "MetricName": "ll_cache_read_miss_rate"
+    },
+    {
+        "MetricExpr": "(LL_CACHE_RD - LL_CACHE_MISS_RD) / LL_CACHE_RD",
+        "PublicDescription": "The rate of LL Cache read hit to the overall LL Cache read",
+        "BriefDescription": "The rate of LL Cache read hit to the overall LL Cache read",
+        "MetricGroup": "Cache",
+        "MetricName": "ll_cache_read_hit_rate"
     }
 ]
\ No newline at end of file
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [RFC PATCH v2 4/6] perf vendor events arm64: Add branch metrics for neoverse-n2
  2022-10-31 11:11 ` Jing Zhang
@ 2022-11-14  7:41   ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-14  7:41 UTC (permalink / raw)
  To: linux-arm-kernel, linux-perf-users, linux-kernel, John Garry,
	Will Deacon, James Clark, Mike Leach, Leo Yan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song, Jing Zhang

Add branch related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json         | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index 1690ef6..e960a66 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -131,5 +131,26 @@
         "BriefDescription": "The rate of LL Cache read hit to the overall LL Cache read",
         "MetricGroup": "Cache",
         "MetricName": "ll_cache_read_hit_rate"
+    },
+    {
+        "MetricExpr": "BR_MIS_PRED / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of branches mis-predicted per kilo instructions",
+        "BriefDescription": "The rate of branches mis-predicted per kilo instructions",
+        "MetricGroup": "Branch",
+        "MetricName": "branch_mpki"
+    },
+    {
+        "MetricExpr": "(BR_PRED - BR_MIS_PRED) / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of branches retired per kilo instructions",
+        "BriefDescription": "The rate of branches retired per kilo instructions",
+        "MetricGroup": "Branch",
+        "MetricName": "branch_pki"
+    },
+    {
+        "MetricExpr": "BR_MIS_PRED / BR_PRED",
+        "PublicDescription": "The rate of branches mis-predited to the overall branches",
+        "BriefDescription": "The rate of branches mis-predited to the overall branches",
+        "MetricGroup": "Branch",
+        "MetricName": "branch_miss_pred_rate"
     }
 ]
\ No newline at end of file
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [RFC PATCH v2 4/6] perf vendor events arm64: Add branch metrics for neoverse-n2
@ 2022-11-14  7:41   ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-14  7:41 UTC (permalink / raw)
  To: linux-arm-kernel, linux-perf-users, linux-kernel, John Garry,
	Will Deacon, James Clark, Mike Leach, Leo Yan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song, Jing Zhang

Add branch related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json         | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index 1690ef6..e960a66 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -131,5 +131,26 @@
         "BriefDescription": "The rate of LL Cache read hit to the overall LL Cache read",
         "MetricGroup": "Cache",
         "MetricName": "ll_cache_read_hit_rate"
+    },
+    {
+        "MetricExpr": "BR_MIS_PRED / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of branches mis-predicted per kilo instructions",
+        "BriefDescription": "The rate of branches mis-predicted per kilo instructions",
+        "MetricGroup": "Branch",
+        "MetricName": "branch_mpki"
+    },
+    {
+        "MetricExpr": "(BR_PRED - BR_MIS_PRED) / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of branches retired per kilo instructions",
+        "BriefDescription": "The rate of branches retired per kilo instructions",
+        "MetricGroup": "Branch",
+        "MetricName": "branch_pki"
+    },
+    {
+        "MetricExpr": "BR_MIS_PRED / BR_PRED",
+        "PublicDescription": "The rate of branches mis-predited to the overall branches",
+        "BriefDescription": "The rate of branches mis-predited to the overall branches",
+        "MetricGroup": "Branch",
+        "MetricName": "branch_miss_pred_rate"
     }
 ]
\ No newline at end of file
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [RFC PATCH v2 5/6] perf vendor events arm64: Add PE utilization metrics for neoverse-n2
  2022-10-31 11:11 ` Jing Zhang
@ 2022-11-14  7:41   ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-14  7:41 UTC (permalink / raw)
  To: linux-arm-kernel, linux-perf-users, linux-kernel, John Garry,
	Will Deacon, James Clark, Mike Leach, Leo Yan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song, Jing Zhang

Add PE utilization related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 28 ++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index e960a66..3c14971 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -152,5 +152,33 @@
         "BriefDescription": "The rate of branches mis-predited to the overall branches",
         "MetricGroup": "Branch",
         "MetricName": "branch_miss_pred_rate"
+    },
+    {
+        "MetricExpr": "instructions / CPU_CYCLES",
+        "PublicDescription": "The average number of instructions executed for each cycle.",
+        "BriefDescription": "Instructions per cycle",
+        "MetricGroup": "PEutilization",
+        "MetricName": "ipc"
+    },
+    {
+        "MetricExpr": "OP_RETIRED / OP_SPEC",
+        "PublicDescription": "Fraction of operations retired",
+        "BriefDescription": "Fraction of operations retired",
+        "MetricGroup": "PEutilization",
+        "MetricName": "retired_rate"
+    },
+    {
+        "MetricExpr": "1 - OP_RETIRED / OP_SPEC",
+        "PublicDescription": "Fraction of operations wasted",
+        "BriefDescription": "Fraction of operations wasted",
+        "MetricGroup": "PEutilization",
+        "MetricName": "wasted_rate"
+    },
+    {
+        "MetricExpr": "(1 - STALL_SLOT / (CPU_CYCLES * 5)) * OP_RETIRED / OP_SPEC",
+        "PublicDescription": "Utilization of CPU",
+        "BriefDescription": "Utilization of CPU",
+        "MetricGroup": "PEutilization",
+        "MetricName": "cpu_utilization"
     }
 ]
\ No newline at end of file
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [RFC PATCH v2 5/6] perf vendor events arm64: Add PE utilization metrics for neoverse-n2
@ 2022-11-14  7:41   ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-14  7:41 UTC (permalink / raw)
  To: linux-arm-kernel, linux-perf-users, linux-kernel, John Garry,
	Will Deacon, James Clark, Mike Leach, Leo Yan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song, Jing Zhang

Add PE utilization related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 28 ++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index e960a66..3c14971 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -152,5 +152,33 @@
         "BriefDescription": "The rate of branches mis-predited to the overall branches",
         "MetricGroup": "Branch",
         "MetricName": "branch_miss_pred_rate"
+    },
+    {
+        "MetricExpr": "instructions / CPU_CYCLES",
+        "PublicDescription": "The average number of instructions executed for each cycle.",
+        "BriefDescription": "Instructions per cycle",
+        "MetricGroup": "PEutilization",
+        "MetricName": "ipc"
+    },
+    {
+        "MetricExpr": "OP_RETIRED / OP_SPEC",
+        "PublicDescription": "Fraction of operations retired",
+        "BriefDescription": "Fraction of operations retired",
+        "MetricGroup": "PEutilization",
+        "MetricName": "retired_rate"
+    },
+    {
+        "MetricExpr": "1 - OP_RETIRED / OP_SPEC",
+        "PublicDescription": "Fraction of operations wasted",
+        "BriefDescription": "Fraction of operations wasted",
+        "MetricGroup": "PEutilization",
+        "MetricName": "wasted_rate"
+    },
+    {
+        "MetricExpr": "(1 - STALL_SLOT / (CPU_CYCLES * 5)) * OP_RETIRED / OP_SPEC",
+        "PublicDescription": "Utilization of CPU",
+        "BriefDescription": "Utilization of CPU",
+        "MetricGroup": "PEutilization",
+        "MetricName": "cpu_utilization"
     }
 ]
\ No newline at end of file
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [RFC PATCH v2 6/6] perf vendor events arm64: Add instruction mix metrics for neoverse-n2
  2022-10-31 11:11 ` Jing Zhang
@ 2022-11-14  7:42   ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-14  7:42 UTC (permalink / raw)
  To: linux-arm-kernel, linux-perf-users, linux-kernel, John Garry,
	Will Deacon, James Clark, Mike Leach, Leo Yan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song, Jing Zhang

Add instruction mix related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 63 ++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index 3c14971..8ff1dfe 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -180,5 +180,68 @@
         "BriefDescription": "Utilization of CPU",
         "MetricGroup": "PEutilization",
         "MetricName": "cpu_utilization"
+    },
+    {
+        "MetricExpr": "LD_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of load instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of load instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "load_spec_rate"
+    },
+    {
+        "MetricExpr": "ST_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of store instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of store instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "store_spec_rate"
+    },
+    {
+        "MetricExpr": "DP_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of integer data-processing instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of integer data-processing instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "date_process_spec_rate"
+    },
+    {
+        "MetricExpr": "ASE_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of advanced SIMD instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of advanced SIMD instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "advanced_simd_spec_rate"
+    },
+    {
+        "MetricExpr": "VFP_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of floating point instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of floating point instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "float_point_spec_rate"
+    },
+    {
+        "MetricExpr": "CRYPTO_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of crypto instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of crypto instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "crypto_spec_rate"
+    },
+    {
+        "MetricExpr": "BR_IMMED_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of branch immediate instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of branch immediate instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "branch_immed_spec_rate"
+    },
+    {
+        "MetricExpr": "BR_RETURN_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of procedure return instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of procedure return instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "branch_return_spec_rate"
+    },
+    {
+        "MetricExpr": "BR_INDIRECT_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of indirect branch instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of indirect branch instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "branch_indirect_spec_rate"
     }
 ]
\ No newline at end of file
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [RFC PATCH v2 6/6] perf vendor events arm64: Add instruction mix metrics for neoverse-n2
@ 2022-11-14  7:42   ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-14  7:42 UTC (permalink / raw)
  To: linux-arm-kernel, linux-perf-users, linux-kernel, John Garry,
	Will Deacon, James Clark, Mike Leach, Leo Yan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song, Jing Zhang

Add instruction mix related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 63 ++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index 3c14971..8ff1dfe 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -180,5 +180,68 @@
         "BriefDescription": "Utilization of CPU",
         "MetricGroup": "PEutilization",
         "MetricName": "cpu_utilization"
+    },
+    {
+        "MetricExpr": "LD_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of load instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of load instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "load_spec_rate"
+    },
+    {
+        "MetricExpr": "ST_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of store instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of store instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "store_spec_rate"
+    },
+    {
+        "MetricExpr": "DP_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of integer data-processing instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of integer data-processing instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "date_process_spec_rate"
+    },
+    {
+        "MetricExpr": "ASE_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of advanced SIMD instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of advanced SIMD instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "advanced_simd_spec_rate"
+    },
+    {
+        "MetricExpr": "VFP_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of floating point instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of floating point instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "float_point_spec_rate"
+    },
+    {
+        "MetricExpr": "CRYPTO_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of crypto instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of crypto instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "crypto_spec_rate"
+    },
+    {
+        "MetricExpr": "BR_IMMED_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of branch immediate instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of branch immediate instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "branch_immed_spec_rate"
+    },
+    {
+        "MetricExpr": "BR_RETURN_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of procedure return instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of procedure return instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "branch_return_spec_rate"
+    },
+    {
+        "MetricExpr": "BR_INDIRECT_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of indirect branch instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of indirect branch instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "branch_indirect_spec_rate"
     }
 ]
\ No newline at end of file
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* Re: [RFC PATCH v2 3/6] perf vendor events arm64: Add cache metrics for neoverse-n2
  2022-11-14  7:41   ` Jing Zhang
@ 2022-11-14  8:35     ` Xing Zhengjun
  -1 siblings, 0 replies; 96+ messages in thread
From: Xing Zhengjun @ 2022-11-14  8:35 UTC (permalink / raw)
  To: Jing Zhang, linux-arm-kernel, linux-perf-users, linux-kernel,
	John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song



On 11/14/2022 3:41 PM, Jing Zhang wrote:
> Add cache related metrics.
> 
> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
> ---
>   .../arch/arm64/arm/neoverse-n2/metrics.json        | 77 ++++++++++++++++++++++
>   1 file changed, 77 insertions(+)
> 
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> index 324ca12..1690ef6 100644
> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> @@ -54,5 +54,82 @@
>           "BriefDescription": "The rate of DTLB Walks to the overall TLB lookups",
>           "MetricGroup": "TLB",
>           "MetricName": "dtlb_walk_rate"
> +    },
> +    {
> +        "MetricExpr": "L1I_CACHE_REFILL / INST_RETIRED * 1000",
> +        "PublicDescription": "The rate of L1 I-Cache misses per kilo instructions",
> +        "BriefDescription": "The rate of L1 I-Cache misses per kilo instructions",
> +        "MetricGroup": "Cache",
> +        "MetricName": "l1i_cache_mpki"
> +    },
> +    {
> +        "MetricExpr": "L1I_CACHE_REFILL / L1I_CACHE",
> +        "PublicDescription": "The rate of L1 I-Cache misses to the overall L1 I-Cache",
> +        "BriefDescription": "The rate of L1 I-Cache misses to the overall L1 I-Cache",
> +        "MetricGroup": "Cache",
> +        "MetricName": "l1i_cache_miss_rate"
> +    },
> +    {
> +        "MetricExpr": "L1D_CACHE_REFILL / INST_RETIRED * 1000",
> +        "PublicDescription": "The rate of L1 D-Cache misses per kilo instructions",
> +        "BriefDescription": "The rate of L1 D-Cache misses per kilo instructions",
> +        "MetricGroup": "Cache",
> +        "MetricName": "l1d_cache_mpki"
> +    },
> +    {
> +        "MetricExpr": "L1D_CACHE_REFILL / L1D_CACHE",
> +        "PublicDescription": "The rate of L1 D-Cache misses to the overall L1 D-Cache",
> +        "BriefDescription": "The rate of L1 D-Cache misses to the overall L1 D-Cache",
> +        "MetricGroup": "Cache",
> +        "MetricName": "l1d_cache_miss_rate"
> +    },
> +    {
> +        "MetricExpr": "L2D_CACHE_REFILL / INST_RETIRED * 1000",
> +        "PublicDescription": "The rate of L2 D-Cache misses per kilo instructions",
> +        "BriefDescription": "The rate of L2 D-Cache misses per kilo instructions",
> +        "MetricGroup": "Cache",
> +        "MetricName": "l2d_cache_mpki"
> +    },
> +    {
> +        "MetricExpr": "L2D_CACHE_REFILL / L2D_CACHE",
> +        "PublicDescription": "The rate of L2 D-Cache misses to the overall L2 D-Cache",
> +        "BriefDescription": "The rate of L2 D-Cache misses to the overall L2 D-Cache",
> +        "MetricGroup": "Cache",
> +        "MetricName": "l2d_cache_miss_rate"
> +    },
> +    {
> +        "MetricExpr": "L3D_CACHE_REFILL / INST_RETIRED * 1000",
> +        "PublicDescription": "The rate of L3 D-Cache misses per kilo instructions",
> +        "BriefDescription": "The rate of L3 D-Cache misses per kilo instructions",
> +        "MetricGroup": "Cache",
> +        "MetricName": "l3d_cache_mpki"
> +    },
> +    {
> +        "MetricExpr": "L3D_CACHE_REFILL / L3D_CACHE",
> +        "PublicDescription": "The rate of L3 D-Cache misses to the overall L3 D-Cache",
> +        "BriefDescription": "The rate of L3 D-Cache misses to the overall L3 D-Cache",
> +        "MetricGroup": "Cache",
> +        "MetricName": "l3d_cache_miss_rate"
> +    },
> +    {
> +        "MetricExpr": "LL_CACHE_MISS_RD / INST_RETIRED * 1000",
> +        "PublicDescription": "The rate of LL Cache read misses per kilo instructions",
> +        "BriefDescription": "The rate of LL Cache read misses per kilo instructions",
> +        "MetricGroup": "Cache",
> +        "MetricName": "ll_cache_read_mpki"
> +    },
> +    {
> +        "MetricExpr": "LL_CACHE_MISS_RD / LL_CACHE_RD",
> +        "PublicDescription": "The rate of LL Cache read misses to the overall LL Cache read",
> +        "BriefDescription": "The rate of LL Cache read misses to the overall LL Cache read",
> +        "MetricGroup": "Cache",
> +        "MetricName": "ll_cache_read_miss_rate"
> +    },
> +    {
> +        "MetricExpr": "(LL_CACHE_RD - LL_CACHE_MISS_RD) / LL_CACHE_RD",
> +        "PublicDescription": "The rate of LL Cache read hit to the overall LL Cache read",
> +        "BriefDescription": "The rate of LL Cache read hit to the overall LL Cache read",
> +        "MetricGroup": "Cache",
> +        "MetricName": "ll_cache_read_hit_rate"
>       }
>   ]
> \ No newline at end of file

It is better to fix this by adding a newline at the end of the file.

-- 
Zhengjun Xing

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC PATCH v2 3/6] perf vendor events arm64: Add cache metrics for neoverse-n2
@ 2022-11-14  8:35     ` Xing Zhengjun
  0 siblings, 0 replies; 96+ messages in thread
From: Xing Zhengjun @ 2022-11-14  8:35 UTC (permalink / raw)
  To: Jing Zhang, linux-arm-kernel, linux-perf-users, linux-kernel,
	John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song



On 11/14/2022 3:41 PM, Jing Zhang wrote:
> Add cache related metrics.
> 
> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
> ---
>   .../arch/arm64/arm/neoverse-n2/metrics.json        | 77 ++++++++++++++++++++++
>   1 file changed, 77 insertions(+)
> 
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> index 324ca12..1690ef6 100644
> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> @@ -54,5 +54,82 @@
>           "BriefDescription": "The rate of DTLB Walks to the overall TLB lookups",
>           "MetricGroup": "TLB",
>           "MetricName": "dtlb_walk_rate"
> +    },
> +    {
> +        "MetricExpr": "L1I_CACHE_REFILL / INST_RETIRED * 1000",
> +        "PublicDescription": "The rate of L1 I-Cache misses per kilo instructions",
> +        "BriefDescription": "The rate of L1 I-Cache misses per kilo instructions",
> +        "MetricGroup": "Cache",
> +        "MetricName": "l1i_cache_mpki"
> +    },
> +    {
> +        "MetricExpr": "L1I_CACHE_REFILL / L1I_CACHE",
> +        "PublicDescription": "The rate of L1 I-Cache misses to the overall L1 I-Cache",
> +        "BriefDescription": "The rate of L1 I-Cache misses to the overall L1 I-Cache",
> +        "MetricGroup": "Cache",
> +        "MetricName": "l1i_cache_miss_rate"
> +    },
> +    {
> +        "MetricExpr": "L1D_CACHE_REFILL / INST_RETIRED * 1000",
> +        "PublicDescription": "The rate of L1 D-Cache misses per kilo instructions",
> +        "BriefDescription": "The rate of L1 D-Cache misses per kilo instructions",
> +        "MetricGroup": "Cache",
> +        "MetricName": "l1d_cache_mpki"
> +    },
> +    {
> +        "MetricExpr": "L1D_CACHE_REFILL / L1D_CACHE",
> +        "PublicDescription": "The rate of L1 D-Cache misses to the overall L1 D-Cache",
> +        "BriefDescription": "The rate of L1 D-Cache misses to the overall L1 D-Cache",
> +        "MetricGroup": "Cache",
> +        "MetricName": "l1d_cache_miss_rate"
> +    },
> +    {
> +        "MetricExpr": "L2D_CACHE_REFILL / INST_RETIRED * 1000",
> +        "PublicDescription": "The rate of L2 D-Cache misses per kilo instructions",
> +        "BriefDescription": "The rate of L2 D-Cache misses per kilo instructions",
> +        "MetricGroup": "Cache",
> +        "MetricName": "l2d_cache_mpki"
> +    },
> +    {
> +        "MetricExpr": "L2D_CACHE_REFILL / L2D_CACHE",
> +        "PublicDescription": "The rate of L2 D-Cache misses to the overall L2 D-Cache",
> +        "BriefDescription": "The rate of L2 D-Cache misses to the overall L2 D-Cache",
> +        "MetricGroup": "Cache",
> +        "MetricName": "l2d_cache_miss_rate"
> +    },
> +    {
> +        "MetricExpr": "L3D_CACHE_REFILL / INST_RETIRED * 1000",
> +        "PublicDescription": "The rate of L3 D-Cache misses per kilo instructions",
> +        "BriefDescription": "The rate of L3 D-Cache misses per kilo instructions",
> +        "MetricGroup": "Cache",
> +        "MetricName": "l3d_cache_mpki"
> +    },
> +    {
> +        "MetricExpr": "L3D_CACHE_REFILL / L3D_CACHE",
> +        "PublicDescription": "The rate of L3 D-Cache misses to the overall L3 D-Cache",
> +        "BriefDescription": "The rate of L3 D-Cache misses to the overall L3 D-Cache",
> +        "MetricGroup": "Cache",
> +        "MetricName": "l3d_cache_miss_rate"
> +    },
> +    {
> +        "MetricExpr": "LL_CACHE_MISS_RD / INST_RETIRED * 1000",
> +        "PublicDescription": "The rate of LL Cache read misses per kilo instructions",
> +        "BriefDescription": "The rate of LL Cache read misses per kilo instructions",
> +        "MetricGroup": "Cache",
> +        "MetricName": "ll_cache_read_mpki"
> +    },
> +    {
> +        "MetricExpr": "LL_CACHE_MISS_RD / LL_CACHE_RD",
> +        "PublicDescription": "The rate of LL Cache read misses to the overall LL Cache read",
> +        "BriefDescription": "The rate of LL Cache read misses to the overall LL Cache read",
> +        "MetricGroup": "Cache",
> +        "MetricName": "ll_cache_read_miss_rate"
> +    },
> +    {
> +        "MetricExpr": "(LL_CACHE_RD - LL_CACHE_MISS_RD) / LL_CACHE_RD",
> +        "PublicDescription": "The rate of LL Cache read hit to the overall LL Cache read",
> +        "BriefDescription": "The rate of LL Cache read hit to the overall LL Cache read",
> +        "MetricGroup": "Cache",
> +        "MetricName": "ll_cache_read_hit_rate"
>       }
>   ]
> \ No newline at end of file

It is better to fix this by adding a newline at the end of the file.

-- 
Zhengjun Xing

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [External] : [RFC PATCH v2 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
  2022-11-14  7:41   ` Jing Zhang
@ 2022-11-14 12:59     ` John Garry
  -1 siblings, 0 replies; 96+ messages in thread
From: John Garry @ 2022-11-14 12:59 UTC (permalink / raw)
  To: Jing Zhang, linux-arm-kernel, linux-perf-users, linux-kernel,
	John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song

On 14/11/2022 07:41, Jing Zhang wrote:
> The calculation formula of topdown L1 is from the document:
> https://urldefense.com/v3/__https://documentation-service.arm.com/static/60250c7395978b529036da86?token=__;!!ACWV5N9M2RV99hQ!Ll-Jgvfs0LitTCU-hC6i6BKBVJfhke-pbQq2VoO-gmuSAcglQ3ZqMVMd2r0An_5a3ZDPYmn8zXuCrpUbehwnLHplVQ$  

So since this is a from "standard" document, did you consider putting 
these as an arch std event? I think arch std events would work for 
metrics, like they do for regular events.

> 
> However, due to the wrong count of stall_slot and stall_slot_frontend
> in neoverse-n2, the real stall_slot and real stall_slot_frontend need
> to subtract cpu_cycles, so when calculating the topdownL1 metrics,
> stall_slot and stall_slot_frontend are corrected.

Is there a reference to this? It would be indeed useful to pass a link 
to the n2 doc as these metrics are not part of the arm64 arm. At least I 
assume that they are not there.

> 
> Since neoverse-n2 does not yet support topdown L2, metricgroups such
> as Cache, TLB, Branch, InstructionsMix, and PEutilization will be
> added to further analysis of performance bottlenecks in the following
> patches.
> 


Thanks,
John


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [External] : [RFC PATCH v2 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
@ 2022-11-14 12:59     ` John Garry
  0 siblings, 0 replies; 96+ messages in thread
From: John Garry @ 2022-11-14 12:59 UTC (permalink / raw)
  To: Jing Zhang, linux-arm-kernel, linux-perf-users, linux-kernel,
	John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song

On 14/11/2022 07:41, Jing Zhang wrote:
> The calculation formula of topdown L1 is from the document:
> https://urldefense.com/v3/__https://documentation-service.arm.com/static/60250c7395978b529036da86?token=__;!!ACWV5N9M2RV99hQ!Ll-Jgvfs0LitTCU-hC6i6BKBVJfhke-pbQq2VoO-gmuSAcglQ3ZqMVMd2r0An_5a3ZDPYmn8zXuCrpUbehwnLHplVQ$  

So since this is a from "standard" document, did you consider putting 
these as an arch std event? I think arch std events would work for 
metrics, like they do for regular events.

> 
> However, due to the wrong count of stall_slot and stall_slot_frontend
> in neoverse-n2, the real stall_slot and real stall_slot_frontend need
> to subtract cpu_cycles, so when calculating the topdownL1 metrics,
> stall_slot and stall_slot_frontend are corrected.

Is there a reference to this? It would be indeed useful to pass a link 
to the n2 doc as these metrics are not part of the arm64 arm. At least I 
assume that they are not there.

> 
> Since neoverse-n2 does not yet support topdown L2, metricgroups such
> as Cache, TLB, Branch, InstructionsMix, and PEutilization will be
> added to further analysis of performance bottlenecks in the following
> patches.
> 


Thanks,
John


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC PATCH v2 3/6] perf vendor events arm64: Add cache metrics for neoverse-n2
  2022-11-14  8:35     ` Xing Zhengjun
@ 2022-11-15  6:28       ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-15  6:28 UTC (permalink / raw)
  To: Xing Zhengjun, linux-arm-kernel, linux-perf-users, linux-kernel,
	John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song



在 2022/11/14 下午4:35, Xing Zhengjun 写道:
> 
> 
> On 11/14/2022 3:41 PM, Jing Zhang wrote:
>> Add cache related metrics.
>>
>> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
>> ---
>>   .../arch/arm64/arm/neoverse-n2/metrics.json        | 77 ++++++++++++++++++++++
>>   1 file changed, 77 insertions(+)
>>
>> diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> index 324ca12..1690ef6 100644
>> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> @@ -54,5 +54,82 @@
>>           "BriefDescription": "The rate of DTLB Walks to the overall TLB lookups",
>>           "MetricGroup": "TLB",
>>           "MetricName": "dtlb_walk_rate"
>> +    },
>> +    {
>> +        "MetricExpr": "L1I_CACHE_REFILL / INST_RETIRED * 1000",
>> +        "PublicDescription": "The rate of L1 I-Cache misses per kilo instructions",
>> +        "BriefDescription": "The rate of L1 I-Cache misses per kilo instructions",
>> +        "MetricGroup": "Cache",
>> +        "MetricName": "l1i_cache_mpki"
>> +    },
>> +    {
>> +        "MetricExpr": "L1I_CACHE_REFILL / L1I_CACHE",
>> +        "PublicDescription": "The rate of L1 I-Cache misses to the overall L1 I-Cache",
>> +        "BriefDescription": "The rate of L1 I-Cache misses to the overall L1 I-Cache",
>> +        "MetricGroup": "Cache",
>> +        "MetricName": "l1i_cache_miss_rate"
>> +    },
>> +    {
>> +        "MetricExpr": "L1D_CACHE_REFILL / INST_RETIRED * 1000",
>> +        "PublicDescription": "The rate of L1 D-Cache misses per kilo instructions",
>> +        "BriefDescription": "The rate of L1 D-Cache misses per kilo instructions",
>> +        "MetricGroup": "Cache",
>> +        "MetricName": "l1d_cache_mpki"
>> +    },
>> +    {
>> +        "MetricExpr": "L1D_CACHE_REFILL / L1D_CACHE",
>> +        "PublicDescription": "The rate of L1 D-Cache misses to the overall L1 D-Cache",
>> +        "BriefDescription": "The rate of L1 D-Cache misses to the overall L1 D-Cache",
>> +        "MetricGroup": "Cache",
>> +        "MetricName": "l1d_cache_miss_rate"
>> +    },
>> +    {
>> +        "MetricExpr": "L2D_CACHE_REFILL / INST_RETIRED * 1000",
>> +        "PublicDescription": "The rate of L2 D-Cache misses per kilo instructions",
>> +        "BriefDescription": "The rate of L2 D-Cache misses per kilo instructions",
>> +        "MetricGroup": "Cache",
>> +        "MetricName": "l2d_cache_mpki"
>> +    },
>> +    {
>> +        "MetricExpr": "L2D_CACHE_REFILL / L2D_CACHE",
>> +        "PublicDescription": "The rate of L2 D-Cache misses to the overall L2 D-Cache",
>> +        "BriefDescription": "The rate of L2 D-Cache misses to the overall L2 D-Cache",
>> +        "MetricGroup": "Cache",
>> +        "MetricName": "l2d_cache_miss_rate"
>> +    },
>> +    {
>> +        "MetricExpr": "L3D_CACHE_REFILL / INST_RETIRED * 1000",
>> +        "PublicDescription": "The rate of L3 D-Cache misses per kilo instructions",
>> +        "BriefDescription": "The rate of L3 D-Cache misses per kilo instructions",
>> +        "MetricGroup": "Cache",
>> +        "MetricName": "l3d_cache_mpki"
>> +    },
>> +    {
>> +        "MetricExpr": "L3D_CACHE_REFILL / L3D_CACHE",
>> +        "PublicDescription": "The rate of L3 D-Cache misses to the overall L3 D-Cache",
>> +        "BriefDescription": "The rate of L3 D-Cache misses to the overall L3 D-Cache",
>> +        "MetricGroup": "Cache",
>> +        "MetricName": "l3d_cache_miss_rate"
>> +    },
>> +    {
>> +        "MetricExpr": "LL_CACHE_MISS_RD / INST_RETIRED * 1000",
>> +        "PublicDescription": "The rate of LL Cache read misses per kilo instructions",
>> +        "BriefDescription": "The rate of LL Cache read misses per kilo instructions",
>> +        "MetricGroup": "Cache",
>> +        "MetricName": "ll_cache_read_mpki"
>> +    },
>> +    {
>> +        "MetricExpr": "LL_CACHE_MISS_RD / LL_CACHE_RD",
>> +        "PublicDescription": "The rate of LL Cache read misses to the overall LL Cache read",
>> +        "BriefDescription": "The rate of LL Cache read misses to the overall LL Cache read",
>> +        "MetricGroup": "Cache",
>> +        "MetricName": "ll_cache_read_miss_rate"
>> +    },
>> +    {
>> +        "MetricExpr": "(LL_CACHE_RD - LL_CACHE_MISS_RD) / LL_CACHE_RD",
>> +        "PublicDescription": "The rate of LL Cache read hit to the overall LL Cache read",
>> +        "BriefDescription": "The rate of LL Cache read hit to the overall LL Cache read",
>> +        "MetricGroup": "Cache",
>> +        "MetricName": "ll_cache_read_hit_rate"
>>       }
>>   ]
>> \ No newline at end of file
> 
> It is better to fix this by adding a newline at the end of the file.
> 
OK, thanks for pointing it out.


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [RFC PATCH v2 3/6] perf vendor events arm64: Add cache metrics for neoverse-n2
@ 2022-11-15  6:28       ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-15  6:28 UTC (permalink / raw)
  To: Xing Zhengjun, linux-arm-kernel, linux-perf-users, linux-kernel,
	John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song



在 2022/11/14 下午4:35, Xing Zhengjun 写道:
> 
> 
> On 11/14/2022 3:41 PM, Jing Zhang wrote:
>> Add cache related metrics.
>>
>> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
>> ---
>>   .../arch/arm64/arm/neoverse-n2/metrics.json        | 77 ++++++++++++++++++++++
>>   1 file changed, 77 insertions(+)
>>
>> diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> index 324ca12..1690ef6 100644
>> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> @@ -54,5 +54,82 @@
>>           "BriefDescription": "The rate of DTLB Walks to the overall TLB lookups",
>>           "MetricGroup": "TLB",
>>           "MetricName": "dtlb_walk_rate"
>> +    },
>> +    {
>> +        "MetricExpr": "L1I_CACHE_REFILL / INST_RETIRED * 1000",
>> +        "PublicDescription": "The rate of L1 I-Cache misses per kilo instructions",
>> +        "BriefDescription": "The rate of L1 I-Cache misses per kilo instructions",
>> +        "MetricGroup": "Cache",
>> +        "MetricName": "l1i_cache_mpki"
>> +    },
>> +    {
>> +        "MetricExpr": "L1I_CACHE_REFILL / L1I_CACHE",
>> +        "PublicDescription": "The rate of L1 I-Cache misses to the overall L1 I-Cache",
>> +        "BriefDescription": "The rate of L1 I-Cache misses to the overall L1 I-Cache",
>> +        "MetricGroup": "Cache",
>> +        "MetricName": "l1i_cache_miss_rate"
>> +    },
>> +    {
>> +        "MetricExpr": "L1D_CACHE_REFILL / INST_RETIRED * 1000",
>> +        "PublicDescription": "The rate of L1 D-Cache misses per kilo instructions",
>> +        "BriefDescription": "The rate of L1 D-Cache misses per kilo instructions",
>> +        "MetricGroup": "Cache",
>> +        "MetricName": "l1d_cache_mpki"
>> +    },
>> +    {
>> +        "MetricExpr": "L1D_CACHE_REFILL / L1D_CACHE",
>> +        "PublicDescription": "The rate of L1 D-Cache misses to the overall L1 D-Cache",
>> +        "BriefDescription": "The rate of L1 D-Cache misses to the overall L1 D-Cache",
>> +        "MetricGroup": "Cache",
>> +        "MetricName": "l1d_cache_miss_rate"
>> +    },
>> +    {
>> +        "MetricExpr": "L2D_CACHE_REFILL / INST_RETIRED * 1000",
>> +        "PublicDescription": "The rate of L2 D-Cache misses per kilo instructions",
>> +        "BriefDescription": "The rate of L2 D-Cache misses per kilo instructions",
>> +        "MetricGroup": "Cache",
>> +        "MetricName": "l2d_cache_mpki"
>> +    },
>> +    {
>> +        "MetricExpr": "L2D_CACHE_REFILL / L2D_CACHE",
>> +        "PublicDescription": "The rate of L2 D-Cache misses to the overall L2 D-Cache",
>> +        "BriefDescription": "The rate of L2 D-Cache misses to the overall L2 D-Cache",
>> +        "MetricGroup": "Cache",
>> +        "MetricName": "l2d_cache_miss_rate"
>> +    },
>> +    {
>> +        "MetricExpr": "L3D_CACHE_REFILL / INST_RETIRED * 1000",
>> +        "PublicDescription": "The rate of L3 D-Cache misses per kilo instructions",
>> +        "BriefDescription": "The rate of L3 D-Cache misses per kilo instructions",
>> +        "MetricGroup": "Cache",
>> +        "MetricName": "l3d_cache_mpki"
>> +    },
>> +    {
>> +        "MetricExpr": "L3D_CACHE_REFILL / L3D_CACHE",
>> +        "PublicDescription": "The rate of L3 D-Cache misses to the overall L3 D-Cache",
>> +        "BriefDescription": "The rate of L3 D-Cache misses to the overall L3 D-Cache",
>> +        "MetricGroup": "Cache",
>> +        "MetricName": "l3d_cache_miss_rate"
>> +    },
>> +    {
>> +        "MetricExpr": "LL_CACHE_MISS_RD / INST_RETIRED * 1000",
>> +        "PublicDescription": "The rate of LL Cache read misses per kilo instructions",
>> +        "BriefDescription": "The rate of LL Cache read misses per kilo instructions",
>> +        "MetricGroup": "Cache",
>> +        "MetricName": "ll_cache_read_mpki"
>> +    },
>> +    {
>> +        "MetricExpr": "LL_CACHE_MISS_RD / LL_CACHE_RD",
>> +        "PublicDescription": "The rate of LL Cache read misses to the overall LL Cache read",
>> +        "BriefDescription": "The rate of LL Cache read misses to the overall LL Cache read",
>> +        "MetricGroup": "Cache",
>> +        "MetricName": "ll_cache_read_miss_rate"
>> +    },
>> +    {
>> +        "MetricExpr": "(LL_CACHE_RD - LL_CACHE_MISS_RD) / LL_CACHE_RD",
>> +        "PublicDescription": "The rate of LL Cache read hit to the overall LL Cache read",
>> +        "BriefDescription": "The rate of LL Cache read hit to the overall LL Cache read",
>> +        "MetricGroup": "Cache",
>> +        "MetricName": "ll_cache_read_hit_rate"
>>       }
>>   ]
>> \ No newline at end of file
> 
> It is better to fix this by adding a newline at the end of the file.
> 
OK, thanks for pointing it out.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [External] : [RFC PATCH v2 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
  2022-11-14 12:59     ` John Garry
@ 2022-11-15  8:43       ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-15  8:43 UTC (permalink / raw)
  To: John Garry, linux-arm-kernel, linux-perf-users, linux-kernel,
	John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song



在 2022/11/14 下午8:59, John Garry 写道:
> On 14/11/2022 07:41, Jing Zhang wrote:
>> The calculation formula of topdown L1 is from the document:
>> https://urldefense.com/v3/__https://documentation-service.arm.com/static/60250c7395978b529036da86?token=__;!!ACWV5N9M2RV99hQ!Ll-Jgvfs0LitTCU-hC6i6BKBVJfhke-pbQq2VoO-gmuSAcglQ3ZqMVMd2r0An_5a3ZDPYmn8zXuCrpUbehwnLHplVQ$  
> 
> So since this is a from "standard" document, did you consider putting these as an arch std event? I think arch std events would work for metrics, like they do for regular events.
> 

I didn't find out how to put the metric as an arch std event, it would be best if you could provide me with an example in the upstream code,
thank you very much.

>>
>> However, due to the wrong count of stall_slot and stall_slot_frontend
>> in neoverse-n2, the real stall_slot and real stall_slot_frontend need
>> to subtract cpu_cycles, so when calculating the topdownL1 metrics,
>> stall_slot and stall_slot_frontend are corrected.
> 
> Is there a reference to this? It would be indeed useful to pass a link to the n2 doc as these metrics are not part of the arm64 arm. At least I assume that they are not there.
> 

You are right, I need to add a doc link. ARM has released the n2 ERRATA document about the incorrect count of stall_slot and stall_slot_frontend,
and provides a workaround to get the correct value.
Link: https://developer.arm.com/documentation/SDEN1982442/1200/?lang=en

>>
>> Since neoverse-n2 does not yet support topdown L2, metricgroups such
>> as Cache, TLB, Branch, InstructionsMix, and PEutilization will be
>> added to further analysis of performance bottlenecks in the following
>> patches.
>>
> 
> 
> Thanks,
> John
Best Regards,
Jing

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [External] : [RFC PATCH v2 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
@ 2022-11-15  8:43       ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-15  8:43 UTC (permalink / raw)
  To: John Garry, linux-arm-kernel, linux-perf-users, linux-kernel,
	John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song



在 2022/11/14 下午8:59, John Garry 写道:
> On 14/11/2022 07:41, Jing Zhang wrote:
>> The calculation formula of topdown L1 is from the document:
>> https://urldefense.com/v3/__https://documentation-service.arm.com/static/60250c7395978b529036da86?token=__;!!ACWV5N9M2RV99hQ!Ll-Jgvfs0LitTCU-hC6i6BKBVJfhke-pbQq2VoO-gmuSAcglQ3ZqMVMd2r0An_5a3ZDPYmn8zXuCrpUbehwnLHplVQ$  
> 
> So since this is a from "standard" document, did you consider putting these as an arch std event? I think arch std events would work for metrics, like they do for regular events.
> 

I didn't find out how to put the metric as an arch std event, it would be best if you could provide me with an example in the upstream code,
thank you very much.

>>
>> However, due to the wrong count of stall_slot and stall_slot_frontend
>> in neoverse-n2, the real stall_slot and real stall_slot_frontend need
>> to subtract cpu_cycles, so when calculating the topdownL1 metrics,
>> stall_slot and stall_slot_frontend are corrected.
> 
> Is there a reference to this? It would be indeed useful to pass a link to the n2 doc as these metrics are not part of the arm64 arm. At least I assume that they are not there.
> 

You are right, I need to add a doc link. ARM has released the n2 ERRATA document about the incorrect count of stall_slot and stall_slot_frontend,
and provides a workaround to get the correct value.
Link: https://developer.arm.com/documentation/SDEN1982442/1200/?lang=en

>>
>> Since neoverse-n2 does not yet support topdown L2, metricgroups such
>> as Cache, TLB, Branch, InstructionsMix, and PEutilization will be
>> added to further analysis of performance bottlenecks in the following
>> patches.
>>
> 
> 
> Thanks,
> John
Best Regards,
Jing

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [External] : [RFC PATCH v2 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
  2022-11-15  8:43       ` Jing Zhang
@ 2022-11-15 11:19         ` John Garry
  -1 siblings, 0 replies; 96+ messages in thread
From: John Garry @ 2022-11-15 11:19 UTC (permalink / raw)
  To: Jing Zhang, linux-arm-kernel, linux-perf-users, linux-kernel,
	John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song

On 15/11/2022 08:43, Jing Zhang wrote:
> I didn't find out how to put the metric as an arch std event, it would be best if you could provide me with an example in the upstream code,
> thank you very much.

As things stand, I don't think it's supported. We only support regular 
events for std arch events (and not metrics).

However we could expand support for metrics.

For the example of hip08 and FRONTEND_BOUND, we would have:

--->8---

diff --git 
a/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json 
b/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json
index 6443a061e22a..5b1ca45224de 100644
--- a/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json
@@ -1,10 +1,6 @@
  [
      {
-        "MetricExpr": "FETCH_BUBBLE / (4 * CPU_CYCLES)",
-        "PublicDescription": "Frontend bound L1 topdown metric",
-        "BriefDescription": "Frontend bound L1 topdown metric",
-        "MetricGroup": "TopDownL1",
-        "MetricName": "frontend_bound"
+        "ArchStdEvent": "FRONTEND_BOUND"
      },
      {
          "MetricExpr": "(INST_SPEC - INST_RETIRED) / (4 * CPU_CYCLES)",
diff --git a/tools/perf/pmu-events/arch/arm64/sbsa.json 
b/tools/perf/pmu-events/arch/arm64/sbsa.json
new file mode 100644
index 000000000000..10b9c0cccc40
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/sbsa.json
@@ -0,0 +1,9 @@
+[
+    {
+        "MetricExpr": "FETCH_BUBBLE / (4 * CPU_CYCLES)",
+        "PublicDescription": "Frontend bound L1 topdown metric",
+        "BriefDescription": "Frontend bound L1 topdown metric",
+        "MetricGroup": "TopDownL1",
+        "MetricName": "FRONTEND_BOUND"
+    }
+]
diff --git a/tools/perf/pmu-events/jevents.py 
b/tools/perf/pmu-events/jevents.py
index 0daa3e007528..77049853c0bf 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -352,6 +352,8 @@ def preprocess_arch_std_files(archpath: str) -> None:
        for event in read_json_events(item.path, topic=''):
          if event.name:
            _arch_std_events[event.name.lower()] = event
+        if event.metric_name:
+          _arch_std_events[event.metric_name.lower()] = event


  def print_events_table_prefix(tblname: str) -> None:
-- 
2.35.3

Note that this is for illustration only. The frontend bound metric for 
hip08 does not really belong in sbsa.json as it does not adhere to that 
spec. But for platforms which do adhere to the spec, we could pick up 
the metrics values from sbsa.json (or whatever we want to call it).

> 
>>> However, due to the wrong count of stall_slot and stall_slot_frontend
>>> in neoverse-n2, the real stall_slot and real stall_slot_frontend need
>>> to subtract cpu_cycles, so when calculating the topdownL1 metrics,
>>> stall_slot and stall_slot_frontend are corrected.
>> Is there a reference to this? It would be indeed useful to pass a link to the n2 doc as these metrics are not part of the arm64 arm. At least I assume that they are not there.
>>
> You are right, I need to add a doc link. ARM has released the n2 ERRATA document about the incorrect count of stall_slot and stall_slot_frontend,
> and provides a workaround to get the correct value.
> Link:https://urldefense.com/v3/__https://developer.arm.com/documentation/SDEN1982442/1200/?lang=en__;!!ACWV5N9M2RV99hQ!I3rCI-RcuDmpfAiWhA_SAxRrq1hCyhA9am8YmrwizljPz9X_G4H_odm_4aRgRo8VswDeC3TATbylxf_vhAJhWbJrlw$  
> 

Note that std arch events support is such that we can still overwrite 
individual std values in the platform-specific json (or at least we used 
to be able to - I have not checked recently). So for n2 case of 
stall_slot, we could use std arch events in the n2 json but overwrite 
the metric expression, like:

+++ b/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json
@@ -1,10 +1,6 @@
  [
      {
          "ArchStdEvent": "FRONTEND_BOUND"
	 "MetricExpr": <insert n2 specific expression>",
      },

Thanks,
John

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* Re: [External] : [RFC PATCH v2 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
@ 2022-11-15 11:19         ` John Garry
  0 siblings, 0 replies; 96+ messages in thread
From: John Garry @ 2022-11-15 11:19 UTC (permalink / raw)
  To: Jing Zhang, linux-arm-kernel, linux-perf-users, linux-kernel,
	John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song

On 15/11/2022 08:43, Jing Zhang wrote:
> I didn't find out how to put the metric as an arch std event, it would be best if you could provide me with an example in the upstream code,
> thank you very much.

As things stand, I don't think it's supported. We only support regular 
events for std arch events (and not metrics).

However we could expand support for metrics.

For the example of hip08 and FRONTEND_BOUND, we would have:

--->8---

diff --git 
a/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json 
b/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json
index 6443a061e22a..5b1ca45224de 100644
--- a/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json
@@ -1,10 +1,6 @@
  [
      {
-        "MetricExpr": "FETCH_BUBBLE / (4 * CPU_CYCLES)",
-        "PublicDescription": "Frontend bound L1 topdown metric",
-        "BriefDescription": "Frontend bound L1 topdown metric",
-        "MetricGroup": "TopDownL1",
-        "MetricName": "frontend_bound"
+        "ArchStdEvent": "FRONTEND_BOUND"
      },
      {
          "MetricExpr": "(INST_SPEC - INST_RETIRED) / (4 * CPU_CYCLES)",
diff --git a/tools/perf/pmu-events/arch/arm64/sbsa.json 
b/tools/perf/pmu-events/arch/arm64/sbsa.json
new file mode 100644
index 000000000000..10b9c0cccc40
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/sbsa.json
@@ -0,0 +1,9 @@
+[
+    {
+        "MetricExpr": "FETCH_BUBBLE / (4 * CPU_CYCLES)",
+        "PublicDescription": "Frontend bound L1 topdown metric",
+        "BriefDescription": "Frontend bound L1 topdown metric",
+        "MetricGroup": "TopDownL1",
+        "MetricName": "FRONTEND_BOUND"
+    }
+]
diff --git a/tools/perf/pmu-events/jevents.py 
b/tools/perf/pmu-events/jevents.py
index 0daa3e007528..77049853c0bf 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -352,6 +352,8 @@ def preprocess_arch_std_files(archpath: str) -> None:
        for event in read_json_events(item.path, topic=''):
          if event.name:
            _arch_std_events[event.name.lower()] = event
+        if event.metric_name:
+          _arch_std_events[event.metric_name.lower()] = event


  def print_events_table_prefix(tblname: str) -> None:
-- 
2.35.3

Note that this is for illustration only. The frontend bound metric for 
hip08 does not really belong in sbsa.json as it does not adhere to that 
spec. But for platforms which do adhere to the spec, we could pick up 
the metrics values from sbsa.json (or whatever we want to call it).

> 
>>> However, due to the wrong count of stall_slot and stall_slot_frontend
>>> in neoverse-n2, the real stall_slot and real stall_slot_frontend need
>>> to subtract cpu_cycles, so when calculating the topdownL1 metrics,
>>> stall_slot and stall_slot_frontend are corrected.
>> Is there a reference to this? It would be indeed useful to pass a link to the n2 doc as these metrics are not part of the arm64 arm. At least I assume that they are not there.
>>
> You are right, I need to add a doc link. ARM has released the n2 ERRATA document about the incorrect count of stall_slot and stall_slot_frontend,
> and provides a workaround to get the correct value.
> Link:https://urldefense.com/v3/__https://developer.arm.com/documentation/SDEN1982442/1200/?lang=en__;!!ACWV5N9M2RV99hQ!I3rCI-RcuDmpfAiWhA_SAxRrq1hCyhA9am8YmrwizljPz9X_G4H_odm_4aRgRo8VswDeC3TATbylxf_vhAJhWbJrlw$  
> 

Note that std arch events support is such that we can still overwrite 
individual std values in the platform-specific json (or at least we used 
to be able to - I have not checked recently). So for n2 case of 
stall_slot, we could use std arch events in the n2 json but overwrite 
the metric expression, like:

+++ b/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json
@@ -1,10 +1,6 @@
  [
      {
          "ArchStdEvent": "FRONTEND_BOUND"
	 "MetricExpr": <insert n2 specific expression>",
      },

Thanks,
John

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* Re: [PATCH RFC 0/6] Add metrics for neoverse-n2
  2022-10-31 11:11 ` Jing Zhang
@ 2022-11-16 11:19   ` James Clark
  -1 siblings, 0 replies; 96+ messages in thread
From: James Clark @ 2022-11-16 11:19 UTC (permalink / raw)
  To: Jing Zhang, nick Forrington, Jumana MP, John Garry
  Cc: Will Deacon, Mike Leach, Leo Yan, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Andrew Kilroy, Shuai Xue, Zhuo Song,
	linux-arm-kernel, linux-perf-users, linux-kernel



On 31/10/2022 11:11, Jing Zhang wrote:
> This series add six metricgroups for neoverse-n2, among which, the
> formula of topdown L1 is from the document:
> https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
> 
> Since neoverse-n2 does not yet support topdown L2, metricgroups such
> as Cache, TLB, Branch, InstructionsMix, and PEutilization are added to
> help further analysis of performance bottlenecks.
> 

Hi Jing,

Thanks for working on this, these metrics look ok to me in general,
although we're currently working on publishing standardised metrics
across all new cores as part of a new project in Arm. This will include
N2, and our ones are very similar (or almost identical) to yours,
barring slightly different group names, metric names, and differences in
things like outputting topdown metrics as percentages.

We plan to publish our standard metrics some time in the next 2 months.
Would you consider holding off on merging this change so that we have
consistant group names and units going forward? Otherwise N2 would be
the odd one out. I will send you the metrics when they are ready, and we
will have a script to generate perf jsons from them, so you can review.

We also have a slightly different forumula for one of the top down
metrics which I think would be slightly more accurate. We don't have
anything for your "PE utilization" metrics, which I can raise
internally. It could always be added to perf on top of the standardised
ones if we don't add it to our standard ones.

Thanks
James

> with this series on neoverse-n2:
> 
> $./perf list metricgroup
> 
> List of pre-defined events (to be used in -e):
> 
> 
> Metric Groups:
> 
> Branch
> Cache
> InstructionMix
> PEutilization
> TLB
> TopDownL1
> 
> 
> $./perf list
> 
> ...
> Metric Groups:
> 
> Branch:
>   branch_miss_pred_rate
>        [The rate of branches mis-predited to the overall branches]
>   branch_mpki
>        [The rate of branches mis-predicted per kilo instructions]
>   branch_pki
>        [The rate of branches retired per kilo instructions]
> Cache:
>   l1d_cache_miss_rate
>        [The rate of L1 D-Cache misses to the overall L1 D-Cache]
>   l1d_cache_mpki
>        [The rate of L1 D-Cache misses per kilo instructions]
> ...
> 
> 
> $sudo ./perf stat -a -M TLB sleep 1
> 
>  Performance counter stats for 'system wide':
> 
>         35,861,936      L1I_TLB                          #     0.00 itlb_walk_rate           (74.91%)
>              5,661      ITLB_WALK                                                            (74.91%)
>         97,279,240      INST_RETIRED                     #     0.07 itlb_mpki                (74.91%)
>              6,851      ITLB_WALK                                                            (74.91%)
>             26,391      DTLB_WALK                        #     0.00 dtlb_walk_rate           (75.07%)
>         35,585,545      L1D_TLB                                                              (75.07%)
>         85,923,244      INST_RETIRED                     #     0.35 dtlb_mpki                (75.11%)
>             29,992      DTLB_WALK                                                            (75.11%)
> 
>        1.003450755 seconds time elapsed
>        
> 
> Jing Zhang (6):
>   perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
>   perf vendor events arm64: Add TLB metrics for neoverse-n2
>   perf vendor events arm64: Add cache metrics for neoverse-n2
>   perf vendor events arm64: Add branch metrics for neoverse-n2
>   perf vendor events arm64: Add PE utilization metrics for neoverse-n2
>   perf vendor events arm64: Add instruction mix metrics for neoverse-n2
> 
>  .../arch/arm64/arm/neoverse-n2/metrics.json        | 247 +++++++++++++++++++++
>  1 file changed, 247 insertions(+)
>  create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH RFC 0/6] Add metrics for neoverse-n2
@ 2022-11-16 11:19   ` James Clark
  0 siblings, 0 replies; 96+ messages in thread
From: James Clark @ 2022-11-16 11:19 UTC (permalink / raw)
  To: Jing Zhang, nick Forrington, Jumana MP, John Garry
  Cc: Will Deacon, Mike Leach, Leo Yan, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Andrew Kilroy, Shuai Xue, Zhuo Song,
	linux-arm-kernel, linux-perf-users, linux-kernel



On 31/10/2022 11:11, Jing Zhang wrote:
> This series add six metricgroups for neoverse-n2, among which, the
> formula of topdown L1 is from the document:
> https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
> 
> Since neoverse-n2 does not yet support topdown L2, metricgroups such
> as Cache, TLB, Branch, InstructionsMix, and PEutilization are added to
> help further analysis of performance bottlenecks.
> 

Hi Jing,

Thanks for working on this, these metrics look ok to me in general,
although we're currently working on publishing standardised metrics
across all new cores as part of a new project in Arm. This will include
N2, and our ones are very similar (or almost identical) to yours,
barring slightly different group names, metric names, and differences in
things like outputting topdown metrics as percentages.

We plan to publish our standard metrics some time in the next 2 months.
Would you consider holding off on merging this change so that we have
consistant group names and units going forward? Otherwise N2 would be
the odd one out. I will send you the metrics when they are ready, and we
will have a script to generate perf jsons from them, so you can review.

We also have a slightly different forumula for one of the top down
metrics which I think would be slightly more accurate. We don't have
anything for your "PE utilization" metrics, which I can raise
internally. It could always be added to perf on top of the standardised
ones if we don't add it to our standard ones.

Thanks
James

> with this series on neoverse-n2:
> 
> $./perf list metricgroup
> 
> List of pre-defined events (to be used in -e):
> 
> 
> Metric Groups:
> 
> Branch
> Cache
> InstructionMix
> PEutilization
> TLB
> TopDownL1
> 
> 
> $./perf list
> 
> ...
> Metric Groups:
> 
> Branch:
>   branch_miss_pred_rate
>        [The rate of branches mis-predited to the overall branches]
>   branch_mpki
>        [The rate of branches mis-predicted per kilo instructions]
>   branch_pki
>        [The rate of branches retired per kilo instructions]
> Cache:
>   l1d_cache_miss_rate
>        [The rate of L1 D-Cache misses to the overall L1 D-Cache]
>   l1d_cache_mpki
>        [The rate of L1 D-Cache misses per kilo instructions]
> ...
> 
> 
> $sudo ./perf stat -a -M TLB sleep 1
> 
>  Performance counter stats for 'system wide':
> 
>         35,861,936      L1I_TLB                          #     0.00 itlb_walk_rate           (74.91%)
>              5,661      ITLB_WALK                                                            (74.91%)
>         97,279,240      INST_RETIRED                     #     0.07 itlb_mpki                (74.91%)
>              6,851      ITLB_WALK                                                            (74.91%)
>             26,391      DTLB_WALK                        #     0.00 dtlb_walk_rate           (75.07%)
>         35,585,545      L1D_TLB                                                              (75.07%)
>         85,923,244      INST_RETIRED                     #     0.35 dtlb_mpki                (75.11%)
>             29,992      DTLB_WALK                                                            (75.11%)
> 
>        1.003450755 seconds time elapsed
>        
> 
> Jing Zhang (6):
>   perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
>   perf vendor events arm64: Add TLB metrics for neoverse-n2
>   perf vendor events arm64: Add cache metrics for neoverse-n2
>   perf vendor events arm64: Add branch metrics for neoverse-n2
>   perf vendor events arm64: Add PE utilization metrics for neoverse-n2
>   perf vendor events arm64: Add instruction mix metrics for neoverse-n2
> 
>  .../arch/arm64/arm/neoverse-n2/metrics.json        | 247 +++++++++++++++++++++
>  1 file changed, 247 insertions(+)
>  create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> 

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH RFC 0/6] Add metrics for neoverse-n2
  2022-11-16 11:19   ` James Clark
@ 2022-11-16 15:26     ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-16 15:26 UTC (permalink / raw)
  To: James Clark, nick Forrington, Jumana MP, John Garry
  Cc: Will Deacon, Mike Leach, Leo Yan, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Andrew Kilroy, Shuai Xue, Zhuo Song,
	linux-arm-kernel, linux-perf-users, linux-kernel



在 2022/11/16 下午7:19, James Clark 写道:
> 
> 
> On 31/10/2022 11:11, Jing Zhang wrote:
>> This series add six metricgroups for neoverse-n2, among which, the
>> formula of topdown L1 is from the document:
>> https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
>>
>> Since neoverse-n2 does not yet support topdown L2, metricgroups such
>> as Cache, TLB, Branch, InstructionsMix, and PEutilization are added to
>> help further analysis of performance bottlenecks.
>>
> 
> Hi Jing,
> 
> Thanks for working on this, these metrics look ok to me in general,
> although we're currently working on publishing standardised metrics
> across all new cores as part of a new project in Arm. This will include
> N2, and our ones are very similar (or almost identical) to yours,
> barring slightly different group names, metric names, and differences in
> things like outputting topdown metrics as percentages.
> 
> We plan to publish our standard metrics some time in the next 2 months.
> Would you consider holding off on merging this change so that we have
> consistant group names and units going forward? Otherwise N2 would be> the odd one out. I will send you the metrics when they are ready, and we
> will have a script to generate perf jsons from them, so you can review.
> 

Do you mean that after you release the new standard metrics, I remake my
patch referring to them, such as consistent group names and unit?


> We also have a slightly different forumula for one of the top down
> metrics which I think would be slightly more accurate. We don't have


The v2 version of the patchset updated the formula of topdown L1.
Link: https://lore.kernel.org/all/1668411720-3581-1-git-send-email-renyu.zj@linux.alibaba.com/

The formula of the v2 version is more accurate than v1, and it has been
verified in our test environment. Can you share your formula first and we
can discuss it together? :)

Thanks,
Jing


> anything for your "PE utilization" metrics, which I can raise
> internally. It could always be added to perf on top of the standardised
> ones if we don't add it to our standard ones.
> 
> Thanks
> James
> 
>> with this series on neoverse-n2:
>>
>> $./perf list metricgroup
>>
>> List of pre-defined events (to be used in -e):
>>
>>
>> Metric Groups:
>>
>> Branch
>> Cache
>> InstructionMix
>> PEutilization
>> TLB
>> TopDownL1
>>
>>
>> $./perf list
>>
>> ...
>> Metric Groups:
>>
>> Branch:
>>   branch_miss_pred_rate
>>        [The rate of branches mis-predited to the overall branches]
>>   branch_mpki
>>        [The rate of branches mis-predicted per kilo instructions]
>>   branch_pki
>>        [The rate of branches retired per kilo instructions]
>> Cache:
>>   l1d_cache_miss_rate
>>        [The rate of L1 D-Cache misses to the overall L1 D-Cache]
>>   l1d_cache_mpki
>>        [The rate of L1 D-Cache misses per kilo instructions]
>> ...
>>
>>
>> $sudo ./perf stat -a -M TLB sleep 1
>>
>>  Performance counter stats for 'system wide':
>>
>>         35,861,936      L1I_TLB                          #     0.00 itlb_walk_rate           (74.91%)
>>              5,661      ITLB_WALK                                                            (74.91%)
>>         97,279,240      INST_RETIRED                     #     0.07 itlb_mpki                (74.91%)
>>              6,851      ITLB_WALK                                                            (74.91%)
>>             26,391      DTLB_WALK                        #     0.00 dtlb_walk_rate           (75.07%)
>>         35,585,545      L1D_TLB                                                              (75.07%)
>>         85,923,244      INST_RETIRED                     #     0.35 dtlb_mpki                (75.11%)
>>             29,992      DTLB_WALK                                                            (75.11%)
>>
>>        1.003450755 seconds time elapsed
>>        
>>
>> Jing Zhang (6):
>>   perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
>>   perf vendor events arm64: Add TLB metrics for neoverse-n2
>>   perf vendor events arm64: Add cache metrics for neoverse-n2
>>   perf vendor events arm64: Add branch metrics for neoverse-n2
>>   perf vendor events arm64: Add PE utilization metrics for neoverse-n2
>>   perf vendor events arm64: Add instruction mix metrics for neoverse-n2
>>
>>  .../arch/arm64/arm/neoverse-n2/metrics.json        | 247 +++++++++++++++++++++
>>  1 file changed, 247 insertions(+)
>>  create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH RFC 0/6] Add metrics for neoverse-n2
@ 2022-11-16 15:26     ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-16 15:26 UTC (permalink / raw)
  To: James Clark, nick Forrington, Jumana MP, John Garry
  Cc: Will Deacon, Mike Leach, Leo Yan, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Andrew Kilroy, Shuai Xue, Zhuo Song,
	linux-arm-kernel, linux-perf-users, linux-kernel



在 2022/11/16 下午7:19, James Clark 写道:
> 
> 
> On 31/10/2022 11:11, Jing Zhang wrote:
>> This series add six metricgroups for neoverse-n2, among which, the
>> formula of topdown L1 is from the document:
>> https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
>>
>> Since neoverse-n2 does not yet support topdown L2, metricgroups such
>> as Cache, TLB, Branch, InstructionsMix, and PEutilization are added to
>> help further analysis of performance bottlenecks.
>>
> 
> Hi Jing,
> 
> Thanks for working on this, these metrics look ok to me in general,
> although we're currently working on publishing standardised metrics
> across all new cores as part of a new project in Arm. This will include
> N2, and our ones are very similar (or almost identical) to yours,
> barring slightly different group names, metric names, and differences in
> things like outputting topdown metrics as percentages.
> 
> We plan to publish our standard metrics some time in the next 2 months.
> Would you consider holding off on merging this change so that we have
> consistant group names and units going forward? Otherwise N2 would be> the odd one out. I will send you the metrics when they are ready, and we
> will have a script to generate perf jsons from them, so you can review.
> 

Do you mean that after you release the new standard metrics, I remake my
patch referring to them, such as consistent group names and unit?


> We also have a slightly different forumula for one of the top down
> metrics which I think would be slightly more accurate. We don't have


The v2 version of the patchset updated the formula of topdown L1.
Link: https://lore.kernel.org/all/1668411720-3581-1-git-send-email-renyu.zj@linux.alibaba.com/

The formula of the v2 version is more accurate than v1, and it has been
verified in our test environment. Can you share your formula first and we
can discuss it together? :)

Thanks,
Jing


> anything for your "PE utilization" metrics, which I can raise
> internally. It could always be added to perf on top of the standardised
> ones if we don't add it to our standard ones.
> 
> Thanks
> James
> 
>> with this series on neoverse-n2:
>>
>> $./perf list metricgroup
>>
>> List of pre-defined events (to be used in -e):
>>
>>
>> Metric Groups:
>>
>> Branch
>> Cache
>> InstructionMix
>> PEutilization
>> TLB
>> TopDownL1
>>
>>
>> $./perf list
>>
>> ...
>> Metric Groups:
>>
>> Branch:
>>   branch_miss_pred_rate
>>        [The rate of branches mis-predited to the overall branches]
>>   branch_mpki
>>        [The rate of branches mis-predicted per kilo instructions]
>>   branch_pki
>>        [The rate of branches retired per kilo instructions]
>> Cache:
>>   l1d_cache_miss_rate
>>        [The rate of L1 D-Cache misses to the overall L1 D-Cache]
>>   l1d_cache_mpki
>>        [The rate of L1 D-Cache misses per kilo instructions]
>> ...
>>
>>
>> $sudo ./perf stat -a -M TLB sleep 1
>>
>>  Performance counter stats for 'system wide':
>>
>>         35,861,936      L1I_TLB                          #     0.00 itlb_walk_rate           (74.91%)
>>              5,661      ITLB_WALK                                                            (74.91%)
>>         97,279,240      INST_RETIRED                     #     0.07 itlb_mpki                (74.91%)
>>              6,851      ITLB_WALK                                                            (74.91%)
>>             26,391      DTLB_WALK                        #     0.00 dtlb_walk_rate           (75.07%)
>>         35,585,545      L1D_TLB                                                              (75.07%)
>>         85,923,244      INST_RETIRED                     #     0.35 dtlb_mpki                (75.11%)
>>             29,992      DTLB_WALK                                                            (75.11%)
>>
>>        1.003450755 seconds time elapsed
>>        
>>
>> Jing Zhang (6):
>>   perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
>>   perf vendor events arm64: Add TLB metrics for neoverse-n2
>>   perf vendor events arm64: Add cache metrics for neoverse-n2
>>   perf vendor events arm64: Add branch metrics for neoverse-n2
>>   perf vendor events arm64: Add PE utilization metrics for neoverse-n2
>>   perf vendor events arm64: Add instruction mix metrics for neoverse-n2
>>
>>  .../arch/arm64/arm/neoverse-n2/metrics.json        | 247 +++++++++++++++++++++
>>  1 file changed, 247 insertions(+)
>>  create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH RFC 0/6] Add metrics for neoverse-n2
  2022-11-16 11:19   ` James Clark
@ 2022-11-19  3:30     ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-19  3:30 UTC (permalink / raw)
  To: James Clark, nick Forrington, Jumana MP, John Garry
  Cc: Will Deacon, Mike Leach, Leo Yan, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Andrew Kilroy, Shuai Xue, Zhuo Song,
	linux-arm-kernel, linux-perf-users, linux-kernel



在 2022/11/16 下午7:19, James Clark 写道:
> 
> 
> On 31/10/2022 11:11, Jing Zhang wrote:
>> This series add six metricgroups for neoverse-n2, among which, the
>> formula of topdown L1 is from the document:
>> https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
>>
>> Since neoverse-n2 does not yet support topdown L2, metricgroups such
>> as Cache, TLB, Branch, InstructionsMix, and PEutilization are added to
>> help further analysis of performance bottlenecks.
>>
> 
> Hi Jing,
> 
> Thanks for working on this, these metrics look ok to me in general,
> although we're currently working on publishing standardised metrics
> across all new cores as part of a new project in Arm. This will include
> N2, and our ones are very similar (or almost identical) to yours,
> barring slightly different group names, metric names, and differences in
> things like outputting topdown metrics as percentages.
> 
> We plan to publish our standard metrics some time in the next 2 months.
> Would you consider holding off on merging this change so that we have
> consistant group names and units going forward? Otherwise N2 would be
> the odd one out. I will send you the metrics when they are ready, and we
> will have a script to generate perf jsons from them, so you can review.
> 
> We also have a slightly different forumula for one of the top down
> metrics which I think would be slightly more accurate. We don't have
> anything for your "PE utilization" metrics, which I can raise
> internally. It could always be added to perf on top of the standardised
> ones if we don't add it to our standard ones.
> 
> Thanks
> James
> 

Hi James,

Regarding the arm n2 standard metrics last time, is my understanding correct,
and does it meet your meaning? If so, may I ask when you will send me the
standards you formulate so that I can align with you in time over my patchset.
Please communicate this matter so that we can understand each other's schedule.

Thanks,
Jing


>> with this series on neoverse-n2:
>>
>> $./perf list metricgroup
>>
>> List of pre-defined events (to be used in -e):
>>
>>
>> Metric Groups:
>>
>> Branch
>> Cache
>> InstructionMix
>> PEutilization
>> TLB
>> TopDownL1
>>
>>
>> $./perf list
>>
>> ...
>> Metric Groups:
>>
>> Branch:
>>   branch_miss_pred_rate
>>        [The rate of branches mis-predited to the overall branches]
>>   branch_mpki
>>        [The rate of branches mis-predicted per kilo instructions]
>>   branch_pki
>>        [The rate of branches retired per kilo instructions]
>> Cache:
>>   l1d_cache_miss_rate
>>        [The rate of L1 D-Cache misses to the overall L1 D-Cache]
>>   l1d_cache_mpki
>>        [The rate of L1 D-Cache misses per kilo instructions]
>> ...
>>
>>
>> $sudo ./perf stat -a -M TLB sleep 1
>>
>>  Performance counter stats for 'system wide':
>>
>>         35,861,936      L1I_TLB                          #     0.00 itlb_walk_rate           (74.91%)
>>              5,661      ITLB_WALK                                                            (74.91%)
>>         97,279,240      INST_RETIRED                     #     0.07 itlb_mpki                (74.91%)
>>              6,851      ITLB_WALK                                                            (74.91%)
>>             26,391      DTLB_WALK                        #     0.00 dtlb_walk_rate           (75.07%)
>>         35,585,545      L1D_TLB                                                              (75.07%)
>>         85,923,244      INST_RETIRED                     #     0.35 dtlb_mpki                (75.11%)
>>             29,992      DTLB_WALK                                                            (75.11%)
>>
>>        1.003450755 seconds time elapsed
>>        
>>
>> Jing Zhang (6):
>>   perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
>>   perf vendor events arm64: Add TLB metrics for neoverse-n2
>>   perf vendor events arm64: Add cache metrics for neoverse-n2
>>   perf vendor events arm64: Add branch metrics for neoverse-n2
>>   perf vendor events arm64: Add PE utilization metrics for neoverse-n2
>>   perf vendor events arm64: Add instruction mix metrics for neoverse-n2
>>
>>  .../arch/arm64/arm/neoverse-n2/metrics.json        | 247 +++++++++++++++++++++
>>  1 file changed, 247 insertions(+)
>>  create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH RFC 0/6] Add metrics for neoverse-n2
@ 2022-11-19  3:30     ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-19  3:30 UTC (permalink / raw)
  To: James Clark, nick Forrington, Jumana MP, John Garry
  Cc: Will Deacon, Mike Leach, Leo Yan, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Andrew Kilroy, Shuai Xue, Zhuo Song,
	linux-arm-kernel, linux-perf-users, linux-kernel



在 2022/11/16 下午7:19, James Clark 写道:
> 
> 
> On 31/10/2022 11:11, Jing Zhang wrote:
>> This series add six metricgroups for neoverse-n2, among which, the
>> formula of topdown L1 is from the document:
>> https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
>>
>> Since neoverse-n2 does not yet support topdown L2, metricgroups such
>> as Cache, TLB, Branch, InstructionsMix, and PEutilization are added to
>> help further analysis of performance bottlenecks.
>>
> 
> Hi Jing,
> 
> Thanks for working on this, these metrics look ok to me in general,
> although we're currently working on publishing standardised metrics
> across all new cores as part of a new project in Arm. This will include
> N2, and our ones are very similar (or almost identical) to yours,
> barring slightly different group names, metric names, and differences in
> things like outputting topdown metrics as percentages.
> 
> We plan to publish our standard metrics some time in the next 2 months.
> Would you consider holding off on merging this change so that we have
> consistant group names and units going forward? Otherwise N2 would be
> the odd one out. I will send you the metrics when they are ready, and we
> will have a script to generate perf jsons from them, so you can review.
> 
> We also have a slightly different forumula for one of the top down
> metrics which I think would be slightly more accurate. We don't have
> anything for your "PE utilization" metrics, which I can raise
> internally. It could always be added to perf on top of the standardised
> ones if we don't add it to our standard ones.
> 
> Thanks
> James
> 

Hi James,

Regarding the arm n2 standard metrics last time, is my understanding correct,
and does it meet your meaning? If so, may I ask when you will send me the
standards you formulate so that I can align with you in time over my patchset.
Please communicate this matter so that we can understand each other's schedule.

Thanks,
Jing


>> with this series on neoverse-n2:
>>
>> $./perf list metricgroup
>>
>> List of pre-defined events (to be used in -e):
>>
>>
>> Metric Groups:
>>
>> Branch
>> Cache
>> InstructionMix
>> PEutilization
>> TLB
>> TopDownL1
>>
>>
>> $./perf list
>>
>> ...
>> Metric Groups:
>>
>> Branch:
>>   branch_miss_pred_rate
>>        [The rate of branches mis-predited to the overall branches]
>>   branch_mpki
>>        [The rate of branches mis-predicted per kilo instructions]
>>   branch_pki
>>        [The rate of branches retired per kilo instructions]
>> Cache:
>>   l1d_cache_miss_rate
>>        [The rate of L1 D-Cache misses to the overall L1 D-Cache]
>>   l1d_cache_mpki
>>        [The rate of L1 D-Cache misses per kilo instructions]
>> ...
>>
>>
>> $sudo ./perf stat -a -M TLB sleep 1
>>
>>  Performance counter stats for 'system wide':
>>
>>         35,861,936      L1I_TLB                          #     0.00 itlb_walk_rate           (74.91%)
>>              5,661      ITLB_WALK                                                            (74.91%)
>>         97,279,240      INST_RETIRED                     #     0.07 itlb_mpki                (74.91%)
>>              6,851      ITLB_WALK                                                            (74.91%)
>>             26,391      DTLB_WALK                        #     0.00 dtlb_walk_rate           (75.07%)
>>         35,585,545      L1D_TLB                                                              (75.07%)
>>         85,923,244      INST_RETIRED                     #     0.35 dtlb_mpki                (75.11%)
>>             29,992      DTLB_WALK                                                            (75.11%)
>>
>>        1.003450755 seconds time elapsed
>>        
>>
>> Jing Zhang (6):
>>   perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
>>   perf vendor events arm64: Add TLB metrics for neoverse-n2
>>   perf vendor events arm64: Add cache metrics for neoverse-n2
>>   perf vendor events arm64: Add branch metrics for neoverse-n2
>>   perf vendor events arm64: Add PE utilization metrics for neoverse-n2
>>   perf vendor events arm64: Add instruction mix metrics for neoverse-n2
>>
>>  .../arch/arm64/arm/neoverse-n2/metrics.json        | 247 +++++++++++++++++++++
>>  1 file changed, 247 insertions(+)
>>  create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH RFC 0/6] Add metrics for neoverse-n2
       [not found]     ` <CAP-5=fW+Z_Tc3BfK1bRKUeKWfxtPfoZXL9D2BhcU1SzNOruSsg@mail.gmail.com>
@ 2022-11-20  3:49         ` Jing Zhang
  2022-11-21 11:55         ` James Clark
  1 sibling, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-20  3:49 UTC (permalink / raw)
  To: Ian Rogers
  Cc: James Clark, nick Forrington, Jumana MP, John Garry, Will Deacon,
	Mike Leach, Leo Yan, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Andrew Kilroy, Shuai Xue, Zhuo Song,
	Linux ARM, linux-perf-users, LKML



在 2022/11/20 上午5:46, Ian Rogers 写道:
> On Fri, Nov 18, 2022 at 7:30 PM Jing Zhang <renyu.zj@linux.alibaba.com <mailto:renyu.zj@linux.alibaba.com>> wrote:
>>
>>
>> 在 2022/11/16 下午7:19, James Clark 写道:
>> >
>> >
>> > On 31/10/2022 11:11, Jing Zhang wrote:
>> >> This series add six metricgroups for neoverse-n2, among which, the
>> >> formula of topdown L1 is from the document:
>> >> https://documentation-service.arm.com/static/60250c7395978b529036da86?token= <https://documentation-service.arm.com/static/60250c7395978b529036da86?token=>
>> >>
>> >> Since neoverse-n2 does not yet support topdown L2, metricgroups such
>> >> as Cache, TLB, Branch, InstructionsMix, and PEutilization are added to
>> >> help further analysis of performance bottlenecks.
>> >>
>> >
>> > Hi Jing,
>> >
>> > Thanks for working on this, these metrics look ok to me in general,
>> > although we're currently working on publishing standardised metrics
>> > across all new cores as part of a new project in Arm. This will include
>> > N2, and our ones are very similar (or almost identical) to yours,
>> > barring slightly different group names, metric names, and differences in
>> > things like outputting topdown metrics as percentages.
>> >
>> > We plan to publish our standard metrics some time in the next 2 months.
>> > Would you consider holding off on merging this change so that we have
>> > consistant group names and units going forward? Otherwise N2 would be
>> > the odd one out. I will send you the metrics when they are ready, and we
>> > will have a script to generate perf jsons from them, so you can review.
>> >
>> > We also have a slightly different forumula for one of the top down
>> > metrics which I think would be slightly more accurate. We don't have
>> > anything for your "PE utilization" metrics, which I can raise
>> > internally. It could always be added to perf on top of the standardised
>> > ones if we don't add it to our standard ones.
>> >
>> > Thanks
>> > James
>> >
>>
>> Hi James,
>>
>> Regarding the arm n2 standard metrics last time, is my understanding correct,
>> and does it meet your meaning? If so, may I ask when you will send me the
>> standards you formulate so that I can align with you in time over my patchset.
>> Please communicate this matter so that we can understand each other's schedule.
>>
>> Thanks,
>> Jing
> 
> Hi,
> 
> In past versions of the perf tool the metrics have been pretty broken. If we have something that is good we shouldn't be holding it to a bar of being perfect, we can merge what we have and improve over time. In this case what Jing has prepared may arrive in time for Linux 6.2 whilst the standard metrics may arrive in time for 6.3. I'd suggest merging Jing's work and then improving on it with the standard metrics.
> 
> In terms of the metrics themselves, could we add ScaleUnit? For example:
> 
> +    {
> +        "MetricExpr": "LD_SPEC / INST_SPEC",
> +        "PublicDescription": "The rate of load instructions speculatively executed to overall instructions speclatively executed",
> +        "BriefDescription": "The rate of load instructions speculatively executed to overall instructions speclatively executed",
> +        "MetricGroup": "InstructionMix",
> +        "MetricName": "load_spec_rate"
> +    },
> 
> A ScaleUnit of "100%" would likely make things more readable.
> 

Ok, I'll modify it over your suggestion, making it more readable, and move on with it.

Thanks,
Jing

> Thanks,
> Ian
> 
>> >> with this series on neoverse-n2:
>> >>
>> >> $./perf list metricgroup
>> >>
>> >> List of pre-defined events (to be used in -e):
>> >>
>> >>
>> >> Metric Groups:
>> >>
>> >> Branch
>> >> Cache
>> >> InstructionMix
>> >> PEutilization
>> >> TLB
>> >> TopDownL1
>> >>
>> >>
>> >> $./perf list
>> >>
>> >> ...
>> >> Metric Groups:
>> >>
>> >> Branch:
>> >>   branch_miss_pred_rate
>> >>        [The rate of branches mis-predited to the overall branches]
>> >>   branch_mpki
>> >>        [The rate of branches mis-predicted per kilo instructions]
>> >>   branch_pki
>> >>        [The rate of branches retired per kilo instructions]
>> >> Cache:
>> >>   l1d_cache_miss_rate
>> >>        [The rate of L1 D-Cache misses to the overall L1 D-Cache]
>> >>   l1d_cache_mpki
>> >>        [The rate of L1 D-Cache misses per kilo instructions]
>> >> ...
>> >>
>> >>
>> >> $sudo ./perf stat -a -M TLB sleep 1
>> >>
>> >>  Performance counter stats for 'system wide':
>> >>
>> >>         35,861,936      L1I_TLB                          #     0.00 itlb_walk_rate           (74.91%)
>> >>              5,661      ITLB_WALK                                                            (74.91%)
>> >>         97,279,240      INST_RETIRED                     #     0.07 itlb_mpki                (74.91%)
>> >>              6,851      ITLB_WALK                                                            (74.91%)
>> >>             26,391      DTLB_WALK                        #     0.00 dtlb_walk_rate           (75.07%)
>> >>         35,585,545      L1D_TLB                                                              (75.07%)
>> >>         85,923,244      INST_RETIRED                     #     0.35 dtlb_mpki                (75.11%)
>> >>             29,992      DTLB_WALK                                                            (75.11%)
>> >>
>> >>        1.003450755 seconds time elapsed
>> >>     
>> >>
>> >> Jing Zhang (6):
>> >>   perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
>> >>   perf vendor events arm64: Add TLB metrics for neoverse-n2
>> >>   perf vendor events arm64: Add cache metrics for neoverse-n2
>> >>   perf vendor events arm64: Add branch metrics for neoverse-n2
>> >>   perf vendor events arm64: Add PE utilization metrics for neoverse-n2
>> >>   perf vendor events arm64: Add instruction mix metrics for neoverse-n2
>> >>
>> >>  .../arch/arm64/arm/neoverse-n2/metrics.json        | 247 +++++++++++++++++++++
>> >>  1 file changed, 247 insertions(+)
>> >>  create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> >>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH RFC 0/6] Add metrics for neoverse-n2
@ 2022-11-20  3:49         ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-20  3:49 UTC (permalink / raw)
  To: Ian Rogers
  Cc: James Clark, nick Forrington, Jumana MP, John Garry, Will Deacon,
	Mike Leach, Leo Yan, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Andrew Kilroy, Shuai Xue, Zhuo Song,
	Linux ARM, linux-perf-users, LKML



在 2022/11/20 上午5:46, Ian Rogers 写道:
> On Fri, Nov 18, 2022 at 7:30 PM Jing Zhang <renyu.zj@linux.alibaba.com <mailto:renyu.zj@linux.alibaba.com>> wrote:
>>
>>
>> 在 2022/11/16 下午7:19, James Clark 写道:
>> >
>> >
>> > On 31/10/2022 11:11, Jing Zhang wrote:
>> >> This series add six metricgroups for neoverse-n2, among which, the
>> >> formula of topdown L1 is from the document:
>> >> https://documentation-service.arm.com/static/60250c7395978b529036da86?token= <https://documentation-service.arm.com/static/60250c7395978b529036da86?token=>
>> >>
>> >> Since neoverse-n2 does not yet support topdown L2, metricgroups such
>> >> as Cache, TLB, Branch, InstructionsMix, and PEutilization are added to
>> >> help further analysis of performance bottlenecks.
>> >>
>> >
>> > Hi Jing,
>> >
>> > Thanks for working on this, these metrics look ok to me in general,
>> > although we're currently working on publishing standardised metrics
>> > across all new cores as part of a new project in Arm. This will include
>> > N2, and our ones are very similar (or almost identical) to yours,
>> > barring slightly different group names, metric names, and differences in
>> > things like outputting topdown metrics as percentages.
>> >
>> > We plan to publish our standard metrics some time in the next 2 months.
>> > Would you consider holding off on merging this change so that we have
>> > consistant group names and units going forward? Otherwise N2 would be
>> > the odd one out. I will send you the metrics when they are ready, and we
>> > will have a script to generate perf jsons from them, so you can review.
>> >
>> > We also have a slightly different forumula for one of the top down
>> > metrics which I think would be slightly more accurate. We don't have
>> > anything for your "PE utilization" metrics, which I can raise
>> > internally. It could always be added to perf on top of the standardised
>> > ones if we don't add it to our standard ones.
>> >
>> > Thanks
>> > James
>> >
>>
>> Hi James,
>>
>> Regarding the arm n2 standard metrics last time, is my understanding correct,
>> and does it meet your meaning? If so, may I ask when you will send me the
>> standards you formulate so that I can align with you in time over my patchset.
>> Please communicate this matter so that we can understand each other's schedule.
>>
>> Thanks,
>> Jing
> 
> Hi,
> 
> In past versions of the perf tool the metrics have been pretty broken. If we have something that is good we shouldn't be holding it to a bar of being perfect, we can merge what we have and improve over time. In this case what Jing has prepared may arrive in time for Linux 6.2 whilst the standard metrics may arrive in time for 6.3. I'd suggest merging Jing's work and then improving on it with the standard metrics.
> 
> In terms of the metrics themselves, could we add ScaleUnit? For example:
> 
> +    {
> +        "MetricExpr": "LD_SPEC / INST_SPEC",
> +        "PublicDescription": "The rate of load instructions speculatively executed to overall instructions speclatively executed",
> +        "BriefDescription": "The rate of load instructions speculatively executed to overall instructions speclatively executed",
> +        "MetricGroup": "InstructionMix",
> +        "MetricName": "load_spec_rate"
> +    },
> 
> A ScaleUnit of "100%" would likely make things more readable.
> 

Ok, I'll modify it over your suggestion, making it more readable, and move on with it.

Thanks,
Jing

> Thanks,
> Ian
> 
>> >> with this series on neoverse-n2:
>> >>
>> >> $./perf list metricgroup
>> >>
>> >> List of pre-defined events (to be used in -e):
>> >>
>> >>
>> >> Metric Groups:
>> >>
>> >> Branch
>> >> Cache
>> >> InstructionMix
>> >> PEutilization
>> >> TLB
>> >> TopDownL1
>> >>
>> >>
>> >> $./perf list
>> >>
>> >> ...
>> >> Metric Groups:
>> >>
>> >> Branch:
>> >>   branch_miss_pred_rate
>> >>        [The rate of branches mis-predited to the overall branches]
>> >>   branch_mpki
>> >>        [The rate of branches mis-predicted per kilo instructions]
>> >>   branch_pki
>> >>        [The rate of branches retired per kilo instructions]
>> >> Cache:
>> >>   l1d_cache_miss_rate
>> >>        [The rate of L1 D-Cache misses to the overall L1 D-Cache]
>> >>   l1d_cache_mpki
>> >>        [The rate of L1 D-Cache misses per kilo instructions]
>> >> ...
>> >>
>> >>
>> >> $sudo ./perf stat -a -M TLB sleep 1
>> >>
>> >>  Performance counter stats for 'system wide':
>> >>
>> >>         35,861,936      L1I_TLB                          #     0.00 itlb_walk_rate           (74.91%)
>> >>              5,661      ITLB_WALK                                                            (74.91%)
>> >>         97,279,240      INST_RETIRED                     #     0.07 itlb_mpki                (74.91%)
>> >>              6,851      ITLB_WALK                                                            (74.91%)
>> >>             26,391      DTLB_WALK                        #     0.00 dtlb_walk_rate           (75.07%)
>> >>         35,585,545      L1D_TLB                                                              (75.07%)
>> >>         85,923,244      INST_RETIRED                     #     0.35 dtlb_mpki                (75.11%)
>> >>             29,992      DTLB_WALK                                                            (75.11%)
>> >>
>> >>        1.003450755 seconds time elapsed
>> >>     
>> >>
>> >> Jing Zhang (6):
>> >>   perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
>> >>   perf vendor events arm64: Add TLB metrics for neoverse-n2
>> >>   perf vendor events arm64: Add cache metrics for neoverse-n2
>> >>   perf vendor events arm64: Add branch metrics for neoverse-n2
>> >>   perf vendor events arm64: Add PE utilization metrics for neoverse-n2
>> >>   perf vendor events arm64: Add instruction mix metrics for neoverse-n2
>> >>
>> >>  .../arch/arm64/arm/neoverse-n2/metrics.json        | 247 +++++++++++++++++++++
>> >>  1 file changed, 247 insertions(+)
>> >>  create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> >>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [External] : [RFC PATCH v2 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
  2022-11-15 11:19         ` John Garry
@ 2022-11-21  9:53           ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-21  9:53 UTC (permalink / raw)
  To: John Garry, linux-arm-kernel, linux-perf-users, linux-kernel,
	John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan,
	Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song



在 2022/11/15 下午7:19, John Garry 写道:
> On 15/11/2022 08:43, Jing Zhang wrote:
>> I didn't find out how to put the metric as an arch std event, it would be best if you could provide me with an example in the upstream code,
>> thank you very much.
> 
> As things stand, I don't think it's supported. We only support regular events for std arch events (and not metrics).
> 
> However we could expand support for metrics.
> 
> For the example of hip08 and FRONTEND_BOUND, we would have:
> 
> --->8---
> 
> diff --git a/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json b/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json
> index 6443a061e22a..5b1ca45224de 100644
> --- a/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json
> +++ b/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json
> @@ -1,10 +1,6 @@
>  [
>      {
> -        "MetricExpr": "FETCH_BUBBLE / (4 * CPU_CYCLES)",
> -        "PublicDescription": "Frontend bound L1 topdown metric",
> -        "BriefDescription": "Frontend bound L1 topdown metric",
> -        "MetricGroup": "TopDownL1",
> -        "MetricName": "frontend_bound"
> +        "ArchStdEvent": "FRONTEND_BOUND"
>      },
>      {
>          "MetricExpr": "(INST_SPEC - INST_RETIRED) / (4 * CPU_CYCLES)",
> diff --git a/tools/perf/pmu-events/arch/arm64/sbsa.json b/tools/perf/pmu-events/arch/arm64/sbsa.json
> new file mode 100644
> index 000000000000..10b9c0cccc40
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/sbsa.json
> @@ -0,0 +1,9 @@
> +[
> +    {
> +        "MetricExpr": "FETCH_BUBBLE / (4 * CPU_CYCLES)",
> +        "PublicDescription": "Frontend bound L1 topdown metric",
> +        "BriefDescription": "Frontend bound L1 topdown metric",
> +        "MetricGroup": "TopDownL1",
> +        "MetricName": "FRONTEND_BOUND"
> +    }
> +]
> diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
> index 0daa3e007528..77049853c0bf 100755
> --- a/tools/perf/pmu-events/jevents.py
> +++ b/tools/perf/pmu-events/jevents.py
> @@ -352,6 +352,8 @@ def preprocess_arch_std_files(archpath: str) -> None:
>        for event in read_json_events(item.path, topic=''):
>          if event.name:
>            _arch_std_events[event.name.lower()] = event
> +        if event.metric_name:
> +          _arch_std_events[event.metric_name.lower()] = event
> 
> 
>  def print_events_table_prefix(tblname: str) -> None:


Sorry for slow response.

I tried the method you provided, but it didn't work, is there any other steps I am missing?
Or is this method not currently supported?


diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics
index 8ff1dfe..2ad30ec 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -1,10 +1,6 @@
 [
     {
-        "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * cpu_cycles)",
-        "PublicDescription": "Frontend bound L1 topdown metric",
-        "BriefDescription": "Frontend bound L1 topdown metric",
-        "MetricGroup": "TopDownL1",
-        "MetricName": "frontend_bound"
+        "ArchStdEvent": "FRONTEND_BOUND"
     },


diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeli
index f9fae15..e8536e2 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json
@@ -6,9 +6,6 @@
     {
         "ArchStdEvent": "STALL_BACKEND_MEM"
-    }
+    },
+    {
+        "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * cpu_cycles)",
+        "PublicDescription": "Frontend bound L1 topdown metric",
+        "BriefDescription": "Frontend bound L1 topdown metric",
+        "MetricGroup": "TopDownL1",
+        "MetricName": "FRONTEND_BOUND"
+    }
 ]


diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index 0daa3e0..7704985 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -352,6 +352,8 @@ def preprocess_arch_std_files(archpath: str) -> None:
       for event in read_json_events(item.path, topic=''):
         if event.name:
           _arch_std_events[event.name.lower()] = event
+        if event.metric_name:
+          _arch_std_events[event.metric_name.lower()] = event


#./perf stat -e FRONTEND_BOUND sleep 1
event syntax error: 'FRONTEND_BOUND'
                     \___ parser error
Run 'perf list' for a list of valid events

 Usage: perf stat [<options>] [<command>]

    -e, --event <event>   event selector. use 'perf list' to list available events



diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeli
index f9fae15..1089ca0 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json
@@ -6,18 +6,24 @@
         "ArchStdEvent": "STALL_BACKEND"
     },
     {
-        "ArchStdEvent": "STALL_SLOT_FRONTEND"
+        "ArchStdEvent": "STALL_SLOT_FRONTEND",
+        "MetricExpr": "STALL_SLOT_FRONTEND - CPU_CYCLES"
     },
     {

#./perf stat -e stall_slot_frontend sleep 1
Add CPU_CYCLES event to groups to get metric expression for stall_slot_frontend

 Performance counter stats for 'sleep 1':

         5,125,457      stall_slot_frontend  //it's still the original value.

       1.001017680 seconds time elapsed

       0.001162000 seconds user
       0.000000000 seconds sys


Thanks,
Jing

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* Re: [External] : [RFC PATCH v2 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
@ 2022-11-21  9:53           ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-21  9:53 UTC (permalink / raw)
  To: John Garry, linux-arm-kernel, linux-perf-users, linux-kernel,
	John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan,
	Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song



在 2022/11/15 下午7:19, John Garry 写道:
> On 15/11/2022 08:43, Jing Zhang wrote:
>> I didn't find out how to put the metric as an arch std event, it would be best if you could provide me with an example in the upstream code,
>> thank you very much.
> 
> As things stand, I don't think it's supported. We only support regular events for std arch events (and not metrics).
> 
> However we could expand support for metrics.
> 
> For the example of hip08 and FRONTEND_BOUND, we would have:
> 
> --->8---
> 
> diff --git a/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json b/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json
> index 6443a061e22a..5b1ca45224de 100644
> --- a/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json
> +++ b/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json
> @@ -1,10 +1,6 @@
>  [
>      {
> -        "MetricExpr": "FETCH_BUBBLE / (4 * CPU_CYCLES)",
> -        "PublicDescription": "Frontend bound L1 topdown metric",
> -        "BriefDescription": "Frontend bound L1 topdown metric",
> -        "MetricGroup": "TopDownL1",
> -        "MetricName": "frontend_bound"
> +        "ArchStdEvent": "FRONTEND_BOUND"
>      },
>      {
>          "MetricExpr": "(INST_SPEC - INST_RETIRED) / (4 * CPU_CYCLES)",
> diff --git a/tools/perf/pmu-events/arch/arm64/sbsa.json b/tools/perf/pmu-events/arch/arm64/sbsa.json
> new file mode 100644
> index 000000000000..10b9c0cccc40
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/sbsa.json
> @@ -0,0 +1,9 @@
> +[
> +    {
> +        "MetricExpr": "FETCH_BUBBLE / (4 * CPU_CYCLES)",
> +        "PublicDescription": "Frontend bound L1 topdown metric",
> +        "BriefDescription": "Frontend bound L1 topdown metric",
> +        "MetricGroup": "TopDownL1",
> +        "MetricName": "FRONTEND_BOUND"
> +    }
> +]
> diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
> index 0daa3e007528..77049853c0bf 100755
> --- a/tools/perf/pmu-events/jevents.py
> +++ b/tools/perf/pmu-events/jevents.py
> @@ -352,6 +352,8 @@ def preprocess_arch_std_files(archpath: str) -> None:
>        for event in read_json_events(item.path, topic=''):
>          if event.name:
>            _arch_std_events[event.name.lower()] = event
> +        if event.metric_name:
> +          _arch_std_events[event.metric_name.lower()] = event
> 
> 
>  def print_events_table_prefix(tblname: str) -> None:


Sorry for slow response.

I tried the method you provided, but it didn't work, is there any other steps I am missing?
Or is this method not currently supported?


diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics
index 8ff1dfe..2ad30ec 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -1,10 +1,6 @@
 [
     {
-        "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * cpu_cycles)",
-        "PublicDescription": "Frontend bound L1 topdown metric",
-        "BriefDescription": "Frontend bound L1 topdown metric",
-        "MetricGroup": "TopDownL1",
-        "MetricName": "frontend_bound"
+        "ArchStdEvent": "FRONTEND_BOUND"
     },


diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeli
index f9fae15..e8536e2 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json
@@ -6,9 +6,6 @@
     {
         "ArchStdEvent": "STALL_BACKEND_MEM"
-    }
+    },
+    {
+        "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * cpu_cycles)",
+        "PublicDescription": "Frontend bound L1 topdown metric",
+        "BriefDescription": "Frontend bound L1 topdown metric",
+        "MetricGroup": "TopDownL1",
+        "MetricName": "FRONTEND_BOUND"
+    }
 ]


diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index 0daa3e0..7704985 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -352,6 +352,8 @@ def preprocess_arch_std_files(archpath: str) -> None:
       for event in read_json_events(item.path, topic=''):
         if event.name:
           _arch_std_events[event.name.lower()] = event
+        if event.metric_name:
+          _arch_std_events[event.metric_name.lower()] = event


#./perf stat -e FRONTEND_BOUND sleep 1
event syntax error: 'FRONTEND_BOUND'
                     \___ parser error
Run 'perf list' for a list of valid events

 Usage: perf stat [<options>] [<command>]

    -e, --event <event>   event selector. use 'perf list' to list available events



diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeli
index f9fae15..1089ca0 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json
@@ -6,18 +6,24 @@
         "ArchStdEvent": "STALL_BACKEND"
     },
     {
-        "ArchStdEvent": "STALL_SLOT_FRONTEND"
+        "ArchStdEvent": "STALL_SLOT_FRONTEND",
+        "MetricExpr": "STALL_SLOT_FRONTEND - CPU_CYCLES"
     },
     {

#./perf stat -e stall_slot_frontend sleep 1
Add CPU_CYCLES event to groups to get metric expression for stall_slot_frontend

 Performance counter stats for 'sleep 1':

         5,125,457      stall_slot_frontend  //it's still the original value.

       1.001017680 seconds time elapsed

       0.001162000 seconds user
       0.000000000 seconds sys


Thanks,
Jing

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* Re: [External] : [RFC PATCH v2 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
  2022-11-21  9:53           ` Jing Zhang
@ 2022-11-21 10:22             ` John Garry
  -1 siblings, 0 replies; 96+ messages in thread
From: John Garry @ 2022-11-21 10:22 UTC (permalink / raw)
  To: Jing Zhang, linux-arm-kernel, linux-perf-users, linux-kernel,
	John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan,
	Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song


> 
> 
> #./perf stat -e FRONTEND_BOUND sleep 1
> event syntax error: 'FRONTEND_BOUND'

For metrics, use -M, not -e

If this doesn't help, verify generated pmu-events/pmu-events.c is same 
after you make the change to try to use std arch events for metrics. 
Note that I never tested running my change.

Thanks,
John

>                       \___ parser error
> Run 'perf list' for a list of valid events
> 
>   Usage: perf stat [<options>] [<command>]
> 
>      -e, --event <event>   event selector. use 'perf list' to list available events
> 
> 
> 
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeli
> index f9fae15..1089ca0 100644
> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json
> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json
> @@ -6,18 +6,24 @@
>           "ArchStdEvent": "STALL_BACKEND"
>       },
>       {
> -        "ArchStdEvent": "STALL_SLOT_FRONTEND"
> +        "ArchStdEvent": "STALL_SLOT_FRONTEND",
> +        "MetricExpr": "STALL_SLOT_FRONTEND - CPU_CYCLES"
>       },
>       {
> 

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [External] : [RFC PATCH v2 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
@ 2022-11-21 10:22             ` John Garry
  0 siblings, 0 replies; 96+ messages in thread
From: John Garry @ 2022-11-21 10:22 UTC (permalink / raw)
  To: Jing Zhang, linux-arm-kernel, linux-perf-users, linux-kernel,
	John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan,
	Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song


> 
> 
> #./perf stat -e FRONTEND_BOUND sleep 1
> event syntax error: 'FRONTEND_BOUND'

For metrics, use -M, not -e

If this doesn't help, verify generated pmu-events/pmu-events.c is same 
after you make the change to try to use std arch events for metrics. 
Note that I never tested running my change.

Thanks,
John

>                       \___ parser error
> Run 'perf list' for a list of valid events
> 
>   Usage: perf stat [<options>] [<command>]
> 
>      -e, --event <event>   event selector. use 'perf list' to list available events
> 
> 
> 
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeli
> index f9fae15..1089ca0 100644
> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json
> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json
> @@ -6,18 +6,24 @@
>           "ArchStdEvent": "STALL_BACKEND"
>       },
>       {
> -        "ArchStdEvent": "STALL_SLOT_FRONTEND"
> +        "ArchStdEvent": "STALL_SLOT_FRONTEND",
> +        "MetricExpr": "STALL_SLOT_FRONTEND - CPU_CYCLES"
>       },
>       {
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH RFC 0/6] Add metrics for neoverse-n2
  2022-11-16 15:26     ` Jing Zhang
@ 2022-11-21 11:51       ` James Clark
  -1 siblings, 0 replies; 96+ messages in thread
From: James Clark @ 2022-11-21 11:51 UTC (permalink / raw)
  To: Jing Zhang, nick Forrington, Jumana MP, John Garry
  Cc: Will Deacon, Mike Leach, Leo Yan, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Andrew Kilroy, Shuai Xue, Zhuo Song,
	linux-arm-kernel, linux-perf-users, linux-kernel



On 16/11/2022 15:26, Jing Zhang wrote:
> 
> 
> 在 2022/11/16 下午7:19, James Clark 写道:
>>
>>
>> On 31/10/2022 11:11, Jing Zhang wrote:
>>> This series add six metricgroups for neoverse-n2, among which, the
>>> formula of topdown L1 is from the document:
>>> https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
>>>
>>> Since neoverse-n2 does not yet support topdown L2, metricgroups such
>>> as Cache, TLB, Branch, InstructionsMix, and PEutilization are added to
>>> help further analysis of performance bottlenecks.
>>>
>>
>> Hi Jing,
>>
>> Thanks for working on this, these metrics look ok to me in general,
>> although we're currently working on publishing standardised metrics
>> across all new cores as part of a new project in Arm. This will include
>> N2, and our ones are very similar (or almost identical) to yours,
>> barring slightly different group names, metric names, and differences in
>> things like outputting topdown metrics as percentages.
>>
>> We plan to publish our standard metrics some time in the next 2 months.
>> Would you consider holding off on merging this change so that we have
>> consistant group names and units going forward? Otherwise N2 would be> the odd one out. I will send you the metrics when they are ready, and we
>> will have a script to generate perf jsons from them, so you can review.
>>
> 
> Do you mean that after you release the new standard metrics, I remake my
> patch referring to them, such as consistent group names and unit?

Hi Jing,

I was planning to submit the patch myself, but there will be a script to
generate perf json files, so no manual work would be needed. Although
this is complicated by the fact that we won't be publishing the fixed
TopdownL1 metrics that you have for the existing N2 silicon so there
would be a one time copy paste to fix that part.

> 
> 
>> We also have a slightly different forumula for one of the top down
>> metrics which I think would be slightly more accurate. We don't have
> 
> 
> The v2 version of the patchset updated the formula of topdown L1.
> Link: https://lore.kernel.org/all/1668411720-3581-1-git-send-email-renyu.zj@linux.alibaba.com/
> 
> The formula of the v2 version is more accurate than v1, and it has been
> verified in our test environment. Can you share your formula first and we
> can discuss it together? :)

I was looking at v2 but replied to the root of the thread by mistake. I
also had it the wrong way round. So your version corrects for the errata
on the current version of N2 (as you mentioned in the commit message).
Our version would be if there is a future new silicon revision with that
fixed, but it does have an extra improvement by subtracting the branch
mispredicts.

Perf doesn't currently match the jsons based on silicon revision, so
we'd have to add something in for that if a fixed silicon version is
released. But this is another problem for another time.

This is the frontend bound metric we have for future revisions:

	"100 * ( (STALL_SLOT_FRONTEND/(CPU_CYCLES * 5)) - ((BR_MIS_PRED *
4)/CPU_CYCLES) )"

Other changes are, for example, your 'wasted' metric, we have
'bad_speculation', and without the
cycles subtraction:

	100 * ( ((1 - (OP_RETIRED/OP_SPEC)) * (1 - (STALL_SLOT/(CPU_CYCLES *
5)))) + ((BR_MIS_PRED * 4)/CPU_CYCLES) )

And some more details filled in around the units, for example:

    {
        "MetricName": "bad_speculation",
        "MetricExpr": "100 * ( ((1 - (OP_RETIRED/OP_SPEC)) * (1 -
(STALL_SLOT/(CPU_CYCLES * 5)))) + ((BR_MIS_PRED * 4)/CPU_CYCLES) )",
        "BriefDescription": "Bad Speculation",
        "PublicDescription": "This metric is the percentage of total
slots that executed operations and didn't retire due to a pipeline
flush.\nThis indicates cycles that were utilized but inefficiently.",
        "MetricGroup": "TopdownL1",
        "ScaleUnit": "1percent of slots"
    },

So ignoring the errata issue, the main reason to hold off is for
consistency and churn because these metrics in this format will be
released for all cores going forwards.

Thanks
James



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH RFC 0/6] Add metrics for neoverse-n2
@ 2022-11-21 11:51       ` James Clark
  0 siblings, 0 replies; 96+ messages in thread
From: James Clark @ 2022-11-21 11:51 UTC (permalink / raw)
  To: Jing Zhang, nick Forrington, Jumana MP, John Garry
  Cc: Will Deacon, Mike Leach, Leo Yan, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Andrew Kilroy, Shuai Xue, Zhuo Song,
	linux-arm-kernel, linux-perf-users, linux-kernel



On 16/11/2022 15:26, Jing Zhang wrote:
> 
> 
> 在 2022/11/16 下午7:19, James Clark 写道:
>>
>>
>> On 31/10/2022 11:11, Jing Zhang wrote:
>>> This series add six metricgroups for neoverse-n2, among which, the
>>> formula of topdown L1 is from the document:
>>> https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
>>>
>>> Since neoverse-n2 does not yet support topdown L2, metricgroups such
>>> as Cache, TLB, Branch, InstructionsMix, and PEutilization are added to
>>> help further analysis of performance bottlenecks.
>>>
>>
>> Hi Jing,
>>
>> Thanks for working on this, these metrics look ok to me in general,
>> although we're currently working on publishing standardised metrics
>> across all new cores as part of a new project in Arm. This will include
>> N2, and our ones are very similar (or almost identical) to yours,
>> barring slightly different group names, metric names, and differences in
>> things like outputting topdown metrics as percentages.
>>
>> We plan to publish our standard metrics some time in the next 2 months.
>> Would you consider holding off on merging this change so that we have
>> consistant group names and units going forward? Otherwise N2 would be> the odd one out. I will send you the metrics when they are ready, and we
>> will have a script to generate perf jsons from them, so you can review.
>>
> 
> Do you mean that after you release the new standard metrics, I remake my
> patch referring to them, such as consistent group names and unit?

Hi Jing,

I was planning to submit the patch myself, but there will be a script to
generate perf json files, so no manual work would be needed. Although
this is complicated by the fact that we won't be publishing the fixed
TopdownL1 metrics that you have for the existing N2 silicon so there
would be a one time copy paste to fix that part.

> 
> 
>> We also have a slightly different forumula for one of the top down
>> metrics which I think would be slightly more accurate. We don't have
> 
> 
> The v2 version of the patchset updated the formula of topdown L1.
> Link: https://lore.kernel.org/all/1668411720-3581-1-git-send-email-renyu.zj@linux.alibaba.com/
> 
> The formula of the v2 version is more accurate than v1, and it has been
> verified in our test environment. Can you share your formula first and we
> can discuss it together? :)

I was looking at v2 but replied to the root of the thread by mistake. I
also had it the wrong way round. So your version corrects for the errata
on the current version of N2 (as you mentioned in the commit message).
Our version would be if there is a future new silicon revision with that
fixed, but it does have an extra improvement by subtracting the branch
mispredicts.

Perf doesn't currently match the jsons based on silicon revision, so
we'd have to add something in for that if a fixed silicon version is
released. But this is another problem for another time.

This is the frontend bound metric we have for future revisions:

	"100 * ( (STALL_SLOT_FRONTEND/(CPU_CYCLES * 5)) - ((BR_MIS_PRED *
4)/CPU_CYCLES) )"

Other changes are, for example, your 'wasted' metric, we have
'bad_speculation', and without the
cycles subtraction:

	100 * ( ((1 - (OP_RETIRED/OP_SPEC)) * (1 - (STALL_SLOT/(CPU_CYCLES *
5)))) + ((BR_MIS_PRED * 4)/CPU_CYCLES) )

And some more details filled in around the units, for example:

    {
        "MetricName": "bad_speculation",
        "MetricExpr": "100 * ( ((1 - (OP_RETIRED/OP_SPEC)) * (1 -
(STALL_SLOT/(CPU_CYCLES * 5)))) + ((BR_MIS_PRED * 4)/CPU_CYCLES) )",
        "BriefDescription": "Bad Speculation",
        "PublicDescription": "This metric is the percentage of total
slots that executed operations and didn't retire due to a pipeline
flush.\nThis indicates cycles that were utilized but inefficiently.",
        "MetricGroup": "TopdownL1",
        "ScaleUnit": "1percent of slots"
    },

So ignoring the errata issue, the main reason to hold off is for
consistency and churn because these metrics in this format will be
released for all cores going forwards.

Thanks
James



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH RFC 0/6] Add metrics for neoverse-n2
       [not found]     ` <CAP-5=fW+Z_Tc3BfK1bRKUeKWfxtPfoZXL9D2BhcU1SzNOruSsg@mail.gmail.com>
@ 2022-11-21 11:55         ` James Clark
  2022-11-21 11:55         ` James Clark
  1 sibling, 0 replies; 96+ messages in thread
From: James Clark @ 2022-11-21 11:55 UTC (permalink / raw)
  To: Ian Rogers, Jing Zhang
  Cc: nick Forrington, Jumana MP, John Garry, Will Deacon, Mike Leach,
	Leo Yan, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song, Linux ARM, linux-perf-users,
	LKML



On 19/11/2022 21:46, Ian Rogers wrote:
> On Fri, Nov 18, 2022 at 7:30 PM Jing Zhang <renyu.zj@linux.alibaba.com>
> wrote:
>>
>>
>> 在 2022/11/16 下午7:19, James Clark 写道:
>>>
>>>
>>> On 31/10/2022 11:11, Jing Zhang wrote:
>>>> This series add six metricgroups for neoverse-n2, among which, the
>>>> formula of topdown L1 is from the document:
>>>>
> https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
>>>>
>>>> Since neoverse-n2 does not yet support topdown L2, metricgroups such
>>>> as Cache, TLB, Branch, InstructionsMix, and PEutilization are added to
>>>> help further analysis of performance bottlenecks.
>>>>
>>>
>>> Hi Jing,
>>>
>>> Thanks for working on this, these metrics look ok to me in general,
>>> although we're currently working on publishing standardised metrics
>>> across all new cores as part of a new project in Arm. This will include
>>> N2, and our ones are very similar (or almost identical) to yours,
>>> barring slightly different group names, metric names, and differences in
>>> things like outputting topdown metrics as percentages.
>>>
>>> We plan to publish our standard metrics some time in the next 2 months.
>>> Would you consider holding off on merging this change so that we have
>>> consistant group names and units going forward? Otherwise N2 would be
>>> the odd one out. I will send you the metrics when they are ready, and we
>>> will have a script to generate perf jsons from them, so you can review.
>>>
>>> We also have a slightly different forumula for one of the top down
>>> metrics which I think would be slightly more accurate. We don't have
>>> anything for your "PE utilization" metrics, which I can raise
>>> internally. It could always be added to perf on top of the standardised
>>> ones if we don't add it to our standard ones.
>>>
>>> Thanks
>>> James
>>>
>>
>> Hi James,
>>
>> Regarding the arm n2 standard metrics last time, is my understanding
> correct,
>> and does it meet your meaning? If so, may I ask when you will send me the
>> standards you formulate so that I can align with you in time over my
> patchset.
>> Please communicate this matter so that we can understand each other's
> schedule.
>>
>> Thanks,
>> Jing
> 
> Hi,
> 
> In past versions of the perf tool the metrics have been pretty broken. If
> we have something that is good we shouldn't be holding it to a bar of being
> perfect, we can merge what we have and improve over time. In this case what
> Jing has prepared may arrive in time for Linux 6.2 whilst the standard
> metrics may arrive in time for 6.3. I'd suggest merging Jing's work and
> then improving on it with the standard metrics.
>

I'm not completely opposed to this, I was just worried about the churn
because ours will be generated from a script, and that it would end up
looking like a mass replacement of these that would have only recently
been added.

But maybe that's fine like you say.

> In terms of the metrics themselves, could we add ScaleUnit? For example:
> 
> +    {
> +        "MetricExpr": "LD_SPEC / INST_SPEC",
> +        "PublicDescription": "The rate of load instructions speculatively
> executed to overall instructions speclatively executed",
> +        "BriefDescription": "The rate of load instructions speculatively
> executed to overall instructions speclatively executed",
> +        "MetricGroup": "InstructionMix",
> +        "MetricName": "load_spec_rate"
> +    },
> 
> A ScaleUnit of "100%" would likely make things more readable.
> 
> Thanks,
> Ian
> 

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH RFC 0/6] Add metrics for neoverse-n2
@ 2022-11-21 11:55         ` James Clark
  0 siblings, 0 replies; 96+ messages in thread
From: James Clark @ 2022-11-21 11:55 UTC (permalink / raw)
  To: Ian Rogers, Jing Zhang
  Cc: nick Forrington, Jumana MP, John Garry, Will Deacon, Mike Leach,
	Leo Yan, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song, Linux ARM, linux-perf-users,
	LKML



On 19/11/2022 21:46, Ian Rogers wrote:
> On Fri, Nov 18, 2022 at 7:30 PM Jing Zhang <renyu.zj@linux.alibaba.com>
> wrote:
>>
>>
>> 在 2022/11/16 下午7:19, James Clark 写道:
>>>
>>>
>>> On 31/10/2022 11:11, Jing Zhang wrote:
>>>> This series add six metricgroups for neoverse-n2, among which, the
>>>> formula of topdown L1 is from the document:
>>>>
> https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
>>>>
>>>> Since neoverse-n2 does not yet support topdown L2, metricgroups such
>>>> as Cache, TLB, Branch, InstructionsMix, and PEutilization are added to
>>>> help further analysis of performance bottlenecks.
>>>>
>>>
>>> Hi Jing,
>>>
>>> Thanks for working on this, these metrics look ok to me in general,
>>> although we're currently working on publishing standardised metrics
>>> across all new cores as part of a new project in Arm. This will include
>>> N2, and our ones are very similar (or almost identical) to yours,
>>> barring slightly different group names, metric names, and differences in
>>> things like outputting topdown metrics as percentages.
>>>
>>> We plan to publish our standard metrics some time in the next 2 months.
>>> Would you consider holding off on merging this change so that we have
>>> consistant group names and units going forward? Otherwise N2 would be
>>> the odd one out. I will send you the metrics when they are ready, and we
>>> will have a script to generate perf jsons from them, so you can review.
>>>
>>> We also have a slightly different forumula for one of the top down
>>> metrics which I think would be slightly more accurate. We don't have
>>> anything for your "PE utilization" metrics, which I can raise
>>> internally. It could always be added to perf on top of the standardised
>>> ones if we don't add it to our standard ones.
>>>
>>> Thanks
>>> James
>>>
>>
>> Hi James,
>>
>> Regarding the arm n2 standard metrics last time, is my understanding
> correct,
>> and does it meet your meaning? If so, may I ask when you will send me the
>> standards you formulate so that I can align with you in time over my
> patchset.
>> Please communicate this matter so that we can understand each other's
> schedule.
>>
>> Thanks,
>> Jing
> 
> Hi,
> 
> In past versions of the perf tool the metrics have been pretty broken. If
> we have something that is good we shouldn't be holding it to a bar of being
> perfect, we can merge what we have and improve over time. In this case what
> Jing has prepared may arrive in time for Linux 6.2 whilst the standard
> metrics may arrive in time for 6.3. I'd suggest merging Jing's work and
> then improving on it with the standard metrics.
>

I'm not completely opposed to this, I was just worried about the churn
because ours will be generated from a script, and that it would end up
looking like a mass replacement of these that would have only recently
been added.

But maybe that's fine like you say.

> In terms of the metrics themselves, could we add ScaleUnit? For example:
> 
> +    {
> +        "MetricExpr": "LD_SPEC / INST_SPEC",
> +        "PublicDescription": "The rate of load instructions speculatively
> executed to overall instructions speclatively executed",
> +        "BriefDescription": "The rate of load instructions speculatively
> executed to overall instructions speclatively executed",
> +        "MetricGroup": "InstructionMix",
> +        "MetricName": "load_spec_rate"
> +    },
> 
> A ScaleUnit of "100%" would likely make things more readable.
> 
> Thanks,
> Ian
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [External] : [RFC PATCH v2 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
  2022-11-21 10:22             ` John Garry
@ 2022-11-21 15:17               ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-21 15:17 UTC (permalink / raw)
  To: John Garry, linux-arm-kernel, linux-perf-users, linux-kernel,
	John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan,
	Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song



在 2022/11/21 下午6:22, John Garry 写道:
> 
>>
>>
>> #./perf stat -e FRONTEND_BOUND sleep 1
>> event syntax error: 'FRONTEND_BOUND'
> 
> For metrics, use -M, not -e
> 
> If this doesn't help, verify generated pmu-events/pmu-events.c is same after you make the change to try to use std arch events for metrics. Note that I never tested running my change.
> 
> Thanks,
> John
> 
>>                       \___ parser error
>> Run 'perf list' for a list of valid events
>>
>>   Usage: perf stat [<options>] [<command>]
>>
>>      -e, --event <event>   event selector. use 'perf list' to list available events
>>
>>
>>
>> diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeli
>> index f9fae15..1089ca0 100644
>> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json
>> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json
>> @@ -6,18 +6,24 @@
>>           "ArchStdEvent": "STALL_BACKEND"
>>       },
>>       {
>> -        "ArchStdEvent": "STALL_SLOT_FRONTEND"
>> +        "ArchStdEvent": "STALL_SLOT_FRONTEND",
>> +        "MetricExpr": "STALL_SLOT_FRONTEND - CPU_CYCLES"
>>       },
>>       {
>>


I'm sorry that I misunderstood the purpose of putting metric as arch_std_event at first,
and now it works after the modification over your suggestion.

But there are also a few questions:

1. The value of the slot in the topdownL1 is various in different architectures, for example,
the slot is 5 on neoverse-n2. If I put topdownL1 metric as arch_std_event, then I need to
specify the slot to 5 in n2. I can specify slot values in metric like below, but is there any
other concise way to do this?

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index 8ff1dfe..b473baf 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -1,4 +1,23 @@
[
+       {
+               "MetricExpr": "5",
+               "PublicDescription": "A pipeline slot represents the hardware resources needed to process one uOp",
+               "BriefDescription": "A pipeline slot represents the hardware resources needed to process one uOp",
+               "MetricName": "slot"
+       },
+       {
+               "ArchStdEvent": "FRONTEND_BOUND"
+       },
+       {
+               "ArchStdEvent": "BACKEND_BOUND"
+       },
+       {
+               "ArchStdEvent": "WASTED"
+       },
+       {
+               "ArchStdEvent": "RETIRING"
+       },


2. Should I add the topdownL1 metric to tools/perf/pmu-event/recommended.json,
or create a new json file to place the general metric?

Looking forward to your reply.

Thanks,
Jing

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* Re: [External] : [RFC PATCH v2 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
@ 2022-11-21 15:17               ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-21 15:17 UTC (permalink / raw)
  To: John Garry, linux-arm-kernel, linux-perf-users, linux-kernel,
	John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan,
	Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song



在 2022/11/21 下午6:22, John Garry 写道:
> 
>>
>>
>> #./perf stat -e FRONTEND_BOUND sleep 1
>> event syntax error: 'FRONTEND_BOUND'
> 
> For metrics, use -M, not -e
> 
> If this doesn't help, verify generated pmu-events/pmu-events.c is same after you make the change to try to use std arch events for metrics. Note that I never tested running my change.
> 
> Thanks,
> John
> 
>>                       \___ parser error
>> Run 'perf list' for a list of valid events
>>
>>   Usage: perf stat [<options>] [<command>]
>>
>>      -e, --event <event>   event selector. use 'perf list' to list available events
>>
>>
>>
>> diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeli
>> index f9fae15..1089ca0 100644
>> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json
>> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json
>> @@ -6,18 +6,24 @@
>>           "ArchStdEvent": "STALL_BACKEND"
>>       },
>>       {
>> -        "ArchStdEvent": "STALL_SLOT_FRONTEND"
>> +        "ArchStdEvent": "STALL_SLOT_FRONTEND",
>> +        "MetricExpr": "STALL_SLOT_FRONTEND - CPU_CYCLES"
>>       },
>>       {
>>


I'm sorry that I misunderstood the purpose of putting metric as arch_std_event at first,
and now it works after the modification over your suggestion.

But there are also a few questions:

1. The value of the slot in the topdownL1 is various in different architectures, for example,
the slot is 5 on neoverse-n2. If I put topdownL1 metric as arch_std_event, then I need to
specify the slot to 5 in n2. I can specify slot values in metric like below, but is there any
other concise way to do this?

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index 8ff1dfe..b473baf 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -1,4 +1,23 @@
[
+       {
+               "MetricExpr": "5",
+               "PublicDescription": "A pipeline slot represents the hardware resources needed to process one uOp",
+               "BriefDescription": "A pipeline slot represents the hardware resources needed to process one uOp",
+               "MetricName": "slot"
+       },
+       {
+               "ArchStdEvent": "FRONTEND_BOUND"
+       },
+       {
+               "ArchStdEvent": "BACKEND_BOUND"
+       },
+       {
+               "ArchStdEvent": "WASTED"
+       },
+       {
+               "ArchStdEvent": "RETIRING"
+       },


2. Should I add the topdownL1 metric to tools/perf/pmu-event/recommended.json,
or create a new json file to place the general metric?

Looking forward to your reply.

Thanks,
Jing

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* Re: [External] : [RFC PATCH v2 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
  2022-11-21 15:17               ` Jing Zhang
@ 2022-11-21 17:55                 ` John Garry
  -1 siblings, 0 replies; 96+ messages in thread
From: John Garry @ 2022-11-21 17:55 UTC (permalink / raw)
  To: Jing Zhang, linux-arm-kernel, linux-perf-users, linux-kernel,
	Will Deacon, James Clark, Mike Leach, Leo Yan, Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song

On 21/11/2022 15:17, Jing Zhang wrote:
> I'm sorry that I misunderstood the purpose of putting metric as arch_std_event at first,
> and now it works after the modification over your suggestion.
> 
> But there are also a few questions:
> 
> 1. The value of the slot in the topdownL1 is various in different architectures, for example,
> the slot is 5 on neoverse-n2. If I put topdownL1 metric as arch_std_event, then I need to
> specify the slot to 5 in n2. I can specify slot values in metric like below, but is there any
> other concise way to do this?
> 
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> index 8ff1dfe..b473baf 100644
> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> @@ -1,4 +1,23 @@
> [
> +       {
> +               "MetricExpr": "5",
> +               "PublicDescription": "A pipeline slot represents the hardware resources needed to process one uOp",
> +               "BriefDescription": "A pipeline slot represents the hardware resources needed to process one uOp",
> +               "MetricName": "slot"

Ehhh....I'm not sure if that is a good idea. Ian or anyone else have an 
opinion on this? It is possible to reuse metrics, so it should work, but...

One problem is that "slot" would show up as a metric, which you would 
not want.

Alternatively I was going to suggest that you can overwrite specific std 
arch event attributes. So for example of frontend_bound, you could have:

+ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -0,0 +1,30 @@
[
     {
	"ArchStdEvent": "FRONTEND_BOUND",
         "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * 
cpu_cycles)",
     },

> +       }
> +       {
> +               "ArchStdEvent": "FRONTEND_BOUND"
> +       },
> +       {
> +               "ArchStdEvent": "BACKEND_BOUND"
> +       },
> +       {
> +               "ArchStdEvent": "WASTED"
> +       },
> +       {
> +               "ArchStdEvent": "RETIRING"
> +       },
> 
> 
> 2. Should I add the topdownL1 metric to tools/perf/pmu-event/recommended.json,
> or create a new json file to place the general metric?

It would not belong in recommended.json as that is specifically for 
arch-recommended events. It would really just depend on where the value 
comes from, i.e. arm arm or sbsa.

> 
> Looking forward to your reply.


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [External] : [RFC PATCH v2 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
@ 2022-11-21 17:55                 ` John Garry
  0 siblings, 0 replies; 96+ messages in thread
From: John Garry @ 2022-11-21 17:55 UTC (permalink / raw)
  To: Jing Zhang, linux-arm-kernel, linux-perf-users, linux-kernel,
	Will Deacon, James Clark, Mike Leach, Leo Yan, Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song

On 21/11/2022 15:17, Jing Zhang wrote:
> I'm sorry that I misunderstood the purpose of putting metric as arch_std_event at first,
> and now it works after the modification over your suggestion.
> 
> But there are also a few questions:
> 
> 1. The value of the slot in the topdownL1 is various in different architectures, for example,
> the slot is 5 on neoverse-n2. If I put topdownL1 metric as arch_std_event, then I need to
> specify the slot to 5 in n2. I can specify slot values in metric like below, but is there any
> other concise way to do this?
> 
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> index 8ff1dfe..b473baf 100644
> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> @@ -1,4 +1,23 @@
> [
> +       {
> +               "MetricExpr": "5",
> +               "PublicDescription": "A pipeline slot represents the hardware resources needed to process one uOp",
> +               "BriefDescription": "A pipeline slot represents the hardware resources needed to process one uOp",
> +               "MetricName": "slot"

Ehhh....I'm not sure if that is a good idea. Ian or anyone else have an 
opinion on this? It is possible to reuse metrics, so it should work, but...

One problem is that "slot" would show up as a metric, which you would 
not want.

Alternatively I was going to suggest that you can overwrite specific std 
arch event attributes. So for example of frontend_bound, you could have:

+ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -0,0 +1,30 @@
[
     {
	"ArchStdEvent": "FRONTEND_BOUND",
         "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * 
cpu_cycles)",
     },

> +       }
> +       {
> +               "ArchStdEvent": "FRONTEND_BOUND"
> +       },
> +       {
> +               "ArchStdEvent": "BACKEND_BOUND"
> +       },
> +       {
> +               "ArchStdEvent": "WASTED"
> +       },
> +       {
> +               "ArchStdEvent": "RETIRING"
> +       },
> 
> 
> 2. Should I add the topdownL1 metric to tools/perf/pmu-event/recommended.json,
> or create a new json file to place the general metric?

It would not belong in recommended.json as that is specifically for 
arch-recommended events. It would really just depend on where the value 
comes from, i.e. arm arm or sbsa.

> 
> Looking forward to your reply.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH RFC 0/6] Add metrics for neoverse-n2
  2022-11-21 11:51       ` James Clark
@ 2022-11-22  7:11         ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-22  7:11 UTC (permalink / raw)
  To: James Clark, nick Forrington, Jumana MP, John Garry, Ian Rogers
  Cc: Will Deacon, Mike Leach, Leo Yan, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Andrew Kilroy, Shuai Xue, Zhuo Song,
	linux-arm-kernel, linux-perf-users, linux-kernel



在 2022/11/21 下午7:51, James Clark 写道:
> 
> 
> On 16/11/2022 15:26, Jing Zhang wrote:
>>
>>
>> 在 2022/11/16 下午7:19, James Clark 写道:
>>>
>>>
>>> On 31/10/2022 11:11, Jing Zhang wrote:
>>>> This series add six metricgroups for neoverse-n2, among which, the
>>>> formula of topdown L1 is from the document:
>>>> https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
>>>>
>>>> Since neoverse-n2 does not yet support topdown L2, metricgroups such
>>>> as Cache, TLB, Branch, InstructionsMix, and PEutilization are added to
>>>> help further analysis of performance bottlenecks.
>>>>
>>>
>>> Hi Jing,
>>>
>>> Thanks for working on this, these metrics look ok to me in general,
>>> although we're currently working on publishing standardised metrics
>>> across all new cores as part of a new project in Arm. This will include
>>> N2, and our ones are very similar (or almost identical) to yours,
>>> barring slightly different group names, metric names, and differences in
>>> things like outputting topdown metrics as percentages.
>>>
>>> We plan to publish our standard metrics some time in the next 2 months.
>>> Would you consider holding off on merging this change so that we have
>>> consistant group names and units going forward? Otherwise N2 would be> the odd one out. I will send you the metrics when they are ready, and we
>>> will have a script to generate perf jsons from them, so you can review.
>>>
>>
>> Do you mean that after you release the new standard metrics, I remake my
>> patch referring to them, such as consistent group names and unit?
> 
> Hi Jing,
> 
> I was planning to submit the patch myself, but there will be a script to
> generate perf json files, so no manual work would be needed. Although
> this is complicated by the fact that we won't be publishing the fixed
> TopdownL1 metrics that you have for the existing N2 silicon so there
> would be a one time copy paste to fix that part.
> 
>>
>>
>>> We also have a slightly different forumula for one of the top down
>>> metrics which I think would be slightly more accurate. We don't have
>>
>>
>> The v2 version of the patchset updated the formula of topdown L1.
>> Link: https://lore.kernel.org/all/1668411720-3581-1-git-send-email-renyu.zj@linux.alibaba.com/
>>
>> The formula of the v2 version is more accurate than v1, and it has been
>> verified in our test environment. Can you share your formula first and we
>> can discuss it together? :)
> 
> I was looking at v2 but replied to the root of the thread by mistake. I
> also had it the wrong way round. So your version corrects for the errata
> on the current version of N2 (as you mentioned in the commit message).
> Our version would be if there is a future new silicon revision with that
> fixed, but it does have an extra improvement by subtracting the branch
> mispredicts.
> 
> Perf doesn't currently match the jsons based on silicon revision, so
> we'd have to add something in for that if a fixed silicon version is
> released. But this is another problem for another time.
> 

Hi James,

Let's do what Ian said, and you can improve it later with the standard metrics,
after the fixed silicon version is released.


> This is the frontend bound metric we have for future revisions:
> 
> 	"100 * ( (STALL_SLOT_FRONTEND/(CPU_CYCLES * 5)) - ((BR_MIS_PRED *
> 4)/CPU_CYCLES) )"
> 
> Other changes are, for example, your 'wasted' metric, we have
> 'bad_speculation', and without the
> cycles subtraction:
> 
> 	100 * ( ((1 - (OP_RETIRED/OP_SPEC)) * (1 - (STALL_SLOT/(CPU_CYCLES *
> 5)))) + ((BR_MIS_PRED * 4)/CPU_CYCLES) )
> 

Thanks for sharing your metric version, But I still wonder, is BR_MIS_PRED not classified
as frontend bound? How do you judge the extra improvement by subtracting branch mispredicts?

> And some more details filled in around the units, for example:
> 
>     {
>         "MetricName": "bad_speculation",
>         "MetricExpr": "100 * ( ((1 - (OP_RETIRED/OP_SPEC)) * (1 -
> (STALL_SLOT/(CPU_CYCLES * 5)))) + ((BR_MIS_PRED * 4)/CPU_CYCLES) )",
>         "BriefDescription": "Bad Speculation",
>         "PublicDescription": "This metric is the percentage of total
> slots that executed operations and didn't retire due to a pipeline
> flush.\nThis indicates cycles that were utilized but inefficiently.",
>         "MetricGroup": "TopdownL1",
>         "ScaleUnit": "1percent of slots"
>     },
> 

My "wasted" metric was changed according to the arm documentation description, it was originally
"bad_speculation".  I will change "wasted" back to "bad_speculation", if you wish.


Thanks,
Jing


> So ignoring the errata issue, the main reason to hold off is for
> consistency and churn because these metrics in this format will be
> released for all cores going forwards.
> 
> Thanks
> James
> 

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH RFC 0/6] Add metrics for neoverse-n2
@ 2022-11-22  7:11         ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-22  7:11 UTC (permalink / raw)
  To: James Clark, nick Forrington, Jumana MP, John Garry, Ian Rogers
  Cc: Will Deacon, Mike Leach, Leo Yan, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Andrew Kilroy, Shuai Xue, Zhuo Song,
	linux-arm-kernel, linux-perf-users, linux-kernel



在 2022/11/21 下午7:51, James Clark 写道:
> 
> 
> On 16/11/2022 15:26, Jing Zhang wrote:
>>
>>
>> 在 2022/11/16 下午7:19, James Clark 写道:
>>>
>>>
>>> On 31/10/2022 11:11, Jing Zhang wrote:
>>>> This series add six metricgroups for neoverse-n2, among which, the
>>>> formula of topdown L1 is from the document:
>>>> https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
>>>>
>>>> Since neoverse-n2 does not yet support topdown L2, metricgroups such
>>>> as Cache, TLB, Branch, InstructionsMix, and PEutilization are added to
>>>> help further analysis of performance bottlenecks.
>>>>
>>>
>>> Hi Jing,
>>>
>>> Thanks for working on this, these metrics look ok to me in general,
>>> although we're currently working on publishing standardised metrics
>>> across all new cores as part of a new project in Arm. This will include
>>> N2, and our ones are very similar (or almost identical) to yours,
>>> barring slightly different group names, metric names, and differences in
>>> things like outputting topdown metrics as percentages.
>>>
>>> We plan to publish our standard metrics some time in the next 2 months.
>>> Would you consider holding off on merging this change so that we have
>>> consistant group names and units going forward? Otherwise N2 would be> the odd one out. I will send you the metrics when they are ready, and we
>>> will have a script to generate perf jsons from them, so you can review.
>>>
>>
>> Do you mean that after you release the new standard metrics, I remake my
>> patch referring to them, such as consistent group names and unit?
> 
> Hi Jing,
> 
> I was planning to submit the patch myself, but there will be a script to
> generate perf json files, so no manual work would be needed. Although
> this is complicated by the fact that we won't be publishing the fixed
> TopdownL1 metrics that you have for the existing N2 silicon so there
> would be a one time copy paste to fix that part.
> 
>>
>>
>>> We also have a slightly different forumula for one of the top down
>>> metrics which I think would be slightly more accurate. We don't have
>>
>>
>> The v2 version of the patchset updated the formula of topdown L1.
>> Link: https://lore.kernel.org/all/1668411720-3581-1-git-send-email-renyu.zj@linux.alibaba.com/
>>
>> The formula of the v2 version is more accurate than v1, and it has been
>> verified in our test environment. Can you share your formula first and we
>> can discuss it together? :)
> 
> I was looking at v2 but replied to the root of the thread by mistake. I
> also had it the wrong way round. So your version corrects for the errata
> on the current version of N2 (as you mentioned in the commit message).
> Our version would be if there is a future new silicon revision with that
> fixed, but it does have an extra improvement by subtracting the branch
> mispredicts.
> 
> Perf doesn't currently match the jsons based on silicon revision, so
> we'd have to add something in for that if a fixed silicon version is
> released. But this is another problem for another time.
> 

Hi James,

Let's do what Ian said, and you can improve it later with the standard metrics,
after the fixed silicon version is released.


> This is the frontend bound metric we have for future revisions:
> 
> 	"100 * ( (STALL_SLOT_FRONTEND/(CPU_CYCLES * 5)) - ((BR_MIS_PRED *
> 4)/CPU_CYCLES) )"
> 
> Other changes are, for example, your 'wasted' metric, we have
> 'bad_speculation', and without the
> cycles subtraction:
> 
> 	100 * ( ((1 - (OP_RETIRED/OP_SPEC)) * (1 - (STALL_SLOT/(CPU_CYCLES *
> 5)))) + ((BR_MIS_PRED * 4)/CPU_CYCLES) )
> 

Thanks for sharing your metric version, But I still wonder, is BR_MIS_PRED not classified
as frontend bound? How do you judge the extra improvement by subtracting branch mispredicts?

> And some more details filled in around the units, for example:
> 
>     {
>         "MetricName": "bad_speculation",
>         "MetricExpr": "100 * ( ((1 - (OP_RETIRED/OP_SPEC)) * (1 -
> (STALL_SLOT/(CPU_CYCLES * 5)))) + ((BR_MIS_PRED * 4)/CPU_CYCLES) )",
>         "BriefDescription": "Bad Speculation",
>         "PublicDescription": "This metric is the percentage of total
> slots that executed operations and didn't retire due to a pipeline
> flush.\nThis indicates cycles that were utilized but inefficiently.",
>         "MetricGroup": "TopdownL1",
>         "ScaleUnit": "1percent of slots"
>     },
> 

My "wasted" metric was changed according to the arm documentation description, it was originally
"bad_speculation".  I will change "wasted" back to "bad_speculation", if you wish.


Thanks,
Jing


> So ignoring the errata issue, the main reason to hold off is for
> consistency and churn because these metrics in this format will be
> released for all cores going forwards.
> 
> Thanks
> James
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [External] : [RFC PATCH v2 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
  2022-11-21 17:55                 ` John Garry
@ 2022-11-22  9:24                   ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-22  9:24 UTC (permalink / raw)
  To: John Garry, linux-arm-kernel, linux-perf-users, linux-kernel,
	Will Deacon, James Clark, Mike Leach, Leo Yan, Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song



在 2022/11/22 上午1:55, John Garry 写道:
> On 21/11/2022 15:17, Jing Zhang wrote:
>> I'm sorry that I misunderstood the purpose of putting metric as arch_std_event at first,
>> and now it works after the modification over your suggestion.
>>
>> But there are also a few questions:
>>
>> 1. The value of the slot in the topdownL1 is various in different architectures, for example,
>> the slot is 5 on neoverse-n2. If I put topdownL1 metric as arch_std_event, then I need to
>> specify the slot to 5 in n2. I can specify slot values in metric like below, but is there any
>> other concise way to do this?
>>
>> diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> index 8ff1dfe..b473baf 100644
>> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> @@ -1,4 +1,23 @@
>> [
>> +       {
>> +               "MetricExpr": "5",
>> +               "PublicDescription": "A pipeline slot represents the hardware resources needed to process one uOp",
>> +               "BriefDescription": "A pipeline slot represents the hardware resources needed to process one uOp",
>> +               "MetricName": "slot"
> 
> Ehhh....I'm not sure if that is a good idea. Ian or anyone else have an opinion on this? It is possible to reuse metrics, so it should work, but...
> 
> One problem is that "slot" would show up as a metric, which you would not want.
> 
> Alternatively I was going to suggest that you can overwrite specific std arch event attributes. So for example of frontend_bound, you could have:
> 
> + b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> @@ -0,0 +1,30 @@
> [
>     {
>     "ArchStdEvent": "FRONTEND_BOUND",
>         "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * cpu_cycles)",
>     },
> 
>> +       }
>> +       {
>> +               "ArchStdEvent": "FRONTEND_BOUND"
>> +       },
>> +       {
>> +               "ArchStdEvent": "BACKEND_BOUND"
>> +       },
>> +       {
>> +               "ArchStdEvent": "WASTED"
>> +       },
>> +       {
>> +               "ArchStdEvent": "RETIRING"
>> +       },
>>
>>
>> 2. Should I add the topdownL1 metric to tools/perf/pmu-event/recommended.json,
>> or create a new json file to place the general metric?
> 
> It would not belong in recommended.json as that is specifically for arch-recommended events. It would really just depend on where the value comes from, i.e. arm arm or sbsa.
> 


Thanks for your suggestion, I will send next patchset as you suggested.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [External] : [RFC PATCH v2 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
@ 2022-11-22  9:24                   ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-22  9:24 UTC (permalink / raw)
  To: John Garry, linux-arm-kernel, linux-perf-users, linux-kernel,
	Will Deacon, James Clark, Mike Leach, Leo Yan, Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song



在 2022/11/22 上午1:55, John Garry 写道:
> On 21/11/2022 15:17, Jing Zhang wrote:
>> I'm sorry that I misunderstood the purpose of putting metric as arch_std_event at first,
>> and now it works after the modification over your suggestion.
>>
>> But there are also a few questions:
>>
>> 1. The value of the slot in the topdownL1 is various in different architectures, for example,
>> the slot is 5 on neoverse-n2. If I put topdownL1 metric as arch_std_event, then I need to
>> specify the slot to 5 in n2. I can specify slot values in metric like below, but is there any
>> other concise way to do this?
>>
>> diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> index 8ff1dfe..b473baf 100644
>> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> @@ -1,4 +1,23 @@
>> [
>> +       {
>> +               "MetricExpr": "5",
>> +               "PublicDescription": "A pipeline slot represents the hardware resources needed to process one uOp",
>> +               "BriefDescription": "A pipeline slot represents the hardware resources needed to process one uOp",
>> +               "MetricName": "slot"
> 
> Ehhh....I'm not sure if that is a good idea. Ian or anyone else have an opinion on this? It is possible to reuse metrics, so it should work, but...
> 
> One problem is that "slot" would show up as a metric, which you would not want.
> 
> Alternatively I was going to suggest that you can overwrite specific std arch event attributes. So for example of frontend_bound, you could have:
> 
> + b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> @@ -0,0 +1,30 @@
> [
>     {
>     "ArchStdEvent": "FRONTEND_BOUND",
>         "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * cpu_cycles)",
>     },
> 
>> +       }
>> +       {
>> +               "ArchStdEvent": "FRONTEND_BOUND"
>> +       },
>> +       {
>> +               "ArchStdEvent": "BACKEND_BOUND"
>> +       },
>> +       {
>> +               "ArchStdEvent": "WASTED"
>> +       },
>> +       {
>> +               "ArchStdEvent": "RETIRING"
>> +       },
>>
>>
>> 2. Should I add the topdownL1 metric to tools/perf/pmu-event/recommended.json,
>> or create a new json file to place the general metric?
> 
> It would not belong in recommended.json as that is specifically for arch-recommended events. It would really just depend on where the value comes from, i.e. arm arm or sbsa.
> 


Thanks for your suggestion, I will send next patchset as you suggested.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH RFC 0/6] Add metrics for neoverse-n2
  2022-11-22  7:11         ` Jing Zhang
@ 2022-11-22 11:53           ` James Clark
  -1 siblings, 0 replies; 96+ messages in thread
From: James Clark @ 2022-11-22 11:53 UTC (permalink / raw)
  To: Jing Zhang, nick Forrington, Jumana MP, John Garry, Ian Rogers
  Cc: Will Deacon, Mike Leach, Leo Yan, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Andrew Kilroy, Shuai Xue, Zhuo Song,
	linux-arm-kernel, linux-perf-users, linux-kernel



On 22/11/2022 07:11, Jing Zhang wrote:
> 
> 
> 在 2022/11/21 下午7:51, James Clark 写道:
>>
>>
>> On 16/11/2022 15:26, Jing Zhang wrote:
>>>
>>>
>>> 在 2022/11/16 下午7:19, James Clark 写道:
>>>>
>>>>
>>>> On 31/10/2022 11:11, Jing Zhang wrote:
>>>>> This series add six metricgroups for neoverse-n2, among which, the
>>>>> formula of topdown L1 is from the document:
>>>>> https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
>>>>>
>>>>> Since neoverse-n2 does not yet support topdown L2, metricgroups such
>>>>> as Cache, TLB, Branch, InstructionsMix, and PEutilization are added to
>>>>> help further analysis of performance bottlenecks.
>>>>>
>>>>
>>>> Hi Jing,
>>>>
>>>> Thanks for working on this, these metrics look ok to me in general,
>>>> although we're currently working on publishing standardised metrics
>>>> across all new cores as part of a new project in Arm. This will include
>>>> N2, and our ones are very similar (or almost identical) to yours,
>>>> barring slightly different group names, metric names, and differences in
>>>> things like outputting topdown metrics as percentages.
>>>>
>>>> We plan to publish our standard metrics some time in the next 2 months.
>>>> Would you consider holding off on merging this change so that we have
>>>> consistant group names and units going forward? Otherwise N2 would be> the odd one out. I will send you the metrics when they are ready, and we
>>>> will have a script to generate perf jsons from them, so you can review.
>>>>
>>>
>>> Do you mean that after you release the new standard metrics, I remake my
>>> patch referring to them, such as consistent group names and unit?
>>
>> Hi Jing,
>>
>> I was planning to submit the patch myself, but there will be a script to
>> generate perf json files, so no manual work would be needed. Although
>> this is complicated by the fact that we won't be publishing the fixed
>> TopdownL1 metrics that you have for the existing N2 silicon so there
>> would be a one time copy paste to fix that part.
>>
>>>
>>>
>>>> We also have a slightly different forumula for one of the top down
>>>> metrics which I think would be slightly more accurate. We don't have
>>>
>>>
>>> The v2 version of the patchset updated the formula of topdown L1.
>>> Link: https://lore.kernel.org/all/1668411720-3581-1-git-send-email-renyu.zj@linux.alibaba.com/
>>>
>>> The formula of the v2 version is more accurate than v1, and it has been
>>> verified in our test environment. Can you share your formula first and we
>>> can discuss it together? :)
>>
>> I was looking at v2 but replied to the root of the thread by mistake. I
>> also had it the wrong way round. So your version corrects for the errata
>> on the current version of N2 (as you mentioned in the commit message).
>> Our version would be if there is a future new silicon revision with that
>> fixed, but it does have an extra improvement by subtracting the branch
>> mispredicts.
>>
>> Perf doesn't currently match the jsons based on silicon revision, so
>> we'd have to add something in for that if a fixed silicon version is
>> released. But this is another problem for another time.
>>
> 
> Hi James,
> 
> Let's do what Ian said, and you can improve it later with the standard metrics,
> after the fixed silicon version is released.
> 

Ok that's fine by me. I do have one update about our publishing progress
to share. This is the (currently empty) repo that we will be holding our
metrics in: https://gitlab.arm.com/telemetry-solution/telemetry-solution

We'll also have the conversion script in there as well. So there has at
least been some progress and we're getting close. I will keep you
updated when it is populated.

> 
>> This is the frontend bound metric we have for future revisions:
>>
>> 	"100 * ( (STALL_SLOT_FRONTEND/(CPU_CYCLES * 5)) - ((BR_MIS_PRED *
>> 4)/CPU_CYCLES) )"
>>
>> Other changes are, for example, your 'wasted' metric, we have
>> 'bad_speculation', and without the
>> cycles subtraction:
>>
>> 	100 * ( ((1 - (OP_RETIRED/OP_SPEC)) * (1 - (STALL_SLOT/(CPU_CYCLES *
>> 5)))) + ((BR_MIS_PRED * 4)/CPU_CYCLES) )
>>
> 
> Thanks for sharing your metric version, But I still wonder, is BR_MIS_PRED not classified
> as frontend bound? 

We're counting branch mispredicts as an extra cost so we subtract it
from frontend_bound because branch related stalls are covered by
bad_speculation where we have added BR_MIS_PRED instead of subtracting.

Unfortunately I'm just the middle man here, I didn't actually work
directly on producing these metrics so I hope nothing gets lost in my
explanation.

> How do you judge the extra improvement by subtracting branch mispredicts?

As far as I know the repo that I mentioned above will have some
benchmarks and tooling that were used to validate our version. So it
should be apparent by running those.

> 
>> And some more details filled in around the units, for example:
>>
>>     {
>>         "MetricName": "bad_speculation",
>>         "MetricExpr": "100 * ( ((1 - (OP_RETIRED/OP_SPEC)) * (1 -
>> (STALL_SLOT/(CPU_CYCLES * 5)))) + ((BR_MIS_PRED * 4)/CPU_CYCLES) )",
>>         "BriefDescription": "Bad Speculation",
>>         "PublicDescription": "This metric is the percentage of total
>> slots that executed operations and didn't retire due to a pipeline
>> flush.\nThis indicates cycles that were utilized but inefficiently.",
>>         "MetricGroup": "TopdownL1",
>>         "ScaleUnit": "1percent of slots"
>>     },
>>
> 
> My "wasted" metric was changed according to the arm documentation description, it was originally
> "bad_speculation".  I will change "wasted" back to "bad_speculation", if you wish.

Yeah that would be good. I think since that document we've tried to
align names more to what was already out there and bad_speculation was
probably judged to be a better description. For example it's already
used in tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json

> 
> 
> Thanks,
> Jing
> 
> 
>> So ignoring the errata issue, the main reason to hold off is for
>> consistency and churn because these metrics in this format will be
>> released for all cores going forwards.
>>
>> Thanks
>> James
>>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH RFC 0/6] Add metrics for neoverse-n2
@ 2022-11-22 11:53           ` James Clark
  0 siblings, 0 replies; 96+ messages in thread
From: James Clark @ 2022-11-22 11:53 UTC (permalink / raw)
  To: Jing Zhang, nick Forrington, Jumana MP, John Garry, Ian Rogers
  Cc: Will Deacon, Mike Leach, Leo Yan, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Andrew Kilroy, Shuai Xue, Zhuo Song,
	linux-arm-kernel, linux-perf-users, linux-kernel



On 22/11/2022 07:11, Jing Zhang wrote:
> 
> 
> 在 2022/11/21 下午7:51, James Clark 写道:
>>
>>
>> On 16/11/2022 15:26, Jing Zhang wrote:
>>>
>>>
>>> 在 2022/11/16 下午7:19, James Clark 写道:
>>>>
>>>>
>>>> On 31/10/2022 11:11, Jing Zhang wrote:
>>>>> This series add six metricgroups for neoverse-n2, among which, the
>>>>> formula of topdown L1 is from the document:
>>>>> https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
>>>>>
>>>>> Since neoverse-n2 does not yet support topdown L2, metricgroups such
>>>>> as Cache, TLB, Branch, InstructionsMix, and PEutilization are added to
>>>>> help further analysis of performance bottlenecks.
>>>>>
>>>>
>>>> Hi Jing,
>>>>
>>>> Thanks for working on this, these metrics look ok to me in general,
>>>> although we're currently working on publishing standardised metrics
>>>> across all new cores as part of a new project in Arm. This will include
>>>> N2, and our ones are very similar (or almost identical) to yours,
>>>> barring slightly different group names, metric names, and differences in
>>>> things like outputting topdown metrics as percentages.
>>>>
>>>> We plan to publish our standard metrics some time in the next 2 months.
>>>> Would you consider holding off on merging this change so that we have
>>>> consistant group names and units going forward? Otherwise N2 would be> the odd one out. I will send you the metrics when they are ready, and we
>>>> will have a script to generate perf jsons from them, so you can review.
>>>>
>>>
>>> Do you mean that after you release the new standard metrics, I remake my
>>> patch referring to them, such as consistent group names and unit?
>>
>> Hi Jing,
>>
>> I was planning to submit the patch myself, but there will be a script to
>> generate perf json files, so no manual work would be needed. Although
>> this is complicated by the fact that we won't be publishing the fixed
>> TopdownL1 metrics that you have for the existing N2 silicon so there
>> would be a one time copy paste to fix that part.
>>
>>>
>>>
>>>> We also have a slightly different forumula for one of the top down
>>>> metrics which I think would be slightly more accurate. We don't have
>>>
>>>
>>> The v2 version of the patchset updated the formula of topdown L1.
>>> Link: https://lore.kernel.org/all/1668411720-3581-1-git-send-email-renyu.zj@linux.alibaba.com/
>>>
>>> The formula of the v2 version is more accurate than v1, and it has been
>>> verified in our test environment. Can you share your formula first and we
>>> can discuss it together? :)
>>
>> I was looking at v2 but replied to the root of the thread by mistake. I
>> also had it the wrong way round. So your version corrects for the errata
>> on the current version of N2 (as you mentioned in the commit message).
>> Our version would be if there is a future new silicon revision with that
>> fixed, but it does have an extra improvement by subtracting the branch
>> mispredicts.
>>
>> Perf doesn't currently match the jsons based on silicon revision, so
>> we'd have to add something in for that if a fixed silicon version is
>> released. But this is another problem for another time.
>>
> 
> Hi James,
> 
> Let's do what Ian said, and you can improve it later with the standard metrics,
> after the fixed silicon version is released.
> 

Ok that's fine by me. I do have one update about our publishing progress
to share. This is the (currently empty) repo that we will be holding our
metrics in: https://gitlab.arm.com/telemetry-solution/telemetry-solution

We'll also have the conversion script in there as well. So there has at
least been some progress and we're getting close. I will keep you
updated when it is populated.

> 
>> This is the frontend bound metric we have for future revisions:
>>
>> 	"100 * ( (STALL_SLOT_FRONTEND/(CPU_CYCLES * 5)) - ((BR_MIS_PRED *
>> 4)/CPU_CYCLES) )"
>>
>> Other changes are, for example, your 'wasted' metric, we have
>> 'bad_speculation', and without the
>> cycles subtraction:
>>
>> 	100 * ( ((1 - (OP_RETIRED/OP_SPEC)) * (1 - (STALL_SLOT/(CPU_CYCLES *
>> 5)))) + ((BR_MIS_PRED * 4)/CPU_CYCLES) )
>>
> 
> Thanks for sharing your metric version, But I still wonder, is BR_MIS_PRED not classified
> as frontend bound? 

We're counting branch mispredicts as an extra cost so we subtract it
from frontend_bound because branch related stalls are covered by
bad_speculation where we have added BR_MIS_PRED instead of subtracting.

Unfortunately I'm just the middle man here, I didn't actually work
directly on producing these metrics so I hope nothing gets lost in my
explanation.

> How do you judge the extra improvement by subtracting branch mispredicts?

As far as I know the repo that I mentioned above will have some
benchmarks and tooling that were used to validate our version. So it
should be apparent by running those.

> 
>> And some more details filled in around the units, for example:
>>
>>     {
>>         "MetricName": "bad_speculation",
>>         "MetricExpr": "100 * ( ((1 - (OP_RETIRED/OP_SPEC)) * (1 -
>> (STALL_SLOT/(CPU_CYCLES * 5)))) + ((BR_MIS_PRED * 4)/CPU_CYCLES) )",
>>         "BriefDescription": "Bad Speculation",
>>         "PublicDescription": "This metric is the percentage of total
>> slots that executed operations and didn't retire due to a pipeline
>> flush.\nThis indicates cycles that were utilized but inefficiently.",
>>         "MetricGroup": "TopdownL1",
>>         "ScaleUnit": "1percent of slots"
>>     },
>>
> 
> My "wasted" metric was changed according to the arm documentation description, it was originally
> "bad_speculation".  I will change "wasted" back to "bad_speculation", if you wish.

Yeah that would be good. I think since that document we've tried to
align names more to what was already out there and bad_speculation was
probably judged to be a better description. For example it's already
used in tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json

> 
> 
> Thanks,
> Jing
> 
> 
>> So ignoring the errata issue, the main reason to hold off is for
>> consistency and churn because these metrics in this format will be
>> released for all cores going forwards.
>>
>> Thanks
>> James
>>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [External] : [RFC PATCH v2 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
  2022-11-21 17:55                 ` John Garry
@ 2022-11-22 14:00                   ` James Clark
  -1 siblings, 0 replies; 96+ messages in thread
From: James Clark @ 2022-11-22 14:00 UTC (permalink / raw)
  To: John Garry, Jing Zhang
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song, linux-arm-kernel,
	linux-perf-users, linux-kernel, Will Deacon, Mike Leach, Leo Yan,
	Ian Rogers



On 21/11/2022 17:55, John Garry wrote:
> On 21/11/2022 15:17, Jing Zhang wrote:
>> I'm sorry that I misunderstood the purpose of putting metric as
>> arch_std_event at first,
>> and now it works after the modification over your suggestion.
>>
>> But there are also a few questions:
>>
>> 1. The value of the slot in the topdownL1 is various in different
>> architectures, for example,
>> the slot is 5 on neoverse-n2. If I put topdownL1 metric as
>> arch_std_event, then I need to
>> specify the slot to 5 in n2. I can specify slot values in metric like
>> below, but is there any
>> other concise way to do this?
>>
>> diff --git
>> a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> index 8ff1dfe..b473baf 100644
>> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> @@ -1,4 +1,23 @@
>> [
>> +       {
>> +               "MetricExpr": "5",
>> +               "PublicDescription": "A pipeline slot represents the
>> hardware resources needed to process one uOp",
>> +               "BriefDescription": "A pipeline slot represents the
>> hardware resources needed to process one uOp",
>> +               "MetricName": "slot"
> 
> Ehhh....I'm not sure if that is a good idea. Ian or anyone else have an
> opinion on this? It is possible to reuse metrics, so it should work, but...
> 
> One problem is that "slot" would show up as a metric, which you would
> not want.
> 
> Alternatively I was going to suggest that you can overwrite specific std
> arch event attributes. So for example of frontend_bound, you could have:

I would agree with not having this and just hard coding the 5 wherever
it's needed. Once we have a few different sets of metrics in place maybe
we can start to look at deduplication, but for now I don't see the value.

> 
> + b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> @@ -0,0 +1,30 @@
> [
>     {
>     "ArchStdEvent": "FRONTEND_BOUND",
>         "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 *
> cpu_cycles)",
>     },
> 
>> +       }
>> +       {
>> +               "ArchStdEvent": "FRONTEND_BOUND"
>> +       },
>> +       {
>> +               "ArchStdEvent": "BACKEND_BOUND"
>> +       },
>> +       {
>> +               "ArchStdEvent": "WASTED"
>> +       },
>> +       {
>> +               "ArchStdEvent": "RETIRING"
>> +       },
>>
>>
>> 2. Should I add the topdownL1 metric to
>> tools/perf/pmu-event/recommended.json,
>> or create a new json file to place the general metric?
> 
> It would not belong in recommended.json as that is specifically for
> arch-recommended events. It would really just depend on where the value
> comes from, i.e. arm arm or sbsa.
> 

For what we're going to publish shortly we'll be generating a
metrics.json file for each CPU. It will be autogenerated so I don't
think duplication will be an issue and I'm expecting that there will be
differences in the topdown metrics between CPUs anyway. So I would also
vote to not put it in recommended.json

>>
>> Looking forward to your reply.
> 

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [External] : [RFC PATCH v2 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
@ 2022-11-22 14:00                   ` James Clark
  0 siblings, 0 replies; 96+ messages in thread
From: James Clark @ 2022-11-22 14:00 UTC (permalink / raw)
  To: John Garry, Jing Zhang
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song, linux-arm-kernel,
	linux-perf-users, linux-kernel, Will Deacon, Mike Leach, Leo Yan,
	Ian Rogers



On 21/11/2022 17:55, John Garry wrote:
> On 21/11/2022 15:17, Jing Zhang wrote:
>> I'm sorry that I misunderstood the purpose of putting metric as
>> arch_std_event at first,
>> and now it works after the modification over your suggestion.
>>
>> But there are also a few questions:
>>
>> 1. The value of the slot in the topdownL1 is various in different
>> architectures, for example,
>> the slot is 5 on neoverse-n2. If I put topdownL1 metric as
>> arch_std_event, then I need to
>> specify the slot to 5 in n2. I can specify slot values in metric like
>> below, but is there any
>> other concise way to do this?
>>
>> diff --git
>> a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> index 8ff1dfe..b473baf 100644
>> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> @@ -1,4 +1,23 @@
>> [
>> +       {
>> +               "MetricExpr": "5",
>> +               "PublicDescription": "A pipeline slot represents the
>> hardware resources needed to process one uOp",
>> +               "BriefDescription": "A pipeline slot represents the
>> hardware resources needed to process one uOp",
>> +               "MetricName": "slot"
> 
> Ehhh....I'm not sure if that is a good idea. Ian or anyone else have an
> opinion on this? It is possible to reuse metrics, so it should work, but...
> 
> One problem is that "slot" would show up as a metric, which you would
> not want.
> 
> Alternatively I was going to suggest that you can overwrite specific std
> arch event attributes. So for example of frontend_bound, you could have:

I would agree with not having this and just hard coding the 5 wherever
it's needed. Once we have a few different sets of metrics in place maybe
we can start to look at deduplication, but for now I don't see the value.

> 
> + b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> @@ -0,0 +1,30 @@
> [
>     {
>     "ArchStdEvent": "FRONTEND_BOUND",
>         "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 *
> cpu_cycles)",
>     },
> 
>> +       }
>> +       {
>> +               "ArchStdEvent": "FRONTEND_BOUND"
>> +       },
>> +       {
>> +               "ArchStdEvent": "BACKEND_BOUND"
>> +       },
>> +       {
>> +               "ArchStdEvent": "WASTED"
>> +       },
>> +       {
>> +               "ArchStdEvent": "RETIRING"
>> +       },
>>
>>
>> 2. Should I add the topdownL1 metric to
>> tools/perf/pmu-event/recommended.json,
>> or create a new json file to place the general metric?
> 
> It would not belong in recommended.json as that is specifically for
> arch-recommended events. It would really just depend on where the value
> comes from, i.e. arm arm or sbsa.
> 

For what we're going to publish shortly we'll be generating a
metrics.json file for each CPU. It will be autogenerated so I don't
think duplication will be an issue and I'm expecting that there will be
differences in the topdown metrics between CPUs anyway. So I would also
vote to not put it in recommended.json

>>
>> Looking forward to your reply.
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [External] : [RFC PATCH v2 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
  2022-11-22 14:00                   ` James Clark
@ 2022-11-22 15:41                     ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-22 15:41 UTC (permalink / raw)
  To: James Clark, John Garry
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song, linux-arm-kernel,
	linux-perf-users, linux-kernel, Will Deacon, Mike Leach, Leo Yan,
	Ian Rogers



在 2022/11/22 下午10:00, James Clark 写道:
> 
> 
> On 21/11/2022 17:55, John Garry wrote:
>> On 21/11/2022 15:17, Jing Zhang wrote:
>>> I'm sorry that I misunderstood the purpose of putting metric as
>>> arch_std_event at first,
>>> and now it works after the modification over your suggestion.
>>>
>>> But there are also a few questions:
>>>
>>> 1. The value of the slot in the topdownL1 is various in different
>>> architectures, for example,
>>> the slot is 5 on neoverse-n2. If I put topdownL1 metric as
>>> arch_std_event, then I need to
>>> specify the slot to 5 in n2. I can specify slot values in metric like
>>> below, but is there any
>>> other concise way to do this?
>>>
>>> diff --git
>>> a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>> b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>> index 8ff1dfe..b473baf 100644
>>> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>> @@ -1,4 +1,23 @@
>>> [
>>> +       {
>>> +               "MetricExpr": "5",
>>> +               "PublicDescription": "A pipeline slot represents the
>>> hardware resources needed to process one uOp",
>>> +               "BriefDescription": "A pipeline slot represents the
>>> hardware resources needed to process one uOp",
>>> +               "MetricName": "slot"
>>
>> Ehhh....I'm not sure if that is a good idea. Ian or anyone else have an
>> opinion on this? It is possible to reuse metrics, so it should work, but...
>>
>> One problem is that "slot" would show up as a metric, which you would
>> not want.
>>
>> Alternatively I was going to suggest that you can overwrite specific std
>> arch event attributes. So for example of frontend_bound, you could have:
> 
> I would agree with not having this and just hard coding the 5 wherever
> it's needed. Once we have a few different sets of metrics in place maybe
> we can start to look at deduplication, but for now I don't see the value.
> 
>>
>> + b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> @@ -0,0 +1,30 @@
>> [
>>     {
>>     "ArchStdEvent": "FRONTEND_BOUND",
>>         "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 *
>> cpu_cycles)",
>>     },
>>
>>> +       }
>>> +       {
>>> +               "ArchStdEvent": "FRONTEND_BOUND"
>>> +       },
>>> +       {
>>> +               "ArchStdEvent": "BACKEND_BOUND"
>>> +       },
>>> +       {
>>> +               "ArchStdEvent": "WASTED"
>>> +       },
>>> +       {
>>> +               "ArchStdEvent": "RETIRING"
>>> +       },
>>>
>>>
>>> 2. Should I add the topdownL1 metric to
>>> tools/perf/pmu-event/recommended.json,
>>> or create a new json file to place the general metric?
>>
>> It would not belong in recommended.json as that is specifically for
>> arch-recommended events. It would really just depend on where the value
>> comes from, i.e. arm arm or sbsa.
>>
> 
> For what we're going to publish shortly we'll be generating a
> metrics.json file for each CPU. It will be autogenerated so I don't
> think duplication will be an issue and I'm expecting that there will be
> differences in the topdown metrics between CPUs anyway. So I would also
> vote to not put it in recommended.json
> 

I will create a new sbsa.json file in tools/perf/pmu-events/arch/arm64/
to place metrics that may be common between some CPUs, just like arch_std_event.
If the topdown metrics are different in other CPUs, we can overwrite the
metric expression.

For example:

+++ b/tools/perf/pmu-events/arch/arm64/sbsa.json
@@ -0,0 +1,9 @@
+[
+    {
+        "MetricExpr": "stall_slot_frontend / (slot * cpu_cycles)",
+        "PublicDescription": "Frontend bound L1 topdown metric",
+        "BriefDescription": "Frontend bound L1 topdown metric",
+        "MetricGroup": "TopDownL1",
+        "MetricName": "FRONTEND_BOUND"
+    }
+]

+ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -0,0 +1,30 @@
+[
+   {
+   	"ArchStdEvent": "FRONTEND_BOUND",
+        "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * cpu_cycles)",
+   }
+]


In addition, I can also add TLB, Cache, Branch, InstructionMix, PEutilization
and other metric groups into sbsa.json, because they are also applicable to
neoverse-n1. Above metrics are described in the documentation of neoverse-n1:
https://developer.arm.com/documentation/PJDOC-466751330-547673/r4p1/


Thanks,
Jing


>>>
>>> Looking forward to your reply.
>>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [External] : [RFC PATCH v2 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
@ 2022-11-22 15:41                     ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-22 15:41 UTC (permalink / raw)
  To: James Clark, John Garry
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song, linux-arm-kernel,
	linux-perf-users, linux-kernel, Will Deacon, Mike Leach, Leo Yan,
	Ian Rogers



在 2022/11/22 下午10:00, James Clark 写道:
> 
> 
> On 21/11/2022 17:55, John Garry wrote:
>> On 21/11/2022 15:17, Jing Zhang wrote:
>>> I'm sorry that I misunderstood the purpose of putting metric as
>>> arch_std_event at first,
>>> and now it works after the modification over your suggestion.
>>>
>>> But there are also a few questions:
>>>
>>> 1. The value of the slot in the topdownL1 is various in different
>>> architectures, for example,
>>> the slot is 5 on neoverse-n2. If I put topdownL1 metric as
>>> arch_std_event, then I need to
>>> specify the slot to 5 in n2. I can specify slot values in metric like
>>> below, but is there any
>>> other concise way to do this?
>>>
>>> diff --git
>>> a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>> b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>> index 8ff1dfe..b473baf 100644
>>> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>> @@ -1,4 +1,23 @@
>>> [
>>> +       {
>>> +               "MetricExpr": "5",
>>> +               "PublicDescription": "A pipeline slot represents the
>>> hardware resources needed to process one uOp",
>>> +               "BriefDescription": "A pipeline slot represents the
>>> hardware resources needed to process one uOp",
>>> +               "MetricName": "slot"
>>
>> Ehhh....I'm not sure if that is a good idea. Ian or anyone else have an
>> opinion on this? It is possible to reuse metrics, so it should work, but...
>>
>> One problem is that "slot" would show up as a metric, which you would
>> not want.
>>
>> Alternatively I was going to suggest that you can overwrite specific std
>> arch event attributes. So for example of frontend_bound, you could have:
> 
> I would agree with not having this and just hard coding the 5 wherever
> it's needed. Once we have a few different sets of metrics in place maybe
> we can start to look at deduplication, but for now I don't see the value.
> 
>>
>> + b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> @@ -0,0 +1,30 @@
>> [
>>     {
>>     "ArchStdEvent": "FRONTEND_BOUND",
>>         "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 *
>> cpu_cycles)",
>>     },
>>
>>> +       }
>>> +       {
>>> +               "ArchStdEvent": "FRONTEND_BOUND"
>>> +       },
>>> +       {
>>> +               "ArchStdEvent": "BACKEND_BOUND"
>>> +       },
>>> +       {
>>> +               "ArchStdEvent": "WASTED"
>>> +       },
>>> +       {
>>> +               "ArchStdEvent": "RETIRING"
>>> +       },
>>>
>>>
>>> 2. Should I add the topdownL1 metric to
>>> tools/perf/pmu-event/recommended.json,
>>> or create a new json file to place the general metric?
>>
>> It would not belong in recommended.json as that is specifically for
>> arch-recommended events. It would really just depend on where the value
>> comes from, i.e. arm arm or sbsa.
>>
> 
> For what we're going to publish shortly we'll be generating a
> metrics.json file for each CPU. It will be autogenerated so I don't
> think duplication will be an issue and I'm expecting that there will be
> differences in the topdown metrics between CPUs anyway. So I would also
> vote to not put it in recommended.json
> 

I will create a new sbsa.json file in tools/perf/pmu-events/arch/arm64/
to place metrics that may be common between some CPUs, just like arch_std_event.
If the topdown metrics are different in other CPUs, we can overwrite the
metric expression.

For example:

+++ b/tools/perf/pmu-events/arch/arm64/sbsa.json
@@ -0,0 +1,9 @@
+[
+    {
+        "MetricExpr": "stall_slot_frontend / (slot * cpu_cycles)",
+        "PublicDescription": "Frontend bound L1 topdown metric",
+        "BriefDescription": "Frontend bound L1 topdown metric",
+        "MetricGroup": "TopDownL1",
+        "MetricName": "FRONTEND_BOUND"
+    }
+]

+ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -0,0 +1,30 @@
+[
+   {
+   	"ArchStdEvent": "FRONTEND_BOUND",
+        "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * cpu_cycles)",
+   }
+]


In addition, I can also add TLB, Cache, Branch, InstructionMix, PEutilization
and other metric groups into sbsa.json, because they are also applicable to
neoverse-n1. Above metrics are described in the documentation of neoverse-n1:
https://developer.arm.com/documentation/PJDOC-466751330-547673/r4p1/


Thanks,
Jing


>>>
>>> Looking forward to your reply.
>>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [External] : [RFC PATCH v2 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
  2022-11-22 15:41                     ` Jing Zhang
@ 2022-11-23 14:26                       ` James Clark
  -1 siblings, 0 replies; 96+ messages in thread
From: James Clark @ 2022-11-23 14:26 UTC (permalink / raw)
  To: Jing Zhang, John Garry
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song, linux-arm-kernel,
	linux-perf-users, linux-kernel, Will Deacon, Mike Leach, Leo Yan,
	Ian Rogers, Nick Forrington



On 22/11/2022 15:41, Jing Zhang wrote:
> 
> 
> 在 2022/11/22 下午10:00, James Clark 写道:
>>
>>
>> On 21/11/2022 17:55, John Garry wrote:
>>> On 21/11/2022 15:17, Jing Zhang wrote:
>>>> I'm sorry that I misunderstood the purpose of putting metric as
>>>> arch_std_event at first,
>>>> and now it works after the modification over your suggestion.
>>>>
>>>> But there are also a few questions:
>>>>
>>>> 1. The value of the slot in the topdownL1 is various in different
>>>> architectures, for example,
>>>> the slot is 5 on neoverse-n2. If I put topdownL1 metric as
>>>> arch_std_event, then I need to
>>>> specify the slot to 5 in n2. I can specify slot values in metric like
>>>> below, but is there any
>>>> other concise way to do this?
>>>>
>>>> diff --git
>>>> a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>> b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>> index 8ff1dfe..b473baf 100644
>>>> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>> @@ -1,4 +1,23 @@
>>>> [
>>>> +       {
>>>> +               "MetricExpr": "5",
>>>> +               "PublicDescription": "A pipeline slot represents the
>>>> hardware resources needed to process one uOp",
>>>> +               "BriefDescription": "A pipeline slot represents the
>>>> hardware resources needed to process one uOp",
>>>> +               "MetricName": "slot"
>>>
>>> Ehhh....I'm not sure if that is a good idea. Ian or anyone else have an
>>> opinion on this? It is possible to reuse metrics, so it should work, but...
>>>
>>> One problem is that "slot" would show up as a metric, which you would
>>> not want.
>>>
>>> Alternatively I was going to suggest that you can overwrite specific std
>>> arch event attributes. So for example of frontend_bound, you could have:
>>
>> I would agree with not having this and just hard coding the 5 wherever
>> it's needed. Once we have a few different sets of metrics in place maybe
>> we can start to look at deduplication, but for now I don't see the value.
>>
>>>
>>> + b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>> @@ -0,0 +1,30 @@
>>> [
>>>     {
>>>     "ArchStdEvent": "FRONTEND_BOUND",
>>>         "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 *
>>> cpu_cycles)",
>>>     },
>>>
>>>> +       }
>>>> +       {
>>>> +               "ArchStdEvent": "FRONTEND_BOUND"
>>>> +       },
>>>> +       {
>>>> +               "ArchStdEvent": "BACKEND_BOUND"
>>>> +       },
>>>> +       {
>>>> +               "ArchStdEvent": "WASTED"
>>>> +       },
>>>> +       {
>>>> +               "ArchStdEvent": "RETIRING"
>>>> +       },
>>>>
>>>>
>>>> 2. Should I add the topdownL1 metric to
>>>> tools/perf/pmu-event/recommended.json,
>>>> or create a new json file to place the general metric?
>>>
>>> It would not belong in recommended.json as that is specifically for
>>> arch-recommended events. It would really just depend on where the value
>>> comes from, i.e. arm arm or sbsa.
>>>
>>
>> For what we're going to publish shortly we'll be generating a
>> metrics.json file for each CPU. It will be autogenerated so I don't
>> think duplication will be an issue and I'm expecting that there will be
>> differences in the topdown metrics between CPUs anyway. So I would also
>> vote to not put it in recommended.json
>>
> 
> I will create a new sbsa.json file in tools/perf/pmu-events/arch/arm64/
> to place metrics that may be common between some CPUs, just like arch_std_event.

Because this would apply to all CPUs rather than just N2, I still think
it's best to wait for our metrics repo to be published. Otherwise Arm
will start publishing metrics with names and group names for all future
CPUs that have different names to the common ones added as part of this
change.

It's something that we've been working on for quite a while and we've
taken care to make sure that it applies to future products and is scalable.

It would be easier to add these right now only for N2, and then
afterwards we can start to look at what is common and could be factored
out into the top level folder.

> If the topdown metrics are different in other CPUs, we can overwrite the
> metric expression.

True, but with different group names and metric names and units it could
get slightly complicated.

> 
> For example:
> 
> +++ b/tools/perf/pmu-events/arch/arm64/sbsa.json
> @@ -0,0 +1,9 @@
> +[
> +    {
> +        "MetricExpr": "stall_slot_frontend / (slot * cpu_cycles)",
> +        "PublicDescription": "Frontend bound L1 topdown metric",
> +        "BriefDescription": "Frontend bound L1 topdown metric",
> +        "MetricGroup": "TopDownL1",
> +        "MetricName": "FRONTEND_BOUND"
> +    }
> +]
> 
> + b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> @@ -0,0 +1,30 @@
> +[
> +   {
> +   	"ArchStdEvent": "FRONTEND_BOUND",
> +        "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * cpu_cycles)",
> +   }
> +]
> 

With the auto generation of metrics file I don't really see too much
benefit of doing it this way.

You also run into the issue where if a platform happens to define all of
the events required by a metric, will that metric appear automatically,
even if it's not valid?

> 
> In addition, I can also add TLB, Cache, Branch, InstructionMix, PEutilization
> and other metric groups into sbsa.json, because they are also applicable to
> neoverse-n1. Above metrics are described in the documentation of neoverse-n1:
> https://developer.arm.com/documentation/PJDOC-466751330-547673/r4p1/
> 
> 
> Thanks,
> Jing
> 
> 
>>>>
>>>> Looking forward to your reply.
>>>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [External] : [RFC PATCH v2 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
@ 2022-11-23 14:26                       ` James Clark
  0 siblings, 0 replies; 96+ messages in thread
From: James Clark @ 2022-11-23 14:26 UTC (permalink / raw)
  To: Jing Zhang, John Garry
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song, linux-arm-kernel,
	linux-perf-users, linux-kernel, Will Deacon, Mike Leach, Leo Yan,
	Ian Rogers, Nick Forrington



On 22/11/2022 15:41, Jing Zhang wrote:
> 
> 
> 在 2022/11/22 下午10:00, James Clark 写道:
>>
>>
>> On 21/11/2022 17:55, John Garry wrote:
>>> On 21/11/2022 15:17, Jing Zhang wrote:
>>>> I'm sorry that I misunderstood the purpose of putting metric as
>>>> arch_std_event at first,
>>>> and now it works after the modification over your suggestion.
>>>>
>>>> But there are also a few questions:
>>>>
>>>> 1. The value of the slot in the topdownL1 is various in different
>>>> architectures, for example,
>>>> the slot is 5 on neoverse-n2. If I put topdownL1 metric as
>>>> arch_std_event, then I need to
>>>> specify the slot to 5 in n2. I can specify slot values in metric like
>>>> below, but is there any
>>>> other concise way to do this?
>>>>
>>>> diff --git
>>>> a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>> b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>> index 8ff1dfe..b473baf 100644
>>>> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>> @@ -1,4 +1,23 @@
>>>> [
>>>> +       {
>>>> +               "MetricExpr": "5",
>>>> +               "PublicDescription": "A pipeline slot represents the
>>>> hardware resources needed to process one uOp",
>>>> +               "BriefDescription": "A pipeline slot represents the
>>>> hardware resources needed to process one uOp",
>>>> +               "MetricName": "slot"
>>>
>>> Ehhh....I'm not sure if that is a good idea. Ian or anyone else have an
>>> opinion on this? It is possible to reuse metrics, so it should work, but...
>>>
>>> One problem is that "slot" would show up as a metric, which you would
>>> not want.
>>>
>>> Alternatively I was going to suggest that you can overwrite specific std
>>> arch event attributes. So for example of frontend_bound, you could have:
>>
>> I would agree with not having this and just hard coding the 5 wherever
>> it's needed. Once we have a few different sets of metrics in place maybe
>> we can start to look at deduplication, but for now I don't see the value.
>>
>>>
>>> + b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>> @@ -0,0 +1,30 @@
>>> [
>>>     {
>>>     "ArchStdEvent": "FRONTEND_BOUND",
>>>         "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 *
>>> cpu_cycles)",
>>>     },
>>>
>>>> +       }
>>>> +       {
>>>> +               "ArchStdEvent": "FRONTEND_BOUND"
>>>> +       },
>>>> +       {
>>>> +               "ArchStdEvent": "BACKEND_BOUND"
>>>> +       },
>>>> +       {
>>>> +               "ArchStdEvent": "WASTED"
>>>> +       },
>>>> +       {
>>>> +               "ArchStdEvent": "RETIRING"
>>>> +       },
>>>>
>>>>
>>>> 2. Should I add the topdownL1 metric to
>>>> tools/perf/pmu-event/recommended.json,
>>>> or create a new json file to place the general metric?
>>>
>>> It would not belong in recommended.json as that is specifically for
>>> arch-recommended events. It would really just depend on where the value
>>> comes from, i.e. arm arm or sbsa.
>>>
>>
>> For what we're going to publish shortly we'll be generating a
>> metrics.json file for each CPU. It will be autogenerated so I don't
>> think duplication will be an issue and I'm expecting that there will be
>> differences in the topdown metrics between CPUs anyway. So I would also
>> vote to not put it in recommended.json
>>
> 
> I will create a new sbsa.json file in tools/perf/pmu-events/arch/arm64/
> to place metrics that may be common between some CPUs, just like arch_std_event.

Because this would apply to all CPUs rather than just N2, I still think
it's best to wait for our metrics repo to be published. Otherwise Arm
will start publishing metrics with names and group names for all future
CPUs that have different names to the common ones added as part of this
change.

It's something that we've been working on for quite a while and we've
taken care to make sure that it applies to future products and is scalable.

It would be easier to add these right now only for N2, and then
afterwards we can start to look at what is common and could be factored
out into the top level folder.

> If the topdown metrics are different in other CPUs, we can overwrite the
> metric expression.

True, but with different group names and metric names and units it could
get slightly complicated.

> 
> For example:
> 
> +++ b/tools/perf/pmu-events/arch/arm64/sbsa.json
> @@ -0,0 +1,9 @@
> +[
> +    {
> +        "MetricExpr": "stall_slot_frontend / (slot * cpu_cycles)",
> +        "PublicDescription": "Frontend bound L1 topdown metric",
> +        "BriefDescription": "Frontend bound L1 topdown metric",
> +        "MetricGroup": "TopDownL1",
> +        "MetricName": "FRONTEND_BOUND"
> +    }
> +]
> 
> + b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> @@ -0,0 +1,30 @@
> +[
> +   {
> +   	"ArchStdEvent": "FRONTEND_BOUND",
> +        "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * cpu_cycles)",
> +   }
> +]
> 

With the auto generation of metrics file I don't really see too much
benefit of doing it this way.

You also run into the issue where if a platform happens to define all of
the events required by a metric, will that metric appear automatically,
even if it's not valid?

> 
> In addition, I can also add TLB, Cache, Branch, InstructionMix, PEutilization
> and other metric groups into sbsa.json, because they are also applicable to
> neoverse-n1. Above metrics are described in the documentation of neoverse-n1:
> https://developer.arm.com/documentation/PJDOC-466751330-547673/r4p1/
> 
> 
> Thanks,
> Jing
> 
> 
>>>>
>>>> Looking forward to your reply.
>>>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [External] : [RFC PATCH v2 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
  2022-11-23 14:26                       ` James Clark
@ 2022-11-24 16:32                         ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-24 16:32 UTC (permalink / raw)
  To: James Clark, John Garry
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song, linux-arm-kernel,
	linux-perf-users, linux-kernel, Will Deacon, Mike Leach, Leo Yan,
	Ian Rogers, Nick Forrington



在 2022/11/23 下午10:26, James Clark 写道:
> 
> 
> On 22/11/2022 15:41, Jing Zhang wrote:
>>
>>
>> 在 2022/11/22 下午10:00, James Clark 写道:
>>>
>>>
>>> On 21/11/2022 17:55, John Garry wrote:
>>>> On 21/11/2022 15:17, Jing Zhang wrote:
>>>>> I'm sorry that I misunderstood the purpose of putting metric as
>>>>> arch_std_event at first,
>>>>> and now it works after the modification over your suggestion.
>>>>>
>>>>> But there are also a few questions:
>>>>>
>>>>> 1. The value of the slot in the topdownL1 is various in different
>>>>> architectures, for example,
>>>>> the slot is 5 on neoverse-n2. If I put topdownL1 metric as
>>>>> arch_std_event, then I need to
>>>>> specify the slot to 5 in n2. I can specify slot values in metric like
>>>>> below, but is there any
>>>>> other concise way to do this?
>>>>>
>>>>> diff --git
>>>>> a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>>> b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>>> index 8ff1dfe..b473baf 100644
>>>>> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>>> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>>> @@ -1,4 +1,23 @@
>>>>> [
>>>>> +       {
>>>>> +               "MetricExpr": "5",
>>>>> +               "PublicDescription": "A pipeline slot represents the
>>>>> hardware resources needed to process one uOp",
>>>>> +               "BriefDescription": "A pipeline slot represents the
>>>>> hardware resources needed to process one uOp",
>>>>> +               "MetricName": "slot"
>>>>
>>>> Ehhh....I'm not sure if that is a good idea. Ian or anyone else have an
>>>> opinion on this? It is possible to reuse metrics, so it should work, but...
>>>>
>>>> One problem is that "slot" would show up as a metric, which you would
>>>> not want.
>>>>
>>>> Alternatively I was going to suggest that you can overwrite specific std
>>>> arch event attributes. So for example of frontend_bound, you could have:
>>>
>>> I would agree with not having this and just hard coding the 5 wherever
>>> it's needed. Once we have a few different sets of metrics in place maybe
>>> we can start to look at deduplication, but for now I don't see the value.
>>>
>>>>
>>>> + b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>> @@ -0,0 +1,30 @@
>>>> [
>>>>     {
>>>>     "ArchStdEvent": "FRONTEND_BOUND",
>>>>         "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 *
>>>> cpu_cycles)",
>>>>     },
>>>>
>>>>> +       }
>>>>> +       {
>>>>> +               "ArchStdEvent": "FRONTEND_BOUND"
>>>>> +       },
>>>>> +       {
>>>>> +               "ArchStdEvent": "BACKEND_BOUND"
>>>>> +       },
>>>>> +       {
>>>>> +               "ArchStdEvent": "WASTED"
>>>>> +       },
>>>>> +       {
>>>>> +               "ArchStdEvent": "RETIRING"
>>>>> +       },
>>>>>
>>>>>
>>>>> 2. Should I add the topdownL1 metric to
>>>>> tools/perf/pmu-event/recommended.json,
>>>>> or create a new json file to place the general metric?
>>>>
>>>> It would not belong in recommended.json as that is specifically for
>>>> arch-recommended events. It would really just depend on where the value
>>>> comes from, i.e. arm arm or sbsa.
>>>>
>>>
>>> For what we're going to publish shortly we'll be generating a
>>> metrics.json file for each CPU. It will be autogenerated so I don't
>>> think duplication will be an issue and I'm expecting that there will be
>>> differences in the topdown metrics between CPUs anyway. So I would also
>>> vote to not put it in recommended.json
>>>
>>
>> I will create a new sbsa.json file in tools/perf/pmu-events/arch/arm64/
>> to place metrics that may be common between some CPUs, just like arch_std_event.
> 
> Because this would apply to all CPUs rather than just N2, I still think
> it's best to wait for our metrics repo to be published. Otherwise Arm
> will start publishing metrics with names and group names for all future
> CPUs that have different names to the common ones added as part of this
> change.
> 
> It's something that we've been working on for quite a while and we've
> taken care to make sure that it applies to future products and is scalable.
> 
> It would be easier to add these right now only for N2, and then
> afterwards we can start to look at what is common and could be factored
> out into the top level folder.
> 
>> If the topdown metrics are different in other CPUs, we can overwrite the
>> metric expression.
> 
> True, but with different group names and metric names and units it could
> get slightly complicated.
> 
>>
>> For example:
>>
>> +++ b/tools/perf/pmu-events/arch/arm64/sbsa.json
>> @@ -0,0 +1,9 @@
>> +[
>> +    {
>> +        "MetricExpr": "stall_slot_frontend / (slot * cpu_cycles)",
>> +        "PublicDescription": "Frontend bound L1 topdown metric",
>> +        "BriefDescription": "Frontend bound L1 topdown metric",
>> +        "MetricGroup": "TopDownL1",
>> +        "MetricName": "FRONTEND_BOUND"
>> +    }
>> +]
>>
>> + b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> @@ -0,0 +1,30 @@
>> +[
>> +   {
>> +   	"ArchStdEvent": "FRONTEND_BOUND",
>> +        "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * cpu_cycles)",
>> +   }
>> +]
>>
> 
> With the auto generation of metrics file I don't really see too much
> benefit of doing it this way.
> 
> You also run into the issue where if a platform happens to define all of
> the events required by a metric, will that metric appear automatically,
> even if it's not valid?
> 

Ok, I agree to put the topdown metric in the n2 metric instead of arch_std_event.
There is no unified formula for the topdown metric currently, and the slots of each
CPU may be different.

After the standard are pubulished in the future, please consider what John said, and
use the general metric as arch_std_event.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [External] : [RFC PATCH v2 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
@ 2022-11-24 16:32                         ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-24 16:32 UTC (permalink / raw)
  To: James Clark, John Garry
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song, linux-arm-kernel,
	linux-perf-users, linux-kernel, Will Deacon, Mike Leach, Leo Yan,
	Ian Rogers, Nick Forrington



在 2022/11/23 下午10:26, James Clark 写道:
> 
> 
> On 22/11/2022 15:41, Jing Zhang wrote:
>>
>>
>> 在 2022/11/22 下午10:00, James Clark 写道:
>>>
>>>
>>> On 21/11/2022 17:55, John Garry wrote:
>>>> On 21/11/2022 15:17, Jing Zhang wrote:
>>>>> I'm sorry that I misunderstood the purpose of putting metric as
>>>>> arch_std_event at first,
>>>>> and now it works after the modification over your suggestion.
>>>>>
>>>>> But there are also a few questions:
>>>>>
>>>>> 1. The value of the slot in the topdownL1 is various in different
>>>>> architectures, for example,
>>>>> the slot is 5 on neoverse-n2. If I put topdownL1 metric as
>>>>> arch_std_event, then I need to
>>>>> specify the slot to 5 in n2. I can specify slot values in metric like
>>>>> below, but is there any
>>>>> other concise way to do this?
>>>>>
>>>>> diff --git
>>>>> a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>>> b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>>> index 8ff1dfe..b473baf 100644
>>>>> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>>> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>>> @@ -1,4 +1,23 @@
>>>>> [
>>>>> +       {
>>>>> +               "MetricExpr": "5",
>>>>> +               "PublicDescription": "A pipeline slot represents the
>>>>> hardware resources needed to process one uOp",
>>>>> +               "BriefDescription": "A pipeline slot represents the
>>>>> hardware resources needed to process one uOp",
>>>>> +               "MetricName": "slot"
>>>>
>>>> Ehhh....I'm not sure if that is a good idea. Ian or anyone else have an
>>>> opinion on this? It is possible to reuse metrics, so it should work, but...
>>>>
>>>> One problem is that "slot" would show up as a metric, which you would
>>>> not want.
>>>>
>>>> Alternatively I was going to suggest that you can overwrite specific std
>>>> arch event attributes. So for example of frontend_bound, you could have:
>>>
>>> I would agree with not having this and just hard coding the 5 wherever
>>> it's needed. Once we have a few different sets of metrics in place maybe
>>> we can start to look at deduplication, but for now I don't see the value.
>>>
>>>>
>>>> + b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>> @@ -0,0 +1,30 @@
>>>> [
>>>>     {
>>>>     "ArchStdEvent": "FRONTEND_BOUND",
>>>>         "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 *
>>>> cpu_cycles)",
>>>>     },
>>>>
>>>>> +       }
>>>>> +       {
>>>>> +               "ArchStdEvent": "FRONTEND_BOUND"
>>>>> +       },
>>>>> +       {
>>>>> +               "ArchStdEvent": "BACKEND_BOUND"
>>>>> +       },
>>>>> +       {
>>>>> +               "ArchStdEvent": "WASTED"
>>>>> +       },
>>>>> +       {
>>>>> +               "ArchStdEvent": "RETIRING"
>>>>> +       },
>>>>>
>>>>>
>>>>> 2. Should I add the topdownL1 metric to
>>>>> tools/perf/pmu-event/recommended.json,
>>>>> or create a new json file to place the general metric?
>>>>
>>>> It would not belong in recommended.json as that is specifically for
>>>> arch-recommended events. It would really just depend on where the value
>>>> comes from, i.e. arm arm or sbsa.
>>>>
>>>
>>> For what we're going to publish shortly we'll be generating a
>>> metrics.json file for each CPU. It will be autogenerated so I don't
>>> think duplication will be an issue and I'm expecting that there will be
>>> differences in the topdown metrics between CPUs anyway. So I would also
>>> vote to not put it in recommended.json
>>>
>>
>> I will create a new sbsa.json file in tools/perf/pmu-events/arch/arm64/
>> to place metrics that may be common between some CPUs, just like arch_std_event.
> 
> Because this would apply to all CPUs rather than just N2, I still think
> it's best to wait for our metrics repo to be published. Otherwise Arm
> will start publishing metrics with names and group names for all future
> CPUs that have different names to the common ones added as part of this
> change.
> 
> It's something that we've been working on for quite a while and we've
> taken care to make sure that it applies to future products and is scalable.
> 
> It would be easier to add these right now only for N2, and then
> afterwards we can start to look at what is common and could be factored
> out into the top level folder.
> 
>> If the topdown metrics are different in other CPUs, we can overwrite the
>> metric expression.
> 
> True, but with different group names and metric names and units it could
> get slightly complicated.
> 
>>
>> For example:
>>
>> +++ b/tools/perf/pmu-events/arch/arm64/sbsa.json
>> @@ -0,0 +1,9 @@
>> +[
>> +    {
>> +        "MetricExpr": "stall_slot_frontend / (slot * cpu_cycles)",
>> +        "PublicDescription": "Frontend bound L1 topdown metric",
>> +        "BriefDescription": "Frontend bound L1 topdown metric",
>> +        "MetricGroup": "TopDownL1",
>> +        "MetricName": "FRONTEND_BOUND"
>> +    }
>> +]
>>
>> + b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> @@ -0,0 +1,30 @@
>> +[
>> +   {
>> +   	"ArchStdEvent": "FRONTEND_BOUND",
>> +        "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * cpu_cycles)",
>> +   }
>> +]
>>
> 
> With the auto generation of metrics file I don't really see too much
> benefit of doing it this way.
> 
> You also run into the issue where if a platform happens to define all of
> the events required by a metric, will that metric appear automatically,
> even if it's not valid?
> 

Ok, I agree to put the topdown metric in the n2 metric instead of arch_std_event.
There is no unified formula for the topdown metric currently, and the slots of each
CPU may be different.

After the standard are pubulished in the future, please consider what John said, and
use the general metric as arch_std_event.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [External] : [RFC PATCH v2 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
  2022-11-24 16:32                         ` Jing Zhang
@ 2022-11-24 16:51                           ` James Clark
  -1 siblings, 0 replies; 96+ messages in thread
From: James Clark @ 2022-11-24 16:51 UTC (permalink / raw)
  To: Jing Zhang, John Garry
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song, linux-arm-kernel,
	linux-perf-users, linux-kernel, Will Deacon, Mike Leach, Leo Yan,
	Ian Rogers, Nick Forrington



On 24/11/2022 16:32, Jing Zhang wrote:
> 
> 
> 在 2022/11/23 下午10:26, James Clark 写道:
>>
>>
>> On 22/11/2022 15:41, Jing Zhang wrote:
>>>
>>>
>>> 在 2022/11/22 下午10:00, James Clark 写道:
>>>>
>>>>
>>>> On 21/11/2022 17:55, John Garry wrote:
>>>>> On 21/11/2022 15:17, Jing Zhang wrote:
>>>>>> I'm sorry that I misunderstood the purpose of putting metric as
>>>>>> arch_std_event at first,
>>>>>> and now it works after the modification over your suggestion.
>>>>>>
>>>>>> But there are also a few questions:
>>>>>>
>>>>>> 1. The value of the slot in the topdownL1 is various in different
>>>>>> architectures, for example,
>>>>>> the slot is 5 on neoverse-n2. If I put topdownL1 metric as
>>>>>> arch_std_event, then I need to
>>>>>> specify the slot to 5 in n2. I can specify slot values in metric like
>>>>>> below, but is there any
>>>>>> other concise way to do this?
>>>>>>
>>>>>> diff --git
>>>>>> a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>>>> b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>>>> index 8ff1dfe..b473baf 100644
>>>>>> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>>>> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>>>> @@ -1,4 +1,23 @@
>>>>>> [
>>>>>> +       {
>>>>>> +               "MetricExpr": "5",
>>>>>> +               "PublicDescription": "A pipeline slot represents the
>>>>>> hardware resources needed to process one uOp",
>>>>>> +               "BriefDescription": "A pipeline slot represents the
>>>>>> hardware resources needed to process one uOp",
>>>>>> +               "MetricName": "slot"
>>>>>
>>>>> Ehhh....I'm not sure if that is a good idea. Ian or anyone else have an
>>>>> opinion on this? It is possible to reuse metrics, so it should work, but...
>>>>>
>>>>> One problem is that "slot" would show up as a metric, which you would
>>>>> not want.
>>>>>
>>>>> Alternatively I was going to suggest that you can overwrite specific std
>>>>> arch event attributes. So for example of frontend_bound, you could have:
>>>>
>>>> I would agree with not having this and just hard coding the 5 wherever
>>>> it's needed. Once we have a few different sets of metrics in place maybe
>>>> we can start to look at deduplication, but for now I don't see the value.
>>>>
>>>>>
>>>>> + b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>>> @@ -0,0 +1,30 @@
>>>>> [
>>>>>     {
>>>>>     "ArchStdEvent": "FRONTEND_BOUND",
>>>>>         "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 *
>>>>> cpu_cycles)",
>>>>>     },
>>>>>
>>>>>> +       }
>>>>>> +       {
>>>>>> +               "ArchStdEvent": "FRONTEND_BOUND"
>>>>>> +       },
>>>>>> +       {
>>>>>> +               "ArchStdEvent": "BACKEND_BOUND"
>>>>>> +       },
>>>>>> +       {
>>>>>> +               "ArchStdEvent": "WASTED"
>>>>>> +       },
>>>>>> +       {
>>>>>> +               "ArchStdEvent": "RETIRING"
>>>>>> +       },
>>>>>>
>>>>>>
>>>>>> 2. Should I add the topdownL1 metric to
>>>>>> tools/perf/pmu-event/recommended.json,
>>>>>> or create a new json file to place the general metric?
>>>>>
>>>>> It would not belong in recommended.json as that is specifically for
>>>>> arch-recommended events. It would really just depend on where the value
>>>>> comes from, i.e. arm arm or sbsa.
>>>>>
>>>>
>>>> For what we're going to publish shortly we'll be generating a
>>>> metrics.json file for each CPU. It will be autogenerated so I don't
>>>> think duplication will be an issue and I'm expecting that there will be
>>>> differences in the topdown metrics between CPUs anyway. So I would also
>>>> vote to not put it in recommended.json
>>>>
>>>
>>> I will create a new sbsa.json file in tools/perf/pmu-events/arch/arm64/
>>> to place metrics that may be common between some CPUs, just like arch_std_event.
>>
>> Because this would apply to all CPUs rather than just N2, I still think
>> it's best to wait for our metrics repo to be published. Otherwise Arm
>> will start publishing metrics with names and group names for all future
>> CPUs that have different names to the common ones added as part of this
>> change.
>>
>> It's something that we've been working on for quite a while and we've
>> taken care to make sure that it applies to future products and is scalable.
>>
>> It would be easier to add these right now only for N2, and then
>> afterwards we can start to look at what is common and could be factored
>> out into the top level folder.
>>
>>> If the topdown metrics are different in other CPUs, we can overwrite the
>>> metric expression.
>>
>> True, but with different group names and metric names and units it could
>> get slightly complicated.
>>
>>>
>>> For example:
>>>
>>> +++ b/tools/perf/pmu-events/arch/arm64/sbsa.json
>>> @@ -0,0 +1,9 @@
>>> +[
>>> +    {
>>> +        "MetricExpr": "stall_slot_frontend / (slot * cpu_cycles)",
>>> +        "PublicDescription": "Frontend bound L1 topdown metric",
>>> +        "BriefDescription": "Frontend bound L1 topdown metric",
>>> +        "MetricGroup": "TopDownL1",
>>> +        "MetricName": "FRONTEND_BOUND"
>>> +    }
>>> +]
>>>
>>> + b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>> @@ -0,0 +1,30 @@
>>> +[
>>> +   {
>>> +   	"ArchStdEvent": "FRONTEND_BOUND",
>>> +        "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * cpu_cycles)",
>>> +   }
>>> +]
>>>
>>
>> With the auto generation of metrics file I don't really see too much
>> benefit of doing it this way.
>>
>> You also run into the issue where if a platform happens to define all of
>> the events required by a metric, will that metric appear automatically,
>> even if it's not valid?
>>
> 
> Ok, I agree to put the topdown metric in the n2 metric instead of arch_std_event.
> There is no unified formula for the topdown metric currently, and the slots of each
> CPU may be different.
> 
> After the standard are pubulished in the future, please consider what John said, and
> use the general metric as arch_std_event.

Yep that sounds good, will do!


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [External] : [RFC PATCH v2 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
@ 2022-11-24 16:51                           ` James Clark
  0 siblings, 0 replies; 96+ messages in thread
From: James Clark @ 2022-11-24 16:51 UTC (permalink / raw)
  To: Jing Zhang, John Garry
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song, linux-arm-kernel,
	linux-perf-users, linux-kernel, Will Deacon, Mike Leach, Leo Yan,
	Ian Rogers, Nick Forrington



On 24/11/2022 16:32, Jing Zhang wrote:
> 
> 
> 在 2022/11/23 下午10:26, James Clark 写道:
>>
>>
>> On 22/11/2022 15:41, Jing Zhang wrote:
>>>
>>>
>>> 在 2022/11/22 下午10:00, James Clark 写道:
>>>>
>>>>
>>>> On 21/11/2022 17:55, John Garry wrote:
>>>>> On 21/11/2022 15:17, Jing Zhang wrote:
>>>>>> I'm sorry that I misunderstood the purpose of putting metric as
>>>>>> arch_std_event at first,
>>>>>> and now it works after the modification over your suggestion.
>>>>>>
>>>>>> But there are also a few questions:
>>>>>>
>>>>>> 1. The value of the slot in the topdownL1 is various in different
>>>>>> architectures, for example,
>>>>>> the slot is 5 on neoverse-n2. If I put topdownL1 metric as
>>>>>> arch_std_event, then I need to
>>>>>> specify the slot to 5 in n2. I can specify slot values in metric like
>>>>>> below, but is there any
>>>>>> other concise way to do this?
>>>>>>
>>>>>> diff --git
>>>>>> a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>>>> b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>>>> index 8ff1dfe..b473baf 100644
>>>>>> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>>>> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>>>> @@ -1,4 +1,23 @@
>>>>>> [
>>>>>> +       {
>>>>>> +               "MetricExpr": "5",
>>>>>> +               "PublicDescription": "A pipeline slot represents the
>>>>>> hardware resources needed to process one uOp",
>>>>>> +               "BriefDescription": "A pipeline slot represents the
>>>>>> hardware resources needed to process one uOp",
>>>>>> +               "MetricName": "slot"
>>>>>
>>>>> Ehhh....I'm not sure if that is a good idea. Ian or anyone else have an
>>>>> opinion on this? It is possible to reuse metrics, so it should work, but...
>>>>>
>>>>> One problem is that "slot" would show up as a metric, which you would
>>>>> not want.
>>>>>
>>>>> Alternatively I was going to suggest that you can overwrite specific std
>>>>> arch event attributes. So for example of frontend_bound, you could have:
>>>>
>>>> I would agree with not having this and just hard coding the 5 wherever
>>>> it's needed. Once we have a few different sets of metrics in place maybe
>>>> we can start to look at deduplication, but for now I don't see the value.
>>>>
>>>>>
>>>>> + b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>>> @@ -0,0 +1,30 @@
>>>>> [
>>>>>     {
>>>>>     "ArchStdEvent": "FRONTEND_BOUND",
>>>>>         "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 *
>>>>> cpu_cycles)",
>>>>>     },
>>>>>
>>>>>> +       }
>>>>>> +       {
>>>>>> +               "ArchStdEvent": "FRONTEND_BOUND"
>>>>>> +       },
>>>>>> +       {
>>>>>> +               "ArchStdEvent": "BACKEND_BOUND"
>>>>>> +       },
>>>>>> +       {
>>>>>> +               "ArchStdEvent": "WASTED"
>>>>>> +       },
>>>>>> +       {
>>>>>> +               "ArchStdEvent": "RETIRING"
>>>>>> +       },
>>>>>>
>>>>>>
>>>>>> 2. Should I add the topdownL1 metric to
>>>>>> tools/perf/pmu-event/recommended.json,
>>>>>> or create a new json file to place the general metric?
>>>>>
>>>>> It would not belong in recommended.json as that is specifically for
>>>>> arch-recommended events. It would really just depend on where the value
>>>>> comes from, i.e. arm arm or sbsa.
>>>>>
>>>>
>>>> For what we're going to publish shortly we'll be generating a
>>>> metrics.json file for each CPU. It will be autogenerated so I don't
>>>> think duplication will be an issue and I'm expecting that there will be
>>>> differences in the topdown metrics between CPUs anyway. So I would also
>>>> vote to not put it in recommended.json
>>>>
>>>
>>> I will create a new sbsa.json file in tools/perf/pmu-events/arch/arm64/
>>> to place metrics that may be common between some CPUs, just like arch_std_event.
>>
>> Because this would apply to all CPUs rather than just N2, I still think
>> it's best to wait for our metrics repo to be published. Otherwise Arm
>> will start publishing metrics with names and group names for all future
>> CPUs that have different names to the common ones added as part of this
>> change.
>>
>> It's something that we've been working on for quite a while and we've
>> taken care to make sure that it applies to future products and is scalable.
>>
>> It would be easier to add these right now only for N2, and then
>> afterwards we can start to look at what is common and could be factored
>> out into the top level folder.
>>
>>> If the topdown metrics are different in other CPUs, we can overwrite the
>>> metric expression.
>>
>> True, but with different group names and metric names and units it could
>> get slightly complicated.
>>
>>>
>>> For example:
>>>
>>> +++ b/tools/perf/pmu-events/arch/arm64/sbsa.json
>>> @@ -0,0 +1,9 @@
>>> +[
>>> +    {
>>> +        "MetricExpr": "stall_slot_frontend / (slot * cpu_cycles)",
>>> +        "PublicDescription": "Frontend bound L1 topdown metric",
>>> +        "BriefDescription": "Frontend bound L1 topdown metric",
>>> +        "MetricGroup": "TopDownL1",
>>> +        "MetricName": "FRONTEND_BOUND"
>>> +    }
>>> +]
>>>
>>> + b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>> @@ -0,0 +1,30 @@
>>> +[
>>> +   {
>>> +   	"ArchStdEvent": "FRONTEND_BOUND",
>>> +        "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * cpu_cycles)",
>>> +   }
>>> +]
>>>
>>
>> With the auto generation of metrics file I don't really see too much
>> benefit of doing it this way.
>>
>> You also run into the issue where if a platform happens to define all of
>> the events required by a metric, will that metric appear automatically,
>> even if it's not valid?
>>
> 
> Ok, I agree to put the topdown metric in the n2 metric instead of arch_std_event.
> There is no unified formula for the topdown metric currently, and the slots of each
> CPU may be different.
> 
> After the standard are pubulished in the future, please consider what John said, and
> use the general metric as arch_std_event.

Yep that sounds good, will do!


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v3 0/6] Add metrics for neoverse-n2
  2022-11-14  7:41   ` Jing Zhang
@ 2022-11-24 17:14     ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-24 17:14 UTC (permalink / raw)
  To: John Garry, Ian Rogers, Xing Zhengjun, Will Deacon, James Clark,
	Mike Leach, Leo Yan
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song, Jing Zhang

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 5385 bytes --]

Changes since v2:
- Correct the furmula of Branch metrics;
- Add more PE utilization metrics;
- Add more TLB metrics;
- Add “ScaleUnit” for some metrics;
- Add a newline at the end of the file;
- Link: https://lore.kernel.org/all/1668411720-3581-1-git-send-email-renyu.zj@linux.alibaba.com/

Changes since v1: 
- Corrected formula for topdown L1 due to wrong counts for stall_slot and
  stall_slot_frontend; 
- Link: https://lore.kernel.org/all/1667214694-89839-1-git-send-email-renyu.zj@linux.alibaba.com/

This series add six metricgroups for neoverse-n2, among which, the formula of
topdown L1 is from ARM sbsa7.0 platform design document [0], D37-38.

However, due to the wrong count of stall_slot and stall_slot_frontend on
neoverse-n2, the real stall_slot and real stall_slot_frontend need to
subtract cpu_cycles,  so correct the expression of topdown metrics.
Reference from ARM neoverse-n2 errata notice [1], D117.

Since neoverse-n2 does not yet support topdown L2, metricgroups such as Cache,
TLB, Branch, InstructionsMix, and PEutilization are added to help further
analysis of performance bottlenecks.
Reference from ARM PMU guide [2][3].

[0] https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
[1] https://documentation-service.arm.com/static/636a66a64e6cf12278ad89cb?token=
[2] https://documentation-service.arm.com/static/628f8fa3dfaf015c2b76eae8?token=
[3] https://documentation-service.arm.com/static/62cfe21e31ea212bb6627393?token=

$./perf list

...
Metric Groups:

Branch:
  branch_miss_pred_rate
       [The rate of branches mis-predited to the overall branches]
  branch_mpki
       [The rate of branches mis-predicted per kilo instructions]
  branch_pki
       [The rate of branches retired per kilo instructions]
Cache:
  l1d_cache_miss_rate
       [The rate of L1 D-Cache misses to the overall L1 D-Cache]
  l1d_cache_mpki
       [The rate of L1 D-Cache misses per kilo instructions]
...

$sudo ./perf stat -M TLB false_sharing 2

 Performance counter stats for '/home/yaoxing/beetest/beecases/beeusers/spe/false_sharing 2':

            31,561      L2D_TLB                          #     18.8 %  l2_tlb_miss_rate      (43.23%)
             5,944      L2D_TLB_REFILL                                                       (43.23%)
             2,248      L1I_TLB_REFILL                   #      0.1 %  l1i_tlb_miss_rate     (43.85%)
         2,203,195      L1I_TLB                                                              (43.85%)
       328,647,380      L1D_TLB                          #      0.0 %  l1d_tlb_miss_rate     (44.32%)
            26,347      L1D_TLB_REFILL                                                       (44.32%)
           747,319      L1I_TLB                          #      0.0 %  itlb_walk_rate        (43.74%)
               310      ITLB_WALK                                                            (43.74%)
       839,420,454      INST_RETIRED                     #     0.00 itlb_mpki                (42.77%)
               212      ITLB_WALK                                                            (42.77%)
               468      DTLB_WALK                        #      0.0 %  dtlb_walk_rate        (42.28%)
       265,405,802      L1D_TLB                                                              (42.28%)
       790,874,367      INST_RETIRED                     #     0.00 dtlb_mpki                (42.33%)
                23      DTLB_WALK                                                            (42.33%)

       0.515904553 seconds time elapsed

       1.410313000 seconds user
       0.000000000 seconds sys


$sudo ./perf stat -M TopDownL1 false_sharing 2

 Performance counter stats for '/home/yaoxing/beetest/beecases/beeusers/spe/false_sharing 2':

     4,310,905,590      cpu_cycles                       #      0.0 %  bad_speculation
                                                  #      4.0 %  retiring              (66.87%)
    25,009,763,735      stall_slot                                                           (66.87%)
       855,659,327      op_spec                                                              (66.87%)
       854,335,288      op_retired                                                           (66.87%)
     4,330,308,058      cpu_cycles                       #     27.1 %  frontend_bound        (66.99%)
    10,207,186,460      stall_slot_frontend                                                  (66.99%)
     4,316,583,673      cpu_cycles                       #     69.4 %  backend_bound         (66.65%)
    14,979,136,808      stall_slot_backend                                                   (66.65%)

       0.572056818 seconds time elapsed

       1.572143000 seconds user
       0.004010000 seconds sys


Jing Zhang (6):
  perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
  perf vendor events arm64: Add TLB metrics for neoverse-n2
  perf vendor events arm64: Add cache metrics for neoverse-n2
  perf vendor events arm64: Add branch metrics for neoverse-n2
  perf vendor events arm64: Add PE utilization metrics for neoverse-n2
  perf vendor events arm64: Add instruction mix metrics for neoverse-n2

 .../arch/arm64/arm/neoverse-n2/metrics.json        | 310 +++++++++++++++++++++
 1 file changed, 310 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v3 0/6] Add metrics for neoverse-n2
@ 2022-11-24 17:14     ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-24 17:14 UTC (permalink / raw)
  To: John Garry, Ian Rogers, Xing Zhengjun, Will Deacon, James Clark,
	Mike Leach, Leo Yan
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song, Jing Zhang

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 5386 bytes --]

Changes since v2:
- Correct the furmula of Branch metrics;
- Add more PE utilization metrics;
- Add more TLB metrics;
- Add “ScaleUnit” for some metrics;
- Add a newline at the end of the file;
- Link: https://lore.kernel.org/all/1668411720-3581-1-git-send-email-renyu.zj@linux.alibaba.com/

Changes since v1: 
- Corrected formula for topdown L1 due to wrong counts for stall_slot and
  stall_slot_frontend; 
- Link: https://lore.kernel.org/all/1667214694-89839-1-git-send-email-renyu.zj@linux.alibaba.com/

This series add six metricgroups for neoverse-n2, among which, the formula of
topdown L1 is from ARM sbsa7.0 platform design document [0], D37-38.

However, due to the wrong count of stall_slot and stall_slot_frontend on
neoverse-n2, the real stall_slot and real stall_slot_frontend need to
subtract cpu_cycles,  so correct the expression of topdown metrics.
Reference from ARM neoverse-n2 errata notice [1], D117.

Since neoverse-n2 does not yet support topdown L2, metricgroups such as Cache,
TLB, Branch, InstructionsMix, and PEutilization are added to help further
analysis of performance bottlenecks.
Reference from ARM PMU guide [2][3].

[0] https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
[1] https://documentation-service.arm.com/static/636a66a64e6cf12278ad89cb?token=
[2] https://documentation-service.arm.com/static/628f8fa3dfaf015c2b76eae8?token=
[3] https://documentation-service.arm.com/static/62cfe21e31ea212bb6627393?token=

$./perf list

...
Metric Groups:

Branch:
  branch_miss_pred_rate
       [The rate of branches mis-predited to the overall branches]
  branch_mpki
       [The rate of branches mis-predicted per kilo instructions]
  branch_pki
       [The rate of branches retired per kilo instructions]
Cache:
  l1d_cache_miss_rate
       [The rate of L1 D-Cache misses to the overall L1 D-Cache]
  l1d_cache_mpki
       [The rate of L1 D-Cache misses per kilo instructions]
...

$sudo ./perf stat -M TLB false_sharing 2

 Performance counter stats for '/home/yaoxing/beetest/beecases/beeusers/spe/false_sharing 2':

            31,561      L2D_TLB                          #     18.8 %  l2_tlb_miss_rate      (43.23%)
             5,944      L2D_TLB_REFILL                                                       (43.23%)
             2,248      L1I_TLB_REFILL                   #      0.1 %  l1i_tlb_miss_rate     (43.85%)
         2,203,195      L1I_TLB                                                              (43.85%)
       328,647,380      L1D_TLB                          #      0.0 %  l1d_tlb_miss_rate     (44.32%)
            26,347      L1D_TLB_REFILL                                                       (44.32%)
           747,319      L1I_TLB                          #      0.0 %  itlb_walk_rate        (43.74%)
               310      ITLB_WALK                                                            (43.74%)
       839,420,454      INST_RETIRED                     #     0.00 itlb_mpki                (42.77%)
               212      ITLB_WALK                                                            (42.77%)
               468      DTLB_WALK                        #      0.0 %  dtlb_walk_rate        (42.28%)
       265,405,802      L1D_TLB                                                              (42.28%)
       790,874,367      INST_RETIRED                     #     0.00 dtlb_mpki                (42.33%)
                23      DTLB_WALK                                                            (42.33%)

       0.515904553 seconds time elapsed

       1.410313000 seconds user
       0.000000000 seconds sys


$sudo ./perf stat -M TopDownL1 false_sharing 2

 Performance counter stats for '/home/yaoxing/beetest/beecases/beeusers/spe/false_sharing 2':

     4,310,905,590      cpu_cycles                       #      0.0 %  bad_speculation
                                                  #      4.0 %  retiring              (66.87%)
    25,009,763,735      stall_slot                                                           (66.87%)
       855,659,327      op_spec                                                              (66.87%)
       854,335,288      op_retired                                                           (66.87%)
     4,330,308,058      cpu_cycles                       #     27.1 %  frontend_bound        (66.99%)
    10,207,186,460      stall_slot_frontend                                                  (66.99%)
     4,316,583,673      cpu_cycles                       #     69.4 %  backend_bound         (66.65%)
    14,979,136,808      stall_slot_backend                                                   (66.65%)

       0.572056818 seconds time elapsed

       1.572143000 seconds user
       0.004010000 seconds sys


Jing Zhang (6):
  perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
  perf vendor events arm64: Add TLB metrics for neoverse-n2
  perf vendor events arm64: Add cache metrics for neoverse-n2
  perf vendor events arm64: Add branch metrics for neoverse-n2
  perf vendor events arm64: Add PE utilization metrics for neoverse-n2
  perf vendor events arm64: Add instruction mix metrics for neoverse-n2

 .../arch/arm64/arm/neoverse-n2/metrics.json        | 310 +++++++++++++++++++++
 1 file changed, 310 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json

-- 
1.8.3.1



[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v3 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
  2022-11-14  7:41   ` Jing Zhang
@ 2022-11-24 17:14     ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-24 17:14 UTC (permalink / raw)
  To: John Garry, Ian Rogers, Xing Zhengjun, Will Deacon, James Clark,
	Mike Leach, Leo Yan
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song, Jing Zhang

The formula of topdown L1 on neoverse-n2 is from ARM sbsa7.0 platform
design document [0], D37-38.

However, due to the wrong count of stall_slot and stall_slot_frontend on
neoverse-n2, the real stall_slot and real stall_slot_frontend need to
subtract cpu_cycles,  so correct the expression of topdown metrics.
Reference from ARM neoverse-n2 errata notice [1], D117.

Since neoverse-n2 does not yet support topdown L2, metricgroups such as
Cache, TLB, Branch, InstructionsMix, and PEutilization will be added to
further analysis of performance bottlenecks in the following patches.
Reference from ARM PMU guide [2][3].

[0] https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
[1] https://documentation-service.arm.com/static/636a66a64e6cf12278ad89cb?token=
[2] https://documentation-service.arm.com/static/628f8fa3dfaf015c2b76eae8?token=
[3] https://documentation-service.arm.com/static/62cfe21e31ea212bb6627393?token=

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 34 ++++++++++++++++++++++
 1 file changed, 34 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
new file mode 100644
index 0000000..8628140
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -0,0 +1,34 @@
+[
+    {
+        "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * cpu_cycles)",
+        "PublicDescription": "Frontend bound L1 topdown metric",
+        "BriefDescription": "Frontend bound L1 topdown metric",
+        "MetricGroup": "TopdownL1",
+        "MetricName": "frontend_bound",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "(1 - op_retired / op_spec) * (1 - (stall_slot - cpu_cycles) / (5 * cpu_cycles))",
+        "PublicDescription": "Bad speculation L1 topdown metric",
+        "BriefDescription": "Bad speculation L1 topdown metric",
+        "MetricGroup": "TopdownL1",
+        "MetricName": "bad_speculation",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "(op_retired / op_spec) * (1 - (stall_slot - cpu_cycles) / (5 * cpu_cycles))",
+        "PublicDescription": "Retiring L1 topdown metric",
+        "BriefDescription": "Retiring L1 topdown metric",
+        "MetricGroup": "TopdownL1",
+        "MetricName": "retiring",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "stall_slot_backend / (5 * cpu_cycles)",
+        "PublicDescription": "Backend Bound L1 topdown metric",
+        "BriefDescription": "Backend Bound L1 topdown metric",
+        "MetricGroup": "TopdownL1",
+        "MetricName": "backend_bound",
+        "ScaleUnit": "100%"
+    }
+]
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v3 1/6] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
@ 2022-11-24 17:14     ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-24 17:14 UTC (permalink / raw)
  To: John Garry, Ian Rogers, Xing Zhengjun, Will Deacon, James Clark,
	Mike Leach, Leo Yan
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song, Jing Zhang

The formula of topdown L1 on neoverse-n2 is from ARM sbsa7.0 platform
design document [0], D37-38.

However, due to the wrong count of stall_slot and stall_slot_frontend on
neoverse-n2, the real stall_slot and real stall_slot_frontend need to
subtract cpu_cycles,  so correct the expression of topdown metrics.
Reference from ARM neoverse-n2 errata notice [1], D117.

Since neoverse-n2 does not yet support topdown L2, metricgroups such as
Cache, TLB, Branch, InstructionsMix, and PEutilization will be added to
further analysis of performance bottlenecks in the following patches.
Reference from ARM PMU guide [2][3].

[0] https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
[1] https://documentation-service.arm.com/static/636a66a64e6cf12278ad89cb?token=
[2] https://documentation-service.arm.com/static/628f8fa3dfaf015c2b76eae8?token=
[3] https://documentation-service.arm.com/static/62cfe21e31ea212bb6627393?token=

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 34 ++++++++++++++++++++++
 1 file changed, 34 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
new file mode 100644
index 0000000..8628140
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -0,0 +1,34 @@
+[
+    {
+        "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * cpu_cycles)",
+        "PublicDescription": "Frontend bound L1 topdown metric",
+        "BriefDescription": "Frontend bound L1 topdown metric",
+        "MetricGroup": "TopdownL1",
+        "MetricName": "frontend_bound",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "(1 - op_retired / op_spec) * (1 - (stall_slot - cpu_cycles) / (5 * cpu_cycles))",
+        "PublicDescription": "Bad speculation L1 topdown metric",
+        "BriefDescription": "Bad speculation L1 topdown metric",
+        "MetricGroup": "TopdownL1",
+        "MetricName": "bad_speculation",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "(op_retired / op_spec) * (1 - (stall_slot - cpu_cycles) / (5 * cpu_cycles))",
+        "PublicDescription": "Retiring L1 topdown metric",
+        "BriefDescription": "Retiring L1 topdown metric",
+        "MetricGroup": "TopdownL1",
+        "MetricName": "retiring",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "stall_slot_backend / (5 * cpu_cycles)",
+        "PublicDescription": "Backend Bound L1 topdown metric",
+        "BriefDescription": "Backend Bound L1 topdown metric",
+        "MetricGroup": "TopdownL1",
+        "MetricName": "backend_bound",
+        "ScaleUnit": "100%"
+    }
+]
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v3 2/6] perf vendor events arm64: Add TLB metrics for neoverse-n2
  2022-11-14  7:41   ` Jing Zhang
@ 2022-11-24 17:14     ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-24 17:14 UTC (permalink / raw)
  To: John Garry, Ian Rogers, Xing Zhengjun, Will Deacon, James Clark,
	Mike Leach, Leo Yan
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song, Jing Zhang

Add TLB related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 54 ++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index 8628140..bb19960 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -30,5 +30,59 @@
         "MetricGroup": "TopdownL1",
         "MetricName": "backend_bound",
         "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "L1D_TLB_REFILL / L1D_TLB",
+        "PublicDescription": "The rate of L1D TLB refill to the overall L1D TLB lookups",
+        "BriefDescription": "L1 data TLB miss rate",
+        "MetricGroup": "TLB",
+        "MetricName": "l1d_tlb_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "L1I_TLB_REFILL / L1I_TLB",
+        "PublicDescription": "The rate of L1I TLB refill to the overall L1I TLB lookups",
+        "BriefDescription": "L1 instruction TLB miss rate",
+        "MetricGroup": "TLB",
+        "MetricName": "l1i_tlb_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "L2D_TLB_REFILL / L2D_TLB",
+        "PublicDescription": "The rate of L2D TLB refill to the overall L2D TLB lookups",
+        "BriefDescription": "L2 TLB miss rate",
+        "MetricGroup": "TLB",
+        "MetricName": "l2_tlb_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "DTLB_WALK / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of TLB Walks per kilo instructions for data accesses",
+        "BriefDescription": "The rate of TLB Walks per kilo instructions for data accesses",
+        "MetricGroup": "TLB",
+        "MetricName": "dtlb_mpki"
+    },
+    {
+        "MetricExpr": "DTLB_WALK / L1D_TLB",
+        "PublicDescription": "The rate of DTLB Walks to the overall L1D TLB lookups",
+        "BriefDescription": "D-side page table walk rate",
+        "MetricGroup": "TLB",
+        "MetricName": "dtlb_walk_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "ITLB_WALK / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of TLB Walks per kilo instructions for instruction accesses",
+        "BriefDescription": "The rate of TLB Walks per kilo instructions for instruction accesses",
+        "MetricGroup": "TLB",
+        "MetricName": "itlb_mpki"
+    },
+    {
+        "MetricExpr": "ITLB_WALK / L1I_TLB",
+        "PublicDescription": "The rate of ITLB Walks to the overall L1I TLB lookups",
+        "BriefDescription": "I-side page table walk rate",
+        "MetricGroup": "TLB",
+        "MetricName": "itlb_walk_rate",
+        "ScaleUnit": "100%"
     }
 ]
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v3 2/6] perf vendor events arm64: Add TLB metrics for neoverse-n2
@ 2022-11-24 17:14     ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-24 17:14 UTC (permalink / raw)
  To: John Garry, Ian Rogers, Xing Zhengjun, Will Deacon, James Clark,
	Mike Leach, Leo Yan
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song, Jing Zhang

Add TLB related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 54 ++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index 8628140..bb19960 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -30,5 +30,59 @@
         "MetricGroup": "TopdownL1",
         "MetricName": "backend_bound",
         "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "L1D_TLB_REFILL / L1D_TLB",
+        "PublicDescription": "The rate of L1D TLB refill to the overall L1D TLB lookups",
+        "BriefDescription": "L1 data TLB miss rate",
+        "MetricGroup": "TLB",
+        "MetricName": "l1d_tlb_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "L1I_TLB_REFILL / L1I_TLB",
+        "PublicDescription": "The rate of L1I TLB refill to the overall L1I TLB lookups",
+        "BriefDescription": "L1 instruction TLB miss rate",
+        "MetricGroup": "TLB",
+        "MetricName": "l1i_tlb_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "L2D_TLB_REFILL / L2D_TLB",
+        "PublicDescription": "The rate of L2D TLB refill to the overall L2D TLB lookups",
+        "BriefDescription": "L2 TLB miss rate",
+        "MetricGroup": "TLB",
+        "MetricName": "l2_tlb_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "DTLB_WALK / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of TLB Walks per kilo instructions for data accesses",
+        "BriefDescription": "The rate of TLB Walks per kilo instructions for data accesses",
+        "MetricGroup": "TLB",
+        "MetricName": "dtlb_mpki"
+    },
+    {
+        "MetricExpr": "DTLB_WALK / L1D_TLB",
+        "PublicDescription": "The rate of DTLB Walks to the overall L1D TLB lookups",
+        "BriefDescription": "D-side page table walk rate",
+        "MetricGroup": "TLB",
+        "MetricName": "dtlb_walk_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "ITLB_WALK / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of TLB Walks per kilo instructions for instruction accesses",
+        "BriefDescription": "The rate of TLB Walks per kilo instructions for instruction accesses",
+        "MetricGroup": "TLB",
+        "MetricName": "itlb_mpki"
+    },
+    {
+        "MetricExpr": "ITLB_WALK / L1I_TLB",
+        "PublicDescription": "The rate of ITLB Walks to the overall L1I TLB lookups",
+        "BriefDescription": "I-side page table walk rate",
+        "MetricGroup": "TLB",
+        "MetricName": "itlb_walk_rate",
+        "ScaleUnit": "100%"
     }
 ]
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v3 3/6] perf vendor events arm64: Add cache metrics for neoverse-n2
  2022-11-14  7:41   ` Jing Zhang
@ 2022-11-24 17:14     ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-24 17:14 UTC (permalink / raw)
  To: John Garry, Ian Rogers, Xing Zhengjun, Will Deacon, James Clark,
	Mike Leach, Leo Yan
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song, Jing Zhang

Add cache related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 83 ++++++++++++++++++++++
 1 file changed, 83 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index bb19960..20b5ad1 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -84,5 +84,88 @@
         "MetricGroup": "TLB",
         "MetricName": "itlb_walk_rate",
         "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "L1I_CACHE_REFILL / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of L1 I-Cache misses per kilo instructions",
+        "BriefDescription": "The rate of L1 I-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l1i_cache_mpki"
+    },
+    {
+        "MetricExpr": "L1I_CACHE_REFILL / L1I_CACHE",
+        "PublicDescription": "The rate of L1 I-Cache misses to the overall L1 I-Cache",
+        "BriefDescription": "The rate of L1 I-Cache misses to the overall L1 I-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l1i_cache_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "L1D_CACHE_REFILL / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of L1 D-Cache misses per kilo instructions",
+        "BriefDescription": "The rate of L1 D-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l1d_cache_mpki"
+    },
+    {
+        "MetricExpr": "L1D_CACHE_REFILL / L1D_CACHE",
+        "PublicDescription": "The rate of L1 D-Cache misses to the overall L1 D-Cache",
+        "BriefDescription": "The rate of L1 D-Cache misses to the overall L1 D-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l1d_cache_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "L2D_CACHE_REFILL / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of L2 D-Cache misses per kilo instructions",
+        "BriefDescription": "The rate of L2 D-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l2d_cache_mpki"
+    },
+    {
+        "MetricExpr": "L2D_CACHE_REFILL / L2D_CACHE",
+        "PublicDescription": "The rate of L2 D-Cache misses to the overall L2 D-Cache",
+        "BriefDescription": "The rate of L2 D-Cache misses to the overall L2 D-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l2d_cache_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "L3D_CACHE_REFILL / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of L3 D-Cache misses per kilo instructions",
+        "BriefDescription": "The rate of L3 D-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l3d_cache_mpki"
+    },
+    {
+        "MetricExpr": "L3D_CACHE_REFILL / L3D_CACHE",
+        "PublicDescription": "The rate of L3 D-Cache misses to the overall L3 D-Cache",
+        "BriefDescription": "The rate of L3 D-Cache misses to the overall L3 D-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l3d_cache_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "LL_CACHE_MISS_RD / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of LL Cache read misses per kilo instructions",
+        "BriefDescription": "The rate of LL Cache read misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "ll_cache_read_mpki"
+    },
+    {
+        "MetricExpr": "LL_CACHE_MISS_RD / LL_CACHE_RD",
+        "PublicDescription": "The rate of LL Cache read misses to the overall LL Cache read",
+        "BriefDescription": "The rate of LL Cache read misses to the overall LL Cache read",
+        "MetricGroup": "Cache",
+        "MetricName": "ll_cache_read_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "(LL_CACHE_RD - LL_CACHE_MISS_RD) / LL_CACHE_RD",
+        "PublicDescription": "The rate of LL Cache read hit to the overall LL Cache read",
+        "BriefDescription": "The rate of LL Cache read hit to the overall LL Cache read",
+        "MetricGroup": "Cache",
+        "MetricName": "ll_cache_read_hit_rate",
+        "ScaleUnit": "100%"
     }
 ]
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v3 3/6] perf vendor events arm64: Add cache metrics for neoverse-n2
@ 2022-11-24 17:14     ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-24 17:14 UTC (permalink / raw)
  To: John Garry, Ian Rogers, Xing Zhengjun, Will Deacon, James Clark,
	Mike Leach, Leo Yan
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song, Jing Zhang

Add cache related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 83 ++++++++++++++++++++++
 1 file changed, 83 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index bb19960..20b5ad1 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -84,5 +84,88 @@
         "MetricGroup": "TLB",
         "MetricName": "itlb_walk_rate",
         "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "L1I_CACHE_REFILL / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of L1 I-Cache misses per kilo instructions",
+        "BriefDescription": "The rate of L1 I-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l1i_cache_mpki"
+    },
+    {
+        "MetricExpr": "L1I_CACHE_REFILL / L1I_CACHE",
+        "PublicDescription": "The rate of L1 I-Cache misses to the overall L1 I-Cache",
+        "BriefDescription": "The rate of L1 I-Cache misses to the overall L1 I-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l1i_cache_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "L1D_CACHE_REFILL / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of L1 D-Cache misses per kilo instructions",
+        "BriefDescription": "The rate of L1 D-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l1d_cache_mpki"
+    },
+    {
+        "MetricExpr": "L1D_CACHE_REFILL / L1D_CACHE",
+        "PublicDescription": "The rate of L1 D-Cache misses to the overall L1 D-Cache",
+        "BriefDescription": "The rate of L1 D-Cache misses to the overall L1 D-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l1d_cache_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "L2D_CACHE_REFILL / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of L2 D-Cache misses per kilo instructions",
+        "BriefDescription": "The rate of L2 D-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l2d_cache_mpki"
+    },
+    {
+        "MetricExpr": "L2D_CACHE_REFILL / L2D_CACHE",
+        "PublicDescription": "The rate of L2 D-Cache misses to the overall L2 D-Cache",
+        "BriefDescription": "The rate of L2 D-Cache misses to the overall L2 D-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l2d_cache_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "L3D_CACHE_REFILL / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of L3 D-Cache misses per kilo instructions",
+        "BriefDescription": "The rate of L3 D-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l3d_cache_mpki"
+    },
+    {
+        "MetricExpr": "L3D_CACHE_REFILL / L3D_CACHE",
+        "PublicDescription": "The rate of L3 D-Cache misses to the overall L3 D-Cache",
+        "BriefDescription": "The rate of L3 D-Cache misses to the overall L3 D-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l3d_cache_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "LL_CACHE_MISS_RD / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of LL Cache read misses per kilo instructions",
+        "BriefDescription": "The rate of LL Cache read misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "ll_cache_read_mpki"
+    },
+    {
+        "MetricExpr": "LL_CACHE_MISS_RD / LL_CACHE_RD",
+        "PublicDescription": "The rate of LL Cache read misses to the overall LL Cache read",
+        "BriefDescription": "The rate of LL Cache read misses to the overall LL Cache read",
+        "MetricGroup": "Cache",
+        "MetricName": "ll_cache_read_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "(LL_CACHE_RD - LL_CACHE_MISS_RD) / LL_CACHE_RD",
+        "PublicDescription": "The rate of LL Cache read hit to the overall LL Cache read",
+        "BriefDescription": "The rate of LL Cache read hit to the overall LL Cache read",
+        "MetricGroup": "Cache",
+        "MetricName": "ll_cache_read_hit_rate",
+        "ScaleUnit": "100%"
     }
 ]
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v3 4/6] perf vendor events arm64: Add branch metrics for neoverse-n2
  2022-11-14  7:41   ` Jing Zhang
@ 2022-11-24 17:14     ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-24 17:14 UTC (permalink / raw)
  To: John Garry, Ian Rogers, Xing Zhengjun, Will Deacon, James Clark,
	Mike Leach, Leo Yan
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song, Jing Zhang

Add branch related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index 20b5ad1..23c7d62 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -167,5 +167,27 @@
         "MetricGroup": "Cache",
         "MetricName": "ll_cache_read_hit_rate",
         "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "BR_MIS_PRED_RETIRED / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of branches mis-predicted per kilo instructions",
+        "BriefDescription": "The rate of branches mis-predicted per kilo instructions",
+        "MetricGroup": "Branch",
+        "MetricName": "branch_mpki"
+    },
+    {
+        "MetricExpr": "BR_RETIRED / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of branches retired per kilo instructions",
+        "BriefDescription": "The rate of branches retired per kilo instructions",
+        "MetricGroup": "Branch",
+        "MetricName": "branch_pki"
+    },
+    {
+        "MetricExpr": "BR_MIS_PRED_RETIRED / BR_RETIRED",
+        "PublicDescription": "The rate of branches mis-predited to the overall branches",
+        "BriefDescription": "The rate of branches mis-predited to the overall branches",
+        "MetricGroup": "Branch",
+        "MetricName": "branch_miss_pred_rate",
+        "ScaleUnit": "100%"
     }
 ]
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v3 4/6] perf vendor events arm64: Add branch metrics for neoverse-n2
@ 2022-11-24 17:14     ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-24 17:14 UTC (permalink / raw)
  To: John Garry, Ian Rogers, Xing Zhengjun, Will Deacon, James Clark,
	Mike Leach, Leo Yan
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song, Jing Zhang

Add branch related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index 20b5ad1..23c7d62 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -167,5 +167,27 @@
         "MetricGroup": "Cache",
         "MetricName": "ll_cache_read_hit_rate",
         "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "BR_MIS_PRED_RETIRED / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of branches mis-predicted per kilo instructions",
+        "BriefDescription": "The rate of branches mis-predicted per kilo instructions",
+        "MetricGroup": "Branch",
+        "MetricName": "branch_mpki"
+    },
+    {
+        "MetricExpr": "BR_RETIRED / INST_RETIRED * 1000",
+        "PublicDescription": "The rate of branches retired per kilo instructions",
+        "BriefDescription": "The rate of branches retired per kilo instructions",
+        "MetricGroup": "Branch",
+        "MetricName": "branch_pki"
+    },
+    {
+        "MetricExpr": "BR_MIS_PRED_RETIRED / BR_RETIRED",
+        "PublicDescription": "The rate of branches mis-predited to the overall branches",
+        "BriefDescription": "The rate of branches mis-predited to the overall branches",
+        "MetricGroup": "Branch",
+        "MetricName": "branch_miss_pred_rate",
+        "ScaleUnit": "100%"
     }
 ]
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v3 5/6] perf vendor events arm64: Add PE utilization metrics for neoverse-n2
  2022-11-14  7:41   ` Jing Zhang
@ 2022-11-24 17:14     ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-24 17:14 UTC (permalink / raw)
  To: John Garry, Ian Rogers, Xing Zhengjun, Will Deacon, James Clark,
	Mike Leach, Leo Yan
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song, Jing Zhang

Add PE utilization related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 45 ++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index 23c7d62..7b54819 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -189,5 +189,50 @@
         "MetricGroup": "Branch",
         "MetricName": "branch_miss_pred_rate",
         "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "instructions / CPU_CYCLES",
+        "PublicDescription": "The average number of instructions executed for each cycle.",
+        "BriefDescription": "Instructions per cycle",
+        "MetricGroup": "PEutilization",
+        "MetricName": "ipc"
+    },
+    {
+        "MetricExpr": "INST_RETIRED / CPU_CYCLES",
+        "PublicDescription": "Architecturally executed Instructions Per Cycle (IPC)",
+        "BriefDescription": "Architecturally executed Instructions Per Cycle (IPC)",
+        "MetricGroup": "PEutilization",
+        "MetricName": "retired_ipc"
+    },
+    {
+        "MetricExpr": "INST_SPEC / CPU_CYCLES",
+        "PublicDescription": "Speculatively executed Instructions Per Cycle (IPC)",
+        "BriefDescription": "Speculatively executed Instructions Per Cycle (IPC)",
+        "MetricGroup": "PEutilization",
+        "MetricName": "spec_ipc"
+    },
+    {
+        "MetricExpr": "OP_RETIRED / OP_SPEC",
+        "PublicDescription": "Fraction of operations retired",
+        "BriefDescription": "Fraction of operations retired",
+        "MetricGroup": "PEutilization",
+        "MetricName": "retired_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "1 - OP_RETIRED / OP_SPEC",
+        "PublicDescription": "Fraction of operations wasted",
+        "BriefDescription": "Fraction of operations wasted",
+        "MetricGroup": "PEutilization",
+        "MetricName": "wasted_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "OP_RETIRED / OP_SPEC * (1 - (STALL_SLOT - CPU_CYCLES) / (CPU_CYCLES * 5))",
+        "PublicDescription": "Utilization of CPU",
+        "BriefDescription": "Utilization of CPU",
+        "MetricGroup": "PEutilization",
+        "MetricName": "cpu_utilization",
+        "ScaleUnit": "100%"
     }
 ]
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v3 5/6] perf vendor events arm64: Add PE utilization metrics for neoverse-n2
@ 2022-11-24 17:14     ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-24 17:14 UTC (permalink / raw)
  To: John Garry, Ian Rogers, Xing Zhengjun, Will Deacon, James Clark,
	Mike Leach, Leo Yan
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song, Jing Zhang

Add PE utilization related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 45 ++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index 23c7d62..7b54819 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -189,5 +189,50 @@
         "MetricGroup": "Branch",
         "MetricName": "branch_miss_pred_rate",
         "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "instructions / CPU_CYCLES",
+        "PublicDescription": "The average number of instructions executed for each cycle.",
+        "BriefDescription": "Instructions per cycle",
+        "MetricGroup": "PEutilization",
+        "MetricName": "ipc"
+    },
+    {
+        "MetricExpr": "INST_RETIRED / CPU_CYCLES",
+        "PublicDescription": "Architecturally executed Instructions Per Cycle (IPC)",
+        "BriefDescription": "Architecturally executed Instructions Per Cycle (IPC)",
+        "MetricGroup": "PEutilization",
+        "MetricName": "retired_ipc"
+    },
+    {
+        "MetricExpr": "INST_SPEC / CPU_CYCLES",
+        "PublicDescription": "Speculatively executed Instructions Per Cycle (IPC)",
+        "BriefDescription": "Speculatively executed Instructions Per Cycle (IPC)",
+        "MetricGroup": "PEutilization",
+        "MetricName": "spec_ipc"
+    },
+    {
+        "MetricExpr": "OP_RETIRED / OP_SPEC",
+        "PublicDescription": "Fraction of operations retired",
+        "BriefDescription": "Fraction of operations retired",
+        "MetricGroup": "PEutilization",
+        "MetricName": "retired_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "1 - OP_RETIRED / OP_SPEC",
+        "PublicDescription": "Fraction of operations wasted",
+        "BriefDescription": "Fraction of operations wasted",
+        "MetricGroup": "PEutilization",
+        "MetricName": "wasted_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "OP_RETIRED / OP_SPEC * (1 - (STALL_SLOT - CPU_CYCLES) / (CPU_CYCLES * 5))",
+        "PublicDescription": "Utilization of CPU",
+        "BriefDescription": "Utilization of CPU",
+        "MetricGroup": "PEutilization",
+        "MetricName": "cpu_utilization",
+        "ScaleUnit": "100%"
     }
 ]
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v3 6/6] perf vendor events arm64: Add instruction mix metrics for neoverse-n2
  2022-11-14  7:41   ` Jing Zhang
@ 2022-11-24 17:14     ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-24 17:14 UTC (permalink / raw)
  To: John Garry, Ian Rogers, Xing Zhengjun, Will Deacon, James Clark,
	Mike Leach, Leo Yan
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song, Jing Zhang

Add instruction mix related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 72 ++++++++++++++++++++++
 1 file changed, 72 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index 7b54819..20d46be 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -234,5 +234,77 @@
         "MetricGroup": "PEutilization",
         "MetricName": "cpu_utilization",
         "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "LD_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of load instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of load instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "load_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "ST_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of store instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of store instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "store_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "DP_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of integer data-processing instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of integer data-processing instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "data_process_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "ASE_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of advanced SIMD instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of advanced SIMD instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "advanced_simd_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "VFP_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of floating point instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of floating point instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "float_point_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "CRYPTO_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of crypto instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of crypto instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "crypto_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "BR_IMMED_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of branch immediate instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of branch immediate instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "branch_immed_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "BR_RETURN_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of procedure return instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of procedure return instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "branch_return_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "BR_INDIRECT_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of indirect branch instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of indirect branch instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "branch_indirect_spec_rate",
+        "ScaleUnit": "100%"
     }
 ]
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v3 6/6] perf vendor events arm64: Add instruction mix metrics for neoverse-n2
@ 2022-11-24 17:14     ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-11-24 17:14 UTC (permalink / raw)
  To: John Garry, Ian Rogers, Xing Zhengjun, Will Deacon, James Clark,
	Mike Leach, Leo Yan
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song, Jing Zhang

Add instruction mix related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
---
 .../arch/arm64/arm/neoverse-n2/metrics.json        | 72 ++++++++++++++++++++++
 1 file changed, 72 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
index 7b54819..20d46be 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
@@ -234,5 +234,77 @@
         "MetricGroup": "PEutilization",
         "MetricName": "cpu_utilization",
         "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "LD_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of load instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of load instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "load_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "ST_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of store instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of store instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "store_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "DP_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of integer data-processing instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of integer data-processing instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "data_process_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "ASE_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of advanced SIMD instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of advanced SIMD instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "advanced_simd_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "VFP_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of floating point instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of floating point instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "float_point_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "CRYPTO_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of crypto instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of crypto instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "crypto_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "BR_IMMED_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of branch immediate instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of branch immediate instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "branch_immed_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "BR_RETURN_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of procedure return instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of procedure return instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "branch_return_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "BR_INDIRECT_SPEC / INST_SPEC",
+        "PublicDescription": "The rate of indirect branch instructions speculatively executed to overall instructions speclatively executed",
+        "BriefDescription": "The rate of indirect branch instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "branch_indirect_spec_rate",
+        "ScaleUnit": "100%"
     }
 ]
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* Re: [PATCH v3 5/6] perf vendor events arm64: Add PE utilization metrics for neoverse-n2
  2022-11-24 17:14     ` Jing Zhang
@ 2022-11-30 18:58       ` Ian Rogers
  -1 siblings, 0 replies; 96+ messages in thread
From: Ian Rogers @ 2022-11-30 18:58 UTC (permalink / raw)
  To: Jing Zhang
  Cc: John Garry, Xing Zhengjun, Will Deacon, James Clark, Mike Leach,
	Leo Yan, linux-arm-kernel, linux-perf-users, linux-kernel,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song

On Thu, Nov 24, 2022 at 9:15 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>
> Add PE utilization related metrics.
>
> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
> ---
>  .../arch/arm64/arm/neoverse-n2/metrics.json        | 45 ++++++++++++++++++++++
>  1 file changed, 45 insertions(+)
>
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> index 23c7d62..7b54819 100644
> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> @@ -189,5 +189,50 @@
>          "MetricGroup": "Branch",
>          "MetricName": "branch_miss_pred_rate",
>          "ScaleUnit": "100%"
> +    },
> +    {
> +        "MetricExpr": "instructions / CPU_CYCLES",
> +        "PublicDescription": "The average number of instructions executed for each cycle.",
> +        "BriefDescription": "Instructions per cycle",
> +        "MetricGroup": "PEutilization",
> +        "MetricName": "ipc"
> +    },

A related useful metric is percentage of peak, so if the peak IPC is 8
(usually a constant related to the number of functional units) then
you can just compute the ratio of IPC with this.

> +    {
> +        "MetricExpr": "INST_RETIRED / CPU_CYCLES",
> +        "PublicDescription": "Architecturally executed Instructions Per Cycle (IPC)",
> +        "BriefDescription": "Architecturally executed Instructions Per Cycle (IPC)",


The duplicated descriptions are unnecessary. Drop the public one for
consistency with what we do for Intel:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py#L299

> +        "MetricGroup": "PEutilization",
> +        "MetricName": "retired_ipc"
> +    },
> +    {
> +        "MetricExpr": "INST_SPEC / CPU_CYCLES",
> +        "PublicDescription": "Speculatively executed Instructions Per Cycle (IPC)",
> +        "BriefDescription": "Speculatively executed Instructions Per Cycle (IPC)",
> +        "MetricGroup": "PEutilization",
> +        "MetricName": "spec_ipc"
> +    },
> +    {
> +        "MetricExpr": "OP_RETIRED / OP_SPEC",
> +        "PublicDescription": "Fraction of operations retired",
> +        "BriefDescription": "Fraction of operations retired",

Would instructions be clearer than operations here?

> +        "MetricGroup": "PEutilization",
> +        "MetricName": "retired_rate",
> +        "ScaleUnit": "100%"
> +    },
> +    {
> +        "MetricExpr": "1 - OP_RETIRED / OP_SPEC",

Should OP_RETIRED be greater than OP_SPEC? In which case won't this
metric be negative?

> +        "PublicDescription": "Fraction of operations wasted",
> +        "BriefDescription": "Fraction of operations wasted",
> +        "MetricGroup": "PEutilization",
> +        "MetricName": "wasted_rate",
> +        "ScaleUnit": "100%"
> +    },
> +    {
> +        "MetricExpr": "OP_RETIRED / OP_SPEC * (1 - (STALL_SLOT - CPU_CYCLES) / (CPU_CYCLES * 5))",
> +        "PublicDescription": "Utilization of CPU",
> +        "BriefDescription": "Utilization of CPU",

Some more detail in the description would be useful.

> +        "MetricGroup": "PEutilization",
> +        "MetricName": "cpu_utilization",
> +        "ScaleUnit": "100%"
>      }
>  ]
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v3 5/6] perf vendor events arm64: Add PE utilization metrics for neoverse-n2
@ 2022-11-30 18:58       ` Ian Rogers
  0 siblings, 0 replies; 96+ messages in thread
From: Ian Rogers @ 2022-11-30 18:58 UTC (permalink / raw)
  To: Jing Zhang
  Cc: John Garry, Xing Zhengjun, Will Deacon, James Clark, Mike Leach,
	Leo Yan, linux-arm-kernel, linux-perf-users, linux-kernel,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song

On Thu, Nov 24, 2022 at 9:15 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>
> Add PE utilization related metrics.
>
> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
> ---
>  .../arch/arm64/arm/neoverse-n2/metrics.json        | 45 ++++++++++++++++++++++
>  1 file changed, 45 insertions(+)
>
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> index 23c7d62..7b54819 100644
> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> @@ -189,5 +189,50 @@
>          "MetricGroup": "Branch",
>          "MetricName": "branch_miss_pred_rate",
>          "ScaleUnit": "100%"
> +    },
> +    {
> +        "MetricExpr": "instructions / CPU_CYCLES",
> +        "PublicDescription": "The average number of instructions executed for each cycle.",
> +        "BriefDescription": "Instructions per cycle",
> +        "MetricGroup": "PEutilization",
> +        "MetricName": "ipc"
> +    },

A related useful metric is percentage of peak, so if the peak IPC is 8
(usually a constant related to the number of functional units) then
you can just compute the ratio of IPC with this.

> +    {
> +        "MetricExpr": "INST_RETIRED / CPU_CYCLES",
> +        "PublicDescription": "Architecturally executed Instructions Per Cycle (IPC)",
> +        "BriefDescription": "Architecturally executed Instructions Per Cycle (IPC)",


The duplicated descriptions are unnecessary. Drop the public one for
consistency with what we do for Intel:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py#L299

> +        "MetricGroup": "PEutilization",
> +        "MetricName": "retired_ipc"
> +    },
> +    {
> +        "MetricExpr": "INST_SPEC / CPU_CYCLES",
> +        "PublicDescription": "Speculatively executed Instructions Per Cycle (IPC)",
> +        "BriefDescription": "Speculatively executed Instructions Per Cycle (IPC)",
> +        "MetricGroup": "PEutilization",
> +        "MetricName": "spec_ipc"
> +    },
> +    {
> +        "MetricExpr": "OP_RETIRED / OP_SPEC",
> +        "PublicDescription": "Fraction of operations retired",
> +        "BriefDescription": "Fraction of operations retired",

Would instructions be clearer than operations here?

> +        "MetricGroup": "PEutilization",
> +        "MetricName": "retired_rate",
> +        "ScaleUnit": "100%"
> +    },
> +    {
> +        "MetricExpr": "1 - OP_RETIRED / OP_SPEC",

Should OP_RETIRED be greater than OP_SPEC? In which case won't this
metric be negative?

> +        "PublicDescription": "Fraction of operations wasted",
> +        "BriefDescription": "Fraction of operations wasted",
> +        "MetricGroup": "PEutilization",
> +        "MetricName": "wasted_rate",
> +        "ScaleUnit": "100%"
> +    },
> +    {
> +        "MetricExpr": "OP_RETIRED / OP_SPEC * (1 - (STALL_SLOT - CPU_CYCLES) / (CPU_CYCLES * 5))",
> +        "PublicDescription": "Utilization of CPU",
> +        "BriefDescription": "Utilization of CPU",

Some more detail in the description would be useful.

> +        "MetricGroup": "PEutilization",
> +        "MetricName": "cpu_utilization",
> +        "ScaleUnit": "100%"
>      }
>  ]
> --
> 1.8.3.1
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v3 5/6] perf vendor events arm64: Add PE utilization metrics for neoverse-n2
  2022-11-30 18:58       ` Ian Rogers
@ 2022-12-01 11:08         ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-12-01 11:08 UTC (permalink / raw)
  To: Ian Rogers
  Cc: John Garry, Xing Zhengjun, Will Deacon, James Clark, Mike Leach,
	Leo Yan, linux-arm-kernel, linux-perf-users, linux-kernel,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song



在 2022/12/1 上午2:58, Ian Rogers 写道:
> On Thu, Nov 24, 2022 at 9:15 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>>
>> Add PE utilization related metrics.
>>
>> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
>> ---
>>  .../arch/arm64/arm/neoverse-n2/metrics.json        | 45 ++++++++++++++++++++++
>>  1 file changed, 45 insertions(+)
>>
>> diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> index 23c7d62..7b54819 100644
>> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> @@ -189,5 +189,50 @@
>>          "MetricGroup": "Branch",
>>          "MetricName": "branch_miss_pred_rate",
>>          "ScaleUnit": "100%"
>> +    },
>> +    {
>> +        "MetricExpr": "instructions / CPU_CYCLES",
>> +        "PublicDescription": "The average number of instructions executed for each cycle.",
>> +        "BriefDescription": "Instructions per cycle",
>> +        "MetricGroup": "PEutilization",
>> +        "MetricName": "ipc"
>> +    },
> 
> A related useful metric is percentage of peak, so if the peak IPC is 8
> (usually a constant related to the number of functional units) then
> you can just compute the ratio of IPC with this.
> 

Glad to discuss these with you.
The peak ipc value of neoverse-n2 is 5. Maybe I should add an ipc_rate metric?

>> +    {
>> +        "MetricExpr": "INST_RETIRED / CPU_CYCLES",
>> +        "PublicDescription": "Architecturally executed Instructions Per Cycle (IPC)",
>> +        "BriefDescription": "Architecturally executed Instructions Per Cycle (IPC)",
> 
> 
> The duplicated descriptions are unnecessary. Drop the public one for
> consistency with what we do for Intel:
> https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py#L299
> 

Sounds good, will do.

>> +        "MetricGroup": "PEutilization",
>> +        "MetricName": "retired_ipc"
>> +    },
>> +    {
>> +        "MetricExpr": "INST_SPEC / CPU_CYCLES",
>> +        "PublicDescription": "Speculatively executed Instructions Per Cycle (IPC)",
>> +        "BriefDescription": "Speculatively executed Instructions Per Cycle (IPC)",
>> +        "MetricGroup": "PEutilization",
>> +        "MetricName": "spec_ipc"
>> +    },
>> +    {
>> +        "MetricExpr": "OP_RETIRED / OP_SPEC",
>> +        "PublicDescription": "Fraction of operations retired",
>> +        "BriefDescription": "Fraction of operations retired",
> 
> Would instructions be clearer than operations here?
> 

operation and instruction are different. OP_RETIRED counts any operation (not instruction)
that has been architecturally executed, For example, speculatively executed operations that
have been abandoned for a branch mispredict will not be counted. So I think operation might
be more accurate.

>> +        "MetricGroup": "PEutilization",
>> +        "MetricName": "retired_rate",
>> +        "ScaleUnit": "100%"
>> +    },
>> +    {
>> +        "MetricExpr": "1 - OP_RETIRED / OP_SPEC",
> 
> Should OP_RETIRED be greater than OP_SPEC? In which case won't this
> metric be negative?
> 

OP_RETIRED will not be greater than OP_SPEC. OP_SPEC counts any operation that has been
speculatively executed. OP_SPEC is a superset of the OP_RETIRED event. There is a
description about OP_SPEC and OP_RETIRED in this neoverse-n2 document.
Link: https://documentation-service.arm.com/static/62cfe21e31ea212bb6627393?token=

>> +        "PublicDescription": "Fraction of operations wasted",
>> +        "BriefDescription": "Fraction of operations wasted",
>> +        "MetricGroup": "PEutilization",
>> +        "MetricName": "wasted_rate",
>> +        "ScaleUnit": "100%"
>> +    },
>> +    {
>> +        "MetricExpr": "OP_RETIRED / OP_SPEC * (1 - (STALL_SLOT - CPU_CYCLES) / (CPU_CYCLES * 5))",
>> +        "PublicDescription": "Utilization of CPU",
>> +        "BriefDescription": "Utilization of CPU",
> 
> Some more detail in the description would be useful.
> 

Ok, I'll describe it in more detail. CPU_utilization reflects the truly effective ratio of operation
executed by the CPU, which means that misprediction and stall are not included. Note that stall_slot
minus cpu_cycles is a correction to the stall_slot error count.

>> +        "MetricGroup": "PEutilization",
>> +        "MetricName": "cpu_utilization",
>> +        "ScaleUnit": "100%"
>>      }
>>  ]
>> --
>> 1.8.3.1
>>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v3 5/6] perf vendor events arm64: Add PE utilization metrics for neoverse-n2
@ 2022-12-01 11:08         ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-12-01 11:08 UTC (permalink / raw)
  To: Ian Rogers
  Cc: John Garry, Xing Zhengjun, Will Deacon, James Clark, Mike Leach,
	Leo Yan, linux-arm-kernel, linux-perf-users, linux-kernel,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song



在 2022/12/1 上午2:58, Ian Rogers 写道:
> On Thu, Nov 24, 2022 at 9:15 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>>
>> Add PE utilization related metrics.
>>
>> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
>> ---
>>  .../arch/arm64/arm/neoverse-n2/metrics.json        | 45 ++++++++++++++++++++++
>>  1 file changed, 45 insertions(+)
>>
>> diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> index 23c7d62..7b54819 100644
>> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>> @@ -189,5 +189,50 @@
>>          "MetricGroup": "Branch",
>>          "MetricName": "branch_miss_pred_rate",
>>          "ScaleUnit": "100%"
>> +    },
>> +    {
>> +        "MetricExpr": "instructions / CPU_CYCLES",
>> +        "PublicDescription": "The average number of instructions executed for each cycle.",
>> +        "BriefDescription": "Instructions per cycle",
>> +        "MetricGroup": "PEutilization",
>> +        "MetricName": "ipc"
>> +    },
> 
> A related useful metric is percentage of peak, so if the peak IPC is 8
> (usually a constant related to the number of functional units) then
> you can just compute the ratio of IPC with this.
> 

Glad to discuss these with you.
The peak ipc value of neoverse-n2 is 5. Maybe I should add an ipc_rate metric?

>> +    {
>> +        "MetricExpr": "INST_RETIRED / CPU_CYCLES",
>> +        "PublicDescription": "Architecturally executed Instructions Per Cycle (IPC)",
>> +        "BriefDescription": "Architecturally executed Instructions Per Cycle (IPC)",
> 
> 
> The duplicated descriptions are unnecessary. Drop the public one for
> consistency with what we do for Intel:
> https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py#L299
> 

Sounds good, will do.

>> +        "MetricGroup": "PEutilization",
>> +        "MetricName": "retired_ipc"
>> +    },
>> +    {
>> +        "MetricExpr": "INST_SPEC / CPU_CYCLES",
>> +        "PublicDescription": "Speculatively executed Instructions Per Cycle (IPC)",
>> +        "BriefDescription": "Speculatively executed Instructions Per Cycle (IPC)",
>> +        "MetricGroup": "PEutilization",
>> +        "MetricName": "spec_ipc"
>> +    },
>> +    {
>> +        "MetricExpr": "OP_RETIRED / OP_SPEC",
>> +        "PublicDescription": "Fraction of operations retired",
>> +        "BriefDescription": "Fraction of operations retired",
> 
> Would instructions be clearer than operations here?
> 

operation and instruction are different. OP_RETIRED counts any operation (not instruction)
that has been architecturally executed, For example, speculatively executed operations that
have been abandoned for a branch mispredict will not be counted. So I think operation might
be more accurate.

>> +        "MetricGroup": "PEutilization",
>> +        "MetricName": "retired_rate",
>> +        "ScaleUnit": "100%"
>> +    },
>> +    {
>> +        "MetricExpr": "1 - OP_RETIRED / OP_SPEC",
> 
> Should OP_RETIRED be greater than OP_SPEC? In which case won't this
> metric be negative?
> 

OP_RETIRED will not be greater than OP_SPEC. OP_SPEC counts any operation that has been
speculatively executed. OP_SPEC is a superset of the OP_RETIRED event. There is a
description about OP_SPEC and OP_RETIRED in this neoverse-n2 document.
Link: https://documentation-service.arm.com/static/62cfe21e31ea212bb6627393?token=

>> +        "PublicDescription": "Fraction of operations wasted",
>> +        "BriefDescription": "Fraction of operations wasted",
>> +        "MetricGroup": "PEutilization",
>> +        "MetricName": "wasted_rate",
>> +        "ScaleUnit": "100%"
>> +    },
>> +    {
>> +        "MetricExpr": "OP_RETIRED / OP_SPEC * (1 - (STALL_SLOT - CPU_CYCLES) / (CPU_CYCLES * 5))",
>> +        "PublicDescription": "Utilization of CPU",
>> +        "BriefDescription": "Utilization of CPU",
> 
> Some more detail in the description would be useful.
> 

Ok, I'll describe it in more detail. CPU_utilization reflects the truly effective ratio of operation
executed by the CPU, which means that misprediction and stall are not included. Note that stall_slot
minus cpu_cycles is a correction to the stall_slot error count.

>> +        "MetricGroup": "PEutilization",
>> +        "MetricName": "cpu_utilization",
>> +        "ScaleUnit": "100%"
>>      }
>>  ]
>> --
>> 1.8.3.1
>>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v3 5/6] perf vendor events arm64: Add PE utilization metrics for neoverse-n2
  2022-12-01 11:08         ` Jing Zhang
@ 2022-12-02 20:05           ` Ian Rogers
  -1 siblings, 0 replies; 96+ messages in thread
From: Ian Rogers @ 2022-12-02 20:05 UTC (permalink / raw)
  To: Jing Zhang
  Cc: John Garry, Xing Zhengjun, Will Deacon, James Clark, Mike Leach,
	Leo Yan, linux-arm-kernel, linux-perf-users, linux-kernel,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song

On Thu, Dec 1, 2022 at 3:08 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>
>
>
> 在 2022/12/1 上午2:58, Ian Rogers 写道:
> > On Thu, Nov 24, 2022 at 9:15 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
> >>
> >> Add PE utilization related metrics.
> >>
> >> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
> >> ---
> >>  .../arch/arm64/arm/neoverse-n2/metrics.json        | 45 ++++++++++++++++++++++
> >>  1 file changed, 45 insertions(+)
> >>
> >> diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> >> index 23c7d62..7b54819 100644
> >> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> >> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> >> @@ -189,5 +189,50 @@
> >>          "MetricGroup": "Branch",
> >>          "MetricName": "branch_miss_pred_rate",
> >>          "ScaleUnit": "100%"
> >> +    },
> >> +    {
> >> +        "MetricExpr": "instructions / CPU_CYCLES",
> >> +        "PublicDescription": "The average number of instructions executed for each cycle.",
> >> +        "BriefDescription": "Instructions per cycle",
> >> +        "MetricGroup": "PEutilization",
> >> +        "MetricName": "ipc"
> >> +    },
> >
> > A related useful metric is percentage of peak, so if the peak IPC is 8
> > (usually a constant related to the number of functional units) then
> > you can just compute the ratio of IPC with this.
> >
>
> Glad to discuss these with you.
> The peak ipc value of neoverse-n2 is 5. Maybe I should add an ipc_rate metric?
>
> >> +    {
> >> +        "MetricExpr": "INST_RETIRED / CPU_CYCLES",
> >> +        "PublicDescription": "Architecturally executed Instructions Per Cycle (IPC)",
> >> +        "BriefDescription": "Architecturally executed Instructions Per Cycle (IPC)",
> >
> >
> > The duplicated descriptions are unnecessary. Drop the public one for
> > consistency with what we do for Intel:
> > https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py#L299
> >
>
> Sounds good, will do.
>
> >> +        "MetricGroup": "PEutilization",
> >> +        "MetricName": "retired_ipc"
> >> +    },
> >> +    {
> >> +        "MetricExpr": "INST_SPEC / CPU_CYCLES",
> >> +        "PublicDescription": "Speculatively executed Instructions Per Cycle (IPC)",
> >> +        "BriefDescription": "Speculatively executed Instructions Per Cycle (IPC)",
> >> +        "MetricGroup": "PEutilization",
> >> +        "MetricName": "spec_ipc"
> >> +    },
> >> +    {
> >> +        "MetricExpr": "OP_RETIRED / OP_SPEC",
> >> +        "PublicDescription": "Fraction of operations retired",
> >> +        "BriefDescription": "Fraction of operations retired",
> >
> > Would instructions be clearer than operations here?
> >
>
> operation and instruction are different. OP_RETIRED counts any operation (not instruction)
> that has been architecturally executed, For example, speculatively executed operations that
> have been abandoned for a branch mispredict will not be counted. So I think operation might
> be more accurate.

Thanks, I see this note in the N2 PMU guide:

"""
For PMU event definitions, some events specifically count
instructions, while other events count micro-operations (which are
referred to as operations). Please be aware of the use of the word
"operations" or "instructions" in the event description.
"""

From your explanation I wasn't sure if operation was a superset of
instruction that included both retired and speculated ones, or whether
operation had another meaning. I don't see operation being used in the
micro-operation sense elsewhere in the ARM perf json, I think
micro-operation is more consistent and also clearer:
https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/pmu-events/arch/arm64/arm/cortex-a75/pipeline.json?h=perf/core#n27

Perhaps the description can be something like:
Of all the micro-operations issued, what percentage were retired. A
lower number indicates bad speculation.

An alternate way to add documentation is the perf wiki's glossary:
https://perf.wiki.kernel.org/index.php/Glossary

I added the Neoverse N2 PMU Guide to:
https://perf.wiki.kernel.org/index.php/Useful_Links#Manuals

Thanks,
Ian

> >> +        "MetricGroup": "PEutilization",
> >> +        "MetricName": "retired_rate",
> >> +        "ScaleUnit": "100%"
> >> +    },
> >> +    {
> >> +        "MetricExpr": "1 - OP_RETIRED / OP_SPEC",
> >
> > Should OP_RETIRED be greater than OP_SPEC? In which case won't this
> > metric be negative?
> >
>
> OP_RETIRED will not be greater than OP_SPEC. OP_SPEC counts any operation that has been
> speculatively executed. OP_SPEC is a superset of the OP_RETIRED event. There is a
> description about OP_SPEC and OP_RETIRED in this neoverse-n2 document.
> Link: https://documentation-service.arm.com/static/62cfe21e31ea212bb6627393?token=
>
> >> +        "PublicDescription": "Fraction of operations wasted",
> >> +        "BriefDescription": "Fraction of operations wasted",
> >> +        "MetricGroup": "PEutilization",
> >> +        "MetricName": "wasted_rate",
> >> +        "ScaleUnit": "100%"
> >> +    },
> >> +    {
> >> +        "MetricExpr": "OP_RETIRED / OP_SPEC * (1 - (STALL_SLOT - CPU_CYCLES) / (CPU_CYCLES * 5))",
> >> +        "PublicDescription": "Utilization of CPU",
> >> +        "BriefDescription": "Utilization of CPU",
> >
> > Some more detail in the description would be useful.
> >
>
> Ok, I'll describe it in more detail. CPU_utilization reflects the truly effective ratio of operation
> executed by the CPU, which means that misprediction and stall are not included. Note that stall_slot
> minus cpu_cycles is a correction to the stall_slot error count.
>
> >> +        "MetricGroup": "PEutilization",
> >> +        "MetricName": "cpu_utilization",
> >> +        "ScaleUnit": "100%"
> >>      }
> >>  ]
> >> --
> >> 1.8.3.1
> >>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v3 5/6] perf vendor events arm64: Add PE utilization metrics for neoverse-n2
@ 2022-12-02 20:05           ` Ian Rogers
  0 siblings, 0 replies; 96+ messages in thread
From: Ian Rogers @ 2022-12-02 20:05 UTC (permalink / raw)
  To: Jing Zhang
  Cc: John Garry, Xing Zhengjun, Will Deacon, James Clark, Mike Leach,
	Leo Yan, linux-arm-kernel, linux-perf-users, linux-kernel,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song

On Thu, Dec 1, 2022 at 3:08 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>
>
>
> 在 2022/12/1 上午2:58, Ian Rogers 写道:
> > On Thu, Nov 24, 2022 at 9:15 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
> >>
> >> Add PE utilization related metrics.
> >>
> >> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
> >> ---
> >>  .../arch/arm64/arm/neoverse-n2/metrics.json        | 45 ++++++++++++++++++++++
> >>  1 file changed, 45 insertions(+)
> >>
> >> diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> >> index 23c7d62..7b54819 100644
> >> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> >> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
> >> @@ -189,5 +189,50 @@
> >>          "MetricGroup": "Branch",
> >>          "MetricName": "branch_miss_pred_rate",
> >>          "ScaleUnit": "100%"
> >> +    },
> >> +    {
> >> +        "MetricExpr": "instructions / CPU_CYCLES",
> >> +        "PublicDescription": "The average number of instructions executed for each cycle.",
> >> +        "BriefDescription": "Instructions per cycle",
> >> +        "MetricGroup": "PEutilization",
> >> +        "MetricName": "ipc"
> >> +    },
> >
> > A related useful metric is percentage of peak, so if the peak IPC is 8
> > (usually a constant related to the number of functional units) then
> > you can just compute the ratio of IPC with this.
> >
>
> Glad to discuss these with you.
> The peak ipc value of neoverse-n2 is 5. Maybe I should add an ipc_rate metric?
>
> >> +    {
> >> +        "MetricExpr": "INST_RETIRED / CPU_CYCLES",
> >> +        "PublicDescription": "Architecturally executed Instructions Per Cycle (IPC)",
> >> +        "BriefDescription": "Architecturally executed Instructions Per Cycle (IPC)",
> >
> >
> > The duplicated descriptions are unnecessary. Drop the public one for
> > consistency with what we do for Intel:
> > https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py#L299
> >
>
> Sounds good, will do.
>
> >> +        "MetricGroup": "PEutilization",
> >> +        "MetricName": "retired_ipc"
> >> +    },
> >> +    {
> >> +        "MetricExpr": "INST_SPEC / CPU_CYCLES",
> >> +        "PublicDescription": "Speculatively executed Instructions Per Cycle (IPC)",
> >> +        "BriefDescription": "Speculatively executed Instructions Per Cycle (IPC)",
> >> +        "MetricGroup": "PEutilization",
> >> +        "MetricName": "spec_ipc"
> >> +    },
> >> +    {
> >> +        "MetricExpr": "OP_RETIRED / OP_SPEC",
> >> +        "PublicDescription": "Fraction of operations retired",
> >> +        "BriefDescription": "Fraction of operations retired",
> >
> > Would instructions be clearer than operations here?
> >
>
> operation and instruction are different. OP_RETIRED counts any operation (not instruction)
> that has been architecturally executed, For example, speculatively executed operations that
> have been abandoned for a branch mispredict will not be counted. So I think operation might
> be more accurate.

Thanks, I see this note in the N2 PMU guide:

"""
For PMU event definitions, some events specifically count
instructions, while other events count micro-operations (which are
referred to as operations). Please be aware of the use of the word
"operations" or "instructions" in the event description.
"""

From your explanation I wasn't sure if operation was a superset of
instruction that included both retired and speculated ones, or whether
operation had another meaning. I don't see operation being used in the
micro-operation sense elsewhere in the ARM perf json, I think
micro-operation is more consistent and also clearer:
https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/pmu-events/arch/arm64/arm/cortex-a75/pipeline.json?h=perf/core#n27

Perhaps the description can be something like:
Of all the micro-operations issued, what percentage were retired. A
lower number indicates bad speculation.

An alternate way to add documentation is the perf wiki's glossary:
https://perf.wiki.kernel.org/index.php/Glossary

I added the Neoverse N2 PMU Guide to:
https://perf.wiki.kernel.org/index.php/Useful_Links#Manuals

Thanks,
Ian

> >> +        "MetricGroup": "PEutilization",
> >> +        "MetricName": "retired_rate",
> >> +        "ScaleUnit": "100%"
> >> +    },
> >> +    {
> >> +        "MetricExpr": "1 - OP_RETIRED / OP_SPEC",
> >
> > Should OP_RETIRED be greater than OP_SPEC? In which case won't this
> > metric be negative?
> >
>
> OP_RETIRED will not be greater than OP_SPEC. OP_SPEC counts any operation that has been
> speculatively executed. OP_SPEC is a superset of the OP_RETIRED event. There is a
> description about OP_SPEC and OP_RETIRED in this neoverse-n2 document.
> Link: https://documentation-service.arm.com/static/62cfe21e31ea212bb6627393?token=
>
> >> +        "PublicDescription": "Fraction of operations wasted",
> >> +        "BriefDescription": "Fraction of operations wasted",
> >> +        "MetricGroup": "PEutilization",
> >> +        "MetricName": "wasted_rate",
> >> +        "ScaleUnit": "100%"
> >> +    },
> >> +    {
> >> +        "MetricExpr": "OP_RETIRED / OP_SPEC * (1 - (STALL_SLOT - CPU_CYCLES) / (CPU_CYCLES * 5))",
> >> +        "PublicDescription": "Utilization of CPU",
> >> +        "BriefDescription": "Utilization of CPU",
> >
> > Some more detail in the description would be useful.
> >
>
> Ok, I'll describe it in more detail. CPU_utilization reflects the truly effective ratio of operation
> executed by the CPU, which means that misprediction and stall are not included. Note that stall_slot
> minus cpu_cycles is a correction to the stall_slot error count.
>
> >> +        "MetricGroup": "PEutilization",
> >> +        "MetricName": "cpu_utilization",
> >> +        "ScaleUnit": "100%"
> >>      }
> >>  ]
> >> --
> >> 1.8.3.1
> >>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v3 5/6] perf vendor events arm64: Add PE utilization metrics for neoverse-n2
  2022-12-02 20:05           ` Ian Rogers
@ 2022-12-04  7:10             ` Jing Zhang
  -1 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-12-04  7:10 UTC (permalink / raw)
  To: Ian Rogers
  Cc: John Garry, Xing Zhengjun, Will Deacon, James Clark, Mike Leach,
	Leo Yan, linux-arm-kernel, linux-perf-users, linux-kernel,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song



在 2022/12/3 上午4:05, Ian Rogers 写道:
> On Thu, Dec 1, 2022 at 3:08 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>>
>>
>>
>> 在 2022/12/1 上午2:58, Ian Rogers 写道:
>>> On Thu, Nov 24, 2022 at 9:15 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>>>>
>>>> Add PE utilization related metrics.
>>>>
>>>> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
>>>> ---
>>>>  .../arch/arm64/arm/neoverse-n2/metrics.json        | 45 ++++++++++++++++++++++
>>>>  1 file changed, 45 insertions(+)
>>>>
>>>> diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>> index 23c7d62..7b54819 100644
>>>> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>> @@ -189,5 +189,50 @@
>>>>          "MetricGroup": "Branch",
>>>>          "MetricName": "branch_miss_pred_rate",
>>>>          "ScaleUnit": "100%"
>>>> +    },
>>>> +    {
>>>> +        "MetricExpr": "instructions / CPU_CYCLES",
>>>> +        "PublicDescription": "The average number of instructions executed for each cycle.",
>>>> +        "BriefDescription": "Instructions per cycle",
>>>> +        "MetricGroup": "PEutilization",
>>>> +        "MetricName": "ipc"
>>>> +    },
>>>
>>> A related useful metric is percentage of peak, so if the peak IPC is 8
>>> (usually a constant related to the number of functional units) then
>>> you can just compute the ratio of IPC with this.
>>>
>>
>> Glad to discuss these with you.
>> The peak ipc value of neoverse-n2 is 5. Maybe I should add an ipc_rate metric?
>>
>>>> +    {
>>>> +        "MetricExpr": "INST_RETIRED / CPU_CYCLES",
>>>> +        "PublicDescription": "Architecturally executed Instructions Per Cycle (IPC)",
>>>> +        "BriefDescription": "Architecturally executed Instructions Per Cycle (IPC)",
>>>
>>>
>>> The duplicated descriptions are unnecessary. Drop the public one for
>>> consistency with what we do for Intel:
>>> https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py#L299
>>>
>>
>> Sounds good, will do.
>>
>>>> +        "MetricGroup": "PEutilization",
>>>> +        "MetricName": "retired_ipc"
>>>> +    },
>>>> +    {
>>>> +        "MetricExpr": "INST_SPEC / CPU_CYCLES",
>>>> +        "PublicDescription": "Speculatively executed Instructions Per Cycle (IPC)",
>>>> +        "BriefDescription": "Speculatively executed Instructions Per Cycle (IPC)",
>>>> +        "MetricGroup": "PEutilization",
>>>> +        "MetricName": "spec_ipc"
>>>> +    },
>>>> +    {
>>>> +        "MetricExpr": "OP_RETIRED / OP_SPEC",
>>>> +        "PublicDescription": "Fraction of operations retired",
>>>> +        "BriefDescription": "Fraction of operations retired",
>>>
>>> Would instructions be clearer than operations here?
>>>
>>
>> operation and instruction are different. OP_RETIRED counts any operation (not instruction)
>> that has been architecturally executed, For example, speculatively executed operations that
>> have been abandoned for a branch mispredict will not be counted. So I think operation might
>> be more accurate.
> 
> Thanks, I see this note in the N2 PMU guide:
> 
> """
> For PMU event definitions, some events specifically count
> instructions, while other events count micro-operations (which are
> referred to as operations). Please be aware of the use of the word
> "operations" or "instructions" in the event description.
> """
> 
> From your explanation I wasn't sure if operation was a superset of
> instruction that included both retired and speculated ones, or whether
> operation had another meaning. I don't see operation being used in the
> micro-operation sense elsewhere in the ARM perf json, I think
> micro-operation is more consistent and also clearer:
> https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/pmu-events/arch/arm64/arm/cortex-a75/pipeline.json?h=perf/core#n27
> 
> Perhaps the description can be something like:
> Of all the micro-operations issued, what percentage were retired. A
> lower number indicates bad speculation.
> 
> An alternate way to add documentation is the perf wiki's glossary:
> https://perf.wiki.kernel.org/index.php/Glossary
> 
> I added the Neoverse N2 PMU Guide to:
> https://perf.wiki.kernel.org/index.php/Useful_Links#Manuals
> 

Thanks.

The operation here is micro-operation, perhaps it is more accurate to change it to micro-operation.

Description of op_retired and op_spec:
https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/pmu-events/arch/arm64/common-and-microarch.json?h=perf/core#n315

The event of op_retired counts Micro-operation architecturally executed. The counter counts each
operation counted by OP_SPEC that would be executed in a simple sequential execution of the program.

The event of op_spec counts Micro-operation speculatively executed. The counter counts the number
of operations executed by the processing element, including those that are executed speculatively
and would not be executed in a simple sequential execution of the program.

So "op_retired/op_spec" is indeed "of all the micro-operations issued, what percentage were retired".
But not "a lower number indicates bad speculation". I think "retired" here means "committed".

In the N2 PMU guide:
"""
If the branch is mispredicted, and the instructions are speculatively executed, they will not be
considered architecturally executed. The Arm® Architecture Reference Manual also refers to
architecturally executed instructions as “retired” or “committed”. Speculatively executed instructions
that are not architecturally executed will be abandoned; that is, their results will be discarded and
not counted as part of the program flow.
"""

> 
>>>> +        "MetricGroup": "PEutilization",
>>>> +        "MetricName": "retired_rate",
>>>> +        "ScaleUnit": "100%"
>>>> +    },
>>>> +    {
>>>> +        "MetricExpr": "1 - OP_RETIRED / OP_SPEC",
>>>
>>> Should OP_RETIRED be greater than OP_SPEC? In which case won't this
>>> metric be negative?
>>>
>>
>> OP_RETIRED will not be greater than OP_SPEC. OP_SPEC counts any operation that has been
>> speculatively executed. OP_SPEC is a superset of the OP_RETIRED event. There is a
>> description about OP_SPEC and OP_RETIRED in this neoverse-n2 document.
>> Link: https://documentation-service.arm.com/static/62cfe21e31ea212bb6627393?token=
>>
>>>> +        "PublicDescription": "Fraction of operations wasted",
>>>> +        "BriefDescription": "Fraction of operations wasted",
>>>> +        "MetricGroup": "PEutilization",
>>>> +        "MetricName": "wasted_rate",
>>>> +        "ScaleUnit": "100%"
>>>> +    },
>>>> +    {
>>>> +        "MetricExpr": "OP_RETIRED / OP_SPEC * (1 - (STALL_SLOT - CPU_CYCLES) / (CPU_CYCLES * 5))",
>>>> +        "PublicDescription": "Utilization of CPU",
>>>> +        "BriefDescription": "Utilization of CPU",
>>>
>>> Some more detail in the description would be useful.
>>>
>>
>> Ok, I'll describe it in more detail. CPU_utilization reflects the truly effective ratio of operation
>> executed by the CPU, which means that misprediction and stall are not included. Note that stall_slot
>> minus cpu_cycles is a correction to the stall_slot error count.
>>
>>>> +        "MetricGroup": "PEutilization",
>>>> +        "MetricName": "cpu_utilization",
>>>> +        "ScaleUnit": "100%"
>>>>      }
>>>>  ]
>>>> --
>>>> 1.8.3.1
>>>>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v3 5/6] perf vendor events arm64: Add PE utilization metrics for neoverse-n2
@ 2022-12-04  7:10             ` Jing Zhang
  0 siblings, 0 replies; 96+ messages in thread
From: Jing Zhang @ 2022-12-04  7:10 UTC (permalink / raw)
  To: Ian Rogers
  Cc: John Garry, Xing Zhengjun, Will Deacon, James Clark, Mike Leach,
	Leo Yan, linux-arm-kernel, linux-perf-users, linux-kernel,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Andrew Kilroy, Shuai Xue, Zhuo Song



在 2022/12/3 上午4:05, Ian Rogers 写道:
> On Thu, Dec 1, 2022 at 3:08 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>>
>>
>>
>> 在 2022/12/1 上午2:58, Ian Rogers 写道:
>>> On Thu, Nov 24, 2022 at 9:15 AM Jing Zhang <renyu.zj@linux.alibaba.com> wrote:
>>>>
>>>> Add PE utilization related metrics.
>>>>
>>>> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
>>>> ---
>>>>  .../arch/arm64/arm/neoverse-n2/metrics.json        | 45 ++++++++++++++++++++++
>>>>  1 file changed, 45 insertions(+)
>>>>
>>>> diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>> index 23c7d62..7b54819 100644
>>>> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json
>>>> @@ -189,5 +189,50 @@
>>>>          "MetricGroup": "Branch",
>>>>          "MetricName": "branch_miss_pred_rate",
>>>>          "ScaleUnit": "100%"
>>>> +    },
>>>> +    {
>>>> +        "MetricExpr": "instructions / CPU_CYCLES",
>>>> +        "PublicDescription": "The average number of instructions executed for each cycle.",
>>>> +        "BriefDescription": "Instructions per cycle",
>>>> +        "MetricGroup": "PEutilization",
>>>> +        "MetricName": "ipc"
>>>> +    },
>>>
>>> A related useful metric is percentage of peak, so if the peak IPC is 8
>>> (usually a constant related to the number of functional units) then
>>> you can just compute the ratio of IPC with this.
>>>
>>
>> Glad to discuss these with you.
>> The peak ipc value of neoverse-n2 is 5. Maybe I should add an ipc_rate metric?
>>
>>>> +    {
>>>> +        "MetricExpr": "INST_RETIRED / CPU_CYCLES",
>>>> +        "PublicDescription": "Architecturally executed Instructions Per Cycle (IPC)",
>>>> +        "BriefDescription": "Architecturally executed Instructions Per Cycle (IPC)",
>>>
>>>
>>> The duplicated descriptions are unnecessary. Drop the public one for
>>> consistency with what we do for Intel:
>>> https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py#L299
>>>
>>
>> Sounds good, will do.
>>
>>>> +        "MetricGroup": "PEutilization",
>>>> +        "MetricName": "retired_ipc"
>>>> +    },
>>>> +    {
>>>> +        "MetricExpr": "INST_SPEC / CPU_CYCLES",
>>>> +        "PublicDescription": "Speculatively executed Instructions Per Cycle (IPC)",
>>>> +        "BriefDescription": "Speculatively executed Instructions Per Cycle (IPC)",
>>>> +        "MetricGroup": "PEutilization",
>>>> +        "MetricName": "spec_ipc"
>>>> +    },
>>>> +    {
>>>> +        "MetricExpr": "OP_RETIRED / OP_SPEC",
>>>> +        "PublicDescription": "Fraction of operations retired",
>>>> +        "BriefDescription": "Fraction of operations retired",
>>>
>>> Would instructions be clearer than operations here?
>>>
>>
>> operation and instruction are different. OP_RETIRED counts any operation (not instruction)
>> that has been architecturally executed, For example, speculatively executed operations that
>> have been abandoned for a branch mispredict will not be counted. So I think operation might
>> be more accurate.
> 
> Thanks, I see this note in the N2 PMU guide:
> 
> """
> For PMU event definitions, some events specifically count
> instructions, while other events count micro-operations (which are
> referred to as operations). Please be aware of the use of the word
> "operations" or "instructions" in the event description.
> """
> 
> From your explanation I wasn't sure if operation was a superset of
> instruction that included both retired and speculated ones, or whether
> operation had another meaning. I don't see operation being used in the
> micro-operation sense elsewhere in the ARM perf json, I think
> micro-operation is more consistent and also clearer:
> https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/pmu-events/arch/arm64/arm/cortex-a75/pipeline.json?h=perf/core#n27
> 
> Perhaps the description can be something like:
> Of all the micro-operations issued, what percentage were retired. A
> lower number indicates bad speculation.
> 
> An alternate way to add documentation is the perf wiki's glossary:
> https://perf.wiki.kernel.org/index.php/Glossary
> 
> I added the Neoverse N2 PMU Guide to:
> https://perf.wiki.kernel.org/index.php/Useful_Links#Manuals
> 

Thanks.

The operation here is micro-operation, perhaps it is more accurate to change it to micro-operation.

Description of op_retired and op_spec:
https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/pmu-events/arch/arm64/common-and-microarch.json?h=perf/core#n315

The event of op_retired counts Micro-operation architecturally executed. The counter counts each
operation counted by OP_SPEC that would be executed in a simple sequential execution of the program.

The event of op_spec counts Micro-operation speculatively executed. The counter counts the number
of operations executed by the processing element, including those that are executed speculatively
and would not be executed in a simple sequential execution of the program.

So "op_retired/op_spec" is indeed "of all the micro-operations issued, what percentage were retired".
But not "a lower number indicates bad speculation". I think "retired" here means "committed".

In the N2 PMU guide:
"""
If the branch is mispredicted, and the instructions are speculatively executed, they will not be
considered architecturally executed. The Arm® Architecture Reference Manual also refers to
architecturally executed instructions as “retired” or “committed”. Speculatively executed instructions
that are not architecturally executed will be abandoned; that is, their results will be discarded and
not counted as part of the program flow.
"""

> 
>>>> +        "MetricGroup": "PEutilization",
>>>> +        "MetricName": "retired_rate",
>>>> +        "ScaleUnit": "100%"
>>>> +    },
>>>> +    {
>>>> +        "MetricExpr": "1 - OP_RETIRED / OP_SPEC",
>>>
>>> Should OP_RETIRED be greater than OP_SPEC? In which case won't this
>>> metric be negative?
>>>
>>
>> OP_RETIRED will not be greater than OP_SPEC. OP_SPEC counts any operation that has been
>> speculatively executed. OP_SPEC is a superset of the OP_RETIRED event. There is a
>> description about OP_SPEC and OP_RETIRED in this neoverse-n2 document.
>> Link: https://documentation-service.arm.com/static/62cfe21e31ea212bb6627393?token=
>>
>>>> +        "PublicDescription": "Fraction of operations wasted",
>>>> +        "BriefDescription": "Fraction of operations wasted",
>>>> +        "MetricGroup": "PEutilization",
>>>> +        "MetricName": "wasted_rate",
>>>> +        "ScaleUnit": "100%"
>>>> +    },
>>>> +    {
>>>> +        "MetricExpr": "OP_RETIRED / OP_SPEC * (1 - (STALL_SLOT - CPU_CYCLES) / (CPU_CYCLES * 5))",
>>>> +        "PublicDescription": "Utilization of CPU",
>>>> +        "BriefDescription": "Utilization of CPU",
>>>
>>> Some more detail in the description would be useful.
>>>
>>
>> Ok, I'll describe it in more detail. CPU_utilization reflects the truly effective ratio of operation
>> executed by the CPU, which means that misprediction and stall are not included. Note that stall_slot
>> minus cpu_cycles is a correction to the stall_slot error count.
>>
>>>> +        "MetricGroup": "PEutilization",
>>>> +        "MetricName": "cpu_utilization",
>>>> +        "ScaleUnit": "100%"
>>>>      }
>>>>  ]
>>>> --
>>>> 1.8.3.1
>>>>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

end of thread, other threads:[~2022-12-04  7:12 UTC | newest]

Thread overview: 96+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-31 11:11 [PATCH RFC 0/6] Add metrics for neoverse-n2 Jing Zhang
2022-10-31 11:11 ` Jing Zhang
2022-10-31 11:11 ` [PATCH RFC 1/6] perf vendor events arm64: Add topdown L1 " Jing Zhang
2022-10-31 11:11   ` Jing Zhang
2022-10-31 11:11 ` [PATCH RFC 2/6] perf vendor events arm64: Add TLB " Jing Zhang
2022-10-31 11:11   ` Jing Zhang
2022-10-31 11:11 ` [PATCH RFC 3/6] perf vendor events arm64: Add cache " Jing Zhang
2022-10-31 11:11   ` Jing Zhang
2022-10-31 11:11 ` [PATCH RFC 4/6] perf vendor events arm64: Add branch " Jing Zhang
2022-10-31 11:11   ` Jing Zhang
2022-10-31 11:11 ` [PATCH RFC 5/6] perf vendor events arm64: Add PE utilization " Jing Zhang
2022-10-31 11:11   ` Jing Zhang
2022-10-31 11:11 ` [PATCH RFC 6/6] perf vendor events arm64: Add instruction mix " Jing Zhang
2022-10-31 11:11   ` Jing Zhang
2022-11-14  7:41 ` [RFC PATCH v2 0/6] Add " Jing Zhang
2022-11-14  7:41   ` Jing Zhang
2022-11-24 17:14   ` [PATCH v3 " Jing Zhang
2022-11-24 17:14     ` Jing Zhang
2022-11-24 17:14   ` [PATCH v3 1/6] perf vendor events arm64: Add topdown L1 " Jing Zhang
2022-11-24 17:14     ` Jing Zhang
2022-11-24 17:14   ` [PATCH v3 2/6] perf vendor events arm64: Add TLB " Jing Zhang
2022-11-24 17:14     ` Jing Zhang
2022-11-24 17:14   ` [PATCH v3 3/6] perf vendor events arm64: Add cache " Jing Zhang
2022-11-24 17:14     ` Jing Zhang
2022-11-24 17:14   ` [PATCH v3 4/6] perf vendor events arm64: Add branch " Jing Zhang
2022-11-24 17:14     ` Jing Zhang
2022-11-24 17:14   ` [PATCH v3 5/6] perf vendor events arm64: Add PE utilization " Jing Zhang
2022-11-24 17:14     ` Jing Zhang
2022-11-30 18:58     ` Ian Rogers
2022-11-30 18:58       ` Ian Rogers
2022-12-01 11:08       ` Jing Zhang
2022-12-01 11:08         ` Jing Zhang
2022-12-02 20:05         ` Ian Rogers
2022-12-02 20:05           ` Ian Rogers
2022-12-04  7:10           ` Jing Zhang
2022-12-04  7:10             ` Jing Zhang
2022-11-24 17:14   ` [PATCH v3 6/6] perf vendor events arm64: Add instruction mix " Jing Zhang
2022-11-24 17:14     ` Jing Zhang
2022-11-14  7:41 ` [RFC PATCH v2 1/6] perf vendor events arm64: Add topdown L1 " Jing Zhang
2022-11-14  7:41   ` Jing Zhang
2022-11-14 12:59   ` [External] : " John Garry
2022-11-14 12:59     ` John Garry
2022-11-15  8:43     ` Jing Zhang
2022-11-15  8:43       ` Jing Zhang
2022-11-15 11:19       ` John Garry
2022-11-15 11:19         ` John Garry
2022-11-21  9:53         ` Jing Zhang
2022-11-21  9:53           ` Jing Zhang
2022-11-21 10:22           ` John Garry
2022-11-21 10:22             ` John Garry
2022-11-21 15:17             ` Jing Zhang
2022-11-21 15:17               ` Jing Zhang
2022-11-21 17:55               ` John Garry
2022-11-21 17:55                 ` John Garry
2022-11-22  9:24                 ` Jing Zhang
2022-11-22  9:24                   ` Jing Zhang
2022-11-22 14:00                 ` James Clark
2022-11-22 14:00                   ` James Clark
2022-11-22 15:41                   ` Jing Zhang
2022-11-22 15:41                     ` Jing Zhang
2022-11-23 14:26                     ` James Clark
2022-11-23 14:26                       ` James Clark
2022-11-24 16:32                       ` Jing Zhang
2022-11-24 16:32                         ` Jing Zhang
2022-11-24 16:51                         ` James Clark
2022-11-24 16:51                           ` James Clark
2022-11-14  7:41 ` [RFC PATCH v2 2/6] perf vendor events arm64: Add TLB " Jing Zhang
2022-11-14  7:41   ` Jing Zhang
2022-11-14  7:41 ` [RFC PATCH v2 3/6] perf vendor events arm64: Add cache " Jing Zhang
2022-11-14  7:41   ` Jing Zhang
2022-11-14  8:35   ` Xing Zhengjun
2022-11-14  8:35     ` Xing Zhengjun
2022-11-15  6:28     ` Jing Zhang
2022-11-15  6:28       ` Jing Zhang
2022-11-14  7:41 ` [RFC PATCH v2 4/6] perf vendor events arm64: Add branch " Jing Zhang
2022-11-14  7:41   ` Jing Zhang
2022-11-14  7:41 ` [RFC PATCH v2 5/6] perf vendor events arm64: Add PE utilization " Jing Zhang
2022-11-14  7:41   ` Jing Zhang
2022-11-14  7:42 ` [RFC PATCH v2 6/6] perf vendor events arm64: Add instruction mix " Jing Zhang
2022-11-14  7:42   ` Jing Zhang
2022-11-16 11:19 ` [PATCH RFC 0/6] Add " James Clark
2022-11-16 11:19   ` James Clark
2022-11-16 15:26   ` Jing Zhang
2022-11-16 15:26     ` Jing Zhang
2022-11-21 11:51     ` James Clark
2022-11-21 11:51       ` James Clark
2022-11-22  7:11       ` Jing Zhang
2022-11-22  7:11         ` Jing Zhang
2022-11-22 11:53         ` James Clark
2022-11-22 11:53           ` James Clark
2022-11-19  3:30   ` Jing Zhang
2022-11-19  3:30     ` Jing Zhang
     [not found]     ` <CAP-5=fW+Z_Tc3BfK1bRKUeKWfxtPfoZXL9D2BhcU1SzNOruSsg@mail.gmail.com>
2022-11-20  3:49       ` Jing Zhang
2022-11-20  3:49         ` Jing Zhang
2022-11-21 11:55       ` James Clark
2022-11-21 11:55         ` James Clark

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.