All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/7] Add metrics for neoverse-n2-v2
@ 2023-01-06 15:05 ` Jing Zhang
  0 siblings, 0 replies; 24+ messages in thread
From: Jing Zhang @ 2023-01-06 15:05 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Xing Zhengjun, Will Deacon, James Clark, Mike Leach, Leo Yan,
	linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song, Jing Zhang

Changes since v5:
- Add common topdownL1 metrics in sbsa.json as suggested by John and Ian;
- Correct PKI/MPKI ScaleUnit to 1PKI/1MPKI;
- Link: https://lore.kernel.org/all/1672745976-2800146-1-git-send-email-renyu.zj@linux.alibaba.com/

Changes since v4:
- Add MPKI/PKI “ScaleUnit”;
- Add acked-by from Ian Rogers;
- Link: https://lore.kernel.org/all/1671799045-1108027-1-git-send-email-renyu.zj@linux.alibaba.com/

Changes since v3:
- Add ipc_rate metric;
- Drop the PublicDescription;
- Describe PEutilization metrics in more detail;
- Link: https://lore.kernel.org/all/1669310088-13482-1-git-send-email-renyu.zj@linux.alibaba.com/

Changes since v2:
- Correct the furmula of Branch metrics;
- Add more PE utilization metrics;
- Add more TLB metrics;
- Add “ScaleUnit” for some metrics;
- Add a newline at the end of the file;
- Link: https://lore.kernel.org/all/1668411720-3581-1-git-send-email-renyu.zj@linux.alibaba.com/

Changes since v1:
- Corrected formula for topdown L1 due to wrong counts for stall_slot and
  stall_slot_frontend; 
- Link: https://lore.kernel.org/all/1667214694-89839-1-git-send-email-renyu.zj@linux.alibaba.com/


The metrics of topdown L1 are from ARM sbsa7.0 platform design doc[0],
D37-38, which are standard. So put them in the common file sbsa.json of
arm64, so that other cores besides n2/v2 can also be reused.

Then add topdownL1 metric for neoverse-n2-v2, and due to the wrong count
of stall_slot and stall_slot_frontend on neoverse-n2, the real stall_slot
and real stall_slot_frontend need to subtract cpu_cycles, so overwrite
the "MetricExpr" for neoverse-n2. 
Reference from ARM neoverse-n2 errata notice [1], D117.

Since neoverse-n2/neoverse-v2 does not yet support topdown L2, metricgroups
such as Cache, TLB, Branch, InstructionsMix, and PEutilization will be added
to further analysis of performance bottlenecks in the following patches.
Reference from ARM PMU guide [2][3].

[0] https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
[1] https://documentation-service.arm.com/static/636a66a64e6cf12278ad89cb?token=
[2] https://documentation-service.arm.com/static/628f8fa3dfaf015c2b76eae8?token=
[3] https://documentation-service.arm.com/static/62cfe21e31ea212bb6627393?token=

Tested in neoverse-n2:

$./perf list
...
Metric Groups:

Branch:
  branch_miss_pred_rate
       [The rate of branches mis-predited to the overall branches]
  branch_mpki
       [The rate of branches mis-predicted per kilo instructions]
  branch_pki
       [The rate of branches retired per kilo instructions]
Cache:
  l1d_cache_miss_rate
       [The rate of L1 D-Cache misses to the overall L1 D-Cache]
  l1d_cache_mpki
       [The rate of L1 D-Cache misses per kilo instructions]
...


$sudo ./perf stat -M TLB false_sharing 2

 Performance counter stats for 'false_sharing 2':

            29,940      L2D_TLB                          #     20.0 %  l2_tlb_miss_rate         (42.36%)
             5,998      L2D_TLB_REFILL                                                          (42.36%)
             1,753      L1I_TLB_REFILL                   #      0.1 %  l1i_tlb_miss_rate        (43.17%)
         2,173,957      L1I_TLB                                                                 (43.17%)
       327,944,763      L1D_TLB                          #      0.0 %  l1d_tlb_miss_rate        (43.98%)
            22,485      L1D_TLB_REFILL                                                          (43.98%)
           497,210      L1I_TLB                          #      0.0 %  itlb_walk_rate           (44.83%)
                28      ITLB_WALK                                                               (44.83%)
       821,488,762      INST_RETIRED                     #      0.0 MPKI  itlb_mpki             (43.97%)
               122      ITLB_WALK                                                               (43.97%)
               744      DTLB_WALK                        #      0.0 %  dtlb_walk_rate           (43.01%)
       263,913,146      L1D_TLB                                                                 (43.01%)
       779,073,875      INST_RETIRED                     #      0.0 MPKI  dtlb_mpki             (42.07%)
             1,050      DTLB_WALK                                                               (42.07%)

       0.435864901 seconds time elapsed

       1.201384000 seconds user
       0.000000000 seconds sys


$sudo ./perf stat -M TopDownL1 false_sharing 2

 Performance counter stats for 'false_sharing 2':

     3,408,960,257      cpu_cycles                       #      0.0 %  bad_speculation
                                                  #      5.1 %  retiring                 (66.79%)
    19,576,079,610      stall_slot                                                              (66.79%)
       877,673,452      op_spec                                                                 (66.79%)
       876,324,270      op_retired                                                              (66.79%)
     3,406,548,064      cpu_cycles                       #     26.7 %  frontend_bound           (67.08%)
     7,961,814,801      stall_slot_frontend                                                     (67.08%)
     3,415,528,440      cpu_cycles                       #     68.8 %  backend_bound            (66.43%)
    11,746,647,747      stall_slot_backend                                                      (66.43%)

       0.455229807 seconds time elapsed

       1.243216000 seconds user
       0.000000000 seconds sys

$sudo ./perf stat -M branch sleep 1

 Performance counter stats for 'sleep 1':

           901,495      INST_RETIRED                     #    223.6 PKI  branch_pki
           201,603      BR_RETIRED
           901,495      INST_RETIRED                     #     10.0 MPKI  branch_mpki
             9,004      BR_MIS_PRED_RETIRED
             9,004      BR_MIS_PRED_RETIRED              #      4.5 %  branch_miss_pred_rate
           201,603      BR_RETIRED

       1.000794467 seconds time elapsed

       0.000905000 seconds user
       0.000000000 seconds sys


Jing Zhang (7):
  perf vendor events arm64: Add common topdown L1 metrics
  perf vendor events arm64: Add topdown L1 metrics for neoverse-n2-v2
  perf vendor events arm64: Add TLB metrics for neoverse-n2-v2
  perf vendor events arm64: Add cache metrics for neoverse-n2-v2
  perf vendor events arm64: Add branch metrics for neoverse-n2-v2
  perf vendor events arm64: Add PE utilization metrics for
    neoverse-n2-v2
  perf vendor events arm64: Add instruction mix metrics for
    neoverse-n2-v2

 tools/perf/arch/arm64/util/pmu.c                   |  22 ++
 .../arch/arm64/arm/neoverse-n2-v2/metrics.json     | 273 +++++++++++++++++++++
 tools/perf/pmu-events/arch/arm64/sbsa.json         |  30 +++
 tools/perf/pmu-events/jevents.py                   |   2 +
 tools/perf/util/expr.c                             |   5 +
 tools/perf/util/pmu.c                              |   5 +
 tools/perf/util/pmu.h                              |   1 +
 7 files changed, 338 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/sbsa.json

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v6 0/7] Add metrics for neoverse-n2-v2
@ 2023-01-06 15:05 ` Jing Zhang
  0 siblings, 0 replies; 24+ messages in thread
From: Jing Zhang @ 2023-01-06 15:05 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Xing Zhengjun, Will Deacon, James Clark, Mike Leach, Leo Yan,
	linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song, Jing Zhang

Changes since v5:
- Add common topdownL1 metrics in sbsa.json as suggested by John and Ian;
- Correct PKI/MPKI ScaleUnit to 1PKI/1MPKI;
- Link: https://lore.kernel.org/all/1672745976-2800146-1-git-send-email-renyu.zj@linux.alibaba.com/

Changes since v4:
- Add MPKI/PKI “ScaleUnit”;
- Add acked-by from Ian Rogers;
- Link: https://lore.kernel.org/all/1671799045-1108027-1-git-send-email-renyu.zj@linux.alibaba.com/

Changes since v3:
- Add ipc_rate metric;
- Drop the PublicDescription;
- Describe PEutilization metrics in more detail;
- Link: https://lore.kernel.org/all/1669310088-13482-1-git-send-email-renyu.zj@linux.alibaba.com/

Changes since v2:
- Correct the furmula of Branch metrics;
- Add more PE utilization metrics;
- Add more TLB metrics;
- Add “ScaleUnit” for some metrics;
- Add a newline at the end of the file;
- Link: https://lore.kernel.org/all/1668411720-3581-1-git-send-email-renyu.zj@linux.alibaba.com/

Changes since v1:
- Corrected formula for topdown L1 due to wrong counts for stall_slot and
  stall_slot_frontend; 
- Link: https://lore.kernel.org/all/1667214694-89839-1-git-send-email-renyu.zj@linux.alibaba.com/


The metrics of topdown L1 are from ARM sbsa7.0 platform design doc[0],
D37-38, which are standard. So put them in the common file sbsa.json of
arm64, so that other cores besides n2/v2 can also be reused.

Then add topdownL1 metric for neoverse-n2-v2, and due to the wrong count
of stall_slot and stall_slot_frontend on neoverse-n2, the real stall_slot
and real stall_slot_frontend need to subtract cpu_cycles, so overwrite
the "MetricExpr" for neoverse-n2. 
Reference from ARM neoverse-n2 errata notice [1], D117.

Since neoverse-n2/neoverse-v2 does not yet support topdown L2, metricgroups
such as Cache, TLB, Branch, InstructionsMix, and PEutilization will be added
to further analysis of performance bottlenecks in the following patches.
Reference from ARM PMU guide [2][3].

[0] https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
[1] https://documentation-service.arm.com/static/636a66a64e6cf12278ad89cb?token=
[2] https://documentation-service.arm.com/static/628f8fa3dfaf015c2b76eae8?token=
[3] https://documentation-service.arm.com/static/62cfe21e31ea212bb6627393?token=

Tested in neoverse-n2:

$./perf list
...
Metric Groups:

Branch:
  branch_miss_pred_rate
       [The rate of branches mis-predited to the overall branches]
  branch_mpki
       [The rate of branches mis-predicted per kilo instructions]
  branch_pki
       [The rate of branches retired per kilo instructions]
Cache:
  l1d_cache_miss_rate
       [The rate of L1 D-Cache misses to the overall L1 D-Cache]
  l1d_cache_mpki
       [The rate of L1 D-Cache misses per kilo instructions]
...


$sudo ./perf stat -M TLB false_sharing 2

 Performance counter stats for 'false_sharing 2':

            29,940      L2D_TLB                          #     20.0 %  l2_tlb_miss_rate         (42.36%)
             5,998      L2D_TLB_REFILL                                                          (42.36%)
             1,753      L1I_TLB_REFILL                   #      0.1 %  l1i_tlb_miss_rate        (43.17%)
         2,173,957      L1I_TLB                                                                 (43.17%)
       327,944,763      L1D_TLB                          #      0.0 %  l1d_tlb_miss_rate        (43.98%)
            22,485      L1D_TLB_REFILL                                                          (43.98%)
           497,210      L1I_TLB                          #      0.0 %  itlb_walk_rate           (44.83%)
                28      ITLB_WALK                                                               (44.83%)
       821,488,762      INST_RETIRED                     #      0.0 MPKI  itlb_mpki             (43.97%)
               122      ITLB_WALK                                                               (43.97%)
               744      DTLB_WALK                        #      0.0 %  dtlb_walk_rate           (43.01%)
       263,913,146      L1D_TLB                                                                 (43.01%)
       779,073,875      INST_RETIRED                     #      0.0 MPKI  dtlb_mpki             (42.07%)
             1,050      DTLB_WALK                                                               (42.07%)

       0.435864901 seconds time elapsed

       1.201384000 seconds user
       0.000000000 seconds sys


$sudo ./perf stat -M TopDownL1 false_sharing 2

 Performance counter stats for 'false_sharing 2':

     3,408,960,257      cpu_cycles                       #      0.0 %  bad_speculation
                                                  #      5.1 %  retiring                 (66.79%)
    19,576,079,610      stall_slot                                                              (66.79%)
       877,673,452      op_spec                                                                 (66.79%)
       876,324,270      op_retired                                                              (66.79%)
     3,406,548,064      cpu_cycles                       #     26.7 %  frontend_bound           (67.08%)
     7,961,814,801      stall_slot_frontend                                                     (67.08%)
     3,415,528,440      cpu_cycles                       #     68.8 %  backend_bound            (66.43%)
    11,746,647,747      stall_slot_backend                                                      (66.43%)

       0.455229807 seconds time elapsed

       1.243216000 seconds user
       0.000000000 seconds sys

$sudo ./perf stat -M branch sleep 1

 Performance counter stats for 'sleep 1':

           901,495      INST_RETIRED                     #    223.6 PKI  branch_pki
           201,603      BR_RETIRED
           901,495      INST_RETIRED                     #     10.0 MPKI  branch_mpki
             9,004      BR_MIS_PRED_RETIRED
             9,004      BR_MIS_PRED_RETIRED              #      4.5 %  branch_miss_pred_rate
           201,603      BR_RETIRED

       1.000794467 seconds time elapsed

       0.000905000 seconds user
       0.000000000 seconds sys


Jing Zhang (7):
  perf vendor events arm64: Add common topdown L1 metrics
  perf vendor events arm64: Add topdown L1 metrics for neoverse-n2-v2
  perf vendor events arm64: Add TLB metrics for neoverse-n2-v2
  perf vendor events arm64: Add cache metrics for neoverse-n2-v2
  perf vendor events arm64: Add branch metrics for neoverse-n2-v2
  perf vendor events arm64: Add PE utilization metrics for
    neoverse-n2-v2
  perf vendor events arm64: Add instruction mix metrics for
    neoverse-n2-v2

 tools/perf/arch/arm64/util/pmu.c                   |  22 ++
 .../arch/arm64/arm/neoverse-n2-v2/metrics.json     | 273 +++++++++++++++++++++
 tools/perf/pmu-events/arch/arm64/sbsa.json         |  30 +++
 tools/perf/pmu-events/jevents.py                   |   2 +
 tools/perf/util/expr.c                             |   5 +
 tools/perf/util/pmu.c                              |   5 +
 tools/perf/util/pmu.h                              |   1 +
 7 files changed, 338 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/sbsa.json

-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v6 1/7] perf vendor events arm64: Add common topdown L1 metrics
  2023-01-06 15:05 ` Jing Zhang
@ 2023-01-06 15:05   ` Jing Zhang
  -1 siblings, 0 replies; 24+ messages in thread
From: Jing Zhang @ 2023-01-06 15:05 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Xing Zhengjun, Will Deacon, James Clark, Mike Leach, Leo Yan,
	linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song, Jing Zhang

The metrics of topdown L1 are from ARM sbsa7.0 platform design doc[0],
D37-38, which are standard. So put them in the common file sbsa.json of
arm64, so that other cores besides n2/v2 can also be reused.

Slots may be different in each architecture, so added "#slots" literal
to get different constant for each architecture.

The value of slots comes from the register PMMIR_EL1, which I can read
in /sys/bus/event_source/device/armv8_pmuv3_*/caps/slots. PMMIR_EL1.SLOT
might read as zero if the STALL_SLOT event is not implemented or the PMU
version is lower than ID_AA64DFR0_EL1_PMUVer_V3P4.

[0] https://documentation-service.arm.com/static/60250c7395978b529036da86?token=

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Acked-by: Ian Rogers <irogers@google.com>
---
 tools/perf/arch/arm64/util/pmu.c           | 22 ++++++++++++++++++++++
 tools/perf/pmu-events/arch/arm64/sbsa.json | 30 ++++++++++++++++++++++++++++++
 tools/perf/pmu-events/jevents.py           |  2 ++
 tools/perf/util/expr.c                     |  5 +++++
 tools/perf/util/pmu.c                      |  5 +++++
 tools/perf/util/pmu.h                      |  1 +
 6 files changed, 65 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/arm64/sbsa.json

diff --git a/tools/perf/arch/arm64/util/pmu.c b/tools/perf/arch/arm64/util/pmu.c
index 477e513..227dadb 100644
--- a/tools/perf/arch/arm64/util/pmu.c
+++ b/tools/perf/arch/arm64/util/pmu.c
@@ -3,6 +3,7 @@
 #include <internal/cpumap.h>
 #include "../../../util/cpumap.h"
 #include "../../../util/pmu.h"
+#include <api/fs/fs.h>
 
 const struct pmu_events_table *pmu_events_table__find(void)
 {
@@ -24,3 +25,24 @@ const struct pmu_events_table *pmu_events_table__find(void)
 
 	return NULL;
 }
+
+int perf_pmu__get_slots(void)
+{
+	char path[PATH_MAX];
+	unsigned long long slots = 0;
+	struct perf_pmu *pmu = NULL;
+
+	while ((pmu = perf_pmu__scan(pmu)) != NULL) {
+		if (is_pmu_core(pmu->name))
+			break;
+	}
+	if (pmu) {
+		scnprintf(path, PATH_MAX,
+			EVENT_SOURCE_DEVICE_PATH "%s/caps/slots", pmu->name);
+		/* The value of slots is not greater than INT_MAX, but sysfs__read_int
+		 * can't read value with 0x prefix, so use sysfs__read_ull instead.
+		 */
+		sysfs__read_ull(path, &slots);
+	}
+	return (int)slots;
+}
diff --git a/tools/perf/pmu-events/arch/arm64/sbsa.json b/tools/perf/pmu-events/arch/arm64/sbsa.json
new file mode 100644
index 0000000..f678c37e
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/sbsa.json
@@ -0,0 +1,30 @@
+[
+    {
+        "MetricExpr": "stall_slot_frontend / (#slots * cpu_cycles)",
+        "BriefDescription": "Frontend bound L1 topdown metric",
+        "MetricGroup": "TopdownL1",
+        "MetricName": "frontend_bound",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "(1 - op_retired / op_spec) * (1 - stall_slot / (#slots * cpu_cycles))",
+        "BriefDescription": "Bad speculation L1 topdown metric",
+        "MetricGroup": "TopdownL1",
+        "MetricName": "bad_speculation",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "(op_retired / op_spec) * (1 - stall_slot / (#slots * cpu_cycles))",
+        "BriefDescription": "Retiring L1 topdown metric",
+        "MetricGroup": "TopdownL1",
+        "MetricName": "retiring",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "stall_slot_backend / (#slots * cpu_cycles)",
+        "BriefDescription": "Backend Bound L1 topdown metric",
+        "MetricGroup": "TopdownL1",
+        "MetricName": "backend_bound",
+        "ScaleUnit": "100%"
+    }
+]
diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index 4c398e0..0416b74 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -358,6 +358,8 @@ def preprocess_arch_std_files(archpath: str) -> None:
       for event in read_json_events(item.path, topic=''):
         if event.name:
           _arch_std_events[event.name.lower()] = event
+        if event.metric_name:
+          _arch_std_events[event.metric_name.lower()] = event
 
 
 def print_events_table_prefix(tblname: str) -> None:
diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c
index 00dcde3..3d67707 100644
--- a/tools/perf/util/expr.c
+++ b/tools/perf/util/expr.c
@@ -19,6 +19,7 @@
 #include <linux/zalloc.h>
 #include <ctype.h>
 #include <math.h>
+#include "pmu.h"
 
 #ifdef PARSER_DEBUG
 extern int expr_debug;
@@ -448,6 +449,10 @@ double expr__get_literal(const char *literal, const struct expr_scanner_ctx *ctx
 		result = topology->core_cpus_lists;
 		goto out;
 	}
+	if (!strcmp("#slots", literal)) {
+		result = perf_pmu__get_slots();
+		goto out;
+	}
 
 	pr_err("Unrecognized literal '%s'", literal);
 out:
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 2bdeb89..d4cace2 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -1993,3 +1993,8 @@ int perf_pmu__cpus_match(struct perf_pmu *pmu, struct perf_cpu_map *cpus,
 	*ucpus_ptr = unmatched_cpus;
 	return 0;
 }
+
+int __weak perf_pmu__get_slots(void)
+{
+	return 0;
+}
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 69ca000..a2f7df8 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -259,4 +259,5 @@ int perf_pmu__cpus_match(struct perf_pmu *pmu, struct perf_cpu_map *cpus,
 
 char *pmu_find_real_name(const char *name);
 char *pmu_find_alias_name(const char *name);
+int perf_pmu__get_slots(void);
 #endif /* __PMU_H */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 1/7] perf vendor events arm64: Add common topdown L1 metrics
@ 2023-01-06 15:05   ` Jing Zhang
  0 siblings, 0 replies; 24+ messages in thread
From: Jing Zhang @ 2023-01-06 15:05 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Xing Zhengjun, Will Deacon, James Clark, Mike Leach, Leo Yan,
	linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song, Jing Zhang

The metrics of topdown L1 are from ARM sbsa7.0 platform design doc[0],
D37-38, which are standard. So put them in the common file sbsa.json of
arm64, so that other cores besides n2/v2 can also be reused.

Slots may be different in each architecture, so added "#slots" literal
to get different constant for each architecture.

The value of slots comes from the register PMMIR_EL1, which I can read
in /sys/bus/event_source/device/armv8_pmuv3_*/caps/slots. PMMIR_EL1.SLOT
might read as zero if the STALL_SLOT event is not implemented or the PMU
version is lower than ID_AA64DFR0_EL1_PMUVer_V3P4.

[0] https://documentation-service.arm.com/static/60250c7395978b529036da86?token=

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Acked-by: Ian Rogers <irogers@google.com>
---
 tools/perf/arch/arm64/util/pmu.c           | 22 ++++++++++++++++++++++
 tools/perf/pmu-events/arch/arm64/sbsa.json | 30 ++++++++++++++++++++++++++++++
 tools/perf/pmu-events/jevents.py           |  2 ++
 tools/perf/util/expr.c                     |  5 +++++
 tools/perf/util/pmu.c                      |  5 +++++
 tools/perf/util/pmu.h                      |  1 +
 6 files changed, 65 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/arm64/sbsa.json

diff --git a/tools/perf/arch/arm64/util/pmu.c b/tools/perf/arch/arm64/util/pmu.c
index 477e513..227dadb 100644
--- a/tools/perf/arch/arm64/util/pmu.c
+++ b/tools/perf/arch/arm64/util/pmu.c
@@ -3,6 +3,7 @@
 #include <internal/cpumap.h>
 #include "../../../util/cpumap.h"
 #include "../../../util/pmu.h"
+#include <api/fs/fs.h>
 
 const struct pmu_events_table *pmu_events_table__find(void)
 {
@@ -24,3 +25,24 @@ const struct pmu_events_table *pmu_events_table__find(void)
 
 	return NULL;
 }
+
+int perf_pmu__get_slots(void)
+{
+	char path[PATH_MAX];
+	unsigned long long slots = 0;
+	struct perf_pmu *pmu = NULL;
+
+	while ((pmu = perf_pmu__scan(pmu)) != NULL) {
+		if (is_pmu_core(pmu->name))
+			break;
+	}
+	if (pmu) {
+		scnprintf(path, PATH_MAX,
+			EVENT_SOURCE_DEVICE_PATH "%s/caps/slots", pmu->name);
+		/* The value of slots is not greater than INT_MAX, but sysfs__read_int
+		 * can't read value with 0x prefix, so use sysfs__read_ull instead.
+		 */
+		sysfs__read_ull(path, &slots);
+	}
+	return (int)slots;
+}
diff --git a/tools/perf/pmu-events/arch/arm64/sbsa.json b/tools/perf/pmu-events/arch/arm64/sbsa.json
new file mode 100644
index 0000000..f678c37e
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/sbsa.json
@@ -0,0 +1,30 @@
+[
+    {
+        "MetricExpr": "stall_slot_frontend / (#slots * cpu_cycles)",
+        "BriefDescription": "Frontend bound L1 topdown metric",
+        "MetricGroup": "TopdownL1",
+        "MetricName": "frontend_bound",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "(1 - op_retired / op_spec) * (1 - stall_slot / (#slots * cpu_cycles))",
+        "BriefDescription": "Bad speculation L1 topdown metric",
+        "MetricGroup": "TopdownL1",
+        "MetricName": "bad_speculation",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "(op_retired / op_spec) * (1 - stall_slot / (#slots * cpu_cycles))",
+        "BriefDescription": "Retiring L1 topdown metric",
+        "MetricGroup": "TopdownL1",
+        "MetricName": "retiring",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "stall_slot_backend / (#slots * cpu_cycles)",
+        "BriefDescription": "Backend Bound L1 topdown metric",
+        "MetricGroup": "TopdownL1",
+        "MetricName": "backend_bound",
+        "ScaleUnit": "100%"
+    }
+]
diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
index 4c398e0..0416b74 100755
--- a/tools/perf/pmu-events/jevents.py
+++ b/tools/perf/pmu-events/jevents.py
@@ -358,6 +358,8 @@ def preprocess_arch_std_files(archpath: str) -> None:
       for event in read_json_events(item.path, topic=''):
         if event.name:
           _arch_std_events[event.name.lower()] = event
+        if event.metric_name:
+          _arch_std_events[event.metric_name.lower()] = event
 
 
 def print_events_table_prefix(tblname: str) -> None:
diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c
index 00dcde3..3d67707 100644
--- a/tools/perf/util/expr.c
+++ b/tools/perf/util/expr.c
@@ -19,6 +19,7 @@
 #include <linux/zalloc.h>
 #include <ctype.h>
 #include <math.h>
+#include "pmu.h"
 
 #ifdef PARSER_DEBUG
 extern int expr_debug;
@@ -448,6 +449,10 @@ double expr__get_literal(const char *literal, const struct expr_scanner_ctx *ctx
 		result = topology->core_cpus_lists;
 		goto out;
 	}
+	if (!strcmp("#slots", literal)) {
+		result = perf_pmu__get_slots();
+		goto out;
+	}
 
 	pr_err("Unrecognized literal '%s'", literal);
 out:
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 2bdeb89..d4cace2 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -1993,3 +1993,8 @@ int perf_pmu__cpus_match(struct perf_pmu *pmu, struct perf_cpu_map *cpus,
 	*ucpus_ptr = unmatched_cpus;
 	return 0;
 }
+
+int __weak perf_pmu__get_slots(void)
+{
+	return 0;
+}
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 69ca000..a2f7df8 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -259,4 +259,5 @@ int perf_pmu__cpus_match(struct perf_pmu *pmu, struct perf_cpu_map *cpus,
 
 char *pmu_find_real_name(const char *name);
 char *pmu_find_alias_name(const char *name);
+int perf_pmu__get_slots(void);
 #endif /* __PMU_H */
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 2/7] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2-v2
  2023-01-06 15:05 ` Jing Zhang
@ 2023-01-06 15:05   ` Jing Zhang
  -1 siblings, 0 replies; 24+ messages in thread
From: Jing Zhang @ 2023-01-06 15:05 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Xing Zhengjun, Will Deacon, James Clark, Mike Leach, Leo Yan,
	linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song, Jing Zhang

Due to the wrong count of stall_slot and stall_slot_frontend on
neoverse-n2, the real stall_slot and real stall_slot_frontend need to
subtract cpu_cycles, so overwrite the "MetricExpr" for neoverse-n2.
Reference from ARM neoverse-n2 errata notice [0], D117.

Since neoverse-n2/neoverse-v2 does not yet support topdown L2, metric
groups such as Cache, TLB, Branch, InstructionsMix and PEutilization
will be added to further analysis of performance bottlenecks in the
following patches. Reference from ARM PMU guide [1][2].

[0] https://documentation-service.arm.com/static/636a66a64e6cf12278ad89cb?token=
[1] https://documentation-service.arm.com/static/628f8fa3dfaf015c2b76eae8?token=
[2] https://documentation-service.arm.com/static/62cfe21e31ea212bb6627393?token=

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Acked-by: Ian Rogers <irogers@google.com>
---
 .../arch/arm64/arm/neoverse-n2-v2/metrics.json          | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
new file mode 100644
index 0000000..4e7417f
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
@@ -0,0 +1,17 @@
+[
+    {
+        "ArchStdEvent": "FRONTEND_BOUND",
+        "MetricExpr": "((stall_slot_frontend) if (#slots - 5) else (stall_slot_frontend - cpu_cycles)) / (#slots * cpu_cycles)"
+    },
+    {
+        "ArchStdEvent": "BAD_SPECULATION",
+        "MetricExpr": "(1 - op_retired / op_spec) * (1 - (stall_slot if (#slots - 5) else (stall_slot - cpu_cycles)) / (#slots * cpu_cycles))"
+    },
+    {
+        "ArchStdEvent": "RETIRING",
+        "MetricExpr": "(op_retired / op_spec) * (1 - (stall_slot if (#slots - 5) else (stall_slot - cpu_cycles)) / (#slots * cpu_cycles))"
+    },
+    {
+        "ArchStdEvent": "BACKEND_BOUND"
+    }
+]
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 2/7] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2-v2
@ 2023-01-06 15:05   ` Jing Zhang
  0 siblings, 0 replies; 24+ messages in thread
From: Jing Zhang @ 2023-01-06 15:05 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Xing Zhengjun, Will Deacon, James Clark, Mike Leach, Leo Yan,
	linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song, Jing Zhang

Due to the wrong count of stall_slot and stall_slot_frontend on
neoverse-n2, the real stall_slot and real stall_slot_frontend need to
subtract cpu_cycles, so overwrite the "MetricExpr" for neoverse-n2.
Reference from ARM neoverse-n2 errata notice [0], D117.

Since neoverse-n2/neoverse-v2 does not yet support topdown L2, metric
groups such as Cache, TLB, Branch, InstructionsMix and PEutilization
will be added to further analysis of performance bottlenecks in the
following patches. Reference from ARM PMU guide [1][2].

[0] https://documentation-service.arm.com/static/636a66a64e6cf12278ad89cb?token=
[1] https://documentation-service.arm.com/static/628f8fa3dfaf015c2b76eae8?token=
[2] https://documentation-service.arm.com/static/62cfe21e31ea212bb6627393?token=

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Acked-by: Ian Rogers <irogers@google.com>
---
 .../arch/arm64/arm/neoverse-n2-v2/metrics.json          | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
new file mode 100644
index 0000000..4e7417f
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
@@ -0,0 +1,17 @@
+[
+    {
+        "ArchStdEvent": "FRONTEND_BOUND",
+        "MetricExpr": "((stall_slot_frontend) if (#slots - 5) else (stall_slot_frontend - cpu_cycles)) / (#slots * cpu_cycles)"
+    },
+    {
+        "ArchStdEvent": "BAD_SPECULATION",
+        "MetricExpr": "(1 - op_retired / op_spec) * (1 - (stall_slot if (#slots - 5) else (stall_slot - cpu_cycles)) / (#slots * cpu_cycles))"
+    },
+    {
+        "ArchStdEvent": "RETIRING",
+        "MetricExpr": "(op_retired / op_spec) * (1 - (stall_slot if (#slots - 5) else (stall_slot - cpu_cycles)) / (#slots * cpu_cycles))"
+    },
+    {
+        "ArchStdEvent": "BACKEND_BOUND"
+    }
+]
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 3/7] perf vendor events arm64: Add TLB metrics for neoverse-n2-v2
  2023-01-06 15:05 ` Jing Zhang
@ 2023-01-06 15:05   ` Jing Zhang
  -1 siblings, 0 replies; 24+ messages in thread
From: Jing Zhang @ 2023-01-06 15:05 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Xing Zhengjun, Will Deacon, James Clark, Mike Leach, Leo Yan,
	linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song, Jing Zhang

Add TLB related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Acked-by: Ian Rogers <irogers@google.com>
---
 .../arch/arm64/arm/neoverse-n2-v2/metrics.json     | 49 ++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
index 4e7417f..60bbd8f 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
@@ -13,5 +13,54 @@
     },
     {
         "ArchStdEvent": "BACKEND_BOUND"
+    },
+    {
+        "MetricExpr": "L1D_TLB_REFILL / L1D_TLB",
+        "BriefDescription": "The rate of L1D TLB refill to the overall L1D TLB lookups",
+        "MetricGroup": "TLB",
+        "MetricName": "l1d_tlb_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "L1I_TLB_REFILL / L1I_TLB",
+        "BriefDescription": "The rate of L1I TLB refill to the overall L1I TLB lookups",
+        "MetricGroup": "TLB",
+        "MetricName": "l1i_tlb_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "L2D_TLB_REFILL / L2D_TLB",
+        "BriefDescription": "The rate of L2D TLB refill to the overall L2D TLB lookups",
+        "MetricGroup": "TLB",
+        "MetricName": "l2_tlb_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "DTLB_WALK / INST_RETIRED * 1000",
+        "BriefDescription": "The rate of TLB Walks per kilo instructions for data accesses",
+        "MetricGroup": "TLB",
+        "MetricName": "dtlb_mpki",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricExpr": "DTLB_WALK / L1D_TLB",
+        "BriefDescription": "The rate of DTLB Walks to the overall L1D TLB lookups",
+        "MetricGroup": "TLB",
+        "MetricName": "dtlb_walk_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "ITLB_WALK / INST_RETIRED * 1000",
+        "BriefDescription": "The rate of TLB Walks per kilo instructions for instruction accesses",
+        "MetricGroup": "TLB",
+        "MetricName": "itlb_mpki",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricExpr": "ITLB_WALK / L1I_TLB",
+        "BriefDescription": "The rate of ITLB Walks to the overall L1I TLB lookups",
+        "MetricGroup": "TLB",
+        "MetricName": "itlb_walk_rate",
+        "ScaleUnit": "100%"
     }
 ]
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 3/7] perf vendor events arm64: Add TLB metrics for neoverse-n2-v2
@ 2023-01-06 15:05   ` Jing Zhang
  0 siblings, 0 replies; 24+ messages in thread
From: Jing Zhang @ 2023-01-06 15:05 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Xing Zhengjun, Will Deacon, James Clark, Mike Leach, Leo Yan,
	linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song, Jing Zhang

Add TLB related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Acked-by: Ian Rogers <irogers@google.com>
---
 .../arch/arm64/arm/neoverse-n2-v2/metrics.json     | 49 ++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
index 4e7417f..60bbd8f 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
@@ -13,5 +13,54 @@
     },
     {
         "ArchStdEvent": "BACKEND_BOUND"
+    },
+    {
+        "MetricExpr": "L1D_TLB_REFILL / L1D_TLB",
+        "BriefDescription": "The rate of L1D TLB refill to the overall L1D TLB lookups",
+        "MetricGroup": "TLB",
+        "MetricName": "l1d_tlb_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "L1I_TLB_REFILL / L1I_TLB",
+        "BriefDescription": "The rate of L1I TLB refill to the overall L1I TLB lookups",
+        "MetricGroup": "TLB",
+        "MetricName": "l1i_tlb_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "L2D_TLB_REFILL / L2D_TLB",
+        "BriefDescription": "The rate of L2D TLB refill to the overall L2D TLB lookups",
+        "MetricGroup": "TLB",
+        "MetricName": "l2_tlb_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "DTLB_WALK / INST_RETIRED * 1000",
+        "BriefDescription": "The rate of TLB Walks per kilo instructions for data accesses",
+        "MetricGroup": "TLB",
+        "MetricName": "dtlb_mpki",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricExpr": "DTLB_WALK / L1D_TLB",
+        "BriefDescription": "The rate of DTLB Walks to the overall L1D TLB lookups",
+        "MetricGroup": "TLB",
+        "MetricName": "dtlb_walk_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "ITLB_WALK / INST_RETIRED * 1000",
+        "BriefDescription": "The rate of TLB Walks per kilo instructions for instruction accesses",
+        "MetricGroup": "TLB",
+        "MetricName": "itlb_mpki",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricExpr": "ITLB_WALK / L1I_TLB",
+        "BriefDescription": "The rate of ITLB Walks to the overall L1I TLB lookups",
+        "MetricGroup": "TLB",
+        "MetricName": "itlb_walk_rate",
+        "ScaleUnit": "100%"
     }
 ]
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 4/7] perf vendor events arm64: Add cache metrics for neoverse-n2-v2
  2023-01-06 15:05 ` Jing Zhang
@ 2023-01-06 15:05   ` Jing Zhang
  -1 siblings, 0 replies; 24+ messages in thread
From: Jing Zhang @ 2023-01-06 15:05 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Xing Zhengjun, Will Deacon, James Clark, Mike Leach, Leo Yan,
	linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song, Jing Zhang

Add cache related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Acked-by: Ian Rogers <irogers@google.com>
---
 .../arch/arm64/arm/neoverse-n2-v2/metrics.json     | 77 ++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
index 60bbd8f..08c6aaa 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
@@ -62,5 +62,82 @@
         "MetricGroup": "TLB",
         "MetricName": "itlb_walk_rate",
         "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "L1I_CACHE_REFILL / INST_RETIRED * 1000",
+        "BriefDescription": "The rate of L1 I-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l1i_cache_mpki",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricExpr": "L1I_CACHE_REFILL / L1I_CACHE",
+        "BriefDescription": "The rate of L1 I-Cache misses to the overall L1 I-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l1i_cache_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "L1D_CACHE_REFILL / INST_RETIRED * 1000",
+        "BriefDescription": "The rate of L1 D-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l1d_cache_mpki",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricExpr": "L1D_CACHE_REFILL / L1D_CACHE",
+        "BriefDescription": "The rate of L1 D-Cache misses to the overall L1 D-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l1d_cache_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "L2D_CACHE_REFILL / INST_RETIRED * 1000",
+        "BriefDescription": "The rate of L2 D-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l2d_cache_mpki",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricExpr": "L2D_CACHE_REFILL / L2D_CACHE",
+        "BriefDescription": "The rate of L2 D-Cache misses to the overall L2 D-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l2d_cache_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "L3D_CACHE_REFILL / INST_RETIRED * 1000",
+        "BriefDescription": "The rate of L3 D-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l3d_cache_mpki",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricExpr": "L3D_CACHE_REFILL / L3D_CACHE",
+        "BriefDescription": "The rate of L3 D-Cache misses to the overall L3 D-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l3d_cache_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "LL_CACHE_MISS_RD / INST_RETIRED * 1000",
+        "BriefDescription": "The rate of LL Cache read misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "ll_cache_read_mpki",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricExpr": "LL_CACHE_MISS_RD / LL_CACHE_RD",
+        "BriefDescription": "The rate of LL Cache read misses to the overall LL Cache read",
+        "MetricGroup": "Cache",
+        "MetricName": "ll_cache_read_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "(LL_CACHE_RD - LL_CACHE_MISS_RD) / LL_CACHE_RD",
+        "BriefDescription": "The rate of LL Cache read hit to the overall LL Cache read",
+        "MetricGroup": "Cache",
+        "MetricName": "ll_cache_read_hit_rate",
+        "ScaleUnit": "100%"
     }
 ]
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 4/7] perf vendor events arm64: Add cache metrics for neoverse-n2-v2
@ 2023-01-06 15:05   ` Jing Zhang
  0 siblings, 0 replies; 24+ messages in thread
From: Jing Zhang @ 2023-01-06 15:05 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Xing Zhengjun, Will Deacon, James Clark, Mike Leach, Leo Yan,
	linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song, Jing Zhang

Add cache related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Acked-by: Ian Rogers <irogers@google.com>
---
 .../arch/arm64/arm/neoverse-n2-v2/metrics.json     | 77 ++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
index 60bbd8f..08c6aaa 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
@@ -62,5 +62,82 @@
         "MetricGroup": "TLB",
         "MetricName": "itlb_walk_rate",
         "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "L1I_CACHE_REFILL / INST_RETIRED * 1000",
+        "BriefDescription": "The rate of L1 I-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l1i_cache_mpki",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricExpr": "L1I_CACHE_REFILL / L1I_CACHE",
+        "BriefDescription": "The rate of L1 I-Cache misses to the overall L1 I-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l1i_cache_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "L1D_CACHE_REFILL / INST_RETIRED * 1000",
+        "BriefDescription": "The rate of L1 D-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l1d_cache_mpki",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricExpr": "L1D_CACHE_REFILL / L1D_CACHE",
+        "BriefDescription": "The rate of L1 D-Cache misses to the overall L1 D-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l1d_cache_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "L2D_CACHE_REFILL / INST_RETIRED * 1000",
+        "BriefDescription": "The rate of L2 D-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l2d_cache_mpki",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricExpr": "L2D_CACHE_REFILL / L2D_CACHE",
+        "BriefDescription": "The rate of L2 D-Cache misses to the overall L2 D-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l2d_cache_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "L3D_CACHE_REFILL / INST_RETIRED * 1000",
+        "BriefDescription": "The rate of L3 D-Cache misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "l3d_cache_mpki",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricExpr": "L3D_CACHE_REFILL / L3D_CACHE",
+        "BriefDescription": "The rate of L3 D-Cache misses to the overall L3 D-Cache",
+        "MetricGroup": "Cache",
+        "MetricName": "l3d_cache_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "LL_CACHE_MISS_RD / INST_RETIRED * 1000",
+        "BriefDescription": "The rate of LL Cache read misses per kilo instructions",
+        "MetricGroup": "Cache",
+        "MetricName": "ll_cache_read_mpki",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricExpr": "LL_CACHE_MISS_RD / LL_CACHE_RD",
+        "BriefDescription": "The rate of LL Cache read misses to the overall LL Cache read",
+        "MetricGroup": "Cache",
+        "MetricName": "ll_cache_read_miss_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "(LL_CACHE_RD - LL_CACHE_MISS_RD) / LL_CACHE_RD",
+        "BriefDescription": "The rate of LL Cache read hit to the overall LL Cache read",
+        "MetricGroup": "Cache",
+        "MetricName": "ll_cache_read_hit_rate",
+        "ScaleUnit": "100%"
     }
 ]
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 5/7] perf vendor events arm64: Add branch metrics for neoverse-n2-v2
  2023-01-06 15:05 ` Jing Zhang
@ 2023-01-06 15:05   ` Jing Zhang
  -1 siblings, 0 replies; 24+ messages in thread
From: Jing Zhang @ 2023-01-06 15:05 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Xing Zhengjun, Will Deacon, James Clark, Mike Leach, Leo Yan,
	linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song, Jing Zhang

Add branch related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Acked-by: Ian Rogers <irogers@google.com>
---
 .../arch/arm64/arm/neoverse-n2-v2/metrics.json      | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
index 08c6aaa..afcdb17 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
@@ -139,5 +139,26 @@
         "MetricGroup": "Cache",
         "MetricName": "ll_cache_read_hit_rate",
         "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "BR_MIS_PRED_RETIRED / INST_RETIRED * 1000",
+        "BriefDescription": "The rate of branches mis-predicted per kilo instructions",
+        "MetricGroup": "Branch",
+        "MetricName": "branch_mpki",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricExpr": "BR_RETIRED / INST_RETIRED * 1000",
+        "BriefDescription": "The rate of branches retired per kilo instructions",
+        "MetricGroup": "Branch",
+        "MetricName": "branch_pki",
+        "ScaleUnit": "1PKI"
+    },
+    {
+        "MetricExpr": "BR_MIS_PRED_RETIRED / BR_RETIRED",
+        "BriefDescription": "The rate of branches mis-predited to the overall branches",
+        "MetricGroup": "Branch",
+        "MetricName": "branch_miss_pred_rate",
+        "ScaleUnit": "100%"
     }
 ]
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 5/7] perf vendor events arm64: Add branch metrics for neoverse-n2-v2
@ 2023-01-06 15:05   ` Jing Zhang
  0 siblings, 0 replies; 24+ messages in thread
From: Jing Zhang @ 2023-01-06 15:05 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Xing Zhengjun, Will Deacon, James Clark, Mike Leach, Leo Yan,
	linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song, Jing Zhang

Add branch related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Acked-by: Ian Rogers <irogers@google.com>
---
 .../arch/arm64/arm/neoverse-n2-v2/metrics.json      | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
index 08c6aaa..afcdb17 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
@@ -139,5 +139,26 @@
         "MetricGroup": "Cache",
         "MetricName": "ll_cache_read_hit_rate",
         "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "BR_MIS_PRED_RETIRED / INST_RETIRED * 1000",
+        "BriefDescription": "The rate of branches mis-predicted per kilo instructions",
+        "MetricGroup": "Branch",
+        "MetricName": "branch_mpki",
+        "ScaleUnit": "1MPKI"
+    },
+    {
+        "MetricExpr": "BR_RETIRED / INST_RETIRED * 1000",
+        "BriefDescription": "The rate of branches retired per kilo instructions",
+        "MetricGroup": "Branch",
+        "MetricName": "branch_pki",
+        "ScaleUnit": "1PKI"
+    },
+    {
+        "MetricExpr": "BR_MIS_PRED_RETIRED / BR_RETIRED",
+        "BriefDescription": "The rate of branches mis-predited to the overall branches",
+        "MetricGroup": "Branch",
+        "MetricName": "branch_miss_pred_rate",
+        "ScaleUnit": "100%"
     }
 ]
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 6/7] perf vendor events arm64: Add PE utilization metrics for neoverse-n2-v2
  2023-01-06 15:05 ` Jing Zhang
@ 2023-01-06 15:05   ` Jing Zhang
  -1 siblings, 0 replies; 24+ messages in thread
From: Jing Zhang @ 2023-01-06 15:05 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Xing Zhengjun, Will Deacon, James Clark, Mike Leach, Leo Yan,
	linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song, Jing Zhang

Add PE utilization related metrics. In cpu_utilization metric, if it is
neoverse-n2 judged by #slot, the real stall_slot need to subtract the
cpu_cycles according to the neoverse-n2 errata [0].

[0] https://documentation-service.arm.com/static/636a66a64e6cf12278ad89cb?token=

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Acked-by: Ian Rogers <irogers@google.com>
---
 .../arch/arm64/arm/neoverse-n2-v2/metrics.json     | 46 ++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
index afcdb17..3d6ac0c 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
@@ -160,5 +160,51 @@
         "MetricGroup": "Branch",
         "MetricName": "branch_miss_pred_rate",
         "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "instructions / CPU_CYCLES",
+        "BriefDescription": "The average number of instructions executed for each cycle.",
+        "MetricGroup": "PEutilization",
+        "MetricName": "ipc"
+    },
+    {
+        "MetricExpr": "ipc / 5",
+        "BriefDescription": "IPC percentage of peak. The peak of IPC is 5.",
+        "MetricGroup": "PEutilization",
+        "MetricName": "ipc_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "INST_RETIRED / CPU_CYCLES",
+        "BriefDescription": "Architecturally executed Instructions Per Cycle (IPC)",
+        "MetricGroup": "PEutilization",
+        "MetricName": "retired_ipc"
+    },
+    {
+        "MetricExpr": "INST_SPEC / CPU_CYCLES",
+        "BriefDescription": "Speculatively executed Instructions Per Cycle (IPC)",
+        "MetricGroup": "PEutilization",
+        "MetricName": "spec_ipc"
+    },
+    {
+        "MetricExpr": "OP_RETIRED / OP_SPEC",
+        "BriefDescription": "Of all the micro-operations issued, what percentage are retired(committed)",
+        "MetricGroup": "PEutilization",
+        "MetricName": "retired_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "1 - OP_RETIRED / OP_SPEC",
+        "BriefDescription": "Of all the micro-operations issued, what percentage are not retired(committed)",
+        "MetricGroup": "PEutilization",
+        "MetricName": "wasted_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "OP_RETIRED / OP_SPEC * (1 - (STALL_SLOT if (#slots - 5) else (STALL_SLOT - CPU_CYCLES)) / (#slots * CPU_CYCLES))",
+        "BriefDescription": "The truly effective ratio of micro-operations executed by the CPU, which means that misprediction and stall are not included",
+        "MetricGroup": "PEutilization",
+        "MetricName": "cpu_utilization",
+        "ScaleUnit": "100%"
     }
 ]
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 6/7] perf vendor events arm64: Add PE utilization metrics for neoverse-n2-v2
@ 2023-01-06 15:05   ` Jing Zhang
  0 siblings, 0 replies; 24+ messages in thread
From: Jing Zhang @ 2023-01-06 15:05 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Xing Zhengjun, Will Deacon, James Clark, Mike Leach, Leo Yan,
	linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song, Jing Zhang

Add PE utilization related metrics. In cpu_utilization metric, if it is
neoverse-n2 judged by #slot, the real stall_slot need to subtract the
cpu_cycles according to the neoverse-n2 errata [0].

[0] https://documentation-service.arm.com/static/636a66a64e6cf12278ad89cb?token=

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Acked-by: Ian Rogers <irogers@google.com>
---
 .../arch/arm64/arm/neoverse-n2-v2/metrics.json     | 46 ++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
index afcdb17..3d6ac0c 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
@@ -160,5 +160,51 @@
         "MetricGroup": "Branch",
         "MetricName": "branch_miss_pred_rate",
         "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "instructions / CPU_CYCLES",
+        "BriefDescription": "The average number of instructions executed for each cycle.",
+        "MetricGroup": "PEutilization",
+        "MetricName": "ipc"
+    },
+    {
+        "MetricExpr": "ipc / 5",
+        "BriefDescription": "IPC percentage of peak. The peak of IPC is 5.",
+        "MetricGroup": "PEutilization",
+        "MetricName": "ipc_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "INST_RETIRED / CPU_CYCLES",
+        "BriefDescription": "Architecturally executed Instructions Per Cycle (IPC)",
+        "MetricGroup": "PEutilization",
+        "MetricName": "retired_ipc"
+    },
+    {
+        "MetricExpr": "INST_SPEC / CPU_CYCLES",
+        "BriefDescription": "Speculatively executed Instructions Per Cycle (IPC)",
+        "MetricGroup": "PEutilization",
+        "MetricName": "spec_ipc"
+    },
+    {
+        "MetricExpr": "OP_RETIRED / OP_SPEC",
+        "BriefDescription": "Of all the micro-operations issued, what percentage are retired(committed)",
+        "MetricGroup": "PEutilization",
+        "MetricName": "retired_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "1 - OP_RETIRED / OP_SPEC",
+        "BriefDescription": "Of all the micro-operations issued, what percentage are not retired(committed)",
+        "MetricGroup": "PEutilization",
+        "MetricName": "wasted_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "OP_RETIRED / OP_SPEC * (1 - (STALL_SLOT if (#slots - 5) else (STALL_SLOT - CPU_CYCLES)) / (#slots * CPU_CYCLES))",
+        "BriefDescription": "The truly effective ratio of micro-operations executed by the CPU, which means that misprediction and stall are not included",
+        "MetricGroup": "PEutilization",
+        "MetricName": "cpu_utilization",
+        "ScaleUnit": "100%"
     }
 ]
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 7/7] perf vendor events arm64: Add instruction mix metrics for neoverse-n2-v2
  2023-01-06 15:05 ` Jing Zhang
@ 2023-01-06 15:05   ` Jing Zhang
  -1 siblings, 0 replies; 24+ messages in thread
From: Jing Zhang @ 2023-01-06 15:05 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Xing Zhengjun, Will Deacon, James Clark, Mike Leach, Leo Yan,
	linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song, Jing Zhang

Add instruction mix related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Acked-by: Ian Rogers <irogers@google.com>
---
 .../arch/arm64/arm/neoverse-n2-v2/metrics.json     | 63 ++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
index 3d6ac0c..8ad15b7 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
@@ -206,5 +206,68 @@
         "MetricGroup": "PEutilization",
         "MetricName": "cpu_utilization",
         "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "LD_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of load instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "load_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "ST_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of store instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "store_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "DP_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of integer data-processing instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "data_process_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "ASE_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of advanced SIMD instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "advanced_simd_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "VFP_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of floating point instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "float_point_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "CRYPTO_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of crypto instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "crypto_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "BR_IMMED_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of branch immediate instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "branch_immed_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "BR_RETURN_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of procedure return instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "branch_return_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "BR_INDIRECT_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of indirect branch instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "branch_indirect_spec_rate",
+        "ScaleUnit": "100%"
     }
 ]
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v6 7/7] perf vendor events arm64: Add instruction mix metrics for neoverse-n2-v2
@ 2023-01-06 15:05   ` Jing Zhang
  0 siblings, 0 replies; 24+ messages in thread
From: Jing Zhang @ 2023-01-06 15:05 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Xing Zhengjun, Will Deacon, James Clark, Mike Leach, Leo Yan,
	linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song, Jing Zhang

Add instruction mix related metrics.

Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Acked-by: Ian Rogers <irogers@google.com>
---
 .../arch/arm64/arm/neoverse-n2-v2/metrics.json     | 63 ++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
index 3d6ac0c..8ad15b7 100644
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/metrics.json
@@ -206,5 +206,68 @@
         "MetricGroup": "PEutilization",
         "MetricName": "cpu_utilization",
         "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "LD_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of load instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "load_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "ST_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of store instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "store_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "DP_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of integer data-processing instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "data_process_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "ASE_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of advanced SIMD instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "advanced_simd_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "VFP_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of floating point instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "float_point_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "CRYPTO_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of crypto instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "crypto_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "BR_IMMED_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of branch immediate instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "branch_immed_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "BR_RETURN_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of procedure return instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "branch_return_spec_rate",
+        "ScaleUnit": "100%"
+    },
+    {
+        "MetricExpr": "BR_INDIRECT_SPEC / INST_SPEC",
+        "BriefDescription": "The rate of indirect branch instructions speculatively executed to overall instructions speclatively executed",
+        "MetricGroup": "InstructionMix",
+        "MetricName": "branch_indirect_spec_rate",
+        "ScaleUnit": "100%"
     }
 ]
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 1/7] perf vendor events arm64: Add common topdown L1 metrics
  2023-01-06 15:05   ` Jing Zhang
@ 2023-01-06 15:59     ` John Garry
  -1 siblings, 0 replies; 24+ messages in thread
From: John Garry @ 2023-01-06 15:59 UTC (permalink / raw)
  To: Jing Zhang, Ian Rogers
  Cc: Xing Zhengjun, Will Deacon, James Clark, Mike Leach, Leo Yan,
	linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song

On 06/01/2023 15:05, Jing Zhang wrote:
> The metrics of topdown L1 are from ARM sbsa7.0 platform design doc[0],
> D37-38, which are standard. So put them in the common file sbsa.json of
> arm64, so that other cores besides n2/v2 can also be reused.
> 
> Slots may be different in each architecture, so added "#slots" literal
> to get different constant for each architecture.
> 
> The value of slots comes from the register PMMIR_EL1, which I can read
> in /sys/bus/event_source/device/armv8_pmuv3_*/caps/slots. PMMIR_EL1.SLOT
> might read as zero if the STALL_SLOT event is not implemented or the PMU
> version is lower than ID_AA64DFR0_EL1_PMUVer_V3P4.
> 
> [0] https://urldefense.com/v3/__https://documentation-service.arm.com/static/60250c7395978b529036da86?token=__;!!ACWV5N9M2RV99hQ!J5JW3y6GhaJUqLfbEAzWIy4GJOhUkHQN4D5hEv3Outpzd54fN1Nt4LNKGnuRtMAepS_Nit-KLSUW98tVfFR0TmMVGQ$
> 
> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
> Acked-by: Ian Rogers <irogers@google.com>

hmmm... you have made significant changes in this version (compared to 
previous), so I would not have picked up this tag. That's just my opinion.

As for the patchset org, I'd move the JSON change here into patch #2, 
and make this patch purely about add "slots" literal support for arm64.

> ---
>   tools/perf/arch/arm64/util/pmu.c           | 22 ++++++++++++++++++++++
>   tools/perf/pmu-events/arch/arm64/sbsa.json | 30 ++++++++++++++++++++++++++++++
>   tools/perf/pmu-events/jevents.py           |  2 ++
>   tools/perf/util/expr.c                     |  5 +++++
>   tools/perf/util/pmu.c                      |  5 +++++
>   tools/perf/util/pmu.h                      |  1 +
>   6 files changed, 65 insertions(+)
>   create mode 100644 tools/perf/pmu-events/arch/arm64/sbsa.json
> 
> diff --git a/tools/perf/arch/arm64/util/pmu.c b/tools/perf/arch/arm64/util/pmu.c
> index 477e513..227dadb 100644
> --- a/tools/perf/arch/arm64/util/pmu.c
> +++ b/tools/perf/arch/arm64/util/pmu.c
> @@ -3,6 +3,7 @@
>   #include <internal/cpumap.h>
>   #include "../../../util/cpumap.h"
>   #include "../../../util/pmu.h"
> +#include <api/fs/fs.h>
>   
>   const struct pmu_events_table *pmu_events_table__find(void)
>   {
> @@ -24,3 +25,24 @@ const struct pmu_events_table *pmu_events_table__find(void)
>   
>   	return NULL;
>   }
> +
> +int perf_pmu__get_slots(void)
> +{
> +	char path[PATH_MAX];
> +	unsigned long long slots = 0;
> +	struct perf_pmu *pmu = NULL;
> +
> +	while ((pmu = perf_pmu__scan(pmu)) != NULL) {
> +		if (is_pmu_core(pmu->name))
> +			break;
> +	}

There is a lot in common with arm64's pmu_events_table__find() - can you 
factor it out? I also prefer how we check for homogeneous CPUs in 
pmu_events_table__find() (which you should do, also).

> +	if (pmu) {
> +		scnprintf(path, PATH_MAX,
> +			EVENT_SOURCE_DEVICE_PATH "%s/caps/slots", pmu->name);
> +		/* The value of slots is not greater than INT_MAX, but sysfs__read_int
> +		 * can't read value with 0x prefix, so use sysfs__read_ull instead.
> +		 */
> +		sysfs__read_ull(path, &slots);
> +	}
> +	return (int)slots;
> +}
> diff --git a/tools/perf/pmu-events/arch/arm64/sbsa.json b/tools/perf/pmu-events/arch/arm64/sbsa.json
> new file mode 100644
> index 0000000..f678c37e
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/sbsa.json
> @@ -0,0 +1,30 @@
> +[
> +    {
> +        "MetricExpr": "stall_slot_frontend / (#slots * cpu_cycles)",
> +        "BriefDescription": "Frontend bound L1 topdown metric",
> +        "MetricGroup": "TopdownL1",
> +        "MetricName": "frontend_bound",
> +        "ScaleUnit": "100%"
> +    },
> +    {
> +        "MetricExpr": "(1 - op_retired / op_spec) * (1 - stall_slot / (#slots * cpu_cycles))",
> +        "BriefDescription": "Bad speculation L1 topdown metric",
> +        "MetricGroup": "TopdownL1",
> +        "MetricName": "bad_speculation",
> +        "ScaleUnit": "100%"
> +    },
> +    {
> +        "MetricExpr": "(op_retired / op_spec) * (1 - stall_slot / (#slots * cpu_cycles))",
> +        "BriefDescription": "Retiring L1 topdown metric",
> +        "MetricGroup": "TopdownL1",
> +        "MetricName": "retiring",
> +        "ScaleUnit": "100%"
> +    },
> +    {
> +        "MetricExpr": "stall_slot_backend / (#slots * cpu_cycles)",
> +        "BriefDescription": "Backend Bound L1 topdown metric",
> +        "MetricGroup": "TopdownL1",
> +        "MetricName": "backend_bound",
> +        "ScaleUnit": "100%"
> +    }
> +]
> diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
> index 4c398e0..0416b74 100755
> --- a/tools/perf/pmu-events/jevents.py
> +++ b/tools/perf/pmu-events/jevents.py
> @@ -358,6 +358,8 @@ def preprocess_arch_std_files(archpath: str) -> None:
>         for event in read_json_events(item.path, topic=''):
>           if event.name:
>             _arch_std_events[event.name.lower()] = event
> +        if event.metric_name:
> +          _arch_std_events[event.metric_name.lower()] = event
>   
>   
>   def print_events_table_prefix(tblname: str) -> None:
> diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c
> index 00dcde3..3d67707 100644
> --- a/tools/perf/util/expr.c
> +++ b/tools/perf/util/expr.c
> @@ -19,6 +19,7 @@
>   #include <linux/zalloc.h>
>   #include <ctype.h>
>   #include <math.h>
> +#include "pmu.h"
>   
>   #ifdef PARSER_DEBUG
>   extern int expr_debug;
> @@ -448,6 +449,10 @@ double expr__get_literal(const char *literal, const struct expr_scanner_ctx *ctx
>   		result = topology->core_cpus_lists;
>   		goto out;
>   	}
> +	if (!strcmp("#slots", literal)) {
> +		result = perf_pmu__get_slots();
> +		goto out;
> +	}
>   
>   	pr_err("Unrecognized literal '%s'", literal);
>   out:
> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> index 2bdeb89..d4cace2 100644
> --- a/tools/perf/util/pmu.c
> +++ b/tools/perf/util/pmu.c
> @@ -1993,3 +1993,8 @@ int perf_pmu__cpus_match(struct perf_pmu *pmu, struct perf_cpu_map *cpus,
>   	*ucpus_ptr = unmatched_cpus;
>   	return 0;
>   }
> +
> +int __weak perf_pmu__get_slots(void)
> +{
> +	return 0;

should this be NAN?

> +}
> diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
> index 69ca000..a2f7df8 100644
> --- a/tools/perf/util/pmu.h
> +++ b/tools/perf/util/pmu.h
> @@ -259,4 +259,5 @@ int perf_pmu__cpus_match(struct perf_pmu *pmu, struct perf_cpu_map *cpus,
>   
>   char *pmu_find_real_name(const char *name);
>   char *pmu_find_alias_name(const char *name);
> +int perf_pmu__get_slots(void);

I think that this name is a bit too vague. Maybe 
perf_pmu__cpu_cycles_per_slot() could be better.

>   #endif /* __PMU_H */

Thanks,
John




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 1/7] perf vendor events arm64: Add common topdown L1 metrics
@ 2023-01-06 15:59     ` John Garry
  0 siblings, 0 replies; 24+ messages in thread
From: John Garry @ 2023-01-06 15:59 UTC (permalink / raw)
  To: Jing Zhang, Ian Rogers
  Cc: Xing Zhengjun, Will Deacon, James Clark, Mike Leach, Leo Yan,
	linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song

On 06/01/2023 15:05, Jing Zhang wrote:
> The metrics of topdown L1 are from ARM sbsa7.0 platform design doc[0],
> D37-38, which are standard. So put them in the common file sbsa.json of
> arm64, so that other cores besides n2/v2 can also be reused.
> 
> Slots may be different in each architecture, so added "#slots" literal
> to get different constant for each architecture.
> 
> The value of slots comes from the register PMMIR_EL1, which I can read
> in /sys/bus/event_source/device/armv8_pmuv3_*/caps/slots. PMMIR_EL1.SLOT
> might read as zero if the STALL_SLOT event is not implemented or the PMU
> version is lower than ID_AA64DFR0_EL1_PMUVer_V3P4.
> 
> [0] https://urldefense.com/v3/__https://documentation-service.arm.com/static/60250c7395978b529036da86?token=__;!!ACWV5N9M2RV99hQ!J5JW3y6GhaJUqLfbEAzWIy4GJOhUkHQN4D5hEv3Outpzd54fN1Nt4LNKGnuRtMAepS_Nit-KLSUW98tVfFR0TmMVGQ$
> 
> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
> Acked-by: Ian Rogers <irogers@google.com>

hmmm... you have made significant changes in this version (compared to 
previous), so I would not have picked up this tag. That's just my opinion.

As for the patchset org, I'd move the JSON change here into patch #2, 
and make this patch purely about add "slots" literal support for arm64.

> ---
>   tools/perf/arch/arm64/util/pmu.c           | 22 ++++++++++++++++++++++
>   tools/perf/pmu-events/arch/arm64/sbsa.json | 30 ++++++++++++++++++++++++++++++
>   tools/perf/pmu-events/jevents.py           |  2 ++
>   tools/perf/util/expr.c                     |  5 +++++
>   tools/perf/util/pmu.c                      |  5 +++++
>   tools/perf/util/pmu.h                      |  1 +
>   6 files changed, 65 insertions(+)
>   create mode 100644 tools/perf/pmu-events/arch/arm64/sbsa.json
> 
> diff --git a/tools/perf/arch/arm64/util/pmu.c b/tools/perf/arch/arm64/util/pmu.c
> index 477e513..227dadb 100644
> --- a/tools/perf/arch/arm64/util/pmu.c
> +++ b/tools/perf/arch/arm64/util/pmu.c
> @@ -3,6 +3,7 @@
>   #include <internal/cpumap.h>
>   #include "../../../util/cpumap.h"
>   #include "../../../util/pmu.h"
> +#include <api/fs/fs.h>
>   
>   const struct pmu_events_table *pmu_events_table__find(void)
>   {
> @@ -24,3 +25,24 @@ const struct pmu_events_table *pmu_events_table__find(void)
>   
>   	return NULL;
>   }
> +
> +int perf_pmu__get_slots(void)
> +{
> +	char path[PATH_MAX];
> +	unsigned long long slots = 0;
> +	struct perf_pmu *pmu = NULL;
> +
> +	while ((pmu = perf_pmu__scan(pmu)) != NULL) {
> +		if (is_pmu_core(pmu->name))
> +			break;
> +	}

There is a lot in common with arm64's pmu_events_table__find() - can you 
factor it out? I also prefer how we check for homogeneous CPUs in 
pmu_events_table__find() (which you should do, also).

> +	if (pmu) {
> +		scnprintf(path, PATH_MAX,
> +			EVENT_SOURCE_DEVICE_PATH "%s/caps/slots", pmu->name);
> +		/* The value of slots is not greater than INT_MAX, but sysfs__read_int
> +		 * can't read value with 0x prefix, so use sysfs__read_ull instead.
> +		 */
> +		sysfs__read_ull(path, &slots);
> +	}
> +	return (int)slots;
> +}
> diff --git a/tools/perf/pmu-events/arch/arm64/sbsa.json b/tools/perf/pmu-events/arch/arm64/sbsa.json
> new file mode 100644
> index 0000000..f678c37e
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/sbsa.json
> @@ -0,0 +1,30 @@
> +[
> +    {
> +        "MetricExpr": "stall_slot_frontend / (#slots * cpu_cycles)",
> +        "BriefDescription": "Frontend bound L1 topdown metric",
> +        "MetricGroup": "TopdownL1",
> +        "MetricName": "frontend_bound",
> +        "ScaleUnit": "100%"
> +    },
> +    {
> +        "MetricExpr": "(1 - op_retired / op_spec) * (1 - stall_slot / (#slots * cpu_cycles))",
> +        "BriefDescription": "Bad speculation L1 topdown metric",
> +        "MetricGroup": "TopdownL1",
> +        "MetricName": "bad_speculation",
> +        "ScaleUnit": "100%"
> +    },
> +    {
> +        "MetricExpr": "(op_retired / op_spec) * (1 - stall_slot / (#slots * cpu_cycles))",
> +        "BriefDescription": "Retiring L1 topdown metric",
> +        "MetricGroup": "TopdownL1",
> +        "MetricName": "retiring",
> +        "ScaleUnit": "100%"
> +    },
> +    {
> +        "MetricExpr": "stall_slot_backend / (#slots * cpu_cycles)",
> +        "BriefDescription": "Backend Bound L1 topdown metric",
> +        "MetricGroup": "TopdownL1",
> +        "MetricName": "backend_bound",
> +        "ScaleUnit": "100%"
> +    }
> +]
> diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
> index 4c398e0..0416b74 100755
> --- a/tools/perf/pmu-events/jevents.py
> +++ b/tools/perf/pmu-events/jevents.py
> @@ -358,6 +358,8 @@ def preprocess_arch_std_files(archpath: str) -> None:
>         for event in read_json_events(item.path, topic=''):
>           if event.name:
>             _arch_std_events[event.name.lower()] = event
> +        if event.metric_name:
> +          _arch_std_events[event.metric_name.lower()] = event
>   
>   
>   def print_events_table_prefix(tblname: str) -> None:
> diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c
> index 00dcde3..3d67707 100644
> --- a/tools/perf/util/expr.c
> +++ b/tools/perf/util/expr.c
> @@ -19,6 +19,7 @@
>   #include <linux/zalloc.h>
>   #include <ctype.h>
>   #include <math.h>
> +#include "pmu.h"
>   
>   #ifdef PARSER_DEBUG
>   extern int expr_debug;
> @@ -448,6 +449,10 @@ double expr__get_literal(const char *literal, const struct expr_scanner_ctx *ctx
>   		result = topology->core_cpus_lists;
>   		goto out;
>   	}
> +	if (!strcmp("#slots", literal)) {
> +		result = perf_pmu__get_slots();
> +		goto out;
> +	}
>   
>   	pr_err("Unrecognized literal '%s'", literal);
>   out:
> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> index 2bdeb89..d4cace2 100644
> --- a/tools/perf/util/pmu.c
> +++ b/tools/perf/util/pmu.c
> @@ -1993,3 +1993,8 @@ int perf_pmu__cpus_match(struct perf_pmu *pmu, struct perf_cpu_map *cpus,
>   	*ucpus_ptr = unmatched_cpus;
>   	return 0;
>   }
> +
> +int __weak perf_pmu__get_slots(void)
> +{
> +	return 0;

should this be NAN?

> +}
> diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
> index 69ca000..a2f7df8 100644
> --- a/tools/perf/util/pmu.h
> +++ b/tools/perf/util/pmu.h
> @@ -259,4 +259,5 @@ int perf_pmu__cpus_match(struct perf_pmu *pmu, struct perf_cpu_map *cpus,
>   
>   char *pmu_find_real_name(const char *name);
>   char *pmu_find_alias_name(const char *name);
> +int perf_pmu__get_slots(void);

I think that this name is a bit too vague. Maybe 
perf_pmu__cpu_cycles_per_slot() could be better.

>   #endif /* __PMU_H */

Thanks,
John




_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 1/7] perf vendor events arm64: Add common topdown L1 metrics
  2023-01-06 15:59     ` John Garry
@ 2023-01-09  2:53       ` Jing Zhang
  -1 siblings, 0 replies; 24+ messages in thread
From: Jing Zhang @ 2023-01-09  2:53 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Xing Zhengjun, Will Deacon, James Clark, Mike Leach, Leo Yan,
	linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song



在 2023/1/6 下午11:59, John Garry 写道:
> On 06/01/2023 15:05, Jing Zhang wrote:
>> The metrics of topdown L1 are from ARM sbsa7.0 platform design doc[0],
>> D37-38, which are standard. So put them in the common file sbsa.json of
>> arm64, so that other cores besides n2/v2 can also be reused.
>>
>> Slots may be different in each architecture, so added "#slots" literal
>> to get different constant for each architecture.
>>
>> The value of slots comes from the register PMMIR_EL1, which I can read
>> in /sys/bus/event_source/device/armv8_pmuv3_*/caps/slots. PMMIR_EL1.SLOT
>> might read as zero if the STALL_SLOT event is not implemented or the PMU
>> version is lower than ID_AA64DFR0_EL1_PMUVer_V3P4.
>>
>> [0] https://urldefense.com/v3/__https://documentation-service.arm.com/static/60250c7395978b529036da86?token=__;!!ACWV5N9M2RV99hQ!J5JW3y6GhaJUqLfbEAzWIy4GJOhUkHQN4D5hEv3Outpzd54fN1Nt4LNKGnuRtMAepS_Nit-KLSUW98tVfFR0TmMVGQ$
>>
>> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
>> Acked-by: Ian Rogers <irogers@google.com>
> 
> hmmm... you have made significant changes in this version (compared to previous), so I would not have picked up this tag. That's just my opinion.
> 

Thanks for pointing it out.

> As for the patchset org, I'd move the JSON change here into patch #2, and make this patch purely about add "slots" literal support for arm64.
> 

Ok, I will move the changes of sbsa.json and jevent.py to patch#2.

>> ---
>>   tools/perf/arch/arm64/util/pmu.c           | 22 ++++++++++++++++++++++
>>   tools/perf/pmu-events/arch/arm64/sbsa.json | 30 ++++++++++++++++++++++++++++++
>>   tools/perf/pmu-events/jevents.py           |  2 ++
>>   tools/perf/util/expr.c                     |  5 +++++
>>   tools/perf/util/pmu.c                      |  5 +++++
>>   tools/perf/util/pmu.h                      |  1 +
>>   6 files changed, 65 insertions(+)
>>   create mode 100644 tools/perf/pmu-events/arch/arm64/sbsa.json
>>
>> diff --git a/tools/perf/arch/arm64/util/pmu.c b/tools/perf/arch/arm64/util/pmu.c
>> index 477e513..227dadb 100644
>> --- a/tools/perf/arch/arm64/util/pmu.c
>> +++ b/tools/perf/arch/arm64/util/pmu.c
>> @@ -3,6 +3,7 @@
>>   #include <internal/cpumap.h>
>>   #include "../../../util/cpumap.h"
>>   #include "../../../util/pmu.h"
>> +#include <api/fs/fs.h>
>>     const struct pmu_events_table *pmu_events_table__find(void)
>>   {
>> @@ -24,3 +25,24 @@ const struct pmu_events_table *pmu_events_table__find(void)
>>         return NULL;
>>   }
>> +
>> +int perf_pmu__get_slots(void)
>> +{
>> +    char path[PATH_MAX];
>> +    unsigned long long slots = 0;
>> +    struct perf_pmu *pmu = NULL;
>> +
>> +    while ((pmu = perf_pmu__scan(pmu)) != NULL) {
>> +        if (is_pmu_core(pmu->name))
>> +            break;
>> +    }
> 
> There is a lot in common with arm64's pmu_events_table__find() - can you factor it out? I also prefer how we check for homogeneous CPUs in pmu_events_table__find() (which you should do, also).
> 

I'll factor out the pmu_core__find function in tools/perf/arch/arm64/util/pmu.c:

static const struct perf_pmu *pmu_core__find(void)
{
	struct perf_pmu *pmu = NULL;

	while ((pmu = perf_pmu__scan(pmu))) {
		if (!is_pmu_core(pmu->name))
			continue;

		/*
		 * The cpumap should cover all CPUs. Otherwise, some CPUs may
		 * not support some events or have different event IDs.
		 */
		if (pmu->cpus->nr != cpu__max_cpu().cpu)
			return NULL;
		return pmu;
	}

	return NULL;
}

>> +    if (pmu) {
>> +        scnprintf(path, PATH_MAX,
>> +            EVENT_SOURCE_DEVICE_PATH "%s/caps/slots", pmu->name);
>> +        /* The value of slots is not greater than INT_MAX, but sysfs__read_int
>> +         * can't read value with 0x prefix, so use sysfs__read_ull instead.
>> +         */
>> +        sysfs__read_ull(path, &slots);
>> +    }
>> +    return (int)slots;
>> +}
>> diff --git a/tools/perf/pmu-events/arch/arm64/sbsa.json b/tools/perf/pmu-events/arch/arm64/sbsa.json
>> new file mode 100644
>> index 0000000..f678c37e
>> --- /dev/null
>> +++ b/tools/perf/pmu-events/arch/arm64/sbsa.json
>> @@ -0,0 +1,30 @@
>> +[
>> +    {
>> +        "MetricExpr": "stall_slot_frontend / (#slots * cpu_cycles)",
>> +        "BriefDescription": "Frontend bound L1 topdown metric",
>> +        "MetricGroup": "TopdownL1",
>> +        "MetricName": "frontend_bound",
>> +        "ScaleUnit": "100%"
>> +    },
>> +    {
>> +        "MetricExpr": "(1 - op_retired / op_spec) * (1 - stall_slot / (#slots * cpu_cycles))",
>> +        "BriefDescription": "Bad speculation L1 topdown metric",
>> +        "MetricGroup": "TopdownL1",
>> +        "MetricName": "bad_speculation",
>> +        "ScaleUnit": "100%"
>> +    },
>> +    {
>> +        "MetricExpr": "(op_retired / op_spec) * (1 - stall_slot / (#slots * cpu_cycles))",
>> +        "BriefDescription": "Retiring L1 topdown metric",
>> +        "MetricGroup": "TopdownL1",
>> +        "MetricName": "retiring",
>> +        "ScaleUnit": "100%"
>> +    },
>> +    {
>> +        "MetricExpr": "stall_slot_backend / (#slots * cpu_cycles)",
>> +        "BriefDescription": "Backend Bound L1 topdown metric",
>> +        "MetricGroup": "TopdownL1",
>> +        "MetricName": "backend_bound",
>> +        "ScaleUnit": "100%"
>> +    }
>> +]
>> diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
>> index 4c398e0..0416b74 100755
>> --- a/tools/perf/pmu-events/jevents.py
>> +++ b/tools/perf/pmu-events/jevents.py
>> @@ -358,6 +358,8 @@ def preprocess_arch_std_files(archpath: str) -> None:
>>         for event in read_json_events(item.path, topic=''):
>>           if event.name:
>>             _arch_std_events[event.name.lower()] = event
>> +        if event.metric_name:
>> +          _arch_std_events[event.metric_name.lower()] = event
>>       def print_events_table_prefix(tblname: str) -> None:
>> diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c
>> index 00dcde3..3d67707 100644
>> --- a/tools/perf/util/expr.c
>> +++ b/tools/perf/util/expr.c
>> @@ -19,6 +19,7 @@
>>   #include <linux/zalloc.h>
>>   #include <ctype.h>
>>   #include <math.h>
>> +#include "pmu.h"
>>     #ifdef PARSER_DEBUG
>>   extern int expr_debug;
>> @@ -448,6 +449,10 @@ double expr__get_literal(const char *literal, const struct expr_scanner_ctx *ctx
>>           result = topology->core_cpus_lists;
>>           goto out;
>>       }
>> +    if (!strcmp("#slots", literal)) {
>> +        result = perf_pmu__get_slots();
>> +        goto out;
>> +    }
>>         pr_err("Unrecognized literal '%s'", literal);
>>   out:
>> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
>> index 2bdeb89..d4cace2 100644
>> --- a/tools/perf/util/pmu.c
>> +++ b/tools/perf/util/pmu.c
>> @@ -1993,3 +1993,8 @@ int perf_pmu__cpus_match(struct perf_pmu *pmu, struct perf_cpu_map *cpus,
>>       *ucpus_ptr = unmatched_cpus;
>>       return 0;
>>   }
>> +
>> +int __weak perf_pmu__get_slots(void)
>> +{
>> +    return 0;
> 
> should this be NAN?
> 

Will do.

>> +}
>> diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
>> index 69ca000..a2f7df8 100644
>> --- a/tools/perf/util/pmu.h
>> +++ b/tools/perf/util/pmu.h
>> @@ -259,4 +259,5 @@ int perf_pmu__cpus_match(struct perf_pmu *pmu, struct perf_cpu_map *cpus,
>>     char *pmu_find_real_name(const char *name);
>>   char *pmu_find_alias_name(const char *name);
>> +int perf_pmu__get_slots(void);
> 
> I think that this name is a bit too vague. Maybe perf_pmu__cpu_cycles_per_slot() could be better.
> 

Does cpu_cycles_per_slot mean "cpu cycles per slot"? In the documemt, Slots mean operation width.
If slots are 5, the largest value by which the STALL_SLOT PMU event may increment in one cycle is 5.
So, maybe perf_pmu__cpu_slots_per_cycle() could be more accurate?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 1/7] perf vendor events arm64: Add common topdown L1 metrics
@ 2023-01-09  2:53       ` Jing Zhang
  0 siblings, 0 replies; 24+ messages in thread
From: Jing Zhang @ 2023-01-09  2:53 UTC (permalink / raw)
  To: John Garry, Ian Rogers
  Cc: Xing Zhengjun, Will Deacon, James Clark, Mike Leach, Leo Yan,
	linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song



在 2023/1/6 下午11:59, John Garry 写道:
> On 06/01/2023 15:05, Jing Zhang wrote:
>> The metrics of topdown L1 are from ARM sbsa7.0 platform design doc[0],
>> D37-38, which are standard. So put them in the common file sbsa.json of
>> arm64, so that other cores besides n2/v2 can also be reused.
>>
>> Slots may be different in each architecture, so added "#slots" literal
>> to get different constant for each architecture.
>>
>> The value of slots comes from the register PMMIR_EL1, which I can read
>> in /sys/bus/event_source/device/armv8_pmuv3_*/caps/slots. PMMIR_EL1.SLOT
>> might read as zero if the STALL_SLOT event is not implemented or the PMU
>> version is lower than ID_AA64DFR0_EL1_PMUVer_V3P4.
>>
>> [0] https://urldefense.com/v3/__https://documentation-service.arm.com/static/60250c7395978b529036da86?token=__;!!ACWV5N9M2RV99hQ!J5JW3y6GhaJUqLfbEAzWIy4GJOhUkHQN4D5hEv3Outpzd54fN1Nt4LNKGnuRtMAepS_Nit-KLSUW98tVfFR0TmMVGQ$
>>
>> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
>> Acked-by: Ian Rogers <irogers@google.com>
> 
> hmmm... you have made significant changes in this version (compared to previous), so I would not have picked up this tag. That's just my opinion.
> 

Thanks for pointing it out.

> As for the patchset org, I'd move the JSON change here into patch #2, and make this patch purely about add "slots" literal support for arm64.
> 

Ok, I will move the changes of sbsa.json and jevent.py to patch#2.

>> ---
>>   tools/perf/arch/arm64/util/pmu.c           | 22 ++++++++++++++++++++++
>>   tools/perf/pmu-events/arch/arm64/sbsa.json | 30 ++++++++++++++++++++++++++++++
>>   tools/perf/pmu-events/jevents.py           |  2 ++
>>   tools/perf/util/expr.c                     |  5 +++++
>>   tools/perf/util/pmu.c                      |  5 +++++
>>   tools/perf/util/pmu.h                      |  1 +
>>   6 files changed, 65 insertions(+)
>>   create mode 100644 tools/perf/pmu-events/arch/arm64/sbsa.json
>>
>> diff --git a/tools/perf/arch/arm64/util/pmu.c b/tools/perf/arch/arm64/util/pmu.c
>> index 477e513..227dadb 100644
>> --- a/tools/perf/arch/arm64/util/pmu.c
>> +++ b/tools/perf/arch/arm64/util/pmu.c
>> @@ -3,6 +3,7 @@
>>   #include <internal/cpumap.h>
>>   #include "../../../util/cpumap.h"
>>   #include "../../../util/pmu.h"
>> +#include <api/fs/fs.h>
>>     const struct pmu_events_table *pmu_events_table__find(void)
>>   {
>> @@ -24,3 +25,24 @@ const struct pmu_events_table *pmu_events_table__find(void)
>>         return NULL;
>>   }
>> +
>> +int perf_pmu__get_slots(void)
>> +{
>> +    char path[PATH_MAX];
>> +    unsigned long long slots = 0;
>> +    struct perf_pmu *pmu = NULL;
>> +
>> +    while ((pmu = perf_pmu__scan(pmu)) != NULL) {
>> +        if (is_pmu_core(pmu->name))
>> +            break;
>> +    }
> 
> There is a lot in common with arm64's pmu_events_table__find() - can you factor it out? I also prefer how we check for homogeneous CPUs in pmu_events_table__find() (which you should do, also).
> 

I'll factor out the pmu_core__find function in tools/perf/arch/arm64/util/pmu.c:

static const struct perf_pmu *pmu_core__find(void)
{
	struct perf_pmu *pmu = NULL;

	while ((pmu = perf_pmu__scan(pmu))) {
		if (!is_pmu_core(pmu->name))
			continue;

		/*
		 * The cpumap should cover all CPUs. Otherwise, some CPUs may
		 * not support some events or have different event IDs.
		 */
		if (pmu->cpus->nr != cpu__max_cpu().cpu)
			return NULL;
		return pmu;
	}

	return NULL;
}

>> +    if (pmu) {
>> +        scnprintf(path, PATH_MAX,
>> +            EVENT_SOURCE_DEVICE_PATH "%s/caps/slots", pmu->name);
>> +        /* The value of slots is not greater than INT_MAX, but sysfs__read_int
>> +         * can't read value with 0x prefix, so use sysfs__read_ull instead.
>> +         */
>> +        sysfs__read_ull(path, &slots);
>> +    }
>> +    return (int)slots;
>> +}
>> diff --git a/tools/perf/pmu-events/arch/arm64/sbsa.json b/tools/perf/pmu-events/arch/arm64/sbsa.json
>> new file mode 100644
>> index 0000000..f678c37e
>> --- /dev/null
>> +++ b/tools/perf/pmu-events/arch/arm64/sbsa.json
>> @@ -0,0 +1,30 @@
>> +[
>> +    {
>> +        "MetricExpr": "stall_slot_frontend / (#slots * cpu_cycles)",
>> +        "BriefDescription": "Frontend bound L1 topdown metric",
>> +        "MetricGroup": "TopdownL1",
>> +        "MetricName": "frontend_bound",
>> +        "ScaleUnit": "100%"
>> +    },
>> +    {
>> +        "MetricExpr": "(1 - op_retired / op_spec) * (1 - stall_slot / (#slots * cpu_cycles))",
>> +        "BriefDescription": "Bad speculation L1 topdown metric",
>> +        "MetricGroup": "TopdownL1",
>> +        "MetricName": "bad_speculation",
>> +        "ScaleUnit": "100%"
>> +    },
>> +    {
>> +        "MetricExpr": "(op_retired / op_spec) * (1 - stall_slot / (#slots * cpu_cycles))",
>> +        "BriefDescription": "Retiring L1 topdown metric",
>> +        "MetricGroup": "TopdownL1",
>> +        "MetricName": "retiring",
>> +        "ScaleUnit": "100%"
>> +    },
>> +    {
>> +        "MetricExpr": "stall_slot_backend / (#slots * cpu_cycles)",
>> +        "BriefDescription": "Backend Bound L1 topdown metric",
>> +        "MetricGroup": "TopdownL1",
>> +        "MetricName": "backend_bound",
>> +        "ScaleUnit": "100%"
>> +    }
>> +]
>> diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py
>> index 4c398e0..0416b74 100755
>> --- a/tools/perf/pmu-events/jevents.py
>> +++ b/tools/perf/pmu-events/jevents.py
>> @@ -358,6 +358,8 @@ def preprocess_arch_std_files(archpath: str) -> None:
>>         for event in read_json_events(item.path, topic=''):
>>           if event.name:
>>             _arch_std_events[event.name.lower()] = event
>> +        if event.metric_name:
>> +          _arch_std_events[event.metric_name.lower()] = event
>>       def print_events_table_prefix(tblname: str) -> None:
>> diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c
>> index 00dcde3..3d67707 100644
>> --- a/tools/perf/util/expr.c
>> +++ b/tools/perf/util/expr.c
>> @@ -19,6 +19,7 @@
>>   #include <linux/zalloc.h>
>>   #include <ctype.h>
>>   #include <math.h>
>> +#include "pmu.h"
>>     #ifdef PARSER_DEBUG
>>   extern int expr_debug;
>> @@ -448,6 +449,10 @@ double expr__get_literal(const char *literal, const struct expr_scanner_ctx *ctx
>>           result = topology->core_cpus_lists;
>>           goto out;
>>       }
>> +    if (!strcmp("#slots", literal)) {
>> +        result = perf_pmu__get_slots();
>> +        goto out;
>> +    }
>>         pr_err("Unrecognized literal '%s'", literal);
>>   out:
>> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
>> index 2bdeb89..d4cace2 100644
>> --- a/tools/perf/util/pmu.c
>> +++ b/tools/perf/util/pmu.c
>> @@ -1993,3 +1993,8 @@ int perf_pmu__cpus_match(struct perf_pmu *pmu, struct perf_cpu_map *cpus,
>>       *ucpus_ptr = unmatched_cpus;
>>       return 0;
>>   }
>> +
>> +int __weak perf_pmu__get_slots(void)
>> +{
>> +    return 0;
> 
> should this be NAN?
> 

Will do.

>> +}
>> diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
>> index 69ca000..a2f7df8 100644
>> --- a/tools/perf/util/pmu.h
>> +++ b/tools/perf/util/pmu.h
>> @@ -259,4 +259,5 @@ int perf_pmu__cpus_match(struct perf_pmu *pmu, struct perf_cpu_map *cpus,
>>     char *pmu_find_real_name(const char *name);
>>   char *pmu_find_alias_name(const char *name);
>> +int perf_pmu__get_slots(void);
> 
> I think that this name is a bit too vague. Maybe perf_pmu__cpu_cycles_per_slot() could be better.
> 

Does cpu_cycles_per_slot mean "cpu cycles per slot"? In the documemt, Slots mean operation width.
If slots are 5, the largest value by which the STALL_SLOT PMU event may increment in one cycle is 5.
So, maybe perf_pmu__cpu_slots_per_cycle() could be more accurate?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 1/7] perf vendor events arm64: Add common topdown L1 metrics
  2023-01-09  2:53       ` Jing Zhang
@ 2023-01-09 14:58         ` John Garry
  -1 siblings, 0 replies; 24+ messages in thread
From: John Garry @ 2023-01-09 14:58 UTC (permalink / raw)
  To: Jing Zhang, Ian Rogers
  Cc: Xing Zhengjun, Will Deacon, James Clark, Mike Leach, Leo Yan,
	linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song

On 09/01/2023 02:53, Jing Zhang wrote:
> I'll factor out the pmu_core__find function in tools/perf/arch/arm64/util/pmu.c:
> 
> static const struct perf_pmu *pmu_core__find(void)

maybe name as pmu_core__find_same() or similar to indicate that we're 
only dealing with homogeneous cores

> {
> 	struct perf_pmu *pmu = NULL;

no need to init to NULL

> 
> 	while ((pmu = perf_pmu__scan(pmu))) {

1x superfluous level of ()

> 		if (!is_pmu_core(pmu->name))
> 			continue;
> 
> 		/*
> 		 * The cpumap should cover all CPUs. Otherwise, some CPUs may
> 		 * not support some events or have different event IDs.
> 		 */
> 		if (pmu->cpus->nr != cpu__max_cpu().cpu)
> 			return NULL;
> 		return pmu;
> 	}
> 
> 	return NULL;
> }
> 

...

> 
>>> +}
>>> diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
>>> index 69ca000..a2f7df8 100644
>>> --- a/tools/perf/util/pmu.h
>>> +++ b/tools/perf/util/pmu.h
>>> @@ -259,4 +259,5 @@ int perf_pmu__cpus_match(struct perf_pmu *pmu, struct perf_cpu_map *cpus,
>>>      char *pmu_find_real_name(const char *name);
>>>    char *pmu_find_alias_name(const char *name);
>>> +int perf_pmu__get_slots(void);
>> I think that this name is a bit too vague. Maybe perf_pmu__cpu_cycles_per_slot() could be better.
>>
> Does cpu_cycles_per_slot mean "cpu cycles per slot"? In the documemt, Slots mean operation width.
> If slots are 5, the largest value by which the STALL_SLOT PMU event may increment in one cycle is 5.
> So, maybe perf_pmu__cpu_slots_per_cycle() could be more accurate?

ok, yes, fine.

Thanks,
John


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 1/7] perf vendor events arm64: Add common topdown L1 metrics
@ 2023-01-09 14:58         ` John Garry
  0 siblings, 0 replies; 24+ messages in thread
From: John Garry @ 2023-01-09 14:58 UTC (permalink / raw)
  To: Jing Zhang, Ian Rogers
  Cc: Xing Zhengjun, Will Deacon, James Clark, Mike Leach, Leo Yan,
	linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song

On 09/01/2023 02:53, Jing Zhang wrote:
> I'll factor out the pmu_core__find function in tools/perf/arch/arm64/util/pmu.c:
> 
> static const struct perf_pmu *pmu_core__find(void)

maybe name as pmu_core__find_same() or similar to indicate that we're 
only dealing with homogeneous cores

> {
> 	struct perf_pmu *pmu = NULL;

no need to init to NULL

> 
> 	while ((pmu = perf_pmu__scan(pmu))) {

1x superfluous level of ()

> 		if (!is_pmu_core(pmu->name))
> 			continue;
> 
> 		/*
> 		 * The cpumap should cover all CPUs. Otherwise, some CPUs may
> 		 * not support some events or have different event IDs.
> 		 */
> 		if (pmu->cpus->nr != cpu__max_cpu().cpu)
> 			return NULL;
> 		return pmu;
> 	}
> 
> 	return NULL;
> }
> 

...

> 
>>> +}
>>> diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
>>> index 69ca000..a2f7df8 100644
>>> --- a/tools/perf/util/pmu.h
>>> +++ b/tools/perf/util/pmu.h
>>> @@ -259,4 +259,5 @@ int perf_pmu__cpus_match(struct perf_pmu *pmu, struct perf_cpu_map *cpus,
>>>      char *pmu_find_real_name(const char *name);
>>>    char *pmu_find_alias_name(const char *name);
>>> +int perf_pmu__get_slots(void);
>> I think that this name is a bit too vague. Maybe perf_pmu__cpu_cycles_per_slot() could be better.
>>
> Does cpu_cycles_per_slot mean "cpu cycles per slot"? In the documemt, Slots mean operation width.
> If slots are 5, the largest value by which the STALL_SLOT PMU event may increment in one cycle is 5.
> So, maybe perf_pmu__cpu_slots_per_cycle() could be more accurate?

ok, yes, fine.

Thanks,
John


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 1/7] perf vendor events arm64: Add common topdown L1 metrics
  2023-01-09 14:58         ` John Garry
@ 2023-01-10 10:38           ` Jing Zhang
  -1 siblings, 0 replies; 24+ messages in thread
From: Jing Zhang @ 2023-01-10 10:38 UTC (permalink / raw)
  To: John Garry
  Cc: Xing Zhengjun, Will Deacon, James Clark, Mike Leach, Leo Yan,
	linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song



在 2023/1/9 下午10:58, John Garry 写道:
> On 09/01/2023 02:53, Jing Zhang wrote:
>> I'll factor out the pmu_core__find function in tools/perf/arch/arm64/util/pmu.c:
>>
>> static const struct perf_pmu *pmu_core__find(void)
> 
> maybe name as pmu_core__find_same() or similar to indicate that we're only dealing with homogeneous cores
> 
>> {
>>     struct perf_pmu *pmu = NULL;
> 
> no need to init to NULL
> 
>>
>>     while ((pmu = perf_pmu__scan(pmu))) {
> 
> 1x superfluous level of ()
> 

Ok, will do, Thank you!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v6 1/7] perf vendor events arm64: Add common topdown L1 metrics
@ 2023-01-10 10:38           ` Jing Zhang
  0 siblings, 0 replies; 24+ messages in thread
From: Jing Zhang @ 2023-01-10 10:38 UTC (permalink / raw)
  To: John Garry
  Cc: Xing Zhengjun, Will Deacon, James Clark, Mike Leach, Leo Yan,
	linux-arm-kernel, linux-perf-users, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Andrew Kilroy,
	Shuai Xue, Zhuo Song



在 2023/1/9 下午10:58, John Garry 写道:
> On 09/01/2023 02:53, Jing Zhang wrote:
>> I'll factor out the pmu_core__find function in tools/perf/arch/arm64/util/pmu.c:
>>
>> static const struct perf_pmu *pmu_core__find(void)
> 
> maybe name as pmu_core__find_same() or similar to indicate that we're only dealing with homogeneous cores
> 
>> {
>>     struct perf_pmu *pmu = NULL;
> 
> no need to init to NULL
> 
>>
>>     while ((pmu = perf_pmu__scan(pmu))) {
> 
> 1x superfluous level of ()
> 

Ok, will do, Thank you!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-01-10 10:39 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-06 15:05 [PATCH v6 0/7] Add metrics for neoverse-n2-v2 Jing Zhang
2023-01-06 15:05 ` Jing Zhang
2023-01-06 15:05 ` [PATCH v6 1/7] perf vendor events arm64: Add common topdown L1 metrics Jing Zhang
2023-01-06 15:05   ` Jing Zhang
2023-01-06 15:59   ` John Garry
2023-01-06 15:59     ` John Garry
2023-01-09  2:53     ` Jing Zhang
2023-01-09  2:53       ` Jing Zhang
2023-01-09 14:58       ` John Garry
2023-01-09 14:58         ` John Garry
2023-01-10 10:38         ` Jing Zhang
2023-01-10 10:38           ` Jing Zhang
2023-01-06 15:05 ` [PATCH v6 2/7] perf vendor events arm64: Add topdown L1 metrics for neoverse-n2-v2 Jing Zhang
2023-01-06 15:05   ` Jing Zhang
2023-01-06 15:05 ` [PATCH v6 3/7] perf vendor events arm64: Add TLB " Jing Zhang
2023-01-06 15:05   ` Jing Zhang
2023-01-06 15:05 ` [PATCH v6 4/7] perf vendor events arm64: Add cache " Jing Zhang
2023-01-06 15:05   ` Jing Zhang
2023-01-06 15:05 ` [PATCH v6 5/7] perf vendor events arm64: Add branch " Jing Zhang
2023-01-06 15:05   ` Jing Zhang
2023-01-06 15:05 ` [PATCH v6 6/7] perf vendor events arm64: Add PE utilization " Jing Zhang
2023-01-06 15:05   ` Jing Zhang
2023-01-06 15:05 ` [PATCH v6 7/7] perf vendor events arm64: Add instruction mix " Jing Zhang
2023-01-06 15:05   ` Jing Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.