All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jing Zhang <renyu.zj@linux.alibaba.com>
To: John Garry <john.g.garry@oracle.com>,
	Ian Rogers <irogers@google.com>,
	Xing Zhengjun <zhengjun.xing@linux.intel.com>,
	Will Deacon <will@kernel.org>, James Clark <james.clark@arm.com>,
	Mike Leach <mike.leach@linaro.org>, Leo Yan <leo.yan@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>,
	Andrew Kilroy <andrew.kilroy@arm.com>,
	Shuai Xue <xueshuai@linux.alibaba.com>,
	Zhuo Song <zhuo.song@linux.alibaba.com>,
	Jing Zhang <renyu.zj@linux.alibaba.com>
Subject: [PATCH v5 0/6] Add metrics for neoverse-n2
Date: Tue,  3 Jan 2023 19:39:30 +0800	[thread overview]
Message-ID: <1672745976-2800146-1-git-send-email-renyu.zj@linux.alibaba.com> (raw)

Changes since v4:
- Add MPKI/PKI “ScaleUnit”;
- Add acked-by from Ian Rogers;
- Link: https://lore.kernel.org/all/1671799045-1108027-1-git-send-email-renyu.zj@linux.alibaba.com/

Changes since v3:
- Add ipc_rate metric;
- Drop the PublicDescription;
- Describe PEutilization metrics in more detail;
- Link: https://lore.kernel.org/all/1669310088-13482-1-git-send-email-renyu.zj@linux.alibaba.com/

Changes since v2:
- Correct the furmula of Branch metrics;
- Add more PE utilization metrics;
- Add more TLB metrics;
- Add “ScaleUnit” for some metrics;
- Add a newline at the end of the file;
- Link: https://lore.kernel.org/all/1668411720-3581-1-git-send-email-renyu.zj@linux.alibaba.com/

Changes since v1:
- Corrected formula for topdown L1 due to wrong counts for stall_slot and
  stall_slot_frontend; 
- Link: https://lore.kernel.org/all/1667214694-89839-1-git-send-email-renyu.zj@linux.alibaba.com/


This series add six metricgroups for neoverse-n2, among which, the formula of
topdown L1 is from ARM sbsa7.0 platform design document [0], D37-38.

However, due to the wrong count of stall_slot and stall_slot_frontend on
neoverse-n2, the real stall_slot and real stall_slot_frontend need to
subtract cpu_cycles,  so correct the expression of topdown metrics.
Reference from ARM neoverse-n2 errata notice [1], D117.

Since neoverse-n2 does not yet support topdown L2, metricgroups such as Cache,
TLB, Branch, InstructionsMix, and PEutilization are added to help further
analysis of performance bottlenecks. Reference from ARM PMU guide [2][3].

[0] https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
[1] https://documentation-service.arm.com/static/636a66a64e6cf12278ad89cb?token=
[2] https://documentation-service.arm.com/static/628f8fa3dfaf015c2b76eae8?token=
[3] https://documentation-service.arm.com/static/62cfe21e31ea212bb6627393?token=


$./perf list
...
Metric Groups:

Branch:
  branch_miss_pred_rate
       [The rate of branches mis-predited to the overall branches]
  branch_mpki
       [The rate of branches mis-predicted per kilo instructions]
  branch_pki
       [The rate of branches retired per kilo instructions]
Cache:
  l1d_cache_miss_rate
       [The rate of L1 D-Cache misses to the overall L1 D-Cache]
  l1d_cache_mpki
       [The rate of L1 D-Cache misses per kilo instructions]
...


$sudo ./perf stat -M TLB false_sharing 2

 Performance counter stats for 'false_sharing 2':

            31,254      L2D_TLB                          #     19.6 %  l2_tlb_miss_rate      (42.51%)
             6,130      L2D_TLB_REFILL                                                       (42.51%)
             1,910      L1I_TLB_REFILL                   #      0.1 %  l1i_tlb_miss_rate     (42.93%)
         2,306,689      L1I_TLB                                                              (42.93%)
       324,924,247      L1D_TLB                          #      0.0 %  l1d_tlb_miss_rate     (43.98%)
            22,688      L1D_TLB_REFILL                                                       (43.98%)
           627,992      L1I_TLB                          #      0.0 %  itlb_walk_rate        (44.26%)
                92      ITLB_WALK                                                            (44.26%)
       772,445,613      INST_RETIRED                     #      0.0 MPKI  itlb_mpki          (43.94%)
                88      ITLB_WALK                                                            (43.94%)
               907      DTLB_WALK                        #      0.0 %  dtlb_walk_rate        (43.10%)
       259,132,258      L1D_TLB                                                              (43.10%)
       804,080,968      INST_RETIRED                     #      0.0 MPKI  dtlb_mpki          (42.22%)
               937      DTLB_WALK                                                            (42.22%)

       0.479544400 seconds time elapsed

       1.264233000 seconds user
       0.000000000 seconds sys


$sudo ./perf stat -M TopDownL1 false_sharing 2

 Performance counter stats for 'false_sharing 2':

     4,310,905,590      cpu_cycles                       #      0.0 %  bad_speculation
                                                  #      4.0 %  retiring              (66.87%)
    25,009,763,735      stall_slot                                                           (66.87%)
       855,659,327      op_spec                                                              (66.87%)
       854,335,288      op_retired                                                           (66.87%)
     4,330,308,058      cpu_cycles                       #     27.1 %  frontend_bound        (66.99%)
    10,207,186,460      stall_slot_frontend                                                  (66.99%)
     4,316,583,673      cpu_cycles                       #     69.4 %  backend_bound         (66.65%)
    14,979,136,808      stall_slot_backend                                                   (66.65%)

       0.572056818 seconds time elapsed

       1.572143000 seconds user
       0.004010000 seconds sys


Jing Zhang (6):
  perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
  perf vendor events arm64: Add TLB metrics for neoverse-n2
  perf vendor events arm64: Add cache metrics for neoverse-n2
  perf vendor events arm64: Add branch metrics for neoverse-n2
  perf vendor events arm64: Add PE utilization metrics for neoverse-n2
  perf vendor events arm64: Add instruction mix metrics for neoverse-n2

 .../arch/arm64/arm/neoverse-n2/metrics.json        | 286 +++++++++++++++++++++
 1 file changed, 286 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json

-- 
1.8.3.1


WARNING: multiple messages have this Message-ID (diff)
From: Jing Zhang <renyu.zj@linux.alibaba.com>
To: John Garry <john.g.garry@oracle.com>,
	Ian Rogers <irogers@google.com>,
	Xing Zhengjun <zhengjun.xing@linux.intel.com>,
	Will Deacon <will@kernel.org>, James Clark <james.clark@arm.com>,
	Mike Leach <mike.leach@linaro.org>, Leo Yan <leo.yan@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>,
	Andrew Kilroy <andrew.kilroy@arm.com>,
	Shuai Xue <xueshuai@linux.alibaba.com>,
	Zhuo Song <zhuo.song@linux.alibaba.com>,
	Jing Zhang <renyu.zj@linux.alibaba.com>
Subject: [PATCH v5 0/6] Add metrics for neoverse-n2
Date: Tue,  3 Jan 2023 19:39:30 +0800	[thread overview]
Message-ID: <1672745976-2800146-1-git-send-email-renyu.zj@linux.alibaba.com> (raw)

Changes since v4:
- Add MPKI/PKI “ScaleUnit”;
- Add acked-by from Ian Rogers;
- Link: https://lore.kernel.org/all/1671799045-1108027-1-git-send-email-renyu.zj@linux.alibaba.com/

Changes since v3:
- Add ipc_rate metric;
- Drop the PublicDescription;
- Describe PEutilization metrics in more detail;
- Link: https://lore.kernel.org/all/1669310088-13482-1-git-send-email-renyu.zj@linux.alibaba.com/

Changes since v2:
- Correct the furmula of Branch metrics;
- Add more PE utilization metrics;
- Add more TLB metrics;
- Add “ScaleUnit” for some metrics;
- Add a newline at the end of the file;
- Link: https://lore.kernel.org/all/1668411720-3581-1-git-send-email-renyu.zj@linux.alibaba.com/

Changes since v1:
- Corrected formula for topdown L1 due to wrong counts for stall_slot and
  stall_slot_frontend; 
- Link: https://lore.kernel.org/all/1667214694-89839-1-git-send-email-renyu.zj@linux.alibaba.com/


This series add six metricgroups for neoverse-n2, among which, the formula of
topdown L1 is from ARM sbsa7.0 platform design document [0], D37-38.

However, due to the wrong count of stall_slot and stall_slot_frontend on
neoverse-n2, the real stall_slot and real stall_slot_frontend need to
subtract cpu_cycles,  so correct the expression of topdown metrics.
Reference from ARM neoverse-n2 errata notice [1], D117.

Since neoverse-n2 does not yet support topdown L2, metricgroups such as Cache,
TLB, Branch, InstructionsMix, and PEutilization are added to help further
analysis of performance bottlenecks. Reference from ARM PMU guide [2][3].

[0] https://documentation-service.arm.com/static/60250c7395978b529036da86?token=
[1] https://documentation-service.arm.com/static/636a66a64e6cf12278ad89cb?token=
[2] https://documentation-service.arm.com/static/628f8fa3dfaf015c2b76eae8?token=
[3] https://documentation-service.arm.com/static/62cfe21e31ea212bb6627393?token=


$./perf list
...
Metric Groups:

Branch:
  branch_miss_pred_rate
       [The rate of branches mis-predited to the overall branches]
  branch_mpki
       [The rate of branches mis-predicted per kilo instructions]
  branch_pki
       [The rate of branches retired per kilo instructions]
Cache:
  l1d_cache_miss_rate
       [The rate of L1 D-Cache misses to the overall L1 D-Cache]
  l1d_cache_mpki
       [The rate of L1 D-Cache misses per kilo instructions]
...


$sudo ./perf stat -M TLB false_sharing 2

 Performance counter stats for 'false_sharing 2':

            31,254      L2D_TLB                          #     19.6 %  l2_tlb_miss_rate      (42.51%)
             6,130      L2D_TLB_REFILL                                                       (42.51%)
             1,910      L1I_TLB_REFILL                   #      0.1 %  l1i_tlb_miss_rate     (42.93%)
         2,306,689      L1I_TLB                                                              (42.93%)
       324,924,247      L1D_TLB                          #      0.0 %  l1d_tlb_miss_rate     (43.98%)
            22,688      L1D_TLB_REFILL                                                       (43.98%)
           627,992      L1I_TLB                          #      0.0 %  itlb_walk_rate        (44.26%)
                92      ITLB_WALK                                                            (44.26%)
       772,445,613      INST_RETIRED                     #      0.0 MPKI  itlb_mpki          (43.94%)
                88      ITLB_WALK                                                            (43.94%)
               907      DTLB_WALK                        #      0.0 %  dtlb_walk_rate        (43.10%)
       259,132,258      L1D_TLB                                                              (43.10%)
       804,080,968      INST_RETIRED                     #      0.0 MPKI  dtlb_mpki          (42.22%)
               937      DTLB_WALK                                                            (42.22%)

       0.479544400 seconds time elapsed

       1.264233000 seconds user
       0.000000000 seconds sys


$sudo ./perf stat -M TopDownL1 false_sharing 2

 Performance counter stats for 'false_sharing 2':

     4,310,905,590      cpu_cycles                       #      0.0 %  bad_speculation
                                                  #      4.0 %  retiring              (66.87%)
    25,009,763,735      stall_slot                                                           (66.87%)
       855,659,327      op_spec                                                              (66.87%)
       854,335,288      op_retired                                                           (66.87%)
     4,330,308,058      cpu_cycles                       #     27.1 %  frontend_bound        (66.99%)
    10,207,186,460      stall_slot_frontend                                                  (66.99%)
     4,316,583,673      cpu_cycles                       #     69.4 %  backend_bound         (66.65%)
    14,979,136,808      stall_slot_backend                                                   (66.65%)

       0.572056818 seconds time elapsed

       1.572143000 seconds user
       0.004010000 seconds sys


Jing Zhang (6):
  perf vendor events arm64: Add topdown L1 metrics for neoverse-n2
  perf vendor events arm64: Add TLB metrics for neoverse-n2
  perf vendor events arm64: Add cache metrics for neoverse-n2
  perf vendor events arm64: Add branch metrics for neoverse-n2
  perf vendor events arm64: Add PE utilization metrics for neoverse-n2
  perf vendor events arm64: Add instruction mix metrics for neoverse-n2

 .../arch/arm64/arm/neoverse-n2/metrics.json        | 286 +++++++++++++++++++++
 1 file changed, 286 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json

-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

             reply	other threads:[~2023-01-03 11:40 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-03 11:39 Jing Zhang [this message]
2023-01-03 11:39 ` [PATCH v5 0/6] Add metrics for neoverse-n2 Jing Zhang
2023-01-03 11:39 ` [PATCH v5 1/6] perf vendor events arm64: Add topdown L1 " Jing Zhang
2023-01-03 11:39   ` Jing Zhang
2023-01-03 11:52   ` John Garry
2023-01-03 11:52     ` John Garry
2023-01-04  5:05     ` Jing Zhang
2023-01-04  5:05       ` Jing Zhang
2023-01-04 17:26       ` John Garry
2023-01-04 17:26         ` John Garry
2023-01-05 10:05         ` Jing Zhang
2023-01-05 10:05           ` Jing Zhang
2023-01-05 10:13           ` John Garry
2023-01-05 10:13             ` John Garry
2023-01-05 11:02             ` Jing Zhang
2023-01-05 11:02               ` Jing Zhang
2023-01-05 21:13             ` Ian Rogers
2023-01-05 21:13               ` Ian Rogers
2023-01-06 10:14               ` John Garry
2023-01-06 10:14                 ` John Garry
2023-01-06 10:34                 ` Jing Zhang
2023-01-06 10:34                   ` Jing Zhang
2023-01-09 15:34                 ` James Clark
2023-01-09 15:34                   ` James Clark
2023-01-11  6:14                   ` Ian Rogers
2023-01-11  6:14                     ` Ian Rogers
2023-01-03 11:39 ` [PATCH v5 2/6] perf vendor events arm64: Add TLB " Jing Zhang
2023-01-03 11:39   ` Jing Zhang
2023-01-03 17:14   ` Ian Rogers
2023-01-03 17:14     ` Ian Rogers
2023-01-04  5:21     ` Jing Zhang
2023-01-04  5:21       ` Jing Zhang
2023-01-04  8:40       ` Jing Zhang
2023-01-04  8:40         ` Jing Zhang
2023-01-04 16:57         ` Ian Rogers
2023-01-04 16:57           ` Ian Rogers
2023-01-03 11:39 ` [PATCH v5 3/6] perf vendor events arm64: Add cache " Jing Zhang
2023-01-03 11:39   ` Jing Zhang
2023-01-03 11:39 ` [PATCH v5 4/6] perf vendor events arm64: Add branch " Jing Zhang
2023-01-03 11:39   ` Jing Zhang
2023-01-03 11:39 ` [PATCH v5 5/6] perf vendor events arm64: Add PE utilization " Jing Zhang
2023-01-03 11:39   ` Jing Zhang
2023-01-03 11:39 ` [PATCH v5 6/6] perf vendor events arm64: Add instruction mix " Jing Zhang
2023-01-03 11:39   ` Jing Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1672745976-2800146-1-git-send-email-renyu.zj@linux.alibaba.com \
    --to=renyu.zj@linux.alibaba.com \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=andrew.kilroy@arm.com \
    --cc=irogers@google.com \
    --cc=james.clark@arm.com \
    --cc=john.g.garry@oracle.com \
    --cc=jolsa@kernel.org \
    --cc=leo.yan@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mike.leach@linaro.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=will@kernel.org \
    --cc=xueshuai@linux.alibaba.com \
    --cc=zhengjun.xing@linux.intel.com \
    --cc=zhuo.song@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.