All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4] perf vendor events: Add metrics for Icelake Server
@ 2021-08-06  7:54 Jin Yao
  2021-08-06  8:16 ` Ian Rogers
  0 siblings, 1 reply; 3+ messages in thread
From: Jin Yao @ 2021-08-06  7:54 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, linux-perf-users, ak, kan.liang, yao.jin, irogers, Jin Yao

Add JSON metrics for Icelake Server to perf.

Based on TMA metrics 4.21 at 01.org.
https://download.01.org/perfmon/

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
---
v4:
 - Since now perf tool can support #SMT_on in metric expression as a token,
   support #SMT_on in 'ILP' and 'SMT_2T_Utilization'.

v3:
 - PMU cstate_core and cstate_pkg are supported for ICX since 5.14-rc1,
   add cstate metrics for Core C1/C6 and Package C2/C6.

v2:
 - Fix perf test 10 error.

   # ./perf test 10
   10: PMU events                                                      :
   10.1: PMU event table sanity                                        : Ok
   10.2: PMU event map aliases                                         : Ok
   10.3: Parsing of PMU event table metrics                            : Ok
   10.4: Parsing of PMU event table metrics with fake PMUs             : Ok

 - Remove cstate metrics because the kernel has not supported
   cstate_core and cstate_core for Icelake server.

 - Remove the topdown L1/L2 metrics.

 .../arch/x86/icelakex/icx-metrics.json        | 315 ++++++++++++++++++
 1 file changed, 315 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json

diff --git a/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json b/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json
new file mode 100644
index 000000000000..57ddbb9f9b31
--- /dev/null
+++ b/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json
@@ -0,0 +1,315 @@
+[
+    {
+        "BriefDescription": "Instructions Per Cycle (per Logical Processor)",
+        "MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD",
+        "MetricGroup": "Summary",
+        "MetricName": "IPC"
+    },
+    {
+        "BriefDescription": "Uops Per Instruction",
+        "MetricExpr": "UOPS_RETIRED.SLOTS / INST_RETIRED.ANY",
+        "MetricGroup": "Pipeline;Retire",
+        "MetricName": "UPI"
+    },
+    {
+        "BriefDescription": "Instruction per taken branch",
+        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN",
+        "MetricGroup": "Branches;FetchBW;PGO",
+        "MetricName": "IpTB"
+    },
+    {
+        "BriefDescription": "Cycles Per Instruction (per Logical Processor)",
+        "MetricExpr": "1 / (INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD)",
+        "MetricGroup": "Pipeline",
+        "MetricName": "CPI"
+    },
+    {
+        "BriefDescription": "Per-Logical Processor actual clocks when the Logical Processor is active.",
+        "MetricExpr": "CPU_CLK_UNHALTED.THREAD",
+        "MetricGroup": "Pipeline",
+        "MetricName": "CLKS"
+    },
+    {
+        "BriefDescription": "Instructions Per Cycle (per physical core)",
+        "MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.DISTRIBUTED",
+        "MetricGroup": "SMT;TmaL1",
+        "MetricName": "CoreIPC"
+    },
+    {
+        "BriefDescription": "Floating Point Operations Per Cycle",
+        "MetricExpr": "( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE ) / CPU_CLK_UNHALTED.DISTRIBUTED",
+        "MetricGroup": "Flops",
+        "MetricName": "FLOPc"
+    },
+    {
+        "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is at least 1 uop executed)",
+        "MetricExpr": "UOPS_EXECUTED.THREAD / (( UOPS_EXECUTED.CORE_CYCLES_GE_1 / 2 ) if #SMT_on else UOPS_EXECUTED.CORE_CYCLES_GE_1)",
+        "MetricGroup": "Pipeline;PortsUtil",
+        "MetricName": "ILP"
+    },
+    {
+        "BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear)",
+        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
+        "MetricGroup": "BrMispredicts",
+        "MetricName": "IpMispredict"
+    },
+    {
+        "BriefDescription": "Core actual clocks when any Logical Processor is active on the Physical Core",
+        "MetricExpr": "CPU_CLK_UNHALTED.DISTRIBUTED",
+        "MetricGroup": "SMT",
+        "MetricName": "CORE_CLKS"
+    },
+    {
+        "BriefDescription": "Instructions per Load (lower number means higher occurrence rate)",
+        "MetricExpr": "INST_RETIRED.ANY / MEM_INST_RETIRED.ALL_LOADS",
+        "MetricGroup": "InsType",
+        "MetricName": "IpLoad"
+    },
+    {
+        "BriefDescription": "Instructions per Store (lower number means higher occurrence rate)",
+        "MetricExpr": "INST_RETIRED.ANY / MEM_INST_RETIRED.ALL_STORES",
+        "MetricGroup": "InsType",
+        "MetricName": "IpStore"
+    },
+    {
+        "BriefDescription": "Instructions per Branch (lower number means higher occurrence rate)",
+        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.ALL_BRANCHES",
+        "MetricGroup": "Branches;InsType",
+        "MetricName": "IpBranch"
+    },
+    {
+        "BriefDescription": "Instructions per (near) call (lower number means higher occurrence rate)",
+        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_CALL",
+        "MetricGroup": "Branches",
+        "MetricName": "IpCall"
+    },
+    {
+        "BriefDescription": "Branch instructions per taken branch. ",
+        "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / BR_INST_RETIRED.NEAR_TAKEN",
+        "MetricGroup": "Branches;PGO",
+        "MetricName": "BpTkBranch"
+    },
+    {
+        "BriefDescription": "Instructions per Floating Point (FP) Operation (lower number means higher occurrence rate)",
+        "MetricExpr": "INST_RETIRED.ANY / ( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE )",
+        "MetricGroup": "Flops;FpArith;InsType",
+        "MetricName": "IpFLOP"
+    },
+    {
+        "BriefDescription": "Total number of retired Instructions, Sample with: INST_RETIRED.PREC_DIST",
+        "MetricExpr": "INST_RETIRED.ANY",
+        "MetricGroup": "Summary;TmaL1",
+        "MetricName": "Instructions"
+    },
+    {
+        "BriefDescription": "Fraction of Uops delivered by the LSD (Loop Stream Detector; aka Loop Cache)",
+        "MetricExpr": "LSD.UOPS / (IDQ.DSB_UOPS + LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS)",
+        "MetricGroup": "LSD",
+        "MetricName": "LSD_Coverage"
+    },
+    {
+        "BriefDescription": "Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache)",
+        "MetricExpr": "IDQ.DSB_UOPS / (IDQ.DSB_UOPS + LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS)",
+        "MetricGroup": "DSB;FetchBW",
+        "MetricName": "DSB_Coverage"
+    },
+    {
+        "BriefDescription": "Actual Average Latency for L1 data-cache miss demand loads (in core cycles)",
+        "MetricExpr": "L1D_PEND_MISS.PENDING / ( MEM_LOAD_RETIRED.L1_MISS + MEM_LOAD_RETIRED.FB_HIT )",
+        "MetricGroup": "MemoryBound;MemoryLat",
+        "MetricName": "Load_Miss_Real_Latency"
+    },
+    {
+        "BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss. Per-Logical Processor)",
+        "MetricExpr": "L1D_PEND_MISS.PENDING / L1D_PEND_MISS.PENDING_CYCLES",
+        "MetricGroup": "MemoryBound;MemoryBW",
+        "MetricName": "MLP"
+    },
+    {
+        "BriefDescription": "Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses",
+        "MetricConstraint": "NO_NMI_WATCHDOG",
+        "MetricExpr": "( ITLB_MISSES.WALK_PENDING + DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING ) / ( 2 * CPU_CLK_UNHALTED.DISTRIBUTED )",
+        "MetricGroup": "MemoryTLB",
+        "MetricName": "Page_Walks_Utilization"
+    },
+    {
+        "BriefDescription": "Average data fill bandwidth to the L1 data cache [GB / sec]",
+        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time",
+        "MetricGroup": "MemoryBW",
+        "MetricName": "L1D_Cache_Fill_BW"
+    },
+    {
+        "BriefDescription": "Average data fill bandwidth to the L2 cache [GB / sec]",
+        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time",
+        "MetricGroup": "MemoryBW",
+        "MetricName": "L2_Cache_Fill_BW"
+    },
+    {
+        "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]",
+        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time",
+        "MetricGroup": "MemoryBW",
+        "MetricName": "L3_Cache_Fill_BW"
+    },
+    {
+        "BriefDescription": "Average per-core data access bandwidth to the L3 cache [GB / sec]",
+        "MetricExpr": "64 * OFFCORE_REQUESTS.ALL_REQUESTS / 1000000000 / duration_time",
+        "MetricGroup": "MemoryBW;Offcore",
+        "MetricName": "L3_Cache_Access_BW"
+    },
+    {
+        "BriefDescription": "L1 cache true misses per kilo instruction for retired demand loads",
+        "MetricExpr": "1000 * MEM_LOAD_RETIRED.L1_MISS / INST_RETIRED.ANY",
+        "MetricGroup": "CacheMisses",
+        "MetricName": "L1MPKI"
+    },
+    {
+        "BriefDescription": "L2 cache true misses per kilo instruction for retired demand loads",
+        "MetricExpr": "1000 * MEM_LOAD_RETIRED.L2_MISS / INST_RETIRED.ANY",
+        "MetricGroup": "CacheMisses",
+        "MetricName": "L2MPKI"
+    },
+    {
+        "BriefDescription": "L2 cache misses per kilo instruction for all request types (including speculative)",
+        "MetricExpr": "1000 * ( ( OFFCORE_REQUESTS.ALL_DATA_RD - OFFCORE_REQUESTS.DEMAND_DATA_RD ) + L2_RQSTS.ALL_DEMAND_MISS + L2_RQSTS.SWPF_MISS ) / INST_RETIRED.ANY",
+        "MetricGroup": "CacheMisses;Offcore",
+        "MetricName": "L2MPKI_All"
+    },
+    {
+        "BriefDescription": "L3 cache true misses per kilo instruction for retired demand loads",
+        "MetricExpr": "1000 * MEM_LOAD_RETIRED.L3_MISS / INST_RETIRED.ANY",
+        "MetricGroup": "CacheMisses",
+        "MetricName": "L3MPKI"
+    },
+    {
+        "BriefDescription": "Rate of silent evictions from the L2 cache per Kilo instruction where the evicted lines are dropped (no writeback to L3 or memory)",
+        "MetricExpr": "1000 * L2_LINES_OUT.SILENT / INST_RETIRED.ANY",
+        "MetricGroup": "L2Evicts;Server",
+        "MetricName": "L2_Evictions_Silent_PKI"
+    },
+    {
+        "BriefDescription": "Rate of non silent evictions from the L2 cache per Kilo instruction",
+        "MetricExpr": "1000 * L2_LINES_OUT.NON_SILENT / INST_RETIRED.ANY",
+        "MetricGroup": "L2Evicts;Server",
+        "MetricName": "L2_Evictions_NonSilent_PKI"
+    },
+    {
+        "BriefDescription": "Average CPU Utilization",
+        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
+        "MetricGroup": "HPC;Summary",
+        "MetricName": "CPU_Utilization"
+    },
+    {
+        "BriefDescription": "Measured Average Frequency for unhalted processors [GHz]",
+        "MetricExpr": "(CPU_CLK_UNHALTED.THREAD / CPU_CLK_UNHALTED.REF_TSC) * msr@tsc@ / 1000000000 / duration_time",
+        "MetricGroup": "Summary;Power",
+        "MetricName": "Average_Frequency"
+    },
+    {
+        "BriefDescription": "Giga Floating Point Operations Per Second",
+        "MetricExpr": "( ( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE ) / 1000000000 ) / duration_time",
+        "MetricGroup": "Flops;HPC",
+        "MetricName": "GFLOPs"
+    },
+    {
+        "BriefDescription": "Average Frequency Utilization relative nominal frequency",
+        "MetricExpr": "CPU_CLK_UNHALTED.THREAD / CPU_CLK_UNHALTED.REF_TSC",
+        "MetricGroup": "Power",
+        "MetricName": "Turbo_Utilization"
+    },
+    {
+        "BriefDescription": "Fraction of cycles where both hardware Logical Processors were active",
+        "MetricExpr": "1 - CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_DISTRIBUTED if #SMT_on else 0",
+        "MetricGroup": "SMT",
+        "MetricName": "SMT_2T_Utilization"
+    },
+    {
+        "BriefDescription": "Fraction of cycles spent in the Operating System (OS) Kernel mode",
+        "MetricExpr": "CPU_CLK_UNHALTED.THREAD_P:k / CPU_CLK_UNHALTED.THREAD",
+        "MetricGroup": "OS",
+        "MetricName": "Kernel_Utilization"
+    },
+    {
+        "BriefDescription": "Average external Memory Bandwidth Use for reads and writes [GB / sec]",
+        "MetricExpr": "( 64 * ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) / 1000000000 ) / duration_time",
+        "MetricGroup": "HPC;MemoryBW;SoC",
+        "MetricName": "DRAM_BW_Use"
+    },
+    {
+        "BriefDescription": "Average latency of data read request to external memory (in nanoseconds). Accounts for demand loads and L1/L2 prefetches",
+        "MetricExpr": "1000000000 * ( UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD / UNC_CHA_TOR_INSERTS.IA_MISS_DRD ) / ( cha_0@event\\=0x0@ / duration_time )",
+        "MetricGroup": "MemoryLat;SoC",
+        "MetricName": "MEM_Read_Latency"
+    },
+    {
+        "BriefDescription": "Average number of parallel data read requests to external memory. Accounts for demand loads and L1/L2 prefetches",
+        "MetricExpr": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD / cha@event\\=0x36\\,umask\\=0xC817FE01\\,thresh\\=1@",
+        "MetricGroup": "MemoryBW;SoC",
+        "MetricName": "MEM_Parallel_Reads"
+    },
+    {
+        "BriefDescription": "Average latency of data read request to external 3D X-Point memory [in nanoseconds]. Accounts for demand loads and L1/L2 data-read prefetches",
+        "MetricExpr": "( 1000000000 * ( UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_PMM / UNC_CHA_TOR_INSERTS.IA_MISS_DRD_PMM ) / cha_0@event\\=0x0@ )",
+        "MetricGroup": "MemoryLat;SoC;Server",
+        "MetricName": "MEM_PMM_Read_Latency"
+    },
+    {
+        "BriefDescription": "Average 3DXP Memory Bandwidth Use for reads [GB / sec]",
+        "MetricExpr": "( ( 64 * imc@event\\=0xe3@ / 1000000000 ) / duration_time )",
+        "MetricGroup": "MemoryBW;SoC;Server",
+        "MetricName": "PMM_Read_BW"
+    },
+    {
+        "BriefDescription": "Average 3DXP Memory Bandwidth Use for Writes [GB / sec]",
+        "MetricExpr": "( ( 64 * imc@event\\=0xe7@ / 1000000000 ) / duration_time )",
+        "MetricGroup": "MemoryBW;SoC;Server",
+        "MetricName": "PMM_Write_BW"
+    },
+    {
+        "BriefDescription": "Average IO (network or disk) Bandwidth Use for Writes [GB / sec]",
+        "MetricExpr": "UNC_CHA_TOR_INSERTS.IO_PCIRDCUR * 64 / 1000000000 / duration_time",
+        "MetricGroup": "IoBW;SoC;Server",
+        "MetricName": "IO_Write_BW"
+    },
+    {
+        "BriefDescription": "Average IO (network or disk) Bandwidth Use for Reads [GB / sec]",
+        "MetricExpr": "( UNC_CHA_TOR_INSERTS.IO_HIT_ITOM + UNC_CHA_TOR_INSERTS.IO_MISS_ITOM + UNC_CHA_TOR_INSERTS.IO_HIT_ITOMCACHENEAR + UNC_CHA_TOR_INSERTS.IO_MISS_ITOMCACHENEAR ) * 64 / 1000000000 / duration_time",
+        "MetricGroup": "IoBW;SoC;Server",
+        "MetricName": "IO_Read_BW"
+    },
+    {
+        "BriefDescription": "Socket actual clocks when any core is active on that socket",
+        "MetricExpr": "cha_0@event\\=0x0@",
+        "MetricGroup": "SoC",
+        "MetricName": "Socket_CLKS"
+    },
+    {
+        "BriefDescription": "Instructions per Far Branch ( Far Branches apply upon transition from application to operating system, handling interrupts, exceptions) [lower number means higher occurrence rate]",
+        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.FAR_BRANCH:u",
+        "MetricGroup": "Branches;OS",
+        "MetricName": "IpFarBranch"
+    },
+    {
+        "BriefDescription": "C1 residency percent per core",
+        "MetricExpr": "(cstate_core@c1\\-residency@ / msr@tsc@) * 100",
+        "MetricGroup": "Power",
+        "MetricName": "C1_Core_Residency"
+    },
+    {
+        "BriefDescription": "C6 residency percent per core",
+        "MetricExpr": "(cstate_core@c6\\-residency@ / msr@tsc@) * 100",
+        "MetricGroup": "Power",
+        "MetricName": "C6_Core_Residency"
+    },
+    {
+        "BriefDescription": "C2 residency percent per package",
+        "MetricExpr": "(cstate_pkg@c2\\-residency@ / msr@tsc@) * 100",
+        "MetricGroup": "Power",
+        "MetricName": "C2_Pkg_Residency"
+    },
+    {
+        "BriefDescription": "C6 residency percent per package",
+        "MetricExpr": "(cstate_pkg@c6\\-residency@ / msr@tsc@) * 100",
+        "MetricGroup": "Power",
+        "MetricName": "C6_Pkg_Residency"
+    },
+]
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v4] perf vendor events: Add metrics for Icelake Server
  2021-08-06  7:54 [PATCH v4] perf vendor events: Add metrics for Icelake Server Jin Yao
@ 2021-08-06  8:16 ` Ian Rogers
  2021-08-06 18:20   ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 3+ messages in thread
From: Ian Rogers @ 2021-08-06  8:16 UTC (permalink / raw)
  To: Jin Yao
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, LKML, linux-perf-users, Andi Kleen,
	Kan Liang, Jin, Yao

On Fri, Aug 6, 2021, 12:55 AM Jin Yao <yao.jin@linux.intel.com> wrote:
>
> Add JSON metrics for Icelake Server to perf.
>
> Based on TMA metrics 4.21 at 01.org.
> https://download.01.org/perfmon/
>
> Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
> Reviewed-by: Andi Kleen <ak@linux.intel.com>

Reviewed-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
> v4:
>  - Since now perf tool can support #SMT_on in metric expression as a token,
>    support #SMT_on in 'ILP' and 'SMT_2T_Utilization'.
>
> v3:
>  - PMU cstate_core and cstate_pkg are supported for ICX since 5.14-rc1,
>    add cstate metrics for Core C1/C6 and Package C2/C6.
>
> v2:
>  - Fix perf test 10 error.
>
>    # ./perf test 10
>    10: PMU events                                                      :
>    10.1: PMU event table sanity                                        : Ok
>    10.2: PMU event map aliases                                         : Ok
>    10.3: Parsing of PMU event table metrics                            : Ok
>    10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
>
>  - Remove cstate metrics because the kernel has not supported
>    cstate_core and cstate_core for Icelake server.
>
>  - Remove the topdown L1/L2 metrics.
>
>  .../arch/x86/icelakex/icx-metrics.json        | 315 ++++++++++++++++++
>  1 file changed, 315 insertions(+)
>  create mode 100644 tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json
>
> diff --git a/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json b/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json
> new file mode 100644
> index 000000000000..57ddbb9f9b31
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json
> @@ -0,0 +1,315 @@
> +[
> +    {
> +        "BriefDescription": "Instructions Per Cycle (per Logical Processor)",
> +        "MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD",
> +        "MetricGroup": "Summary",
> +        "MetricName": "IPC"
> +    },
> +    {
> +        "BriefDescription": "Uops Per Instruction",
> +        "MetricExpr": "UOPS_RETIRED.SLOTS / INST_RETIRED.ANY",
> +        "MetricGroup": "Pipeline;Retire",
> +        "MetricName": "UPI"
> +    },
> +    {
> +        "BriefDescription": "Instruction per taken branch",
> +        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN",
> +        "MetricGroup": "Branches;FetchBW;PGO",
> +        "MetricName": "IpTB"
> +    },
> +    {
> +        "BriefDescription": "Cycles Per Instruction (per Logical Processor)",
> +        "MetricExpr": "1 / (INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD)",
> +        "MetricGroup": "Pipeline",
> +        "MetricName": "CPI"
> +    },
> +    {
> +        "BriefDescription": "Per-Logical Processor actual clocks when the Logical Processor is active.",
> +        "MetricExpr": "CPU_CLK_UNHALTED.THREAD",
> +        "MetricGroup": "Pipeline",
> +        "MetricName": "CLKS"
> +    },
> +    {
> +        "BriefDescription": "Instructions Per Cycle (per physical core)",
> +        "MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.DISTRIBUTED",
> +        "MetricGroup": "SMT;TmaL1",
> +        "MetricName": "CoreIPC"
> +    },
> +    {
> +        "BriefDescription": "Floating Point Operations Per Cycle",
> +        "MetricExpr": "( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE ) / CPU_CLK_UNHALTED.DISTRIBUTED",
> +        "MetricGroup": "Flops",
> +        "MetricName": "FLOPc"
> +    },
> +    {
> +        "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is at least 1 uop executed)",
> +        "MetricExpr": "UOPS_EXECUTED.THREAD / (( UOPS_EXECUTED.CORE_CYCLES_GE_1 / 2 ) if #SMT_on else UOPS_EXECUTED.CORE_CYCLES_GE_1)",
> +        "MetricGroup": "Pipeline;PortsUtil",
> +        "MetricName": "ILP"
> +    },
> +    {
> +        "BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear)",
> +        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
> +        "MetricGroup": "BrMispredicts",
> +        "MetricName": "IpMispredict"
> +    },
> +    {
> +        "BriefDescription": "Core actual clocks when any Logical Processor is active on the Physical Core",
> +        "MetricExpr": "CPU_CLK_UNHALTED.DISTRIBUTED",
> +        "MetricGroup": "SMT",
> +        "MetricName": "CORE_CLKS"
> +    },
> +    {
> +        "BriefDescription": "Instructions per Load (lower number means higher occurrence rate)",
> +        "MetricExpr": "INST_RETIRED.ANY / MEM_INST_RETIRED.ALL_LOADS",
> +        "MetricGroup": "InsType",
> +        "MetricName": "IpLoad"
> +    },
> +    {
> +        "BriefDescription": "Instructions per Store (lower number means higher occurrence rate)",
> +        "MetricExpr": "INST_RETIRED.ANY / MEM_INST_RETIRED.ALL_STORES",
> +        "MetricGroup": "InsType",
> +        "MetricName": "IpStore"
> +    },
> +    {
> +        "BriefDescription": "Instructions per Branch (lower number means higher occurrence rate)",
> +        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.ALL_BRANCHES",
> +        "MetricGroup": "Branches;InsType",
> +        "MetricName": "IpBranch"
> +    },
> +    {
> +        "BriefDescription": "Instructions per (near) call (lower number means higher occurrence rate)",
> +        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_CALL",
> +        "MetricGroup": "Branches",
> +        "MetricName": "IpCall"
> +    },
> +    {
> +        "BriefDescription": "Branch instructions per taken branch. ",
> +        "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / BR_INST_RETIRED.NEAR_TAKEN",
> +        "MetricGroup": "Branches;PGO",
> +        "MetricName": "BpTkBranch"
> +    },
> +    {
> +        "BriefDescription": "Instructions per Floating Point (FP) Operation (lower number means higher occurrence rate)",
> +        "MetricExpr": "INST_RETIRED.ANY / ( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE )",
> +        "MetricGroup": "Flops;FpArith;InsType",
> +        "MetricName": "IpFLOP"
> +    },
> +    {
> +        "BriefDescription": "Total number of retired Instructions, Sample with: INST_RETIRED.PREC_DIST",
> +        "MetricExpr": "INST_RETIRED.ANY",
> +        "MetricGroup": "Summary;TmaL1",
> +        "MetricName": "Instructions"
> +    },
> +    {
> +        "BriefDescription": "Fraction of Uops delivered by the LSD (Loop Stream Detector; aka Loop Cache)",
> +        "MetricExpr": "LSD.UOPS / (IDQ.DSB_UOPS + LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS)",
> +        "MetricGroup": "LSD",
> +        "MetricName": "LSD_Coverage"
> +    },
> +    {
> +        "BriefDescription": "Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache)",
> +        "MetricExpr": "IDQ.DSB_UOPS / (IDQ.DSB_UOPS + LSD.UOPS + IDQ.MITE_UOPS + IDQ.MS_UOPS)",
> +        "MetricGroup": "DSB;FetchBW",
> +        "MetricName": "DSB_Coverage"
> +    },
> +    {
> +        "BriefDescription": "Actual Average Latency for L1 data-cache miss demand loads (in core cycles)",
> +        "MetricExpr": "L1D_PEND_MISS.PENDING / ( MEM_LOAD_RETIRED.L1_MISS + MEM_LOAD_RETIRED.FB_HIT )",
> +        "MetricGroup": "MemoryBound;MemoryLat",
> +        "MetricName": "Load_Miss_Real_Latency"
> +    },
> +    {
> +        "BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss. Per-Logical Processor)",
> +        "MetricExpr": "L1D_PEND_MISS.PENDING / L1D_PEND_MISS.PENDING_CYCLES",
> +        "MetricGroup": "MemoryBound;MemoryBW",
> +        "MetricName": "MLP"
> +    },
> +    {
> +        "BriefDescription": "Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses",
> +        "MetricConstraint": "NO_NMI_WATCHDOG",
> +        "MetricExpr": "( ITLB_MISSES.WALK_PENDING + DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING ) / ( 2 * CPU_CLK_UNHALTED.DISTRIBUTED )",
> +        "MetricGroup": "MemoryTLB",
> +        "MetricName": "Page_Walks_Utilization"
> +    },
> +    {
> +        "BriefDescription": "Average data fill bandwidth to the L1 data cache [GB / sec]",
> +        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time",
> +        "MetricGroup": "MemoryBW",
> +        "MetricName": "L1D_Cache_Fill_BW"
> +    },
> +    {
> +        "BriefDescription": "Average data fill bandwidth to the L2 cache [GB / sec]",
> +        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time",
> +        "MetricGroup": "MemoryBW",
> +        "MetricName": "L2_Cache_Fill_BW"
> +    },
> +    {
> +        "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]",
> +        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration_time",
> +        "MetricGroup": "MemoryBW",
> +        "MetricName": "L3_Cache_Fill_BW"
> +    },
> +    {
> +        "BriefDescription": "Average per-core data access bandwidth to the L3 cache [GB / sec]",
> +        "MetricExpr": "64 * OFFCORE_REQUESTS.ALL_REQUESTS / 1000000000 / duration_time",
> +        "MetricGroup": "MemoryBW;Offcore",
> +        "MetricName": "L3_Cache_Access_BW"
> +    },
> +    {
> +        "BriefDescription": "L1 cache true misses per kilo instruction for retired demand loads",
> +        "MetricExpr": "1000 * MEM_LOAD_RETIRED.L1_MISS / INST_RETIRED.ANY",
> +        "MetricGroup": "CacheMisses",
> +        "MetricName": "L1MPKI"
> +    },
> +    {
> +        "BriefDescription": "L2 cache true misses per kilo instruction for retired demand loads",
> +        "MetricExpr": "1000 * MEM_LOAD_RETIRED.L2_MISS / INST_RETIRED.ANY",
> +        "MetricGroup": "CacheMisses",
> +        "MetricName": "L2MPKI"
> +    },
> +    {
> +        "BriefDescription": "L2 cache misses per kilo instruction for all request types (including speculative)",
> +        "MetricExpr": "1000 * ( ( OFFCORE_REQUESTS.ALL_DATA_RD - OFFCORE_REQUESTS.DEMAND_DATA_RD ) + L2_RQSTS.ALL_DEMAND_MISS + L2_RQSTS.SWPF_MISS ) / INST_RETIRED.ANY",
> +        "MetricGroup": "CacheMisses;Offcore",
> +        "MetricName": "L2MPKI_All"
> +    },
> +    {
> +        "BriefDescription": "L3 cache true misses per kilo instruction for retired demand loads",
> +        "MetricExpr": "1000 * MEM_LOAD_RETIRED.L3_MISS / INST_RETIRED.ANY",
> +        "MetricGroup": "CacheMisses",
> +        "MetricName": "L3MPKI"
> +    },
> +    {
> +        "BriefDescription": "Rate of silent evictions from the L2 cache per Kilo instruction where the evicted lines are dropped (no writeback to L3 or memory)",
> +        "MetricExpr": "1000 * L2_LINES_OUT.SILENT / INST_RETIRED.ANY",
> +        "MetricGroup": "L2Evicts;Server",
> +        "MetricName": "L2_Evictions_Silent_PKI"
> +    },
> +    {
> +        "BriefDescription": "Rate of non silent evictions from the L2 cache per Kilo instruction",
> +        "MetricExpr": "1000 * L2_LINES_OUT.NON_SILENT / INST_RETIRED.ANY",
> +        "MetricGroup": "L2Evicts;Server",
> +        "MetricName": "L2_Evictions_NonSilent_PKI"
> +    },
> +    {
> +        "BriefDescription": "Average CPU Utilization",
> +        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
> +        "MetricGroup": "HPC;Summary",
> +        "MetricName": "CPU_Utilization"
> +    },
> +    {
> +        "BriefDescription": "Measured Average Frequency for unhalted processors [GHz]",
> +        "MetricExpr": "(CPU_CLK_UNHALTED.THREAD / CPU_CLK_UNHALTED.REF_TSC) * msr@tsc@ / 1000000000 / duration_time",
> +        "MetricGroup": "Summary;Power",
> +        "MetricName": "Average_Frequency"
> +    },
> +    {
> +        "BriefDescription": "Giga Floating Point Operations Per Second",
> +        "MetricExpr": "( ( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE ) / 1000000000 ) / duration_time",
> +        "MetricGroup": "Flops;HPC",
> +        "MetricName": "GFLOPs"
> +    },
> +    {
> +        "BriefDescription": "Average Frequency Utilization relative nominal frequency",
> +        "MetricExpr": "CPU_CLK_UNHALTED.THREAD / CPU_CLK_UNHALTED.REF_TSC",
> +        "MetricGroup": "Power",
> +        "MetricName": "Turbo_Utilization"
> +    },
> +    {
> +        "BriefDescription": "Fraction of cycles where both hardware Logical Processors were active",
> +        "MetricExpr": "1 - CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_DISTRIBUTED if #SMT_on else 0",
> +        "MetricGroup": "SMT",
> +        "MetricName": "SMT_2T_Utilization"
> +    },
> +    {
> +        "BriefDescription": "Fraction of cycles spent in the Operating System (OS) Kernel mode",
> +        "MetricExpr": "CPU_CLK_UNHALTED.THREAD_P:k / CPU_CLK_UNHALTED.THREAD",
> +        "MetricGroup": "OS",
> +        "MetricName": "Kernel_Utilization"
> +    },
> +    {
> +        "BriefDescription": "Average external Memory Bandwidth Use for reads and writes [GB / sec]",
> +        "MetricExpr": "( 64 * ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) / 1000000000 ) / duration_time",
> +        "MetricGroup": "HPC;MemoryBW;SoC",
> +        "MetricName": "DRAM_BW_Use"
> +    },
> +    {
> +        "BriefDescription": "Average latency of data read request to external memory (in nanoseconds). Accounts for demand loads and L1/L2 prefetches",
> +        "MetricExpr": "1000000000 * ( UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD / UNC_CHA_TOR_INSERTS.IA_MISS_DRD ) / ( cha_0@event\\=0x0@ / duration_time )",
> +        "MetricGroup": "MemoryLat;SoC",
> +        "MetricName": "MEM_Read_Latency"
> +    },
> +    {
> +        "BriefDescription": "Average number of parallel data read requests to external memory. Accounts for demand loads and L1/L2 prefetches",
> +        "MetricExpr": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD / cha@event\\=0x36\\,umask\\=0xC817FE01\\,thresh\\=1@",
> +        "MetricGroup": "MemoryBW;SoC",
> +        "MetricName": "MEM_Parallel_Reads"
> +    },
> +    {
> +        "BriefDescription": "Average latency of data read request to external 3D X-Point memory [in nanoseconds]. Accounts for demand loads and L1/L2 data-read prefetches",
> +        "MetricExpr": "( 1000000000 * ( UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_PMM / UNC_CHA_TOR_INSERTS.IA_MISS_DRD_PMM ) / cha_0@event\\=0x0@ )",
> +        "MetricGroup": "MemoryLat;SoC;Server",
> +        "MetricName": "MEM_PMM_Read_Latency"
> +    },
> +    {
> +        "BriefDescription": "Average 3DXP Memory Bandwidth Use for reads [GB / sec]",
> +        "MetricExpr": "( ( 64 * imc@event\\=0xe3@ / 1000000000 ) / duration_time )",
> +        "MetricGroup": "MemoryBW;SoC;Server",
> +        "MetricName": "PMM_Read_BW"
> +    },
> +    {
> +        "BriefDescription": "Average 3DXP Memory Bandwidth Use for Writes [GB / sec]",
> +        "MetricExpr": "( ( 64 * imc@event\\=0xe7@ / 1000000000 ) / duration_time )",
> +        "MetricGroup": "MemoryBW;SoC;Server",
> +        "MetricName": "PMM_Write_BW"
> +    },
> +    {
> +        "BriefDescription": "Average IO (network or disk) Bandwidth Use for Writes [GB / sec]",
> +        "MetricExpr": "UNC_CHA_TOR_INSERTS.IO_PCIRDCUR * 64 / 1000000000 / duration_time",
> +        "MetricGroup": "IoBW;SoC;Server",
> +        "MetricName": "IO_Write_BW"
> +    },
> +    {
> +        "BriefDescription": "Average IO (network or disk) Bandwidth Use for Reads [GB / sec]",
> +        "MetricExpr": "( UNC_CHA_TOR_INSERTS.IO_HIT_ITOM + UNC_CHA_TOR_INSERTS.IO_MISS_ITOM + UNC_CHA_TOR_INSERTS.IO_HIT_ITOMCACHENEAR + UNC_CHA_TOR_INSERTS.IO_MISS_ITOMCACHENEAR ) * 64 / 1000000000 / duration_time",
> +        "MetricGroup": "IoBW;SoC;Server",
> +        "MetricName": "IO_Read_BW"
> +    },
> +    {
> +        "BriefDescription": "Socket actual clocks when any core is active on that socket",
> +        "MetricExpr": "cha_0@event\\=0x0@",
> +        "MetricGroup": "SoC",
> +        "MetricName": "Socket_CLKS"
> +    },
> +    {
> +        "BriefDescription": "Instructions per Far Branch ( Far Branches apply upon transition from application to operating system, handling interrupts, exceptions) [lower number means higher occurrence rate]",
> +        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.FAR_BRANCH:u",
> +        "MetricGroup": "Branches;OS",
> +        "MetricName": "IpFarBranch"
> +    },
> +    {
> +        "BriefDescription": "C1 residency percent per core",
> +        "MetricExpr": "(cstate_core@c1\\-residency@ / msr@tsc@) * 100",
> +        "MetricGroup": "Power",
> +        "MetricName": "C1_Core_Residency"
> +    },
> +    {
> +        "BriefDescription": "C6 residency percent per core",
> +        "MetricExpr": "(cstate_core@c6\\-residency@ / msr@tsc@) * 100",
> +        "MetricGroup": "Power",
> +        "MetricName": "C6_Core_Residency"
> +    },
> +    {
> +        "BriefDescription": "C2 residency percent per package",
> +        "MetricExpr": "(cstate_pkg@c2\\-residency@ / msr@tsc@) * 100",
> +        "MetricGroup": "Power",
> +        "MetricName": "C2_Pkg_Residency"
> +    },
> +    {
> +        "BriefDescription": "C6 residency percent per package",
> +        "MetricExpr": "(cstate_pkg@c6\\-residency@ / msr@tsc@) * 100",
> +        "MetricGroup": "Power",
> +        "MetricName": "C6_Pkg_Residency"
> +    },
> +]
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v4] perf vendor events: Add metrics for Icelake Server
  2021-08-06  8:16 ` Ian Rogers
@ 2021-08-06 18:20   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 3+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-08-06 18:20 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Jin Yao, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, LKML, linux-perf-users, Andi Kleen,
	Kan Liang, Jin, Yao

Em Fri, Aug 06, 2021 at 01:16:57AM -0700, Ian Rogers escreveu:
> On Fri, Aug 6, 2021, 12:55 AM Jin Yao <yao.jin@linux.intel.com> wrote:
> >
> > Add JSON metrics for Icelake Server to perf.
> >
> > Based on TMA metrics 4.21 at 01.org.
> > https://download.01.org/perfmon/
> >
> > Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
> > Reviewed-by: Andi Kleen <ak@linux.intel.com>
> 
> Reviewed-by: Ian Rogers <irogers@google.com>

Thanks, applied.

- Arnaldo


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-08-06 18:20 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-06  7:54 [PATCH v4] perf vendor events: Add metrics for Icelake Server Jin Yao
2021-08-06  8:16 ` Ian Rogers
2021-08-06 18:20   ` Arnaldo Carvalho de Melo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.