From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F317C433F5 for ; Tue, 5 Apr 2022 11:46:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1379099AbiDELkS (ORCPT ); Tue, 5 Apr 2022 07:40:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36676 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1354223AbiDEKMT (ORCPT ); Tue, 5 Apr 2022 06:12:19 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 706B55A179; Tue, 5 Apr 2022 02:59:14 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id D7BA7B81B18; Tue, 5 Apr 2022 09:59:11 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 87F8CC385A1; Tue, 5 Apr 2022 09:59:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1649152750; bh=ApFNnPCZHdxiMpggNlrNGwnqMbdd9CoSmHUKroi2qZ4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=FThiyJOVzrMoqqOFGaZ32vxjxClgnRWL3mOgHZz58nx1Dd3jx5omGCjQQAVAQbiAv 51hhCrowuA25xYzOkeXR1vFfdrzvQHWJCa0t1INC7p3B3TgMiX9k1NZjZYUuxdFZQ4 paNZ8pyTXCID/lk6jlbUPwdzryUuE7Q2VARUeMnU= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Kan Liang , Ian Rogers , Alexander Shishkin , Alexandre Torgue , Andi Kleen , Ingo Molnar , James Clark , Jin Yao , Jiri Olsa , John Garry , Mark Rutland , Maxime Coquelin , Namhyung Kim , Peter Zijlstra , Stephane Eranian , Zhengjun Xing , Arnaldo Carvalho de Melo Subject: [PATCH 5.15 889/913] perf vendor events: Update metrics for SkyLake Server Date: Tue, 5 Apr 2022 09:32:31 +0200 Message-Id: <20220405070406.467867483@linuxfoundation.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220405070339.801210740@linuxfoundation.org> References: <20220405070339.801210740@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Ian Rogers commit 3bad20d7d129c3b3063658a0f83974dfe6dac5c4 upstream. Based on TMA_metrics-full.csv version 4.3 at 01.org: https://download.01.org/perfmon/ Events are updated to version 1.26: https://download.01.org/perfmon/SKX Json files generated by: https://github.com/intel/event-converter-for-linux-perf Fixes were made that allow the skx-metrics.json to successfully generate, bringing back TopdownL1 metrics. Tested: $ perf test ... 6: Parse event definition strings : Ok 7: Simple expression parser : Ok ... 9: Parse perf pmu format : Ok 10: PMU events : 10.1: PMU event table sanity : Ok 10.2: PMU event map aliases : Ok 10.3: Parsing of PMU event table metrics : Ok 10.4: Parsing of PMU event table metrics with fake PMUs : Ok ... 68: Parse and process metrics : Ok ... 88: perf stat metrics (shadow stat) test : Ok 89: perf all metricgroups test : Ok 90: perf all metrics test : Sk= ip 91: perf all PMU test : Ok ... 90 skips due to a lack of floating point samples, which is understandable. Fixes: c4ad8fabd03f76ed ("perf vendor events: Update metrics for SkyLake Se= rver") Reviewed-by: Kan Liang Signed-off-by: Ian Rogers Cc: Alexander Shishkin Cc: Alexandre Torgue Cc: Andi Kleen Cc: Ingo Molnar Cc: James Clark Cc: Jin Yao Cc: Jiri Olsa Cc: John Garry Cc: Mark Rutland Cc: Maxime Coquelin Cc: Namhyung Kim Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Zhengjun Xing Link: https://lore.kernel.org/r/20220201015858.1226914-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Greg Kroah-Hartman --- tools/perf/pmu-events/arch/x86/skylakex/cache.json | 111 +- tools/perf/pmu-events/arch/x86/skylakex/floating-point.json | 24=20 tools/perf/pmu-events/arch/x86/skylakex/frontend.json | 18=20 tools/perf/pmu-events/arch/x86/skylakex/memory.json | 96 +- tools/perf/pmu-events/arch/x86/skylakex/pipeline.json | 11=20 tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json | 461 +++++++= +++-- tools/perf/pmu-events/arch/x86/skylakex/uncore-other.json | 23=20 7 files changed, 591 insertions(+), 153 deletions(-) --- a/tools/perf/pmu-events/arch/x86/skylakex/cache.json +++ b/tools/perf/pmu-events/arch/x86/skylakex/cache.json @@ -315,6 +315,19 @@ "UMask": "0x82" }, { + "BriefDescription": "All retired memory instructions.", + "Counter": "0,1,2,3", + "CounterHTOff": "0,1,2,3", + "Data_LA": "1", + "EventCode": "0xD0", + "EventName": "MEM_INST_RETIRED.ANY", + "L1_Hit_Indication": "1", + "PEBS": "1", + "PublicDescription": "Counts all retired memory instructions - loa= ds and stores.", + "SampleAfterValue": "2000003", + "UMask": "0x83" + }, + { "BriefDescription": "Retired load instructions with locked access.= ", "Counter": "0,1,2,3", "CounterHTOff": "0,1,2,3", @@ -358,6 +371,7 @@ "EventCode": "0xD0", "EventName": "MEM_INST_RETIRED.STLB_MISS_LOADS", "PEBS": "1", + "PublicDescription": "Number of retired load instructions that (st= art a) miss in the 2nd-level TLB (STLB).", "SampleAfterValue": "100003", "UMask": "0x11" }, @@ -370,6 +384,7 @@ "EventName": "MEM_INST_RETIRED.STLB_MISS_STORES", "L1_Hit_Indication": "1", "PEBS": "1", + "PublicDescription": "Number of retired store instructions that (s= tart a) miss in the 2nd-level TLB (STLB).", "SampleAfterValue": "100003", "UMask": "0x12" }, @@ -733,7 +748,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_DATA_RD.ANY_RESPONSE", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x0000010491", + "MSRValue": "0x10491", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -772,7 +787,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_DATA_RD.L3_HIT.HIT_OTHER_CORE_N= O_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x04003C0491", + "MSRValue": "0x4003C0491", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -785,7 +800,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_DATA_RD.L3_HIT.NO_SNOOP_NEEDED", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x01003C0491", + "MSRValue": "0x1003C0491", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -798,7 +813,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_DATA_RD.L3_HIT.SNOOP_HIT_WITH_F= WD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x08003C0491", + "MSRValue": "0x8003C0491", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -811,7 +826,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_PF_DATA_RD.ANY_RESPONSE", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x0000010490", + "MSRValue": "0x10490", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -850,7 +865,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_PF_DATA_RD.L3_HIT.HIT_OTHER_COR= E_NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x04003C0490", + "MSRValue": "0x4003C0490", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -863,7 +878,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_PF_DATA_RD.L3_HIT.NO_SNOOP_NEED= ED", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x01003C0490", + "MSRValue": "0x1003C0490", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -876,7 +891,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_PF_DATA_RD.L3_HIT.SNOOP_HIT_WIT= H_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x08003C0490", + "MSRValue": "0x8003C0490", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -889,7 +904,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_PF_RFO.ANY_RESPONSE", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x0000010120", + "MSRValue": "0x10120", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -928,7 +943,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_PF_RFO.L3_HIT.HIT_OTHER_CORE_NO= _FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x04003C0120", + "MSRValue": "0x4003C0120", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -941,7 +956,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_PF_RFO.L3_HIT.NO_SNOOP_NEEDED", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x01003C0120", + "MSRValue": "0x1003C0120", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -954,7 +969,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_PF_RFO.L3_HIT.SNOOP_HIT_WITH_FW= D", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x08003C0120", + "MSRValue": "0x8003C0120", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -967,7 +982,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_RFO.ANY_RESPONSE", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x0000010122", + "MSRValue": "0x10122", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1006,7 +1021,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_RFO.L3_HIT.HIT_OTHER_CORE_NO_FW= D", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x04003C0122", + "MSRValue": "0x4003C0122", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1019,7 +1034,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_RFO.L3_HIT.NO_SNOOP_NEEDED", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x01003C0122", + "MSRValue": "0x1003C0122", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1032,7 +1047,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_RFO.L3_HIT.SNOOP_HIT_WITH_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x08003C0122", + "MSRValue": "0x8003C0122", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1045,7 +1060,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.DEMAND_CODE_RD.ANY_RESPONSE", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x0000010004", + "MSRValue": "0x10004", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1084,7 +1099,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.DEMAND_CODE_RD.L3_HIT.HIT_OTHER_COR= E_NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x04003C0004", + "MSRValue": "0x4003C0004", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1097,7 +1112,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.DEMAND_CODE_RD.L3_HIT.NO_SNOOP_NEED= ED", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x01003C0004", + "MSRValue": "0x1003C0004", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1110,7 +1125,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.DEMAND_CODE_RD.L3_HIT.SNOOP_HIT_WIT= H_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x08003C0004", + "MSRValue": "0x8003C0004", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1123,7 +1138,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.DEMAND_DATA_RD.ANY_RESPONSE", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x0000010001", + "MSRValue": "0x10001", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1162,7 +1177,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.DEMAND_DATA_RD.L3_HIT.HIT_OTHER_COR= E_NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x04003C0001", + "MSRValue": "0x4003C0001", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1175,7 +1190,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.DEMAND_DATA_RD.L3_HIT.NO_SNOOP_NEED= ED", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x01003C0001", + "MSRValue": "0x1003C0001", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1188,7 +1203,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WIT= H_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x08003C0001", + "MSRValue": "0x8003C0001", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1201,7 +1216,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.DEMAND_RFO.ANY_RESPONSE", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x0000010002", + "MSRValue": "0x10002", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1240,7 +1255,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.DEMAND_RFO.L3_HIT.HIT_OTHER_CORE_NO= _FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x04003C0002", + "MSRValue": "0x4003C0002", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1253,7 +1268,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.DEMAND_RFO.L3_HIT.NO_SNOOP_NEEDED", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x01003C0002", + "MSRValue": "0x1003C0002", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1266,7 +1281,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.DEMAND_RFO.L3_HIT.SNOOP_HIT_WITH_FW= D", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x08003C0002", + "MSRValue": "0x8003C0002", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1279,7 +1294,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L1D_AND_SW.ANY_RESPONSE", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x0000010400", + "MSRValue": "0x10400", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1318,7 +1333,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L1D_AND_SW.L3_HIT.HIT_OTHER_CORE= _NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x04003C0400", + "MSRValue": "0x4003C0400", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1331,7 +1346,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L1D_AND_SW.L3_HIT.NO_SNOOP_NEEDE= D", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x01003C0400", + "MSRValue": "0x1003C0400", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1344,7 +1359,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L1D_AND_SW.L3_HIT.SNOOP_HIT_WITH= _FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x08003C0400", + "MSRValue": "0x8003C0400", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1357,7 +1372,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L2_DATA_RD.ANY_RESPONSE", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x0000010010", + "MSRValue": "0x10010", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1396,7 +1411,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L2_DATA_RD.L3_HIT.HIT_OTHER_CORE= _NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x04003C0010", + "MSRValue": "0x4003C0010", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1409,7 +1424,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L2_DATA_RD.L3_HIT.NO_SNOOP_NEEDE= D", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x01003C0010", + "MSRValue": "0x1003C0010", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1422,7 +1437,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L2_DATA_RD.L3_HIT.SNOOP_HIT_WITH= _FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x08003C0010", + "MSRValue": "0x8003C0010", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1435,7 +1450,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L2_RFO.ANY_RESPONSE", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x0000010020", + "MSRValue": "0x10020", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1474,7 +1489,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L2_RFO.L3_HIT.HIT_OTHER_CORE_NO_= FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x04003C0020", + "MSRValue": "0x4003C0020", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1487,7 +1502,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L2_RFO.L3_HIT.NO_SNOOP_NEEDED", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x01003C0020", + "MSRValue": "0x1003C0020", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1500,7 +1515,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L2_RFO.L3_HIT.SNOOP_HIT_WITH_FWD= ", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x08003C0020", + "MSRValue": "0x8003C0020", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1513,7 +1528,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L3_DATA_RD.ANY_RESPONSE", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x0000010080", + "MSRValue": "0x10080", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1552,7 +1567,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L3_DATA_RD.L3_HIT.HIT_OTHER_CORE= _NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x04003C0080", + "MSRValue": "0x4003C0080", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1565,7 +1580,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L3_DATA_RD.L3_HIT.NO_SNOOP_NEEDE= D", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x01003C0080", + "MSRValue": "0x1003C0080", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1578,7 +1593,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L3_DATA_RD.L3_HIT.SNOOP_HIT_WITH= _FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x08003C0080", + "MSRValue": "0x8003C0080", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1591,7 +1606,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L3_RFO.ANY_RESPONSE", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x0000010100", + "MSRValue": "0x10100", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1630,7 +1645,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L3_RFO.L3_HIT.HIT_OTHER_CORE_NO_= FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x04003C0100", + "MSRValue": "0x4003C0100", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1643,7 +1658,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L3_RFO.L3_HIT.NO_SNOOP_NEEDED", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x01003C0100", + "MSRValue": "0x1003C0100", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1656,7 +1671,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L3_RFO.L3_HIT.SNOOP_HIT_WITH_FWD= ", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x08003C0100", + "MSRValue": "0x8003C0100", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", --- a/tools/perf/pmu-events/arch/x86/skylakex/floating-point.json +++ b/tools/perf/pmu-events/arch/x86/skylakex/floating-point.json @@ -1,73 +1,81 @@ [ { - "BriefDescription": "Number of SSE/AVX computational 128-bit packe= d double precision floating-point instructions retired; some instructions w= ill count twice as noted below. Each count represents 2 computation operat= ions, one for each element. Applies to SSE* and AVX* packed double precisi= on floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQ= RT RSQRT14 RCP14 DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count= twice as they perform 2 calculations per element.", + "BriefDescription": "Counts once for most SIMD 128-bit packed comp= utational double precision floating-point instructions retired. Counts twic= e for DPP and FM(N)ADD/SUB instructions retired.", "Counter": "0,1,2,3", "CounterHTOff": "0,1,2,3,4,5,6,7", "EventCode": "0xC7", "EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE", + "PublicDescription": "Counts once for most SIMD 128-bit packed com= putational double precision floating-point instructions retired; some instr= uctions will count twice as noted below. Each count represents 2 computati= on operations, one for each element. Applies to packed double precision fl= oating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT DP= P FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perf= orm 2 calculations per element. The DAZ and FTZ flags in the MXCSR register= need to be set when using these events.", "SampleAfterValue": "2000003", "UMask": "0x4" }, { - "BriefDescription": "Number of SSE/AVX computational 128-bit packe= d single precision floating-point instructions retired; some instructions w= ill count twice as noted below. Each count represents 4 computation operat= ions, one for each element. Applies to SSE* and AVX* packed single precisi= on floating-point instructions: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 SQRT = DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they pe= rform 2 calculations per element.", + "BriefDescription": "Counts once for most SIMD 128-bit packed comp= utational single precision floating-point instruction retired. Counts twice= for DPP and FM(N)ADD/SUB instructions retired.", "Counter": "0,1,2,3", "CounterHTOff": "0,1,2,3,4,5,6,7", "EventCode": "0xC7", "EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE", + "PublicDescription": "Counts once for most SIMD 128-bit packed com= putational single precision floating-point instructions retired; some instr= uctions will count twice as noted below. Each count represents 4 computati= on operations, one for each element. Applies to packed single precision fl= oating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RS= QRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as= they perform 2 calculations per element. The DAZ and FTZ flags in the MXCS= R register need to be set when using these events.", "SampleAfterValue": "2000003", "UMask": "0x8" }, { - "BriefDescription": "Number of SSE/AVX computational 256-bit packe= d double precision floating-point instructions retired; some instructions w= ill count twice as noted below. Each count represents 4 computation operat= ions, one for each element. Applies to SSE* and AVX* packed double precisi= on floating-point instructions: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 SQRT = DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they pe= rform 2 calculations per element.", + "BriefDescription": "Counts once for most SIMD 256-bit packed doub= le computational precision floating-point instructions retired. Counts twic= e for DPP and FM(N)ADD/SUB instructions retired.", "Counter": "0,1,2,3", "CounterHTOff": "0,1,2,3,4,5,6,7", "EventCode": "0xC7", "EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE", + "PublicDescription": "Counts once for most SIMD 256-bit packed dou= ble computational precision floating-point instructions retired; some instr= uctions will count twice as noted below. Each count represents 4 computati= on operations, one for each element. Applies to packed double precision fl= oating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT FM= (N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calcul= ations per element. The DAZ and FTZ flags in the MXCSR register need to be = set when using these events.", "SampleAfterValue": "2000003", "UMask": "0x10" }, { - "BriefDescription": "Number of SSE/AVX computational 256-bit packe= d single precision floating-point instructions retired; some instructions w= ill count twice as noted below. Each count represents 8 computation operat= ions, one for each element. Applies to SSE* and AVX* packed single precisi= on floating-point instructions: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 SQRT = DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they pe= rform 2 calculations per element.", + "BriefDescription": "Counts once for most SIMD 256-bit packed sing= le computational precision floating-point instructions retired. Counts twic= e for DPP and FM(N)ADD/SUB instructions retired.", "Counter": "0,1,2,3", "CounterHTOff": "0,1,2,3,4,5,6,7", "EventCode": "0xC7", "EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE", + "PublicDescription": "Counts once for most SIMD 256-bit packed sin= gle computational precision floating-point instructions retired; some instr= uctions will count twice as noted below. Each count represents 8 computati= on operations, one for each element. Applies to packed single precision fl= oating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RS= QRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as= they perform 2 calculations per element. The DAZ and FTZ flags in the MXCS= R register need to be set when using these events.", "SampleAfterValue": "2000003", "UMask": "0x20" }, { - "BriefDescription": "Number of SSE/AVX computational 512-bit packe= d double precision floating-point instructions retired; some instructions w= ill count twice as noted below. Each count represents 8 computation operat= ions, one for each element. Applies to SSE* and AVX* packed double precisi= on floating-point instructions: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 SQRT = DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they pe= rform 8 calculations per element.", + "BriefDescription": "Counts number of SSE/AVX computational 512-bi= t packed double precision floating-point instructions retired; some instruc= tions will count twice as noted below. Each count represents 8 computation= operations, one for each element. Applies to SSE* and AVX* packed double = precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14= RCP14 FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform = 2 calculations per element.", "Counter": "0,1,2,3", "CounterHTOff": "0,1,2,3,4,5,6,7", "EventCode": "0xC7", "EventName": "FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE", + "PublicDescription": "Number of SSE/AVX computational 512-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 8 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14 RCP14= FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calc= ulations per element. The DAZ and FTZ flags in the MXCSR register need to b= e set when using these events.", "SampleAfterValue": "2000003", "UMask": "0x40" }, { - "BriefDescription": "Number of SSE/AVX computational 512-bit packe= d single precision floating-point instructions retired; some instructions w= ill count twice as noted below. Each count represents 16 computation opera= tions, one for each element. Applies to SSE* and AVX* packed single precis= ion floating-point instructions: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 SQRT= DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they p= erform 16 calculations per element.", + "BriefDescription": "Counts number of SSE/AVX computational 512-bi= t packed single precision floating-point instructions retired; some instruc= tions will count twice as noted below. Each count represents 16 computatio= n operations, one for each element. Applies to SSE* and AVX* packed single= precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT1= 4 RCP14 FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform= 2 calculations per element.", "Counter": "0,1,2,3", "CounterHTOff": "0,1,2,3,4,5,6,7", "EventCode": "0xC7", "EventName": "FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE", + "PublicDescription": "Number of SSE/AVX computational 512-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 16 computation oper= ations, one for each element. Applies to SSE* and AVX* packed single preci= sion floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14 RCP1= 4 FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 cal= culations per element. The DAZ and FTZ flags in the MXCSR register need to = be set when using these events.", "SampleAfterValue": "2000003", "UMask": "0x80" }, { - "BriefDescription": "Number of SSE/AVX computational scalar double= precision floating-point instructions retired; some instructions will coun= t twice as noted below. Each count represents 1 computation. Applies to SS= E* and AVX* scalar double precision floating-point instructions: ADD SUB MU= L DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB in= structions count twice as they perform 2 calculations per element.", + "BriefDescription": "Counts once for most SIMD scalar computationa= l double precision floating-point instructions retired. Counts twice for DP= P and FM(N)ADD/SUB instructions retired.", "Counter": "0,1,2,3", "CounterHTOff": "0,1,2,3,4,5,6,7", "EventCode": "0xC7", "EventName": "FP_ARITH_INST_RETIRED.SCALAR_DOUBLE", + "PublicDescription": "Counts once for most SIMD scalar computation= al double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 1 computational ope= ration. Applies to SIMD scalar double precision floating-point instructions= : ADD SUB MUL DIV MIN MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions cou= nt twice as they perform 2 calculations per element. The DAZ and FTZ flags = in the MXCSR register need to be set when using these events.", "SampleAfterValue": "2000003", "UMask": "0x1" }, { - "BriefDescription": "Number of SSE/AVX computational scalar single= precision floating-point instructions retired; some instructions will coun= t twice as noted below. Each count represents 1 computation. Applies to SS= E* and AVX* scalar single precision floating-point instructions: ADD SUB MU= L DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB in= structions count twice as they perform 2 calculations per element.", + "BriefDescription": "Counts once for most SIMD scalar computationa= l single precision floating-point instructions retired. Counts twice for DP= P and FM(N)ADD/SUB instructions retired.", "Counter": "0,1,2,3", "CounterHTOff": "0,1,2,3,4,5,6,7", "EventCode": "0xC7", "EventName": "FP_ARITH_INST_RETIRED.SCALAR_SINGLE", + "PublicDescription": "Counts once for most SIMD scalar computation= al single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 1 computational ope= ration. Applies to SIMD scalar single precision floating-point instructions= : ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB. FM(N)ADD/SUB instru= ctions count twice as they perform 2 calculations per element. The DAZ and = FTZ flags in the MXCSR register need to be set when using these events.", "SampleAfterValue": "2000003", "UMask": "0x2" }, --- a/tools/perf/pmu-events/arch/x86/skylakex/frontend.json +++ b/tools/perf/pmu-events/arch/x86/skylakex/frontend.json @@ -30,7 +30,21 @@ "UMask": "0x2" }, { - "BriefDescription": "Retired Instructions who experienced decode s= tream buffer (DSB - the decoded instruction-cache) miss.", + "BriefDescription": "Retired Instructions who experienced DSB miss= .", + "Counter": "0,1,2,3", + "CounterHTOff": "0,1,2,3", + "EventCode": "0xC6", + "EventName": "FRONTEND_RETIRED.ANY_DSB_MISS", + "MSRIndex": "0x3F7", + "MSRValue": "0x1", + "PEBS": "1", + "PublicDescription": "Counts retired Instructions that experienced= DSB (Decode stream buffer i.e. the decoded instruction-cache) miss.", + "SampleAfterValue": "100007", + "TakenAlone": "1", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired Instructions who experienced a critic= al DSB miss.", "Counter": "0,1,2,3", "CounterHTOff": "0,1,2,3", "EventCode": "0xC6", @@ -38,7 +52,7 @@ "MSRIndex": "0x3F7", "MSRValue": "0x11", "PEBS": "1", - "PublicDescription": "Counts retired Instructions that experienced= DSB (Decode stream buffer i.e. the decoded instruction-cache) miss.", + "PublicDescription": "Number of retired Instructions that experien= ced a critical DSB (Decode stream buffer i.e. the decoded instruction-cache= ) miss. Critical means stalls were exposed to the back-end as a result of t= he DSB miss.", "SampleAfterValue": "100007", "TakenAlone": "1", "UMask": "0x1" --- a/tools/perf/pmu-events/arch/x86/skylakex/memory.json +++ b/tools/perf/pmu-events/arch/x86/skylakex/memory.json @@ -299,7 +299,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_DATA_RD.L3_MISS.REMOTE_HIT_FORW= ARD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x083FC00491", + "MSRValue": "0x83FC00491", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -312,7 +312,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_DATA_RD.L3_MISS.SNOOP_MISS_OR_N= O_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x063FC00491", + "MSRValue": "0x63FC00491", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -325,7 +325,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_DATA_RD.L3_MISS_LOCAL_DRAM.SNOO= P_MISS_OR_NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x0604000491", + "MSRValue": "0x604000491", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -338,7 +338,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_DATA_RD.L3_MISS_REMOTE_DRAM.SNO= OP_MISS_OR_NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x063B800491", + "MSRValue": "0x63B800491", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -377,7 +377,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_PF_DATA_RD.L3_MISS.REMOTE_HIT_F= ORWARD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x083FC00490", + "MSRValue": "0x83FC00490", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -390,7 +390,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_PF_DATA_RD.L3_MISS.SNOOP_MISS_O= R_NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x063FC00490", + "MSRValue": "0x63FC00490", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -403,7 +403,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_PF_DATA_RD.L3_MISS_LOCAL_DRAM.S= NOOP_MISS_OR_NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x0604000490", + "MSRValue": "0x604000490", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -416,7 +416,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_PF_DATA_RD.L3_MISS_REMOTE_DRAM.= SNOOP_MISS_OR_NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x063B800490", + "MSRValue": "0x63B800490", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -455,7 +455,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_PF_RFO.L3_MISS.REMOTE_HIT_FORWA= RD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x083FC00120", + "MSRValue": "0x83FC00120", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -468,7 +468,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_PF_RFO.L3_MISS.SNOOP_MISS_OR_NO= _FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x063FC00120", + "MSRValue": "0x63FC00120", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -481,7 +481,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_PF_RFO.L3_MISS_LOCAL_DRAM.SNOOP= _MISS_OR_NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x0604000120", + "MSRValue": "0x604000120", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -494,7 +494,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_PF_RFO.L3_MISS_REMOTE_DRAM.SNOO= P_MISS_OR_NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x063B800120", + "MSRValue": "0x63B800120", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -533,7 +533,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_RFO.L3_MISS.REMOTE_HIT_FORWARD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x083FC00122", + "MSRValue": "0x83FC00122", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -546,7 +546,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_RFO.L3_MISS.SNOOP_MISS_OR_NO_FW= D", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x063FC00122", + "MSRValue": "0x63FC00122", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -559,7 +559,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_RFO.L3_MISS_LOCAL_DRAM.SNOOP_MI= SS_OR_NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x0604000122", + "MSRValue": "0x604000122", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -572,7 +572,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.ALL_RFO.L3_MISS_REMOTE_DRAM.SNOOP_M= ISS_OR_NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x063B800122", + "MSRValue": "0x63B800122", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -611,7 +611,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.DEMAND_CODE_RD.L3_MISS.REMOTE_HIT_F= ORWARD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x083FC00004", + "MSRValue": "0x83FC00004", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -624,7 +624,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.DEMAND_CODE_RD.L3_MISS.SNOOP_MISS_O= R_NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x063FC00004", + "MSRValue": "0x63FC00004", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -637,7 +637,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.DEMAND_CODE_RD.L3_MISS_LOCAL_DRAM.S= NOOP_MISS_OR_NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x0604000004", + "MSRValue": "0x604000004", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -650,7 +650,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.DEMAND_CODE_RD.L3_MISS_REMOTE_DRAM.= SNOOP_MISS_OR_NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x063B800004", + "MSRValue": "0x63B800004", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -689,7 +689,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.DEMAND_DATA_RD.L3_MISS.REMOTE_HIT_F= ORWARD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x083FC00001", + "MSRValue": "0x83FC00001", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -702,7 +702,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.DEMAND_DATA_RD.L3_MISS.SNOOP_MISS_O= R_NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x063FC00001", + "MSRValue": "0x63FC00001", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -715,7 +715,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.DEMAND_DATA_RD.L3_MISS_LOCAL_DRAM.S= NOOP_MISS_OR_NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x0604000001", + "MSRValue": "0x604000001", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -728,7 +728,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.DEMAND_DATA_RD.L3_MISS_REMOTE_DRAM.= SNOOP_MISS_OR_NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x063B800001", + "MSRValue": "0x63B800001", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -767,7 +767,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.DEMAND_RFO.L3_MISS.REMOTE_HIT_FORWA= RD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x083FC00002", + "MSRValue": "0x83FC00002", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -780,7 +780,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.DEMAND_RFO.L3_MISS.SNOOP_MISS_OR_NO= _FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x063FC00002", + "MSRValue": "0x63FC00002", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -793,7 +793,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.DEMAND_RFO.L3_MISS_LOCAL_DRAM.SNOOP= _MISS_OR_NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x0604000002", + "MSRValue": "0x604000002", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -806,7 +806,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.DEMAND_RFO.L3_MISS_REMOTE_DRAM.SNOO= P_MISS_OR_NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x063B800002", + "MSRValue": "0x63B800002", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -845,7 +845,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L1D_AND_SW.L3_MISS.REMOTE_HIT_FO= RWARD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x083FC00400", + "MSRValue": "0x83FC00400", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -858,7 +858,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L1D_AND_SW.L3_MISS.SNOOP_MISS_OR= _NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x063FC00400", + "MSRValue": "0x63FC00400", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -871,7 +871,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L1D_AND_SW.L3_MISS_LOCAL_DRAM.SN= OOP_MISS_OR_NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x0604000400", + "MSRValue": "0x604000400", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -884,7 +884,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L1D_AND_SW.L3_MISS_REMOTE_DRAM.S= NOOP_MISS_OR_NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x063B800400", + "MSRValue": "0x63B800400", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -923,7 +923,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L2_DATA_RD.L3_MISS.REMOTE_HIT_FO= RWARD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x083FC00010", + "MSRValue": "0x83FC00010", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -936,7 +936,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L2_DATA_RD.L3_MISS.SNOOP_MISS_OR= _NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x063FC00010", + "MSRValue": "0x63FC00010", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -949,7 +949,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L2_DATA_RD.L3_MISS_LOCAL_DRAM.SN= OOP_MISS_OR_NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x0604000010", + "MSRValue": "0x604000010", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -962,7 +962,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L2_DATA_RD.L3_MISS_REMOTE_DRAM.S= NOOP_MISS_OR_NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x063B800010", + "MSRValue": "0x63B800010", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1001,7 +1001,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L2_RFO.L3_MISS.REMOTE_HIT_FORWAR= D", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x083FC00020", + "MSRValue": "0x83FC00020", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1014,7 +1014,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L2_RFO.L3_MISS.SNOOP_MISS_OR_NO_= FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x063FC00020", + "MSRValue": "0x63FC00020", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1027,7 +1027,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L2_RFO.L3_MISS_LOCAL_DRAM.SNOOP_= MISS_OR_NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x0604000020", + "MSRValue": "0x604000020", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1040,7 +1040,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L2_RFO.L3_MISS_REMOTE_DRAM.SNOOP= _MISS_OR_NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x063B800020", + "MSRValue": "0x63B800020", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1079,7 +1079,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L3_DATA_RD.L3_MISS.REMOTE_HIT_FO= RWARD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x083FC00080", + "MSRValue": "0x83FC00080", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1092,7 +1092,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L3_DATA_RD.L3_MISS.SNOOP_MISS_OR= _NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x063FC00080", + "MSRValue": "0x63FC00080", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1105,7 +1105,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L3_DATA_RD.L3_MISS_LOCAL_DRAM.SN= OOP_MISS_OR_NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x0604000080", + "MSRValue": "0x604000080", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1118,7 +1118,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L3_DATA_RD.L3_MISS_REMOTE_DRAM.S= NOOP_MISS_OR_NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x063B800080", + "MSRValue": "0x63B800080", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1157,7 +1157,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L3_RFO.L3_MISS.REMOTE_HIT_FORWAR= D", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x083FC00100", + "MSRValue": "0x83FC00100", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1170,7 +1170,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L3_RFO.L3_MISS.SNOOP_MISS_OR_NO_= FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x063FC00100", + "MSRValue": "0x63FC00100", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1183,7 +1183,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L3_RFO.L3_MISS_LOCAL_DRAM.SNOOP_= MISS_OR_NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x0604000100", + "MSRValue": "0x604000100", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", @@ -1196,7 +1196,7 @@ "EventCode": "0xB7, 0xBB", "EventName": "OFFCORE_RESPONSE.PF_L3_RFO.L3_MISS_REMOTE_DRAM.SNOOP= _MISS_OR_NO_FWD", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x063B800100", + "MSRValue": "0x63B800100", "Offcore": "1", "PublicDescription": "Offcore response can be programmed only with= a specific pair of event select and counter MSR, and with specific event c= odes and predefine mask bit value in a dedicated MSR to specify attributes = of the offcore transaction.", "SampleAfterValue": "100003", --- a/tools/perf/pmu-events/arch/x86/skylakex/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/skylakex/pipeline.json @@ -436,6 +436,17 @@ "SampleAfterValue": "2000003" }, { + "BriefDescription": "Number of all retired NOP instructions.", + "Counter": "0,1,2,3", + "CounterHTOff": "0,1,2,3,4,5,6,7", + "Errata": "SKL091, SKL044", + "EventCode": "0xC0", + "EventName": "INST_RETIRED.NOP", + "PEBS": "1", + "SampleAfterValue": "2000003", + "UMask": "0x2" + }, + { "BriefDescription": "Precise instruction retired event with HW to = reduce effect of PEBS shadow in IP distribution", "Counter": "1", "CounterHTOff": "1", --- a/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json +++ b/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json @@ -1,26 +1,167 @@ [ { + "BriefDescription": "This category represents fraction of slots wh= ere the processor's Frontend undersupplies its Backend", + "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * CPU_CLK_UNHALTED= .THREAD)", + "MetricGroup": "TopdownL1", + "MetricName": "Frontend_Bound", + "PublicDescription": "This category represents fraction of slots w= here the processor's Frontend undersupplies its Backend. Frontend denotes t= he first part of the processor core responsible to fetch operations that ar= e executed later on by the Backend part. Within the Frontend; a branch pred= ictor predicts the next address to fetch; cache-lines are fetched from the = memory subsystem; parsed into instructions; and lastly decoded into micro-o= perations (uops). Ideally the Frontend can issue Machine_Width uops every c= ycle to the Backend. Frontend Bound denotes unutilized issue-slots when the= re is no Backend stall; i.e. bubbles where Frontend delivered no uops while= Backend could have accepted them. For example; stalls due to instruction-c= ache misses would be categorized under Frontend Bound." + }, + { + "BriefDescription": "This category represents fraction of slots wh= ere the processor's Frontend undersupplies its Backend. SMT version; use wh= en SMT is enabled and measuring per logical CPU.", + "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * ( ( CPU_CLK_UNHA= LTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHA= LTED.REF_XCLK ) ))", + "MetricGroup": "TopdownL1_SMT", + "MetricName": "Frontend_Bound_SMT", + "PublicDescription": "This category represents fraction of slots w= here the processor's Frontend undersupplies its Backend. Frontend denotes t= he first part of the processor core responsible to fetch operations that ar= e executed later on by the Backend part. Within the Frontend; a branch pred= ictor predicts the next address to fetch; cache-lines are fetched from the = memory subsystem; parsed into instructions; and lastly decoded into micro-o= perations (uops). Ideally the Frontend can issue Machine_Width uops every c= ycle to the Backend. Frontend Bound denotes unutilized issue-slots when the= re is no Backend stall; i.e. bubbles where Frontend delivered no uops while= Backend could have accepted them. For example; stalls due to instruction-c= ache misses would be categorized under Frontend Bound. SMT version; use whe= n SMT is enabled and measuring per logical CPU." + }, + { + "BriefDescription": "This category represents fraction of slots wa= sted due to incorrect speculations", + "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 *= INT_MISC.RECOVERY_CYCLES ) / (4 * CPU_CLK_UNHALTED.THREAD)", + "MetricGroup": "TopdownL1", + "MetricName": "Bad_Speculation", + "PublicDescription": "This category represents fraction of slots w= asted due to incorrect speculations. This include slots used to issue uops = that do not eventually get retired and slots for which the issue-pipeline w= as blocked due to recovery from earlier incorrect speculation. For example;= wasted work due to miss-predicted branches are categorized under Bad Specu= lation category. Incorrect data speculation followed by Memory Ordering Nuk= es is another example." + }, + { + "BriefDescription": "This category represents fraction of slots wa= sted due to incorrect speculations. SMT version; use when SMT is enabled an= d measuring per logical CPU.", + "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 *= ( INT_MISC.RECOVERY_CYCLES_ANY / 2 ) ) / (4 * ( ( CPU_CLK_UNHALTED.THREAD = / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCL= K ) ))", + "MetricGroup": "TopdownL1_SMT", + "MetricName": "Bad_Speculation_SMT", + "PublicDescription": "This category represents fraction of slots w= asted due to incorrect speculations. This include slots used to issue uops = that do not eventually get retired and slots for which the issue-pipeline w= as blocked due to recovery from earlier incorrect speculation. For example;= wasted work due to miss-predicted branches are categorized under Bad Specu= lation category. Incorrect data speculation followed by Memory Ordering Nuk= es is another example. SMT version; use when SMT is enabled and measuring p= er logical CPU." + }, + { + "BriefDescription": "This category represents fraction of slots wh= ere no uops are being delivered due to a lack of required resources for acc= epting new uops in the Backend", + "MetricConstraint": "NO_NMI_WATCHDOG", + "MetricExpr": "1 - (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * CPU_CLK_UNH= ALTED.THREAD)) - ( UOPS_ISSUED.ANY + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * = CPU_CLK_UNHALTED.THREAD)", + "MetricGroup": "TopdownL1", + "MetricName": "Backend_Bound", + "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound." + }, + { + "BriefDescription": "This category represents fraction of slots wh= ere no uops are being delivered due to a lack of required resources for acc= epting new uops in the Backend. SMT version; use when SMT is enabled and me= asuring per logical CPU.", + "MetricExpr": "1 - (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * ( ( CPU_CLK= _UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK= _UNHALTED.REF_XCLK ) ))) - ( UOPS_ISSUED.ANY + 4 * ( INT_MISC.RECOVERY_CYCL= ES_ANY / 2 ) ) / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNH= ALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))", + "MetricGroup": "TopdownL1_SMT", + "MetricName": "Backend_Bound_SMT", + "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound. SMT version; use when SMT is enabled and measuring per lo= gical CPU." + }, + { + "BriefDescription": "This category represents fraction of slots ut= ilized by useful work i.e. issued uops that eventually get retired", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * CPU_CLK_UNHALTED.T= HREAD)", + "MetricGroup": "TopdownL1", + "MetricName": "Retiring", + "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. " + }, + { + "BriefDescription": "This category represents fraction of slots ut= ilized by useful work i.e. issued uops that eventually get retired. SMT ver= sion; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * ( ( CPU_CLK_UNHALT= ED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALT= ED.REF_XCLK ) ))", + "MetricGroup": "TopdownL1_SMT", + "MetricName": "Retiring_SMT", + "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. SMT version; use when SMT= is enabled and measuring per logical CPU." + }, + { + "BriefDescription": "Total pipeline cost of Branch Misprediction r= elated bottlenecks", + "MetricExpr": "100 * ( ((BR_MISP_RETIRED.ALL_BRANCHES / ( BR_MISP_= RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * (( UOPS_ISSUED.ANY - UOPS_= RETIRED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * CPU_CLK_UNHALT= ED.THREAD))) + (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * = CPU_CLK_UNHALTED.THREAD)) * ((BR_MISP_RETIRED.ALL_BRANCHES / ( BR_MISP_RETI= RED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * INT_MISC.CLEAR_RESTEER_CYCLES = / CPU_CLK_UNHALTED.THREAD) / #(4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DEL= IV.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) )", + "MetricGroup": "Bad;BadSpec;BrMispredicts", + "MetricName": "Mispredictions" + }, + { + "BriefDescription": "Total pipeline cost of Branch Misprediction r= elated bottlenecks", + "MetricExpr": "100 * ( ((BR_MISP_RETIRED.ALL_BRANCHES / ( BR_MISP_= RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * (( UOPS_ISSUED.ANY - UOPS_= RETIRED.RETIRE_SLOTS + 4 * ( INT_MISC.RECOVERY_CYCLES_ANY / 2 ) ) / (4 * ( = ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE = / CPU_CLK_UNHALTED.REF_XCLK ) )))) + (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_U= OPS_DELIV.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNH= ALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) * ((BR_MISP_RETIR= ED.ALL_BRANCHES / ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) = * INT_MISC.CLEAR_RESTEER_CYCLES / CPU_CLK_UNHALTED.THREAD) / #(4 * IDQ_UOPS= _NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD = / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCL= K ) ))) )", + "MetricGroup": "Bad;BadSpec;BrMispredicts_SMT", + "MetricName": "Mispredictions_SMT" + }, + { + "BriefDescription": "Total pipeline cost of (external) Memory Band= width related bottlenecks", + "MetricExpr": "100 * ((( CYCLE_ACTIVITY.STALLS_MEM_ANY + EXE_ACTIV= ITY.BOUND_ON_STORES ) / (CYCLE_ACTIVITY.STALLS_TOTAL + (EXE_ACTIVITY.1_PORT= S_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * CPU_CLK_UNHALTED.THREAD)) * EXE_= ACTIVITY.2_PORTS_UTIL) + EXE_ACTIVITY.BOUND_ON_STORES)) * (1 - (IDQ_UOPS_NO= T_DELIVERED.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) - ( UOPS_ISSUED.ANY + 4 *= INT_MISC.RECOVERY_CYCLES ) / (4 * CPU_CLK_UNHALTED.THREAD))) * ( ( (CYCLE_= ACTIVITY.STALLS_L3_MISS / CPU_CLK_UNHALTED.THREAD + (( CYCLE_ACTIVITY.STALL= S_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS ) / CPU_CLK_UNHALTED.THREAD) - (= ( (MEM_LOAD_RETIRED.L2_HIT * ( 1 + (MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETI= RED.L1_MISS) )) / ( (MEM_LOAD_RETIRED.L2_HIT * ( 1 + (MEM_LOAD_RETIRED.FB_H= IT / MEM_LOAD_RETIRED.L1_MISS) )) + cpu@L1D_PEND_MISS.FB_FULL\\,cmask\\=3D1= @ ) ) * (( CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS )= / CPU_CLK_UNHALTED.THREAD))) / #((( CYCLE_ACTIVITY.STALLS_MEM_ANY + EXE_AC= TIVITY.BOUND_ON_STORES ) / (CYCLE_ACTIVITY.STALLS_TOTAL + (EXE_ACTIVITY.1_P= ORTS_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * CPU_CLK_UNHALTED.THREAD)) * E= XE_ACTIVITY.2_PORTS_UTIL) + EXE_ACTIVITY.BOUND_ON_STORES)) * (1 - (IDQ_UOPS= _NOT_DELIVERED.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) - ( UOPS_ISSUED.ANY + = 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * CPU_CLK_UNHALTED.THREAD))) ) * ( (min= ( CPU_CLK_UNHALTED.THREAD , cpu@OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD\\,= cmask\\=3D4@ ) / CPU_CLK_UNHALTED.THREAD) / #(CYCLE_ACTIVITY.STALLS_L3_MISS= / CPU_CLK_UNHALTED.THREAD + (( CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTI= VITY.STALLS_L2_MISS ) / CPU_CLK_UNHALTED.THREAD) - (( (MEM_LOAD_RETIRED.L2_= HIT * ( 1 + (MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS) )) / ( (ME= M_LOAD_RETIRED.L2_HIT * ( 1 + (MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L= 1_MISS) )) + cpu@L1D_PEND_MISS.FB_FULL\\,cmask\\=3D1@ ) ) * (( CYCLE_ACTIVI= TY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS ) / CPU_CLK_UNHALTED.THR= EAD))) ) + ( (( CYCLE_ACTIVITY.STALLS_L2_MISS - CYCLE_ACTIVITY.STALLS_L3_MI= SS ) / CPU_CLK_UNHALTED.THREAD) / #((( CYCLE_ACTIVITY.STALLS_MEM_ANY + EXE_= ACTIVITY.BOUND_ON_STORES ) / (CYCLE_ACTIVITY.STALLS_TOTAL + (EXE_ACTIVITY.1= _PORTS_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * CPU_CLK_UNHALTED.THREAD)) *= EXE_ACTIVITY.2_PORTS_UTIL) + EXE_ACTIVITY.BOUND_ON_STORES)) * (1 - (IDQ_UO= PS_NOT_DELIVERED.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) - ( UOPS_ISSUED.ANY = + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * CPU_CLK_UNHALTED.THREAD))) ) * ( (O= FFCORE_REQUESTS_BUFFER.SQ_FULL / CPU_CLK_UNHALTED.THREAD) / #(( CYCLE_ACTIV= ITY.STALLS_L2_MISS - CYCLE_ACTIVITY.STALLS_L3_MISS ) / CPU_CLK_UNHALTED.THR= EAD) ) ) + ( (max( ( CYCLE_ACTIVITY.STALLS_MEM_ANY - CYCLE_ACTIVITY.STALLS_= L1D_MISS ) / CPU_CLK_UNHALTED.THREAD , 0 )) / #((( CYCLE_ACTIVITY.STALLS_ME= M_ANY + EXE_ACTIVITY.BOUND_ON_STORES ) / (CYCLE_ACTIVITY.STALLS_TOTAL + (EX= E_ACTIVITY.1_PORTS_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * CPU_CLK_UNHALTE= D.THREAD)) * EXE_ACTIVITY.2_PORTS_UTIL) + EXE_ACTIVITY.BOUND_ON_STORES)) * = (1 - (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) - ( UOPS= _ISSUED.ANY + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * CPU_CLK_UNHALTED.THREAD= ))) ) * ( ((L1D_PEND_MISS.PENDING / ( MEM_LOAD_RETIRED.L1_MISS + MEM_LOAD_R= ETIRED.FB_HIT )) * cpu@L1D_PEND_MISS.FB_FULL\\,cmask\\=3D1@ / CPU_CLK_UNHAL= TED.THREAD) / #(max( ( CYCLE_ACTIVITY.STALLS_MEM_ANY - CYCLE_ACTIVITY.STALL= S_L1D_MISS ) / CPU_CLK_UNHALTED.THREAD , 0 )) ) ", + "MetricGroup": "Mem;MemoryBW;Offcore", + "MetricName": "Memory_Bandwidth" + }, + { + "BriefDescription": "Total pipeline cost of (external) Memory Band= width related bottlenecks", + "MetricExpr": "100 * ((( CYCLE_ACTIVITY.STALLS_MEM_ANY + EXE_ACTIV= ITY.BOUND_ON_STORES ) / (CYCLE_ACTIVITY.STALLS_TOTAL + (EXE_ACTIVITY.1_PORT= S_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 = ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) = ))) * EXE_ACTIVITY.2_PORTS_UTIL) + EXE_ACTIVITY.BOUND_ON_STORES)) * (1 - (I= DQ_UOPS_NOT_DELIVERED.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 += CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) - ( U= OPS_ISSUED.ANY + 4 * ( INT_MISC.RECOVERY_CYCLES_ANY / 2 ) ) / (4 * ( ( CPU_= CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_= CLK_UNHALTED.REF_XCLK ) )))) * ( ( (CYCLE_ACTIVITY.STALLS_L3_MISS / CPU_CLK= _UNHALTED.THREAD + (( CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALL= S_L2_MISS ) / CPU_CLK_UNHALTED.THREAD) - (( (MEM_LOAD_RETIRED.L2_HIT * ( 1 = + (MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS) )) / ( (MEM_LOAD_RET= IRED.L2_HIT * ( 1 + (MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS) ))= + cpu@L1D_PEND_MISS.FB_FULL\\,cmask\\=3D1@ ) ) * (( CYCLE_ACTIVITY.STALLS_= L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS ) / CPU_CLK_UNHALTED.THREAD))) / #= ((( CYCLE_ACTIVITY.STALLS_MEM_ANY + EXE_ACTIVITY.BOUND_ON_STORES ) / (CYCLE= _ACTIVITY.STALLS_TOTAL + (EXE_ACTIVITY.1_PORTS_UTIL + (UOPS_RETIRED.RETIRE_= SLOTS / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE= _THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) * EXE_ACTIVITY.2_PORTS_UTI= L) + EXE_ACTIVITY.BOUND_ON_STORES)) * (1 - (IDQ_UOPS_NOT_DELIVERED.CORE / (= 4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_A= CTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) - ( UOPS_ISSUED.ANY + 4 * ( INT_MIS= C.RECOVERY_CYCLES_ANY / 2 ) ) / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( = 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) )= * ( (min( CPU_CLK_UNHALTED.THREAD , cpu@OFFCORE_REQUESTS_OUTSTANDING.ALL_D= ATA_RD\\,cmask\\=3D4@ ) / CPU_CLK_UNHALTED.THREAD) / #(CYCLE_ACTIVITY.STALL= S_L3_MISS / CPU_CLK_UNHALTED.THREAD + (( CYCLE_ACTIVITY.STALLS_L1D_MISS - C= YCLE_ACTIVITY.STALLS_L2_MISS ) / CPU_CLK_UNHALTED.THREAD) - (( (MEM_LOAD_RE= TIRED.L2_HIT * ( 1 + (MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS) )= ) / ( (MEM_LOAD_RETIRED.L2_HIT * ( 1 + (MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_= RETIRED.L1_MISS) )) + cpu@L1D_PEND_MISS.FB_FULL\\,cmask\\=3D1@ ) ) * (( CYC= LE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS ) / CPU_CLK_UNH= ALTED.THREAD))) ) + ( (( CYCLE_ACTIVITY.STALLS_L2_MISS - CYCLE_ACTIVITY.STA= LLS_L3_MISS ) / CPU_CLK_UNHALTED.THREAD) / #((( CYCLE_ACTIVITY.STALLS_MEM_A= NY + EXE_ACTIVITY.BOUND_ON_STORES ) / (CYCLE_ACTIVITY.STALLS_TOTAL + (EXE_A= CTIVITY.1_PORTS_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * ( ( CPU_CLK_UNHALT= ED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALT= ED.REF_XCLK ) ))) * EXE_ACTIVITY.2_PORTS_UTIL) + EXE_ACTIVITY.BOUND_ON_STOR= ES)) * (1 - (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD= / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XC= LK ) ))) - ( UOPS_ISSUED.ANY + 4 * ( INT_MISC.RECOVERY_CYCLES_ANY / 2 ) ) /= (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD= _ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) ) * ( (( OFFCORE_REQUESTS_BUFFER= .SQ_FULL / 2 ) / ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED= .ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )) / #(( CYCLE_ACTIVITY.ST= ALLS_L2_MISS - CYCLE_ACTIVITY.STALLS_L3_MISS ) / CPU_CLK_UNHALTED.THREAD) )= ) + ( (max( ( CYCLE_ACTIVITY.STALLS_MEM_ANY - CYCLE_ACTIVITY.STALLS_L1D_MI= SS ) / CPU_CLK_UNHALTED.THREAD , 0 )) / #((( CYCLE_ACTIVITY.STALLS_MEM_ANY = + EXE_ACTIVITY.BOUND_ON_STORES ) / (CYCLE_ACTIVITY.STALLS_TOTAL + (EXE_ACTI= VITY.1_PORTS_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * ( ( CPU_CLK_UNHALTED.= THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.= REF_XCLK ) ))) * EXE_ACTIVITY.2_PORTS_UTIL) + EXE_ACTIVITY.BOUND_ON_STORES)= ) * (1 - (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / = 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK = ) ))) - ( UOPS_ISSUED.ANY + 4 * ( INT_MISC.RECOVERY_CYCLES_ANY / 2 ) ) / (4= * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_AC= TIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) ) * ( ((L1D_PEND_MISS.PENDING / ( M= EM_LOAD_RETIRED.L1_MISS + MEM_LOAD_RETIRED.FB_HIT )) * cpu@L1D_PEND_MISS.FB= _FULL\\,cmask\\=3D1@ / CPU_CLK_UNHALTED.THREAD) / #(max( ( CYCLE_ACTIVITY.S= TALLS_MEM_ANY - CYCLE_ACTIVITY.STALLS_L1D_MISS ) / CPU_CLK_UNHALTED.THREAD = , 0 )) ) ", + "MetricGroup": "Mem;MemoryBW;Offcore_SMT", + "MetricName": "Memory_Bandwidth_SMT" + }, + { + "BriefDescription": "Total pipeline cost of Memory Latency related= bottlenecks (external memory and off-core caches)", + "MetricExpr": "100 * ((( CYCLE_ACTIVITY.STALLS_MEM_ANY + EXE_ACTIV= ITY.BOUND_ON_STORES ) / (CYCLE_ACTIVITY.STALLS_TOTAL + (EXE_ACTIVITY.1_PORT= S_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * CPU_CLK_UNHALTED.THREAD)) * EXE_= ACTIVITY.2_PORTS_UTIL) + EXE_ACTIVITY.BOUND_ON_STORES)) * (1 - (IDQ_UOPS_NO= T_DELIVERED.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) - ( UOPS_ISSUED.ANY + 4 *= INT_MISC.RECOVERY_CYCLES ) / (4 * CPU_CLK_UNHALTED.THREAD))) * ( ( (CYCLE_= ACTIVITY.STALLS_L3_MISS / CPU_CLK_UNHALTED.THREAD + (( CYCLE_ACTIVITY.STALL= S_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS ) / CPU_CLK_UNHALTED.THREAD) - (= ( (MEM_LOAD_RETIRED.L2_HIT * ( 1 + (MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETI= RED.L1_MISS) )) / ( (MEM_LOAD_RETIRED.L2_HIT * ( 1 + (MEM_LOAD_RETIRED.FB_H= IT / MEM_LOAD_RETIRED.L1_MISS) )) + cpu@L1D_PEND_MISS.FB_FULL\\,cmask\\=3D1= @ ) ) * (( CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS )= / CPU_CLK_UNHALTED.THREAD))) / #((( CYCLE_ACTIVITY.STALLS_MEM_ANY + EXE_AC= TIVITY.BOUND_ON_STORES ) / (CYCLE_ACTIVITY.STALLS_TOTAL + (EXE_ACTIVITY.1_P= ORTS_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * CPU_CLK_UNHALTED.THREAD)) * E= XE_ACTIVITY.2_PORTS_UTIL) + EXE_ACTIVITY.BOUND_ON_STORES)) * (1 - (IDQ_UOPS= _NOT_DELIVERED.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) - ( UOPS_ISSUED.ANY + = 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * CPU_CLK_UNHALTED.THREAD))) ) * ( (min= ( CPU_CLK_UNHALTED.THREAD , OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_R= D ) / CPU_CLK_UNHALTED.THREAD - (min( CPU_CLK_UNHALTED.THREAD , cpu@OFFCORE= _REQUESTS_OUTSTANDING.ALL_DATA_RD\\,cmask\\=3D4@ ) / CPU_CLK_UNHALTED.THREA= D)) / #(CYCLE_ACTIVITY.STALLS_L3_MISS / CPU_CLK_UNHALTED.THREAD + (( CYCLE_= ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS ) / CPU_CLK_UNHALT= ED.THREAD) - (( (MEM_LOAD_RETIRED.L2_HIT * ( 1 + (MEM_LOAD_RETIRED.FB_HIT /= MEM_LOAD_RETIRED.L1_MISS) )) / ( (MEM_LOAD_RETIRED.L2_HIT * ( 1 + (MEM_LOA= D_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS) )) + cpu@L1D_PEND_MISS.FB_FULL= \\,cmask\\=3D1@ ) ) * (( CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.ST= ALLS_L2_MISS ) / CPU_CLK_UNHALTED.THREAD))) ) + ( (( CYCLE_ACTIVITY.STALLS_= L2_MISS - CYCLE_ACTIVITY.STALLS_L3_MISS ) / CPU_CLK_UNHALTED.THREAD) / #(((= CYCLE_ACTIVITY.STALLS_MEM_ANY + EXE_ACTIVITY.BOUND_ON_STORES ) / (CYCLE_AC= TIVITY.STALLS_TOTAL + (EXE_ACTIVITY.1_PORTS_UTIL + (UOPS_RETIRED.RETIRE_SLO= TS / (4 * CPU_CLK_UNHALTED.THREAD)) * EXE_ACTIVITY.2_PORTS_UTIL) + EXE_ACTI= VITY.BOUND_ON_STORES)) * (1 - (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * CPU_CLK_U= NHALTED.THREAD)) - ( UOPS_ISSUED.ANY + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 = * CPU_CLK_UNHALTED.THREAD))) ) * ( (( (20.5 * ((CPU_CLK_UNHALTED.THREAD / C= PU_CLK_UNHALTED.REF_TSC) * msr@tsc@ / 1000000000 / duration_time)) - (3.5 *= ((CPU_CLK_UNHALTED.THREAD / CPU_CLK_UNHALTED.REF_TSC) * msr@tsc@ / 1000000= 000 / duration_time)) ) * MEM_LOAD_RETIRED.L3_HIT * (1 + (MEM_LOAD_RETIRED.= FB_HIT / MEM_LOAD_RETIRED.L1_MISS) / 2) / CPU_CLK_UNHALTED.THREAD) / #(( CY= CLE_ACTIVITY.STALLS_L2_MISS - CYCLE_ACTIVITY.STALLS_L3_MISS ) / CPU_CLK_UNH= ALTED.THREAD) ) + ( (( (MEM_LOAD_RETIRED.L2_HIT * ( 1 + (MEM_LOAD_RETIRED.F= B_HIT / MEM_LOAD_RETIRED.L1_MISS) )) / ( (MEM_LOAD_RETIRED.L2_HIT * ( 1 + (= MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS) )) + cpu@L1D_PEND_MISS.= FB_FULL\\,cmask\\=3D1@ ) ) * (( CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTI= VITY.STALLS_L2_MISS ) / CPU_CLK_UNHALTED.THREAD)) / #((( CYCLE_ACTIVITY.STA= LLS_MEM_ANY + EXE_ACTIVITY.BOUND_ON_STORES ) / (CYCLE_ACTIVITY.STALLS_TOTAL= + (EXE_ACTIVITY.1_PORTS_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * CPU_CLK_U= NHALTED.THREAD)) * EXE_ACTIVITY.2_PORTS_UTIL) + EXE_ACTIVITY.BOUND_ON_STORE= S)) * (1 - (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) - = ( UOPS_ISSUED.ANY + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * CPU_CLK_UNHALTED.= THREAD))) ) )", + "MetricGroup": "Mem;MemoryLat;Offcore", + "MetricName": "Memory_Latency" + }, + { + "BriefDescription": "Total pipeline cost of Memory Latency related= bottlenecks (external memory and off-core caches)", + "MetricExpr": "100 * ((( CYCLE_ACTIVITY.STALLS_MEM_ANY + EXE_ACTIV= ITY.BOUND_ON_STORES ) / (CYCLE_ACTIVITY.STALLS_TOTAL + (EXE_ACTIVITY.1_PORT= S_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 = ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) = ))) * EXE_ACTIVITY.2_PORTS_UTIL) + EXE_ACTIVITY.BOUND_ON_STORES)) * (1 - (I= DQ_UOPS_NOT_DELIVERED.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 += CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) - ( U= OPS_ISSUED.ANY + 4 * ( INT_MISC.RECOVERY_CYCLES_ANY / 2 ) ) / (4 * ( ( CPU_= CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_= CLK_UNHALTED.REF_XCLK ) )))) * ( ( (CYCLE_ACTIVITY.STALLS_L3_MISS / CPU_CLK= _UNHALTED.THREAD + (( CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALL= S_L2_MISS ) / CPU_CLK_UNHALTED.THREAD) - (( (MEM_LOAD_RETIRED.L2_HIT * ( 1 = + (MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS) )) / ( (MEM_LOAD_RET= IRED.L2_HIT * ( 1 + (MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS) ))= + cpu@L1D_PEND_MISS.FB_FULL\\,cmask\\=3D1@ ) ) * (( CYCLE_ACTIVITY.STALLS_= L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS ) / CPU_CLK_UNHALTED.THREAD))) / #= ((( CYCLE_ACTIVITY.STALLS_MEM_ANY + EXE_ACTIVITY.BOUND_ON_STORES ) / (CYCLE= _ACTIVITY.STALLS_TOTAL + (EXE_ACTIVITY.1_PORTS_UTIL + (UOPS_RETIRED.RETIRE_= SLOTS / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE= _THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) * EXE_ACTIVITY.2_PORTS_UTI= L) + EXE_ACTIVITY.BOUND_ON_STORES)) * (1 - (IDQ_UOPS_NOT_DELIVERED.CORE / (= 4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_A= CTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) - ( UOPS_ISSUED.ANY + 4 * ( INT_MIS= C.RECOVERY_CYCLES_ANY / 2 ) ) / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( = 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) )= * ( (min( CPU_CLK_UNHALTED.THREAD , OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WI= TH_DATA_RD ) / CPU_CLK_UNHALTED.THREAD - (min( CPU_CLK_UNHALTED.THREAD , cp= u@OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD\\,cmask\\=3D4@ ) / CPU_CLK_UNHAL= TED.THREAD)) / #(CYCLE_ACTIVITY.STALLS_L3_MISS / CPU_CLK_UNHALTED.THREAD + = (( CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS ) / CPU_C= LK_UNHALTED.THREAD) - (( (MEM_LOAD_RETIRED.L2_HIT * ( 1 + (MEM_LOAD_RETIRED= .FB_HIT / MEM_LOAD_RETIRED.L1_MISS) )) / ( (MEM_LOAD_RETIRED.L2_HIT * ( 1 += (MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS) )) + cpu@L1D_PEND_MIS= S.FB_FULL\\,cmask\\=3D1@ ) ) * (( CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_AC= TIVITY.STALLS_L2_MISS ) / CPU_CLK_UNHALTED.THREAD))) ) + ( (( CYCLE_ACTIVIT= Y.STALLS_L2_MISS - CYCLE_ACTIVITY.STALLS_L3_MISS ) / CPU_CLK_UNHALTED.THREA= D) / #((( CYCLE_ACTIVITY.STALLS_MEM_ANY + EXE_ACTIVITY.BOUND_ON_STORES ) / = (CYCLE_ACTIVITY.STALLS_TOTAL + (EXE_ACTIVITY.1_PORTS_UTIL + (UOPS_RETIRED.R= ETIRE_SLOTS / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALT= ED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) * EXE_ACTIVITY.2_POR= TS_UTIL) + EXE_ACTIVITY.BOUND_ON_STORES)) * (1 - (IDQ_UOPS_NOT_DELIVERED.CO= RE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_TH= READ_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) - ( UOPS_ISSUED.ANY + 4 * ( I= NT_MISC.RECOVERY_CYCLES_ANY / 2 ) ) / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 = ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) = )))) ) * ( (( (20.5 * ((CPU_CLK_UNHALTED.THREAD / CPU_CLK_UNHALTED.REF_TSC)= * msr@tsc@ / 1000000000 / duration_time)) - (3.5 * ((CPU_CLK_UNHALTED.THRE= AD / CPU_CLK_UNHALTED.REF_TSC) * msr@tsc@ / 1000000000 / duration_time)) ) = * MEM_LOAD_RETIRED.L3_HIT * (1 + (MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRE= D.L1_MISS) / 2) / CPU_CLK_UNHALTED.THREAD) / #(( CYCLE_ACTIVITY.STALLS_L2_M= ISS - CYCLE_ACTIVITY.STALLS_L3_MISS ) / CPU_CLK_UNHALTED.THREAD) ) + ( (( (= MEM_LOAD_RETIRED.L2_HIT * ( 1 + (MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED= .L1_MISS) )) / ( (MEM_LOAD_RETIRED.L2_HIT * ( 1 + (MEM_LOAD_RETIRED.FB_HIT = / MEM_LOAD_RETIRED.L1_MISS) )) + cpu@L1D_PEND_MISS.FB_FULL\\,cmask\\=3D1@ )= ) * (( CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS ) / = CPU_CLK_UNHALTED.THREAD)) / #((( CYCLE_ACTIVITY.STALLS_MEM_ANY + EXE_ACTIVI= TY.BOUND_ON_STORES ) / (CYCLE_ACTIVITY.STALLS_TOTAL + (EXE_ACTIVITY.1_PORTS= _UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 )= * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )= )) * EXE_ACTIVITY.2_PORTS_UTIL) + EXE_ACTIVITY.BOUND_ON_STORES)) * (1 - (ID= Q_UOPS_NOT_DELIVERED.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + = CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) - ( UO= PS_ISSUED.ANY + 4 * ( INT_MISC.RECOVERY_CYCLES_ANY / 2 ) ) / (4 * ( ( CPU_C= LK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_C= LK_UNHALTED.REF_XCLK ) )))) ) )", + "MetricGroup": "Mem;MemoryLat;Offcore_SMT", + "MetricName": "Memory_Latency_SMT" + }, + { + "BriefDescription": "Total pipeline cost of Memory Address Transla= tion related bottlenecks (data-side TLBs)", + "MetricExpr": "100 * ((( CYCLE_ACTIVITY.STALLS_MEM_ANY + EXE_ACTIV= ITY.BOUND_ON_STORES ) / (CYCLE_ACTIVITY.STALLS_TOTAL + (EXE_ACTIVITY.1_PORT= S_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * CPU_CLK_UNHALTED.THREAD)) * EXE_= ACTIVITY.2_PORTS_UTIL) + EXE_ACTIVITY.BOUND_ON_STORES)) * (1 - (IDQ_UOPS_NO= T_DELIVERED.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) - ( UOPS_ISSUED.ANY + 4 *= INT_MISC.RECOVERY_CYCLES ) / (4 * CPU_CLK_UNHALTED.THREAD))) * ( ( (max( (= CYCLE_ACTIVITY.STALLS_MEM_ANY - CYCLE_ACTIVITY.STALLS_L1D_MISS ) / CPU_CLK= _UNHALTED.THREAD , 0 )) / ((( CYCLE_ACTIVITY.STALLS_MEM_ANY + EXE_ACTIVITY.= BOUND_ON_STORES ) / (CYCLE_ACTIVITY.STALLS_TOTAL + (EXE_ACTIVITY.1_PORTS_UT= IL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * CPU_CLK_UNHALTED.THREAD)) * EXE_ACTI= VITY.2_PORTS_UTIL) + EXE_ACTIVITY.BOUND_ON_STORES)) * (1 - (IDQ_UOPS_NOT_DE= LIVERED.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) - ( UOPS_ISSUED.ANY + 4 * INT= _MISC.RECOVERY_CYCLES ) / (4 * CPU_CLK_UNHALTED.THREAD))) ) * ( (min( 9 * c= pu@DTLB_LOAD_MISSES.STLB_HIT\\,cmask\\=3D1@ + DTLB_LOAD_MISSES.WALK_ACTIVE = , max( CYCLE_ACTIVITY.CYCLES_MEM_ANY - CYCLE_ACTIVITY.CYCLES_L1D_MISS , 0 )= ) / CPU_CLK_UNHALTED.THREAD) / (max( ( CYCLE_ACTIVITY.STALLS_MEM_ANY - CYC= LE_ACTIVITY.STALLS_L1D_MISS ) / CPU_CLK_UNHALTED.THREAD , 0 )) ) + ( (EXE_A= CTIVITY.BOUND_ON_STORES / CPU_CLK_UNHALTED.THREAD) / #((( CYCLE_ACTIVITY.ST= ALLS_MEM_ANY + EXE_ACTIVITY.BOUND_ON_STORES ) / (CYCLE_ACTIVITY.STALLS_TOTA= L + (EXE_ACTIVITY.1_PORTS_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * CPU_CLK_= UNHALTED.THREAD)) * EXE_ACTIVITY.2_PORTS_UTIL) + EXE_ACTIVITY.BOUND_ON_STOR= ES)) * (1 - (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) -= ( UOPS_ISSUED.ANY + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * CPU_CLK_UNHALTED= .THREAD))) ) * ( (( 9 * cpu@DTLB_STORE_MISSES.STLB_HIT\\,cmask\\=3D1@ + DTL= B_STORE_MISSES.WALK_ACTIVE ) / CPU_CLK_UNHALTED.THREAD) / #(EXE_ACTIVITY.BO= UND_ON_STORES / CPU_CLK_UNHALTED.THREAD) ) ) ", + "MetricGroup": "Mem;MemoryTLB", + "MetricName": "Memory_Data_TLBs" + }, + { + "BriefDescription": "Total pipeline cost of Memory Address Transla= tion related bottlenecks (data-side TLBs)", + "MetricExpr": "100 * ((( CYCLE_ACTIVITY.STALLS_MEM_ANY + EXE_ACTIV= ITY.BOUND_ON_STORES ) / (CYCLE_ACTIVITY.STALLS_TOTAL + (EXE_ACTIVITY.1_PORT= S_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 = ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) = ))) * EXE_ACTIVITY.2_PORTS_UTIL) + EXE_ACTIVITY.BOUND_ON_STORES)) * (1 - (I= DQ_UOPS_NOT_DELIVERED.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 += CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) - ( U= OPS_ISSUED.ANY + 4 * ( INT_MISC.RECOVERY_CYCLES_ANY / 2 ) ) / (4 * ( ( CPU_= CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_= CLK_UNHALTED.REF_XCLK ) )))) * ( ( (max( ( CYCLE_ACTIVITY.STALLS_MEM_ANY - = CYCLE_ACTIVITY.STALLS_L1D_MISS ) / CPU_CLK_UNHALTED.THREAD , 0 )) / ((( CYC= LE_ACTIVITY.STALLS_MEM_ANY + EXE_ACTIVITY.BOUND_ON_STORES ) / (CYCLE_ACTIVI= TY.STALLS_TOTAL + (EXE_ACTIVITY.1_PORTS_UTIL + (UOPS_RETIRED.RETIRE_SLOTS /= (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD= _ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) * EXE_ACTIVITY.2_PORTS_UTIL) + EX= E_ACTIVITY.BOUND_ON_STORES)) * (1 - (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * ( (= CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE /= CPU_CLK_UNHALTED.REF_XCLK ) ))) - ( UOPS_ISSUED.ANY + 4 * ( INT_MISC.RECOV= ERY_CYCLES_ANY / 2 ) ) / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU= _CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) ) * ( (m= in( 9 * cpu@DTLB_LOAD_MISSES.STLB_HIT\\,cmask\\=3D1@ + DTLB_LOAD_MISSES.WAL= K_ACTIVE , max( CYCLE_ACTIVITY.CYCLES_MEM_ANY - CYCLE_ACTIVITY.CYCLES_L1D_M= ISS , 0 ) ) / CPU_CLK_UNHALTED.THREAD) / (max( ( CYCLE_ACTIVITY.STALLS_MEM_= ANY - CYCLE_ACTIVITY.STALLS_L1D_MISS ) / CPU_CLK_UNHALTED.THREAD , 0 )) ) += ( (EXE_ACTIVITY.BOUND_ON_STORES / CPU_CLK_UNHALTED.THREAD) / #((( CYCLE_AC= TIVITY.STALLS_MEM_ANY + EXE_ACTIVITY.BOUND_ON_STORES ) / (CYCLE_ACTIVITY.ST= ALLS_TOTAL + (EXE_ACTIVITY.1_PORTS_UTIL + (UOPS_RETIRED.RETIRE_SLOTS / (4 *= ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTI= VE / CPU_CLK_UNHALTED.REF_XCLK ) ))) * EXE_ACTIVITY.2_PORTS_UTIL) + EXE_ACT= IVITY.BOUND_ON_STORES)) * (1 - (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * ( ( CPU_= CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_= CLK_UNHALTED.REF_XCLK ) ))) - ( UOPS_ISSUED.ANY + 4 * ( INT_MISC.RECOVERY_C= YCLES_ANY / 2 ) ) / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_= UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) ) * ( (( 9 * = cpu@DTLB_STORE_MISSES.STLB_HIT\\,cmask\\=3D1@ + DTLB_STORE_MISSES.WALK_ACTI= VE ) / ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREA= D_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )) / #(EXE_ACTIVITY.BOUND_ON_STORES = / CPU_CLK_UNHALTED.THREAD) ) ) ", + "MetricGroup": "Mem;MemoryTLB;_SMT", + "MetricName": "Memory_Data_TLBs_SMT" + }, + { + "BriefDescription": "Total pipeline cost of branch related instruc= tions (used for program control-flow including function calls)", + "MetricExpr": "100 * (( BR_INST_RETIRED.CONDITIONAL + 3 * BR_INST_= RETIRED.NEAR_CALL + (BR_INST_RETIRED.NEAR_TAKEN - ( BR_INST_RETIRED.CONDITI= ONAL - BR_INST_RETIRED.NOT_TAKEN ) - 2 * BR_INST_RETIRED.NEAR_CALL) ) / (4 = * CPU_CLK_UNHALTED.THREAD))", + "MetricGroup": "Ret", + "MetricName": "Branching_Overhead" + }, + { + "BriefDescription": "Total pipeline cost of branch related instruc= tions (used for program control-flow including function calls)", + "MetricExpr": "100 * (( BR_INST_RETIRED.CONDITIONAL + 3 * BR_INST_= RETIRED.NEAR_CALL + (BR_INST_RETIRED.NEAR_TAKEN - ( BR_INST_RETIRED.CONDITI= ONAL - BR_INST_RETIRED.NOT_TAKEN ) - 2 * BR_INST_RETIRED.NEAR_CALL) ) / (4 = * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACT= IVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", + "MetricGroup": "Ret_SMT", + "MetricName": "Branching_Overhead_SMT" + }, + { + "BriefDescription": "Total pipeline cost of instruction fetch rela= ted bottlenecks by large code footprint programs (i-side cache; TLB and BTB= misses)", + "MetricExpr": "100 * (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DEL= IV.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) * ( (ICACHE_64B.IFTAG_STALL / CPU_= CLK_UNHALTED.THREAD) + (( ICACHE_16B.IFDATA_STALL + 2 * cpu@ICACHE_16B.IFDA= TA_STALL\\,cmask\\=3D1\\,edge@ ) / CPU_CLK_UNHALTED.THREAD) + (9 * BACLEARS= .ANY / CPU_CLK_UNHALTED.THREAD) ) / #(4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_U= OPS_DELIV.CORE / (4 * CPU_CLK_UNHALTED.THREAD))", + "MetricGroup": "BigFoot;Fed;Frontend;IcMiss;MemoryTLB", + "MetricName": "Big_Code" + }, + { + "BriefDescription": "Total pipeline cost of instruction fetch rela= ted bottlenecks by large code footprint programs (i-side cache; TLB and BTB= misses)", + "MetricExpr": "100 * (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DEL= IV.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.O= NE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) * ( (ICACHE_64B.IFTAG_ST= ALL / CPU_CLK_UNHALTED.THREAD) + (( ICACHE_16B.IFDATA_STALL + 2 * cpu@ICACH= E_16B.IFDATA_STALL\\,cmask\\=3D1\\,edge@ ) / CPU_CLK_UNHALTED.THREAD) + (9 = * BACLEARS.ANY / CPU_CLK_UNHALTED.THREAD) ) / #(4 * IDQ_UOPS_NOT_DELIVERED.= CYCLES_0_UOPS_DELIV.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + C= PU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))", + "MetricGroup": "BigFoot;Fed;Frontend;IcMiss;MemoryTLB_SMT", + "MetricName": "Big_Code_SMT" + }, + { + "BriefDescription": "Total pipeline cost of instruction fetch band= width related bottlenecks", + "MetricExpr": "100 * ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * CPU_CLK= _UNHALTED.THREAD)) - (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE /= (4 * CPU_CLK_UNHALTED.THREAD)) * ((BR_MISP_RETIRED.ALL_BRANCHES / ( BR_MIS= P_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * INT_MISC.CLEAR_RESTEER_C= YCLES / CPU_CLK_UNHALTED.THREAD) / #(4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UO= PS_DELIV.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) ) - (100 * (4 * IDQ_UOPS_NOT= _DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) * ( (I= CACHE_64B.IFTAG_STALL / CPU_CLK_UNHALTED.THREAD) + (( ICACHE_16B.IFDATA_STA= LL + 2 * cpu@ICACHE_16B.IFDATA_STALL\\,cmask\\=3D1\\,edge@ ) / CPU_CLK_UNHA= LTED.THREAD) + (9 * BACLEARS.ANY / CPU_CLK_UNHALTED.THREAD) ) / #(4 * IDQ_U= OPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * CPU_CLK_UNHALTED.THREAD))= )", + "MetricGroup": "Fed;FetchBW;Frontend", + "MetricName": "Instruction_Fetch_BW" + }, + { + "BriefDescription": "Total pipeline cost of instruction fetch band= width related bottlenecks", + "MetricExpr": "100 * ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * ( ( CPU= _CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU= _CLK_UNHALTED.REF_XCLK ) ))) - (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DE= LIV.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.= ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) * ((BR_MISP_RETIRED.ALL= _BRANCHES / ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * INT_= MISC.CLEAR_RESTEER_CYCLES / CPU_CLK_UNHALTED.THREAD) / #(4 * IDQ_UOPS_NOT_D= ELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) = * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))= ) ) - (100 * (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * ( = ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE = / CPU_CLK_UNHALTED.REF_XCLK ) ))) * ( (ICACHE_64B.IFTAG_STALL / CPU_CLK_UNH= ALTED.THREAD) + (( ICACHE_16B.IFDATA_STALL + 2 * cpu@ICACHE_16B.IFDATA_STAL= L\\,cmask\\=3D1\\,edge@ ) / CPU_CLK_UNHALTED.THREAD) + (9 * BACLEARS.ANY / = CPU_CLK_UNHALTED.THREAD) ) / #(4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DEL= IV.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.O= NE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))))", + "MetricGroup": "Fed;FetchBW;Frontend_SMT", + "MetricName": "Instruction_Fetch_BW_SMT" + }, + { "BriefDescription": "Instructions Per Cycle (per Logical Processor= )", "MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD", - "MetricGroup": "Summary", + "MetricGroup": "Ret;Summary", "MetricName": "IPC" }, { "BriefDescription": "Uops Per Instruction", "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY", - "MetricGroup": "Pipeline;Retire", + "MetricGroup": "Pipeline;Ret;Retire", "MetricName": "UPI" }, { "BriefDescription": "Instruction per taken branch", - "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN", - "MetricGroup": "Branches;FetchBW;PGO", - "MetricName": "IpTB" + "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / BR_INST_RETIRED.NEAR_TA= KEN", + "MetricGroup": "Branches;Fed;FetchBW", + "MetricName": "UpTB" }, { "BriefDescription": "Cycles Per Instruction (per Logical Processor= )", "MetricExpr": "1 / (INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD)", - "MetricGroup": "Pipeline", + "MetricGroup": "Pipeline;Mem", "MetricName": "CPI" }, { @@ -30,39 +171,84 @@ "MetricName": "CLKS" }, { - "BriefDescription": "Instructions Per Cycle (per physical core)", + "BriefDescription": "Total issue-pipeline slots (per-Physical Core= till ICL; per-Logical Processor ICL onward)", + "MetricExpr": "4 * CPU_CLK_UNHALTED.THREAD", + "MetricGroup": "TmaL1", + "MetricName": "SLOTS" + }, + { + "BriefDescription": "Total issue-pipeline slots (per-Physical Core= till ICL; per-Logical Processor ICL onward)", + "MetricExpr": "4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_C= LK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )", + "MetricGroup": "TmaL1_SMT", + "MetricName": "SLOTS_SMT" + }, + { + "BriefDescription": "The ratio of Executed- by Issued-Uops", + "MetricExpr": "UOPS_EXECUTED.THREAD / UOPS_ISSUED.ANY", + "MetricGroup": "Cor;Pipeline", + "MetricName": "Execute_per_Issue", + "PublicDescription": "The ratio of Executed- by Issued-Uops. Ratio= > 1 suggests high rate of uop micro-fusions. Ratio < 1 suggest high rate o= f \"execute\" at rename stage." + }, + { + "BriefDescription": "Instructions Per Cycle across hyper-threads (= per physical core)", "MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD", - "MetricGroup": "SMT;TmaL1", + "MetricGroup": "Ret;SMT;TmaL1", "MetricName": "CoreIPC" }, { - "BriefDescription": "Instructions Per Cycle (per physical core)", + "BriefDescription": "Instructions Per Cycle across hyper-threads (= per physical core)", "MetricExpr": "INST_RETIRED.ANY / ( ( CPU_CLK_UNHALTED.THREAD / 2 = ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) = )", - "MetricGroup": "SMT;TmaL1", + "MetricGroup": "Ret;SMT;TmaL1_SMT", "MetricName": "CoreIPC_SMT" }, { "BriefDescription": "Floating Point Operations Per Cycle", "MetricExpr": "( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_AR= ITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DO= UBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIR= ED.256B_PACKED_DOUBLE ) + 8 * ( FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + = FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * FP_ARITH_INST_RETIRED.512= B_PACKED_SINGLE ) / CPU_CLK_UNHALTED.THREAD", - "MetricGroup": "Flops", + "MetricGroup": "Ret;Flops", "MetricName": "FLOPc" }, { "BriefDescription": "Floating Point Operations Per Cycle", "MetricExpr": "( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_AR= ITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DO= UBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIR= ED.256B_PACKED_DOUBLE ) + 8 * ( FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + = FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * FP_ARITH_INST_RETIRED.512= B_PACKED_SINGLE ) / ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHAL= TED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )", - "MetricGroup": "Flops_SMT", + "MetricGroup": "Ret;Flops_SMT", "MetricName": "FLOPc_SMT" }, { + "BriefDescription": "Actual per-core usage of the Floating Point e= xecution units (regardless of the vector width)", + "MetricExpr": "( (FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_I= NST_RETIRED.SCALAR_DOUBLE) + (FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + FP= _ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_= DOUBLE + FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.5= 12B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE) ) / ( 2 * CPU= _CLK_UNHALTED.THREAD )", + "MetricGroup": "Cor;Flops;HPC", + "MetricName": "FP_Arith_Utilization", + "PublicDescription": "Actual per-core usage of the Floating Point = execution units (regardless of the vector width). Values > 1 are possible d= ue to Fused-Multiply Add (FMA) counting." + }, + { + "BriefDescription": "Actual per-core usage of the Floating Point e= xecution units (regardless of the vector width). SMT version; use when SMT = is enabled and measuring per logical CPU.", + "MetricExpr": "( (FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_ARITH_I= NST_RETIRED.SCALAR_DOUBLE) + (FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + FP= _ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.256B_PACKED_= DOUBLE + FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.5= 12B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE) ) / ( 2 * ( (= CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE /= CPU_CLK_UNHALTED.REF_XCLK ) ) )", + "MetricGroup": "Cor;Flops;HPC_SMT", + "MetricName": "FP_Arith_Utilization_SMT", + "PublicDescription": "Actual per-core usage of the Floating Point = execution units (regardless of the vector width). Values > 1 are possible d= ue to Fused-Multiply Add (FMA) counting. SMT version; use when SMT is enabl= ed and measuring per logical CPU." + }, + { "BriefDescription": "Instruction-Level-Parallelism (average number= of uops executed when there is at least 1 uop executed)", "MetricExpr": "UOPS_EXECUTED.THREAD / (( UOPS_EXECUTED.CORE_CYCLES= _GE_1 / 2 ) if #SMT_on else UOPS_EXECUTED.CORE_CYCLES_GE_1)", - "MetricGroup": "Pipeline;PortsUtil", + "MetricGroup": "Backend;Cor;Pipeline;PortsUtil", "MetricName": "ILP" }, { + "BriefDescription": "Branch Misprediction Cost: Fraction of TMA sl= ots wasted per non-speculative branch misprediction (retired JEClear)", + "MetricExpr": " ( ((BR_MISP_RETIRED.ALL_BRANCHES / ( BR_MISP_RETIR= ED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * (( UOPS_ISSUED.ANY - UOPS_RETIR= ED.RETIRE_SLOTS + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * CPU_CLK_UNHALTED.TH= READ))) + (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * CPU_C= LK_UNHALTED.THREAD)) * ((BR_MISP_RETIRED.ALL_BRANCHES / ( BR_MISP_RETIRED.A= LL_BRANCHES + MACHINE_CLEARS.COUNT )) * INT_MISC.CLEAR_RESTEER_CYCLES / CPU= _CLK_UNHALTED.THREAD) / #(4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CO= RE / (4 * CPU_CLK_UNHALTED.THREAD)) ) * (4 * CPU_CLK_UNHALTED.THREAD) / BR_= MISP_RETIRED.ALL_BRANCHES", + "MetricGroup": "Bad;BrMispredicts", + "MetricName": "Branch_Misprediction_Cost" + }, + { + "BriefDescription": "Branch Misprediction Cost: Fraction of TMA sl= ots wasted per non-speculative branch misprediction (retired JEClear)", + "MetricExpr": " ( ((BR_MISP_RETIRED.ALL_BRANCHES / ( BR_MISP_RETIR= ED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * (( UOPS_ISSUED.ANY - UOPS_RETIR= ED.RETIRE_SLOTS + 4 * ( INT_MISC.RECOVERY_CYCLES_ANY / 2 ) ) / (4 * ( ( CPU= _CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU= _CLK_UNHALTED.REF_XCLK ) )))) + (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_D= ELIV.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED= .ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) * ((BR_MISP_RETIRED.AL= L_BRANCHES / ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * INT= _MISC.CLEAR_RESTEER_CYCLES / CPU_CLK_UNHALTED.THREAD) / #(4 * IDQ_UOPS_NOT_= DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 )= * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )= )) ) * (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_= THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )) / BR_MISP_RETIRED.ALL_BRANCH= ES", + "MetricGroup": "Bad;BrMispredicts_SMT", + "MetricName": "Branch_Misprediction_Cost_SMT" + }, + { "BriefDescription": "Number of Instructions per non-speculative Br= anch Misprediction (JEClear)", "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES", - "MetricGroup": "BrMispredicts", + "MetricGroup": "Bad;BadSpec;BrMispredicts", "MetricName": "IpMispredict" }, { @@ -86,122 +272,249 @@ { "BriefDescription": "Instructions per Branch (lower number means h= igher occurrence rate)", "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.ALL_BRANCHES", - "MetricGroup": "Branches;InsType", + "MetricGroup": "Branches;Fed;InsType", "MetricName": "IpBranch" }, { "BriefDescription": "Instructions per (near) call (lower number me= ans higher occurrence rate)", "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_CALL", - "MetricGroup": "Branches", + "MetricGroup": "Branches;Fed;PGO", "MetricName": "IpCall" }, { + "BriefDescription": "Instruction per taken branch", + "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN", + "MetricGroup": "Branches;Fed;FetchBW;Frontend;PGO", + "MetricName": "IpTB" + }, + { "BriefDescription": "Branch instructions per taken branch. ", "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / BR_INST_RETIRED.NEAR= _TAKEN", - "MetricGroup": "Branches;PGO", + "MetricGroup": "Branches;Fed;PGO", "MetricName": "BpTkBranch" }, { "BriefDescription": "Instructions per Floating Point (FP) Operatio= n (lower number means higher occurrence rate)", "MetricExpr": "INST_RETIRED.ANY / ( 1 * ( FP_ARITH_INST_RETIRED.SC= ALAR_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RET= IRED.128B_PACKED_DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + = FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( FP_ARITH_INST_RETIRED.25= 6B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * FP_ARI= TH_INST_RETIRED.512B_PACKED_SINGLE )", - "MetricGroup": "Flops;FpArith;InsType", + "MetricGroup": "Flops;InsType", "MetricName": "IpFLOP" }, { + "BriefDescription": "Instructions per FP Arithmetic instruction (l= ower number means higher occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / ( (FP_ARITH_INST_RETIRED.SCALAR_= SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE) + (FP_ARITH_INST_RETIRED.128B= _PACKED_DOUBLE + FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_R= ETIRED.256B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_A= RITH_INST_RETIRED.512B_PACKED_DOUBLE + FP_ARITH_INST_RETIRED.512B_PACKED_SI= NGLE) )", + "MetricGroup": "Flops;InsType", + "MetricName": "IpArith", + "PublicDescription": "Instructions per FP Arithmetic instruction (= lower number means higher occurrence rate). May undercount due to FMA doubl= e counting. Approximated prior to BDW." + }, + { + "BriefDescription": "Instructions per FP Arithmetic Scalar Single-= Precision instruction (lower number means higher occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / FP_ARITH_INST_RETIRED.SCALAR_SIN= GLE", + "MetricGroup": "Flops;FpScalar;InsType", + "MetricName": "IpArith_Scalar_SP", + "PublicDescription": "Instructions per FP Arithmetic Scalar Single= -Precision instruction (lower number means higher occurrence rate). May und= ercount due to FMA double counting." + }, + { + "BriefDescription": "Instructions per FP Arithmetic Scalar Double-= Precision instruction (lower number means higher occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / FP_ARITH_INST_RETIRED.SCALAR_DOU= BLE", + "MetricGroup": "Flops;FpScalar;InsType", + "MetricName": "IpArith_Scalar_DP", + "PublicDescription": "Instructions per FP Arithmetic Scalar Double= -Precision instruction (lower number means higher occurrence rate). May und= ercount due to FMA double counting." + }, + { + "BriefDescription": "Instructions per FP Arithmetic AVX/SSE 128-bi= t instruction (lower number means higher occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / ( FP_ARITH_INST_RETIRED.128B_PAC= KED_DOUBLE + FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE )", + "MetricGroup": "Flops;FpVector;InsType", + "MetricName": "IpArith_AVX128", + "PublicDescription": "Instructions per FP Arithmetic AVX/SSE 128-b= it instruction (lower number means higher occurrence rate). May undercount = due to FMA double counting." + }, + { + "BriefDescription": "Instructions per FP Arithmetic AVX* 256-bit i= nstruction (lower number means higher occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / ( FP_ARITH_INST_RETIRED.256B_PAC= KED_DOUBLE + FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )", + "MetricGroup": "Flops;FpVector;InsType", + "MetricName": "IpArith_AVX256", + "PublicDescription": "Instructions per FP Arithmetic AVX* 256-bit = instruction (lower number means higher occurrence rate). May undercount due= to FMA double counting." + }, + { + "BriefDescription": "Instructions per FP Arithmetic AVX 512-bit in= struction (lower number means higher occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / ( FP_ARITH_INST_RETIRED.512B_PAC= KED_DOUBLE + FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE )", + "MetricGroup": "Flops;FpVector;InsType", + "MetricName": "IpArith_AVX512", + "PublicDescription": "Instructions per FP Arithmetic AVX 512-bit i= nstruction (lower number means higher occurrence rate). May undercount due = to FMA double counting." + }, + { "BriefDescription": "Total number of retired Instructions, Sample = with: INST_RETIRED.PREC_DIST", "MetricExpr": "INST_RETIRED.ANY", "MetricGroup": "Summary;TmaL1", "MetricName": "Instructions" }, { + "BriefDescription": "Average number of Uops issued by front-end wh= en it issued something", + "MetricExpr": "UOPS_ISSUED.ANY / cpu@UOPS_ISSUED.ANY\\,cmask\\=3D1= @", + "MetricGroup": "Fed;FetchBW", + "MetricName": "Fetch_UpC" + }, + { "BriefDescription": "Fraction of Uops delivered by the DSB (aka De= coded ICache; or Uop Cache)", "MetricExpr": "IDQ.DSB_UOPS / (IDQ.DSB_UOPS + IDQ.MITE_UOPS + IDQ.= MS_UOPS)", - "MetricGroup": "DSB;FetchBW", + "MetricGroup": "DSB;Fed;FetchBW", "MetricName": "DSB_Coverage" }, { - "BriefDescription": "Actual Average Latency for L1 data-cache miss= demand loads (in core cycles)", + "BriefDescription": "Total penalty related to DSB (uop cache) miss= es - subset/see of/the Instruction_Fetch_BW Bottleneck.", + "MetricExpr": "(4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.COR= E / (4 * CPU_CLK_UNHALTED.THREAD)) * (DSB2MITE_SWITCHES.PENALTY_CYCLES / CP= U_CLK_UNHALTED.THREAD) / #(4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.C= ORE / (4 * CPU_CLK_UNHALTED.THREAD)) + ((IDQ_UOPS_NOT_DELIVERED.CORE / (4 *= CPU_CLK_UNHALTED.THREAD)) - (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELI= V.CORE / (4 * CPU_CLK_UNHALTED.THREAD))) * (( IDQ.ALL_MITE_CYCLES_ANY_UOPS = - IDQ.ALL_MITE_CYCLES_4_UOPS ) / CPU_CLK_UNHALTED.THREAD / 2) / #((IDQ_UOPS= _NOT_DELIVERED.CORE / (4 * CPU_CLK_UNHALTED.THREAD)) - (4 * IDQ_UOPS_NOT_DE= LIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * CPU_CLK_UNHALTED.THREAD)))", + "MetricGroup": "DSBmiss;Fed", + "MetricName": "DSB_Misses_Cost" + }, + { + "BriefDescription": "Total penalty related to DSB (uop cache) miss= es - subset/see of/the Instruction_Fetch_BW Bottleneck.", + "MetricExpr": "(4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.COR= E / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THR= EAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) * (DSB2MITE_SWITCHES.PENALTY_C= YCLES / CPU_CLK_UNHALTED.THREAD) / #(4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UO= PS_DELIV.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHA= LTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) + ((IDQ_UOPS_NOT_D= ELIVERED.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHA= LTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) - (4 * IDQ_UOPS_NO= T_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2= ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK )= )))) * (( IDQ.ALL_MITE_CYCLES_ANY_UOPS - IDQ.ALL_MITE_CYCLES_4_UOPS ) / ( = ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE = / CPU_CLK_UNHALTED.REF_XCLK ) ) / 2) / #((IDQ_UOPS_NOT_DELIVERED.CORE / (4 = * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACT= IVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) - (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_= 0_UOPS_DELIV.CORE / (4 * ( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_= UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))))", + "MetricGroup": "DSBmiss;Fed_SMT", + "MetricName": "DSB_Misses_Cost_SMT" + }, + { + "BriefDescription": "Number of Instructions per non-speculative DS= B miss", + "MetricExpr": "INST_RETIRED.ANY / FRONTEND_RETIRED.ANY_DSB_MISS", + "MetricGroup": "DSBmiss;Fed", + "MetricName": "IpDSB_Miss_Ret" + }, + { + "BriefDescription": "Fraction of branches that are non-taken condi= tionals", + "MetricExpr": "BR_INST_RETIRED.NOT_TAKEN / BR_INST_RETIRED.ALL_BRA= NCHES", + "MetricGroup": "Bad;Branches;CodeGen;PGO", + "MetricName": "Cond_NT" + }, + { + "BriefDescription": "Fraction of branches that are taken condition= als", + "MetricExpr": "( BR_INST_RETIRED.CONDITIONAL - BR_INST_RETIRED.NOT= _TAKEN ) / BR_INST_RETIRED.ALL_BRANCHES", + "MetricGroup": "Bad;Branches;CodeGen;PGO", + "MetricName": "Cond_TK" + }, + { + "BriefDescription": "Fraction of branches that are CALL or RET", + "MetricExpr": "( BR_INST_RETIRED.NEAR_CALL + BR_INST_RETIRED.NEAR_= RETURN ) / BR_INST_RETIRED.ALL_BRANCHES", + "MetricGroup": "Bad;Branches", + "MetricName": "CallRet" + }, + { + "BriefDescription": "Fraction of branches that are unconditional (= direct or indirect) jumps", + "MetricExpr": "(BR_INST_RETIRED.NEAR_TAKEN - ( BR_INST_RETIRED.CON= DITIONAL - BR_INST_RETIRED.NOT_TAKEN ) - 2 * BR_INST_RETIRED.NEAR_CALL) / B= R_INST_RETIRED.ALL_BRANCHES", + "MetricGroup": "Bad;Branches", + "MetricName": "Jump" + }, + { + "BriefDescription": "Actual Average Latency for L1 data-cache miss= demand load instructions (in core cycles)", "MetricExpr": "L1D_PEND_MISS.PENDING / ( MEM_LOAD_RETIRED.L1_MISS = + MEM_LOAD_RETIRED.FB_HIT )", - "MetricGroup": "MemoryBound;MemoryLat", - "MetricName": "Load_Miss_Real_Latency" + "MetricGroup": "Mem;MemoryBound;MemoryLat", + "MetricName": "Load_Miss_Real_Latency", + "PublicDescription": "Actual Average Latency for L1 data-cache mis= s demand load instructions (in core cycles). Latency may be overestimated f= or multi-load instructions - e.g. repeat strings." }, { "BriefDescription": "Memory-Level-Parallelism (average number of L= 1 miss demand load when there is at least one such miss. Per-Logical Proces= sor)", "MetricExpr": "L1D_PEND_MISS.PENDING / L1D_PEND_MISS.PENDING_CYCLE= S", - "MetricGroup": "MemoryBound;MemoryBW", + "MetricGroup": "Mem;MemoryBound;MemoryBW", "MetricName": "MLP" }, { - "BriefDescription": "Utilization of the core's Page Walker(s) serv= ing STLB misses triggered by instruction/Load/Store accesses", - "MetricConstraint": "NO_NMI_WATCHDOG", - "MetricExpr": "( ITLB_MISSES.WALK_PENDING + DTLB_LOAD_MISSES.WALK_= PENDING + DTLB_STORE_MISSES.WALK_PENDING + EPT.WALK_PENDING ) / ( 2 * CORE_= CLKS )", - "MetricGroup": "MemoryTLB", - "MetricName": "Page_Walks_Utilization" - }, - { "BriefDescription": "Average data fill bandwidth to the L1 data ca= che [GB / sec]", "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / duration_time", - "MetricGroup": "MemoryBW", + "MetricGroup": "Mem;MemoryBW", "MetricName": "L1D_Cache_Fill_BW" }, { "BriefDescription": "Average data fill bandwidth to the L2 cache [= GB / sec]", "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / duration_time", - "MetricGroup": "MemoryBW", + "MetricGroup": "Mem;MemoryBW", "MetricName": "L2_Cache_Fill_BW" }, { "BriefDescription": "Average per-core data fill bandwidth to the L= 3 cache [GB / sec]", "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / duration= _time", - "MetricGroup": "MemoryBW", + "MetricGroup": "Mem;MemoryBW", "MetricName": "L3_Cache_Fill_BW" }, { "BriefDescription": "Average per-core data access bandwidth to the= L3 cache [GB / sec]", "MetricExpr": "64 * OFFCORE_REQUESTS.ALL_REQUESTS / 1000000000 / d= uration_time", - "MetricGroup": "MemoryBW;Offcore", + "MetricGroup": "Mem;MemoryBW;Offcore", "MetricName": "L3_Cache_Access_BW" }, { "BriefDescription": "L1 cache true misses per kilo instruction for= retired demand loads", "MetricExpr": "1000 * MEM_LOAD_RETIRED.L1_MISS / INST_RETIRED.ANY", - "MetricGroup": "CacheMisses", + "MetricGroup": "Mem;CacheMisses", "MetricName": "L1MPKI" }, { + "BriefDescription": "L1 cache true misses per kilo instruction for= all demand loads (including speculative)", + "MetricExpr": "1000 * L2_RQSTS.ALL_DEMAND_DATA_RD / INST_RETIRED.A= NY", + "MetricGroup": "Mem;CacheMisses", + "MetricName": "L1MPKI_Load" + }, + { "BriefDescription": "L2 cache true misses per kilo instruction for= retired demand loads", "MetricExpr": "1000 * MEM_LOAD_RETIRED.L2_MISS / INST_RETIRED.ANY", - "MetricGroup": "CacheMisses", + "MetricGroup": "Mem;Backend;CacheMisses", "MetricName": "L2MPKI" }, { "BriefDescription": "L2 cache misses per kilo instruction for all = request types (including speculative)", "MetricExpr": "1000 * L2_RQSTS.MISS / INST_RETIRED.ANY", - "MetricGroup": "CacheMisses;Offcore", + "MetricGroup": "Mem;CacheMisses;Offcore", "MetricName": "L2MPKI_All" }, { + "BriefDescription": "L2 cache misses per kilo instruction for all = demand loads (including speculative)", + "MetricExpr": "1000 * L2_RQSTS.DEMAND_DATA_RD_MISS / INST_RETIRED.= ANY", + "MetricGroup": "Mem;CacheMisses", + "MetricName": "L2MPKI_Load" + }, + { "BriefDescription": "L2 cache hits per kilo instruction for all re= quest types (including speculative)", "MetricExpr": "1000 * ( L2_RQSTS.REFERENCES - L2_RQSTS.MISS ) / IN= ST_RETIRED.ANY", - "MetricGroup": "CacheMisses", + "MetricGroup": "Mem;CacheMisses", "MetricName": "L2HPKI_All" }, { + "BriefDescription": "L2 cache hits per kilo instruction for all de= mand loads (including speculative)", + "MetricExpr": "1000 * L2_RQSTS.DEMAND_DATA_RD_HIT / INST_RETIRED.A= NY", + "MetricGroup": "Mem;CacheMisses", + "MetricName": "L2HPKI_Load" + }, + { "BriefDescription": "L3 cache true misses per kilo instruction for= retired demand loads", "MetricExpr": "1000 * MEM_LOAD_RETIRED.L3_MISS / INST_RETIRED.ANY", - "MetricGroup": "CacheMisses", + "MetricGroup": "Mem;CacheMisses", "MetricName": "L3MPKI" }, { + "BriefDescription": "Fill Buffer (FB) true hits per kilo instructi= ons for retired demand loads", + "MetricExpr": "1000 * MEM_LOAD_RETIRED.FB_HIT / INST_RETIRED.ANY", + "MetricGroup": "Mem;CacheMisses", + "MetricName": "FB_HPKI" + }, + { + "BriefDescription": "Utilization of the core's Page Walker(s) serv= ing STLB misses triggered by instruction/Load/Store accesses", + "MetricConstraint": "NO_NMI_WATCHDOG", + "MetricExpr": "( ITLB_MISSES.WALK_PENDING + DTLB_LOAD_MISSES.WALK_= PENDING + DTLB_STORE_MISSES.WALK_PENDING + EPT.WALK_PENDING ) / ( 2 * CPU_C= LK_UNHALTED.THREAD )", + "MetricGroup": "Mem;MemoryTLB", + "MetricName": "Page_Walks_Utilization" + }, + { + "BriefDescription": "Utilization of the core's Page Walker(s) serv= ing STLB misses triggered by instruction/Load/Store accesses", + "MetricExpr": "( ITLB_MISSES.WALK_PENDING + DTLB_LOAD_MISSES.WALK_= PENDING + DTLB_STORE_MISSES.WALK_PENDING + EPT.WALK_PENDING ) / ( 2 * ( ( C= PU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / C= PU_CLK_UNHALTED.REF_XCLK ) ) )", + "MetricGroup": "Mem;MemoryTLB_SMT", + "MetricName": "Page_Walks_Utilization_SMT" + }, + { "BriefDescription": "Rate of silent evictions from the L2 cache pe= r Kilo instruction where the evicted lines are dropped (no writeback to L3 = or memory)", "MetricExpr": "1000 * L2_LINES_OUT.SILENT / INST_RETIRED.ANY", - "MetricGroup": "L2Evicts;Server", + "MetricGroup": "L2Evicts;Mem;Server", "MetricName": "L2_Evictions_Silent_PKI" }, { "BriefDescription": "Rate of non silent evictions from the L2 cach= e per Kilo instruction", "MetricExpr": "1000 * L2_LINES_OUT.NON_SILENT / INST_RETIRED.ANY", - "MetricGroup": "L2Evicts;Server", + "MetricGroup": "L2Evicts;Mem;Server", "MetricName": "L2_Evictions_NonSilent_PKI" }, { @@ -219,7 +532,7 @@ { "BriefDescription": "Giga Floating Point Operations Per Second", "MetricExpr": "( ( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + FP_= ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_= DOUBLE + 4 * ( FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + FP_ARITH_INST_RET= IRED.256B_PACKED_DOUBLE ) + 8 * ( FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE = + FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * FP_ARITH_INST_RETIRED.5= 12B_PACKED_SINGLE ) / 1000000000 ) / duration_time", - "MetricGroup": "Flops;HPC", + "MetricGroup": "Cor;Flops;HPC", "MetricName": "GFLOPs" }, { @@ -229,6 +542,48 @@ "MetricName": "Turbo_Utilization" }, { + "BriefDescription": "Fraction of Core cycles where the core was ru= nning with power-delivery for baseline license level 0", + "MetricExpr": "CORE_POWER.LVL0_TURBO_LICENSE / CPU_CLK_UNHALTED.TH= READ", + "MetricGroup": "Power", + "MetricName": "Power_License0_Utilization", + "PublicDescription": "Fraction of Core cycles where the core was r= unning with power-delivery for baseline license level 0. This includes non= -AVX codes, SSE, AVX 128-bit, and low-current AVX 256-bit codes." + }, + { + "BriefDescription": "Fraction of Core cycles where the core was ru= nning with power-delivery for baseline license level 0. SMT version; use wh= en SMT is enabled and measuring per logical CPU.", + "MetricExpr": "CORE_POWER.LVL0_TURBO_LICENSE / 2 / ( ( CPU_CLK_UNH= ALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNH= ALTED.REF_XCLK ) )", + "MetricGroup": "Power_SMT", + "MetricName": "Power_License0_Utilization_SMT", + "PublicDescription": "Fraction of Core cycles where the core was r= unning with power-delivery for baseline license level 0. This includes non= -AVX codes, SSE, AVX 128-bit, and low-current AVX 256-bit codes. SMT versio= n; use when SMT is enabled and measuring per logical CPU." + }, + { + "BriefDescription": "Fraction of Core cycles where the core was ru= nning with power-delivery for license level 1", + "MetricExpr": "CORE_POWER.LVL1_TURBO_LICENSE / CPU_CLK_UNHALTED.TH= READ", + "MetricGroup": "Power", + "MetricName": "Power_License1_Utilization", + "PublicDescription": "Fraction of Core cycles where the core was r= unning with power-delivery for license level 1. This includes high current= AVX 256-bit instructions as well as low current AVX 512-bit instructions." + }, + { + "BriefDescription": "Fraction of Core cycles where the core was ru= nning with power-delivery for license level 1. SMT version; use when SMT is= enabled and measuring per logical CPU.", + "MetricExpr": "CORE_POWER.LVL1_TURBO_LICENSE / 2 / ( ( CPU_CLK_UNH= ALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNH= ALTED.REF_XCLK ) )", + "MetricGroup": "Power_SMT", + "MetricName": "Power_License1_Utilization_SMT", + "PublicDescription": "Fraction of Core cycles where the core was r= unning with power-delivery for license level 1. This includes high current= AVX 256-bit instructions as well as low current AVX 512-bit instructions. = SMT version; use when SMT is enabled and measuring per logical CPU." + }, + { + "BriefDescription": "Fraction of Core cycles where the core was ru= nning with power-delivery for license level 2 (introduced in SKX)", + "MetricExpr": "CORE_POWER.LVL2_TURBO_LICENSE / CPU_CLK_UNHALTED.TH= READ", + "MetricGroup": "Power", + "MetricName": "Power_License2_Utilization", + "PublicDescription": "Fraction of Core cycles where the core was r= unning with power-delivery for license level 2 (introduced in SKX). This i= ncludes high current AVX 512-bit instructions." + }, + { + "BriefDescription": "Fraction of Core cycles where the core was ru= nning with power-delivery for license level 2 (introduced in SKX). SMT vers= ion; use when SMT is enabled and measuring per logical CPU.", + "MetricExpr": "CORE_POWER.LVL2_TURBO_LICENSE / 2 / ( ( CPU_CLK_UNH= ALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNH= ALTED.REF_XCLK ) )", + "MetricGroup": "Power_SMT", + "MetricName": "Power_License2_Utilization_SMT", + "PublicDescription": "Fraction of Core cycles where the core was r= unning with power-delivery for license level 2 (introduced in SKX). This i= ncludes high current AVX 512-bit instructions. SMT version; use when SMT is= enabled and measuring per logical CPU." + }, + { "BriefDescription": "Fraction of cycles where both hardware Logica= l Processors were active", "MetricExpr": "1 - CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / ( CPU_CLK_= UNHALTED.REF_XCLK_ANY / 2 ) if #SMT_on else 0", "MetricGroup": "SMT", @@ -241,33 +596,45 @@ "MetricName": "Kernel_Utilization" }, { + "BriefDescription": "Cycles Per Instruction for the Operating Syst= em (OS) Kernel mode", + "MetricExpr": "CPU_CLK_UNHALTED.THREAD_P:k / INST_RETIRED.ANY_P:k", + "MetricGroup": "OS", + "MetricName": "Kernel_CPI" + }, + { "BriefDescription": "Average external Memory Bandwidth Use for rea= ds and writes [GB / sec]", "MetricExpr": "( 64 * ( uncore_imc@cas_count_read@ + uncore_imc@ca= s_count_write@ ) / 1000000000 ) / duration_time", - "MetricGroup": "HPC;MemoryBW;SoC", + "MetricGroup": "HPC;Mem;MemoryBW;SoC", "MetricName": "DRAM_BW_Use" }, { "BriefDescription": "Average latency of data read request to exter= nal memory (in nanoseconds). Accounts for demand loads and L1/L2 prefetches= ", "MetricExpr": "1000000000 * ( cha@event\\=3D0x36\\,umask\\=3D0x21\= \,config\\=3D0x40433@ / cha@event\\=3D0x35\\,umask\\=3D0x21\\,config\\=3D0x= 40433@ ) / ( cha_0@event\\=3D0x0@ / duration_time )", - "MetricGroup": "MemoryLat;SoC", + "MetricGroup": "Mem;MemoryLat;SoC", "MetricName": "MEM_Read_Latency" }, { "BriefDescription": "Average number of parallel data read requests= to external memory. Accounts for demand loads and L1/L2 prefetches", "MetricExpr": "cha@event\\=3D0x36\\,umask\\=3D0x21\\,config\\=3D0x= 40433@ / cha@event\\=3D0x36\\,umask\\=3D0x21\\,config\\=3D0x40433\\,thresh\= \=3D1@", - "MetricGroup": "MemoryBW;SoC", + "MetricGroup": "Mem;MemoryBW;SoC", "MetricName": "MEM_Parallel_Reads" }, { + "BriefDescription": "Average latency of data read request to exter= nal DRAM memory [in nanoseconds]. Accounts for demand loads and L1/L2 data-= read prefetches", + "MetricExpr": "1000000000 * ( UNC_M_RPQ_OCCUPANCY / UNC_M_RPQ_INSE= RTS ) / imc_0@event\\=3D0x0@", + "MetricGroup": "Mem;MemoryLat;SoC;Server", + "MetricName": "MEM_DRAM_Read_Latency" + }, + { "BriefDescription": "Average IO (network or disk) Bandwidth Use fo= r Writes [GB / sec]", "MetricExpr": "( UNC_IIO_DATA_REQ_OF_CPU.MEM_READ.PART0 + UNC_IIO_= DATA_REQ_OF_CPU.MEM_READ.PART1 + UNC_IIO_DATA_REQ_OF_CPU.MEM_READ.PART2 + U= NC_IIO_DATA_REQ_OF_CPU.MEM_READ.PART3 ) * 4 / 1000000000 / duration_time", - "MetricGroup": "IoBW;SoC;Server", + "MetricGroup": "IoBW;Mem;SoC;Server", "MetricName": "IO_Write_BW" }, { "BriefDescription": "Average IO (network or disk) Bandwidth Use fo= r Reads [GB / sec]", "MetricExpr": "( UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART0 + UNC_IIO= _DATA_REQ_OF_CPU.MEM_WRITE.PART1 + UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART2 = + UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART3 ) * 4 / 1000000000 / duration_tim= e", - "MetricGroup": "IoBW;SoC;Server", + "MetricGroup": "IoBW;Mem;SoC;Server", "MetricName": "IO_Read_BW" }, { --- a/tools/perf/pmu-events/arch/x86/skylakex/uncore-other.json +++ b/tools/perf/pmu-events/arch/x86/skylakex/uncore-other.json @@ -538,6 +538,18 @@ "Unit": "IIO" }, { + "BriefDescription": "PCIe Completion Buffer Inserts of completions= with data: Part 0-3", + "Counter": "0,1,2,3", + "EventCode": "0xC2", + "EventName": "UNC_IIO_COMP_BUF_INSERTS.CMPD.ALL_PARTS", + "FCMask": "0x4", + "PerPkg": "1", + "PortMask": "0x0f", + "PublicDescription": "PCIe Completion Buffer Inserts of completion= s with data: Part 0-3", + "UMask": "0x03", + "Unit": "IIO" + }, + { "BriefDescription": "PCIe Completion Buffer Inserts of completions= with data: Part 0", "Counter": "0,1,2,3", "EventCode": "0xC2", @@ -586,6 +598,17 @@ "Unit": "IIO" }, { + "BriefDescription": "PCIe Completion Buffer occupancy of completio= ns with data: Part 0-3", + "Counter": "2,3", + "EventCode": "0xD5", + "EventName": "UNC_IIO_COMP_BUF_OCCUPANCY.CMPD.ALL_PARTS", + "FCMask": "0x04", + "PerPkg": "1", + "PublicDescription": "PCIe Completion Buffer occupancy of completi= ons with data: Part 0-3", + "UMask": "0x0f", + "Unit": "IIO" + }, + { "BriefDescription": "PCIe Completion Buffer occupancy of completio= ns with data: Part 0", "Counter": "2,3", "EventCode": "0xD5",