From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2992529AbbHHBIy (ORCPT ); Fri, 7 Aug 2015 21:08:54 -0400 Received: from mga01.intel.com ([192.55.52.88]:21353 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1946679AbbHHBGq (ORCPT ); Fri, 7 Aug 2015 21:06:46 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.15,632,1432623600"; d="scan'208";a="538057950" From: Andi Kleen To: acme@kernel.org Cc: jolsa@kernel.org, linux-kernel@vger.kernel.org, eranian@google.com, namhyung@kernel.org, peterz@infradead.org, mingo@kernel.org, Andi Kleen Subject: [PATCH 6/9] x86, perf: Add Top Down events to Intel Core Date: Fri, 7 Aug 2015 18:06:22 -0700 Message-Id: <1438995985-13631-7-git-send-email-andi@firstfloor.org> X-Mailer: git-send-email 2.4.3 In-Reply-To: <1438995985-13631-1-git-send-email-andi@firstfloor.org> References: <1438995985-13631-1-git-send-email-andi@firstfloor.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Andi Kleen Add declarations for the events needed for TopDown to the Intel big core CPUs starting with Sandy Bridge. We need to report different values if HyperThreading is on or off. The only thing this patch does is to export some events in sysfs. TopDown level 1 uses a set of abstracted metrics which are generic to out of order CPU cores (although some CPUs may not implement all of them): topdown-total-slots Available slots in the pipeline topdown-slots-issued Slots issued into the pipeline topdown-slots-retired Slots successfully retired topdown-fetch-bubbles Pipeline gaps in the frontend topdown-recovery-bubbles Pipeline gaps during recovery from misspeculation These metrics then allow to compute four useful metrics: FrontendBound, BackendBound, Retiring, BadSpeculation. The formulas to compute the metrics are generic, they only change based on the availability on the abstracted input values. The kernel declares the events supported by the current CPU and perf stat then computes the formulas based on the available metrics. Some events need a divisor. To handle this I redefined ".scale" slightly to let a negative value mean divide by. For HyperThreading the any bit is needed to get accurate values when both threads are executing. This implies that the events can only be collected as root or with perf_event_paranoid=-1 for now. Hyper Threading also requires averaging events from both threads together (the CPU cannot measure them independently). In perf stat this is done by using per core mode, and then forcing a divisor of two to get the average. The new .agg-per-core attribute is added to the events, which then forces perf stat to enable --per-core. When hyperthreading is disabled the attribute has the value 0. The basic scheme is based on the following paper: Yasin, A Top Down Method for Performance analysis and Counter architecture ISPASS14 (pdf available via google) with some extensions to handle HyperThreading. Signed-off-by: Andi Kleen --- arch/x86/kernel/cpu/perf_event_intel.c | 82 ++++++++++++++++++++++++++++++++++ 1 file changed, 82 insertions(+) diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c index a478e3c..65b58cb 100644 --- a/arch/x86/kernel/cpu/perf_event_intel.c +++ b/arch/x86/kernel/cpu/perf_event_intel.c @@ -217,9 +217,70 @@ struct attribute *nhm_events_attrs[] = { NULL, }; +/* + * TopDown events for Core. + * + * With Hyper Threading on, TopDown metrics are averaged between the + * threads of a core: (count_core0 + count_core1) / 2. The 2 is expressed + * as a scale parameter. We also tell perf to aggregate per core + * by setting the .agg-per-core attribute for the alias to 1. + * + * Some events need to be multiplied by the pipeline width (4), which + * is expressed as a negative scale. In HT we cancel the factor 4 + * with the 2 dividend for the core average, so we use -2. + */ + +EVENT_ATTR_STR_HT(topdown-total-slots, td_total_slots, + "event=0x3c,umask=0x0", /* cpu_clk_unhalted.thread */ + "event=0x3c,umask=0x0,any=1"); /* cpu_clk_unhalted.thread_any */ +EVENT_ATTR_STR_HT(topdown-total-slots.scale, td_total_slots_scale, + "-4", "-2"); +EVENT_ATTR_STR_HT(topdown-total-slots.agg-per-core, td_total_slots_pc, + "0", "1"); +EVENT_ATTR_STR(topdown-slots-issued, td_slots_issued, + "event=0xe,umask=0x1"); /* uops_issued.any */ +EVENT_ATTR_STR_HT(topdown-slots-issued.agg-per-core, td_slots_issued_pc, + "0", "1"); +EVENT_ATTR_STR_HT(topdown-slots-issued.scale, td_slots_issued_scale, + "0", "2"); +EVENT_ATTR_STR(topdown-slots-retired, td_slots_retired, + "event=0xc2,umask=0x2"); /* uops_retired.retire_slots */ +EVENT_ATTR_STR_HT(topdown-slots-retired.agg-per-core, td_slots_retired_pc, + "0", "1"); +EVENT_ATTR_STR_HT(topdown-slots-retired.scale, td_slots_retired_scale, + "0", "2"); +EVENT_ATTR_STR(topdown-fetch-bubbles, td_fetch_bubbles, + "event=0x9c,umask=0x1"); /* idq_uops_not_delivered_core */ +EVENT_ATTR_STR_HT(topdown-fetch-bubbles.agg-per-core, td_fetch_bubbles_pc, + "0", "1"); +EVENT_ATTR_STR_HT(topdown-fetch-bubbles.scale, td_fetch_bubbles_scale, + "0", "2"); +EVENT_ATTR_STR_HT(topdown-recovery-bubbles, td_recovery_bubbles, + "event=0xd,umask=0x3,cmask=1", /* int_misc.recovery_cycles */ + "event=0xd,umask=0x3,cmask=1,any=1"); /* int_misc.recovery_cycles_any */ +EVENT_ATTR_STR_HT(topdown-recovery-bubbles.scale, td_recovery_bubbles_scale, + "-4", "-2"); +EVENT_ATTR_STR_HT(topdown-recovery-bubbles.agg-per-core, td_recovery_bubbles_pc, + "0", "1"); + struct attribute *snb_events_attrs[] = { EVENT_PTR(mem_ld_snb), EVENT_PTR(mem_st_snb), + EVENT_PTR(td_slots_issued), + EVENT_PTR(td_slots_issued_scale), + EVENT_PTR(td_slots_issued_pc), + EVENT_PTR(td_slots_retired), + EVENT_PTR(td_slots_retired_scale), + EVENT_PTR(td_slots_retired_pc), + EVENT_PTR(td_fetch_bubbles), + EVENT_PTR(td_fetch_bubbles_scale), + EVENT_PTR(td_fetch_bubbles_pc), + EVENT_PTR(td_total_slots), + EVENT_PTR(td_total_slots_scale), + EVENT_PTR(td_total_slots_pc), + EVENT_PTR(td_recovery_bubbles), + EVENT_PTR(td_recovery_bubbles_scale), + EVENT_PTR(td_recovery_bubbles_pc), NULL, }; @@ -3177,6 +3238,21 @@ static struct attribute *hsw_events_attrs[] = { EVENT_PTR(cycles_ct), EVENT_PTR(mem_ld_hsw), EVENT_PTR(mem_st_hsw), + EVENT_PTR(td_slots_issued), + EVENT_PTR(td_slots_issued_scale), + EVENT_PTR(td_slots_issued_pc), + EVENT_PTR(td_slots_retired), + EVENT_PTR(td_slots_retired_scale), + EVENT_PTR(td_slots_retired_pc), + EVENT_PTR(td_fetch_bubbles), + EVENT_PTR(td_fetch_bubbles_scale), + EVENT_PTR(td_fetch_bubbles_pc), + EVENT_PTR(td_total_slots), + EVENT_PTR(td_total_slots_scale), + EVENT_PTR(td_total_slots_pc), + EVENT_PTR(td_recovery_bubbles), + EVENT_PTR(td_recovery_bubbles_scale), + EVENT_PTR(td_recovery_bubbles_pc), NULL }; @@ -3494,6 +3570,12 @@ __init int intel_pmu_init(void) memcpy(hw_cache_extra_regs, skl_hw_cache_extra_regs, sizeof(hw_cache_extra_regs)); intel_pmu_lbr_init_skl(); + /* INT_MISC.RECOVERY_CYCLES has umask 1 in Skylake */ + event_attr_td_recovery_bubbles.event_str_noht = + "event=0xd,umask=0x1,cmask=1"; + event_attr_td_recovery_bubbles.event_str_ht = + "event=0xd,umask=0x1,cmask=1,any=1"; + x86_pmu.event_constraints = intel_skl_event_constraints; x86_pmu.pebs_constraints = intel_skl_pebs_event_constraints; x86_pmu.extra_regs = intel_skl_extra_regs; -- 2.4.3