linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andi Kleen <andi@firstfloor.org>
To: acme@kernel.org
Cc: jolsa@kernel.org, linux-kernel@vger.kernel.org,
	eranian@google.com, namhyung@kernel.org, peterz@infradead.org,
	mingo@kernel.org, Andi Kleen <ak@linux.intel.com>
Subject: [PATCH 6/9] x86, perf: Add Top Down events to Intel Core
Date: Fri,  7 Aug 2015 18:06:22 -0700	[thread overview]
Message-ID: <1438995985-13631-7-git-send-email-andi@firstfloor.org> (raw)
In-Reply-To: <1438995985-13631-1-git-send-email-andi@firstfloor.org>

From: Andi Kleen <ak@linux.intel.com>

Add declarations for the events needed for TopDown to the
Intel big core CPUs starting with Sandy Bridge. We need
to report different values if HyperThreading is on or off.

The only thing this patch does is to export some events
in sysfs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):

topdown-total-slots	  Available slots in the pipeline
topdown-slots-issued	  Slots issued into the pipeline
topdown-slots-retired	  Slots successfully retired
topdown-fetch-bubbles	  Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
			  from misspeculation

These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.

The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.

The kernel declares the events supported by the current
CPU and perf stat then computes the formulas based on the
available metrics.

Some events need a divisor. To handle this I redefined
".scale" slightly to let a negative value mean divide by.

For HyperThreading the any bit is needed to get accurate
values when both threads are executing. This implies that
the events can only be collected as root or with
perf_event_paranoid=-1 for now.

Hyper Threading also requires averaging events from both
threads together (the CPU cannot measure them independently).
In perf stat this is done by using per core mode, and then
forcing a divisor of two to get the average. The
new .agg-per-core attribute is added to the events, which
then forces perf stat to enable --per-core.
When hyperthreading is disabled the attribute has the value 0.

The basic scheme is based on the following paper:
Yasin,
A Top Down Method for Performance analysis and Counter architecture
ISPASS14
(pdf available via google)

with some extensions to handle HyperThreading.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel.c | 82 ++++++++++++++++++++++++++++++++++
 1 file changed, 82 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index a478e3c..65b58cb 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -217,9 +217,70 @@ struct attribute *nhm_events_attrs[] = {
 	NULL,
 };
 
+/*
+ * TopDown events for Core.
+ *
+ * With Hyper Threading on, TopDown metrics are averaged between the
+ * threads of a core: (count_core0 + count_core1) / 2. The 2 is expressed
+ * as a scale parameter. We also tell perf to aggregate per core
+ * by setting the .agg-per-core attribute for the alias to 1.
+ *
+ * Some events need to be multiplied by the pipeline width (4), which
+ * is expressed as a negative scale. In HT we cancel the factor 4
+ * with the 2 dividend for the core average, so we use -2.
+ */
+
+EVENT_ATTR_STR_HT(topdown-total-slots, td_total_slots,
+	"event=0x3c,umask=0x0",			/* cpu_clk_unhalted.thread */
+	"event=0x3c,umask=0x0,any=1");		/* cpu_clk_unhalted.thread_any */
+EVENT_ATTR_STR_HT(topdown-total-slots.scale, td_total_slots_scale,
+	"-4", "-2");
+EVENT_ATTR_STR_HT(topdown-total-slots.agg-per-core, td_total_slots_pc,
+	"0", "1");
+EVENT_ATTR_STR(topdown-slots-issued, td_slots_issued,
+	"event=0xe,umask=0x1");			/* uops_issued.any */
+EVENT_ATTR_STR_HT(topdown-slots-issued.agg-per-core, td_slots_issued_pc,
+	"0", "1");
+EVENT_ATTR_STR_HT(topdown-slots-issued.scale, td_slots_issued_scale,
+	"0", "2");
+EVENT_ATTR_STR(topdown-slots-retired, td_slots_retired,
+	"event=0xc2,umask=0x2");		/* uops_retired.retire_slots */
+EVENT_ATTR_STR_HT(topdown-slots-retired.agg-per-core, td_slots_retired_pc,
+	"0", "1");
+EVENT_ATTR_STR_HT(topdown-slots-retired.scale, td_slots_retired_scale,
+	"0", "2");
+EVENT_ATTR_STR(topdown-fetch-bubbles, td_fetch_bubbles,
+	"event=0x9c,umask=0x1");		/* idq_uops_not_delivered_core */
+EVENT_ATTR_STR_HT(topdown-fetch-bubbles.agg-per-core, td_fetch_bubbles_pc,
+	"0", "1");
+EVENT_ATTR_STR_HT(topdown-fetch-bubbles.scale, td_fetch_bubbles_scale,
+	"0", "2");
+EVENT_ATTR_STR_HT(topdown-recovery-bubbles, td_recovery_bubbles,
+	"event=0xd,umask=0x3,cmask=1",		/* int_misc.recovery_cycles */
+	"event=0xd,umask=0x3,cmask=1,any=1");	/* int_misc.recovery_cycles_any */
+EVENT_ATTR_STR_HT(topdown-recovery-bubbles.scale, td_recovery_bubbles_scale,
+	"-4", "-2");
+EVENT_ATTR_STR_HT(topdown-recovery-bubbles.agg-per-core, td_recovery_bubbles_pc,
+	"0", "1");
+
 struct attribute *snb_events_attrs[] = {
 	EVENT_PTR(mem_ld_snb),
 	EVENT_PTR(mem_st_snb),
+	EVENT_PTR(td_slots_issued),
+	EVENT_PTR(td_slots_issued_scale),
+	EVENT_PTR(td_slots_issued_pc),
+	EVENT_PTR(td_slots_retired),
+	EVENT_PTR(td_slots_retired_scale),
+	EVENT_PTR(td_slots_retired_pc),
+	EVENT_PTR(td_fetch_bubbles),
+	EVENT_PTR(td_fetch_bubbles_scale),
+	EVENT_PTR(td_fetch_bubbles_pc),
+	EVENT_PTR(td_total_slots),
+	EVENT_PTR(td_total_slots_scale),
+	EVENT_PTR(td_total_slots_pc),
+	EVENT_PTR(td_recovery_bubbles),
+	EVENT_PTR(td_recovery_bubbles_scale),
+	EVENT_PTR(td_recovery_bubbles_pc),
 	NULL,
 };
 
@@ -3177,6 +3238,21 @@ static struct attribute *hsw_events_attrs[] = {
 	EVENT_PTR(cycles_ct),
 	EVENT_PTR(mem_ld_hsw),
 	EVENT_PTR(mem_st_hsw),
+	EVENT_PTR(td_slots_issued),
+	EVENT_PTR(td_slots_issued_scale),
+	EVENT_PTR(td_slots_issued_pc),
+	EVENT_PTR(td_slots_retired),
+	EVENT_PTR(td_slots_retired_scale),
+	EVENT_PTR(td_slots_retired_pc),
+	EVENT_PTR(td_fetch_bubbles),
+	EVENT_PTR(td_fetch_bubbles_scale),
+	EVENT_PTR(td_fetch_bubbles_pc),
+	EVENT_PTR(td_total_slots),
+	EVENT_PTR(td_total_slots_scale),
+	EVENT_PTR(td_total_slots_pc),
+	EVENT_PTR(td_recovery_bubbles),
+	EVENT_PTR(td_recovery_bubbles_scale),
+	EVENT_PTR(td_recovery_bubbles_pc),
 	NULL
 };
 
@@ -3494,6 +3570,12 @@ __init int intel_pmu_init(void)
 		memcpy(hw_cache_extra_regs, skl_hw_cache_extra_regs, sizeof(hw_cache_extra_regs));
 		intel_pmu_lbr_init_skl();
 
+		/* INT_MISC.RECOVERY_CYCLES has umask 1 in Skylake */
+		event_attr_td_recovery_bubbles.event_str_noht =
+			"event=0xd,umask=0x1,cmask=1";
+		event_attr_td_recovery_bubbles.event_str_ht =
+			"event=0xd,umask=0x1,cmask=1,any=1";
+
 		x86_pmu.event_constraints = intel_skl_event_constraints;
 		x86_pmu.pebs_constraints = intel_skl_pebs_event_constraints;
 		x86_pmu.extra_regs = intel_skl_extra_regs;
-- 
2.4.3


  parent reply	other threads:[~2015-08-08  1:08 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-08  1:06 Add top down metrics to perf stat Andi Kleen
2015-08-08  1:06 ` [PATCH 1/9] perf, tools: Dont stop PMU parsing on alias parse error Andi Kleen
2015-08-11 13:07   ` Jiri Olsa
2015-08-11 13:14     ` Andi Kleen
2015-08-11 13:24       ` Jiri Olsa
2015-08-11 13:40         ` Andi Kleen
2015-08-11 14:39           ` Jiri Olsa
2015-08-11 16:59             ` Andi Kleen
2015-08-08  1:06 ` [PATCH 2/9] perf, tools, stat: Support up-scaling of events Andi Kleen
2015-08-11 13:25   ` Jiri Olsa
2015-08-11 13:38     ` Andi Kleen
2015-08-11 13:54       ` Jiri Olsa
2015-08-11 17:00         ` Andi Kleen
2015-08-11 17:13           ` Jiri Olsa
2015-08-11 17:17             ` Andi Kleen
2015-08-08  1:06 ` [PATCH 3/9] perf, tools, stat: Basic support for TopDown in perf stat Andi Kleen
2015-08-08  1:06 ` [PATCH 4/9] perf, tools, stat: Add computation of TopDown formulas Andi Kleen
2015-08-08  1:06 ` [PATCH 5/9] x86, perf: Support sysfs files depending on SMT status Andi Kleen
2015-08-08  1:06 ` Andi Kleen [this message]
2015-08-08  1:06 ` [PATCH 7/9] x86, perf: Add Top Down events to Intel Atom Andi Kleen
2015-08-08  1:06 ` [PATCH 8/9] perf, tools, stat: Add extra output of counter values with -v Andi Kleen
2015-08-08  1:06 ` [PATCH 9/9] perf, tools, stat: Force --per-core mode for .agg-per-core aliases Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1438995985-13631-7-git-send-email-andi@firstfloor.org \
    --to=andi@firstfloor.org \
    --cc=acme@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=eranian@google.com \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --subject='Re: [PATCH 6/9] x86, perf: Add Top Down events to Intel Core' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).