All of lore.kernel.org
 help / color / mirror / Atom feed
* Updated and combined Sandy Bridge/Ivy Bridge perf patchkits
@ 2012-07-02 18:43 Andi Kleen
  2012-07-02 18:43 ` [PATCH 1/5] perf, x86: Improve basic Ivy Bridge support v3 Andi Kleen
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Andi Kleen @ 2012-07-02 18:43 UTC (permalink / raw)
  To: x86; +Cc: a.p.zijlstra, linux-kernel

I combined the SandyBridge and Ivy Bridge patchkit into one series,
because they always conflicted.

All review comments addressed I believe.

- Microcode load patches now rely on the patches to update microcode
on all CPUs.
- Updated version of the SandyBridge PEBS version check, to reenable
PEBS for systems with appropiate microcode
- Remove unsupported events on IvyBridge
- Add Precise Instruction Distribution for IvyBridge, to allow profiling
with less surprises and avoid some systematic errors
- Minor fixes

Please apply.

-Andi


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/5] perf, x86: Improve basic Ivy Bridge support v3
  2012-07-02 18:43 Updated and combined Sandy Bridge/Ivy Bridge perf patchkits Andi Kleen
@ 2012-07-02 18:43 ` Andi Kleen
  2012-07-02 19:26   ` Peter Zijlstra
  2012-07-02 18:43 ` [PATCH 2/5] perf, x86: Enable PDIR precise instruction profiling on IvyBridge Andi Kleen
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Andi Kleen @ 2012-07-02 18:43 UTC (permalink / raw)
  To: x86; +Cc: a.p.zijlstra, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Fork the IvyBridge support from the Sandy Bridge support to enable
some ivy specific features.

Very similar to Sandy Bridge, but:
- there is no PEBS problem.
- As Stephane pointed out .code=0xb1, .umask=0x01 is gone from the event list,
so don't do a generic backend stall event on IvyBridge.
- more changes in the next patch

v2: Remove stall event
v3: rebase to new code. completely separate switch case statement as shared
code was deemed obfuscated.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel.c |   23 ++++++++++++++++++++++-
 1 files changed, 22 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index e23e71f..a741505 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1913,7 +1913,6 @@ __init int intel_pmu_init(void)
 	case 42: /* SandyBridge */
 	case 45: /* SandyBridge, "Romely-EP" */
 		x86_add_quirk(intel_sandybridge_quirk);
-	case 58: /* IvyBridge */
 		memcpy(hw_cache_event_ids, snb_hw_cache_event_ids,
 		       sizeof(hw_cache_event_ids));
 
@@ -1937,6 +1936,28 @@ __init int intel_pmu_init(void)
 		pr_cont("SandyBridge events, ");
 		break;
 
+	case 58: /* IvyBridge */
+		memcpy(hw_cache_event_ids, snb_hw_cache_event_ids,
+		       sizeof(hw_cache_event_ids));
+
+		intel_pmu_lbr_init_snb();
+
+		x86_pmu.event_constraints = intel_snb_event_constraints;
+		x86_pmu.pebs_constraints = intel_snb_pebs_event_constraints;
+		x86_pmu.pebs_aliases = intel_pebs_aliases_snb;
+		x86_pmu.extra_regs = intel_snb_extra_regs;
+		/* all extra regs are per-cpu when HT is on */
+		x86_pmu.er_flags |= ERF_HAS_RSP_1;
+		x86_pmu.er_flags |= ERF_NO_HT_SHARING;
+
+		/* UOPS_ISSUED.ANY,c=1,i=1 to count stall cycles */
+		intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =
+			X86_CONFIG(.event=0x0e, .umask=0x01, .inv=1, .cmask=1);
+		/* no backend event */
+	      
+		pr_cont("IvyBridge events, ");
+		break;
+
 	default:
 		switch (x86_pmu.version) {
 		case 1:
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/5] perf, x86: Enable PDIR precise instruction profiling on IvyBridge
  2012-07-02 18:43 Updated and combined Sandy Bridge/Ivy Bridge perf patchkits Andi Kleen
  2012-07-02 18:43 ` [PATCH 1/5] perf, x86: Improve basic Ivy Bridge support v3 Andi Kleen
@ 2012-07-02 18:43 ` Andi Kleen
  2012-07-02 19:18   ` Peter Zijlstra
  2012-07-02 18:43 ` [PATCH 3/5] x86: Do microcode updates at CPU_STARTING, not CPU_ONLINE v2 Andi Kleen
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Andi Kleen @ 2012-07-02 18:43 UTC (permalink / raw)
  To: x86; +Cc: a.p.zijlstra, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Even with precise profiling Intel CPUs have a "skid". The sample
triggers a few cycles later than the instruction, so in some
cases there can be systematic errors where expensive instructions never
show up in the profile log.

Sandy Bridge added a new PDIR instruction retired event that randomizes
the sampling slightly. This corrects for systematic errors, so that
you should in most cases see the correct instruction getting profile hits.

Unfortunately the SandyBridge version could only work with a otherwise
quiescent CPU and was difficult to use. But now on IvyBridge this
restriction is gone and can be more widely used.

This only works for retired instructions.

I enabled it -- somewhat arbitarily -- for two 'p's or more.

To use it

perf record -e instructions:pp ...

This provides a more precise alternative to the usual cycles:pp,
however it will not account for expensive instructions.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel.c |   25 +++++++++++++++++++++++++
 1 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index a741505..e09a4ad 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1425,6 +1425,29 @@ static int intel_pmu_hw_config(struct perf_event *event)
 	return 0;
 }
 
+static int pdir_hw_config(struct perf_event *event)
+{
+	int err = intel_pmu_hw_config(event);
+
+	if (err)
+		return err;
+
+	/*
+	 * Use the PDIR instruction retired counter for two 'p's.
+	 * This will randomize samples slightly and avoid some systematic
+	 * measurement errors.
+	 * Only works for retired instructions.
+	 */
+	if (event->attr.precise_ip >= 2 &&
+	    (event->hw.config & X86_RAW_EVENT_MASK) == 0xc0) {
+		u64 pdir_event = X86_CONFIG(.event=0xc0, .umask=1);
+		event->hw.config = pdir_event |
+				(event->hw.config & ~X86_RAW_EVENT_MASK);
+	}
+
+	return 0;
+}
+
 struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr)
 {
 	if (x86_pmu.guest_get_msrs)
@@ -1955,6 +1978,8 @@ __init int intel_pmu_init(void)
 			X86_CONFIG(.event=0x0e, .umask=0x01, .inv=1, .cmask=1);
 		/* no backend event */
 	      
+		x86_pmu.hw_config = pdir_hw_config;
+
 		pr_cont("IvyBridge events, ");
 		break;
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/5] x86: Do microcode updates at CPU_STARTING, not CPU_ONLINE v2
  2012-07-02 18:43 Updated and combined Sandy Bridge/Ivy Bridge perf patchkits Andi Kleen
  2012-07-02 18:43 ` [PATCH 1/5] perf, x86: Improve basic Ivy Bridge support v3 Andi Kleen
  2012-07-02 18:43 ` [PATCH 2/5] perf, x86: Enable PDIR precise instruction profiling on IvyBridge Andi Kleen
@ 2012-07-02 18:43 ` Andi Kleen
  2012-07-02 18:43 ` [PATCH 4/5] perf, x86: check ucode before disabling PEBS on SandyBridge v4 Andi Kleen
  2012-07-02 18:43 ` [PATCH 5/5] perf, x86: Spell Romley correctly Andi Kleen
  4 siblings, 0 replies; 20+ messages in thread
From: Andi Kleen @ 2012-07-02 18:43 UTC (permalink / raw)
  To: x86; +Cc: a.p.zijlstra, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Do microcode updates of resuming or newling plugged CPUs earlier
in CPU_STARTING instead of later when ONLINE. This prevents races
with parallel users who may need a microcode update to avoid some
problem.

Since we cannot request the microcode from udev at this stage,
try to grab the microcode from another CPU. This is also more efficient
because it avoids redundant loads. In addition to that
it avoids the need for separate paths for resume and CPU bootup.

This requires invalidating the microcodes on other CPUs on free.
Each CPU does this in parallel, so it's not a big problem. Each
CPU touches at most NR_CPUs memory locations.

When there is no good microcode available the update is delayed
until the update can be requested. In the normal cases it should
be available.

v2: Review updates
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/microcode_core.c  |   65 +++++++++++++++++++++++++------------
 arch/x86/kernel/microcode_intel.c |   13 +++++++-
 2 files changed, 56 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kernel/microcode_core.c b/arch/x86/kernel/microcode_core.c
index fbdfc69..d65cf31 100644
--- a/arch/x86/kernel/microcode_core.c
+++ b/arch/x86/kernel/microcode_core.c
@@ -358,20 +358,7 @@ static void microcode_fini_cpu(int cpu)
 	uci->valid = 0;
 }
 
-static enum ucode_state microcode_resume_cpu(int cpu)
-{
-	struct ucode_cpu_info *uci = ucode_cpu_info + cpu;
-
-	if (!uci->mc)
-		return UCODE_NFOUND;
-
-	pr_debug("CPU%d updated upon resume\n", cpu);
-	apply_microcode_on_target(cpu);
-
-	return UCODE_OK;
-}
-
-static enum ucode_state microcode_init_cpu(int cpu)
+static enum ucode_state microcode_init_cpu_late(int cpu)
 {
 	enum ucode_state ustate;
 
@@ -392,15 +379,44 @@ static enum ucode_state microcode_init_cpu(int cpu)
 	return ustate;
 }
 
-static enum ucode_state microcode_update_cpu(int cpu)
+/* Grab ucode from another CPU */
+
+static void clone_ucode_data(void)
+{
+	int cpu = smp_processor_id();
+	int i;
+
+	for_each_online_cpu (i) {
+		if (ucode_cpu_info[i].mc &&
+			ucode_cpu_info[i].valid &&
+			cpu_data(i).x86 == cpu_data(cpu).x86 &&
+			cpu_data(i).x86_model == cpu_data(cpu).x86_model) {
+			ucode_cpu_info[cpu].mc = ucode_cpu_info[i].mc;
+			break;
+		}
+	}
+}
+
+static void microcode_init_cpu_early(int cpu)
+{
+	clone_ucode_data();
+	/* We can request later when the CPU is online */
+	if (ucode_cpu_info[cpu].mc == NULL)
+		return;
+	if (microcode_ops->collect_cpu_info(cpu, &ucode_cpu_info[cpu].cpu_sig))
+		return;
+	if (microcode_ops->apply_microcode(smp_processor_id()))
+		pr_warn("CPU%d microcode update failed\n", cpu);
+}
+
+static enum ucode_state microcode_update_cpu_late(int cpu)
 {
 	struct ucode_cpu_info *uci = ucode_cpu_info + cpu;
 	enum ucode_state ustate;
 
-	if (uci->valid)
-		ustate = microcode_resume_cpu(cpu);
-	else
-		ustate = microcode_init_cpu(cpu);
+	/* Resume already done early */
+	if (!uci->valid)
+		ustate = microcode_init_cpu_late(cpu);
 
 	return ustate;
 }
@@ -418,7 +434,7 @@ static int mc_device_add(struct device *dev, struct subsys_interface *sif)
 	if (err)
 		return err;
 
-	if (microcode_init_cpu(cpu) == UCODE_ERROR)
+	if (microcode_init_cpu_late(cpu) == UCODE_ERROR)
 		return -EINVAL;
 
 	return err;
@@ -468,9 +484,16 @@ mc_cpu_callback(struct notifier_block *nb, unsigned long action, void *hcpu)
 
 	dev = get_cpu_device(cpu);
 	switch (action) {
+	case CPU_STARTING:
+	case CPU_STARTING_FROZEN:
+		microcode_init_cpu_early(cpu);
+		break;
+
 	case CPU_ONLINE:
 	case CPU_ONLINE_FROZEN:
-		microcode_update_cpu(cpu);
+		/* Retry again in case we couldn't request early */
+		if (cpu_data(cpu).microcode != ucode_cpu_info[cpu].cpu_sig.rev)
+			microcode_update_cpu_late(cpu);
 	case CPU_DOWN_FAILED:
 	case CPU_DOWN_FAILED_FROZEN:
 		pr_debug("CPU%d added\n", cpu);
diff --git a/arch/x86/kernel/microcode_intel.c b/arch/x86/kernel/microcode_intel.c
index 0327e2b..899057b 100644
--- a/arch/x86/kernel/microcode_intel.c
+++ b/arch/x86/kernel/microcode_intel.c
@@ -329,6 +329,16 @@ static int apply_microcode(int cpu)
 	return 0;
 }
 
+static void invalidate_microcode(void *data)
+{
+	int i;
+
+	for_each_possible_cpu (i) {
+		if (ucode_cpu_info[i].mc == data)
+			ucode_cpu_info[i].mc = NULL;
+	}
+}
+
 static enum ucode_state generic_load_microcode(int cpu, void *data, size_t size,
 				int (*get_ucode_data)(void *, const void *, size_t))
 {
@@ -391,6 +401,7 @@ static enum ucode_state generic_load_microcode(int cpu, void *data, size_t size,
 	}
 
 	vfree(uci->mc);
+	invalidate_microcode(uci->mc);
 	uci->mc = (struct microcode_intel *)new_mc;
 
 	pr_debug("CPU%d found a matching microcode update with version 0x%x (current=0x%x)\n",
@@ -444,7 +455,7 @@ static void microcode_fini_cpu(int cpu)
 	struct ucode_cpu_info *uci = ucode_cpu_info + cpu;
 
 	vfree(uci->mc);
-	uci->mc = NULL;
+	invalidate_microcode(uci->mc);
 }
 
 static struct microcode_ops microcode_intel_ops = {
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/5] perf, x86: check ucode before disabling PEBS on SandyBridge v4
  2012-07-02 18:43 Updated and combined Sandy Bridge/Ivy Bridge perf patchkits Andi Kleen
                   ` (2 preceding siblings ...)
  2012-07-02 18:43 ` [PATCH 3/5] x86: Do microcode updates at CPU_STARTING, not CPU_ONLINE v2 Andi Kleen
@ 2012-07-02 18:43 ` Andi Kleen
  2012-07-02 18:43 ` [PATCH 5/5] perf, x86: Spell Romley correctly Andi Kleen
  4 siblings, 0 replies; 20+ messages in thread
From: Andi Kleen @ 2012-07-02 18:43 UTC (permalink / raw)
  To: x86; +Cc: a.p.zijlstra, linux-kernel, Stephane Eranian, Andi Kleen

From: Stephane Eranian <eranian@google.com>

[AK: Updated version of Stephane's patch with various changes.
This version now relies on the global microcode update that
has been implemented in another patch. It also assumes that
the BIOS puts the same microcode version on every CPU.]

This patch checks the microcode version before disabling
PEBS on SandyBridge model 42 (desktop, mobile), and 45 (SNB-EP).
PEBS was disabled for both models due to an erratum.

A workaround is implemented by micro-code specific to different models.
This patch checks the microcode version and disables PEBS support
only if the needed fixes are not available.

The check is done each time a PEBS event is created and NOT at boot
time because the micro-code update may only be done after the kernel
has booted.

Go to downloadcenter.intel.com to download microcode updates.
Need microcode update 6/6/2012 or later.

v2: Was Stephane's old revision
v3: Use boot_cpu_data.microcode (H. Peter Anvin)
v4: Various updates. Now improved model check that handles both
C1 and C2 steppings.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel.c |   41 ++++++++++++++++++++++++-------
 1 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index e09a4ad..b255da3 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -13,6 +13,7 @@
 
 #include <asm/hardirq.h>
 #include <asm/apic.h>
+#include <asm/processor.h>
 
 #include "perf_event.h"
 
@@ -1392,6 +1393,29 @@ static void intel_pebs_aliases_snb(struct perf_event *event)
 	}
 }
 
+static int check_pebs_quirks(void)
+{
+	/* With different CPUs boot_cpu_data ucode will be 0 */
+	int model = boot_cpu_data.x86_model;
+	int ucode = boot_cpu_data.microcode;
+	int stepping = boot_cpu_data.x86_mask;
+
+	/* do not have PEBS to begin with */
+	if (!x86_pmu.pebs)
+		return 0;
+
+	/*
+	 * check ucode version for SNB, SNB-EP
+	 */
+	if ((model == 42 && ucode < 0x28) ||
+		(model == 45 && ucode < 0x70c) ||
+		(model == 45 && stepping == 6 && ucode < 0x618)) {
+		pr_warn_once("SandyBridge PEBS unavailable due to CPU erratum, update microcode\n");
+		return -ENOTSUPP;
+	}
+	return 0;
+}
+
 static int intel_pmu_hw_config(struct perf_event *event)
 {
 	int ret = x86_pmu_hw_config(event);
@@ -1399,8 +1423,13 @@ static int intel_pmu_hw_config(struct perf_event *event)
 	if (ret)
 		return ret;
 
-	if (event->attr.precise_ip && x86_pmu.pebs_aliases)
-		x86_pmu.pebs_aliases(event);
+	if (event->attr.precise_ip) {
+		if (check_pebs_quirks())
+			return -ENOTSUPP;
+
+		if (x86_pmu.pebs_aliases)
+			x86_pmu.pebs_aliases(event);
+	}
 
 	if (intel_pmu_needs_lbr_smpl(event)) {
 		ret = intel_pmu_setup_lbr_filter(event);
@@ -1735,13 +1764,6 @@ static __init void intel_clovertown_quirk(void)
 	x86_pmu.pebs_constraints = NULL;
 }
 
-static __init void intel_sandybridge_quirk(void)
-{
-	printk(KERN_WARNING "PEBS disabled due to CPU errata.\n");
-	x86_pmu.pebs = 0;
-	x86_pmu.pebs_constraints = NULL;
-}
-
 static const struct { int id; char *name; } intel_arch_events_map[] __initconst = {
 	{ PERF_COUNT_HW_CPU_CYCLES, "cpu cycles" },
 	{ PERF_COUNT_HW_INSTRUCTIONS, "instructions" },
@@ -1935,7 +1957,6 @@ __init int intel_pmu_init(void)
 
 	case 42: /* SandyBridge */
 	case 45: /* SandyBridge, "Romely-EP" */
-		x86_add_quirk(intel_sandybridge_quirk);
 		memcpy(hw_cache_event_ids, snb_hw_cache_event_ids,
 		       sizeof(hw_cache_event_ids));
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/5] perf, x86: Spell Romley correctly.
  2012-07-02 18:43 Updated and combined Sandy Bridge/Ivy Bridge perf patchkits Andi Kleen
                   ` (3 preceding siblings ...)
  2012-07-02 18:43 ` [PATCH 4/5] perf, x86: check ucode before disabling PEBS on SandyBridge v4 Andi Kleen
@ 2012-07-02 18:43 ` Andi Kleen
  4 siblings, 0 replies; 20+ messages in thread
From: Andi Kleen @ 2012-07-02 18:43 UTC (permalink / raw)
  To: x86; +Cc: a.p.zijlstra, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Romley is spelled Romley, not Romely.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index b255da3..81a4915 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1956,7 +1956,7 @@ __init int intel_pmu_init(void)
 		break;
 
 	case 42: /* SandyBridge */
-	case 45: /* SandyBridge, "Romely-EP" */
+	case 45: /* SandyBridge, "Romley-EP" */
 		memcpy(hw_cache_event_ids, snb_hw_cache_event_ids,
 		       sizeof(hw_cache_event_ids));
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] perf, x86: Enable PDIR precise instruction profiling on IvyBridge
  2012-07-02 18:43 ` [PATCH 2/5] perf, x86: Enable PDIR precise instruction profiling on IvyBridge Andi Kleen
@ 2012-07-02 19:18   ` Peter Zijlstra
  2012-07-02 19:57     ` Andi Kleen
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2012-07-02 19:18 UTC (permalink / raw)
  To: Andi Kleen; +Cc: x86, linux-kernel, Andi Kleen

On Mon, 2012-07-02 at 11:43 -0700, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> Even with precise profiling Intel CPUs have a "skid". The sample
> triggers a few cycles later than the instruction, so in some
> cases there can be systematic errors where expensive instructions never
> show up in the profile log.
> 
> Sandy Bridge added a new PDIR instruction retired event that randomizes
> the sampling slightly. This corrects for systematic errors, so that
> you should in most cases see the correct instruction getting profile hits.
> 
> Unfortunately the SandyBridge version could only work with a otherwise
> quiescent CPU and was difficult to use. But now on IvyBridge this
> restriction is gone and can be more widely used.
> 
> This only works for retired instructions.
> 
> I enabled it -- somewhat arbitarily -- for two 'p's or more.
> 
> To use it
> 
> perf record -e instructions:pp ...
> 
> This provides a more precise alternative to the usual cycles:pp,
> however it will not account for expensive instructions.

Already told you no.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] perf, x86: Improve basic Ivy Bridge support v3
  2012-07-02 18:43 ` [PATCH 1/5] perf, x86: Improve basic Ivy Bridge support v3 Andi Kleen
@ 2012-07-02 19:26   ` Peter Zijlstra
  2012-07-02 19:58     ` Andi Kleen
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2012-07-02 19:26 UTC (permalink / raw)
  To: Andi Kleen; +Cc: x86, linux-kernel, Andi Kleen

On Mon, 2012-07-02 at 11:43 -0700, Andi Kleen wrote:
> - As Stephane pointed out .code=0xb1, .umask=0x01 is gone from the
> event list,
> so don't do a generic backend stall event on IvyBridge. 

325462-043US, May 2012:

Page 3135, Table 19-2. Non-Architectural Performance Events In the
Processor Core of Third Generation Intel Core i7, i5, i3 Processors

(aka. IvyBridge)

B1H 01H UOPS_EXECUTED.THREAD Counts total number of uops to be
                             executed per-thread each cycle. Set
                             Cmask = 1, INV =1 to count stall
                             cycles.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] perf, x86: Enable PDIR precise instruction profiling on IvyBridge
  2012-07-02 19:18   ` Peter Zijlstra
@ 2012-07-02 19:57     ` Andi Kleen
  2012-07-02 20:22       ` Peter Zijlstra
  0 siblings, 1 reply; 20+ messages in thread
From: Andi Kleen @ 2012-07-02 19:57 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Andi Kleen, x86, linux-kernel, Andi Kleen

> Already told you no.

No on what? You don't want PDIR mode? Or you don't like something
with my implementation?

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] perf, x86: Improve basic Ivy Bridge support v3
  2012-07-02 19:26   ` Peter Zijlstra
@ 2012-07-02 19:58     ` Andi Kleen
  2012-07-02 20:18       ` Peter Zijlstra
  0 siblings, 1 reply; 20+ messages in thread
From: Andi Kleen @ 2012-07-02 19:58 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Andi Kleen, x86, linux-kernel, Andi Kleen

On Mon, Jul 02, 2012 at 09:26:34PM +0200, Peter Zijlstra wrote:
> On Mon, 2012-07-02 at 11:43 -0700, Andi Kleen wrote:
> > - As Stephane pointed out .code=0xb1, .umask=0x01 is gone from the
> > event list,
> > so don't do a generic backend stall event on IvyBridge. 
> 
> 325462-043US, May 2012:
> 
> Page 3135, Table 19-2. Non-Architectural Performance Events In the
> Processor Core of Third Generation Intel Core i7, i5, i3 Processors

The table is outdated.

But if you insist can readd it. It's just unlikely to work.

-Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] perf, x86: Improve basic Ivy Bridge support v3
  2012-07-02 19:58     ` Andi Kleen
@ 2012-07-02 20:18       ` Peter Zijlstra
  0 siblings, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2012-07-02 20:18 UTC (permalink / raw)
  To: Andi Kleen; +Cc: x86, linux-kernel, Andi Kleen, Stephane Eranian

On Mon, 2012-07-02 at 21:58 +0200, Andi Kleen wrote:
> On Mon, Jul 02, 2012 at 09:26:34PM +0200, Peter Zijlstra wrote:
> > On Mon, 2012-07-02 at 11:43 -0700, Andi Kleen wrote:
> > > - As Stephane pointed out .code=0xb1, .umask=0x01 is gone from the
> > > event list,
> > > so don't do a generic backend stall event on IvyBridge. 
> > 
> > 325462-043US, May 2012:
> > 
> > Page 3135, Table 19-2. Non-Architectural Performance Events In the
> > Processor Core of Third Generation Intel Core i7, i5, i3 Processors
> 
> The table is outdated.
>
> But if you insist can readd it. It's just unlikely to work.

But you didn't state anything of the like. If anybody deviates from the
SDM they had better well bloody mention it. And you being from Intel, I
would very much like to have an Official (TM) statement right along with
it. Preferably in the form of a new SDM.

Also, you didn't CC Stephane, nor was it mentioned where this
observation was made from.

In short, the typical shoddy quality one expects from you.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] perf, x86: Enable PDIR precise instruction profiling on IvyBridge
  2012-07-02 19:57     ` Andi Kleen
@ 2012-07-02 20:22       ` Peter Zijlstra
  2012-07-02 21:00         ` Andi Kleen
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2012-07-02 20:22 UTC (permalink / raw)
  To: Andi Kleen; +Cc: x86, linux-kernel, Andi Kleen

On Mon, 2012-07-02 at 21:57 +0200, Andi Kleen wrote:
> > Already told you no.
> 
> No on what? You don't want PDIR mode? Or you don't like something
> with my implementation?

Both are crap.

If anything its an PEBS alias, we have infrastructure for that, yet you
invent new things.

You do random things like only :pp for no apparent reason.

PDIR is only supported on a single counter, regular PEBS 'instructions'
events allow for 4+ counters. Therefore a direct substitution is not
valid.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] perf, x86: Enable PDIR precise instruction profiling on IvyBridge
  2012-07-02 20:22       ` Peter Zijlstra
@ 2012-07-02 21:00         ` Andi Kleen
  2012-07-02 21:36           ` Peter Zijlstra
  0 siblings, 1 reply; 20+ messages in thread
From: Andi Kleen @ 2012-07-02 21:00 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Andi Kleen, x86, linux-kernel

> If anything its an PEBS alias, we have infrastructure for that, yet you
> invent new things.

Ok.
> 
> You do random things like only :pp for no apparent reason.

At which p level do you want it? Or on by default?

> 
> PDIR is only supported on a single counter, regular PEBS 'instructions'
> events allow for 4+ counters. Therefore a direct substitution is not
> valid.

Ok so you want a new flag? Or a new event?

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] perf, x86: Enable PDIR precise instruction profiling on IvyBridge
  2012-07-02 21:00         ` Andi Kleen
@ 2012-07-02 21:36           ` Peter Zijlstra
  2012-07-02 21:57             ` Andi Kleen
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2012-07-02 21:36 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andi Kleen, x86, linux-kernel, Jiri Olsa, Stephane Eranian

On Mon, 2012-07-02 at 14:00 -0700, Andi Kleen wrote:
> Ok so you want a new flag? Or a new event?

Once we grow the sysfs event stuff for cpu PMUs, we could add a new
named event, but I don't think its all that important.

instructions isn't all that interesting a measure to sample. And the
people that do care can already use it by filling out the event
manually.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] perf, x86: Enable PDIR precise instruction profiling on IvyBridge
  2012-07-02 21:36           ` Peter Zijlstra
@ 2012-07-02 21:57             ` Andi Kleen
  2012-07-02 23:13               ` Stephane Eranian
  0 siblings, 1 reply; 20+ messages in thread
From: Andi Kleen @ 2012-07-02 21:57 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Andi Kleen, x86, linux-kernel, Jiri Olsa, Stephane Eranian

On Mon, Jul 02, 2012 at 11:36:54PM +0200, Peter Zijlstra wrote:
> On Mon, 2012-07-02 at 14:00 -0700, Andi Kleen wrote:
> > Ok so you want a new flag? Or a new event?
> 
> Once we grow the sysfs event stuff for cpu PMUs, we could add a new
> named event, but I don't think its all that important.
> 
> instructions isn't all that interesting a measure to sample. And the
> people that do care can already use it by filling out the event
> manually.

The idea was to make it easier to use, similar to the recent
automatic IBS setup on AMD.

It's the nearest you can get on Intel systems.

I agree it would be nicer on cycles, but instructions are in many
cases a reasonable approximation.

BTW the problem with specifying it manually is that there
is no way to enforce the SandyBridge quiescence restrictions. 
It would probably need forking the tables too and removing it on 
SandyBridge, or something special to enforce exclusive mode,
all complicated and ugly.

An explicit way to specify it side steps that all nicely.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] perf, x86: Enable PDIR precise instruction profiling on IvyBridge
  2012-07-02 21:57             ` Andi Kleen
@ 2012-07-02 23:13               ` Stephane Eranian
  2012-07-03  4:04                 ` Andi Kleen
  0 siblings, 1 reply; 20+ messages in thread
From: Stephane Eranian @ 2012-07-02 23:13 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Peter Zijlstra, Andi Kleen, x86, linux-kernel, Jiri Olsa

On Mon, Jul 2, 2012 at 11:57 PM, Andi Kleen <ak@linux.intel.com> wrote:
> On Mon, Jul 02, 2012 at 11:36:54PM +0200, Peter Zijlstra wrote:
>> On Mon, 2012-07-02 at 14:00 -0700, Andi Kleen wrote:
>> > Ok so you want a new flag? Or a new event?
>>
>> Once we grow the sysfs event stuff for cpu PMUs, we could add a new
>> named event, but I don't think its all that important.
>>
>> instructions isn't all that interesting a measure to sample. And the
>> people that do care can already use it by filling out the event
>> manually.
>
> The idea was to make it easier to use, similar to the recent
> automatic IBS setup on AMD.
>
> It's the nearest you can get on Intel systems.
>
> I agree it would be nicer on cycles, but instructions are in many
> cases a reasonable approximation.
>
> BTW the problem with specifying it manually is that there
> is no way to enforce the SandyBridge quiescence restrictions.

Well, the only way I can think of would be to request that the
event be programmed with exclusive mode ALWAYS. That
should not be to hard to add.

> It would probably need forking the tables too and removing it on
> SandyBridge, or something special to enforce exclusive mode,
> all complicated and ugly.
>
> An explicit way to specify it side steps that all nicely.
>
> -Andi
>
> --
> ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] perf, x86: Enable PDIR precise instruction profiling on IvyBridge
  2012-07-02 23:13               ` Stephane Eranian
@ 2012-07-03  4:04                 ` Andi Kleen
  2012-07-05 15:45                   ` Stephane Eranian
  0 siblings, 1 reply; 20+ messages in thread
From: Andi Kleen @ 2012-07-03  4:04 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Andi Kleen, Peter Zijlstra, Andi Kleen, x86, linux-kernel, Jiri Olsa

> Well, the only way I can think of would be to request that the
> event be programmed with exclusive mode ALWAYS. That
> should not be to hard to add.

I suspect it would need some special callbacks from the events core,
that's why I only tried on IvyBridge.

Anyways, I'll drop it as the people in charge didn't approve of 
instruction sampling.

-Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] perf, x86: Enable PDIR precise instruction profiling on IvyBridge
  2012-07-03  4:04                 ` Andi Kleen
@ 2012-07-05 15:45                   ` Stephane Eranian
  2012-07-05 17:26                     ` Andi Kleen
  0 siblings, 1 reply; 20+ messages in thread
From: Stephane Eranian @ 2012-07-05 15:45 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andi Kleen, Peter Zijlstra, x86, linux-kernel, Jiri Olsa

On Tue, Jul 3, 2012 at 6:04 AM, Andi Kleen <andi@firstfloor.org> wrote:
>> Well, the only way I can think of would be to request that the
>> event be programmed with exclusive mode ALWAYS. That
>> should not be to hard to add.
>
> I suspect it would need some special callbacks from the events core,
> that's why I only tried on IvyBridge.
>x
You could simply add a check in the x86 init code like what is done
with the pebs events and precise mode.

> Anyways, I'll drop it as the people in charge didn't approve of
> instruction sampling.
>
> -Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] perf, x86: Enable PDIR precise instruction profiling on IvyBridge
  2012-07-05 15:45                   ` Stephane Eranian
@ 2012-07-05 17:26                     ` Andi Kleen
  2012-07-06  1:04                       ` Stephane Eranian
  0 siblings, 1 reply; 20+ messages in thread
From: Andi Kleen @ 2012-07-05 17:26 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: Andi Kleen, Peter Zijlstra, x86, linux-kernel, Jiri Olsa

On Thu, Jul 05, 2012 at 05:45:16PM +0200, Stephane Eranian wrote:
> On Tue, Jul 3, 2012 at 6:04 AM, Andi Kleen <andi@firstfloor.org> wrote:
> >> Well, the only way I can think of would be to request that the
> >> event be programmed with exclusive mode ALWAYS. That
> >> should not be to hard to add.
> >
> > I suspect it would need some special callbacks from the events core,
> > that's why I only tried on IvyBridge.
> >x
> You could simply add a check in the x86 init code like what is done
> with the pebs events and precise mode.

Ah requiring the user to set exclusive? Maybe. Right now perf does not even
have an option to set it. Would seem user unfriendly to me though.

-Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] perf, x86: Enable PDIR precise instruction profiling on IvyBridge
  2012-07-05 17:26                     ` Andi Kleen
@ 2012-07-06  1:04                       ` Stephane Eranian
  0 siblings, 0 replies; 20+ messages in thread
From: Stephane Eranian @ 2012-07-06  1:04 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andi Kleen, Peter Zijlstra, x86, linux-kernel, Jiri Olsa

On Thu, Jul 5, 2012 at 7:26 PM, Andi Kleen <ak@linux.intel.com> wrote:
> On Thu, Jul 05, 2012 at 05:45:16PM +0200, Stephane Eranian wrote:
>> On Tue, Jul 3, 2012 at 6:04 AM, Andi Kleen <andi@firstfloor.org> wrote:
>> >> Well, the only way I can think of would be to request that the
>> >> event be programmed with exclusive mode ALWAYS. That
>> >> should not be to hard to add.
>> >
>> > I suspect it would need some special callbacks from the events core,
>> > that's why I only tried on IvyBridge.
>> >x
>> You could simply add a check in the x86 init code like what is done
>> with the pebs events and precise mode.
>
> Ah requiring the user to set exclusive? Maybe. Right now perf does not even
> have an option to set it. Would seem user unfriendly to me though.

Yes, that's what I meant since the beginning. Require exclusive mode
on the event is all we need to enforce the restriction. The scheduling
would still only work on counter 1.

I know that perf record/stat do not support this but given the new event
parser it should not too hard to add an excl term to the syntax. I have
that in libpfm4 now as an option on events. That's clearly a perf_event
option like period or freq and not a HW option like inv, edge.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2012-07-06  1:04 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-02 18:43 Updated and combined Sandy Bridge/Ivy Bridge perf patchkits Andi Kleen
2012-07-02 18:43 ` [PATCH 1/5] perf, x86: Improve basic Ivy Bridge support v3 Andi Kleen
2012-07-02 19:26   ` Peter Zijlstra
2012-07-02 19:58     ` Andi Kleen
2012-07-02 20:18       ` Peter Zijlstra
2012-07-02 18:43 ` [PATCH 2/5] perf, x86: Enable PDIR precise instruction profiling on IvyBridge Andi Kleen
2012-07-02 19:18   ` Peter Zijlstra
2012-07-02 19:57     ` Andi Kleen
2012-07-02 20:22       ` Peter Zijlstra
2012-07-02 21:00         ` Andi Kleen
2012-07-02 21:36           ` Peter Zijlstra
2012-07-02 21:57             ` Andi Kleen
2012-07-02 23:13               ` Stephane Eranian
2012-07-03  4:04                 ` Andi Kleen
2012-07-05 15:45                   ` Stephane Eranian
2012-07-05 17:26                     ` Andi Kleen
2012-07-06  1:04                       ` Stephane Eranian
2012-07-02 18:43 ` [PATCH 3/5] x86: Do microcode updates at CPU_STARTING, not CPU_ONLINE v2 Andi Kleen
2012-07-02 18:43 ` [PATCH 4/5] perf, x86: check ucode before disabling PEBS on SandyBridge v4 Andi Kleen
2012-07-02 18:43 ` [PATCH 5/5] perf, x86: Spell Romley correctly Andi Kleen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.