linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9] perf/x86: implement HT counter corruption workaround
@ 2014-06-04 21:34 Stephane Eranian
  2014-06-04 21:34 ` [PATCH 1/9] perf,x86: rename er_flags to flags Stephane Eranian
                   ` (9 more replies)
  0 siblings, 10 replies; 59+ messages in thread
From: Stephane Eranian @ 2014-06-04 21:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, jolsa, zheng.z.yan, maria.n.dimakopoulou

From: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>

This patch series addresses a serious known erratum in the PMU
of Intel SandyBridge, IvyBrige, Haswell processors with hyper-threading
enabled.

The erratum is documented in the Intel specification update documents
for each processor under the errata listed below:
 - SandyBridge: BJ122
 - IvyBridge: BV98
 - Haswell: HSD29

The bug causes silent counter corruption across hyperthreads only
when measuring certain memory events (0xd0, 0xd1, 0xd2, 0xd3).
Counters measuring those events may leak counts to the sibling
counter. For instance, counter 0, thread 0 measuring event 0x81d0,
may leak to counter 0, thread 1, regardless of the event measured
there. The size of the leak is not predictible. It all depends on
the workload and the state of each sibling hyper-thread. The
corrupting events do undercount as a consequence of the leak. The
leak is compensated automatically only when the sibling counter measures
the exact same corrupting event AND the workload is on the two threads
is the same. Given, there is no way to guarantee this, a workaround
is necessary. Furthermore, there is a serious problem if the leaked counts
are added to a low-occurrence event. In that case the corruption on
the low occurrence event can be very large, e.g., orders of magnitude.

There is no HW or FW workaround for this problem.

The bug is very easy to reproduce on a loaded system.
Here is an example on a Haswell client, where CPU0 and CPU4
are siblings. We load the CPUs with a simple triad app
streaming large floating-point vector. We use 0x81d0
corrupting event (MEM_UOPS_RETIRED:ALL_LOADS) and
0x20cc (ROB_MISC_EVENTS:LBR_INSERTS). Given we are not
using the LBR, the 0x20cc event should be zero.

 $ taskset -c 0 triad &
 $ taskset -c 4 triad &
 $ perf stat -a -C 0 -e r81d0 sleep 100 &
 $ perf stat -a -C 4 -r20cc sleep 10
 Performance counter stats for 'system wide':
       139 277 291      r20cc
      10,000969126 seconds time elapsed

In this example, 0x81d0 and r20cc are using sibling counters
on CPU0 and CPU4. 0x81d0 leaks into 0x20cc and corrupts it
from 0 to 139 millions occurrences.

This patch provides a software workaround to this problem by modifying the
way events are scheduled onto counters by the kernel. The patch forces
cross-thread mutual exclusion between sibling counters in case a
corrupting event is measured by one of the hyper-threads. If thread 0,
counter 0 is measuring event 0x81d0, then nothing can be measured on
counter 0, thread 1. If no corrupting event is measured on any hyper-thread,
event scheduling proceeds as before.

The same example run with the workaround enabled, yields the correct answer:
 $ taskset -c 0 triad &
 $ taskset -c 4 triad &
 $ perf stat -a -C 0 -e r81d0 sleep 100 &
 $ perf stat -a -C 4 -r20cc sleep 10
 Performance counter stats for 'system wide':
       0 r20cc
      10,000969126 seconds time elapsed

The patch does provide correctness for all non-corrupting events. It does not
"repatriate" the leaked counts back to the leaking counter. This is planned
for a second patch series. This patch series however makes this repatriation
easier by guaranteeing that the sibling counter is not measuring any useful event.

The patch introduces dynamic constraints for events. That means that events which
did not have constraints, i.e., could be measured on any counters, may now be
constrained to a subset of the counters depending on what is going on the sibling
thread. The algorithm is similar to a cache coherency protocol. We call it XSU
in reference to Exclusive, Shared, Unused, the 3 possible states of a PMU 
counter.  

As a consequence of the workaround, users may see an increased amount of event
multiplexing, even in situtations where there are fewer events than counters measured
on a CPU.

Patch has been tested on all three impacted processors. Note that when
HT is off, there is no corruption. However, the workaround is still enabled,
yet not costing too much. Adding a dynamic detection of HT on turned out to
be complex are requiring too much to code to be justified. This patch series
also covers events used with PEBS.

Maria Dimakopoulou (6):
  perf/x86: add 3 new scheduling callbacks
  perf/x86: add cross-HT counter exclusion infrastructure
  perf/x86: implement cross-HT corruption bug workaround
  perf/x86: enforce HT bug workaround for SNB/IVB/HSW
  perf/x86: enforce HT bug workaround with PEBS for SNB/IVB/HSW
  perf/x86: add syfs entry to disable HT bug workaround

Stephane Eranian (3):
  perf,x86: rename er_flags to flags
  pref/x86: vectorize cpuc->kfree_on_online
  perf/x86: fix intel_get_event_constraints() for dynamic constraints

 arch/x86/kernel/cpu/perf_event.c          |   81 ++++-
 arch/x86/kernel/cpu/perf_event.h          |   76 ++++-
 arch/x86/kernel/cpu/perf_event_amd.c      |    3 +-
 arch/x86/kernel/cpu/perf_event_intel.c    |  456 +++++++++++++++++++++++++++--
 arch/x86/kernel/cpu/perf_event_intel_ds.c |   40 +--
 5 files changed, 586 insertions(+), 70 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 1/9] perf,x86: rename er_flags to flags
  2014-06-04 21:34 [PATCH 0/9] perf/x86: implement HT counter corruption workaround Stephane Eranian
@ 2014-06-04 21:34 ` Stephane Eranian
  2014-06-04 21:34 ` [PATCH 2/9] pref/x86: vectorize cpuc->kfree_on_online Stephane Eranian
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 59+ messages in thread
From: Stephane Eranian @ 2014-06-04 21:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, jolsa, zheng.z.yan, maria.n.dimakopoulou

Because it will be used for more than just
tracking the presence of extra registers.

Reviewed-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
---
 arch/x86/kernel/cpu/perf_event.h       |    9 ++++++---
 arch/x86/kernel/cpu/perf_event_intel.c |   20 ++++++++++----------
 2 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 3b2f9bd..0b50cd2 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -463,7 +463,7 @@ struct x86_pmu {
 	 * Extra registers for events
 	 */
 	struct extra_reg *extra_regs;
-	unsigned int er_flags;
+	unsigned int flags;
 
 	/*
 	 * Intel host/guest support (KVM)
@@ -480,8 +480,11 @@ do {									\
 	x86_pmu.quirks = &__quirk;					\
 } while (0)
 
-#define ERF_NO_HT_SHARING	1
-#define ERF_HAS_RSP_1		2
+/*
+ * x86_pmu flags
+ */
+#define PMU_FL_NO_HT_SHARING	0x1 /* no hyper-threading resource sharing */
+#define PMU_FL_HAS_RSP_1	0x2 /* has 2 equivalent offcore_rsp regs   */
 
 #define EVENT_VAR(_id)  event_attr_##_id
 #define EVENT_PTR(_id) &event_attr_##_id.attr.attr
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index adb02aa..f6f8018 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1455,7 +1455,7 @@ intel_bts_constraints(struct perf_event *event)
 
 static int intel_alt_er(int idx)
 {
-	if (!(x86_pmu.er_flags & ERF_HAS_RSP_1))
+	if (!(x86_pmu.flags & PMU_FL_HAS_RSP_1))
 		return idx;
 
 	if (idx == EXTRA_REG_RSP_0)
@@ -2001,7 +2001,7 @@ static void intel_pmu_cpu_starting(int cpu)
 	if (!cpuc->shared_regs)
 		return;
 
-	if (!(x86_pmu.er_flags & ERF_NO_HT_SHARING)) {
+	if (!(x86_pmu.flags & PMU_FL_NO_HT_SHARING)) {
 		for_each_cpu(i, topology_thread_cpumask(cpu)) {
 			struct intel_shared_regs *pc;
 
@@ -2397,7 +2397,7 @@ __init int intel_pmu_init(void)
 		x86_pmu.event_constraints = intel_slm_event_constraints;
 		x86_pmu.pebs_constraints = intel_slm_pebs_event_constraints;
 		x86_pmu.extra_regs = intel_slm_extra_regs;
-		x86_pmu.er_flags |= ERF_HAS_RSP_1;
+		x86_pmu.flags |= PMU_FL_HAS_RSP_1;
 		pr_cont("Silvermont events, ");
 		break;
 
@@ -2415,7 +2415,7 @@ __init int intel_pmu_init(void)
 		x86_pmu.enable_all = intel_pmu_nhm_enable_all;
 		x86_pmu.pebs_constraints = intel_westmere_pebs_event_constraints;
 		x86_pmu.extra_regs = intel_westmere_extra_regs;
-		x86_pmu.er_flags |= ERF_HAS_RSP_1;
+		x86_pmu.flags |= PMU_FL_HAS_RSP_1;
 
 		x86_pmu.cpu_events = nhm_events_attrs;
 
@@ -2447,8 +2447,8 @@ __init int intel_pmu_init(void)
 		else
 			x86_pmu.extra_regs = intel_snb_extra_regs;
 		/* all extra regs are per-cpu when HT is on */
-		x86_pmu.er_flags |= ERF_HAS_RSP_1;
-		x86_pmu.er_flags |= ERF_NO_HT_SHARING;
+		x86_pmu.flags |= PMU_FL_HAS_RSP_1;
+		x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
 
 		x86_pmu.cpu_events = snb_events_attrs;
 
@@ -2478,8 +2478,8 @@ __init int intel_pmu_init(void)
 		else
 			x86_pmu.extra_regs = intel_snb_extra_regs;
 		/* all extra regs are per-cpu when HT is on */
-		x86_pmu.er_flags |= ERF_HAS_RSP_1;
-		x86_pmu.er_flags |= ERF_NO_HT_SHARING;
+		x86_pmu.flags |= PMU_FL_HAS_RSP_1;
+		x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
 
 		x86_pmu.cpu_events = snb_events_attrs;
 
@@ -2507,8 +2507,8 @@ __init int intel_pmu_init(void)
 		x86_pmu.extra_regs = intel_snb_extra_regs;
 		x86_pmu.pebs_aliases = intel_pebs_aliases_snb;
 		/* all extra regs are per-cpu when HT is on */
-		x86_pmu.er_flags |= ERF_HAS_RSP_1;
-		x86_pmu.er_flags |= ERF_NO_HT_SHARING;
+		x86_pmu.flags |= PMU_FL_HAS_RSP_1;
+		x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
 
 		x86_pmu.hw_config = hsw_hw_config;
 		x86_pmu.get_event_constraints = hsw_get_event_constraints;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 2/9] pref/x86: vectorize cpuc->kfree_on_online
  2014-06-04 21:34 [PATCH 0/9] perf/x86: implement HT counter corruption workaround Stephane Eranian
  2014-06-04 21:34 ` [PATCH 1/9] perf,x86: rename er_flags to flags Stephane Eranian
@ 2014-06-04 21:34 ` Stephane Eranian
  2014-06-04 21:34 ` [PATCH 3/9] perf/x86: add 3 new scheduling callbacks Stephane Eranian
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 59+ messages in thread
From: Stephane Eranian @ 2014-06-04 21:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, jolsa, zheng.z.yan, maria.n.dimakopoulou

Make the cpuc->kfree_on_online a vector to accomodate
more than one entry and add the second entry to be
used by a later patch.

Reviewed-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
---
 arch/x86/kernel/cpu/perf_event.c       |   10 +++++++---
 arch/x86/kernel/cpu/perf_event.h       |    8 +++++++-
 arch/x86/kernel/cpu/perf_event_amd.c   |    3 ++-
 arch/x86/kernel/cpu/perf_event_intel.c |    4 +++-
 4 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 32029e3..36fb4fc 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1321,11 +1321,12 @@ x86_pmu_notifier(struct notifier_block *self, unsigned long action, void *hcpu)
 {
 	unsigned int cpu = (long)hcpu;
 	struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu);
-	int ret = NOTIFY_OK;
+	int i, ret = NOTIFY_OK;
 
 	switch (action & ~CPU_TASKS_FROZEN) {
 	case CPU_UP_PREPARE:
-		cpuc->kfree_on_online = NULL;
+		for (i = 0 ; i < X86_PERF_KFREE_MAX; i++)
+			cpuc->kfree_on_online[i] = NULL;
 		if (x86_pmu.cpu_prepare)
 			ret = x86_pmu.cpu_prepare(cpu);
 		break;
@@ -1338,7 +1339,10 @@ x86_pmu_notifier(struct notifier_block *self, unsigned long action, void *hcpu)
 		break;
 
 	case CPU_ONLINE:
-		kfree(cpuc->kfree_on_online);
+		for (i = 0 ; i < X86_PERF_KFREE_MAX; i++) {
+			kfree(cpuc->kfree_on_online[i]);
+			cpuc->kfree_on_online[i] = NULL;
+		}
 		break;
 
 	case CPU_DYING:
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 0b50cd2..1cab1c2 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -121,6 +121,12 @@ struct intel_shared_regs {
 
 #define MAX_LBR_ENTRIES		16
 
+enum {
+	X86_PERF_KFREE_SHARED = 0,
+	X86_PERF_KFREE_EXCL   = 1,
+	X86_PERF_KFREE_MAX
+};
+
 struct cpu_hw_events {
 	/*
 	 * Generic x86 PMC bits
@@ -183,7 +189,7 @@ struct cpu_hw_events {
 	/* Inverted mask of bits to clear in the perf_ctr ctrl registers */
 	u64				perf_ctr_virt_mask;
 
-	void				*kfree_on_online;
+	void				*kfree_on_online[X86_PERF_KFREE_MAX];
 };
 
 #define __EVENT_CONSTRAINT(c, n, m, w, o, f) {\
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index beeb7cc..a8d1a43 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -382,6 +382,7 @@ static int amd_pmu_cpu_prepare(int cpu)
 static void amd_pmu_cpu_starting(int cpu)
 {
 	struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu);
+	void **onln = &cpuc->kfree_on_online[X86_PERF_KFREE_SHARED];
 	struct amd_nb *nb;
 	int i, nb_id;
 
@@ -399,7 +400,7 @@ static void amd_pmu_cpu_starting(int cpu)
 			continue;
 
 		if (nb->nb_id == nb_id) {
-			cpuc->kfree_on_online = cpuc->amd_nb;
+			*onln = cpuc->amd_nb;
 			cpuc->amd_nb = nb;
 			break;
 		}
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index f6f8018..e913e46 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2002,12 +2002,14 @@ static void intel_pmu_cpu_starting(int cpu)
 		return;
 
 	if (!(x86_pmu.flags & PMU_FL_NO_HT_SHARING)) {
+		void **onln = &cpuc->kfree_on_online[X86_PERF_KFREE_SHARED];
+
 		for_each_cpu(i, topology_thread_cpumask(cpu)) {
 			struct intel_shared_regs *pc;
 
 			pc = per_cpu(cpu_hw_events, i).shared_regs;
 			if (pc && pc->core_id == core_id) {
-				cpuc->kfree_on_online = cpuc->shared_regs;
+				*onln = cpuc->shared_regs;
 				cpuc->shared_regs = pc;
 				break;
 			}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 3/9] perf/x86: add 3 new scheduling callbacks
  2014-06-04 21:34 [PATCH 0/9] perf/x86: implement HT counter corruption workaround Stephane Eranian
  2014-06-04 21:34 ` [PATCH 1/9] perf,x86: rename er_flags to flags Stephane Eranian
  2014-06-04 21:34 ` [PATCH 2/9] pref/x86: vectorize cpuc->kfree_on_online Stephane Eranian
@ 2014-06-04 21:34 ` Stephane Eranian
  2014-06-04 21:34 ` [PATCH 4/9] perf/x86: add cross-HT counter exclusion infrastructure Stephane Eranian
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 59+ messages in thread
From: Stephane Eranian @ 2014-06-04 21:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, jolsa, zheng.z.yan, maria.n.dimakopoulou

From: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>

This patch adds 3 new PMU model specific callbacks
during the event scheduling done by x86_schedule_events().

- start_scheduling: invoked when entering the schedule routine.
- stop_scheduling: invoked at the end of the schedule routine
- commit_scheduling: invoked for each committed event

To be used optionally by model-specific code.

Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
---
 arch/x86/kernel/cpu/perf_event.c |    9 +++++++++
 arch/x86/kernel/cpu/perf_event.h |    8 ++++++++
 2 files changed, 17 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 36fb4fc..858a72a 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -733,6 +733,9 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
 
 	bitmap_zero(used_mask, X86_PMC_IDX_MAX);
 
+	if (x86_pmu.start_scheduling)
+		x86_pmu.start_scheduling(cpuc);
+
 	for (i = 0, wmin = X86_PMC_IDX_MAX, wmax = 0; i < n; i++) {
 		hwc = &cpuc->event_list[i]->hw;
 		c = x86_pmu.get_event_constraints(cpuc, cpuc->event_list[i]);
@@ -779,6 +782,8 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
 		for (i = 0; i < n; i++) {
 			e = cpuc->event_list[i];
 			e->hw.flags |= PERF_X86_EVENT_COMMITTED;
+			if (x86_pmu.commit_scheduling)
+				x86_pmu.commit_scheduling(cpuc, e, assign[i]);
 		}
 	}
 	/*
@@ -799,6 +804,10 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
 				x86_pmu.put_event_constraints(cpuc, e);
 		}
 	}
+
+	if (x86_pmu.stop_scheduling)
+		x86_pmu.stop_scheduling(cpuc);
+
 	return num ? -EINVAL : 0;
 }
 
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 1cab1c2..413799f 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -409,6 +409,14 @@ struct x86_pmu {
 
 	void		(*put_event_constraints)(struct cpu_hw_events *cpuc,
 						 struct perf_event *event);
+
+	void		(*commit_scheduling)(struct cpu_hw_events *cpuc,
+					     struct perf_event *event, int cntr);
+
+	void		(*start_scheduling)(struct cpu_hw_events *cpuc);
+
+	void		(*stop_scheduling)(struct cpu_hw_events *cpuc);
+
 	struct event_constraint *event_constraints;
 	struct x86_pmu_quirk *quirks;
 	int		perfctr_second_write;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 4/9] perf/x86: add cross-HT counter exclusion infrastructure
  2014-06-04 21:34 [PATCH 0/9] perf/x86: implement HT counter corruption workaround Stephane Eranian
                   ` (2 preceding siblings ...)
  2014-06-04 21:34 ` [PATCH 3/9] perf/x86: add 3 new scheduling callbacks Stephane Eranian
@ 2014-06-04 21:34 ` Stephane Eranian
  2014-06-05  7:47   ` Peter Zijlstra
                     ` (2 more replies)
  2014-06-04 21:34 ` [PATCH 5/9] perf/x86: implement cross-HT corruption bug workaround Stephane Eranian
                   ` (5 subsequent siblings)
  9 siblings, 3 replies; 59+ messages in thread
From: Stephane Eranian @ 2014-06-04 21:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, jolsa, zheng.z.yan, maria.n.dimakopoulou

From: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>

This patch adds a new shared_regs style structure to the
per-cpu x86 state (cpuc). It is used to coordinate access
between counters which must be used with exclusion across
HyperThreads on Intel processors. This new struct is not
needed on each PMU, thus is is allocated on demand.

Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
---
 arch/x86/kernel/cpu/perf_event.h       |   40 ++++++++++++++++++--
 arch/x86/kernel/cpu/perf_event_intel.c |   63 +++++++++++++++++++++++++++++---
 2 files changed, 94 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 413799f..5da0a2b 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -65,10 +65,11 @@ struct event_constraint {
 /*
  * struct hw_perf_event.flags flags
  */
-#define PERF_X86_EVENT_PEBS_LDLAT	0x1 /* ld+ldlat data address sampling */
-#define PERF_X86_EVENT_PEBS_ST		0x2 /* st data address sampling */
-#define PERF_X86_EVENT_PEBS_ST_HSW	0x4 /* haswell style st data sampling */
-#define PERF_X86_EVENT_COMMITTED	0x8 /* event passed commit_txn */
+#define PERF_X86_EVENT_PEBS_LDLAT	0x01 /* ld+ldlat data address sampling */
+#define PERF_X86_EVENT_PEBS_ST		0x02 /* st data address sampling */
+#define PERF_X86_EVENT_PEBS_ST_HSW	0x04 /* haswell style st data sampling */
+#define PERF_X86_EVENT_COMMITTED	0x08 /* event passed commit_txn */
+#define PERF_X86_EVENT_EXCL		0x10 /* HT exclusivity on counter */
 
 struct amd_nb {
 	int nb_id;  /* NorthBridge id */
@@ -119,6 +120,27 @@ struct intel_shared_regs {
 	unsigned                core_id;	/* per-core: core id */
 };
 
+enum intel_excl_state_type {
+	INTEL_EXCL_UNUSED    = 0, /* counter is unused */
+	INTEL_EXCL_SHARED    = 1, /* counter can be used by both threads */
+	INTEL_EXCL_EXCLUSIVE = 2, /* counter can be used by one thread only */
+};
+
+struct intel_excl_states {
+	enum intel_excl_state_type init_state[X86_PMC_IDX_MAX];
+	enum intel_excl_state_type state[X86_PMC_IDX_MAX];
+};
+
+struct intel_excl_cntrs {
+	spinlock_t	lock;
+	unsigned long	lock_flags;
+
+	struct intel_excl_states states[2];
+
+	int		refcnt;		/* per-core: #HT threads */
+	unsigned	core_id;	/* per-core: core id */
+};
+
 #define MAX_LBR_ENTRIES		16
 
 enum {
@@ -181,6 +203,11 @@ struct cpu_hw_events {
 	 * used on Intel NHM/WSM/SNB
 	 */
 	struct intel_shared_regs	*shared_regs;
+	/*
+	 * manage exclusive counter access between hyperthread
+	 */
+	struct intel_excl_cntrs		*excl_cntrs;
+	int excl_thread_id; /* 0 or 1 */
 
 	/*
 	 * AMD specific bits
@@ -204,6 +231,10 @@ struct cpu_hw_events {
 #define EVENT_CONSTRAINT(c, n, m)	\
 	__EVENT_CONSTRAINT(c, n, m, HWEIGHT(n), 0, 0)
 
+#define INTEL_EXCLEVT_CONSTRAINT(c, n)	\
+	__EVENT_CONSTRAINT(c, n, ARCH_PERFMON_EVENTSEL_EVENT, HWEIGHT(n),\
+			   0, PERF_X86_EVENT_EXCL)
+
 /*
  * The overlap flag marks event constraints with overlapping counter
  * masks. This is the case if the counter mask of such an event is not
@@ -499,6 +530,7 @@ do {									\
  */
 #define PMU_FL_NO_HT_SHARING	0x1 /* no hyper-threading resource sharing */
 #define PMU_FL_HAS_RSP_1	0x2 /* has 2 equivalent offcore_rsp regs   */
+#define PMU_FL_EXCL_CNTRS	0x4 /* has exclusive counter requirements  */
 
 #define EVENT_VAR(_id)  event_attr_##_id
 #define EVENT_PTR(_id) &event_attr_##_id.attr.attr
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index e913e46..380fce2 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1970,16 +1970,46 @@ struct intel_shared_regs *allocate_shared_regs(int cpu)
 	return regs;
 }
 
+struct intel_excl_cntrs *allocate_excl_cntrs(int cpu)
+{
+	struct intel_excl_cntrs *c;
+	int i;
+
+	c = kzalloc_node(sizeof(struct intel_excl_cntrs),
+			 GFP_KERNEL, cpu_to_node(cpu));
+	if (c) {
+		spin_lock_init(&c->lock);
+		for (i = 0; i < X86_PMC_IDX_MAX; i++) {
+			c->states[0].state[i] = INTEL_EXCL_UNUSED;
+			c->states[0].init_state[i] = INTEL_EXCL_UNUSED;
+
+			c->states[1].state[i] = INTEL_EXCL_UNUSED;
+			c->states[1].init_state[i] = INTEL_EXCL_UNUSED;
+		}
+		c->core_id = -1;
+	}
+	return c;
+}
+
 static int intel_pmu_cpu_prepare(int cpu)
 {
 	struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu);
 
-	if (!(x86_pmu.extra_regs || x86_pmu.lbr_sel_map))
-		return NOTIFY_OK;
+	if (x86_pmu.extra_regs || x86_pmu.lbr_sel_map) {
+		cpuc->shared_regs = allocate_shared_regs(cpu);
+		if (!cpuc->shared_regs)
+			return NOTIFY_BAD;
+	}
 
-	cpuc->shared_regs = allocate_shared_regs(cpu);
-	if (!cpuc->shared_regs)
-		return NOTIFY_BAD;
+	if (x86_pmu.flags & PMU_FL_EXCL_CNTRS) {
+		cpuc->excl_cntrs = allocate_excl_cntrs(cpu);
+		if (!cpuc->excl_cntrs) {
+			if (cpuc->shared_regs)
+				kfree(cpuc->shared_regs);
+			return NOTIFY_BAD;
+		}
+		cpuc->excl_thread_id = 0;
+	}
 
 	return NOTIFY_OK;
 }
@@ -2020,12 +2050,29 @@ static void intel_pmu_cpu_starting(int cpu)
 
 	if (x86_pmu.lbr_sel_map)
 		cpuc->lbr_sel = &cpuc->shared_regs->regs[EXTRA_REG_LBR];
+
+	if (x86_pmu.flags & PMU_FL_EXCL_CNTRS) {
+		for_each_cpu(i, topology_thread_cpumask(cpu)) {
+			struct intel_excl_cntrs *c;
+
+			c = per_cpu(cpu_hw_events, i).excl_cntrs;
+			if (c && c->core_id == core_id) {
+				cpuc->kfree_on_online[1] = cpuc->excl_cntrs;
+				cpuc->excl_cntrs = c;
+				cpuc->excl_thread_id = 1;
+				break;
+			}
+		}
+		cpuc->excl_cntrs->core_id = core_id;
+		cpuc->excl_cntrs->refcnt++;
+	}
 }
 
 static void intel_pmu_cpu_dying(int cpu)
 {
 	struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu);
 	struct intel_shared_regs *pc;
+	struct intel_excl_cntrs *c;
 
 	pc = cpuc->shared_regs;
 	if (pc) {
@@ -2033,6 +2080,12 @@ static void intel_pmu_cpu_dying(int cpu)
 			kfree(pc);
 		cpuc->shared_regs = NULL;
 	}
+	c = cpuc->excl_cntrs;
+	if (c) {
+		if (c->core_id == -1 || --c->refcnt == 0)
+			kfree(c);
+		cpuc->excl_cntrs = NULL;
+	}
 
 	fini_debug_store_on_cpu(cpu);
 }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 5/9] perf/x86: implement cross-HT corruption bug workaround
  2014-06-04 21:34 [PATCH 0/9] perf/x86: implement HT counter corruption workaround Stephane Eranian
                   ` (3 preceding siblings ...)
  2014-06-04 21:34 ` [PATCH 4/9] perf/x86: add cross-HT counter exclusion infrastructure Stephane Eranian
@ 2014-06-04 21:34 ` Stephane Eranian
  2014-06-05 13:38   ` Peter Zijlstra
                     ` (3 more replies)
  2014-06-04 21:34 ` [PATCH 6/9] perf/x86: enforce HT bug workaround for SNB/IVB/HSW Stephane Eranian
                   ` (4 subsequent siblings)
  9 siblings, 4 replies; 59+ messages in thread
From: Stephane Eranian @ 2014-06-04 21:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, jolsa, zheng.z.yan, maria.n.dimakopoulou

From: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>

This patch implements a software workaround for a HW erratum
on Intel SandyBridge, IvyBridge and Haswell processors
with Hyperthreading enabled. The errata are documented for
each processor in their respective specification update
documents:
 - SandyBridge: BJ122
 - IvyBridge: BV98
 - Haswell: HSD29

The bug causes silent counter corruption across hyperthreads only
when measuring certain memory events (0xd0, 0xd1, 0xd2, 0xd3).
Counters measuring those events may leak counts to the sibling
counter. For instance, counter 0, thread 0 measuring event 0xd0,
may leak to counter 0, thread 1, regardless of the event measured
there. The size of the leak is not predictible. It all depends on
the workload and the state of each sibling hyper-thread. The
corrupting events do undercount as a consequence of the leak. The
leak is compensated automatically only when the sibling counter measures
the exact same corrupting event AND the workload is on the two threads
is the same. Given, there is no way to guarantee this, a work-around
is necessary. Furthermore, there is a serious problem if the leaked count
is added to a low-occurrence event. In that case the corruption on
the low occurrence event can be very large, e.g., orders of magnitude.

There is no HW or FW workaround for this problem.

The bug is very easy to reproduce on a loaded system.
Here is an example on a Haswell client, where CPU0, CPU4
are siblings. We load the CPUs with a simple triad app
streaming large floating-point vector. We use 0x81d0
corrupting event (MEM_UOPS_RETIRED:ALL_LOADS) and
0x20cc (ROB_MISC_EVENTS:LBR_INSERTS). Given we are not
using the LBR, the 0x20cc event should be zero.

 $ taskset -c 0 triad &
 $ taskset -c 4 triad &
 $ perf stat -a -C 0 -e r81d0 sleep 100 &
 $ perf stat -a -C 4 -r20cc sleep 10
 Performance counter stats for 'system wide':
       139 277 291      r20cc
      10,000969126 seconds time elapsed

In this example, 0x81d0 and r20cc ar eusing sinling counters
on CPU0 and CPU4. 0x81d0 leaks into 0x20cc and corrupts it
from 0 to 139 millions occurrences.

This patch provides a software workaround to this problem by modifying the
way events are scheduled onto counters by the kernel. The patch forces
cross-thread mutual exclusion between counters in case a corrupting event
is measured by one of the hyper-threads. If thread 0, counter 0 is measuring
event 0xd0, then nothing can be measured on counter 0, thread 1. If no corrupting
event is measured on any hyper-thread, event scheduling proceeds as before.

The same example run with the workaround enabled, yield the correct answer:
 $ taskset -c 0 triad &
 $ taskset -c 4 triad &
 $ perf stat -a -C 0 -e r81d0 sleep 100 &
 $ perf stat -a -C 4 -r20cc sleep 10
 Performance counter stats for 'system wide':
       0 r20cc
      10,000969126 seconds time elapsed

The patch does provide correctness for all non-corrupting events. It does not
"repatriate" the leaked counts back to the leaking counter. This is planned
for a second patch series. This patch series makes this repatriation more
easy by guaranteeing the sibling counter is not measuring any useful event.

The patch introduces dynamic constraints for events. That means that events which
did not have constraints, i.e., could be measured on any counters, may now be
constrained to a subset of the counters depending on what is going on the sibling
thread. The algorithm is similar to a cache coherency protocol. We call it XSU
in reference to Exclusive, Shared, Unused, the 3 possible states of a PMU
counter.

As a consequence of the workaround, users may see an increased amount of event
multiplexing, even in situtations where there are fewer events than counters
measured on a CPU.

Patch has been tested on all three impacted processors. Note that when
HT is off, there is no corruption. However, the workaround is still enabled,
yet not costing too much. Adding a dynamic detection of HT on turned out to
be complex are requiring too much to code to be justified.

This patch addresses the issue when PEBS is not used. A subsequent patch
fixes the problem when PEBS is used.

Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
---
 arch/x86/kernel/cpu/perf_event.c       |   31 ++--
 arch/x86/kernel/cpu/perf_event.h       |    6 +
 arch/x86/kernel/cpu/perf_event_intel.c |  311 +++++++++++++++++++++++++++++++-
 3 files changed, 335 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 858a72a..314458a 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -728,7 +728,7 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
 	struct event_constraint *c;
 	unsigned long used_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
 	struct perf_event *e;
-	int i, wmin, wmax, num = 0;
+	int i, wmin, wmax, unsched = 0;
 	struct hw_perf_event *hwc;
 
 	bitmap_zero(used_mask, X86_PMC_IDX_MAX);
@@ -771,14 +771,20 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
 
 	/* slow path */
 	if (i != n)
-		num = perf_assign_events(cpuc->event_list, n, wmin,
-					 wmax, assign);
+		unsched = perf_assign_events(cpuc->event_list, n, wmin,
+					     wmax, assign);
 
 	/*
-	 * Mark the event as committed, so we do not put_constraint()
-	 * in case new events are added and fail scheduling.
+	 * In case of success (unsched = 0), mark events as committed,
+	 * so we do not put_constraint() in case new events are added
+	 * and fail to be scheduled
+	 *
+	 * We invoke the lower level commit callback to lock the resource
+	 *
+	 * We do not need to do all of this in case we are called to
+	 * validate an event group (assign == NULL)
 	 */
-	if (!num && assign) {
+	if (!unsched && assign) {
 		for (i = 0; i < n; i++) {
 			e = cpuc->event_list[i];
 			e->hw.flags |= PERF_X86_EVENT_COMMITTED;
@@ -786,11 +792,9 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
 				x86_pmu.commit_scheduling(cpuc, e, assign[i]);
 		}
 	}
-	/*
-	 * scheduling failed or is just a simulation,
-	 * free resources if necessary
-	 */
-	if (!assign || num) {
+
+	if (!assign || unsched) {
+
 		for (i = 0; i < n; i++) {
 			e = cpuc->event_list[i];
 			/*
@@ -800,6 +804,9 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
 			if ((e->hw.flags & PERF_X86_EVENT_COMMITTED))
 				continue;
 
+			/*
+			 * release events that failed scheduling
+			 */
 			if (x86_pmu.put_event_constraints)
 				x86_pmu.put_event_constraints(cpuc, e);
 		}
@@ -808,7 +815,7 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
 	if (x86_pmu.stop_scheduling)
 		x86_pmu.stop_scheduling(cpuc);
 
-	return num ? -EINVAL : 0;
+	return unsched ? -EINVAL : 0;
 }
 
 /*
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 5da0a2b..c61ca4a 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -70,6 +70,7 @@ struct event_constraint {
 #define PERF_X86_EVENT_PEBS_ST_HSW	0x04 /* haswell style st data sampling */
 #define PERF_X86_EVENT_COMMITTED	0x08 /* event passed commit_txn */
 #define PERF_X86_EVENT_EXCL		0x10 /* HT exclusivity on counter */
+#define PERF_X86_EVENT_DYNAMIC		0x20 /* dynamic alloc'd constraint */
 
 struct amd_nb {
 	int nb_id;  /* NorthBridge id */
@@ -129,6 +130,7 @@ enum intel_excl_state_type {
 struct intel_excl_states {
 	enum intel_excl_state_type init_state[X86_PMC_IDX_MAX];
 	enum intel_excl_state_type state[X86_PMC_IDX_MAX];
+	bool sched_started; /* true if scheduling has started */
 };
 
 struct intel_excl_cntrs {
@@ -288,6 +290,10 @@ struct cpu_hw_events {
 #define INTEL_UEVENT_CONSTRAINT(c, n)	\
 	EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK)
 
+#define INTEL_EXCLUEVT_CONSTRAINT(c, n)	\
+	__EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK, \
+			   HWEIGHT(n), 0, PERF_X86_EVENT_EXCL)
+
 #define INTEL_PLD_CONSTRAINT(c, n)	\
 	__EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK, \
 			   HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_LDLAT)
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 380fce2..3af9745 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1632,7 +1632,7 @@ x86_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
 }
 
 static struct event_constraint *
-intel_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
+__intel_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
 {
 	struct event_constraint *c;
 
@@ -1652,6 +1652,256 @@ intel_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event
 }
 
 static void
+intel_start_scheduling(struct cpu_hw_events *cpuc)
+{
+	struct intel_excl_cntrs *excl_cntrs = cpuc->excl_cntrs;
+	struct intel_excl_states *xl, *xlo;
+	int tid = cpuc->excl_thread_id;
+	int o_tid = 1 - tid; /* sibling thread */
+
+	/*
+	 * nothing needed if in group validation mode
+	 */
+	if (cpuc->is_fake)
+		return;
+	/*
+	 * no exclusion needed
+	 */
+	if (!excl_cntrs)
+		return;
+
+	xlo = &excl_cntrs->states[o_tid];
+	xl = &excl_cntrs->states[tid];
+
+	xl->sched_started = true;
+
+	/*
+	 * lock shared state until we are done scheduling
+	 * in stop_event_scheduling()
+	 * makes scheduling appear as a transaction
+	 */
+	spin_lock_irqsave(&excl_cntrs->lock, excl_cntrs->lock_flags);
+
+	/*
+	 * save initial state of sibling thread
+	 */
+	memcpy(xlo->init_state, xlo->state, sizeof(xlo->init_state));
+}
+
+static void
+intel_stop_scheduling(struct cpu_hw_events *cpuc)
+{
+	struct intel_excl_cntrs *excl_cntrs = cpuc->excl_cntrs;
+	struct intel_excl_states *xl, *xlo;
+	int tid = cpuc->excl_thread_id;
+	int o_tid = 1 - tid; /* sibling thread */
+	int i;
+
+	/*
+	 * nothing needed if in group validation mode
+	 */
+	if (cpuc->is_fake)
+		return;
+	/*
+	 * no exclusion needed
+	 */
+	if (!excl_cntrs)
+		return;
+
+	xlo = &excl_cntrs->states[o_tid];
+	xl = &excl_cntrs->states[tid];
+
+	/*
+	 * make new sibling thread state visible
+	 */
+	memcpy(xlo->state, xlo->init_state, sizeof(xlo->state));
+
+	xl->sched_started = false;
+	/*
+	 * release shared state lock (lock acquire in intel_start_scheduling())
+	 */
+	spin_unlock_irqrestore(&excl_cntrs->lock, excl_cntrs->lock_flags);
+}
+
+static struct event_constraint *
+intel_get_excl_constraints(struct cpu_hw_events *cpuc, struct perf_event *event,
+			   struct event_constraint *c)
+{
+	struct event_constraint *cx;
+	struct intel_excl_cntrs *excl_cntrs = cpuc->excl_cntrs;
+	struct intel_excl_states *xl, *xlo;
+	int is_excl, i;
+	int tid = cpuc->excl_thread_id;
+	int o_tid = 1 - tid; /* alternate */
+
+	/*
+	 * validating a group does not require
+	 * enforcing cross-thread  exclusion
+	 */
+	if (cpuc->is_fake)
+		return c;
+
+	/*
+	 * event requires exclusive counter access
+	 * across HT threads
+	 */
+	is_excl = c->flags & PERF_X86_EVENT_EXCL;
+
+	/*
+	 * xl = state of current HT
+	 * xlo = state of sibling HT
+	 */
+	xl = &excl_cntrs->states[tid];
+	xlo = &excl_cntrs->states[o_tid];
+
+	cx = c;
+
+	/*
+	 * because we modify the constraint, we need
+	 * to make a copy. Static constraints come
+	 * from static const tables.
+	 *
+	 * only needed when constraint has not yet
+	 * been cloned (marked dynamic)
+	 */
+	if (!(c->flags & PERF_X86_EVENT_DYNAMIC)) {
+
+		/*
+		 * in case we fail, we assume no counter
+		 * is supported to be on the safe side
+		 */
+		cx = kmalloc(sizeof(*cx), GFP_KERNEL);
+		if (!cx)
+			return &emptyconstraint;
+
+		/*
+		 * initialize dynamic constraint
+		 * with static constraint
+		 */
+		memcpy(cx, c, sizeof(*cx));
+
+		/*
+		 * mark constraint as dynamic, so we
+		 * can free it later on
+		 */
+		cx->flags |= PERF_X86_EVENT_DYNAMIC;
+	}
+
+	/*
+	 * From here on, the constraint is dynamic.
+	 * Either it was just allocated above, or it
+	 * was allocated during a earlier invocation
+	 * of this function
+	 */
+
+	/*
+	 * Modify static constraint with current dynamic
+	 * state of thread
+	 *
+	 * EXCLUSIVE: sibling counter measuring exclusive event
+	 * SHARED   : sibling counter measuring non-exclusive event
+	 * UNUSED   : sibling counter unused
+	 */
+	for_each_set_bit(i, cx->idxmsk, X86_PMC_IDX_MAX) {
+		/*
+		 * exclusive event in sibling counter
+		 * our corresponding counter cannot be used
+		 * regardless of our event
+		 */
+		if (xl->state[i] == INTEL_EXCL_EXCLUSIVE)
+			__clear_bit(i, cx->idxmsk);
+		/*
+		 * if measuring an exclusive event, sibling
+		 * measuring non-exclusive, then counter cannot
+		 * be used
+		 */
+		if (is_excl && xl->state[i] == INTEL_EXCL_SHARED)
+			__clear_bit(i, cx->idxmsk);
+	}
+
+	/*
+	 * recompute actual bit weight for scheduling algorithm
+	 */
+	cx->weight = hweight64(cx->idxmsk64);
+
+	/*
+	 * if we return an empty mask, then switch
+	 * back to static empty constraint to avoid
+	 * the cost of freeing later on
+	 */
+	if (cx->weight == 0) {
+		kfree(cx);
+		cx = &emptyconstraint;
+	}
+
+	return cx;
+}
+
+static struct event_constraint *
+intel_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
+{
+	struct event_constraint *c = event->hw.constraint;
+
+	/*
+	 * first time only
+	 * - static constraint: no change across incremental scheduling calls
+	 * - dynamic constraint: handled by intel_get_excl_constraints()
+	 */
+	if (!c)
+		c = __intel_get_event_constraints(cpuc, event);
+
+	if (cpuc->excl_cntrs)
+		return intel_get_excl_constraints(cpuc, event, c);
+
+	return c;
+}
+
+static void intel_put_excl_constraints(struct cpu_hw_events *cpuc,
+		struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	struct intel_excl_cntrs *excl_cntrs = cpuc->excl_cntrs;
+	struct intel_excl_states *xlo, *xl;
+	unsigned long flags = 0; /* keep compiler happy */
+	int tid = cpuc->excl_thread_id;
+	int o_tid = 1 - tid;
+	int i;
+
+	/*
+	 * nothing needed if in group validation mode
+	 */
+	if (cpuc->is_fake)
+		return;
+
+	WARN_ON_ONCE(!excl_cntrs);
+
+	if (!excl_cntrs)
+		return;
+
+	xl = &excl_cntrs->states[tid];
+	xlo = &excl_cntrs->states[o_tid];
+
+	/*
+	 * put_constraint may be called from x86_schedule_events()
+	 * which already has the lock held so here make locking
+	 * conditional
+	 */
+	if (!xl->sched_started)
+		spin_lock_irqsave(&excl_cntrs->lock, flags);
+
+	/*
+	 * if event was actually assigned, then mark the
+	 * counter state as unused now
+	 */
+	if (hwc->idx >= 0)
+		xlo->state[hwc->idx] = INTEL_EXCL_UNUSED;
+
+	if (!xl->sched_started)
+		spin_unlock_irqrestore(&excl_cntrs->lock, flags);
+
+}
+
+static void
 intel_put_shared_regs_event_constraints(struct cpu_hw_events *cpuc,
 					struct perf_event *event)
 {
@@ -1669,7 +1919,59 @@ intel_put_shared_regs_event_constraints(struct cpu_hw_events *cpuc,
 static void intel_put_event_constraints(struct cpu_hw_events *cpuc,
 					struct perf_event *event)
 {
+	struct event_constraint *c = event->hw.constraint;
+
 	intel_put_shared_regs_event_constraints(cpuc, event);
+
+	/*
+	 * is PMU has exclusive counter restrictions, then
+	 * all events are subject to and must call the
+	 * put_excl_constraints() routine
+	 */
+	if (c && cpuc->excl_cntrs)
+		intel_put_excl_constraints(cpuc, event);
+
+	/* free dynamic constraint */
+	if (c && (c->flags & PERF_X86_EVENT_DYNAMIC)) {
+		kfree(event->hw.constraint);
+		event->hw.constraint = NULL;
+	}
+}
+
+static void intel_commit_scheduling(struct cpu_hw_events *cpuc,
+				    struct perf_event *event, int cntr)
+{
+	struct intel_excl_cntrs *excl_cntrs = cpuc->excl_cntrs;
+	struct event_constraint *c = event->hw.constraint;
+	struct intel_excl_states *xlo, *xl;
+	int tid = cpuc->excl_thread_id;
+	int o_tid = 1 - tid;
+	int is_excl;
+
+	if (cpuc->is_fake || !c)
+		return;
+
+	is_excl = c->flags & PERF_X86_EVENT_EXCL;
+
+	if (!(c->flags & PERF_X86_EVENT_DYNAMIC))
+		return;
+
+	WARN_ON_ONCE(!excl_cntrs);
+
+	if (!excl_cntrs)
+		return;
+
+	xl = &excl_cntrs->states[tid];
+	xlo = &excl_cntrs->states[o_tid];
+
+	WARN_ON_ONCE(!spin_is_locked(&excl_cntrs->lock));
+
+	if (cntr >= 0) {
+		if (is_excl)
+			xlo->init_state[cntr] = INTEL_EXCL_EXCLUSIVE;
+		else
+			xlo->init_state[cntr] = INTEL_EXCL_SHARED;
+	}
 }
 
 static void intel_pebs_aliases_core2(struct perf_event *event)
@@ -2087,6 +2389,13 @@ static void intel_pmu_cpu_dying(int cpu)
 		cpuc->excl_cntrs = NULL;
 	}
 
+	c = cpuc->excl_cntrs;
+	if (c) {
+		if (c->core_id == -1 || --c->refcnt == 0)
+			kfree(c);
+		cpuc->excl_cntrs = NULL;
+	}
+
 	fini_debug_store_on_cpu(cpu);
 }
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 6/9] perf/x86: enforce HT bug workaround for SNB/IVB/HSW
  2014-06-04 21:34 [PATCH 0/9] perf/x86: implement HT counter corruption workaround Stephane Eranian
                   ` (4 preceding siblings ...)
  2014-06-04 21:34 ` [PATCH 5/9] perf/x86: implement cross-HT corruption bug workaround Stephane Eranian
@ 2014-06-04 21:34 ` Stephane Eranian
  2014-06-04 21:34 ` [PATCH 7/9] perf/x86: enforce HT bug workaround with PEBS " Stephane Eranian
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 59+ messages in thread
From: Stephane Eranian @ 2014-06-04 21:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, jolsa, zheng.z.yan, maria.n.dimakopoulou

From: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>

This patches activates the HT bug workaround for the
SNB/IVB/HSW processors. This covers non-PEBS mode.
Activation is done thru the constraint tables.

Both client and server processors needs this workaround.

Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
---
 arch/x86/kernel/cpu/perf_event_intel.c |   53 ++++++++++++++++++++++++++------
 1 file changed, 44 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 3af9745..b1a2684 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -113,6 +113,12 @@ static struct event_constraint intel_snb_event_constraints[] __read_mostly =
 	INTEL_EVENT_CONSTRAINT(0xcd, 0x8), /* MEM_TRANS_RETIRED.LOAD_LATENCY */
 	INTEL_UEVENT_CONSTRAINT(0x04a3, 0xf), /* CYCLE_ACTIVITY.CYCLES_NO_DISPATCH */
 	INTEL_UEVENT_CONSTRAINT(0x02a3, 0x4), /* CYCLE_ACTIVITY.CYCLES_L1D_PENDING */
+
+	INTEL_EXCLEVT_CONSTRAINT(0xd0, 0xf), /* MEM_UOPS_RETIRED.* */
+	INTEL_EXCLEVT_CONSTRAINT(0xd1, 0xf), /* MEM_LOAD_UOPS_RETIRED.* */
+	INTEL_EXCLEVT_CONSTRAINT(0xd2, 0xf), /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
+	INTEL_EXCLEVT_CONSTRAINT(0xd3, 0xf), /* MEM_LOAD_UOPS_LLC_MISS_RETIRED.* */
+
 	EVENT_CONSTRAINT_END
 };
 
@@ -131,15 +137,12 @@ static struct event_constraint intel_ivb_event_constraints[] __read_mostly =
 	INTEL_UEVENT_CONSTRAINT(0x08a3, 0x4), /* CYCLE_ACTIVITY.CYCLES_L1D_PENDING */
 	INTEL_UEVENT_CONSTRAINT(0x0ca3, 0x4), /* CYCLE_ACTIVITY.STALLS_L1D_PENDING */
 	INTEL_UEVENT_CONSTRAINT(0x01c0, 0x2), /* INST_RETIRED.PREC_DIST */
-	/*
-	 * Errata BV98 -- MEM_*_RETIRED events can leak between counters of SMT
-	 * siblings; disable these events because they can corrupt unrelated
-	 * counters.
-	 */
-	INTEL_EVENT_CONSTRAINT(0xd0, 0x0), /* MEM_UOPS_RETIRED.* */
-	INTEL_EVENT_CONSTRAINT(0xd1, 0x0), /* MEM_LOAD_UOPS_RETIRED.* */
-	INTEL_EVENT_CONSTRAINT(0xd2, 0x0), /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
-	INTEL_EVENT_CONSTRAINT(0xd3, 0x0), /* MEM_LOAD_UOPS_LLC_MISS_RETIRED.* */
+
+	INTEL_EXCLEVT_CONSTRAINT(0xd0, 0xf), /* MEM_UOPS_RETIRED.* */
+	INTEL_EXCLEVT_CONSTRAINT(0xd1, 0xf), /* MEM_LOAD_UOPS_RETIRED.* */
+	INTEL_EXCLEVT_CONSTRAINT(0xd2, 0xf), /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
+	INTEL_EXCLEVT_CONSTRAINT(0xd3, 0xf), /* MEM_LOAD_UOPS_LLC_MISS_RETIRED.* */
+
 	EVENT_CONSTRAINT_END
 };
 
@@ -217,6 +220,12 @@ static struct event_constraint intel_hsw_event_constraints[] = {
 	INTEL_EVENT_CONSTRAINT(0x0ca3, 0x4),
 	/* CYCLE_ACTIVITY.CYCLES_NO_EXECUTE */
 	INTEL_EVENT_CONSTRAINT(0x04a3, 0xf),
+
+	INTEL_EXCLEVT_CONSTRAINT(0xd0, 0xf), /* MEM_UOPS_RETIRED.* */
+	INTEL_EXCLEVT_CONSTRAINT(0xd1, 0xf), /* MEM_LOAD_UOPS_RETIRED.* */
+	INTEL_EXCLEVT_CONSTRAINT(0xd2, 0xf), /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
+	INTEL_EXCLEVT_CONSTRAINT(0xd3, 0xf), /* MEM_LOAD_UOPS_LLC_MISS_RETIRED.* */
+
 	EVENT_CONSTRAINT_END
 };
 
@@ -2584,6 +2593,27 @@ static __init void intel_nehalem_quirk(void)
 	}
 }
 
+/*
+ * enable software workaround for errata:
+ * SNB: BJ122
+ * IVB: BV98
+ * HSW: HSD29
+ *
+ * Only needed when HT is enabled. However detecting
+ * this is too difficult and model specific so we enable
+ * it even with HT off for now.
+ */
+static __init void intel_ht_bug(void)
+{
+	x86_pmu.flags |= PMU_FL_EXCL_CNTRS;
+
+	x86_pmu.commit_scheduling = intel_commit_scheduling;
+	x86_pmu.start_scheduling = intel_start_scheduling;
+	x86_pmu.stop_scheduling = intel_stop_scheduling;
+
+	pr_info("CPU erratum BJ122, BV98, HSD29 worked around\n");
+}
+
 EVENT_ATTR_STR(mem-loads,	mem_ld_hsw,	"event=0xcd,umask=0x1,ldlat=3");
 EVENT_ATTR_STR(mem-stores,	mem_st_hsw,	"event=0xd0,umask=0x82")
 
@@ -2796,6 +2826,7 @@ __init int intel_pmu_init(void)
 	case 42: /* SandyBridge */
 	case 45: /* SandyBridge, "Romely-EP" */
 		x86_add_quirk(intel_sandybridge_quirk);
+		x86_add_quirk(intel_ht_bug);
 		memcpy(hw_cache_event_ids, snb_hw_cache_event_ids,
 		       sizeof(hw_cache_event_ids));
 		memcpy(hw_cache_extra_regs, snb_hw_cache_extra_regs,
@@ -2810,6 +2841,8 @@ __init int intel_pmu_init(void)
 			x86_pmu.extra_regs = intel_snbep_extra_regs;
 		else
 			x86_pmu.extra_regs = intel_snb_extra_regs;
+
+
 		/* all extra regs are per-cpu when HT is on */
 		x86_pmu.flags |= PMU_FL_HAS_RSP_1;
 		x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
@@ -2827,6 +2860,7 @@ __init int intel_pmu_init(void)
 		break;
 	case 58: /* IvyBridge */
 	case 62: /* IvyBridge EP */
+		x86_add_quirk(intel_ht_bug);
 		memcpy(hw_cache_event_ids, snb_hw_cache_event_ids,
 		       sizeof(hw_cache_event_ids));
 		memcpy(hw_cache_extra_regs, snb_hw_cache_extra_regs,
@@ -2860,6 +2894,7 @@ __init int intel_pmu_init(void)
 	case 71:
 	case 63:
 	case 69:
+		x86_add_quirk(intel_ht_bug);
 		x86_pmu.late_ack = true;
 		memcpy(hw_cache_event_ids, snb_hw_cache_event_ids, sizeof(hw_cache_event_ids));
 		memcpy(hw_cache_extra_regs, snb_hw_cache_extra_regs, sizeof(hw_cache_extra_regs));
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 7/9] perf/x86: enforce HT bug workaround with PEBS for SNB/IVB/HSW
  2014-06-04 21:34 [PATCH 0/9] perf/x86: implement HT counter corruption workaround Stephane Eranian
                   ` (5 preceding siblings ...)
  2014-06-04 21:34 ` [PATCH 6/9] perf/x86: enforce HT bug workaround for SNB/IVB/HSW Stephane Eranian
@ 2014-06-04 21:34 ` Stephane Eranian
  2014-06-04 21:34 ` [PATCH 8/9] perf/x86: fix intel_get_event_constraints() for dynamic constraints Stephane Eranian
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 59+ messages in thread
From: Stephane Eranian @ 2014-06-04 21:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, jolsa, zheng.z.yan, maria.n.dimakopoulou

From: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>

This patch modifies the PEBS constraint tables for SNB/IVB/HSW
such that corrupting events supporting PEBS activate the HT
workaround.

Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
---
 arch/x86/kernel/cpu/perf_event_intel_ds.c |   40 ++++++++++++++---------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 980970c..b4c6ca5 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -630,10 +630,10 @@ struct event_constraint intel_snb_pebs_event_constraints[] = {
 	INTEL_EVENT_CONSTRAINT(0xc5, 0xf),    /* BR_MISP_RETIRED.* */
 	INTEL_PLD_CONSTRAINT(0x01cd, 0x8),    /* MEM_TRANS_RETIRED.LAT_ABOVE_THR */
 	INTEL_PST_CONSTRAINT(0x02cd, 0x8),    /* MEM_TRANS_RETIRED.PRECISE_STORES */
-	INTEL_EVENT_CONSTRAINT(0xd0, 0xf),    /* MEM_UOP_RETIRED.* */
-	INTEL_EVENT_CONSTRAINT(0xd1, 0xf),    /* MEM_LOAD_UOPS_RETIRED.* */
-	INTEL_EVENT_CONSTRAINT(0xd2, 0xf),    /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
-	INTEL_EVENT_CONSTRAINT(0xd3, 0xf),    /* MEM_LOAD_UOPS_LLC_MISS_RETIRED.* */
+	INTEL_EXCLEVT_CONSTRAINT(0xd0, 0xf),    /* MEM_UOP_RETIRED.* */
+	INTEL_EXCLEVT_CONSTRAINT(0xd1, 0xf),    /* MEM_LOAD_UOPS_RETIRED.* */
+	INTEL_EXCLEVT_CONSTRAINT(0xd2, 0xf),    /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
+	INTEL_EXCLEVT_CONSTRAINT(0xd3, 0xf),    /* MEM_LOAD_UOPS_LLC_MISS_RETIRED.* */
 	INTEL_UEVENT_CONSTRAINT(0x02d4, 0xf), /* MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS */
 	EVENT_CONSTRAINT_END
 };
@@ -646,10 +646,10 @@ struct event_constraint intel_ivb_pebs_event_constraints[] = {
         INTEL_EVENT_CONSTRAINT(0xc5, 0xf),    /* BR_MISP_RETIRED.* */
         INTEL_PLD_CONSTRAINT(0x01cd, 0x8),    /* MEM_TRANS_RETIRED.LAT_ABOVE_THR */
 	INTEL_PST_CONSTRAINT(0x02cd, 0x8),    /* MEM_TRANS_RETIRED.PRECISE_STORES */
-        INTEL_EVENT_CONSTRAINT(0xd0, 0xf),    /* MEM_UOP_RETIRED.* */
-        INTEL_EVENT_CONSTRAINT(0xd1, 0xf),    /* MEM_LOAD_UOPS_RETIRED.* */
-        INTEL_EVENT_CONSTRAINT(0xd2, 0xf),    /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
-        INTEL_EVENT_CONSTRAINT(0xd3, 0xf),    /* MEM_LOAD_UOPS_LLC_MISS_RETIRED.* */
+        INTEL_EXCLEVT_CONSTRAINT(0xd0, 0xf),    /* MEM_UOP_RETIRED.* */
+        INTEL_EXCLEVT_CONSTRAINT(0xd1, 0xf),    /* MEM_LOAD_UOPS_RETIRED.* */
+        INTEL_EXCLEVT_CONSTRAINT(0xd2, 0xf),    /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
+        INTEL_EXCLEVT_CONSTRAINT(0xd3, 0xf),    /* MEM_LOAD_UOPS_LLC_MISS_RETIRED.* */
         EVENT_CONSTRAINT_END
 };
 
@@ -665,24 +665,24 @@ struct event_constraint intel_hsw_pebs_event_constraints[] = {
 	/* MEM_UOPS_RETIRED.STLB_MISS_LOADS */
 	INTEL_UEVENT_CONSTRAINT(0x11d0, 0xf),
 	/* MEM_UOPS_RETIRED.STLB_MISS_STORES */
-	INTEL_UEVENT_CONSTRAINT(0x12d0, 0xf),
-	INTEL_UEVENT_CONSTRAINT(0x21d0, 0xf), /* MEM_UOPS_RETIRED.LOCK_LOADS */
-	INTEL_UEVENT_CONSTRAINT(0x41d0, 0xf), /* MEM_UOPS_RETIRED.SPLIT_LOADS */
+	INTEL_EXCLUEVT_CONSTRAINT(0x12d0, 0xf),
+	INTEL_EXCLUEVT_CONSTRAINT(0x21d0, 0xf), /* MEM_UOPS_RETIRED.LOCK_LOADS */
+	INTEL_EXCLUEVT_CONSTRAINT(0x41d0, 0xf), /* MEM_UOPS_RETIRED.SPLIT_LOADS */
 	/* MEM_UOPS_RETIRED.SPLIT_STORES */
-	INTEL_UEVENT_CONSTRAINT(0x42d0, 0xf),
-	INTEL_UEVENT_CONSTRAINT(0x81d0, 0xf), /* MEM_UOPS_RETIRED.ALL_LOADS */
+	INTEL_EXCLUEVT_CONSTRAINT(0x42d0, 0xf),
+	INTEL_EXCLUEVT_CONSTRAINT(0x81d0, 0xf), /* MEM_UOPS_RETIRED.ALL_LOADS */
 	INTEL_PST_HSW_CONSTRAINT(0x82d0, 0xf), /* MEM_UOPS_RETIRED.ALL_STORES */
-	INTEL_UEVENT_CONSTRAINT(0x01d1, 0xf), /* MEM_LOAD_UOPS_RETIRED.L1_HIT */
-	INTEL_UEVENT_CONSTRAINT(0x02d1, 0xf), /* MEM_LOAD_UOPS_RETIRED.L2_HIT */
-	INTEL_UEVENT_CONSTRAINT(0x04d1, 0xf), /* MEM_LOAD_UOPS_RETIRED.L3_HIT */
+	INTEL_EXCLUEVT_CONSTRAINT(0x01d1, 0xf), /* MEM_LOAD_UOPS_RETIRED.L1_HIT */
+	INTEL_EXCLUEVT_CONSTRAINT(0x02d1, 0xf), /* MEM_LOAD_UOPS_RETIRED.L2_HIT */
+	INTEL_EXCLUEVT_CONSTRAINT(0x04d1, 0xf), /* MEM_LOAD_UOPS_RETIRED.L3_HIT */
 	/* MEM_LOAD_UOPS_RETIRED.HIT_LFB */
-	INTEL_UEVENT_CONSTRAINT(0x40d1, 0xf),
+	INTEL_EXCLUEVT_CONSTRAINT(0x40d1, 0xf),
 	/* MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_MISS */
-	INTEL_UEVENT_CONSTRAINT(0x01d2, 0xf),
+	INTEL_EXCLUEVT_CONSTRAINT(0x01d2, 0xf),
 	/* MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HIT */
-	INTEL_UEVENT_CONSTRAINT(0x02d2, 0xf),
+	INTEL_EXCLUEVT_CONSTRAINT(0x02d2, 0xf),
 	/* MEM_LOAD_UOPS_LLC_MISS_RETIRED.LOCAL_DRAM */
-	INTEL_UEVENT_CONSTRAINT(0x01d3, 0xf),
+	INTEL_EXCLUEVT_CONSTRAINT(0x01d3, 0xf),
 	INTEL_UEVENT_CONSTRAINT(0x04c8, 0xf), /* HLE_RETIRED.Abort */
 	INTEL_UEVENT_CONSTRAINT(0x04c9, 0xf), /* RTM_RETIRED.Abort */
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 8/9] perf/x86: fix intel_get_event_constraints() for dynamic constraints
  2014-06-04 21:34 [PATCH 0/9] perf/x86: implement HT counter corruption workaround Stephane Eranian
                   ` (6 preceding siblings ...)
  2014-06-04 21:34 ` [PATCH 7/9] perf/x86: enforce HT bug workaround with PEBS " Stephane Eranian
@ 2014-06-04 21:34 ` Stephane Eranian
  2014-06-04 21:34 ` [PATCH 9/9] perf/x86: add syfs entry to disable HT bug workaround Stephane Eranian
  2014-06-04 22:28 ` [PATCH 0/9] perf/x86: implement HT counter corruption workaround Andi Kleen
  9 siblings, 0 replies; 59+ messages in thread
From: Stephane Eranian @ 2014-06-04 21:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, jolsa, zheng.z.yan, maria.n.dimakopoulou

With dynamic constraint, we need to restart from the static
constraints each time the intel_get_event_constraints() is called.

Reviewed-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
---
 arch/x86/kernel/cpu/perf_event_intel.c |   15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index b1a2684..4fb4fe6 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1849,20 +1849,25 @@ intel_get_excl_constraints(struct cpu_hw_events *cpuc, struct perf_event *event,
 static struct event_constraint *
 intel_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
 {
-	struct event_constraint *c = event->hw.constraint;
+	struct event_constraint *c1 = event->hw.constraint;
+	struct event_constraint *c2;
 
 	/*
 	 * first time only
 	 * - static constraint: no change across incremental scheduling calls
 	 * - dynamic constraint: handled by intel_get_excl_constraints()
 	 */
-	if (!c)
-		c = __intel_get_event_constraints(cpuc, event);
+	c2 = __intel_get_event_constraints(cpuc, event);
+	if (c1 && (c1->flags & PERF_X86_EVENT_DYNAMIC)) {
+		bitmap_copy(c1->idxmsk, c2->idxmsk, X86_PMC_IDX_MAX);
+		c1->weight = c2->weight;
+		c2 = c1;
+	}
 
 	if (cpuc->excl_cntrs)
-		return intel_get_excl_constraints(cpuc, event, c);
+		return intel_get_excl_constraints(cpuc, event, c2);
 
-	return c;
+	return c2;
 }
 
 static void intel_put_excl_constraints(struct cpu_hw_events *cpuc,
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 9/9] perf/x86: add syfs entry to disable HT bug workaround
  2014-06-04 21:34 [PATCH 0/9] perf/x86: implement HT counter corruption workaround Stephane Eranian
                   ` (7 preceding siblings ...)
  2014-06-04 21:34 ` [PATCH 8/9] perf/x86: fix intel_get_event_constraints() for dynamic constraints Stephane Eranian
@ 2014-06-04 21:34 ` Stephane Eranian
  2014-06-05  8:32   ` Matt Fleming
  2014-06-04 22:28 ` [PATCH 0/9] perf/x86: implement HT counter corruption workaround Andi Kleen
  9 siblings, 1 reply; 59+ messages in thread
From: Stephane Eranian @ 2014-06-04 21:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, jolsa, zheng.z.yan, maria.n.dimakopoulou

From: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>

This patch adds a sysfs entry:

	/sys/devices/cpu/ht_bug_workaround

to activate/deactivate the PMU HT bug workaround.

To activate (activated by default):
 # echo 1 > /sys/devices/cpu/ht_bug_workaround

To deactivate:
 # echo 0 > /sys/devices/cpu/ht_bug_workaround

Results effective only once there is no more active
events.

Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
---
 arch/x86/kernel/cpu/perf_event.c       |   31 +++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/perf_event.h       |    5 +++++
 arch/x86/kernel/cpu/perf_event_intel.c |    4 ++--
 3 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 314458a..fdea88e 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1876,10 +1876,41 @@ static ssize_t set_attr_rdpmc(struct device *cdev,
 	return count;
 }
 
+static ssize_t get_attr_xsu(struct device *cdev,
+			      struct device_attribute *attr,
+			      char *buf)
+{
+	int ff = is_ht_workaround_enabled();
+	return snprintf(buf, 40, "%d\n", ff);
+}
+
+static ssize_t set_attr_xsu(struct device *cdev,
+			      struct device_attribute *attr,
+			      const char *buf, size_t count)
+{
+	unsigned long val;
+	int ff = is_ht_workaround_enabled();
+	ssize_t ret;
+
+	ret = kstrtoul(buf, 0, &val);
+	if (ret)
+		return ret;
+
+	if (!!val != ff) {
+		if (!val)
+			x86_pmu.flags &= ~PMU_FL_EXCL_ENABLED;
+		else
+			x86_pmu.flags |= PMU_FL_EXCL_ENABLED;
+	}
+	return count;
+}
+
 static DEVICE_ATTR(rdpmc, S_IRUSR | S_IWUSR, get_attr_rdpmc, set_attr_rdpmc);
+static DEVICE_ATTR(ht_bug_workaround, S_IRUSR | S_IWUSR, get_attr_xsu, set_attr_xsu);
 
 static struct attribute *x86_pmu_attrs[] = {
 	&dev_attr_rdpmc.attr,
+	&dev_attr_ht_bug_workaround.attr,
 	NULL,
 };
 
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index c61ca4a..2e7c6a7 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -537,6 +537,7 @@ do {									\
 #define PMU_FL_NO_HT_SHARING	0x1 /* no hyper-threading resource sharing */
 #define PMU_FL_HAS_RSP_1	0x2 /* has 2 equivalent offcore_rsp regs   */
 #define PMU_FL_EXCL_CNTRS	0x4 /* has exclusive counter requirements  */
+#define PMU_FL_EXCL_ENABLED	0x8 /* exclusive counter active */
 
 #define EVENT_VAR(_id)  event_attr_##_id
 #define EVENT_PTR(_id) &event_attr_##_id.attr.attr
@@ -769,6 +770,10 @@ int knc_pmu_init(void);
 ssize_t events_sysfs_show(struct device *dev, struct device_attribute *attr,
 			  char *page);
 
+static inline int is_ht_workaround_enabled(void)
+{
+	return !!(x86_pmu.flags & PMU_FL_EXCL_ENABLED);
+}
 #else /* CONFIG_CPU_SUP_INTEL */
 
 static inline void reserve_ds_buffers(void)
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 4fb4fe6..7040c41 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1747,7 +1747,7 @@ intel_get_excl_constraints(struct cpu_hw_events *cpuc, struct perf_event *event,
 	 * validating a group does not require
 	 * enforcing cross-thread  exclusion
 	 */
-	if (cpuc->is_fake)
+	if (cpuc->is_fake || !is_ht_workaround_enabled())
 		return c;
 
 	/*
@@ -2610,7 +2610,7 @@ static __init void intel_nehalem_quirk(void)
  */
 static __init void intel_ht_bug(void)
 {
-	x86_pmu.flags |= PMU_FL_EXCL_CNTRS;
+	x86_pmu.flags |= PMU_FL_EXCL_CNTRS | PMU_FL_EXCL_ENABLED;
 
 	x86_pmu.commit_scheduling = intel_commit_scheduling;
 	x86_pmu.start_scheduling = intel_start_scheduling;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH 0/9] perf/x86: implement HT counter corruption workaround
  2014-06-04 21:34 [PATCH 0/9] perf/x86: implement HT counter corruption workaround Stephane Eranian
                   ` (8 preceding siblings ...)
  2014-06-04 21:34 ` [PATCH 9/9] perf/x86: add syfs entry to disable HT bug workaround Stephane Eranian
@ 2014-06-04 22:28 ` Andi Kleen
  2014-06-05 12:45   ` Stephane Eranian
  9 siblings, 1 reply; 59+ messages in thread
From: Andi Kleen @ 2014-06-04 22:28 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: linux-kernel, peterz, mingo, jolsa, zheng.z.yan, maria.n.dimakopoulou

> There is no HW or FW workaround for this problem.

Actually there is for global measurements: 

measure with Any bit set and divide by two.

Please add a check that the any bit is set, and disable
the workaround for that case.

-Andi


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 4/9] perf/x86: add cross-HT counter exclusion infrastructure
  2014-06-04 21:34 ` [PATCH 4/9] perf/x86: add cross-HT counter exclusion infrastructure Stephane Eranian
@ 2014-06-05  7:47   ` Peter Zijlstra
  2014-06-05 10:51     ` Stephane Eranian
  2014-06-05  8:04   ` Peter Zijlstra
  2014-06-05  8:29   ` Peter Zijlstra
  2 siblings, 1 reply; 59+ messages in thread
From: Peter Zijlstra @ 2014-06-05  7:47 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: linux-kernel, mingo, ak, jolsa, zheng.z.yan, maria.n.dimakopoulou

[-- Attachment #1: Type: text/plain, Size: 2886 bytes --]

On Wed, Jun 04, 2014 at 11:34:13PM +0200, Stephane Eranian wrote:
> From: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
> 
> This patch adds a new shared_regs style structure to the
> per-cpu x86 state (cpuc). It is used to coordinate access
> between counters which must be used with exclusion across
> HyperThreads on Intel processors. This new struct is not
> needed on each PMU, thus is is allocated on demand.
> 
> Reviewed-by: Stephane Eranian <eranian@google.com>
> Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
> ---
>  arch/x86/kernel/cpu/perf_event.h       |   40 ++++++++++++++++++--
>  arch/x86/kernel/cpu/perf_event_intel.c |   63 +++++++++++++++++++++++++++++---
>  2 files changed, 94 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
> index 413799f..5da0a2b 100644
> --- a/arch/x86/kernel/cpu/perf_event.h
> +++ b/arch/x86/kernel/cpu/perf_event.h
> @@ -65,10 +65,11 @@ struct event_constraint {
>  /*
>   * struct hw_perf_event.flags flags
>   */
> -#define PERF_X86_EVENT_PEBS_LDLAT	0x1 /* ld+ldlat data address sampling */
> -#define PERF_X86_EVENT_PEBS_ST		0x2 /* st data address sampling */
> -#define PERF_X86_EVENT_PEBS_ST_HSW	0x4 /* haswell style st data sampling */
> -#define PERF_X86_EVENT_COMMITTED	0x8 /* event passed commit_txn */
> +#define PERF_X86_EVENT_PEBS_LDLAT	0x01 /* ld+ldlat data address sampling */
> +#define PERF_X86_EVENT_PEBS_ST		0x02 /* st data address sampling */
> +#define PERF_X86_EVENT_PEBS_ST_HSW	0x04 /* haswell style st data sampling */
> +#define PERF_X86_EVENT_COMMITTED	0x08 /* event passed commit_txn */
> +#define PERF_X86_EVENT_EXCL		0x10 /* HT exclusivity on counter */
>  
>  struct amd_nb {
>  	int nb_id;  /* NorthBridge id */
> @@ -119,6 +120,27 @@ struct intel_shared_regs {
>  	unsigned                core_id;	/* per-core: core id */
>  };
>  
> +enum intel_excl_state_type {
> +	INTEL_EXCL_UNUSED    = 0, /* counter is unused */
> +	INTEL_EXCL_SHARED    = 1, /* counter can be used by both threads */
> +	INTEL_EXCL_EXCLUSIVE = 2, /* counter can be used by one thread only */
> +};
> +
> +struct intel_excl_states {
> +	enum intel_excl_state_type init_state[X86_PMC_IDX_MAX];
> +	enum intel_excl_state_type state[X86_PMC_IDX_MAX];
> +};
> +
> +struct intel_excl_cntrs {
> +	spinlock_t	lock;
> +	unsigned long	lock_flags;
> +
> +	struct intel_excl_states states[2];
> +
> +	int		refcnt;		/* per-core: #HT threads */
> +	unsigned	core_id;	/* per-core: core id */
> +};

This must be a raw_spin_lock_t, its taken from pmu::add() which is
called under perf_event_context::lock, which is raw_spinlock_t, as its
taken under rq::lock, which too is raw_spinlock_t.

I should really get around to fixing these errors and include the
lockdep infrastructure for this.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 4/9] perf/x86: add cross-HT counter exclusion infrastructure
  2014-06-04 21:34 ` [PATCH 4/9] perf/x86: add cross-HT counter exclusion infrastructure Stephane Eranian
  2014-06-05  7:47   ` Peter Zijlstra
@ 2014-06-05  8:04   ` Peter Zijlstra
  2014-06-05 13:36     ` Maria Dimakopoulou
  2014-06-05  8:29   ` Peter Zijlstra
  2 siblings, 1 reply; 59+ messages in thread
From: Peter Zijlstra @ 2014-06-05  8:04 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: linux-kernel, mingo, ak, jolsa, zheng.z.yan, maria.n.dimakopoulou

[-- Attachment #1: Type: text/plain, Size: 407 bytes --]

On Wed, Jun 04, 2014 at 11:34:13PM +0200, Stephane Eranian wrote:
> @@ -499,6 +530,7 @@ do {									\
>   */
>  #define PMU_FL_NO_HT_SHARING	0x1 /* no hyper-threading resource sharing */
>  #define PMU_FL_HAS_RSP_1	0x2 /* has 2 equivalent offcore_rsp regs   */
> +#define PMU_FL_EXCL_CNTRS	0x4 /* has exclusive counter requirements  */

the EXLC thing is HT specific too, right? How about HT_EXCL or so?



[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 4/9] perf/x86: add cross-HT counter exclusion infrastructure
  2014-06-04 21:34 ` [PATCH 4/9] perf/x86: add cross-HT counter exclusion infrastructure Stephane Eranian
  2014-06-05  7:47   ` Peter Zijlstra
  2014-06-05  8:04   ` Peter Zijlstra
@ 2014-06-05  8:29   ` Peter Zijlstra
  2014-06-05 21:33     ` Andi Kleen
  2014-06-10 11:53     ` Stephane Eranian
  2 siblings, 2 replies; 59+ messages in thread
From: Peter Zijlstra @ 2014-06-05  8:29 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: linux-kernel, mingo, ak, jolsa, zheng.z.yan, maria.n.dimakopoulou

[-- Attachment #1: Type: text/plain, Size: 922 bytes --]

On Wed, Jun 04, 2014 at 11:34:13PM +0200, Stephane Eranian wrote:
> @@ -2020,12 +2050,29 @@ static void intel_pmu_cpu_starting(int cpu)
>  
>  	if (x86_pmu.lbr_sel_map)
>  		cpuc->lbr_sel = &cpuc->shared_regs->regs[EXTRA_REG_LBR];
> +
> +	if (x86_pmu.flags & PMU_FL_EXCL_CNTRS) {
> +		for_each_cpu(i, topology_thread_cpumask(cpu)) {
> +			struct intel_excl_cntrs *c;
> +
> +			c = per_cpu(cpu_hw_events, i).excl_cntrs;
> +			if (c && c->core_id == core_id) {
> +				cpuc->kfree_on_online[1] = cpuc->excl_cntrs;
> +				cpuc->excl_cntrs = c;
> +				cpuc->excl_thread_id = 1;
> +				break;
> +			}
> +		}
> +		cpuc->excl_cntrs->core_id = core_id;
> +		cpuc->excl_cntrs->refcnt++;
> +	}
>  }

This hard assumes theres only ever 2 threads, which is true and I
suppose more in arch/x86 will come apart the moment Intel makes a chip
with more, still, do we have topology_thread_id() or so to cure this?

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 9/9] perf/x86: add syfs entry to disable HT bug workaround
  2014-06-04 21:34 ` [PATCH 9/9] perf/x86: add syfs entry to disable HT bug workaround Stephane Eranian
@ 2014-06-05  8:32   ` Matt Fleming
  2014-06-05  9:29     ` Stephane Eranian
  0 siblings, 1 reply; 59+ messages in thread
From: Matt Fleming @ 2014-06-05  8:32 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: linux-kernel, peterz, mingo, ak, jolsa, zheng.z.yan,
	maria.n.dimakopoulou

On 4 June 2014 22:34, Stephane Eranian <eranian@google.com> wrote:
> From: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
>
> This patch adds a sysfs entry:
>
>         /sys/devices/cpu/ht_bug_workaround
>
> to activate/deactivate the PMU HT bug workaround.
>
> To activate (activated by default):
>  # echo 1 > /sys/devices/cpu/ht_bug_workaround
>
> To deactivate:
>  # echo 0 > /sys/devices/cpu/ht_bug_workaround

If your hardware contains this erratum, why would you ever want to
disable the workaround? Providing the user with the option of turning
this off seems like a bad idea.

I suspect that users will rarely know whether they can legitimately
disable this.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 9/9] perf/x86: add syfs entry to disable HT bug workaround
  2014-06-05  8:32   ` Matt Fleming
@ 2014-06-05  9:29     ` Stephane Eranian
  2014-06-05 10:01       ` Matt Fleming
  2014-06-05 13:19       ` Peter Zijlstra
  0 siblings, 2 replies; 59+ messages in thread
From: Stephane Eranian @ 2014-06-05  9:29 UTC (permalink / raw)
  To: Matt Fleming
  Cc: LKML, Peter Zijlstra, mingo, ak, Jiri Olsa, Yan, Zheng,
	Maria Dimakopoulou

On Thu, Jun 5, 2014 at 10:32 AM, Matt Fleming <matt@console-pimps.org> wrote:
> On 4 June 2014 22:34, Stephane Eranian <eranian@google.com> wrote:
>> From: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
>>
>> This patch adds a sysfs entry:
>>
>>         /sys/devices/cpu/ht_bug_workaround
>>
>> to activate/deactivate the PMU HT bug workaround.
>>
>> To activate (activated by default):
>>  # echo 1 > /sys/devices/cpu/ht_bug_workaround
>>
>> To deactivate:
>>  # echo 0 > /sys/devices/cpu/ht_bug_workaround
>
> If your hardware contains this erratum, why would you ever want to
> disable the workaround? Providing the user with the option of turning
> this off seems like a bad idea.
>
> I suspect that users will rarely know whether they can legitimately
> disable this.

If you know what you are doing (poweruser), then there are measurements
which works fine with the HT erratum.  This is why we have the option.

For instance if you only measure events 4x4 in system-wide mode
and you know which counters these event are going to use, you don't
need the workaround. For instance:

# perf stat -a -e r81d0,r01d1,r08d0,r20d1 sleep 5

Works well if you have a uniform workload across all CPUs.
All those events leak, but the leaks balance themselves and you
get the correct counts in the end. The advantage is that you don't
have to multiplex. With the workaround enable, this would multiplex
a lot.

But as I said, this is for experts only.

Another reason is for systems with HT disabled. It turned out to be
very difficult to determine at kernel BOOT TIME if HT was enabled
or not. Note what I said: ENABLED and not SUPPORTED. The latter is
easy to detect. The former needs some model specific code which is
quite complicated. I wish the kernel had this capability abstracted
somehow. Consequently, the workaround is always enabled. When
HT is disabled, there won't be multiplexing because there will never
be conflict, but you pay a little price for accessing the extra data
state. An init script could well detect HT is off and thus disable the
workaround altogether.

Those are the two main reasons for this control in sysfs.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 9/9] perf/x86: add syfs entry to disable HT bug workaround
  2014-06-05  9:29     ` Stephane Eranian
@ 2014-06-05 10:01       ` Matt Fleming
  2014-06-05 10:19         ` Stephane Eranian
  2014-06-05 13:19       ` Peter Zijlstra
  1 sibling, 1 reply; 59+ messages in thread
From: Matt Fleming @ 2014-06-05 10:01 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: LKML, Peter Zijlstra, mingo, ak, Jiri Olsa, Yan, Zheng,
	Maria Dimakopoulou

On 5 June 2014 10:29, Stephane Eranian <eranian@google.com> wrote:
>
> If you know what you are doing (poweruser), then there are measurements
> which works fine with the HT erratum.  This is why we have the option.
>
> For instance if you only measure events 4x4 in system-wide mode
> and you know which counters these event are going to use, you don't
> need the workaround. For instance:
>
> # perf stat -a -e r81d0,r01d1,r08d0,r20d1 sleep 5
>
> Works well if you have a uniform workload across all CPUs.
> All those events leak, but the leaks balance themselves and you
> get the correct counts in the end. The advantage is that you don't
> have to multiplex. With the workaround enable, this would multiplex
> a lot.
>
> But as I said, this is for experts only.

Is it not possible to detect this in the kernel and only enable the
workaround for the case where the leaks don't balance? It may not be
possible (or practical) but I do think it's worth having the
discussion.

> Another reason is for systems with HT disabled. It turned out to be
> very difficult to determine at kernel BOOT TIME if HT was enabled
> or not. Note what I said: ENABLED and not SUPPORTED. The latter is
> easy to detect. The former needs some model specific code which is
> quite complicated. I wish the kernel had this capability abstracted
> somehow. Consequently, the workaround is always enabled. When
> HT is disabled, there won't be multiplexing because there will never
> be conflict, but you pay a little price for accessing the extra data
> state.

Does cpu_sibling_map not give you some indication of whether HT is
enabled? I think the topology_thread_cpumask() is the topology API for
that. But I could most definitely be wrong. Hopefully someone on the
Cc list will know.

>An init script could well detect HT is off and thus disable the workaround altogether.

This is exactly the kind of thing I think we should try to avoid. The
ideal is that things just work out of the box and don't require these
magic knobs to be tweaked.

> Those are the two main reasons for this control in sysfs.

Thanks for the info!

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 9/9] perf/x86: add syfs entry to disable HT bug workaround
  2014-06-05 10:01       ` Matt Fleming
@ 2014-06-05 10:19         ` Stephane Eranian
  2014-06-05 11:16           ` Matt Fleming
  0 siblings, 1 reply; 59+ messages in thread
From: Stephane Eranian @ 2014-06-05 10:19 UTC (permalink / raw)
  To: Matt Fleming
  Cc: LKML, Peter Zijlstra, mingo, ak, Jiri Olsa, Yan, Zheng,
	Maria Dimakopoulou

On Thu, Jun 5, 2014 at 12:01 PM, Matt Fleming <matt@console-pimps.org> wrote:
> On 5 June 2014 10:29, Stephane Eranian <eranian@google.com> wrote:
>>
>> If you know what you are doing (poweruser), then there are measurements
>> which works fine with the HT erratum.  This is why we have the option.
>>
>> For instance if you only measure events 4x4 in system-wide mode
>> and you know which counters these event are going to use, you don't
>> need the workaround. For instance:
>>
>> # perf stat -a -e r81d0,r01d1,r08d0,r20d1 sleep 5
>>
>> Works well if you have a uniform workload across all CPUs.
>> All those events leak, but the leaks balance themselves and you
>> get the correct counts in the end. The advantage is that you don't
>> have to multiplex. With the workaround enable, this would multiplex
>> a lot.
>>
>> But as I said, this is for experts only.
>
> Is it not possible to detect this in the kernel and only enable the
> workaround for the case where the leaks don't balance? It may not be
> possible (or practical) but I do think it's worth having the
> discussion.
>
How would you know that you have a uniform workload from inside
the kernel?

>> Another reason is for systems with HT disabled. It turned out to be
>> very difficult to determine at kernel BOOT TIME if HT was enabled
>> or not. Note what I said: ENABLED and not SUPPORTED. The latter is
>> easy to detect. The former needs some model specific code which is
>> quite complicated. I wish the kernel had this capability abstracted
>> somehow. Consequently, the workaround is always enabled. When
>> HT is disabled, there won't be multiplexing because there will never
>> be conflict, but you pay a little price for accessing the extra data
>> state.
>
> Does cpu_sibling_map not give you some indication of whether HT is
> enabled? I think the topology_thread_cpumask() is the topology API for
> that. But I could most definitely be wrong. Hopefully someone on the
> Cc list will know.
>
Remember trying some of that, but when perf_event is initialized, those
masks are not yet setup properly.

>>An init script could well detect HT is off and thus disable the workaround altogether.
>
> This is exactly the kind of thing I think we should try to avoid. The
> ideal is that things just work out of the box and don't require these
> magic knobs to be tweaked.
>
>> Those are the two main reasons for this control in sysfs.
>
> Thanks for the info!

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 4/9] perf/x86: add cross-HT counter exclusion infrastructure
  2014-06-05  7:47   ` Peter Zijlstra
@ 2014-06-05 10:51     ` Stephane Eranian
  0 siblings, 0 replies; 59+ messages in thread
From: Stephane Eranian @ 2014-06-05 10:51 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, mingo, ak, Jiri Olsa, Yan, Zheng, Maria Dimakopoulou

On Thu, Jun 5, 2014 at 9:47 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Jun 04, 2014 at 11:34:13PM +0200, Stephane Eranian wrote:
>> From: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
>>
>> This patch adds a new shared_regs style structure to the
>> per-cpu x86 state (cpuc). It is used to coordinate access
>> between counters which must be used with exclusion across
>> HyperThreads on Intel processors. This new struct is not
>> needed on each PMU, thus is is allocated on demand.
>>
>> Reviewed-by: Stephane Eranian <eranian@google.com>
>> Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
>> ---
>>  arch/x86/kernel/cpu/perf_event.h       |   40 ++++++++++++++++++--
>>  arch/x86/kernel/cpu/perf_event_intel.c |   63 +++++++++++++++++++++++++++++---
>>  2 files changed, 94 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
>> index 413799f..5da0a2b 100644
>> --- a/arch/x86/kernel/cpu/perf_event.h
>> +++ b/arch/x86/kernel/cpu/perf_event.h
>> @@ -65,10 +65,11 @@ struct event_constraint {
>>  /*
>>   * struct hw_perf_event.flags flags
>>   */
>> -#define PERF_X86_EVENT_PEBS_LDLAT    0x1 /* ld+ldlat data address sampling */
>> -#define PERF_X86_EVENT_PEBS_ST               0x2 /* st data address sampling */
>> -#define PERF_X86_EVENT_PEBS_ST_HSW   0x4 /* haswell style st data sampling */
>> -#define PERF_X86_EVENT_COMMITTED     0x8 /* event passed commit_txn */
>> +#define PERF_X86_EVENT_PEBS_LDLAT    0x01 /* ld+ldlat data address sampling */
>> +#define PERF_X86_EVENT_PEBS_ST               0x02 /* st data address sampling */
>> +#define PERF_X86_EVENT_PEBS_ST_HSW   0x04 /* haswell style st data sampling */
>> +#define PERF_X86_EVENT_COMMITTED     0x08 /* event passed commit_txn */
>> +#define PERF_X86_EVENT_EXCL          0x10 /* HT exclusivity on counter */
>>
>>  struct amd_nb {
>>       int nb_id;  /* NorthBridge id */
>> @@ -119,6 +120,27 @@ struct intel_shared_regs {
>>       unsigned                core_id;        /* per-core: core id */
>>  };
>>
>> +enum intel_excl_state_type {
>> +     INTEL_EXCL_UNUSED    = 0, /* counter is unused */
>> +     INTEL_EXCL_SHARED    = 1, /* counter can be used by both threads */
>> +     INTEL_EXCL_EXCLUSIVE = 2, /* counter can be used by one thread only */
>> +};
>> +
>> +struct intel_excl_states {
>> +     enum intel_excl_state_type init_state[X86_PMC_IDX_MAX];
>> +     enum intel_excl_state_type state[X86_PMC_IDX_MAX];
>> +};
>> +
>> +struct intel_excl_cntrs {
>> +     spinlock_t      lock;
>> +     unsigned long   lock_flags;
>> +
>> +     struct intel_excl_states states[2];
>> +
>> +     int             refcnt;         /* per-core: #HT threads */
>> +     unsigned        core_id;        /* per-core: core id */
>> +};
>
> This must be a raw_spin_lock_t, its taken from pmu::add() which is
> called under perf_event_context::lock, which is raw_spinlock_t, as its
> taken under rq::lock, which too is raw_spinlock_t.
>
Will fix this in V2.

> I should really get around to fixing these errors and include the
> lockdep infrastructure for this.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 9/9] perf/x86: add syfs entry to disable HT bug workaround
  2014-06-05 10:19         ` Stephane Eranian
@ 2014-06-05 11:16           ` Matt Fleming
  2014-06-05 12:02             ` Stephane Eranian
  2014-06-05 12:50             ` Peter Zijlstra
  0 siblings, 2 replies; 59+ messages in thread
From: Matt Fleming @ 2014-06-05 11:16 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: LKML, Peter Zijlstra, mingo, ak, Jiri Olsa, Yan, Zheng,
	Maria Dimakopoulou

On 5 June 2014 11:19, Stephane Eranian <eranian@google.com> wrote:
> How would you know that you have a uniform workload from inside
> the kernel?

That's what I'm asking you ;-)

>> Does cpu_sibling_map not give you some indication of whether HT is
>> enabled? I think the topology_thread_cpumask() is the topology API for
>> that. But I could most definitely be wrong. Hopefully someone on the
>> Cc list will know.
>>
> Remember trying some of that, but when perf_event is initialized, those
> masks are not yet setup properly.

Oh, bummer.

If there's no way to detect whether we should enable this workaround
at runtime (and it sounds like there isn't a good way), then that's
fair enough.

We should think twice about allowing it to be disabled via sysfs,
however. Because what is guaranteed to happen is that some user will
report getting bogus results from perf for these events and we'll
spend days trying to figure out why, only to discover they disabled
the workaround and didn't tell us or didn't realise that they'd
disabled it.

If the workaround is low overhead, can't we just leave it enabled?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 9/9] perf/x86: add syfs entry to disable HT bug workaround
  2014-06-05 11:16           ` Matt Fleming
@ 2014-06-05 12:02             ` Stephane Eranian
  2014-06-05 13:27               ` Borislav Petkov
  2014-06-05 12:50             ` Peter Zijlstra
  1 sibling, 1 reply; 59+ messages in thread
From: Stephane Eranian @ 2014-06-05 12:02 UTC (permalink / raw)
  To: Matt Fleming
  Cc: LKML, Peter Zijlstra, mingo, ak, Jiri Olsa, Yan, Zheng,
	Maria Dimakopoulou

On Thu, Jun 5, 2014 at 1:16 PM, Matt Fleming <matt@console-pimps.org> wrote:
> On 5 June 2014 11:19, Stephane Eranian <eranian@google.com> wrote:
>> How would you know that you have a uniform workload from inside
>> the kernel?
>
> That's what I'm asking you ;-)
>
No way to know this otherwise we could play some tricks.

>>> Does cpu_sibling_map not give you some indication of whether HT is
>>> enabled? I think the topology_thread_cpumask() is the topology API for
>>> that. But I could most definitely be wrong. Hopefully someone on the
>>> Cc list will know.
>>>
>> Remember trying some of that, but when perf_event is initialized, those
>> masks are not yet setup properly.
>
> Oh, bummer.
>
I think those should be initialized earlier on during booting.

> If there's no way to detect whether we should enable this workaround
> at runtime (and it sounds like there isn't a good way), then that's
> fair enough.
>
> We should think twice about allowing it to be disabled via sysfs,
> however. Because what is guaranteed to happen is that some user will
> report getting bogus results from perf for these events and we'll
> spend days trying to figure out why, only to discover they disabled
> the workaround and didn't tell us or didn't realise that they'd
> disabled it.
>
> If the workaround is low overhead, can't we just leave it enabled?

It is enabled by default. Nothing is done to try and disable it later
even once the kernel is fully booted. So this is mostly for testing
and power-users.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 0/9] perf/x86: implement HT counter corruption workaround
  2014-06-04 22:28 ` [PATCH 0/9] perf/x86: implement HT counter corruption workaround Andi Kleen
@ 2014-06-05 12:45   ` Stephane Eranian
  0 siblings, 0 replies; 59+ messages in thread
From: Stephane Eranian @ 2014-06-05 12:45 UTC (permalink / raw)
  To: Andi Kleen
  Cc: LKML, Peter Zijlstra, mingo, Jiri Olsa, Yan, Zheng, Maria Dimakopoulou

On Thu, Jun 5, 2014 at 12:28 AM, Andi Kleen <ak@linux.intel.com> wrote:
>> There is no HW or FW workaround for this problem.
>
> Actually there is for global measurements:
>
> measure with Any bit set and divide by two.
>
> Please add a check that the any bit is set, and disable
> the workaround for that case.
>
I think this assume same event is measured on sibling counters at any time.
Otherwise, how does that eliminate the corruption?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 9/9] perf/x86: add syfs entry to disable HT bug workaround
  2014-06-05 11:16           ` Matt Fleming
  2014-06-05 12:02             ` Stephane Eranian
@ 2014-06-05 12:50             ` Peter Zijlstra
  2014-06-05 12:55               ` Stephane Eranian
  1 sibling, 1 reply; 59+ messages in thread
From: Peter Zijlstra @ 2014-06-05 12:50 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Stephane Eranian, LKML, mingo, ak, Jiri Olsa, Yan, Zheng,
	Maria Dimakopoulou

[-- Attachment #1: Type: text/plain, Size: 821 bytes --]

On Thu, Jun 05, 2014 at 12:16:01PM +0100, Matt Fleming wrote:
> On 5 June 2014 11:19, Stephane Eranian <eranian@google.com> wrote:
> > How would you know that you have a uniform workload from inside
> > the kernel?
> 
> That's what I'm asking you ;-)
> 
> >> Does cpu_sibling_map not give you some indication of whether HT is
> >> enabled? I think the topology_thread_cpumask() is the topology API for
> >> that. But I could most definitely be wrong. Hopefully someone on the
> >> Cc list will know.
> >>
> > Remember trying some of that, but when perf_event is initialized, those
> > masks are not yet setup properly.
> 
> Oh, bummer.

So we init perf very early to get nmi-watchdog up and running, but
there's no reason you cannot register a second initcall later and flip
the switch from it there.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 9/9] perf/x86: add syfs entry to disable HT bug workaround
  2014-06-05 12:50             ` Peter Zijlstra
@ 2014-06-05 12:55               ` Stephane Eranian
  2014-06-05 12:59                 ` Peter Zijlstra
  0 siblings, 1 reply; 59+ messages in thread
From: Stephane Eranian @ 2014-06-05 12:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Matt Fleming, LKML, mingo, ak, Jiri Olsa, Yan, Zheng, Maria Dimakopoulou

On Thu, Jun 5, 2014 at 2:50 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Jun 05, 2014 at 12:16:01PM +0100, Matt Fleming wrote:
>> On 5 June 2014 11:19, Stephane Eranian <eranian@google.com> wrote:
>> > How would you know that you have a uniform workload from inside
>> > the kernel?
>>
>> That's what I'm asking you ;-)
>>
>> >> Does cpu_sibling_map not give you some indication of whether HT is
>> >> enabled? I think the topology_thread_cpumask() is the topology API for
>> >> that. But I could most definitely be wrong. Hopefully someone on the
>> >> Cc list will know.
>> >>
>> > Remember trying some of that, but when perf_event is initialized, those
>> > masks are not yet setup properly.
>>
>> Oh, bummer.
>
> So we init perf very early to get nmi-watchdog up and running, but
> there's no reason you cannot register a second initcall later and flip
> the switch from it there.

and what initcall would that be?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 9/9] perf/x86: add syfs entry to disable HT bug workaround
  2014-06-05 12:55               ` Stephane Eranian
@ 2014-06-05 12:59                 ` Peter Zijlstra
  2014-06-05 13:16                   ` Stephane Eranian
  0 siblings, 1 reply; 59+ messages in thread
From: Peter Zijlstra @ 2014-06-05 12:59 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Matt Fleming, LKML, mingo, ak, Jiri Olsa, Yan, Zheng, Maria Dimakopoulou

[-- Attachment #1: Type: text/plain, Size: 1142 bytes --]

On Thu, Jun 05, 2014 at 02:55:05PM +0200, Stephane Eranian wrote:
> On Thu, Jun 5, 2014 at 2:50 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Thu, Jun 05, 2014 at 12:16:01PM +0100, Matt Fleming wrote:
> >> On 5 June 2014 11:19, Stephane Eranian <eranian@google.com> wrote:
> >> > How would you know that you have a uniform workload from inside
> >> > the kernel?
> >>
> >> That's what I'm asking you ;-)
> >>
> >> >> Does cpu_sibling_map not give you some indication of whether HT is
> >> >> enabled? I think the topology_thread_cpumask() is the topology API for
> >> >> that. But I could most definitely be wrong. Hopefully someone on the
> >> >> Cc list will know.
> >> >>
> >> > Remember trying some of that, but when perf_event is initialized, those
> >> > masks are not yet setup properly.
> >>
> >> Oh, bummer.
> >
> > So we init perf very early to get nmi-watchdog up and running, but
> > there's no reason you cannot register a second initcall later and flip
> > the switch from it there.
> 
> and what initcall would that be?

Pretty much anything !early_initcall() is ran after SMP bringup iirc.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 9/9] perf/x86: add syfs entry to disable HT bug workaround
  2014-06-05 12:59                 ` Peter Zijlstra
@ 2014-06-05 13:16                   ` Stephane Eranian
  2014-06-05 13:26                     ` Peter Zijlstra
  0 siblings, 1 reply; 59+ messages in thread
From: Stephane Eranian @ 2014-06-05 13:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Matt Fleming, LKML, mingo, ak, Jiri Olsa, Yan, Zheng, Maria Dimakopoulou

On Thu, Jun 5, 2014 at 2:59 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Jun 05, 2014 at 02:55:05PM +0200, Stephane Eranian wrote:
>> On Thu, Jun 5, 2014 at 2:50 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Thu, Jun 05, 2014 at 12:16:01PM +0100, Matt Fleming wrote:
>> >> On 5 June 2014 11:19, Stephane Eranian <eranian@google.com> wrote:
>> >> > How would you know that you have a uniform workload from inside
>> >> > the kernel?
>> >>
>> >> That's what I'm asking you ;-)
>> >>
>> >> >> Does cpu_sibling_map not give you some indication of whether HT is
>> >> >> enabled? I think the topology_thread_cpumask() is the topology API for
>> >> >> that. But I could most definitely be wrong. Hopefully someone on the
>> >> >> Cc list will know.
>> >> >>
>> >> > Remember trying some of that, but when perf_event is initialized, those
>> >> > masks are not yet setup properly.
>> >>
>> >> Oh, bummer.
>> >
>> > So we init perf very early to get nmi-watchdog up and running, but
>> > there's no reason you cannot register a second initcall later and flip
>> > the switch from it there.
>>
>> and what initcall would that be?
>
> Pretty much anything !early_initcall() is ran after SMP bringup iirc.

Ok, we can try this. Need to check the impact on NMI watchdog if
already active.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 9/9] perf/x86: add syfs entry to disable HT bug workaround
  2014-06-05  9:29     ` Stephane Eranian
  2014-06-05 10:01       ` Matt Fleming
@ 2014-06-05 13:19       ` Peter Zijlstra
  2014-06-05 13:26         ` Stephane Eranian
  1 sibling, 1 reply; 59+ messages in thread
From: Peter Zijlstra @ 2014-06-05 13:19 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Matt Fleming, LKML, mingo, ak, Jiri Olsa, Yan, Zheng, Maria Dimakopoulou

[-- Attachment #1: Type: text/plain, Size: 1415 bytes --]

On Thu, Jun 05, 2014 at 11:29:33AM +0200, Stephane Eranian wrote:

> If you know what you are doing (poweruser), then there are measurements
> which works fine with the HT erratum.  This is why we have the option.
> 
> For instance if you only measure events 4x4 in system-wide mode
> and you know which counters these event are going to use, you don't
> need the workaround. For instance:
> 
> # perf stat -a -e r81d0,r01d1,r08d0,r20d1 sleep 5
> 
> Works well if you have a uniform workload across all CPUs.
> All those events leak, but the leaks balance themselves and you
> get the correct counts in the end. The advantage is that you don't
> have to multiplex. With the workaround enable, this would multiplex
> a lot.
> 
> But as I said, this is for experts only.

Still seems tricky, you really want those pinned to make that guarantee,
and even then its a stretch. I don't think perf tool exposes the pinned
attribute though, or I'm just not looking right.

I say stretch, because while I think it'll work out and we'll end up
programming the counters the same way on each cpu, we really do not make
that guarantee either, pinned or not.

I think I agree with Matt in that exposing this to userspace is really
asking for trouble.

Now, I've not yet read through the entire patch series, but how
impossible is it to allow programming the exact same event on both HT
siblings?

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 9/9] perf/x86: add syfs entry to disable HT bug workaround
  2014-06-05 13:19       ` Peter Zijlstra
@ 2014-06-05 13:26         ` Stephane Eranian
  0 siblings, 0 replies; 59+ messages in thread
From: Stephane Eranian @ 2014-06-05 13:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Matt Fleming, LKML, mingo, ak, Jiri Olsa, Yan, Zheng, Maria Dimakopoulou

On Thu, Jun 5, 2014 at 3:19 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Jun 05, 2014 at 11:29:33AM +0200, Stephane Eranian wrote:
>
>> If you know what you are doing (poweruser), then there are measurements
>> which works fine with the HT erratum.  This is why we have the option.
>>
>> For instance if you only measure events 4x4 in system-wide mode
>> and you know which counters these event are going to use, you don't
>> need the workaround. For instance:
>>
>> # perf stat -a -e r81d0,r01d1,r08d0,r20d1 sleep 5
>>
>> Works well if you have a uniform workload across all CPUs.
>> All those events leak, but the leaks balance themselves and you
>> get the correct counts in the end. The advantage is that you don't
>> have to multiplex. With the workaround enable, this would multiplex
>> a lot.
>>
>> But as I said, this is for experts only.
>
> Still seems tricky, you really want those pinned to make that guarantee,
> and even then its a stretch. I don't think perf tool exposes the pinned
> attribute though, or I'm just not looking right.
>
I think it does. But regardless, if you are on single user machine,
NMI disabled,
and you know where events can run, workload is uniform, then it does work.
Of course, this is a stretch for average users.

> I say stretch, because while I think it'll work out and we'll end up
> programming the counters the same way on each cpu, we really do not make
> that guarantee either, pinned or not.
>
There is no guarantee. However, this is what is currently going on.

> I think I agree with Matt in that exposing this to userspace is really
> asking for trouble.
>
This is a separate patch for a good reason, it is optional. If you think
it is too risky, then drop it.

> Now, I've not yet read through the entire patch series, but how
> impossible is it to allow programming the exact same event on both HT
> siblings?

That would require global view of scheduling and multiplexing in sync
between HT to ensure corrupting events always face each other. But
again this also assume only one tool instance is running.

I think it is better to use the workaround and repatriate the counts
for corrupting events. That would allow correct counting. Sampling
is out on those.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 9/9] perf/x86: add syfs entry to disable HT bug workaround
  2014-06-05 13:16                   ` Stephane Eranian
@ 2014-06-05 13:26                     ` Peter Zijlstra
  0 siblings, 0 replies; 59+ messages in thread
From: Peter Zijlstra @ 2014-06-05 13:26 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Matt Fleming, LKML, mingo, ak, Jiri Olsa, Yan, Zheng, Maria Dimakopoulou

[-- Attachment #1: Type: text/plain, Size: 376 bytes --]

On Thu, Jun 05, 2014 at 03:16:47PM +0200, Stephane Eranian wrote:
> > Pretty much anything !early_initcall() is ran after SMP bringup iirc.
> 
> Ok, we can try this. Need to check the impact on NMI watchdog if
> already active.

Right, we could provide an interface to stop and restart the watchdog so
that the PMU is quiescent when we flip this switch, if required. 

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 9/9] perf/x86: add syfs entry to disable HT bug workaround
  2014-06-05 12:02             ` Stephane Eranian
@ 2014-06-05 13:27               ` Borislav Petkov
  2014-06-05 13:42                 ` Stephane Eranian
  0 siblings, 1 reply; 59+ messages in thread
From: Borislav Petkov @ 2014-06-05 13:27 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Matt Fleming, LKML, Peter Zijlstra, mingo, ak, Jiri Olsa, Yan,
	Zheng, Maria Dimakopoulou

On Thu, Jun 05, 2014 at 02:02:51PM +0200, Stephane Eranian wrote:
> It is enabled by default. Nothing is done to try and disable it later
> even once the kernel is fully booted. So this is mostly for testing
> and power-users.

You keep saying "power-users". What is the disadvantage for power users
running with the workaround disabled? I.e., why would anyone want to
disable it at all, what is the use case for that?

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 4/9] perf/x86: add cross-HT counter exclusion infrastructure
  2014-06-05  8:04   ` Peter Zijlstra
@ 2014-06-05 13:36     ` Maria Dimakopoulou
  0 siblings, 0 replies; 59+ messages in thread
From: Maria Dimakopoulou @ 2014-06-05 13:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Stephane Eranian, linux-kernel, mingo, ak, jolsa, zheng.z.yan

This is to indicate that the PMU needs to setup the
shared state struct which is called excl_cntrs.
That struct is allocated for all CPUs at first, and
then half of them are destroyed and their CPUs
made to point to their siblings struct.

So yes. EXCL is really related to HT.
It could be called HT_EXCL.

Will fix in V2.

On Thu, Jun 5, 2014 at 11:04 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Jun 04, 2014 at 11:34:13PM +0200, Stephane Eranian wrote:
>> @@ -499,6 +530,7 @@ do {                                                                      \
>>   */
>>  #define PMU_FL_NO_HT_SHARING 0x1 /* no hyper-threading resource sharing */
>>  #define PMU_FL_HAS_RSP_1     0x2 /* has 2 equivalent offcore_rsp regs   */
>> +#define PMU_FL_EXCL_CNTRS    0x4 /* has exclusive counter requirements  */
>
> the EXLC thing is HT specific too, right? How about HT_EXCL or so?
>
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/9] perf/x86: implement cross-HT corruption bug workaround
  2014-06-04 21:34 ` [PATCH 5/9] perf/x86: implement cross-HT corruption bug workaround Stephane Eranian
@ 2014-06-05 13:38   ` Peter Zijlstra
  2014-06-05 13:42   ` Peter Zijlstra
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 59+ messages in thread
From: Peter Zijlstra @ 2014-06-05 13:38 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: linux-kernel, mingo, ak, jolsa, zheng.z.yan, maria.n.dimakopoulou

[-- Attachment #1: Type: text/plain, Size: 1219 bytes --]

On Wed, Jun 04, 2014 at 11:34:14PM +0200, Stephane Eranian wrote:
>  static void
> +intel_start_scheduling(struct cpu_hw_events *cpuc)
> +{

> +	/*
> +	 * lock shared state until we are done scheduling
> +	 * in stop_event_scheduling()
> +	 * makes scheduling appear as a transaction
> +	 */
> +	spin_lock_irqsave(&excl_cntrs->lock, excl_cntrs->lock_flags);
> +
> +	/*
> +	 * save initial state of sibling thread
> +	 */
> +	memcpy(xlo->init_state, xlo->state, sizeof(xlo->init_state));
> +}
> +
> +static void
> +intel_stop_scheduling(struct cpu_hw_events *cpuc)
> +{

> +	/*
> +	 * make new sibling thread state visible
> +	 */
> +	memcpy(xlo->state, xlo->init_state, sizeof(xlo->state));
> +
> +	xl->sched_started = false;
> +	/*
> +	 * release shared state lock (lock acquire in intel_start_scheduling())
> +	 */
> +	spin_unlock_irqrestore(&excl_cntrs->lock, excl_cntrs->lock_flags);
> +}

Do you really need the irqsave/irqrestore? From what I can tell this is
always called under perf_event_context::lock and that is already an
IRQ-safe lock, so interrupts should always be disabled here.

Also, it looks like xl->state is the effective state, and ->init_state
is the work state? Is init the right name for this?

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 9/9] perf/x86: add syfs entry to disable HT bug workaround
  2014-06-05 13:27               ` Borislav Petkov
@ 2014-06-05 13:42                 ` Stephane Eranian
  2014-06-05 14:03                   ` Borislav Petkov
  0 siblings, 1 reply; 59+ messages in thread
From: Stephane Eranian @ 2014-06-05 13:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Matt Fleming, LKML, Peter Zijlstra, mingo, ak, Jiri Olsa, Yan,
	Zheng, Maria Dimakopoulou

On Thu, Jun 5, 2014 at 3:27 PM, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Jun 05, 2014 at 02:02:51PM +0200, Stephane Eranian wrote:
>> It is enabled by default. Nothing is done to try and disable it later
>> even once the kernel is fully booted. So this is mostly for testing
>> and power-users.
>
> You keep saying "power-users". What is the disadvantage for power users
> running with the workaround disabled? I.e., why would anyone want to
> disable it at all, what is the use case for that?
>
I gave a test case earlier:

# echo 0 >/proc/sys/kernel/nmi_watchdog
# run_my_uniform_workload_on_all_cpus &
# perf stat -a -e r81d0,r01d1,r08d0,r20d1 sleep 5

That run gives the correct answer.

If I just look at CPU0 CPU4 siblings:

CPU0, counter0 leaks N counts to CPU4, counter 0

but at the same time:

CPU4, counter0 leaks N counts to CPU0, counter 0

This is because we have the same event in the same
counter AND the workload is uniform, meaning the
event (here loads retired) occurs at the same rate
on both siblings.

You can test this by measuring only on one HT.
# perf stat -a -C0 -e r81d0,r01d1,r08d0,r20d1 sleep 5

Note that some events, leak more than they count.

Again, this is really for experts. The average users
should not have to deal with this. So we can drop
the sysfs entry.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/9] perf/x86: implement cross-HT corruption bug workaround
  2014-06-04 21:34 ` [PATCH 5/9] perf/x86: implement cross-HT corruption bug workaround Stephane Eranian
  2014-06-05 13:38   ` Peter Zijlstra
@ 2014-06-05 13:42   ` Peter Zijlstra
  2014-06-05 13:48   ` Peter Zijlstra
  2014-06-05 14:04   ` Peter Zijlstra
  3 siblings, 0 replies; 59+ messages in thread
From: Peter Zijlstra @ 2014-06-05 13:42 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: linux-kernel, mingo, ak, jolsa, zheng.z.yan, maria.n.dimakopoulou

[-- Attachment #1: Type: text/plain, Size: 1118 bytes --]

On Wed, Jun 04, 2014 at 11:34:14PM +0200, Stephane Eranian wrote:
> +static void intel_commit_scheduling(struct cpu_hw_events *cpuc,
> +				    struct perf_event *event, int cntr)
> +{
> +	struct intel_excl_cntrs *excl_cntrs = cpuc->excl_cntrs;
> +	struct event_constraint *c = event->hw.constraint;
> +	struct intel_excl_states *xlo, *xl;
> +	int tid = cpuc->excl_thread_id;
> +	int o_tid = 1 - tid;
> +	int is_excl;
> +
> +	if (cpuc->is_fake || !c)
> +		return;
> +
> +	is_excl = c->flags & PERF_X86_EVENT_EXCL;
> +
> +	if (!(c->flags & PERF_X86_EVENT_DYNAMIC))
> +		return;
> +
> +	WARN_ON_ONCE(!excl_cntrs);
> +
> +	if (!excl_cntrs)
> +		return;
> +
> +	xl = &excl_cntrs->states[tid];
> +	xlo = &excl_cntrs->states[o_tid];
> +
> +	WARN_ON_ONCE(!spin_is_locked(&excl_cntrs->lock));

Use:
	lockdep_assert_held(&excl_cntrs->lock);

It also checks to see the current context is actually the lock holder,
and it doesn't generate any code when !lockdep.

> +
> +	if (cntr >= 0) {
> +		if (is_excl)
> +			xlo->init_state[cntr] = INTEL_EXCL_EXCLUSIVE;
> +		else
> +			xlo->init_state[cntr] = INTEL_EXCL_SHARED;
> +	}
>  }

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/9] perf/x86: implement cross-HT corruption bug workaround
  2014-06-04 21:34 ` [PATCH 5/9] perf/x86: implement cross-HT corruption bug workaround Stephane Eranian
  2014-06-05 13:38   ` Peter Zijlstra
  2014-06-05 13:42   ` Peter Zijlstra
@ 2014-06-05 13:48   ` Peter Zijlstra
  2014-06-05 14:01     ` Maria Dimakopoulou
  2014-06-05 14:04   ` Peter Zijlstra
  3 siblings, 1 reply; 59+ messages in thread
From: Peter Zijlstra @ 2014-06-05 13:48 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: linux-kernel, mingo, ak, jolsa, zheng.z.yan, maria.n.dimakopoulou

[-- Attachment #1: Type: text/plain, Size: 729 bytes --]

On Wed, Jun 04, 2014 at 11:34:14PM +0200, Stephane Eranian wrote:

> +static struct event_constraint *
> +intel_get_excl_constraints(struct cpu_hw_events *cpuc, struct perf_event *event,
> +			   struct event_constraint *c)
> +{

> +	if (!(c->flags & PERF_X86_EVENT_DYNAMIC)) {
> +
> +		/*
> +		 * in case we fail, we assume no counter
> +		 * is supported to be on the safe side
> +		 */
> +		cx = kmalloc(sizeof(*cx), GFP_KERNEL);
> +		if (!cx)
> +			return &emptyconstraint;
> +

Ok, so forgive me if I'm wrong, but the way we get here is through:

x86_schedule_event()
  ->start_scheduling()
    spin_lock()
  ->get_event_constraints()
    intel_get_excl_constraints()
      kmalloc(.gfp=GFP_KERNEL)

How can that ever work?

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/9] perf/x86: implement cross-HT corruption bug workaround
  2014-06-05 13:48   ` Peter Zijlstra
@ 2014-06-05 14:01     ` Maria Dimakopoulou
  2014-06-05 14:04       ` Borislav Petkov
  2014-06-05 14:11       ` Peter Zijlstra
  0 siblings, 2 replies; 59+ messages in thread
From: Maria Dimakopoulou @ 2014-06-05 14:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Stephane Eranian, linux-kernel, mingo, ak, jolsa, zheng.z.yan

Are you saying it is illegal to call kmalloc() from
this context?

kmalloc is needed because we need to allocate
a new constraint struct since the static constraint
cannot be modified.

Worst case we can statically allocate a second
constraint struct in the event struct.

On Thu, Jun 5, 2014 at 4:48 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Jun 04, 2014 at 11:34:14PM +0200, Stephane Eranian wrote:
>
>> +static struct event_constraint *
>> +intel_get_excl_constraints(struct cpu_hw_events *cpuc, struct perf_event *event,
>> +                        struct event_constraint *c)
>> +{
>
>> +     if (!(c->flags & PERF_X86_EVENT_DYNAMIC)) {
>> +
>> +             /*
>> +              * in case we fail, we assume no counter
>> +              * is supported to be on the safe side
>> +              */
>> +             cx = kmalloc(sizeof(*cx), GFP_KERNEL);
>> +             if (!cx)
>> +                     return &emptyconstraint;
>> +
>
> Ok, so forgive me if I'm wrong, but the way we get here is through:
>
> x86_schedule_event()
>   ->start_scheduling()
>     spin_lock()
>   ->get_event_constraints()
>     intel_get_excl_constraints()
>       kmalloc(.gfp=GFP_KERNEL)
>
> How can that ever work?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 9/9] perf/x86: add syfs entry to disable HT bug workaround
  2014-06-05 13:42                 ` Stephane Eranian
@ 2014-06-05 14:03                   ` Borislav Petkov
  2014-06-05 14:45                     ` Maria Dimakopoulou
  0 siblings, 1 reply; 59+ messages in thread
From: Borislav Petkov @ 2014-06-05 14:03 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Matt Fleming, LKML, Peter Zijlstra, mingo, ak, Jiri Olsa, Yan,
	Zheng, Maria Dimakopoulou

On Thu, Jun 05, 2014 at 03:42:14PM +0200, Stephane Eranian wrote:
> I gave a test case earlier:
> 
> # echo 0 >/proc/sys/kernel/nmi_watchdog
> # run_my_uniform_workload_on_all_cpus &
> # perf stat -a -e r81d0,r01d1,r08d0,r20d1 sleep 5
> 
> That run gives the correct answer.
> 
> If I just look at CPU0 CPU4 siblings:
> 
> CPU0, counter0 leaks N counts to CPU4, counter 0
> 
> but at the same time:
> 
> CPU4, counter0 leaks N counts to CPU0, counter 0
> 
> This is because we have the same event in the same
> counter AND the workload is uniform, meaning the
> event (here loads retired) occurs at the same rate
> on both siblings.
> 
> You can test this by measuring only on one HT.
> # perf stat -a -C0 -e r81d0,r01d1,r08d0,r20d1 sleep 5
> 
> Note that some events, leak more than they count.

Ok, so AFAIU, this particular workload counts correctly just because
counters leak the same amount. If so, what happens if you run this exact
same workload with the workaround enabled? I read something about a bit
more counter multiplexing... or is there a more serious issue?

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/9] perf/x86: implement cross-HT corruption bug workaround
  2014-06-04 21:34 ` [PATCH 5/9] perf/x86: implement cross-HT corruption bug workaround Stephane Eranian
                     ` (2 preceding siblings ...)
  2014-06-05 13:48   ` Peter Zijlstra
@ 2014-06-05 14:04   ` Peter Zijlstra
  2014-06-05 14:15     ` Stephane Eranian
  3 siblings, 1 reply; 59+ messages in thread
From: Peter Zijlstra @ 2014-06-05 14:04 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: linux-kernel, mingo, ak, jolsa, zheng.z.yan, maria.n.dimakopoulou

[-- Attachment #1: Type: text/plain, Size: 1225 bytes --]

On Wed, Jun 04, 2014 at 11:34:14PM +0200, Stephane Eranian wrote:
> +
> +	/*
> +	 * Modify static constraint with current dynamic
> +	 * state of thread
> +	 *
> +	 * EXCLUSIVE: sibling counter measuring exclusive event
> +	 * SHARED   : sibling counter measuring non-exclusive event
> +	 * UNUSED   : sibling counter unused
> +	 */
> +	for_each_set_bit(i, cx->idxmsk, X86_PMC_IDX_MAX) {
> +		/*
> +		 * exclusive event in sibling counter
> +		 * our corresponding counter cannot be used
> +		 * regardless of our event
> +		 */
> +		if (xl->state[i] == INTEL_EXCL_EXCLUSIVE)
> +			__clear_bit(i, cx->idxmsk);
> +		/*
> +		 * if measuring an exclusive event, sibling
> +		 * measuring non-exclusive, then counter cannot
> +		 * be used
> +		 */
> +		if (is_excl && xl->state[i] == INTEL_EXCL_SHARED)
> +			__clear_bit(i, cx->idxmsk);
> +	}
> +
> +	/*
> +	 * recompute actual bit weight for scheduling algorithm
> +	 */
> +	cx->weight = hweight64(cx->idxmsk64);

So I think we talked about this a bit; what happens if CPU0 (taking your
4 core HSW-client) is first to program its counters and takes all 4 in
exclusive mode?

Then there's none left for CPU4.

Did I miss where we avoid that problem, or is that an actual issue?

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/9] perf/x86: implement cross-HT corruption bug workaround
  2014-06-05 14:01     ` Maria Dimakopoulou
@ 2014-06-05 14:04       ` Borislav Petkov
  2014-06-05 14:11       ` Peter Zijlstra
  1 sibling, 0 replies; 59+ messages in thread
From: Borislav Petkov @ 2014-06-05 14:04 UTC (permalink / raw)
  To: Maria Dimakopoulou
  Cc: Peter Zijlstra, Stephane Eranian, linux-kernel, mingo, ak, jolsa,
	zheng.z.yan

On Thu, Jun 05, 2014 at 05:01:25PM +0300, Maria Dimakopoulou wrote:
> Are you saying it is illegal to call kmalloc() from
> this context?
> 
> kmalloc is needed because we need to allocate
> a new constraint struct since the static constraint
> cannot be modified.
> 
> Worst case we can statically allocate a second
> constraint struct in the event struct.

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

Please do not top-post.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/9] perf/x86: implement cross-HT corruption bug workaround
  2014-06-05 14:01     ` Maria Dimakopoulou
  2014-06-05 14:04       ` Borislav Petkov
@ 2014-06-05 14:11       ` Peter Zijlstra
  2014-06-05 14:14         ` Peter Zijlstra
  1 sibling, 1 reply; 59+ messages in thread
From: Peter Zijlstra @ 2014-06-05 14:11 UTC (permalink / raw)
  To: Maria Dimakopoulou
  Cc: Stephane Eranian, linux-kernel, mingo, ak, jolsa, zheng.z.yan

[-- Attachment #1: Type: text/plain, Size: 1681 bytes --]

On Thu, Jun 05, 2014 at 05:01:25PM +0300, Maria Dimakopoulou wrote:
> On Thu, Jun 5, 2014 at 4:48 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Wed, Jun 04, 2014 at 11:34:14PM +0200, Stephane Eranian wrote:
> >
> >> +static struct event_constraint *
> >> +intel_get_excl_constraints(struct cpu_hw_events *cpuc, struct perf_event *event,
> >> +                        struct event_constraint *c)
> >> +{
> >
> >> +     if (!(c->flags & PERF_X86_EVENT_DYNAMIC)) {
> >> +
> >> +             /*
> >> +              * in case we fail, we assume no counter
> >> +              * is supported to be on the safe side
> >> +              */
> >> +             cx = kmalloc(sizeof(*cx), GFP_KERNEL);
> >> +             if (!cx)
> >> +                     return &emptyconstraint;
> >> +
> >
> > Ok, so forgive me if I'm wrong, but the way we get here is through:
> >
> > x86_schedule_event()
> >   ->start_scheduling()
> >     spin_lock()
> >   ->get_event_constraints()
> >     intel_get_excl_constraints()
> >       kmalloc(.gfp=GFP_KERNEL)
> >
> > How can that ever work?

> Are you saying it is illegal to call kmalloc() from
> this context?

Nobody will come and arrest you for it, so no. Broken though. GFP_KERNEL
will attempt to sleep to wait for reclaim, and you're holding a
spinlock.

> kmalloc is needed because we need to allocate
> a new constraint struct since the static constraint
> cannot be modified.
> 
> Worst case we can statically allocate a second
> constraint struct in the event struct.

Nah, since you will need at most one constraint per counter, you could
preallocate num_counter constraints for each cpu.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/9] perf/x86: implement cross-HT corruption bug workaround
  2014-06-05 14:11       ` Peter Zijlstra
@ 2014-06-05 14:14         ` Peter Zijlstra
  2014-06-05 14:24           ` Maria Dimakopoulou
  0 siblings, 1 reply; 59+ messages in thread
From: Peter Zijlstra @ 2014-06-05 14:14 UTC (permalink / raw)
  To: Maria Dimakopoulou
  Cc: Stephane Eranian, linux-kernel, mingo, ak, jolsa, zheng.z.yan

[-- Attachment #1: Type: text/plain, Size: 720 bytes --]

On Thu, Jun 05, 2014 at 04:11:50PM +0200, Peter Zijlstra wrote:
> > > x86_schedule_event()
> > >   ->start_scheduling()
> > >     spin_lock()
> > >   ->get_event_constraints()
> > >     intel_get_excl_constraints()
> > >       kmalloc(.gfp=GFP_KERNEL)
> > >
> > > How can that ever work?
> 
> > Are you saying it is illegal to call kmalloc() from
> > this context?
> 
> Nobody will come and arrest you for it, so no. Broken though. GFP_KERNEL
> will attempt to sleep to wait for reclaim, and you're holding a
> spinlock.

Furthermore, even GFP_ATOMIC shouldn't be used because these are/should
be raw_spinlocks and the zone->lock is a regular spinlock.

So please look into the preallocation thing.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/9] perf/x86: implement cross-HT corruption bug workaround
  2014-06-05 14:04   ` Peter Zijlstra
@ 2014-06-05 14:15     ` Stephane Eranian
  2014-06-05 14:21       ` Peter Zijlstra
  0 siblings, 1 reply; 59+ messages in thread
From: Stephane Eranian @ 2014-06-05 14:15 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, mingo, ak, Jiri Olsa, Yan, Zheng, Maria Dimakopoulou

On Thu, Jun 5, 2014 at 4:04 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Jun 04, 2014 at 11:34:14PM +0200, Stephane Eranian wrote:
>> +
>> +     /*
>> +      * Modify static constraint with current dynamic
>> +      * state of thread
>> +      *
>> +      * EXCLUSIVE: sibling counter measuring exclusive event
>> +      * SHARED   : sibling counter measuring non-exclusive event
>> +      * UNUSED   : sibling counter unused
>> +      */
>> +     for_each_set_bit(i, cx->idxmsk, X86_PMC_IDX_MAX) {
>> +             /*
>> +              * exclusive event in sibling counter
>> +              * our corresponding counter cannot be used
>> +              * regardless of our event
>> +              */
>> +             if (xl->state[i] == INTEL_EXCL_EXCLUSIVE)
>> +                     __clear_bit(i, cx->idxmsk);
>> +             /*
>> +              * if measuring an exclusive event, sibling
>> +              * measuring non-exclusive, then counter cannot
>> +              * be used
>> +              */
>> +             if (is_excl && xl->state[i] == INTEL_EXCL_SHARED)
>> +                     __clear_bit(i, cx->idxmsk);
>> +     }
>> +
>> +     /*
>> +      * recompute actual bit weight for scheduling algorithm
>> +      */
>> +     cx->weight = hweight64(cx->idxmsk64);
>
> So I think we talked about this a bit; what happens if CPU0 (taking your
> 4 core HSW-client) is first to program its counters and takes all 4 in
> exclusive mode?
>
> Then there's none left for CPU4.
>
> Did I miss where we avoid that problem, or is that an actual issue?

Yes, this patch series does not address this problem yet. It will be
in a second series.
Don't have a good solution yet.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/9] perf/x86: implement cross-HT corruption bug workaround
  2014-06-05 14:15     ` Stephane Eranian
@ 2014-06-05 14:21       ` Peter Zijlstra
  2014-06-05 14:26         ` Stephane Eranian
  2014-06-05 15:31         ` Stephane Eranian
  0 siblings, 2 replies; 59+ messages in thread
From: Peter Zijlstra @ 2014-06-05 14:21 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: LKML, mingo, ak, Jiri Olsa, Yan, Zheng, Maria Dimakopoulou

[-- Attachment #1: Type: text/plain, Size: 2066 bytes --]

On Thu, Jun 05, 2014 at 04:15:17PM +0200, Stephane Eranian wrote:
> On Thu, Jun 5, 2014 at 4:04 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Wed, Jun 04, 2014 at 11:34:14PM +0200, Stephane Eranian wrote:
> >> +
> >> +     /*
> >> +      * Modify static constraint with current dynamic
> >> +      * state of thread
> >> +      *
> >> +      * EXCLUSIVE: sibling counter measuring exclusive event
> >> +      * SHARED   : sibling counter measuring non-exclusive event
> >> +      * UNUSED   : sibling counter unused
> >> +      */
> >> +     for_each_set_bit(i, cx->idxmsk, X86_PMC_IDX_MAX) {
> >> +             /*
> >> +              * exclusive event in sibling counter
> >> +              * our corresponding counter cannot be used
> >> +              * regardless of our event
> >> +              */
> >> +             if (xl->state[i] == INTEL_EXCL_EXCLUSIVE)
> >> +                     __clear_bit(i, cx->idxmsk);
> >> +             /*
> >> +              * if measuring an exclusive event, sibling
> >> +              * measuring non-exclusive, then counter cannot
> >> +              * be used
> >> +              */
> >> +             if (is_excl && xl->state[i] == INTEL_EXCL_SHARED)
> >> +                     __clear_bit(i, cx->idxmsk);
> >> +     }
> >> +
> >> +     /*
> >> +      * recompute actual bit weight for scheduling algorithm
> >> +      */
> >> +     cx->weight = hweight64(cx->idxmsk64);
> >
> > So I think we talked about this a bit; what happens if CPU0 (taking your
> > 4 core HSW-client) is first to program its counters and takes all 4 in
> > exclusive mode?
> >
> > Then there's none left for CPU4.
> >
> > Did I miss where we avoid that problem, or is that an actual issue?
> 
> Yes, this patch series does not address this problem yet. It will be
> in a second series.
> Don't have a good solution yet.

We could limit each cpu to num_counters/2 exclusive slots. That'll still
be painful with some constrained events I imagine, but in general that
should 'work' I suppose.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/9] perf/x86: implement cross-HT corruption bug workaround
  2014-06-05 14:14         ` Peter Zijlstra
@ 2014-06-05 14:24           ` Maria Dimakopoulou
  0 siblings, 0 replies; 59+ messages in thread
From: Maria Dimakopoulou @ 2014-06-05 14:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Stephane Eranian, linux-kernel, mingo, ak, jolsa, zheng.z.yan

On Thu, Jun 5, 2014 at 5:14 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Jun 05, 2014 at 04:11:50PM +0200, Peter Zijlstra wrote:
>> > > x86_schedule_event()
>> > >   ->start_scheduling()
>> > >     spin_lock()
>> > >   ->get_event_constraints()
>> > >     intel_get_excl_constraints()
>> > >       kmalloc(.gfp=GFP_KERNEL)
>> > >
>> > > How can that ever work?
>>
>> > Are you saying it is illegal to call kmalloc() from
>> > this context?
>>
>> Nobody will come and arrest you for it, so no. Broken though. GFP_KERNEL
>> will attempt to sleep to wait for reclaim, and you're holding a
>> spinlock.

Ok, got it.

>
> Furthermore, even GFP_ATOMIC shouldn't be used because these are/should
> be raw_spinlocks and the zone->lock is a regular spinlock.
>
> So please look into the preallocation thing.

We will preallocate.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/9] perf/x86: implement cross-HT corruption bug workaround
  2014-06-05 14:21       ` Peter Zijlstra
@ 2014-06-05 14:26         ` Stephane Eranian
  2014-06-05 14:31           ` Peter Zijlstra
  2014-06-05 15:31         ` Stephane Eranian
  1 sibling, 1 reply; 59+ messages in thread
From: Stephane Eranian @ 2014-06-05 14:26 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, mingo, ak, Jiri Olsa, Yan, Zheng, Maria Dimakopoulou

On Thu, Jun 5, 2014 at 4:21 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Jun 05, 2014 at 04:15:17PM +0200, Stephane Eranian wrote:
>> On Thu, Jun 5, 2014 at 4:04 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Wed, Jun 04, 2014 at 11:34:14PM +0200, Stephane Eranian wrote:
>> >> +
>> >> +     /*
>> >> +      * Modify static constraint with current dynamic
>> >> +      * state of thread
>> >> +      *
>> >> +      * EXCLUSIVE: sibling counter measuring exclusive event
>> >> +      * SHARED   : sibling counter measuring non-exclusive event
>> >> +      * UNUSED   : sibling counter unused
>> >> +      */
>> >> +     for_each_set_bit(i, cx->idxmsk, X86_PMC_IDX_MAX) {
>> >> +             /*
>> >> +              * exclusive event in sibling counter
>> >> +              * our corresponding counter cannot be used
>> >> +              * regardless of our event
>> >> +              */
>> >> +             if (xl->state[i] == INTEL_EXCL_EXCLUSIVE)
>> >> +                     __clear_bit(i, cx->idxmsk);
>> >> +             /*
>> >> +              * if measuring an exclusive event, sibling
>> >> +              * measuring non-exclusive, then counter cannot
>> >> +              * be used
>> >> +              */
>> >> +             if (is_excl && xl->state[i] == INTEL_EXCL_SHARED)
>> >> +                     __clear_bit(i, cx->idxmsk);
>> >> +     }
>> >> +
>> >> +     /*
>> >> +      * recompute actual bit weight for scheduling algorithm
>> >> +      */
>> >> +     cx->weight = hweight64(cx->idxmsk64);
>> >
>> > So I think we talked about this a bit; what happens if CPU0 (taking your
>> > 4 core HSW-client) is first to program its counters and takes all 4 in
>> > exclusive mode?
>> >
>> > Then there's none left for CPU4.
>> >
>> > Did I miss where we avoid that problem, or is that an actual issue?
>>
>> Yes, this patch series does not address this problem yet. It will be
>> in a second series.
>> Don't have a good solution yet.
>
> We could limit each cpu to num_counters/2 exclusive slots. That'll still
> be painful with some constrained events I imagine, but in general that
> should 'work' I suppose.

That is probably the easiest solution, just modify the dynamic constraint
mask some more. Have not yet tried it.

The repatriation of the leaked count is not so easy either. Need to IPI
the other HT. There may be some restrictions as to when we can
safely do this.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/9] perf/x86: implement cross-HT corruption bug workaround
  2014-06-05 14:26         ` Stephane Eranian
@ 2014-06-05 14:31           ` Peter Zijlstra
  0 siblings, 0 replies; 59+ messages in thread
From: Peter Zijlstra @ 2014-06-05 14:31 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: LKML, mingo, ak, Jiri Olsa, Yan, Zheng, Maria Dimakopoulou

[-- Attachment #1: Type: text/plain, Size: 1035 bytes --]

On Thu, Jun 05, 2014 at 04:26:38PM +0200, Stephane Eranian wrote:
> On Thu, Jun 5, 2014 at 4:21 PM, Peter Zijlstra <peterz@infradead.org> wrote:

> > We could limit each cpu to num_counters/2 exclusive slots. That'll still
> > be painful with some constrained events I imagine, but in general that
> > should 'work' I suppose.
> 
> That is probably the easiest solution, just modify the dynamic constraint
> mask some more. Have not yet tried it.

Right, see what happens :-)

> The repatriation of the leaked count is not so easy either. Need to IPI
> the other HT. There may be some restrictions as to when we can
> safely do this.

Yes, that'll be 'interesting'. You typically cannot do things like
smp_function_call() from IRQ/NMI context. Which leaves you to
asynchonous IPIs.

I suppose the easiest way is to simply push the count out to the right
event when you reprogram the sibling. And maybe kick it every so often
to force do this just in case the PMU doesn't get reprogrammed a lot.

Still tricky.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 9/9] perf/x86: add syfs entry to disable HT bug workaround
  2014-06-05 14:03                   ` Borislav Petkov
@ 2014-06-05 14:45                     ` Maria Dimakopoulou
  2014-06-05 15:17                       ` Borislav Petkov
  0 siblings, 1 reply; 59+ messages in thread
From: Maria Dimakopoulou @ 2014-06-05 14:45 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Stephane Eranian, Matt Fleming, LKML, Peter Zijlstra, mingo, ak,
	Jiri Olsa, Yan, Zheng

On Thu, Jun 5, 2014 at 5:03 PM, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Jun 05, 2014 at 03:42:14PM +0200, Stephane Eranian wrote:
>> I gave a test case earlier:
>>
>> # echo 0 >/proc/sys/kernel/nmi_watchdog
>> # run_my_uniform_workload_on_all_cpus &
>> # perf stat -a -e r81d0,r01d1,r08d0,r20d1 sleep 5
>>
>> That run gives the correct answer.
>>
>> If I just look at CPU0 CPU4 siblings:
>>
>> CPU0, counter0 leaks N counts to CPU4, counter 0
>>
>> but at the same time:
>>
>> CPU4, counter0 leaks N counts to CPU0, counter 0
>>
>> This is because we have the same event in the same
>> counter AND the workload is uniform, meaning the
>> event (here loads retired) occurs at the same rate
>> on both siblings.
>>
>> You can test this by measuring only on one HT.
>> # perf stat -a -C0 -e r81d0,r01d1,r08d0,r20d1 sleep 5
>>
>> Note that some events, leak more than they count.
>
> Ok, so AFAIU, this particular workload counts correctly just because
> counters leak the same amount. If so, what happens if you run this exact
> same workload with the workaround enabled? I read something about a bit
> more counter multiplexing... or is there a more serious issue?

The issue is that the outcoming leaked counts are not compensated
by the incoming leaked counts of the sibling thread. With the workaround,
corrupting events are always scheduled with an empty sibling counter.
This means that their leaked counts are lost. So it is expected to see
lower counts with the workaround. Note that this is not a side-effect of
the workaround; leaked counts are expected to be lost with nothing
measured on the sibling counter in general.

In a second series we intend to re-integrate the counts for counting mode
events. The workaround makes this easier because it guarantees
that the sibling counter is unused, thus its counts are purely leaked
counts and they can be safely re-integrated.


>
> Thanks.
>
> --
> Regards/Gruss,
>     Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 9/9] perf/x86: add syfs entry to disable HT bug workaround
  2014-06-05 14:45                     ` Maria Dimakopoulou
@ 2014-06-05 15:17                       ` Borislav Petkov
  2014-06-05 16:39                         ` Maria Dimakopoulou
  0 siblings, 1 reply; 59+ messages in thread
From: Borislav Petkov @ 2014-06-05 15:17 UTC (permalink / raw)
  To: Maria Dimakopoulou
  Cc: Stephane Eranian, Matt Fleming, LKML, Peter Zijlstra, mingo, ak,
	Jiri Olsa, Yan, Zheng

On Thu, Jun 05, 2014 at 05:45:24PM +0300, Maria Dimakopoulou wrote:
> The issue is that the outcoming leaked counts are not compensated
> by the incoming leaked counts of the sibling thread. With the workaround,
> corrupting events are always scheduled with an empty sibling counter.
> This means that their leaked counts are lost. So it is expected to see
> lower counts with the workaround. Note that this is not a side-effect of
> the workaround; leaked counts are expected to be lost with nothing
> measured on the sibling counter in general.
> 
> In a second series we intend to re-integrate the counts for counting mode
> events. The workaround makes this easier because it guarantees
> that the sibling counter is unused, thus its counts are purely leaked
> counts and they can be safely re-integrated.

IIUC, sounds to me like reintegrating the leaked counts from the unused
counter should be part of the workaround too, not a second series...

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 5/9] perf/x86: implement cross-HT corruption bug workaround
  2014-06-05 14:21       ` Peter Zijlstra
  2014-06-05 14:26         ` Stephane Eranian
@ 2014-06-05 15:31         ` Stephane Eranian
  1 sibling, 0 replies; 59+ messages in thread
From: Stephane Eranian @ 2014-06-05 15:31 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, mingo, ak, Jiri Olsa, Yan, Zheng, Maria Dimakopoulou

On Thu, Jun 5, 2014 at 4:21 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Jun 05, 2014 at 04:15:17PM +0200, Stephane Eranian wrote:
>> On Thu, Jun 5, 2014 at 4:04 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Wed, Jun 04, 2014 at 11:34:14PM +0200, Stephane Eranian wrote:
>> >> +
>> >> +     /*
>> >> +      * Modify static constraint with current dynamic
>> >> +      * state of thread
>> >> +      *
>> >> +      * EXCLUSIVE: sibling counter measuring exclusive event
>> >> +      * SHARED   : sibling counter measuring non-exclusive event
>> >> +      * UNUSED   : sibling counter unused
>> >> +      */
>> >> +     for_each_set_bit(i, cx->idxmsk, X86_PMC_IDX_MAX) {
>> >> +             /*
>> >> +              * exclusive event in sibling counter
>> >> +              * our corresponding counter cannot be used
>> >> +              * regardless of our event
>> >> +              */
>> >> +             if (xl->state[i] == INTEL_EXCL_EXCLUSIVE)
>> >> +                     __clear_bit(i, cx->idxmsk);
>> >> +             /*
>> >> +              * if measuring an exclusive event, sibling
>> >> +              * measuring non-exclusive, then counter cannot
>> >> +              * be used
>> >> +              */
>> >> +             if (is_excl && xl->state[i] == INTEL_EXCL_SHARED)
>> >> +                     __clear_bit(i, cx->idxmsk);
>> >> +     }
>> >> +
>> >> +     /*
>> >> +      * recompute actual bit weight for scheduling algorithm
>> >> +      */
>> >> +     cx->weight = hweight64(cx->idxmsk64);
>> >
>> > So I think we talked about this a bit; what happens if CPU0 (taking your
>> > 4 core HSW-client) is first to program its counters and takes all 4 in
>> > exclusive mode?
>> >
>> > Then there's none left for CPU4.
>> >
>> > Did I miss where we avoid that problem, or is that an actual issue?
>>
>> Yes, this patch series does not address this problem yet. It will be
>> in a second series.
>> Don't have a good solution yet.
>
> We could limit each cpu to num_counters/2 exclusive slots. That'll still
> be painful with some constrained events I imagine, but in general that
> should 'work' I suppose.

Ok, tried this. It avoids the problems and just requires a 3 line patch.
So we could go with that for now. It would avoid total starvation.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 9/9] perf/x86: add syfs entry to disable HT bug workaround
  2014-06-05 15:17                       ` Borislav Petkov
@ 2014-06-05 16:39                         ` Maria Dimakopoulou
  2014-06-05 16:47                           ` Stephane Eranian
  2014-06-05 16:52                           ` Borislav Petkov
  0 siblings, 2 replies; 59+ messages in thread
From: Maria Dimakopoulou @ 2014-06-05 16:39 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Stephane Eranian, Matt Fleming, LKML, Peter Zijlstra, mingo, ak,
	Jiri Olsa, Yan, Zheng

On Thu, Jun 5, 2014 at 6:17 PM, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Jun 05, 2014 at 05:45:24PM +0300, Maria Dimakopoulou wrote:
>> The issue is that the outcoming leaked counts are not compensated
>> by the incoming leaked counts of the sibling thread. With the workaround,
>> corrupting events are always scheduled with an empty sibling counter.
>> This means that their leaked counts are lost. So it is expected to see
>> lower counts with the workaround. Note that this is not a side-effect of
>> the workaround; leaked counts are expected to be lost with nothing
>> measured on the sibling counter in general.
>>
>> In a second series we intend to re-integrate the counts for counting mode
>> events. The workaround makes this easier because it guarantees
>> that the sibling counter is unused, thus its counts are purely leaked
>> counts and they can be safely re-integrated.
>
> IIUC, sounds to me like reintegrating the leaked counts from the unused
> counter should be part of the workaround too, not a second series...
>

This series aims to avoid corruption of non-corrupting events.
Re-integration of the counts is not related to this. This is why
we chose to fix this other problem on a second series to keep
things clean and concepts separated.

> --
> Regards/Gruss,
>     Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 9/9] perf/x86: add syfs entry to disable HT bug workaround
  2014-06-05 16:39                         ` Maria Dimakopoulou
@ 2014-06-05 16:47                           ` Stephane Eranian
  2014-06-05 16:52                           ` Borislav Petkov
  1 sibling, 0 replies; 59+ messages in thread
From: Stephane Eranian @ 2014-06-05 16:47 UTC (permalink / raw)
  To: Maria Dimakopoulou
  Cc: Borislav Petkov, Matt Fleming, LKML, Peter Zijlstra, mingo, ak,
	Jiri Olsa, Yan, Zheng

On Thu, Jun 5, 2014 at 6:39 PM, Maria Dimakopoulou
<maria.n.dimakopoulou@gmail.com> wrote:
> On Thu, Jun 5, 2014 at 6:17 PM, Borislav Petkov <bp@alien8.de> wrote:
>> On Thu, Jun 05, 2014 at 05:45:24PM +0300, Maria Dimakopoulou wrote:
>>> The issue is that the outcoming leaked counts are not compensated
>>> by the incoming leaked counts of the sibling thread. With the workaround,
>>> corrupting events are always scheduled with an empty sibling counter.
>>> This means that their leaked counts are lost. So it is expected to see
>>> lower counts with the workaround. Note that this is not a side-effect of
>>> the workaround; leaked counts are expected to be lost with nothing
>>> measured on the sibling counter in general.
>>>
>>> In a second series we intend to re-integrate the counts for counting mode
>>> events. The workaround makes this easier because it guarantees
>>> that the sibling counter is unused, thus its counts are purely leaked
>>> counts and they can be safely re-integrated.
>>
>> IIUC, sounds to me like reintegrating the leaked counts from the unused
>> counter should be part of the workaround too, not a second series...
>>
>
> This series aims to avoid corruption of non-corrupting events.
> Re-integration of the counts is not related to this. This is why
> we chose to fix this other problem on a second series to keep
> things clean and concepts separated.
>
I agree with Maria here. The reintegration is different problem.
The series here helps and is a required first step. Reintegration
will be added asap.

>> --
>> Regards/Gruss,
>>     Boris.
>>
>> Sent from a fat crate under my desk. Formatting is fine.
>> --

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 9/9] perf/x86: add syfs entry to disable HT bug workaround
  2014-06-05 16:39                         ` Maria Dimakopoulou
  2014-06-05 16:47                           ` Stephane Eranian
@ 2014-06-05 16:52                           ` Borislav Petkov
  2014-06-05 18:00                             ` Maria Dimakopoulou
  1 sibling, 1 reply; 59+ messages in thread
From: Borislav Petkov @ 2014-06-05 16:52 UTC (permalink / raw)
  To: Maria Dimakopoulou
  Cc: Stephane Eranian, Matt Fleming, LKML, Peter Zijlstra, mingo, ak,
	Jiri Olsa, Yan, Zheng

On Thu, Jun 05, 2014 at 07:39:55PM +0300, Maria Dimakopoulou wrote:
> This series aims to avoid corruption of non-corrupting events.
> Re-integration of the counts is not related to this.

What do you mean "not related"? Then you have a partial workaround. This
whole thing is trying to fix a hw bug so of course it is related.

Once you do the "full" workaround, along with the part which
reintegrates the counts and thus completely fixes the issue (I believe),
then you don't need to disable it at all because disabling it doesn't
make any sense then. See what I'm saying?

This is the whole point Matt and I are trying to make: if the *full*
workaround doesn't have any noticeable disadvantages, then you don't
need to add a disable-mechanism due to the can of worms opening if you
do.

How you get this upstream to ease the review is a whole another story
and I applaud your desire to keep things clean.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 9/9] perf/x86: add syfs entry to disable HT bug workaround
  2014-06-05 16:52                           ` Borislav Petkov
@ 2014-06-05 18:00                             ` Maria Dimakopoulou
  2014-06-05 23:29                               ` Andi Kleen
  0 siblings, 1 reply; 59+ messages in thread
From: Maria Dimakopoulou @ 2014-06-05 18:00 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Stephane Eranian, Matt Fleming, LKML, Peter Zijlstra, mingo, ak,
	Jiri Olsa, Yan, Zheng

On Thu, Jun 5, 2014 at 7:52 PM, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Jun 05, 2014 at 07:39:55PM +0300, Maria Dimakopoulou wrote:
>> This series aims to avoid corruption of non-corrupting events.
>> Re-integration of the counts is not related to this.
>
> What do you mean "not related"? Then you have a partial workaround. This
> whole thing is trying to fix a hw bug so of course it is related.

I said not related to the problem of corruption of non-corrupting events which
we solve here.
Re-integration of the counts of corrupting events is a different problem.
Both problems are related to HT bug.

>
> Once you do the "full" workaround, along with the part which
> reintegrates the counts and thus completely fixes the issue (I believe),
> then you don't need to disable it at all because disabling it doesn't
> make any sense then. See what I'm saying?
>
> This is the whole point Matt and I are trying to make: if the *full*
> workaround doesn't have any noticeable disadvantages, then you don't
> need to add a disable-mechanism due to the can of worms opening if you
> do.
>

As Stephane pointed out, the sysfs entry is optional and the workaround
can be disabled only as root.

It is not absolutely necessary and it's not important.
We will drop it in V2.

> How you get this upstream to ease the review is a whole another story
> and I applaud your desire to keep things clean.
>

Then, I guess we agree to keep the re-integration for another series for
the reason you just pointed.


> --
> Regards/Gruss,
>     Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 4/9] perf/x86: add cross-HT counter exclusion infrastructure
  2014-06-05  8:29   ` Peter Zijlstra
@ 2014-06-05 21:33     ` Andi Kleen
  2014-06-05 21:38       ` Stephane Eranian
  2014-06-10 11:53     ` Stephane Eranian
  1 sibling, 1 reply; 59+ messages in thread
From: Andi Kleen @ 2014-06-05 21:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Stephane Eranian, linux-kernel, mingo, jolsa, zheng.z.yan,
	maria.n.dimakopoulou

> This hard assumes theres only ever 2 threads, which is true and I
> suppose more in arch/x86 will come apart the moment Intel makes a chip
> with more, still, do we have topology_thread_id() or so to cure this?


Xeon Phi already has 4 threads today.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 4/9] perf/x86: add cross-HT counter exclusion infrastructure
  2014-06-05 21:33     ` Andi Kleen
@ 2014-06-05 21:38       ` Stephane Eranian
  0 siblings, 0 replies; 59+ messages in thread
From: Stephane Eranian @ 2014-06-05 21:38 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, LKML, mingo, Jiri Olsa, Yan, Zheng, Maria Dimakopoulou

On Thu, Jun 5, 2014 at 11:33 PM, Andi Kleen <ak@linux.intel.com> wrote:
>> This hard assumes theres only ever 2 threads, which is true and I
>> suppose more in arch/x86 will come apart the moment Intel makes a chip
>> with more, still, do we have topology_thread_id() or so to cure this?
>
>
> Xeon Phi already has 4 threads today.
>
Yes, but it does not have that bug (hopefully). This code is just for
the impacted processors.

> -Andi
>
> --
> ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 9/9] perf/x86: add syfs entry to disable HT bug workaround
  2014-06-05 18:00                             ` Maria Dimakopoulou
@ 2014-06-05 23:29                               ` Andi Kleen
  2014-06-06  8:28                                 ` Matt Fleming
  0 siblings, 1 reply; 59+ messages in thread
From: Andi Kleen @ 2014-06-05 23:29 UTC (permalink / raw)
  To: Maria Dimakopoulou
  Cc: Borislav Petkov, Stephane Eranian, Matt Fleming, LKML,
	Peter Zijlstra, mingo, Jiri Olsa, Yan, Zheng

> As Stephane pointed out, the sysfs entry is optional and the workaround
> can be disabled only as root.
> 
> It is not absolutely necessary and it's not important.
> We will drop it in V2.

I would prefer to keep it. It's fairly complex and it's always good
to have a way to disable complex things in case something goes wrong.

-Andi

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 9/9] perf/x86: add syfs entry to disable HT bug workaround
  2014-06-05 23:29                               ` Andi Kleen
@ 2014-06-06  8:28                                 ` Matt Fleming
  0 siblings, 0 replies; 59+ messages in thread
From: Matt Fleming @ 2014-06-06  8:28 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Maria Dimakopoulou, Borislav Petkov, Stephane Eranian, LKML,
	Peter Zijlstra, mingo, Jiri Olsa, Yan, Zheng

On 6 June 2014 00:29, Andi Kleen <ak@linux.intel.com> wrote:
>> As Stephane pointed out, the sysfs entry is optional and the workaround
>> can be disabled only as root.
>>
>> It is not absolutely necessary and it's not important.
>> We will drop it in V2.
>
> I would prefer to keep it. It's fairly complex and it's always good
> to have a way to disable complex things in case something goes wrong.

You want to be able to disable the workaround in case it doesn't work?
I'm having a hard time buying that as a valid reason for this knob.

If someone runs into issues with the workaround we want them to report
those issues so that we can tweak the code. If they've got the ability
to simply disable the code they're less likely to report the problem
and we'll all be worse for it.

Having the knob increase the number of configurations and increases
complexity, it doesn't reduce it.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 4/9] perf/x86: add cross-HT counter exclusion infrastructure
  2014-06-05  8:29   ` Peter Zijlstra
  2014-06-05 21:33     ` Andi Kleen
@ 2014-06-10 11:53     ` Stephane Eranian
  2014-06-10 12:26       ` Peter Zijlstra
  1 sibling, 1 reply; 59+ messages in thread
From: Stephane Eranian @ 2014-06-10 11:53 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, mingo, ak, Jiri Olsa, Yan, Zheng, Maria Dimakopoulou

On Thu, Jun 5, 2014 at 10:29 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Jun 04, 2014 at 11:34:13PM +0200, Stephane Eranian wrote:
>> @@ -2020,12 +2050,29 @@ static void intel_pmu_cpu_starting(int cpu)
>>
>>       if (x86_pmu.lbr_sel_map)
>>               cpuc->lbr_sel = &cpuc->shared_regs->regs[EXTRA_REG_LBR];
>> +
>> +     if (x86_pmu.flags & PMU_FL_EXCL_CNTRS) {
>> +             for_each_cpu(i, topology_thread_cpumask(cpu)) {
>> +                     struct intel_excl_cntrs *c;
>> +
>> +                     c = per_cpu(cpu_hw_events, i).excl_cntrs;
>> +                     if (c && c->core_id == core_id) {
>> +                             cpuc->kfree_on_online[1] = cpuc->excl_cntrs;
>> +                             cpuc->excl_cntrs = c;
>> +                             cpuc->excl_thread_id = 1;
>> +                             break;
>> +                     }
>> +             }
>> +             cpuc->excl_cntrs->core_id = core_id;
>> +             cpuc->excl_cntrs->refcnt++;
>> +     }
>>  }
>
> This hard assumes theres only ever 2 threads, which is true and I
> suppose more in arch/x86 will come apart the moment Intel makes a chip
> with more, still, do we have topology_thread_id() or so to cure this?

I assume your comment is relative to kfree_on_online[].
This code is specific to the HT bug, so yes, it assumes 2 threads and that
only one entry of the two excl_cntrs structs needs to be freed.
Doing otherwise, would require a list and will never be used to its full
potential.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 4/9] perf/x86: add cross-HT counter exclusion infrastructure
  2014-06-10 11:53     ` Stephane Eranian
@ 2014-06-10 12:26       ` Peter Zijlstra
  0 siblings, 0 replies; 59+ messages in thread
From: Peter Zijlstra @ 2014-06-10 12:26 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: LKML, mingo, ak, Jiri Olsa, Yan, Zheng, Maria Dimakopoulou

[-- Attachment #1: Type: text/plain, Size: 1821 bytes --]

On Tue, Jun 10, 2014 at 01:53:45PM +0200, Stephane Eranian wrote:
> On Thu, Jun 5, 2014 at 10:29 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Wed, Jun 04, 2014 at 11:34:13PM +0200, Stephane Eranian wrote:
> >> @@ -2020,12 +2050,29 @@ static void intel_pmu_cpu_starting(int cpu)
> >>
> >>       if (x86_pmu.lbr_sel_map)
> >>               cpuc->lbr_sel = &cpuc->shared_regs->regs[EXTRA_REG_LBR];
> >> +
> >> +     if (x86_pmu.flags & PMU_FL_EXCL_CNTRS) {
> >> +             for_each_cpu(i, topology_thread_cpumask(cpu)) {
> >> +                     struct intel_excl_cntrs *c;
> >> +
> >> +                     c = per_cpu(cpu_hw_events, i).excl_cntrs;
> >> +                     if (c && c->core_id == core_id) {
> >> +                             cpuc->kfree_on_online[1] = cpuc->excl_cntrs;
> >> +                             cpuc->excl_cntrs = c;
> >> +                             cpuc->excl_thread_id = 1;
> >> +                             break;
> >> +                     }
> >> +             }
> >> +             cpuc->excl_cntrs->core_id = core_id;
> >> +             cpuc->excl_cntrs->refcnt++;
> >> +     }
> >>  }
> >
> > This hard assumes theres only ever 2 threads, which is true and I
> > suppose more in arch/x86 will come apart the moment Intel makes a chip
> > with more, still, do we have topology_thread_id() or so to cure this?
> 
> I assume your comment is relative to kfree_on_online[].
> This code is specific to the HT bug, so yes, it assumes 2 threads and that
> only one entry of the two excl_cntrs structs needs to be freed.
> Doing otherwise, would require a list and will never be used to its full
> potential.

That and ->excl_thread_id = 1, I was thinking that if we'd somehow have
4 threads, some of them need to have = [23] in there.



[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2014-06-10 12:26 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-04 21:34 [PATCH 0/9] perf/x86: implement HT counter corruption workaround Stephane Eranian
2014-06-04 21:34 ` [PATCH 1/9] perf,x86: rename er_flags to flags Stephane Eranian
2014-06-04 21:34 ` [PATCH 2/9] pref/x86: vectorize cpuc->kfree_on_online Stephane Eranian
2014-06-04 21:34 ` [PATCH 3/9] perf/x86: add 3 new scheduling callbacks Stephane Eranian
2014-06-04 21:34 ` [PATCH 4/9] perf/x86: add cross-HT counter exclusion infrastructure Stephane Eranian
2014-06-05  7:47   ` Peter Zijlstra
2014-06-05 10:51     ` Stephane Eranian
2014-06-05  8:04   ` Peter Zijlstra
2014-06-05 13:36     ` Maria Dimakopoulou
2014-06-05  8:29   ` Peter Zijlstra
2014-06-05 21:33     ` Andi Kleen
2014-06-05 21:38       ` Stephane Eranian
2014-06-10 11:53     ` Stephane Eranian
2014-06-10 12:26       ` Peter Zijlstra
2014-06-04 21:34 ` [PATCH 5/9] perf/x86: implement cross-HT corruption bug workaround Stephane Eranian
2014-06-05 13:38   ` Peter Zijlstra
2014-06-05 13:42   ` Peter Zijlstra
2014-06-05 13:48   ` Peter Zijlstra
2014-06-05 14:01     ` Maria Dimakopoulou
2014-06-05 14:04       ` Borislav Petkov
2014-06-05 14:11       ` Peter Zijlstra
2014-06-05 14:14         ` Peter Zijlstra
2014-06-05 14:24           ` Maria Dimakopoulou
2014-06-05 14:04   ` Peter Zijlstra
2014-06-05 14:15     ` Stephane Eranian
2014-06-05 14:21       ` Peter Zijlstra
2014-06-05 14:26         ` Stephane Eranian
2014-06-05 14:31           ` Peter Zijlstra
2014-06-05 15:31         ` Stephane Eranian
2014-06-04 21:34 ` [PATCH 6/9] perf/x86: enforce HT bug workaround for SNB/IVB/HSW Stephane Eranian
2014-06-04 21:34 ` [PATCH 7/9] perf/x86: enforce HT bug workaround with PEBS " Stephane Eranian
2014-06-04 21:34 ` [PATCH 8/9] perf/x86: fix intel_get_event_constraints() for dynamic constraints Stephane Eranian
2014-06-04 21:34 ` [PATCH 9/9] perf/x86: add syfs entry to disable HT bug workaround Stephane Eranian
2014-06-05  8:32   ` Matt Fleming
2014-06-05  9:29     ` Stephane Eranian
2014-06-05 10:01       ` Matt Fleming
2014-06-05 10:19         ` Stephane Eranian
2014-06-05 11:16           ` Matt Fleming
2014-06-05 12:02             ` Stephane Eranian
2014-06-05 13:27               ` Borislav Petkov
2014-06-05 13:42                 ` Stephane Eranian
2014-06-05 14:03                   ` Borislav Petkov
2014-06-05 14:45                     ` Maria Dimakopoulou
2014-06-05 15:17                       ` Borislav Petkov
2014-06-05 16:39                         ` Maria Dimakopoulou
2014-06-05 16:47                           ` Stephane Eranian
2014-06-05 16:52                           ` Borislav Petkov
2014-06-05 18:00                             ` Maria Dimakopoulou
2014-06-05 23:29                               ` Andi Kleen
2014-06-06  8:28                                 ` Matt Fleming
2014-06-05 12:50             ` Peter Zijlstra
2014-06-05 12:55               ` Stephane Eranian
2014-06-05 12:59                 ` Peter Zijlstra
2014-06-05 13:16                   ` Stephane Eranian
2014-06-05 13:26                     ` Peter Zijlstra
2014-06-05 13:19       ` Peter Zijlstra
2014-06-05 13:26         ` Stephane Eranian
2014-06-04 22:28 ` [PATCH 0/9] perf/x86: implement HT counter corruption workaround Andi Kleen
2014-06-05 12:45   ` Stephane Eranian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).