All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [V2] x86: mce: Bugfixes, cleanups and a new CMCI poll version
  2012-07-18 19:59 [V2] x86: mce: Bugfixes, cleanups and a new CMCI poll version Chen Gong
@ 2012-07-18  8:07 ` Borislav Petkov
  2012-07-18 19:59 ` [PATCH 1/5] x86: mce: Disable preemption when calling raise_local() Chen Gong
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Borislav Petkov @ 2012-07-18  8:07 UTC (permalink / raw)
  To: Chen Gong; +Cc: tglx, tony.luck, x86, linux-kernel

On Wed, Jul 18, 2012 at 03:59:29PM -0400, Chen Gong wrote:
> [PATCH 1/5] x86: mce: Disable preemption when calling raise_local()
> [PATCH 2/5] x86: mce: Serialize mce injection
> [PATCH 3/5] x86: mce: Split timer init
> [PATCH 4/5] x86: mce: Remove the frozen cases in the hotplug code
> [PATCH 5/5] x86: mce: Add cmci poll mode

Some (if not all) patches were authored by tglx but the single patch
emails are missing a "From:" line at the beginning of the mail
containing him as the author.

Please sort out who's the author of each patch and since it seems you
have a local git branch you're sending, git-commit has an --author
option with which you can fixup the authorship.

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [V2] x86: mce: Bugfixes, cleanups and a new CMCI poll version
@ 2012-07-18 19:59 Chen Gong
  2012-07-18  8:07 ` Borislav Petkov
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Chen Gong @ 2012-07-18 19:59 UTC (permalink / raw)
  To: tglx; +Cc: tony.luck, bp, x86, linux-kernel

[PATCH 1/5] x86: mce: Disable preemption when calling raise_local()
[PATCH 2/5] x86: mce: Serialize mce injection
[PATCH 3/5] x86: mce: Split timer init
[PATCH 4/5] x86: mce: Remove the frozen cases in the hotplug code
[PATCH 5/5] x86: mce: Add cmci poll mode

The following series fixes a few interesting bugs (found by review in
context of the CMCI poll effort) and a cleanup to the timer/hotplug
code followed by a consolidated version of the CMCI poll
implementation. This series is based on Linus' tree.

  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/5] x86: mce: Disable preemption when calling raise_local()
  2012-07-18 19:59 [V2] x86: mce: Bugfixes, cleanups and a new CMCI poll version Chen Gong
  2012-07-18  8:07 ` Borislav Petkov
@ 2012-07-18 19:59 ` Chen Gong
  2012-07-18 19:59 ` [PATCH 2/5] x86: mce: Serialize mce injection Chen Gong
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Chen Gong @ 2012-07-18 19:59 UTC (permalink / raw)
  To: tglx; +Cc: tony.luck, bp, x86, linux-kernel, Chen Gong

raise_mce() has a code path which does not disable preemption when the
raise_local() is called. The per cpu variable access in raise_local()
depends on preemption being disabled to be functional. So that code
path was either never tested or never tested with CONFIG_DEBUG_PREEMPT
enabled.

Add the missing preempt_disable/enable() pair around the call.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/mcheck/mce-inject.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/cpu/mcheck/mce-inject.c b/arch/x86/kernel/cpu/mcheck/mce-inject.c
index fc4beb3..753746f 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-inject.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-inject.c
@@ -194,7 +194,11 @@ static void raise_mce(struct mce *m)
 		put_online_cpus();
 	} else
 #endif
+	{
+		preempt_disable();
 		raise_local();
+		preempt_enable();
+	}
 }
 
 /* Error injection interface */
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/5] x86: mce: Serialize mce injection
  2012-07-18 19:59 [V2] x86: mce: Bugfixes, cleanups and a new CMCI poll version Chen Gong
  2012-07-18  8:07 ` Borislav Petkov
  2012-07-18 19:59 ` [PATCH 1/5] x86: mce: Disable preemption when calling raise_local() Chen Gong
@ 2012-07-18 19:59 ` Chen Gong
  2012-07-18 19:59 ` [PATCH 3/5] x86: mce: Split timer init Chen Gong
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Chen Gong @ 2012-07-18 19:59 UTC (permalink / raw)
  To: tglx; +Cc: tony.luck, bp, x86, linux-kernel, Chen Gong

raise_mce() fiddles with global state, but lacks any kind of
serialization.

Add a mutex around the raise_mce() call, so concurrent writers do not
stomp on each other toes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/mcheck/mce-inject.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/cpu/mcheck/mce-inject.c b/arch/x86/kernel/cpu/mcheck/mce-inject.c
index 753746f..ddc72f8 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-inject.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-inject.c
@@ -78,6 +78,7 @@ static void raise_exception(struct mce *m, struct pt_regs *pregs)
 }
 
 static cpumask_var_t mce_inject_cpumask;
+static DEFINE_MUTEX(mce_inject_mutex);
 
 static int mce_raise_notify(unsigned int cmd, struct pt_regs *regs)
 {
@@ -229,7 +230,10 @@ static ssize_t mce_write(struct file *filp, const char __user *ubuf,
 	 * so do it a jiffie or two later everywhere.
 	 */
 	schedule_timeout(2);
+
+	mutex_lock(&mce_inject_mutex);
 	raise_mce(&m);
+	mutex_unlock(&mce_inject_mutex);
 	return usize;
 }
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/5] x86: mce: Split timer init
  2012-07-18 19:59 [V2] x86: mce: Bugfixes, cleanups and a new CMCI poll version Chen Gong
                   ` (2 preceding siblings ...)
  2012-07-18 19:59 ` [PATCH 2/5] x86: mce: Serialize mce injection Chen Gong
@ 2012-07-18 19:59 ` Chen Gong
  2012-07-18 19:59 ` [PATCH 4/5] x86: mce: Remove the frozen cases in the hotplug code Chen Gong
  2012-07-18 19:59 ` [PATCH 5/5] x86: mce: Add cmci poll mode Chen Gong
  5 siblings, 0 replies; 9+ messages in thread
From: Chen Gong @ 2012-07-18 19:59 UTC (permalink / raw)
  To: tglx; +Cc: tony.luck, bp, x86, linux-kernel, Chen Gong

Split timer init function into the init and the start part, so the
start part can replace the open coded version in CPU_DOWN_FAILED.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c |   25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index da27c5d..9bc425f 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1554,23 +1554,28 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c)
 	}
 }
 
-static void __mcheck_cpu_init_timer(void)
+static void mce_start_timer(unsigned int cpu, struct timer_list *t)
 {
-	struct timer_list *t = &__get_cpu_var(mce_timer);
 	unsigned long iv = check_interval * HZ;
 
-	setup_timer(t, mce_timer_fn, smp_processor_id());
+	__this_cpu_write(mce_next_interval, iv);
 
-	if (mce_ignore_ce)
+	if (mce_ignore_ce || !iv)
 		return;
 
-	__this_cpu_write(mce_next_interval, iv);
-	if (!iv)
-		return;
 	t->expires = round_jiffies(jiffies + iv);
 	add_timer_on(t, smp_processor_id());
 }
 
+static void __mcheck_cpu_init_timer(void)
+{
+	struct timer_list *t = &__get_cpu_var(mce_timer);
+	unsigned int cpu = smp_processor_id();
+
+	setup_timer(t, mce_timer_fn, cpu);
+	mce_start_timer(cpu, t);
+}
+
 /* Handle unconfigured int18 (should never happen) */
 static void unexpected_machine_check(struct pt_regs *regs, long error_code)
 {
@@ -2275,12 +2280,8 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
 		break;
 	case CPU_DOWN_FAILED:
 	case CPU_DOWN_FAILED_FROZEN:
-		if (!mce_ignore_ce && check_interval) {
-			t->expires = round_jiffies(jiffies +
-					per_cpu(mce_next_interval, cpu));
-			add_timer_on(t, cpu);
-		}
 		smp_call_function_single(cpu, mce_reenable_cpu, &action, 1);
+		mce_start_timer(cpu, t);
 		break;
 	case CPU_POST_DEAD:
 		/* intentionally ignoring frozen here */
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/5] x86: mce: Remove the frozen cases in the hotplug code
  2012-07-18 19:59 [V2] x86: mce: Bugfixes, cleanups and a new CMCI poll version Chen Gong
                   ` (3 preceding siblings ...)
  2012-07-18 19:59 ` [PATCH 3/5] x86: mce: Split timer init Chen Gong
@ 2012-07-18 19:59 ` Chen Gong
  2012-07-18 19:59 ` [PATCH 5/5] x86: mce: Add cmci poll mode Chen Gong
  5 siblings, 0 replies; 9+ messages in thread
From: Chen Gong @ 2012-07-18 19:59 UTC (permalink / raw)
  To: tglx; +Cc: tony.luck, bp, x86, linux-kernel, Chen Gong

No point in having double cases if we can simply mask the FROZEN bit
out.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/mcheck/mce.c |   12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 9bc425f..eff73e7 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -2260,34 +2260,32 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
 	unsigned int cpu = (unsigned long)hcpu;
 	struct timer_list *t = &per_cpu(mce_timer, cpu);
 
-	switch (action) {
+	switch (action & ~CPU_TASKS_FROZEN) {
 	case CPU_ONLINE:
-	case CPU_ONLINE_FROZEN:
 		mce_device_create(cpu);
 		if (threshold_cpu_callback)
 			threshold_cpu_callback(action, cpu);
 		break;
 	case CPU_DEAD:
-	case CPU_DEAD_FROZEN:
 		if (threshold_cpu_callback)
 			threshold_cpu_callback(action, cpu);
 		mce_device_remove(cpu);
 		break;
 	case CPU_DOWN_PREPARE:
-	case CPU_DOWN_PREPARE_FROZEN:
 		del_timer_sync(t);
 		smp_call_function_single(cpu, mce_disable_cpu, &action, 1);
 		break;
 	case CPU_DOWN_FAILED:
-	case CPU_DOWN_FAILED_FROZEN:
 		smp_call_function_single(cpu, mce_reenable_cpu, &action, 1);
 		mce_start_timer(cpu, t);
 		break;
-	case CPU_POST_DEAD:
+	}
+
+	if (action == CPU_POST_DEAD) {
 		/* intentionally ignoring frozen here */
 		cmci_rediscover(cpu);
-		break;
 	}
+
 	return NOTIFY_OK;
 }
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 5/5] x86: mce: Add cmci poll mode
  2012-07-18 19:59 [V2] x86: mce: Bugfixes, cleanups and a new CMCI poll version Chen Gong
                   ` (4 preceding siblings ...)
  2012-07-18 19:59 ` [PATCH 4/5] x86: mce: Remove the frozen cases in the hotplug code Chen Gong
@ 2012-07-18 19:59 ` Chen Gong
  5 siblings, 0 replies; 9+ messages in thread
From: Chen Gong @ 2012-07-18 19:59 UTC (permalink / raw)
  To: tglx; +Cc: tony.luck, bp, x86, linux-kernel, Chen Gong

When CMCI is too many to handle, it should be disabled
to avoid hanging the whole system. In the meanwhile, CMCI poll
timer can be employed to receive CMCI periodically. When no
more CMCI happens CMCI handler can be switched from poll mode
to interrupt mode again.

By now, every CPU core owns one poll timer, but in fact, maybe
it should be enough that every package (or socket) owning one
poll timer. It is because CMCI gets broadcast to all threads on
the same socket. So if one cpu has a problem, all the cpus on
the same socket have a problem.

Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Chen Gong <gong.chen@linux.intel.com>
---
 arch/x86/kernel/cpu/mcheck/mce-internal.h |   12 ++++
 arch/x86/kernel/cpu/mcheck/mce.c          |   47 ++++++++++++--
 arch/x86/kernel/cpu/mcheck/mce_intel.c    |   99 ++++++++++++++++++++++++++++-
 3 files changed, 151 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce-internal.h b/arch/x86/kernel/cpu/mcheck/mce-internal.h
index ed44c8a..6a05c1d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-internal.h
+++ b/arch/x86/kernel/cpu/mcheck/mce-internal.h
@@ -28,6 +28,18 @@ extern int mce_ser;
 
 extern struct mce_bank *mce_banks;
 
+#ifdef CONFIG_X86_MCE_INTEL
+unsigned long mce_intel_adjust_timer(unsigned long interval);
+void mce_intel_cmci_poll(void);
+void mce_intel_hcpu_update(unsigned long cpu);
+#else
+# define mce_intel_adjust_timer mce_adjust_timer_default
+static inline void mce_intel_cmci_poll(void) { }
+static inline void mce_intel_hcpu_update(unsigned long cpu) { }
+#endif
+
+void mce_timer_kick(unsigned long interval);
+
 #ifdef CONFIG_ACPI_APEI
 int apei_write_mce(struct mce *m);
 ssize_t apei_read_mce(struct mce *m, u64 *record_id);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index eff73e7..95738db0 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1256,6 +1256,14 @@ static unsigned long check_interval = 5 * 60; /* 5 minutes */
 static DEFINE_PER_CPU(unsigned long, mce_next_interval); /* in jiffies */
 static DEFINE_PER_CPU(struct timer_list, mce_timer);
 
+static unsigned long mce_adjust_timer_default(unsigned long interval)
+{
+	return interval;
+}
+
+static unsigned long (*mce_adjust_timer)(unsigned long interval) =
+	mce_adjust_timer_default;
+
 static void mce_timer_fn(unsigned long data)
 {
 	struct timer_list *t = &__get_cpu_var(mce_timer);
@@ -1266,6 +1274,7 @@ static void mce_timer_fn(unsigned long data)
 	if (mce_available(__this_cpu_ptr(&cpu_info))) {
 		machine_check_poll(MCP_TIMESTAMP,
 				&__get_cpu_var(mce_poll_banks));
+		mce_intel_cmci_poll();
 	}
 
 	/*
@@ -1273,14 +1282,38 @@ static void mce_timer_fn(unsigned long data)
 	 * polling interval, otherwise increase the polling interval.
 	 */
 	iv = __this_cpu_read(mce_next_interval);
-	if (mce_notify_irq())
+	if (mce_notify_irq()) {
 		iv = max(iv / 2, (unsigned long) HZ/100);
-	else
+	} else {
 		iv = min(iv * 2, round_jiffies_relative(check_interval * HZ));
+		iv = mce_adjust_timer(iv);
+	}
 	__this_cpu_write(mce_next_interval, iv);
+	/* Might have become 0 after CMCI storm subsided */
+	if (iv) {
+		t->expires = jiffies + iv;
+		add_timer_on(t, smp_processor_id());
+	}
+}
 
-	t->expires = jiffies + iv;
-	add_timer_on(t, smp_processor_id());
+/*
+ * Ensure that the timer is firing in @interval from now.
+ */
+void mce_timer_kick(unsigned long interval)
+{
+	struct timer_list *t = &__get_cpu_var(mce_timer);
+	unsigned long when = jiffies + interval;
+	unsigned long iv = __this_cpu_read(mce_next_interval);
+
+	if (timer_pending(t)) {
+		if (time_before(when, t->expires))
+			mod_timer_pinned(t, when);
+	} else {
+		t->expires = round_jiffies(when);
+		add_timer_on(t, smp_processor_id());
+	}
+	if (interval < iv)
+		__this_cpu_write(mce_next_interval, interval);
 }
 
 /* Must not be called in IRQ context where del_timer_sync() can deadlock */
@@ -1545,6 +1578,7 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c)
 	switch (c->x86_vendor) {
 	case X86_VENDOR_INTEL:
 		mce_intel_feature_init(c);
+		mce_adjust_timer = mce_intel_adjust_timer;
 		break;
 	case X86_VENDOR_AMD:
 		mce_amd_feature_init(c);
@@ -1556,7 +1590,7 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c)
 
 static void mce_start_timer(unsigned int cpu, struct timer_list *t)
 {
-	unsigned long iv = check_interval * HZ;
+	unsigned long iv = mce_adjust_timer(check_interval * HZ);
 
 	__this_cpu_write(mce_next_interval, iv);
 
@@ -2270,10 +2304,11 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
 		if (threshold_cpu_callback)
 			threshold_cpu_callback(action, cpu);
 		mce_device_remove(cpu);
+		mce_intel_hcpu_update(cpu);
 		break;
 	case CPU_DOWN_PREPARE:
-		del_timer_sync(t);
 		smp_call_function_single(cpu, mce_disable_cpu, &action, 1);
+		del_timer_sync(t);
 		break;
 	case CPU_DOWN_FAILED:
 		smp_call_function_single(cpu, mce_reenable_cpu, &action, 1);
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index 38e49bc..693bc7d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -15,6 +15,8 @@
 #include <asm/msr.h>
 #include <asm/mce.h>
 
+#include "mce-internal.h"
+
 /*
  * Support for Intel Correct Machine Check Interrupts. This allows
  * the CPU to raise an interrupt when a corrected machine check happened.
@@ -30,7 +32,22 @@ static DEFINE_PER_CPU(mce_banks_t, mce_banks_owned);
  */
 static DEFINE_RAW_SPINLOCK(cmci_discover_lock);
 
-#define CMCI_THRESHOLD 1
+#define CMCI_THRESHOLD		1
+#define CMCI_POLL_INTERVAL	(30 * HZ)
+#define CMCI_STORM_INTERVAL	(1 * HZ)
+#define CMCI_STORM_TRESHOLD	5
+
+static DEFINE_PER_CPU(unsigned long, cmci_time_stamp);
+static DEFINE_PER_CPU(unsigned int, cmci_storm_cnt);
+static DEFINE_PER_CPU(unsigned int, cmci_storm_state);
+
+enum {
+	CMCI_STORM_NONE,
+	CMCI_STORM_ACTIVE,
+	CMCI_STORM_SUBSIDED,
+};
+
+static atomic_t cmci_storm_on_cpus;
 
 static int cmci_supported(int *banks)
 {
@@ -53,6 +70,84 @@ static int cmci_supported(int *banks)
 	return !!(cap & MCG_CMCI_P);
 }
 
+void mce_intel_cmci_poll(void)
+{
+	if (__this_cpu_read(cmci_storm_state) == CMCI_STORM_NONE)
+		return;
+	machine_check_poll(MCP_TIMESTAMP, &__get_cpu_var(mce_banks_owned));
+}
+
+void mce_intel_hcpu_update(unsigned long cpu)
+{
+	if (per_cpu(cmci_storm_state, cpu) == CMCI_STORM_ACTIVE)
+		atomic_dec(&cmci_storm_on_cpus);
+
+	per_cpu(cmci_storm_state, cpu) = CMCI_STORM_NONE;
+}
+
+unsigned long mce_intel_adjust_timer(unsigned long interval)
+{
+	if (interval < CMCI_POLL_INTERVAL)
+		return interval;
+
+	switch (__this_cpu_read(cmci_storm_state)) {
+	case CMCI_STORM_ACTIVE:
+		/*
+		 * We switch back to interrupt mode once the poll timer has
+		 * silenced itself. That means no events recorded and the
+		 * timer interval is back to our poll interval.
+		 */
+		__this_cpu_write(cmci_storm_state, CMCI_STORM_SUBSIDED);
+		atomic_dec(&cmci_storm_on_cpus);
+
+	case CMCI_STORM_SUBSIDED:
+		/*
+		 * We wait for all cpus to go back to SUBSIDED
+		 * state. When that happens we switch back to
+		 * interrupt mode.
+		 */
+		if (!atomic_read(&cmci_storm_on_cpus)) {
+			__this_cpu_write(cmci_storm_state, CMCI_STORM_NONE);
+			cmci_reenable();
+			cmci_recheck();
+		}
+		return CMCI_POLL_INTERVAL;
+	default:
+		/*
+		 * We have shiny wheather, let the poll do whatever it
+		 * thinks.
+		 */
+		return interval;
+	}
+}
+
+static bool cmci_storm_detect(void)
+{
+	unsigned int cnt = __this_cpu_read(cmci_storm_cnt);
+	unsigned long ts = __this_cpu_read(cmci_time_stamp);
+	unsigned long now = jiffies;
+
+	if (__this_cpu_read(cmci_storm_state) != CMCI_STORM_NONE)
+		return true;
+
+	if (time_before_eq(now, ts + CMCI_STORM_INTERVAL)) {
+		cnt++;
+	} else {
+		cnt = 1;
+		__this_cpu_write(cmci_time_stamp, now);
+	}
+	__this_cpu_write(cmci_storm_cnt, cnt);
+
+	if (cnt <= CMCI_STORM_TRESHOLD)
+		return false;
+
+	cmci_clear();
+	__this_cpu_write(cmci_storm_state, CMCI_STORM_ACTIVE);
+	atomic_inc(&cmci_storm_on_cpus);
+	mce_timer_kick(CMCI_POLL_INTERVAL);
+	return true;
+}
+
 /*
  * The interrupt handler. This is called on every event.
  * Just call the poller directly to log any events.
@@ -61,6 +156,8 @@ static int cmci_supported(int *banks)
  */
 static void intel_threshold_interrupt(void)
 {
+	if (cmci_storm_detect())
+		return;
 	machine_check_poll(MCP_TIMESTAMP, &__get_cpu_var(mce_banks_owned));
 	mce_notify_irq();
 }
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/5] x86: mce: Serialize mce injection
  2012-07-19 17:59 [RESEND PATCH 0/5 V2] x86: mce: Bugfixes, cleanups and a new CMCI poll version Chen Gong
@ 2012-07-19 17:59 ` Chen Gong
  0 siblings, 0 replies; 9+ messages in thread
From: Chen Gong @ 2012-07-19 17:59 UTC (permalink / raw)
  To: tglx; +Cc: tony.luck, bp, x86, linux-kernel

From: Thomas Gleixner <tglx@linutronix.de>

raise_mce() fiddles with global state, but lacks any kind of
serialization.

Add a mutex around the raise_mce() call, so concurrent writers do not
stomp on each other toes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/mcheck/mce-inject.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/cpu/mcheck/mce-inject.c b/arch/x86/kernel/cpu/mcheck/mce-inject.c
index 753746f..ddc72f8 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-inject.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-inject.c
@@ -78,6 +78,7 @@ static void raise_exception(struct mce *m, struct pt_regs *pregs)
 }
 
 static cpumask_var_t mce_inject_cpumask;
+static DEFINE_MUTEX(mce_inject_mutex);
 
 static int mce_raise_notify(unsigned int cmd, struct pt_regs *regs)
 {
@@ -229,7 +230,10 @@ static ssize_t mce_write(struct file *filp, const char __user *ubuf,
 	 * so do it a jiffie or two later everywhere.
 	 */
 	schedule_timeout(2);
+
+	mutex_lock(&mce_inject_mutex);
 	raise_mce(&m);
+	mutex_unlock(&mce_inject_mutex);
 	return usize;
 }
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [patch 2/5] x86: mce: Serialize mce injection
  2012-06-06 21:53 [patch 0/5] x86: mce: Bugfixes, cleanups and a new CMCI poll version Thomas Gleixner
@ 2012-06-06 21:53 ` Thomas Gleixner
  0 siblings, 0 replies; 9+ messages in thread
From: Thomas Gleixner @ 2012-06-06 21:53 UTC (permalink / raw)
  To: LKML; +Cc: Tony Luck, Borislav Petkov, Chen Gong, x86, Peter Zijlstra

[-- Attachment #1: x86-mce-serialize-mce-injection.patch --]
[-- Type: text/plain, Size: 1006 bytes --]

raise_mce() fiddles with global state, but lacks any kind of
serialization.

Add a mutex around the raise_mce() call, so concurrent writers do not
stomp on each other toes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/mcheck/mce-inject.c |    4 ++++
 1 file changed, 4 insertions(+)

Index: tip/arch/x86/kernel/cpu/mcheck/mce-inject.c
===================================================================
--- tip.orig/arch/x86/kernel/cpu/mcheck/mce-inject.c
+++ tip/arch/x86/kernel/cpu/mcheck/mce-inject.c
@@ -78,6 +78,7 @@ static void raise_exception(struct mce *
 }
 
 static cpumask_var_t mce_inject_cpumask;
+static DEFINE_MUTEX(mce_inject_mutex);
 
 static int mce_raise_notify(unsigned int cmd, struct pt_regs *regs)
 {
@@ -229,7 +230,10 @@ static ssize_t mce_write(struct file *fi
 	 * so do it a jiffie or two later everywhere.
 	 */
 	schedule_timeout(2);
+
+	mutex_lock(&mce_inject_mutex);
 	raise_mce(&m);
+	mutex_unlock(&mce_inject_mutex);
 	return usize;
 }
 



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-07-19  5:59 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-18 19:59 [V2] x86: mce: Bugfixes, cleanups and a new CMCI poll version Chen Gong
2012-07-18  8:07 ` Borislav Petkov
2012-07-18 19:59 ` [PATCH 1/5] x86: mce: Disable preemption when calling raise_local() Chen Gong
2012-07-18 19:59 ` [PATCH 2/5] x86: mce: Serialize mce injection Chen Gong
2012-07-18 19:59 ` [PATCH 3/5] x86: mce: Split timer init Chen Gong
2012-07-18 19:59 ` [PATCH 4/5] x86: mce: Remove the frozen cases in the hotplug code Chen Gong
2012-07-18 19:59 ` [PATCH 5/5] x86: mce: Add cmci poll mode Chen Gong
  -- strict thread matches above, loose matches on Subject: below --
2012-07-19 17:59 [RESEND PATCH 0/5 V2] x86: mce: Bugfixes, cleanups and a new CMCI poll version Chen Gong
2012-07-19 17:59 ` [PATCH 2/5] x86: mce: Serialize mce injection Chen Gong
2012-06-06 21:53 [patch 0/5] x86: mce: Bugfixes, cleanups and a new CMCI poll version Thomas Gleixner
2012-06-06 21:53 ` [patch 2/5] x86: mce: Serialize mce injection Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.