All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] x86, mce: misc fix/cleanups, cont
@ 2011-06-17  8:37 Hidetoshi Seto
  2011-06-17  8:40 ` [PATCH 1/8] x86, mce: stop calling del_timer_sync() from interrupt Hidetoshi Seto
                   ` (7 more replies)
  0 siblings, 8 replies; 25+ messages in thread
From: Hidetoshi Seto @ 2011-06-17  8:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Luck, Tony,
	Borislav Petkov

This patch set contains small patches, newly invented through the
recent discussion about error recovery etc.
Not drastic, but small improvements.

(This set is based on today's tip/ras/core branch.)

Thanks,
H.Seto

 arch/x86/include/asm/mce.h             |    2 -
 arch/x86/kernel/cpu/mcheck/mce.c       |  340 ++++++++++++++++----------------
 arch/x86/kernel/cpu/mcheck/mce_intel.c |    2 +-
 3 files changed, 173 insertions(+), 171 deletions(-)

Hidetoshi Seto (8):
      x86, mce: stop calling del_timer_sync() from interrupt
      x86, mce: remove redundant mce_available() checks
      x86, mce: introduce mce_timer_add()
      x86, mce: rename bootparam parser
      x86, mce: introduce mce_sysdev_init()
      x86, mce: introduce mce_memory_failure_process()
      x86, mce: rework use of TIF_MCE_NOTIFY
      x86, mce, edac: call edac_mce_parse() once per a record


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/8] x86, mce: stop calling del_timer_sync() from interrupt
  2011-06-17  8:37 [PATCH 0/8] x86, mce: misc fix/cleanups, cont Hidetoshi Seto
@ 2011-06-17  8:40 ` Hidetoshi Seto
  2011-06-17 13:56   ` Borislav Petkov
  2011-08-26 10:50   ` Borislav Petkov
  2011-06-17  8:42 ` [PATCH 2/8] x86, mce: remove redundant mce_available() checks Hidetoshi Seto
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 25+ messages in thread
From: Hidetoshi Seto @ 2011-06-17  8:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Luck, Tony,
	Borislav Petkov

Function del_timer_sync() has WARN_ON(in_irq()) in it because
calling it from interrupt context can cause deadlock if it
interrupts the target timer running.

In MCE code, del_timer_sync() is used with on_each_cpu() in
some parts for sysfs files:
  bank*, check_interval, cmci_disabled and ignore_ce.

However use of on_each_cpu() results in calling the function
passed as the argument in the interrupt context.  It means you
can see a flood of warnings from del_timer_sync() by a simple
file access, for example:

  echo 300 > /sys/devices/system/machinecheck/machinecheck0/check_interval

Fortunately these MCE specific files are rare-used and AFAIK
only few MCE geeks experience this warning on write.

To remove the warning (for my happy hacking), move timer deletion
outside of the interrupt context ;-)

v2: update patch description

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c |   22 ++++++++++++++++------
 1 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 08363b0..42fc8d2 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1140,6 +1140,17 @@ static void mce_start_timer(unsigned long data)
 	add_timer_on(t, smp_processor_id());
 }
 
+/* Must not be called from interrupt where del_timer_sync() can deadlock */
+static void mce_timer_delete_all(void)
+{
+	int cpu;
+
+	for_each_online_cpu(cpu) {
+		if (mce_available(&per_cpu(cpu_info, cpu)))
+			del_timer_sync(&per_cpu(mce_timer, cpu));
+	}
+}
+
 static void mce_do_trigger(struct work_struct *work)
 {
 	call_usermodehelper(mce_helper, mce_helper_argv, NULL, UMH_NO_WAIT);
@@ -1750,7 +1761,6 @@ static struct syscore_ops mce_syscore_ops = {
 
 static void mce_cpu_restart(void *data)
 {
-	del_timer_sync(&__get_cpu_var(mce_timer));
 	if (!mce_available(__this_cpu_ptr(&cpu_info)))
 		return;
 	__mcheck_cpu_init_generic();
@@ -1760,16 +1770,15 @@ static void mce_cpu_restart(void *data)
 /* Reinit MCEs after user configuration changes */
 static void mce_restart(void)
 {
+	mce_timer_delete_all();
 	on_each_cpu(mce_cpu_restart, NULL, 1);
 }
 
 /* Toggle features for corrected errors */
-static void mce_disable_ce(void *all)
+static void mce_disable_cmci(void *data)
 {
 	if (!mce_available(__this_cpu_ptr(&cpu_info)))
 		return;
-	if (all)
-		del_timer_sync(&__get_cpu_var(mce_timer));
 	cmci_clear();
 }
 
@@ -1852,7 +1861,8 @@ static ssize_t set_ignore_ce(struct sys_device *s,
 	if (mce_ignore_ce ^ !!new) {
 		if (new) {
 			/* disable ce features */
-			on_each_cpu(mce_disable_ce, (void *)1, 1);
+			mce_timer_delete_all();
+			on_each_cpu(mce_disable_cmci, NULL, 1);
 			mce_ignore_ce = 1;
 		} else {
 			/* enable ce features */
@@ -1875,7 +1885,7 @@ static ssize_t set_cmci_disabled(struct sys_device *s,
 	if (mce_cmci_disabled ^ !!new) {
 		if (new) {
 			/* disable cmci */
-			on_each_cpu(mce_disable_ce, NULL, 1);
+			on_each_cpu(mce_disable_cmci, NULL, 1);
 			mce_cmci_disabled = 1;
 		} else {
 			/* enable cmci */
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/8] x86, mce: remove redundant mce_available() checks
  2011-06-17  8:37 [PATCH 0/8] x86, mce: misc fix/cleanups, cont Hidetoshi Seto
  2011-06-17  8:40 ` [PATCH 1/8] x86, mce: stop calling del_timer_sync() from interrupt Hidetoshi Seto
@ 2011-06-17  8:42 ` Hidetoshi Seto
  2011-06-17 14:39   ` Borislav Petkov
  2011-06-17  8:43 ` [PATCH 3/8] x86, mce: introduce mce_timer_add() Hidetoshi Seto
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Hidetoshi Seto @ 2011-06-17  8:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Luck, Tony,
	Borislav Petkov

I noticed that there is no need to check mce_available() on runtime.
In fact we don't support asymmetric configuration such as mixture of
varied processors which some are MCE capable while others are not.

Finally only 2 checks in init code path are enough for now.

When !mce_available():
  - in mcheck_cpu_init():
    - mce timer is not initialized
    - cmci is not enabled
  - and later in mce_init_device():
    - sysfs files are not created
    - hotplug notifier is not registered
So we don't need checks at all in these functions which is never
called when !mce_available().

For safety I also add a warning which is given when bad unsupported
asymmetric configuration is detected.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
---
 arch/x86/include/asm/mce.h             |    2 -
 arch/x86/kernel/cpu/mcheck/mce.c       |   34 ++++++++++---------------------
 arch/x86/kernel/cpu/mcheck/mce_intel.c |    2 +-
 3 files changed, 12 insertions(+), 26 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 716b48a..77825dd 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -180,8 +180,6 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c);
 static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { }
 #endif
 
-int mce_available(struct cpuinfo_x86 *c);
-
 DECLARE_PER_CPU(unsigned, mce_exception_count);
 DECLARE_PER_CPU(unsigned, mce_poll_count);
 
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 42fc8d2..205b334 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -449,7 +449,7 @@ static int mce_ring_add(unsigned long pfn)
 	return 0;
 }
 
-int mce_available(struct cpuinfo_x86 *c)
+static int mce_available(struct cpuinfo_x86 *c)
 {
 	if (mce_disabled)
 		return 0;
@@ -1121,10 +1121,7 @@ static void mce_start_timer(unsigned long data)
 
 	WARN_ON(smp_processor_id() != data);
 
-	if (mce_available(__this_cpu_ptr(&cpu_info))) {
-		machine_check_poll(MCP_TIMESTAMP,
-				&__get_cpu_var(mce_poll_banks));
-	}
+	machine_check_poll(MCP_TIMESTAMP, &__get_cpu_var(mce_poll_banks));
 
 	/*
 	 * Alert userspace if needed.  If we logged an MCE, reduce the
@@ -1146,8 +1143,7 @@ static void mce_timer_delete_all(void)
 	int cpu;
 
 	for_each_online_cpu(cpu) {
-		if (mce_available(&per_cpu(cpu_info, cpu)))
-			del_timer_sync(&per_cpu(mce_timer, cpu));
+		del_timer_sync(&per_cpu(mce_timer, cpu));
 	}
 }
 
@@ -1418,8 +1414,15 @@ void __cpuinit mcheck_cpu_init(struct cpuinfo_x86 *c)
 	if (__mcheck_cpu_ancient_init(c))
 		return;
 
-	if (!mce_available(c))
+	if (!mce_available(c)) {
+		/*
+		 * Asymmetric configurations are not supported today.
+		 * If mce_banks is allocated there must be a cpu passed here.
+		 */
+		WARN_ON(!mce_disabled && mce_banks);
+		mce_disabled = 1;
 		return;
+	}
 
 	if (__mcheck_cpu_cap_init() < 0 || __mcheck_cpu_apply_quirks(c) < 0) {
 		mce_disabled = 1;
@@ -1761,8 +1764,6 @@ static struct syscore_ops mce_syscore_ops = {
 
 static void mce_cpu_restart(void *data)
 {
-	if (!mce_available(__this_cpu_ptr(&cpu_info)))
-		return;
 	__mcheck_cpu_init_generic();
 	__mcheck_cpu_init_timer();
 }
@@ -1777,15 +1778,11 @@ static void mce_restart(void)
 /* Toggle features for corrected errors */
 static void mce_disable_cmci(void *data)
 {
-	if (!mce_available(__this_cpu_ptr(&cpu_info)))
-		return;
 	cmci_clear();
 }
 
 static void mce_enable_ce(void *all)
 {
-	if (!mce_available(__this_cpu_ptr(&cpu_info)))
-		return;
 	cmci_reenable();
 	cmci_recheck();
 	if (all)
@@ -1946,9 +1943,6 @@ static __cpuinit int mce_sysdev_create(unsigned int cpu)
 	int err;
 	int i, j;
 
-	if (!mce_available(&boot_cpu_data))
-		return -EIO;
-
 	memset(&sysdev->kobj, 0, sizeof(struct kobject));
 	sysdev->id  = cpu;
 	sysdev->cls = &mce_sysdev_class;
@@ -2006,9 +2000,6 @@ static void __cpuinit mce_disable_cpu(void *h)
 	unsigned long action = *(unsigned long *)h;
 	int i;
 
-	if (!mce_available(__this_cpu_ptr(&cpu_info)))
-		return;
-
 	if (!(action & CPU_TASKS_FROZEN))
 		cmci_clear();
 	for (i = 0; i < banks; i++) {
@@ -2024,9 +2015,6 @@ static void __cpuinit mce_reenable_cpu(void *h)
 	unsigned long action = *(unsigned long *)h;
 	int i;
 
-	if (!mce_available(__this_cpu_ptr(&cpu_info)))
-		return;
-
 	if (!(action & CPU_TASKS_FROZEN))
 		cmci_reenable();
 	for (i = 0; i < banks; i++) {
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index 8694ef5..393516c 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -130,7 +130,7 @@ void cmci_recheck(void)
 	unsigned long flags;
 	int banks;
 
-	if (!mce_available(__this_cpu_ptr(&cpu_info)) || !cmci_supported(&banks))
+	if (!cmci_supported(&banks))
 		return;
 	local_irq_save(flags);
 	machine_check_poll(MCP_TIMESTAMP, &__get_cpu_var(mce_banks_owned));
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/8] x86, mce: introduce mce_timer_add()
  2011-06-17  8:37 [PATCH 0/8] x86, mce: misc fix/cleanups, cont Hidetoshi Seto
  2011-06-17  8:40 ` [PATCH 1/8] x86, mce: stop calling del_timer_sync() from interrupt Hidetoshi Seto
  2011-06-17  8:42 ` [PATCH 2/8] x86, mce: remove redundant mce_available() checks Hidetoshi Seto
@ 2011-06-17  8:43 ` Hidetoshi Seto
  2011-06-17 15:11   ` Borislav Petkov
  2011-06-17  8:44 ` [PATCH 4/8] x86, mce: rename bootparam parser Hidetoshi Seto
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Hidetoshi Seto @ 2011-06-17  8:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Luck, Tony,
	Borislav Petkov

It is too redundant to call setup_timer() every time when the timer is
going to be added.

This patch breaks __mcheck_cpu_init_timer() down, put setup part to init
code path and construct mce_timer_add() from the rests. Since there is no
strong reason to keep interval only when it back from hotplug event, this
patch also helps to kill duplicated code in hotplug notifier.

As the sideline this patch includes rename of mce_start_timer() to
mce_timer_run(), to group related functions with mce_timer_ prefix.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c |   54 ++++++++++++++++++-------------------
 1 files changed, 26 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 205b334..c3dad64 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1114,7 +1114,7 @@ static int check_interval = 5 * 60; /* 5 minutes */
 static DEFINE_PER_CPU(int, mce_next_interval); /* in jiffies */
 static DEFINE_PER_CPU(struct timer_list, mce_timer);
 
-static void mce_start_timer(unsigned long data)
+static void mce_timer_run(unsigned long data)
 {
 	struct timer_list *t = &per_cpu(mce_timer, data);
 	int *n;
@@ -1147,6 +1147,21 @@ static void mce_timer_delete_all(void)
 	}
 }
 
+static void mce_timer_add(unsigned long cpu)
+{
+	struct timer_list *t = &per_cpu(mce_timer, cpu);
+	int *n = &per_cpu(mce_next_interval, cpu);
+
+	if (mce_ignore_ce || !check_interval)
+		return;
+
+	/* reset next interval */
+	*n = check_interval * HZ;
+
+	t->expires = round_jiffies(jiffies + *n);
+	add_timer_on(t, cpu);
+}
+
 static void mce_do_trigger(struct work_struct *work)
 {
 	call_usermodehelper(mce_helper, mce_helper_argv, NULL, UMH_NO_WAIT);
@@ -1374,23 +1389,6 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c)
 	}
 }
 
-static void __mcheck_cpu_init_timer(void)
-{
-	struct timer_list *t = &__get_cpu_var(mce_timer);
-	int *n = &__get_cpu_var(mce_next_interval);
-
-	setup_timer(t, mce_start_timer, smp_processor_id());
-
-	if (mce_ignore_ce)
-		return;
-
-	*n = check_interval * HZ;
-	if (!*n)
-		return;
-	t->expires = round_jiffies(jiffies + *n);
-	add_timer_on(t, smp_processor_id());
-}
-
 /* Handle unconfigured int18 (should never happen) */
 static void unexpected_machine_check(struct pt_regs *regs, long error_code)
 {
@@ -1408,6 +1406,8 @@ void (*machine_check_vector)(struct pt_regs *, long error_code) =
  */
 void __cpuinit mcheck_cpu_init(struct cpuinfo_x86 *c)
 {
+	int cpu = smp_processor_id();
+
 	if (mce_disabled)
 		return;
 
@@ -1433,9 +1433,12 @@ void __cpuinit mcheck_cpu_init(struct cpuinfo_x86 *c)
 
 	__mcheck_cpu_init_generic();
 	__mcheck_cpu_init_vendor(c);
-	__mcheck_cpu_init_timer();
+
+	setup_timer(&__get_cpu_var(mce_timer), mce_timer_run, cpu);
 	INIT_WORK(&__get_cpu_var(mce_work), mce_process_work);
 	init_irq_work(&__get_cpu_var(mce_irq_work), &mce_irq_work_cb);
+
+	mce_timer_add(cpu);
 }
 
 /*
@@ -1765,7 +1768,7 @@ static struct syscore_ops mce_syscore_ops = {
 static void mce_cpu_restart(void *data)
 {
 	__mcheck_cpu_init_generic();
-	__mcheck_cpu_init_timer();
+	mce_timer_add(smp_processor_id());
 }
 
 /* Reinit MCEs after user configuration changes */
@@ -1786,7 +1789,7 @@ static void mce_enable_ce(void *all)
 	cmci_reenable();
 	cmci_recheck();
 	if (all)
-		__mcheck_cpu_init_timer();
+		mce_timer_add(smp_processor_id());
 }
 
 static struct sysdev_class mce_sysdev_class = {
@@ -2030,7 +2033,6 @@ static int __cpuinit
 mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
 {
 	unsigned int cpu = (unsigned long)hcpu;
-	struct timer_list *t = &per_cpu(mce_timer, cpu);
 
 	switch (action) {
 	case CPU_ONLINE:
@@ -2047,16 +2049,12 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
 		break;
 	case CPU_DOWN_PREPARE:
 	case CPU_DOWN_PREPARE_FROZEN:
-		del_timer_sync(t);
+		del_timer_sync(&per_cpu(mce_timer, cpu));
 		smp_call_function_single(cpu, mce_disable_cpu, &action, 1);
 		break;
 	case CPU_DOWN_FAILED:
 	case CPU_DOWN_FAILED_FROZEN:
-		if (!mce_ignore_ce && check_interval) {
-			t->expires = round_jiffies(jiffies +
-					   __get_cpu_var(mce_next_interval));
-			add_timer_on(t, cpu);
-		}
+		mce_timer_add(cpu);
 		smp_call_function_single(cpu, mce_reenable_cpu, &action, 1);
 		break;
 	case CPU_POST_DEAD:
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/8] x86, mce: rename bootparam parser
  2011-06-17  8:37 [PATCH 0/8] x86, mce: misc fix/cleanups, cont Hidetoshi Seto
                   ` (2 preceding siblings ...)
  2011-06-17  8:43 ` [PATCH 3/8] x86, mce: introduce mce_timer_add() Hidetoshi Seto
@ 2011-06-17  8:44 ` Hidetoshi Seto
  2011-06-17 15:41   ` Borislav Petkov
  2011-06-17  8:45 ` [PATCH 5/8] x86, mce: introduce mce_sysdev_init() Hidetoshi Seto
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Hidetoshi Seto @ 2011-06-17  8:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Luck, Tony,
	Borislav Petkov

Rename them with comprehensible prefix mcheck_setup.
(at least it looks better than current misleading name)
And relocate to put together setup codes.

	Before:			After:
	 mcheck_enable		 mcheck_setup
	 mcheck_disable		 mcheck_setup_old

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c |   91 +++++++++++++++++++-------------------
 1 files changed, 45 insertions(+), 46 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index c3dad64..ad0e9fb 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1662,50 +1662,6 @@ static struct miscdevice mce_chrdev_device = {
 	&mce_chrdev_ops,
 };
 
-/*
- * mce=off Disables machine check
- * mce=no_cmci Disables CMCI
- * mce=dont_log_ce Clears corrected events silently, no log created for CEs.
- * mce=ignore_ce Disables polling and CMCI, corrected events are not cleared.
- * mce=TOLERANCELEVEL[,monarchtimeout] (number, see above)
- *	monarchtimeout is how long to wait for other CPUs on machine
- *	check, or 0 to not wait
- * mce=bootlog Log MCEs from before booting. Disabled by default on AMD.
- * mce=nobootlog Don't log MCEs from before booting.
- */
-static int __init mcheck_enable(char *str)
-{
-	if (*str == 0) {
-		enable_p5_mce();
-		return 1;
-	}
-	if (*str == '=')
-		str++;
-	if (!strcmp(str, "off"))
-		mce_disabled = 1;
-	else if (!strcmp(str, "no_cmci"))
-		mce_cmci_disabled = 1;
-	else if (!strcmp(str, "dont_log_ce"))
-		mce_dont_log_ce = 1;
-	else if (!strcmp(str, "ignore_ce"))
-		mce_ignore_ce = 1;
-	else if (!strcmp(str, "bootlog") || !strcmp(str, "nobootlog"))
-		mce_bootlog = (str[0] == 'b');
-	else if (isdigit(str[0])) {
-		get_option(&str, &tolerant);
-		if (*str == ',') {
-			++str;
-			get_option(&str, &monarch_timeout);
-		}
-	} else {
-		printk(KERN_INFO "mce argument %s ignored. Please use /sys\n",
-		       str);
-		return 0;
-	}
-	return 1;
-}
-__setup("mce", mcheck_enable);
-
 int __init mcheck_init(void)
 {
 	mcheck_intel_therm_init();
@@ -2120,14 +2076,57 @@ static __init int mcheck_init_device(void)
 device_initcall(mcheck_init_device);
 
 /*
+ * mce=off Disables machine check
+ * mce=no_cmci Disables CMCI
+ * mce=dont_log_ce Clears corrected events silently, no log created for CEs.
+ * mce=ignore_ce Disables polling and CMCI, corrected events are not cleared.
+ * mce=TOLERANCELEVEL[,monarchtimeout] (number, see above)
+ *	monarchtimeout is how long to wait for other CPUs on machine
+ *	check, or 0 to not wait
+ * mce=bootlog Log MCEs from before booting. Disabled by default on AMD.
+ * mce=nobootlog Don't log MCEs from before booting.
+ */
+static int __init mcheck_setup(char *str)
+{
+	if (*str == 0) {
+		enable_p5_mce();
+		return 1;
+	}
+	if (*str == '=')
+		str++;
+	if (!strcmp(str, "off"))
+		mce_disabled = 1;
+	else if (!strcmp(str, "no_cmci"))
+		mce_cmci_disabled = 1;
+	else if (!strcmp(str, "dont_log_ce"))
+		mce_dont_log_ce = 1;
+	else if (!strcmp(str, "ignore_ce"))
+		mce_ignore_ce = 1;
+	else if (!strcmp(str, "bootlog") || !strcmp(str, "nobootlog"))
+		mce_bootlog = (str[0] == 'b');
+	else if (isdigit(str[0])) {
+		get_option(&str, &tolerant);
+		if (*str == ',') {
+			++str;
+			get_option(&str, &monarch_timeout);
+		}
+	} else {
+		pr_info("mce argument %s ignored. Please use /sys\n", str);
+		return 0;
+	}
+	return 1;
+}
+__setup("mce", mcheck_setup);
+
+/*
  * Old style boot options parsing. Only for compatibility.
  */
-static int __init mcheck_disable(char *str)
+static int __init mcheck_setup_old(char *str)
 {
 	mce_disabled = 1;
 	return 1;
 }
-__setup("nomce", mcheck_disable);
+__setup("nomce", mcheck_setup_old);
 
 #ifdef CONFIG_DEBUG_FS
 struct dentry *mce_get_debugfs_dir(void)
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/8] x86, mce: introduce mce_sysdev_init()
  2011-06-17  8:37 [PATCH 0/8] x86, mce: misc fix/cleanups, cont Hidetoshi Seto
                   ` (3 preceding siblings ...)
  2011-06-17  8:44 ` [PATCH 4/8] x86, mce: rename bootparam parser Hidetoshi Seto
@ 2011-06-17  8:45 ` Hidetoshi Seto
  2011-06-17 16:32   ` Borislav Petkov
  2011-06-17  8:46 ` [PATCH 6/8] x86, mce: introduce mce_memory_failure_process() Hidetoshi Seto
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Hidetoshi Seto @ 2011-06-17  8:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Luck, Tony,
	Borislav Petkov

As a one of mce_sysdev_ family, coordinate a single function that
initializes sysfs hierarchy here.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c |   65 ++++++++++++++++++++-----------------
 1 files changed, 35 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index ad0e9fb..0424299 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1953,6 +1953,39 @@ static __cpuinit void mce_sysdev_remove(unsigned int cpu)
 	cpumask_clear_cpu(cpu, mce_sysdev_initialized);
 }
 
+static __init int mce_sysdev_init(void)
+{
+	int err, i;
+
+	zalloc_cpumask_var(&mce_sysdev_initialized, GFP_KERNEL);
+
+	/* init attributes for each mce_banks */
+	for (i = 0; i < banks; i++) {
+		struct mce_bank *b = &mce_banks[i];
+		struct sysdev_attribute *a = &b->attr;
+
+		sysfs_attr_init(&a->attr);
+		a->attr.name	= b->attrname;
+		snprintf(b->attrname, ATTR_LEN, "bank%d", i);
+
+		a->attr.mode	= 0644;
+		a->show		= show_bank;
+		a->store	= set_bank;
+	}
+
+	err = sysdev_class_register(&mce_sysdev_class);
+	if (err)
+		return err;
+
+	for_each_online_cpu(i) {
+		err = mce_sysdev_create(i);
+		if (err)
+			return err;
+	}
+
+	return err;
+}
+
 /* Make sure there are no machine checks on offlined CPUs. */
 static void __cpuinit mce_disable_cpu(void *h)
 {
@@ -2025,46 +2058,18 @@ static struct notifier_block mce_cpu_notifier __cpuinitdata = {
 	.notifier_call = mce_cpu_callback,
 };
 
-static __init void mce_init_banks(void)
-{
-	int i;
-
-	for (i = 0; i < banks; i++) {
-		struct mce_bank *b = &mce_banks[i];
-		struct sysdev_attribute *a = &b->attr;
-
-		sysfs_attr_init(&a->attr);
-		a->attr.name	= b->attrname;
-		snprintf(b->attrname, ATTR_LEN, "bank%d", i);
-
-		a->attr.mode	= 0644;
-		a->show		= show_bank;
-		a->store	= set_bank;
-	}
-}
-
 static __init int mcheck_init_device(void)
 {
 	int err;
-	int i = 0;
 
 	if (!mce_available(&boot_cpu_data))
 		return -EIO;
 
-	zalloc_cpumask_var(&mce_sysdev_initialized, GFP_KERNEL);
-
-	mce_init_banks();
-
-	err = sysdev_class_register(&mce_sysdev_class);
+	/* register sysfs interface */
+	err = mce_sysdev_init();
 	if (err)
 		return err;
 
-	for_each_online_cpu(i) {
-		err = mce_sysdev_create(i);
-		if (err)
-			return err;
-	}
-
 	register_syscore_ops(&mce_syscore_ops);
 	register_hotcpu_notifier(&mce_cpu_notifier);
 
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 6/8] x86, mce: introduce mce_memory_failure_process()
  2011-06-17  8:37 [PATCH 0/8] x86, mce: misc fix/cleanups, cont Hidetoshi Seto
                   ` (4 preceding siblings ...)
  2011-06-17  8:45 ` [PATCH 5/8] x86, mce: introduce mce_sysdev_init() Hidetoshi Seto
@ 2011-06-17  8:46 ` Hidetoshi Seto
  2011-06-17 16:59   ` Borislav Petkov
  2011-06-17  8:49 ` [PATCH 7/8] x86, mce: rework use of TIF_MCE_NOTIFY Hidetoshi Seto
  2011-06-17  8:50 ` [PATCH 8/8] x86, mce, edac: call edac_mce_parse() once per a record Hidetoshi Seto
  7 siblings, 1 reply; 25+ messages in thread
From: Hidetoshi Seto @ 2011-06-17  8:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Luck, Tony,
	Borislav Petkov

And relocate related functions to near by mce_ring* routines.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c |   26 +++++++++++++++++---------
 1 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 0424299..a118496 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -395,6 +395,8 @@ static inline void mce_gather_info(struct mce *m, struct pt_regs *regs)
 }
 
 /*
+ * mce_ring, mce_memory_failure: Support for Memory errors
+ *
  * Simple lockless ring to communicate PFNs from the exception handler with the
  * process context work function. This is vastly simplified because there's
  * only a single reader and a single writer.
@@ -449,6 +451,20 @@ static int mce_ring_add(unsigned long pfn)
 	return 0;
 }
 
+/* dummy to break dependency. actual code is in mm/memory-failure.c */
+void __attribute__((weak)) memory_failure(unsigned long pfn, int vector)
+{
+	pr_err("Action optional memory failure at %lx ignored\n", pfn);
+}
+
+static inline void mce_memory_failure_process(void)
+{
+	unsigned long pfn;
+
+	while (mce_ring_get(&pfn))
+		memory_failure(pfn, MCE_VECTOR);
+}
+
 static int mce_available(struct cpuinfo_x86 *c)
 {
 	if (mce_disabled)
@@ -1049,12 +1065,6 @@ out:
 }
 EXPORT_SYMBOL_GPL(do_machine_check);
 
-/* dummy to break dependency. actual code is in mm/memory-failure.c */
-void __attribute__((weak)) memory_failure(unsigned long pfn, int vector)
-{
-	printk(KERN_ERR "Action optional memory failure at %lx ignored\n", pfn);
-}
-
 /*
  * Called after mce notification in process context. This code
  * is allowed to sleep. Call the high level VM handler to process
@@ -1068,10 +1078,8 @@ void __attribute__((weak)) memory_failure(unsigned long pfn, int vector)
  */
 void mce_notify_process(void)
 {
-	unsigned long pfn;
 	mce_notify_irq();
-	while (mce_ring_get(&pfn))
-		memory_failure(pfn, MCE_VECTOR);
+	mce_memory_failure_process();
 }
 
 static void mce_process_work(struct work_struct *dummy)
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 7/8] x86, mce: rework use of TIF_MCE_NOTIFY
  2011-06-17  8:37 [PATCH 0/8] x86, mce: misc fix/cleanups, cont Hidetoshi Seto
                   ` (5 preceding siblings ...)
  2011-06-17  8:46 ` [PATCH 6/8] x86, mce: introduce mce_memory_failure_process() Hidetoshi Seto
@ 2011-06-17  8:49 ` Hidetoshi Seto
  2011-06-17  8:50 ` [PATCH 8/8] x86, mce, edac: call edac_mce_parse() once per a record Hidetoshi Seto
  7 siblings, 0 replies; 25+ messages in thread
From: Hidetoshi Seto @ 2011-06-17  8:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Luck, Tony,
	Borislav Petkov

The basic flow of MCE handler is summarized as follows:
 1) from NMI context:
      check hardware error registers, determine error severity,
      and then panic or request non-NMI context by irq_work() to
      continue the system.
 2) from (irq) context:
      call non-NMI safe functions,
      wake up loggers and schedule work if required
 3) from worker thread:
      process some time-consuming works like memory poisoning.

TIF_MCE_NOTIFY flag is relatively legacy and have used to do tasks of
2) and 3) on the thread context that interrupted by MCE.  However now
use of irq_work() and work-queue is enough for these tasks, so this
patch removes duplicated tasks in mce_notify_process().

As the result there is no task to be done in the interrupted context,
but soon if SRAR is supported there would be some thread-specific thing
for action required.  So (even if it will be removed soon) keep the flag
for such possible future use, until better mechanism is introduced.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c |   28 ++++++++++++----------------
 1 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index a118496..bc8a02c 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1053,8 +1053,9 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 	if (kill_it && tolerant < 3)
 		force_sig(SIGBUS, current);
 
-	/* notify userspace ASAP */
-	set_thread_flag(TIF_MCE_NOTIFY);
+	/* Trap this thread before returning to user, for action required */
+	if (worst == MCE_AR_SEVERITY)
+		set_thread_flag(TIF_MCE_NOTIFY);
 
 	if (worst > 0)
 		mce_report_event(regs);
@@ -1066,25 +1067,22 @@ out:
 EXPORT_SYMBOL_GPL(do_machine_check);
 
 /*
- * Called after mce notification in process context. This code
- * is allowed to sleep. Call the high level VM handler to process
- * any corrupted pages.
- * Assume that the work queue code only calls this one at a time
- * per CPU.
- * Note we don't disable preemption, so this code might run on the wrong
- * CPU. In this case the event is picked up by the scheduled work queue.
- * This is merely a fast path to expedite processing in some common
- * cases.
+ * Called in process context that interrupted by MCE and marked with
+ * TIF_MCE_NOTFY, just before returning to erroneous userland.
+ * This code is allowed to sleep.
+ * Attempt possible recovery such as calling the high level VM handler to
+ * process any corrupted pages, and kill/signal current process if required.
  */
 void mce_notify_process(void)
 {
-	mce_notify_irq();
-	mce_memory_failure_process();
+	clear_thread_flag(TIF_MCE_NOTIFY);
+
+	/* TBD: do recovery for action required event */
 }
 
 static void mce_process_work(struct work_struct *dummy)
 {
-	mce_notify_process();
+	mce_memory_failure_process();
 }
 
 #ifdef CONFIG_X86_MCE_INTEL
@@ -1187,8 +1185,6 @@ int mce_notify_irq(void)
 	/* Not more than two messages every minute */
 	static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2);
 
-	clear_thread_flag(TIF_MCE_NOTIFY);
-
 	if (test_and_clear_bit(0, &mce_need_notify)) {
 		/* wake processes polling /dev/mcelog */
 		wake_up_interruptible(&mce_chrdev_wait);
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 8/8] x86, mce, edac: call edac_mce_parse() once per a record
  2011-06-17  8:37 [PATCH 0/8] x86, mce: misc fix/cleanups, cont Hidetoshi Seto
                   ` (6 preceding siblings ...)
  2011-06-17  8:49 ` [PATCH 7/8] x86, mce: rework use of TIF_MCE_NOTIFY Hidetoshi Seto
@ 2011-06-17  8:50 ` Hidetoshi Seto
  2011-06-17 17:10   ` Borislav Petkov
  7 siblings, 1 reply; 25+ messages in thread
From: Hidetoshi Seto @ 2011-06-17  8:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Luck, Tony,
	Borislav Petkov

There is no reason to keep it in the cmpxchg loop.
(The loop continues until mcelog acquires buffer to save the record)

And use do-while for the loop here.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c |   26 +++++++++++++-------------
 1 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index bc8a02c..f1d8ce1 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -148,21 +148,21 @@ void mce_log(struct mce *mce)
 	/* Emit the trace record: */
 	trace_mce_record(mce);
 
+	/*
+	 * If edac_mce is enabled, it will check the error type and will
+	 * process it, if it is a known error.
+	 * Otherwise, the error will be sent through mcelog interface.
+	 */
+	if (edac_mce_parse(mce))
+		return;
+
 	mce->finished = 0;
 	wmb();
-	for (;;) {
+
+	do {
 		entry = rcu_dereference_check_mce(mcelog.next);
 		for (;;) {
 			/*
-			 * If edac_mce is enabled, it will check the error type
-			 * and will process it, if it is a known error.
-			 * Otherwise, the error will be sent through mcelog
-			 * interface
-			 */
-			if (edac_mce_parse(mce))
-				return;
-
-			/*
 			 * When the buffer fills up discard new entries.
 			 * Assume that the earlier errors are the more
 			 * interesting ones:
@@ -181,10 +181,10 @@ void mce_log(struct mce *mce)
 		}
 		smp_rmb();
 		next = entry + 1;
-		if (cmpxchg(&mcelog.next, entry, next) == entry)
-			break;
-	}
+	} while (cmpxchg(&mcelog.next, entry, next) != entry);
+
 	memcpy(mcelog.entry + entry, mce, sizeof(struct mce));
+
 	wmb();
 	mcelog.entry[entry].finished = 1;
 	wmb();
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/8] x86, mce: stop calling del_timer_sync() from interrupt
  2011-06-17  8:40 ` [PATCH 1/8] x86, mce: stop calling del_timer_sync() from interrupt Hidetoshi Seto
@ 2011-06-17 13:56   ` Borislav Petkov
  2011-06-20  4:46     ` Hidetoshi Seto
  2011-08-26 10:50   ` Borislav Petkov
  1 sibling, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2011-06-17 13:56 UTC (permalink / raw)
  To: Hidetoshi Seto
  Cc: linux-kernel, x86, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Luck, Tony

On Fri, Jun 17, 2011 at 04:40:36AM -0400, Hidetoshi Seto wrote:
> Function del_timer_sync() has WARN_ON(in_irq()) in it because
> calling it from interrupt context can cause deadlock if it
> interrupts the target timer running.

No need to explain the del_timer_sync() code here - just say that it's
not allowed to call it from an IRQ context.

> In MCE code, del_timer_sync() is used with on_each_cpu() in
> some parts for sysfs files:

	 ... for the following sysfs files:

>   bank*, check_interval, cmci_disabled and ignore_ce.
> 
> However use of on_each_cpu() results in calling the function
> passed as the argument in the interrupt context.  It means you
> can see a flood of warnings from del_timer_sync() by a simple
> file access, for example:
> 
>   echo 300 > /sys/devices/system/machinecheck/machinecheck0/check_interval

Good.

> 
> Fortunately these MCE specific files are rare-used and AFAIK

					  rarely used

> only few MCE geeks experience this warning on write.

MCE geeks ??? I wonder who those are :-)

> To remove the warning (for my happy hacking), move timer deletion
> outside of the interrupt context ;-)
> 
> v2: update patch description
> 
> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c |   22 ++++++++++++++++------
>  1 files changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 08363b0..42fc8d2 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1140,6 +1140,17 @@ static void mce_start_timer(unsigned long data)
>  	add_timer_on(t, smp_processor_id());
>  }
>  
> +/* Must not be called from interrupt where del_timer_sync() can deadlock */
> +static void mce_timer_delete_all(void)
> +{
> +	int cpu;
> +
> +	for_each_online_cpu(cpu) {
> +		if (mce_available(&per_cpu(cpu_info, cpu)))
> +			del_timer_sync(&per_cpu(mce_timer, cpu));
> +	}
> +}

You're adding the mce_available(..) check just to remove it in the next
patch. Since all those sysfs nodes are behind such a check, there's no
need for it here too.

>  static void mce_do_trigger(struct work_struct *work)
>  {
>  	call_usermodehelper(mce_helper, mce_helper_argv, NULL, UMH_NO_WAIT);
> @@ -1750,7 +1761,6 @@ static struct syscore_ops mce_syscore_ops = {
>  
>  static void mce_cpu_restart(void *data)
>  {
> -	del_timer_sync(&__get_cpu_var(mce_timer));
>  	if (!mce_available(__this_cpu_ptr(&cpu_info)))
>  		return;
>  	__mcheck_cpu_init_generic();
> @@ -1760,16 +1770,15 @@ static void mce_cpu_restart(void *data)
>  /* Reinit MCEs after user configuration changes */
>  static void mce_restart(void)
>  {
> +	mce_timer_delete_all();
>  	on_each_cpu(mce_cpu_restart, NULL, 1);
>  }
>  
>  /* Toggle features for corrected errors */
> -static void mce_disable_ce(void *all)
> +static void mce_disable_cmci(void *data)
>  {
>  	if (!mce_available(__this_cpu_ptr(&cpu_info)))
>  		return;
> -	if (all)
> -		del_timer_sync(&__get_cpu_var(mce_timer));
>  	cmci_clear();
>  }
>  
> @@ -1852,7 +1861,8 @@ static ssize_t set_ignore_ce(struct sys_device *s,
>  	if (mce_ignore_ce ^ !!new) {
>  		if (new) {
>  			/* disable ce features */
> -			on_each_cpu(mce_disable_ce, (void *)1, 1);
> +			mce_timer_delete_all();
> +			on_each_cpu(mce_disable_cmci, NULL, 1);
>  			mce_ignore_ce = 1;
>  		} else {
>  			/* enable ce features */
> @@ -1875,7 +1885,7 @@ static ssize_t set_cmci_disabled(struct sys_device *s,
>  	if (mce_cmci_disabled ^ !!new) {
>  		if (new) {
>  			/* disable cmci */
> -			on_each_cpu(mce_disable_ce, NULL, 1);
> +			on_each_cpu(mce_disable_cmci, NULL, 1);
>  			mce_cmci_disabled = 1;
>  		} else {
>  			/* enable cmci */
> -- 
> 1.7.1

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/8] x86, mce: remove redundant mce_available() checks
  2011-06-17  8:42 ` [PATCH 2/8] x86, mce: remove redundant mce_available() checks Hidetoshi Seto
@ 2011-06-17 14:39   ` Borislav Petkov
  2011-06-20  4:47     ` Hidetoshi Seto
  0 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2011-06-17 14:39 UTC (permalink / raw)
  To: Hidetoshi Seto
  Cc: linux-kernel, x86, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Luck, Tony

On Fri, Jun 17, 2011 at 04:42:01AM -0400, Hidetoshi Seto wrote:
> I noticed that there is no need to check mce_available() on runtime.
> In fact we don't support asymmetric configuration such as mixture of
> varied processors which some are MCE capable while others are not.

simplify the above sentence please.

> 
> Finally only 2 checks in init code path are enough for now.
> 
> When !mce_available():
>   - in mcheck_cpu_init():
>     - mce timer is not initialized
				   ... in __mcheck_cpu_init_timer()
>     - cmci is not enabled
>   - and later in mce_init_device():
>     - sysfs files are not created
>     - hotplug notifier is not registered
> So we don't need checks at all in these functions which is never

s/which is/because they are/

> called when !mce_available().
> 
> For safety I also add a warning which is given when bad unsupported
> asymmetric configuration is detected.
> 
> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
> ---
>  arch/x86/include/asm/mce.h             |    2 -
>  arch/x86/kernel/cpu/mcheck/mce.c       |   34 ++++++++++---------------------
>  arch/x86/kernel/cpu/mcheck/mce_intel.c |    2 +-
>  3 files changed, 12 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index 716b48a..77825dd 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -180,8 +180,6 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c);
>  static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { }
>  #endif
>  
> -int mce_available(struct cpuinfo_x86 *c);
> -
>  DECLARE_PER_CPU(unsigned, mce_exception_count);
>  DECLARE_PER_CPU(unsigned, mce_poll_count);
>  
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 42fc8d2..205b334 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -449,7 +449,7 @@ static int mce_ring_add(unsigned long pfn)
>  	return 0;
>  }
>  
> -int mce_available(struct cpuinfo_x86 *c)
> +static int mce_available(struct cpuinfo_x86 *c)
>  {
>  	if (mce_disabled)
>  		return 0;
> @@ -1121,10 +1121,7 @@ static void mce_start_timer(unsigned long data)
>  
>  	WARN_ON(smp_processor_id() != data);
>  
> -	if (mce_available(__this_cpu_ptr(&cpu_info))) {
> -		machine_check_poll(MCP_TIMESTAMP,
> -				&__get_cpu_var(mce_poll_banks));
> -	}
> +	machine_check_poll(MCP_TIMESTAMP, &__get_cpu_var(mce_poll_banks));
>  
>  	/*
>  	 * Alert userspace if needed.  If we logged an MCE, reduce the
> @@ -1146,8 +1143,7 @@ static void mce_timer_delete_all(void)
>  	int cpu;
>  
>  	for_each_online_cpu(cpu) {
> -		if (mce_available(&per_cpu(cpu_info, cpu)))
> -			del_timer_sync(&per_cpu(mce_timer, cpu));
> +		del_timer_sync(&per_cpu(mce_timer, cpu));
>  	}

no need for the parentheses but this should be part of the first patch
anyway.

>  }
>  
> @@ -1418,8 +1414,15 @@ void __cpuinit mcheck_cpu_init(struct cpuinfo_x86 *c)
>  	if (__mcheck_cpu_ancient_init(c))
>  		return;
>  
> -	if (!mce_available(c))
> +	if (!mce_available(c)) {
> +		/*
> +		 * Asymmetric configurations are not supported today.
> +		 * If mce_banks is allocated there must be a cpu passed here.
> +		 */
> +		WARN_ON(!mce_disabled && mce_banks);
> +		mce_disabled = 1;
>  		return;
> +	}

I don't think this will ever happen so why complicate the code
needlessly? This can only be executed if at least one of the cores on
the system has CPUID(1)_EDX, bits 7 and 14 cleared and that's just
silly.

Besides, mcheck_init_device() already confirms we don't support
MCE-asymmetric configs:

        if (!mce_available(&boot_cpu_data))
                return -EIO;

>  	if (__mcheck_cpu_cap_init() < 0 || __mcheck_cpu_apply_quirks(c) < 0) {
>  		mce_disabled = 1;
> @@ -1761,8 +1764,6 @@ static struct syscore_ops mce_syscore_ops = {
>  
>  static void mce_cpu_restart(void *data)
>  {
> -	if (!mce_available(__this_cpu_ptr(&cpu_info)))
> -		return;
>  	__mcheck_cpu_init_generic();
>  	__mcheck_cpu_init_timer();
>  }
> @@ -1777,15 +1778,11 @@ static void mce_restart(void)
>  /* Toggle features for corrected errors */
>  static void mce_disable_cmci(void *data)
>  {
> -	if (!mce_available(__this_cpu_ptr(&cpu_info)))
> -		return;
>  	cmci_clear();
>  }
>  
>  static void mce_enable_ce(void *all)
>  {
> -	if (!mce_available(__this_cpu_ptr(&cpu_info)))
> -		return;
>  	cmci_reenable();
>  	cmci_recheck();
>  	if (all)
> @@ -1946,9 +1943,6 @@ static __cpuinit int mce_sysdev_create(unsigned int cpu)
>  	int err;
>  	int i, j;
>  
> -	if (!mce_available(&boot_cpu_data))
> -		return -EIO;
> -
>  	memset(&sysdev->kobj, 0, sizeof(struct kobject));
>  	sysdev->id  = cpu;
>  	sysdev->cls = &mce_sysdev_class;
> @@ -2006,9 +2000,6 @@ static void __cpuinit mce_disable_cpu(void *h)
>  	unsigned long action = *(unsigned long *)h;
>  	int i;
>  
> -	if (!mce_available(__this_cpu_ptr(&cpu_info)))
> -		return;
> -
>  	if (!(action & CPU_TASKS_FROZEN))
>  		cmci_clear();
>  	for (i = 0; i < banks; i++) {
> @@ -2024,9 +2015,6 @@ static void __cpuinit mce_reenable_cpu(void *h)
>  	unsigned long action = *(unsigned long *)h;
>  	int i;
>  
> -	if (!mce_available(__this_cpu_ptr(&cpu_info)))
> -		return;
> -
>  	if (!(action & CPU_TASKS_FROZEN))
>  		cmci_reenable();
>  	for (i = 0; i < banks; i++) {
> diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
> index 8694ef5..393516c 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
> @@ -130,7 +130,7 @@ void cmci_recheck(void)
>  	unsigned long flags;
>  	int banks;
>  
> -	if (!mce_available(__this_cpu_ptr(&cpu_info)) || !cmci_supported(&banks))
> +	if (!cmci_supported(&banks))
>  		return;
>  	local_irq_save(flags);
>  	machine_check_poll(MCP_TIMESTAMP, &__get_cpu_var(mce_banks_owned));
> -- 
> 1.7.1
> 
> 
> 

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/8] x86, mce: introduce mce_timer_add()
  2011-06-17  8:43 ` [PATCH 3/8] x86, mce: introduce mce_timer_add() Hidetoshi Seto
@ 2011-06-17 15:11   ` Borislav Petkov
  0 siblings, 0 replies; 25+ messages in thread
From: Borislav Petkov @ 2011-06-17 15:11 UTC (permalink / raw)
  To: Hidetoshi Seto
  Cc: linux-kernel, x86, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Luck, Tony

On Fri, Jun 17, 2011 at 04:43:09AM -0400, Hidetoshi Seto wrote:
> It is too redundant to call setup_timer() every time when the timer is
> going to be added.
> 
> This patch breaks __mcheck_cpu_init_timer() down, put setup part to init
> code path and construct mce_timer_add() from the rests. Since there is no
> strong reason to keep interval only when it back from hotplug event, this
> patch also helps to kill duplicated code in hotplug notifier.
> 
> As the sideline this patch includes rename of mce_start_timer() to
> mce_timer_run(), to group related functions with mce_timer_ prefix.
> 
> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>

Reviewed-by: Borislav Petkov <borislav.petkov@amd.com>

> ---
>  arch/x86/kernel/cpu/mcheck/mce.c |   54 ++++++++++++++++++-------------------
>  1 files changed, 26 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 205b334..c3dad64 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1114,7 +1114,7 @@ static int check_interval = 5 * 60; /* 5 minutes */
>  static DEFINE_PER_CPU(int, mce_next_interval); /* in jiffies */
>  static DEFINE_PER_CPU(struct timer_list, mce_timer);
>  
> -static void mce_start_timer(unsigned long data)
> +static void mce_timer_run(unsigned long data)
>  {
>  	struct timer_list *t = &per_cpu(mce_timer, data);
>  	int *n;
> @@ -1147,6 +1147,21 @@ static void mce_timer_delete_all(void)
>  	}
>  }
>  
> +static void mce_timer_add(unsigned long cpu)
> +{
> +	struct timer_list *t = &per_cpu(mce_timer, cpu);
> +	int *n = &per_cpu(mce_next_interval, cpu);
> +
> +	if (mce_ignore_ce || !check_interval)
> +		return;
> +
> +	/* reset next interval */
> +	*n = check_interval * HZ;
> +
> +	t->expires = round_jiffies(jiffies + *n);
> +	add_timer_on(t, cpu);
> +}
> +
>  static void mce_do_trigger(struct work_struct *work)
>  {
>  	call_usermodehelper(mce_helper, mce_helper_argv, NULL, UMH_NO_WAIT);
> @@ -1374,23 +1389,6 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c)
>  	}
>  }
>  
> -static void __mcheck_cpu_init_timer(void)
> -{
> -	struct timer_list *t = &__get_cpu_var(mce_timer);
> -	int *n = &__get_cpu_var(mce_next_interval);
> -
> -	setup_timer(t, mce_start_timer, smp_processor_id());
> -
> -	if (mce_ignore_ce)
> -		return;
> -
> -	*n = check_interval * HZ;
> -	if (!*n)
> -		return;
> -	t->expires = round_jiffies(jiffies + *n);
> -	add_timer_on(t, smp_processor_id());
> -}
> -
>  /* Handle unconfigured int18 (should never happen) */
>  static void unexpected_machine_check(struct pt_regs *regs, long error_code)
>  {
> @@ -1408,6 +1406,8 @@ void (*machine_check_vector)(struct pt_regs *, long error_code) =
>   */
>  void __cpuinit mcheck_cpu_init(struct cpuinfo_x86 *c)
>  {
> +	int cpu = smp_processor_id();
> +
>  	if (mce_disabled)
>  		return;
>  
> @@ -1433,9 +1433,12 @@ void __cpuinit mcheck_cpu_init(struct cpuinfo_x86 *c)
>  
>  	__mcheck_cpu_init_generic();
>  	__mcheck_cpu_init_vendor(c);
> -	__mcheck_cpu_init_timer();
> +
> +	setup_timer(&__get_cpu_var(mce_timer), mce_timer_run, cpu);
>  	INIT_WORK(&__get_cpu_var(mce_work), mce_process_work);
>  	init_irq_work(&__get_cpu_var(mce_irq_work), &mce_irq_work_cb);
> +
> +	mce_timer_add(cpu);
>  }
>  
>  /*
> @@ -1765,7 +1768,7 @@ static struct syscore_ops mce_syscore_ops = {
>  static void mce_cpu_restart(void *data)
>  {
>  	__mcheck_cpu_init_generic();
> -	__mcheck_cpu_init_timer();
> +	mce_timer_add(smp_processor_id());
>  }
>  
>  /* Reinit MCEs after user configuration changes */
> @@ -1786,7 +1789,7 @@ static void mce_enable_ce(void *all)
>  	cmci_reenable();
>  	cmci_recheck();
>  	if (all)
> -		__mcheck_cpu_init_timer();
> +		mce_timer_add(smp_processor_id());
>  }
>  
>  static struct sysdev_class mce_sysdev_class = {
> @@ -2030,7 +2033,6 @@ static int __cpuinit
>  mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
>  {
>  	unsigned int cpu = (unsigned long)hcpu;
> -	struct timer_list *t = &per_cpu(mce_timer, cpu);
>  
>  	switch (action) {
>  	case CPU_ONLINE:
> @@ -2047,16 +2049,12 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
>  		break;
>  	case CPU_DOWN_PREPARE:
>  	case CPU_DOWN_PREPARE_FROZEN:
> -		del_timer_sync(t);
> +		del_timer_sync(&per_cpu(mce_timer, cpu));
>  		smp_call_function_single(cpu, mce_disable_cpu, &action, 1);
>  		break;
>  	case CPU_DOWN_FAILED:
>  	case CPU_DOWN_FAILED_FROZEN:
> -		if (!mce_ignore_ce && check_interval) {
> -			t->expires = round_jiffies(jiffies +
> -					   __get_cpu_var(mce_next_interval));
> -			add_timer_on(t, cpu);
> -		}
> +		mce_timer_add(cpu);
>  		smp_call_function_single(cpu, mce_reenable_cpu, &action, 1);
>  		break;
>  	case CPU_POST_DEAD:
> -- 
> 1.7.1
> 
> 
> 

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/8] x86, mce: rename bootparam parser
  2011-06-17  8:44 ` [PATCH 4/8] x86, mce: rename bootparam parser Hidetoshi Seto
@ 2011-06-17 15:41   ` Borislav Petkov
  2011-06-17 22:25     ` Luck, Tony
  0 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2011-06-17 15:41 UTC (permalink / raw)
  To: Hidetoshi Seto
  Cc: linux-kernel, x86, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Luck, Tony

On Fri, Jun 17, 2011 at 04:44:18AM -0400, Hidetoshi Seto wrote:
> Rename them with comprehensible prefix mcheck_setup.

The relocation is causing unneeded churn for no apparent reason.

> (at least it looks better than current misleading name)
> And relocate to put together setup codes.
> 
> 	Before:			After:
> 	 mcheck_enable		 mcheck_setup

Nah, let's call it mcheck_parse_boot_param...

> 	 mcheck_disable		 mcheck_setup_old

and leave this like this. "nomce" is the same as "mce=off" and frankly,
I'd like to remove this redundancy, thus no need to do the code
relocation. In addition, I don't think there are lots of systems running
with "nomce" so I really think we should drop it.

So Ingo, hpa, what is the proper way to remove early setup params? Maybe
through Documentation/feature-removal-schedule.txt?

> 
> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c |   91 +++++++++++++++++++-------------------
>  1 files changed, 45 insertions(+), 46 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index c3dad64..ad0e9fb 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1662,50 +1662,6 @@ static struct miscdevice mce_chrdev_device = {
>  	&mce_chrdev_ops,
>  };
>  
> -/*
> - * mce=off Disables machine check
> - * mce=no_cmci Disables CMCI
> - * mce=dont_log_ce Clears corrected events silently, no log created for CEs.
> - * mce=ignore_ce Disables polling and CMCI, corrected events are not cleared.
> - * mce=TOLERANCELEVEL[,monarchtimeout] (number, see above)
> - *	monarchtimeout is how long to wait for other CPUs on machine
> - *	check, or 0 to not wait
> - * mce=bootlog Log MCEs from before booting. Disabled by default on AMD.

Please remove "Disabled by default on AMD." while you're at it, since
this is not true anymore.

> - * mce=nobootlog Don't log MCEs from before booting.
> - */
> -static int __init mcheck_enable(char *str)
> -{
> -	if (*str == 0) {
> -		enable_p5_mce();
> -		return 1;
> -	}
> -	if (*str == '=')
> -		str++;
> -	if (!strcmp(str, "off"))
> -		mce_disabled = 1;
> -	else if (!strcmp(str, "no_cmci"))
> -		mce_cmci_disabled = 1;
> -	else if (!strcmp(str, "dont_log_ce"))
> -		mce_dont_log_ce = 1;
> -	else if (!strcmp(str, "ignore_ce"))
> -		mce_ignore_ce = 1;
> -	else if (!strcmp(str, "bootlog") || !strcmp(str, "nobootlog"))
> -		mce_bootlog = (str[0] == 'b');
> -	else if (isdigit(str[0])) {
> -		get_option(&str, &tolerant);
> -		if (*str == ',') {
> -			++str;
> -			get_option(&str, &monarch_timeout);
> -		}
> -	} else {
> -		printk(KERN_INFO "mce argument %s ignored. Please use /sys\n",
> -		       str);
> -		return 0;
> -	}
> -	return 1;
> -}
> -__setup("mce", mcheck_enable);
> -
>  int __init mcheck_init(void)
>  {
>  	mcheck_intel_therm_init();
> @@ -2120,14 +2076,57 @@ static __init int mcheck_init_device(void)
>  device_initcall(mcheck_init_device);
>  
>  /*
> + * mce=off Disables machine check
> + * mce=no_cmci Disables CMCI
> + * mce=dont_log_ce Clears corrected events silently, no log created for CEs.
> + * mce=ignore_ce Disables polling and CMCI, corrected events are not cleared.
> + * mce=TOLERANCELEVEL[,monarchtimeout] (number, see above)
> + *	monarchtimeout is how long to wait for other CPUs on machine
> + *	check, or 0 to not wait
> + * mce=bootlog Log MCEs from before booting. Disabled by default on AMD.
> + * mce=nobootlog Don't log MCEs from before booting.
> + */
> +static int __init mcheck_setup(char *str)
> +{
> +	if (*str == 0) {
> +		enable_p5_mce();
> +		return 1;
> +	}
> +	if (*str == '=')
> +		str++;
> +	if (!strcmp(str, "off"))
> +		mce_disabled = 1;
> +	else if (!strcmp(str, "no_cmci"))
> +		mce_cmci_disabled = 1;
> +	else if (!strcmp(str, "dont_log_ce"))
> +		mce_dont_log_ce = 1;
> +	else if (!strcmp(str, "ignore_ce"))
> +		mce_ignore_ce = 1;
> +	else if (!strcmp(str, "bootlog") || !strcmp(str, "nobootlog"))
> +		mce_bootlog = (str[0] == 'b');
> +	else if (isdigit(str[0])) {
> +		get_option(&str, &tolerant);
> +		if (*str == ',') {
> +			++str;
> +			get_option(&str, &monarch_timeout);
> +		}
> +	} else {
> +		pr_info("mce argument %s ignored. Please use /sys\n", str);
> +		return 0;
> +	}
> +	return 1;
> +}
> +__setup("mce", mcheck_setup);
> +
> +/*
>   * Old style boot options parsing. Only for compatibility.
>   */
> -static int __init mcheck_disable(char *str)
> +static int __init mcheck_setup_old(char *str)
>  {
>  	mce_disabled = 1;
>  	return 1;
>  }
> -__setup("nomce", mcheck_disable);
> +__setup("nomce", mcheck_setup_old);
>  
>  #ifdef CONFIG_DEBUG_FS
>  struct dentry *mce_get_debugfs_dir(void)
> -- 
> 1.7.1
> 
> 
> 

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/8] x86, mce: introduce mce_sysdev_init()
  2011-06-17  8:45 ` [PATCH 5/8] x86, mce: introduce mce_sysdev_init() Hidetoshi Seto
@ 2011-06-17 16:32   ` Borislav Petkov
  2011-06-20  4:48     ` Hidetoshi Seto
  0 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2011-06-17 16:32 UTC (permalink / raw)
  To: Hidetoshi Seto
  Cc: linux-kernel, x86, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Luck, Tony, Borislav Petkov

On Fri, Jun 17, 2011 at 04:45:44AM -0400, Hidetoshi Seto wrote:
> As a one of mce_sysdev_ family, coordinate a single function that
> initializes sysfs hierarchy here.
> 
> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c |   65 ++++++++++++++++++++-----------------
>  1 files changed, 35 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index ad0e9fb..0424299 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1953,6 +1953,39 @@ static __cpuinit void mce_sysdev_remove(unsigned int cpu)
>  	cpumask_clear_cpu(cpu, mce_sysdev_initialized);
>  }
>  
> +static __init int mce_sysdev_init(void)
> +{
> +	int err, i;
> +
> +	zalloc_cpumask_var(&mce_sysdev_initialized, GFP_KERNEL);
> +
> +	/* init attributes for each mce_banks */
> +	for (i = 0; i < banks; i++) {
> +		struct mce_bank *b = &mce_banks[i];
> +		struct sysdev_attribute *a = &b->attr;
> +
> +		sysfs_attr_init(&a->attr);
> +		a->attr.name	= b->attrname;
> +		snprintf(b->attrname, ATTR_LEN, "bank%d", i);
> +
> +		a->attr.mode	= 0644;
> +		a->show		= show_bank;
> +		a->store	= set_bank;
> +	}
> +
> +	err = sysdev_class_register(&mce_sysdev_class);
> +	if (err)
> +		return err;
> +
> +	for_each_online_cpu(i) {
> +		err = mce_sysdev_create(i);
> +		if (err)
> +			return err;
> +	}

Yeah, let's handle the error case here and unwind to a sane state:

	if (err)
		goto sysdev_unwind;
...

sysdev_unwind:
	while (--i > 0)
		mce_sysdev_remove(i);


> +
> +	return err;
> +}
> +
>  /* Make sure there are no machine checks on offlined CPUs. */
>  static void __cpuinit mce_disable_cpu(void *h)
>  {
> @@ -2025,46 +2058,18 @@ static struct notifier_block mce_cpu_notifier __cpuinitdata = {
>  	.notifier_call = mce_cpu_callback,
>  };
>  
> -static __init void mce_init_banks(void)
> -{
> -	int i;
> -
> -	for (i = 0; i < banks; i++) {
> -		struct mce_bank *b = &mce_banks[i];
> -		struct sysdev_attribute *a = &b->attr;
> -
> -		sysfs_attr_init(&a->attr);
> -		a->attr.name	= b->attrname;
> -		snprintf(b->attrname, ATTR_LEN, "bank%d", i);
> -
> -		a->attr.mode	= 0644;
> -		a->show		= show_bank;
> -		a->store	= set_bank;
> -	}
> -}
> -
>  static __init int mcheck_init_device(void)
>  {
>  	int err;
> -	int i = 0;
>  
>  	if (!mce_available(&boot_cpu_data))
>  		return -EIO;
>  
> -	zalloc_cpumask_var(&mce_sysdev_initialized, GFP_KERNEL);
> -
> -	mce_init_banks();
> -
> -	err = sysdev_class_register(&mce_sysdev_class);
> +	/* register sysfs interface */
> +	err = mce_sysdev_init();
>  	if (err)
>  		return err;
>  
> -	for_each_online_cpu(i) {
> -		err = mce_sysdev_create(i);
> -		if (err)
> -			return err;
> -	}
> -
>  	register_syscore_ops(&mce_syscore_ops);
>  	register_hotcpu_notifier(&mce_cpu_notifier);
>  
> -- 
> 1.7.1

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/8] x86, mce: introduce mce_memory_failure_process()
  2011-06-17  8:46 ` [PATCH 6/8] x86, mce: introduce mce_memory_failure_process() Hidetoshi Seto
@ 2011-06-17 16:59   ` Borislav Petkov
  0 siblings, 0 replies; 25+ messages in thread
From: Borislav Petkov @ 2011-06-17 16:59 UTC (permalink / raw)
  To: Hidetoshi Seto
  Cc: linux-kernel, x86, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Luck, Tony, Borislav Petkov

On Fri, Jun 17, 2011 at 04:46:46AM -0400, Hidetoshi Seto wrote:
> And relocate related functions to near by mce_ring* routines.
> 
> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c |   26 +++++++++++++++++---------
>  1 files changed, 17 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 0424299..a118496 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -395,6 +395,8 @@ static inline void mce_gather_info(struct mce *m, struct pt_regs *regs)
>  }
>  
>  /*
> + * mce_ring, mce_memory_failure: Support for Memory errors
> + *
>   * Simple lockless ring to communicate PFNs from the exception handler with the
>   * process context work function. This is vastly simplified because there's
>   * only a single reader and a single writer.
> @@ -449,6 +451,20 @@ static int mce_ring_add(unsigned long pfn)
>  	return 0;
>  }
>  
> +/* dummy to break dependency. actual code is in mm/memory-failure.c */
> +void __attribute__((weak)) memory_failure(unsigned long pfn, int vector)
> +{
> +	pr_err("Action optional memory failure at %lx ignored\n", pfn);
> +}
> +
> +static inline void mce_memory_failure_process(void)
> +{
> +	unsigned long pfn;
> +
> +	while (mce_ring_get(&pfn))
> +		memory_failure(pfn, MCE_VECTOR);
> +}
> +

Frankly, I don't see the need to carve out two lines of code into a
function to avoid duplication and causing needless churn for this.

Just do the

	while (..)
		memory_failure(..)

in the next patch and that's it. mce.c is already cluttered with too
much stuff anyway.

>  static int mce_available(struct cpuinfo_x86 *c)
>  {
>  	if (mce_disabled)
> @@ -1049,12 +1065,6 @@ out:
>  }
>  EXPORT_SYMBOL_GPL(do_machine_check);
>  
> -/* dummy to break dependency. actual code is in mm/memory-failure.c */
> -void __attribute__((weak)) memory_failure(unsigned long pfn, int vector)
> -{
> -	printk(KERN_ERR "Action optional memory failure at %lx ignored\n", pfn);
> -}
> -
>  /*
>   * Called after mce notification in process context. This code
>   * is allowed to sleep. Call the high level VM handler to process
> @@ -1068,10 +1078,8 @@ void __attribute__((weak)) memory_failure(unsigned long pfn, int vector)
>   */
>  void mce_notify_process(void)
>  {
> -	unsigned long pfn;
>  	mce_notify_irq();
> -	while (mce_ring_get(&pfn))
> -		memory_failure(pfn, MCE_VECTOR);
> +	mce_memory_failure_process();
>  }
>  
>  static void mce_process_work(struct work_struct *dummy)
> -- 
> 1.7.1
> 
> 
> 

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 8/8] x86, mce, edac: call edac_mce_parse() once per a record
  2011-06-17  8:50 ` [PATCH 8/8] x86, mce, edac: call edac_mce_parse() once per a record Hidetoshi Seto
@ 2011-06-17 17:10   ` Borislav Petkov
  0 siblings, 0 replies; 25+ messages in thread
From: Borislav Petkov @ 2011-06-17 17:10 UTC (permalink / raw)
  To: Hidetoshi Seto
  Cc: linux-kernel, x86, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Luck, Tony

On Fri, Jun 17, 2011 at 04:50:48AM -0400, Hidetoshi Seto wrote:
> There is no reason to keep it in the cmpxchg loop.
> (The loop continues until mcelog acquires buffer to save the record)
> 
> And use do-while for the loop here.
> 
> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>

Good catch.

Reviewed-by: Borislav Petkov <borislav.petkov@amd.com>

> ---
>  arch/x86/kernel/cpu/mcheck/mce.c |   26 +++++++++++++-------------
>  1 files changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index bc8a02c..f1d8ce1 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -148,21 +148,21 @@ void mce_log(struct mce *mce)
>  	/* Emit the trace record: */
>  	trace_mce_record(mce);
>  
> +	/*
> +	 * If edac_mce is enabled, it will check the error type and will
> +	 * process it, if it is a known error.
> +	 * Otherwise, the error will be sent through mcelog interface.
> +	 */
> +	if (edac_mce_parse(mce))
> +		return;
> +
>  	mce->finished = 0;
>  	wmb();
> -	for (;;) {
> +
> +	do {
>  		entry = rcu_dereference_check_mce(mcelog.next);
>  		for (;;) {
>  			/*
> -			 * If edac_mce is enabled, it will check the error type
> -			 * and will process it, if it is a known error.
> -			 * Otherwise, the error will be sent through mcelog
> -			 * interface
> -			 */
> -			if (edac_mce_parse(mce))
> -				return;
> -
> -			/*
>  			 * When the buffer fills up discard new entries.
>  			 * Assume that the earlier errors are the more
>  			 * interesting ones:
> @@ -181,10 +181,10 @@ void mce_log(struct mce *mce)
>  		}
>  		smp_rmb();
>  		next = entry + 1;
> -		if (cmpxchg(&mcelog.next, entry, next) == entry)
> -			break;
> -	}
> +	} while (cmpxchg(&mcelog.next, entry, next) != entry);
> +
>  	memcpy(mcelog.entry + entry, mce, sizeof(struct mce));
> +
>  	wmb();
>  	mcelog.entry[entry].finished = 1;
>  	wmb();
> -- 
> 1.7.1

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 4/8] x86, mce: rename bootparam parser
  2011-06-17 15:41   ` Borislav Petkov
@ 2011-06-17 22:25     ` Luck, Tony
  2011-06-18  8:38       ` Borislav Petkov
  0 siblings, 1 reply; 25+ messages in thread
From: Luck, Tony @ 2011-06-17 22:25 UTC (permalink / raw)
  To: Borislav Petkov, Hidetoshi Seto
  Cc: linux-kernel, x86, Ingo Molnar, Thomas Gleixner, H. Peter Anvin

> and leave this like this. "nomce" is the same as "mce=off" and frankly,
> I'd like to remove this redundancy, thus no need to do the code
> relocation. In addition, I don't think there are lots of systems running
> with "nomce" so I really think we should drop it.
>
> So Ingo, hpa, what is the proper way to remove early setup params? Maybe
> through Documentation/feature-removal-schedule.txt?

Though it seems odd to me that anyone would want to turn mce off,
the fact that we have two ways to do so would indicate that people
do (or at least did) want to do this.

It seems like we'd need a long "deprecated" period (till all the
major OSVs turn out a new release) to remove this without surprises.
If the "nomce" option just disappears, then people using it will
not realize that their boot time argument was ignored (until they
see a reported error).

-Tony


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/8] x86, mce: rename bootparam parser
  2011-06-17 22:25     ` Luck, Tony
@ 2011-06-18  8:38       ` Borislav Petkov
  2011-06-20  4:48         ` Hidetoshi Seto
  0 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2011-06-18  8:38 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Borislav Petkov, Hidetoshi Seto, linux-kernel, x86, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin

On Fri, Jun 17, 2011 at 03:25:51PM -0700, Luck, Tony wrote:
> > and leave this like this. "nomce" is the same as "mce=off" and frankly,
> > I'd like to remove this redundancy, thus no need to do the code
> > relocation. In addition, I don't think there are lots of systems running
> > with "nomce" so I really think we should drop it.
> >
> > So Ingo, hpa, what is the proper way to remove early setup params? Maybe
> > through Documentation/feature-removal-schedule.txt?
> 
> Though it seems odd to me that anyone would want to turn mce off,
> the fact that we have two ways to do so would indicate that people
> do (or at least did) want to do this.

Yeah, the only usecase I could think of for this is people doing their
own userspace DRAM ECC evaluation like google. But they'd still need to
have an #MC handler... or they could parse syslog for the output from
the default unexpected_machine_check().. hm.

> It seems like we'd need a long "deprecated" period (till all the
> major OSVs turn out a new release) to remove this without surprises.
> If the "nomce" option just disappears, then people using it will
> not realize that their boot time argument was ignored (until they
> see a reported error).

Yeah, we could do a grace period with a warning:

/*
 * Old style boot options parsing. Only for compatibility.
 */
static int __init mcheck_disable(char *str)
{
	pr_err("\"nomce\" boot param is deprecated. Please use \"mce=off\"\n");

	mce_disabled = 1;
	return 1;
}
__setup("nomce", mcheck_disable);

and then kill it.

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/8] x86, mce: stop calling del_timer_sync() from interrupt
  2011-06-17 13:56   ` Borislav Petkov
@ 2011-06-20  4:46     ` Hidetoshi Seto
  2011-06-20  7:36       ` Borislav Petkov
  0 siblings, 1 reply; 25+ messages in thread
From: Hidetoshi Seto @ 2011-06-20  4:46 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Luck, Tony

(2011/06/17 22:56), Borislav Petkov wrote:
> On Fri, Jun 17, 2011 at 04:40:36AM -0400, Hidetoshi Seto wrote:
>> Function del_timer_sync() has WARN_ON(in_irq()) in it because
>> calling it from interrupt context can cause deadlock if it
>> interrupts the target timer running.
> 
> No need to explain the del_timer_sync() code here - just say that it's
> not allowed to call it from an IRQ context.
> 

I'd like to add "why not allowed" in brief too...
Anyway I'll update the patch description and make it short.

>> only few MCE geeks experience this warning on write.
> 
> MCE geeks ??? I wonder who those are :-)

I'm just joking ;-)

>> +/* Must not be called from interrupt where del_timer_sync() can deadlock */
>> +static void mce_timer_delete_all(void)
>> +{
>> +	int cpu;
>> +
>> +	for_each_online_cpu(cpu) {
>> +		if (mce_available(&per_cpu(cpu_info, cpu)))
>> +			del_timer_sync(&per_cpu(mce_timer, cpu));
>> +	}
>> +}
> 
> You're adding the mce_available(..) check just to remove it in the next
> patch. Since all those sysfs nodes are behind such a check, there's no
> need for it here too.

Maybe it would be better to change the order of patches to make
this fix after the removal of unnecessary mce_available().


Thanks,
H.Seto



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/8] x86, mce: remove redundant mce_available() checks
  2011-06-17 14:39   ` Borislav Petkov
@ 2011-06-20  4:47     ` Hidetoshi Seto
  2011-06-20  7:44       ` Borislav Petkov
  0 siblings, 1 reply; 25+ messages in thread
From: Hidetoshi Seto @ 2011-06-20  4:47 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Luck, Tony

(2011/06/17 23:39), Borislav Petkov wrote:
>> @@ -1418,8 +1414,15 @@ void __cpuinit mcheck_cpu_init(struct cpuinfo_x86 *c)
>>  	if (__mcheck_cpu_ancient_init(c))
>>  		return;
>>  
>> -	if (!mce_available(c))
>> +	if (!mce_available(c)) {
>> +		/*
>> +		 * Asymmetric configurations are not supported today.
>> +		 * If mce_banks is allocated there must be a cpu passed here.
>> +		 */
>> +		WARN_ON(!mce_disabled && mce_banks);
>> +		mce_disabled = 1;
>>  		return;
>> +	}
> 
> I don't think this will ever happen so why complicate the code
> needlessly? This can only be executed if at least one of the cores on
> the system has CPUID(1)_EDX, bits 7 and 14 cleared and that's just
> silly.

This is a guard against such silly situation, isn't it?
I don't think it looks so complicated and needless.

I think this code will not run on systems with single socket as
long as the processor vendors are in the right mind.
But in cases when it is numa with strange nodes or when it is
migrated to/from misconfigured virtual machine I can't say that
with certainty.

> 
> Besides, mcheck_init_device() already confirms we don't support
> MCE-asymmetric configs:
> 
>         if (!mce_available(&boot_cpu_data))
>                 return -EIO;

Without above check, mcheck_init_device() will not notice the
issue when boot cpu is mce capable but others are not.

Maybe I should have another patch to do well with silly
configurations, and let this patch to a simple cleanup.


Thanks,
H.Seto


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/8] x86, mce: introduce mce_sysdev_init()
  2011-06-17 16:32   ` Borislav Petkov
@ 2011-06-20  4:48     ` Hidetoshi Seto
  0 siblings, 0 replies; 25+ messages in thread
From: Hidetoshi Seto @ 2011-06-20  4:48 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Luck, Tony

(2011/06/18 1:32), Borislav Petkov wrote:
>> @@ -1953,6 +1953,39 @@ static __cpuinit void mce_sysdev_remove(unsigned int cpu)
>>  	cpumask_clear_cpu(cpu, mce_sysdev_initialized);
>>  }
>>  
>> +static __init int mce_sysdev_init(void)
>> +{
>> +	int err, i;
>> +
>> +	zalloc_cpumask_var(&mce_sysdev_initialized, GFP_KERNEL);
>> +
>> +	/* init attributes for each mce_banks */
>> +	for (i = 0; i < banks; i++) {
>> +		struct mce_bank *b = &mce_banks[i];
>> +		struct sysdev_attribute *a = &b->attr;
>> +
>> +		sysfs_attr_init(&a->attr);
>> +		a->attr.name	= b->attrname;
>> +		snprintf(b->attrname, ATTR_LEN, "bank%d", i);
>> +
>> +		a->attr.mode	= 0644;
>> +		a->show		= show_bank;
>> +		a->store	= set_bank;
>> +	}
>> +
>> +	err = sysdev_class_register(&mce_sysdev_class);
>> +	if (err)
>> +		return err;
>> +
>> +	for_each_online_cpu(i) {
>> +		err = mce_sysdev_create(i);
>> +		if (err)
>> +			return err;
>> +	}
> 
> Yeah, let's handle the error case here and unwind to a sane state:
> 
> 	if (err)
> 		goto sysdev_unwind;
> ...
> 
> sysdev_unwind:
> 	while (--i > 0)
> 		mce_sysdev_remove(i);
> 

Nice!
I'll do it.


Thanks,
H.Seto


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/8] x86, mce: rename bootparam parser
  2011-06-18  8:38       ` Borislav Petkov
@ 2011-06-20  4:48         ` Hidetoshi Seto
  0 siblings, 0 replies; 25+ messages in thread
From: Hidetoshi Seto @ 2011-06-20  4:48 UTC (permalink / raw)
  To: Borislav Petkov, Luck, Tony, Borislav Petkov, linux-kernel, x86,
	Ingo Molnar, Thomas Gleixner, H. Peter Anvin

(2011/06/18 17:38), Borislav Petkov wrote:
> On Fri, Jun 17, 2011 at 03:25:51PM -0700, Luck, Tony wrote:
>>> and leave this like this. "nomce" is the same as "mce=off" and frankly,
>>> I'd like to remove this redundancy, thus no need to do the code
>>> relocation. In addition, I don't think there are lots of systems running
>>> with "nomce" so I really think we should drop it.
>>>
>>> So Ingo, hpa, what is the proper way to remove early setup params? Maybe
>>> through Documentation/feature-removal-schedule.txt?
>>
>> Though it seems odd to me that anyone would want to turn mce off,
>> the fact that we have two ways to do so would indicate that people
>> do (or at least did) want to do this.
> 
> Yeah, the only usecase I could think of for this is people doing their
> own userspace DRAM ECC evaluation like google. But they'd still need to
> have an #MC handler... or they could parse syslog for the output from
> the default unexpected_machine_check().. hm.
> 
>> It seems like we'd need a long "deprecated" period (till all the
>> major OSVs turn out a new release) to remove this without surprises.
>> If the "nomce" option just disappears, then people using it will
>> not realize that their boot time argument was ignored (until they
>> see a reported error).
> 
> Yeah, we could do a grace period with a warning:
> 
> /*
>  * Old style boot options parsing. Only for compatibility.
>  */
> static int __init mcheck_disable(char *str)
> {
> 	pr_err("\"nomce\" boot param is deprecated. Please use \"mce=off\"\n");
> 
> 	mce_disabled = 1;
> 	return 1;
> }
> __setup("nomce", mcheck_disable);
> 
> and then kill it.

Thank you for giving a new direction!
I'll write a patch to "deprecate nomce" instead of this rename/cleanup.


Thanks,
H.Seto


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/8] x86, mce: stop calling del_timer_sync() from interrupt
  2011-06-20  4:46     ` Hidetoshi Seto
@ 2011-06-20  7:36       ` Borislav Petkov
  0 siblings, 0 replies; 25+ messages in thread
From: Borislav Petkov @ 2011-06-20  7:36 UTC (permalink / raw)
  To: Hidetoshi Seto
  Cc: linux-kernel, x86, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Luck, Tony

On Mon, Jun 20, 2011 at 12:46:28AM -0400, Hidetoshi Seto wrote:
> > You're adding the mce_available(..) check just to remove it in the next
> > patch. Since all those sysfs nodes are behind such a check, there's no
> > need for it here too.
> 
> Maybe it would be better to change the order of patches to make
> this fix after the removal of unnecessary mce_available().

Yes, and then don't add it to mce_timer_delete_all because this functionality is
guarded my mce_available() checks in the sysfs nodes init path.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/8] x86, mce: remove redundant mce_available() checks
  2011-06-20  4:47     ` Hidetoshi Seto
@ 2011-06-20  7:44       ` Borislav Petkov
  0 siblings, 0 replies; 25+ messages in thread
From: Borislav Petkov @ 2011-06-20  7:44 UTC (permalink / raw)
  To: Hidetoshi Seto
  Cc: linux-kernel, x86, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Luck, Tony

On Mon, Jun 20, 2011 at 12:47:57AM -0400, Hidetoshi Seto wrote:
> > I don't think this will ever happen so why complicate the code
> > needlessly? This can only be executed if at least one of the cores on
> > the system has CPUID(1)_EDX, bits 7 and 14 cleared and that's just
> > silly.
> 
> This is a guard against such silly situation, isn't it?
> I don't think it looks so complicated and needless.
> 
> I think this code will not run on systems with single socket as
> long as the processor vendors are in the right mind.
> But in cases when it is numa with strange nodes or when it is
> migrated to/from misconfigured virtual machine I can't say that
> with certainty.

Let me put it this way: are you talking about a concrete example where
there's machine with the BSP's MCA features enabled and the rest of the
cores are not or is this some theoretical thing you're conjecturing?

If it is the first I'd really like to know about it, if it's the second,
then my initial point stands - no need to handle purely theoretical
cases.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/8] x86, mce: stop calling del_timer_sync() from interrupt
  2011-06-17  8:40 ` [PATCH 1/8] x86, mce: stop calling del_timer_sync() from interrupt Hidetoshi Seto
  2011-06-17 13:56   ` Borislav Petkov
@ 2011-08-26 10:50   ` Borislav Petkov
  1 sibling, 0 replies; 25+ messages in thread
From: Borislav Petkov @ 2011-08-26 10:50 UTC (permalink / raw)
  To: Hidetoshi Seto, Luck, Tony
  Cc: linux-kernel, x86, Ingo Molnar, Thomas Gleixner, H. Peter Anvin

On Fri, Jun 17, 2011 at 04:40:36AM -0400, Hidetoshi Seto wrote:
> Function del_timer_sync() has WARN_ON(in_irq()) in it because
> calling it from interrupt context can cause deadlock if it
> interrupts the target timer running.
> 
> In MCE code, del_timer_sync() is used with on_each_cpu() in
> some parts for sysfs files:
>   bank*, check_interval, cmci_disabled and ignore_ce.
> 
> However use of on_each_cpu() results in calling the function
> passed as the argument in the interrupt context.  It means you
> can see a flood of warnings from del_timer_sync() by a simple
> file access, for example:
> 
>   echo 300 > /sys/devices/system/machinecheck/machinecheck0/check_interval
> 
> Fortunately these MCE specific files are rare-used and AFAIK
> only few MCE geeks experience this warning on write.
> 
> To remove the warning (for my happy hacking), move timer deletion
> outside of the interrupt context ;-)

I'm hitting this warning too in my testing here so I'd like to expedite
this patch going in. I'll send the version below upstream next week if
there are no objections:

--
>From 0e4fe50cd4ed6f5990173f4587ad07dd4782fa30 Mon Sep 17 00:00:00 2001
From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Date: Fri, 17 Jun 2011 04:40:36 -0400
Subject: [PATCH] x86, mce: Do not call del_timer_sync() in IRQ context

del_timer_sync() can cause a deadlock when called in interrupt context.
It is used with on_each_cpu() in some parts for sysfs files like bank*,
check_interval, cmci_disabled and ignore_ce.

However, use of on_each_cpu() results in calling the function passed
as the argument in interrupt context. This causes a flood of nested
warnings from del_timer_sync() (it runs on each CPU) caused even by a
simple file access like:

$ echo 300 > /sys/devices/system/machinecheck/machinecheck0/check_interval

Fortunately, these MCE-specific files are rarely used and AFAIK only few
MCE geeks experience this warning.

To remove the warning, move timer deletion outside of the interrupt
context.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c |   20 ++++++++++++++------
 1 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 08363b0..5b5ccee 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1140,6 +1140,15 @@ static void mce_start_timer(unsigned long data)
 	add_timer_on(t, smp_processor_id());
 }
 
+/* Must not be called in IRQ context where del_timer_sync() can deadlock */
+static void mce_timer_delete_all(void)
+{
+	int cpu;
+
+	for_each_online_cpu(cpu)
+		del_timer_sync(&per_cpu(mce_timer, cpu));
+}
+
 static void mce_do_trigger(struct work_struct *work)
 {
 	call_usermodehelper(mce_helper, mce_helper_argv, NULL, UMH_NO_WAIT);
@@ -1750,7 +1759,6 @@ static struct syscore_ops mce_syscore_ops = {
 
 static void mce_cpu_restart(void *data)
 {
-	del_timer_sync(&__get_cpu_var(mce_timer));
 	if (!mce_available(__this_cpu_ptr(&cpu_info)))
 		return;
 	__mcheck_cpu_init_generic();
@@ -1760,16 +1768,15 @@ static void mce_cpu_restart(void *data)
 /* Reinit MCEs after user configuration changes */
 static void mce_restart(void)
 {
+	mce_timer_delete_all();
 	on_each_cpu(mce_cpu_restart, NULL, 1);
 }
 
 /* Toggle features for corrected errors */
-static void mce_disable_ce(void *all)
+static void mce_disable_cmci(void *data)
 {
 	if (!mce_available(__this_cpu_ptr(&cpu_info)))
 		return;
-	if (all)
-		del_timer_sync(&__get_cpu_var(mce_timer));
 	cmci_clear();
 }
 
@@ -1852,7 +1859,8 @@ static ssize_t set_ignore_ce(struct sys_device *s,
 	if (mce_ignore_ce ^ !!new) {
 		if (new) {
 			/* disable ce features */
-			on_each_cpu(mce_disable_ce, (void *)1, 1);
+			mce_timer_delete_all();
+			on_each_cpu(mce_disable_cmci, NULL, 1);
 			mce_ignore_ce = 1;
 		} else {
 			/* enable ce features */
@@ -1875,7 +1883,7 @@ static ssize_t set_cmci_disabled(struct sys_device *s,
 	if (mce_cmci_disabled ^ !!new) {
 		if (new) {
 			/* disable cmci */
-			on_each_cpu(mce_disable_ce, NULL, 1);
+			on_each_cpu(mce_disable_cmci, NULL, 1);
 			mce_cmci_disabled = 1;
 		} else {
 			/* enable cmci */
-- 
1.7.4.rc2


-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2011-08-26 10:50 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-17  8:37 [PATCH 0/8] x86, mce: misc fix/cleanups, cont Hidetoshi Seto
2011-06-17  8:40 ` [PATCH 1/8] x86, mce: stop calling del_timer_sync() from interrupt Hidetoshi Seto
2011-06-17 13:56   ` Borislav Petkov
2011-06-20  4:46     ` Hidetoshi Seto
2011-06-20  7:36       ` Borislav Petkov
2011-08-26 10:50   ` Borislav Petkov
2011-06-17  8:42 ` [PATCH 2/8] x86, mce: remove redundant mce_available() checks Hidetoshi Seto
2011-06-17 14:39   ` Borislav Petkov
2011-06-20  4:47     ` Hidetoshi Seto
2011-06-20  7:44       ` Borislav Petkov
2011-06-17  8:43 ` [PATCH 3/8] x86, mce: introduce mce_timer_add() Hidetoshi Seto
2011-06-17 15:11   ` Borislav Petkov
2011-06-17  8:44 ` [PATCH 4/8] x86, mce: rename bootparam parser Hidetoshi Seto
2011-06-17 15:41   ` Borislav Petkov
2011-06-17 22:25     ` Luck, Tony
2011-06-18  8:38       ` Borislav Petkov
2011-06-20  4:48         ` Hidetoshi Seto
2011-06-17  8:45 ` [PATCH 5/8] x86, mce: introduce mce_sysdev_init() Hidetoshi Seto
2011-06-17 16:32   ` Borislav Petkov
2011-06-20  4:48     ` Hidetoshi Seto
2011-06-17  8:46 ` [PATCH 6/8] x86, mce: introduce mce_memory_failure_process() Hidetoshi Seto
2011-06-17 16:59   ` Borislav Petkov
2011-06-17  8:49 ` [PATCH 7/8] x86, mce: rework use of TIF_MCE_NOTIFY Hidetoshi Seto
2011-06-17  8:50 ` [PATCH 8/8] x86, mce, edac: call edac_mce_parse() once per a record Hidetoshi Seto
2011-06-17 17:10   ` Borislav Petkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.