linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch] reproducible athlon mce fix
@ 2003-11-02  5:57 Geoffrey Lee
  2003-11-02  7:25 ` Willy Tarreau
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Geoffrey Lee @ 2003-11-02  5:57 UTC (permalink / raw)
  To: linux-kernel, davej

[-- Attachment #1: Type: text/plain, Size: 660 bytes --]

Hi all,


After switching from 2.4.22 to 2.6.0-test9, I have received reproducible
MCE non-fatal error check messages in my kernel log.  (For example, one
shows up right after my first scsi card init).

>From Dave Jones' patch here:

http://lists.insecure.org/lists/linux-kernel/2003/Sep/7362.html

and another message here:

http://lkml.org/lkml/2003/10/7/214

would seem to imply that Athlons don't like having their Bank 0 poked at,
though that's what non-fatal.c does.  Would it be correct to make sure
that that non-fatal.c starts at bank 1, if it is an Athlon?

Dave, is the following patch correct?  Booted and tested, no ill effects
so far ...


	- g.

[-- Attachment #2: mce-fix.patch --]
[-- Type: text/plain, Size: 428 bytes --]

--- linux-2.6.0-test9/arch/i386/kernel/cpu/mcheck/non-fatal.c.orig	2003-11-02 13:31:43.000000000 +0800
+++ linux-2.6.0-test9/arch/i386/kernel/cpu/mcheck/non-fatal.c	2003-11-02 13:34:37.000000000 +0800
@@ -30,7 +30,11 @@
 	int i;
 
 	preempt_disable(); 
+#if CONFIG_MK7
+	for (i=1; i<nr_mce_banks; i++) {
+#else
 	for (i=0; i<nr_mce_banks; i++) {
+#endif
 		rdmsr (MSR_IA32_MC0_STATUS+i*4, low, high);
 
 		if (high & (1<<31)) {

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] reproducible athlon mce fix
  2003-11-02  5:57 [patch] reproducible athlon mce fix Geoffrey Lee
@ 2003-11-02  7:25 ` Willy Tarreau
  2003-11-02  7:28   ` Geoffrey Lee
                     ` (2 more replies)
  2003-11-02 12:52 ` Dave Jones
  2003-11-02 18:25 ` Kronos
  2 siblings, 3 replies; 11+ messages in thread
From: Willy Tarreau @ 2003-11-02  7:25 UTC (permalink / raw)
  To: Geoffrey Lee; +Cc: linux-kernel, davej

Hi,

I don't know if the patch is correct, but :

On Sun, Nov 02, 2003 at 01:57:48PM +0800, Geoffrey Lee wrote:
>  	preempt_disable(); 
> +#if CONFIG_MK7
> +	for (i=1; i<nr_mce_banks; i++) {
> +#else
>  	for (i=0; i<nr_mce_banks; i++) {
> +#endif

Including opening braces within #if often fools editors such as emacs
which count them and don't know about #if. Then, editing the rest of
the file can become annoying because it simply thinks that there are
two embedded for loops.

Other solutions to this problem often include one of these constructs
which are unfortunately not as beautiful :

1) a still readable one
#if CONFIG_MK7
	i=1;
#else
 	i=0;
#endif
 	for (; i<nr_mce_banks; i++) {

2) when there's absolutely no choice (eg changing part of an
   expression...) something unreadable like this.

	for (
#if CONFIG_MK7
	i=1;
#else
 	i=0;
#endif
 	i<nr_mce_banks; i++) {

I'm sure other people have other solution that don't come to mind.

Cheers,
Willy


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] reproducible athlon mce fix
  2003-11-02  7:25 ` Willy Tarreau
@ 2003-11-02  7:28   ` Geoffrey Lee
  2003-11-02  8:59   ` Zwane Mwaikambo
  2003-11-10 23:55   ` bill davidsen
  2 siblings, 0 replies; 11+ messages in thread
From: Geoffrey Lee @ 2003-11-02  7:28 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: linux-kernel, davej

On Sun, Nov 02, 2003 at 08:25:19AM +0100, Willy Tarreau wrote:
> Hi,
> 
> 
> Other solutions to this problem often include one of these constructs
> which are unfortunately not as beautiful :
> 


Hi,

Yep, I know it's ugly, but what I am really after is whether the patch
is correct or not, in terms of what it does.

If it's correct we can work out how to fix this properly, since non-fatal.c
is generic code and I am not sure if doing a CONFIG_MK7 like that is very
pretty there anyway ...

	- g.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] reproducible athlon mce fix
  2003-11-02  7:25 ` Willy Tarreau
  2003-11-02  7:28   ` Geoffrey Lee
@ 2003-11-02  8:59   ` Zwane Mwaikambo
  2003-11-10 23:55   ` bill davidsen
  2 siblings, 0 replies; 11+ messages in thread
From: Zwane Mwaikambo @ 2003-11-02  8:59 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Geoffrey Lee, linux-kernel, davej

On Sun, 2 Nov 2003, Willy Tarreau wrote:

> 1) a still readable one
> #if CONFIG_MK7
> 	i=1;
> #else
>  	i=0;
> #endif
>  	for (; i<nr_mce_banks; i++) {
> 
> 2) when there's absolutely no choice (eg changing part of an
>    expression...) something unreadable like this.
> 
> 	for (
> #if CONFIG_MK7
> 	i=1;
> #else
>  	i=0;
> #endif
>  	i<nr_mce_banks; i++) {
> 
> I'm sure other people have other solution that don't come to mind.

Perhaps..

#ifdef CONFIG_MK7
#define MCE_BANK_START	1
#else
#define MCE_BANK_START	0
#endif

for (i = MCE_BANK_START; nr_mce_banks; i++)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] reproducible athlon mce fix
  2003-11-02  5:57 [patch] reproducible athlon mce fix Geoffrey Lee
  2003-11-02  7:25 ` Willy Tarreau
@ 2003-11-02 12:52 ` Dave Jones
  2003-11-02 13:52   ` Geoffrey Lee
  2003-11-03  9:20   ` Geoffrey Lee
  2003-11-02 18:25 ` Kronos
  2 siblings, 2 replies; 11+ messages in thread
From: Dave Jones @ 2003-11-02 12:52 UTC (permalink / raw)
  To: Geoffrey Lee; +Cc: linux-kernel

On Sun, Nov 02, 2003 at 01:57:48PM +0800, Geoffrey Lee wrote:

 >  	preempt_disable(); 
 > +#if CONFIG_MK7
 > +	for (i=1; i<nr_mce_banks; i++) {
 > +#else
 >  	for (i=0; i<nr_mce_banks; i++) {
 > +#endif
 >  		rdmsr (MSR_IA32_MC0_STATUS+i*4, low, high);

This needs to be a runtime check. In 2.6, a K7 can boot
a P4 kernel, and vice versa.

		Dave

-- 
 Dave Jones     http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] reproducible athlon mce fix
  2003-11-02 12:52 ` Dave Jones
@ 2003-11-02 13:52   ` Geoffrey Lee
  2003-11-03  9:20   ` Geoffrey Lee
  1 sibling, 0 replies; 11+ messages in thread
From: Geoffrey Lee @ 2003-11-02 13:52 UTC (permalink / raw)
  To: Dave Jones, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 629 bytes --]

On Sun, Nov 02, 2003 at 12:52:03PM +0000, Dave Jones wrote:
> On Sun, Nov 02, 2003 at 01:57:48PM +0800, Geoffrey Lee wrote:
> 
>  >  	preempt_disable(); 
>  > +#if CONFIG_MK7
>  > +	for (i=1; i<nr_mce_banks; i++) {
>  > +#else
>  >  	for (i=0; i<nr_mce_banks; i++) {
>  > +#endif
>  >  		rdmsr (MSR_IA32_MC0_STATUS+i*4, low, high);
> 
> This needs to be a runtime check. In 2.6, a K7 can boot
> a P4 kernel, and vice versa.
> 

Yes, of course.

Would it be sufficient to check for the boot_cpu_data.x86 == 6 and
boot_cpu_data.x86_vendor == X86_VENDOR_AMD?  This would seem to do
the right thing.  Updated patch attached.


	- g.

[-- Attachment #2: mce-fix.patch --]
[-- Type: text/plain, Size: 1061 bytes --]

--- linux-2.6.0-test9/arch/i386/kernel/cpu/mcheck/non-fatal.c.orig	2003-11-02 13:31:43.000000000 +0800
+++ linux-2.6.0-test9/arch/i386/kernel/cpu/mcheck/non-fatal.c	2003-11-02 21:50:36.000000000 +0800
@@ -21,6 +21,7 @@
 
 static struct timer_list mce_timer;
 static int timerset;
+static int startbank;
 
 #define MCE_RATE	15*HZ	/* timer rate is 15s */
 
@@ -30,7 +31,7 @@
 	int i;
 
 	preempt_disable(); 
-	for (i=0; i<nr_mce_banks; i++) {
+	for (i=startbank; i<nr_mce_banks; i++) {
 		rdmsr (MSR_IA32_MC0_STATUS+i*4, low, high);
 
 		if (high & (1<<31)) {
@@ -68,6 +69,19 @@
 
 static int __init init_nonfatal_mce_checker(void)
 {
+	/*
+	   Certain Athlons would cause spurious MCE non-fatal
+	   exception check to be reported, if we poke at bank 0.
+	   Avoid this if the running machine is an Athlon.
+
+	   -- geoff.
+	*/
+	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD && 
+	    boot_cpu_data.x86 == 6)
+		startbank = 1;
+	else
+		startbank = 0;
+
 	if (timerset == 0) {
 		/* Set the timer to check for non-fatal
 		   errors every MCE_RATE seconds */

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] reproducible athlon mce fix
  2003-11-02  5:57 [patch] reproducible athlon mce fix Geoffrey Lee
  2003-11-02  7:25 ` Willy Tarreau
  2003-11-02 12:52 ` Dave Jones
@ 2003-11-02 18:25 ` Kronos
  2003-11-02 23:44   ` Geoffrey Lee
  2 siblings, 1 reply; 11+ messages in thread
From: Kronos @ 2003-11-02 18:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: Geoffrey Lee, Dave Jones

Geoffrey Lee ha scritto:
> After switching from 2.4.22 to 2.6.0-test9, I have received reproducible
> MCE non-fatal error check messages in my kernel log.  (For example, one
> shows up right after my first scsi card init).
[cut]
> would seem to imply that Athlons don't like having their Bank 0 poked at,
> though that's what non-fatal.c does.  Would it be correct to make sure
> that that non-fatal.c starts at bank 1, if it is an Athlon?
>
> --- linux-2.6.0-test9/arch/i386/kernel/cpu/mcheck/non-fatal.c.orig	2003-11-02 13:31:43.000000000 +0800
> +++ linux-2.6.0-test9/arch/i386/kernel/cpu/mcheck/non-fatal.c	2003-11-02 13:34:37.000000000 +0800
> @@ -30,7 +30,11 @@
>  	int i;
> 
> 	preempt_disable(); 
> +#if CONFIG_MK7
> +	for (i=1; i<nr_mce_banks; i++) {
> +#else
>  	for (i=0; i<nr_mce_banks; i++) {
> +#endif
>  		rdmsr (MSR_IA32_MC0_STATUS+i*4, low, high);
> 
>  		if (high & (1<<31)) {

In  this way  you don't  read  from bank  0. The strange  thing is  that
amd_mcheck_init doesn't enable reporting on  this bank... it should stay
clean. What's going on here?

Luca
-- 
Reply-To: kronos@kronoz.cjb.net
Home: http://kronoz.cjb.net
Quando un uomo porta dei fiori a sua moglie senza motivo, 
un motivo c'e`.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] reproducible athlon mce fix
  2003-11-02 18:25 ` Kronos
@ 2003-11-02 23:44   ` Geoffrey Lee
  0 siblings, 0 replies; 11+ messages in thread
From: Geoffrey Lee @ 2003-11-02 23:44 UTC (permalink / raw)
  To: Kronos; +Cc: linux-kernel, Dave Jones

On Sun, Nov 02, 2003 at 07:25:56PM +0100, Kronos wrote:

> In  this way  you don't  read  from bank  0. The strange  thing is  that



In any case:

(1) only does it for k7 (which seems to do the right thing)
(2) k7.c mcheck doesn't read from bank 0 as well

> amd_mcheck_init doesn't enable reporting on  this bank... it should stay
> clean. What's going on here?
> 


Notice how k7.c doesn't read from bank 0 either, and this was the fix 
submitted by Dave earlier on for k7.c but was not done for non-fatal.c.


	 - g.
-- 
\x42\x20\x70\x65\x6f\x70\x6c\x65\x20\x61\x72\x65\x20\x77\x61\x6e\x6b\x65\x72\x73

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] reproducible athlon mce fix
  2003-11-02 12:52 ` Dave Jones
  2003-11-02 13:52   ` Geoffrey Lee
@ 2003-11-03  9:20   ` Geoffrey Lee
  2003-11-03 13:24     ` Dave Jones
  1 sibling, 1 reply; 11+ messages in thread
From: Geoffrey Lee @ 2003-11-03  9:20 UTC (permalink / raw)
  To: Dave Jones, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 702 bytes --]

On Sun, Nov 02, 2003 at 12:52:03PM +0000, Dave Jones wrote:
> On Sun, Nov 02, 2003 at 01:57:48PM +0800, Geoffrey Lee wrote:
> 
>  >  	preempt_disable(); 
>  > +#if CONFIG_MK7
>  > +	for (i=1; i<nr_mce_banks; i++) {
>  > +#else
>  >  	for (i=0; i<nr_mce_banks; i++) {
>  > +#endif
>  >  		rdmsr (MSR_IA32_MC0_STATUS+i*4, low, high);
> 
> This needs to be a runtime check. In 2.6, a K7 can boot
> a P4 kernel, and vice versa.
> 


(Resending as it seems to have eaten my mail due to dns problems ... apologies
if you get this twice.)


Would checking boot_cpu_data.x86_vendor == X86_VENDOR_AMD and 
boot_cpu_data.x86 == 6 be sufficient?  It seems to do the right thing ..
Updated patch attached.

	- g.


[-- Attachment #2: mce-fix.patch --]
[-- Type: text/plain, Size: 1061 bytes --]

--- linux-2.6.0-test9/arch/i386/kernel/cpu/mcheck/non-fatal.c.orig	2003-11-02 13:31:43.000000000 +0800
+++ linux-2.6.0-test9/arch/i386/kernel/cpu/mcheck/non-fatal.c	2003-11-02 21:50:36.000000000 +0800
@@ -21,6 +21,7 @@
 
 static struct timer_list mce_timer;
 static int timerset;
+static int startbank;
 
 #define MCE_RATE	15*HZ	/* timer rate is 15s */
 
@@ -30,7 +31,7 @@
 	int i;
 
 	preempt_disable(); 
-	for (i=0; i<nr_mce_banks; i++) {
+	for (i=startbank; i<nr_mce_banks; i++) {
 		rdmsr (MSR_IA32_MC0_STATUS+i*4, low, high);
 
 		if (high & (1<<31)) {
@@ -68,6 +69,19 @@
 
 static int __init init_nonfatal_mce_checker(void)
 {
+	/*
+	   Certain Athlons would cause spurious MCE non-fatal
+	   exception check to be reported, if we poke at bank 0.
+	   Avoid this if the running machine is an Athlon.
+
+	   -- geoff.
+	*/
+	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD && 
+	    boot_cpu_data.x86 == 6)
+		startbank = 1;
+	else
+		startbank = 0;
+
 	if (timerset == 0) {
 		/* Set the timer to check for non-fatal
 		   errors every MCE_RATE seconds */

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] reproducible athlon mce fix
  2003-11-03  9:20   ` Geoffrey Lee
@ 2003-11-03 13:24     ` Dave Jones
  0 siblings, 0 replies; 11+ messages in thread
From: Dave Jones @ 2003-11-03 13:24 UTC (permalink / raw)
  To: Geoffrey Lee; +Cc: linux-kernel

On Mon, Nov 03, 2003 at 05:20:48PM +0800, Geoffrey Lee wrote:

 > Would checking boot_cpu_data.x86_vendor == X86_VENDOR_AMD and 
 > boot_cpu_data.x86 == 6 be sufficient?  It seems to do the right thing ..
 > Updated patch attached.

Yup, looks to be pretty much the same change I made in my local tree.

		Dave


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] reproducible athlon mce fix
  2003-11-02  7:25 ` Willy Tarreau
  2003-11-02  7:28   ` Geoffrey Lee
  2003-11-02  8:59   ` Zwane Mwaikambo
@ 2003-11-10 23:55   ` bill davidsen
  2 siblings, 0 replies; 11+ messages in thread
From: bill davidsen @ 2003-11-10 23:55 UTC (permalink / raw)
  To: linux-kernel

In article <20031102072519.GD530@alpha.home.local>,
Willy Tarreau  <willy@w.ods.org> wrote:

| I don't know if the patch is correct, but :
| 
| On Sun, Nov 02, 2003 at 01:57:48PM +0800, Geoffrey Lee wrote:
| >  	preempt_disable(); 
| > +#if CONFIG_MK7
| > +	for (i=1; i<nr_mce_banks; i++) {
| > +#else
| >  	for (i=0; i<nr_mce_banks; i++) {
| > +#endif
| 
| Including opening braces within #if often fools editors such as emacs
| which count them and don't know about #if. Then, editing the rest of
| the file can become annoying because it simply thinks that there are
| two embedded for loops.

Wouldn't it be easier to just move the brace out of the ifdef and put it
on a line by itself? Readable, doesn't confuse, etc?

  	preempt_disable(); 
 +#if CONFIG_MK7
 +	for (i=1; i<nr_mce_banks; i++) {
 +#else
 - 	for (i=0; i<nr_mce_banks; i++) {
 + 	for (i=0; i<nr_mce_banks; i++)
 +#endif
        {

or similar. Otherwise I guess the solution defining a starting value
would be "less unreadable."


-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2003-11-11  0:05 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-11-02  5:57 [patch] reproducible athlon mce fix Geoffrey Lee
2003-11-02  7:25 ` Willy Tarreau
2003-11-02  7:28   ` Geoffrey Lee
2003-11-02  8:59   ` Zwane Mwaikambo
2003-11-10 23:55   ` bill davidsen
2003-11-02 12:52 ` Dave Jones
2003-11-02 13:52   ` Geoffrey Lee
2003-11-03  9:20   ` Geoffrey Lee
2003-11-03 13:24     ` Dave Jones
2003-11-02 18:25 ` Kronos
2003-11-02 23:44   ` Geoffrey Lee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).