* [patch] reproducible athlon mce fix
@ 2003-11-02 5:57 Geoffrey Lee
2003-11-02 7:25 ` Willy Tarreau
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Geoffrey Lee @ 2003-11-02 5:57 UTC (permalink / raw)
To: linux-kernel, davej
[-- Attachment #1: Type: text/plain, Size: 660 bytes --]
Hi all,
After switching from 2.4.22 to 2.6.0-test9, I have received reproducible
MCE non-fatal error check messages in my kernel log. (For example, one
shows up right after my first scsi card init).
>From Dave Jones' patch here:
http://lists.insecure.org/lists/linux-kernel/2003/Sep/7362.html
and another message here:
http://lkml.org/lkml/2003/10/7/214
would seem to imply that Athlons don't like having their Bank 0 poked at,
though that's what non-fatal.c does. Would it be correct to make sure
that that non-fatal.c starts at bank 1, if it is an Athlon?
Dave, is the following patch correct? Booted and tested, no ill effects
so far ...
- g.
[-- Attachment #2: mce-fix.patch --]
[-- Type: text/plain, Size: 428 bytes --]
--- linux-2.6.0-test9/arch/i386/kernel/cpu/mcheck/non-fatal.c.orig 2003-11-02 13:31:43.000000000 +0800
+++ linux-2.6.0-test9/arch/i386/kernel/cpu/mcheck/non-fatal.c 2003-11-02 13:34:37.000000000 +0800
@@ -30,7 +30,11 @@
int i;
preempt_disable();
+#if CONFIG_MK7
+ for (i=1; i<nr_mce_banks; i++) {
+#else
for (i=0; i<nr_mce_banks; i++) {
+#endif
rdmsr (MSR_IA32_MC0_STATUS+i*4, low, high);
if (high & (1<<31)) {
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] reproducible athlon mce fix
2003-11-02 5:57 [patch] reproducible athlon mce fix Geoffrey Lee
@ 2003-11-02 7:25 ` Willy Tarreau
2003-11-02 7:28 ` Geoffrey Lee
` (2 more replies)
2003-11-02 12:52 ` Dave Jones
2003-11-02 18:25 ` Kronos
2 siblings, 3 replies; 11+ messages in thread
From: Willy Tarreau @ 2003-11-02 7:25 UTC (permalink / raw)
To: Geoffrey Lee; +Cc: linux-kernel, davej
Hi,
I don't know if the patch is correct, but :
On Sun, Nov 02, 2003 at 01:57:48PM +0800, Geoffrey Lee wrote:
> preempt_disable();
> +#if CONFIG_MK7
> + for (i=1; i<nr_mce_banks; i++) {
> +#else
> for (i=0; i<nr_mce_banks; i++) {
> +#endif
Including opening braces within #if often fools editors such as emacs
which count them and don't know about #if. Then, editing the rest of
the file can become annoying because it simply thinks that there are
two embedded for loops.
Other solutions to this problem often include one of these constructs
which are unfortunately not as beautiful :
1) a still readable one
#if CONFIG_MK7
i=1;
#else
i=0;
#endif
for (; i<nr_mce_banks; i++) {
2) when there's absolutely no choice (eg changing part of an
expression...) something unreadable like this.
for (
#if CONFIG_MK7
i=1;
#else
i=0;
#endif
i<nr_mce_banks; i++) {
I'm sure other people have other solution that don't come to mind.
Cheers,
Willy
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] reproducible athlon mce fix
2003-11-02 7:25 ` Willy Tarreau
@ 2003-11-02 7:28 ` Geoffrey Lee
2003-11-02 8:59 ` Zwane Mwaikambo
2003-11-10 23:55 ` bill davidsen
2 siblings, 0 replies; 11+ messages in thread
From: Geoffrey Lee @ 2003-11-02 7:28 UTC (permalink / raw)
To: Willy Tarreau; +Cc: linux-kernel, davej
On Sun, Nov 02, 2003 at 08:25:19AM +0100, Willy Tarreau wrote:
> Hi,
>
>
> Other solutions to this problem often include one of these constructs
> which are unfortunately not as beautiful :
>
Hi,
Yep, I know it's ugly, but what I am really after is whether the patch
is correct or not, in terms of what it does.
If it's correct we can work out how to fix this properly, since non-fatal.c
is generic code and I am not sure if doing a CONFIG_MK7 like that is very
pretty there anyway ...
- g.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] reproducible athlon mce fix
2003-11-02 7:25 ` Willy Tarreau
2003-11-02 7:28 ` Geoffrey Lee
@ 2003-11-02 8:59 ` Zwane Mwaikambo
2003-11-10 23:55 ` bill davidsen
2 siblings, 0 replies; 11+ messages in thread
From: Zwane Mwaikambo @ 2003-11-02 8:59 UTC (permalink / raw)
To: Willy Tarreau; +Cc: Geoffrey Lee, linux-kernel, davej
On Sun, 2 Nov 2003, Willy Tarreau wrote:
> 1) a still readable one
> #if CONFIG_MK7
> i=1;
> #else
> i=0;
> #endif
> for (; i<nr_mce_banks; i++) {
>
> 2) when there's absolutely no choice (eg changing part of an
> expression...) something unreadable like this.
>
> for (
> #if CONFIG_MK7
> i=1;
> #else
> i=0;
> #endif
> i<nr_mce_banks; i++) {
>
> I'm sure other people have other solution that don't come to mind.
Perhaps..
#ifdef CONFIG_MK7
#define MCE_BANK_START 1
#else
#define MCE_BANK_START 0
#endif
for (i = MCE_BANK_START; nr_mce_banks; i++)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] reproducible athlon mce fix
2003-11-02 5:57 [patch] reproducible athlon mce fix Geoffrey Lee
2003-11-02 7:25 ` Willy Tarreau
@ 2003-11-02 12:52 ` Dave Jones
2003-11-02 13:52 ` Geoffrey Lee
2003-11-03 9:20 ` Geoffrey Lee
2003-11-02 18:25 ` Kronos
2 siblings, 2 replies; 11+ messages in thread
From: Dave Jones @ 2003-11-02 12:52 UTC (permalink / raw)
To: Geoffrey Lee; +Cc: linux-kernel
On Sun, Nov 02, 2003 at 01:57:48PM +0800, Geoffrey Lee wrote:
> preempt_disable();
> +#if CONFIG_MK7
> + for (i=1; i<nr_mce_banks; i++) {
> +#else
> for (i=0; i<nr_mce_banks; i++) {
> +#endif
> rdmsr (MSR_IA32_MC0_STATUS+i*4, low, high);
This needs to be a runtime check. In 2.6, a K7 can boot
a P4 kernel, and vice versa.
Dave
--
Dave Jones http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] reproducible athlon mce fix
2003-11-02 12:52 ` Dave Jones
@ 2003-11-02 13:52 ` Geoffrey Lee
2003-11-03 9:20 ` Geoffrey Lee
1 sibling, 0 replies; 11+ messages in thread
From: Geoffrey Lee @ 2003-11-02 13:52 UTC (permalink / raw)
To: Dave Jones, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 629 bytes --]
On Sun, Nov 02, 2003 at 12:52:03PM +0000, Dave Jones wrote:
> On Sun, Nov 02, 2003 at 01:57:48PM +0800, Geoffrey Lee wrote:
>
> > preempt_disable();
> > +#if CONFIG_MK7
> > + for (i=1; i<nr_mce_banks; i++) {
> > +#else
> > for (i=0; i<nr_mce_banks; i++) {
> > +#endif
> > rdmsr (MSR_IA32_MC0_STATUS+i*4, low, high);
>
> This needs to be a runtime check. In 2.6, a K7 can boot
> a P4 kernel, and vice versa.
>
Yes, of course.
Would it be sufficient to check for the boot_cpu_data.x86 == 6 and
boot_cpu_data.x86_vendor == X86_VENDOR_AMD? This would seem to do
the right thing. Updated patch attached.
- g.
[-- Attachment #2: mce-fix.patch --]
[-- Type: text/plain, Size: 1061 bytes --]
--- linux-2.6.0-test9/arch/i386/kernel/cpu/mcheck/non-fatal.c.orig 2003-11-02 13:31:43.000000000 +0800
+++ linux-2.6.0-test9/arch/i386/kernel/cpu/mcheck/non-fatal.c 2003-11-02 21:50:36.000000000 +0800
@@ -21,6 +21,7 @@
static struct timer_list mce_timer;
static int timerset;
+static int startbank;
#define MCE_RATE 15*HZ /* timer rate is 15s */
@@ -30,7 +31,7 @@
int i;
preempt_disable();
- for (i=0; i<nr_mce_banks; i++) {
+ for (i=startbank; i<nr_mce_banks; i++) {
rdmsr (MSR_IA32_MC0_STATUS+i*4, low, high);
if (high & (1<<31)) {
@@ -68,6 +69,19 @@
static int __init init_nonfatal_mce_checker(void)
{
+ /*
+ Certain Athlons would cause spurious MCE non-fatal
+ exception check to be reported, if we poke at bank 0.
+ Avoid this if the running machine is an Athlon.
+
+ -- geoff.
+ */
+ if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
+ boot_cpu_data.x86 == 6)
+ startbank = 1;
+ else
+ startbank = 0;
+
if (timerset == 0) {
/* Set the timer to check for non-fatal
errors every MCE_RATE seconds */
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] reproducible athlon mce fix
2003-11-02 5:57 [patch] reproducible athlon mce fix Geoffrey Lee
2003-11-02 7:25 ` Willy Tarreau
2003-11-02 12:52 ` Dave Jones
@ 2003-11-02 18:25 ` Kronos
2003-11-02 23:44 ` Geoffrey Lee
2 siblings, 1 reply; 11+ messages in thread
From: Kronos @ 2003-11-02 18:25 UTC (permalink / raw)
To: linux-kernel; +Cc: Geoffrey Lee, Dave Jones
Geoffrey Lee ha scritto:
> After switching from 2.4.22 to 2.6.0-test9, I have received reproducible
> MCE non-fatal error check messages in my kernel log. (For example, one
> shows up right after my first scsi card init).
[cut]
> would seem to imply that Athlons don't like having their Bank 0 poked at,
> though that's what non-fatal.c does. Would it be correct to make sure
> that that non-fatal.c starts at bank 1, if it is an Athlon?
>
> --- linux-2.6.0-test9/arch/i386/kernel/cpu/mcheck/non-fatal.c.orig 2003-11-02 13:31:43.000000000 +0800
> +++ linux-2.6.0-test9/arch/i386/kernel/cpu/mcheck/non-fatal.c 2003-11-02 13:34:37.000000000 +0800
> @@ -30,7 +30,11 @@
> int i;
>
> preempt_disable();
> +#if CONFIG_MK7
> + for (i=1; i<nr_mce_banks; i++) {
> +#else
> for (i=0; i<nr_mce_banks; i++) {
> +#endif
> rdmsr (MSR_IA32_MC0_STATUS+i*4, low, high);
>
> if (high & (1<<31)) {
In this way you don't read from bank 0. The strange thing is that
amd_mcheck_init doesn't enable reporting on this bank... it should stay
clean. What's going on here?
Luca
--
Reply-To: kronos@kronoz.cjb.net
Home: http://kronoz.cjb.net
Quando un uomo porta dei fiori a sua moglie senza motivo,
un motivo c'e`.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] reproducible athlon mce fix
2003-11-02 18:25 ` Kronos
@ 2003-11-02 23:44 ` Geoffrey Lee
0 siblings, 0 replies; 11+ messages in thread
From: Geoffrey Lee @ 2003-11-02 23:44 UTC (permalink / raw)
To: Kronos; +Cc: linux-kernel, Dave Jones
On Sun, Nov 02, 2003 at 07:25:56PM +0100, Kronos wrote:
> In this way you don't read from bank 0. The strange thing is that
In any case:
(1) only does it for k7 (which seems to do the right thing)
(2) k7.c mcheck doesn't read from bank 0 as well
> amd_mcheck_init doesn't enable reporting on this bank... it should stay
> clean. What's going on here?
>
Notice how k7.c doesn't read from bank 0 either, and this was the fix
submitted by Dave earlier on for k7.c but was not done for non-fatal.c.
- g.
--
\x42\x20\x70\x65\x6f\x70\x6c\x65\x20\x61\x72\x65\x20\x77\x61\x6e\x6b\x65\x72\x73
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] reproducible athlon mce fix
2003-11-02 12:52 ` Dave Jones
2003-11-02 13:52 ` Geoffrey Lee
@ 2003-11-03 9:20 ` Geoffrey Lee
2003-11-03 13:24 ` Dave Jones
1 sibling, 1 reply; 11+ messages in thread
From: Geoffrey Lee @ 2003-11-03 9:20 UTC (permalink / raw)
To: Dave Jones, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 702 bytes --]
On Sun, Nov 02, 2003 at 12:52:03PM +0000, Dave Jones wrote:
> On Sun, Nov 02, 2003 at 01:57:48PM +0800, Geoffrey Lee wrote:
>
> > preempt_disable();
> > +#if CONFIG_MK7
> > + for (i=1; i<nr_mce_banks; i++) {
> > +#else
> > for (i=0; i<nr_mce_banks; i++) {
> > +#endif
> > rdmsr (MSR_IA32_MC0_STATUS+i*4, low, high);
>
> This needs to be a runtime check. In 2.6, a K7 can boot
> a P4 kernel, and vice versa.
>
(Resending as it seems to have eaten my mail due to dns problems ... apologies
if you get this twice.)
Would checking boot_cpu_data.x86_vendor == X86_VENDOR_AMD and
boot_cpu_data.x86 == 6 be sufficient? It seems to do the right thing ..
Updated patch attached.
- g.
[-- Attachment #2: mce-fix.patch --]
[-- Type: text/plain, Size: 1061 bytes --]
--- linux-2.6.0-test9/arch/i386/kernel/cpu/mcheck/non-fatal.c.orig 2003-11-02 13:31:43.000000000 +0800
+++ linux-2.6.0-test9/arch/i386/kernel/cpu/mcheck/non-fatal.c 2003-11-02 21:50:36.000000000 +0800
@@ -21,6 +21,7 @@
static struct timer_list mce_timer;
static int timerset;
+static int startbank;
#define MCE_RATE 15*HZ /* timer rate is 15s */
@@ -30,7 +31,7 @@
int i;
preempt_disable();
- for (i=0; i<nr_mce_banks; i++) {
+ for (i=startbank; i<nr_mce_banks; i++) {
rdmsr (MSR_IA32_MC0_STATUS+i*4, low, high);
if (high & (1<<31)) {
@@ -68,6 +69,19 @@
static int __init init_nonfatal_mce_checker(void)
{
+ /*
+ Certain Athlons would cause spurious MCE non-fatal
+ exception check to be reported, if we poke at bank 0.
+ Avoid this if the running machine is an Athlon.
+
+ -- geoff.
+ */
+ if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
+ boot_cpu_data.x86 == 6)
+ startbank = 1;
+ else
+ startbank = 0;
+
if (timerset == 0) {
/* Set the timer to check for non-fatal
errors every MCE_RATE seconds */
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] reproducible athlon mce fix
2003-11-03 9:20 ` Geoffrey Lee
@ 2003-11-03 13:24 ` Dave Jones
0 siblings, 0 replies; 11+ messages in thread
From: Dave Jones @ 2003-11-03 13:24 UTC (permalink / raw)
To: Geoffrey Lee; +Cc: linux-kernel
On Mon, Nov 03, 2003 at 05:20:48PM +0800, Geoffrey Lee wrote:
> Would checking boot_cpu_data.x86_vendor == X86_VENDOR_AMD and
> boot_cpu_data.x86 == 6 be sufficient? It seems to do the right thing ..
> Updated patch attached.
Yup, looks to be pretty much the same change I made in my local tree.
Dave
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] reproducible athlon mce fix
2003-11-02 7:25 ` Willy Tarreau
2003-11-02 7:28 ` Geoffrey Lee
2003-11-02 8:59 ` Zwane Mwaikambo
@ 2003-11-10 23:55 ` bill davidsen
2 siblings, 0 replies; 11+ messages in thread
From: bill davidsen @ 2003-11-10 23:55 UTC (permalink / raw)
To: linux-kernel
In article <20031102072519.GD530@alpha.home.local>,
Willy Tarreau <willy@w.ods.org> wrote:
| I don't know if the patch is correct, but :
|
| On Sun, Nov 02, 2003 at 01:57:48PM +0800, Geoffrey Lee wrote:
| > preempt_disable();
| > +#if CONFIG_MK7
| > + for (i=1; i<nr_mce_banks; i++) {
| > +#else
| > for (i=0; i<nr_mce_banks; i++) {
| > +#endif
|
| Including opening braces within #if often fools editors such as emacs
| which count them and don't know about #if. Then, editing the rest of
| the file can become annoying because it simply thinks that there are
| two embedded for loops.
Wouldn't it be easier to just move the brace out of the ifdef and put it
on a line by itself? Readable, doesn't confuse, etc?
preempt_disable();
+#if CONFIG_MK7
+ for (i=1; i<nr_mce_banks; i++) {
+#else
- for (i=0; i<nr_mce_banks; i++) {
+ for (i=0; i<nr_mce_banks; i++)
+#endif
{
or similar. Otherwise I guess the solution defining a starting value
would be "less unreadable."
--
bill davidsen <davidsen@tmr.com>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2003-11-11 0:05 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-11-02 5:57 [patch] reproducible athlon mce fix Geoffrey Lee
2003-11-02 7:25 ` Willy Tarreau
2003-11-02 7:28 ` Geoffrey Lee
2003-11-02 8:59 ` Zwane Mwaikambo
2003-11-10 23:55 ` bill davidsen
2003-11-02 12:52 ` Dave Jones
2003-11-02 13:52 ` Geoffrey Lee
2003-11-03 9:20 ` Geoffrey Lee
2003-11-03 13:24 ` Dave Jones
2003-11-02 18:25 ` Kronos
2003-11-02 23:44 ` Geoffrey Lee
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).