linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered
       [not found] <200312132040.00875.ross@datscreative.com.au>
@ 2003-12-13 12:00 ` Bob
  2003-12-15 13:11   ` Maciej W. Rozycki
  0 siblings, 1 reply; 7+ messages in thread
From: Bob @ 2003-12-13 12:00 UTC (permalink / raw)
  To: linux-kernel

udma133 with Award bios update and nforce2

APIC error on CPU0: 02(02)
what?? no crash though.

Ross Dickson wrote:

>Hi Bob
>
>Jesse has award bios, see attached
>Ross.
>
Months ago I thought using a 3ware card might
help with nforce2 crashes so I gave up on promise
and sii hd cards after a lot of experiments(hdparm,
no lapic, no acpi, apic off in bios) and put in a 3ware
card but I flashed the bios at the same time so didn't
know if the 3ware card helped with the nforce2
crashing or not, since the bios flash did the job.

With 3ware I couldn't use hdparm to see what udma
settings the drives were set to. Now I can report.

Just now I took the 3ware card out and went back
to promise cards(using 4 hd's either method, 2 cd's
on mboard amd74xx, onboard sata disabled).

bob@where cat /proc/interrupts
           CPU0      
  0:    3350153    IO-APIC-edge  timer
  1:       5775    IO-APIC-edge  i8042
  2:          0          XT-PIC  cascade
  8:          1    IO-APIC-edge  rtc
  9:          0   IO-APIC-level  acpi
 12:       5385    IO-APIC-edge  i8042
 14:         10    IO-APIC-edge  ide0
 15:         10    IO-APIC-edge  ide1
 16:    1717957   IO-APIC-level  ide2, ide3, eth0
 19:     472929   IO-APIC-level  ide4, ide5
 21:          0   IO-APIC-level  NVidia nForce2
NMI:        822
LOC:    3350073
ERR:         35
MIS:      15818

cd's on amd74xx onboard, amd74xx onboard is always solid,
4 ide hd's on two promise cards. not many nmi ticks without
the better patch there.

bonnie++ smooth, then hdparm up the settings, udma6,
bonnie++ again, saw a few "APIC error on CPU0: 02(02)"
but no lockup. not sure if data lost since it was a test. APIC
error might be fixed by changing hdparm settings. This
second test was with unmasked irq and udma6.

I have to patch to get ioapic edge timer on.

This 11/7/2003 updated award bios does not have a cpu
disconnect option but it does eliminate the crashes with
no patch and it is no longer impossible to use promise
ide udma133 controller cards.

MSI K7N2 Delta MCP2-T mboard

I don't have the promise patch in yet, either, so the APIC
error might be from that, or hdparm unmasked irq.

-Bob

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered
  2003-12-13 12:00 ` Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered Bob
@ 2003-12-15 13:11   ` Maciej W. Rozycki
  2003-12-16  7:18     ` Bob
  0 siblings, 1 reply; 7+ messages in thread
From: Maciej W. Rozycki @ 2003-12-15 13:11 UTC (permalink / raw)
  To: Bob; +Cc: linux-kernel

On Sat, 13 Dec 2003, Bob wrote:

> APIC error on CPU0: 02(02)
> what?? no crash though.
[...]
> bob@where cat /proc/interrupts
>            CPU0      
>   0:    3350153    IO-APIC-edge  timer
>   1:       5775    IO-APIC-edge  i8042
>   2:          0          XT-PIC  cascade
>   8:          1    IO-APIC-edge  rtc
>   9:          0   IO-APIC-level  acpi
>  12:       5385    IO-APIC-edge  i8042
>  14:         10    IO-APIC-edge  ide0
>  15:         10    IO-APIC-edge  ide1
>  16:    1717957   IO-APIC-level  ide2, ide3, eth0
>  19:     472929   IO-APIC-level  ide4, ide5
>  21:          0   IO-APIC-level  NVidia nForce2
> NMI:        822
> LOC:    3350073
> ERR:         35
> MIS:      15818

 It looks like the infamous APIC delivery bug -- the "MIS" counter shows
how many level-triggered interrupts has been erronously delivered as
edge-triggered ones.  No wonder the system shows instability -- you have 
noise problems at the APIC bus.

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered
  2003-12-15 13:11   ` Maciej W. Rozycki
@ 2003-12-16  7:18     ` Bob
  0 siblings, 0 replies; 7+ messages in thread
From: Bob @ 2003-12-16  7:18 UTC (permalink / raw)
  To: linux-kernel

apic.c patch needs reload:%lu instead of %u  ---------->
printk("..APIC TIMER ack delay, reload:%lu, safe:%u\n",



amd xp3000+, 1:1 333mhz fsb to ram, 166mhz cpu
bus clock x dual channel 2-512mb pc3200 tested cas2
sticks, 1:1 fsb to ram for 333mhz, Award bios with
update that works for non-crashing but not for edge
timer without patch.  MSI K7N2 Delta MCP2-T mbo
linux-2.6.0-test11

This was with 3ware controller and unpatched 2.6.0-test11
Note low MIS score but PIC timer and no nmi--

          CPU0       0:  244393560          XT-PIC  timer
 1:      31963    IO-APIC-edge  i8042
 2:          0          XT-PIC  cascade
 8:          1    IO-APIC-edge  rtc
 9:          0   IO-APIC-level  acpi
12:     251884    IO-APIC-edge  i8042
14:         22    IO-APIC-edge  ide0
15:         24    IO-APIC-edge  ide1
16:    4290216   IO-APIC-level  3ware Storage Controller, yenta, yenta
17:    5929405   IO-APIC-level  eth0
21:          0   IO-APIC-level  NVidia nForce2
NMI:          0
LOC:  244378698
ERR:          0
MIS:          6

Next is with the first edge timer patch, nmi_watchdog=2
works but =1 does not, MIS really high("noisy bus"),
replacing 3ware with promise cards and hdparm udma133
causes apic error logged to console during bonnie++ test--

>>APIC error on CPU0: 02(02)
>>what?? no crash though.
>>    
>>
>>bob@where cat /proc/interrupts
>>           CPU0      
>>  0:    3350153    IO-APIC-edge  timer
>>  1:       5775    IO-APIC-edge  i8042
>>  2:          0          XT-PIC  cascade
>>  8:          1    IO-APIC-edge  rtc
>>  9:          0   IO-APIC-level  acpi
>> 12:       5385    IO-APIC-edge  i8042
>> 14:         10    IO-APIC-edge  ide0
>> 15:         10    IO-APIC-edge  ide1
>> 16:    1717957   IO-APIC-level  ide2, ide3, eth0
>> 19:     472929   IO-APIC-level  ide4, ide5
>> 21:          0   IO-APIC-level  NVidia nForce2
>>NMI:        822
>>LOC:    3350073
>>ERR:         35
>>MIS:      15818
>>    
>>

now with promise controllers again, new edge timer patch
permits nmi_watchdog=1 not =2, lots of nmi ticks, MIS count
is only half with first timer patch, NMI ticks = LOC?


bob@where cat /proc/interrupts
           CPU0      
  0:   46188571    IO-APIC-edge  timer
  1:      12396    IO-APIC-edge  i8042
  2:          0          XT-PIC  cascade
  8:          1    IO-APIC-edge  rtc
  9:          0   IO-APIC-level  acpi
 12:     147429    IO-APIC-edge  i8042
 14:         10    IO-APIC-edge  ide0
 15:         10    IO-APIC-edge  ide1
 16:    1413705   IO-APIC-level  ide2, ide3, eth0
 17:          0   IO-APIC-level  yenta, yenta
 19:     258804   IO-APIC-level  ide4, ide5
 21:          0   IO-APIC-level  NVidia nForce2
NMI:   46188592
LOC:   46188482
ERR:         36
MIS:       6877

Now I'll try 800UL/100ndelay to see if it helps with
MIS count(pseudo-sci masochism), be back in a while.

Oh, by the way, I set debug 1 in apic.h but I don't
see anything, and I thought I saw a compile error
flash by, so now I'll compile > logfile 2>&1 and
might see why I don't see--

"..APIC TIMER ack delay, predelay count: 20769"

I don't see any of that debug stuff. Maybe the compile
errors I found were it, see my previous message about
"unsigned in format", maybe printk needs %lu(I don't
know hardly nuffing yet). I'm going to boot 800UL/100ndelay
now.


it needs reload:%lu instead of %u  ---------->
printk("..APIC TIMER ack delay, reload:%lu, safe:%u\n",

Ross: "Can you also advise if your bios setting of the
"C1 disconnect" is set"

I can only guess by my 41C low load 48C high load
temps exactly equal to range for "2.1Ghz 333mhz"
of Ian Kumlien(his?) which is same speed as mine,
that probably cpu disconnect is not on. I have
no visible choice in setup for cpu disconnect.
I'll try athcool to see how disconnect is set.

Ross:"I have heard lockups are not supposed to happen
at all if the fsb (host bus clock speed) matches the
ddr speed. One of my systems went about 4 hours (xp2500
333fsb, DDR333) without the apic delay patch on a phoenix 
bios before lockup"

A couple of months ago I was overly optimistic a couple
of times before the bios update, and it seemed to work
to use 1:1 and only amd74xx onboard hd controller, no
hd cards, and pre-emptive, anticipatory sched not
deadline, apic off in setup but on in linux, lapic
off, acpi on. It was almost stable if using only one
drive, but I really can't go without hd cards for
software raid, so the first fsck on boot if using hd
card, and crash. I could finesse stability by using
options but never quite reach reliability without a
bios update, and certain functions need patching, and
I still have "MIS count, noisy bus" and agp8 crash(I can
use the X nv driver and agpgart no problem, but not nvidia
drivers for X and agp8).



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered
  2003-12-15 15:02 ` Craig Bradney
  2003-12-15 15:56   ` Maciej W. Rozycki
@ 2003-12-15 16:54   ` Ross Dickson
  1 sibling, 0 replies; 7+ messages in thread
From: Ross Dickson @ 2003-12-15 16:54 UTC (permalink / raw)
  To: Craig Bradney; +Cc: recbo, linux-kernel, Ian Kumlien

On Tuesday 16 December 2003 01:02, you wrote:
> Just to give the status here ...
> Im still running the original 2.6 test 11 patches for apic and ioapic.
> Uptime is now 2d 20h with lots of idle time and hard work too.. 
> 
> /proc/interrupts as follows:
> 
>            CPU0
>   0:  245382420    IO-APIC-edge  timer
>   1:     139577    IO-APIC-edge  i8042
>   2:          0          XT-PIC  cascade
>   8:          3    IO-APIC-edge  rtc
>   9:          0   IO-APIC-level  acpi
>  12:    1478615    IO-APIC-edge  i8042
>  14:    1055548    IO-APIC-edge  ide0
>  15:     737664    IO-APIC-edge  ide1
>  19:   18405692   IO-APIC-level  radeon@PCI:3:0:0
>  21:    5257090   IO-APIC-level  ehci_hcd, NVidia nForce2, eth0
>  22:          3   IO-APIC-level  ohci1394
> NMI:      14944
> LOC:  245087891
> ERR:          0
> MIS:          6

Uptime sounds good so far.
 
I am not convinced my v2 apic patch is a great overall improvement, I am 
thinking v1 apic, is safer for now. 

Having said that
Ian Kumlien currently has an uptime of
1 day, 15 hours +
on v2 patches but with the apic delay timeout increased from 600UL to 800UL.
He has a Barton core - see below.

> 
> Craig
> A7N8X Deluxe V2 BIOS 1007
> 
> 
<snip>

> > I am currently trying the simpler v1 (always add a delay) patch but on all apic
> > acks as per this posting
> > 
> > http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-12/3291.html
> > 
> > which is a reply to an earlier posting of the same name but I accidently 
> > omitted the Re in the subject.
> > 

I don't think it is necessary to put the delay in all apic acks - I just tried it 
to see if it worked and have not yet put my code back the way it was. 
My hard lockups went away with the original v1 apic 
timer delay patch anyway.

Please note in that (above) posting I write that I stuffed up the #ifdefs
in my v1 and v2 patches and adjust code accordingly. Patches worked 
but were only testing on the first config item after #ifdef

apic code should have had
#if defined(CONFIG_MK7) && defined(CONFIG_BLK_DEV_AMD74XX)

ioapic code should have had
#if defined(CONFIG_ACPI_BOOT) && defined(CONFIG_X86_UP_IOAPIC)

Brief summary at this point

1) 2? reports are in that latest award bios with "C1 disconnect" set to "auto?"
 may remove need for apic ack delay patch and still keep cpu thermo managed

2) apic ack delay v1 patch seems safe for all cpu cores but introduces a small
 delay of about half the time of an XTPIC access on each apic timer interrupt
	
3) apic ack delay v2 patch seems safe only on barton cores and gives more debugging
 info and wastes less time than apic v1 patch

4) io-apic v2 patch gives more debugging info but functions same as io-apic v1 patch

Regards
Ross


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered
  2003-12-15 15:02 ` Craig Bradney
@ 2003-12-15 15:56   ` Maciej W. Rozycki
  2003-12-15 16:54   ` Ross Dickson
  1 sibling, 0 replies; 7+ messages in thread
From: Maciej W. Rozycki @ 2003-12-15 15:56 UTC (permalink / raw)
  To: Craig Bradney; +Cc: ross, recbo, linux-kernel

On Mon, 15 Dec 2003, Craig Bradney wrote:

>            CPU0
>   0:  245382420    IO-APIC-edge  timer
>   1:     139577    IO-APIC-edge  i8042
>   2:          0          XT-PIC  cascade
>   8:          3    IO-APIC-edge  rtc
>   9:          0   IO-APIC-level  acpi
>  12:    1478615    IO-APIC-edge  i8042
>  14:    1055548    IO-APIC-edge  ide0
>  15:     737664    IO-APIC-edge  ide1
>  19:   18405692   IO-APIC-level  radeon@PCI:3:0:0
>  21:    5257090   IO-APIC-level  ehci_hcd, NVidia nForce2, eth0
>  22:          3   IO-APIC-level  ohci1394
> NMI:      14944
> LOC:  245087891
> ERR:          0
> MIS:          6
> 
> As for NMI.. I actually forget which I booted from... I think =1, but NMI is a small number now.. would it have wrapped?

 That's "=2" -- otherwise the NMI count would be rougly the same as the
sum of counts for IRQ 0 for all processors.  And you can actually get your
kernel's command line from /proc/cmdline.

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered
  2003-12-15 14:30 Fwd: " Ross Dickson
@ 2003-12-15 15:02 ` Craig Bradney
  2003-12-15 15:56   ` Maciej W. Rozycki
  2003-12-15 16:54   ` Ross Dickson
  0 siblings, 2 replies; 7+ messages in thread
From: Craig Bradney @ 2003-12-15 15:02 UTC (permalink / raw)
  To: ross; +Cc: Maciej W. Rozycki, recbo, linux-kernel

Just to give the status here ...
Im still running the original 2.6 test 11 patches for apic and ioapic.
Uptime is now 2d 20h with lots of idle time and hard work too.. 

/proc/interrupts as follows:

           CPU0
  0:  245382420    IO-APIC-edge  timer
  1:     139577    IO-APIC-edge  i8042
  2:          0          XT-PIC  cascade
  8:          3    IO-APIC-edge  rtc
  9:          0   IO-APIC-level  acpi
 12:    1478615    IO-APIC-edge  i8042
 14:    1055548    IO-APIC-edge  ide0
 15:     737664    IO-APIC-edge  ide1
 19:   18405692   IO-APIC-level  radeon@PCI:3:0:0
 21:    5257090   IO-APIC-level  ehci_hcd, NVidia nForce2, eth0
 22:          3   IO-APIC-level  ohci1394
NMI:      14944
LOC:  245087891
ERR:          0
MIS:          6

As for NMI.. I actually forget which I booted from... I think =1, but NMI is a small number now.. would it have wrapped?

Craig
A7N8X Deluxe V2 BIOS 1007



On Mon, 2003-12-15 at 15:30, Ross Dickson wrote:
> >> APIC error on CPU0: 02(02) 
> > > what?? no crash though. 
> > [...] 
> > > bob@where cat /proc/interrupts 
> > > CPU0 
> > > 0: 3350153 IO-APIC-edge timer 
> > > 1: 5775 IO-APIC-edge i8042 
> > > 2: 0 XT-PIC cascade 
> > > 8: 1 IO-APIC-edge rtc 
> > > 9: 0 IO-APIC-level acpi 
> > > 12: 5385 IO-APIC-edge i8042 
> > > 14: 10 IO-APIC-edge ide0 
> > > 15: 10 IO-APIC-edge ide1 
> > > 16: 1717957 IO-APIC-level ide2, ide3, eth0 
> > > 19: 472929 IO-APIC-level ide4, ide5 
> > > 21: 0 IO-APIC-level NVidia nForce2 
> > > NMI: 822 
> > > LOC: 3350073 
> > > ERR: 35 
> > > MIS: 15818 
> 
> >It looks like the infamous APIC delivery bug -- the "MIS" counter shows 
> >how many level-triggered interrupts has been erronously delivered as 
> >edge-triggered ones. No wonder the system shows instability -- you have 
> >noise problems at the APIC bus. 
>  
> Thanks Maciej
> I was wondering about those, I had seen the work around code and would not
> have thought it need apply to recent athlon chipsets?
> 
> 
> For comparison here is my proc/interrupts 
> CPU0
>   0:   50462204    IO-APIC-edge  timer
>   1:      49153    IO-APIC-edge  keyboard
>   2:          0          XT-PIC  cascade
>   9:          0   IO-APIC-level  acpi
>  12:     395912    IO-APIC-edge  PS/2 Mouse
>  14:     995872    IO-APIC-edge  ide0
>  15:        283    IO-APIC-edge  ide1
>  16:    3921102   IO-APIC-level  nvidia
>  18:          2   IO-APIC-level  bttv
>  20:     136325   IO-APIC-level  eth0, usb-ohci
>  21:     146903   IO-APIC-level  ehci_hcd, NVIDIA nForce Audio
>  22:          0   IO-APIC-level  usb-ohci
> NMI:          0
> LOC:   50457798
> ERR:          0
> MIS:          0
> 
> Albatron KM18G-Pro, nforce2, pheonix bios, 2200XP, 255fsb, ddr400,
> ide0 is hard drive, ide1 is cdrom, nmi watchdog off
> 
> Report seems OK but this machine locks up hard without the apic delay patch.
> 
> I am currently trying the simpler v1 (always add a delay) patch but on all apic
> acks as per this posting
> 
> http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-12/3291.html
> 
> which is a reply to an earlier posting of the same name but I accidently 
> omitted the Re in the subject.
> 
> Regards,
> Ross.
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered
@ 2003-12-15 14:30 Ross Dickson
  2003-12-15 15:02 ` Craig Bradney
  0 siblings, 1 reply; 7+ messages in thread
From: Ross Dickson @ 2003-12-15 14:30 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: recbo, linux-kernel

>> APIC error on CPU0: 02(02) 
> > what?? no crash though. 
> [...] 
> > bob@where cat /proc/interrupts 
> > CPU0 
> > 0: 3350153 IO-APIC-edge timer 
> > 1: 5775 IO-APIC-edge i8042 
> > 2: 0 XT-PIC cascade 
> > 8: 1 IO-APIC-edge rtc 
> > 9: 0 IO-APIC-level acpi 
> > 12: 5385 IO-APIC-edge i8042 
> > 14: 10 IO-APIC-edge ide0 
> > 15: 10 IO-APIC-edge ide1 
> > 16: 1717957 IO-APIC-level ide2, ide3, eth0 
> > 19: 472929 IO-APIC-level ide4, ide5 
> > 21: 0 IO-APIC-level NVidia nForce2 
> > NMI: 822 
> > LOC: 3350073 
> > ERR: 35 
> > MIS: 15818 

>It looks like the infamous APIC delivery bug -- the "MIS" counter shows 
>how many level-triggered interrupts has been erronously delivered as 
>edge-triggered ones. No wonder the system shows instability -- you have 
>noise problems at the APIC bus. 
 
Thanks Maciej
I was wondering about those, I had seen the work around code and would not
have thought it need apply to recent athlon chipsets?


For comparison here is my proc/interrupts 
CPU0
  0:   50462204    IO-APIC-edge  timer
  1:      49153    IO-APIC-edge  keyboard
  2:          0          XT-PIC  cascade
  9:          0   IO-APIC-level  acpi
 12:     395912    IO-APIC-edge  PS/2 Mouse
 14:     995872    IO-APIC-edge  ide0
 15:        283    IO-APIC-edge  ide1
 16:    3921102   IO-APIC-level  nvidia
 18:          2   IO-APIC-level  bttv
 20:     136325   IO-APIC-level  eth0, usb-ohci
 21:     146903   IO-APIC-level  ehci_hcd, NVIDIA nForce Audio
 22:          0   IO-APIC-level  usb-ohci
NMI:          0
LOC:   50457798
ERR:          0
MIS:          0

Albatron KM18G-Pro, nforce2, pheonix bios, 2200XP, 255fsb, ddr400,
ide0 is hard drive, ide1 is cdrom, nmi watchdog off

Report seems OK but this machine locks up hard without the apic delay patch.

I am currently trying the simpler v1 (always add a delay) patch but on all apic
acks as per this posting

http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-12/3291.html

which is a reply to an earlier posting of the same name but I accidently 
omitted the Re in the subject.

Regards,
Ross.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2003-12-16  7:18 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <200312132040.00875.ross@datscreative.com.au>
2003-12-13 12:00 ` Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered Bob
2003-12-15 13:11   ` Maciej W. Rozycki
2003-12-16  7:18     ` Bob
2003-12-15 14:30 Fwd: " Ross Dickson
2003-12-15 15:02 ` Craig Bradney
2003-12-15 15:56   ` Maciej W. Rozycki
2003-12-15 16:54   ` Ross Dickson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).