linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* AMD 768 erratum 10 (solved: AMD 760MPX DMA lockup)
@ 2002-09-25 13:24 Jan Kasprzak
  2002-09-26 15:08 ` Alan Cox
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Kasprzak @ 2002-09-25 13:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mark Hahn, kernel, Petr Konecny, Bruno A. Crespo, Denis Vlasenko,
	Alan Cox

	Hello, all!

two weeks ago I've posted to the LKML the following message:

[...]
: my dual athlon box is unstable in some situations. I can consistently
: lock it up by running the following code:
: 
: fd = open("/dev/hda3", O_RDWR);
: for (i=0; i<1024*1024; i++) {
:         read(fd, buffer, 8192);
:         lseek(fd, -8192, SEEK_CUR);
:         write(fd, buffer, 8192);
: }
[...]

	I think I have been hit by AMD 768 southbridge erratum number 10.
After plugging in the PS/2 mouse, the server is able to run 10 iterations
of bonnie++ without any problem (w/o PS/2 mouse it locks up in first
or second iterations).

	I want to ask everyone who replied to me that the above code
works for him on the 760MPX-based system to re-run the above code
(or run bonnie++ benchmark several times in a loop), but _without_
the PS/2 mouse connected?

	Since this is an official AMD errata, we should have a work-around
for this, or at least the big fat warning during boot, when the 768
southbridge is detected - something like the following:

WARNING: Using the system with AMD 768 southbridge without the PS/2
WARNING: mouse plugged in can cause instabilities. See the AMD 768 erratum #10

	The AMD 768 Revision Guide is at the following URL:

http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24472.pdf

the erratum #10 is described on page 7 (pstotext output, manually edited):

: 10	Multiprocessor System May Hang While in FULL APIC Mode
: 	and IOAPIC Interrupt is Masked
: 
: Products Affected. B1, B2
: 
: Normal Specified Operation. The AMD-768 peripheral bus controller is
: designed to support FULL APIC mode in multiprocessor systems for system
: management events. If an interrupt is masked in the APIC controller of
: the AMD-768, then the corresponding interrupt message should not be
: sent to the processor via the 3-wire APIC bus.
: 
: Non-conformance. The AMD-768 peripheral bus controller will send an
: interrupt message via the 3-wire APIC bus regardless if the interrupt
: is masked or not.
: 
: Potential Effect on System. Since the processor had previously masked
: the APIC interrupt, it is not expecting to receive future APIC messages
: for the masked interrupt. The APIC controller will continuously send
: the interrupt message via the 3-wire bus until a processor accepts the
: message, causing the system to hang.
: 
: A system hang has been observed when executing a server shutdown
: command in Novell Netware versions 5.0 or 5.1 while using a serial
: mouse. During the server shutdown sequence, software writes an invalid
: CPU ID to the IOAPIC redirection table, and the system does not
: complete the shutdown.
: 
: Note: No failure has been observed when using a PS/2 mouse.
: 
: Suggested Workaround. None.
: 
: Resolution Status: No fix planned.


-- 
| Jan "Yenya" Kasprzak  <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839      Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/   Czech Linux Homepage: http://www.linux.cz/ |
|----------- If you want the holes in your knowledge showing up -----------|
|----------- try teaching someone.                  -- Alan Cox -----------|

^ permalink raw reply	[flat|nested] 8+ messages in thread
* Re: AMD 768 erratum 10 (solved: AMD 760MPX DMA lockup)
@ 2002-09-26 16:08 Manfred Spraul
  2002-09-27  6:46 ` Jan Kasprzak
  0 siblings, 1 reply; 8+ messages in thread
From: Manfred Spraul @ 2002-09-26 16:08 UTC (permalink / raw)
  To: Dave Jones; +Cc: linux-kernel, Jan Kasprzak, Alan Cox

The errata is not PS/2 mouse specific:
it says that the io apic doesn't implement masking interrupts correctly.

Linux uses masking aggressively - disable_irq() is implemented by 
masking the interrupt in the io apic. I'm surprised that this doesn't 
cause frequent problems. Perhaps the problem only occurs if an invalid 
cpu id is written into the target register, as done by Netware?

Is someone around with a ne2k-pci card and a AMD-760MPX based system?

Regarding Jan's problem: I'm not sure if his problems are related to 
this errata. It says that using a PS/2 mouse instead of a serial mouse 
with Novell Netward avoids the hang during shutdown, probably because 
then netware doesn't mask the irq.

--
	Manfred


^ permalink raw reply	[flat|nested] 8+ messages in thread
* Re: AMD 768 erratum 10 (solved: AMD 760MPX DMA lockup)
@ 2002-09-27 14:24 Bruno A. Crespo
  0 siblings, 0 replies; 8+ messages in thread
From: Bruno A. Crespo @ 2002-09-27 14:24 UTC (permalink / raw)
  To: linux-kernel, kas

Jan Kasprzak wrote:

> Manfred Spraul wrote:
> : The errata is not PS/2 mouse specific:
> : it says that the io apic doesn't implement masking interrupts correctly.
> 
> Yes, but it says that problem has not been observed when running
> with PS/2 mouse. Which is exactly what I observe on my system.

I observed the same problem with a MSI K7D Master mainboard, and
plugging a PS/2 mouse fix the problem.

BTW: I can also fix the problem unplugging the AGP card.


						Bruno

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2002-09-30 21:42 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-09-25 13:24 AMD 768 erratum 10 (solved: AMD 760MPX DMA lockup) Jan Kasprzak
2002-09-26 15:08 ` Alan Cox
2002-09-26 15:34   ` Dave Jones
2002-09-26 16:51     ` Alan Cox
2002-09-26 16:08 Manfred Spraul
2002-09-27  6:46 ` Jan Kasprzak
2002-09-30 21:47   ` Maxwell Spangler
2002-09-27 14:24 Bruno A. Crespo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).