linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* AMD 760MPX DMA lockup
@ 2002-09-12 14:12 Jan Kasprzak
  2002-09-12 15:07 ` kernel
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Jan Kasprzak @ 2002-09-12 14:12 UTC (permalink / raw)
  To: linux-kernel

	Hello, kernel hackers,

my dual athlon box is unstable in some situations. I can consistently
lock it up by running the following code:

fd = open("/dev/hda3", O_RDWR);
for (i=0; i<1024*1024; i++) {
	read(fd, buffer, 8192);
	lseek(fd, -8192, SEEK_CUR);
	write(fd, buffer, 8192);
}

It locks up in a minute or so (solid lock up, it does not react even
to a NumLock key or console switching). It can surely be a HW problem
(this is a new box), but how to tell whether this is the case?

The mainboard is MSI K7D Master, AMD 760MPX chipset, 460W power supply,
1GB RAM.

The box survived whole night of memtest86 and the whole night of three kernel
compiles running in parallel in an infinite loop.

This problem is on many recent kernels (tried 2.4.18-11 from RedHat "null",
2.4.20-pre5-ac1, 2.4.20-pre5-ac5, 2.4.20-pre6). It does not matter whether
I compile the kernel SMP or UP, with or without CONFIG_HIGHMEM.

I tried several disks (WD1200JB, WD1200BB, IBM 120GXP).
I tried to remove all other PCI cards and 512MB of RAM. No change.
I tried to create an ext3 filesystem on /dev/hda3, mounted it 
as /mnt, created big file /mnt/bigfile and run the above code
on /mnt/bigfile. System still locks up.

I tried to put the tested disk to a separate IDE controller
(Promise PDC20269 PCI card) - then I do not get a complete lockup,
just the drive starts to complain about the DMA timeout, and the kernel
reesets the controller. However, DMA timeouts start to occur even on
the primary controller.

When I switch off the DMA (hdparm -d0 /dev/hda), the problem goes away
(however, the disk is very slow, as expected).

Is anybody able to run the above code on AMD 760MPX-based system?
Is it a kernel problem or hardware problem?

	Thanks in advance,

-Yenya

-- 
| Jan "Yenya" Kasprzak  <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839      Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/   Czech Linux Homepage: http://www.linux.cz/ |
|----------- If you want the holes in your knowledge showing up -----------|
|----------- try teaching someone.                  -- Alan Cox -----------|

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: AMD 760MPX DMA lockup
  2002-09-12 14:12 AMD 760MPX DMA lockup Jan Kasprzak
@ 2002-09-12 15:07 ` kernel
  2002-09-12 16:22 ` Petr Konecny
  2002-09-12 23:10 ` Denis Vlasenko
  2 siblings, 0 replies; 11+ messages in thread
From: kernel @ 2002-09-12 15:07 UTC (permalink / raw)
  To: Jan Kasprzak; +Cc: linux-kernel

> 
> 	Hello, kernel hackers,
> 
> my dual athlon box is unstable in some situations. I can consistently
> lock it up by running the following code:
> 
> fd = open("/dev/hda3", O_RDWR);
> for (i=0; i<1024*1024; i++) {
> 	read(fd, buffer, 8192);
> 	lseek(fd, -8192, SEEK_CUR);
> 	write(fd, buffer, 8192);
> }
> 
> It locks up in a minute or so (solid lock up, it does not react even
> to a NumLock key or console switching). It can surely be a HW problem
> (this is a new box), but how to tell whether this is the case?
> 
> The mainboard is MSI K7D Master, AMD 760MPX chipset, 460W power supply,
> 1GB RAM.
> 
> The box survived whole night of memtest86 and the whole night of three kernel
> compiles running in parallel in an infinite loop.
> 
> This problem is on many recent kernels (tried 2.4.18-11 from RedHat "null",
> 2.4.20-pre5-ac1, 2.4.20-pre5-ac5, 2.4.20-pre6). It does not matter whether
> I compile the kernel SMP or UP, with or without CONFIG_HIGHMEM.

Well I have run this several times on my MPX, and it is fine.

This is 2.4.20-pre1, dual AMD 2000MP, only difference is it is the Tyan
version of the MPX, not the MSI. 

Justin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: AMD 760MPX DMA lockup
  2002-09-12 14:12 AMD 760MPX DMA lockup Jan Kasprzak
  2002-09-12 15:07 ` kernel
@ 2002-09-12 16:22 ` Petr Konecny
  2002-09-12 23:10 ` Denis Vlasenko
  2 siblings, 0 replies; 11+ messages in thread
From: Petr Konecny @ 2002-09-12 16:22 UTC (permalink / raw)
  Cc: kas

>>>>> Jan Kasprzak (Yenya) napsal:

 Yenya> Is anybody able to run the above code on AMD 760MPX-based system?
 Yenya> Is it a kernel problem or hardware problem?
Runs fine ASUS A7M266-D MoBo, WD800JB disk.

                                                        Petr


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: AMD 760MPX DMA lockup (partly solved)
  2002-09-12 23:10 ` Denis Vlasenko
@ 2002-09-12 19:14   ` Jan Kasprzak
  2002-09-12 20:43     ` Alan Cox
                       ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Jan Kasprzak @ 2002-09-12 19:14 UTC (permalink / raw)
  To: Denis Vlasenko, kernel; +Cc: linux-kernel

kernel@street-vision.com wrote:
: Well I have run this several times on my MPX, and it is fine.
:
: This is 2.4.20-pre1, dual AMD 2000MP, only difference is it is the Tyan
: version of the MPX, not the MSI.
:
: Justin

        Justin, thanks for this! I've tried 2.4.20-pre1 with your
.config (and then with my .config), and it works!

        Further investigation showed that the problem first appeared
somewhere between 2.4.20-pre2 (works for me) and 2.4.20-pre5 (has the
lock-up problem I've described). I was not able to test -pre3 and -pre4,
because these kernel died on me during boot after the
"Initializing RT netlink socket" message.

	I the bug got merged from the -ac kernels, because it is
present bot in the kernel 2.4.19-11 from RedHat "null" beta
and in 2.4.20-pre2-ac1 (altough the later crashes instead of lock-up).

Denis Vlasenko wrote:
: 
: 8 GB... Can you make it loop over much lesser size?
: 
	with 2GB it still fails. I didn't try less, because with 1GB of RAM
it would not have any effect.

: I assume removing read+lseek eliminates lockup?

	Partly. I've tried 

dd if=/dev/hda3 of=/dev/null bs=1024k, and it still causes filesystem
corruption (altough no lockup).

: Is it IDE related or not? 
: If you can test it over SCSI/NFS/ramdisk/???...

	I think it is IDE or DMA related.

: > When I switch off the DMA (hdparm -d0 /dev/hda), the problem goes away
: > (however, the disk is very slow, as expected).
: 
: At which DMA/UDMA mode it starts to fail?

	-d1 -X33 fails.

-Y.

-- 
| Jan "Yenya" Kasprzak  <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839      Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/   Czech Linux Homepage: http://www.linux.cz/ |
|----------- If you want the holes in your knowledge showing up -----------|
|----------- try teaching someone.                  -- Alan Cox -----------|

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: AMD 760MPX DMA lockup (partly solved)
  2002-09-12 19:14   ` AMD 760MPX DMA lockup (partly solved) Jan Kasprzak
@ 2002-09-12 20:43     ` Alan Cox
  2002-09-13  9:41       ` Jan Kasprzak
  2002-09-12 21:34     ` Vojtech Pavlik
  2002-09-13 11:58     ` Denis Vlasenko
  2 siblings, 1 reply; 11+ messages in thread
From: Alan Cox @ 2002-09-12 20:43 UTC (permalink / raw)
  To: Jan Kasprzak; +Cc: Denis Vlasenko, kernel, linux-kernel

On Thu, 2002-09-12 at 20:14, Jan Kasprzak wrote:
> 	I the bug got merged from the -ac kernels, because it is
> present bot in the kernel 2.4.19-11 from RedHat "null" beta
> and in 2.4.20-pre2-ac1 (altough the later crashes instead of lock-up).

That would strange actually. The Red Hat beta kernel has 2.4.18 like IDE
not -ac like IDE



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: AMD 760MPX DMA lockup (partly solved)
  2002-09-12 19:14   ` AMD 760MPX DMA lockup (partly solved) Jan Kasprzak
  2002-09-12 20:43     ` Alan Cox
@ 2002-09-12 21:34     ` Vojtech Pavlik
  2002-09-13 11:58     ` Denis Vlasenko
  2 siblings, 0 replies; 11+ messages in thread
From: Vojtech Pavlik @ 2002-09-12 21:34 UTC (permalink / raw)
  To: Jan Kasprzak; +Cc: Denis Vlasenko, kernel, linux-kernel

On Thu, Sep 12, 2002 at 09:14:52PM +0200, Jan Kasprzak wrote:

> : > When I switch off the DMA (hdparm -d0 /dev/hda), the problem goes away
> : > (however, the disk is very slow, as expected).
> : 
> : At which DMA/UDMA mode it starts to fail?
> 
> 	-d1 -X33 fails.

X33? X33 doesn't make sense.

-- 
Vojtech Pavlik
SuSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: AMD 760MPX DMA lockup
  2002-09-12 14:12 AMD 760MPX DMA lockup Jan Kasprzak
  2002-09-12 15:07 ` kernel
  2002-09-12 16:22 ` Petr Konecny
@ 2002-09-12 23:10 ` Denis Vlasenko
  2002-09-12 19:14   ` AMD 760MPX DMA lockup (partly solved) Jan Kasprzak
  2 siblings, 1 reply; 11+ messages in thread
From: Denis Vlasenko @ 2002-09-12 23:10 UTC (permalink / raw)
  To: Jan Kasprzak, linux-kernel

On 12 September 2002 12:12, Jan Kasprzak wrote:

> my dual athlon box is unstable in some situations. I can consistently
> lock it up by running the following code:
>
> fd = open("/dev/hda3", O_RDWR);
> for (i=0; i<1024*1024; i++) {
> 	read(fd, buffer, 8192);
> 	lseek(fd, -8192, SEEK_CUR);
> 	write(fd, buffer, 8192);
> }

8 GB... Can you make it loop over much lesser size?

for (j=0; j<1024; j++) {
  fd = open("/dev/hda3", O_RDWR);
  for (i=0; i<1024; i++) {
  	read(fd, buffer, 8192);
  	lseek(fd, -8192, SEEK_CUR);
  	write(fd, buffer, 8192);
  }
  close(fd);
  printf(<some stats>);
}

I assume removing read+lseek eliminates lockup?

> I tried to put the tested disk to a separate IDE controller
> (Promise PDC20269 PCI card) - then I do not get a complete lockup,
> just the drive starts to complain about the DMA timeout, and the kernel
> reesets the controller. However, DMA timeouts start to occur even on
> the primary controller.

Is it IDE related or not? 
If you can test it over SCSI/NFS/ramdisk/???...

> When I switch off the DMA (hdparm -d0 /dev/hda), the problem goes away
> (however, the disk is very slow, as expected).

At which DMA/UDMA mode it starts to fail?
--
vda

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: AMD 760MPX DMA lockup (partly solved)
  2002-09-12 20:43     ` Alan Cox
@ 2002-09-13  9:41       ` Jan Kasprzak
  2002-09-13  9:46         ` Vojtech Pavlik
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kasprzak @ 2002-09-13  9:41 UTC (permalink / raw)
  To: Alan Cox; +Cc: Denis Vlasenko, vojtech, kernel, linux-kernel

Alan Cox wrote:
: On Thu, 2002-09-12 at 20:14, Jan Kasprzak wrote:
: > 	I the bug got merged from the -ac kernels, because it is
: > present bot in the kernel 2.4.19-11 from RedHat "null" beta
: > and in 2.4.20-pre2-ac1 (altough the later crashes instead of lock-up).
: 
: That would strange actually. The Red Hat beta kernel has 2.4.18 like IDE
: not -ac like IDE
: 
	Well, it is probably not IDE-related at all, but rather it has
something to do with PCI or may be scheduling changes that -pre2-ac2 does.
Currently I positively know that 2.4.20-pre2 works, but 2.4.20-ac2 and
2.4.20-pre5 does not. I have tested 2.4.20-pre2-ac1 and 2.4.20-pre[34],
but all these does not work for me probably for some other reason.

	So I have taken patch-2.4.20-pre2-ac2, and deleted all
changes except the ones that are IDE-related, and 2.4.20-pre2 plus the
following parts of -pre2-ac2 works:

 drivers/ide/Config.in      |   15 
 drivers/ide/Makefile       |    9 
 drivers/ide/amd74xx.c      |  292 +++---
 drivers/ide/hd.c           |   42 
 drivers/ide/ide-disk.c     |  879 +++++++++++-------
 drivers/ide/ide-dma.c      |  347 +++----
 drivers/ide/ide-features.c |  385 --------
 drivers/ide/ide-pci.c      |  539 +++++------
 drivers/ide/ide-probe.c    |  126 +-
 drivers/ide/ide-proc.c     |   58 -
 drivers/ide/ide-taskfile.c | 2159 +++++++++++++++++++++++++++++++++------------ drivers/ide/ide.c          |  651 +++----------
 drivers/ide/pdc202xx.c     | 1098 +++++++++++++---------
 drivers/ide/pdc4030.c      |  272 ++++-
 drivers/ide/pdcadma.c      |  106 ++
 include/asm-i386/ide.h     |   26 
 include/asm-i386/system.h  |   14 
 include/linux/hdreg.h      |   93 +
 include/linux/ide.h        |  409 +++++---
 include/linux/pci_ids.h    |   16 
 20 files changed, 4524 insertions, 3012 deletions

	I have to delete the first chunk of the patch of
include/asm-i386/ide.h, and I have deleted the call to
pci_enable_device_bars() in drivers/ide/ide-pci.c to be able to compile
this, but other than that it is exactly the same as 2.4.20-pre2-ac2 IDE code.
I will send you this as a patch against 2.4.20-pre2 if you want.

	This still works, so it means the problem has to be in some other
part of 2.4.20-pre2-ac2.

Vojtech Pavlik wrote:
: 
: X33? X33 doesn't make sense.
: 
	X34, sorry. DMA 33.

-Y.

-- 
| Jan "Yenya" Kasprzak  <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839      Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/   Czech Linux Homepage: http://www.linux.cz/ |
|----------- If you want the holes in your knowledge showing up -----------|
|----------- try teaching someone.                  -- Alan Cox -----------|

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: AMD 760MPX DMA lockup (partly solved)
  2002-09-13  9:41       ` Jan Kasprzak
@ 2002-09-13  9:46         ` Vojtech Pavlik
  0 siblings, 0 replies; 11+ messages in thread
From: Vojtech Pavlik @ 2002-09-13  9:46 UTC (permalink / raw)
  To: Jan Kasprzak; +Cc: Alan Cox, Denis Vlasenko, vojtech, kernel, linux-kernel

On Fri, Sep 13, 2002 at 11:41:49AM +0200, Jan Kasprzak wrote:

> Vojtech Pavlik wrote:
> : 
> : X33? X33 doesn't make sense.
> : 
> 	X34, sorry. DMA 33.

Still not right. -X34 is MWDMA16, for UDMA33 you need -X66.
I know it's confusing, but these are mode numbers from the ATA spec.

-- 
Vojtech Pavlik
SuSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: AMD 760MPX DMA lockup (partly solved)
  2002-09-13 11:58     ` Denis Vlasenko
@ 2002-09-13 11:44       ` kernel
  0 siblings, 0 replies; 11+ messages in thread
From: kernel @ 2002-09-13 11:44 UTC (permalink / raw)
  To: vda; +Cc: kas, linux-kernel

> 
> On 12 September 2002 17:14, Jan Kasprzak wrote:
> > : This is 2.4.20-pre1, dual AMD 2000MP, only difference is it is the Tyan
> > : version of the MPX, not the MSI.
> > :
> > : Justin
> >
> >         Justin, thanks for this! I've tried 2.4.20-pre1 with your
> > .config (and then with my .config), and it works!
> >
> >         Further investigation showed that the problem first appeared
> > somewhere between 2.4.20-pre2 (works for me) and 2.4.20-pre5 (has the
> > lock-up problem I've described). I was not able to test -pre3 and -pre4,
> > because these kernel died on me during boot after the
> > "Initializing RT netlink socket" message.
> 
> It would be interesting to test 2.4.20-pre5 on Justin's box
> (if he can risk fs damage)

ok, tried it on 2.4.20-pre5, and it is fine. 

I would send your board back...

Justin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: AMD 760MPX DMA lockup (partly solved)
  2002-09-12 19:14   ` AMD 760MPX DMA lockup (partly solved) Jan Kasprzak
  2002-09-12 20:43     ` Alan Cox
  2002-09-12 21:34     ` Vojtech Pavlik
@ 2002-09-13 11:58     ` Denis Vlasenko
  2002-09-13 11:44       ` kernel
  2 siblings, 1 reply; 11+ messages in thread
From: Denis Vlasenko @ 2002-09-13 11:58 UTC (permalink / raw)
  To: Jan Kasprzak, kernel; +Cc: linux-kernel

On 12 September 2002 17:14, Jan Kasprzak wrote:
> : This is 2.4.20-pre1, dual AMD 2000MP, only difference is it is the Tyan
> : version of the MPX, not the MSI.
> :
> : Justin
>
>         Justin, thanks for this! I've tried 2.4.20-pre1 with your
> .config (and then with my .config), and it works!
>
>         Further investigation showed that the problem first appeared
> somewhere between 2.4.20-pre2 (works for me) and 2.4.20-pre5 (has the
> lock-up problem I've described). I was not able to test -pre3 and -pre4,
> because these kernel died on me during boot after the
> "Initializing RT netlink socket" message.

It would be interesting to test 2.4.20-pre5 on Justin's box
(if he can risk fs damage)
--
vda

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2002-09-13 11:40 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-09-12 14:12 AMD 760MPX DMA lockup Jan Kasprzak
2002-09-12 15:07 ` kernel
2002-09-12 16:22 ` Petr Konecny
2002-09-12 23:10 ` Denis Vlasenko
2002-09-12 19:14   ` AMD 760MPX DMA lockup (partly solved) Jan Kasprzak
2002-09-12 20:43     ` Alan Cox
2002-09-13  9:41       ` Jan Kasprzak
2002-09-13  9:46         ` Vojtech Pavlik
2002-09-12 21:34     ` Vojtech Pavlik
2002-09-13 11:58     ` Denis Vlasenko
2002-09-13 11:44       ` kernel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).