linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Dma problems with Promise IDE controller
@ 2004-10-18 19:43 Johan Groth
  2004-10-18 20:22 ` Bartlomiej Zolnierkiewicz
  0 siblings, 1 reply; 8+ messages in thread
From: Johan Groth @ 2004-10-18 19:43 UTC (permalink / raw)
  To: linux-kernel

Hi,
I'm using a Promise controller controlling 4 IDE HD:s, setup as a sw 
raid0 array. Lately I'm getting interuppt problems that looks like this:

Oct 18 18:03:06 lion kernel: hdg: dma_timer_expiry: dma status == 0x61
Oct 18 18:03:16 lion kernel: hdg: dma timeout retry: status=0x51 { 
DriveReady SeekComplete Error }
Oct 18 18:03:16 lion kernel: hdg: dma timeout retry: error=0x40 { 
UncorrectableError }, LBAsect=53500655, sector=53500520
Oct 18 18:03:16 lion kernel: end_request: I/O error, dev 22:01 (hdg), 
sector 53500520
Oct 18 18:03:16 lion kernel: blk: queue c030c85c, I/O limit 4095Mb (mask 
0xffffffff)
Oct 18 18:03:21 lion kernel: hdg: read_intr: status=0x59 { DriveReady 
SeekComplete DataRequest Error }
Oct 18 18:03:21 lion kernel: hdg: read_intr: error=0x40 { 
UncorrectableError }, LBAsect=53500655, sector=53500592
Oct 18 18:03:21 lion kernel: end_request: I/O error, dev 22:01 (hdg), 
sector 53500592

System info:
lion:~# uname -a
Linux lion 2.4.25 #4 Mon Aug 9 15:30:49 CEST 2004 i586 GNU/Linux

lion:~# gcc --version
gcc (GCC) 3.3.2 (Debian)
Copyright (C) 2003 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

lion:~# lspci -v
00:00.0 Host bridge: VIA Technologies, Inc. VT82C598 [Apollo MVP3] (rev 04)
         Flags: bus master, medium devsel, latency 16
         Memory at e0000000 (32-bit, prefetchable) [size=64M]
         Capabilities: [a0] AGP version 1.0

00:01.0 PCI bridge: VIA Technologies, Inc. VT82C598/694x [Apollo 
MVP3/Pro133x AGP] (prog-if 00 [Normal decode])
         Flags: bus master, 66Mhz, medium devsel, latency 0
         Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
         I/O behind bridge: 00009000-00009fff

00:07.0 ISA bridge: VIA Technologies, Inc. VT82C596 ISA [Mobile South] 
(rev 12)
         Subsystem: VIA Technologies, Inc. VT82C596/A/B PCI to ISA Bridge
         Flags: bus master, stepping, medium devsel, latency 0

00:07.1 IDE interface: VIA Technologies, Inc. 
VT82C586A/B/VT82C686/A/B/VT8233/A/C/VT8235 PIPC Bus Master IDE (rev 06) 
(prog-if 8a [Master SecP PriP])
         Flags: bus master, medium devsel, latency 64
         I/O ports at a000 [size=16]

00:07.2 USB Controller: VIA Technologies, Inc. USB (rev 08) (prog-if 00 
[UHCI])
         Subsystem: VIA Technologies, Inc. (Wrong ID) USB Controller
         Flags: bus master, medium devsel, latency 64, IRQ 11
         I/O ports at a400 [size=32]

00:07.3 Host bridge: VIA Technologies, Inc. VT82C596 Power Management 
(rev 20)
         Flags: medium devsel

00:09.0 Unknown mass storage controller: Promise Technology, Inc. 20268 
(rev 02) (prog-if 85)
         Subsystem: Promise Technology, Inc. Ultra100TX2
         Flags: bus master, 66Mhz, slow devsel, latency 64, IRQ 12
         I/O ports at a800 [size=8]
         I/O ports at ac00 [size=4]
         I/O ports at b000 [size=8]
         I/O ports at b400 [size=4]
         I/O ports at b800 [size=16]
         Memory at eb100000 (32-bit, non-prefetchable) [size=16K]
         Expansion ROM at e8000000 [disabled] [size=16K]
         Capabilities: [60] Power Management version 1

lion:~# cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 5
model           : 9
model name      : AMD-K6(tm) 3D+ Processor
stepping        : 1
cpu MHz         : 451.036
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr mce cx8 pge mmx syscall 3dnow 
k6_mtrr
bogomips        : 897.84

hde: WDC WD800BB-32BSA0, ATA DISK drive
hdf: WDC WD800BB-32BSA0, ATA DISK drive
blk: queue c030c408, I/O limit 4095Mb (mask 0xffffffff)
blk: queue c030c544, I/O limit 4095Mb (mask 0xffffffff)
hdg: WDC WD800BB-32BSA0, ATA DISK drive
hdh: WDC WD800BB-32CCB0, ATA DISK drive
blk: queue c030c85c, I/O limit 4095Mb (mask 0xffffffff)
blk: queue c030c998, I/O limit 4095Mb (mask 0xffffffff)

Well, that is all the info I can think of.
As you can see the system is a:
AMD K6-3 450 MHz with VIA Apollo MVP3 chipset.
Promise Ultra TX02 controller
4 x Western Digital 80 GB ATA100

I thought the controller was dying so I bought a new one but with the 
same result. Can it be that hdg is dying?

Please, CC me as I'm not subscribed to this list.

Regards,
Johan Groth

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Dma problems with Promise IDE controller
  2004-10-18 19:43 Dma problems with Promise IDE controller Johan Groth
@ 2004-10-18 20:22 ` Bartlomiej Zolnierkiewicz
  2004-10-18 21:20   ` Ross Biro
  0 siblings, 1 reply; 8+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2004-10-18 20:22 UTC (permalink / raw)
  To: Johan Groth; +Cc: linux-kernel

On Mon, 18 Oct 2004 20:43:23 +0100, Johan Groth <jgroth@dsl.pipex.com> wrote:
> Hi,
> I'm using a Promise controller controlling 4 IDE HD:s, setup as a sw
> raid0 array. Lately I'm getting interuppt problems that looks like this:
> 
> Oct 18 18:03:06 lion kernel: hdg: dma_timer_expiry: dma status == 0x61
> Oct 18 18:03:16 lion kernel: hdg: dma timeout retry: status=0x51 {
> DriveReady SeekComplete Error }
> Oct 18 18:03:16 lion kernel: hdg: dma timeout retry: error=0x40 {
> UncorrectableError }, LBAsect=53500655, sector=53500520
> Oct 18 18:03:16 lion kernel: end_request: I/O error, dev 22:01 (hdg),
> sector 53500520
> Oct 18 18:03:16 lion kernel: blk: queue c030c85c, I/O limit 4095Mb (mask
> 0xffffffff)
> Oct 18 18:03:21 lion kernel: hdg: read_intr: status=0x59 { DriveReady
> SeekComplete DataRequest Error }
> Oct 18 18:03:21 lion kernel: hdg: read_intr: error=0x40 {
> UncorrectableError }, LBAsect=53500655, sector=53500592
> Oct 18 18:03:21 lion kernel: end_request: I/O error, dev 22:01 (hdg),
> sector 53500592

...

> I thought the controller was dying so I bought a new one but with the
> same result. Can it be that hdg is dying?

Yes, you can use smartmontools (http://smartmontools.sf.net) to check the drive.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Dma problems with Promise IDE controller
  2004-10-18 20:22 ` Bartlomiej Zolnierkiewicz
@ 2004-10-18 21:20   ` Ross Biro
  2004-10-19 16:17     ` Johan Groth
  0 siblings, 1 reply; 8+ messages in thread
From: Ross Biro @ 2004-10-18 21:20 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz; +Cc: Johan Groth, linux-kernel

On Mon, 18 Oct 2004 22:22:38 +0200, Bartlomiej Zolnierkiewicz
<bzolnier@gmail.com> wrote:
> On Mon, 18 Oct 2004 20:43:23 +0100, Johan Groth <jgroth@dsl.pipex.com> wrote:
> > Oct 18 18:03:16 lion kernel: hdg: dma timeout retry: error=0x40 {
> > UncorrectableError }, LBAsect=53500655, sector=53500520

The Uncorrectable Error is a dead give away.  You have a bad sector on
your drive.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Dma problems with Promise IDE controller
  2004-10-18 21:20   ` Ross Biro
@ 2004-10-19 16:17     ` Johan Groth
  2004-10-19 17:13       ` Ross Biro
  0 siblings, 1 reply; 8+ messages in thread
From: Johan Groth @ 2004-10-19 16:17 UTC (permalink / raw)
  To: Ross Biro; +Cc: Bartlomiej Zolnierkiewicz, linux-kernel

Ross Biro wrote:
> On Mon, 18 Oct 2004 22:22:38 +0200, Bartlomiej Zolnierkiewicz
> <bzolnier@gmail.com> wrote:
> 
>>On Mon, 18 Oct 2004 20:43:23 +0100, Johan Groth <jgroth@dsl.pipex.com> wrote:
>>
>>>Oct 18 18:03:16 lion kernel: hdg: dma timeout retry: error=0x40 {
>>>UncorrectableError }, LBAsect=53500655, sector=53500520
> 
> 
> The Uncorrectable Error is a dead give away.  You have a bad sector on
> your drive.
> 
How am I supposed to fix those blocks? I've tried with e2fsck -c -c -y 
/dev/md0  but that yields the following printout in the log.

Oct 19 18:12:13 lion kernel: hdg: dma_timer_expiry: dma status == 0x61
Oct 19 18:12:23 lion kernel: hdg: dma timeout retry: status=0x51 { 
DriveReady SeekComplete Error }
Oct 19 18:12:23 lion kernel: hdg: dma timeout retry: error=0x40 { 
UncorrectableError }, LBAsect=156145, sector=156064
Oct 19 18:12:23 lion kernel: end_request: I/O error, dev 22:01 (hdg), 
sector 156064
Oct 19 18:12:24 lion kernel: blk: queue c8828afc, I/O limit 4095Mb (mask 
0xffffffff)
Oct 19 18:12:29 lion kernel: hdg: read_intr: status=0x59 { DriveReady 
SeekComplete DataRequest Error }
Oct 19 18:12:29 lion kernel: hdg: read_intr: error=0x40 { 
UncorrectableError }, LBAsect=156145, sector=156082
Oct 19 18:12:29 lion kernel: end_request: I/O error, dev 22:01 (hdg), 
sector 156082
Oct 19 18:12:49 lion kernel: hdg: dma_timer_expiry: dma status == 0x61
Oct 19 18:12:59 lion kernel: hdg: dma timeout retry: status=0x51 { 
DriveReady SeekComplete Error }
Oct 19 18:12:59 lion kernel: hdg: dma timeout retry: error=0x40 { 
UncorrectableError }, LBAsect=156145, sector=156072
Oct 19 18:12:59 lion kernel: end_request: I/O error, dev 22:01 (hdg), 
sector 156072

This goes on for a while and after that the following appears in the log.
Oct 19 18:14:29 lion kernel:
Oct 19 18:14:29 lion kernel: hdg: status timeout: status=0xd0 { Busy }
Oct 19 18:14:29 lion kernel:
Oct 19 18:14:29 lion kernel: hdh: DMA disabled
Oct 19 18:14:29 lion kernel: PDC202XX: Secondary channel reset.
Oct 19 18:14:29 lion kernel: ide3: reset: success
Oct 19 18:14:34 lion kernel: hdh: status error: status=0x58 { DriveReady 
SeekComplete DataRequest }
Oct 19 18:14:34 lion kernel:
Oct 19 18:14:34 lion kernel: hdh: status timeout: status=0xd0 { Busy }
Oct 19 18:14:34 lion kernel:
Oct 19 18:14:34 lion kernel: PDC202XX: Secondary channel reset.
Oct 19 18:14:34 lion kernel: ide3: reset: success
Oct 19 18:14:40 lion kernel: hdh: status error: status=0x58 { DriveReady 
SeekComplete DataRequest }
Oct 19 18:14:40 lion kernel:
Oct 19 18:14:40 lion kernel: hdh: status timeout: status=0xd0 { Busy }

And that goes on as long as e2fsck runs. So it takes forever just to 
check a couple of blocks and as it has to check > 78E6 blocks it will 
take weeks.

Is there something wrong with the drivers or the controller or the hd:s?

Please CC me as I'm not on the list.

/Johan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Dma problems with Promise IDE controller
  2004-10-19 16:17     ` Johan Groth
@ 2004-10-19 17:13       ` Ross Biro
  2004-10-19 17:23         ` Johan Groth
  0 siblings, 1 reply; 8+ messages in thread
From: Ross Biro @ 2004-10-19 17:13 UTC (permalink / raw)
  To: Johan Groth; +Cc: Bartlomiej Zolnierkiewicz, linux-kernel

On Tue, 19 Oct 2004 17:17:33 +0100, Johan Groth <jgroth@dsl.pipex.com> wrote:
> Ross Biro wrote:
> 
> 
> > On Mon, 18 Oct 2004 22:22:38 +0200, Bartlomiej Zolnierkiewicz
> > <bzolnier@gmail.com> wrote:
> >
> >>On Mon, 18 Oct 2004 20:43:23 +0100, Johan Groth <jgroth@dsl.pipex.com> wrote:
> >>
> >>>Oct 18 18:03:16 lion kernel: hdg: dma timeout retry: error=0x40 {
> >>>UncorrectableError }, LBAsect=53500655, sector=53500520
> >
> >
> > The Uncorrectable Error is a dead give away.  You have a bad sector on
> > your drive.
> > 
> How am I supposed to fix those blocks? I've tried with e2fsck -c -c -y
> /dev/md0  but that yields the following printout in the log.
> 
The drive still has a bad sector.  You are having trouble because the
error recover in the Linux ide code is not the same as Windows and
most drive vendors care about Windows, not the ATA-Spec.  On top of
that Linux switches out of DMA mode once it hits a bad sector, so the
drive will be very slow from the on.

The only way you are going to fix the problem is if your drive has
some spare sectors still available, and you do a write with out a read
to the bad sector.

Ross

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Dma problems with Promise IDE controller
  2004-10-19 17:13       ` Ross Biro
@ 2004-10-19 17:23         ` Johan Groth
  2004-10-19 17:43           ` Richard B. Johnson
  2004-10-19 17:44           ` Ross Biro
  0 siblings, 2 replies; 8+ messages in thread
From: Johan Groth @ 2004-10-19 17:23 UTC (permalink / raw)
  To: Ross Biro; +Cc: Bartlomiej Zolnierkiewicz, linux-kernel

Ross Biro wrote:
[snip]

> 
> The drive still has a bad sector.  You are having trouble because the
> error recover in the Linux ide code is not the same as Windows and
> most drive vendors care about Windows, not the ATA-Spec.  On top of
> that Linux switches out of DMA mode once it hits a bad sector, so the
> drive will be very slow from the on.
> 
> The only way you are going to fix the problem is if your drive has
> some spare sectors still available, and you do a write with out a read
> to the bad sector.

Ok, I pretty sure it has spare sectors. How do I write to that sector 
without a read and how do I find which sector is bad?

Sorry for all these questions but this is the first time I've had these 
kind of problems ever. SCSI disks fix bad blocks by themselves so you 
don't have to do anything.

Regards,
Johan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Dma problems with Promise IDE controller
  2004-10-19 17:23         ` Johan Groth
@ 2004-10-19 17:43           ` Richard B. Johnson
  2004-10-19 17:44           ` Ross Biro
  1 sibling, 0 replies; 8+ messages in thread
From: Richard B. Johnson @ 2004-10-19 17:43 UTC (permalink / raw)
  To: Johan Groth; +Cc: Ross Biro, Bartlomiej Zolnierkiewicz, linux-kernel

On Tue, 19 Oct 2004, Johan Groth wrote:

> Ross Biro wrote:
> [snip]
>
>> 
>> The drive still has a bad sector.  You are having trouble because the
>> error recover in the Linux ide code is not the same as Windows and
>> most drive vendors care about Windows, not the ATA-Spec.  On top of
>> that Linux switches out of DMA mode once it hits a bad sector, so the
>> drive will be very slow from the on.
>> 
>> The only way you are going to fix the problem is if your drive has
>> some spare sectors still available, and you do a write with out a read
>> to the bad sector.
>
> Ok, I pretty sure it has spare sectors. How do I write to that sector without 
> a read and how do I find which sector is bad?
>
> Sorry for all these questions but this is the first time I've had these kind 
> of problems ever. SCSI disks fix bad blocks by themselves so you don't have 
> to do anything.
>
> Regards,
> Johan

man `badblocks`

Also, if you has a BIOS screen when the machine is booting, that
are tools for SCSI (Adaptec has this), then you can use the
SCSI disk utility to replace any bad blocks. Generally, it
reads everything and relocates anything it can't read. You
man end up with corrupt files, but the disk ends up clean.


Cheers,
Dick Johnson
Penguin : Linux version 2.6.9 on an i686 machine (5537.79 GrumpyMips).
                  98.36% of all statistics are fiction.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Dma problems with Promise IDE controller
  2004-10-19 17:23         ` Johan Groth
  2004-10-19 17:43           ` Richard B. Johnson
@ 2004-10-19 17:44           ` Ross Biro
  1 sibling, 0 replies; 8+ messages in thread
From: Ross Biro @ 2004-10-19 17:44 UTC (permalink / raw)
  To: Johan Groth; +Cc: Bartlomiej Zolnierkiewicz, linux-kernel

On Tue, 19 Oct 2004 18:23:07 +0100, Johan Groth <jgroth@dsl.pipex.com> wrote:
> Ross Biro wrote:
> [snip]
> 
> >
> > The drive still has a bad sector.  You are having trouble because the
> > error recover in the Linux ide code is not the same as Windows and
> > most drive vendors care about Windows, not the ATA-Spec.  On top of
> > that Linux switches out of DMA mode once it hits a bad sector, so the
> > drive will be very slow from the on.
> >
> > The only way you are going to fix the problem is if your drive has
> > some spare sectors still available, and you do a write with out a read
> > to the bad sector.
> 
> Ok, I pretty sure it has spare sectors. How do I write to that sector
> without a read and how do I find which sector is bad?

That part is easy.  It's in your error message. 156064 is the bad
sector.  I would use dd if=/dev/zero of=/dev/hd???? bs=512 seek=?????
count=1 to write the sector, but before I did that, I would be very
sure of my sector number.  The best way I can think of to do that is
to turn off read aheda for that device and attempt to read one sector
at a time until you find the bad one.  Then reboot, double check,
reboot again, and finally write that sector out.  Then you'll need to
do an fsck to fix the file system.  You will have lost some data, but
it may not be clare what file(s) have been damaged.

If you are very confident in your backups, you could just dd
if=/dev/zero of=/dev/hd???? bs=something big and wipe the whole drive.
 That will remapp all of the bad sectors, then just mke2fs the device
and start over.

Becareful doing any of the above, if you do it wrong, you lose data. 
Even if you do it write, you lose some data, just not as much.

    Ross

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2004-10-19 19:00 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-18 19:43 Dma problems with Promise IDE controller Johan Groth
2004-10-18 20:22 ` Bartlomiej Zolnierkiewicz
2004-10-18 21:20   ` Ross Biro
2004-10-19 16:17     ` Johan Groth
2004-10-19 17:13       ` Ross Biro
2004-10-19 17:23         ` Johan Groth
2004-10-19 17:43           ` Richard B. Johnson
2004-10-19 17:44           ` Ross Biro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).