All of lore.kernel.org
 help / color / mirror / Atom feed
* need help with ata error
@ 2007-02-09  9:39 Eyal Lebedinsky
  2007-02-09 10:37 ` Tejun Heo
  0 siblings, 1 reply; 3+ messages in thread
From: Eyal Lebedinsky @ 2007-02-09  9:39 UTC (permalink / raw)
  To: list linux-ide

I recently added a 6th disk to a RAID5. All disks are WD 320GB SATA, of different
Caviar models (SE, RE) and this new one is RE16.

It worked well for about 5 days (completed a 20 hour grow OK). I now see the following
messages logged (see at end). Can someone explain what it means? The raid5 is still
up and it did not react to this. Being a mythtv repository it gets used regularly.

Is this a disk issue? A controller issue (the new disk is now the fourth on a
Promise SATA-II-150-TX4)? A kernel problem (2.6.20 vanilla).

ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata6.00: cmd 25/00:b8:3f:c4:b6/00:00:20:00:00/e0 tag 0 cdb 0x0 data 94208 in
         res 50/00:00:f6:c4:b6/00:00:00:00:00/e0 Emask 0x1 (device error)
ata6.00: configured for UDMA/133
ata6: EH complete
SCSI device sdf: 625142448 512-byte hdwr sectors (320073 MB)
sdf: Write Protect is off
sdf: Mode Sense: 00 3a 00 00
SCSI device sdf: write cache: enabled, read cache: enabled, doesn't support DPO or FUA

The disk was detected at bootup as:

sata_promise 0000:03:01.0: version 1.05
ACPI: PCI Interrupt 0000:03:01.0[A] -> GSI 21 (level, low) -> IRQ 19
ata3: SATA max UDMA/133 cmd 0xF8CB2200 ctl 0xF8CB2238 bmdma 0x0 irq 19
ata4: SATA max UDMA/133 cmd 0xF8CB2280 ctl 0xF8CB22B8 bmdma 0x0 irq 19
ata5: SATA max UDMA/133 cmd 0xF8CB2300 ctl 0xF8CB2338 bmdma 0x0 irq 19
ata6: SATA max UDMA/133 cmd 0xF8CB2380 ctl 0xF8CB23B8 bmdma 0x0 irq 19
.....
scsi5 : sata_promise
.....
ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata6.00: ATA-7, max UDMA/133, 625142448 sectors: LBA48 NCQ (depth 0/1)
ata6.00: ata6: dev 0 multi count 0
ata6.00: configured for UDMA/133
.....
scsi 5:0:0:0: Direct-Access     ATA      WDC WD3200YS-01P 21.0 PQ: 0 ANSI: 5
SCSI device sdf: 625142448 512-byte hdwr sectors (320073 MB)
sdf: Write Protect is off
sdf: Mode Sense: 00 3a 00 00
SCSI device sdf: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
SCSI device sdf: 625142448 512-byte hdwr sectors (320073 MB)
sdf: Write Protect is off
sdf: Mode Sense: 00 3a 00 00
SCSI device sdf: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sdf: sdf1
sd 5:0:0:0: Attached scsi disk sdf

TIA

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au) <http://samba.org/eyal/>
	attach .zip as .dat

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: need help with ata error
  2007-02-09  9:39 need help with ata error Eyal Lebedinsky
@ 2007-02-09 10:37 ` Tejun Heo
  2007-02-16 22:43   ` Eyal Lebedinsky
  0 siblings, 1 reply; 3+ messages in thread
From: Tejun Heo @ 2007-02-09 10:37 UTC (permalink / raw)
  To: Eyal Lebedinsky; +Cc: list linux-ide, mikpe

[cc'ing Mikael Pettersson, hi!]

Eyal Lebedinsky wrote:
> I recently added a 6th disk to a RAID5. All disks are WD 320GB SATA, of different
> Caviar models (SE, RE) and this new one is RE16.
> 
> It worked well for about 5 days (completed a 20 hour grow OK). I now see the following
> messages logged (see at end). Can someone explain what it means? The raid5 is still
> up and it did not react to this. Being a mythtv repository it gets used regularly.
> 
> Is this a disk issue? A controller issue (the new disk is now the fourth on a
> Promise SATA-II-150-TX4)? A kernel problem (2.6.20 vanilla).
> 
> ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> ata6.00: cmd 25/00:b8:3f:c4:b6/00:00:20:00:00/e0 tag 0 cdb 0x0 data 94208 in
>          res 50/00:00:f6:c4:b6/00:00:00:00:00/e0 Emask 0x1 (device error)

Device error w/o ATA_ERR set?  Mikael, this seems coming from 
PDC_ERR_MASK test in pdc_host_intr().  AC_ERR_DEV means 'the attached 
ATA/ATAPI device indicated error condition', so it isn't really 
appropriate there nor is pdc_reset_port() in IRQ handler.  I guess this 
is from the old EH days.

Unknown errors can use AC_ERR_OTHER which will be automatically cleared 
if error diagnosis results in any real error mask.  I think what should 
be done here is recording irq mask using ata_ehi_push_desc() and setting 
specific AC_ERR_* according to the IRQ mask as ahci and sata_sil24 do.

Eyal, if the error doesn't repeat, you can ignore it.  It probably is a 
transient transmission problem, power fluctuation or whatever.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: need help with ata error
  2007-02-09 10:37 ` Tejun Heo
@ 2007-02-16 22:43   ` Eyal Lebedinsky
  0 siblings, 0 replies; 3+ messages in thread
From: Eyal Lebedinsky @ 2007-02-16 22:43 UTC (permalink / raw)
  To: Tejun Heo; +Cc: list linux-ide, mikpe

Today I got a similar error (I think) once during the overnight RAID "check".
This time it was sdc (was sdf in my original report). Both are on the Promise.
The check completed on time with zero mismatches.

Still 2.6.20 vanilla:
	Linux e7 2.6.20 #1 Mon Feb 5 22:08:32 EST 2007 i686 GNU/Linux

Also, the disks normally claim to be set to UDMA/133 but this time is says UDMA/100.

dmesg has a complete report, but /var/log/messages is missing some of the lines:

[927080.617744] md: data-check of RAID array md0
[927080.630783] md: minimum _guaranteed_  speed: 24000 KB/sec/disk.
[927080.648734] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
[927080.678103] md: using 128k window, over a total of 312568576 blocks.

[937567.332751] ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x4190002 action 0x2
[937567.354094] ata3.00: cmd b0/d5:01:09:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 512 in
[937567.354096]          res 51/04:83:45:00:00/00:00:00:00:00/a0 Emask 0x10 (ATA bus error)
[937568.120783] ata3: soft resetting port
[937568.282450] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[937568.306693] ata3.00: configured for UDMA/100
[937568.319733] ata3: EH complete
[937568.361223] SCSI device sdc: 625142448 512-byte hdwr sectors (320073 MB)
[937568.397207] sdc: Write Protect is off
[937568.408620] sdc: Mode Sense: 00 3a 00 00
[937568.453522] SCSI device sdc: write cache: enabled, read cache: enabled, doesn't support DPO or FUA

[941696.843935] md: md0: data-check done.
[941697.246454] RAID5 conf printout:
[941697.256366]  --- rd:6 wd:6
[941697.264718]  disk 0, o:1, dev:sda1
[941697.275146]  disk 1, o:1, dev:sdb1
[941697.285575]  disk 2, o:1, dev:sdc1
[941697.296003]  disk 3, o:1, dev:sdd1
[941697.306432]  disk 4, o:1, dev:sde1
[941697.316862]  disk 5, o:1, dev:sdf1

Tejun Heo wrote:
> [cc'ing Mikael Pettersson, hi!]
> 
> Eyal Lebedinsky wrote:
> 
>> I recently added a 6th disk to a RAID5. All disks are WD 320GB SATA,
>> of different
>> Caviar models (SE, RE) and this new one is RE16.
>>
>> It worked well for about 5 days (completed a 20 hour grow OK). I now
>> see the following
>> messages logged (see at end). Can someone explain what it means? The
>> raid5 is still
>> up and it did not react to this. Being a mythtv repository it gets
>> used regularly.
>>
>> Is this a disk issue? A controller issue (the new disk is now the
>> fourth on a
>> Promise SATA-II-150-TX4)? A kernel problem (2.6.20 vanilla).
>>
>> ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
>> ata6.00: cmd 25/00:b8:3f:c4:b6/00:00:20:00:00/e0 tag 0 cdb 0x0 data
>> 94208 in
>>          res 50/00:00:f6:c4:b6/00:00:00:00:00/e0 Emask 0x1 (device error)
> 
> Device error w/o ATA_ERR set?  Mikael, this seems coming from
> PDC_ERR_MASK test in pdc_host_intr().  AC_ERR_DEV means 'the attached
> ATA/ATAPI device indicated error condition', so it isn't really
> appropriate there nor is pdc_reset_port() in IRQ handler.  I guess this
> is from the old EH days.
> 
> Unknown errors can use AC_ERR_OTHER which will be automatically cleared
> if error diagnosis results in any real error mask.  I think what should
> be done here is recording irq mask using ata_ehi_push_desc() and setting
> specific AC_ERR_* according to the IRQ mask as ahci and sata_sil24 do.
> 
> Eyal, if the error doesn't repeat, you can ignore it.  It probably is a
> transient transmission problem, power fluctuation or whatever.
> 
> Thanks.

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au) <http://samba.org/eyal/>
	attach .zip as .dat

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-02-16 22:43 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-09  9:39 need help with ata error Eyal Lebedinsky
2007-02-09 10:37 ` Tejun Heo
2007-02-16 22:43   ` Eyal Lebedinsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.