All of lore.kernel.org
 help / color / mirror / Atom feed
* sata exception frozen timeout?
@ 2009-09-01  9:16 Thomas Fjellstrom
  2009-09-01 10:28 ` Thomas Fjellstrom
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Thomas Fjellstrom @ 2009-09-01  9:16 UTC (permalink / raw)
  To: linux-kernel

I've recently started seeing the following messages from a new WD Green drive, 
not long after I had to RMA a Seagate drive for a strange (but different) 
error. Highlights from my dmesg include:

[59634.792101] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 
frozen
[59634.792121] ata6.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[59634.792125]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 
(timeout)
[59634.792133] ata6.00: status: { DRDY }
[59634.792145] ata6: hard resetting link
[59635.277695] ata6: softreset failed (device not ready)
[59635.277705] ata6: failed due to HW bug, retry pmp=0
[59635.441570] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[59635.454030] ata6.00: configured for UDMA/133
[59635.454068] ata6: EH complete
...
[59713.792224] ata6.00: NCQ disabled due to excessive errors
[59713.792236] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 
frozen
[59713.792254] ata6.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[59713.792257]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 
(timeout)
[59713.792264] ata6.00: status: { DRDY }
...
[63484.781764] ata6: hard resetting link
[63485.264204] ata6: softreset failed (device not ready)
[63485.264214] ata6: failed due to HW bug, retry pmp=0
[63485.428205] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[63485.440499] ata6.00: configured for UDMA/100
[63485.440536] ata6: EH complete
[63493.780182] ata6.00: limiting speed to UDMA/33:PIO4
[63493.780194] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 
frozen
[63493.780211] ata6.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[63493.780214]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 
(timeout)
[63493.780221] ata6.00: status: { DRDY }
...
[99318.070669] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[99318.070679] ata6.00: irq_stat 0x40000001
[99318.070698] ata6.00: cmd 25/00:00:30:bd:01/00:01:1b:00:00/e0 tag 0 dma 
131072 in
[99318.070701]          res 41/04:ef:30:bd:01/00:00:1b:00:00/e0 Emask 0x1 
(device error)
[99318.070709] ata6.00: status: { DRDY ERR }
[99318.070714] ata6.00: error: { ABRT }
[99318.082588] ata6.00: configured for UDMA/33
[99318.082612] ata6: EH complete
...
[99339.780358] ata6: hard resetting link
[99349.785096] ata6: softreset failed (device not ready)
[99349.785108] ata6: hard resetting link
[99359.789033] ata6: softreset failed (device not ready)
[99359.789045] ata6: hard resetting link
[99370.352916] ata6: link is slow to respond, please be patient (ready=0)
..
[99400.016925] ata6: softreset failed (device not ready)
[99400.016936] ata6: reset failed, giving up
[99400.016942] ata6.00: disabled
[99400.016986] ata6: EH complete
[99400.017021] sd 5:0:0:0: [sdf] Unhandled error code
[99400.017026] sd 5:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET 
driverbyte=DRIVER_OK
[99400.017021] sd 5:0:0:0: [sdf] Unhandled error code
[99400.017026] sd 5:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET 
driverbyte=DRIVER_OK
[99400.017035] end_request: I/O error, dev sdf, sector 453099568
[99400.017045] Buffer I/O error on device md1, logical block 113274390
[99400.017058] Buffer I/O error on device md1, logical block 113274391
[99400.017064] Buffer I/O error on device md1, logical block 113274392
[99400.017070] Buffer I/O error on device md1, logical block 113274393
[99400.017076] Buffer I/O error on device md1, logical block 113274394
[99400.017082] Buffer I/O error on device md1, logical block 113274395
[99400.017088] Buffer I/O error on device md1, logical block 113274396
[99400.017094] Buffer I/O error on device md1, logical block 113274397
[99400.017100] Buffer I/O error on device md1, logical block 113274398
[99400.017106] Buffer I/O error on device md1, logical block 113274399
[99400.017115] sd 5:0:0:0: [sdf] Unhandled error code
[99400.017123] sd 5:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET 
driverbyte=DRIVER_OK
[99400.017133] end_request: I/O error, dev sdf, sector 453099824
[99400.017310] sd 5:0:0:0: [sdf] Unhandled error code
[99400.017315] sd 5:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET 
driverbyte=DRIVER_OK
[99400.017322] end_request: I/O error, dev sdf, sector 453099568
[99400.017383] program smartctl is using a deprecated SCSI ioctl, please 
convert it to SG_IO
[99400.017456] program smartctl is using a deprecated SCSI ioctl, please 
convert it to SG_IO
[99400.018822] sd 5:0:0:0: [sdf] Unhandled error code
[99400.018829] sd 5:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET 
driverbyte=DRIVER_OK
[99400.018838] end_request: I/O error, dev sdf, sector 453099568

and a bunch more end_request errors. This happens fairly reliably after a day 
or so of being online. I try and start to play some music or watch a video off 
the array (2xWD Green on md raid0) and it fails. The file system is mostly 
read only right now (that is I don't write to it) so I haven't manged to loose 
any data that I know of, and the fs seems to be fine. A reboot fixes the 
errors and the disk and fs work fine for a day or two before these errors show 
up again.

I'm currently running 2.6.31-rc5 from kernel.org. What would cause these 
errors?

-- 
Thomas Fjellstrom
tfjellstrom@shaw.ca

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sata exception frozen timeout?
  2009-09-01  9:16 sata exception frozen timeout? Thomas Fjellstrom
@ 2009-09-01 10:28 ` Thomas Fjellstrom
  2009-09-01 20:34   ` Thomas Fjellstrom
  2009-09-02  4:32 ` Robert Hancock
  2009-09-03 19:38 ` BIOS update == more errors (was Re: sata exception frozen timeout?) Thomas Fjellstrom
  2 siblings, 1 reply; 9+ messages in thread
From: Thomas Fjellstrom @ 2009-09-01 10:28 UTC (permalink / raw)
  To: linux-kernel

On Tue September 1 2009, Thomas Fjellstrom wrote:
> I've recently started seeing the following messages from a new WD Green
> drive, not long after I had to RMA a Seagate drive for a strange (but
> different) error. Highlights from my dmesg include:
>
> [59634.792101] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> frozen
> [59634.792121] ata6.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
> [59634.792125]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4
> (timeout)
> [59634.792133] ata6.00: status: { DRDY }
> [59634.792145] ata6: hard resetting link
> [59635.277695] ata6: softreset failed (device not ready)
> [59635.277705] ata6: failed due to HW bug, retry pmp=0
> [59635.441570] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [59635.454030] ata6.00: configured for UDMA/133
> [59635.454068] ata6: EH complete
> ...
> [59713.792224] ata6.00: NCQ disabled due to excessive errors
> [59713.792236] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> frozen
> [59713.792254] ata6.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
> [59713.792257]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4
> (timeout)
> [59713.792264] ata6.00: status: { DRDY }
> ...
> [63484.781764] ata6: hard resetting link
> [63485.264204] ata6: softreset failed (device not ready)
> [63485.264214] ata6: failed due to HW bug, retry pmp=0
> [63485.428205] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> [63485.440499] ata6.00: configured for UDMA/100
> [63485.440536] ata6: EH complete
> [63493.780182] ata6.00: limiting speed to UDMA/33:PIO4
> [63493.780194] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> frozen
> [63493.780211] ata6.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
> [63493.780214]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4
> (timeout)
> [63493.780221] ata6.00: status: { DRDY }
> ...
> [99318.070669] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [99318.070679] ata6.00: irq_stat 0x40000001
> [99318.070698] ata6.00: cmd 25/00:00:30:bd:01/00:01:1b:00:00/e0 tag 0 dma
> 131072 in
> [99318.070701]          res 41/04:ef:30:bd:01/00:00:1b:00:00/e0 Emask 0x1
> (device error)
> [99318.070709] ata6.00: status: { DRDY ERR }
> [99318.070714] ata6.00: error: { ABRT }
> [99318.082588] ata6.00: configured for UDMA/33
> [99318.082612] ata6: EH complete
> ...
> [99339.780358] ata6: hard resetting link
> [99349.785096] ata6: softreset failed (device not ready)
> [99349.785108] ata6: hard resetting link
> [99359.789033] ata6: softreset failed (device not ready)
> [99359.789045] ata6: hard resetting link
> [99370.352916] ata6: link is slow to respond, please be patient (ready=0)
> ..
> [99400.016925] ata6: softreset failed (device not ready)
> [99400.016936] ata6: reset failed, giving up
> [99400.016942] ata6.00: disabled
> [99400.016986] ata6: EH complete
> [99400.017021] sd 5:0:0:0: [sdf] Unhandled error code
> [99400.017026] sd 5:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET
> driverbyte=DRIVER_OK
> [99400.017021] sd 5:0:0:0: [sdf] Unhandled error code
> [99400.017026] sd 5:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET
> driverbyte=DRIVER_OK
> [99400.017035] end_request: I/O error, dev sdf, sector 453099568
> [99400.017045] Buffer I/O error on device md1, logical block 113274390
> [99400.017058] Buffer I/O error on device md1, logical block 113274391
> [99400.017064] Buffer I/O error on device md1, logical block 113274392
> [99400.017070] Buffer I/O error on device md1, logical block 113274393
> [99400.017076] Buffer I/O error on device md1, logical block 113274394
> [99400.017082] Buffer I/O error on device md1, logical block 113274395
> [99400.017088] Buffer I/O error on device md1, logical block 113274396
> [99400.017094] Buffer I/O error on device md1, logical block 113274397
> [99400.017100] Buffer I/O error on device md1, logical block 113274398
> [99400.017106] Buffer I/O error on device md1, logical block 113274399
> [99400.017115] sd 5:0:0:0: [sdf] Unhandled error code
> [99400.017123] sd 5:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET
> driverbyte=DRIVER_OK
> [99400.017133] end_request: I/O error, dev sdf, sector 453099824
> [99400.017310] sd 5:0:0:0: [sdf] Unhandled error code
> [99400.017315] sd 5:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET
> driverbyte=DRIVER_OK
> [99400.017322] end_request: I/O error, dev sdf, sector 453099568
> [99400.017383] program smartctl is using a deprecated SCSI ioctl, please
> convert it to SG_IO
> [99400.017456] program smartctl is using a deprecated SCSI ioctl, please
> convert it to SG_IO
> [99400.018822] sd 5:0:0:0: [sdf] Unhandled error code
> [99400.018829] sd 5:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET
> driverbyte=DRIVER_OK
> [99400.018838] end_request: I/O error, dev sdf, sector 453099568
>
> and a bunch more end_request errors. This happens fairly reliably after a
> day or so of being online. I try and start to play some music or watch a
> video off the array (2xWD Green on md raid0) and it fails. The file system
> is mostly read only right now (that is I don't write to it) so I haven't
> manged to loose any data that I know of, and the fs seems to be fine. A
> reboot fixes the errors and the disk and fs work fine for a day or two
> before these errors show up again.
>
> I'm currently running 2.6.31-rc5 from kernel.org. What would cause these
> errors?

It's now happened again after only 30 minutes or so. I'm trying with an older 
kernel (debian's 2.6.30-6 from sid) now. So far so good.

-- 
Thomas Fjellstrom
tfjellstrom@shaw.ca

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sata exception frozen timeout?
  2009-09-01 10:28 ` Thomas Fjellstrom
@ 2009-09-01 20:34   ` Thomas Fjellstrom
  0 siblings, 0 replies; 9+ messages in thread
From: Thomas Fjellstrom @ 2009-09-01 20:34 UTC (permalink / raw)
  To: linux-kernel

On Tue September 1 2009, Thomas Fjellstrom wrote:
> On Tue September 1 2009, Thomas Fjellstrom wrote:
> > I've recently started seeing the following messages from a new WD Green
> > drive, not long after I had to RMA a Seagate drive for a strange (but
> > different) error. Highlights from my dmesg include:
> >
> > [59634.792101] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> > frozen
> > [59634.792121] ata6.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
> > [59634.792125]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4
> > (timeout)
> > [59634.792133] ata6.00: status: { DRDY }
> > [59634.792145] ata6: hard resetting link
> > [59635.277695] ata6: softreset failed (device not ready)
> > [59635.277705] ata6: failed due to HW bug, retry pmp=0
> > [59635.441570] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> > [59635.454030] ata6.00: configured for UDMA/133
> > [59635.454068] ata6: EH complete
> > ...
> > [59713.792224] ata6.00: NCQ disabled due to excessive errors
> > [59713.792236] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> > frozen
> > [59713.792254] ata6.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
> > [59713.792257]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4
> > (timeout)
> > [59713.792264] ata6.00: status: { DRDY }
> > ...
> > [63484.781764] ata6: hard resetting link
> > [63485.264204] ata6: softreset failed (device not ready)
> > [63485.264214] ata6: failed due to HW bug, retry pmp=0
> > [63485.428205] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> > [63485.440499] ata6.00: configured for UDMA/100
> > [63485.440536] ata6: EH complete
> > [63493.780182] ata6.00: limiting speed to UDMA/33:PIO4
> > [63493.780194] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> > frozen
> > [63493.780211] ata6.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
> > [63493.780214]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4
> > (timeout)
> > [63493.780221] ata6.00: status: { DRDY }
> > ...
> > [99318.070669] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> > [99318.070679] ata6.00: irq_stat 0x40000001
> > [99318.070698] ata6.00: cmd 25/00:00:30:bd:01/00:01:1b:00:00/e0 tag 0 dma
> > 131072 in
> > [99318.070701]          res 41/04:ef:30:bd:01/00:00:1b:00:00/e0 Emask 0x1
> > (device error)
> > [99318.070709] ata6.00: status: { DRDY ERR }
> > [99318.070714] ata6.00: error: { ABRT }
> > [99318.082588] ata6.00: configured for UDMA/33
> > [99318.082612] ata6: EH complete
> > ...
> > [99339.780358] ata6: hard resetting link
> > [99349.785096] ata6: softreset failed (device not ready)
> > [99349.785108] ata6: hard resetting link
> > [99359.789033] ata6: softreset failed (device not ready)
> > [99359.789045] ata6: hard resetting link
> > [99370.352916] ata6: link is slow to respond, please be patient (ready=0)
> > ..
> > [99400.016925] ata6: softreset failed (device not ready)
> > [99400.016936] ata6: reset failed, giving up
> > [99400.016942] ata6.00: disabled
> > [99400.016986] ata6: EH complete
> > [99400.017021] sd 5:0:0:0: [sdf] Unhandled error code
> > [99400.017026] sd 5:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET
> > driverbyte=DRIVER_OK
> > [99400.017021] sd 5:0:0:0: [sdf] Unhandled error code
> > [99400.017026] sd 5:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET
> > driverbyte=DRIVER_OK
> > [99400.017035] end_request: I/O error, dev sdf, sector 453099568
> > [99400.017045] Buffer I/O error on device md1, logical block 113274390
> > [99400.017058] Buffer I/O error on device md1, logical block 113274391
> > [99400.017064] Buffer I/O error on device md1, logical block 113274392
> > [99400.017070] Buffer I/O error on device md1, logical block 113274393
> > [99400.017076] Buffer I/O error on device md1, logical block 113274394
> > [99400.017082] Buffer I/O error on device md1, logical block 113274395
> > [99400.017088] Buffer I/O error on device md1, logical block 113274396
> > [99400.017094] Buffer I/O error on device md1, logical block 113274397
> > [99400.017100] Buffer I/O error on device md1, logical block 113274398
> > [99400.017106] Buffer I/O error on device md1, logical block 113274399
> > [99400.017115] sd 5:0:0:0: [sdf] Unhandled error code
> > [99400.017123] sd 5:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET
> > driverbyte=DRIVER_OK
> > [99400.017133] end_request: I/O error, dev sdf, sector 453099824
> > [99400.017310] sd 5:0:0:0: [sdf] Unhandled error code
> > [99400.017315] sd 5:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET
> > driverbyte=DRIVER_OK
> > [99400.017322] end_request: I/O error, dev sdf, sector 453099568
> > [99400.017383] program smartctl is using a deprecated SCSI ioctl, please
> > convert it to SG_IO
> > [99400.017456] program smartctl is using a deprecated SCSI ioctl, please
> > convert it to SG_IO
> > [99400.018822] sd 5:0:0:0: [sdf] Unhandled error code
> > [99400.018829] sd 5:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET
> > driverbyte=DRIVER_OK
> > [99400.018838] end_request: I/O error, dev sdf, sector 453099568
> >
> > and a bunch more end_request errors. This happens fairly reliably after a
> > day or so of being online. I try and start to play some music or watch a
> > video off the array (2xWD Green on md raid0) and it fails. The file
> > system is mostly read only right now (that is I don't write to it) so I
> > haven't manged to loose any data that I know of, and the fs seems to be
> > fine. A reboot fixes the errors and the disk and fs work fine for a day
> > or two before these errors show up again.
> >
> > I'm currently running 2.6.31-rc5 from kernel.org. What would cause these
> > errors?
>
> It's now happened again after only 30 minutes or so. I'm trying with an
> older kernel (debian's 2.6.30-6 from sid) now. So far so good.

Lasted a whole half a day this time:

[29461.780188] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 
frozen                                                                           
[29461.780208] ata8.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0                                                                                     
[29461.780211]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 
(timeout)                                                                       
[29461.780219] ata8.00: status: { DRDY }                                                                                                                  
[29461.780230] ata8: hard resetting link                                                                                                                  
[29462.264190] ata8: softreset failed (device not ready)                                                                                                  
[29462.264198] ata8: failed due to HW bug, retry pmp=0                                                                                                    
[29462.428205] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)                                                                                     
[29462.440501] ata8.00: configured for UDMA/133                                                                                                           
[29462.440541] ata8: EH complete                                                                                                                          
[29469.780184] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 
frozen                                                                           
[29469.780204] ata8.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0                                                                                     
[29469.780207]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 
(timeout)                                                                       
[29469.780215] ata8.00: status: { DRDY }                                                                                                                  
[29469.780227] ata8: hard resetting link                                                                                                                  
[29470.264245] ata8: softreset failed (device not ready)                                                                                                  
[29470.264254] ata8: failed due to HW bug, retry pmp=0                                                                                                    
[29470.428710] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)                                                                                     
[29470.442007] ata8.00: configured for UDMA/133                                                                                                           
[29470.442040] ata8: EH complete                                                                                                                          
[30861.793185] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 
frozen                                                                           
[30861.793205] ata8.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0                                                                                     
[30861.793208]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 
(timeout)                                                                       
[30861.793216] ata8.00: status: { DRDY }                                                                                                                  
[30861.793228] ata8: hard resetting link                                                                                                                  
[30862.389604] ata8: softreset failed (device not ready)                                                                                                  
[30862.389614] ata8: failed due to HW bug, retry pmp=0                                                                                                    
[30862.553541] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)                                                                                     
[30862.565716] ata8.00: configured for UDMA/133                                                                                                           
[30862.565758] ata8: EH complete                                                                                                                          
[30869.793199] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 
frozen                                                                           
[30869.793219] ata8.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0                                                                                     
[30869.793223]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 
(timeout)                                                                       
[30869.793230] ata8.00: status: { DRDY }                                                                                                                  
[30869.793242] ata8: hard resetting link                                                                                                                  
[30870.333570] ata8: softreset failed (device not ready)                                                                                                  
[30870.333579] ata8: failed due to HW bug, retry pmp=0                                                                                                    
[30870.497680] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)                                                                                     
[30870.509783] ata8.00: configured for UDMA/133                                                                                                           
[30870.509813] ata8: EH complete                                                                                                                          
[36301.639209] ata8.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0                                                                                  
[36301.639218] ata8.00: irq_stat 0x40000008                                                                                                               
[36301.639237] ata8.00: cmd 60/18:00:80:e9:09/00:00:23:00:00/40 tag 0 ncq 
12288 in                                                                        
[36301.639240]          res 41/04:07:80:e9:09/00:00:23:00:00/40 Emask 0x401 
(device error) <F>                                                            
[36301.639248] ata8.00: status: { DRDY ERR }                                                                                                              
[36301.639253] ata8.00: error: { ABRT }                                                                                                                   
[36301.652443] ata8.00: configured for UDMA/133                                                                                                           
[36301.652466] ata8: EH complete                                                                                                                          
[36329.593455] ata8.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0                                                                                  
[36329.593472] ata8.00: irq_stat 0x40000008                                                                                                               
[36329.593491] ata8.00: cmd 60/18:00:80:e9:09/00:00:23:00:00/40 tag 0 ncq 
12288 in                                                                        
[36329.593495]          res 41/04:07:80:e9:09/00:00:23:00:00/40 Emask 0x401 
(device error) <F>                                                            
[36329.593524] ata8.00: status: { DRDY ERR }                                                                                                              
[36329.593530] ata8.00: error: { ABRT }                                                                                                                   
[36329.605664] ata8.00: configured for UDMA/133                                                                                                           
[36329.605687] ata8: EH complete                                                                                                                          
[36352.125981] ata8.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0                                                                                  
[36352.125991] ata8.00: irq_stat 0x40000008                                                                                                               
[36352.126009] ata8.00: cmd 60/18:00:80:e9:09/00:00:23:00:00/40 tag 0 ncq 
12288 in                                                                        
[36352.126013]          res 41/04:07:80:e9:09/00:00:23:00:00/40 Emask 0x401 
(device error) <F>                                                            
[36352.126021] ata8.00: status: { DRDY ERR }                                                                                                              
[36352.126026] ata8.00: error: { ABRT }                                                                                                                   
[36352.138677] ata8.00: configured for UDMA/133                                                                                                           
[36352.138700] ata8: EH complete                                                                                                                          
[36370.337334] ata8.00: NCQ disabled due to excessive errors                                                                                              
[36370.337339] ata8.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0                                                                                  
[36370.337341] ata8.00: irq_stat 0x40000008                                                                                                               
[36370.337347] ata8.00: cmd 60/18:00:80:e9:09/00:00:23:00:00/40 tag 0 ncq 
12288 in                                                                        
[36370.337348]          res 41/04:07:80:e9:09/00:00:23:00:00/40 Emask 0x401 
(device error) <F>                                                            
[36370.337350] ata8.00: status: { DRDY ERR }                                                                                                              
[36370.337352] ata8.00: error: { ABRT }                                                                                                                   
[36370.349333] ata8.00: configured for UDMA/133                                                                                                           
[36370.349360] ata8: EH complete                                                                                                                          
[36379.641876] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0                                                                                  
[36379.641885] ata8.00: irq_stat 0x40000001                                                                                                               
[36379.641903] ata8.00: cmd 25/00:18:80:e9:09/00:00:23:00:00/e0 tag 0 dma 
12288 in                                                                        
[36379.641907]          res 41/04:07:80:e9:09/00:00:23:00:00/e0 Emask 0x1 
(device error)                                                                  
[36379.641915] ata8.00: status: { DRDY ERR }                                                                                                              
[36379.641920] ata8.00: error: { ABRT }                                                                                                                   
[36379.653754] ata8.00: configured for UDMA/133                                                                                                           
[36379.653779] ata8: EH complete
[36388.973359] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[36388.973368] ata8.00: irq_stat 0x40000001
[36388.973386] ata8.00: cmd 25/00:18:80:e9:09/00:00:23:00:00/e0 tag 0 dma 
12288 in
[36388.973390]          res 41/04:07:80:e9:09/00:00:23:00:00/e0 Emask 0x1 
(device error)
[36388.973398] ata8.00: status: { DRDY ERR }
[36388.973403] ata8.00: error: { ABRT }
[36388.985379] ata8.00: configured for UDMA/133
[36388.985404] sd 7:0:0:0: [sdf] Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE
[36388.985412] sd 7:0:0:0: [sdf] Sense Key : Aborted Command [current] 
[descriptor]
[36388.985422] Descriptor sense data with sense descriptors (in hex):
[36388.985426]         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[36388.985445]         23 09 e9 80
[36388.985453] sd 7:0:0:0: [sdf] Add. Sense: No additional sense information
[36388.985462] end_request: I/O error, dev sdf, sector 587852160
[36388.985470] Buffer I/O error on device md1, logical block 146962544
[36388.985482] Buffer I/O error on device md1, logical block 146962545
[36388.985488] Buffer I/O error on device md1, logical block 146962546
[36388.985524] ata8: EH complete
[36398.303765] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[36398.303774] ata8.00: irq_stat 0x40000001
[36398.303792] ata8.00: cmd 25/00:08:80:e9:09/00:00:23:00:00/e0 tag 0 dma 4096 
in
[36398.303796]          res 41/04:00:80:e9:09/00:00:23:00:00/e0 Emask 0x1 
(device error)
[36398.303803] ata8.00: status: { DRDY ERR }
[36398.303808] ata8.00: error: { ABRT }
[36398.315894] ata8.00: configured for UDMA/133
[36398.315916] ata8: EH complete
[36407.346978] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[36407.346982] ata8.00: irq_stat 0x40000001
[36407.346988] ata8.00: cmd 25/00:08:80:e9:09/00:00:23:00:00/e0 tag 0 dma 4096 
in
[36407.346989]          res 41/04:00:80:e9:09/00:00:23:00:00/e0 Emask 0x1 
(device error)
[36407.346991] ata8.00: status: { DRDY ERR }
[36407.346993] ata8.00: error: { ABRT }
[36407.358876] ata8.00: configured for UDMA/133
[36407.358888] ata8: EH complete
[36416.680069] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[36416.680079] ata8.00: irq_stat 0x40000001
[36416.680097] ata8.00: cmd 25/00:08:80:e9:09/00:00:23:00:00/e0 tag 0 dma 4096 
in
[36416.680101]          res 41/04:00:80:e9:09/00:00:23:00:00/e0 Emask 0x1 
(device error)
[36416.680109] ata8.00: status: { DRDY ERR }
[36416.680114] ata8.00: error: { ABRT }
[36416.692769] ata8.00: configured for UDMA/133
[36416.692792] ata8: EH complete
[36426.011216] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[36426.011225] ata8.00: irq_stat 0x40000001
[36426.011243] ata8.00: cmd 25/00:08:80:e9:09/00:00:23:00:00/e0 tag 0 dma 4096 
in
[36426.011246]          res 41/04:00:80:e9:09/00:00:23:00:00/e0 Emask 0x1 
(device error)
[36426.011254] ata8.00: status: { DRDY ERR }
[36426.011259] ata8.00: error: { ABRT }
[36426.023178] ata8.00: configured for UDMA/133
[36426.023202] ata8: EH complete

Is this a hardware error?

-- 
Thomas Fjellstrom
tfjellstrom@shaw.ca

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sata exception frozen timeout?
  2009-09-01  9:16 sata exception frozen timeout? Thomas Fjellstrom
  2009-09-01 10:28 ` Thomas Fjellstrom
@ 2009-09-02  4:32 ` Robert Hancock
  2009-09-02  6:42   ` Thomas Fjellstrom
  2009-09-03 19:38 ` BIOS update == more errors (was Re: sata exception frozen timeout?) Thomas Fjellstrom
  2 siblings, 1 reply; 9+ messages in thread
From: Robert Hancock @ 2009-09-02  4:32 UTC (permalink / raw)
  To: tfjellstrom; +Cc: linux-kernel

On 09/01/2009 03:16 AM, Thomas Fjellstrom wrote:
> I've recently started seeing the following messages from a new WD Green drive,
> not long after I had to RMA a Seagate drive for a strange (but different)
> error. Highlights from my dmesg include:
>
> [59634.792101] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> frozen
> [59634.792121] ata6.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
> [59634.792125]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4
> (timeout)
> [59634.792133] ata6.00: status: { DRDY }
> [59634.792145] ata6: hard resetting link
> [59635.277695] ata6: softreset failed (device not ready)
> [59635.277705] ata6: failed due to HW bug, retry pmp=0
> [59635.441570] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [59635.454030] ata6.00: configured for UDMA/133
> [59635.454068] ata6: EH complete
> ...
> [59713.792224] ata6.00: NCQ disabled due to excessive errors
> [59713.792236] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> frozen
> [59713.792254] ata6.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
> [59713.792257]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4
> (timeout)
> [59713.792264] ata6.00: status: { DRDY }
> ...
> [63484.781764] ata6: hard resetting link
> [63485.264204] ata6: softreset failed (device not ready)
> [63485.264214] ata6: failed due to HW bug, retry pmp=0
> [63485.428205] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> [63485.440499] ata6.00: configured for UDMA/100
> [63485.440536] ata6: EH complete
> [63493.780182] ata6.00: limiting speed to UDMA/33:PIO4
> [63493.780194] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> frozen
> [63493.780211] ata6.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
> [63493.780214]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4
> (timeout)
> [63493.780221] ata6.00: status: { DRDY }
> ...
> [99318.070669] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [99318.070679] ata6.00: irq_stat 0x40000001
> [99318.070698] ata6.00: cmd 25/00:00:30:bd:01/00:01:1b:00:00/e0 tag 0 dma
> 131072 in
> [99318.070701]          res 41/04:ef:30:bd:01/00:00:1b:00:00/e0 Emask 0x1
> (device error)
> [99318.070709] ata6.00: status: { DRDY ERR }
> [99318.070714] ata6.00: error: { ABRT }
> [99318.082588] ata6.00: configured for UDMA/33
> [99318.082612] ata6: EH complete
> ...
> [99339.780358] ata6: hard resetting link
> [99349.785096] ata6: softreset failed (device not ready)
> [99349.785108] ata6: hard resetting link
> [99359.789033] ata6: softreset failed (device not ready)
> [99359.789045] ata6: hard resetting link
> [99370.352916] ata6: link is slow to respond, please be patient (ready=0)
> ..
> [99400.016925] ata6: softreset failed (device not ready)
> [99400.016936] ata6: reset failed, giving up
> [99400.016942] ata6.00: disabled
> [99400.016986] ata6: EH complete
> [99400.017021] sd 5:0:0:0: [sdf] Unhandled error code
> [99400.017026] sd 5:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET
> driverbyte=DRIVER_OK
> [99400.017021] sd 5:0:0:0: [sdf] Unhandled error code
> [99400.017026] sd 5:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET
> driverbyte=DRIVER_OK
> [99400.017035] end_request: I/O error, dev sdf, sector 453099568
> [99400.017045] Buffer I/O error on device md1, logical block 113274390
> [99400.017058] Buffer I/O error on device md1, logical block 113274391
> [99400.017064] Buffer I/O error on device md1, logical block 113274392
> [99400.017070] Buffer I/O error on device md1, logical block 113274393
> [99400.017076] Buffer I/O error on device md1, logical block 113274394
> [99400.017082] Buffer I/O error on device md1, logical block 113274395
> [99400.017088] Buffer I/O error on device md1, logical block 113274396
> [99400.017094] Buffer I/O error on device md1, logical block 113274397
> [99400.017100] Buffer I/O error on device md1, logical block 113274398
> [99400.017106] Buffer I/O error on device md1, logical block 113274399
> [99400.017115] sd 5:0:0:0: [sdf] Unhandled error code
> [99400.017123] sd 5:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET
> driverbyte=DRIVER_OK
> [99400.017133] end_request: I/O error, dev sdf, sector 453099824
> [99400.017310] sd 5:0:0:0: [sdf] Unhandled error code
> [99400.017315] sd 5:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET
> driverbyte=DRIVER_OK
> [99400.017322] end_request: I/O error, dev sdf, sector 453099568
> [99400.017383] program smartctl is using a deprecated SCSI ioctl, please
> convert it to SG_IO
> [99400.017456] program smartctl is using a deprecated SCSI ioctl, please
> convert it to SG_IO
> [99400.018822] sd 5:0:0:0: [sdf] Unhandled error code
> [99400.018829] sd 5:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET
> driverbyte=DRIVER_OK
> [99400.018838] end_request: I/O error, dev sdf, sector 453099568
>
> and a bunch more end_request errors. This happens fairly reliably after a day
> or so of being online. I try and start to play some music or watch a video off
> the array (2xWD Green on md raid0) and it fails. The file system is mostly
> read only right now (that is I don't write to it) so I haven't manged to loose
> any data that I know of, and the fs seems to be fine. A reboot fixes the
> errors and the disk and fs work fine for a day or two before these errors show
> up again.
>
> I'm currently running 2.6.31-rc5 from kernel.org. What would cause these
> errors?

This sort of thing is quite often a hardware problem. You can see that 
commands start timing out and then the controller seems to mostly lose 
communication with the drive (the resets failing). In particular, it 
seems an overloaded power supply can cause these sort of symptoms. (It 
seems like a lot of the time this happens with people with multiple hard 
drives installed, either the power supply just isn't powerful enough to 
handle them all, or there's too many drives connected to one cable and 
the voltage drop is too high under heavy load.)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sata exception frozen timeout?
  2009-09-02  4:32 ` Robert Hancock
@ 2009-09-02  6:42   ` Thomas Fjellstrom
  0 siblings, 0 replies; 9+ messages in thread
From: Thomas Fjellstrom @ 2009-09-02  6:42 UTC (permalink / raw)
  To: linux-kernel

On Tue September 1 2009, Robert Hancock wrote:
> On 09/01/2009 03:16 AM, Thomas Fjellstrom wrote:
> > I've recently started seeing the following messages from a new WD Green
> > drive, not long after I had to RMA a Seagate drive for a strange (but
> > different) error. Highlights from my dmesg include:
> >
> > [59634.792101] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> > frozen
> > [59634.792121] ata6.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
> > [59634.792125]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4
> > (timeout)
> > [59634.792133] ata6.00: status: { DRDY }
> > [59634.792145] ata6: hard resetting link
> > [59635.277695] ata6: softreset failed (device not ready)
> > [59635.277705] ata6: failed due to HW bug, retry pmp=0
> > [59635.441570] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> > [59635.454030] ata6.00: configured for UDMA/133
> > [59635.454068] ata6: EH complete
> > ...
> > [59713.792224] ata6.00: NCQ disabled due to excessive errors
> > [59713.792236] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> > frozen
> > [59713.792254] ata6.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
> > [59713.792257]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4
> > (timeout)
> > [59713.792264] ata6.00: status: { DRDY }
> > ...
> > [63484.781764] ata6: hard resetting link
> > [63485.264204] ata6: softreset failed (device not ready)
> > [63485.264214] ata6: failed due to HW bug, retry pmp=0
> > [63485.428205] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> > [63485.440499] ata6.00: configured for UDMA/100
> > [63485.440536] ata6: EH complete
> > [63493.780182] ata6.00: limiting speed to UDMA/33:PIO4
> > [63493.780194] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> > frozen
> > [63493.780211] ata6.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
> > [63493.780214]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4
> > (timeout)
> > [63493.780221] ata6.00: status: { DRDY }
> > ...
> > [99318.070669] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> > [99318.070679] ata6.00: irq_stat 0x40000001
> > [99318.070698] ata6.00: cmd 25/00:00:30:bd:01/00:01:1b:00:00/e0 tag 0 dma
> > 131072 in
> > [99318.070701]          res 41/04:ef:30:bd:01/00:00:1b:00:00/e0 Emask 0x1
> > (device error)
> > [99318.070709] ata6.00: status: { DRDY ERR }
> > [99318.070714] ata6.00: error: { ABRT }
> > [99318.082588] ata6.00: configured for UDMA/33
> > [99318.082612] ata6: EH complete
> > ...
> > [99339.780358] ata6: hard resetting link
> > [99349.785096] ata6: softreset failed (device not ready)
> > [99349.785108] ata6: hard resetting link
> > [99359.789033] ata6: softreset failed (device not ready)
> > [99359.789045] ata6: hard resetting link
> > [99370.352916] ata6: link is slow to respond, please be patient (ready=0)
> > ..
> > [99400.016925] ata6: softreset failed (device not ready)
> > [99400.016936] ata6: reset failed, giving up
> > [99400.016942] ata6.00: disabled
> > [99400.016986] ata6: EH complete
> > [99400.017021] sd 5:0:0:0: [sdf] Unhandled error code
> > [99400.017026] sd 5:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET
> > driverbyte=DRIVER_OK
> > [99400.017021] sd 5:0:0:0: [sdf] Unhandled error code
> > [99400.017026] sd 5:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET
> > driverbyte=DRIVER_OK
> > [99400.017035] end_request: I/O error, dev sdf, sector 453099568
> > [99400.017045] Buffer I/O error on device md1, logical block 113274390
> > [99400.017058] Buffer I/O error on device md1, logical block 113274391
> > [99400.017064] Buffer I/O error on device md1, logical block 113274392
> > [99400.017070] Buffer I/O error on device md1, logical block 113274393
> > [99400.017076] Buffer I/O error on device md1, logical block 113274394
> > [99400.017082] Buffer I/O error on device md1, logical block 113274395
> > [99400.017088] Buffer I/O error on device md1, logical block 113274396
> > [99400.017094] Buffer I/O error on device md1, logical block 113274397
> > [99400.017100] Buffer I/O error on device md1, logical block 113274398
> > [99400.017106] Buffer I/O error on device md1, logical block 113274399
> > [99400.017115] sd 5:0:0:0: [sdf] Unhandled error code
> > [99400.017123] sd 5:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET
> > driverbyte=DRIVER_OK
> > [99400.017133] end_request: I/O error, dev sdf, sector 453099824
> > [99400.017310] sd 5:0:0:0: [sdf] Unhandled error code
> > [99400.017315] sd 5:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET
> > driverbyte=DRIVER_OK
> > [99400.017322] end_request: I/O error, dev sdf, sector 453099568
> > [99400.017383] program smartctl is using a deprecated SCSI ioctl, please
> > convert it to SG_IO
> > [99400.017456] program smartctl is using a deprecated SCSI ioctl, please
> > convert it to SG_IO
> > [99400.018822] sd 5:0:0:0: [sdf] Unhandled error code
> > [99400.018829] sd 5:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET
> > driverbyte=DRIVER_OK
> > [99400.018838] end_request: I/O error, dev sdf, sector 453099568
> >
> > and a bunch more end_request errors. This happens fairly reliably after a
> > day or so of being online. I try and start to play some music or watch a
> > video off the array (2xWD Green on md raid0) and it fails. The file
> > system is mostly read only right now (that is I don't write to it) so I
> > haven't manged to loose any data that I know of, and the fs seems to be
> > fine. A reboot fixes the errors and the disk and fs work fine for a day
> > or two before these errors show up again.
> >
> > I'm currently running 2.6.31-rc5 from kernel.org. What would cause these
> > errors?
>
> This sort of thing is quite often a hardware problem. You can see that
> commands start timing out and then the controller seems to mostly lose
> communication with the drive (the resets failing). In particular, it
> seems an overloaded power supply can cause these sort of symptoms. (It
> seems like a lot of the time this happens with people with multiple hard
> drives installed, either the power supply just isn't powerful enough to
> handle them all, or there's too many drives connected to one cable and
> the voltage drop is too high under heavy load.)

I doubt that's the problem, this is an almost brand new 650W and there's only 
2 drives per cable, and each of the 3 12v rails has 19A which should be more 
than enough for all 6 drives, even if all of them were on the same rail 
(shouldn't be, 5v is 24A, so should be enough for just about anything).

Were you talking about the softreset's? That seems to be a common AMD SB600+ 
error. At least it happens with both of my AMD boards, a SB700 and a SB750. 
Linux eventually figures out a way to initialize the sata ports, but it sure 
takes a while if you have 10+ drives connected.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


-- 
Thomas Fjellstrom
tfjellstrom@shaw.ca

^ permalink raw reply	[flat|nested] 9+ messages in thread

* BIOS update == more errors (was Re: sata exception frozen timeout?)
  2009-09-01  9:16 sata exception frozen timeout? Thomas Fjellstrom
  2009-09-01 10:28 ` Thomas Fjellstrom
  2009-09-02  4:32 ` Robert Hancock
@ 2009-09-03 19:38 ` Thomas Fjellstrom
  2009-09-03 20:11   ` Thomas Fjellstrom
  2009-09-04  3:43   ` Tejun Heo
  2 siblings, 2 replies; 9+ messages in thread
From: Thomas Fjellstrom @ 2009-09-03 19:38 UTC (permalink / raw)
  To: linux-kernel

I've updated my bios to try and see if it would help at all (it did seem to 
fix other issues).

But now I'm getting the following warnings and errors from dmesg on boot: 
(debian sid 2.6.30, with "noapic" to see if the original problem was an 
interupt issue, as everyone seems to have hinted at).

[    1.024337] ------------[ cut here ]------------
[    1.024408] WARNING: at /build/buildd-linux-2.6_2.6.30-6-amd64-
s9DPiZ/linux-2.6-2.6.30/debian/build/source_amd64_none/drivers/ata/libata-
core.c:6174 ata_host_activate+0x47/0xe0 [libata]()
[    1.024492] Hardware name: GA-MA790FXT-UD5P
[    1.024546] Modules linked in: crc_itu_t atiixp(+) ide_core pata_jmicron 
ahci(+) ehci_hcd(+) libata scsi_mod r8169 mii thermal fan thermal_sys
[    1.025196] Pid: 796, comm: work_for_cpu Not tainted 2.6.30-1-amd64 #1
[    1.025252] Call Trace:
[    1.025318]  [<ffffffffa0056101>] ? ata_host_activate+0x47/0xe0 [libata]
[    1.025385]  [<ffffffffa0056101>] ? ata_host_activate+0x47/0xe0 [libata]
[    1.025445]  [<ffffffff8024237f>] ? warn_slowpath_common+0x77/0xa3
[    1.025505]  [<ffffffffa008e598>] ? ahci_interrupt+0x0/0x454 [ahci]
[    1.025572]  [<ffffffffa0056101>] ? ata_host_activate+0x47/0xe0 [libata]
[    1.025632]  [<ffffffffa008e578>] ? ahci_init_one+0xbb4/0xbd4 [ahci]
[    1.025691]  [<ffffffff8025136e>] ? do_work_for_cpu+0x0/0x1b
[    1.025749]  [<ffffffff8035fa16>] ? local_pci_probe+0x12/0x16
[    1.025806]  [<ffffffff80251379>] ? do_work_for_cpu+0xb/0x1b
[    1.025862]  [<ffffffff80254382>] ? kthread+0x54/0x80
[    1.025918]  [<ffffffff80210aca>] ? child_rip+0xa/0x20
[    1.025975]  [<ffffffff8025432e>] ? kthread+0x0/0x80
[    1.026030]  [<ffffffff80210ac0>] ? child_rip+0x0/0x20
[    1.026085] ---[ end trace 54d3fd405814ad85 ]---
...
[   14.872233] ata10.15: qc timeout (cmd 0xe4)
[   14.872925] ata10.15: failed to read PMP GSCR[0] (Emask=0x4)
[   14.873256] ata9.15: qc timeout (cmd 0xe4)
[   14.873344] ata10: limiting SATA link speed to 1.5 Gbps
[   14.874008] ata9.15: failed to read PMP GSCR[0] (Emask=0x4)
[   14.874271] ata9: limiting SATA link speed to 1.5 Gbps
[   15.548002] irq 7: nobody cared (try booting with the "irqpoll" option)
[   15.548121] Pid: 0, comm: swapper Tainted: G        W  2.6.30-1-amd64 #1
[   15.548240] Call Trace:
[   15.548346]  <IRQ>  [<ffffffff8027fe0f>] ? __report_bad_irq+0x30/0x7d
[   15.548569]  [<ffffffff8027ff61>] ? note_interrupt+0x105/0x170
[   15.548689]  [<ffffffff802805f1>] ? handle_level_irq+0x7c/0xaf
[   15.548809]  [<ffffffff80212655>] ? handle_irq+0x17/0x1d
[   15.548929]  [<ffffffff80211e7c>] ? do_IRQ+0x57/0xbf
[   15.549044]  [<ffffffff80210453>] ? ret_from_intr+0x0/0x11
[   15.549161]  <EOI>  [<ffffffff8064e140>] ? early_idt_handler+0x0/0x71
[   15.549379]  [<ffffffff80227520>] ? native_safe_halt+0x2/0x3
[   15.549497]  [<ffffffff80216995>] ? default_idle+0x40/0x68
[   15.549612]  [<ffffffff8025d784>] ? clockevents_notify+0x2b/0x75
[   15.549730]  [<ffffffff80216d48>] ? c1e_idle+0xe5/0x10d
[   15.549848]  [<ffffffff8020edda>] ? cpu_idle+0x50/0x91
[   15.549963]  [<ffffffff8064ec62>] ? start_kernel+0x37a/0x386
[   15.550081]  [<ffffffff8064e3b7>] ? x86_64_start_kernel+0xf9/0x106
[   15.550199] handlers:
[   15.550311] [<ffffffff803de867>] (usb_hcd_irq+0x0/0x7e)
[   15.550630] Disabling IRQ #7
[   15.636304] irq 11: nobody cared (try booting with the "irqpoll" option)
[   15.636377] Pid: 0, comm: swapper Tainted: G        W  2.6.30-1-amd64 #1
[   15.636434] Call Trace:
[   15.636486]  <IRQ>  [<ffffffff8027fe0f>] ? __report_bad_irq+0x30/0x7d
[   15.636586]  [<ffffffff8027ff61>] ? note_interrupt+0x105/0x170
[   15.636643]  [<ffffffff802805f1>] ? handle_level_irq+0x7c/0xaf
[   15.636699]  [<ffffffff80212655>] ? handle_irq+0x17/0x1d
[   15.636755]  [<ffffffff80211e7c>] ? do_IRQ+0x57/0xbf
[   15.636811]  [<ffffffff80210453>] ? ret_from_intr+0x0/0x11
[   15.636866]  <EOI>  [<ffffffff8064e140>] ? early_idt_handler+0x0/0x71
[   15.636966]  [<ffffffff80227520>] ? native_safe_halt+0x2/0x3
[   15.637022]  [<ffffffff80216995>] ? default_idle+0x40/0x68
[   15.637078]  [<ffffffff8025d784>] ? clockevents_notify+0x2b/0x75
[   15.637135]  [<ffffffff80216d48>] ? c1e_idle+0xe5/0x10d
[   15.637191]  [<ffffffff8020edda>] ? cpu_idle+0x50/0x91
[   15.637247]  [<ffffffff8064ec62>] ? start_kernel+0x37a/0x386
[   15.637304]  [<ffffffff8064e3b7>] ? x86_64_start_kernel+0xf9/0x106
[   15.637359] handlers:
[   15.637411] [<ffffffff803de867>] (usb_hcd_irq+0x0/0x7e)
[   15.637549] [<ffffffff803de867>] (usb_hcd_irq+0x0/0x7e)
[   15.637687] [<ffffffff803de867>] (usb_hcd_irq+0x0/0x7e)
[   15.637825] Disabling IRQ #11
[   17.536129] usb-storage: device scan complete
...
[   21.388126] ata9.15: qc timeout (cmd 0xe4)
[   21.388194] ata9.15: failed to read PMP GSCR[0] (Emask=0x4)
[   21.588135] ata10.15: qc timeout (cmd 0xe4)
[   21.588204] ata10.15: failed to read PMP GSCR[0] (Emask=0x4)
[   24.832135] ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[   25.632183] ata10: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
...
[  198.322072] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 
frozen
[  198.322080] ata8.00: cmd b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0
[  198.322081]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 
(timeout)
[  198.322083] ata8.00: status: { DRDY }
[  198.322088] ata8: hard resetting link
[  198.916035] ata8: softreset failed (device not ready)
[  198.916039] ata8: failed due to HW bug, retry pmp=0
[  199.080024] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  199.092059] ata8.00: configured for UDMA/133
[  199.092072] ata8: EH complete
[  227.900583] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 
frozen
[  227.900590] ata8.00: cmd b0/d8:00:01:4f:c2/00:00:00:00:00/00 tag 0
[  227.900591]          res 40/00:00:af:88:e0/00:00:e8:00:00/e0 Emask 0x4 
(timeout)
[  227.900594] ata8.00: status: { DRDY }
[  227.900598] ata8: hard resetting link
[  228.384016] ata8: softreset failed (device not ready)
[  228.384020] ata8: failed due to HW bug, retry pmp=0
[  228.548024] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  228.560372] ata8.00: configured for UDMA/133
[  228.560385] ata8: EH complete
[  238.805198] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 
frozen
[  238.805218] ata8.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[  238.805221]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 
(timeout)
[  238.805229] ata8.00: status: { DRDY }
[  238.805241] ata8: hard resetting link
[  239.404154] ata8: softreset failed (device not ready)
[  239.404163] ata8: failed due to HW bug, retry pmp=0
[  239.568186] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  239.580343] ata8.00: configured for UDMA/133
[  239.580374] ata8: EH complete
[  246.808086] ata8.00: NCQ disabled due to excessive errors
[  246.808099] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 
frozen
[  246.808115] ata8.00: cmd b0/d8:00:01:4f:c2/00:00:00:00:00/00 tag 0
[  246.808119]          res 40/00:00:af:88:e0/00:00:e8:00:00/e0 Emask 0x4 
(timeout)
[  246.808126] ata8.00: status: { DRDY }
[  246.808138] ata8: hard resetting link
[  247.292158] ata8: softreset failed (device not ready)
[  247.292167] ata8: failed due to HW bug, retry pmp=0
[  247.456174] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  247.468955] ata8.00: configured for UDMA/133
[  247.468984] ata8: EH complete
[  272.804207] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 
frozen
[  272.804227] ata8.00: cmd b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0
[  272.804231]          res 40/00:00:af:88:e0/00:00:e8:00:00/e0 Emask 0x4 
(timeout)
[  272.804238] ata8.00: status: { DRDY }
[  272.804250] ata8: hard resetting link
[  273.292161] ata8: softreset failed (device not ready)
[  273.292169] ata8: failed due to HW bug, retry pmp=0
[  273.456173] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  273.468892] ata8.00: configured for UDMA/133
[  273.468916] ata8: EH complete

On boot, it seemed to hang the disk up for a good few minutes, even though 
nothing is using it at the moment (I have to manually bring up the mdraid0 
array, so it can't possibly be mounted), and smartctl was erroring out for a 
while, but now its fine, and smart shows no issues.

I'm going to try without noapic on 2.6.30, and 2.6.31-rc5 and see what 
happens.


-- 
Thomas Fjellstrom
tfjellstrom@shaw.ca

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BIOS update == more errors (was Re: sata exception frozen timeout?)
  2009-09-03 19:38 ` BIOS update == more errors (was Re: sata exception frozen timeout?) Thomas Fjellstrom
@ 2009-09-03 20:11   ` Thomas Fjellstrom
  2009-09-03 22:54     ` Thomas Fjellstrom
  2009-09-04  3:43   ` Tejun Heo
  1 sibling, 1 reply; 9+ messages in thread
From: Thomas Fjellstrom @ 2009-09-03 20:11 UTC (permalink / raw)
  To: linux-kernel

On Thu September 3 2009, Thomas Fjellstrom wrote:
> I've updated my bios to try and see if it would help at all (it did seem to
> fix other issues).
>
> But now I'm getting the following warnings and errors from dmesg on boot:
> (debian sid 2.6.30, with "noapic" to see if the original problem was an
> interupt issue, as everyone seems to have hinted at).
>
> [    1.024337] ------------[ cut here ]------------
> [    1.024408] WARNING: at /build/buildd-linux-2.6_2.6.30-6-amd64-
> s9DPiZ/linux-2.6-2.6.30/debian/build/source_amd64_none/drivers/ata/libata-
> core.c:6174 ata_host_activate+0x47/0xe0 [libata]()
> [    1.024492] Hardware name: GA-MA790FXT-UD5P
> [    1.024546] Modules linked in: crc_itu_t atiixp(+) ide_core pata_jmicron
> ahci(+) ehci_hcd(+) libata scsi_mod r8169 mii thermal fan thermal_sys
> [    1.025196] Pid: 796, comm: work_for_cpu Not tainted 2.6.30-1-amd64 #1
> [    1.025252] Call Trace:
> [    1.025318]  [<ffffffffa0056101>] ? ata_host_activate+0x47/0xe0 [libata]
> [    1.025385]  [<ffffffffa0056101>] ? ata_host_activate+0x47/0xe0 [libata]
> [    1.025445]  [<ffffffff8024237f>] ? warn_slowpath_common+0x77/0xa3
> [    1.025505]  [<ffffffffa008e598>] ? ahci_interrupt+0x0/0x454 [ahci]
> [    1.025572]  [<ffffffffa0056101>] ? ata_host_activate+0x47/0xe0 [libata]
> [    1.025632]  [<ffffffffa008e578>] ? ahci_init_one+0xbb4/0xbd4 [ahci]
> [    1.025691]  [<ffffffff8025136e>] ? do_work_for_cpu+0x0/0x1b
> [    1.025749]  [<ffffffff8035fa16>] ? local_pci_probe+0x12/0x16
> [    1.025806]  [<ffffffff80251379>] ? do_work_for_cpu+0xb/0x1b
> [    1.025862]  [<ffffffff80254382>] ? kthread+0x54/0x80
> [    1.025918]  [<ffffffff80210aca>] ? child_rip+0xa/0x20
> [    1.025975]  [<ffffffff8025432e>] ? kthread+0x0/0x80
> [    1.026030]  [<ffffffff80210ac0>] ? child_rip+0x0/0x20
> [    1.026085] ---[ end trace 54d3fd405814ad85 ]---
> ...
> [   14.872233] ata10.15: qc timeout (cmd 0xe4)
> [   14.872925] ata10.15: failed to read PMP GSCR[0] (Emask=0x4)
> [   14.873256] ata9.15: qc timeout (cmd 0xe4)
> [   14.873344] ata10: limiting SATA link speed to 1.5 Gbps
> [   14.874008] ata9.15: failed to read PMP GSCR[0] (Emask=0x4)
> [   14.874271] ata9: limiting SATA link speed to 1.5 Gbps
> [   15.548002] irq 7: nobody cared (try booting with the "irqpoll" option)
> [   15.548121] Pid: 0, comm: swapper Tainted: G        W  2.6.30-1-amd64 #1
> [   15.548240] Call Trace:
> [   15.548346]  <IRQ>  [<ffffffff8027fe0f>] ? __report_bad_irq+0x30/0x7d
> [   15.548569]  [<ffffffff8027ff61>] ? note_interrupt+0x105/0x170
> [   15.548689]  [<ffffffff802805f1>] ? handle_level_irq+0x7c/0xaf
> [   15.548809]  [<ffffffff80212655>] ? handle_irq+0x17/0x1d
> [   15.548929]  [<ffffffff80211e7c>] ? do_IRQ+0x57/0xbf
> [   15.549044]  [<ffffffff80210453>] ? ret_from_intr+0x0/0x11
> [   15.549161]  <EOI>  [<ffffffff8064e140>] ? early_idt_handler+0x0/0x71
> [   15.549379]  [<ffffffff80227520>] ? native_safe_halt+0x2/0x3
> [   15.549497]  [<ffffffff80216995>] ? default_idle+0x40/0x68
> [   15.549612]  [<ffffffff8025d784>] ? clockevents_notify+0x2b/0x75
> [   15.549730]  [<ffffffff80216d48>] ? c1e_idle+0xe5/0x10d
> [   15.549848]  [<ffffffff8020edda>] ? cpu_idle+0x50/0x91
> [   15.549963]  [<ffffffff8064ec62>] ? start_kernel+0x37a/0x386
> [   15.550081]  [<ffffffff8064e3b7>] ? x86_64_start_kernel+0xf9/0x106
> [   15.550199] handlers:
> [   15.550311] [<ffffffff803de867>] (usb_hcd_irq+0x0/0x7e)
> [   15.550630] Disabling IRQ #7
> [   15.636304] irq 11: nobody cared (try booting with the "irqpoll" option)
> [   15.636377] Pid: 0, comm: swapper Tainted: G        W  2.6.30-1-amd64 #1
> [   15.636434] Call Trace:
> [   15.636486]  <IRQ>  [<ffffffff8027fe0f>] ? __report_bad_irq+0x30/0x7d
> [   15.636586]  [<ffffffff8027ff61>] ? note_interrupt+0x105/0x170
> [   15.636643]  [<ffffffff802805f1>] ? handle_level_irq+0x7c/0xaf
> [   15.636699]  [<ffffffff80212655>] ? handle_irq+0x17/0x1d
> [   15.636755]  [<ffffffff80211e7c>] ? do_IRQ+0x57/0xbf
> [   15.636811]  [<ffffffff80210453>] ? ret_from_intr+0x0/0x11
> [   15.636866]  <EOI>  [<ffffffff8064e140>] ? early_idt_handler+0x0/0x71
> [   15.636966]  [<ffffffff80227520>] ? native_safe_halt+0x2/0x3
> [   15.637022]  [<ffffffff80216995>] ? default_idle+0x40/0x68
> [   15.637078]  [<ffffffff8025d784>] ? clockevents_notify+0x2b/0x75
> [   15.637135]  [<ffffffff80216d48>] ? c1e_idle+0xe5/0x10d
> [   15.637191]  [<ffffffff8020edda>] ? cpu_idle+0x50/0x91
> [   15.637247]  [<ffffffff8064ec62>] ? start_kernel+0x37a/0x386
> [   15.637304]  [<ffffffff8064e3b7>] ? x86_64_start_kernel+0xf9/0x106
> [   15.637359] handlers:
> [   15.637411] [<ffffffff803de867>] (usb_hcd_irq+0x0/0x7e)
> [   15.637549] [<ffffffff803de867>] (usb_hcd_irq+0x0/0x7e)
> [   15.637687] [<ffffffff803de867>] (usb_hcd_irq+0x0/0x7e)
> [   15.637825] Disabling IRQ #11
> [   17.536129] usb-storage: device scan complete
> ...
> [   21.388126] ata9.15: qc timeout (cmd 0xe4)
> [   21.388194] ata9.15: failed to read PMP GSCR[0] (Emask=0x4)
> [   21.588135] ata10.15: qc timeout (cmd 0xe4)
> [   21.588204] ata10.15: failed to read PMP GSCR[0] (Emask=0x4)
> [   24.832135] ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> [   25.632183] ata10: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> ...
> [  198.322072] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> frozen
> [  198.322080] ata8.00: cmd b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0
> [  198.322081]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4
> (timeout)
> [  198.322083] ata8.00: status: { DRDY }
> [  198.322088] ata8: hard resetting link
> [  198.916035] ata8: softreset failed (device not ready)
> [  198.916039] ata8: failed due to HW bug, retry pmp=0
> [  199.080024] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [  199.092059] ata8.00: configured for UDMA/133
> [  199.092072] ata8: EH complete
> [  227.900583] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> frozen
> [  227.900590] ata8.00: cmd b0/d8:00:01:4f:c2/00:00:00:00:00/00 tag 0
> [  227.900591]          res 40/00:00:af:88:e0/00:00:e8:00:00/e0 Emask 0x4
> (timeout)
> [  227.900594] ata8.00: status: { DRDY }
> [  227.900598] ata8: hard resetting link
> [  228.384016] ata8: softreset failed (device not ready)
> [  228.384020] ata8: failed due to HW bug, retry pmp=0
> [  228.548024] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [  228.560372] ata8.00: configured for UDMA/133
> [  228.560385] ata8: EH complete
> [  238.805198] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> frozen
> [  238.805218] ata8.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
> [  238.805221]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4
> (timeout)
> [  238.805229] ata8.00: status: { DRDY }
> [  238.805241] ata8: hard resetting link
> [  239.404154] ata8: softreset failed (device not ready)
> [  239.404163] ata8: failed due to HW bug, retry pmp=0
> [  239.568186] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [  239.580343] ata8.00: configured for UDMA/133
> [  239.580374] ata8: EH complete
> [  246.808086] ata8.00: NCQ disabled due to excessive errors
> [  246.808099] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> frozen
> [  246.808115] ata8.00: cmd b0/d8:00:01:4f:c2/00:00:00:00:00/00 tag 0
> [  246.808119]          res 40/00:00:af:88:e0/00:00:e8:00:00/e0 Emask 0x4
> (timeout)
> [  246.808126] ata8.00: status: { DRDY }
> [  246.808138] ata8: hard resetting link
> [  247.292158] ata8: softreset failed (device not ready)
> [  247.292167] ata8: failed due to HW bug, retry pmp=0
> [  247.456174] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [  247.468955] ata8.00: configured for UDMA/133
> [  247.468984] ata8: EH complete
> [  272.804207] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> frozen
> [  272.804227] ata8.00: cmd b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0
> [  272.804231]          res 40/00:00:af:88:e0/00:00:e8:00:00/e0 Emask 0x4
> (timeout)
> [  272.804238] ata8.00: status: { DRDY }
> [  272.804250] ata8: hard resetting link
> [  273.292161] ata8: softreset failed (device not ready)
> [  273.292169] ata8: failed due to HW bug, retry pmp=0
> [  273.456173] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [  273.468892] ata8.00: configured for UDMA/133
> [  273.468916] ata8: EH complete
>
> On boot, it seemed to hang the disk up for a good few minutes, even though
> nothing is using it at the moment (I have to manually bring up the mdraid0
> array, so it can't possibly be mounted), and smartctl was erroring out for
> a while, but now its fine, and smart shows no issues.
>
> I'm going to try without noapic on 2.6.30, and 2.6.31-rc5 and see what
> happens.

back with 2.6.30 apic enabled, all the traces are gone, but I still get the 
SATA errors, and a new message: ata8: SError: { HostInt }

[  415.781659] ata8.00: exception Emask 0x40 SAct 0x0 SErr 0x800 action 0x6 
frozen
[  415.781672] ata8: SError: { HostInt }
[  415.781687] ata8.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[  415.781690]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x44 
(timeout)
[  415.781697] ata8.00: status: { DRDY }
[  415.781708] ata8: hard resetting link
[  416.264190] ata8: softreset failed (device not ready)
[  416.264199] ata8: failed due to HW bug, retry pmp=0
[  416.428196] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  416.440517] ata8.00: configured for UDMA/133
[  416.440544] ata8: EH complete
[  424.781778] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 
frozen
[  424.781798] ata8.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[  424.781801]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 
(timeout)
[  424.781809] ata8.00: status: { DRDY }
[  424.781820] ata8: hard resetting link
[  425.265677] ata8: softreset failed (device not ready)
[  425.265686] ata8: failed due to HW bug, retry pmp=0
[  425.429546] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  425.442153] ata8.00: configured for UDMA/133
[  425.442179] ata8: EH complete
[  458.482002] CE: hpet increasing min_delta_ns to 15000 nsec
[  499.780213] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 
frozen
[  499.780233] ata8.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[  499.780237]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 
(timeout)
[  499.780244] ata8.00: status: { DRDY }
[  499.780256] ata8: hard resetting link
[  500.320191] ata8: softreset failed (device not ready)
[  500.320200] ata8: failed due to HW bug, retry pmp=0
[  500.485084] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  500.497235] ata8.00: configured for UDMA/133
[  500.497256] ata8: EH complete

And now with 2.6.31-rc5, instant ata exceptions, same as before (just no 
SError line this time).

-- 
Thomas Fjellstrom
tfjellstrom@shaw.ca

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BIOS update == more errors (was Re: sata exception frozen timeout?)
  2009-09-03 20:11   ` Thomas Fjellstrom
@ 2009-09-03 22:54     ` Thomas Fjellstrom
  0 siblings, 0 replies; 9+ messages in thread
From: Thomas Fjellstrom @ 2009-09-03 22:54 UTC (permalink / raw)
  To: linux-kernel

On Thu September 3 2009, Thomas Fjellstrom wrote:
> On Thu September 3 2009, Thomas Fjellstrom wrote:
> > I've updated my bios to try and see if it would help at all (it did seem
> > to fix other issues).
> >
> > But now I'm getting the following warnings and errors from dmesg on boot:
> > (debian sid 2.6.30, with "noapic" to see if the original problem was an
> > interupt issue, as everyone seems to have hinted at).
> >
> > [    1.024337] ------------[ cut here ]------------
> > [    1.024408] WARNING: at /build/buildd-linux-2.6_2.6.30-6-amd64-
> > s9DPiZ/linux-2.6-2.6.30/debian/build/source_amd64_none/drivers/ata/libata
> >- core.c:6174 ata_host_activate+0x47/0xe0 [libata]()
> > [    1.024492] Hardware name: GA-MA790FXT-UD5P
> > [    1.024546] Modules linked in: crc_itu_t atiixp(+) ide_core
> > pata_jmicron ahci(+) ehci_hcd(+) libata scsi_mod r8169 mii thermal fan
> > thermal_sys [    1.025196] Pid: 796, comm: work_for_cpu Not tainted
> > 2.6.30-1-amd64 #1 [    1.025252] Call Trace:
> > [    1.025318]  [<ffffffffa0056101>] ? ata_host_activate+0x47/0xe0
> > [libata] [    1.025385]  [<ffffffffa0056101>] ?
> > ata_host_activate+0x47/0xe0 [libata] [    1.025445]  [<ffffffff8024237f>]
> > ? warn_slowpath_common+0x77/0xa3 [    1.025505]  [<ffffffffa008e598>] ?
> > ahci_interrupt+0x0/0x454 [ahci] [    1.025572]  [<ffffffffa0056101>] ?
> > ata_host_activate+0x47/0xe0 [libata] [    1.025632]  [<ffffffffa008e578>]
> > ? ahci_init_one+0xbb4/0xbd4 [ahci] [    1.025691]  [<ffffffff8025136e>] ?
> > do_work_for_cpu+0x0/0x1b
> > [    1.025749]  [<ffffffff8035fa16>] ? local_pci_probe+0x12/0x16
> > [    1.025806]  [<ffffffff80251379>] ? do_work_for_cpu+0xb/0x1b
> > [    1.025862]  [<ffffffff80254382>] ? kthread+0x54/0x80
> > [    1.025918]  [<ffffffff80210aca>] ? child_rip+0xa/0x20
> > [    1.025975]  [<ffffffff8025432e>] ? kthread+0x0/0x80
> > [    1.026030]  [<ffffffff80210ac0>] ? child_rip+0x0/0x20
> > [    1.026085] ---[ end trace 54d3fd405814ad85 ]---
> > ...
> > [   14.872233] ata10.15: qc timeout (cmd 0xe4)
> > [   14.872925] ata10.15: failed to read PMP GSCR[0] (Emask=0x4)
> > [   14.873256] ata9.15: qc timeout (cmd 0xe4)
> > [   14.873344] ata10: limiting SATA link speed to 1.5 Gbps
> > [   14.874008] ata9.15: failed to read PMP GSCR[0] (Emask=0x4)
> > [   14.874271] ata9: limiting SATA link speed to 1.5 Gbps
> > [   15.548002] irq 7: nobody cared (try booting with the "irqpoll"
> > option) [   15.548121] Pid: 0, comm: swapper Tainted: G        W 
> > 2.6.30-1-amd64 #1 [   15.548240] Call Trace:
> > [   15.548346]  <IRQ>  [<ffffffff8027fe0f>] ? __report_bad_irq+0x30/0x7d
> > [   15.548569]  [<ffffffff8027ff61>] ? note_interrupt+0x105/0x170
> > [   15.548689]  [<ffffffff802805f1>] ? handle_level_irq+0x7c/0xaf
> > [   15.548809]  [<ffffffff80212655>] ? handle_irq+0x17/0x1d
> > [   15.548929]  [<ffffffff80211e7c>] ? do_IRQ+0x57/0xbf
> > [   15.549044]  [<ffffffff80210453>] ? ret_from_intr+0x0/0x11
> > [   15.549161]  <EOI>  [<ffffffff8064e140>] ? early_idt_handler+0x0/0x71
> > [   15.549379]  [<ffffffff80227520>] ? native_safe_halt+0x2/0x3
> > [   15.549497]  [<ffffffff80216995>] ? default_idle+0x40/0x68
> > [   15.549612]  [<ffffffff8025d784>] ? clockevents_notify+0x2b/0x75
> > [   15.549730]  [<ffffffff80216d48>] ? c1e_idle+0xe5/0x10d
> > [   15.549848]  [<ffffffff8020edda>] ? cpu_idle+0x50/0x91
> > [   15.549963]  [<ffffffff8064ec62>] ? start_kernel+0x37a/0x386
> > [   15.550081]  [<ffffffff8064e3b7>] ? x86_64_start_kernel+0xf9/0x106
> > [   15.550199] handlers:
> > [   15.550311] [<ffffffff803de867>] (usb_hcd_irq+0x0/0x7e)
> > [   15.550630] Disabling IRQ #7
> > [   15.636304] irq 11: nobody cared (try booting with the "irqpoll"
> > option) [   15.636377] Pid: 0, comm: swapper Tainted: G        W 
> > 2.6.30-1-amd64 #1 [   15.636434] Call Trace:
> > [   15.636486]  <IRQ>  [<ffffffff8027fe0f>] ? __report_bad_irq+0x30/0x7d
> > [   15.636586]  [<ffffffff8027ff61>] ? note_interrupt+0x105/0x170
> > [   15.636643]  [<ffffffff802805f1>] ? handle_level_irq+0x7c/0xaf
> > [   15.636699]  [<ffffffff80212655>] ? handle_irq+0x17/0x1d
> > [   15.636755]  [<ffffffff80211e7c>] ? do_IRQ+0x57/0xbf
> > [   15.636811]  [<ffffffff80210453>] ? ret_from_intr+0x0/0x11
> > [   15.636866]  <EOI>  [<ffffffff8064e140>] ? early_idt_handler+0x0/0x71
> > [   15.636966]  [<ffffffff80227520>] ? native_safe_halt+0x2/0x3
> > [   15.637022]  [<ffffffff80216995>] ? default_idle+0x40/0x68
> > [   15.637078]  [<ffffffff8025d784>] ? clockevents_notify+0x2b/0x75
> > [   15.637135]  [<ffffffff80216d48>] ? c1e_idle+0xe5/0x10d
> > [   15.637191]  [<ffffffff8020edda>] ? cpu_idle+0x50/0x91
> > [   15.637247]  [<ffffffff8064ec62>] ? start_kernel+0x37a/0x386
> > [   15.637304]  [<ffffffff8064e3b7>] ? x86_64_start_kernel+0xf9/0x106
> > [   15.637359] handlers:
> > [   15.637411] [<ffffffff803de867>] (usb_hcd_irq+0x0/0x7e)
> > [   15.637549] [<ffffffff803de867>] (usb_hcd_irq+0x0/0x7e)
> > [   15.637687] [<ffffffff803de867>] (usb_hcd_irq+0x0/0x7e)
> > [   15.637825] Disabling IRQ #11
> > [   17.536129] usb-storage: device scan complete
> > ...
> > [   21.388126] ata9.15: qc timeout (cmd 0xe4)
> > [   21.388194] ata9.15: failed to read PMP GSCR[0] (Emask=0x4)
> > [   21.588135] ata10.15: qc timeout (cmd 0xe4)
> > [   21.588204] ata10.15: failed to read PMP GSCR[0] (Emask=0x4)
> > [   24.832135] ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> > [   25.632183] ata10: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> > ...
> > [  198.322072] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> > frozen
> > [  198.322080] ata8.00: cmd b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0
> > [  198.322081]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4
> > (timeout)
> > [  198.322083] ata8.00: status: { DRDY }
> > [  198.322088] ata8: hard resetting link
> > [  198.916035] ata8: softreset failed (device not ready)
> > [  198.916039] ata8: failed due to HW bug, retry pmp=0
> > [  199.080024] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> > [  199.092059] ata8.00: configured for UDMA/133
> > [  199.092072] ata8: EH complete
> > [  227.900583] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> > frozen
> > [  227.900590] ata8.00: cmd b0/d8:00:01:4f:c2/00:00:00:00:00/00 tag 0
> > [  227.900591]          res 40/00:00:af:88:e0/00:00:e8:00:00/e0 Emask 0x4
> > (timeout)
> > [  227.900594] ata8.00: status: { DRDY }
> > [  227.900598] ata8: hard resetting link
> > [  228.384016] ata8: softreset failed (device not ready)
> > [  228.384020] ata8: failed due to HW bug, retry pmp=0
> > [  228.548024] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> > [  228.560372] ata8.00: configured for UDMA/133
> > [  228.560385] ata8: EH complete
> > [  238.805198] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> > frozen
> > [  238.805218] ata8.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
> > [  238.805221]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4
> > (timeout)
> > [  238.805229] ata8.00: status: { DRDY }
> > [  238.805241] ata8: hard resetting link
> > [  239.404154] ata8: softreset failed (device not ready)
> > [  239.404163] ata8: failed due to HW bug, retry pmp=0
> > [  239.568186] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> > [  239.580343] ata8.00: configured for UDMA/133
> > [  239.580374] ata8: EH complete
> > [  246.808086] ata8.00: NCQ disabled due to excessive errors
> > [  246.808099] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> > frozen
> > [  246.808115] ata8.00: cmd b0/d8:00:01:4f:c2/00:00:00:00:00/00 tag 0
> > [  246.808119]          res 40/00:00:af:88:e0/00:00:e8:00:00/e0 Emask 0x4
> > (timeout)
> > [  246.808126] ata8.00: status: { DRDY }
> > [  246.808138] ata8: hard resetting link
> > [  247.292158] ata8: softreset failed (device not ready)
> > [  247.292167] ata8: failed due to HW bug, retry pmp=0
> > [  247.456174] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> > [  247.468955] ata8.00: configured for UDMA/133
> > [  247.468984] ata8: EH complete
> > [  272.804207] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> > frozen
> > [  272.804227] ata8.00: cmd b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0
> > [  272.804231]          res 40/00:00:af:88:e0/00:00:e8:00:00/e0 Emask 0x4
> > (timeout)
> > [  272.804238] ata8.00: status: { DRDY }
> > [  272.804250] ata8: hard resetting link
> > [  273.292161] ata8: softreset failed (device not ready)
> > [  273.292169] ata8: failed due to HW bug, retry pmp=0
> > [  273.456173] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> > [  273.468892] ata8.00: configured for UDMA/133
> > [  273.468916] ata8: EH complete
> >
> > On boot, it seemed to hang the disk up for a good few minutes, even
> > though nothing is using it at the moment (I have to manually bring up the
> > mdraid0 array, so it can't possibly be mounted), and smartctl was
> > erroring out for a while, but now its fine, and smart shows no issues.
> >
> > I'm going to try without noapic on 2.6.30, and 2.6.31-rc5 and see what
> > happens.
>
> back with 2.6.30 apic enabled, all the traces are gone, but I still get the
> SATA errors, and a new message: ata8: SError: { HostInt }
>
> [  415.781659] ata8.00: exception Emask 0x40 SAct 0x0 SErr 0x800 action 0x6
> frozen
> [  415.781672] ata8: SError: { HostInt }
> [  415.781687] ata8.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
> [  415.781690]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x44
> (timeout)
> [  415.781697] ata8.00: status: { DRDY }
> [  415.781708] ata8: hard resetting link
> [  416.264190] ata8: softreset failed (device not ready)
> [  416.264199] ata8: failed due to HW bug, retry pmp=0
> [  416.428196] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [  416.440517] ata8.00: configured for UDMA/133
> [  416.440544] ata8: EH complete
> [  424.781778] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> frozen
> [  424.781798] ata8.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
> [  424.781801]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4
> (timeout)
> [  424.781809] ata8.00: status: { DRDY }
> [  424.781820] ata8: hard resetting link
> [  425.265677] ata8: softreset failed (device not ready)
> [  425.265686] ata8: failed due to HW bug, retry pmp=0
> [  425.429546] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [  425.442153] ata8.00: configured for UDMA/133
> [  425.442179] ata8: EH complete
> [  458.482002] CE: hpet increasing min_delta_ns to 15000 nsec
> [  499.780213] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> frozen
> [  499.780233] ata8.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
> [  499.780237]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4
> (timeout)
> [  499.780244] ata8.00: status: { DRDY }
> [  499.780256] ata8: hard resetting link
> [  500.320191] ata8: softreset failed (device not ready)
> [  500.320200] ata8: failed due to HW bug, retry pmp=0
> [  500.485084] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [  500.497235] ata8.00: configured for UDMA/133
> [  500.497256] ata8: EH complete
>
> And now with 2.6.31-rc5, instant ata exceptions, same as before (just no
> SError line this time).

I think I have an answer finally. Bad drive. I've plugged a new sata 
controller in, and I get the following from it:

[ 1581.172316] ata15: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 
0xb/00/00
[ 1581.172325] ata15: status=0x01 { Error }
[ 1581.172333] ata15: error=0x04 { DriveStatusError }

I assume that means the drive is at fault.

(one system lock up later...)

shortly after I wrote the above this appeared in dmesg:

Sep  3 16:33:59 natasha kernel: [ 1679.334788] ------------[ cut here 
]------------
Sep  3 16:33:59 natasha kernel: [ 1679.334864] WARNING: at drivers/ata/libata-
core.c:4910 __ata_qc_complete+0x5a/0xe1 [libata]()
Sep  3 16:33:59 natasha kernel: [ 1679.334870] Hardware name: GA-MA790FXT-UD5P
Sep  3 16:33:59 natasha kernel: [ 1679.334874] Modules linked in: fuse raid0 
kvm_amd kvm powernow_k8 cpufreq_conservative cpufreq_stats cpufreq_userspace 
cpufreq_powersave nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc 
bridge stp ext4 jbd2 crc16 md_mod it87 hwmon_vid adt7473 firewire_sbp2 loop 
snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss 
snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event 
amd64_edac_mod snd_seq edac_core snd_timer snd_seq_device evdev snd soundcore 
i2c_piix4 snd_page_alloc i2c_core parport_pc parport wmi processor button ext3 
jbd mbcache dm_mod usbhid hid sd_mod ata_generic pata_jmicron crc_t10dif 
atiixp firewire_ohci mvsas firewire_core crc_itu_t libsas scsi_transport_sas 
ohci_hcd ide_pci_generic ehci_hcd ide_core ahci libata scsi_mod r8169 mii 
floppy thermal fan thermal_sys [last unloaded: scsi_wait_scan]
Sep  3 16:33:59 natasha kernel: [ 1679.334995] Pid: 0, comm: swapper Tainted: 
G        W  2.6.31-rc5 #1
Sep  3 16:33:59 natasha kernel: [ 1679.335001] Call Trace:
Sep  3 16:33:59 natasha kernel: [ 1679.335006]  <IRQ>  [<ffffffffa006a2ca>] ? 
__ata_qc_complete+0x5a/0xe1 [libata]
Sep  3 16:33:59 natasha kernel: [ 1679.335097]  [<ffffffffa006a2ca>] ? 
__ata_qc_complete+0x5a/0xe1 [libata]
Sep  3 16:33:59 natasha kernel: [ 1679.335109]  [<ffffffff8104a57b>] ? 
warn_slowpath_common+0x77/0xa3
Sep  3 16:33:59 natasha kernel: [ 1679.335154]  [<ffffffffa006a2ca>] ? 
__ata_qc_complete+0x5a/0xe1 [libata]
Sep  3 16:33:59 natasha kernel: [ 1679.335178]  [<ffffffffa00ed9d3>] ? 
sas_ata_task_done+0x178/0x210 [libsas]
Sep  3 16:33:59 natasha kernel: [ 1679.335199]  [<ffffffffa01042ce>] ? 
mvs_slot_complete+0x3df/0x41b [mvsas]
Sep  3 16:33:59 natasha kernel: [ 1679.335209]  [<ffffffff810610fa>] ? 
ktime_get+0xc/0x41
Sep  3 16:33:59 natasha kernel: [ 1679.335218]  [<ffffffff81023a86>] ? 
lapic_next_event+0x18/0x1d
Sep  3 16:33:59 natasha kernel: [ 1679.335236]  [<ffffffffa010439c>] ? 
mvs_int_rx+0x92/0x101 [mvsas]
Sep  3 16:33:59 natasha kernel: [ 1679.335254]  [<ffffffffa01072b6>] ? 
mvs_int_full+0x25/0x88 [mvsas]
Sep  3 16:33:59 natasha kernel: [ 1679.335272]  [<ffffffffa010734a>] ? 
mvs_64xx_isr+0x31/0x40 [mvsas]
Sep  3 16:33:59 natasha kernel: [ 1679.335289]  [<ffffffffa01030e5>] ? 
mvs_interrupt+0x61/0x78 [mvsas]
Sep  3 16:33:59 natasha kernel: [ 1679.335298]  [<ffffffff81089ea8>] ? 
handle_IRQ_event+0x58/0x135
Sep  3 16:33:59 natasha kernel: [ 1679.335306]  [<ffffffff8108b595>] ? 
handle_fasteoi_irq+0x7d/0xb5
Sep  3 16:33:59 natasha kernel: [ 1679.335314]  [<ffffffff8101388d>] ? 
handle_irq+0x17/0x1d
Sep  3 16:33:59 natasha kernel: [ 1679.335321]  [<ffffffff81012ee5>] ? 
do_IRQ+0x57/0xb6
Sep  3 16:33:59 natasha kernel: [ 1679.335329]  [<ffffffff81011413>] ? 
ret_from_intr+0x0/0x11
Sep  3 16:33:59 natasha kernel: [ 1679.335333]  <EOI>  [<ffffffff8102b518>] ? 
native_safe_halt+0x2/0x3
Sep  3 16:33:59 natasha kernel: [ 1679.335349]  [<ffffffff81017c3d>] ? 
default_idle+0x40/0x68
Sep  3 16:33:59 natasha kernel: [ 1679.335357]  [<ffffffff8106790c>] ? 
clockevents_notify+0x2b/0x75
Sep  3 16:33:59 natasha kernel: [ 1679.335365]  [<ffffffff8101804c>] ? 
c1e_idle+0xe5/0x10d
Sep  3 16:33:59 natasha kernel: [ 1679.335374]  [<ffffffff8100fd9b>] ? 
cpu_idle+0x50/0x91
Sep  3 16:33:59 natasha kernel: [ 1679.335380] ---[ end trace 56631d4c7d46fc1b 
]---
Sep  3 16:34:00 natasha kernel: [ 1679.792256] CPU 2 
Sep  3 16:34:00 natasha kernel: [ 1679.792260] Modules linked in: fuse raid0 
kvm_amd kvm powernow_k8 cpufreq_conservative cpufreq_stats cpufreq_userspace 
cpufreq_powersave nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc 
bridge stp ext4 jbd2 crc16 md_mod it87 hwmon_vid adt7473 firewire_sbp2 loop 
snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss 
snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event 
amd64_edac_mod snd_seq edac_core snd_timer snd_seq_device evdev snd soundcore 
i2c_piix4 snd_page_alloc i2c_core parport_pc parport wmi processor button ext3 
jbd mbcache dm_mod usbhid hid sd_mod ata_generic pata_jmicron crc_t10dif 
atiixp firewire_ohci mvsas firewire_core crc_itu_t libsas scsi_transport_sas 
ohci_hcd ide_pci_generic ehci_hcd ide_core ahci libata scsi_mod r8169 mii 
floppy thermal fan thermal_sys [last unloaded: scsi_wait_scan]
Sep  3 16:34:00 natasha kernel: [ 1679.792384] Pid: 17, comm: events/2 
Tainted: G        W  2.6.31-rc5 #1 GA-MA790FXT-UD5P
Sep  3 16:34:00 natasha kernel: [ 1679.792390] RIP: 0010:[<ffffffff810d23c3>]  
[<ffffffff810d23c3>] free_block+0x59/0x11e
Sep  3 16:34:00 natasha kernel: [ 1679.792407] RSP: 0018:ffff88012fb97d90  
EFLAGS: 00010046
Sep  3 16:34:00 natasha kernel: [ 1679.792413] RAX: ffffea0001be4700 RBX: 
ffff88007f8201c0 RCX: 0000000000000000
Sep  3 16:34:00 natasha kernel: [ 1679.792419] RDX: ffffea0000000000 RSI: 
ffff88012b96ca18 RDI: ffff88007f8201c0
Sep  3 16:34:00 natasha kernel: [ 1679.792425] RBP: ffff88012ad7c6c0 R08: 
0000000000000000 R09: 00000000000000fa
Sep  3 16:34:00 natasha kernel: [ 1679.792430] R10: 0000000000000000 R11: 
ffff88002806bf20 R12: 0000000000000000
Sep  3 16:34:00 natasha kernel: [ 1679.792435] R13: ffff88012b96ca18 R14: 
000000000000020c R15: 0000000000000001
Sep  3 16:34:00 natasha kernel: [ 1679.792443] FS:  00007fe13b44e750(0000) 
GS:ffff88002805c000(0000) knlGS:00000000f65ffbb0
Sep  3 16:34:00 natasha kernel: [ 1679.792449] CS:  0010 DS: 0018 ES: 0018 
CR0: 000000008005003b
Sep  3 16:34:00 natasha kernel: [ 1679.792454] CR2: 00007f5d72294000 CR3: 
00000000cb536000 CR4: 00000000000006e0
Sep  3 16:34:00 natasha kernel: [ 1679.792460] DR0: 0000000000000000 DR1: 
0000000000000000 DR2: 0000000000000000
Sep  3 16:34:00 natasha kernel: [ 1679.792465] DR3: 0000000000000000 DR6: 
00000000ffff0ff0 DR7: 0000000000000400
Sep  3 16:34:00 natasha kernel: [ 1679.792472] Process events/2 (pid: 17, 
threadinfo ffff88012fb96000, task ffff88012fb740c0)
Sep  3 16:34:00 natasha kernel: [ 1679.792480]  ffff88012e1ed500 
ffff88012b96ca18 0000000000000001 ffff88012b96ca00
Sep  3 16:34:00 natasha kernel: [ 1679.792488] <0> ffff88012b9bb4c0 
0000000000000000 ffff88012ad7c6c0 ffffffff810d25b2
Sep  3 16:34:00 natasha kernel: [ 1679.792498] <0> ffffffffa03d6840 
0000000000011ab0 ffff88012b9bb4c0 ffff88012ad7c6c0
Sep  3 16:34:00 natasha kernel: [ 1679.792517]  [<ffffffff810d25b2>] ? 
drain_array+0x89/0xc1
Sep  3 16:34:00 natasha kernel: [ 1679.792526]  [<ffffffff810d299c>] ? 
cache_reap+0xde/0x212
Sep  3 16:34:00 natasha kernel: [ 1679.792535]  [<ffffffff810d28be>] ? 
cache_reap+0x0/0x212
Sep  3 16:34:00 natasha kernel: [ 1679.792545]  [<ffffffff8105a3da>] ? 
worker_thread+0x174/0x211
Sep  3 16:34:00 natasha kernel: [ 1679.792553]  [<ffffffff8105e326>] ? 
autoremove_wake_function+0x0/0x2e
Sep  3 16:34:00 natasha kernel: [ 1679.792561]  [<ffffffff8105a266>] ? 
worker_thread+0x0/0x211
Sep  3 16:34:00 natasha kernel: [ 1679.792568]  [<ffffffff8105dfd3>] ? 
kthread+0x84/0x8c
Sep  3 16:34:00 natasha kernel: [ 1679.792577]  [<ffffffff81011aea>] ? 
child_rip+0xa/0x20
Sep  3 16:34:00 natasha kernel: [ 1679.792587]  [<ffffffff81027574>] ? 
flat_send_IPI_mask+0x0/0x5
Sep  3 16:34:00 natasha kernel: [ 1679.792594]  [<ffffffff8105df4f>] ? 
kthread+0x0/0x8c
Sep  3 16:34:00 natasha kernel: [ 1679.792600]  [<ffffffff81011ae0>] ? 
child_rip+0x0/0x20
Sep  3 16:34:00 natasha kernel: [ 1679.792675]  RSP <ffff88012fb97d90>
Sep  3 16:34:00 natasha kernel: [ 1679.792681] ---[ end trace 56631d4c7d46fc1c 
]---

fun times. It seems as though the mvsas driver is a little less robust than 
the ahci driver.

-- 
Thomas Fjellstrom
tfjellstrom@shaw.ca

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BIOS update == more errors (was Re: sata exception frozen timeout?)
  2009-09-03 19:38 ` BIOS update == more errors (was Re: sata exception frozen timeout?) Thomas Fjellstrom
  2009-09-03 20:11   ` Thomas Fjellstrom
@ 2009-09-04  3:43   ` Tejun Heo
  1 sibling, 0 replies; 9+ messages in thread
From: Tejun Heo @ 2009-09-04  3:43 UTC (permalink / raw)
  To: tfjellstrom; +Cc: linux-kernel

Hello,

Thomas Fjellstrom wrote:
> I've updated my bios to try and see if it would help at all (it did seem to 
> fix other issues).
> 
> But now I'm getting the following warnings and errors from dmesg on boot: 
> (debian sid 2.6.30, with "noapic" to see if the original problem was an 
> interupt issue, as everyone seems to have hinted at).
> 
> [    1.024337] ------------[ cut here ]------------
> [    1.024408] WARNING: at /build/buildd-linux-2.6_2.6.30-6-amd64-
> s9DPiZ/linux-2.6-2.6.30/debian/build/source_amd64_none/drivers/ata/libata-
> core.c:6174 ata_host_activate+0x47/0xe0 [libata]()
> [    1.024492] Hardware name: GA-MA790FXT-UD5P
> [    1.024546] Modules linked in: crc_itu_t atiixp(+) ide_core pata_jmicron 
> ahci(+) ehci_hcd(+) libata scsi_mod r8169 mii thermal fan thermal_sys
> [    1.025196] Pid: 796, comm: work_for_cpu Not tainted 2.6.30-1-amd64 #1
> [    1.025252] Call Trace:
> [    1.025318]  [<ffffffffa0056101>] ? ata_host_activate+0x47/0xe0 [libata]
> [    1.025385]  [<ffffffffa0056101>] ? ata_host_activate+0x47/0xe0 [libata]
> [    1.025445]  [<ffffffff8024237f>] ? warn_slowpath_common+0x77/0xa3
> [    1.025505]  [<ffffffffa008e598>] ? ahci_interrupt+0x0/0x454 [ahci]
> [    1.025572]  [<ffffffffa0056101>] ? ata_host_activate+0x47/0xe0 [libata]
> [    1.025632]  [<ffffffffa008e578>] ? ahci_init_one+0xbb4/0xbd4 [ahci]
> [    1.025691]  [<ffffffff8025136e>] ? do_work_for_cpu+0x0/0x1b
> [    1.025749]  [<ffffffff8035fa16>] ? local_pci_probe+0x12/0x16
> [    1.025806]  [<ffffffff80251379>] ? do_work_for_cpu+0xb/0x1b
> [    1.025862]  [<ffffffff80254382>] ? kthread+0x54/0x80
> [    1.025918]  [<ffffffff80210aca>] ? child_rip+0xa/0x20
> [    1.025975]  [<ffffffff8025432e>] ? kthread+0x0/0x80
> [    1.026030]  [<ffffffff80210ac0>] ? child_rip+0x0/0x20
> [    1.026085] ---[ end trace 54d3fd405814ad85 ]---

This means that the controller wasn't assigned an irq.  Newer BIOSen
often break IRQ routing on !acpi.

> [   14.872233] ata10.15: qc timeout (cmd 0xe4)
> [   14.872925] ata10.15: failed to read PMP GSCR[0] (Emask=0x4)
> [   14.873256] ata9.15: qc timeout (cmd 0xe4)
> [   14.873344] ata10: limiting SATA link speed to 1.5 Gbps
> [   14.874008] ata9.15: failed to read PMP GSCR[0] (Emask=0x4)
> [   14.874271] ata9: limiting SATA link speed to 1.5 Gbps

So, probing of course can't be done properly.

> [   15.548002] irq 7: nobody cared (try booting with the "irqpoll" option)
> [   15.548121] Pid: 0, comm: swapper Tainted: G        W  2.6.30-1-amd64 #1
> [   15.548240] Call Trace:
> [   15.548346]  <IRQ>  [<ffffffff8027fe0f>] ? __report_bad_irq+0x30/0x7d
> [   15.548569]  [<ffffffff8027ff61>] ? note_interrupt+0x105/0x170
> [   15.548689]  [<ffffffff802805f1>] ? handle_level_irq+0x7c/0xaf
> [   15.548809]  [<ffffffff80212655>] ? handle_irq+0x17/0x1d
> [   15.548929]  [<ffffffff80211e7c>] ? do_IRQ+0x57/0xbf
> [   15.549044]  [<ffffffff80210453>] ? ret_from_intr+0x0/0x11
> [   15.549161]  <EOI>  [<ffffffff8064e140>] ? early_idt_handler+0x0/0x71
> [   15.549379]  [<ffffffff80227520>] ? native_safe_halt+0x2/0x3
> [   15.549497]  [<ffffffff80216995>] ? default_idle+0x40/0x68
> [   15.549612]  [<ffffffff8025d784>] ? clockevents_notify+0x2b/0x75
> [   15.549730]  [<ffffffff80216d48>] ? c1e_idle+0xe5/0x10d
> [   15.549848]  [<ffffffff8020edda>] ? cpu_idle+0x50/0x91
> [   15.549963]  [<ffffffff8064ec62>] ? start_kernel+0x37a/0x386
> [   15.550081]  [<ffffffff8064e3b7>] ? x86_64_start_kernel+0xf9/0x106
> [   15.550199] handlers:
> [   15.550311] [<ffffffff803de867>] (usb_hcd_irq+0x0/0x7e)
> [   15.550630] Disabling IRQ #7
> [   15.636304] irq 11: nobody cared (try booting with the "irqpoll" option)
> [   15.636377] Pid: 0, comm: swapper Tainted: G        W  2.6.30-1-amd64 #1
> [   15.636434] Call Trace:
> [   15.636486]  <IRQ>  [<ffffffff8027fe0f>] ? __report_bad_irq+0x30/0x7d
> [   15.636586]  [<ffffffff8027ff61>] ? note_interrupt+0x105/0x170
> [   15.636643]  [<ffffffff802805f1>] ? handle_level_irq+0x7c/0xaf
> [   15.636699]  [<ffffffff80212655>] ? handle_irq+0x17/0x1d
> [   15.636755]  [<ffffffff80211e7c>] ? do_IRQ+0x57/0xbf
> [   15.636811]  [<ffffffff80210453>] ? ret_from_intr+0x0/0x11
> [   15.636866]  <EOI>  [<ffffffff8064e140>] ? early_idt_handler+0x0/0x71
> [   15.636966]  [<ffffffff80227520>] ? native_safe_halt+0x2/0x3
> [   15.637022]  [<ffffffff80216995>] ? default_idle+0x40/0x68
> [   15.637078]  [<ffffffff8025d784>] ? clockevents_notify+0x2b/0x75
> [   15.637135]  [<ffffffff80216d48>] ? c1e_idle+0xe5/0x10d
> [   15.637191]  [<ffffffff8020edda>] ? cpu_idle+0x50/0x91
> [   15.637247]  [<ffffffff8064ec62>] ? start_kernel+0x37a/0x386
> [   15.637304]  [<ffffffff8064e3b7>] ? x86_64_start_kernel+0xf9/0x106
> [   15.637359] handlers:
> [   15.637411] [<ffffffff803de867>] (usb_hcd_irq+0x0/0x7e)
> [   15.637549] [<ffffffff803de867>] (usb_hcd_irq+0x0/0x7e)
> [   15.637687] [<ffffffff803de867>] (usb_hcd_irq+0x0/0x7e)
> [   15.637825] Disabling IRQ #11

And two nobody cared.  As the kernel doesn't have much idea how the
IRQs are being routed.  Hell breaking loose isn't much of a surprise.

> [  198.322072] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 
> frozen
> [  198.322080] ata8.00: cmd b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0
> [  198.322081]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 
> (timeout)
> [  198.322083] ata8.00: status: { DRDY }
> [  198.322088] ata8: hard resetting link
> [  198.916035] ata8: softreset failed (device not ready)
> [  198.916039] ata8: failed due to HW bug, retry pmp=0
> [  199.080024] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [  199.092059] ata8.00: configured for UDMA/133
> [  199.092072] ata8: EH complete
> [  227.900583] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 
> frozen
> [  227.900590] ata8.00: cmd b0/d8:00:01:4f:c2/00:00:00:00:00/00 tag 0
> [  227.900591]          res 40/00:00:af:88:e0/00:00:e8:00:00/e0 Emask 0x4 
> (timeout)

Does disabling smartd make any difference?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-09-04  3:43 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-01  9:16 sata exception frozen timeout? Thomas Fjellstrom
2009-09-01 10:28 ` Thomas Fjellstrom
2009-09-01 20:34   ` Thomas Fjellstrom
2009-09-02  4:32 ` Robert Hancock
2009-09-02  6:42   ` Thomas Fjellstrom
2009-09-03 19:38 ` BIOS update == more errors (was Re: sata exception frozen timeout?) Thomas Fjellstrom
2009-09-03 20:11   ` Thomas Fjellstrom
2009-09-03 22:54     ` Thomas Fjellstrom
2009-09-04  3:43   ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.