All of lore.kernel.org
 help / color / mirror / Atom feed
* 2.6.34 PDC20268 PATA IO error loop makes system unusable
@ 2010-06-13 15:48 Andi Kleen
  2010-06-14  7:43 ` Tejun Heo
  0 siblings, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2010-06-13 15:48 UTC (permalink / raw)
  To: linux-ide


Hi,

On 2.6.34:

While writing some data to an old PATA Maxtor disk connected to a PDC20268 Promise 
controller using the libata driver there were some IO errors.

After some time those resulted in a endless error message loop that made the system essentially 
unusable: (console was flooded and unusable, ssh was extremly slow etc.):

This does not exactly look like graceful error handling.

Excerpts from the log (full version available on request)

ata12.00: configured for UDMA/100
ata12: EH complete
ata12.00: configured for UDMA/100
ata12: EH complete
ata12.00: configured for UDMA/100
ata12: EH complete
ata12.00: configured for UDMA/100
ata12: EH complete
ata12.00: configured for UDMA/100
ata12: EH complete
ata12.00: configured for UDMA/100
sd 11:0:0:0: [sdd] Unhandled sense code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
       72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
       00 5b 23 bb 
sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 4f 00 00 80 00
ata12: EH complete
ata12.00: limiting speed to UDMA/66:PIO4
ata12: soft resetting link
ata12.00: configured for UDMA/66
ata12: EH complete
ata12.00: configured for UDMA/66
ata12: EH complete
ata12.00: configured for UDMA/66
ata12: EH complete
ata12.00: configured for UDMA/66
ata12: EH complete
ata12.00: configured for UDMA/66
ata12: EH complete
ata12.00: configured for UDMA/66
sd 11:0:0:0: [sdd] Unhandled sense code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
       72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
       00 5b 23 bb 
sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 bb 00 00 04 00
quiet_error: 10 callbacks suppressed
ata12: EH complete
EXT4-fs (dm-0): mounted filesystem with ordered data mode
kjournald starting.  Commit interval 5 seconds
EXT3-fs (dm-1): using internal journal
EXT3-fs (dm-1): mounted filesystem with writeback data mode
EXT4-fs (dm-2): mounted filesystem with ordered data mode
ata12.00: configured for UDMA/66
ata12: EH complete
ata12.00: configured for UDMA/66
ata12: EH complete
ata12.00: configured for UDMA/66
ata12: EH complete
ata12.00: configured for UDMA/66
ata12: EH complete
ata12.00: configured for UDMA/66
ata12: EH complete
ata12.00: configured for UDMA/66

...

sd 11:0:0:0: [sdd] Unhandled sense code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
       72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
       00 5b 23 bb 
sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 bb 00 00 04 00
quiet_error: 10 callbacks suppressed
ata12: EH complete
ata12.00: configured for UDMA/33
ata12: EH complete
ata12.00: configured for UDMA/33
ata12: EH complete
ata12.00: configured for UDMA/33
ata12: EH complete
ata12.00: configured for UDMA/33
ata12: EH complete
ata12.00: configured for UDMA/33
ata12: EH complete
ata12.00: configured for UDMA/33
sd 11:0:0:0: [sdd] Unhandled sense code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
       72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
       00 5b 23 bb 
sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 bb 00 00 04 00
ata12: EH complete
ata12.00: configured for UDMA/33
ata12: EH complete
ata12.00: configured for UDMA/33
ata12: EH complete
ata12.00: configured for UDMA/33
ata12: EH complete
ata12.00: configured for UDMA/33
ata12: EH complete
ata12.00: configured for UDMA/33
ata12: EH complete
ata12.00: configured for UDMA/33


... lots of similar messages until it goes down to PIO0 then some more errors ....


ata12: soft resetting link
ata12: soft resetting link
ata12: link is slow to respond, please be patient (ready=0)
ata12.00: qc timeout (cmd 0xec)
ata12.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata12: link is slow to respond, please be patient (ready=0)
ata12: soft resetting link
ata12.00: disabled
ata12: soft resetting link
ata12: EH complete
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 e7 08 00 01 00 00
lost page write due to I/O error on sdd
lost page write due to I/O error on sdd
lost page write due to I/O error on sdd
lost page write due to I/O error on sdd
lost page write due to I/O error on sdd
lost page write due to I/O error on sdd
lost page write due to I/O error on sdd
lost page write due to I/O error on sdd
lost page write due to I/O error on sdd
lost page write due to I/O error on sdd
sd 11:0:0:0: [sdd] Unhandled error code

and finally and endless flood of 

sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 e8 08 00 01 00 00
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 e9 08 00 01 00 00
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ea 08 00 01 00 00
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 eb 08 00 01 00 00
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ec 08 00 01 00 00
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ed 08 00 01 00 00
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ee 08 00 01 00 00
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ef 08 00 01 00 00
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f0 08 00 01 00 00
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f1 08 00 01 00 00
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f2 08 00 01 00 00

same messages repeating forever, just with CDB changing occasionally.

....

not stopping until I reset the box.


-Andi










^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.34 PDC20268 PATA IO error loop makes system unusable
  2010-06-13 15:48 2.6.34 PDC20268 PATA IO error loop makes system unusable Andi Kleen
@ 2010-06-14  7:43 ` Tejun Heo
  2010-06-14  7:53   ` Andi Kleen
  2010-06-14 11:17   ` Alan Cox
  0 siblings, 2 replies; 8+ messages in thread
From: Tejun Heo @ 2010-06-14  7:43 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-ide, linux-scsi

Hello, Andi.

(cc'ing linux-scsi and quoting whole message)

On 06/13/2010 05:48 PM, Andi Kleen wrote:
> 
> Hi,
> 
> On 2.6.34:
> 
> While writing some data to an old PATA Maxtor disk connected to a
> PDC20268 Promise controller using the libata driver there were some
> IO errors.
> 
> After some time those resulted in a endless error message loop that
> made the system essentially unusable: (console was flooded and
> unusable, ssh was extremly slow etc.):
> 
> This does not exactly look like graceful error handling.
> 
> Excerpts from the log (full version available on request)
> 
> ata12.00: configured for UDMA/100
> ata12: EH complete
> ata12.00: configured for UDMA/100
> ata12: EH complete
> ata12.00: configured for UDMA/100
> ata12: EH complete
> ata12.00: configured for UDMA/100
> ata12: EH complete
> ata12.00: configured for UDMA/100
> ata12: EH complete
> ata12.00: configured for UDMA/100
> sd 11:0:0:0: [sdd] Unhandled sense code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
> Descriptor sense data with sense descriptors (in hex):
>        72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
>        00 5b 23 bb 
> sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
> sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 4f 00 00 80 00
> ata12: EH complete
> ata12.00: limiting speed to UDMA/66:PIO4
> ata12: soft resetting link
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> sd 11:0:0:0: [sdd] Unhandled sense code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
> Descriptor sense data with sense descriptors (in hex):
>        72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
>        00 5b 23 bb 
> sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
> sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 bb 00 00 04 00
> quiet_error: 10 callbacks suppressed
> ata12: EH complete
> EXT4-fs (dm-0): mounted filesystem with ordered data mode
> kjournald starting.  Commit interval 5 seconds
> EXT3-fs (dm-1): using internal journal
> EXT3-fs (dm-1): mounted filesystem with writeback data mode
> EXT4-fs (dm-2): mounted filesystem with ordered data mode
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> 
> ...
> 
> sd 11:0:0:0: [sdd] Unhandled sense code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
> Descriptor sense data with sense descriptors (in hex):
>        72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
>        00 5b 23 bb 
> sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
> sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 bb 00 00 04 00
> quiet_error: 10 callbacks suppressed
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> sd 11:0:0:0: [sdd] Unhandled sense code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
> Descriptor sense data with sense descriptors (in hex):
>        72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
>        00 5b 23 bb 
> sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
> sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 bb 00 00 04 00
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> 
> 
> ... lots of similar messages until it goes down to PIO0 then some more errors ....
> 
> 
> ata12: soft resetting link
> ata12: soft resetting link
> ata12: link is slow to respond, please be patient (ready=0)
> ata12.00: qc timeout (cmd 0xec)
> ata12.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> ata12: link is slow to respond, please be patient (ready=0)
> ata12: soft resetting link
> ata12.00: disabled
> ata12: soft resetting link
> ata12: EH complete

At this point, the drive stopped responding and libata removed the
drive from the system.

> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 e7 08 00 01 00 00

As the device is gone, any command is immediately failed with
DID_BAD_TARGET.

> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> sd 11:0:0:0: [sdd] Unhandled error code
> 
> and finally and endless flood of 
> 
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 e8 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 e9 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ea 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 eb 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ec 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ed 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ee 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ef 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f0 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f1 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f2 08 00 01 00 00
> 
> same messages repeating forever, just with CDB changing occasionally.
> 
> ....
> 
> not stopping until I reset the box.

Did you have a lot of dirty pages?  It looks like upper layer is
trying to flush all the dirty buffers and SCSI is a tad bit too
verbose about failing each IO w/ DID_BAD_TARGET thus taking a very
long time if there are many to fail.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.34 PDC20268 PATA IO error loop makes system unusable
  2010-06-14  7:43 ` Tejun Heo
@ 2010-06-14  7:53   ` Andi Kleen
  2010-06-14  7:59     ` Tejun Heo
  2010-06-14 11:17   ` Alan Cox
  1 sibling, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2010-06-14  7:53 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Andi Kleen, linux-ide, linux-scsi

On Mon, Jun 14, 2010 at 09:43:28AM +0200, Tejun Heo wrote:
> > sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> > sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f0 08 00 01 00 00
> > sd 11:0:0:0: [sdd] Unhandled error code
> > sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> > sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f1 08 00 01 00 00
> > sd 11:0:0:0: [sdd] Unhandled error code
> > sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> > sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f2 08 00 01 00 00
> > 
> > same messages repeating forever, just with CDB changing occasionally.
> > 
> > ....
> > 
> > not stopping until I reset the box.
> 
> Did you have a lot of dirty pages?  It looks like upper layer is

Yes, there was a dd running.

> trying to flush all the dirty buffers and SCSI is a tad bit too
> verbose about failing each IO w/ DID_BAD_TARGET thus taking a very

A bit too verbose?  That's really an euphemism ...

During the CDB: Write loop the console was totally unusable!

And I think the fsyncs in syslogd completely made the performance
tank.

So basically it was a "reset button only" situation.

When the device is gone what's the point in giving a message 
more than once? Can't the requests just be silently failed in this
case?

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.34 PDC20268 PATA IO error loop makes system unusable
  2010-06-14  7:53   ` Andi Kleen
@ 2010-06-14  7:59     ` Tejun Heo
  2010-06-14  9:22       ` Andi Kleen
  2010-06-14 16:20       ` James Bottomley
  0 siblings, 2 replies; 8+ messages in thread
From: Tejun Heo @ 2010-06-14  7:59 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-ide, linux-scsi

Hello,

On 06/14/2010 09:53 AM, Andi Kleen wrote:
> On Mon, Jun 14, 2010 at 09:43:28AM +0200, Tejun Heo wrote:
>>> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
>>> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f0 08 00 01 00 00
>>> sd 11:0:0:0: [sdd] Unhandled error code
>>> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
>>> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f1 08 00 01 00 00
>>> sd 11:0:0:0: [sdd] Unhandled error code
>>> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
>>> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f2 08 00 01 00 00
>>>
>>> same messages repeating forever, just with CDB changing occasionally.
>>>
>>> ....
>>>
>>> not stopping until I reset the box.
>>
>> Did you have a lot of dirty pages?  It looks like upper layer is
> 
> Yes, there was a dd running.
> 
>> trying to flush all the dirty buffers and SCSI is a tad bit too
>> verbose about failing each IO w/ DID_BAD_TARGET thus taking a very
> 
> A bit too verbose?  That's really an euphemism ...

Yeap, of course it was. :-)

> During the CDB: Write loop the console was totally unusable!
> 
> And I think the fsyncs in syslogd completely made the performance
> tank.

Console often becomes the bottleneck too when there are a lot of
kernel messages.

> So basically it was a "reset button only" situation.
> 
> When the device is gone what's the point in giving a message 
> more than once? Can't the requests just be silently failed in this
> case?

Yeah, it would be better to somehow summarize those error message
instead of spitting out all of them.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.34 PDC20268 PATA IO error loop makes system unusable
  2010-06-14  7:59     ` Tejun Heo
@ 2010-06-14  9:22       ` Andi Kleen
  2010-06-14 16:20       ` James Bottomley
  1 sibling, 0 replies; 8+ messages in thread
From: Andi Kleen @ 2010-06-14  9:22 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Andi Kleen, linux-ide, linux-scsi

> > So basically it was a "reset button only" situation.
> > 
> > When the device is gone what's the point in giving a message 
> > more than once? Can't the requests just be silently failed in this
> > case?
> 
> Yeah, it would be better to somehow summarize those error message
> instead of spitting out all of them.

Hmm I think I have some ideas on that, I'll take a look.
-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.34 PDC20268 PATA IO error loop makes system unusable
  2010-06-14  7:43 ` Tejun Heo
  2010-06-14  7:53   ` Andi Kleen
@ 2010-06-14 11:17   ` Alan Cox
  2010-06-14 11:29     ` Andi Kleen
  1 sibling, 1 reply; 8+ messages in thread
From: Alan Cox @ 2010-06-14 11:17 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Andi Kleen, linux-ide, linux-scsi

> > sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]

That is the drive complaining about a bad block.

> > Descriptor sense data with sense descriptors (in hex):
> >        72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
> >        00 5b 23 bb 
> > sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
> > sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 4f 00 00 80 00

Which in my experience usually comes from

- The disc being unable to find the data block

- A corrupt indirect block in the file system leading to a requst for an
  out of range data block

In theory it could be an on the wire bitflip but thats a bit too
consistent to believe.

Alan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.34 PDC20268 PATA IO error loop makes system unusable
  2010-06-14 11:17   ` Alan Cox
@ 2010-06-14 11:29     ` Andi Kleen
  0 siblings, 0 replies; 8+ messages in thread
From: Andi Kleen @ 2010-06-14 11:29 UTC (permalink / raw)
  To: Alan Cox; +Cc: Tejun Heo, Andi Kleen, linux-ide, linux-scsi

On Mon, Jun 14, 2010 at 12:17:53PM +0100, Alan Cox wrote:
> > > sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > > sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
> 
> That is the drive complaining about a bad block.
> 
> > > Descriptor sense data with sense descriptors (in hex):
> > >        72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
> > >        00 5b 23 bb 
> > > sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
> > > sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 4f 00 00 80 00
> 
> Which in my experience usually comes from
> 
> - The disc being unable to find the data block
> 
> - A corrupt indirect block in the file system leading to a requst for an
>   out of range data block
> 
> In theory it could be an on the wire bitflip but thats a bit too
> consistent to believe.

Thank, Alan. I really don't care about the drive (nothing valuable on it).
It was an old IDE disk and I am not surprised it has some bad sectors.
In fact I was clearing it for junking it (although I may keep it now
just in case I need to test IDE error handling again :)

But I cared about the system becoming unusable due to error handling running
amok.

That was the point of this bug report.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.34 PDC20268 PATA IO error loop makes system unusable
  2010-06-14  7:59     ` Tejun Heo
  2010-06-14  9:22       ` Andi Kleen
@ 2010-06-14 16:20       ` James Bottomley
  1 sibling, 0 replies; 8+ messages in thread
From: James Bottomley @ 2010-06-14 16:20 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Andi Kleen, linux-ide, linux-scsi

On Mon, 2010-06-14 at 09:59 +0200, Tejun Heo wrote:
> Hello,
> 
> On 06/14/2010 09:53 AM, Andi Kleen wrote:
> > On Mon, Jun 14, 2010 at 09:43:28AM +0200, Tejun Heo wrote:
> >>> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> >>> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f0 08 00 01 00 00
> >>> sd 11:0:0:0: [sdd] Unhandled error code
> >>> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> >>> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f1 08 00 01 00 00
> >>> sd 11:0:0:0: [sdd] Unhandled error code
> >>> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> >>> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f2 08 00 01 00 00
> >>>
> >>> same messages repeating forever, just with CDB changing occasionally.
> >>>
> >>> ....
> >>>
> >>> not stopping until I reset the box.
> >>
> >> Did you have a lot of dirty pages?  It looks like upper layer is
> > 
> > Yes, there was a dd running.
> > 
> >> trying to flush all the dirty buffers and SCSI is a tad bit too
> >> verbose about failing each IO w/ DID_BAD_TARGET thus taking a very
> > 
> > A bit too verbose?  That's really an euphemism ...
> 
> Yeap, of course it was. :-)
> 
> > During the CDB: Write loop the console was totally unusable!
> > 
> > And I think the fsyncs in syslogd completely made the performance
> > tank.
> 
> Console often becomes the bottleneck too when there are a lot of
> kernel messages.
> 
> > So basically it was a "reset button only" situation.
> > 
> > When the device is gone what's the point in giving a message 
> > more than once? Can't the requests just be silently failed in this
> > case?
> 
> Yeah, it would be better to somehow summarize those error message
> instead of spitting out all of them.

I don't think we can summarize.  However, when things start to go wrong,
it's usually only the first set of errors that are significant, so we
could do a simple ratelimit.

James

---

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 1646fe7..c8c7483 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -896,7 +896,7 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 	case ACTION_FAIL:
 		/* Give up and fail the remainder of the request */
 		scsi_release_buffers(cmd);
-		if (!(req->cmd_flags & REQ_QUIET)) {
+		if (!(req->cmd_flags & REQ_QUIET) && printk_ratelimit()) {
 			if (description)
 				scmd_printk(KERN_INFO, cmd, "%s\n",
 					    description);



^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-06-14 16:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-13 15:48 2.6.34 PDC20268 PATA IO error loop makes system unusable Andi Kleen
2010-06-14  7:43 ` Tejun Heo
2010-06-14  7:53   ` Andi Kleen
2010-06-14  7:59     ` Tejun Heo
2010-06-14  9:22       ` Andi Kleen
2010-06-14 16:20       ` James Bottomley
2010-06-14 11:17   ` Alan Cox
2010-06-14 11:29     ` Andi Kleen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.