* 2.6.34 PDC20268 PATA IO error loop makes system unusable
@ 2010-06-13 15:48 Andi Kleen
2010-06-14 7:43 ` Tejun Heo
0 siblings, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2010-06-13 15:48 UTC (permalink / raw)
To: linux-ide
Hi,
On 2.6.34:
While writing some data to an old PATA Maxtor disk connected to a PDC20268 Promise
controller using the libata driver there were some IO errors.
After some time those resulted in a endless error message loop that made the system essentially
unusable: (console was flooded and unusable, ssh was extremly slow etc.):
This does not exactly look like graceful error handling.
Excerpts from the log (full version available on request)
ata12.00: configured for UDMA/100
ata12: EH complete
ata12.00: configured for UDMA/100
ata12: EH complete
ata12.00: configured for UDMA/100
ata12: EH complete
ata12.00: configured for UDMA/100
ata12: EH complete
ata12.00: configured for UDMA/100
ata12: EH complete
ata12.00: configured for UDMA/100
sd 11:0:0:0: [sdd] Unhandled sense code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00
00 5b 23 bb
sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 4f 00 00 80 00
ata12: EH complete
ata12.00: limiting speed to UDMA/66:PIO4
ata12: soft resetting link
ata12.00: configured for UDMA/66
ata12: EH complete
ata12.00: configured for UDMA/66
ata12: EH complete
ata12.00: configured for UDMA/66
ata12: EH complete
ata12.00: configured for UDMA/66
ata12: EH complete
ata12.00: configured for UDMA/66
ata12: EH complete
ata12.00: configured for UDMA/66
sd 11:0:0:0: [sdd] Unhandled sense code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00
00 5b 23 bb
sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 bb 00 00 04 00
quiet_error: 10 callbacks suppressed
ata12: EH complete
EXT4-fs (dm-0): mounted filesystem with ordered data mode
kjournald starting. Commit interval 5 seconds
EXT3-fs (dm-1): using internal journal
EXT3-fs (dm-1): mounted filesystem with writeback data mode
EXT4-fs (dm-2): mounted filesystem with ordered data mode
ata12.00: configured for UDMA/66
ata12: EH complete
ata12.00: configured for UDMA/66
ata12: EH complete
ata12.00: configured for UDMA/66
ata12: EH complete
ata12.00: configured for UDMA/66
ata12: EH complete
ata12.00: configured for UDMA/66
ata12: EH complete
ata12.00: configured for UDMA/66
...
sd 11:0:0:0: [sdd] Unhandled sense code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00
00 5b 23 bb
sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 bb 00 00 04 00
quiet_error: 10 callbacks suppressed
ata12: EH complete
ata12.00: configured for UDMA/33
ata12: EH complete
ata12.00: configured for UDMA/33
ata12: EH complete
ata12.00: configured for UDMA/33
ata12: EH complete
ata12.00: configured for UDMA/33
ata12: EH complete
ata12.00: configured for UDMA/33
ata12: EH complete
ata12.00: configured for UDMA/33
sd 11:0:0:0: [sdd] Unhandled sense code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00
00 5b 23 bb
sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 bb 00 00 04 00
ata12: EH complete
ata12.00: configured for UDMA/33
ata12: EH complete
ata12.00: configured for UDMA/33
ata12: EH complete
ata12.00: configured for UDMA/33
ata12: EH complete
ata12.00: configured for UDMA/33
ata12: EH complete
ata12.00: configured for UDMA/33
ata12: EH complete
ata12.00: configured for UDMA/33
... lots of similar messages until it goes down to PIO0 then some more errors ....
ata12: soft resetting link
ata12: soft resetting link
ata12: link is slow to respond, please be patient (ready=0)
ata12.00: qc timeout (cmd 0xec)
ata12.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata12: link is slow to respond, please be patient (ready=0)
ata12: soft resetting link
ata12.00: disabled
ata12: soft resetting link
ata12: EH complete
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 e7 08 00 01 00 00
lost page write due to I/O error on sdd
lost page write due to I/O error on sdd
lost page write due to I/O error on sdd
lost page write due to I/O error on sdd
lost page write due to I/O error on sdd
lost page write due to I/O error on sdd
lost page write due to I/O error on sdd
lost page write due to I/O error on sdd
lost page write due to I/O error on sdd
lost page write due to I/O error on sdd
sd 11:0:0:0: [sdd] Unhandled error code
and finally and endless flood of
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 e8 08 00 01 00 00
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 e9 08 00 01 00 00
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ea 08 00 01 00 00
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 eb 08 00 01 00 00
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ec 08 00 01 00 00
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ed 08 00 01 00 00
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ee 08 00 01 00 00
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ef 08 00 01 00 00
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f0 08 00 01 00 00
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f1 08 00 01 00 00
sd 11:0:0:0: [sdd] Unhandled error code
sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f2 08 00 01 00 00
same messages repeating forever, just with CDB changing occasionally.
....
not stopping until I reset the box.
-Andi
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.34 PDC20268 PATA IO error loop makes system unusable
2010-06-13 15:48 2.6.34 PDC20268 PATA IO error loop makes system unusable Andi Kleen
@ 2010-06-14 7:43 ` Tejun Heo
2010-06-14 7:53 ` Andi Kleen
2010-06-14 11:17 ` Alan Cox
0 siblings, 2 replies; 8+ messages in thread
From: Tejun Heo @ 2010-06-14 7:43 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-ide, linux-scsi
Hello, Andi.
(cc'ing linux-scsi and quoting whole message)
On 06/13/2010 05:48 PM, Andi Kleen wrote:
>
> Hi,
>
> On 2.6.34:
>
> While writing some data to an old PATA Maxtor disk connected to a
> PDC20268 Promise controller using the libata driver there were some
> IO errors.
>
> After some time those resulted in a endless error message loop that
> made the system essentially unusable: (console was flooded and
> unusable, ssh was extremly slow etc.):
>
> This does not exactly look like graceful error handling.
>
> Excerpts from the log (full version available on request)
>
> ata12.00: configured for UDMA/100
> ata12: EH complete
> ata12.00: configured for UDMA/100
> ata12: EH complete
> ata12.00: configured for UDMA/100
> ata12: EH complete
> ata12.00: configured for UDMA/100
> ata12: EH complete
> ata12.00: configured for UDMA/100
> ata12: EH complete
> ata12.00: configured for UDMA/100
> sd 11:0:0:0: [sdd] Unhandled sense code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
> Descriptor sense data with sense descriptors (in hex):
> 72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00
> 00 5b 23 bb
> sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
> sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 4f 00 00 80 00
> ata12: EH complete
> ata12.00: limiting speed to UDMA/66:PIO4
> ata12: soft resetting link
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> sd 11:0:0:0: [sdd] Unhandled sense code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
> Descriptor sense data with sense descriptors (in hex):
> 72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00
> 00 5b 23 bb
> sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
> sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 bb 00 00 04 00
> quiet_error: 10 callbacks suppressed
> ata12: EH complete
> EXT4-fs (dm-0): mounted filesystem with ordered data mode
> kjournald starting. Commit interval 5 seconds
> EXT3-fs (dm-1): using internal journal
> EXT3-fs (dm-1): mounted filesystem with writeback data mode
> EXT4-fs (dm-2): mounted filesystem with ordered data mode
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
> ata12: EH complete
> ata12.00: configured for UDMA/66
>
> ...
>
> sd 11:0:0:0: [sdd] Unhandled sense code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
> Descriptor sense data with sense descriptors (in hex):
> 72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00
> 00 5b 23 bb
> sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
> sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 bb 00 00 04 00
> quiet_error: 10 callbacks suppressed
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> sd 11:0:0:0: [sdd] Unhandled sense code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
> Descriptor sense data with sense descriptors (in hex):
> 72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00
> 00 5b 23 bb
> sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
> sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 bb 00 00 04 00
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
> ata12: EH complete
> ata12.00: configured for UDMA/33
>
>
> ... lots of similar messages until it goes down to PIO0 then some more errors ....
>
>
> ata12: soft resetting link
> ata12: soft resetting link
> ata12: link is slow to respond, please be patient (ready=0)
> ata12.00: qc timeout (cmd 0xec)
> ata12.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> ata12: link is slow to respond, please be patient (ready=0)
> ata12: soft resetting link
> ata12.00: disabled
> ata12: soft resetting link
> ata12: EH complete
At this point, the drive stopped responding and libata removed the
drive from the system.
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 e7 08 00 01 00 00
As the device is gone, any command is immediately failed with
DID_BAD_TARGET.
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> lost page write due to I/O error on sdd
> sd 11:0:0:0: [sdd] Unhandled error code
>
> and finally and endless flood of
>
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 e8 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 e9 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ea 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 eb 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ec 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ed 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ee 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 ef 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f0 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f1 08 00 01 00 00
> sd 11:0:0:0: [sdd] Unhandled error code
> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f2 08 00 01 00 00
>
> same messages repeating forever, just with CDB changing occasionally.
>
> ....
>
> not stopping until I reset the box.
Did you have a lot of dirty pages? It looks like upper layer is
trying to flush all the dirty buffers and SCSI is a tad bit too
verbose about failing each IO w/ DID_BAD_TARGET thus taking a very
long time if there are many to fail.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.34 PDC20268 PATA IO error loop makes system unusable
2010-06-14 7:43 ` Tejun Heo
@ 2010-06-14 7:53 ` Andi Kleen
2010-06-14 7:59 ` Tejun Heo
2010-06-14 11:17 ` Alan Cox
1 sibling, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2010-06-14 7:53 UTC (permalink / raw)
To: Tejun Heo; +Cc: Andi Kleen, linux-ide, linux-scsi
On Mon, Jun 14, 2010 at 09:43:28AM +0200, Tejun Heo wrote:
> > sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> > sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f0 08 00 01 00 00
> > sd 11:0:0:0: [sdd] Unhandled error code
> > sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> > sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f1 08 00 01 00 00
> > sd 11:0:0:0: [sdd] Unhandled error code
> > sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> > sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f2 08 00 01 00 00
> >
> > same messages repeating forever, just with CDB changing occasionally.
> >
> > ....
> >
> > not stopping until I reset the box.
>
> Did you have a lot of dirty pages? It looks like upper layer is
Yes, there was a dd running.
> trying to flush all the dirty buffers and SCSI is a tad bit too
> verbose about failing each IO w/ DID_BAD_TARGET thus taking a very
A bit too verbose? That's really an euphemism ...
During the CDB: Write loop the console was totally unusable!
And I think the fsyncs in syslogd completely made the performance
tank.
So basically it was a "reset button only" situation.
When the device is gone what's the point in giving a message
more than once? Can't the requests just be silently failed in this
case?
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.34 PDC20268 PATA IO error loop makes system unusable
2010-06-14 7:53 ` Andi Kleen
@ 2010-06-14 7:59 ` Tejun Heo
2010-06-14 9:22 ` Andi Kleen
2010-06-14 16:20 ` James Bottomley
0 siblings, 2 replies; 8+ messages in thread
From: Tejun Heo @ 2010-06-14 7:59 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-ide, linux-scsi
Hello,
On 06/14/2010 09:53 AM, Andi Kleen wrote:
> On Mon, Jun 14, 2010 at 09:43:28AM +0200, Tejun Heo wrote:
>>> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
>>> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f0 08 00 01 00 00
>>> sd 11:0:0:0: [sdd] Unhandled error code
>>> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
>>> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f1 08 00 01 00 00
>>> sd 11:0:0:0: [sdd] Unhandled error code
>>> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
>>> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f2 08 00 01 00 00
>>>
>>> same messages repeating forever, just with CDB changing occasionally.
>>>
>>> ....
>>>
>>> not stopping until I reset the box.
>>
>> Did you have a lot of dirty pages? It looks like upper layer is
>
> Yes, there was a dd running.
>
>> trying to flush all the dirty buffers and SCSI is a tad bit too
>> verbose about failing each IO w/ DID_BAD_TARGET thus taking a very
>
> A bit too verbose? That's really an euphemism ...
Yeap, of course it was. :-)
> During the CDB: Write loop the console was totally unusable!
>
> And I think the fsyncs in syslogd completely made the performance
> tank.
Console often becomes the bottleneck too when there are a lot of
kernel messages.
> So basically it was a "reset button only" situation.
>
> When the device is gone what's the point in giving a message
> more than once? Can't the requests just be silently failed in this
> case?
Yeah, it would be better to somehow summarize those error message
instead of spitting out all of them.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.34 PDC20268 PATA IO error loop makes system unusable
2010-06-14 7:59 ` Tejun Heo
@ 2010-06-14 9:22 ` Andi Kleen
2010-06-14 16:20 ` James Bottomley
1 sibling, 0 replies; 8+ messages in thread
From: Andi Kleen @ 2010-06-14 9:22 UTC (permalink / raw)
To: Tejun Heo; +Cc: Andi Kleen, linux-ide, linux-scsi
> > So basically it was a "reset button only" situation.
> >
> > When the device is gone what's the point in giving a message
> > more than once? Can't the requests just be silently failed in this
> > case?
>
> Yeah, it would be better to somehow summarize those error message
> instead of spitting out all of them.
Hmm I think I have some ideas on that, I'll take a look.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.34 PDC20268 PATA IO error loop makes system unusable
2010-06-14 7:43 ` Tejun Heo
2010-06-14 7:53 ` Andi Kleen
@ 2010-06-14 11:17 ` Alan Cox
2010-06-14 11:29 ` Andi Kleen
1 sibling, 1 reply; 8+ messages in thread
From: Alan Cox @ 2010-06-14 11:17 UTC (permalink / raw)
To: Tejun Heo; +Cc: Andi Kleen, linux-ide, linux-scsi
> > sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
That is the drive complaining about a bad block.
> > Descriptor sense data with sense descriptors (in hex):
> > 72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00
> > 00 5b 23 bb
> > sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
> > sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 4f 00 00 80 00
Which in my experience usually comes from
- The disc being unable to find the data block
- A corrupt indirect block in the file system leading to a requst for an
out of range data block
In theory it could be an on the wire bitflip but thats a bit too
consistent to believe.
Alan
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.34 PDC20268 PATA IO error loop makes system unusable
2010-06-14 11:17 ` Alan Cox
@ 2010-06-14 11:29 ` Andi Kleen
0 siblings, 0 replies; 8+ messages in thread
From: Andi Kleen @ 2010-06-14 11:29 UTC (permalink / raw)
To: Alan Cox; +Cc: Tejun Heo, Andi Kleen, linux-ide, linux-scsi
On Mon, Jun 14, 2010 at 12:17:53PM +0100, Alan Cox wrote:
> > > sd 11:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > > sd 11:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor]
>
> That is the drive complaining about a bad block.
>
> > > Descriptor sense data with sense descriptors (in hex):
> > > 72 03 13 00 00 00 00 0c 00 0a 80 00 00 00 00 00
> > > 00 5b 23 bb
> > > sd 11:0:0:0: [sdd] Add. Sense: Address mark not found for data field
> > > sd 11:0:0:0: [sdd] CDB: Read(10): 28 00 00 5b 23 4f 00 00 80 00
>
> Which in my experience usually comes from
>
> - The disc being unable to find the data block
>
> - A corrupt indirect block in the file system leading to a requst for an
> out of range data block
>
> In theory it could be an on the wire bitflip but thats a bit too
> consistent to believe.
Thank, Alan. I really don't care about the drive (nothing valuable on it).
It was an old IDE disk and I am not surprised it has some bad sectors.
In fact I was clearing it for junking it (although I may keep it now
just in case I need to test IDE error handling again :)
But I cared about the system becoming unusable due to error handling running
amok.
That was the point of this bug report.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.34 PDC20268 PATA IO error loop makes system unusable
2010-06-14 7:59 ` Tejun Heo
2010-06-14 9:22 ` Andi Kleen
@ 2010-06-14 16:20 ` James Bottomley
1 sibling, 0 replies; 8+ messages in thread
From: James Bottomley @ 2010-06-14 16:20 UTC (permalink / raw)
To: Tejun Heo; +Cc: Andi Kleen, linux-ide, linux-scsi
On Mon, 2010-06-14 at 09:59 +0200, Tejun Heo wrote:
> Hello,
>
> On 06/14/2010 09:53 AM, Andi Kleen wrote:
> > On Mon, Jun 14, 2010 at 09:43:28AM +0200, Tejun Heo wrote:
> >>> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> >>> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f0 08 00 01 00 00
> >>> sd 11:0:0:0: [sdd] Unhandled error code
> >>> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> >>> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f1 08 00 01 00 00
> >>> sd 11:0:0:0: [sdd] Unhandled error code
> >>> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> >>> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f2 08 00 01 00 00
> >>>
> >>> same messages repeating forever, just with CDB changing occasionally.
> >>>
> >>> ....
> >>>
> >>> not stopping until I reset the box.
> >>
> >> Did you have a lot of dirty pages? It looks like upper layer is
> >
> > Yes, there was a dd running.
> >
> >> trying to flush all the dirty buffers and SCSI is a tad bit too
> >> verbose about failing each IO w/ DID_BAD_TARGET thus taking a very
> >
> > A bit too verbose? That's really an euphemism ...
>
> Yeap, of course it was. :-)
>
> > During the CDB: Write loop the console was totally unusable!
> >
> > And I think the fsyncs in syslogd completely made the performance
> > tank.
>
> Console often becomes the bottleneck too when there are a lot of
> kernel messages.
>
> > So basically it was a "reset button only" situation.
> >
> > When the device is gone what's the point in giving a message
> > more than once? Can't the requests just be silently failed in this
> > case?
>
> Yeah, it would be better to somehow summarize those error message
> instead of spitting out all of them.
I don't think we can summarize. However, when things start to go wrong,
it's usually only the first set of errors that are significant, so we
could do a simple ratelimit.
James
---
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 1646fe7..c8c7483 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -896,7 +896,7 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
case ACTION_FAIL:
/* Give up and fail the remainder of the request */
scsi_release_buffers(cmd);
- if (!(req->cmd_flags & REQ_QUIET)) {
+ if (!(req->cmd_flags & REQ_QUIET) && printk_ratelimit()) {
if (description)
scmd_printk(KERN_INFO, cmd, "%s\n",
description);
^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-06-14 16:20 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-13 15:48 2.6.34 PDC20268 PATA IO error loop makes system unusable Andi Kleen
2010-06-14 7:43 ` Tejun Heo
2010-06-14 7:53 ` Andi Kleen
2010-06-14 7:59 ` Tejun Heo
2010-06-14 9:22 ` Andi Kleen
2010-06-14 16:20 ` James Bottomley
2010-06-14 11:17 ` Alan Cox
2010-06-14 11:29 ` Andi Kleen
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.