[SCSI][REGRESSION][BISECTED] Disk errors loop forever in 2.6.29

* [SCSI][REGRESSION][BISECTED] Disk errors loop forever in 2.6.29
@ 2009-02-19 13:53 Sitsofe Wheeler
  2009-02-19 14:58 ` Ingo Molnar
  2009-02-19 15:02 ` Alan Stern
  0 siblings, 2 replies; 8+ messages in thread
From: Sitsofe Wheeler @ 2009-02-19 13:53 UTC (permalink / raw)
  To: linux-kernel; +Cc: Alan Stern, James Bottomley

Hi,

There appears to be a regression from 2.6.28 in how disk errors are
handled in 2.6.29rc5 - rather than trying and eventually giving up, it
appears to try (and report) forever. 

Here is the output where it aborts in 2.6.28:

ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
ata2.01: BMDMA stat 0x65
ata2.01: cmd c8/00:00:f7:e2:9c/00:00:00:00:00/f1 tag 0 dma 131072 in
         res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x2 (HSM violation)
ata2: soft resetting link
ata2.00: configured for UDMA/66
ata2.01: configured for UDMA/66
ata2: EH complete
ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
ata2.01: BMDMA stat 0x65
ata2.01: cmd c8/00:00:f7:e2:9c/00:00:00:00:00/f1 tag 0 dma 131072 in
         res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x2 (HSM violation)
ata2: soft resetting link
ata2.00: configured for UDMA/66
ata2.01: configured for UDMA/66
ata2: EH complete
sd 1:0:0:0: [sda] 7880544 512-byte hardware sectors: (4.03 GB/3.75 GiB)
ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
ata2.01: BMDMA stat 0x65
ata2.01: cmd c8/00:00:f7:e2:9c/00:00:00:00:00/f1 tag 0 dma 131072 in
         res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x2 (HSM violation)
ata2: soft resetting link
ata2.00: configured for UDMA/66
ata2.01: configured for UDMA/66
ata2: EH complete
sd 1:0:0:0: [sda] Write Protect is off
sd 1:0:0:0: [sda] Mode Sense: 00 3a 00 00
ata2.01: limiting speed to UDMA/44:PIO4
ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
ata2.01: BMDMA stat 0x65
ata2.01: cmd c8/00:00:f7:e2:9c/00:00:00:00:00/f1 tag 0 dma 131072 in
         res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x2 (HSM violation)
ata2: soft resetting link
ata2.00: configured for UDMA/66
ata2.01: configured for UDMA/44
ata2: EH complete
ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
ata2.01: BMDMA stat 0x65
ata2.01: cmd c8/00:00:f7:e2:9c/00:00:00:00:00/f1 tag 0 dma 131072 in
         res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x2 (HSM violation)
ata2: soft resetting link
ata2.00: configured for UDMA/66
ata2.01: configured for UDMA/44
ata2: EH complete
sd 1:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
sda: detected capacity change from 0 to 4034838528
ata2.01: limiting speed to UDMA/33:PIO4
ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
ata2.01: BMDMA stat 0x65
ata2.01: cmd c8/00:00:f7:e2:9c/00:00:00:00:00/f1 tag 0 dma 131072 in
         res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x2 (HSM violation)
ata2: soft resetting link
ata2.00: configured for UDMA/66
ata2.01: configured for UDMA/33
sd 1:0:1:0: [sdb] Result: hostbyte=0x00 driverbyte=0x08
sd 1:0:1:0: [sdb] Sense Key : 0xb [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
        72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
        00 00 00 00 
sd 1:0:1:0: [sdb] ASC=0x0 ASCQ=0x0
end_request: I/O error, dev sdb, sector 27058935
Buffer I/O error on device sdb2, logical block 444480
Buffer I/O error on device sdb2, logical block 444481
Buffer I/O error on device sdb2, logical block 444482
Buffer I/O error on device sdb2, logical block 444483
Buffer I/O error on device sdb2, logical block 444484
Buffer I/O error on device sdb2, logical block 444485
Buffer I/O error on device sdb2, logical block 444486
Buffer I/O error on device sdb2, logical block 444487
Buffer I/O error on device sdb2, logical block 444488
Buffer I/O error on device sdb2, logical block 444489
ata2: EH complete

It never gets to end_request on 2.6.29. I've bisected the problem down
to the following:

[b60af5b0adf0da24c673598c8d3fb4d4189a15ce] [SCSI] simplify scsi_io_completion()

Author: Alan Stern <stern@rowland.harvard.edu>
Date:   Mon Nov 3 15:56:47 2008 -0500

    [SCSI] simplify scsi_io_completion()

    This patch (as1142b) consolidates a lot of repetitious code in
    scsi_io_completion().  It also fixes a few comments.  Most
    importantly, however, it clearly distinguishes among the three sorts
    of retries that can be done when a command fails to complete:

        Unprepare the request and resubmit it, so that a new
        command will be created for it.

        Requeue the request directly so that it will be retried
        immediately using the same command.

        Requeue the request so that it will be retried following
        a short delay.

        Complete the remainder of the request with an I/O error.

    [jejb: Updates
	 1. For several error conditions, we would now print the sense twice
            in slightly different ways, so unify the location of sense
            printing.
         2. I added more descriptions to actual failure conditions for
            better debugging
         3. according to spec, ABORTED_COMMAND is supposed to be retried
            (except on DIF failure).  Our old behaviour of erroring it looks
            to be a bug.
         4. I'd prefer not to default initialise the action variable because
            that ensures that every leg of the error handler has an
            associated action and the compiler will warn if someone later
            accidentally misses one or removes one.
    ]
    Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
    Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

Reverting this made the endless looping go away. Here is the bisection
log:

git-bisect start
# bad: [38c8e6180939e5619140b2e9e479cb26029ff8b1] do_mpage_readpage():
# don't submit lots of small bios on boundary
git-bisect bad 38c8e6180939e5619140b2e9e479cb26029ff8b1
# good: [4a6908a3a050aacc9c3a2f36b276b46c0629ad91] Linux 2.6.28
git-bisect good 4a6908a3a050aacc9c3a2f36b276b46c0629ad91
# good: [3c92ec8ae91ecf59d88c798301833d7cf83f2179] Merge branch 'next'
# of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
git-bisect good 3c92ec8ae91ecf59d88c798301833d7cf83f2179
# bad: [590cf28580c999c8ba70dc39b40bab09d69e2630] Merge
# git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
git-bisect bad 590cf28580c999c8ba70dc39b40bab09d69e2630
# good: [14a3c4ab0e58d143c7928c9eb2f2610205e13bf2] Merge branch 'devel'
# of master.kernel.org:/home/rmk/linux-2.6-arm
git-bisect good 14a3c4ab0e58d143c7928c9eb2f2610205e13bf2
# good: [121520a7084b48cb26437c6e89d4b491c3e4d4d5] V4L/DVB (9853):
# gspca: Webcam 093a:2622 added in pac7311.
git-bisect good 121520a7084b48cb26437c6e89d4b491c3e4d4d5
# good: [9414de39e8e07d90bdb6524be501fae0e013d37b] V4L/DVB (10079):
# sh_mobile_ceu: use new pixel format translation code
git-bisect good 9414de39e8e07d90bdb6524be501fae0e013d37b
# bad: [dedbc2b3cb8404c618975bd2811c7605a4ccb51e] [SCSI] zfcp: Simplify
# SBAL allocation to fix sparse warnings
git-bisect bad dedbc2b3cb8404c618975bd2811c7605a4ccb51e
# bad: [ae15f80172d95f978b60d40408353943d5bc099b] [SCSI] libiscsi:
# handle init task failures.
git-bisect bad ae15f80172d95f978b60d40408353943d5bc099b
# good: [dd7c34e9a27be521d480e5b84c84e92de2ea7b95] [SCSI] fusion: use
# ARRAY_SIZE
git-bisect good dd7c34e9a27be521d480e5b84c84e92de2ea7b95
# bad: [b29f841378460c37e99a2398d0015d9bd7901a09] [SCSI] remove timeout
# from struct scsi_device
git-bisect bad b29f841378460c37e99a2398d0015d9bd7901a09
# good: [67e6d58d812ec3e95918266076d4c836cdec849a] [SCSI] ibmvfc: Driver
# version 1.0.4
git-bisect good 67e6d58d812ec3e95918266076d4c836cdec849a
# bad: [09e13e91670b69736b5da0a869a076a55a326394] [SCSI] m68k: mac_esp
# asm fix
git-bisect bad 09e13e91670b69736b5da0a869a076a55a326394
# bad: [749af3d54a6d5619088ecadb1010c6ab7766f519] [SCSI] qla2xxx: Code
# changes for vport bus reset
git-bisect bad 749af3d54a6d5619088ecadb1010c6ab7766f519
# bad: [b60af5b0adf0da24c673598c8d3fb4d4189a15ce] [SCSI] simplify
# scsi_io_completion()
git-bisect bad b60af5b0adf0da24c673598c8d3fb4d4189a15ce

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 8+ messages in thread