All of lore.kernel.org
 help / color / mirror / Atom feed
* mvsas errors in 2.6.36
@ 2010-10-29 12:50 Thomas Fjellstrom
  2010-10-31 15:11 ` Thomas Fjellstrom
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Fjellstrom @ 2010-10-29 12:50 UTC (permalink / raw)
  To: Linux Kernel List; +Cc: linux-scsi

Good news and bad news, the current mvsas driver in 2.6.36 seems to work
better than older kernels with my setup (2 port sas + 5 SATA disks). But I
gotten the following messages so far:

[  213.856050] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task() mvi=ffff880123200000 task=ffff880122545a40 slot=ffff880123226628 slot_idx=x3
[  213.856064] drivers/scsi/mvsas/mv_sas.c 1632:mvs_query_task:rc= 5
[  213.856094] drivers/scsi/mvsas/mv_sas.c 2083:port 0 ctrl sts=0x89800.
[  213.856100] drivers/scsi/mvsas/mv_sas.c 2085:Port 0 irq sts = 0x1001
[  213.856111] drivers/scsi/mvsas/mv_sas.c 2111:phy0 Unplug Notice
[  213.866069] drivers/scsi/mvsas/mv_sas.c 2083:port 0 ctrl sts=0x199800.
[  213.866078] drivers/scsi/mvsas/mv_sas.c 2085:Port 0 irq sts = 0x1081
[  213.887617] drivers/scsi/mvsas/mv_sas.c 2083:port 0 ctrl sts=0x199800.
[  213.887625] drivers/scsi/mvsas/mv_sas.c 2085:Port 0 irq sts = 0x10000
[  213.887632] drivers/scsi/mvsas/mv_sas.c 2138:notify plug in on phy[0]
[  213.991191] drivers/scsi/mvsas/mv_sas.c 1224:port 0 attach dev info is 0
[  213.991191] drivers/scsi/mvsas/mv_sas.c 1226:port 0 attach sas addr is 0
[  213.995701] drivers/scsi/mvsas/mv_sas.c 378:phy 0 byte dmaded.
[  216.064032] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for device[0]:rc= 0
[  216.064049] ata9: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00
[  216.064647] ata9.00: device reported invalid CHS sector 0
[  216.065226] ata9: status=0x01 { Error }
[  216.065815] ata9: error=0x04 { DriveStatusError }
[ 1519.840061] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task() mvi=ffff880123200000 task=ffff88011f944700 slot=ffff880123226680 slot_idx=x4
[ 1519.840075] drivers/scsi/mvsas/mv_sas.c 1632:mvs_query_task:rc= 5
[ 1519.840107] drivers/scsi/mvsas/mv_sas.c 2083:port 3 ctrl sts=0x89800.
[ 1519.840113] drivers/scsi/mvsas/mv_sas.c 2085:Port 3 irq sts = 0x1001
[ 1519.840124] drivers/scsi/mvsas/mv_sas.c 2111:phy3 Unplug Notice
[ 1519.850080] drivers/scsi/mvsas/mv_sas.c 2083:port 3 ctrl sts=0x199800.
[ 1519.850086] drivers/scsi/mvsas/mv_sas.c 2085:Port 3 irq sts = 0x1081
[ 1519.854247] drivers/scsi/mvsas/mv_sas.c 2083:port 3 ctrl sts=0x199800.
[ 1519.854250] drivers/scsi/mvsas/mv_sas.c 2085:Port 3 irq sts = 0x10000
[ 1519.854252] drivers/scsi/mvsas/mv_sas.c 2138:notify plug in on phy[3]
[ 1519.951698] drivers/scsi/mvsas/mv_sas.c 1224:port 3 attach dev info is 2000000
[ 1519.951698] drivers/scsi/mvsas/mv_sas.c 1226:port 3 attach sas addr is 3
[ 1519.963251] drivers/scsi/mvsas/mv_sas.c 378:phy 3 byte dmaded.
[ 1522.048039] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for device[3]:rc= 0
[ 1522.048056] ata12: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00
[ 1522.048679] ata12.00: device reported invalid CHS sector 0
[ 1522.049268] ata12: status=0x01 { Error }
[ 1522.049856] ata12: error=0x04 { DriveStatusError }
[ 1558.816044] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task() mvi=ffff880123200000 task=ffff880124d89500 slot=ffff880123226680 slot_idx=x4
[ 1558.816058] drivers/scsi/mvsas/mv_sas.c 1632:mvs_query_task:rc= 5
[ 1558.816086] drivers/scsi/mvsas/mv_sas.c 2083:port 3 ctrl sts=0x89800.
[ 1558.816092] drivers/scsi/mvsas/mv_sas.c 2085:Port 3 irq sts = 0x1001
[ 1558.816103] drivers/scsi/mvsas/mv_sas.c 2111:phy3 Unplug Notice
[ 1558.826059] drivers/scsi/mvsas/mv_sas.c 2083:port 3 ctrl sts=0x199800.
[ 1558.826066] drivers/scsi/mvsas/mv_sas.c 2085:Port 3 irq sts = 0x1081
[ 1558.829663] drivers/scsi/mvsas/mv_sas.c 2083:port 3 ctrl sts=0x199800.
[ 1558.829670] drivers/scsi/mvsas/mv_sas.c 2085:Port 3 irq sts = 0x10000
[ 1558.829677] drivers/scsi/mvsas/mv_sas.c 2138:notify plug in on phy[3]
[ 1558.904494] drivers/scsi/mvsas/mv_sas.c 1224:port 3 attach dev info is 2000000
[ 1558.904494] drivers/scsi/mvsas/mv_sas.c 1226:port 3 attach sas addr is 3
[ 1558.938424] drivers/scsi/mvsas/mv_sas.c 378:phy 3 byte dmaded.
[ 1561.024027] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for device[3]:rc= 0
[ 1561.024044] ata12: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00
[ 1561.024652] ata12.00: device reported invalid CHS sector 0
[ 1561.025242] ata12: status=0x01 { Error }
[ 1561.025834] ata12: error=0x04 { DriveStatusError }
[ 1594.800036] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task() mvi=ffff880123200000 task=ffff88011f945a40 slot=ffff880123226680 slot_idx=x4
[ 1594.800051] drivers/scsi/mvsas/mv_sas.c 1632:mvs_query_task:rc= 5
[ 1594.800077] drivers/scsi/mvsas/mv_sas.c 2083:port 3 ctrl sts=0x89800.
[ 1594.800083] drivers/scsi/mvsas/mv_sas.c 2085:Port 3 irq sts = 0x1001
[ 1594.800094] drivers/scsi/mvsas/mv_sas.c 2111:phy3 Unplug Notice
[ 1594.810048] drivers/scsi/mvsas/mv_sas.c 2083:port 3 ctrl sts=0x199800.
[ 1594.810055] drivers/scsi/mvsas/mv_sas.c 2085:Port 3 irq sts = 0x1081
[ 1594.814327] drivers/scsi/mvsas/mv_sas.c 2083:port 3 ctrl sts=0x199800.
[ 1594.814330] drivers/scsi/mvsas/mv_sas.c 2085:Port 3 irq sts = 0x10000
[ 1594.814332] drivers/scsi/mvsas/mv_sas.c 2138:notify plug in on phy[3]
[ 1594.882000] drivers/scsi/mvsas/mv_sas.c 1224:port 3 attach dev info is 2000000
[ 1594.882000] drivers/scsi/mvsas/mv_sas.c 1226:port 3 attach sas addr is 3
[ 1594.923382] drivers/scsi/mvsas/mv_sas.c 378:phy 3 byte dmaded.
[ 1597.008031] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for device[3]:rc= 0
[ 1597.008048] ata12: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00
[ 1597.008675] ata12.00: device reported invalid CHS sector 0
[ 1597.009271] ata12: status=0x01 { Error }
[ 1597.009871] ata12: error=0x04 { DriveStatusError }
[ 2193.824051] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task() mvi=ffff880123200000 task=ffff880009c7c540 slot=ffff8801232265d0 slot_idx=x2
[ 2193.824065] drivers/scsi/mvsas/mv_sas.c 1632:mvs_query_task:rc= 5
[ 2193.824092] drivers/scsi/mvsas/mv_sas.c 2083:port 0 ctrl sts=0x89800.
[ 2193.824099] drivers/scsi/mvsas/mv_sas.c 2085:Port 0 irq sts = 0x1001001
[ 2193.824109] drivers/scsi/mvsas/mv_sas.c 2111:phy0 Unplug Notice
[ 2193.834062] drivers/scsi/mvsas/mv_sas.c 2083:port 0 ctrl sts=0x199800.
[ 2193.834067] drivers/scsi/mvsas/mv_sas.c 2085:Port 0 irq sts = 0x1001081
[ 2193.855272] drivers/scsi/mvsas/mv_sas.c 2083:port 0 ctrl sts=0x199800.
[ 2193.855279] drivers/scsi/mvsas/mv_sas.c 2085:Port 0 irq sts = 0x10000
[ 2193.855286] drivers/scsi/mvsas/mv_sas.c 2138:notify plug in on phy[0]
[ 2193.859234] drivers/scsi/mvsas/mv_sas.c 1224:port 0 attach dev info is 0
[ 2193.859234] drivers/scsi/mvsas/mv_sas.c 1226:port 0 attach sas addr is 0
[ 2193.959270] drivers/scsi/mvsas/mv_sas.c 378:phy 0 byte dmaded.
[ 2196.032026] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for device[0]:rc= 0
[ 2196.032045] ata9: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00
[ 2196.032676] ata9: status=0x01 { Error }
[ 2196.033274] ata9: error=0x04 { DriveStatusError }
[ 2440.800047] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task() mvi=ffff880123200000 task=ffff880010f36700 slot=ffff880123226628 slot_idx=x3
[ 2440.800061] drivers/scsi/mvsas/mv_sas.c 1632:mvs_query_task:rc= 5
[ 2440.800090] drivers/scsi/mvsas/mv_sas.c 2083:port 7 ctrl sts=0x89800.
[ 2440.800096] drivers/scsi/mvsas/mv_sas.c 2085:Port 7 irq sts = 0x1001
[ 2440.800107] drivers/scsi/mvsas/mv_sas.c 2111:phy7 Unplug Notice
[ 2440.810060] drivers/scsi/mvsas/mv_sas.c 2083:port 7 ctrl sts=0x199800.
[ 2440.810065] drivers/scsi/mvsas/mv_sas.c 2085:Port 7 irq sts = 0x1081
[ 2440.831453] drivers/scsi/mvsas/mv_sas.c 2083:port 7 ctrl sts=0x199800.
[ 2440.831460] drivers/scsi/mvsas/mv_sas.c 2085:Port 7 irq sts = 0x10000
[ 2440.831467] drivers/scsi/mvsas/mv_sas.c 2138:notify plug in on phy[7]
[ 2440.880053] drivers/scsi/mvsas/mv_sas.c 1224:port 7 attach dev info is 4000000
[ 2440.880053] drivers/scsi/mvsas/mv_sas.c 1226:port 7 attach sas addr is 7
[ 2440.940497] drivers/scsi/mvsas/mv_sas.c 378:phy 7 byte dmaded.
[ 2443.008033] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for device[4]:rc= 0
[ 2443.008052] ata13: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00
[ 2443.008685] ata13: status=0x01 { Error }
[ 2443.009295] ata13: error=0x04 { DriveStatusError }
[ 2675.808044] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task() mvi=ffff880123200000 task=ffff88011aae3500 slot=ffff880123226578 slot_idx=x1
[ 2675.808058] drivers/scsi/mvsas/mv_sas.c 1632:mvs_query_task:rc= 5
[ 2675.808088] drivers/scsi/mvsas/mv_sas.c 2083:port 2 ctrl sts=0x89800.
[ 2675.808094] drivers/scsi/mvsas/mv_sas.c 2085:Port 2 irq sts = 0x1001
[ 2675.808104] drivers/scsi/mvsas/mv_sas.c 2111:phy2 Unplug Notice
[ 2675.818051] drivers/scsi/mvsas/mv_sas.c 2083:port 2 ctrl sts=0x199800.
[ 2675.818057] drivers/scsi/mvsas/mv_sas.c 2085:Port 2 irq sts = 0x1081
[ 2675.839505] drivers/scsi/mvsas/mv_sas.c 2083:port 2 ctrl sts=0x199800.
[ 2675.839513] drivers/scsi/mvsas/mv_sas.c 2085:Port 2 irq sts = 0x10000
[ 2675.839519] drivers/scsi/mvsas/mv_sas.c 2138:notify plug in on phy[2]
[ 2675.874139] drivers/scsi/mvsas/mv_sas.c 1224:port 2 attach dev info is 4
[ 2675.874139] drivers/scsi/mvsas/mv_sas.c 1226:port 2 attach sas addr is 2
[ 2675.936683] drivers/scsi/mvsas/mv_sas.c 378:phy 2 byte dmaded.
[ 2678.016055] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for device[2]:rc= 0
[ 2678.016075] ata11: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00
[ 2678.016706] ata11: status=0x01 { Error }
[ 2678.017315] ata11: error=0x04 { DriveStatusError }
[ 2678.017964] ata9: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00
[ 2678.018573] ata9: status=0x01 { Error }
[ 2678.019175] ata9: error=0x04 { DriveStatusError }

I did not unplug a disk, the errors seem to be spurious.

Otherwise though things seem to be working. At least so far. The mv_abort_task
part is very familiar, the older version of this driver would do it right
after attempting to build/activate the md raid5 array that lives on this
controller. Except the controller would lock up, and all drives would become
inaccessible.

I'm going to attempt to grow this array today, so long as the xfs_fsr that I
started doesn't cause the array to fail.

If I keep getting mv_abort_task errors, I'll have to back down to the copy of
the driver I got from Andy Yan. I've managed to patch it up to compile for
2.6.36 just now, I just hope it'll work at least as well as it did with
2.6.34. At the very least I didn't get these errors.

Some background, the disks attached to the card are (5) Seagate 7200.12 1TB
disks, using SAS->SATA cables. Machine is a amd64 Phenom II X4 810 w/4G
ram running debian sid and a vanila 2.6.36 kernel. The card is a
AOC-SASLP-MV8, according to lspci:

04:00.0 SCSI storage controller: Marvell Technology Group Ltd. MV64460/64461/64462 System Controller, Revision B (rev 01)

according to dmesg:

[    2.819325] mvsas 0000:04:00.0: mvsas: driver version 0.8.2
[    2.819394] mvsas 0000:04:00.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
[    2.819454] mvsas 0000:04:00.0: setting latency timer to 64
[    2.820952] mvsas 0000:04:00.0: mvsas: PCI-E x4, Bandwidth Usage: 2.5 Gbps
[    7.203222] drivers/scsi/mvsas/mv_sas.c 1224:port 0 attach dev info is 0
[    7.203225] drivers/scsi/mvsas/mv_sas.c 1226:port 0 attach sas addr is 0
[    7.403220] drivers/scsi/mvsas/mv_sas.c 1224:port 1 attach dev info is 0
[    7.403223] drivers/scsi/mvsas/mv_sas.c 1226:port 1 attach sas addr is 1
[    7.603221] drivers/scsi/mvsas/mv_sas.c 1224:port 2 attach dev info is 4
[    7.603223] drivers/scsi/mvsas/mv_sas.c 1226:port 2 attach sas addr is 2
[    7.803221] drivers/scsi/mvsas/mv_sas.c 1224:port 3 attach dev info is 2000000
[    7.803224] drivers/scsi/mvsas/mv_sas.c 1226:port 3 attach sas addr is 3
[    7.904015] drivers/scsi/mvsas/mv_sas.c 1224:port 4 attach dev info is 0
[    7.904018] drivers/scsi/mvsas/mv_sas.c 1226:port 4 attach sas addr is 0
[    8.008014] drivers/scsi/mvsas/mv_sas.c 1224:port 5 attach dev info is 0
[    8.008017] drivers/scsi/mvsas/mv_sas.c 1226:port 5 attach sas addr is 0
[    8.112014] drivers/scsi/mvsas/mv_sas.c 1224:port 6 attach dev info is 0
[    8.112016] drivers/scsi/mvsas/mv_sas.c 1226:port 6 attach sas addr is 0
[    8.315223] drivers/scsi/mvsas/mv_sas.c 1224:port 7 attach dev info is 4000000
[    8.315226] drivers/scsi/mvsas/mv_sas.c 1226:port 7 attach sas addr is 7
[    8.315230] scsi8 : mvsas
[    8.315620] drivers/scsi/mvsas/mv_sas.c 378:phy 0 byte dmaded.
[    8.315624] drivers/scsi/mvsas/mv_sas.c 378:phy 1 byte dmaded.
[    8.315628] drivers/scsi/mvsas/mv_sas.c 378:phy 2 byte dmaded.
[    8.315632] drivers/scsi/mvsas/mv_sas.c 378:phy 3 byte dmaded.
[    8.315636] drivers/scsi/mvsas/mv_sas.c 378:phy 7 byte dmaded.
[    8.316762] drivers/scsi/mvsas/mv_sas.c 1388:found dev[0:5] is gone.
[    8.384626] drivers/scsi/mvsas/mv_sas.c 1388:found dev[1:5] is gone.
[    8.452444] drivers/scsi/mvsas/mv_sas.c 1388:found dev[2:5] is gone.
[    8.520181] drivers/scsi/mvsas/mv_sas.c 1388:found dev[3:5] is gone.
[    8.523810] drivers/scsi/mvsas/mv_sas.c 1388:found dev[4:5] is gone.

I just hope the raid5 reshape I'm about to do doesn't crap its pants because
of the errors above.

I'd like to help test any fixes or changes if needed. Let me know.

Thanks again.

-- 
Thomas Fjellstrom
thomas@fjellstrom.ca

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mvsas errors in 2.6.36
  2010-10-29 12:50 mvsas errors in 2.6.36 Thomas Fjellstrom
@ 2010-10-31 15:11 ` Thomas Fjellstrom
  2010-11-02 17:02   ` Audio Haven
                     ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Thomas Fjellstrom @ 2010-10-31 15:11 UTC (permalink / raw)
  To: Linux Kernel List; +Cc: linux-scsi

On October 29, 2010, Thomas Fjellstrom wrote:
> Good news and bad news, the current mvsas driver in 2.6.36 seems to work
> better than older kernels with my setup (2 port sas + 5 SATA disks). But I
> gotten the following messages so far:
> 
> [  213.856050] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task()
> mvi=ffff880123200000 task=ffff880122545a40 slot=ffff880123226628
> slot_idx=x3 [  213.856064] drivers/scsi/mvsas/mv_sas.c
> 1632:mvs_query_task:rc= 5 [  213.856094] drivers/scsi/mvsas/mv_sas.c
> 2083:port 0 ctrl sts=0x89800. [  213.856100] drivers/scsi/mvsas/mv_sas.c
> 2085:Port 0 irq sts = 0x1001 [  213.856111] drivers/scsi/mvsas/mv_sas.c
> 2111:phy0 Unplug Notice [  213.866069] drivers/scsi/mvsas/mv_sas.c
> 2083:port 0 ctrl sts=0x199800. [  213.866078] drivers/scsi/mvsas/mv_sas.c
> 2085:Port 0 irq sts = 0x1081 [  213.887617] drivers/scsi/mvsas/mv_sas.c
> 2083:port 0 ctrl sts=0x199800. [  213.887625] drivers/scsi/mvsas/mv_sas.c
> 2085:Port 0 irq sts = 0x10000 [  213.887632] drivers/scsi/mvsas/mv_sas.c
> 2138:notify plug in on phy[0] [  213.991191] drivers/scsi/mvsas/mv_sas.c
> 1224:port 0 attach dev info is 0 [  213.991191]
> drivers/scsi/mvsas/mv_sas.c 1226:port 0 attach sas addr is 0 [ 
> 213.995701] drivers/scsi/mvsas/mv_sas.c 378:phy 0 byte dmaded.
> [  216.064032] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for
> device[0]:rc= 0 [  216.064049] ata9: translated ATA stat/err 0x01/04 to
> SCSI SK/ASC/ASCQ 0xb/00/00 [  216.064647] ata9.00: device reported invalid
> CHS sector 0
> [  216.065226] ata9: status=0x01 { Error }
> [  216.065815] ata9: error=0x04 { DriveStatusError }
> [ 1519.840061] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task()
> mvi=ffff880123200000 task=ffff88011f944700 slot=ffff880123226680
> slot_idx=x4 [ 1519.840075] drivers/scsi/mvsas/mv_sas.c
> 1632:mvs_query_task:rc= 5 [ 1519.840107] drivers/scsi/mvsas/mv_sas.c
> 2083:port 3 ctrl sts=0x89800. [ 1519.840113] drivers/scsi/mvsas/mv_sas.c
> 2085:Port 3 irq sts = 0x1001 [ 1519.840124] drivers/scsi/mvsas/mv_sas.c
> 2111:phy3 Unplug Notice [ 1519.850080] drivers/scsi/mvsas/mv_sas.c
> 2083:port 3 ctrl sts=0x199800. [ 1519.850086] drivers/scsi/mvsas/mv_sas.c
> 2085:Port 3 irq sts = 0x1081 [ 1519.854247] drivers/scsi/mvsas/mv_sas.c
> 2083:port 3 ctrl sts=0x199800. [ 1519.854250] drivers/scsi/mvsas/mv_sas.c
> 2085:Port 3 irq sts = 0x10000 [ 1519.854252] drivers/scsi/mvsas/mv_sas.c
> 2138:notify plug in on phy[3] [ 1519.951698] drivers/scsi/mvsas/mv_sas.c
> 1224:port 3 attach dev info is 2000000 [ 1519.951698]
> drivers/scsi/mvsas/mv_sas.c 1226:port 3 attach sas addr is 3 [
> 1519.963251] drivers/scsi/mvsas/mv_sas.c 378:phy 3 byte dmaded.
> [ 1522.048039] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for
> device[3]:rc= 0 [ 1522.048056] ata12: translated ATA stat/err 0x01/04 to
> SCSI SK/ASC/ASCQ 0xb/00/00 [ 1522.048679] ata12.00: device reported
> invalid CHS sector 0
> [ 1522.049268] ata12: status=0x01 { Error }
> [ 1522.049856] ata12: error=0x04 { DriveStatusError }
> [ 1558.816044] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task()
> mvi=ffff880123200000 task=ffff880124d89500 slot=ffff880123226680
> slot_idx=x4 [ 1558.816058] drivers/scsi/mvsas/mv_sas.c
> 1632:mvs_query_task:rc= 5 [ 1558.816086] drivers/scsi/mvsas/mv_sas.c
> 2083:port 3 ctrl sts=0x89800. [ 1558.816092] drivers/scsi/mvsas/mv_sas.c
> 2085:Port 3 irq sts = 0x1001 [ 1558.816103] drivers/scsi/mvsas/mv_sas.c
> 2111:phy3 Unplug Notice [ 1558.826059] drivers/scsi/mvsas/mv_sas.c
> 2083:port 3 ctrl sts=0x199800. [ 1558.826066] drivers/scsi/mvsas/mv_sas.c
> 2085:Port 3 irq sts = 0x1081 [ 1558.829663] drivers/scsi/mvsas/mv_sas.c
> 2083:port 3 ctrl sts=0x199800. [ 1558.829670] drivers/scsi/mvsas/mv_sas.c
> 2085:Port 3 irq sts = 0x10000 [ 1558.829677] drivers/scsi/mvsas/mv_sas.c
> 2138:notify plug in on phy[3] [ 1558.904494] drivers/scsi/mvsas/mv_sas.c
> 1224:port 3 attach dev info is 2000000 [ 1558.904494]
> drivers/scsi/mvsas/mv_sas.c 1226:port 3 attach sas addr is 3 [
> 1558.938424] drivers/scsi/mvsas/mv_sas.c 378:phy 3 byte dmaded.
> [ 1561.024027] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for
> device[3]:rc= 0 [ 1561.024044] ata12: translated ATA stat/err 0x01/04 to
> SCSI SK/ASC/ASCQ 0xb/00/00 [ 1561.024652] ata12.00: device reported
> invalid CHS sector 0
> [ 1561.025242] ata12: status=0x01 { Error }
> [ 1561.025834] ata12: error=0x04 { DriveStatusError }
> [ 1594.800036] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task()
> mvi=ffff880123200000 task=ffff88011f945a40 slot=ffff880123226680
> slot_idx=x4 [ 1594.800051] drivers/scsi/mvsas/mv_sas.c
> 1632:mvs_query_task:rc= 5 [ 1594.800077] drivers/scsi/mvsas/mv_sas.c
> 2083:port 3 ctrl sts=0x89800. [ 1594.800083] drivers/scsi/mvsas/mv_sas.c
> 2085:Port 3 irq sts = 0x1001 [ 1594.800094] drivers/scsi/mvsas/mv_sas.c
> 2111:phy3 Unplug Notice [ 1594.810048] drivers/scsi/mvsas/mv_sas.c
> 2083:port 3 ctrl sts=0x199800. [ 1594.810055] drivers/scsi/mvsas/mv_sas.c
> 2085:Port 3 irq sts = 0x1081 [ 1594.814327] drivers/scsi/mvsas/mv_sas.c
> 2083:port 3 ctrl sts=0x199800. [ 1594.814330] drivers/scsi/mvsas/mv_sas.c
> 2085:Port 3 irq sts = 0x10000 [ 1594.814332] drivers/scsi/mvsas/mv_sas.c
> 2138:notify plug in on phy[3] [ 1594.882000] drivers/scsi/mvsas/mv_sas.c
> 1224:port 3 attach dev info is 2000000 [ 1594.882000]
> drivers/scsi/mvsas/mv_sas.c 1226:port 3 attach sas addr is 3 [
> 1594.923382] drivers/scsi/mvsas/mv_sas.c 378:phy 3 byte dmaded.
> [ 1597.008031] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for
> device[3]:rc= 0 [ 1597.008048] ata12: translated ATA stat/err 0x01/04 to
> SCSI SK/ASC/ASCQ 0xb/00/00 [ 1597.008675] ata12.00: device reported
> invalid CHS sector 0
> [ 1597.009271] ata12: status=0x01 { Error }
> [ 1597.009871] ata12: error=0x04 { DriveStatusError }
> [ 2193.824051] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task()
> mvi=ffff880123200000 task=ffff880009c7c540 slot=ffff8801232265d0
> slot_idx=x2 [ 2193.824065] drivers/scsi/mvsas/mv_sas.c
> 1632:mvs_query_task:rc= 5 [ 2193.824092] drivers/scsi/mvsas/mv_sas.c
> 2083:port 0 ctrl sts=0x89800. [ 2193.824099] drivers/scsi/mvsas/mv_sas.c
> 2085:Port 0 irq sts = 0x1001001 [ 2193.824109] drivers/scsi/mvsas/mv_sas.c
> 2111:phy0 Unplug Notice [ 2193.834062] drivers/scsi/mvsas/mv_sas.c
> 2083:port 0 ctrl sts=0x199800. [ 2193.834067] drivers/scsi/mvsas/mv_sas.c
> 2085:Port 0 irq sts = 0x1001081 [ 2193.855272] drivers/scsi/mvsas/mv_sas.c
> 2083:port 0 ctrl sts=0x199800. [ 2193.855279] drivers/scsi/mvsas/mv_sas.c
> 2085:Port 0 irq sts = 0x10000 [ 2193.855286] drivers/scsi/mvsas/mv_sas.c
> 2138:notify plug in on phy[0] [ 2193.859234] drivers/scsi/mvsas/mv_sas.c
> 1224:port 0 attach dev info is 0 [ 2193.859234]
> drivers/scsi/mvsas/mv_sas.c 1226:port 0 attach sas addr is 0 [
> 2193.959270] drivers/scsi/mvsas/mv_sas.c 378:phy 0 byte dmaded.
> [ 2196.032026] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for
> device[0]:rc= 0 [ 2196.032045] ata9: translated ATA stat/err 0x01/04 to
> SCSI SK/ASC/ASCQ 0xb/00/00 [ 2196.032676] ata9: status=0x01 { Error }
> [ 2196.033274] ata9: error=0x04 { DriveStatusError }
> [ 2440.800047] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task()
> mvi=ffff880123200000 task=ffff880010f36700 slot=ffff880123226628
> slot_idx=x3 [ 2440.800061] drivers/scsi/mvsas/mv_sas.c
> 1632:mvs_query_task:rc= 5 [ 2440.800090] drivers/scsi/mvsas/mv_sas.c
> 2083:port 7 ctrl sts=0x89800. [ 2440.800096] drivers/scsi/mvsas/mv_sas.c
> 2085:Port 7 irq sts = 0x1001 [ 2440.800107] drivers/scsi/mvsas/mv_sas.c
> 2111:phy7 Unplug Notice [ 2440.810060] drivers/scsi/mvsas/mv_sas.c
> 2083:port 7 ctrl sts=0x199800. [ 2440.810065] drivers/scsi/mvsas/mv_sas.c
> 2085:Port 7 irq sts = 0x1081 [ 2440.831453] drivers/scsi/mvsas/mv_sas.c
> 2083:port 7 ctrl sts=0x199800. [ 2440.831460] drivers/scsi/mvsas/mv_sas.c
> 2085:Port 7 irq sts = 0x10000 [ 2440.831467] drivers/scsi/mvsas/mv_sas.c
> 2138:notify plug in on phy[7] [ 2440.880053] drivers/scsi/mvsas/mv_sas.c
> 1224:port 7 attach dev info is 4000000 [ 2440.880053]
> drivers/scsi/mvsas/mv_sas.c 1226:port 7 attach sas addr is 7 [
> 2440.940497] drivers/scsi/mvsas/mv_sas.c 378:phy 7 byte dmaded.
> [ 2443.008033] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for
> device[4]:rc= 0 [ 2443.008052] ata13: translated ATA stat/err 0x01/04 to
> SCSI SK/ASC/ASCQ 0xb/00/00 [ 2443.008685] ata13: status=0x01 { Error }
> [ 2443.009295] ata13: error=0x04 { DriveStatusError }
> [ 2675.808044] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task()
> mvi=ffff880123200000 task=ffff88011aae3500 slot=ffff880123226578
> slot_idx=x1 [ 2675.808058] drivers/scsi/mvsas/mv_sas.c
> 1632:mvs_query_task:rc= 5 [ 2675.808088] drivers/scsi/mvsas/mv_sas.c
> 2083:port 2 ctrl sts=0x89800. [ 2675.808094] drivers/scsi/mvsas/mv_sas.c
> 2085:Port 2 irq sts = 0x1001 [ 2675.808104] drivers/scsi/mvsas/mv_sas.c
> 2111:phy2 Unplug Notice [ 2675.818051] drivers/scsi/mvsas/mv_sas.c
> 2083:port 2 ctrl sts=0x199800. [ 2675.818057] drivers/scsi/mvsas/mv_sas.c
> 2085:Port 2 irq sts = 0x1081 [ 2675.839505] drivers/scsi/mvsas/mv_sas.c
> 2083:port 2 ctrl sts=0x199800. [ 2675.839513] drivers/scsi/mvsas/mv_sas.c
> 2085:Port 2 irq sts = 0x10000 [ 2675.839519] drivers/scsi/mvsas/mv_sas.c
> 2138:notify plug in on phy[2] [ 2675.874139] drivers/scsi/mvsas/mv_sas.c
> 1224:port 2 attach dev info is 4 [ 2675.874139]
> drivers/scsi/mvsas/mv_sas.c 1226:port 2 attach sas addr is 2 [
> 2675.936683] drivers/scsi/mvsas/mv_sas.c 378:phy 2 byte dmaded.
> [ 2678.016055] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for
> device[2]:rc= 0 [ 2678.016075] ata11: translated ATA stat/err 0x01/04 to
> SCSI SK/ASC/ASCQ 0xb/00/00 [ 2678.016706] ata11: status=0x01 { Error }
> [ 2678.017315] ata11: error=0x04 { DriveStatusError }
> [ 2678.017964] ata9: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ
> 0xb/00/00 [ 2678.018573] ata9: status=0x01 { Error }
> [ 2678.019175] ata9: error=0x04 { DriveStatusError }
> 
> I did not unplug a disk, the errors seem to be spurious.
> 
> Otherwise though things seem to be working. At least so far. The
> mv_abort_task part is very familiar, the older version of this driver
> would do it right after attempting to build/activate the md raid5 array
> that lives on this controller. Except the controller would lock up, and
> all drives would become inaccessible.
> 
> I'm going to attempt to grow this array today, so long as the xfs_fsr that
> I started doesn't cause the array to fail.
> 
> If I keep getting mv_abort_task errors, I'll have to back down to the copy
> of the driver I got from Andy Yan. I've managed to patch it up to compile
> for 2.6.36 just now, I just hope it'll work at least as well as it did
> with 2.6.34. At the very least I didn't get these errors.
> 
> Some background, the disks attached to the card are (5) Seagate 7200.12 1TB
> disks, using SAS->SATA cables. Machine is a amd64 Phenom II X4 810 w/4G
> ram running debian sid and a vanila 2.6.36 kernel. The card is a
> AOC-SASLP-MV8, according to lspci:
> 
> 04:00.0 SCSI storage controller: Marvell Technology Group Ltd.
> MV64460/64461/64462 System Controller, Revision B (rev 01)
> 
> according to dmesg:
> 
> [    2.819325] mvsas 0000:04:00.0: mvsas: driver version 0.8.2
> [    2.819394] mvsas 0000:04:00.0: PCI INT A -> GSI 19 (level, low) -> IRQ
> 19 [    2.819454] mvsas 0000:04:00.0: setting latency timer to 64
> [    2.820952] mvsas 0000:04:00.0: mvsas: PCI-E x4, Bandwidth Usage: 2.5
> Gbps [    7.203222] drivers/scsi/mvsas/mv_sas.c 1224:port 0 attach dev
> info is 0 [    7.203225] drivers/scsi/mvsas/mv_sas.c 1226:port 0 attach
> sas addr is 0 [    7.403220] drivers/scsi/mvsas/mv_sas.c 1224:port 1
> attach dev info is 0 [    7.403223] drivers/scsi/mvsas/mv_sas.c 1226:port
> 1 attach sas addr is 1 [    7.603221] drivers/scsi/mvsas/mv_sas.c
> 1224:port 2 attach dev info is 4 [    7.603223]
> drivers/scsi/mvsas/mv_sas.c 1226:port 2 attach sas addr is 2 [   
> 7.803221] drivers/scsi/mvsas/mv_sas.c 1224:port 3 attach dev info is
> 2000000 [    7.803224] drivers/scsi/mvsas/mv_sas.c 1226:port 3 attach sas
> addr is 3 [    7.904015] drivers/scsi/mvsas/mv_sas.c 1224:port 4 attach
> dev info is 0 [    7.904018] drivers/scsi/mvsas/mv_sas.c 1226:port 4
> attach sas addr is 0 [    8.008014] drivers/scsi/mvsas/mv_sas.c 1224:port
> 5 attach dev info is 0 [    8.008017] drivers/scsi/mvsas/mv_sas.c
> 1226:port 5 attach sas addr is 0 [    8.112014]
> drivers/scsi/mvsas/mv_sas.c 1224:port 6 attach dev info is 0 [   
> 8.112016] drivers/scsi/mvsas/mv_sas.c 1226:port 6 attach sas addr is 0 [  
>  8.315223] drivers/scsi/mvsas/mv_sas.c 1224:port 7 attach dev info is
> 4000000 [    8.315226] drivers/scsi/mvsas/mv_sas.c 1226:port 7 attach sas
> addr is 7 [    8.315230] scsi8 : mvsas
> [    8.315620] drivers/scsi/mvsas/mv_sas.c 378:phy 0 byte dmaded.
> [    8.315624] drivers/scsi/mvsas/mv_sas.c 378:phy 1 byte dmaded.
> [    8.315628] drivers/scsi/mvsas/mv_sas.c 378:phy 2 byte dmaded.
> [    8.315632] drivers/scsi/mvsas/mv_sas.c 378:phy 3 byte dmaded.
> [    8.315636] drivers/scsi/mvsas/mv_sas.c 378:phy 7 byte dmaded.
> [    8.316762] drivers/scsi/mvsas/mv_sas.c 1388:found dev[0:5] is gone.
> [    8.384626] drivers/scsi/mvsas/mv_sas.c 1388:found dev[1:5] is gone.
> [    8.452444] drivers/scsi/mvsas/mv_sas.c 1388:found dev[2:5] is gone.
> [    8.520181] drivers/scsi/mvsas/mv_sas.c 1388:found dev[3:5] is gone.
> [    8.523810] drivers/scsi/mvsas/mv_sas.c 1388:found dev[4:5] is gone.
> 
> I just hope the raid5 reshape I'm about to do doesn't crap its pants
> because of the errors above.
> 
> I'd like to help test any fixes or changes if needed. Let me know.
> 
> Thanks again.

After a couple days of uptime, the messages are still happening:

[175665.888045] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task() mvi=ffff880123b00000 task=ffff88010e77e000 slot=ffff880123b26680 slot_idx=x4
[175665.888059] drivers/scsi/mvsas/mv_sas.c 1632:mvs_query_task:rc= 5
[175665.888086] drivers/scsi/mvsas/mv_sas.c 2083:port 0 ctrl sts=0x89800.
[175665.888092] drivers/scsi/mvsas/mv_sas.c 2085:Port 0 irq sts = 0x1001
[175665.888103] drivers/scsi/mvsas/mv_sas.c 2111:phy0 Unplug Notice
[175665.898053] drivers/scsi/mvsas/mv_sas.c 2083:port 0 ctrl sts=0x199800.
[175665.898061] drivers/scsi/mvsas/mv_sas.c 2085:Port 0 irq sts = 0x1081
[175665.919498] drivers/scsi/mvsas/mv_sas.c 2083:port 0 ctrl sts=0x199800.
[175665.919501] drivers/scsi/mvsas/mv_sas.c 2085:Port 0 irq sts = 0x10000
[175665.919503] drivers/scsi/mvsas/mv_sas.c 2138:notify plug in on phy[0]
[175666.018302] drivers/scsi/mvsas/mv_sas.c 1224:port 0 attach dev info is 0
[175666.018302] drivers/scsi/mvsas/mv_sas.c 1226:port 0 attach sas addr is 0
[175666.028291] drivers/scsi/mvsas/mv_sas.c 378:phy 0 byte dmaded.
[175668.096048] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for device[0]:rc= 0
[175668.096066] ata9: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00
[175668.096739] ata9.00: device reported invalid CHS sector 0
[175668.097379] ata9: status=0x01 { Error }
[175668.098022] ata9: error=0x04 { DriveStatusError }

No fatal errors yet.

-- 
Thomas Fjellstrom
thomas@fjellstrom.ca

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mvsas errors in 2.6.36
  2010-10-31 15:11 ` Thomas Fjellstrom
@ 2010-11-02 17:02   ` Audio Haven
  2010-11-17  7:53   ` Thomas Fjellstrom
  2010-12-07 19:45   ` tomm
  2 siblings, 0 replies; 26+ messages in thread
From: Audio Haven @ 2010-11-02 17:02 UTC (permalink / raw)
  To: thomas; +Cc: Linux Kernel List

Hello Thomas,

I'm seeing similar errors on 2.6.36-rc3 (which should have the same
mvsas driver).
When copying lots of data over samba to a sofware raid6 on top of
mvsas, my samba connection stalls and the following is reported:

drivers/scsi/mvsas/mv_sas.c 1632:mvs_query_task:rc= 5
drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for device[7]:rc= 0
ata18: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00
ata18: status=0x01 { Error }
ata18: error=0x04 { DriveStatusError }
drivers/scsi/mvsas/mv_sas.c 1632:mvs_query_task:rc= 5
drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for device[7]:rc= 0
ata18: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00
ata18: status=0x01 { Error }
ata18: error=0x04 { DriveStatusError }
drivers/scsi/mvsas/mv_sas.c 1632:mvs_query_task:rc= 5
drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for device[6]:rc= 0
ata17: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00
ata17: status=0x01 { Error }
ata17: error=0x04 { DriveStatusError }
drivers/scsi/mvsas/mv_sas.c 1632:mvs_query_task:rc= 5
drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for device[7]:rc= 0
ata18: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00
ata18: status=0x01 { Error }
ata18: error=0x04 { DriveStatusError }
drivers/scsi/mvsas/mv_sas.c 1632:mvs_query_task:rc= 5
drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for device[7]:rc= 0
ata18: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00
ata18: status=0x01 { Error }
ata18: error=0x04 { DriveStatusError }

No data is lost and raid never breaks, but this is a nuisance. The
above can be reproduced when doing lot's of IO to mvsas independent of
samba: e.g. a large local file copy to mvsas or even rm -rf of large
subdirectory in xfs can trigger it. Then these operations become
terribly slow.

Best regards,

Frederic Vanden Poel

On Sun, Oct 31, 2010 at 4:11 PM, Thomas Fjellstrom <thomas@fjellstrom.ca> wrote:
> On October 29, 2010, Thomas Fjellstrom wrote:
>> Good news and bad news, the current mvsas driver in 2.6.36 seems to work
>> better than older kernels with my setup (2 port sas + 5 SATA disks). But I
>> gotten the following messages so far:
>>
>> [  213.856050] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task()
>> mvi=ffff880123200000 task=ffff880122545a40 slot=ffff880123226628
>> slot_idx=x3 [  213.856064] drivers/scsi/mvsas/mv_sas.c
>> 1632:mvs_query_task:rc= 5 [  213.856094] drivers/scsi/mvsas/mv_sas.c
>> 2083:port 0 ctrl sts=0x89800. [  213.856100] drivers/scsi/mvsas/mv_sas.c
>> 2085:Port 0 irq sts = 0x1001 [  213.856111] drivers/scsi/mvsas/mv_sas.c
>> 2111:phy0 Unplug Notice [  213.866069] drivers/scsi/mvsas/mv_sas.c
>> 2083:port 0 ctrl sts=0x199800. [  213.866078] drivers/scsi/mvsas/mv_sas.c
>> 2085:Port 0 irq sts = 0x1081 [  213.887617] drivers/scsi/mvsas/mv_sas.c
>> 2083:port 0 ctrl sts=0x199800. [  213.887625] drivers/scsi/mvsas/mv_sas.c
>> 2085:Port 0 irq sts = 0x10000 [  213.887632] drivers/scsi/mvsas/mv_sas.c
>> 2138:notify plug in on phy[0] [  213.991191] drivers/scsi/mvsas/mv_sas.c
>> 1224:port 0 attach dev info is 0 [  213.991191]
>> drivers/scsi/mvsas/mv_sas.c 1226:port 0 attach sas addr is 0 [
>> 213.995701] drivers/scsi/mvsas/mv_sas.c 378:phy 0 byte dmaded.
>> [  216.064032] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for
>> device[0]:rc= 0 [  216.064049] ata9: translated ATA stat/err 0x01/04 to
>> SCSI SK/ASC/ASCQ 0xb/00/00 [  216.064647] ata9.00: device reported invalid
>> CHS sector 0
>> [  216.065226] ata9: status=0x01 { Error }
>> [  216.065815] ata9: error=0x04 { DriveStatusError }
>> [ 1519.840061] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task()
>> mvi=ffff880123200000 task=ffff88011f944700 slot=ffff880123226680
>> slot_idx=x4 [ 1519.840075] drivers/scsi/mvsas/mv_sas.c
>> 1632:mvs_query_task:rc= 5 [ 1519.840107] drivers/scsi/mvsas/mv_sas.c
>> 2083:port 3 ctrl sts=0x89800. [ 1519.840113] drivers/scsi/mvsas/mv_sas.c
>> 2085:Port 3 irq sts = 0x1001 [ 1519.840124] drivers/scsi/mvsas/mv_sas.c
>> 2111:phy3 Unplug Notice [ 1519.850080] drivers/scsi/mvsas/mv_sas.c
>> 2083:port 3 ctrl sts=0x199800. [ 1519.850086] drivers/scsi/mvsas/mv_sas.c
>> 2085:Port 3 irq sts = 0x1081 [ 1519.854247] drivers/scsi/mvsas/mv_sas.c
>> 2083:port 3 ctrl sts=0x199800. [ 1519.854250] drivers/scsi/mvsas/mv_sas.c
>> 2085:Port 3 irq sts = 0x10000 [ 1519.854252] drivers/scsi/mvsas/mv_sas.c
>> 2138:notify plug in on phy[3] [ 1519.951698] drivers/scsi/mvsas/mv_sas.c
>> 1224:port 3 attach dev info is 2000000 [ 1519.951698]
>> drivers/scsi/mvsas/mv_sas.c 1226:port 3 attach sas addr is 3 [
>> 1519.963251] drivers/scsi/mvsas/mv_sas.c 378:phy 3 byte dmaded.
>> [ 1522.048039] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for
>> device[3]:rc= 0 [ 1522.048056] ata12: translated ATA stat/err 0x01/04 to
>> SCSI SK/ASC/ASCQ 0xb/00/00 [ 1522.048679] ata12.00: device reported
>> invalid CHS sector 0
>> [ 1522.049268] ata12: status=0x01 { Error }
>> [ 1522.049856] ata12: error=0x04 { DriveStatusError }
>> [ 1558.816044] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task()
>> mvi=ffff880123200000 task=ffff880124d89500 slot=ffff880123226680
>> slot_idx=x4 [ 1558.816058] drivers/scsi/mvsas/mv_sas.c
>> 1632:mvs_query_task:rc= 5 [ 1558.816086] drivers/scsi/mvsas/mv_sas.c
>> 2083:port 3 ctrl sts=0x89800. [ 1558.816092] drivers/scsi/mvsas/mv_sas.c
>> 2085:Port 3 irq sts = 0x1001 [ 1558.816103] drivers/scsi/mvsas/mv_sas.c
>> 2111:phy3 Unplug Notice [ 1558.826059] drivers/scsi/mvsas/mv_sas.c
>> 2083:port 3 ctrl sts=0x199800. [ 1558.826066] drivers/scsi/mvsas/mv_sas.c
>> 2085:Port 3 irq sts = 0x1081 [ 1558.829663] drivers/scsi/mvsas/mv_sas.c
>> 2083:port 3 ctrl sts=0x199800. [ 1558.829670] drivers/scsi/mvsas/mv_sas.c
>> 2085:Port 3 irq sts = 0x10000 [ 1558.829677] drivers/scsi/mvsas/mv_sas.c
>> 2138:notify plug in on phy[3] [ 1558.904494] drivers/scsi/mvsas/mv_sas.c
>> 1224:port 3 attach dev info is 2000000 [ 1558.904494]
>> drivers/scsi/mvsas/mv_sas.c 1226:port 3 attach sas addr is 3 [
>> 1558.938424] drivers/scsi/mvsas/mv_sas.c 378:phy 3 byte dmaded.
>> [ 1561.024027] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for
>> device[3]:rc= 0 [ 1561.024044] ata12: translated ATA stat/err 0x01/04 to
>> SCSI SK/ASC/ASCQ 0xb/00/00 [ 1561.024652] ata12.00: device reported
>> invalid CHS sector 0
>> [ 1561.025242] ata12: status=0x01 { Error }
>> [ 1561.025834] ata12: error=0x04 { DriveStatusError }
>> [ 1594.800036] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task()
>> mvi=ffff880123200000 task=ffff88011f945a40 slot=ffff880123226680
>> slot_idx=x4 [ 1594.800051] drivers/scsi/mvsas/mv_sas.c
>> 1632:mvs_query_task:rc= 5 [ 1594.800077] drivers/scsi/mvsas/mv_sas.c
>> 2083:port 3 ctrl sts=0x89800. [ 1594.800083] drivers/scsi/mvsas/mv_sas.c
>> 2085:Port 3 irq sts = 0x1001 [ 1594.800094] drivers/scsi/mvsas/mv_sas.c
>> 2111:phy3 Unplug Notice [ 1594.810048] drivers/scsi/mvsas/mv_sas.c
>> 2083:port 3 ctrl sts=0x199800. [ 1594.810055] drivers/scsi/mvsas/mv_sas.c
>> 2085:Port 3 irq sts = 0x1081 [ 1594.814327] drivers/scsi/mvsas/mv_sas.c
>> 2083:port 3 ctrl sts=0x199800. [ 1594.814330] drivers/scsi/mvsas/mv_sas.c
>> 2085:Port 3 irq sts = 0x10000 [ 1594.814332] drivers/scsi/mvsas/mv_sas.c
>> 2138:notify plug in on phy[3] [ 1594.882000] drivers/scsi/mvsas/mv_sas.c
>> 1224:port 3 attach dev info is 2000000 [ 1594.882000]
>> drivers/scsi/mvsas/mv_sas.c 1226:port 3 attach sas addr is 3 [
>> 1594.923382] drivers/scsi/mvsas/mv_sas.c 378:phy 3 byte dmaded.
>> [ 1597.008031] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for
>> device[3]:rc= 0 [ 1597.008048] ata12: translated ATA stat/err 0x01/04 to
>> SCSI SK/ASC/ASCQ 0xb/00/00 [ 1597.008675] ata12.00: device reported
>> invalid CHS sector 0
>> [ 1597.009271] ata12: status=0x01 { Error }
>> [ 1597.009871] ata12: error=0x04 { DriveStatusError }
>> [ 2193.824051] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task()
>> mvi=ffff880123200000 task=ffff880009c7c540 slot=ffff8801232265d0
>> slot_idx=x2 [ 2193.824065] drivers/scsi/mvsas/mv_sas.c
>> 1632:mvs_query_task:rc= 5 [ 2193.824092] drivers/scsi/mvsas/mv_sas.c
>> 2083:port 0 ctrl sts=0x89800. [ 2193.824099] drivers/scsi/mvsas/mv_sas.c
>> 2085:Port 0 irq sts = 0x1001001 [ 2193.824109] drivers/scsi/mvsas/mv_sas.c
>> 2111:phy0 Unplug Notice [ 2193.834062] drivers/scsi/mvsas/mv_sas.c
>> 2083:port 0 ctrl sts=0x199800. [ 2193.834067] drivers/scsi/mvsas/mv_sas.c
>> 2085:Port 0 irq sts = 0x1001081 [ 2193.855272] drivers/scsi/mvsas/mv_sas.c
>> 2083:port 0 ctrl sts=0x199800. [ 2193.855279] drivers/scsi/mvsas/mv_sas.c
>> 2085:Port 0 irq sts = 0x10000 [ 2193.855286] drivers/scsi/mvsas/mv_sas.c
>> 2138:notify plug in on phy[0] [ 2193.859234] drivers/scsi/mvsas/mv_sas.c
>> 1224:port 0 attach dev info is 0 [ 2193.859234]
>> drivers/scsi/mvsas/mv_sas.c 1226:port 0 attach sas addr is 0 [
>> 2193.959270] drivers/scsi/mvsas/mv_sas.c 378:phy 0 byte dmaded.
>> [ 2196.032026] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for
>> device[0]:rc= 0 [ 2196.032045] ata9: translated ATA stat/err 0x01/04 to
>> SCSI SK/ASC/ASCQ 0xb/00/00 [ 2196.032676] ata9: status=0x01 { Error }
>> [ 2196.033274] ata9: error=0x04 { DriveStatusError }
>> [ 2440.800047] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task()
>> mvi=ffff880123200000 task=ffff880010f36700 slot=ffff880123226628
>> slot_idx=x3 [ 2440.800061] drivers/scsi/mvsas/mv_sas.c
>> 1632:mvs_query_task:rc= 5 [ 2440.800090] drivers/scsi/mvsas/mv_sas.c
>> 2083:port 7 ctrl sts=0x89800. [ 2440.800096] drivers/scsi/mvsas/mv_sas.c
>> 2085:Port 7 irq sts = 0x1001 [ 2440.800107] drivers/scsi/mvsas/mv_sas.c
>> 2111:phy7 Unplug Notice [ 2440.810060] drivers/scsi/mvsas/mv_sas.c
>> 2083:port 7 ctrl sts=0x199800. [ 2440.810065] drivers/scsi/mvsas/mv_sas.c
>> 2085:Port 7 irq sts = 0x1081 [ 2440.831453] drivers/scsi/mvsas/mv_sas.c
>> 2083:port 7 ctrl sts=0x199800. [ 2440.831460] drivers/scsi/mvsas/mv_sas.c
>> 2085:Port 7 irq sts = 0x10000 [ 2440.831467] drivers/scsi/mvsas/mv_sas.c
>> 2138:notify plug in on phy[7] [ 2440.880053] drivers/scsi/mvsas/mv_sas.c
>> 1224:port 7 attach dev info is 4000000 [ 2440.880053]
>> drivers/scsi/mvsas/mv_sas.c 1226:port 7 attach sas addr is 7 [
>> 2440.940497] drivers/scsi/mvsas/mv_sas.c 378:phy 7 byte dmaded.
>> [ 2443.008033] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for
>> device[4]:rc= 0 [ 2443.008052] ata13: translated ATA stat/err 0x01/04 to
>> SCSI SK/ASC/ASCQ 0xb/00/00 [ 2443.008685] ata13: status=0x01 { Error }
>> [ 2443.009295] ata13: error=0x04 { DriveStatusError }
>> [ 2675.808044] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task()
>> mvi=ffff880123200000 task=ffff88011aae3500 slot=ffff880123226578
>> slot_idx=x1 [ 2675.808058] drivers/scsi/mvsas/mv_sas.c
>> 1632:mvs_query_task:rc= 5 [ 2675.808088] drivers/scsi/mvsas/mv_sas.c
>> 2083:port 2 ctrl sts=0x89800. [ 2675.808094] drivers/scsi/mvsas/mv_sas.c
>> 2085:Port 2 irq sts = 0x1001 [ 2675.808104] drivers/scsi/mvsas/mv_sas.c
>> 2111:phy2 Unplug Notice [ 2675.818051] drivers/scsi/mvsas/mv_sas.c
>> 2083:port 2 ctrl sts=0x199800. [ 2675.818057] drivers/scsi/mvsas/mv_sas.c
>> 2085:Port 2 irq sts = 0x1081 [ 2675.839505] drivers/scsi/mvsas/mv_sas.c
>> 2083:port 2 ctrl sts=0x199800. [ 2675.839513] drivers/scsi/mvsas/mv_sas.c
>> 2085:Port 2 irq sts = 0x10000 [ 2675.839519] drivers/scsi/mvsas/mv_sas.c
>> 2138:notify plug in on phy[2] [ 2675.874139] drivers/scsi/mvsas/mv_sas.c
>> 1224:port 2 attach dev info is 4 [ 2675.874139]
>> drivers/scsi/mvsas/mv_sas.c 1226:port 2 attach sas addr is 2 [
>> 2675.936683] drivers/scsi/mvsas/mv_sas.c 378:phy 2 byte dmaded.
>> [ 2678.016055] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for
>> device[2]:rc= 0 [ 2678.016075] ata11: translated ATA stat/err 0x01/04 to
>> SCSI SK/ASC/ASCQ 0xb/00/00 [ 2678.016706] ata11: status=0x01 { Error }
>> [ 2678.017315] ata11: error=0x04 { DriveStatusError }
>> [ 2678.017964] ata9: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ
>> 0xb/00/00 [ 2678.018573] ata9: status=0x01 { Error }
>> [ 2678.019175] ata9: error=0x04 { DriveStatusError }
>>
>> I did not unplug a disk, the errors seem to be spurious.
>>
>> Otherwise though things seem to be working. At least so far. The
>> mv_abort_task part is very familiar, the older version of this driver
>> would do it right after attempting to build/activate the md raid5 array
>> that lives on this controller. Except the controller would lock up, and
>> all drives would become inaccessible.
>>
>> I'm going to attempt to grow this array today, so long as the xfs_fsr that
>> I started doesn't cause the array to fail.
>>
>> If I keep getting mv_abort_task errors, I'll have to back down to the copy
>> of the driver I got from Andy Yan. I've managed to patch it up to compile
>> for 2.6.36 just now, I just hope it'll work at least as well as it did
>> with 2.6.34. At the very least I didn't get these errors.
>>
>> Some background, the disks attached to the card are (5) Seagate 7200.12 1TB
>> disks, using SAS->SATA cables. Machine is a amd64 Phenom II X4 810 w/4G
>> ram running debian sid and a vanila 2.6.36 kernel. The card is a
>> AOC-SASLP-MV8, according to lspci:
>>
>> 04:00.0 SCSI storage controller: Marvell Technology Group Ltd.
>> MV64460/64461/64462 System Controller, Revision B (rev 01)
>>
>> according to dmesg:
>>
>> [    2.819325] mvsas 0000:04:00.0: mvsas: driver version 0.8.2
>> [    2.819394] mvsas 0000:04:00.0: PCI INT A -> GSI 19 (level, low) -> IRQ
>> 19 [    2.819454] mvsas 0000:04:00.0: setting latency timer to 64
>> [    2.820952] mvsas 0000:04:00.0: mvsas: PCI-E x4, Bandwidth Usage: 2.5
>> Gbps [    7.203222] drivers/scsi/mvsas/mv_sas.c 1224:port 0 attach dev
>> info is 0 [    7.203225] drivers/scsi/mvsas/mv_sas.c 1226:port 0 attach
>> sas addr is 0 [    7.403220] drivers/scsi/mvsas/mv_sas.c 1224:port 1
>> attach dev info is 0 [    7.403223] drivers/scsi/mvsas/mv_sas.c 1226:port
>> 1 attach sas addr is 1 [    7.603221] drivers/scsi/mvsas/mv_sas.c
>> 1224:port 2 attach dev info is 4 [    7.603223]
>> drivers/scsi/mvsas/mv_sas.c 1226:port 2 attach sas addr is 2 [
>> 7.803221] drivers/scsi/mvsas/mv_sas.c 1224:port 3 attach dev info is
>> 2000000 [    7.803224] drivers/scsi/mvsas/mv_sas.c 1226:port 3 attach sas
>> addr is 3 [    7.904015] drivers/scsi/mvsas/mv_sas.c 1224:port 4 attach
>> dev info is 0 [    7.904018] drivers/scsi/mvsas/mv_sas.c 1226:port 4
>> attach sas addr is 0 [    8.008014] drivers/scsi/mvsas/mv_sas.c 1224:port
>> 5 attach dev info is 0 [    8.008017] drivers/scsi/mvsas/mv_sas.c
>> 1226:port 5 attach sas addr is 0 [    8.112014]
>> drivers/scsi/mvsas/mv_sas.c 1224:port 6 attach dev info is 0 [
>> 8.112016] drivers/scsi/mvsas/mv_sas.c 1226:port 6 attach sas addr is 0 [
>>  8.315223] drivers/scsi/mvsas/mv_sas.c 1224:port 7 attach dev info is
>> 4000000 [    8.315226] drivers/scsi/mvsas/mv_sas.c 1226:port 7 attach sas
>> addr is 7 [    8.315230] scsi8 : mvsas
>> [    8.315620] drivers/scsi/mvsas/mv_sas.c 378:phy 0 byte dmaded.
>> [    8.315624] drivers/scsi/mvsas/mv_sas.c 378:phy 1 byte dmaded.
>> [    8.315628] drivers/scsi/mvsas/mv_sas.c 378:phy 2 byte dmaded.
>> [    8.315632] drivers/scsi/mvsas/mv_sas.c 378:phy 3 byte dmaded.
>> [    8.315636] drivers/scsi/mvsas/mv_sas.c 378:phy 7 byte dmaded.
>> [    8.316762] drivers/scsi/mvsas/mv_sas.c 1388:found dev[0:5] is gone.
>> [    8.384626] drivers/scsi/mvsas/mv_sas.c 1388:found dev[1:5] is gone.
>> [    8.452444] drivers/scsi/mvsas/mv_sas.c 1388:found dev[2:5] is gone.
>> [    8.520181] drivers/scsi/mvsas/mv_sas.c 1388:found dev[3:5] is gone.
>> [    8.523810] drivers/scsi/mvsas/mv_sas.c 1388:found dev[4:5] is gone.
>>
>> I just hope the raid5 reshape I'm about to do doesn't crap its pants
>> because of the errors above.
>>
>> I'd like to help test any fixes or changes if needed. Let me know.
>>
>> Thanks again.
>
> After a couple days of uptime, the messages are still happening:
>
> [175665.888045] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task() mvi=ffff880123b00000 task=ffff88010e77e000 slot=ffff880123b26680 slot_idx=x4
> [175665.888059] drivers/scsi/mvsas/mv_sas.c 1632:mvs_query_task:rc= 5
> [175665.888086] drivers/scsi/mvsas/mv_sas.c 2083:port 0 ctrl sts=0x89800.
> [175665.888092] drivers/scsi/mvsas/mv_sas.c 2085:Port 0 irq sts = 0x1001
> [175665.888103] drivers/scsi/mvsas/mv_sas.c 2111:phy0 Unplug Notice
> [175665.898053] drivers/scsi/mvsas/mv_sas.c 2083:port 0 ctrl sts=0x199800.
> [175665.898061] drivers/scsi/mvsas/mv_sas.c 2085:Port 0 irq sts = 0x1081
> [175665.919498] drivers/scsi/mvsas/mv_sas.c 2083:port 0 ctrl sts=0x199800.
> [175665.919501] drivers/scsi/mvsas/mv_sas.c 2085:Port 0 irq sts = 0x10000
> [175665.919503] drivers/scsi/mvsas/mv_sas.c 2138:notify plug in on phy[0]
> [175666.018302] drivers/scsi/mvsas/mv_sas.c 1224:port 0 attach dev info is 0
> [175666.018302] drivers/scsi/mvsas/mv_sas.c 1226:port 0 attach sas addr is 0
> [175666.028291] drivers/scsi/mvsas/mv_sas.c 378:phy 0 byte dmaded.
> [175668.096048] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for device[0]:rc= 0
> [175668.096066] ata9: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00
> [175668.096739] ata9.00: device reported invalid CHS sector 0
> [175668.097379] ata9: status=0x01 { Error }
> [175668.098022] ata9: error=0x04 { DriveStatusError }
>
> No fatal errors yet.
>
> --
> Thomas Fjellstrom
> thomas@fjellstrom.ca
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mvsas errors in 2.6.36
  2010-10-31 15:11 ` Thomas Fjellstrom
  2010-11-02 17:02   ` Audio Haven
@ 2010-11-17  7:53   ` Thomas Fjellstrom
  2010-11-17  8:24     ` Andre Tomt
  2010-12-07 19:45   ` tomm
  2 siblings, 1 reply; 26+ messages in thread
From: Thomas Fjellstrom @ 2010-11-17  7:53 UTC (permalink / raw)
  To: Linux Kernel List; +Cc: linux-scsi

On October 31, 2010, Thomas Fjellstrom wrote:
> On October 29, 2010, Thomas Fjellstrom wrote:
> > Good news and bad news, the current mvsas driver in 2.6.36 seems to work
> > better than older kernels with my setup (2 port sas + 5 SATA disks). But
> > I gotten the following messages so far:
> > 
[snip]
> > I did not unplug a disk, the errors seem to be spurious.
> > 
> > Otherwise though things seem to be working. At least so far. The
> > mv_abort_task part is very familiar, the older version of this driver
> > would do it right after attempting to build/activate the md raid5 array
> > that lives on this controller. Except the controller would lock up, and
> > all drives would become inaccessible.
> > 
> > I'm going to attempt to grow this array today, so long as the xfs_fsr
> > that I started doesn't cause the array to fail.
> > 
> > If I keep getting mv_abort_task errors, I'll have to back down to the
> > copy of the driver I got from Andy Yan. I've managed to patch it up to
> > compile for 2.6.36 just now, I just hope it'll work at least as well as
> > it did with 2.6.34. At the very least I didn't get these errors.
> > 
> > Some background, the disks attached to the card are (5) Seagate 7200.12
> > 1TB disks, using SAS->SATA cables. Machine is a amd64 Phenom II X4 810
> > w/4G ram running debian sid and a vanila 2.6.36 kernel. The card is a
> > AOC-SASLP-MV8, according to lspci:
> > 
> > 04:00.0 SCSI storage controller: Marvell Technology Group Ltd.
> > MV64460/64461/64462 System Controller, Revision B (rev 01)
> > 
> > according to dmesg:
> > 
[snip]
> > I just hope the raid5 reshape I'm about to do doesn't crap its pants
> > because of the errors above.
> > 
> > I'd like to help test any fixes or changes if needed. Let me know.
> > 
> > Thanks again.
> 
> After a couple days of uptime, the messages are still happening:
> 
[snip]
> No fatal errors yet.

Still no fatal errors, but the problem is still happening regularly. It causes 
a pause in disk io of a couple seconds at least. Really quite annoying.

One thing thats got me wondering, is could this be a power issue? It almost 
seems like (from the messages) that a single drive (any drive) is freaking 
out, and returning an error that probably shouldn't happen (no CHS 0?), which 
could mean the drive is underpowered and the firmware is flipping out. I'm not 
entirely sure. The system has a 750w decent quality Antec power supply. The 
total power use of the system shouldn't come over half that (phenom II x4 810 
cpu, gigabyte ma790fxtud5p mb, low profile nvidia 9400GS gpu, 8 sata hdds, 3 
fans, etc). I'm /mostly/ sure the 12v rails are spread out evenly, but I have 
yet to make absolutely sure.

But then it doesn't seem as if the root drives are ever flipping out. Theres 
two 500GB Seagate 7200.12 drives md raid1'ed on the motherboard's (SB750) sata 
II controller. They work fine, no messages regarding them at all the entire 
time. However I get frequent and repeated messages from all drives on the 
mvsas based controller.

So color me stumped.

-- 
Thomas Fjellstrom
thomas@fjellstrom.ca

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mvsas errors in 2.6.36
  2010-11-17  7:53   ` Thomas Fjellstrom
@ 2010-11-17  8:24     ` Andre Tomt
  2010-12-02  6:29       ` Thomas Fjellstrom
  0 siblings, 1 reply; 26+ messages in thread
From: Andre Tomt @ 2010-11-17  8:24 UTC (permalink / raw)
  To: thomas; +Cc: Linux Kernel List, linux-scsi

On 11/17/2010 08:53 AM, Thomas Fjellstrom wrote:
[snip]
> Still no fatal errors, but the problem is still happening regularly. It causes
> a pause in disk io of a couple seconds at least. Really quite annoying.
[snip]

After the mvsas update in 2.6.35 this started happening to me as well; 
at least its better than the previous state - not working.. ;-) However, 
after rolling a new 2.6.35 with the following fix that is queued up for 
the upcoming 2.6.35 and 2.6.36 stable releases, they seem to have 
dissapeared - 3 days and counting.

http://git.kernel.org/?p=linux/kernel/git/stable/stable-queue.git;a=blob_plain;f=queue-2.6.33/libsas-fix-ncq-mixing-with-non-ncq.patch;h=b6d7c92094d95ad67a3b23c2e09c25d4fbd0f46b;hb=HEAD

The fix is queued up for the next 2.6.36 and 2.6.35 stable point-releases.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mvsas errors in 2.6.36
  2010-11-17  8:24     ` Andre Tomt
@ 2010-12-02  6:29       ` Thomas Fjellstrom
  2010-12-02  9:48         ` Thomas Fjellstrom
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Fjellstrom @ 2010-12-02  6:29 UTC (permalink / raw)
  To: Andre Tomt; +Cc: Linux Kernel List, linux-scsi

On November 17, 2010, you wrote:
> On 11/17/2010 08:53 AM, Thomas Fjellstrom wrote:
> [snip]
> 
> > Still no fatal errors, but the problem is still happening regularly. It
> > causes a pause in disk io of a couple seconds at least. Really quite
> > annoying.
> >
> > One thing thats got me wondering, is could this be a power issue?
> > It almost seems like (from the messages) that a single drive (any drive)
> > is freaking out, and returning an error that probably shouldn't happen (no 
> > CHS 0?), which could mean the drive is underpowered and the firmware is 
> > flipping out. I'm not entirely sure. The system has a 750w decent quality
> > Antec power supply. The total power use of the system shouldn't come over 
> > half that (phenom II x4 810 cpu, gigabyte ma790fxtud5p mb, low profile 
> > nvidia 9400GS gpu, 8 sata hdds, 3 fans, etc). I'm mostly sure the 12v 
> > rails are spread out evenly, but I have yet to make absolutely sure.

Made absolute sure. I had been worrying that I was overloading one of the 
rails on the PSU, but it turns out that it isn't a multi 12v rail PSU after 
all. The box and advertising says it is, but the electronics inside all say 
its a single 12v rail device.

> [snip]
> 
> After the mvsas update in 2.6.35 this started happening to me as well;
> at least its better than the previous state - not working.. ;-) However,
> after rolling a new 2.6.35 with the following fix that is queued up for
> the upcoming 2.6.35 and 2.6.36 stable releases, they seem to have
> dissapeared - 3 days and counting.
> 
> http://git.kernel.org/?p=linux/kernel/git/stable/stable-queue.git;a=blob_pl
> ain;f=queue-2.6.33/libsas-fix-ncq-mixing-with-non-ncq.patch;h=b6d7c92094d95
> ad67a3b23c2e09c25d4fbd0f46b;hb=HEAD
> 
> The fix is queued up for the next 2.6.36 and 2.6.35 stable point-releases.

Ahah. I wonder how I missed that when I first read it. I'll have to give the 
stable .36 kernel a try. Thanks!


-- 
Thomas Fjellstrom
thomas@fjellstrom.ca

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mvsas errors in 2.6.36
  2010-12-02  6:29       ` Thomas Fjellstrom
@ 2010-12-02  9:48         ` Thomas Fjellstrom
  2010-12-02 13:17           ` Spelic
  2010-12-03 16:39           ` Thomas Fjellstrom
  0 siblings, 2 replies; 26+ messages in thread
From: Thomas Fjellstrom @ 2010-12-02  9:48 UTC (permalink / raw)
  To: Andre Tomt; +Cc: Linux Kernel List, linux-scsi

On December 1, 2010, Thomas Fjellstrom wrote:
> On November 17, 2010, you wrote:
> > On 11/17/2010 08:53 AM, Thomas Fjellstrom wrote:
> > [snip]
> > 
> > > Still no fatal errors, but the problem is still happening regularly. It
> > > causes a pause in disk io of a couple seconds at least. Really quite
> > > annoying.
> > > 
> > > One thing thats got me wondering, is could this be a power issue?
> > > It almost seems like (from the messages) that a single drive (any
> > > drive) is freaking out, and returning an error that probably shouldn't
> > > happen (no CHS 0?), which could mean the drive is underpowered and the
> > > firmware is flipping out. I'm not entirely sure. The system has a 750w
> > > decent quality Antec power supply. The total power use of the system
> > > shouldn't come over half that (phenom II x4 810 cpu, gigabyte
> > > ma790fxtud5p mb, low profile nvidia 9400GS gpu, 8 sata hdds, 3 fans,
> > > etc). I'm mostly sure the 12v rails are spread out evenly, but I have
> > > yet to make absolutely sure.
> 
> Made absolute sure. I had been worrying that I was overloading one of the
> rails on the PSU, but it turns out that it isn't a multi 12v rail PSU after
> all. The box and advertising says it is, but the electronics inside all say
> its a single 12v rail device.
> 
> > [snip]
> > 
> > After the mvsas update in 2.6.35 this started happening to me as well;
> > at least its better than the previous state - not working.. ;-) However,
> > after rolling a new 2.6.35 with the following fix that is queued up for
> > the upcoming 2.6.35 and 2.6.36 stable releases, they seem to have
> > dissapeared - 3 days and counting.
> > 
> > http://git.kernel.org/?p=linux/kernel/git/stable/stable-queue.git;a=blob_
> > pl
> > ain;f=queue-2.6.33/libsas-fix-ncq-mixing-with-non-ncq.patch;h=b6d7c92094
> > d95 ad67a3b23c2e09c25d4fbd0f46b;hb=HEAD
> > 
> > The fix is queued up for the next 2.6.36 and 2.6.35 stable
> > point-releases.
> 
> Ahah. I wonder how I missed that when I first read it. I'll have to give
> the stable .36 kernel a try. Thanks!

No fix so far:

[ 2539.040104] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task() mvi=ffff880222f00000 task=ffff88018b3e2980 slot=ffff880222f265d0 slot_idx=x2
[ 2539.040118] drivers/scsi/mvsas/mv_sas.c 1632:mvs_query_task:rc= 5
[ 2539.040154] drivers/scsi/mvsas/mv_sas.c 2083:port 7 ctrl sts=0x89800.
[ 2539.040163] drivers/scsi/mvsas/mv_sas.c 2085:Port 7 irq sts = 0x1001001
[ 2539.040176] drivers/scsi/mvsas/mv_sas.c 2111:phy7 Unplug Notice
[ 2539.050220] drivers/scsi/mvsas/mv_sas.c 2083:port 7 ctrl sts=0x199800.
[ 2539.050229] drivers/scsi/mvsas/mv_sas.c 2085:Port 7 irq sts = 0x1001081
[ 2539.071157] drivers/scsi/mvsas/mv_sas.c 2083:port 7 ctrl sts=0x199800.
[ 2539.071165] drivers/scsi/mvsas/mv_sas.c 2085:Port 7 irq sts = 0x10000
[ 2539.071173] drivers/scsi/mvsas/mv_sas.c 2138:notify plug in on phy[7]
[ 2539.081142] drivers/scsi/mvsas/mv_sas.c 1224:port 7 attach dev info is 5000002
[ 2539.081142] drivers/scsi/mvsas/mv_sas.c 1226:port 7 attach sas addr is 7
[ 2539.081142] drivers/scsi/mvsas/mv_sas.c 378:phy 7 byte dmaded.
[ 2541.270047] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for device[5]:rc= 0
[ 2541.270066] ata14: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00
[ 2541.270926] ata14: status=0x01 { Error }
[ 2541.271747] ata14: error=0x04 { DriveStatusError }

That appeared after about 42 minutes of uptime.

-- 
Thomas Fjellstrom
thomas@fjellstrom.ca

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mvsas errors in 2.6.36
  2010-12-02  9:48         ` Thomas Fjellstrom
@ 2010-12-02 13:17           ` Spelic
  2010-12-02 13:37             ` Thomas Fjellstrom
                               ` (2 more replies)
  2010-12-03 16:39           ` Thomas Fjellstrom
  1 sibling, 3 replies; 26+ messages in thread
From: Spelic @ 2010-12-02 13:17 UTC (permalink / raw)
  To: thomas; +Cc: linux-scsi

On 12/02/2010 10:48 AM, Thomas Fjellstrom wrote:
>
>>> http://git.kernel.org/?p=linux/kernel/git/stable/stable-queue.git;a=blob_
>>> pl
>>> ain;f=queue-2.6.33/libsas-fix-ncq-mixing-with-non-ncq.patch;h=b6d7c92094
>>> d95 ad67a3b23c2e09c25d4fbd0f46b;hb=HEAD
>>>
>>> The fix is queued up for the next 2.6.36 and 2.6.35 stable
>>> point-releases.
>>>        
>> Ahah. I wonder how I missed that when I first read it. I'll have to give
>> the stable .36 kernel a try. Thanks!
>>      
> No fix so far:
>    

If you tried v2.6.36 or v2.6.36.1 like you said you are not getting the 
above mentioned commit.
It's already applied only in 2.6.37-rc1..rc4
What version have you tried exactly?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mvsas errors in 2.6.36
  2010-12-02 13:17           ` Spelic
@ 2010-12-02 13:37             ` Thomas Fjellstrom
  2010-12-03  2:16             ` Thomas Fjellstrom
  2010-12-05 10:45             ` Audio Haven
  2 siblings, 0 replies; 26+ messages in thread
From: Thomas Fjellstrom @ 2010-12-02 13:37 UTC (permalink / raw)
  To: Spelic; +Cc: linux-scsi

On December 2, 2010, you wrote:
> On 12/02/2010 10:48 AM, Thomas Fjellstrom wrote:
> >
> >>> http://git.kernel.org/?p=linux/kernel/git/stable/stable-
queue.git;a=blob_
> >>> pl
> >>> ain;f=queue-2.6.33/libsas-fix-ncq-mixing-with-non-ncq.patch;h=b6d7c92094
> >>> d95 ad67a3b23c2e09c25d4fbd0f46b;hb=HEAD
> >>>
> >>> The fix is queued up for the next 2.6.36 and 2.6.35 stable
> >>> point-releases.
> >>>        
> >> Ahah. I wonder how I missed that when I first read it. I'll have to give
> >> the stable .36 kernel a try. Thanks!
> >>      
> > No fix so far:
> >    
> 
> If you tried v2.6.36 or v2.6.36.1 like you said you are not getting the 
> above mentioned commit.
> It's already applied only in 2.6.37-rc1..rc4
> What version have you tried exactly?
> 

2.6.36, and from the machine itself:

root@boris:~# uname -a
Linux boris 2.6.36.1+ #2 SMP Thu Dec 2 00:37:57 MST 2010 x86_64 GNU/Linux

There are fewer errors it seems though. Its been a few hours and theres only 
one set, rather than one every 30-60 minutes or so.

I pulled 2.6.36.1 directly from the kernel.org 2.6.36.y git repo, and in case 
you're wondering, the extra patches giving it the + is the group scheduling 
patch. Otherwise its a vanila 2.6.36.1.

-- 
Thomas Fjellstrom
thomas@fjellstrom.ca

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mvsas errors in 2.6.36
  2010-12-02 13:17           ` Spelic
  2010-12-02 13:37             ` Thomas Fjellstrom
@ 2010-12-03  2:16             ` Thomas Fjellstrom
  2010-12-05 10:45             ` Audio Haven
  2 siblings, 0 replies; 26+ messages in thread
From: Thomas Fjellstrom @ 2010-12-03  2:16 UTC (permalink / raw)
  To: Spelic; +Cc: linux-scsi

On December 2, 2010, Spelic wrote:
> On 12/02/2010 10:48 AM, Thomas Fjellstrom wrote:
> >>> http://git.kernel.org/?p=linux/kernel/git/stable/stable-queue.git;a=blo
> >>> b_ pl
> >>> ain;f=queue-2.6.33/libsas-fix-ncq-mixing-with-non-ncq.patch;h=b6d7c9209
> >>> 4 d95 ad67a3b23c2e09c25d4fbd0f46b;hb=HEAD
> >>> 
> >>> The fix is queued up for the next 2.6.36 and 2.6.35 stable
> >>> point-releases.
> >> 
> >> Ahah. I wonder how I missed that when I first read it. I'll have to give
> >> the stable .36 kernel a try. Thanks!
> > 
> > No fix so far:
> If you tried v2.6.36 or v2.6.36.1 like you said you are not getting the
> above mentioned commit.
> It's already applied only in 2.6.37-rc1..rc4
> What version have you tried exactly?

I see I misread a message again. I checked the 2.6.36.1 release changelog and 
it mentions a "libsas fix ncq mixing with non ncq" changeset.

http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.36.y.git;a=commit;h=82fa8bea5ecadf3c2278f677b500905f9ddb7ac0

That seems to be part of 2.6.36.1, so if theres a different patch that I'm 
missing, please let me know.

Thank you :)

-- 
Thomas Fjellstrom
thomas@fjellstrom.ca

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mvsas errors in 2.6.36
  2010-12-02  9:48         ` Thomas Fjellstrom
  2010-12-02 13:17           ` Spelic
@ 2010-12-03 16:39           ` Thomas Fjellstrom
  2010-12-03 20:31             ` David Milburn
  1 sibling, 1 reply; 26+ messages in thread
From: Thomas Fjellstrom @ 2010-12-03 16:39 UTC (permalink / raw)
  To: Andre Tomt; +Cc: Linux Kernel List, linux-scsi

On December 2, 2010, Thomas Fjellstrom wrote:
> On December 1, 2010, Thomas Fjellstrom wrote:
> > On November 17, 2010, you wrote:
> > > On 11/17/2010 08:53 AM, Thomas Fjellstrom wrote:
> > > [snip]
> > > 
> > > > Still no fatal errors, but the problem is still happening regularly.
> > > > It causes a pause in disk io of a couple seconds at least. Really
> > > > quite annoying.
> > > > 
> > > > One thing thats got me wondering, is could this be a power issue?
> > > > It almost seems like (from the messages) that a single drive (any
> > > > drive) is freaking out, and returning an error that probably
> > > > shouldn't happen (no CHS 0?), which could mean the drive is
> > > > underpowered and the firmware is flipping out. I'm not entirely
> > > > sure. The system has a 750w decent quality Antec power supply. The
> > > > total power use of the system shouldn't come over half that (phenom
> > > > II x4 810 cpu, gigabyte ma790fxtud5p mb, low profile nvidia 9400GS
> > > > gpu, 8 sata hdds, 3 fans, etc). I'm mostly sure the 12v rails are
> > > > spread out evenly, but I have yet to make absolutely sure.
> > 
> > Made absolute sure. I had been worrying that I was overloading one of the
> > rails on the PSU, but it turns out that it isn't a multi 12v rail PSU
> > after all. The box and advertising says it is, but the electronics
> > inside all say its a single 12v rail device.
> > 
> > > [snip]
> > > 
> > > After the mvsas update in 2.6.35 this started happening to me as well;
> > > at least its better than the previous state - not working.. ;-)
> > > However, after rolling a new 2.6.35 with the following fix that is
> > > queued up for the upcoming 2.6.35 and 2.6.36 stable releases, they
> > > seem to have dissapeared - 3 days and counting.
> > > 
> > > http://git.kernel.org/?p=linux/kernel/git/stable/stable-queue.git;a=blo
> > > b_ pl
> > > ain;f=queue-2.6.33/libsas-fix-ncq-mixing-with-non-ncq.patch;h=b6d7c9209
> > > 4 d95 ad67a3b23c2e09c25d4fbd0f46b;hb=HEAD
> > > 
> > > The fix is queued up for the next 2.6.36 and 2.6.35 stable
> > > point-releases.
> > 
> > Ahah. I wonder how I missed that when I first read it. I'll have to give
> > the stable .36 kernel a try. Thanks!
> 
> No fix so far:
> 
> [ 2539.040104] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task()
> mvi=ffff880222f00000 task=ffff88018b3e2980 slot=ffff880222f265d0
> slot_idx=x2 [ 2539.040118] drivers/scsi/mvsas/mv_sas.c
> 1632:mvs_query_task:rc= 5 [ 2539.040154] drivers/scsi/mvsas/mv_sas.c
> 2083:port 7 ctrl sts=0x89800. [ 2539.040163] drivers/scsi/mvsas/mv_sas.c
> 2085:Port 7 irq sts = 0x1001001 [ 2539.040176] drivers/scsi/mvsas/mv_sas.c
> 2111:phy7 Unplug Notice [ 2539.050220] drivers/scsi/mvsas/mv_sas.c
> 2083:port 7 ctrl sts=0x199800. [ 2539.050229] drivers/scsi/mvsas/mv_sas.c
> 2085:Port 7 irq sts = 0x1001081 [ 2539.071157] drivers/scsi/mvsas/mv_sas.c
> 2083:port 7 ctrl sts=0x199800. [ 2539.071165] drivers/scsi/mvsas/mv_sas.c
> 2085:Port 7 irq sts = 0x10000 [ 2539.071173] drivers/scsi/mvsas/mv_sas.c
> 2138:notify plug in on phy[7] [ 2539.081142] drivers/scsi/mvsas/mv_sas.c
> 1224:port 7 attach dev info is 5000002 [ 2539.081142]
> drivers/scsi/mvsas/mv_sas.c 1226:port 7 attach sas addr is 7 [
> 2539.081142] drivers/scsi/mvsas/mv_sas.c 378:phy 7 byte dmaded.
> [ 2541.270047] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for
> device[5]:rc= 0 [ 2541.270066] ata14: translated ATA stat/err 0x01/04 to
> SCSI SK/ASC/ASCQ 0xb/00/00 [ 2541.270926] ata14: status=0x01 { Error }
> [ 2541.271747] ata14: error=0x04 { DriveStatusError }
> 
> That appeared after about 42 minutes of uptime.

So after about 32 hours of uptime theres been 36 separate events. Each spits
out similar messages as above, and each comes with a noticeable pause while
the drive is reset.

There are a number of possible reasons that I'm still having issues:
 - I managed to mess up the git checkout
 - My problem isn't related to the fix
 - The fix doesn't cover all cases of the problem it meant to fix

I'm not certain which of them it is, I'd be more inclined to think I messed up
the checkout, as I did patch something in, but the patches were completely
unrelated and shouldn't have affected the scsi or ata systems at all. At this
point I'm just grasping at straws.

In case my card is somehow different than expected, I'll paste the lspci info
for it: (AOC-SASLP-MV8)

04:00.0 SCSI storage controller: Marvell Technology Group Ltd. MV64460/64461/64462 System Controller, Revision B (rev 01)
        Subsystem: Super Micro Computer Inc Device 0500
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 19
        Region 2: I/O ports at df00 [size=128]
        Region 4: Memory at fdef0000 (64-bit, non-prefetchable) [size=64K]
        [virtual] Expansion ROM at fdd00000 [disabled] [size=256K]
        Capabilities: [48] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [e0] Express (v1) Legacy Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 2048 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L0s, Latency L0 <256ns, L1 unlimited
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr+ BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
        Kernel driver in use: mvsas

Its installed in a Phenom II X4 810 based system with a 790FX/SB750 chipset,
8G DDR3 1333 RAM, 6 1TB Seagate 7200.12 SATAII drives connected to the
card via sas->sata breakout cables, and a couple 4 drive SATA hotswap bays.
There are also two Seagate 7200.12 500G drives hooked up to the motherboard
SATA controller. The system is powered via an Antec Neopower Blue 650W PSU
which is probably only half loaded. System also has a discreet gfx card, but its
a low end, low profile, fanless card that takes up next to no power.

I'm still willing to help test any fixes for the mvsas driver on this card.

Thank you.

-- 
Thomas Fjellstrom
thomas@fjellstrom.ca

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mvsas errors in 2.6.36
  2010-12-03 16:39           ` Thomas Fjellstrom
@ 2010-12-03 20:31             ` David Milburn
  2010-12-04  6:57               ` Thomas Fjellstrom
       [not found]               ` <201012041550372348573@usish.com>
  0 siblings, 2 replies; 26+ messages in thread
From: David Milburn @ 2010-12-03 20:31 UTC (permalink / raw)
  To: thomas; +Cc: Andre Tomt, Linux Kernel List, linux-scsi

Thomas Fjellstrom wrote:
> On December 2, 2010, Thomas Fjellstrom wrote:
>> On December 1, 2010, Thomas Fjellstrom wrote:
>>> On November 17, 2010, you wrote:
>>>> On 11/17/2010 08:53 AM, Thomas Fjellstrom wrote:
>>>> [snip]
>>>>
>>>>> Still no fatal errors, but the problem is still happening regularly.
>>>>> It causes a pause in disk io of a couple seconds at least. Really
>>>>> quite annoying.
>>>>>
>>>>> One thing thats got me wondering, is could this be a power issue?
>>>>> It almost seems like (from the messages) that a single drive (any
>>>>> drive) is freaking out, and returning an error that probably
>>>>> shouldn't happen (no CHS 0?), which could mean the drive is
>>>>> underpowered and the firmware is flipping out. I'm not entirely
>>>>> sure. The system has a 750w decent quality Antec power supply. The
>>>>> total power use of the system shouldn't come over half that (phenom
>>>>> II x4 810 cpu, gigabyte ma790fxtud5p mb, low profile nvidia 9400GS
>>>>> gpu, 8 sata hdds, 3 fans, etc). I'm mostly sure the 12v rails are
>>>>> spread out evenly, but I have yet to make absolutely sure.
>>> Made absolute sure. I had been worrying that I was overloading one of the
>>> rails on the PSU, but it turns out that it isn't a multi 12v rail PSU
>>> after all. The box and advertising says it is, but the electronics
>>> inside all say its a single 12v rail device.
>>>
>>>> [snip]
>>>>
>>>> After the mvsas update in 2.6.35 this started happening to me as well;
>>>> at least its better than the previous state - not working.. ;-)
>>>> However, after rolling a new 2.6.35 with the following fix that is
>>>> queued up for the upcoming 2.6.35 and 2.6.36 stable releases, they
>>>> seem to have dissapeared - 3 days and counting.
>>>>
>>>> http://git.kernel.org/?p=linux/kernel/git/stable/stable-queue.git;a=blo
>>>> b_ pl
>>>> ain;f=queue-2.6.33/libsas-fix-ncq-mixing-with-non-ncq.patch;h=b6d7c9209
>>>> 4 d95 ad67a3b23c2e09c25d4fbd0f46b;hb=HEAD
>>>>
>>>> The fix is queued up for the next 2.6.36 and 2.6.35 stable
>>>> point-releases.
>>> Ahah. I wonder how I missed that when I first read it. I'll have to give
>>> the stable .36 kernel a try. Thanks!
>> No fix so far:
>>
>> [ 2539.040104] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task()
>> mvi=ffff880222f00000 task=ffff88018b3e2980 slot=ffff880222f265d0
>> slot_idx=x2 [ 2539.040118] drivers/scsi/mvsas/mv_sas.c
>> 1632:mvs_query_task:rc= 5 [ 2539.040154] drivers/scsi/mvsas/mv_sas.c
>> 2083:port 7 ctrl sts=0x89800. [ 2539.040163] drivers/scsi/mvsas/mv_sas.c
>> 2085:Port 7 irq sts = 0x1001001 [ 2539.040176] drivers/scsi/mvsas/mv_sas.c
>> 2111:phy7 Unplug Notice [ 2539.050220] drivers/scsi/mvsas/mv_sas.c

The controller is reporting a phy ready state change, which is why you see
the unplug notice.

Can you enable SCSI_SAS_LIBSAS_DEBUG and see if libsas reports anything
before the abort?

You should be able to turn on in your kernel config:

Device Drivers
  SCSI device support
   SCSI Transports
    Compile the SAS Domain Transport Attributes in debug mode

Thanks,
David

>> 2083:port 7 ctrl sts=0x199800. [ 2539.050229] drivers/scsi/mvsas/mv_sas.c
>> 2085:Port 7 irq sts = 0x1001081 [ 2539.071157] drivers/scsi/mvsas/mv_sas.c
>> 2083:port 7 ctrl sts=0x199800. [ 2539.071165] drivers/scsi/mvsas/mv_sas.c
>> 2085:Port 7 irq sts = 0x10000 [ 2539.071173] drivers/scsi/mvsas/mv_sas.c
>> 2138:notify plug in on phy[7] [ 2539.081142] drivers/scsi/mvsas/mv_sas.c
>> 1224:port 7 attach dev info is 5000002 [ 2539.081142]
>> drivers/scsi/mvsas/mv_sas.c 1226:port 7 attach sas addr is 7 [
>> 2539.081142] drivers/scsi/mvsas/mv_sas.c 378:phy 7 byte dmaded.
>> [ 2541.270047] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for
>> device[5]:rc= 0 [ 2541.270066] ata14: translated ATA stat/err 0x01/04 to
>> SCSI SK/ASC/ASCQ 0xb/00/00 [ 2541.270926] ata14: status=0x01 { Error }
>> [ 2541.271747] ata14: error=0x04 { DriveStatusError }
>>
>> That appeared after about 42 minutes of uptime.
> 
> So after about 32 hours of uptime theres been 36 separate events. Each spits
> out similar messages as above, and each comes with a noticeable pause while
> the drive is reset.
> 
> There are a number of possible reasons that I'm still having issues:
>  - I managed to mess up the git checkout
>  - My problem isn't related to the fix
>  - The fix doesn't cover all cases of the problem it meant to fix
> 
> I'm not certain which of them it is, I'd be more inclined to think I messed up
> the checkout, as I did patch something in, but the patches were completely
> unrelated and shouldn't have affected the scsi or ata systems at all. At this
> point I'm just grasping at straws.
> 
> In case my card is somehow different than expected, I'll paste the lspci info
> for it: (AOC-SASLP-MV8)
> 
> 04:00.0 SCSI storage controller: Marvell Technology Group Ltd. MV64460/64461/64462 System Controller, Revision B (rev 01)
>         Subsystem: Super Micro Computer Inc Device 0500
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0, Cache Line Size: 64 bytes
>         Interrupt: pin A routed to IRQ 19
>         Region 2: I/O ports at df00 [size=128]
>         Region 4: Memory at fdef0000 (64-bit, non-prefetchable) [size=64K]
>         [virtual] Expansion ROM at fdd00000 [disabled] [size=256K]
>         Capabilities: [48] Power Management version 2
>                 Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
>         Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
>                 Address: 0000000000000000  Data: 0000
>         Capabilities: [e0] Express (v1) Legacy Endpoint, MSI 00
>                 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
>                         ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
>                 DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
>                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>                         MaxPayload 128 bytes, MaxReadReq 2048 bytes
>                 DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
>                 LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L0s, Latency L0 <256ns, L1 unlimited
>                         ClockPM- Surprise- LLActRep- BwNot-
>                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>                 LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>         Capabilities: [100 v1] Advanced Error Reporting
>                 UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>                 CESta:  RxErr+ BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
>                 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
>                 AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
>         Kernel driver in use: mvsas
> 
> Its installed in a Phenom II X4 810 based system with a 790FX/SB750 chipset,
> 8G DDR3 1333 RAM, 6 1TB Seagate 7200.12 SATAII drives connected to the
> card via sas->sata breakout cables, and a couple 4 drive SATA hotswap bays.
> There are also two Seagate 7200.12 500G drives hooked up to the motherboard
> SATA controller. The system is powered via an Antec Neopower Blue 650W PSU
> which is probably only half loaded. System also has a discreet gfx card, but its
> a low end, low profile, fanless card that takes up next to no power.
> 
> I'm still willing to help test any fixes for the mvsas driver on this card.
> 
> Thank you.
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mvsas errors in 2.6.36
  2010-12-03 20:31             ` David Milburn
@ 2010-12-04  6:57               ` Thomas Fjellstrom
       [not found]               ` <201012041550372348573@usish.com>
  1 sibling, 0 replies; 26+ messages in thread
From: Thomas Fjellstrom @ 2010-12-04  6:57 UTC (permalink / raw)
  To: David Milburn; +Cc: Andre Tomt, Linux Kernel List, linux-scsi

On December 3, 2010, David Milburn wrote:
> Thomas Fjellstrom wrote:
> > On December 2, 2010, Thomas Fjellstrom wrote:
> >> On December 1, 2010, Thomas Fjellstrom wrote:
> >>> On November 17, 2010, you wrote:
> >>>> On 11/17/2010 08:53 AM, Thomas Fjellstrom wrote:
> >>>> [snip]
> >>>> 
> >>>>> Still no fatal errors, but the problem is still happening regularly.
> >>>>> It causes a pause in disk io of a couple seconds at least. Really
> >>>>> quite annoying.
> >>>>> 
> >>>>> One thing thats got me wondering, is could this be a power issue?
> >>>>> It almost seems like (from the messages) that a single drive (any
> >>>>> drive) is freaking out, and returning an error that probably
> >>>>> shouldn't happen (no CHS 0?), which could mean the drive is
> >>>>> underpowered and the firmware is flipping out. I'm not entirely
> >>>>> sure. The system has a 750w decent quality Antec power supply. The
> >>>>> total power use of the system shouldn't come over half that (phenom
> >>>>> II x4 810 cpu, gigabyte ma790fxtud5p mb, low profile nvidia 9400GS
> >>>>> gpu, 8 sata hdds, 3 fans, etc). I'm mostly sure the 12v rails are
> >>>>> spread out evenly, but I have yet to make absolutely sure.
> >>> 
> >>> Made absolute sure. I had been worrying that I was overloading one of
> >>> the rails on the PSU, but it turns out that it isn't a multi 12v rail
> >>> PSU after all. The box and advertising says it is, but the electronics
> >>> inside all say its a single 12v rail device.
> >>> 
> >>>> [snip]
> >>>> 
> >>>> After the mvsas update in 2.6.35 this started happening to me as well;
> >>>> at least its better than the previous state - not working.. ;-)
> >>>> However, after rolling a new 2.6.35 with the following fix that is
> >>>> queued up for the upcoming 2.6.35 and 2.6.36 stable releases, they
> >>>> seem to have dissapeared - 3 days and counting.
> >>>> 
> >>>> http://git.kernel.org/?p=linux/kernel/git/stable/stable-queue.git;a=bl
> >>>> o b_ pl
> >>>> ain;f=queue-2.6.33/libsas-fix-ncq-mixing-with-non-ncq.patch;h=b6d7c920
> >>>> 9 4 d95 ad67a3b23c2e09c25d4fbd0f46b;hb=HEAD
> >>>> 
> >>>> The fix is queued up for the next 2.6.36 and 2.6.35 stable
> >>>> point-releases.
> >>> 
> >>> Ahah. I wonder how I missed that when I first read it. I'll have to
> >>> give the stable .36 kernel a try. Thanks!
> >> 
> >> No fix so far:
> >> 
> >> [ 2539.040104] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task()
> >> mvi=ffff880222f00000 task=ffff88018b3e2980 slot=ffff880222f265d0
> >> slot_idx=x2 [ 2539.040118] drivers/scsi/mvsas/mv_sas.c
> >> 1632:mvs_query_task:rc= 5 [ 2539.040154] drivers/scsi/mvsas/mv_sas.c
> >> 2083:port 7 ctrl sts=0x89800. [ 2539.040163] drivers/scsi/mvsas/mv_sas.c
> >> 2085:Port 7 irq sts = 0x1001001 [ 2539.040176]
> >> drivers/scsi/mvsas/mv_sas.c 2111:phy7 Unplug Notice [ 2539.050220]
> >> drivers/scsi/mvsas/mv_sas.c
> 
> The controller is reporting a phy ready state change, which is why you see
> the unplug notice.
> 
> Can you enable SCSI_SAS_LIBSAS_DEBUG and see if libsas reports anything
> before the abort?
> 
> You should be able to turn on in your kernel config:
> 
> Device Drivers
>   SCSI device support
>    SCSI Transports
>     Compile the SAS Domain Transport Attributes in debug mode

Hi, I've done as you requested.

here's all of the output from the first (and currently only) event:

[ 1428.000080] sas: command 0xffff880184ed1680, task 0xffff88017a0f2680, timed out: BLK_EH_NOT_HANDLED
[ 1428.080051] sas: command 0xffff880224e03880, task 0xffff88017a0f24c0, timed out: BLK_EH_NOT_HANDLED
[ 1428.080077] sas: Enter sas_scsi_recover_host
[ 1428.080085] sas: trying to find task 0xffff88017a0f2680
[ 1428.080092] sas: sas_scsi_find_task: aborting task 0xffff88017a0f2680
[ 1428.080102] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task() mvi=ffff880224040000 task=ffff88017a0f2680 slot=ffff880224066680 slot_idx=x4
[ 1428.080113] sas: sas_scsi_find_task: querying task 0xffff88017a0f2680
[ 1428.080119] drivers/scsi/mvsas/mv_sas.c 1632:mvs_query_task:rc= 5
[ 1428.080125] sas: sas_scsi_find_task: task 0xffff88017a0f2680 failed to abort
[ 1428.080130] sas: task 0xffff88017a0f2680 is not at LU: I_T recover
[ 1428.080135] sas: I_T nexus reset for dev 0000000000000000
[ 1428.080172] drivers/scsi/mvsas/mv_sas.c 2083:port 0 ctrl sts=0x89800.
[ 1428.080180] drivers/scsi/mvsas/mv_sas.c 2085:Port 0 irq sts = 0x1001
[ 1428.080193] drivers/scsi/mvsas/mv_sas.c 2111:phy0 Unplug Notice
[ 1428.090228] drivers/scsi/mvsas/mv_sas.c 2083:port 0 ctrl sts=0x199800.
[ 1428.090236] drivers/scsi/mvsas/mv_sas.c 2085:Port 0 irq sts = 0x1081
[ 1428.111954] drivers/scsi/mvsas/mv_sas.c 2083:port 0 ctrl sts=0x199800.
[ 1428.111962] drivers/scsi/mvsas/mv_sas.c 2085:Port 0 irq sts = 0x10000
[ 1428.111969] drivers/scsi/mvsas/mv_sas.c 2138:notify plug in on phy[0]
[ 1428.146351] drivers/scsi/mvsas/mv_sas.c 1224:port 0 attach dev info is 20004
[ 1428.146351] drivers/scsi/mvsas/mv_sas.c 1226:port 0 attach sas addr is 0
[ 1428.222044] drivers/scsi/mvsas/mv_sas.c 378:phy 0 byte dmaded.
[ 1428.222109] sas: sas_form_port: phy0 belongs to port0 already(1)!
[ 1430.300028] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for device[0]:rc= 0
[ 1430.300040] sas: I_T 0000000000000000 recovered
[ 1430.300048] sas: sas_ata_task_done: SAS error 8d
[ 1430.300059] ata9: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00
[ 1430.300883] ata9.00: device reported invalid CHS sector 0
[ 1430.300888] ata9: status=0x01 { Error }
[ 1430.300894] ata9: error=0x04 { DriveStatusError }
[ 1430.300950] sas: trying to find task 0xffff88017a0f24c0
[ 1430.300956] sas: sas_scsi_find_task: aborting task 0xffff88017a0f24c0
[ 1430.300963] sas: sas_scsi_find_task: task 0xffff88017a0f24c0 is done
[ 1430.300968] sas: sas_eh_handle_sas_errors: task 0xffff88017a0f24c0 is done
[ 1430.300974] sas: sas_ata_task_done: SAS error 8d
[ 1430.300982] ata12: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00
[ 1430.301777] ata12.00: device reported invalid CHS sector 0
[ 1430.301782] ata12: status=0x01 { Error }
[ 1430.301788] ata12: error=0x04 { DriveStatusError }
[ 1430.301808] sas: --- Exit sas_scsi_recover_host

Thanks.

> Thanks,
> David
> 
> >> 2083:port 7 ctrl sts=0x199800. [ 2539.050229]
> >> drivers/scsi/mvsas/mv_sas.c 2085:Port 7 irq sts = 0x1001081 [
> >> 2539.071157] drivers/scsi/mvsas/mv_sas.c 2083:port 7 ctrl sts=0x199800.
> >> [ 2539.071165] drivers/scsi/mvsas/mv_sas.c 2085:Port 7 irq sts =
> >> 0x10000 [ 2539.071173] drivers/scsi/mvsas/mv_sas.c 2138:notify plug in
> >> on phy[7] [ 2539.081142] drivers/scsi/mvsas/mv_sas.c 1224:port 7 attach
> >> dev info is 5000002 [ 2539.081142]
> >> drivers/scsi/mvsas/mv_sas.c 1226:port 7 attach sas addr is 7 [
> >> 2539.081142] drivers/scsi/mvsas/mv_sas.c 378:phy 7 byte dmaded.
> >> [ 2541.270047] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for
> >> device[5]:rc= 0 [ 2541.270066] ata14: translated ATA stat/err 0x01/04 to
> >> SCSI SK/ASC/ASCQ 0xb/00/00 [ 2541.270926] ata14: status=0x01 { Error }
> >> [ 2541.271747] ata14: error=0x04 { DriveStatusError }
> >> 
> >> That appeared after about 42 minutes of uptime.
> > 
> > So after about 32 hours of uptime theres been 36 separate events. Each
> > spits out similar messages as above, and each comes with a noticeable
> > pause while the drive is reset.
> > 
> > There are a number of possible reasons that I'm still having issues:
> >  - I managed to mess up the git checkout
> >  - My problem isn't related to the fix
> >  - The fix doesn't cover all cases of the problem it meant to fix
> > 
> > I'm not certain which of them it is, I'd be more inclined to think I
> > messed up the checkout, as I did patch something in, but the patches
> > were completely unrelated and shouldn't have affected the scsi or ata
> > systems at all. At this point I'm just grasping at straws.
> > 
> > In case my card is somehow different than expected, I'll paste the lspci
> > info for it: (AOC-SASLP-MV8)
> > 
> > 04:00.0 SCSI storage controller: Marvell Technology Group Ltd.
> > MV64460/64461/64462 System Controller, Revision B (rev 01)
> > 
> >         Subsystem: Super Micro Computer Inc Device 0500
> >         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> >         ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz-
> >         UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort-
> >         >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes
> >         Interrupt: pin A routed to IRQ 19
> >         Region 2: I/O ports at df00 [size=128]
> >         Region 4: Memory at fdef0000 (64-bit, non-prefetchable)
> >         [size=64K] [virtual] Expansion ROM at fdd00000 [disabled]
> >         [size=256K] Capabilities: [48] Power Management version 2
> >         
> >                 Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA
> >                 PME(D0+,D1+,D2-,D3hot+,D3cold-) Status: D0 NoSoftRst-
> >                 PME-Enable- DSel=0 DScale=1 PME-
> >         
> >         Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> >         
> >                 Address: 0000000000000000  Data: 0000
> >         
> >         Capabilities: [e0] Express (v1) Legacy Endpoint, MSI 00
> >         
> >                 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
> >                 unlimited, L1 unlimited
> >                 
> >                         ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
> >                 
> >                 DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
> >                 Unsupported-
> >                 
> >                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >                         MaxPayload 128 bytes, MaxReadReq 2048 bytes
> >                 
> >                 DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr-
> >                 TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x4,
> >                 ASPM L0s, Latency L0 <256ns, L1 unlimited
> >                 
> >                         ClockPM- Surprise- LLActRep- BwNot-
> >                 
> >                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain-
> >                 CommClk+
> >                 
> >                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> >                 
> >                 LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+
> >                 DLActive- BWMgmt- ABWMgmt-
> >         
> >         Capabilities: [100 v1] Advanced Error Reporting
> >         
> >                 UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
> >                 UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk:
> >                  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >                 RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+
> >                 SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+
> >                 MalfTLP+ ECRC- UnsupReq- ACSViol- CESta:  RxErr+ BadTLP-
> >                 BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk:  RxErr-
> >                 BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- AERCap:
> >                 First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
> >         
> >         Kernel driver in use: mvsas
> > 
> > Its installed in a Phenom II X4 810 based system with a 790FX/SB750
> > chipset, 8G DDR3 1333 RAM, 6 1TB Seagate 7200.12 SATAII drives connected
> > to the card via sas->sata breakout cables, and a couple 4 drive SATA
> > hotswap bays. There are also two Seagate 7200.12 500G drives hooked up
> > to the motherboard SATA controller. The system is powered via an Antec
> > Neopower Blue 650W PSU which is probably only half loaded. System also
> > has a discreet gfx card, but its a low end, low profile, fanless card
> > that takes up next to no power.
> > 
> > I'm still willing to help test any fixes for the mvsas driver on this
> > card.
> > 
> > Thank you.


-- 
Thomas Fjellstrom
thomas@fjellstrom.ca

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mvsas errors in 2.6.36
       [not found]               ` <201012041550372348573@usish.com>
@ 2010-12-04  8:37                 ` Thomas Fjellstrom
  2010-12-04 11:52                 ` Thomas Fjellstrom
  2010-12-04 12:33                 ` jack_wang
  2 siblings, 0 replies; 26+ messages in thread
From: Thomas Fjellstrom @ 2010-12-04  8:37 UTC (permalink / raw)
  To: jack_wang; +Cc: David Milburn, Andre Tomt, Linux Kernel List, linux-scsi

On December 4, 2010, jack_wang wrote:
> Hi, I've done as you requested.
> here's all of the output from the first (and currently only) event:
[snip]
> Thanks.
> 
> [Jack] The error shows that there are two commands have no respounse utill the timer timeout, and scsi host enter error hander
> to quary and abort task but all failed so try to reset the device .
> 
> I look into mvs_abort_task in mv_sas.c,
> if (SATA_DEV == dev->dev_type) {
> struct mvs_slot_info *slot = task->lldd_task;
> struct task_status_struct *tstat;
> u32 slot_idx = (u32)(slot - mvi->slot_info);
> tstat = &task->task_status;
> mv_dprintk(KERN_DEBUG "mv_abort_task() mvi=%p task=%p "
>    "slot=%p slot_idx=x%x\n",
>    mvi, task, slot, slot_idx);
> tstat->stat = SAS_ABORTED_TASK;
> if (mvi_dev && mvi_dev->running_req)
> mvi_dev->running_req--;
> if (sas_protocol_ata(task->task_proto))
> mvs_free_reg_set(mvi, mvi_dev);
> mvs_slot_task_free(mvi, task, slot, slot_idx);
> return -1;
>  //// here the return -1 looks suspicuse , you can remove it to have a try
> 

I commented out that return, and am waiting for another event.. But this
time instead of rebooting, I just rmmod'ed the driver and modprobed it,
and got the following messages, some of which don't seem quite right
(to me at least):

[ 6741.390235] sd 0:0:0:0: [sdc] Synchronizing SCSI cache
[ 6741.390474] sd 0:0:0:0: [sdc] Stopping disk
[ 6741.917971] drivers/scsi/mvsas/mv_sas.c 1388:found dev[0:5] is gone.
[ 6742.000262] sd 0:0:1:0: [sdd] Synchronizing SCSI cache
[ 6742.000474] sd 0:0:1:0: [sdd] Stopping disk
[ 6742.525251] drivers/scsi/mvsas/mv_sas.c 1388:found dev[1:5] is gone.
[ 6742.560242] sd 0:0:2:0: [sde] Synchronizing SCSI cache
[ 6742.560534] sd 0:0:2:0: [sde] Stopping disk
[ 6743.088472] drivers/scsi/mvsas/mv_sas.c 1388:found dev[2:5] is gone.
[ 6743.200184] sd 0:0:3:0: [sdf] Synchronizing SCSI cache
[ 6743.200531] sd 0:0:3:0: [sdf] Stopping disk
[ 6743.713619] drivers/scsi/mvsas/mv_sas.c 1388:found dev[3:5] is gone.
[ 6743.750267] sd 0:0:4:0: [sdg] Synchronizing SCSI cache
[ 6743.750842] sd 0:0:4:0: [sdg] Stopping disk
[ 6744.272381] drivers/scsi/mvsas/mv_sas.c 1388:found dev[4:5] is gone.
[ 6744.310257] sd 0:0:5:0: [sdh] Synchronizing SCSI cache
[ 6744.311120] sd 0:0:5:0: [sdh] Stopping disk
[ 6744.827928] drivers/scsi/mvsas/mv_sas.c 1388:found dev[5:5] is gone.
[ 6744.829125] mvsas 0000:04:00.0: PCI INT A disabled
[ 6764.553594] mvsas 0000:04:00.0: mvsas: driver version 0.8.2
[ 6764.553621] mvsas 0000:04:00.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
[ 6764.553633] mvsas 0000:04:00.0: setting latency timer to 64
[ 6764.557330] mvsas 0000:04:00.0: mvsas: PCI-E x4, Bandwidth Usage: 2.5 Gbps
[ 6769.020123] drivers/scsi/mvsas/mv_sas.c 1224:port 0 attach dev info is 20004
[ 6769.020131] drivers/scsi/mvsas/mv_sas.c 1226:port 0 attach sas addr is 0
[ 6769.230098] drivers/scsi/mvsas/mv_sas.c 1224:port 1 attach dev info is 0
[ 6769.230107] drivers/scsi/mvsas/mv_sas.c 1226:port 1 attach sas addr is 1
[ 6769.440189] drivers/scsi/mvsas/mv_sas.c 1224:port 2 attach dev info is 2000200
[ 6769.440197] drivers/scsi/mvsas/mv_sas.c 1226:port 2 attach sas addr is 2
[ 6769.650098] drivers/scsi/mvsas/mv_sas.c 1224:port 3 attach dev info is 2000000
[ 6769.650106] drivers/scsi/mvsas/mv_sas.c 1226:port 3 attach sas addr is 3
[ 6769.760044] drivers/scsi/mvsas/mv_sas.c 1224:port 4 attach dev info is 0
[ 6769.760053] drivers/scsi/mvsas/mv_sas.c 1226:port 4 attach sas addr is 0
[ 6769.870131] drivers/scsi/mvsas/mv_sas.c 1224:port 5 attach dev info is 0
[ 6769.870140] drivers/scsi/mvsas/mv_sas.c 1226:port 5 attach sas addr is 0
[ 6770.080099] drivers/scsi/mvsas/mv_sas.c 1224:port 6 attach dev info is 2000000
[ 6770.080108] drivers/scsi/mvsas/mv_sas.c 1226:port 6 attach sas addr is 6
[ 6770.290091] drivers/scsi/mvsas/mv_sas.c 1224:port 7 attach dev info is 5000002
[ 6770.290100] drivers/scsi/mvsas/mv_sas.c 1226:port 7 attach sas addr is 7
[ 6770.290111] scsi9 : mvsas
[ 6770.291608] drivers/scsi/mvsas/mv_sas.c 378:phy 0 byte dmaded.
[ 6770.291618] drivers/scsi/mvsas/mv_sas.c 378:phy 1 byte dmaded.
[ 6770.291626] drivers/scsi/mvsas/mv_sas.c 378:phy 2 byte dmaded.
[ 6770.291632] drivers/scsi/mvsas/mv_sas.c 378:phy 3 byte dmaded.
[ 6770.291639] drivers/scsi/mvsas/mv_sas.c 378:phy 6 byte dmaded.
[ 6770.291646] drivers/scsi/mvsas/mv_sas.c 378:phy 7 byte dmaded.
[ 6770.291749] sas: phy-9:0 added to port-9:0, phy_mask:0x1 (               0)
[ 6770.291869] sas: phy-9:1 added to port-9:1, phy_mask:0x2 ( 100000000000000)
[ 6770.291949] sas: phy-9:2 added to port-9:2, phy_mask:0x4 ( 200000000000000)
[ 6770.292010] sas: phy-9:3 added to port-9:3, phy_mask:0x8 ( 300000000000000)
[ 6770.292073] sas: phy-9:6 added to port-9:4, phy_mask:0x40 ( 600000000000000)
[ 6770.292135] sas: phy-9:7 added to port-9:5, phy_mask:0x80 ( 700000000000000)
[ 6770.292154] sas: DOING DISCOVERY on port 0, pid:7283
[ 6770.293877] drivers/scsi/mvsas/mv_sas.c 1388:found dev[0:5] is gone.
[ 6770.294212] sas: sas_ata_phy_reset: Found ATA device.
[ 6770.321998] ata15.00: ATA-8: ST31000528AS, CC34, max UDMA/133
[ 6770.322008] ata15.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[ 6775.320090] ata15.00: qc timeout (cmd 0xef)
[ 6775.320103] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task() mvi=ffff880224040000 task=ffff880224cf8dc0 slot=ffff880224066520 slot_idx=x0
[ 6775.320116] drivers/scsi/mvsas/mv_sas.c 1718:mvs_abort_task:rc= 5
[ 6775.320125] ata15.00: failed to set xfermode (err_mask=0x4)
[ 6775.320925] drivers/scsi/mvsas/mv_sas.c 2083:port 0 ctrl sts=0x89800.
[ 6775.320931] drivers/scsi/mvsas/mv_sas.c 2085:Port 0 irq sts = 0x1001001
[ 6775.320942] drivers/scsi/mvsas/mv_sas.c 2111:phy0 Unplug Notice
[ 6775.331012] drivers/scsi/mvsas/mv_sas.c 2083:port 0 ctrl sts=0x199800.
[ 6775.331019] drivers/scsi/mvsas/mv_sas.c 2085:Port 0 irq sts = 0x1001081
[ 6776.068899] drivers/scsi/mvsas/mv_sas.c 2083:port 0 ctrl sts=0x199800.
[ 6776.068908] drivers/scsi/mvsas/mv_sas.c 2085:Port 0 irq sts = 0x10000
[ 6776.068916] drivers/scsi/mvsas/mv_sas.c 2138:notify plug in on phy[0]
[ 6776.078883] drivers/scsi/mvsas/mv_sas.c 1224:port 0 attach dev info is 20004
[ 6776.078883] drivers/scsi/mvsas/mv_sas.c 1226:port 0 attach sas addr is 0
[ 6776.078883] drivers/scsi/mvsas/mv_sas.c 378:phy 0 byte dmaded.
[ 6777.540093] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for device[0]:rc= 0
[ 6777.540105] sas: sas_ata_phy_reset: Found ATA device.
[ 6777.540164] sas: sas_to_ata_err: Saw error 2.  What to do?
[ 6777.540172] sas: sas_ata_task_done: SAS error 2
[ 6777.540248] ata15.00: failed to IDENTIFY (I/O error, err_mask=0x100)
[ 6777.540254] sas: STUB sas_ata_scr_read
[ 6777.540260] ata15: limiting SATA link speed to 1.5 Gbps
[ 6777.540266] ata15.00: limiting speed to UDMA/133:PIO3
[ 6777.540293] drivers/scsi/mvsas/mv_sas.c 2083:port 0 ctrl sts=0x89800.
[ 6777.540299] drivers/scsi/mvsas/mv_sas.c 2085:Port 0 irq sts = 0x1001
[ 6777.540310] drivers/scsi/mvsas/mv_sas.c 2111:phy0 Unplug Notice
[ 6777.550384] drivers/scsi/mvsas/mv_sas.c 2083:port 0 ctrl sts=0x199800.
[ 6777.550390] drivers/scsi/mvsas/mv_sas.c 2085:Port 0 irq sts = 0x1081
[ 6777.571696] drivers/scsi/mvsas/mv_sas.c 2083:port 0 ctrl sts=0x199800.
[ 6777.571704] drivers/scsi/mvsas/mv_sas.c 2085:Port 0 irq sts = 0x10000
[ 6777.571712] drivers/scsi/mvsas/mv_sas.c 2138:notify plug in on phy[0]
[ 6777.581683] drivers/scsi/mvsas/mv_sas.c 1224:port 0 attach dev info is 20004
[ 6777.581683] drivers/scsi/mvsas/mv_sas.c 1226:port 0 attach sas addr is 0
[ 6777.581683] drivers/scsi/mvsas/mv_sas.c 378:phy 0 byte dmaded.
[ 6779.760042] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for device[0]:rc= 0
[ 6779.760054] sas: sas_ata_phy_reset: Found ATA device.
[ 6779.787123] ata15.00: ATA-8: ST31000528AS, CC34, max UDMA/133
[ 6779.787133] ata15.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[ 6779.826838] ata15.00: configured for UDMA/133
[ 6779.827081] scsi 9:0:0:0: Direct-Access     ATA      ST31000528AS     CC34 PQ: 0 ANSI: 5
[ 6779.827164] sas: DONE DISCOVERY on port 0, pid:7283, result:0
[ 6779.827188] sas: DOING DISCOVERY on port 1, pid:7283
[ 6779.828926] drivers/scsi/mvsas/mv_sas.c 1388:found dev[1:5] is gone.
[ 6779.829306] sas: sas_ata_phy_reset: Found ATA device.
[ 6779.856997] ata16.00: ATA-8: ST31000528AS, CC34, max UDMA/133
[ 6779.857007] ata16.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[ 6784.860080] ata16.00: qc timeout (cmd 0xec)
[ 6784.860093] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task() mvi=ffff880224040000 task=ffff88022dd92c40 slot=ffff880224066520 slot_idx=x0
[ 6784.860106] drivers/scsi/mvsas/mv_sas.c 1718:mvs_abort_task:rc= 5
[ 6784.860115] ata16.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 6784.860122] ata16.00: revalidation failed (errno=-5)
[ 6784.860923] drivers/scsi/mvsas/mv_sas.c 2083:port 1 ctrl sts=0x89800.
[ 6784.860930] drivers/scsi/mvsas/mv_sas.c 2085:Port 1 irq sts = 0x1001001
[ 6784.860941] drivers/scsi/mvsas/mv_sas.c 2111:phy1 Unplug Notice
[ 6784.871016] drivers/scsi/mvsas/mv_sas.c 2083:port 1 ctrl sts=0x199800.
[ 6784.871022] drivers/scsi/mvsas/mv_sas.c 2085:Port 1 irq sts = 0x1001081
[ 6785.740356] drivers/scsi/mvsas/mv_sas.c 2083:port 1 ctrl sts=0x199800.
[ 6785.740365] drivers/scsi/mvsas/mv_sas.c 2085:Port 1 irq sts = 0x10000
[ 6785.740373] drivers/scsi/mvsas/mv_sas.c 2138:notify plug in on phy[1]
[ 6785.785279] drivers/scsi/mvsas/mv_sas.c 1224:port 1 attach dev info is 0
[ 6785.785279] drivers/scsi/mvsas/mv_sas.c 1226:port 1 attach sas addr is 1
[ 6785.850460] drivers/scsi/mvsas/mv_sas.c 378:phy 1 byte dmaded.
[ 6787.080025] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for device[1]:rc= 0
[ 6787.080031] sas: sas_ata_phy_reset: Found ATA device.
[ 6787.080080] sas: sas_to_ata_err: Saw error 2.  What to do?
[ 6787.080088] sas: sas_ata_task_done: SAS error 2
[ 6787.080165] ata16.00: failed to IDENTIFY (I/O error, err_mask=0x100)
[ 6787.080171] sas: STUB sas_ata_scr_read
[ 6787.080176] ata16: limiting SATA link speed to 1.5 Gbps
[ 6787.080183] ata16.00: limiting speed to UDMA/133:PIO3
[ 6787.080213] drivers/scsi/mvsas/mv_sas.c 2083:port 1 ctrl sts=0x89800.
[ 6787.080219] drivers/scsi/mvsas/mv_sas.c 2085:Port 1 irq sts = 0x1001
[ 6787.080230] drivers/scsi/mvsas/mv_sas.c 2111:phy1 Unplug Notice
[ 6787.090298] drivers/scsi/mvsas/mv_sas.c 2083:port 1 ctrl sts=0x199800.
[ 6787.090304] drivers/scsi/mvsas/mv_sas.c 2085:Port 1 irq sts = 0x1081
[ 6787.111165] drivers/scsi/mvsas/mv_sas.c 2083:port 1 ctrl sts=0x199800.
[ 6787.111173] drivers/scsi/mvsas/mv_sas.c 2085:Port 1 irq sts = 0x10000
[ 6787.111181] drivers/scsi/mvsas/mv_sas.c 2138:notify plug in on phy[1]
[ 6787.153464] drivers/scsi/mvsas/mv_sas.c 1224:port 1 attach dev info is 0
[ 6787.153464] drivers/scsi/mvsas/mv_sas.c 1226:port 1 attach sas addr is 1
[ 6787.153464] drivers/scsi/mvsas/mv_sas.c 378:phy 1 byte dmaded.
[ 6789.300083] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for device[1]:rc= 0
[ 6789.300095] sas: sas_ata_phy_reset: Found ATA device.
[ 6789.327213] ata16.00: ATA-8: ST31000528AS, CC34, max UDMA/133
[ 6789.327223] ata16.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[ 6789.366778] ata16.00: configured for UDMA/133
[ 6789.367007] scsi 9:0:1:0: Direct-Access     ATA      ST31000528AS     CC34 PQ: 0 ANSI: 5
[ 6789.367134] sas: DONE DISCOVERY on port 1, pid:7283, result:0
[ 6789.367172] sas: DOING DISCOVERY on port 2, pid:7283
[ 6789.368918] drivers/scsi/mvsas/mv_sas.c 1388:found dev[2:5] is gone.
[ 6789.369288] sas: sas_ata_phy_reset: Found ATA device.
[ 6789.397016] ata17.00: ATA-8: ST31000528AS, CC34, max UDMA/133
[ 6789.397026] ata17.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[ 6794.390105] ata17.00: qc timeout (cmd 0xef)
[ 6794.390121] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task() mvi=ffff880224040000 task=ffff880224130cc0 slot=ffff880224066520 slot_idx=x0
[ 6794.390134] drivers/scsi/mvsas/mv_sas.c 1718:mvs_abort_task:rc= 5
[ 6794.390143] ata17.00: failed to set xfermode (err_mask=0x4)
[ 6794.390952] drivers/scsi/mvsas/mv_sas.c 2083:port 2 ctrl sts=0x89800.
[ 6794.390958] drivers/scsi/mvsas/mv_sas.c 2085:Port 2 irq sts = 0x1001
[ 6794.390969] drivers/scsi/mvsas/mv_sas.c 2111:phy2 Unplug Notice
[ 6794.401034] drivers/scsi/mvsas/mv_sas.c 2083:port 2 ctrl sts=0x199800.
[ 6794.401040] drivers/scsi/mvsas/mv_sas.c 2085:Port 2 irq sts = 0x1081
[ 6795.412489] drivers/scsi/mvsas/mv_sas.c 2083:port 2 ctrl sts=0x199800.
[ 6795.412498] drivers/scsi/mvsas/mv_sas.c 2085:Port 2 irq sts = 0x10000
[ 6795.412506] drivers/scsi/mvsas/mv_sas.c 2138:notify plug in on phy[2]
[ 6795.422471] drivers/scsi/mvsas/mv_sas.c 1224:port 2 attach dev info is 2000200
[ 6795.422471] drivers/scsi/mvsas/mv_sas.c 1226:port 2 attach sas addr is 2
[ 6795.521980] drivers/scsi/mvsas/mv_sas.c 378:phy 2 byte dmaded.
[ 6796.610091] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for device[2]:rc= 0
[ 6796.610103] sas: sas_ata_phy_reset: Found ATA device.
[ 6796.610147] sas: sas_to_ata_err: Saw error 2.  What to do?
[ 6796.610153] sas: sas_ata_task_done: SAS error 2
[ 6796.610183] ata17.00: failed to IDENTIFY (I/O error, err_mask=0x100)
[ 6796.610192] sas: STUB sas_ata_scr_read
[ 6796.610198] ata17: limiting SATA link speed to 1.5 Gbps
[ 6796.610206] ata17.00: limiting speed to UDMA/133:PIO3
[ 6796.610233] drivers/scsi/mvsas/mv_sas.c 2083:port 2 ctrl sts=0x89800.
[ 6796.610240] drivers/scsi/mvsas/mv_sas.c 2085:Port 2 irq sts = 0x1001
[ 6796.610251] drivers/scsi/mvsas/mv_sas.c 2111:phy2 Unplug Notice
[ 6796.620285] drivers/scsi/mvsas/mv_sas.c 2083:port 2 ctrl sts=0x199800.
[ 6796.620295] drivers/scsi/mvsas/mv_sas.c 2085:Port 2 irq sts = 0x1081
[ 6796.642316] drivers/scsi/mvsas/mv_sas.c 2083:port 2 ctrl sts=0x199800.
[ 6796.642319] drivers/scsi/mvsas/mv_sas.c 2085:Port 2 irq sts = 0x10000
[ 6796.642322] drivers/scsi/mvsas/mv_sas.c 2138:notify plug in on phy[2]
[ 6796.743532] drivers/scsi/mvsas/mv_sas.c 1224:port 2 attach dev info is 2000200
[ 6796.743532] drivers/scsi/mvsas/mv_sas.c 1226:port 2 attach sas addr is 2
[ 6796.743532] drivers/scsi/mvsas/mv_sas.c 378:phy 2 byte dmaded.
[ 6798.830036] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for device[2]:rc= 0
[ 6798.830050] sas: sas_ata_phy_reset: Found ATA device.
[ 6798.857151] ata17.00: ATA-8: ST31000528AS, CC34, max UDMA/133
[ 6798.857161] ata17.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[ 6798.896658] ata17.00: configured for UDMA/133
[ 6798.896874] scsi 9:0:2:0: Direct-Access     ATA      ST31000528AS     CC34 PQ: 0 ANSI: 5
[ 6798.896937] sas: DONE DISCOVERY on port 2, pid:7283, result:0
[ 6798.896964] sas: DOING DISCOVERY on port 3, pid:7283
[ 6798.898669] drivers/scsi/mvsas/mv_sas.c 1388:found dev[3:5] is gone.
[ 6798.899043] sas: sas_ata_phy_reset: Found ATA device.
[ 6798.900882] ata18.00: ATA-8: ST31000523AS, CC35, max UDMA/133
[ 6798.900885] ata18.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[ 6803.900077] ata18.00: qc timeout (cmd 0xef)
[ 6803.900091] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task() mvi=ffff880224040000 task=ffff880224130400 slot=ffff880224066520 slot_idx=x0
[ 6803.900103] drivers/scsi/mvsas/mv_sas.c 1718:mvs_abort_task:rc= 5
[ 6803.900113] ata18.00: failed to set xfermode (err_mask=0x4)
[ 6803.900942] drivers/scsi/mvsas/mv_sas.c 2083:port 3 ctrl sts=0x89800.
[ 6803.900949] drivers/scsi/mvsas/mv_sas.c 2085:Port 3 irq sts = 0x1001
[ 6803.900961] drivers/scsi/mvsas/mv_sas.c 2111:phy3 Unplug Notice
[ 6803.911030] drivers/scsi/mvsas/mv_sas.c 2083:port 3 ctrl sts=0x199800.
[ 6803.911035] drivers/scsi/mvsas/mv_sas.c 2085:Port 3 irq sts = 0x1081
[ 6804.517325] drivers/scsi/mvsas/mv_sas.c 2083:port 3 ctrl sts=0x199800.
[ 6804.517333] drivers/scsi/mvsas/mv_sas.c 2085:Port 3 irq sts = 0x10000
[ 6804.517341] drivers/scsi/mvsas/mv_sas.c 2138:notify plug in on phy[3]
[ 6804.528992] drivers/scsi/mvsas/mv_sas.c 1224:port 3 attach dev info is 2000000
[ 6804.528992] drivers/scsi/mvsas/mv_sas.c 1226:port 3 attach sas addr is 3
[ 6804.627313] drivers/scsi/mvsas/mv_sas.c 378:phy 3 byte dmaded.
[ 6806.120086] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for device[3]:rc= 0
[ 6806.120103] sas: sas_ata_phy_reset: Found ATA device.
[ 6806.120147] sas: sas_to_ata_err: Saw error 2.  What to do?
[ 6806.120153] sas: sas_ata_task_done: SAS error 2
[ 6806.120244] ata18.00: failed to IDENTIFY (I/O error, err_mask=0x100)
[ 6806.120252] sas: STUB sas_ata_scr_read
[ 6806.120258] ata18: limiting SATA link speed to 1.5 Gbps
[ 6806.120266] ata18.00: limiting speed to UDMA/133:PIO3
[ 6806.120293] drivers/scsi/mvsas/mv_sas.c 2083:port 3 ctrl sts=0x89800.
[ 6806.120300] drivers/scsi/mvsas/mv_sas.c 2085:Port 3 irq sts = 0x1001
[ 6806.120311] drivers/scsi/mvsas/mv_sas.c 2111:phy3 Unplug Notice
[ 6806.130385] drivers/scsi/mvsas/mv_sas.c 2083:port 3 ctrl sts=0x199800.
[ 6806.130392] drivers/scsi/mvsas/mv_sas.c 2085:Port 3 irq sts = 0x1081
[ 6806.134101] drivers/scsi/mvsas/mv_sas.c 2083:port 3 ctrl sts=0x199800.
[ 6806.134103] drivers/scsi/mvsas/mv_sas.c 2085:Port 3 irq sts = 0x10000
[ 6806.134106] drivers/scsi/mvsas/mv_sas.c 2138:notify plug in on phy[3]
[ 6806.171284] drivers/scsi/mvsas/mv_sas.c 1224:port 3 attach dev info is 2000000
[ 6806.171284] drivers/scsi/mvsas/mv_sas.c 1226:port 3 attach sas addr is 3
[ 6806.171284] drivers/scsi/mvsas/mv_sas.c 378:phy 3 byte dmaded.
[ 6808.340071] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for device[3]:rc= 0
[ 6808.340084] sas: sas_ata_phy_reset: Found ATA device.
[ 6808.341287] ata18.00: ATA-8: ST31000523AS, CC35, max UDMA/133
[ 6808.341297] ata18.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[ 6808.342630] ata18.00: configured for UDMA/133
[ 6808.342854] scsi 9:0:3:0: Direct-Access     ATA      ST31000523AS     CC35 PQ: 0 ANSI: 5
[ 6808.342940] sas: DONE DISCOVERY on port 3, pid:7283, result:0
[ 6808.342968] sas: DOING DISCOVERY on port 4, pid:7283
[ 6808.344645] drivers/scsi/mvsas/mv_sas.c 1388:found dev[4:5] is gone.
[ 6808.345028] sas: sas_ata_phy_reset: Found ATA device.
[ 6808.372766] ata19.00: ATA-8: ST31000528AS, CC34, max UDMA/133
[ 6808.372775] ata19.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[ 6813.370091] ata19.00: qc timeout (cmd 0xef)
[ 6813.370109] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task() mvi=ffff880224040000 task=ffff8802241305c0 slot=ffff880224066520 slot_idx=x0
[ 6813.370121] drivers/scsi/mvsas/mv_sas.c 1718:mvs_abort_task:rc= 5
[ 6813.370130] ata19.00: failed to set xfermode (err_mask=0x4)
[ 6813.370911] drivers/scsi/mvsas/mv_sas.c 2083:port 6 ctrl sts=0x89800.
[ 6813.370917] drivers/scsi/mvsas/mv_sas.c 2085:Port 6 irq sts = 0x1001
[ 6813.370932] drivers/scsi/mvsas/mv_sas.c 2111:phy6 Unplug Notice
[ 6813.381007] drivers/scsi/mvsas/mv_sas.c 2083:port 6 ctrl sts=0x199800.
[ 6813.381014] drivers/scsi/mvsas/mv_sas.c 2085:Port 6 irq sts = 0x1081
[ 6814.132874] drivers/scsi/mvsas/mv_sas.c 2083:port 6 ctrl sts=0x199800.
[ 6814.132883] drivers/scsi/mvsas/mv_sas.c 2085:Port 6 irq sts = 0x10000
[ 6814.132891] drivers/scsi/mvsas/mv_sas.c 2138:notify plug in on phy[6]
[ 6814.200424] drivers/scsi/mvsas/mv_sas.c 1224:port 6 attach dev info is 2000000
[ 6814.200424] drivers/scsi/mvsas/mv_sas.c 1226:port 6 attach sas addr is 6
[ 6814.242707] drivers/scsi/mvsas/mv_sas.c 378:phy 6 byte dmaded.
[ 6815.590038] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for device[4]:rc= 0
[ 6815.590050] sas: sas_ata_phy_reset: Found ATA device.
[ 6815.590095] sas: sas_to_ata_err: Saw error 2.  What to do?
[ 6815.590101] sas: sas_ata_task_done: SAS error 2
[ 6815.590118] ata19.00: failed to IDENTIFY (I/O error, err_mask=0x100)
[ 6815.590124] sas: STUB sas_ata_scr_read
[ 6815.590130] ata19: limiting SATA link speed to 1.5 Gbps
[ 6815.590138] ata19.00: limiting speed to UDMA/133:PIO3
[ 6815.590161] drivers/scsi/mvsas/mv_sas.c 2083:port 6 ctrl sts=0x89800.
[ 6815.590167] drivers/scsi/mvsas/mv_sas.c 2085:Port 6 irq sts = 0x1001
[ 6815.590178] drivers/scsi/mvsas/mv_sas.c 2111:phy6 Unplug Notice
[ 6815.600244] drivers/scsi/mvsas/mv_sas.c 2083:port 6 ctrl sts=0x199800.
[ 6815.600251] drivers/scsi/mvsas/mv_sas.c 2085:Port 6 irq sts = 0x1081
[ 6815.621709] drivers/scsi/mvsas/mv_sas.c 2083:port 6 ctrl sts=0x199800.
[ 6815.621712] drivers/scsi/mvsas/mv_sas.c 2085:Port 6 irq sts = 0x10000
[ 6815.621715] drivers/scsi/mvsas/mv_sas.c 2138:notify plug in on phy[6]
[ 6815.631701] drivers/scsi/mvsas/mv_sas.c 1224:port 6 attach dev info is 2000000
[ 6815.631701] drivers/scsi/mvsas/mv_sas.c 1226:port 6 attach sas addr is 6
[ 6815.631701] drivers/scsi/mvsas/mv_sas.c 378:phy 6 byte dmaded.
[ 6817.810034] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for device[4]:rc= 0
[ 6817.810047] sas: sas_ata_phy_reset: Found ATA device.
[ 6817.837142] ata19.00: ATA-8: ST31000528AS, CC34, max UDMA/133
[ 6817.837151] ata19.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[ 6817.876702] ata19.00: configured for UDMA/133
[ 6817.876963] scsi 9:0:4:0: Direct-Access     ATA      ST31000528AS     CC34 PQ: 0 ANSI: 5
[ 6817.877066] sas: DONE DISCOVERY on port 4, pid:7283, result:0
[ 6817.877093] sas: DOING DISCOVERY on port 5, pid:7283
[ 6817.878838] drivers/scsi/mvsas/mv_sas.c 1388:found dev[5:5] is gone.
[ 6817.879216] sas: sas_ata_phy_reset: Found ATA device.
[ 6817.906941] ata20.00: ATA-8: ST31000528AS, CC34, max UDMA/133
[ 6817.906950] ata20.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[ 6822.900108] ata20.00: qc timeout (cmd 0xef)
[ 6822.900120] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task() mvi=ffff880224040000 task=ffff88022ae51d00 slot=ffff880224066520 slot_idx=x0
[ 6822.900133] drivers/scsi/mvsas/mv_sas.c 1718:mvs_abort_task:rc= 5
[ 6822.900142] ata20.00: failed to set xfermode (err_mask=0x4)
[ 6822.900956] drivers/scsi/mvsas/mv_sas.c 2083:port 7 ctrl sts=0x89800.
[ 6822.900963] drivers/scsi/mvsas/mv_sas.c 2085:Port 7 irq sts = 0x1001
[ 6822.900974] drivers/scsi/mvsas/mv_sas.c 2111:phy7 Unplug Notice
[ 6822.911042] drivers/scsi/mvsas/mv_sas.c 2083:port 7 ctrl sts=0x199800.
[ 6822.911048] drivers/scsi/mvsas/mv_sas.c 2085:Port 7 irq sts = 0x1081
[ 6823.803507] drivers/scsi/mvsas/mv_sas.c 2083:port 7 ctrl sts=0x199800.
[ 6823.803516] drivers/scsi/mvsas/mv_sas.c 2085:Port 7 irq sts = 0x10000
[ 6823.803524] drivers/scsi/mvsas/mv_sas.c 2138:notify plug in on phy[7]
[ 6823.895054] drivers/scsi/mvsas/mv_sas.c 1224:port 7 attach dev info is 5000002
[ 6823.895054] drivers/scsi/mvsas/mv_sas.c 1226:port 7 attach sas addr is 7
[ 6823.913276] drivers/scsi/mvsas/mv_sas.c 378:phy 7 byte dmaded.
[ 6825.120085] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for device[5]:rc= 0
[ 6825.120097] sas: sas_ata_phy_reset: Found ATA device.
[ 6825.120140] sas: sas_to_ata_err: Saw error 2.  What to do?
[ 6825.120146] sas: sas_ata_task_done: SAS error 2
[ 6825.120176] ata20.00: failed to IDENTIFY (I/O error, err_mask=0x100)
[ 6825.120184] sas: STUB sas_ata_scr_read
[ 6825.120190] ata20: limiting SATA link speed to 1.5 Gbps
[ 6825.120198] ata20.00: limiting speed to UDMA/133:PIO3
[ 6825.120224] drivers/scsi/mvsas/mv_sas.c 2083:port 7 ctrl sts=0x89800.
[ 6825.120231] drivers/scsi/mvsas/mv_sas.c 2085:Port 7 irq sts = 0x1001
[ 6825.120242] drivers/scsi/mvsas/mv_sas.c 2111:phy7 Unplug Notice
[ 6825.130285] drivers/scsi/mvsas/mv_sas.c 2083:port 7 ctrl sts=0x199800.
[ 6825.130296] drivers/scsi/mvsas/mv_sas.c 2085:Port 7 irq sts = 0x1081
[ 6825.151317] drivers/scsi/mvsas/mv_sas.c 2083:port 7 ctrl sts=0x199800.
[ 6825.151325] drivers/scsi/mvsas/mv_sas.c 2085:Port 7 irq sts = 0x10000
[ 6825.151333] drivers/scsi/mvsas/mv_sas.c 2138:notify plug in on phy[7]
[ 6825.232377] drivers/scsi/mvsas/mv_sas.c 1224:port 7 attach dev info is 5000002
[ 6825.232377] drivers/scsi/mvsas/mv_sas.c 1226:port 7 attach sas addr is 7
[ 6825.232377] drivers/scsi/mvsas/mv_sas.c 378:phy 7 byte dmaded.
[ 6827.340101] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for device[5]:rc= 0
[ 6827.340112] sas: sas_ata_phy_reset: Found ATA device.
[ 6827.367259] ata20.00: ATA-8: ST31000528AS, CC34, max UDMA/133
[ 6827.367268] ata20.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[ 6827.406888] ata20.00: configured for UDMA/133
[ 6827.407105] scsi 9:0:5:0: Direct-Access     ATA      ST31000528AS     CC34 PQ: 0 ANSI: 5
[ 6827.407229] sas: DONE DISCOVERY on port 5, pid:7283, result:0
[ 6827.407244] sas: sas_form_port: phy0 belongs to port0 already(1)!
[ 6827.407252] sas: sas_form_port: phy1 belongs to port1 already(1)!
[ 6827.407259] sas: sas_form_port: phy2 belongs to port2 already(1)!
[ 6827.407266] sas: sas_form_port: phy3 belongs to port3 already(1)!
[ 6827.407272] sas: sas_form_port: phy6 belongs to port4 already(1)!
[ 6827.407279] sas: sas_form_port: phy7 belongs to port5 already(1)!
[ 6827.407688] sd 9:0:0:0: [sdc] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
[ 6827.408241] sd 9:0:0:0: [sdc] Write Protect is off
[ 6827.408251] sd 9:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[ 6827.408341] sd 9:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 6827.408403] sd 9:0:2:0: [sde] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
[ 6827.408429] sd 9:0:1:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
[ 6827.408577] sd 9:0:1:0: [sdd] Write Protect is off
[ 6827.408585] sd 9:0:1:0: [sdd] Mode Sense: 00 3a 00 00
[ 6827.408594] sd 9:0:2:0: [sde] Write Protect is off
[ 6827.408602] sd 9:0:2:0: [sde] Mode Sense: 00 3a 00 00
[ 6827.408640] sd 9:0:1:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 6827.408664] sd 9:0:2:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 6827.408946] sd 9:0:3:0: [sdf] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
[ 6827.409203] sd 9:0:3:0: [sdf] Write Protect is off
[ 6827.409211] sd 9:0:3:0: [sdf] Mode Sense: 00 3a 00 00
[ 6827.409257] sd 9:0:3:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 6827.409520] sd 9:0:4:0: [sdg] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
[ 6827.409653] sd 9:0:5:0: [sdh] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
[ 6827.409689] sd 9:0:4:0: [sdg] Write Protect is off
[ 6827.409696] sd 9:0:4:0: [sdg] Mode Sense: 00 3a 00 00
[ 6827.409761] sd 9:0:4:0: [sdg] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 6827.409786] sd 9:0:5:0: [sdh] Write Protect is off
[ 6827.409794] sd 9:0:5:0: [sdh] Mode Sense: 00 3a 00 00
[ 6827.409893] sd 9:0:5:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 6827.423376]  sdc: unknown partition table
[ 6827.423617] sd 9:0:0:0: [sdc] Attached SCSI disk
[ 6827.427047]  sdf: unknown partition table
[ 6827.427091]  sdg: unknown partition table
[ 6827.427193] sd 9:0:3:0: [sdf] Attached SCSI disk
[ 6827.427253] sd 9:0:4:0: [sdg] Attached SCSI disk
[ 6827.427723]  sde: unknown partition table
[ 6827.427859]  sdd: unknown partition table
[ 6827.427917] sd 9:0:2:0: [sde] Attached SCSI disk
[ 6827.428016] sd 9:0:1:0: [sdd] Attached SCSI disk
[ 6827.429659]  sdh: unknown partition table
[ 6827.429792] sd 9:0:5:0: [sdh] Attached SCSI disk

When I see another event, I'll paste it, and then reboot, and see if it
happens the same after a fresh reboot (as this test isn't quite
equivalent to the last test).

Now given I know next to nothing about SAS and SATA controllers,
this is probably a bit presumptuous of me, but I've been thinking about
this for a while. Assuming for a moment that it isn't a hardware problem
(that is, my card, drives, motherboard or cables are not contributing),
then somehow the driver is not handling something entirely correctly.
At some point the card is getting a command it does not like, or can't
handle properly when connected to SATA disks. Possibly its passing
through a SAS or SCSI command to SATA disks that don't know how to handle
it, and decide to completely ignore the command.

That of course it just my best (likely incorrect) guess as to what could
be happening. Please correct me if I'm off base.

At any rate those messages from rmmoding the old driver, and modprobing
the new one are strange. Its finding the drive on each port just fine,
but then immediately fails the xfrmode command, and decides to reprobe
the port, and for some reason decides to connect in 1.5Gbps mode rather
than the native 3Gbps.

-- 
Thomas Fjellstrom
thomas@fjellstrom.ca

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mvsas errors in 2.6.36
       [not found]               ` <201012041550372348573@usish.com>
  2010-12-04  8:37                 ` Thomas Fjellstrom
@ 2010-12-04 11:52                 ` Thomas Fjellstrom
  2010-12-04 12:33                 ` jack_wang
  2 siblings, 0 replies; 26+ messages in thread
From: Thomas Fjellstrom @ 2010-12-04 11:52 UTC (permalink / raw)
  To: jack_wang; +Cc: David Milburn, Andre Tomt, Linux Kernel List, linux-scsi

On December 4, 2010, jack_wang wrote:
> Hi, I've done as you requested.
> here's all of the output from the first (and currently only) event:
> [ 1428.000080] sas: command 0xffff880184ed1680, task 0xffff88017a0f2680, timed out: BLK_EH_NOT_HANDLED
> [ 1428.080051] sas: command 0xffff880224e03880, task 0xffff88017a0f24c0, timed out: BLK_EH_NOT_HANDLED
> [ 1428.080077] sas: Enter sas_scsi_recover_host
> [ 1428.080085] sas: trying to find task 0xffff88017a0f2680
> [ 1428.080092] sas: sas_scsi_find_task: aborting task 0xffff88017a0f2680
> [ 1428.080102] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task() mvi=ffff880224040000 task=ffff88017a0f2680 slot=ffff880224066680 slot_idx=x4
> [ 1428.080113] sas: sas_scsi_find_task: querying task 0xffff88017a0f2680
> [ 1428.080119] drivers/scsi/mvsas/mv_sas.c 1632:mvs_query_task:rc= 5
> [ 1428.080125] sas: sas_scsi_find_task: task 0xffff88017a0f2680 failed to abort
> [ 1428.080130] sas: task 0xffff88017a0f2680 is not at LU: I_T recover
> [ 1428.080135] sas: I_T nexus reset for dev 0000000000000000
> [ 1428.080172] drivers/scsi/mvsas/mv_sas.c 2083:port 0 ctrl sts=0x89800.
> [ 1428.080180] drivers/scsi/mvsas/mv_sas.c 2085:Port 0 irq sts = 0x1001
> [ 1428.080193] drivers/scsi/mvsas/mv_sas.c 2111:phy0 Unplug Notice
> [ 1428.090228] drivers/scsi/mvsas/mv_sas.c 2083:port 0 ctrl sts=0x199800.
> [ 1428.090236] drivers/scsi/mvsas/mv_sas.c 2085:Port 0 irq sts = 0x1081
> [ 1428.111954] drivers/scsi/mvsas/mv_sas.c 2083:port 0 ctrl sts=0x199800.
> [ 1428.111962] drivers/scsi/mvsas/mv_sas.c 2085:Port 0 irq sts = 0x10000
> [ 1428.111969] drivers/scsi/mvsas/mv_sas.c 2138:notify plug in on phy[0]
> [ 1428.146351] drivers/scsi/mvsas/mv_sas.c 1224:port 0 attach dev info is 20004
> [ 1428.146351] drivers/scsi/mvsas/mv_sas.c 1226:port 0 attach sas addr is 0
> [ 1428.222044] drivers/scsi/mvsas/mv_sas.c 378:phy 0 byte dmaded.
> [ 1428.222109] sas: sas_form_port: phy0 belongs to port0 already(1)!
> [ 1430.300028] drivers/scsi/mvsas/mv_sas.c 1586:mvs_I_T_nexus_reset for device[0]:rc= 0
> [ 1430.300040] sas: I_T 0000000000000000 recovered
> [ 1430.300048] sas: sas_ata_task_done: SAS error 8d
> [ 1430.300059] ata9: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00
> [ 1430.300883] ata9.00: device reported invalid CHS sector 0
> [ 1430.300888] ata9: status=0x01 { Error }
> [ 1430.300894] ata9: error=0x04 { DriveStatusError }
> [ 1430.300950] sas: trying to find task 0xffff88017a0f24c0
> [ 1430.300956] sas: sas_scsi_find_task: aborting task 0xffff88017a0f24c0
> [ 1430.300963] sas: sas_scsi_find_task: task 0xffff88017a0f24c0 is done
> [ 1430.300968] sas: sas_eh_handle_sas_errors: task 0xffff88017a0f24c0 is done
> [ 1430.300974] sas: sas_ata_task_done: SAS error 8d
> [ 1430.300982] ata12: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00
> [ 1430.301777] ata12.00: device reported invalid CHS sector 0
> [ 1430.301782] ata12: status=0x01 { Error }
> [ 1430.301788] ata12: error=0x04 { DriveStatusError }
> [ 1430.301808] sas: --- Exit sas_scsi_recover_host
> Thanks.
> 
> [Jack] The error shows that there are two commands have no respounse utill the timer timeout, and scsi host enter error hander
> to quary and abort task but all failed so try to reset the device .
> 
> I look into mvs_abort_task in mv_sas.c,
> if (SATA_DEV == dev->dev_type) {
> struct mvs_slot_info *slot = task->lldd_task;
> struct task_status_struct *tstat;
> u32 slot_idx = (u32)(slot - mvi->slot_info);
> tstat = &task->task_status;
> mv_dprintk(KERN_DEBUG "mv_abort_task() mvi=%p task=%p "
>    "slot=%p slot_idx=x%x\n",
>    mvi, task, slot, slot_idx);
> tstat->stat = SAS_ABORTED_TASK;
> if (mvi_dev && mvi_dev->running_req)
> mvi_dev->running_req--;
> if (sas_protocol_ata(task->task_proto))
> mvs_free_reg_set(mvi, mvi_dev);
> mvs_slot_task_free(mvi, task, slot, slot_idx);
> return -1;
>  //// here the return -1 looks suspicuse , you can remove it to have a try
> >
> [Jack] Sorry, please try to use return 0 to see if help. 

Here is what I get with that returning 0 rather than -1 as you requested:

[19107.040031] sas: command 0xffff88011c77f9c0, task 0xffff88022ae51600, timed out: BLK_EH_NOT_HANDLED
[19107.040062] sas: Enter sas_scsi_recover_host
[19107.040072] sas: trying to find task 0xffff88022ae51600
[19107.040079] sas: sas_scsi_find_task: aborting task 0xffff88022ae51600
[19107.040089] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task() mvi=ffff880224040000 task=ffff88022ae51600 slot=ffff880224066680 slot_idx=x4
[19107.040101] sas: sas_scsi_find_task: task 0xffff88022ae51600 is aborted
[19107.040107] sas: sas_eh_handle_sas_errors: task 0xffff88022ae51600 is aborted
[19107.040113] sas: sas_ata_task_done: SAS error 8d
[19107.040124] ata21: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00
[19107.040860] ata21: status=0x01 { Error }
[19107.040866] ata21: error=0x04 { DriveStatusError }
[19107.040886] sas: --- Exit sas_scsi_recover_host
[19318.000085] sas: command 0xffff8801250291c0, task 0xffff88018a8e5b80, timed out: BLK_EH_NOT_HANDLED
[19318.000125] sas: Enter sas_scsi_recover_host
[19318.000135] sas: trying to find task 0xffff88018a8e5b80
[19318.000141] sas: sas_scsi_find_task: aborting task 0xffff88018a8e5b80
[19318.000152] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task() mvi=ffff880224040000 task=ffff88018a8e5b80 slot=ffff8802240666d8 slot_idx=x5
[19318.000163] sas: sas_scsi_find_task: task 0xffff88018a8e5b80 is aborted
[19318.000169] sas: sas_eh_handle_sas_errors: task 0xffff88018a8e5b80 is aborted
[19318.000175] sas: sas_ata_task_done: SAS error 8d
[19318.000185] ata24: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00
[19318.000896] ata24: status=0x01 { Error }
[19318.000902] ata24: error=0x04 { DriveStatusError }
[19318.000922] sas: --- Exit sas_scsi_recover_host

-- 
Thomas Fjellstrom
thomas@fjellstrom.ca

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Re: mvsas errors in 2.6.36
       [not found]               ` <201012041550372348573@usish.com>
  2010-12-04  8:37                 ` Thomas Fjellstrom
  2010-12-04 11:52                 ` Thomas Fjellstrom
@ 2010-12-04 12:33                 ` jack_wang
  2010-12-04 12:54                   ` Thomas Fjellstrom
  2 siblings, 1 reply; 26+ messages in thread
From: jack_wang @ 2010-12-04 12:33 UTC (permalink / raw)
  To: thomas; +Cc: David Milburn, Andre Tomt, Linux Kernel List, linux-scsi


Here is what I get with that returning 0 rather than -1 as you requested:
[19107.040031] sas: command 0xffff88011c77f9c0, task 0xffff88022ae51600, timed out: BLK_EH_NOT_HANDLED
[19107.040062] sas: Enter sas_scsi_recover_host
[19107.040072] sas: trying to find task 0xffff88022ae51600
[19107.040079] sas: sas_scsi_find_task: aborting task 0xffff88022ae51600
[19107.040089] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task() mvi=ffff880224040000 task=ffff88022ae51600 slot=ffff880224066680 slot_idx=x4
[19107.040101] sas: sas_scsi_find_task: task 0xffff88022ae51600 is aborted
[19107.040107] sas: sas_eh_handle_sas_errors: task 0xffff88022ae51600 is aborted
[19107.040113] sas: sas_ata_task_done: SAS error 8d
[19107.040124] ata21: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00
[19107.040860] ata21: status=0x01 { Error }
[19107.040866] ata21: error=0x04 { DriveStatusError }
[19107.040886] sas: --- Exit sas_scsi_recover_host
[19318.000085] sas: command 0xffff8801250291c0, task 0xffff88018a8e5b80, timed out: BLK_EH_NOT_HANDLED
[19318.000125] sas: Enter sas_scsi_recover_host
[19318.000135] sas: trying to find task 0xffff88018a8e5b80
[19318.000141] sas: sas_scsi_find_task: aborting task 0xffff88018a8e5b80
[19318.000152] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task() mvi=ffff880224040000 task=ffff88018a8e5b80 slot=ffff8802240666d8 slot_idx=x5
[19318.000163] sas: sas_scsi_find_task: task 0xffff88018a8e5b80 is aborted
[19318.000169] sas: sas_eh_handle_sas_errors: task 0xffff88018a8e5b80 is aborted
[19318.000175] sas: sas_ata_task_done: SAS error 8d
[19318.000185] ata24: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00
[19318.000896] ata24: status=0x01 { Error }
[19318.000902] ata24: error=0x04 { DriveStatusError }
[19318.000922] sas: --- Exit sas_scsi_recover_host



[Jack] Do all the drives discoverd? There are still commands timeout, maybe the disks need more time to response, or something
wrong with the driver, I'm not sure.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mvsas errors in 2.6.36
  2010-12-04 12:33                 ` jack_wang
@ 2010-12-04 12:54                   ` Thomas Fjellstrom
  2010-12-04 15:44                     ` Thomas Fjellstrom
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Fjellstrom @ 2010-12-04 12:54 UTC (permalink / raw)
  To: jack_wang; +Cc: David Milburn, Andre Tomt, Linux Kernel List, linux-scsi

On December 4, 2010, jack_wang wrote:
> 
> Here is what I get with that returning 0 rather than -1 as you requested:
> [19107.040031] sas: command 0xffff88011c77f9c0, task 0xffff88022ae51600, timed out: BLK_EH_NOT_HANDLED
> [19107.040062] sas: Enter sas_scsi_recover_host
> [19107.040072] sas: trying to find task 0xffff88022ae51600
> [19107.040079] sas: sas_scsi_find_task: aborting task 0xffff88022ae51600
> [19107.040089] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task() mvi=ffff880224040000 task=ffff88022ae51600 slot=ffff880224066680 slot_idx=x4
> [19107.040101] sas: sas_scsi_find_task: task 0xffff88022ae51600 is aborted
> [19107.040107] sas: sas_eh_handle_sas_errors: task 0xffff88022ae51600 is aborted
> [19107.040113] sas: sas_ata_task_done: SAS error 8d
> [19107.040124] ata21: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00
> [19107.040860] ata21: status=0x01 { Error }
> [19107.040866] ata21: error=0x04 { DriveStatusError }
> [19107.040886] sas: --- Exit sas_scsi_recover_host
> [19318.000085] sas: command 0xffff8801250291c0, task 0xffff88018a8e5b80, timed out: BLK_EH_NOT_HANDLED
> [19318.000125] sas: Enter sas_scsi_recover_host
> [19318.000135] sas: trying to find task 0xffff88018a8e5b80
> [19318.000141] sas: sas_scsi_find_task: aborting task 0xffff88018a8e5b80
> [19318.000152] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task() mvi=ffff880224040000 task=ffff88018a8e5b80 slot=ffff8802240666d8 slot_idx=x5
> [19318.000163] sas: sas_scsi_find_task: task 0xffff88018a8e5b80 is aborted
> [19318.000169] sas: sas_eh_handle_sas_errors: task 0xffff88018a8e5b80 is aborted
> [19318.000175] sas: sas_ata_task_done: SAS error 8d
> [19318.000185] ata24: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00
> [19318.000896] ata24: status=0x01 { Error }
> [19318.000902] ata24: error=0x04 { DriveStatusError }
> [19318.000922] sas: --- Exit sas_scsi_recover_host
> 
> 
> 
> [Jack] Do all the drives discoverd? There are still commands timeout, maybe the disks need more time to response, or something
> wrong with the driver, I'm not sure.

All drives come up. That last set of logs is something that happens once
or twice an hour while running. I just rebooted again to see what
difference the change makes with a fresh startup. So far it seems that
the controller is running properly in SATA II/3Gbps mode after the reboot.

Just to contrast what the kernel reports in the two scenarios:
rmmod+modprobe:
sas: DOING DISCOVERY on port 0, pid:7283
drivers/scsi/mvsas/mv_sas.c 1388:found dev[0:5] is gone.
sas: sas_ata_phy_reset: Found ATA device.
ata15.00: ATA-8: ST31000528AS, CC34, max UDMA/133
ata15.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
ata15.00: qc timeout (cmd 0xef)
[snip mvsas reset]
sas: sas_ata_phy_reset: Found ATA device.
sas: sas_to_ata_err: Saw error 2.  What to do?
sas: sas_ata_task_done: SAS error 2
ata15.00: failed to IDENTIFY (I/O error, err_mask=0x100)
sas: STUB sas_ata_scr_read
ata15: limiting SATA link speed to 1.5 Gbps
ata15.00: limiting speed to UDMA/133:PIO3

fresh boot:
sas: DOING DISCOVERY on port 0, pid:312
drivers/scsi/mvsas/mv_sas.c 1388:found dev[0:5] is gone.
sas: sas_ata_phy_reset: Found ATA device.
ata9.00: ATA-8: ST31000528AS, CC34, max UDMA/133
ata9.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
ata9.00: configured for UDMA/133

This seems to happen on all ports. As does my original issue, though it
(the original issue) doesn't happen to all ports at the same time, rather
events seem to randomly happen, to one or more ports at random times.

As you can see, the drive are 1TB Seagate SATAII drives. They are setup
in a md-raid 5 array. Luckily these events don't bubble any errors up
the stack causing a rebuild.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


-- 
Thomas Fjellstrom
thomas@fjellstrom.ca

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mvsas errors in 2.6.36
  2010-12-04 12:54                   ` Thomas Fjellstrom
@ 2010-12-04 15:44                     ` Thomas Fjellstrom
  2010-12-04 18:22                       ` Thomas Fjellstrom
  2010-12-05  2:08                       ` jack_wang
  0 siblings, 2 replies; 26+ messages in thread
From: Thomas Fjellstrom @ 2010-12-04 15:44 UTC (permalink / raw)
  To: jack_wang; +Cc: David Milburn, Andre Tomt, Linux Kernel List, linux-scsi

On December 4, 2010, Thomas Fjellstrom wrote:
> On December 4, 2010, jack_wang wrote:
> > 
> > Here is what I get with that returning 0 rather than -1 as you requested:
> > [19107.040031] sas: command 0xffff88011c77f9c0, task 0xffff88022ae51600, timed out: BLK_EH_NOT_HANDLED
> > [19107.040062] sas: Enter sas_scsi_recover_host
> > [19107.040072] sas: trying to find task 0xffff88022ae51600
> > [19107.040079] sas: sas_scsi_find_task: aborting task 0xffff88022ae51600
> > [19107.040089] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task() mvi=ffff880224040000 task=ffff88022ae51600 slot=ffff880224066680 slot_idx=x4
> > [19107.040101] sas: sas_scsi_find_task: task 0xffff88022ae51600 is aborted
> > [19107.040107] sas: sas_eh_handle_sas_errors: task 0xffff88022ae51600 is aborted
> > [19107.040113] sas: sas_ata_task_done: SAS error 8d
> > [19107.040124] ata21: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00
> > [19107.040860] ata21: status=0x01 { Error }
> > [19107.040866] ata21: error=0x04 { DriveStatusError }
> > [19107.040886] sas: --- Exit sas_scsi_recover_host
> > [19318.000085] sas: command 0xffff8801250291c0, task 0xffff88018a8e5b80, timed out: BLK_EH_NOT_HANDLED
> > [19318.000125] sas: Enter sas_scsi_recover_host
> > [19318.000135] sas: trying to find task 0xffff88018a8e5b80
> > [19318.000141] sas: sas_scsi_find_task: aborting task 0xffff88018a8e5b80
> > [19318.000152] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task() mvi=ffff880224040000 task=ffff88018a8e5b80 slot=ffff8802240666d8 slot_idx=x5
> > [19318.000163] sas: sas_scsi_find_task: task 0xffff88018a8e5b80 is aborted
> > [19318.000169] sas: sas_eh_handle_sas_errors: task 0xffff88018a8e5b80 is aborted
> > [19318.000175] sas: sas_ata_task_done: SAS error 8d
> > [19318.000185] ata24: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00
> > [19318.000896] ata24: status=0x01 { Error }
> > [19318.000902] ata24: error=0x04 { DriveStatusError }
> > [19318.000922] sas: --- Exit sas_scsi_recover_host
> > 
> > 
> > 
> > [Jack] Do all the drives discoverd? There are still commands timeout, maybe the disks need more time to response, or something
> > wrong with the driver, I'm not sure.
> 
> All drives come up. That last set of logs is something that happens once
> or twice an hour while running. I just rebooted again to see what
> difference the change makes with a fresh startup. So far it seems that
> the controller is running properly in SATA II/3Gbps mode after the reboot.
> 
> Just to contrast what the kernel reports in the two scenarios:
> rmmod+modprobe:
> sas: DOING DISCOVERY on port 0, pid:7283
> drivers/scsi/mvsas/mv_sas.c 1388:found dev[0:5] is gone.
> sas: sas_ata_phy_reset: Found ATA device.
> ata15.00: ATA-8: ST31000528AS, CC34, max UDMA/133
> ata15.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
> ata15.00: qc timeout (cmd 0xef)
> [snip mvsas reset]
> sas: sas_ata_phy_reset: Found ATA device.
> sas: sas_to_ata_err: Saw error 2.  What to do?
> sas: sas_ata_task_done: SAS error 2
> ata15.00: failed to IDENTIFY (I/O error, err_mask=0x100)
> sas: STUB sas_ata_scr_read
> ata15: limiting SATA link speed to 1.5 Gbps
> ata15.00: limiting speed to UDMA/133:PIO3
> 
> fresh boot:
> sas: DOING DISCOVERY on port 0, pid:312
> drivers/scsi/mvsas/mv_sas.c 1388:found dev[0:5] is gone.
> sas: sas_ata_phy_reset: Found ATA device.
> ata9.00: ATA-8: ST31000528AS, CC34, max UDMA/133
> ata9.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
> ata9.00: configured for UDMA/133
> 
> This seems to happen on all ports. As does my original issue, though it
> (the original issue) doesn't happen to all ports at the same time, rather
> events seem to randomly happen, to one or more ports at random times.
> 
> As you can see, the drive are 1TB Seagate SATAII drives. They are setup
> in a md-raid 5 array. Luckily these events don't bubble any errors up
> the stack causing a rebuild.

Even after the reboot it still happens, though with that change, it /seems/
as if the pause is gone, but I can't be sure yet.

[ 6080.020026] sas: command 0xffff880172dfbe80, task 0xffff8800379cbb40, timed out: BLK_EH_NOT_HANDLED
[ 6080.020053] sas: Enter sas_scsi_recover_host
[ 6080.020062] sas: trying to find task 0xffff8800379cbb40
[ 6080.020069] sas: sas_scsi_find_task: aborting task 0xffff8800379cbb40
[ 6080.020079] drivers/scsi/mvsas/mv_sas.c 1703:<7>mv_abort_task() mvi=ffff880222a00000 task=ffff8800379cbb40 slot=ffff880222a26680 slot_idx=x4
[ 6080.020090] sas: sas_scsi_find_task: task 0xffff8800379cbb40 is aborted
[ 6080.020096] sas: sas_eh_handle_sas_errors: task 0xffff8800379cbb40 is aborted
[ 6080.020102] sas: sas_ata_task_done: SAS error 8d
[ 6080.020113] ata9: translated ATA stat/err 0x01/04 to SCSI SK/ASC/ASCQ 0xb/00/00
[ 6080.020931] ata9: status=0x01 { Error }
[ 6080.020937] ata9: error=0x04 { DriveStatusError }
[ 6080.021008] sas: --- Exit sas_scsi_recover_host

Hopefully we can figure out whats causing these errors.

> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 
> 
> 
> 


-- 
Thomas Fjellstrom
thomas@fjellstrom.ca

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mvsas errors in 2.6.36
  2010-12-04 15:44                     ` Thomas Fjellstrom
@ 2010-12-04 18:22                       ` Thomas Fjellstrom
  2010-12-05  2:08                       ` jack_wang
  1 sibling, 0 replies; 26+ messages in thread
From: Thomas Fjellstrom @ 2010-12-04 18:22 UTC (permalink / raw)
  To: jack_wang; +Cc: David Milburn, Andre Tomt, Linux Kernel List, linux-scsi

On December 4, 2010, Thomas Fjellstrom wrote:
> On December 4, 2010, Thomas Fjellstrom wrote:
> > On December 4, 2010, jack_wang wrote:
> > > 
[snip]
> 
> Even after the reboot it still happens, though with that change, it /seems/
> as if the pause is gone, but I can't be sure yet.
> 

Nope, pauses are still here, but they are shorter.

-- 
Thomas Fjellstrom
thomas@fjellstrom.ca

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Re: mvsas errors in 2.6.36
  2010-12-04 15:44                     ` Thomas Fjellstrom
  2010-12-04 18:22                       ` Thomas Fjellstrom
@ 2010-12-05  2:08                       ` jack_wang
  2010-12-05 20:01                         ` Thomas Fjellstrom
  1 sibling, 1 reply; 26+ messages in thread
From: jack_wang @ 2010-12-05  2:08 UTC (permalink / raw)
  To: thomas; +Cc: David Milburn, Andre Tomt, Linux Kernel List, linux-scsi

On December 4, 2010, Thomas Fjellstrom wrote:
> On December 4, 2010, Thomas Fjellstrom wrote:
> > On December 4, 2010, jack_wang wrote:
> > > 
[snip]
> 
> Even after the reboot it still happens, though with that change, it /seems/
> as if the pause is gone, but I can't be sure yet.
> 
Nope, pauses are still here, but they are shorter.

[Jack] Yes , once the host enter error handle , the scsi core will hold on the host(not sen IOs to the host as you see pause utill 
 the error are corrected). The main reason of the host go into error host is there are commands have no response utill the command
timer timeout, this maybe the disks need more time or the host lost interupt or some other reason.  You may need to change disks
and host part by part to see what cause the command timeout.
-- 
Thomas Fjellstrom
thomas@fjellstrom.ca
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
__________ Information from ESET NOD32 Antivirus, version of virus signature database 5659 (20101129) __________
The message was checked by ESET NOD32 Antivirus.
http://www.eset.com


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mvsas errors in 2.6.36
  2010-12-02 13:17           ` Spelic
  2010-12-02 13:37             ` Thomas Fjellstrom
  2010-12-03  2:16             ` Thomas Fjellstrom
@ 2010-12-05 10:45             ` Audio Haven
  2010-12-05 10:58               ` Mikael Abrahamsson
  2 siblings, 1 reply; 26+ messages in thread
From: Audio Haven @ 2010-12-05 10:45 UTC (permalink / raw)
  To: Spelic; +Cc: thomas, linux-scsi

On Thu, Dec 2, 2010 at 2:17 PM, Spelic <spelic@shiftmail.org> wrote:
> It's already applied only in 2.6.37-rc1..rc4

After one year of mvsas troubles, I can finally say that 2.6.37-rc4
fixed all of my mvsas issues. I have been stress testing 2 machines
for days and can't trigger the usual mvsas issues (nexus errors and
unplug notices) anymore.

In the past, on one system all 8 1TB drives of 2 different brands
(Hitachi & Samsung) on a Supermicro AOC-SASLP-MV8 gave me random
errors, such as unplug notices. Large file copies over samba stalled
or were aborted. This could easily be triggered by doing 1 hour of
large IO.

On a second system, newer 1.5 TB drives gave me almost no issues. I
had to stress much longer before something appeared in the log. Both
machines have identical motherboards and same kernel versions. So my
suspicion some of the older 1 TB drives were bad is now confirmed.

With 2.6.37-rc4, only the real faulty drives are now reported in dmesg
instead of ALL drives after waiting long enough. I found one drive had
12 uncorrectable sectors according to smart, while another drive
always reported DriveStatusError under heavy load. These two drives
were always slowing down the array. After removong them from the RAID6
software raid, speed is now stable.

On the second system with all good drives, the nexus errors and unplug
notices are completely gone.

Thanks to everyone who helped making mvsas stable as it finally works!

Best regards,

Audio Haven (Frederic Vanden Poel)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mvsas errors in 2.6.36
  2010-12-05 10:45             ` Audio Haven
@ 2010-12-05 10:58               ` Mikael Abrahamsson
  2010-12-06 11:11                 ` Audio Haven
  0 siblings, 1 reply; 26+ messages in thread
From: Mikael Abrahamsson @ 2010-12-05 10:58 UTC (permalink / raw)
  To: Audio Haven; +Cc: linux-scsi

On Sun, 5 Dec 2010, Audio Haven wrote:

> Thanks to everyone who helped making mvsas stable as it finally works!

Did you include smartctl reading under load as well, does that now work 
reliably?

I've had my AOC-SASLP-MV8 lying in a box for about a year because I just 
gave up trying to get it to work, would be very welcome news to see it 
working in 2.6.37.

The changes made, are they backportable to 2.6.32 by any chance?

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mvsas errors in 2.6.36
  2010-12-05  2:08                       ` jack_wang
@ 2010-12-05 20:01                         ` Thomas Fjellstrom
  0 siblings, 0 replies; 26+ messages in thread
From: Thomas Fjellstrom @ 2010-12-05 20:01 UTC (permalink / raw)
  To: jack_wang; +Cc: David Milburn, Andre Tomt, Linux Kernel List, linux-scsi

On December 4, 2010, jack_wang wrote:
> On December 4, 2010, Thomas Fjellstrom wrote:
> > On December 4, 2010, Thomas Fjellstrom wrote:
> > > On December 4, 2010, jack_wang wrote:
> > > > 
> [snip]
> > 
> > Even after the reboot it still happens, though with that change, it 
/seems/
> > as if the pause is gone, but I can't be sure yet.
> > 
> Nope, pauses are still here, but they are shorter.
> 
> [Jack] Yes , once the host enter error handle , the scsi core will hold on 
the host(not sen IOs to the host as you see pause utill 
>  the error are corrected). The main reason of the host go into error host is 
there are commands have no response utill the command
> timer timeout, this maybe the disks need more time or the host lost interupt 
or some other reason.  You may need to change disks
> and host part by part to see what cause the command timeout.
> 

Well so far I see errors from 4 of my 6 disks since I rebooted 30 hours ago. 
And in the past I've seen these errors come from all disks. I'm more inclined 
to believe its some kind of handling issue than that all of those drives are 
in some way bad. Especially since that older driver I got from Andy Yan did 
not suffer from any of these issues. Of course it had other problems, like 
hotswap oopsing the kernel, but I almost never use hotswap, so it was never an 
issue for me.

Now I'm not sure its related, but I do see this:
[  342.353646] hrtimer: interrupt took 61135 ns
in my dmesg. But that really isn't that long of a pause least not by human 
standards. And theres only the one. It happens once just after boot up, and 
then never again (I assume because at bootup the machine is starting up 4 kvm 
VMs /at the same time/).

-- 
Thomas Fjellstrom
thomas@fjellstrom.ca

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mvsas errors in 2.6.36
  2010-12-05 10:58               ` Mikael Abrahamsson
@ 2010-12-06 11:11                 ` Audio Haven
  2010-12-07 16:30                   ` Benjamin LaHaise
  0 siblings, 1 reply; 26+ messages in thread
From: Audio Haven @ 2010-12-06 11:11 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: linux-scsi

Hello Mikael,

smartctl reporting seems to work fine while xfs_fsr is defragmenting
xfs on top of the raid6 software raid + lvm2:

# for disk in /dev/sd[c-j]; do  smartctl -T permissive -s on -A
$disk|grep Raw_Read_Error_Rate ; done
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail
Always       -       2
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail
Always       -       0
  1 Raw_Read_Error_Rate     0x000b   086   086   016    Pre-fail
Always       -       4587581
  1 Raw_Read_Error_Rate     0x000b   086   086   016    Pre-fail
Always       -       2883644
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail
Always       -       12
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail
Always       -       0
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail
Always       -       0
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail
Always       -       1

No issues appear in dmesg. Notice the difference in Hitachi disks
(very high numbers, likely the actual raw error rate before error
correction, both are Hitachi HDT721010SLA360) vs Samsung disks (only
the unrecoverable, SAMSUNG HD103SI -> they lie, shouldn't be named
'raw').

On Sun, Dec 5, 2010 at 11:58 AM, Mikael Abrahamsson <swmike@swm.pp.se> wrote:
> On Sun, 5 Dec 2010, Audio Haven wrote:
>
>> Thanks to everyone who helped making mvsas stable as it finally works!
>
> Did you include smartctl reading under load as well, does that now work
> reliably?
>
> I've had my AOC-SASLP-MV8 lying in a box for about a year because I just
> gave up trying to get it to work, would be very welcome news to see it
> working in 2.6.37.
>
> The changes made, are they backportable to 2.6.32 by any chance?
>
> --
> Mikael Abrahamsson    email: swmike@swm.pp.se
>
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mvsas errors in 2.6.36
  2010-12-06 11:11                 ` Audio Haven
@ 2010-12-07 16:30                   ` Benjamin LaHaise
  0 siblings, 0 replies; 26+ messages in thread
From: Benjamin LaHaise @ 2010-12-07 16:30 UTC (permalink / raw)
  To: Audio Haven; +Cc: Mikael Abrahamsson, linux-scsi

On Mon, Dec 06, 2010 at 12:11:11PM +0100, Audio Haven wrote:
> No issues appear in dmesg. Notice the difference in Hitachi disks
> (very high numbers, likely the actual raw error rate before error
> correction, both are Hitachi HDT721010SLA360) vs Samsung disks (only
> the unrecoverable, SAMSUNG HD103SI -> they lie, shouldn't be named
> 'raw').

This appears to have been the problem I was having after adding in the 
PCI ids for the HighPoint 2720s.  So far I've been running tests on 
2.6.37-rc4+ for the past day without any device errors which is a good 
sign.

		-ben

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mvsas errors in 2.6.36
  2010-10-31 15:11 ` Thomas Fjellstrom
  2010-11-02 17:02   ` Audio Haven
  2010-11-17  7:53   ` Thomas Fjellstrom
@ 2010-12-07 19:45   ` tomm
  2 siblings, 0 replies; 26+ messages in thread
From: tomm @ 2010-12-07 19:45 UTC (permalink / raw)
  To: linux-scsi

Thomas Fjellstrom <thomas <at> fjellstrom.ca> writes:

[snip]
> > The card is a AOC-SASLP-MV8

Been following this discussion and wondering if you have
finally found a solution to this problem?

Also, have you considered this might be a heat problem with
the card?  That particular Supermicro card as a large heat
sink on the Marvell chip and during operation I've noticed
it gets quite hot.  Perhaps you can try running a fan over
the card and re-running your tests to rule this out.

-tom


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2010-12-07 19:50 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-29 12:50 mvsas errors in 2.6.36 Thomas Fjellstrom
2010-10-31 15:11 ` Thomas Fjellstrom
2010-11-02 17:02   ` Audio Haven
2010-11-17  7:53   ` Thomas Fjellstrom
2010-11-17  8:24     ` Andre Tomt
2010-12-02  6:29       ` Thomas Fjellstrom
2010-12-02  9:48         ` Thomas Fjellstrom
2010-12-02 13:17           ` Spelic
2010-12-02 13:37             ` Thomas Fjellstrom
2010-12-03  2:16             ` Thomas Fjellstrom
2010-12-05 10:45             ` Audio Haven
2010-12-05 10:58               ` Mikael Abrahamsson
2010-12-06 11:11                 ` Audio Haven
2010-12-07 16:30                   ` Benjamin LaHaise
2010-12-03 16:39           ` Thomas Fjellstrom
2010-12-03 20:31             ` David Milburn
2010-12-04  6:57               ` Thomas Fjellstrom
     [not found]               ` <201012041550372348573@usish.com>
2010-12-04  8:37                 ` Thomas Fjellstrom
2010-12-04 11:52                 ` Thomas Fjellstrom
2010-12-04 12:33                 ` jack_wang
2010-12-04 12:54                   ` Thomas Fjellstrom
2010-12-04 15:44                     ` Thomas Fjellstrom
2010-12-04 18:22                       ` Thomas Fjellstrom
2010-12-05  2:08                       ` jack_wang
2010-12-05 20:01                         ` Thomas Fjellstrom
2010-12-07 19:45   ` tomm

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.