All of lore.kernel.org
 help / color / mirror / Atom feed
* aic94xx driver woes
@ 2007-03-31 16:48 Douglas Gilbert
  2007-03-31 17:28 ` James Bottomley
  2007-03-31 18:01 ` Darrick J. Wong
  0 siblings, 2 replies; 9+ messages in thread
From: Douglas Gilbert @ 2007-03-31 16:48 UTC (permalink / raw)
  To: SCSI Mailing List

Every 3 months or so I complain about the aic94xx
SAS low level driver. Here I go again. Same old story
so most could just stop reading here.

-----------------------------------------------

I have been asked to look at SMP (SAS Management
Protocol) commands going via the bsg driver to
the SAS transport and onto the aic94xx driver.

My SAS hardware external to my HBAs (i.e. SAS+SATA disks
and some expanders) works just fine if it is connected
to:
  - a LSI Fusion HBA (I have two in the 34xx family)
  - an adaptec 48300 HBA if and only if it is running
    the _real_ Luben Tuikov aic94xx driver (or a W2K
    driver)

Unfortunately to run the above test I need to forego
Luben's driver and use the mainline kernel version.
[The mainline version also has Luben's name on it but
I think that should be changed as others have hacked it.]

So what happens when I run the aic94xx driver found
in linux-2.6-block.git bsg branch which says it is
lk 2.6.21-rc5? See below. Basically it times out
sending a REPORT GENERAL SMP request to an expander
(probably the first SMP request sent) and that is it.
No disks or expanders are found. However the 48300
card's POST scan sees everything (as does the W2K driver).

So that is almost 12 months that I have been reporting
this driver as broken. Is it just me or my hardware?


Doug Gilbert

Edited highlights from my log:

aic94xx: found Adaptec AIC-9410W SAS/SATA Host Adapter, device 0000:03:04.0
scsi5 : aic94xx
aic94xx: BIOS present (1,1), 1822
aic94xx: ue num:4, ue size:88
aic94xx: manuf sect SAS_ADDR 50000d10002dc000
aic94xx: manuf sect PCBA SN 0BB0C54904WZ
aic94xx: ms: num_phy_desc: 8
aic94xx: ms: phy0: ENABLED
aic94xx: ms: phy1: ENABLED
aic94xx: ms: phy2: ENABLED
aic94xx: ms: phy3: ENABLED
aic94xx: ms: phy4: ENABLED
aic94xx: ms: phy5: ENABLED
aic94xx: ms: phy6: ENABLED
aic94xx: ms: phy7: ENABLED
aic94xx: ms: max_phys:0x8, num_phys:0x8
aic94xx: ms: enabled_phys:0xff
aic94xx: ctrla: phy0: sas_addr: 50000d10002dc000, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
aic94xx: ctrla: phy1: sas_addr: 50000d10002dc000, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
aic94xx: ctrla: phy2: sas_addr: 50000d10002dc000, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
aic94xx: ctrla: phy3: sas_addr: 50000d10002dc000, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
aic94xx: ctrla: phy4: sas_addr: 50000d10002dc000, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
aic94xx: ctrla: phy5: sas_addr: 50000d10002dc000, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
aic94xx: ctrla: phy6: sas_addr: 50000d10002dc000, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
aic94xx: ctrla: phy7: sas_addr: 50000d10002dc000, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
aic94xx: max_scbs:512, max_ddbs:128
aic94xx: setting phy0 addr to 50000d10002dc000
aic94xx: setting phy1 addr to 50000d10002dc000
aic94xx: setting phy2 addr to 50000d10002dc000
aic94xx: setting phy3 addr to 50000d10002dc000
aic94xx: setting phy4 addr to 50000d10002dc000
aic94xx: setting phy5 addr to 50000d10002dc000
aic94xx: setting phy6 addr to 50000d10002dc000
aic94xx: setting phy7 addr to 50000d10002dc000
aic94xx: Found sequencer Firmware version 1.1 (V17/10c6)
aic94xx: downloading CSEQ...
aic94xx: dma-ing 8192 bytes
aic94xx: verified 8192 bytes, passed
aic94xx: downloading LSEQs...
aic94xx: dma-ing 14336 bytes
aic94xx: LSEQ0 verified 14336 bytes, passed
aic94xx: LSEQ1 verified 14336 bytes, passed
aic94xx: LSEQ2 verified 14336 bytes, passed
aic94xx: LSEQ3 verified 14336 bytes, passed
aic94xx: LSEQ4 verified 14336 bytes, passed
aic94xx: LSEQ5 verified 14336 bytes, passed
aic94xx: LSEQ6 verified 14336 bytes, passed
aic94xx: LSEQ7 verified 14336 bytes, passed
aic94xx: max_scbs:446
aic94xx: first_scb_site_no:0x20
aic94xx: last_scb_site_no:0x1fe
aic94xx: First SCB dma_handle: 0x35189000
aic94xx: device 0000:03:04.0: SAS addr 50000d10002dc000, PCBA SN 0BB0C54904WZ, 8 phys, 8 enabled phys, flash present, BIOS build 1822
aic94xx: posting 3 escbs
aic94xx: escbs posted
aic94xx: posting 8 control phy scbs
aic94xx: control_phy_tasklet_complete: phy0, lrate:0x9, proto:0xe
aic94xx: escb_tasklet_complete: phy0: BYTES_DMAED
aic94xx: SAS proto IDENTIFY:
aic94xx: 00: 20 00 00 02
aic94xx: 04: 00 00 00 00
aic94xx: 08: 00 00 00 00
aic94xx: 0c: 50 06 05 b0
aic94xx: 10: 00 00 33 ef
aic94xx: 14: 06 00 00 00
aic94xx: 18: 00 00 00 00
aic94xx: asd_form_port: updating phy_mask 0x1 for phy0
sas: phy0 added to port0, phy_mask:0x1
sas: DOING DISCOVERY on port 0, pid:2100
aic94xx: scb:0x80 timed out
sas last message repeated 6 times
sas: smp task timed out or aborted
aic94xx: tmf timed out
aic94xx: tmf came back
aic94xx: task not done, clearing nexus
aic94xx: asd_clear_nexus_index: PRE
aic94xx: asd_clear_nexus_index: POST
aic94xx: asd_clear_nexus_index: clear nexus posted, waiting...
aic94xx: asd_clear_nexus_timedout: here
aic94xx: came back from clear nexus
aic94xx: task not done, clearing nexus
aic94xx: asd_clear_nexus_index: PRE
aic94xx: asd_clear_nexus_index: POST
aic94xx: asd_clear_nexus_index: clear nexus posted, waiting...
aic94xx: asd_clear_nexus_timedout: here
aic94xx: came back from clear nexus
aic94xx: task 0xf4568ea8 aborted, res: 0x5
sas: SMP task aborted and not done
sas: RG to ex 500605b0000033ef failed:0xffffff06
sas: DONE DISCOVERY on port 0, pid:2100, result:-250



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: aic94xx driver woes
  2007-03-31 16:48 aic94xx driver woes Douglas Gilbert
@ 2007-03-31 17:28 ` James Bottomley
  2007-03-31 18:01 ` Darrick J. Wong
  1 sibling, 0 replies; 9+ messages in thread
From: James Bottomley @ 2007-03-31 17:28 UTC (permalink / raw)
  To: dougg; +Cc: SCSI Mailing List

On Sat, 2007-03-31 at 12:48 -0400, Douglas Gilbert wrote:
> Every 3 months or so I complain about the aic94xx
> SAS low level driver. Here I go again. Same old story
> so most could just stop reading here.
> 
> -----------------------------------------------
> 
> I have been asked to look at SMP (SAS Management
> Protocol) commands going via the bsg driver to
> the SAS transport and onto the aic94xx driver.
> 
> My SAS hardware external to my HBAs (i.e. SAS+SATA disks
> and some expanders) works just fine if it is connected
> to:
>   - a LSI Fusion HBA (I have two in the 34xx family)
>   - an adaptec 48300 HBA if and only if it is running
>     the _real_ Luben Tuikov aic94xx driver (or a W2K
>     driver)
> 
> Unfortunately to run the above test I need to forego
> Luben's driver and use the mainline kernel version.
> [The mainline version also has Luben's name on it but
> I think that should be changed as others have hacked it.]
> 
> So what happens when I run the aic94xx driver found
> in linux-2.6-block.git bsg branch which says it is
> lk 2.6.21-rc5? See below. Basically it times out
> sending a REPORT GENERAL SMP request to an expander
> (probably the first SMP request sent) and that is it.
> No disks or expanders are found. However the 48300
> card's POST scan sees everything (as does the W2K driver).

Hopefully you're right ... and there haven't been too many updates to
aic94xx recently.  However, it is preferable when reporting bugs to make
sure by reporting them against either a vanilla kernel or scsi-misc-2.6

> So that is almost 12 months that I have been reporting
> this driver as broken. Is it just me or my hardware?

Impossible to say ... I do know it works for me(tm).

> 
> Doug Gilbert
> 
> Edited highlights from my log:
> 
> aic94xx: found Adaptec AIC-9410W SAS/SATA Host Adapter, device 0000:03:04.0
> scsi5 : aic94xx
> aic94xx: BIOS present (1,1), 1822
> aic94xx: ue num:4, ue size:88
> aic94xx: manuf sect SAS_ADDR 50000d10002dc000
> aic94xx: manuf sect PCBA SN 0BB0C54904WZ
> aic94xx: ms: num_phy_desc: 8
> aic94xx: ms: phy0: ENABLED
> aic94xx: ms: phy1: ENABLED
> aic94xx: ms: phy2: ENABLED
> aic94xx: ms: phy3: ENABLED
> aic94xx: ms: phy4: ENABLED
> aic94xx: ms: phy5: ENABLED
> aic94xx: ms: phy6: ENABLED
> aic94xx: ms: phy7: ENABLED
> aic94xx: ms: max_phys:0x8, num_phys:0x8
> aic94xx: ms: enabled_phys:0xff
> aic94xx: ctrla: phy0: sas_addr: 50000d10002dc000, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
> aic94xx: ctrla: phy1: sas_addr: 50000d10002dc000, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
> aic94xx: ctrla: phy2: sas_addr: 50000d10002dc000, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
> aic94xx: ctrla: phy3: sas_addr: 50000d10002dc000, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
> aic94xx: ctrla: phy4: sas_addr: 50000d10002dc000, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
> aic94xx: ctrla: phy5: sas_addr: 50000d10002dc000, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
> aic94xx: ctrla: phy6: sas_addr: 50000d10002dc000, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
> aic94xx: ctrla: phy7: sas_addr: 50000d10002dc000, sas rate:0x9-0x8, sata rate:0x0-0x0, flags:0x0
> aic94xx: max_scbs:512, max_ddbs:128
> aic94xx: setting phy0 addr to 50000d10002dc000
> aic94xx: setting phy1 addr to 50000d10002dc000
> aic94xx: setting phy2 addr to 50000d10002dc000
> aic94xx: setting phy3 addr to 50000d10002dc000
> aic94xx: setting phy4 addr to 50000d10002dc000
> aic94xx: setting phy5 addr to 50000d10002dc000
> aic94xx: setting phy6 addr to 50000d10002dc000
> aic94xx: setting phy7 addr to 50000d10002dc000
> aic94xx: Found sequencer Firmware version 1.1 (V17/10c6)
> aic94xx: downloading CSEQ...
> aic94xx: dma-ing 8192 bytes
> aic94xx: verified 8192 bytes, passed
> aic94xx: downloading LSEQs...
> aic94xx: dma-ing 14336 bytes
> aic94xx: LSEQ0 verified 14336 bytes, passed
> aic94xx: LSEQ1 verified 14336 bytes, passed
> aic94xx: LSEQ2 verified 14336 bytes, passed
> aic94xx: LSEQ3 verified 14336 bytes, passed
> aic94xx: LSEQ4 verified 14336 bytes, passed
> aic94xx: LSEQ5 verified 14336 bytes, passed
> aic94xx: LSEQ6 verified 14336 bytes, passed
> aic94xx: LSEQ7 verified 14336 bytes, passed
> aic94xx: max_scbs:446
> aic94xx: first_scb_site_no:0x20
> aic94xx: last_scb_site_no:0x1fe
> aic94xx: First SCB dma_handle: 0x35189000
> aic94xx: device 0000:03:04.0: SAS addr 50000d10002dc000, PCBA SN 0BB0C54904WZ, 8 phys, 8 enabled phys, flash present, BIOS build 1822
> aic94xx: posting 3 escbs
> aic94xx: escbs posted
> aic94xx: posting 8 control phy scbs
> aic94xx: control_phy_tasklet_complete: phy0, lrate:0x9, proto:0xe
> aic94xx: escb_tasklet_complete: phy0: BYTES_DMAED
> aic94xx: SAS proto IDENTIFY:
> aic94xx: 00: 20 00 00 02

Edge Expander talking SMP ... that looks fairly standard

> aic94xx: 04: 00 00 00 00
> aic94xx: 08: 00 00 00 00
> aic94xx: 0c: 50 06 05 b0
> aic94xx: 10: 00 00 33 ef

SAS address 500605b0000033ef

That looks slightly odd for an expander ... usually expanders end in a
zero ... is that what the other SAS drivers report the address to be?

> aic94xx: 14: 06 00 00 00

Plugged into expander phy6

> aic94xx: 18: 00 00 00 00
> aic94xx: asd_form_port: updating phy_mask 0x1 for phy0
> sas: phy0 added to port0, phy_mask:0x1
> sas: DOING DISCOVERY on port 0, pid:2100
> aic94xx: scb:0x80 timed out

Definitely a timeout ... my first guess is address mismatch, but it
could be many other things.

> sas last message repeated 6 times
> sas: smp task timed out or aborted
> aic94xx: tmf timed out
> aic94xx: tmf came back
> aic94xx: task not done, clearing nexus
> aic94xx: asd_clear_nexus_index: PRE
> aic94xx: asd_clear_nexus_index: POST
> aic94xx: asd_clear_nexus_index: clear nexus posted, waiting...
> aic94xx: asd_clear_nexus_timedout: here
> aic94xx: came back from clear nexus
> aic94xx: task not done, clearing nexus
> aic94xx: asd_clear_nexus_index: PRE
> aic94xx: asd_clear_nexus_index: POST
> aic94xx: asd_clear_nexus_index: clear nexus posted, waiting...
> aic94xx: asd_clear_nexus_timedout: here
> aic94xx: came back from clear nexus
> aic94xx: task 0xf4568ea8 aborted, res: 0x5
> sas: SMP task aborted and not done
> sas: RG to ex 500605b0000033ef failed:0xffffff06
> sas: DONE DISCOVERY on port 0, pid:2100, result:-250

Details of your topology would be helpful ... as well as whether you can
get the HBA to see a directly attached device (just in case phy0 is bad
on the HBA).

James


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: aic94xx driver woes
  2007-03-31 16:48 aic94xx driver woes Douglas Gilbert
  2007-03-31 17:28 ` James Bottomley
@ 2007-03-31 18:01 ` Darrick J. Wong
  2007-03-31 19:05   ` Douglas Gilbert
  1 sibling, 1 reply; 9+ messages in thread
From: Darrick J. Wong @ 2007-03-31 18:01 UTC (permalink / raw)
  To: dougg; +Cc: SCSI Mailing List

Douglas Gilbert wrote:

> So that is almost 12 months that I have been reporting
> this driver as broken. Is it just me or my hardware?

I seem to recall you saying that the LSI Fusion card was plugged into
the same expander as the 48300?  If so, does unplugging the Fusion card
from the expander make it work?

> aic94xx: Found sequencer Firmware version 1.1 (V17/10c6)

Have you tried the V30 sequencer?

--D

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: aic94xx driver woes
  2007-03-31 18:01 ` Darrick J. Wong
@ 2007-03-31 19:05   ` Douglas Gilbert
  2007-03-31 23:17     ` James Bottomley
  0 siblings, 1 reply; 9+ messages in thread
From: Douglas Gilbert @ 2007-03-31 19:05 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: SCSI Mailing List, James.Bottomley

Darrick J. Wong wrote:
> Douglas Gilbert wrote:
> 
>> So that is almost 12 months that I have been reporting
>> this driver as broken. Is it just me or my hardware?
> 
> I seem to recall you saying that the LSI Fusion card was plugged into
> the same expander as the 48300?  If so, does unplugging the Fusion card
> from the expander make it work?

Darrick,
There is a LSI Fusion card in the adjacent PCI-X
slot but it wasn't connected to anything so it
should not have been interfering.

I have another Fusion card in a second machine
which was off. I'll turn the second machine on
now to show the topology of my SAS domain.

Topology (seen from the second machine's MPT Fusion
phy which is both an initiator and a target):

# smp_discover -mb
Device <500605b0000033ef>, expander (only connected phys shown):
  phy   3:S:attached:[500605b00006f260:00  i(SSP+STP+SMP) t(SSP)]  3 Gbps
  phy   5:T:attached:[500605b000000af0:02 exp t(SMP)]  3 Gbps
  phy   6:T:attached:[50000d10002dc000:00  i(SSP+STP+SMP)]  3 Gbps
  phy   9:T:attached:[5000c500005208ee:01  t(SSP)]  3 Gbps
  phy  11:T:attached:[5000c50001b0213a:01  t(SSP)]  3 Gbps

# smp_discover -mb -s 0x500605b000000af0
Device <500605b000000af0>, expander (only connected phys shown):
  phy   2:S:attached:[500605b0000033ef:05 exp t(SMP)]  3 Gbps
  phy  10:T:attached:[5000c50001b02139:00  t(SSP)]  3 Gbps
  phy  11:T:attached:[5000c500005208ed:00  t(SSP)]  3 Gbps

James, note the SAS address of the first expander.

So with the second machine off, the expander entry
on 0x500605b0000033ef phy_id 3 is not there. [The
mainline aic94xx driver fails the same way with the
second machine off or on.]

>> aic94xx: Found sequencer Firmware version 1.1 (V17/10c6)
> 
> Have you tried the V30 sequencer?

No. But I note that Luben's driver is still using
V17/10c6 successfully (in lk 2.6.21-rc4).

How would I know that the official driver needs firmware,
where to get it and what was the recommended version
with a Kconfig entry like this:

config SCSI_AIC94XX
        tristate "Adaptec AIC94xx SAS/SATA support"
        depends on PCI
        select SCSI_SAS_LIBSAS
        select FW_LOADER
        help
                This driver supports Adaptec's SAS/SATA 3Gb/s 64 bit PCI-X
                AIC94xx chip based host adapters.

config AIC94XX_DEBUG
        bool "Compile in debug mode"
        default y
        depends on SCSI_AIC94XX
        help
                Compiles the aic94xx driver in debug mode.  In debug mode,
                the driver prints some messages to the console.

??

Is there some useful documentation somewhere else?
If so perhaps I link to it could be placed in the
Kconfig entry.


Doug Gilbert



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: aic94xx driver woes
  2007-03-31 19:05   ` Douglas Gilbert
@ 2007-03-31 23:17     ` James Bottomley
  2007-04-01 20:29       ` Douglas Gilbert
  0 siblings, 1 reply; 9+ messages in thread
From: James Bottomley @ 2007-03-31 23:17 UTC (permalink / raw)
  To: dougg; +Cc: Darrick J. Wong, SCSI Mailing List

On Sat, 2007-03-31 at 15:05 -0400, Douglas Gilbert wrote:
> James, note the SAS address of the first expander.

Thanks, just checking ... what happens when you directly attach a disk?
Or even try the other expander?

James



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: aic94xx driver woes
  2007-03-31 23:17     ` James Bottomley
@ 2007-04-01 20:29       ` Douglas Gilbert
  2007-04-02 17:52         ` James Bottomley
  0 siblings, 1 reply; 9+ messages in thread
From: Douglas Gilbert @ 2007-04-01 20:29 UTC (permalink / raw)
  To: James Bottomley; +Cc: Darrick J. Wong, SCSI Mailing List

James Bottomley wrote:
> On Sat, 2007-03-31 at 15:05 -0400, Douglas Gilbert wrote:
>> James, note the SAS address of the first expander.
> 
> Thanks, just checking ... what happens when you directly attach a disk?

Then I get what I term as "udev hell". That is when
FC6 gets to the point during boot-up of saying
"Starting udev: " and hangs for about 5 minutes and
then continues.

I don't think my log records what happens in that
elongated pause. Later attempts to talk to the
single SAS disk (one port only connected) during
boot-up are shown below starting from the first sign
of trouble. The SAS address of the disk port is
0x5000c50001b02139 .

> Or even try the other expander?

Same as yesterday's report:
  sas: RG to ex 500605b000000af0 failed:0xffffff06


If I fiddle with the cabling long enough (i.e. shorten
it) then it will work some of the time. But how come the
card POST, Luben's driver and Adaptec's for Windows have
no problem with exactly the same wiring all of the
time? I suspect that either the HBA's phys are not
being set up properly or, the first blemish (e.g. loss
of dword synchronization) on the link, knocks the
production driver off its perch, while the other
drivers recover and continue.

Doug Gilbert


...
sas: phy3 added to port0, phy_mask:0x8
sas: DOING DISCOVERY on port 0, pid:2110
aic94xx: scb:0x80 timed out
last message repeated 6 times
sas: command 0xf57d5edc, task 0xf527bea8, timed out: EH_NOT_HANDLED
sas: Enter sas_scsi_recover_host
sas: trying to find task 0xf527bea8
sas: sas_scsi_find_task: aborting task 0xf527bea8
aic94xx: tmf timed out
aic94xx: tmf came back
aic94xx: task not done, clearing nexus
aic94xx: asd_clear_nexus_index: PRE
aic94xx: asd_clear_nexus_index: POST
aic94xx: asd_clear_nexus_index: clear nexus posted, waiting...
aic94xx: asd_clear_nexus_timedout: here
aic94xx: came back from clear nexus
aic94xx: task not done, clearing nexus
aic94xx: asd_clear_nexus_index: PRE
aic94xx: asd_clear_nexus_index: POST
aic94xx: asd_clear_nexus_index: clear nexus posted, waiting...
aic94xx: asd_clear_nexus_timedout: here
aic94xx: came back from clear nexus
aic94xx: task 0xf527bea8 aborted, res: 0x5
sas: sas_scsi_find_task: querying task 0xf527bea8
aic94xx: tmf timed out
sas: sas_scsi_find_task: task 0xf527bea8 failed to abort
sas: task 0xf527bea8 is not at LU: I_T recover
sas: I_T nexus reset for dev 5000c50001b02139
sas: clearing nexus for port:0
aic94xx: asd_clear_nexus_port: PRE
aic94xx: asd_clear_nexus_port: POST
aic94xx: asd_clear_nexus_port: clear nexus posted, waiting...
aic94xx: asd_clear_nexus_timedout: here
sas: clear nexus ha
aic94xx: asd_clear_nexus_ha: PRE
aic94xx: asd_clear_nexus_ha: POST
aic94xx: asd_clear_nexus_ha: clear nexus posted, waiting...
aic94xx: asd_clear_nexus_timedout: here
sas: error from  device 5000c50001b02139, LUN 0 couldn't be recovered in any way
sas: --- Exit sas_eh_handle_sas_errors -- clear_q
sas: --- Exit sas_scsi_recover_host
sas: command 0xf57d5edc, task 0xf527bea8, timed out: EH_NOT_HANDLED
sas: Enter sas_scsi_recover_host
sas: trying to find task 0xf527bea8
sas: sas_scsi_find_task: aborting task 0xf527bea8
aic94xx: tmf timed out
aic94xx: tmf came back
aic94xx: task not done, clearing nexus
aic94xx: asd_clear_nexus_index: PRE
aic94xx: asd_clear_nexus_index: POST
aic94xx: asd_clear_nexus_index: clear nexus posted, waiting...
aic94xx: asd_clear_nexus_timedout: here
aic94xx: came back from clear nexus
aic94xx: task not done, clearing nexus
aic94xx: asd_clear_nexus_index: PRE
aic94xx: asd_clear_nexus_index: POST
aic94xx: asd_clear_nexus_index: clear nexus posted, waiting...
aic94xx: asd_clear_nexus_timedout: here
aic94xx: came back from clear nexus
aic94xx: task 0xf527bea8 aborted, res: 0x5
sas: sas_scsi_find_task: querying task 0xf527bea8
aic94xx: tmf timed out
sas: sas_scsi_find_task: task 0xf527bea8 failed to abort
sas: task 0xf527bea8 is not at LU: I_T recover
sas: I_T nexus reset for dev 5000c50001b02139
sas: clearing nexus for port:0
aic94xx: asd_clear_nexus_port: PRE
aic94xx: asd_clear_nexus_port: POST
aic94xx: asd_clear_nexus_port: clear nexus posted, waiting...
aic94xx: asd_clear_nexus_timedout: here
sas: clear nexus ha
aic94xx: asd_clear_nexus_ha: PRE
aic94xx: asd_clear_nexus_ha: POST
aic94xx: asd_clear_nexus_ha: clear nexus posted, waiting...
aic94xx: asd_clear_nexus_timedout: here
sas: error from  device 5000c50001b02139, LUN 0 couldn't be recovered in any way
sas: --- Exit sas_eh_handle_sas_errors -- clear_q
sas: --- Exit sas_scsi_recover_host
sas: command 0xf57d5edc, task 0xf527bea8, timed out: EH_NOT_HANDLED
sas: Enter sas_scsi_recover_host
sas: trying to find task 0xf527bea8
sas: sas_scsi_find_task: aborting task 0xf527bea8
aic94xx: tmf timed out
aic94xx: tmf came back
aic94xx: task not done, clearing nexus
aic94xx: asd_clear_nexus_index: PRE
aic94xx: asd_clear_nexus_index: POST
aic94xx: asd_clear_nexus_index: clear nexus posted, waiting...
aic94xx: asd_clear_nexus_timedout: here
aic94xx: came back from clear nexus
aic94xx: task not done, clearing nexus
aic94xx: asd_clear_nexus_index: PRE
aic94xx: asd_clear_nexus_index: POST
aic94xx: asd_clear_nexus_index: clear nexus posted, waiting...
aic94xx: asd_clear_nexus_timedout: here
aic94xx: came back from clear nexus
aic94xx: task 0xf527bea8 aborted, res: 0x5
sas: sas_scsi_find_task: querying task 0xf527bea8
aic94xx: tmf timed out
sas: sas_scsi_find_task: task 0xf527bea8 failed to abort
sas: task 0xf527bea8 is not at LU: I_T recover
sas: I_T nexus reset for dev 5000c50001b02139
sas: clearing nexus for port:0
aic94xx: asd_clear_nexus_port: PRE
aic94xx: asd_clear_nexus_port: POST
aic94xx: asd_clear_nexus_port: clear nexus posted, waiting...
aic94xx: asd_clear_nexus_timedout: here
sas: clear nexus ha
aic94xx: asd_clear_nexus_ha: PRE
aic94xx: asd_clear_nexus_ha: POST
aic94xx: asd_clear_nexus_ha: clear nexus posted, waiting...
aic94xx: asd_clear_nexus_timedout: here
sas: error from  device 5000c50001b02139, LUN 0 couldn't be recovered in any way
sas: --- Exit sas_eh_handle_sas_errors -- clear_q
sas: --- Exit sas_scsi_recover_host
sas: command 0xf57d5edc, task 0xf527bea8, timed out: EH_NOT_HANDLED
sas: Enter sas_scsi_recover_host
sas: trying to find task 0xf527bea8
sas: sas_scsi_find_task: aborting task 0xf527bea8
aic94xx: tmf timed out
aic94xx: tmf came back
aic94xx: task not done, clearing nexus
aic94xx: asd_clear_nexus_index: PRE
aic94xx: asd_clear_nexus_index: POST
aic94xx: asd_clear_nexus_index: clear nexus posted, waiting...
aic94xx: asd_clear_nexus_timedout: here
aic94xx: came back from clear nexus
aic94xx: task not done, clearing nexus
aic94xx: asd_clear_nexus_index: PRE
aic94xx: asd_clear_nexus_index: POST
aic94xx: asd_clear_nexus_index: clear nexus posted, waiting...
aic94xx: asd_clear_nexus_timedout: here
aic94xx: came back from clear nexus
aic94xx: task 0xf527bea8 aborted, res: 0x5
sas: sas_scsi_find_task: querying task 0xf527bea8
aic94xx: tmf timed out
sas: sas_scsi_find_task: task 0xf527bea8 failed to abort
sas: task 0xf527bea8 is not at LU: I_T recover
sas: I_T nexus reset for dev 5000c50001b02139
sas: clearing nexus for port:0
aic94xx: asd_clear_nexus_port: PRE
aic94xx: asd_clear_nexus_port: POST
aic94xx: asd_clear_nexus_port: clear nexus posted, waiting...
aic94xx: asd_clear_nexus_timedout: here
sas: clear nexus ha
aic94xx: asd_clear_nexus_ha: PRE
aic94xx: asd_clear_nexus_ha: POST
aic94xx: asd_clear_nexus_ha: clear nexus posted, waiting...
aic94xx: asd_clear_nexus_timedout: here
sas: error from  device 5000c50001b02139, LUN 0 couldn't be recovered in any way
sas: --- Exit sas_eh_handle_sas_errors -- clear_q
sas: --- Exit sas_scsi_recover_host
sas: DONE DISCOVERY on port 0, pid:2110, result:0






^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: aic94xx driver woes
  2007-04-01 20:29       ` Douglas Gilbert
@ 2007-04-02 17:52         ` James Bottomley
  2007-04-02 23:36           ` Douglas Gilbert
  0 siblings, 1 reply; 9+ messages in thread
From: James Bottomley @ 2007-04-02 17:52 UTC (permalink / raw)
  To: dougg; +Cc: Darrick J. Wong, SCSI Mailing List

On Sun, 2007-04-01 at 16:29 -0400, Douglas Gilbert wrote:
> ...
> sas: phy3 added to port0, phy_mask:0x8
> sas: DOING DISCOVERY on port 0, pid:2110
> aic94xx: scb:0x80 timed out

This might be the problem.

I see this periodically when a phy goes out to lunch on my system ...
with me, it always seems to be phy0 of a port containing phy0-4 ... so
phy1-3 still function to get messages.

Can you try sending a link reset to phy3?

It should be something like

echo 1 > /sys/class/sas_phy/phy-X:3/link_reset

and see if it just produces 

aic94xx: scb:0x80 timed out

Again?

Thanks,

James



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: aic94xx driver woes
  2007-04-02 17:52         ` James Bottomley
@ 2007-04-02 23:36           ` Douglas Gilbert
  2007-04-03  0:37             ` James Bottomley
  0 siblings, 1 reply; 9+ messages in thread
From: Douglas Gilbert @ 2007-04-02 23:36 UTC (permalink / raw)
  To: James Bottomley; +Cc: Darrick J. Wong, SCSI Mailing List

James Bottomley wrote:
> On Sun, 2007-04-01 at 16:29 -0400, Douglas Gilbert wrote:
>> ...
>> sas: phy3 added to port0, phy_mask:0x8
>> sas: DOING DISCOVERY on port 0, pid:2110
>> aic94xx: scb:0x80 timed out
> 
> This might be the problem.
> 
> I see this periodically when a phy goes out to lunch on my system ...
> with me, it always seems to be phy0 of a port containing phy0-4 ... so
> phy1-3 still function to get messages.
> 
> Can you try sending a link reset to phy3?
> 
> It should be something like
> 
> echo 1 > /sys/class/sas_phy/phy-X:3/link_reset
> 
> and see if it just produces 
> 
> aic94xx: scb:0x80 timed out

Yes it does.

> Again?

It is repeatable.

Also when I connect to phy 0 it works (both direct
connect and expander). However phys 1 and 2 react
like phy 3 shown above.

Doug Gilbert


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: aic94xx driver woes
  2007-04-02 23:36           ` Douglas Gilbert
@ 2007-04-03  0:37             ` James Bottomley
  0 siblings, 0 replies; 9+ messages in thread
From: James Bottomley @ 2007-04-03  0:37 UTC (permalink / raw)
  To: dougg; +Cc: Darrick J. Wong, SCSI Mailing List

On Mon, 2007-04-02 at 19:36 -0400, Douglas Gilbert wrote:
> > echo 1 > /sys/class/sas_phy/phy-X:3/link_reset
> > 
> > and see if it just produces 
> > 
> > aic94xx: scb:0x80 timed out
> 
> Yes it does.
> 
> > Again?
> 
> It is repeatable.
> 
> Also when I connect to phy 0 it works (both direct
> connect and expander). However phys 1 and 2 react
> like phy 3 shown above.

OK, well, I know what it is, I just don't know how to fix it.

On certain error conditions, whatever controls the phy SCB processing
seems to freeze to a particular phy.  With me, it's externally induced
(an expander->expander->satapi configuration).  I can recover my system
by powering off and on the expander setup.

Whatever this condition is, it blows away all error recovery, since
they're also done via means of ascbs, so any error recovery ascbs also
get stuck until they timeout.  Someone with the specs needs to look and
see if there's a way we can kick the phy queue (or whatever queue it's
stuck in)---I'm assuming it's a phy queue because I can get packets out
via other phys, just nothing via the stuck one.

James



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-04-03  0:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-31 16:48 aic94xx driver woes Douglas Gilbert
2007-03-31 17:28 ` James Bottomley
2007-03-31 18:01 ` Darrick J. Wong
2007-03-31 19:05   ` Douglas Gilbert
2007-03-31 23:17     ` James Bottomley
2007-04-01 20:29       ` Douglas Gilbert
2007-04-02 17:52         ` James Bottomley
2007-04-02 23:36           ` Douglas Gilbert
2007-04-03  0:37             ` James Bottomley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.