linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [REGRESSION] qla2xxx: tape drive not removed after unplug FC cable
@ 2022-05-25 22:03 Tony Battersby
  2022-05-28  0:27 ` Arun Easi
  0 siblings, 1 reply; 9+ messages in thread
From: Tony Battersby @ 2022-05-25 22:03 UTC (permalink / raw)
  To: Saurav Kashyap, Nilesh Javali, GR-QLogic-Storage-Upstream
  Cc: linux-scsi, linux-kernel, regressions

#regzbot introduced: 44c57f205876

I have several different QLogic FC HBAs (8, 16, 32 Gbps) and several
different FC LTO tape drives (IBM Ultrium 8 & 9).  When I plug in the FC
cable, the tape drive shows up as a SCSI device as expected.  With older
kernels, when I unplug the FC cable, the tape drive SCSI device would
disappear after about 30 seconds.  But with newer kernels (including
5.18), when I unplug the FC cable, the tape drive SCSI device never
disappears.  I have bisected the change in behavior to the following
commit in kernel 5.15:

44c57f205876 ("scsi: qla2xxx: Changes to support FCP2 Target")

This commit has been backported to various -stable kernels, so they are
also affected.

When testing with two different tape drives:
1) Plug FC cable into tape drive A.  Tape drive A shows up as a SCSI device.
2) Unplug FC cable; wait 60 seconds.  Tape drive A does not disappear.
3) Plug FC cable into tape drive B.  Tape drive A disappears 30 seconds
later, but tape drive B does not show up.
4) Unplug FC cable and plug it back into tape drive B.  Tape drive B
shows up as a SCSI device.

So I can actually make a tape drive disappear by plugging the cable into
a different tape drive, but then I have to reseat the cable again to
make the new tape drive show up.

lspci -n
83:00.0 0c04: 1077:2031 (rev 02)
83:00.1 0c04: 1077:2031 (rev 02)

When plugging in cable:
qla2xxx [0000:83:00.1]-500a:7: LOOP UP detected (8 Gbps).

When unplugging cable with old kernel:
qla2xxx [0000:83:00.1]-500b:7: LOOP DOWN detected (2 7 0 0).
rport-7:0-2: blocked FC remote port time out: removing target and saving binding

When unplugging cable with new kernel:
qla2xxx [0000:83:00.1]-500b:7: LOOP DOWN detected (2 7 0 0).

/sys/class/fc_remote_ports/rport-*/
dev_loss_tmo: 30
supported_classes: Class 3
port_state: Online
(port_state remains Online even when FC cable unplugged)

/proc/scsi/scsi
Host: scsi7 Channel: 00 Id: 01 Lun: 00
  Vendor: IBM      Model: ULTRIUM-HH8      Rev: K4K1
  Type:   Sequential-Access                ANSI  SCSI revision: 06

Tony Battersby
Cybernetics


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] qla2xxx: tape drive not removed after unplug FC cable
  2022-05-25 22:03 [REGRESSION] qla2xxx: tape drive not removed after unplug FC cable Tony Battersby
@ 2022-05-28  0:27 ` Arun Easi
  2022-06-20  6:56   ` Thorsten Leemhuis
  0 siblings, 1 reply; 9+ messages in thread
From: Arun Easi @ 2022-05-28  0:27 UTC (permalink / raw)
  To: Tony Battersby
  Cc: Saurav Kashyap, Nilesh Javali, GR-QLogic-Storage-Upstream,
	linux-scsi, linux-kernel, regressions

[-- Attachment #1: Type: text/plain, Size: 2590 bytes --]

Hi Tony,

Thanks for reporting the issue. We are trying to recreate this issue in 
house. I will reach out to you for logs, if we cannot repro. Typically, we 
get sufficient context to the issue when the problem is reproduced with 
module parameter "ql2xextended_error_logging=1".

Anyway, I will let you know the status.

Regards,
-Arun

On Wed, 25 May 2022, 3:03pm, Tony Battersby wrote:

> #regzbot introduced: 44c57f205876
> 
> I have several different QLogic FC HBAs (8, 16, 32 Gbps) and several
> different FC LTO tape drives (IBM Ultrium 8 & 9).  When I plug in the FC
> cable, the tape drive shows up as a SCSI device as expected.  With older
> kernels, when I unplug the FC cable, the tape drive SCSI device would
> disappear after about 30 seconds.  But with newer kernels (including
> 5.18), when I unplug the FC cable, the tape drive SCSI device never
> disappears.  I have bisected the change in behavior to the following
> commit in kernel 5.15:
> 
> 44c57f205876 ("scsi: qla2xxx: Changes to support FCP2 Target")
> 
> This commit has been backported to various -stable kernels, so they are
> also affected.
> 
> When testing with two different tape drives:
> 1) Plug FC cable into tape drive A.  Tape drive A shows up as a SCSI device.
> 2) Unplug FC cable; wait 60 seconds.  Tape drive A does not disappear.
> 3) Plug FC cable into tape drive B.  Tape drive A disappears 30 seconds
> later, but tape drive B does not show up.
> 4) Unplug FC cable and plug it back into tape drive B.  Tape drive B
> shows up as a SCSI device.
> 
> So I can actually make a tape drive disappear by plugging the cable into
> a different tape drive, but then I have to reseat the cable again to
> make the new tape drive show up.
> 
> lspci -n
> 83:00.0 0c04: 1077:2031 (rev 02)
> 83:00.1 0c04: 1077:2031 (rev 02)
> 
> When plugging in cable:
> qla2xxx [0000:83:00.1]-500a:7: LOOP UP detected (8 Gbps).
> 
> When unplugging cable with old kernel:
> qla2xxx [0000:83:00.1]-500b:7: LOOP DOWN detected (2 7 0 0).
> rport-7:0-2: blocked FC remote port time out: removing target and saving binding
> 
> When unplugging cable with new kernel:
> qla2xxx [0000:83:00.1]-500b:7: LOOP DOWN detected (2 7 0 0).
> 
> /sys/class/fc_remote_ports/rport-*/
> dev_loss_tmo: 30
> supported_classes: Class 3
> port_state: Online
> (port_state remains Online even when FC cable unplugged)
> 
> /proc/scsi/scsi
> Host: scsi7 Channel: 00 Id: 01 Lun: 00
>   Vendor: IBM      Model: ULTRIUM-HH8      Rev: K4K1
>   Type:   Sequential-Access                ANSI  SCSI revision: 06
> 
> Tony Battersby
> Cybernetics
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] qla2xxx: tape drive not removed after unplug FC cable
  2022-05-28  0:27 ` Arun Easi
@ 2022-06-20  6:56   ` Thorsten Leemhuis
  2022-06-20 14:33     ` Tony Battersby
  0 siblings, 1 reply; 9+ messages in thread
From: Thorsten Leemhuis @ 2022-06-20  6:56 UTC (permalink / raw)
  To: Arun Easi, Tony Battersby
  Cc: Saurav Kashyap, Nilesh Javali, GR-QLogic-Storage-Upstream,
	linux-scsi, linux-kernel, regressions

Hi!

On 28.05.22 02:27, Arun Easi wrote:
> Hi Tony,
> 
> Thanks for reporting the issue. We are trying to recreate this issue in 
> house. I will reach out to you for logs, if we cannot repro. Typically, we 
> get sufficient context to the issue when the problem is reproduced with 
> module parameter "ql2xextended_error_logging=1".
> 
> Anyway, I will let you know the status.

What's the status here? Tony, did you provide the info Arun asked for=
Or was some progress made somehow without it?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

> On Wed, 25 May 2022, 3:03pm, Tony Battersby wrote:
> 
>> #regzbot introduced: 44c57f205876
>>
>> I have several different QLogic FC HBAs (8, 16, 32 Gbps) and several
>> different FC LTO tape drives (IBM Ultrium 8 & 9).  When I plug in the FC
>> cable, the tape drive shows up as a SCSI device as expected.  With older
>> kernels, when I unplug the FC cable, the tape drive SCSI device would
>> disappear after about 30 seconds.  But with newer kernels (including
>> 5.18), when I unplug the FC cable, the tape drive SCSI device never
>> disappears.  I have bisected the change in behavior to the following
>> commit in kernel 5.15:
>>
>> 44c57f205876 ("scsi: qla2xxx: Changes to support FCP2 Target")
>>
>> This commit has been backported to various -stable kernels, so they are
>> also affected.
>>
>> When testing with two different tape drives:
>> 1) Plug FC cable into tape drive A.  Tape drive A shows up as a SCSI device.
>> 2) Unplug FC cable; wait 60 seconds.  Tape drive A does not disappear.
>> 3) Plug FC cable into tape drive B.  Tape drive A disappears 30 seconds
>> later, but tape drive B does not show up.
>> 4) Unplug FC cable and plug it back into tape drive B.  Tape drive B
>> shows up as a SCSI device.
>>
>> So I can actually make a tape drive disappear by plugging the cable into
>> a different tape drive, but then I have to reseat the cable again to
>> make the new tape drive show up.
>>
>> lspci -n
>> 83:00.0 0c04: 1077:2031 (rev 02)
>> 83:00.1 0c04: 1077:2031 (rev 02)
>>
>> When plugging in cable:
>> qla2xxx [0000:83:00.1]-500a:7: LOOP UP detected (8 Gbps).
>>
>> When unplugging cable with old kernel:
>> qla2xxx [0000:83:00.1]-500b:7: LOOP DOWN detected (2 7 0 0).
>> rport-7:0-2: blocked FC remote port time out: removing target and saving binding
>>
>> When unplugging cable with new kernel:
>> qla2xxx [0000:83:00.1]-500b:7: LOOP DOWN detected (2 7 0 0).
>>
>> /sys/class/fc_remote_ports/rport-*/
>> dev_loss_tmo: 30
>> supported_classes: Class 3
>> port_state: Online
>> (port_state remains Online even when FC cable unplugged)
>>
>> /proc/scsi/scsi
>> Host: scsi7 Channel: 00 Id: 01 Lun: 00
>>   Vendor: IBM      Model: ULTRIUM-HH8      Rev: K4K1
>>   Type:   Sequential-Access                ANSI  SCSI revision: 06
>>
>> Tony Battersby
>> Cybernetics
>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] qla2xxx: tape drive not removed after unplug FC cable
  2022-06-20  6:56   ` Thorsten Leemhuis
@ 2022-06-20 14:33     ` Tony Battersby
  2022-06-21 22:05       ` [EXT] " Arun Easi
  0 siblings, 1 reply; 9+ messages in thread
From: Tony Battersby @ 2022-06-20 14:33 UTC (permalink / raw)
  To: Thorsten Leemhuis, Arun Easi
  Cc: Saurav Kashyap, Nilesh Javali, GR-QLogic-Storage-Upstream,
	linux-scsi, linux-kernel, regressions

On 6/20/22 02:56, Thorsten Leemhuis wrote:
> Hi!
>
> On 28.05.22 02:27, Arun Easi wrote:
>> Hi Tony,
>>
>> Thanks for reporting the issue. We are trying to recreate this issue in 
>> house. I will reach out to you for logs, if we cannot repro. Typically, we 
>> get sufficient context to the issue when the problem is reproduced with 
>> module parameter "ql2xextended_error_logging=1".
>>
>> Anyway, I will let you know the status.
> What's the status here? Tony, did you provide the info Arun asked for=
> Or was some progress made somehow without it?
>
I was waiting for them to reproduce the problem, or ask for logs. 
Anyway, here are the logs they said they might want:

modprobe qla2xxx ql2xextended_error_logging=1

Jun 20 13:51:46 tony13 kern.warn kernel: qla2xxx [0000:00:00.0]-0005: : QLogic Fibre Channel HBA Driver: 10.02.07.400-k-debug.
Jun 20 13:51:46 tony13 kern.warn kernel: qla2xxx [0000:00:00.0]-011c: : MSI-X vector count: 32.
Jun 20 13:51:46 tony13 kern.warn kernel: qla2xxx [0000:00:00.0]-001d: : Found an ISP2031 irq 37 iobase 0x00000000875338f3.
Jun 20 13:51:46 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-00c6:6: MSI-X: Using 26 vectors
Jun 20 13:51:47 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-0075:6: ZIO mode 6 enabled; timer delay (200 us).
Jun 20 13:51:47 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-ffff:6: FC4 priority set to NVMe
Jun 20 13:51:48 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-507b:6: SFP detect: Short-Range SFP  (nvr=0 ll=40 lr=0 lrd=0).
Jun 20 13:51:50 tony13 kern.info kernel: scsi host6: qla2xxx
Jun 20 13:51:50 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-4000:6: DPC handler sleeping.
Jun 20 13:51:51 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-400f:6: Loop resync scheduled.
Jun 20 13:51:55 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-00fb:6: QLogic QLE2672 - QLE2672 QLogic 2-port 16Gb Fibre Channel Adapter.
Jun 20 13:51:55 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-00fc:6: ISP2031: PCIe (8.0GT/s x8) @ 0000:83:00.0 hdma- host#=6 fw=8.07.12 (d0d5).
Jun 20 13:51:55 tony13 kern.warn kernel: qla2xxx [0000:00:00.0]-011c: : MSI-X vector count: 32.
Jun 20 13:51:55 tony13 kern.warn kernel: qla2xxx [0000:00:00.0]-001d: : Found an ISP2031 irq 135 iobase 0x000000007cfbf1eb.
Jun 20 13:51:55 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-00c6:7: MSI-X: Using 26 vectors
Jun 20 13:51:55 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-0075:7: ZIO mode 6 enabled; timer delay (200 us).
Jun 20 13:51:55 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-ffff:7: FC4 priority set to NVMe
Jun 20 13:51:57 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-507b:7: SFP detect: Short-Range SFP  (nvr=0 ll=40 lr=0 lrd=0).
Jun 20 13:51:58 tony13 kern.info kernel: scsi host7: qla2xxx
Jun 20 13:51:58 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4000:7: DPC handler sleeping.
Jun 20 13:51:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-400f:7: Loop resync scheduled.
Jun 20 13:52:03 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-00fb:7: QLogic QLE2672 - QLE2672 QLogic 2-port 16Gb Fibre Channel Adapter.
Jun 20 13:52:03 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-00fc:7: ISP2031: PCIe (8.0GT/s x8) @ 0000:83:00.1 hdma- host#=7 fw=8.07.12 (d0d5).
Jun 20 13:52:11 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-8038:6: Cable is unplugged...
Jun 20 13:52:11 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-803a:6: fw_state=4 (7, 141, 0, 800 0) curr time=ffffc764.
Jun 20 13:52:11 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-206c:6: qla2x00_loop_resync *** FAILED ***.
Jun 20 13:52:11 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-4010:6: Loop resync end.
Jun 20 13:52:11 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-4000:6: DPC handler sleeping.
Jun 20 13:52:20 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-8038:7: Cable is unplugged...
Jun 20 13:52:20 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-803a:7: fw_state=4 (7, 141, 0, 800 0) curr time=ffffcaac.
Jun 20 13:52:20 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-206c:7: qla2x00_loop_resync *** FAILED ***.
Jun 20 13:52:20 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4010:7: Loop resync end.
Jun 20 13:52:20 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4000:7: DPC handler sleeping.

Plug in cable

Jun 20 13:52:56 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-5009:7: LIP occurred (0).
Jun 20 13:52:56 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-10b7:7: Format 0 : Number of VPs setup 254, number of VPs acquired 1.
Jun 20 13:52:56 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-10b8:7: Primary port id 0000ef.
Jun 20 13:52:58 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-500a:7: LOOP UP detected (8 Gbps).
Jun 20 13:52:58 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-5012:7: Port database changed ffff 0006 0000.
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-400f:7: Loop resync scheduled.
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-8037:7: F/W Ready - OK.
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-803a:7: fw_state=3 (7, 141, 0, 800 0) curr time=ffffd9e8.
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-200b:7: HBA in NL topology.
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2014:7: Configure loop -- dpc flags = 0x112e0.
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2011:7: Entries in ID list (1).
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20d8:7: qla24xx_fcport_handle_login 50:00:e1:11:c5:27:f0:70 DS 0 LS 7 P 0 fl 0 confl 0000000000000000 rscn 0|0 login 0 lid 0 scan 2 fc4type 0
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-307b:7: qla_chk_n2n_b4_login 50:00:e1:11:c5:27:f0:70 DS 0 LS 7 lid 0 retries=30
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20bf:7: qla_chk_n2n_b4_login 1628 50:00:e1:11:c5:27:f0:70 post login
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2069:7: LOOP READY.
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-206b:7: qla2x00_configure_loop: exiting normally. local port wwpn 2100000e1e2227a1 id 0000ef)
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4010:7: Loop resync end.
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: DELETED to LOGIN_PEND - portid=000026.
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2072:7: Async-login - 50:00:e1:11:c5:27:f0:70 hdl=2, loopid=0 portid=000026 retries=29 .
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4000:7: DPC handler sleeping.
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-5036:7: Async-login complete: handle=2 pid=000026 wwpn=50:00:e1:11:c5:27:f0:70 iop0=312
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20dd:7: qla2x00_async_login_sp_done 50:00:e1:11:c5:27:f0:70 res 0 
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-ffff:7: qla24xx_handle_plogi_done_event 50:00:e1:11:c5:27:f0:70 DS 3 LS 7 rc 0 login 0|0 rscn 0|0 data 4000|0 iop 0|0
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20ea:7: qla24xx_handle_plogi_done_event 2232 50:00:e1:11:c5:27:f0:70 LoopID 0x0 in use with 000026. post gpdb
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: LOGIN_PEND to GPDB - portid=000026.
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20dc:7: Async-gpdb 50:00:e1:11:c5:27:f0:70 hndl 0 opt 0
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20db:7: Async done-gpdb res 0, WWPN 50:00:e1:11:c5:27:f0:70 mb[1]=0 mb[2]=ffdd 
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20d2:7: qla24xx_handle_gpdb_event 50:00:e1:11:c5:27:f0:70 DS 5 LS 6 fc4_type 0 rc 0
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2062:7: 50:00:e1:11:c5:27:f0:70 SVC Param w3 0312
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: GPDB to UPD_FCPORT - portid=000026.
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20ef:7: qla2x00_update_fcport 50:00:e1:11:c5:27:f0:70
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: UPD_FCPORT to UPD_FCPORT - portid=000026.
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-207d:7: FCPort 50:00:e1:11:c5:27:f0:70 state transitioned from UNCONFIGURED to ONLINE - portid=000026.
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20ee:7: qla2x00_reg_remote_port: 5000e111c527f070. rport 7:0:0 (00000000bd175424) is tgt mode
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: UPD_FCPORT to LOGIN_COMPLETE - portid=000026.
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-911e:7: qla_register_fcport_fn rscn gen 0/0 next DS 0
Jun 20 13:52:59 tony13 kern.notice kernel: scsi 7:0:0:0: Sequential-Access IBM      ULTRIUM-HH8      K4K1 PQ: 0 ANSI: 6
Jun 20 13:52:59 tony13 kern.notice kernel: scsi 7:0:0:0: Attached scsi generic sg1 type 1
Jun 20 13:52:59 tony13 kern.info kernel: st: Version 20160209, fixed bufsize 32768, s/g segs 256
Jun 20 13:52:59 tony13 kern.notice kernel: st 7:0:0:0: Attached scsi tape st0
Jun 20 13:52:59 tony13 kern.info kernel: st 7:0:0:0: st0: try direct i/o: yes (alignment 8 B)

Unplug cable

Jun 20 13:54:03 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-500b:7: LOOP DOWN detected (2 7 0 0).
Jun 20 13:54:03 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20f1:7: Mark all dev lost

sleep 180

cat /proc/scsi/scsi 
Attached devices:
Host: scsi7 Channel: 00 Id: 00 Lun: 00
  Vendor: IBM      Model: ULTRIUM-HH8      Rev: K4K1
  Type:   Sequential-Access                ANSI  SCSI revision: 06

rmmod qla2xxx

Jun 20 14:00:18 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-211a:7: Delaying session delete for FCP2 flags 0x4 port_type = 0x4 port_id=000026 50
Jun 20 14:00:18 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-b079:7: Removing driver
Jun 20 14:00:18 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-00af:7: Performing ISP error recovery - ha=00000000c584070c.
Jun 20 14:00:18 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20f1:7: Mark all dev lost
Jun 20 14:00:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-211a:7: Delaying session delete for FCP2 flags 0x4 port_type = 0x4 port_id=000026 50
Jun 20 14:00:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4011:7: DPC handler exiting.
Jun 20 14:00:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-207d:7: FCPort 50:00:e1:11:c5:27:f0:70 state transitioned from ONLINE to DEAD - portid=000026.
Jun 20 14:00:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-b079:6: Removing driver
Jun 20 14:00:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-00af:6: Performing ISP error recovery - ha=000000008a8560d1.
Jun 20 14:00:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-20f1:6: Mark all dev lost
Jun 20 14:00:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-4011:6: DPC handler exiting.

Tony Battersby
Cybernetics

> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>
> P.S.: As the Linux kernel's regression tracker I deal with a lot of
> reports and sometimes miss something important when writing mails like
> this. If that's the case here, don't hesitate to tell me in a public
> reply, it's in everyone's interest to set the public record straight.
>
>> On Wed, 25 May 2022, 3:03pm, Tony Battersby wrote:
>>
>>> #regzbot introduced: 44c57f205876
>>>
>>> I have several different QLogic FC HBAs (8, 16, 32 Gbps) and several
>>> different FC LTO tape drives (IBM Ultrium 8 & 9).  When I plug in the FC
>>> cable, the tape drive shows up as a SCSI device as expected.  With older
>>> kernels, when I unplug the FC cable, the tape drive SCSI device would
>>> disappear after about 30 seconds.  But with newer kernels (including
>>> 5.18), when I unplug the FC cable, the tape drive SCSI device never
>>> disappears.  I have bisected the change in behavior to the following
>>> commit in kernel 5.15:
>>>
>>> 44c57f205876 ("scsi: qla2xxx: Changes to support FCP2 Target")
>>>
>>> This commit has been backported to various -stable kernels, so they are
>>> also affected.
>>>
>>> When testing with two different tape drives:
>>> 1) Plug FC cable into tape drive A.  Tape drive A shows up as a SCSI device.
>>> 2) Unplug FC cable; wait 60 seconds.  Tape drive A does not disappear.
>>> 3) Plug FC cable into tape drive B.  Tape drive A disappears 30 seconds
>>> later, but tape drive B does not show up.
>>> 4) Unplug FC cable and plug it back into tape drive B.  Tape drive B
>>> shows up as a SCSI device.
>>>
>>> So I can actually make a tape drive disappear by plugging the cable into
>>> a different tape drive, but then I have to reseat the cable again to
>>> make the new tape drive show up.
>>>
>>> lspci -n
>>> 83:00.0 0c04: 1077:2031 (rev 02)
>>> 83:00.1 0c04: 1077:2031 (rev 02)
>>>
>>> When plugging in cable:
>>> qla2xxx [0000:83:00.1]-500a:7: LOOP UP detected (8 Gbps).
>>>
>>> When unplugging cable with old kernel:
>>> qla2xxx [0000:83:00.1]-500b:7: LOOP DOWN detected (2 7 0 0).
>>> rport-7:0-2: blocked FC remote port time out: removing target and saving binding
>>>
>>> When unplugging cable with new kernel:
>>> qla2xxx [0000:83:00.1]-500b:7: LOOP DOWN detected (2 7 0 0).
>>>
>>> /sys/class/fc_remote_ports/rport-*/
>>> dev_loss_tmo: 30
>>> supported_classes: Class 3
>>> port_state: Online
>>> (port_state remains Online even when FC cable unplugged)
>>>
>>> /proc/scsi/scsi
>>> Host: scsi7 Channel: 00 Id: 01 Lun: 00
>>>   Vendor: IBM      Model: ULTRIUM-HH8      Rev: K4K1
>>>   Type:   Sequential-Access                ANSI  SCSI revision: 06
>>>
>>> Tony Battersby
>>> Cybernetics
>>>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [EXT] Re: [REGRESSION] qla2xxx: tape drive not removed after unplug FC cable
  2022-06-20 14:33     ` Tony Battersby
@ 2022-06-21 22:05       ` Arun Easi
  2022-06-22 14:56         ` Tony Battersby
  0 siblings, 1 reply; 9+ messages in thread
From: Arun Easi @ 2022-06-21 22:05 UTC (permalink / raw)
  To: Tony Battersby
  Cc: Thorsten Leemhuis, Saurav Kashyap, Nilesh Javali,
	GR-QLogic-Storage-Upstream, linux-scsi, linux-kernel,
	regressions

[-- Attachment #1: Type: text/plain, Size: 13541 bytes --]

Hi Tony,

Apologies for the delay. This was not forgotten, there were repro attempts 
and discussions happening internally.

See further comments inline..

On Mon, 20 Jun 2022, 7:33am, Tony Battersby wrote:

> 
> ----------------------------------------------------------------------
> On 6/20/22 02:56, Thorsten Leemhuis wrote:
> > Hi!
> >
> > On 28.05.22 02:27, Arun Easi wrote:
> >> Hi Tony,
> >>
> >> Thanks for reporting the issue. We are trying to recreate this issue in 
> >> house. I will reach out to you for logs, if we cannot repro. Typically, we 
> >> get sufficient context to the issue when the problem is reproduced with 
> >> module parameter "ql2xextended_error_logging=1".
> >>
> >> Anyway, I will let you know the status.
> > What's the status here? Tony, did you provide the info Arun asked for=
> > Or was some progress made somehow without it?
> >
> I was waiting for them to reproduce the problem, or ask for logs. 
> Anyway, here are the logs they said they might want:
> 
> modprobe qla2xxx ql2xextended_error_logging=1
> 
> Jun 20 13:51:46 tony13 kern.warn kernel: qla2xxx [0000:00:00.0]-0005: : QLogic Fibre Channel HBA Driver: 10.02.07.400-k-debug.
> Jun 20 13:51:46 tony13 kern.warn kernel: qla2xxx [0000:00:00.0]-011c: : MSI-X vector count: 32.
> Jun 20 13:51:46 tony13 kern.warn kernel: qla2xxx [0000:00:00.0]-001d: : Found an ISP2031 irq 37 iobase 0x00000000875338f3.
> Jun 20 13:51:46 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-00c6:6: MSI-X: Using 26 vectors
> Jun 20 13:51:47 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-0075:6: ZIO mode 6 enabled; timer delay (200 us).
> Jun 20 13:51:47 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-ffff:6: FC4 priority set to NVMe
> Jun 20 13:51:48 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-507b:6: SFP detect: Short-Range SFP  (nvr=0 ll=40 lr=0 lrd=0).
> Jun 20 13:51:50 tony13 kern.info kernel: scsi host6: qla2xxx
> Jun 20 13:51:50 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-4000:6: DPC handler sleeping.
> Jun 20 13:51:51 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-400f:6: Loop resync scheduled.
> Jun 20 13:51:55 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-00fb:6: QLogic QLE2672 - QLE2672 QLogic 2-port 16Gb Fibre Channel Adapter.
> Jun 20 13:51:55 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-00fc:6: ISP2031: PCIe (8.0GT/s x8) @ 0000:83:00.0 hdma- host#=6 fw=8.07.12 (d0d5).
> Jun 20 13:51:55 tony13 kern.warn kernel: qla2xxx [0000:00:00.0]-011c: : MSI-X vector count: 32.
> Jun 20 13:51:55 tony13 kern.warn kernel: qla2xxx [0000:00:00.0]-001d: : Found an ISP2031 irq 135 iobase 0x000000007cfbf1eb.
> Jun 20 13:51:55 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-00c6:7: MSI-X: Using 26 vectors
> Jun 20 13:51:55 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-0075:7: ZIO mode 6 enabled; timer delay (200 us).
> Jun 20 13:51:55 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-ffff:7: FC4 priority set to NVMe
> Jun 20 13:51:57 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-507b:7: SFP detect: Short-Range SFP  (nvr=0 ll=40 lr=0 lrd=0).
> Jun 20 13:51:58 tony13 kern.info kernel: scsi host7: qla2xxx
> Jun 20 13:51:58 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4000:7: DPC handler sleeping.
> Jun 20 13:51:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-400f:7: Loop resync scheduled.
> Jun 20 13:52:03 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-00fb:7: QLogic QLE2672 - QLE2672 QLogic 2-port 16Gb Fibre Channel Adapter.
> Jun 20 13:52:03 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-00fc:7: ISP2031: PCIe (8.0GT/s x8) @ 0000:83:00.1 hdma- host#=7 fw=8.07.12 (d0d5).
> Jun 20 13:52:11 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-8038:6: Cable is unplugged...
> Jun 20 13:52:11 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-803a:6: fw_state=4 (7, 141, 0, 800 0) curr time=ffffc764.
> Jun 20 13:52:11 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-206c:6: qla2x00_loop_resync *** FAILED ***.
> Jun 20 13:52:11 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-4010:6: Loop resync end.
> Jun 20 13:52:11 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-4000:6: DPC handler sleeping.
> Jun 20 13:52:20 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-8038:7: Cable is unplugged...
> Jun 20 13:52:20 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-803a:7: fw_state=4 (7, 141, 0, 800 0) curr time=ffffcaac.
> Jun 20 13:52:20 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-206c:7: qla2x00_loop_resync *** FAILED ***.
> Jun 20 13:52:20 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4010:7: Loop resync end.
> Jun 20 13:52:20 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4000:7: DPC handler sleeping.
> 
> Plug in cable
> 
> Jun 20 13:52:56 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-5009:7: LIP occurred (0).
> Jun 20 13:52:56 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-10b7:7: Format 0 : Number of VPs setup 254, number of VPs acquired 1.
> Jun 20 13:52:56 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-10b8:7: Primary port id 0000ef.
> Jun 20 13:52:58 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-500a:7: LOOP UP detected (8 Gbps).
> Jun 20 13:52:58 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-5012:7: Port database changed ffff 0006 0000.
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-400f:7: Loop resync scheduled.
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-8037:7: F/W Ready - OK.
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-803a:7: fw_state=3 (7, 141, 0, 800 0) curr time=ffffd9e8.
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-200b:7: HBA in NL topology.
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2014:7: Configure loop -- dpc flags = 0x112e0.
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2011:7: Entries in ID list (1).
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20d8:7: qla24xx_fcport_handle_login 50:00:e1:11:c5:27:f0:70 DS 0 LS 7 P 0 fl 0 confl 0000000000000000 rscn 0|0 login 0 lid 0 scan 2 fc4type 0
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-307b:7: qla_chk_n2n_b4_login 50:00:e1:11:c5:27:f0:70 DS 0 LS 7 lid 0 retries=30
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20bf:7: qla_chk_n2n_b4_login 1628 50:00:e1:11:c5:27:f0:70 post login
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2069:7: LOOP READY.
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-206b:7: qla2x00_configure_loop: exiting normally. local port wwpn 2100000e1e2227a1 id 0000ef)
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4010:7: Loop resync end.
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: DELETED to LOGIN_PEND - portid=000026.
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2072:7: Async-login - 50:00:e1:11:c5:27:f0:70 hdl=2, loopid=0 portid=000026 retries=29 .
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4000:7: DPC handler sleeping.
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-5036:7: Async-login complete: handle=2 pid=000026 wwpn=50:00:e1:11:c5:27:f0:70 iop0=312
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20dd:7: qla2x00_async_login_sp_done 50:00:e1:11:c5:27:f0:70 res 0 
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-ffff:7: qla24xx_handle_plogi_done_event 50:00:e1:11:c5:27:f0:70 DS 3 LS 7 rc 0 login 0|0 rscn 0|0 data 4000|0 iop 0|0
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20ea:7: qla24xx_handle_plogi_done_event 2232 50:00:e1:11:c5:27:f0:70 LoopID 0x0 in use with 000026. post gpdb
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: LOGIN_PEND to GPDB - portid=000026.
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20dc:7: Async-gpdb 50:00:e1:11:c5:27:f0:70 hndl 0 opt 0
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20db:7: Async done-gpdb res 0, WWPN 50:00:e1:11:c5:27:f0:70 mb[1]=0 mb[2]=ffdd 
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20d2:7: qla24xx_handle_gpdb_event 50:00:e1:11:c5:27:f0:70 DS 5 LS 6 fc4_type 0 rc 0
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2062:7: 50:00:e1:11:c5:27:f0:70 SVC Param w3 0312
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: GPDB to UPD_FCPORT - portid=000026.
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20ef:7: qla2x00_update_fcport 50:00:e1:11:c5:27:f0:70
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: UPD_FCPORT to UPD_FCPORT - portid=000026.
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-207d:7: FCPort 50:00:e1:11:c5:27:f0:70 state transitioned from UNCONFIGURED to ONLINE - portid=000026.
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20ee:7: qla2x00_reg_remote_port: 5000e111c527f070. rport 7:0:0 (00000000bd175424) is tgt mode
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: UPD_FCPORT to LOGIN_COMPLETE - portid=000026.
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-911e:7: qla_register_fcport_fn rscn gen 0/0 next DS 0
> Jun 20 13:52:59 tony13 kern.notice kernel: scsi 7:0:0:0: Sequential-Access IBM      ULTRIUM-HH8      K4K1 PQ: 0 ANSI: 6
> Jun 20 13:52:59 tony13 kern.notice kernel: scsi 7:0:0:0: Attached scsi generic sg1 type 1
> Jun 20 13:52:59 tony13 kern.info kernel: st: Version 20160209, fixed bufsize 32768, s/g segs 256
> Jun 20 13:52:59 tony13 kern.notice kernel: st 7:0:0:0: Attached scsi tape st0
> Jun 20 13:52:59 tony13 kern.info kernel: st 7:0:0:0: st0: try direct i/o: yes (alignment 8 B)
> 
> Unplug cable
> 
> Jun 20 13:54:03 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-500b:7: LOOP DOWN detected (2 7 0 0).
> Jun 20 13:54:03 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20f1:7: Mark all dev lost
> 
> sleep 180
> 
> cat /proc/scsi/scsi 
> Attached devices:
> Host: scsi7 Channel: 00 Id: 00 Lun: 00
>   Vendor: IBM      Model: ULTRIUM-HH8      Rev: K4K1
>   Type:   Sequential-Access                ANSI  SCSI revision: 06

Thanks for the info. Just to reiterate, you've reported two issues (though 
this log was showing only 1 of them).

Issue 1 - Tape device never disappears when removed
Issue 2 - When a direct connected tape 1 was replaced with tape 2, tape 2 
          was not discovered.

For Issue-2, please try the attached patch. This may not be the final fix, 
but wanted to check if that would fix the issue for you.

For Issue-1, the behavior was intentional, though that behavior needs 
refinement. These tape drives support something called FC sequence level 
error recovery (added in FCP-2), which can make tape I/Os survive even 
across a short cable pull. This is not a simple retry of the I/O, rather a 
retry done at the FC sequence level that gives the IO a better chance of
revival. In other words, the said patch that caused regression, while 
introduces an incorrect reporting of the state of the device, makes backup 
more resilient.

Now, onto the behavior when device state is reported immediately. What we 
have observed, at least with one tape drive from a major vendor, is that, 
across a device loss and device back case with both the events reported to 
upper layers, the backup operation was getting failed. This is due to a 
REPORT LUNS command being issued during device reappearance reporting 
(fc_remote_port_add -> SCSI scan), which the tape drive was not expecting 
and caused the backup to fail.

I know that some tape drives do not support multiple commands to it at the 
same time, but not sure if that is still the norm these days.

So, perhaps one way to make the behavior better, is to either report the 
disappearing device a bit delayed or have intelligence added in SCSI scan 
to detect ongoing tape IO operations and delay/avoid the REPORT LUNs. 
Former is a more contained (in the LLD) fix.

Regards,
-Arun

> 
> rmmod qla2xxx
> 
> Jun 20 14:00:18 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-211a:7: Delaying session delete for FCP2 flags 0x4 port_type = 0x4 port_id=000026 50
> Jun 20 14:00:18 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-b079:7: Removing driver
> Jun 20 14:00:18 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-00af:7: Performing ISP error recovery - ha=00000000c584070c.
> Jun 20 14:00:18 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20f1:7: Mark all dev lost
> Jun 20 14:00:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-211a:7: Delaying session delete for FCP2 flags 0x4 port_type = 0x4 port_id=000026 50
> Jun 20 14:00:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4011:7: DPC handler exiting.
> Jun 20 14:00:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-207d:7: FCPort 50:00:e1:11:c5:27:f0:70 state transitioned from ONLINE to DEAD - portid=000026.
> Jun 20 14:00:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-b079:6: Removing driver
> Jun 20 14:00:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-00af:6: Performing ISP error recovery - ha=000000008a8560d1.
> Jun 20 14:00:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-20f1:6: Mark all dev lost
> Jun 20 14:00:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-4011:6: DPC handler exiting.
> 

[-- Attachment #2: Type: text/plain, Size: 3776 bytes --]

From af58373c6a61f74b3b190e17333b3b1e4d151919 Mon Sep 17 00:00:00 2001
From: Arun Easi <aeasi@marvell.com>
Date: Thu, 16 Jun 2022 17:31:43 -0700
Subject: [PATCH] qla2xxx: Fix discovery issues in FC-AL topology

Topology: Arbitrated loop
Device: Tape (FCP-2 device)
Issue:
  1. Connect Tape-A directly to the HBA port
  2. Disconnect it and connect Tape-B
  3. Tape-B is not discovered by the HBA driver
  4. When Tape-B is disconnected and connected back, it shows up

To fix this, when Get ID List reports no devices and loop map
reports there are devices, issue a re-init so that device login
could be performed during next link initialization.

Reported-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Arun Easi <aeasi@marvell.com>
---
 drivers/scsi/qla2xxx/qla_gbl.h  |  3 ++-
 drivers/scsi/qla2xxx/qla_init.c | 29 +++++++++++++++++++++++++++++
 drivers/scsi/qla2xxx/qla_mbx.c  |  5 ++++-
 3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_gbl.h b/drivers/scsi/qla2xxx/qla_gbl.h
index dac27b5..b9f106a 100644
--- a/drivers/scsi/qla2xxx/qla_gbl.h
+++ b/drivers/scsi/qla2xxx/qla_gbl.h
@@ -433,7 +433,8 @@ extern int
 qla2x00_get_resource_cnts(scsi_qla_host_t *);
 
 extern int
-qla2x00_get_fcal_position_map(scsi_qla_host_t *ha, char *pos_map);
+qla2x00_get_fcal_position_map(scsi_qla_host_t *ha, char *pos_map,
+		u8 *num_entries);
 
 extern int
 qla2x00_get_link_status(scsi_qla_host_t *, uint16_t, struct link_statistics *,
diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
index 3f3417a..3defbea 100644
--- a/drivers/scsi/qla2xxx/qla_init.c
+++ b/drivers/scsi/qla2xxx/qla_init.c
@@ -5488,6 +5488,22 @@ static int qla2x00_configure_n2n_loop(scsi_qla_host_t *vha)
 	return QLA_FUNCTION_FAILED;
 }
 
+static void
+qla_reinitialize_link(scsi_qla_host_t *vha)
+{
+	int rval;
+
+	atomic_set(&vha->loop_state, LOOP_DOWN);
+	atomic_set(&vha->loop_down_timer, LOOP_DOWN_TIME);
+	rval = qla2x00_full_login_lip(vha);
+	if (rval == QLA_SUCCESS) {
+		ql_dbg(ql_dbg_disc, vha, 0xd050, "Link reinitialized\n");
+	} else {
+		ql_dbg(ql_dbg_disc, vha, 0xd051,
+			"Link reinitialization failed (%d)\n", rval);
+	}
+}
+
 /*
  * qla2x00_configure_local_loop
  *	Updates Fibre Channel Device Database with local loop devices.
@@ -5539,6 +5555,19 @@ qla2x00_configure_local_loop(scsi_qla_host_t *vha)
 		spin_unlock_irqrestore(&vha->work_lock, flags);
 
 		if (vha->scan.scan_retry < MAX_SCAN_RETRIES) {
+			u8 loop_map_entries = 0;
+			int rc;
+
+			rc = qla2x00_get_fcal_position_map(vha, NULL,
+						&loop_map_entries);
+			if (rc == QLA_SUCCESS && loop_map_entries > 1) {
+				/*
+				 * There are devices that are still not logged
+				 * in. Reinitialize to give them a chance.
+				 */
+				qla_reinitialize_link(vha);
+				return QLA_FUNCTION_FAILED;
+			}
 			set_bit(LOCAL_LOOP_UPDATE, &vha->dpc_flags);
 			set_bit(LOOP_RESYNC_NEEDED, &vha->dpc_flags);
 		}
diff --git a/drivers/scsi/qla2xxx/qla_mbx.c b/drivers/scsi/qla2xxx/qla_mbx.c
index 892caf2..4a33874 100644
--- a/drivers/scsi/qla2xxx/qla_mbx.c
+++ b/drivers/scsi/qla2xxx/qla_mbx.c
@@ -3066,7 +3066,8 @@ qla2x00_get_resource_cnts(scsi_qla_host_t *vha)
  *	Kernel context.
  */
 int
-qla2x00_get_fcal_position_map(scsi_qla_host_t *vha, char *pos_map)
+qla2x00_get_fcal_position_map(scsi_qla_host_t *vha, char *pos_map,
+		u8 *num_entries)
 {
 	int rval;
 	mbx_cmd_t mc;
@@ -3106,6 +3107,8 @@ qla2x00_get_fcal_position_map(scsi_qla_host_t *vha, char *pos_map)
 
 		if (pos_map)
 			memcpy(pos_map, pmap, FCAL_MAP_SIZE);
+		if (num_entries)
+			*num_entries = pmap[0];
 	}
 	dma_pool_free(ha->s_dma_pool, pmap, pmap_dma);
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [EXT] Re: [REGRESSION] qla2xxx: tape drive not removed after unplug FC cable
  2022-06-21 22:05       ` [EXT] " Arun Easi
@ 2022-06-22 14:56         ` Tony Battersby
  2022-06-22 23:03           ` Arun Easi
  0 siblings, 1 reply; 9+ messages in thread
From: Tony Battersby @ 2022-06-22 14:56 UTC (permalink / raw)
  To: Arun Easi
  Cc: Thorsten Leemhuis, Saurav Kashyap, Nilesh Javali,
	GR-QLogic-Storage-Upstream, linux-scsi, linux-kernel,
	regressions

On 6/21/22 18:05, Arun Easi wrote:
> Thanks for the info. Just to reiterate, you've reported two issues (though 
> this log was showing only 1 of them).
>
> Issue 1 - Tape device never disappears when removed
> Issue 2 - When a direct connected tape 1 was replaced with tape 2, tape 2 
>           was not discovered.
>
> For Issue-2, please try the attached patch. This may not be the final fix, 
> but wanted to check if that would fix the issue for you.
>
> For Issue-1, the behavior was intentional, though that behavior needs 
> refinement. These tape drives support something called FC sequence level 
> error recovery (added in FCP-2), which can make tape I/Os survive even 
> across a short cable pull. This is not a simple retry of the I/O, rather a 
> retry done at the FC sequence level that gives the IO a better chance of
> revival. In other words, the said patch that caused regression, while 
> introduces an incorrect reporting of the state of the device, makes backup 
> more resilient.
>
> Now, onto the behavior when device state is reported immediately. What we 
> have observed, at least with one tape drive from a major vendor, is that, 
> across a device loss and device back case with both the events reported to 
> upper layers, the backup operation was getting failed. This is due to a 
> REPORT LUNS command being issued during device reappearance reporting 
> (fc_remote_port_add -> SCSI scan), which the tape drive was not expecting 
> and caused the backup to fail.
>
> I know that some tape drives do not support multiple commands to it at the 
> same time, but not sure if that is still the norm these days.
>
> So, perhaps one way to make the behavior better, is to either report the 
> disappearing device a bit delayed or have intelligence added in SCSI scan 
> to detect ongoing tape IO operations and delay/avoid the REPORT LUNs. 
> Former is a more contained (in the LLD) fix.
>
> Regards,
> -Arun

Your patch does fix Issue-2 for me.  For Issue-1, it would be fine with
me if qla2xxx reported device removal to the upper level a bit delayed,
as you said.

Below is a log of the test for Issue-2 with your patch applied with
kernel 5.18.2:

modprobe qla2xxx ql2xextended_error_logging=1

Jun 22 14:08:23 tony13 kern.warn kernel: qla2xxx [0000:00:00.0]-0005: : QLogic Fibre Channel HBA Driver: 10.02.07.400-k-debug.
Jun 22 14:08:23 tony13 kern.warn kernel: qla2xxx [0000:00:00.0]-011c: : MSI-X vector count: 32.
Jun 22 14:08:23 tony13 kern.warn kernel: qla2xxx [0000:00:00.0]-001d: : Found an ISP2031 irq 37 iobase 0x00000000d7dfad2d.
Jun 22 14:08:23 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-00c6:6: MSI-X: Using 26 vectors
Jun 22 14:08:23 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-0075:6: ZIO mode 6 enabled; timer delay (200 us).
Jun 22 14:08:23 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-ffff:6: FC4 priority set to NVMe
Jun 22 14:08:24 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-507b:6: SFP detect: Short-Range SFP  (nvr=0 ll=40 lr=0 lrd=0).
Jun 22 14:08:26 tony13 kern.info kernel: scsi host6: qla2xxx
Jun 22 14:08:26 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-4000:6: DPC handler sleeping.
Jun 22 14:08:27 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-400f:6: Loop resync scheduled.
Jun 22 14:08:31 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-00fb:6: QLogic QLE2672 - QLE2672 QLogic 2-port 16Gb Fibre Channel Adapter.
Jun 22 14:08:31 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-00fc:6: ISP2031: PCIe (8.0GT/s x8) @ 0000:83:00.0 hdma- host#=6 fw=8.07.12 (d0d5).
Jun 22 14:08:31 tony13 kern.warn kernel: qla2xxx [0000:00:00.0]-011c: : MSI-X vector count: 32.
Jun 22 14:08:31 tony13 kern.warn kernel: qla2xxx [0000:00:00.0]-001d: : Found an ISP2031 irq 135 iobase 0x0000000083e02c30.
Jun 22 14:08:31 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-00c6:7: MSI-X: Using 26 vectors
Jun 22 14:08:31 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-0075:7: ZIO mode 6 enabled; timer delay (200 us).
Jun 22 14:08:31 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-ffff:7: FC4 priority set to NVMe
Jun 22 14:08:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-507b:7: SFP detect: Short-Range SFP  (nvr=0 ll=40 lr=0 lrd=0).
Jun 22 14:08:34 tony13 kern.info kernel: scsi host7: qla2xxx
Jun 22 14:08:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4000:7: DPC handler sleeping.
Jun 22 14:08:36 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-400f:7: Loop resync scheduled.
Jun 22 14:08:39 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-00fb:7: QLogic QLE2672 - QLE2672 QLogic 2-port 16Gb Fibre Channel Adapter.
Jun 22 14:08:39 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-00fc:7: ISP2031: PCIe (8.0GT/s x8) @ 0000:83:00.1 hdma- host#=7 fw=8.07.12 (d0d5).
Jun 22 14:08:47 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-8038:6: Cable is unplugged...
Jun 22 14:08:47 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-803a:6: fw_state=4 (7, 141, 0, 800 0) curr time=10000dd24.
Jun 22 14:08:47 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-206c:6: qla2x00_loop_resync *** FAILED ***.
Jun 22 14:08:47 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-4010:6: Loop resync end.
Jun 22 14:08:47 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-4000:6: DPC handler sleeping.
Jun 22 14:08:56 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-8038:7: Cable is unplugged...
Jun 22 14:08:56 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-803a:7: fw_state=4 (7, 141, 0, 800 0) curr time=10000e075.
Jun 22 14:08:56 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-206c:7: qla2x00_loop_resync *** FAILED ***.
Jun 22 14:08:56 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4010:7: Loop resync end.
Jun 22 14:08:56 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4000:7: DPC handler sleeping.

Plug in cable to tape drive A

Jun 22 14:09:32 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-5009:7: LIP occurred (0).
Jun 22 14:09:32 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-10b7:7: Format 0 : Number of VPs setup 254, number of VPs acquired 1.
Jun 22 14:09:32 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-10b8:7: Primary port id 0000ef.
Jun 22 14:09:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-500a:7: LOOP UP detected (8 Gbps).
Jun 22 14:09:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-5012:7: Port database changed ffff 0006 0000.
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-400f:7: Loop resync scheduled.
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-8037:7: F/W Ready - OK.
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-803a:7: fw_state=3 (7, 141, 0, 800 0) curr time=10000efb0.
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-200b:7: HBA in NL topology.
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2014:7: Configure loop -- dpc flags = 0x112e0.
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2011:7: Entries in ID list (1).
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20d8:7: qla24xx_fcport_handle_login 50:00:e1:11:c5:27:f0:70 DS 0 LS 7 P 0 fl 0 confl 0000000000000000 rscn 0|0 login 0 lid 0 scan 2 fc4type 0
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-307b:7: qla_chk_n2n_b4_login 50:00:e1:11:c5:27:f0:70 DS 0 LS 7 lid 0 retries=30
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20bf:7: qla_chk_n2n_b4_login 1628 50:00:e1:11:c5:27:f0:70 post login
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2069:7: LOOP READY.
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-206b:7: qla2x00_configure_loop: exiting normally. local port wwpn 2100000e1e2227a1 id 0000ef)
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4010:7: Loop resync end.
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: DELETED to LOGIN_PEND - portid=000026.
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2072:7: Async-login - 50:00:e1:11:c5:27:f0:70 hdl=2, loopid=0 portid=000026 retries=29 .
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4000:7: DPC handler sleeping.
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-5036:7: Async-login complete: handle=2 pid=000026 wwpn=50:00:e1:11:c5:27:f0:70 iop0=312
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20dd:7: qla2x00_async_login_sp_done 50:00:e1:11:c5:27:f0:70 res 0 
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-ffff:7: qla24xx_handle_plogi_done_event 50:00:e1:11:c5:27:f0:70 DS 3 LS 7 rc 0 login 0|0 rscn 0|0 data 4000|0 iop 0|0
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20ea:7: qla24xx_handle_plogi_done_event 2232 50:00:e1:11:c5:27:f0:70 LoopID 0x0 in use with 000026. post gpdb
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: LOGIN_PEND to GPDB - portid=000026.
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20dc:7: Async-gpdb 50:00:e1:11:c5:27:f0:70 hndl 0 opt 0
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20db:7: Async done-gpdb res 0, WWPN 50:00:e1:11:c5:27:f0:70 mb[1]=0 mb[2]=ffd3 
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20d2:7: qla24xx_handle_gpdb_event 50:00:e1:11:c5:27:f0:70 DS 5 LS 6 fc4_type 0 rc 0
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2062:7: 50:00:e1:11:c5:27:f0:70 SVC Param w3 0312
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: GPDB to UPD_FCPORT - portid=000026.
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20ef:7: qla2x00_update_fcport 50:00:e1:11:c5:27:f0:70
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: UPD_FCPORT to UPD_FCPORT - portid=000026.
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-207d:7: FCPort 50:00:e1:11:c5:27:f0:70 state transitioned from UNCONFIGURED to ONLINE - portid=000026.
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20ee:7: qla2x00_reg_remote_port: 5000e111c527f070. rport 7:0:0 (00000000380d4127) is tgt mode
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: UPD_FCPORT to LOGIN_COMPLETE - portid=000026.
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-911e:7: qla_register_fcport_fn rscn gen 0/0 next DS 0
Jun 22 14:09:35 tony13 kern.notice kernel: scsi 7:0:0:0: Sequential-Access IBM      ULTRIUM-HH8      K4K1 PQ: 0 ANSI: 6
Jun 22 14:09:35 tony13 kern.notice kernel: st 7:0:0:0: Attached scsi tape st0
Jun 22 14:09:35 tony13 kern.info kernel: st 7:0:0:0: st0: try direct i/o: yes (alignment 8 B)

cat /proc/scsi/scsi

Attached devices:
Host: scsi7 Channel: 00 Id: 00 Lun: 00
  Vendor: IBM      Model: ULTRIUM-HH8      Rev: K4K1
  Type:   Sequential-Access                ANSI  SCSI revision: 06

COMMENT: Tape drive shows up as expected after cable plugged in.

Unplug cable

Jun 22 14:10:54 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-500b:7: LOOP DOWN detected (2 7 0 0).
Jun 22 14:10:54 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20f1:7: Mark all dev lost

sleep 120

cat /proc/scsi/scsi

Attached devices:
Host: scsi7 Channel: 00 Id: 00 Lun: 00
  Vendor: IBM      Model: ULTRIUM-HH8      Rev: K4K1
  Type:   Sequential-Access                ANSI  SCSI revision: 06

COMMENT: Tape drive does not disappear after cable unplugged.

Plug in cable to tape drive B

Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-211a:7: Delaying session delete for FCP2 flags 0x4 port_type = 0x4 port_id=000026 50
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-5009:7: LIP occurred (0).
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-10b7:7: Format 0 : Number of VPs setup 254, number of VPs acquired 1.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-10b8:7: Primary port id 0000ef.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-5010:7: Port logout 0000 0007 0300.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-508a:7: Marking port lost loopid=0000 portid=000026.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: LOGIN_COMPLETE to DELETE_PEND - portid=000026.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-210a:7: qlt_unreg_sess sess 00000000fd4cb6a1 for deletion 50:00:e1:11:c5:27:f0:70
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: DELETE_PEND to DELETE_PEND - portid=000026.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-f084:7: qlt_free_session_done: se_sess 0000000000000000 / sess 00000000fd4cb6a1 from port 50:00:e1:11:c5:27:f0:70 loop_id 0x00 s_id 00:00:26 logout 0 keep 1 els_logo 0
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-207d:7: FCPort 50:00:e1:11:c5:27:f0:70 state transitioned from ONLINE to LOST - portid=000026.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2109:7: qla2x00_schedule_rport_del 5000e111c527f070. rport 00000000380d4127 roles 1
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-500a:7: LOOP UP detected (8 Gbps).
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-5012:7: Port database changed ffff 0006 0000.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: DELETE_PEND to DELETED - portid=000026.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-f001:7: Unregistration of sess 00000000fd4cb6a1 50:00:e1:11:c5:27:f0:70 finished fcp_cnt 0
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-400f:7: Loop resync scheduled.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-8037:7: F/W Ready - OK.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-803a:7: fw_state=3 (7, 141, 0, 4 0) curr time=100016448.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-200b:7: HBA in NL topology.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2014:7: Configure loop -- dpc flags = 0x1260.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2011:7: Entries in ID list (0).
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-d050:7: Link reinitialized
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-206a:7: qla2x00_configure_loop *** FAILED ***.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4010:7: Loop resync end.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4000:7: DPC handler sleeping.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-500c:7: LIP reset occurred (f7f7).
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-5009:7: LIP occurred (f7f7).
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-10b7:7: Format 0 : Number of VPs setup 254, number of VPs acquired 1.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-10b8:7: Primary port id 0000ef.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-5012:7: Port database changed ffff 0006 0000.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-400b:7: Reset marker scheduled.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-400c:7: Reset marker end.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-400f:7: Loop resync scheduled.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-8037:7: F/W Ready - OK.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-803a:7: fw_state=3 (7, 141, 0, 4 0) curr time=1000164b0.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-200b:7: HBA in NL topology.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2014:7: Configure loop -- dpc flags = 0x1260.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2011:7: Entries in ID list (1).
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20d8:7: qla24xx_fcport_handle_login 50:00:e1:11:c5:27:f0:5c DS 0 LS 7 P 0 fl 0 confl 0000000000000000 rscn 0|0 login 0 lid 0 scan 2 fc4type 0
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-307b:7: qla_chk_n2n_b4_login 50:00:e1:11:c5:27:f0:5c DS 0 LS 7 lid 0 retries=30
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20bf:7: qla_chk_n2n_b4_login 1628 50:00:e1:11:c5:27:f0:5c post login
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2069:7: LOOP READY.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-206b:7: qla2x00_configure_loop: exiting normally. local port wwpn 2100000e1e2227a1 id 0000ef)
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4010:7: Loop resync end.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4000:7: DPC handler sleeping.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:5c disc_state transition: DELETED to LOGIN_PEND - portid=000026.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2072:7: Async-login - 50:00:e1:11:c5:27:f0:5c hdl=4, loopid=0 portid=000026 retries=29 .
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-5036:7: Async-login complete: handle=4 pid=000026 wwpn=50:00:e1:11:c5:27:f0:5c iop0=312
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20dd:7: qla2x00_async_login_sp_done 50:00:e1:11:c5:27:f0:5c res 0 
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-ffff:7: qla24xx_handle_plogi_done_event 50:00:e1:11:c5:27:f0:5c DS 3 LS 7 rc 0 login 0|0 rscn 0|0 data 4000|0 iop 0|0
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20ea:7: qla24xx_handle_plogi_done_event 2232 50:00:e1:11:c5:27:f0:5c LoopID 0x0 in use with 000026. post gpdb
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:5c disc_state transition: LOGIN_PEND to GPDB - portid=000026.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20dc:7: Async-gpdb 50:00:e1:11:c5:27:f0:5c hndl 0 opt 0
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20db:7: Async done-gpdb res 0, WWPN 50:00:e1:11:c5:27:f0:5c mb[1]=0 mb[2]=ffd3 
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20d2:7: qla24xx_handle_gpdb_event 50:00:e1:11:c5:27:f0:5c DS 5 LS 6 fc4_type 0 rc 0
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2062:7: 50:00:e1:11:c5:27:f0:5c SVC Param w3 0312
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:5c disc_state transition: GPDB to UPD_FCPORT - portid=000026.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20ef:7: qla2x00_update_fcport 50:00:e1:11:c5:27:f0:5c
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:5c disc_state transition: UPD_FCPORT to UPD_FCPORT - portid=000026.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-207d:7: FCPort 50:00:e1:11:c5:27:f0:5c state transitioned from UNCONFIGURED to ONLINE - portid=000026.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20ee:7: qla2x00_reg_remote_port: 5000e111c527f05c. rport 7:0:1 (0000000091a61914) is tgt mode
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:5c disc_state transition: UPD_FCPORT to LOGIN_COMPLETE - portid=000026.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-911e:7: qla_register_fcport_fn rscn gen 0/0 next DS 0
Jun 22 14:14:34 tony13 kern.notice kernel: scsi 7:0:1:0: Sequential-Access IBM      ULTRIUM-HH9      NCA1 PQ: 0 ANSI: 6
Jun 22 14:14:34 tony13 kern.notice kernel: st 7:0:1:0: Attached scsi tape st1
Jun 22 14:14:34 tony13 kern.info kernel: st 7:0:1:0: st1: try direct i/o: yes (alignment 8 B)
Jun 22 14:14:34 tony13 kern.notice kernel: scsi 7:0:1:1: Medium Changer    CYBERNET CY-LTOLIB        1.40 PQ: 0 ANSI: 5

cat /proc/scsi/scsi

Attached devices:
Host: scsi7 Channel: 00 Id: 00 Lun: 00
  Vendor: IBM      Model: ULTRIUM-HH8      Rev: K4K1
  Type:   Sequential-Access                ANSI  SCSI revision: 06
Host: scsi7 Channel: 00 Id: 01 Lun: 00
  Vendor: IBM      Model: ULTRIUM-HH9      Rev: NCA1
  Type:   Sequential-Access                ANSI  SCSI revision: 06
Host: scsi7 Channel: 00 Id: 01 Lun: 01
  Vendor: CYBERNET Model: CY-LTOLIB        Rev: 1.40
  Type:   Medium Changer                   ANSI  SCSI revision: 05

COMMENT: Tape drive B shows up as expected; tape drive A disappears 30 seconds later:

Jun 22 14:15:03 tony13 kern.err kernel:  rport-7:0-0: blocked FC remote port time out: removing target and saving binding
Jun 22 14:15:03 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-207d:7: FCPort 50:00:e1:11:c5:27:f0:70 state transitioned from LOST to DEAD - portid=000026.

cat /proc/scsi/scsi

Attached devices:
Host: scsi7 Channel: 00 Id: 01 Lun: 00
  Vendor: IBM      Model: ULTRIUM-HH9      Rev: NCA1
  Type:   Sequential-Access                ANSI  SCSI revision: 06
Host: scsi7 Channel: 00 Id: 01 Lun: 01
  Vendor: CYBERNET Model: CY-LTOLIB        Rev: 1.40
  Type:   Medium Changer                   ANSI  SCSI revision: 05

Unplug cable

Jun 22 14:15:31 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-500b:7: LOOP DOWN detected (2 7 0 0).
Jun 22 14:15:31 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20f1:7: Mark all dev lost

cat /proc/scsi/scsi

Attached devices:
Host: scsi7 Channel: 00 Id: 01 Lun: 00
  Vendor: IBM      Model: ULTRIUM-HH9      Rev: NCA1
  Type:   Sequential-Access                ANSI  SCSI revision: 06
Host: scsi7 Channel: 00 Id: 01 Lun: 01
  Vendor: CYBERNET Model: CY-LTOLIB        Rev: 1.40
  Type:   Medium Changer                   ANSI  SCSI revision: 05

COMMENT: Tape drive does not disappear after cable unplugged.

rmmod qla2xxx

Jun 22 14:16:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-211a:7: Delaying session delete for FCP2 flags 0x4 port_type = 0x4 port_id=000026 50
Jun 22 14:16:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-b079:7: Removing driver
Jun 22 14:16:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-00af:7: Performing ISP error recovery - ha=0000000070375427.
Jun 22 14:16:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20f1:7: Mark all dev lost
Jun 22 14:16:39 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-211a:7: Delaying session delete for FCP2 flags 0x4 port_type = 0x4 port_id=000026 50
Jun 22 14:16:39 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4011:7: DPC handler exiting.
Jun 22 14:16:39 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-207d:7: FCPort 50:00:e1:11:c5:27:f0:5c state transitioned from ONLINE to DEAD - portid=000026.
Jun 22 14:16:39 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-b079:6: Removing driver
Jun 22 14:16:39 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-00af:6: Performing ISP error recovery - ha=000000006f17f9d2.
Jun 22 14:16:39 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-20f1:6: Mark all dev lost
Jun 22 14:16:39 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-4011:6: DPC handler exiting.

Tony Battersby
Cybernetics


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [EXT] Re: [REGRESSION] qla2xxx: tape drive not removed after unplug FC cable
  2022-06-22 14:56         ` Tony Battersby
@ 2022-06-22 23:03           ` Arun Easi
  2022-07-04 12:06             ` Thorsten Leemhuis
  0 siblings, 1 reply; 9+ messages in thread
From: Arun Easi @ 2022-06-22 23:03 UTC (permalink / raw)
  To: Tony Battersby
  Cc: Thorsten Leemhuis, Saurav Kashyap, Nilesh Javali,
	GR-QLogic-Storage-Upstream, linux-scsi, linux-kernel,
	regressions

[-- Attachment #1: Type: text/plain, Size: 2356 bytes --]

Hi Tony,

On Wed, 22 Jun 2022, 7:56am, Tony Battersby wrote:

> On 6/21/22 18:05, Arun Easi wrote:
> > Thanks for the info. Just to reiterate, you've reported two issues (though 
> > this log was showing only 1 of them).
> >
> > Issue 1 - Tape device never disappears when removed
> > Issue 2 - When a direct connected tape 1 was replaced with tape 2, tape 2 
> >           was not discovered.
> >
> > For Issue-2, please try the attached patch. This may not be the final fix, 
> > but wanted to check if that would fix the issue for you.
> >
> > For Issue-1, the behavior was intentional, though that behavior needs 
> > refinement. These tape drives support something called FC sequence level 
> > error recovery (added in FCP-2), which can make tape I/Os survive even 
> > across a short cable pull. This is not a simple retry of the I/O, rather a 
> > retry done at the FC sequence level that gives the IO a better chance of
> > revival. In other words, the said patch that caused regression, while 
> > introduces an incorrect reporting of the state of the device, makes backup 
> > more resilient.
> >
> > Now, onto the behavior when device state is reported immediately. What we 
> > have observed, at least with one tape drive from a major vendor, is that, 
> > across a device loss and device back case with both the events reported to 
> > upper layers, the backup operation was getting failed. This is due to a 
> > REPORT LUNS command being issued during device reappearance reporting 
> > (fc_remote_port_add -> SCSI scan), which the tape drive was not expecting 
> > and caused the backup to fail.
> >
> > I know that some tape drives do not support multiple commands to it at the 
> > same time, but not sure if that is still the norm these days.
> >
> > So, perhaps one way to make the behavior better, is to either report the 
> > disappearing device a bit delayed or have intelligence added in SCSI scan 
> > to detect ongoing tape IO operations and delay/avoid the REPORT LUNs. 
> > Former is a more contained (in the LLD) fix.
> >
> > Regards,
> > -Arun
> 
> Your patch does fix Issue-2 for me.  For Issue-1, it would be fine with
> me if qla2xxx reported device removal to the upper level a bit delayed,
> as you said.
> 

Thanks for testing and verifying the patch. We will post the patch 
upstream after due testing.

Regards,
-Arun

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [EXT] Re: [REGRESSION] qla2xxx: tape drive not removed after unplug FC cable
  2022-06-22 23:03           ` Arun Easi
@ 2022-07-04 12:06             ` Thorsten Leemhuis
  2022-07-06 17:29               ` Arun Easi
  0 siblings, 1 reply; 9+ messages in thread
From: Thorsten Leemhuis @ 2022-07-04 12:06 UTC (permalink / raw)
  To: Arun Easi, Tony Battersby
  Cc: Saurav Kashyap, Nilesh Javali, GR-QLogic-Storage-Upstream,
	linux-scsi, linux-kernel, regressions

On 23.06.22 01:03, Arun Easi wrote:
> On Wed, 22 Jun 2022, 7:56am, Tony Battersby wrote:
> 
>> On 6/21/22 18:05, Arun Easi wrote:
>>> Thanks for the info. Just to reiterate, you've reported two issues (though 
>>> this log was showing only 1 of them).
>>>
>>> Issue 1 - Tape device never disappears when removed
>>> Issue 2 - When a direct connected tape 1 was replaced with tape 2, tape 2 
>>>           was not discovered.
>>>
>>> For Issue-2, please try the attached patch. This may not be the final fix, 
>>> but wanted to check if that would fix the issue for you.
>>>
>>> For Issue-1, the behavior was intentional, though that behavior needs 
>>> refinement. These tape drives support something called FC sequence level 
>>> error recovery (added in FCP-2), which can make tape I/Os survive even 
>>> across a short cable pull. This is not a simple retry of the I/O, rather a 
>>> retry done at the FC sequence level that gives the IO a better chance of
>>> revival. In other words, the said patch that caused regression, while 
>>> introduces an incorrect reporting of the state of the device, makes backup 
>>> more resilient.
>>>
>>> Now, onto the behavior when device state is reported immediately. What we 
>>> have observed, at least with one tape drive from a major vendor, is that, 
>>> across a device loss and device back case with both the events reported to 
>>> upper layers, the backup operation was getting failed. This is due to a 
>>> REPORT LUNS command being issued during device reappearance reporting 
>>> (fc_remote_port_add -> SCSI scan), which the tape drive was not expecting 
>>> and caused the backup to fail.
>>>
>>> I know that some tape drives do not support multiple commands to it at the 
>>> same time, but not sure if that is still the norm these days.
>>>
>>> So, perhaps one way to make the behavior better, is to either report the 
>>> disappearing device a bit delayed or have intelligence added in SCSI scan 
>>> to detect ongoing tape IO operations and delay/avoid the REPORT LUNs. 
>>> Former is a more contained (in the LLD) fix.
>>>
>>> Regards,
>>> -Arun
>>
>> Your patch does fix Issue-2 for me.  For Issue-1, it would be fine with
>> me if qla2xxx reported device removal to the upper level a bit delayed,
>> as you said.
>>
> 
> Thanks for testing and verifying the patch.

BTW, that patch should have 'Link:' tags pointing to all reports about
this issue, e.g. the start of this thread.

These tags are important, as they allow others to look into the
backstory now and years from now. That is why they should be placed in
cases like this, as Documentation/process/submitting-patches.rst and
Documentation/process/5.Posting.rst explain in more detail.
Additionally, my regression tracking bot ‘regzbot’ relies on these tags
to automatically connect reports with patches that are posted or
committed to fix the reported issue. BTW, let me tell regzbot to monitor
this thread:

> We will post the patch upstream after due testing.
That was more than two weeks ago now and I didn't see any progress. Or
did I miss it?

Reminder, things should take this long. For details see the section
"Prioritize work on fixing regressions" in this document:
https://docs.kernel.org/process/handling-regressions.html

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

#regzbot poke

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [EXT] Re: [REGRESSION] qla2xxx: tape drive not removed after unplug FC cable
  2022-07-04 12:06             ` Thorsten Leemhuis
@ 2022-07-06 17:29               ` Arun Easi
  0 siblings, 0 replies; 9+ messages in thread
From: Arun Easi @ 2022-07-06 17:29 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Tony Battersby, Saurav Kashyap, Nilesh Javali,
	GR-QLogic-Storage-Upstream, linux-scsi, linux-kernel,
	regressions

[-- Attachment #1: Type: text/plain, Size: 3370 bytes --]

Hi Thorsten,

On Mon, 4 Jul 2022, 5:06am, Thorsten Leemhuis wrote:

> On 23.06.22 01:03, Arun Easi wrote:
> > On Wed, 22 Jun 2022, 7:56am, Tony Battersby wrote:
> > 
> >> On 6/21/22 18:05, Arun Easi wrote:
> >>> Thanks for the info. Just to reiterate, you've reported two issues (though 
> >>> this log was showing only 1 of them).
> >>>
> >>> Issue 1 - Tape device never disappears when removed
> >>> Issue 2 - When a direct connected tape 1 was replaced with tape 2, tape 2 
> >>>           was not discovered.
> >>>
> >>> For Issue-2, please try the attached patch. This may not be the final fix, 
> >>> but wanted to check if that would fix the issue for you.
> >>>
> >>> For Issue-1, the behavior was intentional, though that behavior needs 
> >>> refinement. These tape drives support something called FC sequence level 
> >>> error recovery (added in FCP-2), which can make tape I/Os survive even 
> >>> across a short cable pull. This is not a simple retry of the I/O, rather a 
> >>> retry done at the FC sequence level that gives the IO a better chance of
> >>> revival. In other words, the said patch that caused regression, while 
> >>> introduces an incorrect reporting of the state of the device, makes backup 
> >>> more resilient.
> >>>
> >>> Now, onto the behavior when device state is reported immediately. What we 
> >>> have observed, at least with one tape drive from a major vendor, is that, 
> >>> across a device loss and device back case with both the events reported to 
> >>> upper layers, the backup operation was getting failed. This is due to a 
> >>> REPORT LUNS command being issued during device reappearance reporting 
> >>> (fc_remote_port_add -> SCSI scan), which the tape drive was not expecting 
> >>> and caused the backup to fail.
> >>>
> >>> I know that some tape drives do not support multiple commands to it at the 
> >>> same time, but not sure if that is still the norm these days.
> >>>
> >>> So, perhaps one way to make the behavior better, is to either report the 
> >>> disappearing device a bit delayed or have intelligence added in SCSI scan 
> >>> to detect ongoing tape IO operations and delay/avoid the REPORT LUNs. 
> >>> Former is a more contained (in the LLD) fix.
> >>>
> >>> Regards,
> >>> -Arun
> >>
> >> Your patch does fix Issue-2 for me.  For Issue-1, it would be fine with
> >> me if qla2xxx reported device removal to the upper level a bit delayed,
> >> as you said.
> >>
> > 
> > Thanks for testing and verifying the patch.
> 
> BTW, that patch should have 'Link:' tags pointing to all reports about
> this issue, e.g. the start of this thread.

Thanks, will add, I was not aware of this.

> 
> These tags are important, as they allow others to look into the
> backstory now and years from now. That is why they should be placed in
> cases like this, as Documentation/process/submitting-patches.rst and
> Documentation/process/5.Posting.rst explain in more detail.
> Additionally, my regression tracking bot ‘regzbot’ relies on these tags
> to automatically connect reports with patches that are posted or
> committed to fix the reported issue. BTW, let me tell regzbot to monitor
> this thread:
> 
> > We will post the patch upstream after due testing.
> That was more than two weeks ago now and I didn't see any progress. Or
> did I miss it?

No, the fix is being prepared to be posted.

Regards,
-Arun

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-07-06 17:29 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-25 22:03 [REGRESSION] qla2xxx: tape drive not removed after unplug FC cable Tony Battersby
2022-05-28  0:27 ` Arun Easi
2022-06-20  6:56   ` Thorsten Leemhuis
2022-06-20 14:33     ` Tony Battersby
2022-06-21 22:05       ` [EXT] " Arun Easi
2022-06-22 14:56         ` Tony Battersby
2022-06-22 23:03           ` Arun Easi
2022-07-04 12:06             ` Thorsten Leemhuis
2022-07-06 17:29               ` Arun Easi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).