linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] scsi: qla2xxx: Bug fixes
@ 2019-09-12  0:39 Roman Bolshakov
  2019-09-12  0:39 ` [PATCH 1/4] scsi: qla2xxx: Ignore NULL pointer in tcm_qla2xxx_free_mcmd Roman Bolshakov
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Roman Bolshakov @ 2019-09-12  0:39 UTC (permalink / raw)
  To: linux-scsi; +Cc: Roman Bolshakov, Quinn Tran, Himanshu Madhani, Bart Van Assche

Hi Martin,

This series has a few bug fixes for the driver.

Note, #1 only fixes the crash in the kernel. The complete fix for clean
ACL deletion from initiator side is in works and requires a discussion.

As of now initiator is not aware that target no longer wants talking to
it, that implies unneeded timeout. It might be fixed by making LOGO
explicit on session deletion but it's an issue I want to raise first
before making the change. Whether we need implicit LOGO in qla2xxx,
explicit or use both.

Also, an unsolicited ABTS from a port without session would still result
in BA_RJT response instead of frame discard and LOGO ELS, as specified
in FCP (12.3.3 Target FCP_Port response to Exchange termination):

   When an ABTS-LS is received at the target FCP_Port, it shall abort
   the designated Exchange and return one of the following responses:

   a) the target FCP_Port shall discard the ABTS-LS and transmit a LOGO
      ELS if the Nx_Port issuing the ABTS-LS is not currently logged in
      (i.e., no N_Port Login exists);

FWIW, the target driver can receive ABTS as part of ABORT TASK/LUN
RESET/CLEAR TASK SET TMFs and in case of failed sequence retransmission
requests, exchange or sequence errors. IIRC, some initiators requeue
SCSI commands if BA_RJT is received. Therefore, a timely LOGO will
prevent a perceived session freeze on the initiators.

Thanks,
Roman

Roman Bolshakov (4):
  scsi: qla2xxx: Ignore NULL pointer in tcm_qla2xxx_free_mcmd
  scsi: qla2xxx: Initialize free_work before flushing it
  scsi: qla2xxx: Drop superfluous INIT_WORK of del_work
  scsi: qla2xxx: Change discovery state before PLOGI

 drivers/scsi/qla2xxx/qla_init.c    | 2 ++
 drivers/scsi/qla2xxx/qla_target.c  | 2 --
 drivers/scsi/qla2xxx/tcm_qla2xxx.c | 2 ++
 3 files changed, 4 insertions(+), 2 deletions(-)

-- 
2.22.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/4] scsi: qla2xxx: Ignore NULL pointer in tcm_qla2xxx_free_mcmd
  2019-09-12  0:39 [PATCH 0/4] scsi: qla2xxx: Bug fixes Roman Bolshakov
@ 2019-09-12  0:39 ` Roman Bolshakov
  2019-09-12  0:39 ` [PATCH 2/4] scsi: qla2xxx: Initialize free_work before flushing it Roman Bolshakov
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Roman Bolshakov @ 2019-09-12  0:39 UTC (permalink / raw)
  To: linux-scsi
  Cc: Roman Bolshakov, Quinn Tran, Himanshu Madhani, Bart Van Assche, stable

If ABTS cannot be completed in target mode, the driver attempts to free
related management command and crashes:

  NIP [d000000019181ee8] tcm_qla2xxx_free_mcmd+0x40/0x80 [tcm_qla2xxx]
  LR [d00000001dc1e6f8] qlt_response_pkt+0x190/0xa10 [qla2xxx]
  Call Trace:
  [c000003fff27bb50] [c000003fff27bc10] 0xc000003fff27bc10 (unreliable)
  [c000003fff27bb70] [d00000001dc1e6f8] qlt_response_pkt+0x190/0xa10 [qla2xxx]
  [c000003fff27bc10] [d00000001dbc2be0] qla24xx_process_response_queue+0x5d8/0xbd0 [qla2xxx]
  [c000003fff27bd50] [d00000001dbc632c] qla24xx_msix_rsp_q+0x64/0x150 [qla2xxx]
  [c000003fff27bde0] [c000000000187200] __handle_irq_event_percpu+0x90/0x310
  [c000003fff27bea0] [c0000000001874b8] handle_irq_event_percpu+0x38/0x90
  [c000003fff27bee0] [c000000000187574] handle_irq_event+0x64/0xb0
  [c000003fff27bf10] [c00000000018cd38] handle_fasteoi_irq+0xe8/0x280
  [c000003fff27bf40] [c000000000185ccc] generic_handle_irq+0x4c/0x70
  [c000003fff27bf60] [c000000000016cec] __do_irq+0x7c/0x1d0
  [c000003fff27bf90] [c00000000002a530] call_do_irq+0x14/0x24
  [c00000207d2cba90] [c000000000016edc] do_IRQ+0x9c/0x130
  [c00000207d2cbae0] [c000000000008bf4] hardware_interrupt_common+0x114/0x120
  --- interrupt: 501 at arch_local_irq_restore+0x74/0x90
      LR = arch_local_irq_restore+0x74/0x90
  [c00000207d2cbdd0] [c0000000001c64fc] tick_broadcast_oneshot_control+0x4c/0x60 (unreliable)
  [c00000207d2cbdf0] [c0000000007ac840] cpuidle_enter_state+0xf0/0x450
  [c00000207d2cbe50] [c00000000016b81c] call_cpuidle+0x4c/0x90
  [c00000207d2cbe70] [c00000000016bc30] do_idle+0x2b0/0x330
  [c00000207d2cbec0] [c00000000016beec] cpu_startup_entry+0x3c/0x50
  [c00000207d2cbef0] [c00000000004a06c] start_secondary+0x63c/0x670
  [c00000207d2cbf90] [c00000000000aa6c] start_secondary_prolog+0x10/0x14

The crash can be triggered by ACL deletion when there's active I/O.

During ACL deletion, qla2xxx performs implicit LOGO that's invisible for
the initiator. Only the driver and firmware are aware of the logout.
Therefore the initiator continues to send SCSI commands and the target
always responds with SAM STATUS BUSY as it can't find the session.

The command times out after a while and initiator invokes ABORT TASK TMF
for the command. The TMF is mapped to ABTS-LS in FCP. The target can't
find session for S_ID originating ABTS-LS so it never allocates mcmd.
And since N_Port handle was deleted after LOGO, it is no longer valid
and ABTS Response IOCB is returned from firmware with status 31. Then
free_mcmd is invoked on NULL pointer and the kernel crashes.

[ 7734.578642] qla2xxx [0000:00:0c.0]-e837:6: ABTS_RECV_24XX: instance 0
[ 7734.578644] qla2xxx [0000:00:0c.0]-f811:6: qla_target(0): task abort (s_id=1:2:0, tag=1209504, param=0)
[ 7734.578645] find_sess_by_s_id: 0x010200
[ 7734.578645] Unable to locate s_id: 0x010200
[ 7734.578646] qla2xxx [0000:00:0c.0]-f812:6: qla_target(0): task abort for non-existent session
[ 7734.578648] qla2xxx [0000:00:0c.0]-e806:6: Sending task mgmt ABTS response (ha=c0000000d5819000, atio=c0000000d3fd4700, status=4
[ 7734.578730] qla2xxx [0000:00:0c.0]-e838:6: ABTS_RESP_24XX: compl_status 31
[ 7734.578732] qla2xxx [0000:00:0c.0]-e863:6: qla_target(0): ABTS_RESP_24XX failed 31 (subcode 19:a)
[ 7734.578740] Unable to handle kernel paging request for data at address 0x00000200

Fixes: 6b0431d6fa20b ("scsi: qla2xxx: Fix out of order Termination and ABTS response")
Cc: Quinn Tran <qtran@marvell.com>
Cc: Himanshu Madhani <hmadhani@marvell.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: stable@vger.kernel.org
Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
---
 drivers/scsi/qla2xxx/tcm_qla2xxx.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/qla2xxx/tcm_qla2xxx.c b/drivers/scsi/qla2xxx/tcm_qla2xxx.c
index 042a24314edc..bab2073c1f72 100644
--- a/drivers/scsi/qla2xxx/tcm_qla2xxx.c
+++ b/drivers/scsi/qla2xxx/tcm_qla2xxx.c
@@ -246,6 +246,8 @@ static void tcm_qla2xxx_complete_mcmd(struct work_struct *work)
  */
 static void tcm_qla2xxx_free_mcmd(struct qla_tgt_mgmt_cmd *mcmd)
 {
+	if (!mcmd)
+		return;
 	INIT_WORK(&mcmd->free_work, tcm_qla2xxx_complete_mcmd);
 	queue_work(tcm_qla2xxx_free_wq, &mcmd->free_work);
 }
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/4] scsi: qla2xxx: Initialize free_work before flushing it
  2019-09-12  0:39 [PATCH 0/4] scsi: qla2xxx: Bug fixes Roman Bolshakov
  2019-09-12  0:39 ` [PATCH 1/4] scsi: qla2xxx: Ignore NULL pointer in tcm_qla2xxx_free_mcmd Roman Bolshakov
@ 2019-09-12  0:39 ` Roman Bolshakov
  2019-09-12  0:39 ` [PATCH 3/4] scsi: qla2xxx: Drop superfluous INIT_WORK of del_work Roman Bolshakov
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Roman Bolshakov @ 2019-09-12  0:39 UTC (permalink / raw)
  To: linux-scsi; +Cc: Roman Bolshakov, Quinn Tran, Himanshu Madhani, stable

Target creation triggers a new BUG_ON introduced in in 4d43d395fed12
("workqueue: Try to catch flush_work() without INIT_WORK().").
The BUG_ON reveals an attempt to flush free_work in qla24xx_do_nack_work
before it's initialized in qlt_unreg_sess:

  WARNING: CPU: 7 PID: 211 at kernel/workqueue.c:3031 __flush_work.isra.38+0x40/0x2e0
  CPU: 7 PID: 211 Comm: kworker/7:1 Kdump: loaded Tainted: G            E     5.3.0-rc7-vanilla+ #2
  Workqueue: qla2xxx_wq qla2x00_iocb_work_fn [qla2xxx]
  NIP:  c000000000159620 LR: c0080000009d91b0 CTR: c0000000001598c0
  REGS: c000000005f3f730 TRAP: 0700   Tainted: G            E      (5.3.0-rc7-vanilla+)
  MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24002222  XER: 00000000
  CFAR: c0000000001598d0 IRQMASK: 0
  GPR00: c0080000009d91b0 c000000005f3f9c0 c000000001670a00 c0000003f8655ca8
  GPR04: c0000003f8655c00 000000000000ffff 0000000000000011 ffffffffffffffff
  GPR08: c008000000949228 0000000000000000 0000000000000001 c0080000009e7780
  GPR12: 0000000000002200 c00000003fff6200 c000000000161bc8 0000000000000004
  GPR16: c0000003f9d68280 0000000002000000 0000000000000005 0000000000000003
  GPR20: 0000000000000002 000000000000ffff 0000000000000000 fffffffffffffef7
  GPR24: c000000004f73848 c000000004f73838 c000000004f73f28 c000000005f3fb60
  GPR28: c000000004f73e48 c000000004f73c80 c000000004f73818 c0000003f9d68280
  NIP [c000000000159620] __flush_work.isra.38+0x40/0x2e0
  LR [c0080000009d91b0] qla24xx_do_nack_work+0x88/0x180 [qla2xxx]
  Call Trace:
  [c000000005f3f9c0] [c000000000159644] __flush_work.isra.38+0x64/0x2e0 (unreliable)
  [c000000005f3fa50] [c0080000009d91a0] qla24xx_do_nack_work+0x78/0x180 [qla2xxx]
  [c000000005f3fae0] [c0080000009496ec] qla2x00_do_work+0x604/0xb90 [qla2xxx]
  [c000000005f3fc40] [c008000000949cd8] qla2x00_iocb_work_fn+0x60/0xe0 [qla2xxx]
  [c000000005f3fc80] [c000000000157bb8] process_one_work+0x2c8/0x5b0
  [c000000005f3fd10] [c000000000157f28] worker_thread+0x88/0x660
  [c000000005f3fdb0] [c000000000161d64] kthread+0x1a4/0x1b0
  [c000000005f3fe20] [c00000000000b960] ret_from_kernel_thread+0x5c/0x7c
  Instruction dump:
  3d22001d 892966b1 7d908026 91810008 f821ff71 69290001 0b090000 2e290000
  40920200 e9230018 7d2a0074 794ad182 <0b0a0000> 2fa90000 419e01e8 7c0802a6
  ---[ end trace 5ccf335d4f90fcb8 ]---

Fixes: 1021f0bc2f3d6 ("scsi: qla2xxx: allow session delete to finish before create.")
Cc: Quinn Tran <qtran@marvell.com>
Cc: Himanshu Madhani <hmadhani@marvell.com>
Cc: stable@vger.kernel.org
Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
---
 drivers/scsi/qla2xxx/qla_init.c   | 1 +
 drivers/scsi/qla2xxx/qla_target.c | 1 -
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
index cb08004ef3f8..c8d89912d044 100644
--- a/drivers/scsi/qla2xxx/qla_init.c
+++ b/drivers/scsi/qla2xxx/qla_init.c
@@ -4803,6 +4803,7 @@ qla2x00_alloc_fcport(scsi_qla_host_t *vha, gfp_t flags)
 	}
 
 	INIT_WORK(&fcport->del_work, qla24xx_delete_sess_fn);
+	INIT_WORK(&fcport->free_work, qlt_free_session_done);
 	INIT_WORK(&fcport->reg_work, qla_register_fcport_fn);
 	INIT_LIST_HEAD(&fcport->gnl_entry);
 	INIT_LIST_HEAD(&fcport->list);
diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c
index 0ffda6171614..bd1d30524de5 100644
--- a/drivers/scsi/qla2xxx/qla_target.c
+++ b/drivers/scsi/qla2xxx/qla_target.c
@@ -1160,7 +1160,6 @@ void qlt_unreg_sess(struct fc_port *sess)
 	sess->last_rscn_gen = sess->rscn_gen;
 	sess->last_login_gen = sess->login_gen;
 
-	INIT_WORK(&sess->free_work, qlt_free_session_done);
 	schedule_work(&sess->free_work);
 }
 EXPORT_SYMBOL(qlt_unreg_sess);
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/4] scsi: qla2xxx: Drop superfluous INIT_WORK of del_work
  2019-09-12  0:39 [PATCH 0/4] scsi: qla2xxx: Bug fixes Roman Bolshakov
  2019-09-12  0:39 ` [PATCH 1/4] scsi: qla2xxx: Ignore NULL pointer in tcm_qla2xxx_free_mcmd Roman Bolshakov
  2019-09-12  0:39 ` [PATCH 2/4] scsi: qla2xxx: Initialize free_work before flushing it Roman Bolshakov
@ 2019-09-12  0:39 ` Roman Bolshakov
  2019-09-12  0:39 ` [PATCH 4/4] scsi: qla2xxx: Change discovery state before PLOGI Roman Bolshakov
  2019-09-12  5:37 ` [PATCH 0/4] scsi: qla2xxx: Bug fixes Bart Van Assche
  4 siblings, 0 replies; 12+ messages in thread
From: Roman Bolshakov @ 2019-09-12  0:39 UTC (permalink / raw)
  To: linux-scsi; +Cc: Roman Bolshakov, Quinn Tran, Himanshu Madhani, Hannes Reinecke

del_work is already initialized inside qla2x00_alloc_fcport, there's no
need to overwrite it. Indeed, it might prevent complete traversal of
workqueue list.

Fixes: a01c77d2cbc45 ("scsi: qla2xxx: Move session delete to driver work queue")
Cc: Quinn Tran <qtran@marvell.com>
Cc: Himanshu Madhani <hmadhani@marvell.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
---
 drivers/scsi/qla2xxx/qla_target.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c
index bd1d30524de5..13fd617d6b96 100644
--- a/drivers/scsi/qla2xxx/qla_target.c
+++ b/drivers/scsi/qla2xxx/qla_target.c
@@ -1264,7 +1264,6 @@ void qlt_schedule_sess_for_deletion(struct fc_port *sess)
 	    "Scheduling sess %p for deletion %8phC\n",
 	    sess, sess->port_name);
 
-	INIT_WORK(&sess->del_work, qla24xx_delete_sess_fn);
 	WARN_ON(!queue_work(sess->vha->hw->wq, &sess->del_work));
 }
 
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/4] scsi: qla2xxx: Change discovery state before PLOGI
  2019-09-12  0:39 [PATCH 0/4] scsi: qla2xxx: Bug fixes Roman Bolshakov
                   ` (2 preceding siblings ...)
  2019-09-12  0:39 ` [PATCH 3/4] scsi: qla2xxx: Drop superfluous INIT_WORK of del_work Roman Bolshakov
@ 2019-09-12  0:39 ` Roman Bolshakov
  2019-09-12  5:37 ` [PATCH 0/4] scsi: qla2xxx: Bug fixes Bart Van Assche
  4 siblings, 0 replies; 12+ messages in thread
From: Roman Bolshakov @ 2019-09-12  0:39 UTC (permalink / raw)
  To: linux-scsi; +Cc: Roman Bolshakov, Quinn Tran, Himanshu Madhani, stable

When a port sends PLOGI, discovery state should be changed to login
pending, otherwise RELOGIN_NEEDED bit is set in
qla24xx_handle_plogi_done_event(). RELOGIN_NEEDED triggers another
PLOGI, and it never goes out of the loop until login timer expires.

Fixes: 8777e4314d397 ("scsi: qla2xxx: Migrate NVME N2N handling into state machine")
Fixes: 8b5292bcfcacf ("scsi: qla2xxx: Fix Relogin to prevent modifying scan_state flag")
Cc: Quinn Tran <qtran@marvell.com>
Cc: Himanshu Madhani <hmadhani@marvell.com>
Cc: stable@vger.kernel.org
Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
---
 drivers/scsi/qla2xxx/qla_init.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
index c8d89912d044..e4857ef0e5c4 100644
--- a/drivers/scsi/qla2xxx/qla_init.c
+++ b/drivers/scsi/qla2xxx/qla_init.c
@@ -516,6 +516,7 @@ static int qla_post_els_plogi_work(struct scsi_qla_host *vha, fc_port_t *fcport)
 
 	e->u.fcport.fcport = fcport;
 	fcport->flags |= FCF_ASYNC_ACTIVE;
+	fcport->disc_state = DSC_LOGIN_PEND;
 	return qla2x00_post_work(vha, e);
 }
 
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/4] scsi: qla2xxx: Bug fixes
  2019-09-12  0:39 [PATCH 0/4] scsi: qla2xxx: Bug fixes Roman Bolshakov
                   ` (3 preceding siblings ...)
  2019-09-12  0:39 ` [PATCH 4/4] scsi: qla2xxx: Change discovery state before PLOGI Roman Bolshakov
@ 2019-09-12  5:37 ` Bart Van Assche
  2019-09-12 13:49   ` Roman Bolshakov
  4 siblings, 1 reply; 12+ messages in thread
From: Bart Van Assche @ 2019-09-12  5:37 UTC (permalink / raw)
  To: Roman Bolshakov, linux-scsi; +Cc: Quinn Tran, Himanshu Madhani

On 9/12/19 1:39 AM, Roman Bolshakov wrote:
> This series has a few bug fixes for the driver.
> 
> Note, #1 only fixes the crash in the kernel. The complete fix for clean
> ACL deletion from initiator side is in works and requires a discussion.
> 
> As of now initiator is not aware that target no longer wants talking to
> it, that implies unneeded timeout. It might be fixed by making LOGO
> explicit on session deletion but it's an issue I want to raise first
> before making the change. Whether we need implicit LOGO in qla2xxx,
> explicit or use both.
> 
> Also, an unsolicited ABTS from a port without session would still result
> in BA_RJT response instead of frame discard and LOGO ELS, as specified
> in FCP (12.3.3 Target FCP_Port response to Exchange termination):
> 
>    When an ABTS-LS is received at the target FCP_Port, it shall abort
>    the designated Exchange and return one of the following responses:
> 
>    a) the target FCP_Port shall discard the ABTS-LS and transmit a LOGO
>       ELS if the Nx_Port issuing the ABTS-LS is not currently logged in
>       (i.e., no N_Port Login exists);
> 
> FWIW, the target driver can receive ABTS as part of ABORT TASK/LUN
> RESET/CLEAR TASK SET TMFs and in case of failed sequence retransmission
> requests, exchange or sequence errors. IIRC, some initiators requeue
> SCSI commands if BA_RJT is received. Therefore, a timely LOGO will
> prevent a perceived session freeze on the initiators.

Hi Roman,

Has this patch series been prepared against Linus' master branch,
against Martin's 5.3/scsi-fixes or against Martin's 5.4/scsi-queue
branch? I'm asking this because some patches in this series look similar
to patches that are already present in the 5.4/scsi-queue branch.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/4] scsi: qla2xxx: Bug fixes
  2019-09-12  5:37 ` [PATCH 0/4] scsi: qla2xxx: Bug fixes Bart Van Assche
@ 2019-09-12 13:49   ` Roman Bolshakov
  2019-09-12 13:53     ` Himanshu Madhani
  0 siblings, 1 reply; 12+ messages in thread
From: Roman Bolshakov @ 2019-09-12 13:49 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-scsi, Quinn Tran, Himanshu Madhani

On Thu, Sep 12, 2019 at 06:37:22AM +0100, Bart Van Assche wrote:
> On 9/12/19 1:39 AM, Roman Bolshakov wrote:
> > This series has a few bug fixes for the driver.
> > 
> > Note, #1 only fixes the crash in the kernel. The complete fix for clean
> > ACL deletion from initiator side is in works and requires a discussion.
> > 
> > As of now initiator is not aware that target no longer wants talking to
> > it, that implies unneeded timeout. It might be fixed by making LOGO
> > explicit on session deletion but it's an issue I want to raise first
> > before making the change. Whether we need implicit LOGO in qla2xxx,
> > explicit or use both.
> > 
> > Also, an unsolicited ABTS from a port without session would still result
> > in BA_RJT response instead of frame discard and LOGO ELS, as specified
> > in FCP (12.3.3 Target FCP_Port response to Exchange termination):
> > 
> >    When an ABTS-LS is received at the target FCP_Port, it shall abort
> >    the designated Exchange and return one of the following responses:
> > 
> >    a) the target FCP_Port shall discard the ABTS-LS and transmit a LOGO
> >       ELS if the Nx_Port issuing the ABTS-LS is not currently logged in
> >       (i.e., no N_Port Login exists);
> > 
> > FWIW, the target driver can receive ABTS as part of ABORT TASK/LUN
> > RESET/CLEAR TASK SET TMFs and in case of failed sequence retransmission
> > requests, exchange or sequence errors. IIRC, some initiators requeue
> > SCSI commands if BA_RJT is received. Therefore, a timely LOGO will
> > prevent a perceived session freeze on the initiators.
> 
> Hi Roman,
> 
> Has this patch series been prepared against Linus' master branch,
> against Martin's 5.3/scsi-fixes or against Martin's 5.4/scsi-queue
> branch? I'm asking this because some patches in this series look similar
> to patches that are already present in the 5.4/scsi-queue branch.
> 
> Thanks,
> 
> Bart.
> 

Hi Bart,

To be honest it was prepared against next-20190904 but it applies to
5.4/scsi-queue cleanly. The fixes made two weeks ago look promising but
are related to stuck PRLI and unhandled RSCN while #4 is related to
stuck PLOGI after qla_post_els_plogi_work.

Thank you,
Roman

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/4] scsi: qla2xxx: Bug fixes
  2019-09-12 13:49   ` Roman Bolshakov
@ 2019-09-12 13:53     ` Himanshu Madhani
  2019-11-07 19:00       ` Roman Bolshakov
  0 siblings, 1 reply; 12+ messages in thread
From: Himanshu Madhani @ 2019-09-12 13:53 UTC (permalink / raw)
  To: Roman Bolshakov, Bart Van Assche; +Cc: linux-scsi, Quinn Tran

Adding Correct Quinn. Please use "qutran@mavell.com"

We'll take a look at the series

On 9/12/19, 8:49 AM, "linux-scsi-owner@vger.kernel.org on behalf of Roman Bolshakov" <linux-scsi-owner@vger.kernel.org on behalf of r.bolshakov@yadro.com> wrote:

    On Thu, Sep 12, 2019 at 06:37:22AM +0100, Bart Van Assche wrote:
    > On 9/12/19 1:39 AM, Roman Bolshakov wrote:
    > > This series has a few bug fixes for the driver.
    > > 
    > > Note, #1 only fixes the crash in the kernel. The complete fix for clean
    > > ACL deletion from initiator side is in works and requires a discussion.
    > > 
    > > As of now initiator is not aware that target no longer wants talking to
    > > it, that implies unneeded timeout. It might be fixed by making LOGO
    > > explicit on session deletion but it's an issue I want to raise first
    > > before making the change. Whether we need implicit LOGO in qla2xxx,
    > > explicit or use both.
    > > 
    > > Also, an unsolicited ABTS from a port without session would still result
    > > in BA_RJT response instead of frame discard and LOGO ELS, as specified
    > > in FCP (12.3.3 Target FCP_Port response to Exchange termination):
    > > 
    > >    When an ABTS-LS is received at the target FCP_Port, it shall abort
    > >    the designated Exchange and return one of the following responses:
    > > 
    > >    a) the target FCP_Port shall discard the ABTS-LS and transmit a LOGO
    > >       ELS if the Nx_Port issuing the ABTS-LS is not currently logged in
    > >       (i.e., no N_Port Login exists);
    > > 
    > > FWIW, the target driver can receive ABTS as part of ABORT TASK/LUN
    > > RESET/CLEAR TASK SET TMFs and in case of failed sequence retransmission
    > > requests, exchange or sequence errors. IIRC, some initiators requeue
    > > SCSI commands if BA_RJT is received. Therefore, a timely LOGO will
    > > prevent a perceived session freeze on the initiators.
    > 
    > Hi Roman,
    > 
    > Has this patch series been prepared against Linus' master branch,
    > against Martin's 5.3/scsi-fixes or against Martin's 5.4/scsi-queue
    > branch? I'm asking this because some patches in this series look similar
    > to patches that are already present in the 5.4/scsi-queue branch.
    > 
    > Thanks,
    > 
    > Bart.
    > 
    
    Hi Bart,
    
    To be honest it was prepared against next-20190904 but it applies to
    5.4/scsi-queue cleanly. The fixes made two weeks ago look promising but
    are related to stuck PRLI and unhandled RSCN while #4 is related to
    stuck PLOGI after qla_post_els_plogi_work.
    
    Thank you,
    Roman
    


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/4] scsi: qla2xxx: Bug fixes
  2019-09-12 13:53     ` Himanshu Madhani
@ 2019-11-07 19:00       ` Roman Bolshakov
  2019-11-13 18:54         ` Roman Bolshakov
  0 siblings, 1 reply; 12+ messages in thread
From: Roman Bolshakov @ 2019-11-07 19:00 UTC (permalink / raw)
  To: Himanshu Madhani; +Cc: Bart Van Assche, linux-scsi, Quinn Tran

Hi Himanshu,

Could you please take a look at the series and anwser if we should stop
doing BA_RJT as a response on ABTS when there's no session?

Thank you,
Roman

On Thu, Sep 12, 2019 at 01:53:03PM +0000, Himanshu Madhani wrote:
> Adding Correct Quinn. Please use "qutran@mavell.com"
> 
> We'll take a look at the series
> 
> On 9/12/19, 8:49 AM, "linux-scsi-owner@vger.kernel.org on behalf of Roman Bolshakov" <linux-scsi-owner@vger.kernel.org on behalf of r.bolshakov@yadro.com> wrote:
> 
>     On Thu, Sep 12, 2019 at 06:37:22AM +0100, Bart Van Assche wrote:
>     > On 9/12/19 1:39 AM, Roman Bolshakov wrote:
>     > > This series has a few bug fixes for the driver.
>     > > 
>     > > Note, #1 only fixes the crash in the kernel. The complete fix for clean
>     > > ACL deletion from initiator side is in works and requires a discussion.
>     > > 
>     > > As of now initiator is not aware that target no longer wants talking to
>     > > it, that implies unneeded timeout. It might be fixed by making LOGO
>     > > explicit on session deletion but it's an issue I want to raise first
>     > > before making the change. Whether we need implicit LOGO in qla2xxx,
>     > > explicit or use both.
>     > > 
>     > > Also, an unsolicited ABTS from a port without session would still result
>     > > in BA_RJT response instead of frame discard and LOGO ELS, as specified
>     > > in FCP (12.3.3 Target FCP_Port response to Exchange termination):
>     > > 
>     > >    When an ABTS-LS is received at the target FCP_Port, it shall abort
>     > >    the designated Exchange and return one of the following responses:
>     > > 
>     > >    a) the target FCP_Port shall discard the ABTS-LS and transmit a LOGO
>     > >       ELS if the Nx_Port issuing the ABTS-LS is not currently logged in
>     > >       (i.e., no N_Port Login exists);
>     > > 
>     > > FWIW, the target driver can receive ABTS as part of ABORT TASK/LUN
>     > > RESET/CLEAR TASK SET TMFs and in case of failed sequence retransmission
>     > > requests, exchange or sequence errors. IIRC, some initiators requeue
>     > > SCSI commands if BA_RJT is received. Therefore, a timely LOGO will
>     > > prevent a perceived session freeze on the initiators.
>     > 
>     > Hi Roman,
>     > 
>     > Has this patch series been prepared against Linus' master branch,
>     > against Martin's 5.3/scsi-fixes or against Martin's 5.4/scsi-queue
>     > branch? I'm asking this because some patches in this series look similar
>     > to patches that are already present in the 5.4/scsi-queue branch.
>     > 
>     > Thanks,
>     > 
>     > Bart.
>     > 
>     
>     Hi Bart,
>     
>     To be honest it was prepared against next-20190904 but it applies to
>     5.4/scsi-queue cleanly. The fixes made two weeks ago look promising but
>     are related to stuck PRLI and unhandled RSCN while #4 is related to
>     stuck PLOGI after qla_post_els_plogi_work.
>     
>     Thank you,
>     Roman
>     
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/4] scsi: qla2xxx: Bug fixes
  2019-11-07 19:00       ` Roman Bolshakov
@ 2019-11-13 18:54         ` Roman Bolshakov
       [not found]           ` <0B40AFCA-8CB0-4F21-BDD1-DFE7A66DAA07@marvell.com>
  0 siblings, 1 reply; 12+ messages in thread
From: Roman Bolshakov @ 2019-11-13 18:54 UTC (permalink / raw)
  To: Himanshu Madhani; +Cc: Bart Van Assche, linux-scsi, Quinn Tran

Hi Himanshu,

Could you please answer the questions below?
And if we should start doing explicit LOGO on session shutdown?

Thank you,
Roman

On Thu, Nov 07, 2019 at 10:00:32PM +0300, Roman Bolshakov wrote:
> Hi Himanshu,
> 
> Could you please take a look at the series and anwser if we should stop
> doing BA_RJT as a response on ABTS when there's no session?
> 
> Thank you,
> Roman
> 
> On Thu, Sep 12, 2019 at 01:53:03PM +0000, Himanshu Madhani wrote:
> > Adding Correct Quinn. Please use "qutran@mavell.com"
> > 
> > We'll take a look at the series
> > 
> > On 9/12/19, 8:49 AM, "linux-scsi-owner@vger.kernel.org on behalf of Roman Bolshakov" <linux-scsi-owner@vger.kernel.org on behalf of r.bolshakov@yadro.com> wrote:
> > 
> >     On Thu, Sep 12, 2019 at 06:37:22AM +0100, Bart Van Assche wrote:
> >     > On 9/12/19 1:39 AM, Roman Bolshakov wrote:
> >     > > This series has a few bug fixes for the driver.
> >     > > 
> >     > > Note, #1 only fixes the crash in the kernel. The complete fix for clean
> >     > > ACL deletion from initiator side is in works and requires a discussion.
> >     > > 
> >     > > As of now initiator is not aware that target no longer wants talking to
> >     > > it, that implies unneeded timeout. It might be fixed by making LOGO
> >     > > explicit on session deletion but it's an issue I want to raise first
> >     > > before making the change. Whether we need implicit LOGO in qla2xxx,
> >     > > explicit or use both.
> >     > > 
> >     > > Also, an unsolicited ABTS from a port without session would still result
> >     > > in BA_RJT response instead of frame discard and LOGO ELS, as specified
> >     > > in FCP (12.3.3 Target FCP_Port response to Exchange termination):
> >     > > 
> >     > >    When an ABTS-LS is received at the target FCP_Port, it shall abort
> >     > >    the designated Exchange and return one of the following responses:
> >     > > 
> >     > >    a) the target FCP_Port shall discard the ABTS-LS and transmit a LOGO
> >     > >       ELS if the Nx_Port issuing the ABTS-LS is not currently logged in
> >     > >       (i.e., no N_Port Login exists);
> >     > > 
> >     > > FWIW, the target driver can receive ABTS as part of ABORT TASK/LUN
> >     > > RESET/CLEAR TASK SET TMFs and in case of failed sequence retransmission
> >     > > requests, exchange or sequence errors. IIRC, some initiators requeue
> >     > > SCSI commands if BA_RJT is received. Therefore, a timely LOGO will
> >     > > prevent a perceived session freeze on the initiators.
> >     > 
> >     > Hi Roman,
> >     > 
> >     > Has this patch series been prepared against Linus' master branch,
> >     > against Martin's 5.3/scsi-fixes or against Martin's 5.4/scsi-queue
> >     > branch? I'm asking this because some patches in this series look similar
> >     > to patches that are already present in the 5.4/scsi-queue branch.
> >     > 
> >     > Thanks,
> >     > 
> >     > Bart.
> >     > 
> >     
> >     Hi Bart,
> >     
> >     To be honest it was prepared against next-20190904 but it applies to
> >     5.4/scsi-queue cleanly. The fixes made two weeks ago look promising but
> >     are related to stuck PRLI and unhandled RSCN while #4 is related to
> >     stuck PLOGI after qla_post_els_plogi_work.
> >     
> >     Thank you,
> >     Roman
> >     
> > 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [EXT] [PATCH 0/4] scsi: qla2xxx: Bug fixes
       [not found]           ` <0B40AFCA-8CB0-4F21-BDD1-DFE7A66DAA07@marvell.com>
@ 2019-11-19 21:46             ` Roman Bolshakov
  2019-11-20  0:09               ` Roman Bolshakov
  0 siblings, 1 reply; 12+ messages in thread
From: Roman Bolshakov @ 2019-11-19 21:46 UTC (permalink / raw)
  To: Himanshu Madhani; +Cc: Bart Van Assche, linux-scsi, Quinn Tran

Hi Himanshu,

I've tried the patch and it seems that LOGO doesn't succeed yet:
[ 1079.073246] qla2xxx [0000:00:06.0]-2870:1: Async-logout - hdl=0 loop-id=0 portid=000002 21:00:00:24:ff:7f:35:c6.
[ 1079.073333] qla2xxx [0000:00:06.0]-5837:1: Async-logout failed - 21:00:00:24:ff:7f:35:c6 hdl=12 portid=000002 comp=31 iop0=19 iop1=c.

It means that firmware detected IOCB parameter error at offset 0xc.
I'll examine IOCB parameter dumps tomorrow.

Are you ok if I send the patch (keeping Quinn's authorship) in the my
patch set once I get it fixed?

I also consider to add one more patch to address the issue with BA_RJT.
The idea is to discard a frame that has no session (not logged in) and
send explicit LOGO ELS instead of BA_RJT to follow FCP spec 12.3.3:
   When an ABTS-LS is received at the target FCP_Port, it shall abort
   the designated Exchange and return one of the following responses:

   a) the target FCP_Port shall discard the ABTS-LS and transmit a LOGO
      ELS if the Nx_Port issuing the ABTS-LS is not currently logged in
      (i.e., no N_Port Login exists);

Although, I don't know if firmware can handle that yet. We'll see.

Thank you,
Roman

On Wed, Nov 13, 2019 at 10:33:35PM +0000, Himanshu Madhani wrote:
> Hi Roman,
> 
> 
> > On Nov 13, 2019, at 12:54 PM, Roman Bolshakov <r.bolshakov@yadro.com> wrote:
> >
> > External Email
> >
> > ----------------------------------------------------------------------
> > Hi Himanshu,
> >
> > Could you please answer the questions below?
> > And if we should start doing explicit LOGO on session shutdown?
> >
> 
> You are correct in target mode driver should do explicit logo if target does not want to talk to initiator anymore.
> 
> We had patch for the explicit_logo in our internal code base, but did not sent it upstream. I’ve attached it here for you to try it out and see if it helps in your env.
> 
> Thanks,
> Himanshu
> 
> 
> 
> > Thank you,
> > Roman
> >
> > On Thu, Nov 07, 2019 at 10:00:32PM +0300, Roman Bolshakov wrote:
> >> Hi Himanshu,
> >>
> >> Could you please take a look at the series and anwser if we should stop
> >> doing BA_RJT as a response on ABTS when there's no session?
> >>
> >> Thank you,
> >> Roman
> >>
> >> On Thu, Sep 12, 2019 at 01:53:03PM +0000, Himanshu Madhani wrote:
> >>> Adding Correct Quinn. Please use "qutran@mavell.com"
> >>>
> >>> We'll take a look at the series
> >>>
> >>> On 9/12/19, 8:49 AM, "linux-scsi-owner@vger.kernel.org on behalf of Roman Bolshakov" <linux-scsi-owner@vger.kernel.org on behalf of r.bolshakov@yadro.com> wrote:
> >>>
> >>>    On Thu, Sep 12, 2019 at 06:37:22AM +0100, Bart Van Assche wrote:
> >>>> On 9/12/19 1:39 AM, Roman Bolshakov wrote:
> >>>>> This series has a few bug fixes for the driver.
> >>>>>
> >>>>> Note, #1 only fixes the crash in the kernel. The complete fix for clean
> >>>>> ACL deletion from initiator side is in works and requires a discussion.
> >>>>>
> >>>>> As of now initiator is not aware that target no longer wants talking to
> >>>>> it, that implies unneeded timeout. It might be fixed by making LOGO
> >>>>> explicit on session deletion but it's an issue I want to raise first
> >>>>> before making the change. Whether we need implicit LOGO in qla2xxx,
> >>>>> explicit or use both.
> >>>>>
> >>>>> Also, an unsolicited ABTS from a port without session would still result
> >>>>> in BA_RJT response instead of frame discard and LOGO ELS, as specified
> >>>>> in FCP (12.3.3 Target FCP_Port response to Exchange termination):
> >>>>>
> >>>>>   When an ABTS-LS is received at the target FCP_Port, it shall abort
> >>>>>   the designated Exchange and return one of the following responses:
> >>>>>
> >>>>>   a) the target FCP_Port shall discard the ABTS-LS and transmit a LOGO
> >>>>>      ELS if the Nx_Port issuing the ABTS-LS is not currently logged in
> >>>>>      (i.e., no N_Port Login exists);
> >>>>>
> >>>>> FWIW, the target driver can receive ABTS as part of ABORT TASK/LUN
> >>>>> RESET/CLEAR TASK SET TMFs and in case of failed sequence retransmission
> >>>>> requests, exchange or sequence errors. IIRC, some initiators requeue
> >>>>> SCSI commands if BA_RJT is received. Therefore, a timely LOGO will
> >>>>> prevent a perceived session freeze on the initiators.
> >>>>
> >>>> Hi Roman,
> >>>>
> >>>> Has this patch series been prepared against Linus' master branch,
> >>>> against Martin's 5.3/scsi-fixes or against Martin's 5.4/scsi-queue
> >>>> branch? I'm asking this because some patches in this series look similar
> >>>> to patches that are already present in the 5.4/scsi-queue branch.
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Bart.
> >>>>
> >>>
> >>>    Hi Bart,
> >>>
> >>>    To be honest it was prepared against next-20190904 but it applies to
> >>>    5.4/scsi-queue cleanly. The fixes made two weeks ago look promising but
> >>>    are related to stuck PRLI and unhandled RSCN while #4 is related to
> >>>    stuck PLOGI after qla_post_els_plogi_work.
> >>>
> >>>    Thank you,
> >>>    Roman
> >>>
> >>>
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [EXT] [PATCH 0/4] scsi: qla2xxx: Bug fixes
  2019-11-19 21:46             ` [EXT] " Roman Bolshakov
@ 2019-11-20  0:09               ` Roman Bolshakov
  0 siblings, 0 replies; 12+ messages in thread
From: Roman Bolshakov @ 2019-11-20  0:09 UTC (permalink / raw)
  To: Himanshu Madhani; +Cc: Bart Van Assche, linux-scsi, Quinn Tran

On Wed, Nov 20, 2019 at 12:46:56AM +0300, Roman Bolshakov wrote:
> Hi Himanshu,
> 
> I've tried the patch and it seems that LOGO doesn't succeed yet:
> [ 1079.073246] qla2xxx [0000:00:06.0]-2870:1: Async-logout - hdl=0 loop-id=0 portid=000002 21:00:00:24:ff:7f:35:c6.
> [ 1079.073333] qla2xxx [0000:00:06.0]-5837:1: Async-logout failed - 21:00:00:24:ff:7f:35:c6 hdl=12 portid=000002 comp=31 iop0=19 iop1=c.
> 
> It means that firmware detected IOCB parameter error at offset 0xc.
> I'll examine IOCB parameter dumps tomorrow.
> 

FWIW, it was an easy fix, we have to set either explicit LOGO bit or implicit
LOGO bit. Free N_Port handle is not allowed to be set alone. I corrected
the patch:
https://github.com/roolebo/linux/commit/6e86300a60552bfc0a4c49d65d89e5011dd90f10

--
Roman

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-11-20  0:09 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-12  0:39 [PATCH 0/4] scsi: qla2xxx: Bug fixes Roman Bolshakov
2019-09-12  0:39 ` [PATCH 1/4] scsi: qla2xxx: Ignore NULL pointer in tcm_qla2xxx_free_mcmd Roman Bolshakov
2019-09-12  0:39 ` [PATCH 2/4] scsi: qla2xxx: Initialize free_work before flushing it Roman Bolshakov
2019-09-12  0:39 ` [PATCH 3/4] scsi: qla2xxx: Drop superfluous INIT_WORK of del_work Roman Bolshakov
2019-09-12  0:39 ` [PATCH 4/4] scsi: qla2xxx: Change discovery state before PLOGI Roman Bolshakov
2019-09-12  5:37 ` [PATCH 0/4] scsi: qla2xxx: Bug fixes Bart Van Assche
2019-09-12 13:49   ` Roman Bolshakov
2019-09-12 13:53     ` Himanshu Madhani
2019-11-07 19:00       ` Roman Bolshakov
2019-11-13 18:54         ` Roman Bolshakov
     [not found]           ` <0B40AFCA-8CB0-4F21-BDD1-DFE7A66DAA07@marvell.com>
2019-11-19 21:46             ` [EXT] " Roman Bolshakov
2019-11-20  0:09               ` Roman Bolshakov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).