All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] scsi: qla2xxx: Keep initiator ports after RSCN
@ 2020-05-18 18:31 ` Roman Bolshakov
  0 siblings, 0 replies; 18+ messages in thread
From: Roman Bolshakov @ 2020-05-18 18:31 UTC (permalink / raw)
  To: linux-scsi
  Cc: GR-QLogic-Storage-Upstream, target-devel, linux, Roman Bolshakov,
	Quinn Tran, Arun Easi, Nilesh Javali, Bart Van Assche,
	Daniel Wagner, Himanshu Madhani, Martin Wilck, stable

The driver performs SCR (state change registration) in all modes
including pure target mode.

For each RSCN, scan_needed flag is set in qla2x00_handle_rscn() for the
port mentioned in the RSCN and fabric rescan is scheduled. During the
rescan, GNN_FT handler, qla24xx_async_gnnft_done() deletes session of
the port that caused the RSCN.

In target mode, the session deletion has an impact on ATIO handler,
qlt_24xx_atio_pkt(). Target responds with SAM STATUS BUSY to I/O
incoming from the deleted session. qlt_handle_cmd_for_atio() and
qlt_handle_task_mgmt() return -EFAULT if they are not able to find
session of the command/TMF, and that results in invocation of
qlt_send_busy():

  qlt_24xx_atio_pkt_all_vps: qla_target(0): type 6 ox_id 0014
  qla_target(0): Unable to send command to target, sending BUSY status

Such response causes command timeout on the initiator. Error handler
thread on the initiator will be spawned to abort the commands:

  scsi 23:0:0:0: tag#0 abort scheduled
  scsi 23:0:0:0: tag#0 aborting command
  qla2xxx [0000:af:00.0]-188c:23: Entered qla24xx_abort_command.
  qla2xxx [0000:af:00.0]-801c:23: Abort command issued nexus#:0:0 -- 0 2003.

Command abort is rejected by target and fails (2003), error handler then
tries to perform DEVICE RESET and TARGET RESET but they're also doomed
to fail because TMFs are ignored for the deleted sessions.

Then initiator makes BUS RESET that resets the link via
qla2x00_full_login_lip(). BUS RESET succeeds and brings initiator port
up, SAN switch detects that and sends RSCN to the target port and it
fails again the same way as described above. It never goes out of the
loop.

The change breaks the RSCN loop by keeping initiator sessions mentioned
in RSCN payload in all modes, including dual and pure target mode.

Fixes: 2037ce49d30a ("scsi: qla2xxx: Fix stale session")
Cc: Quinn Tran <qutran@marvell.com>
Cc: Arun Easi <aeasi@marvell.com>
Cc: Nilesh Javali <njavali@marvell.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Daniel Wagner <dwagner@suse.de>
Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
Cc: Martin Wilck <mwilck@suse.com>
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
---
 drivers/scsi/qla2xxx/qla_gs.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Hi Martin,

Please apply the patch to scsi-fixes/5.7 at your earliest convenience.

qla2xxx in target and, likely, dual mode is unusable in some SAN fabrics
due to the bug.

Thanks,
Roman

diff --git a/drivers/scsi/qla2xxx/qla_gs.c b/drivers/scsi/qla2xxx/qla_gs.c
index 42c3ad27f1cb..b9955af5cffe 100644
--- a/drivers/scsi/qla2xxx/qla_gs.c
+++ b/drivers/scsi/qla2xxx/qla_gs.c
@@ -3495,8 +3495,10 @@ void qla24xx_async_gnnft_done(scsi_qla_host_t *vha, srb_t *sp)
 			if ((fcport->flags & FCF_FABRIC_DEVICE) = 0) {
 				qla2x00_clear_loop_id(fcport);
 				fcport->flags |= FCF_FABRIC_DEVICE;
-			} else if (fcport->d_id.b24 != rp->id.b24 ||
-				fcport->scan_needed) {
+			} else if ((fcport->d_id.b24 != rp->id.b24 ||
+				    fcport->scan_needed) &&
+				   (fcport->port_type != FCT_INITIATOR &&
+				    fcport->port_type != FCT_NVME_INITIATOR)) {
 				qlt_schedule_sess_for_deletion(fcport);
 			}
 			fcport->d_id.b24 = rp->id.b24;
-- 
2.26.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH] scsi: qla2xxx: Keep initiator ports after RSCN
@ 2020-05-18 18:31 ` Roman Bolshakov
  0 siblings, 0 replies; 18+ messages in thread
From: Roman Bolshakov @ 2020-05-18 18:31 UTC (permalink / raw)
  To: linux-scsi
  Cc: GR-QLogic-Storage-Upstream, target-devel, linux, Roman Bolshakov,
	Quinn Tran, Arun Easi, Nilesh Javali, Bart Van Assche,
	Daniel Wagner, Himanshu Madhani, Martin Wilck, stable

The driver performs SCR (state change registration) in all modes
including pure target mode.

For each RSCN, scan_needed flag is set in qla2x00_handle_rscn() for the
port mentioned in the RSCN and fabric rescan is scheduled. During the
rescan, GNN_FT handler, qla24xx_async_gnnft_done() deletes session of
the port that caused the RSCN.

In target mode, the session deletion has an impact on ATIO handler,
qlt_24xx_atio_pkt(). Target responds with SAM STATUS BUSY to I/O
incoming from the deleted session. qlt_handle_cmd_for_atio() and
qlt_handle_task_mgmt() return -EFAULT if they are not able to find
session of the command/TMF, and that results in invocation of
qlt_send_busy():

  qlt_24xx_atio_pkt_all_vps: qla_target(0): type 6 ox_id 0014
  qla_target(0): Unable to send command to target, sending BUSY status

Such response causes command timeout on the initiator. Error handler
thread on the initiator will be spawned to abort the commands:

  scsi 23:0:0:0: tag#0 abort scheduled
  scsi 23:0:0:0: tag#0 aborting command
  qla2xxx [0000:af:00.0]-188c:23: Entered qla24xx_abort_command.
  qla2xxx [0000:af:00.0]-801c:23: Abort command issued nexus=23:0:0 -- 0 2003.

Command abort is rejected by target and fails (2003), error handler then
tries to perform DEVICE RESET and TARGET RESET but they're also doomed
to fail because TMFs are ignored for the deleted sessions.

Then initiator makes BUS RESET that resets the link via
qla2x00_full_login_lip(). BUS RESET succeeds and brings initiator port
up, SAN switch detects that and sends RSCN to the target port and it
fails again the same way as described above. It never goes out of the
loop.

The change breaks the RSCN loop by keeping initiator sessions mentioned
in RSCN payload in all modes, including dual and pure target mode.

Fixes: 2037ce49d30a ("scsi: qla2xxx: Fix stale session")
Cc: Quinn Tran <qutran@marvell.com>
Cc: Arun Easi <aeasi@marvell.com>
Cc: Nilesh Javali <njavali@marvell.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Daniel Wagner <dwagner@suse.de>
Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
Cc: Martin Wilck <mwilck@suse.com>
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
---
 drivers/scsi/qla2xxx/qla_gs.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Hi Martin,

Please apply the patch to scsi-fixes/5.7 at your earliest convenience.

qla2xxx in target and, likely, dual mode is unusable in some SAN fabrics
due to the bug.

Thanks,
Roman

diff --git a/drivers/scsi/qla2xxx/qla_gs.c b/drivers/scsi/qla2xxx/qla_gs.c
index 42c3ad27f1cb..b9955af5cffe 100644
--- a/drivers/scsi/qla2xxx/qla_gs.c
+++ b/drivers/scsi/qla2xxx/qla_gs.c
@@ -3495,8 +3495,10 @@ void qla24xx_async_gnnft_done(scsi_qla_host_t *vha, srb_t *sp)
 			if ((fcport->flags & FCF_FABRIC_DEVICE) == 0) {
 				qla2x00_clear_loop_id(fcport);
 				fcport->flags |= FCF_FABRIC_DEVICE;
-			} else if (fcport->d_id.b24 != rp->id.b24 ||
-				fcport->scan_needed) {
+			} else if ((fcport->d_id.b24 != rp->id.b24 ||
+				    fcport->scan_needed) &&
+				   (fcport->port_type != FCT_INITIATOR &&
+				    fcport->port_type != FCT_NVME_INITIATOR)) {
 				qlt_schedule_sess_for_deletion(fcport);
 			}
 			fcport->d_id.b24 = rp->id.b24;
-- 
2.26.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] scsi: qla2xxx: Keep initiator ports after RSCN
  2020-05-18 18:31 ` Roman Bolshakov
@ 2020-05-18 22:40   ` Himanshu Madhani
  -1 siblings, 0 replies; 18+ messages in thread
From: Himanshu Madhani @ 2020-05-18 22:40 UTC (permalink / raw)
  To: Roman Bolshakov, linux-scsi
  Cc: GR-QLogic-Storage-Upstream, target-devel, linux, Quinn Tran,
	Arun Easi, Nilesh Javali, Bart Van Assche, Daniel Wagner,
	Martin Wilck, stable



On 5/18/20 1:31 PM, Roman Bolshakov wrote:
> The driver performs SCR (state change registration) in all modes
> including pure target mode.
> 
> For each RSCN, scan_needed flag is set in qla2x00_handle_rscn() for the
> port mentioned in the RSCN and fabric rescan is scheduled. During the
> rescan, GNN_FT handler, qla24xx_async_gnnft_done() deletes session of
> the port that caused the RSCN.
> 
> In target mode, the session deletion has an impact on ATIO handler,
> qlt_24xx_atio_pkt(). Target responds with SAM STATUS BUSY to I/O
> incoming from the deleted session. qlt_handle_cmd_for_atio() and
> qlt_handle_task_mgmt() return -EFAULT if they are not able to find
> session of the command/TMF, and that results in invocation of
> qlt_send_busy():
> 
>    qlt_24xx_atio_pkt_all_vps: qla_target(0): type 6 ox_id 0014
>    qla_target(0): Unable to send command to target, sending BUSY status
> 
> Such response causes command timeout on the initiator. Error handler
> thread on the initiator will be spawned to abort the commands:
> 
>    scsi 23:0:0:0: tag#0 abort scheduled
>    scsi 23:0:0:0: tag#0 aborting command
>    qla2xxx [0000:af:00.0]-188c:23: Entered qla24xx_abort_command.
>    qla2xxx [0000:af:00.0]-801c:23: Abort command issued nexus#:0:0 -- 0 2003.
> 
> Command abort is rejected by target and fails (2003), error handler then
> tries to perform DEVICE RESET and TARGET RESET but they're also doomed
> to fail because TMFs are ignored for the deleted sessions.
> 
> Then initiator makes BUS RESET that resets the link via
> qla2x00_full_login_lip(). BUS RESET succeeds and brings initiator port
> up, SAN switch detects that and sends RSCN to the target port and it
> fails again the same way as described above. It never goes out of the
> loop.
> 
> The change breaks the RSCN loop by keeping initiator sessions mentioned
> in RSCN payload in all modes, including dual and pure target mode.
> 
> Fixes: 2037ce49d30a ("scsi: qla2xxx: Fix stale session")
> Cc: Quinn Tran <qutran@marvell.com>
> Cc: Arun Easi <aeasi@marvell.com>
> Cc: Nilesh Javali <njavali@marvell.com>
> Cc: Bart Van Assche <bvanassche@acm.org>
> Cc: Daniel Wagner <dwagner@suse.de>
> Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
> Cc: Martin Wilck <mwilck@suse.com>
> Cc: stable@vger.kernel.org # v5.4+
> Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
> ---
>   drivers/scsi/qla2xxx/qla_gs.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
> Hi Martin,
> 
> Please apply the patch to scsi-fixes/5.7 at your earliest convenience.
> 
> qla2xxx in target and, likely, dual mode is unusable in some SAN fabrics
> due to the bug.
> 
> Thanks,
> Roman
> 
> diff --git a/drivers/scsi/qla2xxx/qla_gs.c b/drivers/scsi/qla2xxx/qla_gs.c
> index 42c3ad27f1cb..b9955af5cffe 100644
> --- a/drivers/scsi/qla2xxx/qla_gs.c
> +++ b/drivers/scsi/qla2xxx/qla_gs.c
> @@ -3495,8 +3495,10 @@ void qla24xx_async_gnnft_done(scsi_qla_host_t *vha, srb_t *sp)
>   			if ((fcport->flags & FCF_FABRIC_DEVICE) = 0) {
>   				qla2x00_clear_loop_id(fcport);
>   				fcport->flags |= FCF_FABRIC_DEVICE;
> -			} else if (fcport->d_id.b24 != rp->id.b24 ||
> -				fcport->scan_needed) {
> +			} else if ((fcport->d_id.b24 != rp->id.b24 ||
> +				    fcport->scan_needed) &&
> +				   (fcport->port_type != FCT_INITIATOR &&
> +				    fcport->port_type != FCT_NVME_INITIATOR)) {
>   				qlt_schedule_sess_for_deletion(fcport);
>   			}
>   			fcport->d_id.b24 = rp->id.b24;
> 
Looks okay.

Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>

-- 
Himanshu Madhani                     Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] scsi: qla2xxx: Keep initiator ports after RSCN
@ 2020-05-18 22:40   ` Himanshu Madhani
  0 siblings, 0 replies; 18+ messages in thread
From: Himanshu Madhani @ 2020-05-18 22:40 UTC (permalink / raw)
  To: Roman Bolshakov, linux-scsi
  Cc: GR-QLogic-Storage-Upstream, target-devel, linux, Quinn Tran,
	Arun Easi, Nilesh Javali, Bart Van Assche, Daniel Wagner,
	Martin Wilck, stable



On 5/18/20 1:31 PM, Roman Bolshakov wrote:
> The driver performs SCR (state change registration) in all modes
> including pure target mode.
> 
> For each RSCN, scan_needed flag is set in qla2x00_handle_rscn() for the
> port mentioned in the RSCN and fabric rescan is scheduled. During the
> rescan, GNN_FT handler, qla24xx_async_gnnft_done() deletes session of
> the port that caused the RSCN.
> 
> In target mode, the session deletion has an impact on ATIO handler,
> qlt_24xx_atio_pkt(). Target responds with SAM STATUS BUSY to I/O
> incoming from the deleted session. qlt_handle_cmd_for_atio() and
> qlt_handle_task_mgmt() return -EFAULT if they are not able to find
> session of the command/TMF, and that results in invocation of
> qlt_send_busy():
> 
>    qlt_24xx_atio_pkt_all_vps: qla_target(0): type 6 ox_id 0014
>    qla_target(0): Unable to send command to target, sending BUSY status
> 
> Such response causes command timeout on the initiator. Error handler
> thread on the initiator will be spawned to abort the commands:
> 
>    scsi 23:0:0:0: tag#0 abort scheduled
>    scsi 23:0:0:0: tag#0 aborting command
>    qla2xxx [0000:af:00.0]-188c:23: Entered qla24xx_abort_command.
>    qla2xxx [0000:af:00.0]-801c:23: Abort command issued nexus=23:0:0 -- 0 2003.
> 
> Command abort is rejected by target and fails (2003), error handler then
> tries to perform DEVICE RESET and TARGET RESET but they're also doomed
> to fail because TMFs are ignored for the deleted sessions.
> 
> Then initiator makes BUS RESET that resets the link via
> qla2x00_full_login_lip(). BUS RESET succeeds and brings initiator port
> up, SAN switch detects that and sends RSCN to the target port and it
> fails again the same way as described above. It never goes out of the
> loop.
> 
> The change breaks the RSCN loop by keeping initiator sessions mentioned
> in RSCN payload in all modes, including dual and pure target mode.
> 
> Fixes: 2037ce49d30a ("scsi: qla2xxx: Fix stale session")
> Cc: Quinn Tran <qutran@marvell.com>
> Cc: Arun Easi <aeasi@marvell.com>
> Cc: Nilesh Javali <njavali@marvell.com>
> Cc: Bart Van Assche <bvanassche@acm.org>
> Cc: Daniel Wagner <dwagner@suse.de>
> Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
> Cc: Martin Wilck <mwilck@suse.com>
> Cc: stable@vger.kernel.org # v5.4+
> Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
> ---
>   drivers/scsi/qla2xxx/qla_gs.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
> Hi Martin,
> 
> Please apply the patch to scsi-fixes/5.7 at your earliest convenience.
> 
> qla2xxx in target and, likely, dual mode is unusable in some SAN fabrics
> due to the bug.
> 
> Thanks,
> Roman
> 
> diff --git a/drivers/scsi/qla2xxx/qla_gs.c b/drivers/scsi/qla2xxx/qla_gs.c
> index 42c3ad27f1cb..b9955af5cffe 100644
> --- a/drivers/scsi/qla2xxx/qla_gs.c
> +++ b/drivers/scsi/qla2xxx/qla_gs.c
> @@ -3495,8 +3495,10 @@ void qla24xx_async_gnnft_done(scsi_qla_host_t *vha, srb_t *sp)
>   			if ((fcport->flags & FCF_FABRIC_DEVICE) == 0) {
>   				qla2x00_clear_loop_id(fcport);
>   				fcport->flags |= FCF_FABRIC_DEVICE;
> -			} else if (fcport->d_id.b24 != rp->id.b24 ||
> -				fcport->scan_needed) {
> +			} else if ((fcport->d_id.b24 != rp->id.b24 ||
> +				    fcport->scan_needed) &&
> +				   (fcport->port_type != FCT_INITIATOR &&
> +				    fcport->port_type != FCT_NVME_INITIATOR)) {
>   				qlt_schedule_sess_for_deletion(fcport);
>   			}
>   			fcport->d_id.b24 = rp->id.b24;
> 
Looks okay.

Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>

-- 
Himanshu Madhani                     Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] scsi: qla2xxx: Keep initiator ports after RSCN
  2020-05-18 18:31 ` Roman Bolshakov
@ 2020-05-18 23:22   ` Roman Bolshakov
  -1 siblings, 0 replies; 18+ messages in thread
From: Roman Bolshakov @ 2020-05-18 23:22 UTC (permalink / raw)
  To: linux-scsi
  Cc: GR-QLogic-Storage-Upstream, target-devel, linux, Quinn Tran,
	Arun Easi, Nilesh Javali, Bart Van Assche, Daniel Wagner,
	Himanshu Madhani, Martin Wilck, stable

On Mon, May 18, 2020 at 09:31:42PM +0300, Roman Bolshakov wrote:
> The driver performs SCR (state change registration) in all modes
> including pure target mode.
> 
> For each RSCN, scan_needed flag is set in qla2x00_handle_rscn() for the
> port mentioned in the RSCN and fabric rescan is scheduled. During the
> rescan, GNN_FT handler, qla24xx_async_gnnft_done() deletes session of
> the port that caused the RSCN.
> 
> In target mode, the session deletion has an impact on ATIO handler,
> qlt_24xx_atio_pkt(). Target responds with SAM STATUS BUSY to I/O
> incoming from the deleted session. qlt_handle_cmd_for_atio() and
> qlt_handle_task_mgmt() return -EFAULT if they are not able to find
> session of the command/TMF, and that results in invocation of
> qlt_send_busy():
> 
>   qlt_24xx_atio_pkt_all_vps: qla_target(0): type 6 ox_id 0014
>   qla_target(0): Unable to send command to target, sending BUSY status
> 
> Such response causes command timeout on the initiator. Error handler
> thread on the initiator will be spawned to abort the commands:
> 
>   scsi 23:0:0:0: tag#0 abort scheduled
>   scsi 23:0:0:0: tag#0 aborting command
>   qla2xxx [0000:af:00.0]-188c:23: Entered qla24xx_abort_command.
>   qla2xxx [0000:af:00.0]-801c:23: Abort command issued nexus#:0:0 -- 0 2003.
> 
> Command abort is rejected by target and fails (2003), error handler then
> tries to perform DEVICE RESET and TARGET RESET but they're also doomed
> to fail because TMFs are ignored for the deleted sessions.
> 
> Then initiator makes BUS RESET that resets the link via
> qla2x00_full_login_lip(). BUS RESET succeeds and brings initiator port
> up, SAN switch detects that and sends RSCN to the target port and it
> fails again the same way as described above. It never goes out of the
> loop.
> 
> The change breaks the RSCN loop by keeping initiator sessions mentioned
> in RSCN payload in all modes, including dual and pure target mode.
> 
> Fixes: 2037ce49d30a ("scsi: qla2xxx: Fix stale session")
> Cc: Quinn Tran <qutran@marvell.com>
> Cc: Arun Easi <aeasi@marvell.com>
> Cc: Nilesh Javali <njavali@marvell.com>
> Cc: Bart Van Assche <bvanassche@acm.org>
> Cc: Daniel Wagner <dwagner@suse.de>
> Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
> Cc: Martin Wilck <mwilck@suse.com>
> Cc: stable@vger.kernel.org # v5.4+
> Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
> ---
>  drivers/scsi/qla2xxx/qla_gs.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> Hi Martin,
> 
> Please apply the patch to scsi-fixes/5.7 at your earliest convenience.
> 
> qla2xxx in target and, likely, dual mode is unusable in some SAN fabrics
> due to the bug.
> 
> Thanks,
> Roman
> 
> diff --git a/drivers/scsi/qla2xxx/qla_gs.c b/drivers/scsi/qla2xxx/qla_gs.c
> index 42c3ad27f1cb..b9955af5cffe 100644
> --- a/drivers/scsi/qla2xxx/qla_gs.c
> +++ b/drivers/scsi/qla2xxx/qla_gs.c
> @@ -3495,8 +3495,10 @@ void qla24xx_async_gnnft_done(scsi_qla_host_t *vha, srb_t *sp)
>  			if ((fcport->flags & FCF_FABRIC_DEVICE) = 0) {
>  				qla2x00_clear_loop_id(fcport);
>  				fcport->flags |= FCF_FABRIC_DEVICE;
> -			} else if (fcport->d_id.b24 != rp->id.b24 ||
> -				fcport->scan_needed) {
> +			} else if ((fcport->d_id.b24 != rp->id.b24 ||
> +				    fcport->scan_needed) &&
> +				   (fcport->port_type != FCT_INITIATOR &&
> +				    fcport->port_type != FCT_NVME_INITIATOR)) {
>  				qlt_schedule_sess_for_deletion(fcport);
>  			}
>  			fcport->d_id.b24 = rp->id.b24;
> -- 
> 2.26.1
> 

P.S. A little bit cleaner alternative would be to avoid session deletion
only if scan needed is set (that allows session deletion of initiator
ports after fabric discovery if N_Port ID change happened). Please let
me know if I should submit v2 like this:

diff --git a/drivers/scsi/qla2xxx/qla_gs.c b/drivers/scsi/qla2xxx/qla_gs.c
index 42c3ad27f1cb..b9955af5cffe 100644
--- a/drivers/scsi/qla2xxx/qla_gs.c
+++ b/drivers/scsi/qla2xxx/qla_gs.c
@@ -3495,8 +3495,10 @@ void qla24xx_async_gnnft_done(scsi_qla_host_t *vha, srb_t *sp)
 			if ((fcport->flags & FCF_FABRIC_DEVICE) = 0) {
 				qla2x00_clear_loop_id(fcport);
 				fcport->flags |= FCF_FABRIC_DEVICE;
-			} else if (fcport->d_id.b24 != rp->id.b24 ||
-				fcport->scan_needed) {
+			} else if (fcport->d_id.b24 != rp->id.b24 ||
+				   (fcport->scan_needed &&
+				    fcport->port_type != FCT_INITIATOR &&
+				    fcport->port_type != FCT_NVME_INITIATOR)) {
 				qlt_schedule_sess_for_deletion(fcport);
 			}
 			fcport->d_id.b24 = rp->id.b24;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] scsi: qla2xxx: Keep initiator ports after RSCN
@ 2020-05-18 23:22   ` Roman Bolshakov
  0 siblings, 0 replies; 18+ messages in thread
From: Roman Bolshakov @ 2020-05-18 23:22 UTC (permalink / raw)
  To: linux-scsi
  Cc: GR-QLogic-Storage-Upstream, target-devel, linux, Quinn Tran,
	Arun Easi, Nilesh Javali, Bart Van Assche, Daniel Wagner,
	Himanshu Madhani, Martin Wilck, stable

On Mon, May 18, 2020 at 09:31:42PM +0300, Roman Bolshakov wrote:
> The driver performs SCR (state change registration) in all modes
> including pure target mode.
> 
> For each RSCN, scan_needed flag is set in qla2x00_handle_rscn() for the
> port mentioned in the RSCN and fabric rescan is scheduled. During the
> rescan, GNN_FT handler, qla24xx_async_gnnft_done() deletes session of
> the port that caused the RSCN.
> 
> In target mode, the session deletion has an impact on ATIO handler,
> qlt_24xx_atio_pkt(). Target responds with SAM STATUS BUSY to I/O
> incoming from the deleted session. qlt_handle_cmd_for_atio() and
> qlt_handle_task_mgmt() return -EFAULT if they are not able to find
> session of the command/TMF, and that results in invocation of
> qlt_send_busy():
> 
>   qlt_24xx_atio_pkt_all_vps: qla_target(0): type 6 ox_id 0014
>   qla_target(0): Unable to send command to target, sending BUSY status
> 
> Such response causes command timeout on the initiator. Error handler
> thread on the initiator will be spawned to abort the commands:
> 
>   scsi 23:0:0:0: tag#0 abort scheduled
>   scsi 23:0:0:0: tag#0 aborting command
>   qla2xxx [0000:af:00.0]-188c:23: Entered qla24xx_abort_command.
>   qla2xxx [0000:af:00.0]-801c:23: Abort command issued nexus=23:0:0 -- 0 2003.
> 
> Command abort is rejected by target and fails (2003), error handler then
> tries to perform DEVICE RESET and TARGET RESET but they're also doomed
> to fail because TMFs are ignored for the deleted sessions.
> 
> Then initiator makes BUS RESET that resets the link via
> qla2x00_full_login_lip(). BUS RESET succeeds and brings initiator port
> up, SAN switch detects that and sends RSCN to the target port and it
> fails again the same way as described above. It never goes out of the
> loop.
> 
> The change breaks the RSCN loop by keeping initiator sessions mentioned
> in RSCN payload in all modes, including dual and pure target mode.
> 
> Fixes: 2037ce49d30a ("scsi: qla2xxx: Fix stale session")
> Cc: Quinn Tran <qutran@marvell.com>
> Cc: Arun Easi <aeasi@marvell.com>
> Cc: Nilesh Javali <njavali@marvell.com>
> Cc: Bart Van Assche <bvanassche@acm.org>
> Cc: Daniel Wagner <dwagner@suse.de>
> Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
> Cc: Martin Wilck <mwilck@suse.com>
> Cc: stable@vger.kernel.org # v5.4+
> Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
> ---
>  drivers/scsi/qla2xxx/qla_gs.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> Hi Martin,
> 
> Please apply the patch to scsi-fixes/5.7 at your earliest convenience.
> 
> qla2xxx in target and, likely, dual mode is unusable in some SAN fabrics
> due to the bug.
> 
> Thanks,
> Roman
> 
> diff --git a/drivers/scsi/qla2xxx/qla_gs.c b/drivers/scsi/qla2xxx/qla_gs.c
> index 42c3ad27f1cb..b9955af5cffe 100644
> --- a/drivers/scsi/qla2xxx/qla_gs.c
> +++ b/drivers/scsi/qla2xxx/qla_gs.c
> @@ -3495,8 +3495,10 @@ void qla24xx_async_gnnft_done(scsi_qla_host_t *vha, srb_t *sp)
>  			if ((fcport->flags & FCF_FABRIC_DEVICE) == 0) {
>  				qla2x00_clear_loop_id(fcport);
>  				fcport->flags |= FCF_FABRIC_DEVICE;
> -			} else if (fcport->d_id.b24 != rp->id.b24 ||
> -				fcport->scan_needed) {
> +			} else if ((fcport->d_id.b24 != rp->id.b24 ||
> +				    fcport->scan_needed) &&
> +				   (fcport->port_type != FCT_INITIATOR &&
> +				    fcport->port_type != FCT_NVME_INITIATOR)) {
>  				qlt_schedule_sess_for_deletion(fcport);
>  			}
>  			fcport->d_id.b24 = rp->id.b24;
> -- 
> 2.26.1
> 

P.S. A little bit cleaner alternative would be to avoid session deletion
only if scan needed is set (that allows session deletion of initiator
ports after fabric discovery if N_Port ID change happened). Please let
me know if I should submit v2 like this:

diff --git a/drivers/scsi/qla2xxx/qla_gs.c b/drivers/scsi/qla2xxx/qla_gs.c
index 42c3ad27f1cb..b9955af5cffe 100644
--- a/drivers/scsi/qla2xxx/qla_gs.c
+++ b/drivers/scsi/qla2xxx/qla_gs.c
@@ -3495,8 +3495,10 @@ void qla24xx_async_gnnft_done(scsi_qla_host_t *vha, srb_t *sp)
 			if ((fcport->flags & FCF_FABRIC_DEVICE) == 0) {
 				qla2x00_clear_loop_id(fcport);
 				fcport->flags |= FCF_FABRIC_DEVICE;
-			} else if (fcport->d_id.b24 != rp->id.b24 ||
-				fcport->scan_needed) {
+			} else if (fcport->d_id.b24 != rp->id.b24 ||
+				   (fcport->scan_needed &&
+				    fcport->port_type != FCT_INITIATOR &&
+				    fcport->port_type != FCT_NVME_INITIATOR)) {
 				qlt_schedule_sess_for_deletion(fcport);
 			}
 			fcport->d_id.b24 = rp->id.b24;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] scsi: qla2xxx: Keep initiator ports after RSCN
  2020-05-18 18:31 ` Roman Bolshakov
@ 2020-05-19  8:46   ` Martin Wilck
  -1 siblings, 0 replies; 18+ messages in thread
From: Martin Wilck @ 2020-05-19  8:46 UTC (permalink / raw)
  To: Roman Bolshakov, linux-scsi
  Cc: GR-QLogic-Storage-Upstream, target-devel, linux, Quinn Tran,
	Arun Easi, Nilesh Javali, Bart Van Assche, Daniel Wagner,
	Himanshu Madhani, stable

On Mon, 2020-05-18 at 21:31 +0300, Roman Bolshakov wrote:
> The driver performs SCR (state change registration) in all modes
> including pure target mode.
> 
> For each RSCN, scan_needed flag is set in qla2x00_handle_rscn() for
> the
> port mentioned in the RSCN and fabric rescan is scheduled. During the
> rescan, GNN_FT handler, qla24xx_async_gnnft_done() deletes session of
> the port that caused the RSCN.
> 
> In target mode, the session deletion has an impact on ATIO handler,
> qlt_24xx_atio_pkt(). Target responds with SAM STATUS BUSY to I/O
> incoming from the deleted session. qlt_handle_cmd_for_atio() and
> qlt_handle_task_mgmt() return -EFAULT if they are not able to find
> session of the command/TMF, and that results in invocation of
> qlt_send_busy():
> 
>   qlt_24xx_atio_pkt_all_vps: qla_target(0): type 6 ox_id 0014
>   qla_target(0): Unable to send command to target, sending BUSY
> status
> 
> Such response causes command timeout on the initiator. Error handler
> thread on the initiator will be spawned to abort the commands:
> 
>   scsi 23:0:0:0: tag#0 abort scheduled
>   scsi 23:0:0:0: tag#0 aborting command
>   qla2xxx [0000:af:00.0]-188c:23: Entered qla24xx_abort_command.
>   qla2xxx [0000:af:00.0]-801c:23: Abort command issued nexus#:0:0
> -- 0 2003.
> 
> Command abort is rejected by target and fails (2003), error handler
> then
> tries to perform DEVICE RESET and TARGET RESET but they're also
> doomed
> to fail because TMFs are ignored for the deleted sessions.
> 
> Then initiator makes BUS RESET that resets the link via
> qla2x00_full_login_lip(). BUS RESET succeeds and brings initiator
> port
> up, SAN switch detects that and sends RSCN to the target port and it
> fails again the same way as described above. It never goes out of the
> loop.
> 
> The change breaks the RSCN loop by keeping initiator sessions
> mentioned
> in RSCN payload in all modes, including dual and pure target mode.
> 
> Fixes: 2037ce49d30a ("scsi: qla2xxx: Fix stale session")
> Cc: Quinn Tran <qutran@marvell.com>
> Cc: Arun Easi <aeasi@marvell.com>
> Cc: Nilesh Javali <njavali@marvell.com>
> Cc: Bart Van Assche <bvanassche@acm.org>
> Cc: Daniel Wagner <dwagner@suse.de>
> Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
> Cc: Martin Wilck <mwilck@suse.com>
> Cc: stable@vger.kernel.org # v5.4+
> Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
> ---
>  drivers/scsi/qla2xxx/qla_gs.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> Hi Martin,
> 
> Please apply the patch to scsi-fixes/5.7 at your earliest
> convenience.
> 
> qla2xxx in target and, likely, dual mode is unusable in some SAN
> fabrics
> due to the bug.
> 
> Thanks,
> Roman
> 
> diff --git a/drivers/scsi/qla2xxx/qla_gs.c
> b/drivers/scsi/qla2xxx/qla_gs.c
> index 42c3ad27f1cb..b9955af5cffe 100644
> --- a/drivers/scsi/qla2xxx/qla_gs.c
> +++ b/drivers/scsi/qla2xxx/qla_gs.c
> @@ -3495,8 +3495,10 @@ void qla24xx_async_gnnft_done(scsi_qla_host_t
> *vha, srb_t *sp)
>  			if ((fcport->flags & FCF_FABRIC_DEVICE) = 0) {
>  				qla2x00_clear_loop_id(fcport);
>  				fcport->flags |= FCF_FABRIC_DEVICE;
> -			} else if (fcport->d_id.b24 != rp->id.b24 ||
> -				fcport->scan_needed) {
> +			} else if ((fcport->d_id.b24 != rp->id.b24 ||
> +				    fcport->scan_needed) &&
> +				   (fcport->port_type != FCT_INITIATOR
> &&
> +				    fcport->port_type !> FCT_NVME_INITIATOR)) {
>  				qlt_schedule_sess_for_deletion(fcport);
>  			}
>  			fcport->d_id.b24 = rp->id.b24;

Hi Roman,

what if the session does need to be deleted eventually? E.g. if after
the RSCN the connection to the initiator is actually lost, either
temporarily or for good? Would the session be deleted in some other
code path, or would it just continue to lurk around?

Martin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] scsi: qla2xxx: Keep initiator ports after RSCN
@ 2020-05-19  8:46   ` Martin Wilck
  0 siblings, 0 replies; 18+ messages in thread
From: Martin Wilck @ 2020-05-19  8:46 UTC (permalink / raw)
  To: Roman Bolshakov, linux-scsi
  Cc: GR-QLogic-Storage-Upstream, target-devel, linux, Quinn Tran,
	Arun Easi, Nilesh Javali, Bart Van Assche, Daniel Wagner,
	Himanshu Madhani, stable

On Mon, 2020-05-18 at 21:31 +0300, Roman Bolshakov wrote:
> The driver performs SCR (state change registration) in all modes
> including pure target mode.
> 
> For each RSCN, scan_needed flag is set in qla2x00_handle_rscn() for
> the
> port mentioned in the RSCN and fabric rescan is scheduled. During the
> rescan, GNN_FT handler, qla24xx_async_gnnft_done() deletes session of
> the port that caused the RSCN.
> 
> In target mode, the session deletion has an impact on ATIO handler,
> qlt_24xx_atio_pkt(). Target responds with SAM STATUS BUSY to I/O
> incoming from the deleted session. qlt_handle_cmd_for_atio() and
> qlt_handle_task_mgmt() return -EFAULT if they are not able to find
> session of the command/TMF, and that results in invocation of
> qlt_send_busy():
> 
>   qlt_24xx_atio_pkt_all_vps: qla_target(0): type 6 ox_id 0014
>   qla_target(0): Unable to send command to target, sending BUSY
> status
> 
> Such response causes command timeout on the initiator. Error handler
> thread on the initiator will be spawned to abort the commands:
> 
>   scsi 23:0:0:0: tag#0 abort scheduled
>   scsi 23:0:0:0: tag#0 aborting command
>   qla2xxx [0000:af:00.0]-188c:23: Entered qla24xx_abort_command.
>   qla2xxx [0000:af:00.0]-801c:23: Abort command issued nexus=23:0:0
> -- 0 2003.
> 
> Command abort is rejected by target and fails (2003), error handler
> then
> tries to perform DEVICE RESET and TARGET RESET but they're also
> doomed
> to fail because TMFs are ignored for the deleted sessions.
> 
> Then initiator makes BUS RESET that resets the link via
> qla2x00_full_login_lip(). BUS RESET succeeds and brings initiator
> port
> up, SAN switch detects that and sends RSCN to the target port and it
> fails again the same way as described above. It never goes out of the
> loop.
> 
> The change breaks the RSCN loop by keeping initiator sessions
> mentioned
> in RSCN payload in all modes, including dual and pure target mode.
> 
> Fixes: 2037ce49d30a ("scsi: qla2xxx: Fix stale session")
> Cc: Quinn Tran <qutran@marvell.com>
> Cc: Arun Easi <aeasi@marvell.com>
> Cc: Nilesh Javali <njavali@marvell.com>
> Cc: Bart Van Assche <bvanassche@acm.org>
> Cc: Daniel Wagner <dwagner@suse.de>
> Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
> Cc: Martin Wilck <mwilck@suse.com>
> Cc: stable@vger.kernel.org # v5.4+
> Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
> ---
>  drivers/scsi/qla2xxx/qla_gs.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> Hi Martin,
> 
> Please apply the patch to scsi-fixes/5.7 at your earliest
> convenience.
> 
> qla2xxx in target and, likely, dual mode is unusable in some SAN
> fabrics
> due to the bug.
> 
> Thanks,
> Roman
> 
> diff --git a/drivers/scsi/qla2xxx/qla_gs.c
> b/drivers/scsi/qla2xxx/qla_gs.c
> index 42c3ad27f1cb..b9955af5cffe 100644
> --- a/drivers/scsi/qla2xxx/qla_gs.c
> +++ b/drivers/scsi/qla2xxx/qla_gs.c
> @@ -3495,8 +3495,10 @@ void qla24xx_async_gnnft_done(scsi_qla_host_t
> *vha, srb_t *sp)
>  			if ((fcport->flags & FCF_FABRIC_DEVICE) == 0) {
>  				qla2x00_clear_loop_id(fcport);
>  				fcport->flags |= FCF_FABRIC_DEVICE;
> -			} else if (fcport->d_id.b24 != rp->id.b24 ||
> -				fcport->scan_needed) {
> +			} else if ((fcport->d_id.b24 != rp->id.b24 ||
> +				    fcport->scan_needed) &&
> +				   (fcport->port_type != FCT_INITIATOR
> &&
> +				    fcport->port_type !=
> FCT_NVME_INITIATOR)) {
>  				qlt_schedule_sess_for_deletion(fcport);
>  			}
>  			fcport->d_id.b24 = rp->id.b24;

Hi Roman,

what if the session does need to be deleted eventually? E.g. if after
the RSCN the connection to the initiator is actually lost, either
temporarily or for good? Would the session be deleted in some other
code path, or would it just continue to lurk around?

Martin



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] scsi: qla2xxx: Keep initiator ports after RSCN
  2020-05-19  8:46   ` Martin Wilck
@ 2020-05-21 15:17     ` Roman Bolshakov
  -1 siblings, 0 replies; 18+ messages in thread
From: Roman Bolshakov @ 2020-05-21 15:17 UTC (permalink / raw)
  To: Martin Wilck
  Cc: linux-scsi, GR-QLogic-Storage-Upstream, target-devel, linux,
	Quinn Tran, Arun Easi, Nilesh Javali, Bart Van Assche,
	Daniel Wagner, Himanshu Madhani, stable

On Tue, May 19, 2020 at 10:46:54AM +0200, Martin Wilck wrote:
> On Mon, 2020-05-18 at 21:31 +0300, Roman Bolshakov wrote:
> > The driver performs SCR (state change registration) in all modes
> > including pure target mode.
> > 
> > For each RSCN, scan_needed flag is set in qla2x00_handle_rscn() for
> > the
> > port mentioned in the RSCN and fabric rescan is scheduled. During the
> > rescan, GNN_FT handler, qla24xx_async_gnnft_done() deletes session of
> > the port that caused the RSCN.
> > 
> > In target mode, the session deletion has an impact on ATIO handler,
> > qlt_24xx_atio_pkt(). Target responds with SAM STATUS BUSY to I/O
> > incoming from the deleted session. qlt_handle_cmd_for_atio() and
> > qlt_handle_task_mgmt() return -EFAULT if they are not able to find
> > session of the command/TMF, and that results in invocation of
> > qlt_send_busy():
> > 
> >   qlt_24xx_atio_pkt_all_vps: qla_target(0): type 6 ox_id 0014
> >   qla_target(0): Unable to send command to target, sending BUSY
> > status
> > 
> > Such response causes command timeout on the initiator. Error handler
> > thread on the initiator will be spawned to abort the commands:
> > 
> >   scsi 23:0:0:0: tag#0 abort scheduled
> >   scsi 23:0:0:0: tag#0 aborting command
> >   qla2xxx [0000:af:00.0]-188c:23: Entered qla24xx_abort_command.
> >   qla2xxx [0000:af:00.0]-801c:23: Abort command issued nexus#:0:0
> > -- 0 2003.
> > 
> > Command abort is rejected by target and fails (2003), error handler
> > then
> > tries to perform DEVICE RESET and TARGET RESET but they're also
> > doomed
> > to fail because TMFs are ignored for the deleted sessions.
> > 
> > Then initiator makes BUS RESET that resets the link via
> > qla2x00_full_login_lip(). BUS RESET succeeds and brings initiator
> > port
> > up, SAN switch detects that and sends RSCN to the target port and it
> > fails again the same way as described above. It never goes out of the
> > loop.
> > 
> > The change breaks the RSCN loop by keeping initiator sessions
> > mentioned
> > in RSCN payload in all modes, including dual and pure target mode.
> > 
> > Fixes: 2037ce49d30a ("scsi: qla2xxx: Fix stale session")
> > Cc: Quinn Tran <qutran@marvell.com>
> > Cc: Arun Easi <aeasi@marvell.com>
> > Cc: Nilesh Javali <njavali@marvell.com>
> > Cc: Bart Van Assche <bvanassche@acm.org>
> > Cc: Daniel Wagner <dwagner@suse.de>
> > Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
> > Cc: Martin Wilck <mwilck@suse.com>
> > Cc: stable@vger.kernel.org # v5.4+
> > Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
> > ---
> >  drivers/scsi/qla2xxx/qla_gs.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > Hi Martin,
> > 
> > Please apply the patch to scsi-fixes/5.7 at your earliest
> > convenience.
> > 
> > qla2xxx in target and, likely, dual mode is unusable in some SAN
> > fabrics
> > due to the bug.
> > 
> > Thanks,
> > Roman
> > 
> > diff --git a/drivers/scsi/qla2xxx/qla_gs.c
> > b/drivers/scsi/qla2xxx/qla_gs.c
> > index 42c3ad27f1cb..b9955af5cffe 100644
> > --- a/drivers/scsi/qla2xxx/qla_gs.c
> > +++ b/drivers/scsi/qla2xxx/qla_gs.c
> > @@ -3495,8 +3495,10 @@ void qla24xx_async_gnnft_done(scsi_qla_host_t
> > *vha, srb_t *sp)
> >  			if ((fcport->flags & FCF_FABRIC_DEVICE) = 0) {
> >  				qla2x00_clear_loop_id(fcport);
> >  				fcport->flags |= FCF_FABRIC_DEVICE;
> > -			} else if (fcport->d_id.b24 != rp->id.b24 ||
> > -				fcport->scan_needed) {
> > +			} else if ((fcport->d_id.b24 != rp->id.b24 ||
> > +				    fcport->scan_needed) &&
> > +				   (fcport->port_type != FCT_INITIATOR
> > &&
> > +				    fcport->port_type !> > FCT_NVME_INITIATOR)) {
> >  				qlt_schedule_sess_for_deletion(fcport);
> >  			}
> >  			fcport->d_id.b24 = rp->id.b24;
> 
> Hi Roman,
> 
> what if the session does need to be deleted eventually?

Hi Martin,

I'm sorry for the delay, it took a while to complete the answer :)

In the other reply to the patch I sent a proposal for v2. We might go
with it to handle N_Port_ID changes. N_Port_ID may change in the
switched fabric topology if initiator cable is replugged to another
physical port on the SAN switch (some fabrics assign physical port
number to domain area). Physical reconnection means initiator is going
to relogin anyway and previous session is no longer needed.

Third variant would be to check non-NULL TCM session on the fcport instead of
checking against fc_port type but that will be less reliable if qla2xxx_nvmet
is added to the mainline, i.e.:

     } else if (fcport->d_id.b24 != rp->id.b24 ||
                (fcport->scan_needed &&
                 !fcport->se_sess)) {
             qlt_schedule_sess_for_deletion(fcport);
     }

Fourth variant would be to disable SCR in pure target mode as Firmware
Interface Specification suggests. Although, the issue would still exit
in dual mode. Proposal #2 and #3 fix the issue in target and dual modes.

> E.g. if after the RSCN the connection to the initiator is actually
> lost, either temporarily or for good? Would the session be deleted in
> some other code path, or would it just continue to lurk around?
> 

RSCN per-se shouldn't be a reason to delete initiator IMO, since it's
just a notification that state of the entity in the fabric has changed.
Nx_Port should figure out what changed and then decide if it's worth to
keep session of the port.

There are multiple cases in v5.7 when session deletion happens. I tried
to cover them below in cscope order and inspected the code around.
Pardon me for mistakes if I overlooked something:

1.
2.
3.  GPN_ID request may be issued to find if something has changed with a
    port after an RSCN.

    I don't see if it's called anywhere though. There's an request of
    GPN_ID inside GPN_ID response handling but it's never called
    outside. So it looks a kind of dead code:

    qla24xx_post_gpnid_work() sends QLA_EVT_GPNID ->
    qla2x00_do_work() dispatches the event to qla24xx_async_gpnid() ->
    qla24xx_async_gpnid() sends IOCB with GPN_ID to fimrware and sets up
    call back to qla2x00_async_gpnid_sp_done() ->
    qla2x00_async_gpnid_sp_done() calls qla24xx_post_gpnid_work() if the
    IOCB timed out or RSCN arrived in the middle of the request.


    If GPN_ID request fails, all session on the host are deleted in
    qla24xx_handle_gpnid_event(). It's kind of weird that it deletes
    everything. May be it was expected to look like this to delete only
    the port that was requested in the GPN_ID request (ea->id.b24):

      if (ea->rc) {
              /* cable is disconnected */
              list_for_each_entry_safe(fcport, t, &vha->vp_fcports, list) {
                      if (fcport->d_id.b24 = ea->id.b24) {
                              fcport->scan_state = QLA_FCPORT_SCAN;

         		      qlt_schedule_sess_for_deletion(fcport);
         	     }
              }
      } else {

    Instead of (5.7/scsi-fixes):

      if (ea->rc) {
              /* cable is disconnected */
              list_for_each_entry_safe(fcport, t, &vha->vp_fcports, list) {
                      if (fcport->d_id.b24 = ea->id.b24)
                              fcport->scan_state = QLA_FCPORT_SCAN;

                      qlt_schedule_sess_for_deletion(fcport);
              }
      } else {


    If GPN_ID request suceeds, and there's a session for the port with
    the same N_Port_Name, it deletes all sessions on the host in
    qla24xx_handle_gpnid_event(). It looks like too heavy hammer IMO,
    maybe it was expected to behave like this to replace only session
    session with the same N_Port_Name (including deleted sessions):

      list_for_each_entry_safe(conflict, t, &vha->vp_fcports,
          list) {
              if ((conflict->d_id.b24 = ea->id.b24) &&
                  (fcport != conflict)) {
                      /*
                       * 2 fcports with conflict Nport ID or
                       * an existing fcport is having nport ID
                       * conflict with new fcport.
                       */

                      conflict->scan_state = QLA_FCPORT_SCAN;

         	      qlt_schedule_sess_for_deletion(conflict);
              }
      }

    Instead of (5.7/scsi-fixes):

      list_for_each_entry_safe(conflict, t, &vha->vp_fcports,
          list) {
              if ((conflict->d_id.b24 = ea->id.b24) &&
                  (fcport != conflict))
                      /*
                       * 2 fcports with conflict Nport ID or
                       * an existing fcport is having nport ID
                       * conflict with new fcport.
                       */

                      conflict->scan_state = QLA_FCPORT_SCAN;

              qlt_schedule_sess_for_deletion(conflict);
      }

    If GPN_ID request succeeds, no sessions were were found for the
    N_Port_Name but there's a session with conflicting N_Port_ID, it gets
    deleted in qla24xx_handle_gpnid_event().

4.
5.  Session is deleted during fabric rescan in qla24xx_async_gnnft_done()
    if RSCN had N_Port_ID of the session (**The patch aims to address
    the issue**) or N_Port_ID was changed in GPN_FT reply.

    Sessions missing in GPN_FT response are deleted after fabric
    discovery in qla24xx_async_gnnft_done() if driver is running in pure
    initiator or dual mode. So, if zone definition changed or some ports
    logged out, sessions of the ports that somehow disappeared from the
    zone are going to be deleted.

6.
7.  Session is deleted if ADISC to an N_Port fails
    qla24xx_handle_adisc_event(). It's also going to be deleted if
    GPN_ID was done for the port or RSCN for the port arrived during ADISC.

    I'm not sure but it may cause session loss for the target
    mode and then I/O timeout on the attached initiator.

8.
9.  RSCN or fabric rescan during GNL will trigger session deletion in
    qla24xx_handle_gnl_done_event().

10.
11. GPDB handling during login will delete session in
    qla24xx_handle_gpdb_event() if RSCN has happened or GPN_ID was made
    in the middle of the GPDB for the port. If ports are detected by
    GPDB as logged out or logout is pending, their session is also
    deleted.

12. Session is going to be deleted in P2P mode if a spare N_Port handle
    can't be found in qla_chk_n2n_b4_login().

13. Failed PLOGI in all topologies but P2P is going to trigger session
    deletion in qla24xx_fcport_handle_login().

14. Session is deleted after an unsuccessful PRLI in
    qla24xx_handle_prli_done_event(). I wonder if dual FCP/FC-NVMe
    sessions are impacted by that.

15.
16. RSCN during PLOGI or duplicated N_Port handle will trigger session
    deletion in qla24xx_handle_plogi_done_event()

17. Missing target, switch, name server and NVMe sessions are deleted in
    after rescan of arbitrated loop by qla2x00_configure_local_loop() in
    dual and pure initiator mode.

    This check is similar to 5. Code duplication can be removed.

18. An RSCN during port registration is going to trigger deletion of the
    related session in qla_register_fcport_fn().

19. Missing target, switch, name server and NVMe sessions are deleted
    after sync fabric scan by qla2x00_find_all_fabric_devs() in dual and
    pure initiator mode.

    The code is similar to 5 and 17, perhaps the duplication can be
    removed.

20. Session is deleted in qla2x00_els_dcmd2_sp_done() if duplicate
    N_Port handle was detected by firmware during Login IOCB.

21. Session is deleted in initiator mode if firmware detects port logout
    in qla2x00_async_event()

22. Session is deleted in during initiator I/O in qla2x00_status_entry()
    if firmware returns a kind of logged out state.

23. Conflicting session with the same N_Port_ID is deleted in cases when
    login is performed by firmware in qla24xx_report_id_acquisition().

24. In case of link loss, all session are deleted in
    qla2x00_mark_all_devices_lost().

25. Another session with the same N_Port_ID is deleted during the
    session init in qla24xx_create_new_sess().

26. All target sessions are deleted in qlt_clear_tgt_db() as part of
    target shutdown in qlt_stop_phase1() and in case of LIP or if Port
    configuration of the target port was changed.

27. Session deletion happens in the dead code, qlt_fc_port_deleted().
    Perhaps the function can be safely removed.

28. Session is deleted for a port that issued LOGO, TPRLO and PRLO in
    qlt_xmit_tm_rsp().

29. Session is deleted in qlt_do_ctio_completion() during the I/O in
    target mode if firmware detects initiator logout and responds to the
    driver with a related error.

30.
31.
32. A PLOGI handler is going to delete stale sessions in
    qlt_find_sess_invalidate_other() if S_ID or N_Port handle of
    the PLOGI matches N_Port_ID or N_Port handle of an existing session.

33. PLOGI, PRLI will result in session deletion if the session is
    already established.

    FWIW, NVMe PRLI may destroy FCP session in case of simultaneous
    FCP/NVMe target, current code doesn't handle multiple process logins
    well. It shouldn't be a big deal since qla2xxx_nvmet hasn't been
    merged to mainline yet.

34. TPRLO, PRLO, LOGO result in session deletion in qlt_24xx_handle_els()
    After inspecting the code I think, only LOGO should fully delete the
    session.

    In the case of mixed FC-NVMe/FCP initiator/target, PRLO/TPRLO
    to/from SCSI target may break sessions of FC-NVMe, likewise FC-NVMe
    PRLO/TPRLO may break FCP sessions.

    PRLO and TPRLO (which is like PRLO from all ports) should only
    downgrade session of the requested FC-4 type to a kind of "process
    (i.e. FCP/FC-NVMe) logged out" state rather than delete it. TPRLO
    should log out all sessions of the FC-4 type on the target port,
    while PRLO should do it only for the originator of the ELS request.


> E.g. if after the RSCN the connection to the initiator is actually
> lost, either temporarily or for good? Would the session be deleted in
> some other code path, or would it just continue to lurk around?
> 

So, as a summary, it seems that the existing code mostly keeps sessions
until:
 - a conflict of N_Port_ID, N_Port_Name or N_Port handle detected;
 - the session is in the middle of refresh/rescan and RSCN arrives;
 - there's an explicit process or port logout from the session, i.e.
   PRLO, TPRLO, LOGO
 - there's a new port or process login from the same session, i.e.
   PLOGI, PRLI;
 - target is shut down;
 - target port link is reset;

And if a session of an initiator is deleted when driver works in target
mode, new session won't be established until a PLOGI and PRLI come from
the initiator.

The assumption is used qlt_handle_cmd_for_atio() and
qlt_handle_task_mgmt() to discard commands and TMFs from initiators that
are not logged in.

Thanks,
Roman

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] scsi: qla2xxx: Keep initiator ports after RSCN
@ 2020-05-21 15:17     ` Roman Bolshakov
  0 siblings, 0 replies; 18+ messages in thread
From: Roman Bolshakov @ 2020-05-21 15:17 UTC (permalink / raw)
  To: Martin Wilck
  Cc: linux-scsi, GR-QLogic-Storage-Upstream, target-devel, linux,
	Quinn Tran, Arun Easi, Nilesh Javali, Bart Van Assche,
	Daniel Wagner, Himanshu Madhani, stable

On Tue, May 19, 2020 at 10:46:54AM +0200, Martin Wilck wrote:
> On Mon, 2020-05-18 at 21:31 +0300, Roman Bolshakov wrote:
> > The driver performs SCR (state change registration) in all modes
> > including pure target mode.
> > 
> > For each RSCN, scan_needed flag is set in qla2x00_handle_rscn() for
> > the
> > port mentioned in the RSCN and fabric rescan is scheduled. During the
> > rescan, GNN_FT handler, qla24xx_async_gnnft_done() deletes session of
> > the port that caused the RSCN.
> > 
> > In target mode, the session deletion has an impact on ATIO handler,
> > qlt_24xx_atio_pkt(). Target responds with SAM STATUS BUSY to I/O
> > incoming from the deleted session. qlt_handle_cmd_for_atio() and
> > qlt_handle_task_mgmt() return -EFAULT if they are not able to find
> > session of the command/TMF, and that results in invocation of
> > qlt_send_busy():
> > 
> >   qlt_24xx_atio_pkt_all_vps: qla_target(0): type 6 ox_id 0014
> >   qla_target(0): Unable to send command to target, sending BUSY
> > status
> > 
> > Such response causes command timeout on the initiator. Error handler
> > thread on the initiator will be spawned to abort the commands:
> > 
> >   scsi 23:0:0:0: tag#0 abort scheduled
> >   scsi 23:0:0:0: tag#0 aborting command
> >   qla2xxx [0000:af:00.0]-188c:23: Entered qla24xx_abort_command.
> >   qla2xxx [0000:af:00.0]-801c:23: Abort command issued nexus=23:0:0
> > -- 0 2003.
> > 
> > Command abort is rejected by target and fails (2003), error handler
> > then
> > tries to perform DEVICE RESET and TARGET RESET but they're also
> > doomed
> > to fail because TMFs are ignored for the deleted sessions.
> > 
> > Then initiator makes BUS RESET that resets the link via
> > qla2x00_full_login_lip(). BUS RESET succeeds and brings initiator
> > port
> > up, SAN switch detects that and sends RSCN to the target port and it
> > fails again the same way as described above. It never goes out of the
> > loop.
> > 
> > The change breaks the RSCN loop by keeping initiator sessions
> > mentioned
> > in RSCN payload in all modes, including dual and pure target mode.
> > 
> > Fixes: 2037ce49d30a ("scsi: qla2xxx: Fix stale session")
> > Cc: Quinn Tran <qutran@marvell.com>
> > Cc: Arun Easi <aeasi@marvell.com>
> > Cc: Nilesh Javali <njavali@marvell.com>
> > Cc: Bart Van Assche <bvanassche@acm.org>
> > Cc: Daniel Wagner <dwagner@suse.de>
> > Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
> > Cc: Martin Wilck <mwilck@suse.com>
> > Cc: stable@vger.kernel.org # v5.4+
> > Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
> > ---
> >  drivers/scsi/qla2xxx/qla_gs.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > Hi Martin,
> > 
> > Please apply the patch to scsi-fixes/5.7 at your earliest
> > convenience.
> > 
> > qla2xxx in target and, likely, dual mode is unusable in some SAN
> > fabrics
> > due to the bug.
> > 
> > Thanks,
> > Roman
> > 
> > diff --git a/drivers/scsi/qla2xxx/qla_gs.c
> > b/drivers/scsi/qla2xxx/qla_gs.c
> > index 42c3ad27f1cb..b9955af5cffe 100644
> > --- a/drivers/scsi/qla2xxx/qla_gs.c
> > +++ b/drivers/scsi/qla2xxx/qla_gs.c
> > @@ -3495,8 +3495,10 @@ void qla24xx_async_gnnft_done(scsi_qla_host_t
> > *vha, srb_t *sp)
> >  			if ((fcport->flags & FCF_FABRIC_DEVICE) == 0) {
> >  				qla2x00_clear_loop_id(fcport);
> >  				fcport->flags |= FCF_FABRIC_DEVICE;
> > -			} else if (fcport->d_id.b24 != rp->id.b24 ||
> > -				fcport->scan_needed) {
> > +			} else if ((fcport->d_id.b24 != rp->id.b24 ||
> > +				    fcport->scan_needed) &&
> > +				   (fcport->port_type != FCT_INITIATOR
> > &&
> > +				    fcport->port_type !=
> > FCT_NVME_INITIATOR)) {
> >  				qlt_schedule_sess_for_deletion(fcport);
> >  			}
> >  			fcport->d_id.b24 = rp->id.b24;
> 
> Hi Roman,
> 
> what if the session does need to be deleted eventually?

Hi Martin,

I'm sorry for the delay, it took a while to complete the answer :)

In the other reply to the patch I sent a proposal for v2. We might go
with it to handle N_Port_ID changes. N_Port_ID may change in the
switched fabric topology if initiator cable is replugged to another
physical port on the SAN switch (some fabrics assign physical port
number to domain area). Physical reconnection means initiator is going
to relogin anyway and previous session is no longer needed.

Third variant would be to check non-NULL TCM session on the fcport instead of
checking against fc_port type but that will be less reliable if qla2xxx_nvmet
is added to the mainline, i.e.:

     } else if (fcport->d_id.b24 != rp->id.b24 ||
                (fcport->scan_needed &&
                 !fcport->se_sess)) {
             qlt_schedule_sess_for_deletion(fcport);
     }

Fourth variant would be to disable SCR in pure target mode as Firmware
Interface Specification suggests. Although, the issue would still exit
in dual mode. Proposal #2 and #3 fix the issue in target and dual modes.

> E.g. if after the RSCN the connection to the initiator is actually
> lost, either temporarily or for good? Would the session be deleted in
> some other code path, or would it just continue to lurk around?
> 

RSCN per-se shouldn't be a reason to delete initiator IMO, since it's
just a notification that state of the entity in the fabric has changed.
Nx_Port should figure out what changed and then decide if it's worth to
keep session of the port.

There are multiple cases in v5.7 when session deletion happens. I tried
to cover them below in cscope order and inspected the code around.
Pardon me for mistakes if I overlooked something:

1.
2.
3.  GPN_ID request may be issued to find if something has changed with a
    port after an RSCN.

    I don't see if it's called anywhere though. There's an request of
    GPN_ID inside GPN_ID response handling but it's never called
    outside. So it looks a kind of dead code:

    qla24xx_post_gpnid_work() sends QLA_EVT_GPNID ->
    qla2x00_do_work() dispatches the event to qla24xx_async_gpnid() ->
    qla24xx_async_gpnid() sends IOCB with GPN_ID to fimrware and sets up
    call back to qla2x00_async_gpnid_sp_done() ->
    qla2x00_async_gpnid_sp_done() calls qla24xx_post_gpnid_work() if the
    IOCB timed out or RSCN arrived in the middle of the request.


    If GPN_ID request fails, all session on the host are deleted in
    qla24xx_handle_gpnid_event(). It's kind of weird that it deletes
    everything. May be it was expected to look like this to delete only
    the port that was requested in the GPN_ID request (ea->id.b24):

      if (ea->rc) {
              /* cable is disconnected */
              list_for_each_entry_safe(fcport, t, &vha->vp_fcports, list) {
                      if (fcport->d_id.b24 == ea->id.b24) {
                              fcport->scan_state = QLA_FCPORT_SCAN;

         		      qlt_schedule_sess_for_deletion(fcport);
         	     }
              }
      } else {

    Instead of (5.7/scsi-fixes):

      if (ea->rc) {
              /* cable is disconnected */
              list_for_each_entry_safe(fcport, t, &vha->vp_fcports, list) {
                      if (fcport->d_id.b24 == ea->id.b24)
                              fcport->scan_state = QLA_FCPORT_SCAN;

                      qlt_schedule_sess_for_deletion(fcport);
              }
      } else {


    If GPN_ID request suceeds, and there's a session for the port with
    the same N_Port_Name, it deletes all sessions on the host in
    qla24xx_handle_gpnid_event(). It looks like too heavy hammer IMO,
    maybe it was expected to behave like this to replace only session
    session with the same N_Port_Name (including deleted sessions):

      list_for_each_entry_safe(conflict, t, &vha->vp_fcports,
          list) {
              if ((conflict->d_id.b24 == ea->id.b24) &&
                  (fcport != conflict)) {
                      /*
                       * 2 fcports with conflict Nport ID or
                       * an existing fcport is having nport ID
                       * conflict with new fcport.
                       */

                      conflict->scan_state = QLA_FCPORT_SCAN;

         	      qlt_schedule_sess_for_deletion(conflict);
              }
      }

    Instead of (5.7/scsi-fixes):

      list_for_each_entry_safe(conflict, t, &vha->vp_fcports,
          list) {
              if ((conflict->d_id.b24 == ea->id.b24) &&
                  (fcport != conflict))
                      /*
                       * 2 fcports with conflict Nport ID or
                       * an existing fcport is having nport ID
                       * conflict with new fcport.
                       */

                      conflict->scan_state = QLA_FCPORT_SCAN;

              qlt_schedule_sess_for_deletion(conflict);
      }

    If GPN_ID request succeeds, no sessions were were found for the
    N_Port_Name but there's a session with conflicting N_Port_ID, it gets
    deleted in qla24xx_handle_gpnid_event().

4.
5.  Session is deleted during fabric rescan in qla24xx_async_gnnft_done()
    if RSCN had N_Port_ID of the session (**The patch aims to address
    the issue**) or N_Port_ID was changed in GPN_FT reply.

    Sessions missing in GPN_FT response are deleted after fabric
    discovery in qla24xx_async_gnnft_done() if driver is running in pure
    initiator or dual mode. So, if zone definition changed or some ports
    logged out, sessions of the ports that somehow disappeared from the
    zone are going to be deleted.

6.
7.  Session is deleted if ADISC to an N_Port fails
    qla24xx_handle_adisc_event(). It's also going to be deleted if
    GPN_ID was done for the port or RSCN for the port arrived during ADISC.

    I'm not sure but it may cause session loss for the target
    mode and then I/O timeout on the attached initiator.

8.
9.  RSCN or fabric rescan during GNL will trigger session deletion in
    qla24xx_handle_gnl_done_event().

10.
11. GPDB handling during login will delete session in
    qla24xx_handle_gpdb_event() if RSCN has happened or GPN_ID was made
    in the middle of the GPDB for the port. If ports are detected by
    GPDB as logged out or logout is pending, their session is also
    deleted.

12. Session is going to be deleted in P2P mode if a spare N_Port handle
    can't be found in qla_chk_n2n_b4_login().

13. Failed PLOGI in all topologies but P2P is going to trigger session
    deletion in qla24xx_fcport_handle_login().

14. Session is deleted after an unsuccessful PRLI in
    qla24xx_handle_prli_done_event(). I wonder if dual FCP/FC-NVMe
    sessions are impacted by that.

15.
16. RSCN during PLOGI or duplicated N_Port handle will trigger session
    deletion in qla24xx_handle_plogi_done_event()

17. Missing target, switch, name server and NVMe sessions are deleted in
    after rescan of arbitrated loop by qla2x00_configure_local_loop() in
    dual and pure initiator mode.

    This check is similar to 5. Code duplication can be removed.

18. An RSCN during port registration is going to trigger deletion of the
    related session in qla_register_fcport_fn().

19. Missing target, switch, name server and NVMe sessions are deleted
    after sync fabric scan by qla2x00_find_all_fabric_devs() in dual and
    pure initiator mode.

    The code is similar to 5 and 17, perhaps the duplication can be
    removed.

20. Session is deleted in qla2x00_els_dcmd2_sp_done() if duplicate
    N_Port handle was detected by firmware during Login IOCB.

21. Session is deleted in initiator mode if firmware detects port logout
    in qla2x00_async_event()

22. Session is deleted in during initiator I/O in qla2x00_status_entry()
    if firmware returns a kind of logged out state.

23. Conflicting session with the same N_Port_ID is deleted in cases when
    login is performed by firmware in qla24xx_report_id_acquisition().

24. In case of link loss, all session are deleted in
    qla2x00_mark_all_devices_lost().

25. Another session with the same N_Port_ID is deleted during the
    session init in qla24xx_create_new_sess().

26. All target sessions are deleted in qlt_clear_tgt_db() as part of
    target shutdown in qlt_stop_phase1() and in case of LIP or if Port
    configuration of the target port was changed.

27. Session deletion happens in the dead code, qlt_fc_port_deleted().
    Perhaps the function can be safely removed.

28. Session is deleted for a port that issued LOGO, TPRLO and PRLO in
    qlt_xmit_tm_rsp().

29. Session is deleted in qlt_do_ctio_completion() during the I/O in
    target mode if firmware detects initiator logout and responds to the
    driver with a related error.

30.
31.
32. A PLOGI handler is going to delete stale sessions in
    qlt_find_sess_invalidate_other() if S_ID or N_Port handle of
    the PLOGI matches N_Port_ID or N_Port handle of an existing session.

33. PLOGI, PRLI will result in session deletion if the session is
    already established.

    FWIW, NVMe PRLI may destroy FCP session in case of simultaneous
    FCP/NVMe target, current code doesn't handle multiple process logins
    well. It shouldn't be a big deal since qla2xxx_nvmet hasn't been
    merged to mainline yet.

34. TPRLO, PRLO, LOGO result in session deletion in qlt_24xx_handle_els()
    After inspecting the code I think, only LOGO should fully delete the
    session.

    In the case of mixed FC-NVMe/FCP initiator/target, PRLO/TPRLO
    to/from SCSI target may break sessions of FC-NVMe, likewise FC-NVMe
    PRLO/TPRLO may break FCP sessions.

    PRLO and TPRLO (which is like PRLO from all ports) should only
    downgrade session of the requested FC-4 type to a kind of "process
    (i.e. FCP/FC-NVMe) logged out" state rather than delete it. TPRLO
    should log out all sessions of the FC-4 type on the target port,
    while PRLO should do it only for the originator of the ELS request.


> E.g. if after the RSCN the connection to the initiator is actually
> lost, either temporarily or for good? Would the session be deleted in
> some other code path, or would it just continue to lurk around?
> 

So, as a summary, it seems that the existing code mostly keeps sessions
until:
 - a conflict of N_Port_ID, N_Port_Name or N_Port handle detected;
 - the session is in the middle of refresh/rescan and RSCN arrives;
 - there's an explicit process or port logout from the session, i.e.
   PRLO, TPRLO, LOGO
 - there's a new port or process login from the same session, i.e.
   PLOGI, PRLI;
 - target is shut down;
 - target port link is reset;

And if a session of an initiator is deleted when driver works in target
mode, new session won't be established until a PLOGI and PRLI come from
the initiator.

The assumption is used qlt_handle_cmd_for_atio() and
qlt_handle_task_mgmt() to discard commands and TMFs from initiators that
are not logged in.

Thanks,
Roman

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] scsi: qla2xxx: Keep initiator ports after RSCN
  2020-05-18 23:22   ` Roman Bolshakov
@ 2020-05-26 19:17     ` Martin Wilck
  -1 siblings, 0 replies; 18+ messages in thread
From: Martin Wilck @ 2020-05-26 19:17 UTC (permalink / raw)
  To: Roman Bolshakov, linux-scsi
  Cc: GR-QLogic-Storage-Upstream, target-devel, linux, Quinn Tran,
	Arun Easi, Nilesh Javali, Bart Van Assche, Daniel Wagner,
	Himanshu Madhani, stable

On Tue, 2020-05-19 at 02:22 +0300, Roman Bolshakov wrote:
> On Mon, May 18, 2020 at 09:31:42PM +0300, Roman Bolshakov wrote:
> > 
> 
> P.S. A little bit cleaner alternative would be to avoid session
> deletion
> only if scan needed is set (that allows session deletion of initiator
> ports after fabric discovery if N_Port ID change happened). Please
> let
> me know if I should submit v2 like this:
> 
> diff --git a/drivers/scsi/qla2xxx/qla_gs.c
> b/drivers/scsi/qla2xxx/qla_gs.c
> index 42c3ad27f1cb..b9955af5cffe 100644
> --- a/drivers/scsi/qla2xxx/qla_gs.c
> +++ b/drivers/scsi/qla2xxx/qla_gs.c
> @@ -3495,8 +3495,10 @@ void qla24xx_async_gnnft_done(scsi_qla_host_t
> *vha, srb_t *sp)
>  			if ((fcport->flags & FCF_FABRIC_DEVICE) = 0) {
>  				qla2x00_clear_loop_id(fcport);
>  				fcport->flags |= FCF_FABRIC_DEVICE;
> -			} else if (fcport->d_id.b24 != rp->id.b24 ||
> -				fcport->scan_needed) {
> +			} else if (fcport->d_id.b24 != rp->id.b24 ||
> +				   (fcport->scan_needed &&
> +				    fcport->port_type != FCT_INITIATOR
> &&
> +				    fcport->port_type !> FCT_NVME_INITIATOR)) {
>  				qlt_schedule_sess_for_deletion(fcport);
>  			}
>  			fcport->d_id.b24 = rp->id.b24;


Yes, this looks like an improvement to me. To express it in my own
words (as the logic is subtle): wrt the original code, this only
changes the behavior if the NPort ID is unchanged but "scan_needed" is
set - in this case the session used to be deleted, but with this patch
it isn't any more.

Thanks,
Martin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] scsi: qla2xxx: Keep initiator ports after RSCN
@ 2020-05-26 19:17     ` Martin Wilck
  0 siblings, 0 replies; 18+ messages in thread
From: Martin Wilck @ 2020-05-26 19:17 UTC (permalink / raw)
  To: Roman Bolshakov, linux-scsi
  Cc: GR-QLogic-Storage-Upstream, target-devel, linux, Quinn Tran,
	Arun Easi, Nilesh Javali, Bart Van Assche, Daniel Wagner,
	Himanshu Madhani, stable

On Tue, 2020-05-19 at 02:22 +0300, Roman Bolshakov wrote:
> On Mon, May 18, 2020 at 09:31:42PM +0300, Roman Bolshakov wrote:
> > 
> 
> P.S. A little bit cleaner alternative would be to avoid session
> deletion
> only if scan needed is set (that allows session deletion of initiator
> ports after fabric discovery if N_Port ID change happened). Please
> let
> me know if I should submit v2 like this:
> 
> diff --git a/drivers/scsi/qla2xxx/qla_gs.c
> b/drivers/scsi/qla2xxx/qla_gs.c
> index 42c3ad27f1cb..b9955af5cffe 100644
> --- a/drivers/scsi/qla2xxx/qla_gs.c
> +++ b/drivers/scsi/qla2xxx/qla_gs.c
> @@ -3495,8 +3495,10 @@ void qla24xx_async_gnnft_done(scsi_qla_host_t
> *vha, srb_t *sp)
>  			if ((fcport->flags & FCF_FABRIC_DEVICE) == 0) {
>  				qla2x00_clear_loop_id(fcport);
>  				fcport->flags |= FCF_FABRIC_DEVICE;
> -			} else if (fcport->d_id.b24 != rp->id.b24 ||
> -				fcport->scan_needed) {
> +			} else if (fcport->d_id.b24 != rp->id.b24 ||
> +				   (fcport->scan_needed &&
> +				    fcport->port_type != FCT_INITIATOR
> &&
> +				    fcport->port_type !=
> FCT_NVME_INITIATOR)) {
>  				qlt_schedule_sess_for_deletion(fcport);
>  			}
>  			fcport->d_id.b24 = rp->id.b24;


Yes, this looks like an improvement to me. To express it in my own
words (as the logic is subtle): wrt the original code, this only
changes the behavior if the NPort ID is unchanged but "scan_needed" is
set - in this case the session used to be deleted, but with this patch
it isn't any more.

Thanks,
Martin



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] scsi: qla2xxx: Keep initiator ports after RSCN
  2020-05-21 15:17     ` Roman Bolshakov
@ 2020-05-26 19:21       ` Martin Wilck
  -1 siblings, 0 replies; 18+ messages in thread
From: Martin Wilck @ 2020-05-26 19:21 UTC (permalink / raw)
  To: Roman Bolshakov
  Cc: linux-scsi, GR-QLogic-Storage-Upstream, target-devel, linux,
	Quinn Tran, Arun Easi, Nilesh Javali, Bart Van Assche,
	Daniel Wagner, Himanshu Madhani, stable

Roman,

> [...]
>
> So, as a summary, it seems that the existing code mostly keeps
> sessions
> until:
>  - a conflict of N_Port_ID, N_Port_Name or N_Port handle detected;
>  - the session is in the middle of refresh/rescan and RSCN arrives;
>  - there's an explicit process or port logout from the session, i.e.
>    PRLO, TPRLO, LOGO
>  - there's a new port or process login from the same session, i.e.
>    PLOGI, PRLI;
>  - target is shut down;
>  - target port link is reset;
> 
> And if a session of an initiator is deleted when driver works in
> target
> mode, new session won't be established until a PLOGI and PRLI come
> from
> the initiator.
> 
> The assumption is used qlt_handle_cmd_for_atio() and
> qlt_handle_task_mgmt() to discard commands and TMFs from initiators
> that
> are not logged in.
> 

thank you for this incredibly extensive response. I'll bookmark it,
I guess it can serve as a reference for future qla2xxx work. I hope you
didn't do all this work just for my little question.

Forgive me that I won't immediately try to review this document in
detail. You've studied and understood this driver in much more depth
than I probably ever will.

Thanks again,
Martin



> Thanks,
> Roman

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] scsi: qla2xxx: Keep initiator ports after RSCN
@ 2020-05-26 19:21       ` Martin Wilck
  0 siblings, 0 replies; 18+ messages in thread
From: Martin Wilck @ 2020-05-26 19:21 UTC (permalink / raw)
  To: Roman Bolshakov
  Cc: linux-scsi, GR-QLogic-Storage-Upstream, target-devel, linux,
	Quinn Tran, Arun Easi, Nilesh Javali, Bart Van Assche,
	Daniel Wagner, Himanshu Madhani, stable

Roman,

> [...]
>
> So, as a summary, it seems that the existing code mostly keeps
> sessions
> until:
>  - a conflict of N_Port_ID, N_Port_Name or N_Port handle detected;
>  - the session is in the middle of refresh/rescan and RSCN arrives;
>  - there's an explicit process or port logout from the session, i.e.
>    PRLO, TPRLO, LOGO
>  - there's a new port or process login from the same session, i.e.
>    PLOGI, PRLI;
>  - target is shut down;
>  - target port link is reset;
> 
> And if a session of an initiator is deleted when driver works in
> target
> mode, new session won't be established until a PLOGI and PRLI come
> from
> the initiator.
> 
> The assumption is used qlt_handle_cmd_for_atio() and
> qlt_handle_task_mgmt() to discard commands and TMFs from initiators
> that
> are not logged in.
> 

thank you for this incredibly extensive response. I'll bookmark it,
I guess it can serve as a reference for future qla2xxx work. I hope you
didn't do all this work just for my little question.

Forgive me that I won't immediately try to review this document in
detail. You've studied and understood this driver in much more depth
than I probably ever will.

Thanks again,
Martin



> Thanks,
> Roman



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] scsi: qla2xxx: Keep initiator ports after RSCN
  2020-05-18 18:31 ` Roman Bolshakov
@ 2020-06-03  1:42   ` Martin K. Petersen
  -1 siblings, 0 replies; 18+ messages in thread
From: Martin K. Petersen @ 2020-06-03  1:42 UTC (permalink / raw)
  To: Roman Bolshakov
  Cc: linux-scsi, GR-QLogic-Storage-Upstream, target-devel, linux,
	Quinn Tran, Arun Easi, Nilesh Javali, Bart Van Assche,
	Daniel Wagner, Himanshu Madhani, Martin Wilck, stable


Roman,

> The driver performs SCR (state change registration) in all modes
> including pure target mode.

What is the current status of this patch? There was a bunch of going
back and forth wrt. whether this was the correct approach.

Thanks!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] scsi: qla2xxx: Keep initiator ports after RSCN
@ 2020-06-03  1:42   ` Martin K. Petersen
  0 siblings, 0 replies; 18+ messages in thread
From: Martin K. Petersen @ 2020-06-03  1:42 UTC (permalink / raw)
  To: Roman Bolshakov
  Cc: linux-scsi, GR-QLogic-Storage-Upstream, target-devel, linux,
	Quinn Tran, Arun Easi, Nilesh Javali, Bart Van Assche,
	Daniel Wagner, Himanshu Madhani, Martin Wilck, stable


Roman,

> The driver performs SCR (state change registration) in all modes
> including pure target mode.

What is the current status of this patch? There was a bunch of going
back and forth wrt. whether this was the correct approach.

Thanks!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] scsi: qla2xxx: Keep initiator ports after RSCN
  2020-06-03  1:42   ` Martin K. Petersen
@ 2020-06-05 14:23     ` Roman Bolshakov
  -1 siblings, 0 replies; 18+ messages in thread
From: Roman Bolshakov @ 2020-06-05 14:23 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: linux-scsi, GR-QLogic-Storage-Upstream, target-devel, linux,
	Quinn Tran, Arun Easi, Nilesh Javali, Bart Van Assche,
	Daniel Wagner, Himanshu Madhani, Martin Wilck, stable

On Tue, Jun 02, 2020 at 09:42:06PM -0400, Martin K. Petersen wrote:
> 
> Roman,
> 
> > The driver performs SCR (state change registration) in all modes
> > including pure target mode.
> 
> What is the current status of this patch? There was a bunch of going
> back and forth wrt. whether this was the correct approach.
> 
> Thanks!
> 

Hi Martin,

The patch has a logic error that prevents session deletion if N_Port ID
was changed after fabric rescan and shouldn't be merged IMO. I'll send a
correct v2 shortly.

Thanks,
Roman

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] scsi: qla2xxx: Keep initiator ports after RSCN
@ 2020-06-05 14:23     ` Roman Bolshakov
  0 siblings, 0 replies; 18+ messages in thread
From: Roman Bolshakov @ 2020-06-05 14:23 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: linux-scsi, GR-QLogic-Storage-Upstream, target-devel, linux,
	Quinn Tran, Arun Easi, Nilesh Javali, Bart Van Assche,
	Daniel Wagner, Himanshu Madhani, Martin Wilck, stable

On Tue, Jun 02, 2020 at 09:42:06PM -0400, Martin K. Petersen wrote:
> 
> Roman,
> 
> > The driver performs SCR (state change registration) in all modes
> > including pure target mode.
> 
> What is the current status of this patch? There was a bunch of going
> back and forth wrt. whether this was the correct approach.
> 
> Thanks!
> 

Hi Martin,

The patch has a logic error that prevents session deletion if N_Port ID
was changed after fabric rescan and shouldn't be merged IMO. I'll send a
correct v2 shortly.

Thanks,
Roman

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-06-05 14:23 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-18 18:31 [PATCH] scsi: qla2xxx: Keep initiator ports after RSCN Roman Bolshakov
2020-05-18 18:31 ` Roman Bolshakov
2020-05-18 22:40 ` Himanshu Madhani
2020-05-18 22:40   ` Himanshu Madhani
2020-05-18 23:22 ` Roman Bolshakov
2020-05-18 23:22   ` Roman Bolshakov
2020-05-26 19:17   ` Martin Wilck
2020-05-26 19:17     ` Martin Wilck
2020-05-19  8:46 ` Martin Wilck
2020-05-19  8:46   ` Martin Wilck
2020-05-21 15:17   ` Roman Bolshakov
2020-05-21 15:17     ` Roman Bolshakov
2020-05-26 19:21     ` Martin Wilck
2020-05-26 19:21       ` Martin Wilck
2020-06-03  1:42 ` Martin K. Petersen
2020-06-03  1:42   ` Martin K. Petersen
2020-06-05 14:23   ` Roman Bolshakov
2020-06-05 14:23     ` Roman Bolshakov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.