All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] qla2xxx patches for kernel v5.6
@ 2019-12-09 18:02 Bart Van Assche
  2019-12-09 18:02 ` [PATCH 1/4] qla2xxx: Check locking assumptions at runtime in qla2x00_abort_srb() Bart Van Assche
                   ` (4 more replies)
  0 siblings, 5 replies; 37+ messages in thread
From: Bart Van Assche @ 2019-12-09 18:02 UTC (permalink / raw)
  To: Martin K . Petersen, James E . J . Bottomley; +Cc: linux-scsi, Bart Van Assche

Hi Martin,

Please consider these four small qla2xxx patches for kernel v5.6.

Thanks,

Bart.

Bart Van Assche (4):
  qla2xxx: Check locking assumptions at runtime in qla2x00_abort_srb()
  qla2xxx: Simplify the code for aborting SCSI commands
  qla2xxx: Fix point-to-point mode for qla25xx and older
  qla2xxx: Micro-optimize qla2x00_configure_local_loop()

 drivers/scsi/qla2xxx/qla_def.h  |  5 +----
 drivers/scsi/qla2xxx/qla_init.c |  9 ++++-----
 drivers/scsi/qla2xxx/qla_iocb.c | 14 ++++++++++----
 drivers/scsi/qla2xxx/qla_isr.c  |  5 -----
 drivers/scsi/qla2xxx/qla_os.c   | 17 ++++-------------
 5 files changed, 19 insertions(+), 31 deletions(-)


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 1/4] qla2xxx: Check locking assumptions at runtime in qla2x00_abort_srb()
  2019-12-09 18:02 [PATCH 0/4] qla2xxx patches for kernel v5.6 Bart Van Assche
@ 2019-12-09 18:02 ` Bart Van Assche
  2019-12-10 11:06   ` Martin Wilck
                     ` (2 more replies)
  2019-12-09 18:02 ` [PATCH 2/4] qla2xxx: Simplify the code for aborting SCSI commands Bart Van Assche
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 37+ messages in thread
From: Bart Van Assche @ 2019-12-09 18:02 UTC (permalink / raw)
  To: Martin K . Petersen, James E . J . Bottomley
  Cc: linux-scsi, Bart Van Assche, Quinn Tran, Martin Wilck,
	Daniel Wagner, Roman Bolshakov

Document the locking assumptions this function relies on and also verify these
locking assumptions at runtime.

Cc: Quinn Tran <qutran@marvell.com>
Cc: Martin Wilck <mwilck@suse.com>
Cc: Daniel Wagner <dwagner@suse.de>
Cc: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/qla2xxx/qla_os.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index 8b84bc4a6ac8..145ea93206f0 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -1700,6 +1700,8 @@ static void qla2x00_abort_srb(struct qla_qpair *qp, srb_t *sp, const int res,
 	bool ret_cmd;
 	uint32_t ratov_j;
 
+	lockdep_assert_held(qp->qp_lock_ptr);
+
 	if (qla2x00_chip_is_down(vha)) {
 		sp->done(sp, res);
 		return;

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 2/4] qla2xxx: Simplify the code for aborting SCSI commands
  2019-12-09 18:02 [PATCH 0/4] qla2xxx patches for kernel v5.6 Bart Van Assche
  2019-12-09 18:02 ` [PATCH 1/4] qla2xxx: Check locking assumptions at runtime in qla2x00_abort_srb() Bart Van Assche
@ 2019-12-09 18:02 ` Bart Van Assche
  2019-12-10 11:47   ` Martin Wilck
                     ` (2 more replies)
  2019-12-09 18:02 ` [PATCH 3/4] qla2xxx: Fix point-to-point mode for qla25xx and older Bart Van Assche
                   ` (2 subsequent siblings)
  4 siblings, 3 replies; 37+ messages in thread
From: Bart Van Assche @ 2019-12-09 18:02 UTC (permalink / raw)
  To: Martin K . Petersen, James E . J . Bottomley
  Cc: linux-scsi, Bart Van Assche, Quinn Tran, Martin Wilck,
	Daniel Wagner, Roman Bolshakov

Since the SCSI core does not reuse the tag of the SCSI command that is
being aborted by .eh_abort() before .eh_abort() has finished it is not
necessary to check from inside that callback whether or not the SCSI command
has already completed. Instead, rely on the firmware to return an error code
when attempting to abort a command that has already completed. Additionally,
rely on the firmware to return an error code when attempting to abort an
already aborted command.

In qla2x00_abort_srb(), use blk_mq_request_started() instead of
sp->completed and sp->aborted.

This patch eliminates several race conditions triggered by the removed member
variables.

Cc: Quinn Tran <qutran@marvell.com>
Cc: Martin Wilck <mwilck@suse.com>
Cc: Daniel Wagner <dwagner@suse.de>
Cc: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/qla2xxx/qla_def.h |  3 ---
 drivers/scsi/qla2xxx/qla_isr.c |  5 -----
 drivers/scsi/qla2xxx/qla_os.c  | 15 ++-------------
 3 files changed, 2 insertions(+), 21 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h
index 460f443f6471..ab7424318ee8 100644
--- a/drivers/scsi/qla2xxx/qla_def.h
+++ b/drivers/scsi/qla2xxx/qla_def.h
@@ -597,9 +597,6 @@ typedef struct srb {
 	struct fc_port *fcport;
 	struct scsi_qla_host *vha;
 	unsigned int start_timer:1;
-	unsigned int abort:1;
-	unsigned int aborted:1;
-	unsigned int completed:1;
 
 	uint32_t handle;
 	uint16_t flags;
diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
index 2601d7673c37..721a8d83e350 100644
--- a/drivers/scsi/qla2xxx/qla_isr.c
+++ b/drivers/scsi/qla2xxx/qla_isr.c
@@ -2487,11 +2487,6 @@ qla2x00_status_entry(scsi_qla_host_t *vha, struct rsp_que *rsp, void *pkt)
 		return;
 	}
 
-	if (sp->abort)
-		sp->aborted = 1;
-	else
-		sp->completed = 1;
-
 	if (sp->cmd_type != TYPE_SRB) {
 		req->outstanding_cmds[handle] = NULL;
 		ql_dbg(ql_dbg_io, vha, 0x3015,
diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index 145ea93206f0..2231d99d311b 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -1253,17 +1253,6 @@ qla2xxx_eh_abort(struct scsi_cmnd *cmd)
 		return SUCCESS;
 
 	spin_lock_irqsave(qpair->qp_lock_ptr, flags);
-	if (sp->completed) {
-		spin_unlock_irqrestore(qpair->qp_lock_ptr, flags);
-		return SUCCESS;
-	}
-
-	if (sp->abort || sp->aborted) {
-		spin_unlock_irqrestore(qpair->qp_lock_ptr, flags);
-		return FAILED;
-	}
-
-	sp->abort = 1;
 	sp->comp = &comp;
 	spin_unlock_irqrestore(qpair->qp_lock_ptr, flags);
 
@@ -1696,6 +1685,7 @@ static void qla2x00_abort_srb(struct qla_qpair *qp, srb_t *sp, const int res,
 	DECLARE_COMPLETION_ONSTACK(comp);
 	scsi_qla_host_t *vha = qp->vha;
 	struct qla_hw_data *ha = vha->hw;
+	struct scsi_cmnd *cmd = GET_CMD_SP(sp);
 	int rval;
 	bool ret_cmd;
 	uint32_t ratov_j;
@@ -1717,7 +1707,6 @@ static void qla2x00_abort_srb(struct qla_qpair *qp, srb_t *sp, const int res,
 		}
 
 		sp->comp = &comp;
-		sp->abort =  1;
 		spin_unlock_irqrestore(qp->qp_lock_ptr, *flags);
 
 		rval = ha->isp_ops->abort_command(sp);
@@ -1741,7 +1730,7 @@ static void qla2x00_abort_srb(struct qla_qpair *qp, srb_t *sp, const int res,
 		}
 
 		spin_lock_irqsave(qp->qp_lock_ptr, *flags);
-		if (ret_cmd && (!sp->completed || !sp->aborted))
+		if (ret_cmd && blk_mq_request_started(cmd->request))
 			sp->done(sp, res);
 	} else {
 		sp->done(sp, res);

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 3/4] qla2xxx: Fix point-to-point mode for qla25xx and older
  2019-12-09 18:02 [PATCH 0/4] qla2xxx patches for kernel v5.6 Bart Van Assche
  2019-12-09 18:02 ` [PATCH 1/4] qla2xxx: Check locking assumptions at runtime in qla2x00_abort_srb() Bart Van Assche
  2019-12-09 18:02 ` [PATCH 2/4] qla2xxx: Simplify the code for aborting SCSI commands Bart Van Assche
@ 2019-12-09 18:02 ` Bart Van Assche
  2019-12-10 10:52   ` Martin Wilck
  2019-12-10 19:40   ` Himanshu Madhani
  2019-12-09 18:02 ` [PATCH 4/4] qla2xxx: Micro-optimize qla2x00_configure_local_loop() Bart Van Assche
  2020-02-06 13:23 ` [PATCH 0/4] qla2xxx patches for kernel v5.6 Martin Wilck
  4 siblings, 2 replies; 37+ messages in thread
From: Bart Van Assche @ 2019-12-09 18:02 UTC (permalink / raw)
  To: Martin K . Petersen, James E . J . Bottomley
  Cc: linux-scsi, Bart Van Assche, Quinn Tran, Martin Wilck,
	Daniel Wagner, Roman Bolshakov

Restore point-to-point for qla25xx and older. Although this patch initializes
a field (s_id) that has been marked as "reserved" in the firmware manual, it
works fine on my setup.

Cc: Quinn Tran <qutran@marvell.com>
Cc: Martin Wilck <mwilck@suse.com>
Cc: Daniel Wagner <dwagner@suse.de>
Cc: Roman Bolshakov <r.bolshakov@yadro.com>
Fixes: 0aabb6b699f7 ("scsi: qla2xxx: Fix Nport ID display value")
Fixes: edd05de19759 ("scsi: qla2xxx: Changes to support N2N logins")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/qla2xxx/qla_iocb.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_iocb.c b/drivers/scsi/qla2xxx/qla_iocb.c
index b25f87ff8cde..e2e91b3f2e65 100644
--- a/drivers/scsi/qla2xxx/qla_iocb.c
+++ b/drivers/scsi/qla2xxx/qla_iocb.c
@@ -2656,10 +2656,16 @@ qla24xx_els_logo_iocb(srb_t *sp, struct els_entry_24xx *els_iocb)
 	els_iocb->port_id[0] = sp->fcport->d_id.b.al_pa;
 	els_iocb->port_id[1] = sp->fcport->d_id.b.area;
 	els_iocb->port_id[2] = sp->fcport->d_id.b.domain;
-	/* For SID the byte order is different than DID */
-	els_iocb->s_id[1] = vha->d_id.b.al_pa;
-	els_iocb->s_id[2] = vha->d_id.b.area;
-	els_iocb->s_id[0] = vha->d_id.b.domain;
+	if (IS_QLA23XX(vha->hw) || IS_QLA24XX(vha->hw) || IS_QLA25XX(vha->hw)) {
+		els_iocb->s_id[0] = vha->d_id.b.al_pa;
+		els_iocb->s_id[1] = vha->d_id.b.area;
+		els_iocb->s_id[2] = vha->d_id.b.domain;
+	} else {
+		/* For SID the byte order is different than DID */
+		els_iocb->s_id[1] = vha->d_id.b.al_pa;
+		els_iocb->s_id[2] = vha->d_id.b.area;
+		els_iocb->s_id[0] = vha->d_id.b.domain;
+	}
 
 	if (elsio->u.els_logo.els_cmd == ELS_DCMD_PLOGI) {
 		els_iocb->control_flags = 0;

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 4/4] qla2xxx: Micro-optimize qla2x00_configure_local_loop()
  2019-12-09 18:02 [PATCH 0/4] qla2xxx patches for kernel v5.6 Bart Van Assche
                   ` (2 preceding siblings ...)
  2019-12-09 18:02 ` [PATCH 3/4] qla2xxx: Fix point-to-point mode for qla25xx and older Bart Van Assche
@ 2019-12-09 18:02 ` Bart Van Assche
  2019-12-10 10:46   ` Martin Wilck
  2019-12-14 12:33   ` Roman Bolshakov
  2020-02-06 13:23 ` [PATCH 0/4] qla2xxx patches for kernel v5.6 Martin Wilck
  4 siblings, 2 replies; 37+ messages in thread
From: Bart Van Assche @ 2019-12-09 18:02 UTC (permalink / raw)
  To: Martin K . Petersen, James E . J . Bottomley
  Cc: linux-scsi, Bart Van Assche, Quinn Tran, Martin Wilck,
	Daniel Wagner, Roman Bolshakov

Instead of changing endianness in-place and copying the data in two steps,
do this in one step. This patch makes is a preparation step for fixing the
endianness warnings reported by 'sparse' for the qla2xxx driver.

Cc: Quinn Tran <qutran@marvell.com>
Cc: Martin Wilck <mwilck@suse.com>
Cc: Daniel Wagner <dwagner@suse.de>
Cc: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/qla2xxx/qla_def.h  | 2 +-
 drivers/scsi/qla2xxx/qla_init.c | 9 ++++-----
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h
index ab7424318ee8..cf23f10d27fe 100644
--- a/drivers/scsi/qla2xxx/qla_def.h
+++ b/drivers/scsi/qla2xxx/qla_def.h
@@ -414,7 +414,7 @@ struct els_logo_payload {
 struct els_plogi_payload {
 	uint8_t opcode;
 	uint8_t rsvd[3];
-	uint8_t data[112];
+	__be32	data[112 / 4];
 };
 
 struct ct_arg {
diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
index 6c28f38f8021..ddd8bf7997a8 100644
--- a/drivers/scsi/qla2xxx/qla_init.c
+++ b/drivers/scsi/qla2xxx/qla_init.c
@@ -5047,13 +5047,12 @@ qla2x00_configure_local_loop(scsi_qla_host_t *vha)
 			rval = qla24xx_get_port_login_templ(vha,
 			    ha->init_cb_dma, (void *)ha->init_cb, sz);
 			if (rval == QLA_SUCCESS) {
+				__be32 *q = &ha->plogi_els_payld.data[0];
+
 				bp = (uint32_t *)ha->init_cb;
-				for (i = 0; i < sz/4 ; i++, bp++)
-					*bp = cpu_to_be32(*bp);
+				for (i = 0; i < sz/4 ; i++, bp++, q++)
+					*q = cpu_to_be32(*bp);
 
-				memcpy(&ha->plogi_els_payld.data,
-				    (void *)ha->init_cb,
-				    sizeof(ha->plogi_els_payld.data));
 				set_bit(RELOGIN_NEEDED, &vha->dpc_flags);
 			} else {
 				ql_dbg(ql_dbg_init, vha, 0x00d1,

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/4] qla2xxx: Micro-optimize qla2x00_configure_local_loop()
  2019-12-09 18:02 ` [PATCH 4/4] qla2xxx: Micro-optimize qla2x00_configure_local_loop() Bart Van Assche
@ 2019-12-10 10:46   ` Martin Wilck
  2019-12-14 12:30     ` Roman Bolshakov
  2019-12-15  0:12     ` Bart Van Assche
  2019-12-14 12:33   ` Roman Bolshakov
  1 sibling, 2 replies; 37+ messages in thread
From: Martin Wilck @ 2019-12-10 10:46 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen, James E . J . Bottomley,
	Quinn Tran, Himanshu Madhani
  Cc: linux-scsi, Martin Wilck, Daniel Wagner, Roman Bolshakov

Hello Bart,

On Mon, 2019-12-09 at 10:02 -0800, Bart Van Assche wrote:
> Instead of changing endianness in-place and copying the data in two
> steps,
> do this in one step. This patch makes is a preparation step for
> fixing the
> endianness warnings reported by 'sparse' for the qla2xxx driver.
> 
> Cc: Quinn Tran <qutran@marvell.com>
> Cc: Martin Wilck <mwilck@suse.com>
> Cc: Daniel Wagner <dwagner@suse.de>
> Cc: Roman Bolshakov <r.bolshakov@yadro.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/scsi/qla2xxx/qla_def.h  | 2 +-
>  drivers/scsi/qla2xxx/qla_init.c | 9 ++++-----
>  2 files changed, 5 insertions(+), 6 deletions(-)

I suppose you're aware that this patch conflicts with Roman's pending
patch "scsi: qla2xxx: Don't defer relogin unconditonally".

> diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
> index 6c28f38f8021..ddd8bf7997a8 100644
> --- a/drivers/scsi/qla2xxx/qla_init.c
> +++ b/drivers/scsi/qla2xxx/qla_init.c
> @@ -5047,13 +5047,12 @@ qla2x00_configure_local_loop(scsi_qla_host_t *vha)
>  			rval = qla24xx_get_port_login_templ(vha,
>  			    ha->init_cb_dma, (void *)ha->init_cb, sz);
>  			if (rval == QLA_SUCCESS) {
> +				__be32 *q = &ha->plogi_els_payld.data[0];
> +
>  				bp = (uint32_t *)ha->init_cb;
> -				for (i = 0; i < sz/4 ; i++, bp++)
> -					*bp = cpu_to_be32(*bp);
> +				for (i = 0; i < sz/4 ; i++, bp++, q++)
> +					*q = cpu_to_be32(*bp);
>  
> -				memcpy(&ha->plogi_els_payld.data,
> -				    (void *)ha->init_cb,
> -				    sizeof(ha->plogi_els_payld.data));
>  				set_bit(RELOGIN_NEEDED, &vha->dpc_flags);
>  			} else {
>  				ql_dbg(ql_dbg_init, vha, 0x00d1,

A side effect of this patch would be that after return from
qla2x00_configure_local_loop(), the byte ordering in ha->init_cb
remains in CPU byte ordering, whereas before your patch, it would have
been converted to be32. I'm uncertain if that would matter later on.

The following is not a problems with your patch, but what's really
weird is that in qla24xx_get_port_login_templ() (which is only called
from here), the buffer is converted from le32 to CPU endianness, and
then here, in a second step, from CPU to be32. I'm wondering which byte
order this buffer is supposed to have, and whether that's different
depending on which mode the controller is operating in (the be32
conversion seems to be applied in N2N mode only). Moreover, looking at
the definition of init_cb_t in qla_def.h, this data structure actually
has mixed endianness, making me doubt that changing the endianness of
the whole buffer makes sense at all. Or is ha->init_cb simply being
abused in this part of the code?

I guess only Himanshu or Quinn can tell.

Martin



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/4] qla2xxx: Fix point-to-point mode for qla25xx and older
  2019-12-09 18:02 ` [PATCH 3/4] qla2xxx: Fix point-to-point mode for qla25xx and older Bart Van Assche
@ 2019-12-10 10:52   ` Martin Wilck
  2019-12-10 18:00     ` Bart Van Assche
  2019-12-12 20:07     ` Roman Bolshakov
  2019-12-10 19:40   ` Himanshu Madhani
  1 sibling, 2 replies; 37+ messages in thread
From: Martin Wilck @ 2019-12-10 10:52 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen, James E . J . Bottomley
  Cc: linux-scsi, Quinn Tran, Martin Wilck, Daniel Wagner, Roman Bolshakov

Hello Bart,

On Mon, 2019-12-09 at 10:02 -0800, Bart Van Assche wrote:
> Restore point-to-point for qla25xx and older. Although this patch
> initializes
> a field (s_id) that has been marked as "reserved" in the firmware
> manual, it
> works fine on my setup.
> 
> Cc: Quinn Tran <qutran@marvell.com>
> Cc: Martin Wilck <mwilck@suse.com>
> Cc: Daniel Wagner <dwagner@suse.de>
> Cc: Roman Bolshakov <r.bolshakov@yadro.com>
> Fixes: 0aabb6b699f7 ("scsi: qla2xxx: Fix Nport ID display value")
> Fixes: edd05de19759 ("scsi: qla2xxx: Changes to support N2N logins")
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/scsi/qla2xxx/qla_iocb.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)

Having followed the discussion between you and Roman, I guess this is
ok. However I'd like to understand better in what ways the N2N topology
was broken for you. After all, this patch affects only the LOGO
payload. Was it a logout / relogin issue? Were wrong ports being logged
out?

Regards,
Martin



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/4] qla2xxx: Check locking assumptions at runtime in qla2x00_abort_srb()
  2019-12-09 18:02 ` [PATCH 1/4] qla2xxx: Check locking assumptions at runtime in qla2x00_abort_srb() Bart Van Assche
@ 2019-12-10 11:06   ` Martin Wilck
  2019-12-10 19:39   ` Himanshu Madhani
  2019-12-11 19:03   ` Roman Bolshakov
  2 siblings, 0 replies; 37+ messages in thread
From: Martin Wilck @ 2019-12-10 11:06 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen, James E . J . Bottomley
  Cc: linux-scsi, Quinn Tran, Martin Wilck, Daniel Wagner, Roman Bolshakov

On Mon, 2019-12-09 at 10:02 -0800, Bart Van Assche wrote:
> Document the locking assumptions this function relies on and also
> verify these
> locking assumptions at runtime.
> 
> Cc: Quinn Tran <qutran@marvell.com>
> Cc: Martin Wilck <mwilck@suse.com>
> Cc: Daniel Wagner <dwagner@suse.de>
> Cc: Roman Bolshakov <r.bolshakov@yadro.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/scsi/qla2xxx/qla_os.c | 2 ++
>  1 file changed, 2 insertions(+)

Reviewed-by: Martin Wilck <mwilck@suse.com>



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/4] qla2xxx: Simplify the code for aborting SCSI commands
  2019-12-09 18:02 ` [PATCH 2/4] qla2xxx: Simplify the code for aborting SCSI commands Bart Van Assche
@ 2019-12-10 11:47   ` Martin Wilck
  2019-12-10 17:57     ` Bart Van Assche
  2019-12-12 16:01     ` Roman Bolshakov
  2019-12-10 19:41   ` Himanshu Madhani
  2019-12-12 16:04   ` Roman Bolshakov
  2 siblings, 2 replies; 37+ messages in thread
From: Martin Wilck @ 2019-12-10 11:47 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen, James E . J . Bottomley,
	Quinn Tran, Himanshu Madhani
  Cc: linux-scsi, Martin Wilck, Daniel Wagner, Roman Bolshakov

Hi Bart,

On Mon, 2019-12-09 at 10:02 -0800, Bart Van Assche wrote:
> Since the SCSI core does not reuse the tag of the SCSI command that
> is
> being aborted by .eh_abort() before .eh_abort() has finished it is
> not
> necessary to check from inside that callback whether or not the SCSI
> command
> has already completed. Instead, rely on the firmware to return an
> error code
> when attempting to abort a command that has already completed.
> Additionally,
> rely on the firmware to return an error code when attempting to abort
> an
> already aborted command.

Do we have evidence that the firmware can truly be relied upon in this
respect? It's at least not impossible that the FW (or some version of
it) relies on the driver not trying to abort commands that have been
aborted or completed already, and crashes if that assumption is
violated.

> 
> In qla2x00_abort_srb(), use blk_mq_request_started() instead of
> sp->completed and sp->aborted.

See below.

> This patch eliminates several race conditions triggered by the
> removed member
> variables.
> 
> Cc: Quinn Tran <qutran@marvell.com>
> Cc: Martin Wilck <mwilck@suse.com>
> Cc: Daniel Wagner <dwagner@suse.de>
> Cc: Roman Bolshakov <r.bolshakov@yadro.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/scsi/qla2xxx/qla_def.h |  3 ---
>  drivers/scsi/qla2xxx/qla_isr.c |  5 -----
>  drivers/scsi/qla2xxx/qla_os.c  | 15 ++-------------
>  3 files changed, 2 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_def.h
> b/drivers/scsi/qla2xxx/qla_def.h
> index 460f443f6471..ab7424318ee8 100644
> --- a/drivers/scsi/qla2xxx/qla_def.h
> +++ b/drivers/scsi/qla2xxx/qla_def.h
> @@ -597,9 +597,6 @@ typedef struct srb {
>  	struct fc_port *fcport;
>  	struct scsi_qla_host *vha;
>  	unsigned int start_timer:1;
> -	unsigned int abort:1;
> -	unsigned int aborted:1;
> -	unsigned int completed:1;
>  
>  	uint32_t handle;
>  	uint16_t flags;
> diff --git a/drivers/scsi/qla2xxx/qla_isr.c
> b/drivers/scsi/qla2xxx/qla_isr.c
> index 2601d7673c37..721a8d83e350 100644
> --- a/drivers/scsi/qla2xxx/qla_isr.c
> +++ b/drivers/scsi/qla2xxx/qla_isr.c
> @@ -2487,11 +2487,6 @@ qla2x00_status_entry(scsi_qla_host_t *vha,
> struct rsp_que *rsp, void *pkt)
>  		return;
>  	}
>  
> -	if (sp->abort)
> -		sp->aborted = 1;
> -	else
> -		sp->completed = 1;
> -
>  	if (sp->cmd_type != TYPE_SRB) {
>  		req->outstanding_cmds[handle] = NULL;
>  		ql_dbg(ql_dbg_io, vha, 0x3015,
> diff --git a/drivers/scsi/qla2xxx/qla_os.c
> b/drivers/scsi/qla2xxx/qla_os.c
> index 145ea93206f0..2231d99d311b 100644
> --- a/drivers/scsi/qla2xxx/qla_os.c
> +++ b/drivers/scsi/qla2xxx/qla_os.c
> @@ -1253,17 +1253,6 @@ qla2xxx_eh_abort(struct scsi_cmnd *cmd)
>  		return SUCCESS;
>  
>  	spin_lock_irqsave(qpair->qp_lock_ptr, flags);
> -	if (sp->completed) {
> -		spin_unlock_irqrestore(qpair->qp_lock_ptr, flags);
> -		return SUCCESS;
> -	}
> -
> -	if (sp->abort || sp->aborted) {
> -		spin_unlock_irqrestore(qpair->qp_lock_ptr, flags);
> -		return FAILED;
> -	}
> -
> -	sp->abort = 1;
>  	sp->comp = &comp;
>  	spin_unlock_irqrestore(qpair->qp_lock_ptr, flags);
>  
> @@ -1696,6 +1685,7 @@ static void qla2x00_abort_srb(struct qla_qpair
> *qp, srb_t *sp, const int res,
>  	DECLARE_COMPLETION_ONSTACK(comp);
>  	scsi_qla_host_t *vha = qp->vha;
>  	struct qla_hw_data *ha = vha->hw;
> +	struct scsi_cmnd *cmd = GET_CMD_SP(sp);
>  	int rval;
>  	bool ret_cmd;
>  	uint32_t ratov_j;
> @@ -1717,7 +1707,6 @@ static void qla2x00_abort_srb(struct qla_qpair
> *qp, srb_t *sp, const int res,
>  		}
>  
>  		sp->comp = &comp;
> -		sp->abort =  1;
>  		spin_unlock_irqrestore(qp->qp_lock_ptr, *flags);
>  
>  		rval = ha->isp_ops->abort_command(sp);
> @@ -1741,7 +1730,7 @@ static void qla2x00_abort_srb(struct qla_qpair
> *qp, srb_t *sp, const int res,
>  		}
>  
>  		spin_lock_irqsave(qp->qp_lock_ptr, *flags);
> -		if (ret_cmd && (!sp->completed || !sp->aborted))
> +		if (ret_cmd && blk_mq_request_started(cmd->request))
>  			sp->done(sp, res);
>  	} else {
>  		sp->done(sp, res);

blk_mq_request_started() returns true for requests in MQ_RQ_COMPLETE
state. Is this really an equivalent condition?

That said, the condition in the current code is sort of strange, as
it's equivalent to !(sp->completed && sp->aborted). I'm wondering what
it means if a command is both completed and aborted. Naïvely thinking
(and inferring from the current code) this condition could never be
met, and thus its negation would hold for every command. Perhaps Quinn
meant "!(sp->completed || sp->aborted)" ?

Regards,
Martin



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/4] qla2xxx: Simplify the code for aborting SCSI commands
  2019-12-10 11:47   ` Martin Wilck
@ 2019-12-10 17:57     ` Bart Van Assche
  2019-12-10 20:29       ` Martin Wilck
  2019-12-12 16:01     ` Roman Bolshakov
  1 sibling, 1 reply; 37+ messages in thread
From: Bart Van Assche @ 2019-12-10 17:57 UTC (permalink / raw)
  To: Martin Wilck, Martin K . Petersen, James E . J . Bottomley,
	Quinn Tran, Himanshu Madhani
  Cc: linux-scsi, Martin Wilck, Daniel Wagner, Roman Bolshakov

On 12/10/19 6:47 AM, Martin Wilck wrote:
> blk_mq_request_started() returns true for requests in MQ_RQ_COMPLETE
> state. Is this really an equivalent condition?
> 
> That said, the condition in the current code is sort of strange, as
> it's equivalent to !(sp->completed && sp->aborted). I'm wondering what
> it means if a command is both completed and aborted. Naïvely thinking
> (and inferring from the current code) this condition could never be
> met, and thus its negation would hold for every command. Perhaps Quinn
> meant "!(sp->completed || sp->aborted)" ?

Hi Martin,

The only caller of qla2x00_abort_srb() is qla2x00_abort_all_cmds(). That
function should only be called after completion interrupts have been
disabled. In other words, I don't think that we have to worry about
blk_mq_request_started() encountering the MQ_RQ_COMPLETE state. No
request should have that state when qla2x00_abort_all_cmds() is called.

Bart.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/4] qla2xxx: Fix point-to-point mode for qla25xx and older
  2019-12-10 10:52   ` Martin Wilck
@ 2019-12-10 18:00     ` Bart Van Assche
  2019-12-10 19:44       ` Martin Wilck
  2019-12-12 20:07     ` Roman Bolshakov
  1 sibling, 1 reply; 37+ messages in thread
From: Bart Van Assche @ 2019-12-10 18:00 UTC (permalink / raw)
  To: Martin Wilck, Martin K . Petersen, James E . J . Bottomley
  Cc: linux-scsi, Quinn Tran, Martin Wilck, Daniel Wagner, Roman Bolshakov

On 12/10/19 5:52 AM, Martin Wilck wrote:
> Hello Bart,
> 
> On Mon, 2019-12-09 at 10:02 -0800, Bart Van Assche wrote:
>> Restore point-to-point for qla25xx and older. Although this patch
>> initializes
>> a field (s_id) that has been marked as "reserved" in the firmware
>> manual, it
>> works fine on my setup.
>>
>> Cc: Quinn Tran <qutran@marvell.com>
>> Cc: Martin Wilck <mwilck@suse.com>
>> Cc: Daniel Wagner <dwagner@suse.de>
>> Cc: Roman Bolshakov <r.bolshakov@yadro.com>
>> Fixes: 0aabb6b699f7 ("scsi: qla2xxx: Fix Nport ID display value")
>> Fixes: edd05de19759 ("scsi: qla2xxx: Changes to support N2N logins")
>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>> ---
>>  drivers/scsi/qla2xxx/qla_iocb.c | 14 ++++++++++----
>>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> Having followed the discussion between you and Roman, I guess this is
> ok. However I'd like to understand better in what ways the N2N topology
> was broken for you. After all, this patch affects only the LOGO
> payload. Was it a logout / relogin issue? Were wrong ports being logged
> out?

Hi Martin,

Without this patch N2N login fails for 25xx adapters. With this patch
N2N login succeeds for 25xx adapters. You may have noticed that Martin
Petersen asked Himanshu for advice in another e-mail thread about how to
address this regression.

Bart.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/4] qla2xxx: Check locking assumptions at runtime in qla2x00_abort_srb()
  2019-12-09 18:02 ` [PATCH 1/4] qla2xxx: Check locking assumptions at runtime in qla2x00_abort_srb() Bart Van Assche
  2019-12-10 11:06   ` Martin Wilck
@ 2019-12-10 19:39   ` Himanshu Madhani
  2019-12-11 19:03   ` Roman Bolshakov
  2 siblings, 0 replies; 37+ messages in thread
From: Himanshu Madhani @ 2019-12-10 19:39 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen, James E . J . Bottomley
  Cc: linux-scsi, Quinn Tran, Martin Wilck, Daniel Wagner, Roman Bolshakov



On 12/9/19, 12:02 PM, "linux-scsi-owner@vger.kernel.org on behalf of Bart Van Assche" <linux-scsi-owner@vger.kernel.org on behalf of bvanassche@acm.org> wrote:

    Document the locking assumptions this function relies on and also verify these
    locking assumptions at runtime.
    
    Cc: Quinn Tran <qutran@marvell.com>
    Cc: Martin Wilck <mwilck@suse.com>
    Cc: Daniel Wagner <dwagner@suse.de>
    Cc: Roman Bolshakov <r.bolshakov@yadro.com>
    Signed-off-by: Bart Van Assche <bvanassche@acm.org>
    ---
     drivers/scsi/qla2xxx/qla_os.c | 2 ++
     1 file changed, 2 insertions(+)
    
    diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
    index 8b84bc4a6ac8..145ea93206f0 100644
    --- a/drivers/scsi/qla2xxx/qla_os.c
    +++ b/drivers/scsi/qla2xxx/qla_os.c
    @@ -1700,6 +1700,8 @@ static void qla2x00_abort_srb(struct qla_qpair *qp, srb_t *sp, const int res,
     	bool ret_cmd;
     	uint32_t ratov_j;
     
    +	lockdep_assert_held(qp->qp_lock_ptr);
    +
     	if (qla2x00_chip_is_down(vha)) {
     		sp->done(sp, res);
     		return;
    
Looks Good. 

Acked-by: Himanshu Madhani <hmadhani@marvell.com>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/4] qla2xxx: Fix point-to-point mode for qla25xx and older
  2019-12-09 18:02 ` [PATCH 3/4] qla2xxx: Fix point-to-point mode for qla25xx and older Bart Van Assche
  2019-12-10 10:52   ` Martin Wilck
@ 2019-12-10 19:40   ` Himanshu Madhani
  1 sibling, 0 replies; 37+ messages in thread
From: Himanshu Madhani @ 2019-12-10 19:40 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen, James E . J . Bottomley
  Cc: linux-scsi, Quinn Tran, Martin Wilck, Daniel Wagner, Roman Bolshakov



On 12/9/19, 12:03 PM, "linux-scsi-owner@vger.kernel.org on behalf of Bart Van Assche" <linux-scsi-owner@vger.kernel.org on behalf of bvanassche@acm.org> wrote:

    Restore point-to-point for qla25xx and older. Although this patch initializes
    a field (s_id) that has been marked as "reserved" in the firmware manual, it
    works fine on my setup.
    
    Cc: Quinn Tran <qutran@marvell.com>
    Cc: Martin Wilck <mwilck@suse.com>
    Cc: Daniel Wagner <dwagner@suse.de>
    Cc: Roman Bolshakov <r.bolshakov@yadro.com>
    Fixes: 0aabb6b699f7 ("scsi: qla2xxx: Fix Nport ID display value")
    Fixes: edd05de19759 ("scsi: qla2xxx: Changes to support N2N logins")
    Signed-off-by: Bart Van Assche <bvanassche@acm.org>
    ---
     drivers/scsi/qla2xxx/qla_iocb.c | 14 ++++++++++----
     1 file changed, 10 insertions(+), 4 deletions(-)
    
    diff --git a/drivers/scsi/qla2xxx/qla_iocb.c b/drivers/scsi/qla2xxx/qla_iocb.c
    index b25f87ff8cde..e2e91b3f2e65 100644
    --- a/drivers/scsi/qla2xxx/qla_iocb.c
    +++ b/drivers/scsi/qla2xxx/qla_iocb.c
    @@ -2656,10 +2656,16 @@ qla24xx_els_logo_iocb(srb_t *sp, struct els_entry_24xx *els_iocb)
     	els_iocb->port_id[0] = sp->fcport->d_id.b.al_pa;
     	els_iocb->port_id[1] = sp->fcport->d_id.b.area;
     	els_iocb->port_id[2] = sp->fcport->d_id.b.domain;
    -	/* For SID the byte order is different than DID */
    -	els_iocb->s_id[1] = vha->d_id.b.al_pa;
    -	els_iocb->s_id[2] = vha->d_id.b.area;
    -	els_iocb->s_id[0] = vha->d_id.b.domain;
    +	if (IS_QLA23XX(vha->hw) || IS_QLA24XX(vha->hw) || IS_QLA25XX(vha->hw)) {
    +		els_iocb->s_id[0] = vha->d_id.b.al_pa;
    +		els_iocb->s_id[1] = vha->d_id.b.area;
    +		els_iocb->s_id[2] = vha->d_id.b.domain;
    +	} else {
    +		/* For SID the byte order is different than DID */
    +		els_iocb->s_id[1] = vha->d_id.b.al_pa;
    +		els_iocb->s_id[2] = vha->d_id.b.area;
    +		els_iocb->s_id[0] = vha->d_id.b.domain;
    +	}
     
     	if (elsio->u.els_logo.els_cmd == ELS_DCMD_PLOGI) {
     		els_iocb->control_flags = 0;
    
Acked-by: Himanshu Madhani <hmadhani@marvell.com>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/4] qla2xxx: Simplify the code for aborting SCSI commands
  2019-12-09 18:02 ` [PATCH 2/4] qla2xxx: Simplify the code for aborting SCSI commands Bart Van Assche
  2019-12-10 11:47   ` Martin Wilck
@ 2019-12-10 19:41   ` Himanshu Madhani
  2019-12-12 16:04   ` Roman Bolshakov
  2 siblings, 0 replies; 37+ messages in thread
From: Himanshu Madhani @ 2019-12-10 19:41 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen, James E . J . Bottomley
  Cc: linux-scsi, Quinn Tran, Martin Wilck, Daniel Wagner, Roman Bolshakov



On 12/9/19, 12:02 PM, "linux-scsi-owner@vger.kernel.org on behalf of Bart Van Assche" <linux-scsi-owner@vger.kernel.org on behalf of bvanassche@acm.org> wrote:

    Since the SCSI core does not reuse the tag of the SCSI command that is
    being aborted by .eh_abort() before .eh_abort() has finished it is not
    necessary to check from inside that callback whether or not the SCSI command
    has already completed. Instead, rely on the firmware to return an error code
    when attempting to abort a command that has already completed. Additionally,
    rely on the firmware to return an error code when attempting to abort an
    already aborted command.
    
    In qla2x00_abort_srb(), use blk_mq_request_started() instead of
    sp->completed and sp->aborted.
    
    This patch eliminates several race conditions triggered by the removed member
    variables.
    
    Cc: Quinn Tran <qutran@marvell.com>
    Cc: Martin Wilck <mwilck@suse.com>
    Cc: Daniel Wagner <dwagner@suse.de>
    Cc: Roman Bolshakov <r.bolshakov@yadro.com>
    Signed-off-by: Bart Van Assche <bvanassche@acm.org>
    ---
     drivers/scsi/qla2xxx/qla_def.h |  3 ---
     drivers/scsi/qla2xxx/qla_isr.c |  5 -----
     drivers/scsi/qla2xxx/qla_os.c  | 15 ++-------------
     3 files changed, 2 insertions(+), 21 deletions(-)
    
    diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h
    index 460f443f6471..ab7424318ee8 100644
    --- a/drivers/scsi/qla2xxx/qla_def.h
    +++ b/drivers/scsi/qla2xxx/qla_def.h
    @@ -597,9 +597,6 @@ typedef struct srb {
     	struct fc_port *fcport;
     	struct scsi_qla_host *vha;
     	unsigned int start_timer:1;
    -	unsigned int abort:1;
    -	unsigned int aborted:1;
    -	unsigned int completed:1;
     
     	uint32_t handle;
     	uint16_t flags;
    diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
    index 2601d7673c37..721a8d83e350 100644
    --- a/drivers/scsi/qla2xxx/qla_isr.c
    +++ b/drivers/scsi/qla2xxx/qla_isr.c
    @@ -2487,11 +2487,6 @@ qla2x00_status_entry(scsi_qla_host_t *vha, struct rsp_que *rsp, void *pkt)
     		return;
     	}
     
    -	if (sp->abort)
    -		sp->aborted = 1;
    -	else
    -		sp->completed = 1;
    -
     	if (sp->cmd_type != TYPE_SRB) {
     		req->outstanding_cmds[handle] = NULL;
     		ql_dbg(ql_dbg_io, vha, 0x3015,
    diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
    index 145ea93206f0..2231d99d311b 100644
    --- a/drivers/scsi/qla2xxx/qla_os.c
    +++ b/drivers/scsi/qla2xxx/qla_os.c
    @@ -1253,17 +1253,6 @@ qla2xxx_eh_abort(struct scsi_cmnd *cmd)
     		return SUCCESS;
     
     	spin_lock_irqsave(qpair->qp_lock_ptr, flags);
    -	if (sp->completed) {
    -		spin_unlock_irqrestore(qpair->qp_lock_ptr, flags);
    -		return SUCCESS;
    -	}
    -
    -	if (sp->abort || sp->aborted) {
    -		spin_unlock_irqrestore(qpair->qp_lock_ptr, flags);
    -		return FAILED;
    -	}
    -
    -	sp->abort = 1;
     	sp->comp = &comp;
     	spin_unlock_irqrestore(qpair->qp_lock_ptr, flags);
     
    @@ -1696,6 +1685,7 @@ static void qla2x00_abort_srb(struct qla_qpair *qp, srb_t *sp, const int res,
     	DECLARE_COMPLETION_ONSTACK(comp);
     	scsi_qla_host_t *vha = qp->vha;
     	struct qla_hw_data *ha = vha->hw;
    +	struct scsi_cmnd *cmd = GET_CMD_SP(sp);
     	int rval;
     	bool ret_cmd;
     	uint32_t ratov_j;
    @@ -1717,7 +1707,6 @@ static void qla2x00_abort_srb(struct qla_qpair *qp, srb_t *sp, const int res,
     		}
     
     		sp->comp = &comp;
    -		sp->abort =  1;
     		spin_unlock_irqrestore(qp->qp_lock_ptr, *flags);
     
     		rval = ha->isp_ops->abort_command(sp);
    @@ -1741,7 +1730,7 @@ static void qla2x00_abort_srb(struct qla_qpair *qp, srb_t *sp, const int res,
     		}
     
     		spin_lock_irqsave(qp->qp_lock_ptr, *flags);
    -		if (ret_cmd && (!sp->completed || !sp->aborted))
    +		if (ret_cmd && blk_mq_request_started(cmd->request))
     			sp->done(sp, res);
     	} else {
     		sp->done(sp, res);
    
Acked-By: Himanshu Madhani <hmadhani@marvell.com>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/4] qla2xxx: Fix point-to-point mode for qla25xx and older
  2019-12-10 18:00     ` Bart Van Assche
@ 2019-12-10 19:44       ` Martin Wilck
  0 siblings, 0 replies; 37+ messages in thread
From: Martin Wilck @ 2019-12-10 19:44 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen, James E . J . Bottomley
  Cc: linux-scsi, Quinn Tran, Martin Wilck, Daniel Wagner, Roman Bolshakov

Hi Bart,

On Tue, 2019-12-10 at 13:00 -0500, Bart Van Assche wrote:
> On 12/10/19 5:52 AM, Martin Wilck wrote:
> > Hello Bart,
> > 
> > On Mon, 2019-12-09 at 10:02 -0800, Bart Van Assche wrote:
> > > Restore point-to-point for qla25xx and older. Although this patch
> > > initializes
> > > a field (s_id) that has been marked as "reserved" in the firmware
> > > manual, it
> > > works fine on my setup.
> > > 
> > > Cc: Quinn Tran <qutran@marvell.com>
> > > Cc: Martin Wilck <mwilck@suse.com>
> > > Cc: Daniel Wagner <dwagner@suse.de>
> > > Cc: Roman Bolshakov <r.bolshakov@yadro.com>
> > > Fixes: 0aabb6b699f7 ("scsi: qla2xxx: Fix Nport ID display value")
> > > Fixes: edd05de19759 ("scsi: qla2xxx: Changes to support N2N
> > > logins")
> > > Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> > > ---
> > >  drivers/scsi/qla2xxx/qla_iocb.c | 14 ++++++++++----
> > >  1 file changed, 10 insertions(+), 4 deletions(-)
> > 
> > Having followed the discussion between you and Roman, I guess this
> > is
> > ok. However I'd like to understand better in what ways the N2N
> > topology
> > was broken for you. After all, this patch affects only the LOGO
> > payload. Was it a logout / relogin issue? Were wrong ports being
> > logged
> > out?
> 
> Hi Martin,
> 
> Without this patch N2N login fails for 25xx adapters. With this patch
> N2N login succeeds for 25xx adapters. You may have noticed that
> Martin
> Petersen asked Himanshu for advice in another e-mail thread about how
> to
> address this regression.

Of course I did. It still kind of baffles me that a change in the LOGO
iocb causes LOGIN to fail, but whatever.

Martin



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/4] qla2xxx: Simplify the code for aborting SCSI commands
  2019-12-10 17:57     ` Bart Van Assche
@ 2019-12-10 20:29       ` Martin Wilck
  2019-12-10 20:45         ` Bart Van Assche
  0 siblings, 1 reply; 37+ messages in thread
From: Martin Wilck @ 2019-12-10 20:29 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen, James E . J . Bottomley,
	Quinn Tran, Himanshu Madhani
  Cc: linux-scsi, Martin Wilck, Daniel Wagner, Roman Bolshakov

On Tue, 2019-12-10 at 12:57 -0500, Bart Van Assche wrote:
> On 12/10/19 6:47 AM, Martin Wilck wrote:
> > blk_mq_request_started() returns true for requests in
> > MQ_RQ_COMPLETE
> > state. Is this really an equivalent condition?
> > 
> > That said, the condition in the current code is sort of strange, as
> > it's equivalent to !(sp->completed && sp->aborted). I'm wondering
> > what
> > it means if a command is both completed and aborted. Naïvely
> > thinking
> > (and inferring from the current code) this condition could never be
> > met, and thus its negation would hold for every command. Perhaps
> > Quinn
> > meant "!(sp->completed || sp->aborted)" ?
> 
> Hi Martin,
> 
> The only caller of qla2x00_abort_srb() is qla2x00_abort_all_cmds().
> That
> function should only be called after completion interrupts have been
> disabled. In other words, I don't think that we have to worry about
> blk_mq_request_started() encountering the MQ_RQ_COMPLETE state. No
> request should have that state when qla2x00_abort_all_cmds() is
> called.

I thought avoiding a race between completion and abort was the whole
point of f45bca8c5052 ("scsi: qla2xxx: Fix double scsi_done for abort
path"), which introduced the code that you're now changing. But I must
be overlooking something then, as Himanshu has acked this.

Martin




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/4] qla2xxx: Simplify the code for aborting SCSI commands
  2019-12-10 20:29       ` Martin Wilck
@ 2019-12-10 20:45         ` Bart Van Assche
  2019-12-11 10:26           ` Martin Wilck
  0 siblings, 1 reply; 37+ messages in thread
From: Bart Van Assche @ 2019-12-10 20:45 UTC (permalink / raw)
  To: Martin Wilck, Martin K . Petersen, James E . J . Bottomley,
	Quinn Tran, Himanshu Madhani
  Cc: linux-scsi, Martin Wilck, Daniel Wagner, Roman Bolshakov

On 12/10/19 3:29 PM, Martin Wilck wrote:
> On Tue, 2019-12-10 at 12:57 -0500, Bart Van Assche wrote:
>> On 12/10/19 6:47 AM, Martin Wilck wrote:
>>> blk_mq_request_started() returns true for requests in
>>> MQ_RQ_COMPLETE
>>> state. Is this really an equivalent condition?
>>>
>>> That said, the condition in the current code is sort of strange, as
>>> it's equivalent to !(sp->completed && sp->aborted). I'm wondering
>>> what
>>> it means if a command is both completed and aborted. Naïvely
>>> thinking
>>> (and inferring from the current code) this condition could never be
>>> met, and thus its negation would hold for every command. Perhaps
>>> Quinn
>>> meant "!(sp->completed || sp->aborted)" ?
>>
>> Hi Martin,
>>
>> The only caller of qla2x00_abort_srb() is qla2x00_abort_all_cmds().
>> That
>> function should only be called after completion interrupts have been
>> disabled. In other words, I don't think that we have to worry about
>> blk_mq_request_started() encountering the MQ_RQ_COMPLETE state. No
>> request should have that state when qla2x00_abort_all_cmds() is
>> called.
> 
> I thought avoiding a race between completion and abort was the whole
> point of f45bca8c5052 ("scsi: qla2xxx: Fix double scsi_done for abort
> path"), which introduced the code that you're now changing. But I must
> be overlooking something then, as Himanshu has acked this.

Hi Martin,

My understanding of commit f45bca8c5052 ("scsi: qla2xxx: Fix double
scsi_done for abort path") is as follows:
* A long time ago a scsi_done() call was introduced in
qla2xxx_eh_abort(). Maybe this commit introduced that call: 083a469db4ec
("[SCSI] qla2xxx: Correct use-after-free oops seen during EH-abort.").
* Calling scsi_done() from qla2xxx_eh_abort() is only fine if the
firmware does not report a completion after having processed an abort
request. My conclusion from commit f45bca8c5052 is that the firmware
does report a completion after having processed an abort request. Hence
the removal of that scsi_done call from qla2xxx_eh_abort().

Bart.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/4] qla2xxx: Simplify the code for aborting SCSI commands
  2019-12-10 20:45         ` Bart Van Assche
@ 2019-12-11 10:26           ` Martin Wilck
  0 siblings, 0 replies; 37+ messages in thread
From: Martin Wilck @ 2019-12-11 10:26 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen, James E . J . Bottomley,
	Quinn Tran, Himanshu Madhani
  Cc: linux-scsi, Martin Wilck, Daniel Wagner, Roman Bolshakov

Hi Bart,

On Tue, 2019-12-10 at 15:45 -0500, Bart Van Assche wrote:
> On 12/10/19 3:29 PM, Martin Wilck wrote:
> > On Tue, 2019-12-10 at 12:57 -0500, Bart Van Assche wrote:
> > > On 12/10/19 6:47 AM, Martin Wilck wrote:
> > > > blk_mq_request_started() returns true for requests in
> > > > MQ_RQ_COMPLETE
> > > > state. Is this really an equivalent condition?
> > > > 
> > > > That said, the condition in the current code is sort of
> > > > strange, as
> > > > it's equivalent to !(sp->completed && sp->aborted). I'm
> > > > wondering
> > > > what
> > > > it means if a command is both completed and aborted. Naïvely
> > > > thinking
> > > > (and inferring from the current code) this condition could
> > > > never be
> > > > met, and thus its negation would hold for every command.
> > > > Perhaps
> > > > Quinn
> > > > meant "!(sp->completed || sp->aborted)" ?
> > > 
> > > Hi Martin,
> > > 
> > > The only caller of qla2x00_abort_srb() is
> > > qla2x00_abort_all_cmds().
> > > That
> > > function should only be called after completion interrupts have
> > > been
> > > disabled. In other words, I don't think that we have to worry
> > > about
> > > blk_mq_request_started() encountering the MQ_RQ_COMPLETE state.
> > > No
> > > request should have that state when qla2x00_abort_all_cmds() is
> > > called.
> > 
> > I thought avoiding a race between completion and abort was the
> > whole
> > point of f45bca8c5052 ("scsi: qla2xxx: Fix double scsi_done for
> > abort
> > path"), which introduced the code that you're now changing. But I
> > must
> > be overlooking something then, as Himanshu has acked this.
> 
> Hi Martin,
> 
> My understanding of commit f45bca8c5052 ("scsi: qla2xxx: Fix double
> scsi_done for abort path") is as follows:
> * A long time ago a scsi_done() call was introduced in
> qla2xxx_eh_abort(). Maybe this commit introduced that call:
> 083a469db4ec
> ("[SCSI] qla2xxx: Correct use-after-free oops seen during EH-
> abort.").
> * Calling scsi_done() from qla2xxx_eh_abort() is only fine if the
> firmware does not report a completion after having processed an abort
> request. My conclusion from commit f45bca8c5052 is that the firmware
> does report a completion after having processed an abort request.
> Hence
> the removal of that scsi_done call from qla2xxx_eh_abort().

I think it depends on the scenario in which commands are aborted.
For "regular" aborts, you're certainly right. But in certain scenarios,
e.g. controller removal or module unloading, the current driver shuts
down the firmware early in the shutdown process, and once the firmware
is stopped, I suppose it will not report completions any more.

Whether it was a good idea in the first place to shut down the FW so
early is a different issue, which I tried to adress with the 2 patches
I submitted on Nov. 29th.

Regards,
Martin



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/4] qla2xxx: Check locking assumptions at runtime in qla2x00_abort_srb()
  2019-12-09 18:02 ` [PATCH 1/4] qla2xxx: Check locking assumptions at runtime in qla2x00_abort_srb() Bart Van Assche
  2019-12-10 11:06   ` Martin Wilck
  2019-12-10 19:39   ` Himanshu Madhani
@ 2019-12-11 19:03   ` Roman Bolshakov
  2 siblings, 0 replies; 37+ messages in thread
From: Roman Bolshakov @ 2019-12-11 19:03 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Martin K . Petersen, James E . J . Bottomley, linux-scsi,
	Quinn Tran, Martin Wilck, Daniel Wagner

On Mon, Dec 09, 2019 at 10:02:20AM -0800, Bart Van Assche wrote:
> Document the locking assumptions this function relies on and also verify these
> locking assumptions at runtime.
> 
> Cc: Quinn Tran <qutran@marvell.com>
> Cc: Martin Wilck <mwilck@suse.com>
> Cc: Daniel Wagner <dwagner@suse.de>
> Cc: Roman Bolshakov <r.bolshakov@yadro.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/scsi/qla2xxx/qla_os.c | 2 ++
>  1 file changed, 2 insertions(+)
> 

Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/4] qla2xxx: Simplify the code for aborting SCSI commands
  2019-12-10 11:47   ` Martin Wilck
  2019-12-10 17:57     ` Bart Van Assche
@ 2019-12-12 16:01     ` Roman Bolshakov
  1 sibling, 0 replies; 37+ messages in thread
From: Roman Bolshakov @ 2019-12-12 16:01 UTC (permalink / raw)
  To: Martin Wilck
  Cc: Bart Van Assche, Martin K . Petersen, James E . J . Bottomley,
	Quinn Tran, Himanshu Madhani, linux-scsi, Martin Wilck,
	Daniel Wagner

On Tue, Dec 10, 2019 at 12:47:57PM +0100, Martin Wilck wrote:
> Hi Bart,
> 
> On Mon, 2019-12-09 at 10:02 -0800, Bart Van Assche wrote:
> > Since the SCSI core does not reuse the tag of the SCSI command that
> > is
> > being aborted by .eh_abort() before .eh_abort() has finished it is
> > not
> > necessary to check from inside that callback whether or not the SCSI
> > command
> > has already completed. Instead, rely on the firmware to return an
> > error code
> > when attempting to abort a command that has already completed.
> > Additionally,
> > rely on the firmware to return an error code when attempting to abort
> > an
> > already aborted command.
> 
> Do we have evidence that the firmware can truly be relied upon in this
> respect? It's at least not impossible that the FW (or some version of
> it) relies on the driver not trying to abort commands that have been
> aborted or completed already, and crashes if that assumption is
> violated.
> 

Hi Martin,

IMO yes, FW spec defines the behavior mentioned by Bart in the commit
message. So an Abort IOCB is going to be completed by FW (including an
error if the command was already aborted) or timed out if FW is down.

Thanks,
Roman

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/4] qla2xxx: Simplify the code for aborting SCSI commands
  2019-12-09 18:02 ` [PATCH 2/4] qla2xxx: Simplify the code for aborting SCSI commands Bart Van Assche
  2019-12-10 11:47   ` Martin Wilck
  2019-12-10 19:41   ` Himanshu Madhani
@ 2019-12-12 16:04   ` Roman Bolshakov
  2 siblings, 0 replies; 37+ messages in thread
From: Roman Bolshakov @ 2019-12-12 16:04 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Martin K . Petersen, James E . J . Bottomley, linux-scsi,
	Quinn Tran, Martin Wilck, Daniel Wagner

On Mon, Dec 09, 2019 at 10:02:21AM -0800, Bart Van Assche wrote:
> Since the SCSI core does not reuse the tag of the SCSI command that is
> being aborted by .eh_abort() before .eh_abort() has finished it is not
> necessary to check from inside that callback whether or not the SCSI command
> has already completed. Instead, rely on the firmware to return an error code
> when attempting to abort a command that has already completed. Additionally,
> rely on the firmware to return an error code when attempting to abort an
> already aborted command.
> 
> In qla2x00_abort_srb(), use blk_mq_request_started() instead of
> sp->completed and sp->aborted.
> 
> This patch eliminates several race conditions triggered by the removed member
> variables.
> 

Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>

Thanks,
Roman

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/4] qla2xxx: Fix point-to-point mode for qla25xx and older
  2019-12-10 10:52   ` Martin Wilck
  2019-12-10 18:00     ` Bart Van Assche
@ 2019-12-12 20:07     ` Roman Bolshakov
  2019-12-14  1:04       ` Roman Bolshakov
  1 sibling, 1 reply; 37+ messages in thread
From: Roman Bolshakov @ 2019-12-12 20:07 UTC (permalink / raw)
  To: Martin Wilck
  Cc: Bart Van Assche, Martin K . Petersen, James E . J . Bottomley,
	linux-scsi, Quinn Tran, Martin Wilck, Daniel Wagner

On Tue, Dec 10, 2019 at 11:52:11AM +0100, Martin Wilck wrote:
> Hello Bart,
> 
> On Mon, 2019-12-09 at 10:02 -0800, Bart Van Assche wrote:
> > Restore point-to-point for qla25xx and older. Although this patch
> > initializes
> > a field (s_id) that has been marked as "reserved" in the firmware
> > manual, it
> > works fine on my setup.
> > 
> > Cc: Quinn Tran <qutran@marvell.com>
> > Cc: Martin Wilck <mwilck@suse.com>
> > Cc: Daniel Wagner <dwagner@suse.de>
> > Cc: Roman Bolshakov <r.bolshakov@yadro.com>
> > Fixes: 0aabb6b699f7 ("scsi: qla2xxx: Fix Nport ID display value")
> > Fixes: edd05de19759 ("scsi: qla2xxx: Changes to support N2N logins")
> > Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> > ---
> >  drivers/scsi/qla2xxx/qla_iocb.c | 14 ++++++++++----
> >  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> Having followed the discussion between you and Roman, I guess this is
> ok. However I'd like to understand better in what ways the N2N topology
> was broken for you. After all, this patch affects only the LOGO
> payload. Was it a logout / relogin issue? Were wrong ports being logged
> out?
> 
> Regards,
> Martin
> 
> 

Hi Martin,

I think your concern is indeed valid. I didn't have a chance to test the
P2P target on QLE2562 earlier. Here's the log that shows the issue
Bart is experiencing:

[  179.925639] qla2xxx [0000:00:05.0]-1959:0: Entered qla24xx_get_port_login_templ.
[  179.925689] qla2xxx [0000:00:05.0]-1800:0: Entered qla2x00_mailbox_command.
[  179.925731] qla2xxx [0000:00:05.0]-1806:0: Prepare to issue mbox cmd=0x5a.
[  179.925771] qla2xxx [0000:00:05.0]-1911:0: Mailbox registers (OUT):
[  179.925813] qla2xxx [0000:00:05.0]-1912:0: mbox[0]<-0x005a
[  179.925856] qla2xxx [0000:00:05.0]-1912:0: mbox[1]<-0x0700
[  179.925897] qla2xxx [0000:00:05.0]-1912:0: mbox[2]<-0x3000
[  179.925939] qla2xxx [0000:00:05.0]-1912:0: mbox[3]<-0x0000
[  179.925981] qla2xxx [0000:00:05.0]-1912:0: mbox[4]<-0x0003
[  179.926021] qla2xxx [0000:00:05.0]-1912:0: mbox[5]<-0x0000
[  179.926063] qla2xxx [0000:00:05.0]-1912:0: mbox[6]<-0x0000
[  179.926105] qla2xxx [0000:00:05.0]-1912:0: mbox[7]<-0x0000
[  179.926147] qla2xxx [0000:00:05.0]-1912:0: mbox[8]<-0x001d
[  179.926188] qla2xxx [0000:00:05.0]-1917:0: I/O Address = 00000000167c2ddb.
[  179.926230] qla2xxx [0000:00:05.0]-180f:0: Going to unlock irq & waiting for interrupts. jiffies=ffffd115.
[  179.926367] qla2xxx [0000:00:05.0]-1814:0: Cmd=5a completed.
[  179.926413] qla2xxx [0000:00:05.0]-1913:0: Mailbox registers (IN):
[  179.926456] qla2xxx [0000:00:05.0]-1914:0: mbox[0]->0x4000
[  179.926489] qla2xxx [0000:00:05.0]-1914:0: mbox[1]->0x0700
[  179.926522] qla2xxx [0000:00:05.0]-1821:0: Done qla2x00_mailbox_command.
[  179.926566] qla2xxx [0000:00:05.0]-195b:0: Done qla24xx_get_port_login_templ.
[  179.926617] qla2xxx [0000:00:05.0]-28d8:0: qla24xx_fcport_handle_login 21:00:00:24:ff:7f:35:c7 DS 0 LS 4 P 0 fl 0 confl 00000000f1f36df6 rscn 0|0 login 1 lid 0 scan 2
[  179.926709] qla2xxx [0000:00:05.0]-2869:0: LOOP READY.
[  179.926751] qla2xxx [0000:00:05.0]-286b:0: qla2x00_configure_loop: exiting normally.
[  179.926801] qla2xxx [0000:00:05.0]-4810:0: Loop resync end.
[  179.926836] qla2xxx [0000:00:05.0]-4800:0: DPC handler sleeping.
[  179.926879] qla2xxx [0000:00:05.0]-28d9:0: Async-gnlist WWPN 21:00:00:24:ff:7f:35:c7
[  179.926928] qla2xxx [0000:00:05.0]-28da:0: Async-gnlist - OUT WWPN 21:00:00:24:ff:7f:35:c7 hndl 0
[  179.928148] qla2xxx [0000:00:05.0]-28e7:0: Async done-gnlist res 0 mb[1]=50 mb[2]=301a
[  179.928222] qla2xxx [0000:00:05.0]-28e8:0: qla24xx_async_gnl_sp_done 21:00:00:24:ff:7f:35:c7 00:00:01 CLS 4/4 lid 0
[  179.928295] qla2xxx [0000:00:05.0]-28e8:0: qla24xx_async_gnl_sp_done 21:00:00:24:ff:7f:35:c7 ff:ff:fe CLS 4/4 lid 7fe
[  179.928368] qla2xxx [0000:00:05.0]-107ff:0: qla24xx_handle_gnl_done_event 21:00:00:24:ff:7f:35:c7 DS 2 LS rc 4 0 login 1|1 rscn 0|0 lid 0
[  179.928449] qla2xxx [0000:00:05.0]-28e1:0: qla24xx_handle_gnl_done_event 725 21:00:00:24:ff:7f:35:c7 n 2 000001 lid 0
[  179.928520] qla2xxx [0000:00:05.0]-28e2:0: qla24xx_handle_gnl_done_event found 21:00:00:24:ff:7f:35:c7 CLS [4|4] fc4_type 1 ID[000001|000001] lid[0|0]
[  179.928611] qla2xxx [0000:00:05.0]-28d8:0: qla24xx_fcport_handle_login 21:00:00:24:ff:7f:35:c7 DS 2 LS 4 P 0 fl 0 confl 00000000f1f36df6 rscn 0|0 login 1 lid 0 scan 2

If discovery state is DSC_GNL and fcport->current_login_state & 0xf ==
6, target tries to do PRLI (which should be prohibited because
qlini_mode=disabled on the machine and scsi_host active_mode is
"Target"):

[  179.928706] qla2xxx [0000:00:05.0]-2918:0: qla24xx_fcport_handle_login 1595 21:00:00:24:ff:7f:35:c7 post FC PRLI
[  179.928828] qla2xxx [0000:00:05.0]-291b:0: Async-prli - 21:00:00:24:ff:7f:35:c7 hdl=0, loopid=0 portid=000001 retries=30 fc.
[  179.928969] qla2xxx [0000:00:05.0]-5836:0: Async-prli complete - 21:00:00:24:ff:7f:35:c7 hdl=4 portid=000001 iop0=422.
[  179.929053] qla2xxx [0000:00:05.0]-2929:0: qla2x00_async_prli_sp_done 21:00:00:24:ff:7f:35:c7 res 0
[  179.929118] qla2xxx [0000:00:05.0]-2918:0: qla24xx_handle_prli_done_event 1872 21:00:00:24:ff:7f:35:c7 post gpdb
[  179.929224] qla2xxx [0000:00:05.0]-28dc:0: Async-gpdb 21:00:00:24:ff:7f:35:c7 hndl 0 opt 0
[  179.929320] qla2xxx [0000:00:05.0]-28db:0: Async done-gpdb res 0, WWPN 21:00:00:24:ff:7f:35:c7 mb[1]=0 mb[2]=3011
[  179.929394] qla2xxx [0000:00:05.0]-28d2:0: qla24xx_handle_gpdb_event 21:00:00:24:ff:7f:35:c7 DS 5 LS 6 fc4_type 1 rc 0
[  179.929490] qla2xxx [0000:00:05.0]-28ef:0: qla2x00_update_fcport 21:00:00:24:ff:7f:35:c7

qla2x00_update_fcport sets discovery state to DSC_LOGIN_COMPLETE.

[  179.929720] qla2xxx [0000:00:05.0]-f806:0: Adding sess 000000005ec6d065 se_sess 00000000f0daac7c  to tgt 00000000dde86879 sess_count 1
[  179.929803] qla2xxx [0000:00:05.0]-f84b:0: qla_target(0): session for wwn 21:00:00:24:ff:7f:35:c7 (loop_id 0, s_id 0:0:1, confirmed completion not supported) added
[  179.929900] qla2xxx [0000:00:05.0]-287d:0: FCPort 21:00:00:24:ff:7f:35:c7 state transitioned from UNCONFIGURED to ONLINE - portid=000001.
[  180.773031] qla2xxx [0000:00:05.0]-e872:0: qlt_24xx_atio_pkt_all_vps: qla_target(0): type d ox_id 0000
[  180.773123] qla2xxx [0000:00:05.0]-e82e:0: IMMED_NOTIFY ATIO
[  180.773126] qla2xxx [0000:00:05.0]-f826:0: qla_target(0): Port ID: 00:00:01 ELS opcode: 0x03 lid 0 21:00:00:24:ff:7f:35:c7
[  180.773243] qla2xxx [0000:00:05.0]-f897:0: Linking sess 000000005ec6d065 [0] wwn 21:00:00:24:ff:7f:35:c7 with PLOGI ACK to wwn 21:00:00:24:ff:7f:35:c7 s_id 00:00:01, ref=1 pla 00000000e7b769c4 link 0
[  180.773354] qla2xxx [0000:00:05.0]-28f9:0: qlt_handle_login 4802 21:00:00:24:ff:7f:35:c7  DS 7

So, if Discovery State is DSC_LOGIN_COMPLETE, let's nuke the session:

[  180.773413] qla2xxx [0000:00:05.0]-28f9:0: qlt_handle_login 4834 21:00:00:24:ff:7f:35:c7 post del sess
[  180.773471] qla2xxx [0000:00:05.0]-e801:0: Scheduling sess 000000005ec6d065 for deletion 21:00:00:24:ff:7f:35:c7
[  180.773544] qla2xxx [0000:00:05.0]-f826:0: qla_target(0): Exit ELS opcode: 0x03 res 0
[  180.773619] qla2xxx [0000:00:05.0]-290a:0: qlt_unreg_sess sess 000000005ec6d065 for deletion 21:00:00:24:ff:7f:35:c7
[  180.773692] qla2xxx [0000:00:05.0]-f884:0: qlt_free_session_done: se_sess 00000000f0daac7c / sess 000000005ec6d065 from port 21:00:00:24:ff:7f:35:c7 loop_id 0x00 s_id 00:00:01 logout 1 keep 1 els_logo 0
[  180.773800] qla2xxx [0000:00:05.0]-287d:0: FCPort 21:00:00:24:ff:7f:35:c7 state transitioned from ONLINE to LOST - portid=000001.
[  180.773886] qla2xxx [0000:00:05.0]-2870:0: Async-logout - hdl=0 loop-id=0 portid=000001 21:00:00:24:ff:7f:35:c7.

And here's the LOGO you was worried about:

[  180.774077] qla2xxx [0000:00:05.0]-5836:0: Async-logout complete - 21:00:00:24:ff:7f:35:c7 hdl=6 portid=000001 iop0=0.
[  180.774150] qla2xxx [0000:00:05.0]-f893:0: qlt_logo_completion_handler: se_sess 00000000f0daac7c / sess 000000005ec6d065 from port 21:00:00:24:ff:7f:35:c7 loop_id 0x00 s_id 00:00:01 LOGO failed: 0x0
[  180.829063] qla2xxx [0000:00:05.0]-f887:0: qlt_free_session_done: sess 000000005ec6d065 logout completed
[  180.829182] qla2xxx [0000:00:05.0]-f89a:0: se_sess 00000000f1f36df6 / sess 000000005ec6d065 port 21:00:00:24:ff:7f:35:c7 is gone, releasing own PLOGI (ref=1)
[  180.829276] qla2xxx [0000:00:05.0]-5889:0: Sending PLOGI ACK to wwn 21:00:00:24:ff:7f:35:c7 s_id 00:00:01 loop_id 0x00 exch 0x1155f0 ox_id 0x4
[  180.829364] qla2xxx [0000:00:05.0]-f801:0: Unregistration of sess 000000005ec6d065 21:00:00:24:ff:7f:35:c7 finished fcp_cnt 0
[  180.829445] qla2xxx [0000:00:05.0]-28f4:0: Async-nack 21:00:00:24:ff:7f:35:c7 hndl 0 PLOGI
[  180.829566] qla2xxx [0000:00:05.0]-28f2:0: Async done-nack res 0 21:00:00:24:ff:7f:35:c7  type 16

Then initiator sends a PLOGI:

[  181.810543] qla2xxx [0000:00:05.0]-e872:0: qlt_24xx_atio_pkt_all_vps: qla_target(0): type d ox_id 0000
[  181.810641] qla2xxx [0000:00:05.0]-e82e:0: IMMED_NOTIFY ATIO
[  181.810645] qla2xxx [0000:00:05.0]-f826:0: qla_target(0): Port ID: 00:00:01 ELS opcode: 0x03 lid 0 21:00:00:24:ff:7f:35:c7
[  181.810785] qla2xxx [0000:00:05.0]-f897:0: Linking sess 000000005ec6d065 [0] wwn 21:00:00:24:ff:7f:35:c7 with PLOGI ACK to wwn 21:00:00:24:ff:7f:35:c7 s_id 00:00:01, ref=1 pla 00000000eebd5532 link 0
[  181.810918] qla2xxx [0000:00:05.0]-28f9:0: qlt_handle_login 4802 21:00:00:24:ff:7f:35:c7  DS 0
[  181.810992] qla2xxx [0000:00:05.0]-5889:0: Sending PLOGI ACK to wwn 21:00:00:24:ff:7f:35:c7 s_id 00:00:01 loop_id 0x00 exch 0x115620 ox_id 0x5
[  181.811106] qla2xxx [0000:00:05.0]-f826:0: qla_target(0): Exit ELS opcode: 0x03 res 0
[  181.811251] qla2xxx [0000:00:05.0]-28f4:0: Async-nack 21:00:00:24:ff:7f:35:c7 hndl 0 PLOGI
[  181.811361] qla2xxx [0000:00:05.0]-28f2:0: Async done-nack res 0 21:00:00:24:ff:7f:35:c7  type 16

The block of messages just above can be seen a few more times which
means initiator sends multiple PLOGI requests.

I'll try to figure out the reason why qla24xx_fcport_handle_login
doesn't quit immediately.

Thanks,
Roman

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/4] qla2xxx: Fix point-to-point mode for qla25xx and older
  2019-12-12 20:07     ` Roman Bolshakov
@ 2019-12-14  1:04       ` Roman Bolshakov
  2019-12-15  0:09         ` Bart Van Assche
  2019-12-16  8:37         ` Daniel Wagner
  0 siblings, 2 replies; 37+ messages in thread
From: Roman Bolshakov @ 2019-12-14  1:04 UTC (permalink / raw)
  To: Martin Wilck, Bart Van Assche
  Cc: Martin K . Petersen, James E . J . Bottomley, linux-scsi,
	Quinn Tran, Martin Wilck, Daniel Wagner, linux

On Thu, Dec 12, 2019 at 11:07:20PM +0300, Roman Bolshakov wrote:
> On Tue, Dec 10, 2019 at 11:52:11AM +0100, Martin Wilck wrote:
> > On Mon, 2019-12-09 at 10:02 -0800, Bart Van Assche wrote:
> > > Restore point-to-point for qla25xx and older. Although this patch
> > > initializes
> > > a field (s_id) that has been marked as "reserved" in the firmware
> > > manual, it
> > > works fine on my setup.
> > > 
> > 
> > Having followed the discussion between you and Roman, I guess this is
> > ok. However I'd like to understand better in what ways the N2N topology
> > was broken for you. After all, this patch affects only the LOGO
> > payload. Was it a logout / relogin issue? Were wrong ports being logged
> > out?
> > 
> 
> I think your concern is indeed valid. I didn't have a chance to test the
> P2P target on QLE2562 earlier. Here's the log that shows the issue
> Bart is experiencing:
> 
> [  179.925639] qla2xxx [0000:00:05.0]-1959:0: Entered qla24xx_get_port_login_templ.
> [  179.925689] qla2xxx [0000:00:05.0]-1800:0: Entered qla2x00_mailbox_command.
> [  179.925731] qla2xxx [0000:00:05.0]-1806:0: Prepare to issue mbox cmd=0x5a.
> [  179.925771] qla2xxx [0000:00:05.0]-1911:0: Mailbox registers (OUT):
> [  179.925813] qla2xxx [0000:00:05.0]-1912:0: mbox[0]<-0x005a
> [  179.925856] qla2xxx [0000:00:05.0]-1912:0: mbox[1]<-0x0700
> [  179.925897] qla2xxx [0000:00:05.0]-1912:0: mbox[2]<-0x3000
> [  179.925939] qla2xxx [0000:00:05.0]-1912:0: mbox[3]<-0x0000
> [  179.925981] qla2xxx [0000:00:05.0]-1912:0: mbox[4]<-0x0003
> [  179.926021] qla2xxx [0000:00:05.0]-1912:0: mbox[5]<-0x0000
> [  179.926063] qla2xxx [0000:00:05.0]-1912:0: mbox[6]<-0x0000
> [  179.926105] qla2xxx [0000:00:05.0]-1912:0: mbox[7]<-0x0000
> [  179.926147] qla2xxx [0000:00:05.0]-1912:0: mbox[8]<-0x001d
> [  179.926188] qla2xxx [0000:00:05.0]-1917:0: I/O Address = 00000000167c2ddb.
> [  179.926230] qla2xxx [0000:00:05.0]-180f:0: Going to unlock irq & waiting for interrupts. jiffies=ffffd115.
> [  179.926367] qla2xxx [0000:00:05.0]-1814:0: Cmd=5a completed.
> [  179.926413] qla2xxx [0000:00:05.0]-1913:0: Mailbox registers (IN):
> [  179.926456] qla2xxx [0000:00:05.0]-1914:0: mbox[0]->0x4000
> [  179.926489] qla2xxx [0000:00:05.0]-1914:0: mbox[1]->0x0700
> [  179.926522] qla2xxx [0000:00:05.0]-1821:0: Done qla2x00_mailbox_command.
> [  179.926566] qla2xxx [0000:00:05.0]-195b:0: Done qla24xx_get_port_login_templ.
> [  179.926617] qla2xxx [0000:00:05.0]-28d8:0: qla24xx_fcport_handle_login 21:00:00:24:ff:7f:35:c7 DS 0 LS 4 P 0 fl 0 confl 00000000f1f36df6 rscn 0|0 login 1 lid 0 scan 2
> [  179.926709] qla2xxx [0000:00:05.0]-2869:0: LOOP READY.
> [  179.926751] qla2xxx [0000:00:05.0]-286b:0: qla2x00_configure_loop: exiting normally.
> [  179.926801] qla2xxx [0000:00:05.0]-4810:0: Loop resync end.
> [  179.926836] qla2xxx [0000:00:05.0]-4800:0: DPC handler sleeping.
> [  179.926879] qla2xxx [0000:00:05.0]-28d9:0: Async-gnlist WWPN 21:00:00:24:ff:7f:35:c7
> [  179.926928] qla2xxx [0000:00:05.0]-28da:0: Async-gnlist - OUT WWPN 21:00:00:24:ff:7f:35:c7 hndl 0
> [  179.928148] qla2xxx [0000:00:05.0]-28e7:0: Async done-gnlist res 0 mb[1]=50 mb[2]=301a
> [  179.928222] qla2xxx [0000:00:05.0]-28e8:0: qla24xx_async_gnl_sp_done 21:00:00:24:ff:7f:35:c7 00:00:01 CLS 4/4 lid 0
> [  179.928295] qla2xxx [0000:00:05.0]-28e8:0: qla24xx_async_gnl_sp_done 21:00:00:24:ff:7f:35:c7 ff:ff:fe CLS 4/4 lid 7fe
> [  179.928368] qla2xxx [0000:00:05.0]-107ff:0: qla24xx_handle_gnl_done_event 21:00:00:24:ff:7f:35:c7 DS 2 LS rc 4 0 login 1|1 rscn 0|0 lid 0
> [  179.928449] qla2xxx [0000:00:05.0]-28e1:0: qla24xx_handle_gnl_done_event 725 21:00:00:24:ff:7f:35:c7 n 2 000001 lid 0
> [  179.928520] qla2xxx [0000:00:05.0]-28e2:0: qla24xx_handle_gnl_done_event found 21:00:00:24:ff:7f:35:c7 CLS [4|4] fc4_type 1 ID[000001|000001] lid[0|0]
> [  179.928611] qla2xxx [0000:00:05.0]-28d8:0: qla24xx_fcport_handle_login 21:00:00:24:ff:7f:35:c7 DS 2 LS 4 P 0 fl 0 confl 00000000f1f36df6 rscn 0|0 login 1 lid 0 scan 2
> 
> If discovery state is DSC_GNL and fcport->current_login_state & 0xf ==
> 6, target tries to do PRLI (which should be prohibited because
> qlini_mode=disabled on the machine and scsi_host active_mode is
> "Target"):
> 
> [  179.928706] qla2xxx [0000:00:05.0]-2918:0: qla24xx_fcport_handle_login 1595 21:00:00:24:ff:7f:35:c7 post FC PRLI
> [  179.928828] qla2xxx [0000:00:05.0]-291b:0: Async-prli - 21:00:00:24:ff:7f:35:c7 hdl=0, loopid=0 portid=000001 retries=30 fc.
> [  179.928969] qla2xxx [0000:00:05.0]-5836:0: Async-prli complete - 21:00:00:24:ff:7f:35:c7 hdl=4 portid=000001 iop0=422.
> [  179.929053] qla2xxx [0000:00:05.0]-2929:0: qla2x00_async_prli_sp_done 21:00:00:24:ff:7f:35:c7 res 0
> [  179.929118] qla2xxx [0000:00:05.0]-2918:0: qla24xx_handle_prli_done_event 1872 21:00:00:24:ff:7f:35:c7 post gpdb
> [  179.929224] qla2xxx [0000:00:05.0]-28dc:0: Async-gpdb 21:00:00:24:ff:7f:35:c7 hndl 0 opt 0
> [  179.929320] qla2xxx [0000:00:05.0]-28db:0: Async done-gpdb res 0, WWPN 21:00:00:24:ff:7f:35:c7 mb[1]=0 mb[2]=3011
> [  179.929394] qla2xxx [0000:00:05.0]-28d2:0: qla24xx_handle_gpdb_event 21:00:00:24:ff:7f:35:c7 DS 5 LS 6 fc4_type 1 rc 0
> [  179.929490] qla2xxx [0000:00:05.0]-28ef:0: qla2x00_update_fcport 21:00:00:24:ff:7f:35:c7
> 
> qla2x00_update_fcport sets discovery state to DSC_LOGIN_COMPLETE.
> 
> [  179.929720] qla2xxx [0000:00:05.0]-f806:0: Adding sess 000000005ec6d065 se_sess 00000000f0daac7c  to tgt 00000000dde86879 sess_count 1
> [  179.929803] qla2xxx [0000:00:05.0]-f84b:0: qla_target(0): session for wwn 21:00:00:24:ff:7f:35:c7 (loop_id 0, s_id 0:0:1, confirmed completion not supported) added
> [  179.929900] qla2xxx [0000:00:05.0]-287d:0: FCPort 21:00:00:24:ff:7f:35:c7 state transitioned from UNCONFIGURED to ONLINE - portid=000001.
> [  180.773031] qla2xxx [0000:00:05.0]-e872:0: qlt_24xx_atio_pkt_all_vps: qla_target(0): type d ox_id 0000
> [  180.773123] qla2xxx [0000:00:05.0]-e82e:0: IMMED_NOTIFY ATIO
> [  180.773126] qla2xxx [0000:00:05.0]-f826:0: qla_target(0): Port ID: 00:00:01 ELS opcode: 0x03 lid 0 21:00:00:24:ff:7f:35:c7
> [  180.773243] qla2xxx [0000:00:05.0]-f897:0: Linking sess 000000005ec6d065 [0] wwn 21:00:00:24:ff:7f:35:c7 with PLOGI ACK to wwn 21:00:00:24:ff:7f:35:c7 s_id 00:00:01, ref=1 pla 00000000e7b769c4 link 0
> [  180.773354] qla2xxx [0000:00:05.0]-28f9:0: qlt_handle_login 4802 21:00:00:24:ff:7f:35:c7  DS 7
> 
> So, if Discovery State is DSC_LOGIN_COMPLETE, let's nuke the session:
> 
> [  180.773413] qla2xxx [0000:00:05.0]-28f9:0: qlt_handle_login 4834 21:00:00:24:ff:7f:35:c7 post del sess
> [  180.773471] qla2xxx [0000:00:05.0]-e801:0: Scheduling sess 000000005ec6d065 for deletion 21:00:00:24:ff:7f:35:c7
> [  180.773544] qla2xxx [0000:00:05.0]-f826:0: qla_target(0): Exit ELS opcode: 0x03 res 0
> [  180.773619] qla2xxx [0000:00:05.0]-290a:0: qlt_unreg_sess sess 000000005ec6d065 for deletion 21:00:00:24:ff:7f:35:c7
> [  180.773692] qla2xxx [0000:00:05.0]-f884:0: qlt_free_session_done: se_sess 00000000f0daac7c / sess 000000005ec6d065 from port 21:00:00:24:ff:7f:35:c7 loop_id 0x00 s_id 00:00:01 logout 1 keep 1 els_logo 0
> [  180.773800] qla2xxx [0000:00:05.0]-287d:0: FCPort 21:00:00:24:ff:7f:35:c7 state transitioned from ONLINE to LOST - portid=000001.
> [  180.773886] qla2xxx [0000:00:05.0]-2870:0: Async-logout - hdl=0 loop-id=0 portid=000001 21:00:00:24:ff:7f:35:c7.
> 
> And here's the LOGO you was worried about:
> 
> [  180.774077] qla2xxx [0000:00:05.0]-5836:0: Async-logout complete - 21:00:00:24:ff:7f:35:c7 hdl=6 portid=000001 iop0=0.
> [  180.774150] qla2xxx [0000:00:05.0]-f893:0: qlt_logo_completion_handler: se_sess 00000000f0daac7c / sess 000000005ec6d065 from port 21:00:00:24:ff:7f:35:c7 loop_id 0x00 s_id 00:00:01 LOGO failed: 0x0
> [  180.829063] qla2xxx [0000:00:05.0]-f887:0: qlt_free_session_done: sess 000000005ec6d065 logout completed
> [  180.829182] qla2xxx [0000:00:05.0]-f89a:0: se_sess 00000000f1f36df6 / sess 000000005ec6d065 port 21:00:00:24:ff:7f:35:c7 is gone, releasing own PLOGI (ref=1)
> [  180.829276] qla2xxx [0000:00:05.0]-5889:0: Sending PLOGI ACK to wwn 21:00:00:24:ff:7f:35:c7 s_id 00:00:01 loop_id 0x00 exch 0x1155f0 ox_id 0x4
> [  180.829364] qla2xxx [0000:00:05.0]-f801:0: Unregistration of sess 000000005ec6d065 21:00:00:24:ff:7f:35:c7 finished fcp_cnt 0
> [  180.829445] qla2xxx [0000:00:05.0]-28f4:0: Async-nack 21:00:00:24:ff:7f:35:c7 hndl 0 PLOGI
> [  180.829566] qla2xxx [0000:00:05.0]-28f2:0: Async done-nack res 0 21:00:00:24:ff:7f:35:c7  type 16
> 
> Then initiator sends a PLOGI:
> 
> [  181.810543] qla2xxx [0000:00:05.0]-e872:0: qlt_24xx_atio_pkt_all_vps: qla_target(0): type d ox_id 0000
> [  181.810641] qla2xxx [0000:00:05.0]-e82e:0: IMMED_NOTIFY ATIO
> [  181.810645] qla2xxx [0000:00:05.0]-f826:0: qla_target(0): Port ID: 00:00:01 ELS opcode: 0x03 lid 0 21:00:00:24:ff:7f:35:c7
> [  181.810785] qla2xxx [0000:00:05.0]-f897:0: Linking sess 000000005ec6d065 [0] wwn 21:00:00:24:ff:7f:35:c7 with PLOGI ACK to wwn 21:00:00:24:ff:7f:35:c7 s_id 00:00:01, ref=1 pla 00000000eebd5532 link 0
> [  181.810918] qla2xxx [0000:00:05.0]-28f9:0: qlt_handle_login 4802 21:00:00:24:ff:7f:35:c7  DS 0
> [  181.810992] qla2xxx [0000:00:05.0]-5889:0: Sending PLOGI ACK to wwn 21:00:00:24:ff:7f:35:c7 s_id 00:00:01 loop_id 0x00 exch 0x115620 ox_id 0x5
> [  181.811106] qla2xxx [0000:00:05.0]-f826:0: qla_target(0): Exit ELS opcode: 0x03 res 0
> [  181.811251] qla2xxx [0000:00:05.0]-28f4:0: Async-nack 21:00:00:24:ff:7f:35:c7 hndl 0 PLOGI
> [  181.811361] qla2xxx [0000:00:05.0]-28f2:0: Async done-nack res 0 21:00:00:24:ff:7f:35:c7  type 16
> 
> The block of messages just above can be seen a few more times which
> means initiator sends multiple PLOGI requests.
> 
> I'll try to figure out the reason why qla24xx_fcport_handle_login
> doesn't quit immediately.
> 

Martin, Bart,

I have noticed the issue only when I use SLE15 as initiator (I haven't
tried centos/rhel kernels yet). SLE15 initiator sends PLOGI multiple
times by a reason unknown to me in spite of LS_ACC response from target
on each PLOGI.

Login works flawlessly if I use the latest kernel on initiator (5.5/scsi
fixes + the first two patches from the patchset).
qla24xx_fcport_handle_login is invoked on target when current link state
is PRLI COMPLETED, so it doesn't do PRLI and doesn't invoke LOGO.

Bart, what kernel do you use on initiator? Should we bring in
workarounds for initiators into mainline?

Best regards,
Roman

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/4] qla2xxx: Micro-optimize qla2x00_configure_local_loop()
  2019-12-10 10:46   ` Martin Wilck
@ 2019-12-14 12:30     ` Roman Bolshakov
  2019-12-15  0:12     ` Bart Van Assche
  1 sibling, 0 replies; 37+ messages in thread
From: Roman Bolshakov @ 2019-12-14 12:30 UTC (permalink / raw)
  To: Martin Wilck
  Cc: Bart Van Assche, Martin K . Petersen, James E . J . Bottomley,
	Quinn Tran, Himanshu Madhani, linux-scsi, Martin Wilck,
	Daniel Wagner

On Tue, Dec 10, 2019 at 11:46:17AM +0100, Martin Wilck wrote:
> Hello Bart,
> 
> On Mon, 2019-12-09 at 10:02 -0800, Bart Van Assche wrote:
> > Instead of changing endianness in-place and copying the data in two
> > steps,
> > do this in one step. This patch makes is a preparation step for
> > fixing the
> > endianness warnings reported by 'sparse' for the qla2xxx driver.
> > 
> > diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
> > index 6c28f38f8021..ddd8bf7997a8 100644
> > --- a/drivers/scsi/qla2xxx/qla_init.c
> > +++ b/drivers/scsi/qla2xxx/qla_init.c
> > @@ -5047,13 +5047,12 @@ qla2x00_configure_local_loop(scsi_qla_host_t *vha)
> >  			rval = qla24xx_get_port_login_templ(vha,
> >  			    ha->init_cb_dma, (void *)ha->init_cb, sz);
> >  			if (rval == QLA_SUCCESS) {
> > +				__be32 *q = &ha->plogi_els_payld.data[0];
> > +
> >  				bp = (uint32_t *)ha->init_cb;
> > -				for (i = 0; i < sz/4 ; i++, bp++)
> > -					*bp = cpu_to_be32(*bp);
> > +				for (i = 0; i < sz/4 ; i++, bp++, q++)
> > +					*q = cpu_to_be32(*bp);
> >  
> > -				memcpy(&ha->plogi_els_payld.data,
> > -				    (void *)ha->init_cb,
> > -				    sizeof(ha->plogi_els_payld.data));
> >  				set_bit(RELOGIN_NEEDED, &vha->dpc_flags);
> >  			} else {
> >  				ql_dbg(ql_dbg_init, vha, 0x00d1,
> 
> A side effect of this patch would be that after return from
> qla2x00_configure_local_loop(), the byte ordering in ha->init_cb
> remains in CPU byte ordering, whereas before your patch, it would have
> been converted to be32. I'm uncertain if that would matter later on.
> 
> The following is not a problems with your patch, but what's really
> weird is that in qla24xx_get_port_login_templ() (which is only called
> from here), the buffer is converted from le32 to CPU endianness, and
> then here, in a second step, from CPU to be32. I'm wondering which byte
> order this buffer is supposed to have, and whether that's different
> depending on which mode the controller is operating in (the be32
> conversion seems to be applied in N2N mode only). Moreover, looking at
> the definition of init_cb_t in qla_def.h, this data structure actually
> has mixed endianness, making me doubt that changing the endianness of
> the whole buffer makes sense at all. Or is ha->init_cb simply being
> abused in this part of the code?
> 
> I guess only Himanshu or Quinn can tell.
> 

PLOGI payload is returned from FW in little-endian 32-bit words, but ELS
IOCB should have big-endian payload. That's why the conversion happens.
init_cb is only temporary container to get the results from FW.
The big-endian ELS payload is copied from plogi_els_payld.data to
elsio->u.els_plogi.els_plogi_pyld->data in qla24xx_els_dcmd2_iocb()
before being sent to FW/ASIC.

Thanks,
Roman

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/4] qla2xxx: Micro-optimize qla2x00_configure_local_loop()
  2019-12-09 18:02 ` [PATCH 4/4] qla2xxx: Micro-optimize qla2x00_configure_local_loop() Bart Van Assche
  2019-12-10 10:46   ` Martin Wilck
@ 2019-12-14 12:33   ` Roman Bolshakov
  2019-12-15  0:14     ` Bart Van Assche
  1 sibling, 1 reply; 37+ messages in thread
From: Roman Bolshakov @ 2019-12-14 12:33 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Martin K . Petersen, James E . J . Bottomley, linux-scsi,
	Quinn Tran, Martin Wilck, Daniel Wagner

On Mon, Dec 09, 2019 at 10:02:23AM -0800, Bart Van Assche wrote:
> Instead of changing endianness in-place and copying the data in two steps,
> do this in one step. This patch makes is a preparation step for fixing the
> endianness warnings reported by 'sparse' for the qla2xxx driver.
> 
> Cc: Quinn Tran <qutran@marvell.com>
> Cc: Martin Wilck <mwilck@suse.com>
> Cc: Daniel Wagner <dwagner@suse.de>
> Cc: Roman Bolshakov <r.bolshakov@yadro.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/scsi/qla2xxx/qla_def.h  | 2 +-
>  drivers/scsi/qla2xxx/qla_init.c | 9 ++++-----
>  2 files changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h
> index ab7424318ee8..cf23f10d27fe 100644
> --- a/drivers/scsi/qla2xxx/qla_def.h
> +++ b/drivers/scsi/qla2xxx/qla_def.h
> @@ -414,7 +414,7 @@ struct els_logo_payload {
>  struct els_plogi_payload {
>  	uint8_t opcode;
>  	uint8_t rsvd[3];
> -	uint8_t data[112];
> +	__be32	data[112 / 4];
>  };
>  
>  struct ct_arg {
> diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
> index 6c28f38f8021..ddd8bf7997a8 100644
> --- a/drivers/scsi/qla2xxx/qla_init.c
> +++ b/drivers/scsi/qla2xxx/qla_init.c
> @@ -5047,13 +5047,12 @@ qla2x00_configure_local_loop(scsi_qla_host_t *vha)
>  			rval = qla24xx_get_port_login_templ(vha,
>  			    ha->init_cb_dma, (void *)ha->init_cb, sz);
>  			if (rval == QLA_SUCCESS) {
> +				__be32 *q = &ha->plogi_els_payld.data[0];
> +
>  				bp = (uint32_t *)ha->init_cb;
> -				for (i = 0; i < sz/4 ; i++, bp++)
> -					*bp = cpu_to_be32(*bp);
> +				for (i = 0; i < sz/4 ; i++, bp++, q++)
> +					*q = cpu_to_be32(*bp);
>  

How about cpu_to_be32_array() instead of the hand-written loop?

> -				memcpy(&ha->plogi_els_payld.data,
> -				    (void *)ha->init_cb,
> -				    sizeof(ha->plogi_els_payld.data));
>  				set_bit(RELOGIN_NEEDED, &vha->dpc_flags);
>  			} else {
>  				ql_dbg(ql_dbg_init, vha, 0x00d1,

Thanks,
Roman

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/4] qla2xxx: Fix point-to-point mode for qla25xx and older
  2019-12-14  1:04       ` Roman Bolshakov
@ 2019-12-15  0:09         ` Bart Van Assche
  2019-12-16 11:45           ` Roman Bolshakov
  2019-12-16  8:37         ` Daniel Wagner
  1 sibling, 1 reply; 37+ messages in thread
From: Bart Van Assche @ 2019-12-15  0:09 UTC (permalink / raw)
  To: Roman Bolshakov, Martin Wilck
  Cc: Martin K . Petersen, James E . J . Bottomley, linux-scsi,
	Quinn Tran, Martin Wilck, Daniel Wagner, linux

On 2019-12-13 17:04, Roman Bolshakov wrote:
> Bart, what kernel do you use on initiator? Should we bring in
> workarounds for initiators into mainline?

Hi Roman,

The setup I use to test point-to-point mode is as follows:
* A single server equipped with a QLE2562 adapter.
* The two ports of the QLE2562 adapter directly connected to each other.
* Both FC ports are assigned to a VM such that if a kernel crash occurs
  only the VM has to be rebooted (PCIe-passthrough).
* The following in /etc/modprobe.d/qla2xxx.conf of the VM used for
  testing:

options qla2xxx qlini_mode=dual ql2xextended_error_logging=0x5200b000

With this setup it is guaranteed that initiator and target are running
the same kernel version.

I run all my tests with mainline kernels. Typically I run my tests on
top of a merge of Martin's scsi-fixes and scsi-queue branches. Since the
regression that I reported is a regression in the upstream kernel I
think the fix should go into the upstream kernel.

Bart.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/4] qla2xxx: Micro-optimize qla2x00_configure_local_loop()
  2019-12-10 10:46   ` Martin Wilck
  2019-12-14 12:30     ` Roman Bolshakov
@ 2019-12-15  0:12     ` Bart Van Assche
  1 sibling, 0 replies; 37+ messages in thread
From: Bart Van Assche @ 2019-12-15  0:12 UTC (permalink / raw)
  To: Martin Wilck, Martin K . Petersen, James E . J . Bottomley,
	Quinn Tran, Himanshu Madhani
  Cc: linux-scsi, Martin Wilck, Daniel Wagner, Roman Bolshakov

On 2019-12-10 02:46, Martin Wilck wrote:
> On Mon, 2019-12-09 at 10:02 -0800, Bart Van Assche wrote:
>> diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
>> index 6c28f38f8021..ddd8bf7997a8 100644
>> --- a/drivers/scsi/qla2xxx/qla_init.c
>> +++ b/drivers/scsi/qla2xxx/qla_init.c
>> @@ -5047,13 +5047,12 @@ qla2x00_configure_local_loop(scsi_qla_host_t *vha)
>>  			rval = qla24xx_get_port_login_templ(vha,
>>  			    ha->init_cb_dma, (void *)ha->init_cb, sz);
>>  			if (rval == QLA_SUCCESS) {
>> +				__be32 *q = &ha->plogi_els_payld.data[0];
>> +
>>  				bp = (uint32_t *)ha->init_cb;
>> -				for (i = 0; i < sz/4 ; i++, bp++)
>> -					*bp = cpu_to_be32(*bp);
>> +				for (i = 0; i < sz/4 ; i++, bp++, q++)
>> +					*q = cpu_to_be32(*bp);
>>  
>> -				memcpy(&ha->plogi_els_payld.data,
>> -				    (void *)ha->init_cb,
>> -				    sizeof(ha->plogi_els_payld.data));
>>  				set_bit(RELOGIN_NEEDED, &vha->dpc_flags);
>>  			} else {
>>  				ql_dbg(ql_dbg_init, vha, 0x00d1,
> 
> A side effect of this patch would be that after return from
> qla2x00_configure_local_loop(), the byte ordering in ha->init_cb
> remains in CPU byte ordering, whereas before your patch, it would have
> been converted to be32. I'm uncertain if that would matter later on.
> 
> The following is not a problems with your patch, but what's really
> weird is that in qla24xx_get_port_login_templ() (which is only called
> from here), the buffer is converted from le32 to CPU endianness, and
> then here, in a second step, from CPU to be32. I'm wondering which byte
> order this buffer is supposed to have, and whether that's different
> depending on which mode the controller is operating in (the be32
> conversion seems to be applied in N2N mode only). Moreover, looking at
> the definition of init_cb_t in qla_def.h, this data structure actually
> has mixed endianness, making me doubt that changing the endianness of
> the whole buffer makes sense at all. Or is ha->init_cb simply being
> abused in this part of the code?
> 
> I guess only Himanshu or Quinn can tell.

Hi Martin,

I will rework this patch such that this function again changes the
ha->init_cb buffer into big endian byte order.

Bart.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/4] qla2xxx: Micro-optimize qla2x00_configure_local_loop()
  2019-12-14 12:33   ` Roman Bolshakov
@ 2019-12-15  0:14     ` Bart Van Assche
  0 siblings, 0 replies; 37+ messages in thread
From: Bart Van Assche @ 2019-12-15  0:14 UTC (permalink / raw)
  To: Roman Bolshakov
  Cc: Martin K . Petersen, James E . J . Bottomley, linux-scsi,
	Quinn Tran, Martin Wilck, Daniel Wagner

On 2019-12-14 04:33, Roman Bolshakov wrote:
> How about cpu_to_be32_array() instead of the hand-written loop?

Hi Roman,

That sounds like a good idea to me. I will look into this.

Bart.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/4] qla2xxx: Fix point-to-point mode for qla25xx and older
  2019-12-14  1:04       ` Roman Bolshakov
  2019-12-15  0:09         ` Bart Van Assche
@ 2019-12-16  8:37         ` Daniel Wagner
  1 sibling, 0 replies; 37+ messages in thread
From: Daniel Wagner @ 2019-12-16  8:37 UTC (permalink / raw)
  To: Roman Bolshakov
  Cc: Martin Wilck, Bart Van Assche, Martin K . Petersen,
	James E . J . Bottomley, linux-scsi, Quinn Tran, Martin Wilck,
	linux

Hi Roman,

On Sat, Dec 14, 2019 at 04:04:00AM +0300, Roman Bolshakov wrote:
> I have noticed the issue only when I use SLE15 as initiator (I haven't
> tried centos/rhel kernels yet). SLE15 initiator sends PLOGI multiple
> times by a reason unknown to me in spite of LS_ACC response from target
> on each PLOGI.

The currently released SLE15 kernel runs still an older version of the
qla2xxx driver. Though we have backported the current qla2xxx driver
for the next version of SLE15.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/4] qla2xxx: Fix point-to-point mode for qla25xx and older
  2019-12-15  0:09         ` Bart Van Assche
@ 2019-12-16 11:45           ` Roman Bolshakov
  2019-12-16 20:08             ` Roman Bolshakov
  0 siblings, 1 reply; 37+ messages in thread
From: Roman Bolshakov @ 2019-12-16 11:45 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Martin Wilck, Martin K . Petersen, James E . J . Bottomley,
	linux-scsi, Quinn Tran, Martin Wilck, Daniel Wagner, linux

On Sat, Dec 14, 2019 at 04:09:41PM -0800, Bart Van Assche wrote:
> On 2019-12-13 17:04, Roman Bolshakov wrote:
> > Bart, what kernel do you use on initiator? Should we bring in
> > workarounds for initiators into mainline?
> 
> The setup I use to test point-to-point mode is as follows:
> * A single server equipped with a QLE2562 adapter.
> * The two ports of the QLE2562 adapter directly connected to each other.
> * Both FC ports are assigned to a VM such that if a kernel crash occurs
>   only the VM has to be rebooted (PCIe-passthrough).
> * The following in /etc/modprobe.d/qla2xxx.conf of the VM used for
>   testing:
> 
> options qla2xxx qlini_mode=dual ql2xextended_error_logging=0x5200b000
> 
> With this setup it is guaranteed that initiator and target are running
> the same kernel version.
> 
> I run all my tests with mainline kernels. Typically I run my tests on
> top of a merge of Martin's scsi-fixes and scsi-queue branches. Since the
> regression that I reported is a regression in the upstream kernel I
> think the fix should go into the upstream kernel.
> 

Hi Bart,

The setup details are helpful. Yes, I can reproduce it with
qlini_mode=dual. But it works with qlini_mode=disabled.

But, qlini_mode=dual works if target is created before initiator is
started.  So, the transition from initiator mode to target is the one
that needs a fix.

There first PLOGI after target creation fails because it complains about
wrong D_ID in Login/Logout IOCB:

qla2xxx [0000:00:07.0]-3873:1: Enter: PLOGI portid=0000ef
qla2xxx [0000:00:07.0]-3873:1: PLOGI 00000000003b33d0 00000000f1800318
qla2xxx [0000:00:07.0]-3873:1: PLOGI buffer:
qla2xxx [0000:00:07.0]-0909:1: +116   0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F
qla2xxx [0000:00:07.0]-0909:1: ----- -----------------------------------------------
qla2xxx [0000:00:07.0]-0909:1: 0000: 03 00 00 00 20 20 00 0c 88 00 08 00 00 ff 00 1f
qla2xxx [0000:00:07.0]-0909:1: 0010: 00 00 07 d0 21 00 00 24 ff 7f 35 c7 20 00 00 24
qla2xxx [0000:00:07.0]-0909:1: 0020: ff 7f 35 c7 00 00 00 00 00 00 00 00 00 00 00 00
qla2xxx [0000:00:07.0]-0909:1: 0030: 00 00 00 00 00 00 00 00 20 00 08 00 00 ff 00 04
qla2xxx [0000:00:07.0]-0909:1: 0040: 00 01 00 00 80 00 00 00 00 00 08 00 00 ff 00 00
qla2xxx [0000:00:07.0]-0909:1: 0050: 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00
qla2xxx [0000:00:07.0]-0909:1: 0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
qla2xxx [0000:00:07.0]-0909:1: 0070: 00 00 00 00
qla2xxx [0000:00:07.0]-3873:1: PLOGI ELS IOCB:
qla2xxx [0000:00:07.0]-0909:1: +64    0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F
qla2xxx [0000:00:07.0]-0909:1: ----- -----------------------------------------------
qla2xxx [0000:00:07.0]-0909:1: 0000: 53 01 00 00 05 00 00 00 00 00 00 00 01 00 00 10
qla2xxx [0000:00:07.0]-0909:1: 0010: 00 00 00 00 01 00 03 00 ef 00 00 00 ef 00 00 00
qla2xxx [0000:00:07.0]-0909:1: 0020: 74 00 00 00 74 00 00 00 00 00 a5 30 00 00 00 00
qla2xxx [0000:00:07.0]-0909:1: 0030: 74 00 00 00 00 00 a6 30 00 00 00 00 74 00 00 00
qla2xxx [0000:00:07.0]-3874:1: ELS_DCMD PLOGI sent, hdl=5, loopid=0, to port_id 0000ef from port_id 0000ef
qla2xxx [0000:00:07.0]-583f:1: ELS IOCB Done -Driver ELS logo error hdl=5 comp_status=0x31 error subcode 1=0x19 error subcode 2=0x18 total_byte=0x74
qla2xxx [0000:00:07.0]-3872:1: ELS_DCMD ELS done rc 458752 hdl=5, portid=0000ef 21:00:00:24:ff:12:b1:a0
qla2xxx [0000:00:07.0]-28eb:1: qla2x00_els_dcmd2_sp_done 21:00:00:24:ff:12:b1:a0 cmd error fw_status 0x31 0x19 0x18

Also note, that the IOCB is sent from port 0000ef to port 0000ef. This
is definitely wrong and should be fixed. I'm still looking how it
reached the invalid state.

If qla2xxx target is started with qlini_mode=disabled in P2P mode,
initiator/target have port ids 000001/000002. The port with bigger WWPN
in the pair typically assigns 000001 to itself and 000002 to the peer.

Roman

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/4] qla2xxx: Fix point-to-point mode for qla25xx and older
  2019-12-16 11:45           ` Roman Bolshakov
@ 2019-12-16 20:08             ` Roman Bolshakov
  0 siblings, 0 replies; 37+ messages in thread
From: Roman Bolshakov @ 2019-12-16 20:08 UTC (permalink / raw)
  To: Bart Van Assche, Quinn Tran, Himanshu Madhani
  Cc: Martin Wilck, Martin K . Petersen, James E . J . Bottomley,
	linux-scsi, Martin Wilck, Daniel Wagner, linux

On Mon, Dec 16, 2019 at 02:45:35PM +0300, Roman Bolshakov wrote:
> On Sat, Dec 14, 2019 at 04:09:41PM -0800, Bart Van Assche wrote:
> > On 2019-12-13 17:04, Roman Bolshakov wrote:
> > > Bart, what kernel do you use on initiator? Should we bring in
> > > workarounds for initiators into mainline?
> > 
> > The setup I use to test point-to-point mode is as follows:
> > * A single server equipped with a QLE2562 adapter.
> > * The two ports of the QLE2562 adapter directly connected to each other.
> > * Both FC ports are assigned to a VM such that if a kernel crash occurs
> >   only the VM has to be rebooted (PCIe-passthrough).
> > * The following in /etc/modprobe.d/qla2xxx.conf of the VM used for
> >   testing:
> > 
> > options qla2xxx qlini_mode=dual ql2xextended_error_logging=0x5200b000
> > 
> > With this setup it is guaranteed that initiator and target are running
> > the same kernel version.
> > 
> > I run all my tests with mainline kernels. Typically I run my tests on
> > top of a merge of Martin's scsi-fixes and scsi-queue branches. Since the
> > regression that I reported is a regression in the upstream kernel I
> > think the fix should go into the upstream kernel.
> > 
> 
> Hi Bart,
> 
> The setup details are helpful. Yes, I can reproduce it with
> qlini_mode=dual. But it works with qlini_mode=disabled.
> 
> But, qlini_mode=dual works if target is created before initiator is
> started.  So, the transition from initiator mode to target is the one
> that needs a fix.
> 
> There first PLOGI after target creation fails because it complains about
> wrong D_ID in Login/Logout IOCB:
> 
> qla2xxx [0000:00:07.0]-3873:1: Enter: PLOGI portid=0000ef
> qla2xxx [0000:00:07.0]-3873:1: PLOGI 00000000003b33d0 00000000f1800318
> qla2xxx [0000:00:07.0]-3873:1: PLOGI buffer:
> qla2xxx [0000:00:07.0]-0909:1: +116   0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F
> qla2xxx [0000:00:07.0]-0909:1: ----- -----------------------------------------------
> qla2xxx [0000:00:07.0]-0909:1: 0000: 03 00 00 00 20 20 00 0c 88 00 08 00 00 ff 00 1f
> qla2xxx [0000:00:07.0]-0909:1: 0010: 00 00 07 d0 21 00 00 24 ff 7f 35 c7 20 00 00 24
> qla2xxx [0000:00:07.0]-0909:1: 0020: ff 7f 35 c7 00 00 00 00 00 00 00 00 00 00 00 00
> qla2xxx [0000:00:07.0]-0909:1: 0030: 00 00 00 00 00 00 00 00 20 00 08 00 00 ff 00 04
> qla2xxx [0000:00:07.0]-0909:1: 0040: 00 01 00 00 80 00 00 00 00 00 08 00 00 ff 00 00
> qla2xxx [0000:00:07.0]-0909:1: 0050: 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> qla2xxx [0000:00:07.0]-0909:1: 0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> qla2xxx [0000:00:07.0]-0909:1: 0070: 00 00 00 00
> qla2xxx [0000:00:07.0]-3873:1: PLOGI ELS IOCB:
> qla2xxx [0000:00:07.0]-0909:1: +64    0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F
> qla2xxx [0000:00:07.0]-0909:1: ----- -----------------------------------------------
> qla2xxx [0000:00:07.0]-0909:1: 0000: 53 01 00 00 05 00 00 00 00 00 00 00 01 00 00 10
> qla2xxx [0000:00:07.0]-0909:1: 0010: 00 00 00 00 01 00 03 00 ef 00 00 00 ef 00 00 00
> qla2xxx [0000:00:07.0]-0909:1: 0020: 74 00 00 00 74 00 00 00 00 00 a5 30 00 00 00 00
> qla2xxx [0000:00:07.0]-0909:1: 0030: 74 00 00 00 00 00 a6 30 00 00 00 00 74 00 00 00
> qla2xxx [0000:00:07.0]-3874:1: ELS_DCMD PLOGI sent, hdl=5, loopid=0, to port_id 0000ef from port_id 0000ef
> qla2xxx [0000:00:07.0]-583f:1: ELS IOCB Done -Driver ELS logo error hdl=5 comp_status=0x31 error subcode 1=0x19 error subcode 2=0x18 total_byte=0x74
> qla2xxx [0000:00:07.0]-3872:1: ELS_DCMD ELS done rc 458752 hdl=5, portid=0000ef 21:00:00:24:ff:12:b1:a0
> qla2xxx [0000:00:07.0]-28eb:1: qla2x00_els_dcmd2_sp_done 21:00:00:24:ff:12:b1:a0 cmd error fw_status 0x31 0x19 0x18
> 
> Also note, that the IOCB is sent from port 0000ef to port 0000ef. This
> is definitely wrong and should be fixed. I'm still looking how it
> reached the invalid state.
> 

Hi all again,

Bart,

In my setup I use QLE2560 as target (it has lower WWPN) and one port of
QLE2742 as initiator. The proposed fix doesn't work for me. The same
PLOGI error is returned.

When target is started with qlini_mode=dual, the connection is
initialized in AL_PA topology. FW assigns AL_PA for both ports. It sets
AL_PA ef for the target port and e8 for initiator on my setup.

LOOP DOWN event comes right after then target is created and existing
rport is deleted by initiator without explicit LOGO and N_Port handle
cleanup.

Then the topology changes to N2N, the session is getting re-initialized,
but initiator doesn't set correct D_ID for PLOGI, because FW returns
wrong port id for initiator (ef instead of e8) in the response of Get ID
mailbox command.

Quinn, Himanshu, does it tell something to you?

> If qla2xxx target is started with qlini_mode=disabled in P2P mode,
> initiator/target have port ids 000001/000002. The port with bigger WWPN
> in the pair typically assigns 000001 to itself and 000002 to the peer.
> 

Thank you,
Roman

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/4] qla2xxx patches for kernel v5.6
  2019-12-09 18:02 [PATCH 0/4] qla2xxx patches for kernel v5.6 Bart Van Assche
                   ` (3 preceding siblings ...)
  2019-12-09 18:02 ` [PATCH 4/4] qla2xxx: Micro-optimize qla2x00_configure_local_loop() Bart Van Assche
@ 2020-02-06 13:23 ` Martin Wilck
  2020-02-06 20:49   ` Bart Van Assche
  4 siblings, 1 reply; 37+ messages in thread
From: Martin Wilck @ 2020-02-06 13:23 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen, James E . J . Bottomley
  Cc: linux-scsi, dwagner

Hi Bart, hi Martin,

On Mon, 2019-12-09 at 10:02 -0800, Bart Van Assche wrote:
> Hi Martin,
> 
> Please consider these four small qla2xxx patches for kernel v5.6.
> 
> Thanks,
> 
> Bart.

it seems this series got lost somehow, including the point-to-point
mode fix?

I made some remarks back in December, but they were mostly nitpicks.
The only real issue with this set is that the last patch doesn't apply
cleanly.

Bart, could you resend?

Martin



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/4] qla2xxx patches for kernel v5.6
  2020-02-06 13:23 ` [PATCH 0/4] qla2xxx patches for kernel v5.6 Martin Wilck
@ 2020-02-06 20:49   ` Bart Van Assche
  2020-02-06 20:52     ` Martin Wilck
  0 siblings, 1 reply; 37+ messages in thread
From: Bart Van Assche @ 2020-02-06 20:49 UTC (permalink / raw)
  To: Martin Wilck, Martin K . Petersen, James E . J . Bottomley
  Cc: linux-scsi, dwagner

On 2/6/20 5:23 AM, Martin Wilck wrote:
> Hi Bart, hi Martin,
> 
> On Mon, 2019-12-09 at 10:02 -0800, Bart Van Assche wrote:
>> Hi Martin,
>>
>> Please consider these four small qla2xxx patches for kernel v5.6.
>>
>> Thanks,
>>
>> Bart.
> 
> it seems this series got lost somehow, including the point-to-point
> mode fix?
> 
> I made some remarks back in December, but they were mostly nitpicks.
> The only real issue with this set is that the last patch doesn't apply
> cleanly.

Hi Martin,

The first patch of this series has been queued by Martin Petersen for 
the v5.7 merge window. I plan to repost patches 2/4 and 4/4 of this 
series after the merge window has closed. Patch 3/4 (the fix for 
point-to-point mode) has been dropped because I'm not confident that my 
fix is a proper fix.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/4] qla2xxx patches for kernel v5.6
  2020-02-06 20:49   ` Bart Van Assche
@ 2020-02-06 20:52     ` Martin Wilck
  2020-02-06 21:01       ` Bart Van Assche
  0 siblings, 1 reply; 37+ messages in thread
From: Martin Wilck @ 2020-02-06 20:52 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen, James E . J . Bottomley
  Cc: linux-scsi, dwagner

On Thu, 2020-02-06 at 12:49 -0800, Bart Van Assche wrote:
> 
> The first patch of this series has been queued by Martin Petersen
> for 
> the v5.7 merge window. I plan to repost patches 2/4 and 4/4 of this 
> series after the merge window has closed. Patch 3/4 (the fix for 
> point-to-point mode) has been dropped because I'm not confident that
> my 
> fix is a proper fix.

Well, Himanshu had ack'd it, and Roman too IIRC ... have you given up
on the subject?

Martin



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/4] qla2xxx patches for kernel v5.6
  2020-02-06 20:52     ` Martin Wilck
@ 2020-02-06 21:01       ` Bart Van Assche
  2020-02-07  9:09         ` Martin Wilck
  0 siblings, 1 reply; 37+ messages in thread
From: Bart Van Assche @ 2020-02-06 21:01 UTC (permalink / raw)
  To: Martin Wilck, Martin K . Petersen, James E . J . Bottomley
  Cc: linux-scsi, dwagner

On 2/6/20 12:52 PM, Martin Wilck wrote:
> On Thu, 2020-02-06 at 12:49 -0800, Bart Van Assche wrote:
>>
>> The first patch of this series has been queued by Martin Petersen
>> for
>> the v5.7 merge window. I plan to repost patches 2/4 and 4/4 of this
>> series after the merge window has closed. Patch 3/4 (the fix for
>> point-to-point mode) has been dropped because I'm not confident that
>> my
>> fix is a proper fix.
> 
> Well, Himanshu had ack'd it, and Roman too IIRC ... have you given up
> on the subject?

That patch is not sufficient to make point-to-point mode work reliably 
on 8 Gb/s adapters, the adapter type on which I noticed that 
point-to-point mode is broken. If anyone else wants to fix 
point-to-point mode on older adapters that would be welcome.

Bart.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/4] qla2xxx patches for kernel v5.6
  2020-02-06 21:01       ` Bart Van Assche
@ 2020-02-07  9:09         ` Martin Wilck
  2020-02-08  3:17           ` Bart Van Assche
  0 siblings, 1 reply; 37+ messages in thread
From: Martin Wilck @ 2020-02-07  9:09 UTC (permalink / raw)
  To: Bart Van Assche, Himanshu Madhani, Quinn Tran
  Cc: linux-scsi, dwagner, Roman Bolshakov

Bart,

On Thu, 2020-02-06 at 13:01 -0800, Bart Van Assche wrote:
> On 2/6/20 12:52 PM, Martin Wilck wrote:
> > On Thu, 2020-02-06 at 12:49 -0800, Bart Van Assche wrote:
> > > The first patch of this series has been queued by Martin Petersen
> > > for
> > > the v5.7 merge window. I plan to repost patches 2/4 and 4/4 of
> > > this
> > > series after the merge window has closed. Patch 3/4 (the fix for
> > > point-to-point mode) has been dropped because I'm not confident
> > > that
> > > my
> > > fix is a proper fix.
> > 
> > Well, Himanshu had ack'd it, and Roman too IIRC ... have you given
> > up
> > on the subject?
> 
> That patch is not sufficient to make point-to-point mode work
> reliably 
> on 8 Gb/s adapters, the adapter type on which I noticed that 
> point-to-point mode is broken. If anyone else wants to fix 
> point-to-point mode on older adapters that would be welcome.

Sorry for insisting - in your original submission "Revert "qla2xxx: Fix
Nport ID display value" from 11/09/2019, you'd identified this as a
regression caused by 0aabb6b699f7 ("scsi: qla2xxx: Fix Nport ID display
value"). Are you saying that that wasn't the whole truth? Had N2N been
broken before already? Did N2N with "old" adapters ever work with
previous versions of the driver? Which ones?

@Himanshu, Quinn, can you say anything about this subject? Do you
intend to work on fixing this any time soon?

Martin



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/4] qla2xxx patches for kernel v5.6
  2020-02-07  9:09         ` Martin Wilck
@ 2020-02-08  3:17           ` Bart Van Assche
  0 siblings, 0 replies; 37+ messages in thread
From: Bart Van Assche @ 2020-02-08  3:17 UTC (permalink / raw)
  To: Martin Wilck, Himanshu Madhani, Quinn Tran
  Cc: linux-scsi, dwagner, Roman Bolshakov

On 2020-02-07 01:09, Martin Wilck wrote:
> Sorry for insisting - in your original submission "Revert "qla2xxx: Fix
> Nport ID display value" from 11/09/2019, you'd identified this as a
> regression caused by 0aabb6b699f7 ("scsi: qla2xxx: Fix Nport ID display
> value"). Are you saying that that wasn't the whole truth? Had N2N been
> broken before already? Did N2N with "old" adapters ever work with
> previous versions of the driver? Which ones?

Hi Martin,

My conclusion is indeed that N2N is broken since a long time for at
least the 8 Gb/s QLogic FC adapter. The test I run is to set the
qlini_mode kernel module parameter to dual, to connect two ports of a
dual port adapter to each other and to make both ports log in to each
other. If I run this test against a 32 Gb/s dual port adapter then I
always see that both ports log in to each other. If I run this test
against an 8 Gb/s dual port adapter then I see that most of the time
only one port logs in to another port. Sometimes no login happens at all.

Bart.

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2020-02-08  3:17 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-09 18:02 [PATCH 0/4] qla2xxx patches for kernel v5.6 Bart Van Assche
2019-12-09 18:02 ` [PATCH 1/4] qla2xxx: Check locking assumptions at runtime in qla2x00_abort_srb() Bart Van Assche
2019-12-10 11:06   ` Martin Wilck
2019-12-10 19:39   ` Himanshu Madhani
2019-12-11 19:03   ` Roman Bolshakov
2019-12-09 18:02 ` [PATCH 2/4] qla2xxx: Simplify the code for aborting SCSI commands Bart Van Assche
2019-12-10 11:47   ` Martin Wilck
2019-12-10 17:57     ` Bart Van Assche
2019-12-10 20:29       ` Martin Wilck
2019-12-10 20:45         ` Bart Van Assche
2019-12-11 10:26           ` Martin Wilck
2019-12-12 16:01     ` Roman Bolshakov
2019-12-10 19:41   ` Himanshu Madhani
2019-12-12 16:04   ` Roman Bolshakov
2019-12-09 18:02 ` [PATCH 3/4] qla2xxx: Fix point-to-point mode for qla25xx and older Bart Van Assche
2019-12-10 10:52   ` Martin Wilck
2019-12-10 18:00     ` Bart Van Assche
2019-12-10 19:44       ` Martin Wilck
2019-12-12 20:07     ` Roman Bolshakov
2019-12-14  1:04       ` Roman Bolshakov
2019-12-15  0:09         ` Bart Van Assche
2019-12-16 11:45           ` Roman Bolshakov
2019-12-16 20:08             ` Roman Bolshakov
2019-12-16  8:37         ` Daniel Wagner
2019-12-10 19:40   ` Himanshu Madhani
2019-12-09 18:02 ` [PATCH 4/4] qla2xxx: Micro-optimize qla2x00_configure_local_loop() Bart Van Assche
2019-12-10 10:46   ` Martin Wilck
2019-12-14 12:30     ` Roman Bolshakov
2019-12-15  0:12     ` Bart Van Assche
2019-12-14 12:33   ` Roman Bolshakov
2019-12-15  0:14     ` Bart Van Assche
2020-02-06 13:23 ` [PATCH 0/4] qla2xxx patches for kernel v5.6 Martin Wilck
2020-02-06 20:49   ` Bart Van Assche
2020-02-06 20:52     ` Martin Wilck
2020-02-06 21:01       ` Bart Van Assche
2020-02-07  9:09         ` Martin Wilck
2020-02-08  3:17           ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.