All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] qla2xxx driver bug fixes
@ 2021-03-23  4:42 Nilesh Javali
  2021-03-23  4:42 ` [PATCH 01/11] qla2xxx: Fix IOPS drop seen in some adapters Nilesh Javali
                   ` (11 more replies)
  0 siblings, 12 replies; 29+ messages in thread
From: Nilesh Javali @ 2021-03-23  4:42 UTC (permalink / raw)
  To: martin.petersen; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

Martin,

Please apply the qla2xxx driver bug fixes to the scsi tree at your
earliest convenience.

Thanks,
Nilesh

Arun Easi (3):
  qla2xxx: Fix IOPS drop seen in some adapters
  qla2xxx: Add H:C:T info in the log message for fc ports
  qla2xxx: Fix crash in qla2xxx_mqueuecommand

Nilesh Javali (1):
  qla2xxx: Update version to 10.02.00.106-k

Quinn Tran (7):
  qla2xxx: fix stuck session
  qla2xxx: consolidate zio threshold setting for both fcp & nvme
  qla2xxx: Fix use after free in bsg
  qla2xxx: fix RISC RESET completion polling
  qla2xxx: fix crash in PCIe error handling
  qla2xxx: fix mailbox recovery during PCIe error
  qla2xxx: include AER debug mask to default

 drivers/scsi/qla2xxx/qla_bsg.c     |   3 +-
 drivers/scsi/qla2xxx/qla_dbg.c     |  32 +++++
 drivers/scsi/qla2xxx/qla_dbg.h     |   2 +-
 drivers/scsi/qla2xxx/qla_def.h     |  12 +-
 drivers/scsi/qla2xxx/qla_gbl.h     |   3 +
 drivers/scsi/qla2xxx/qla_init.c    | 115 ++++++++++++----
 drivers/scsi/qla2xxx/qla_inline.h  |  29 ++++
 drivers/scsi/qla2xxx/qla_iocb.c    |  79 +++++++++--
 drivers/scsi/qla2xxx/qla_isr.c     |   9 +-
 drivers/scsi/qla2xxx/qla_mbx.c     |  37 +++--
 drivers/scsi/qla2xxx/qla_nvme.c    |  10 +-
 drivers/scsi/qla2xxx/qla_os.c      | 212 ++++++++++++++++-------------
 drivers/scsi/qla2xxx/qla_target.c  |   2 +-
 drivers/scsi/qla2xxx/qla_version.h |   4 +-
 14 files changed, 395 insertions(+), 154 deletions(-)


base-commit: f749d8b7a9896bc6e5ffe104cc64345037e0b152
-- 
2.19.0.rc0


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 01/11] qla2xxx: Fix IOPS drop seen in some adapters
  2021-03-23  4:42 [PATCH 00/11] qla2xxx driver bug fixes Nilesh Javali
@ 2021-03-23  4:42 ` Nilesh Javali
  2021-03-24 15:46   ` Himanshu Madhani
  2021-03-23  4:42 ` [PATCH 02/11] qla2xxx: Add H:C:T info in the log message for fc ports Nilesh Javali
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Nilesh Javali @ 2021-03-23  4:42 UTC (permalink / raw)
  To: martin.petersen; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

From: Arun Easi <aeasi@marvell.com>

Removing the response queue processing in the send path is showing IOPS
drop. Add back the process_response_queue() call in the send path.

Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
---
 drivers/scsi/qla2xxx/qla_iocb.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/scsi/qla2xxx/qla_iocb.c b/drivers/scsi/qla2xxx/qla_iocb.c
index 8b41cbaf8535..f26a7a14fce9 100644
--- a/drivers/scsi/qla2xxx/qla_iocb.c
+++ b/drivers/scsi/qla2xxx/qla_iocb.c
@@ -1600,12 +1600,14 @@ qla24xx_start_scsi(srb_t *sp)
 	uint16_t	req_cnt;
 	uint16_t	tot_dsds;
 	struct req_que *req = NULL;
+	struct rsp_que *rsp;
 	struct scsi_cmnd *cmd = GET_CMD_SP(sp);
 	struct scsi_qla_host *vha = sp->vha;
 	struct qla_hw_data *ha = vha->hw;
 
 	/* Setup device pointers. */
 	req = vha->req;
+	rsp = req->rsp;
 
 	/* So we know we haven't pci_map'ed anything yet */
 	tot_dsds = 0;
@@ -1707,6 +1709,11 @@ qla24xx_start_scsi(srb_t *sp)
 	/* Set chip new ring index. */
 	wrt_reg_dword(req->req_q_in, req->ring_index);
 
+	/* Manage unprocessed RIO/ZIO commands in response queue. */
+	if (vha->flags.process_response_queue &&
+	    rsp->ring_ptr->signature != RESPONSE_PROCESSED)
+		qla24xx_process_response_queue(vha, rsp);
+
 	spin_unlock_irqrestore(&ha->hardware_lock, flags);
 	return QLA_SUCCESS;
 
@@ -1897,6 +1904,11 @@ qla24xx_dif_start_scsi(srb_t *sp)
 	/* Set chip new ring index. */
 	wrt_reg_dword(req->req_q_in, req->ring_index);
 
+	/* Manage unprocessed RIO/ZIO commands in response queue. */
+	if (vha->flags.process_response_queue &&
+	    rsp->ring_ptr->signature != RESPONSE_PROCESSED)
+		qla24xx_process_response_queue(vha, rsp);
+
 	spin_unlock_irqrestore(&ha->hardware_lock, flags);
 
 	return QLA_SUCCESS;
@@ -1931,6 +1943,7 @@ qla2xxx_start_scsi_mq(srb_t *sp)
 	uint16_t	req_cnt;
 	uint16_t	tot_dsds;
 	struct req_que *req = NULL;
+	struct rsp_que *rsp;
 	struct scsi_cmnd *cmd = GET_CMD_SP(sp);
 	struct scsi_qla_host *vha = sp->fcport->vha;
 	struct qla_hw_data *ha = vha->hw;
@@ -1941,6 +1954,7 @@ qla2xxx_start_scsi_mq(srb_t *sp)
 
 	/* Setup qpair pointers */
 	req = qpair->req;
+	rsp = qpair->rsp;
 
 	/* So we know we haven't pci_map'ed anything yet */
 	tot_dsds = 0;
@@ -2041,6 +2055,11 @@ qla2xxx_start_scsi_mq(srb_t *sp)
 	/* Set chip new ring index. */
 	wrt_reg_dword(req->req_q_in, req->ring_index);
 
+	/* Manage unprocessed RIO/ZIO commands in response queue. */
+	if (vha->flags.process_response_queue &&
+	    rsp->ring_ptr->signature != RESPONSE_PROCESSED)
+		qla24xx_process_response_queue(vha, rsp);
+
 	spin_unlock_irqrestore(&qpair->qp_lock, flags);
 	return QLA_SUCCESS;
 
-- 
2.19.0.rc0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 02/11] qla2xxx: Add H:C:T info in the log message for fc ports
  2021-03-23  4:42 [PATCH 00/11] qla2xxx driver bug fixes Nilesh Javali
  2021-03-23  4:42 ` [PATCH 01/11] qla2xxx: Fix IOPS drop seen in some adapters Nilesh Javali
@ 2021-03-23  4:42 ` Nilesh Javali
  2021-03-24 15:48   ` Himanshu Madhani
  2021-03-23  4:42 ` [PATCH 03/11] qla2xxx: fix stuck session Nilesh Javali
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Nilesh Javali @ 2021-03-23  4:42 UTC (permalink / raw)
  To: martin.petersen; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

From: Arun Easi <aeasi@marvell.com>

The host:channel:scsi_target_id information is helpful in matching
an fc port with a scsi device, so add it. For initiator fc ports,
a -1 would be displayed for "target" part.

Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
---
 drivers/scsi/qla2xxx/qla_init.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
index f01f07116bd3..af237c485389 100644
--- a/drivers/scsi/qla2xxx/qla_init.c
+++ b/drivers/scsi/qla2xxx/qla_init.c
@@ -5512,13 +5512,14 @@ qla2x00_reg_remote_port(scsi_qla_host_t *vha, fc_port_t *fcport)
 	if (fcport->port_type & FCT_NVME_DISCOVERY)
 		rport_ids.roles |= FC_PORT_ROLE_NVME_DISCOVERY;
 
+	fc_remote_port_rolechg(rport, rport_ids.roles);
+
 	ql_dbg(ql_dbg_disc, vha, 0x20ee,
-	    "%s %8phN. rport %p is %s mode\n",
-	    __func__, fcport->port_name, rport,
+	    "%s: %8phN. rport %ld:0:%d (%p) is %s mode\n",
+	    __func__, fcport->port_name, vha->host_no,
+	    rport->scsi_target_id, rport,
 	    (fcport->port_type == FCT_TARGET) ? "tgt" :
 	    ((fcport->port_type & FCT_NVME) ? "nvme" : "ini"));
-
-	fc_remote_port_rolechg(rport, rport_ids.roles);
 }
 
 /*
-- 
2.19.0.rc0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 03/11] qla2xxx: fix stuck session
  2021-03-23  4:42 [PATCH 00/11] qla2xxx driver bug fixes Nilesh Javali
  2021-03-23  4:42 ` [PATCH 01/11] qla2xxx: Fix IOPS drop seen in some adapters Nilesh Javali
  2021-03-23  4:42 ` [PATCH 02/11] qla2xxx: Add H:C:T info in the log message for fc ports Nilesh Javali
@ 2021-03-23  4:42 ` Nilesh Javali
  2021-03-23 16:31   ` Bart Van Assche
  2021-03-23  4:42 ` [PATCH 04/11] qla2xxx: consolidate zio threshold setting for both fcp & nvme Nilesh Javali
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Nilesh Javali @ 2021-03-23  4:42 UTC (permalink / raw)
  To: martin.petersen; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

From: Quinn Tran <qutran@marvell.com>

Session was stuck due to explicit logout to target was timed out.
The target was in an unresponsive state. This timeout induced an error
to the GNL command from moving forward.

Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
---
 drivers/scsi/qla2xxx/qla_init.c   | 1 +
 drivers/scsi/qla2xxx/qla_target.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
index af237c485389..f6dc8166e7ba 100644
--- a/drivers/scsi/qla2xxx/qla_init.c
+++ b/drivers/scsi/qla2xxx/qla_init.c
@@ -718,6 +718,7 @@ static void qla24xx_handle_gnl_done_event(scsi_qla_host_t *vha,
 		ql_dbg(ql_dbg_disc, vha, 0x20e0,
 		    "%s %8phC login gen changed\n",
 		    __func__, fcport->port_name);
+		set_bit(RELOGIN_NEEDED, &vha->dpc_flags);
 		return;
 	}
 
diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c
index c48daf52725d..fa8c4dae8dce 100644
--- a/drivers/scsi/qla2xxx/qla_target.c
+++ b/drivers/scsi/qla2xxx/qla_target.c
@@ -1029,7 +1029,7 @@ void qlt_free_session_done(struct work_struct *work)
 			}
 			msleep(100);
 			cnt++;
-			if (cnt > 200)
+			if (cnt > 230)
 				break;
 		}
 
-- 
2.19.0.rc0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 04/11] qla2xxx: consolidate zio threshold setting for both fcp & nvme
  2021-03-23  4:42 [PATCH 00/11] qla2xxx driver bug fixes Nilesh Javali
                   ` (2 preceding siblings ...)
  2021-03-23  4:42 ` [PATCH 03/11] qla2xxx: fix stuck session Nilesh Javali
@ 2021-03-23  4:42 ` Nilesh Javali
  2021-03-24 15:55   ` Himanshu Madhani
  2021-03-23  4:42 ` [PATCH 05/11] qla2xxx: Fix use after free in bsg Nilesh Javali
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Nilesh Javali @ 2021-03-23  4:42 UTC (permalink / raw)
  To: martin.petersen; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

From: Quinn Tran <qutran@marvell.com>

consolidate zio threshold setting for both fcp & nvme to prevent
one protocol from clobbering the setting of the other protocol.

Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
---
 drivers/scsi/qla2xxx/qla_def.h |  1 -
 drivers/scsi/qla2xxx/qla_os.c  | 34 ++++++++++++++--------------------
 2 files changed, 14 insertions(+), 21 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h
index 49b42b430df4..3d09f31895e7 100644
--- a/drivers/scsi/qla2xxx/qla_def.h
+++ b/drivers/scsi/qla2xxx/qla_def.h
@@ -4727,7 +4727,6 @@ typedef struct scsi_qla_host {
 #define FX00_CRITEMP_RECOVERY	25
 #define FX00_HOST_INFO_RESEND	26
 #define QPAIR_ONLINE_CHECK_NEEDED	27
-#define SET_NVME_ZIO_THRESHOLD_NEEDED	28
 #define DETECT_SFP_CHANGE	29
 #define N2N_LOGIN_NEEDED	30
 #define IOCB_WORK_ACTIVE	31
diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index 074392560f3d..6563d69706ba 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -6969,28 +6969,23 @@ qla2x00_do_dpc(void *data)
 			mutex_unlock(&ha->mq_lock);
 		}
 
-		if (test_and_clear_bit(SET_NVME_ZIO_THRESHOLD_NEEDED,
-		    &base_vha->dpc_flags)) {
+		if (test_and_clear_bit(SET_ZIO_THRESHOLD_NEEDED,
+				       &base_vha->dpc_flags)) {
+			u16 threshold = ha->nvme_last_rptd_aen + ha->last_zio_threshold;
+
+			if (threshold > ha->orig_fw_xcb_count)
+				threshold = ha->orig_fw_xcb_count;
+
 			ql_log(ql_log_info, base_vha, 0xffffff,
-				"nvme: SET ZIO Activity exchange threshold to %d.\n",
-						ha->nvme_last_rptd_aen);
-			if (qla27xx_set_zio_threshold(base_vha,
-			    ha->nvme_last_rptd_aen)) {
+			       "SET ZIO Activity exchange threshold to %d.\n",
+			       threshold);
+			if (qla27xx_set_zio_threshold(base_vha, threshold)) {
 				ql_log(ql_log_info, base_vha, 0xffffff,
-				    "nvme: Unable to SET ZIO Activity exchange threshold to %d.\n",
-				    ha->nvme_last_rptd_aen);
+				       "Unable to SET ZIO Activity exchange threshold to %d.\n",
+				       threshold);
 			}
 		}
 
-		if (test_and_clear_bit(SET_ZIO_THRESHOLD_NEEDED,
-		    &base_vha->dpc_flags)) {
-			ql_log(ql_log_info, base_vha, 0xffffff,
-			    "SET ZIO Activity exchange threshold to %d.\n",
-			    ha->last_zio_threshold);
-			qla27xx_set_zio_threshold(base_vha,
-			    ha->last_zio_threshold);
-		}
-
 		if (!IS_QLAFX00(ha))
 			qla2x00_do_dpc_all_vps(base_vha);
 
@@ -7218,14 +7213,13 @@ qla2x00_timer(struct timer_list *t)
 	index = atomic_read(&ha->nvme_active_aen_cnt);
 	if (!vha->vp_idx &&
 	    (index != ha->nvme_last_rptd_aen) &&
-	    (index >= DEFAULT_ZIO_THRESHOLD) &&
 	    ha->zio_mode == QLA_ZIO_MODE_6 &&
 	    !ha->flags.host_shutting_down) {
+		ha->nvme_last_rptd_aen = atomic_read(&ha->nvme_active_aen_cnt);
 		ql_log(ql_log_info, vha, 0x3002,
 		    "nvme: Sched: Set ZIO exchange threshold to %d.\n",
 		    ha->nvme_last_rptd_aen);
-		ha->nvme_last_rptd_aen = atomic_read(&ha->nvme_active_aen_cnt);
-		set_bit(SET_NVME_ZIO_THRESHOLD_NEEDED, &vha->dpc_flags);
+		set_bit(SET_ZIO_THRESHOLD_NEEDED, &vha->dpc_flags);
 		start_dpc++;
 	}
 
-- 
2.19.0.rc0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 05/11] qla2xxx: Fix use after free in bsg
  2021-03-23  4:42 [PATCH 00/11] qla2xxx driver bug fixes Nilesh Javali
                   ` (3 preceding siblings ...)
  2021-03-23  4:42 ` [PATCH 04/11] qla2xxx: consolidate zio threshold setting for both fcp & nvme Nilesh Javali
@ 2021-03-23  4:42 ` Nilesh Javali
  2021-03-24 15:57   ` Himanshu Madhani
  2021-03-23  4:42 ` [PATCH 06/11] qla2xxx: Fix crash in qla2xxx_mqueuecommand Nilesh Javali
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Nilesh Javali @ 2021-03-23  4:42 UTC (permalink / raw)
  To: martin.petersen; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

From: Quinn Tran <qutran@marvell.com>

On bsg command completion, bsg_job_done was called while
qla driver continue to access the bsg_job buffer. The bsg_job_done
can free up resources and reuse by other task, qla continue access
of the same resource can read garbage data.

localhost kernel: BUG: KASAN: use-after-free in sg_next+0x64/0x80
localhost kernel: Read of size 8 at addr ffff8883228a3330 by task swapper/26/0
localhost kernel:
localhost kernel: CPU: 26 PID: 0 Comm: swapper/26 Kdump:
loaded Tainted: G          OE    --------- -  - 4.18.0-193.el8.x86_64+debug #1
localhost kernel: Hardware name: HP ProLiant DL360
Gen9/ProLiant DL360 Gen9, BIOS P89 08/12/2016
localhost kernel: Call Trace:
localhost kernel: <IRQ>
localhost kernel: dump_stack+0x9a/0xf0
localhost kernel: print_address_description.cold.3+0x9/0x23b
localhost kernel: kasan_report.cold.4+0x65/0x95
localhost kernel: debug_dma_unmap_sg.part.12+0x10d/0x2d0
localhost kernel: qla2x00_bsg_sp_free+0xaf6/0x1010 [qla2xxx]

Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
---
 drivers/scsi/qla2xxx/qla_bsg.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/qla2xxx/qla_bsg.c b/drivers/scsi/qla2xxx/qla_bsg.c
index bee8cf9f8123..d021e51344f5 100644
--- a/drivers/scsi/qla2xxx/qla_bsg.c
+++ b/drivers/scsi/qla2xxx/qla_bsg.c
@@ -25,10 +25,11 @@ void qla2x00_bsg_job_done(srb_t *sp, int res)
 	struct bsg_job *bsg_job = sp->u.bsg_job;
 	struct fc_bsg_reply *bsg_reply = bsg_job->reply;
 
+	sp->free(sp);
+
 	bsg_reply->result = res;
 	bsg_job_done(bsg_job, bsg_reply->result,
 		       bsg_reply->reply_payload_rcv_len);
-	sp->free(sp);
 }
 
 void qla2x00_bsg_sp_free(srb_t *sp)
-- 
2.19.0.rc0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 06/11] qla2xxx: Fix crash in qla2xxx_mqueuecommand
  2021-03-23  4:42 [PATCH 00/11] qla2xxx driver bug fixes Nilesh Javali
                   ` (4 preceding siblings ...)
  2021-03-23  4:42 ` [PATCH 05/11] qla2xxx: Fix use after free in bsg Nilesh Javali
@ 2021-03-23  4:42 ` Nilesh Javali
  2021-03-24 15:57   ` Himanshu Madhani
  2021-03-23  4:42 ` [PATCH 07/11] qla2xxx: fix RISC RESET completion polling Nilesh Javali
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Nilesh Javali @ 2021-03-23  4:42 UTC (permalink / raw)
  To: martin.petersen; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

From: Arun Easi <aeasi@marvell.com>

    RIP: 0010:kmem_cache_free+0xfa/0x1b0
    Call Trace:
       qla2xxx_mqueuecommand+0x2b5/0x2c0 [qla2xxx]
       scsi_queue_rq+0x5e2/0xa40
       __blk_mq_try_issue_directly+0x128/0x1d0
       blk_mq_request_issue_directly+0x4e/0xb0

Fix incorrect call to free srb in qla2xxx_mqueuecommand, as
srb is now allocated by upper layers. This fixes smatch warning of
srb unintended free.

Fixes: af2a0c51b120 ("scsi: qla2xxx: Fix SRB leak on switch command timeout")
Cc: stable@vger.kernel.org # 5.5
Reported-by: Laurence Oberman <loberman@redhat.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
---
 drivers/scsi/qla2xxx/qla_os.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index 6563d69706ba..6a57399b515f 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -1013,8 +1013,6 @@ qla2xxx_mqueuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd,
 	if (rval != QLA_SUCCESS) {
 		ql_dbg(ql_dbg_io + ql_dbg_verbose, vha, 0x3078,
 		    "Start scsi failed rval=%d for cmd=%p.\n", rval, cmd);
-		if (rval == QLA_INTERFACE_ERROR)
-			goto qc24_free_sp_fail_command;
 		goto qc24_host_busy_free_sp;
 	}
 
@@ -1026,11 +1024,6 @@ qla2xxx_mqueuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd,
 qc24_target_busy:
 	return SCSI_MLQUEUE_TARGET_BUSY;
 
-qc24_free_sp_fail_command:
-	sp->free(sp);
-	CMD_SP(cmd) = NULL;
-	qla2xxx_rel_qpair_sp(sp->qpair, sp);
-
 qc24_fail_command:
 	cmd->scsi_done(cmd);
 
-- 
2.19.0.rc0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 07/11] qla2xxx: fix RISC RESET completion polling
  2021-03-23  4:42 [PATCH 00/11] qla2xxx driver bug fixes Nilesh Javali
                   ` (5 preceding siblings ...)
  2021-03-23  4:42 ` [PATCH 06/11] qla2xxx: Fix crash in qla2xxx_mqueuecommand Nilesh Javali
@ 2021-03-23  4:42 ` Nilesh Javali
  2021-03-24 16:03   ` Himanshu Madhani
  2021-03-23  4:42 ` [PATCH 08/11] qla2xxx: fix crash in PCIe error handling Nilesh Javali
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Nilesh Javali @ 2021-03-23  4:42 UTC (permalink / raw)
  To: martin.petersen; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

From: Quinn Tran <qutran@marvell.com>

After risc reset, the poll time for risc reset completion is
too short. Fix the completion polling time.

Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
---
 drivers/scsi/qla2xxx/qla_init.c | 65 ++++++++++++++++++++++++++++++---
 1 file changed, 59 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
index f6dc8166e7ba..19681d3c5b7a 100644
--- a/drivers/scsi/qla2xxx/qla_init.c
+++ b/drivers/scsi/qla2xxx/qla_init.c
@@ -2767,6 +2767,49 @@ qla81xx_reset_mpi(scsi_qla_host_t *vha)
 	return qla81xx_write_mpi_register(vha, mb);
 }
 
+static int
+qla_chk_risc_recovery(scsi_qla_host_t *vha)
+{
+	struct qla_hw_data *ha = vha->hw;
+	struct device_reg_24xx __iomem *reg = &ha->iobase->isp24;
+	__le16 __iomem *mbptr = &reg->mailbox0;
+	int i;
+	u16 mb[32];
+	int rc = QLA_SUCCESS;
+
+	if (!IS_QLA27XX(ha) && !IS_QLA28XX(ha))
+		return rc;
+
+	/* this check is only valid after RISC reset */
+	mb[0] = rd_reg_word(mbptr);
+	mbptr++;
+	if (mb[0] == 0xf) {
+		rc = QLA_FUNCTION_FAILED;
+
+		for (i = 1; i < 32; i++) {
+			mb[i] = rd_reg_word(mbptr);
+			mbptr++;
+		}
+
+		ql_log(ql_log_warn, vha, 0x1015,
+		       "RISC reset failed. mb[0-7] %04xh %04xh %04xh %04xh %04xh %04xh %04xh %04xh\n",
+		       mb[0], mb[1], mb[2], mb[3], mb[4], mb[5], mb[6], mb[7]);
+		ql_log(ql_log_warn, vha, 0x1015,
+		       "RISC reset failed. mb[8-15] %04xh %04xh %04xh %04xh %04xh %04xh %04xh %04xh\n",
+		       mb[8], mb[9], mb[10], mb[11], mb[12], mb[13], mb[14],
+		       mb[15]);
+		ql_log(ql_log_warn, vha, 0x1015,
+		       "RISC reset failed. mb[16-23] %04xh %04xh %04xh %04xh %04xh %04xh %04xh %04xh\n",
+		       mb[16], mb[17], mb[18], mb[19], mb[20], mb[21], mb[22],
+		       mb[23]);
+		ql_log(ql_log_warn, vha, 0x1015,
+		       "RISC reset failed. mb[24-31] %04xh %04xh %04xh %04xh %04xh %04xh %04xh %04xh\n",
+		       mb[24], mb[25], mb[26], mb[27], mb[28], mb[29], mb[30],
+		       mb[31]);
+	}
+	return rc;
+}
+
 /**
  * qla24xx_reset_risc() - Perform full reset of ISP24xx RISC.
  * @vha: HA context
@@ -2783,6 +2826,7 @@ qla24xx_reset_risc(scsi_qla_host_t *vha)
 	uint16_t wd;
 	static int abts_cnt; /* ISP abort retry counts */
 	int rval = QLA_SUCCESS;
+	int print = 1;
 
 	spin_lock_irqsave(&ha->hardware_lock, flags);
 
@@ -2871,17 +2915,26 @@ qla24xx_reset_risc(scsi_qla_host_t *vha)
 	rd_reg_dword(&reg->hccr);
 
 	wrt_reg_dword(&reg->hccr, HCCRX_CLR_RISC_RESET);
+	mdelay(10);
 	rd_reg_dword(&reg->hccr);
 
-	rd_reg_word(&reg->mailbox0);
-	for (cnt = 60; rd_reg_word(&reg->mailbox0) != 0 &&
-	    rval == QLA_SUCCESS; cnt--) {
+	wd = rd_reg_word(&reg->mailbox0);
+	for (cnt = 300; wd != 0 && rval == QLA_SUCCESS; cnt--) {
 		barrier();
-		if (cnt)
-			udelay(5);
-		else
+		if (cnt) {
+			mdelay(1);
+			if (print && qla_chk_risc_recovery(vha))
+				print = 0;
+
+			wd = rd_reg_word(&reg->mailbox0);
+		} else {
 			rval = QLA_FUNCTION_TIMEOUT;
+
+			ql_log(ql_log_warn, vha, 0x015e,
+			       "RISC reset timeout\n");
+		}
 	}
+
 	if (rval == QLA_SUCCESS)
 		set_bit(RISC_RDY_AFT_RESET, &ha->fw_dump_cap_flags);
 
-- 
2.19.0.rc0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 08/11] qla2xxx: fix crash in PCIe error handling
  2021-03-23  4:42 [PATCH 00/11] qla2xxx driver bug fixes Nilesh Javali
                   ` (6 preceding siblings ...)
  2021-03-23  4:42 ` [PATCH 07/11] qla2xxx: fix RISC RESET completion polling Nilesh Javali
@ 2021-03-23  4:42 ` Nilesh Javali
  2021-03-24 16:21   ` Himanshu Madhani
  2021-03-23  4:42 ` [PATCH 09/11] qla2xxx: fix mailbox recovery during PCIe error Nilesh Javali
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Nilesh Javali @ 2021-03-23  4:42 UTC (permalink / raw)
  To: martin.petersen; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

From: Quinn Tran <qutran@marvell.com>

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: qla2x00_abort_isp+0x21/0x6b0 [qla2xxx] PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 1715 Comm: kworker/0:2
Tainted: GOE 4.12.14-122.37-default #1 SLE12-SP5
Hardware name: HPE Superdome Flex/Superdome Flex, BIOS
Bundle:3.30.100 SFW:IP147.007.004.017.000.2009211957 09/21/2020
Workqueue: events aer_recover_work_func
task: ffff9e399c14ca80 task.stack: ffffc1c58e4ac000
RIP: 0010:qla2x00_abort_isp+0x21/0x6b0 [qla2xxx]
RSP: 0018:ffffc1c58e4afd50 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff9e419cdef480 RCX: 0000000000000000
RDX: ffff9e399c14ca80 RSI: 0000000000000246 RDI: ffff9e419bbc27b8
RBP: ffff9e419bbc27b8 R08: 0000000000000004 R09: 00000000a0440000
R10: 0000000000000000 R11: ffff9e399416d1a0 R12: ffff9e419cdef000
R13: ffff9e3a7cfae800 R14: ffff9e3a7cfae800 R15: 00000000000000c0
FS:  0000000000000000(0000) GS:ffff9e39a0000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000006cd00a005 CR4: 00000000007606f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
  qla2xxx_pci_slot_reset+0x141/0x160 [qla2xxx]
  report_slot_reset+0x41/0x80
  ? merge_result.part.4+0x30/0x30
  pci_walk_bus+0x70/0x90
  pcie_do_recovery+0x1db/0x2e0
  aer_recover_work_func+0xc2/0xf0
  process_one_work+0x14c/0x390

Disable board_disable logic where driver resources are freed
while OS is in the process of recovering the adapter.

Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
---
 drivers/scsi/qla2xxx/qla_dbg.c    |  32 ++++++
 drivers/scsi/qla2xxx/qla_def.h    |  11 ++
 drivers/scsi/qla2xxx/qla_gbl.h    |   3 +
 drivers/scsi/qla2xxx/qla_init.c   |  40 ++++---
 drivers/scsi/qla2xxx/qla_inline.h |  29 +++++
 drivers/scsi/qla2xxx/qla_iocb.c   |  60 +++++++++--
 drivers/scsi/qla2xxx/qla_isr.c    |   9 +-
 drivers/scsi/qla2xxx/qla_mbx.c    |   3 +-
 drivers/scsi/qla2xxx/qla_nvme.c   |  10 +-
 drivers/scsi/qla2xxx/qla_os.c     | 171 ++++++++++++++++++------------
 10 files changed, 264 insertions(+), 104 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_dbg.c b/drivers/scsi/qla2xxx/qla_dbg.c
index 144a893e7335..059f6f9b8698 100644
--- a/drivers/scsi/qla2xxx/qla_dbg.c
+++ b/drivers/scsi/qla2xxx/qla_dbg.c
@@ -113,8 +113,18 @@ qla27xx_dump_mpi_ram(struct qla_hw_data *ha, uint32_t addr, uint32_t *ram,
 	uint32_t stat;
 	ulong i, j, timer = 6000000;
 	int rval = QLA_FUNCTION_FAILED;
+	scsi_qla_host_t *vha = pci_get_drvdata(ha->pdev);
 
 	clear_bit(MBX_INTERRUPT, &ha->mbx_cmd_flags);
+
+	stat = rd_reg_dword(&reg->host_status);
+	if (stat == 0xffffffff) {
+		ql_log(ql_log_info, vha, 0x8041,
+		       "dump mpi ram detect PCI disconnect, exiting.\n");
+		qla_schedule_eeh_work(vha);
+		return rval;
+	}
+
 	for (i = 0; i < ram_dwords; i += dwords, addr += dwords) {
 		if (i + dwords > ram_dwords)
 			dwords = ram_dwords - i;
@@ -139,6 +149,13 @@ qla27xx_dump_mpi_ram(struct qla_hw_data *ha, uint32_t addr, uint32_t *ram,
 			udelay(5);
 
 			stat = rd_reg_dword(&reg->host_status);
+			if (stat == 0xffffffff) {
+				ql_log(ql_log_info, vha, 0x8041,
+				       "dump mpi ram detect PCI disconnect, exiting.\n");
+				qla_schedule_eeh_work(vha);
+				return rval;
+			}
+
 			/* Check for pending interrupts. */
 			if (!(stat & HSRX_RISC_INT))
 				continue;
@@ -192,9 +209,18 @@ qla24xx_dump_ram(struct qla_hw_data *ha, uint32_t addr, __be32 *ram,
 	uint32_t dwords = qla2x00_gid_list_size(ha) / 4;
 	uint32_t stat;
 	ulong i, j, timer = 6000000;
+	scsi_qla_host_t *vha = pci_get_drvdata(ha->pdev);
 
 	clear_bit(MBX_INTERRUPT, &ha->mbx_cmd_flags);
 
+	stat = rd_reg_dword(&reg->host_status);
+	if (stat == 0xffffffff) {
+		ql_log(ql_log_info, vha, 0x8041,
+		       "dump ram detect PCI disconnect, exiting.\n");
+		qla_schedule_eeh_work(vha);
+		return rval;
+	}
+
 	for (i = 0; i < ram_dwords; i += dwords, addr += dwords) {
 		if (i + dwords > ram_dwords)
 			dwords = ram_dwords - i;
@@ -217,6 +243,12 @@ qla24xx_dump_ram(struct qla_hw_data *ha, uint32_t addr, __be32 *ram,
 		while (timer--) {
 			udelay(5);
 			stat = rd_reg_dword(&reg->host_status);
+			if (stat == 0xffffffff) {
+				ql_log(ql_log_info, vha, 0x8041,
+				       "dump ram detect PCI disconnect, exiting.\n");
+				qla_schedule_eeh_work(vha);
+				return rval;
+			}
 
 			/* Check for pending interrupts. */
 			if (!(stat & HSRX_RISC_INT))
diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h
index 3d09f31895e7..b6b4d4a5b2e8 100644
--- a/drivers/scsi/qla2xxx/qla_def.h
+++ b/drivers/scsi/qla2xxx/qla_def.h
@@ -396,6 +396,7 @@ typedef union {
 	} b;
 } port_id_t;
 #define INVALID_PORT_ID	0xFFFFFF
+#define ISP_REG16_DISCONNECT 0xFFFF
 
 static inline le_id_t be_id_to_le(be_id_t id)
 {
@@ -3857,6 +3858,13 @@ struct qla_hw_data_stat {
 	u32 num_mpi_reset;
 };
 
+/* refer to pcie_do_recovery reference */
+typedef enum {
+	QLA_PCI_RESUME,
+	QLA_PCI_ERR_DETECTED,
+	QLA_PCI_MMIO_ENABLED,
+	QLA_PCI_SLOT_RESET,
+} pci_error_state_t;
 /*
  * Qlogic host adapter specific data structure.
 */
@@ -3928,6 +3936,7 @@ struct qla_hw_data {
 		uint32_t	max_req_queue_warned:1;
 		uint32_t	plogi_template_valid:1;
 		uint32_t	port_isolated:1;
+		uint32_t	eeh_work_pending:1;
 	} flags;
 
 	uint16_t max_exchg;
@@ -4607,6 +4616,7 @@ struct qla_hw_data {
 #define DEFAULT_ZIO_THRESHOLD 5
 
 	struct qla_hw_data_stat stat;
+	pci_error_state_t pci_error_state;
 };
 
 struct active_regions {
@@ -4727,6 +4737,7 @@ typedef struct scsi_qla_host {
 #define FX00_CRITEMP_RECOVERY	25
 #define FX00_HOST_INFO_RESEND	26
 #define QPAIR_ONLINE_CHECK_NEEDED	27
+#define DO_EEH_RECOVERY		28
 #define DETECT_SFP_CHANGE	29
 #define N2N_LOGIN_NEEDED	30
 #define IOCB_WORK_ACTIVE	31
diff --git a/drivers/scsi/qla2xxx/qla_gbl.h b/drivers/scsi/qla2xxx/qla_gbl.h
index 6486f97d649e..fae5cae6f0a8 100644
--- a/drivers/scsi/qla2xxx/qla_gbl.h
+++ b/drivers/scsi/qla2xxx/qla_gbl.h
@@ -224,6 +224,7 @@ extern int qla2x00_post_uevent_work(struct scsi_qla_host *, u32);
 
 extern int qla2x00_post_uevent_work(struct scsi_qla_host *, u32);
 extern void qla2x00_disable_board_on_pci_error(struct work_struct *);
+extern void qla_eeh_work(struct work_struct *);
 extern void qla2x00_sp_compl(srb_t *sp, int);
 extern void qla2xxx_qpair_sp_free_dma(srb_t *sp);
 extern void qla2xxx_qpair_sp_compl(srb_t *sp, int);
@@ -235,6 +236,8 @@ int qla24xx_post_relogin_work(struct scsi_qla_host *vha);
 void qla2x00_wait_for_sess_deletion(scsi_qla_host_t *);
 void qla24xx_process_purex_rdp(struct scsi_qla_host *vha,
 			       struct purex_item *pkt);
+void qla_pci_set_eeh_busy(struct scsi_qla_host *);
+void qla_schedule_eeh_work(struct scsi_qla_host *);
 
 /*
  * Global Functions in qla_mid.c source file.
diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
index 19681d3c5b7a..cd051eee8cd1 100644
--- a/drivers/scsi/qla2xxx/qla_init.c
+++ b/drivers/scsi/qla2xxx/qla_init.c
@@ -6932,22 +6932,18 @@ qla2x00_abort_isp_cleanup(scsi_qla_host_t *vha)
 	}
 	spin_unlock_irqrestore(&ha->vport_slock, flags);
 
-	if (!ha->flags.eeh_busy) {
-		/* Make sure for ISP 82XX IO DMA is complete */
-		if (IS_P3P_TYPE(ha)) {
-			qla82xx_chip_reset_cleanup(vha);
-			ql_log(ql_log_info, vha, 0x00b4,
-			    "Done chip reset cleanup.\n");
-
-			/* Done waiting for pending commands.
-			 * Reset the online flag.
-			 */
-			vha->flags.online = 0;
-		}
+	/* Make sure for ISP 82XX IO DMA is complete */
+	if (IS_P3P_TYPE(ha)) {
+		qla82xx_chip_reset_cleanup(vha);
+		ql_log(ql_log_info, vha, 0x00b4,
+		       "Done chip reset cleanup.\n");
 
-		/* Requeue all commands in outstanding command list. */
-		qla2x00_abort_all_cmds(vha, DID_RESET << 16);
+		/* Done waiting for pending commands. Reset online flag */
+		vha->flags.online = 0;
 	}
+
+	/* Requeue all commands in outstanding command list. */
+	qla2x00_abort_all_cmds(vha, DID_RESET << 16);
 	/* memory barrier */
 	wmb();
 }
@@ -6978,6 +6974,12 @@ qla2x00_abort_isp(scsi_qla_host_t *vha)
 		if (vha->hw->flags.port_isolated)
 			return status;
 
+		if (qla2x00_isp_reg_stat(ha)) {
+			ql_log(ql_log_info, vha, 0x803f,
+			       "PCI/Register disconnect 1, exiting.\n");
+			return status;
+		}
+
 		if (test_and_clear_bit(ISP_ABORT_TO_ROM, &vha->dpc_flags)) {
 			ha->flags.chip_reset_done = 1;
 			vha->flags.online = 1;
@@ -7017,8 +7019,18 @@ qla2x00_abort_isp(scsi_qla_host_t *vha)
 
 		ha->isp_ops->get_flash_version(vha, req->ring);
 
+		if (qla2x00_isp_reg_stat(ha)) {
+			ql_log(ql_log_info, vha, 0x803f,
+			       "PCI/Register disconnect 2, exiting.\n");
+			return status;
+		}
 		ha->isp_ops->nvram_config(vha);
 
+		if (qla2x00_isp_reg_stat(ha)) {
+			ql_log(ql_log_info, vha, 0x803f,
+			       "PCI/Register disconnect 3, exiting.\n");
+			return status;
+		}
 		if (!qla2x00_restart_isp(vha)) {
 			clear_bit(RESET_MARKER_NEEDED, &vha->dpc_flags);
 
diff --git a/drivers/scsi/qla2xxx/qla_inline.h b/drivers/scsi/qla2xxx/qla_inline.h
index e80e41b6c9e1..12eb74ae1859 100644
--- a/drivers/scsi/qla2xxx/qla_inline.h
+++ b/drivers/scsi/qla2xxx/qla_inline.h
@@ -432,3 +432,32 @@ qla_put_iocbs(struct qla_qpair *qp, struct iocb_resource *iores)
 	}
 	iores->res_type = RESOURCE_NONE;
 }
+
+#define ISP_REG_DISCONNECT 0xffffffffU
+/**************************************************************************
+ * qla2x00_isp_reg_stat
+ *
+ * Description:
+ *        Read the host status register of ISP before aborting the command.
+ *
+ * Input:
+ *       ha = pointer to host adapter structure.
+ *
+ *
+ * Returns:
+ *       Either true or false.
+ *
+ * Note: Return true if there is register disconnect.
+ **************************************************************************/
+static inline
+uint32_t qla2x00_isp_reg_stat(struct qla_hw_data *ha)
+{
+	struct device_reg_24xx __iomem *reg = &ha->iobase->isp24;
+	struct device_reg_82xx __iomem *reg82 = &ha->iobase->isp82;
+
+	if (IS_P3P_TYPE(ha))
+		return ((rd_reg_dword(&reg82->host_int)) == ISP_REG_DISCONNECT);
+	else
+		return ((rd_reg_dword(&reg->host_status)) ==
+			ISP_REG_DISCONNECT);
+}
diff --git a/drivers/scsi/qla2xxx/qla_iocb.c b/drivers/scsi/qla2xxx/qla_iocb.c
index f26a7a14fce9..a86a856215c5 100644
--- a/drivers/scsi/qla2xxx/qla_iocb.c
+++ b/drivers/scsi/qla2xxx/qla_iocb.c
@@ -1645,8 +1645,14 @@ qla24xx_start_scsi(srb_t *sp)
 		goto queuing_error;
 
 	if (req->cnt < (req_cnt + 2)) {
-		cnt = IS_SHADOW_REG_CAPABLE(ha) ? *req->out_ptr :
-		    rd_reg_dword_relaxed(req->req_q_out);
+		if (IS_SHADOW_REG_CAPABLE(ha)) {
+			cnt = *req->out_ptr;
+		} else {
+			cnt = rd_reg_dword_relaxed(req->req_q_out);
+			if (qla2x00_check_reg16_for_disconnect(vha, cnt))
+				goto queuing_error;
+		}
+
 		if (req->ring_index < cnt)
 			req->cnt = cnt - req->ring_index;
 		else
@@ -1842,8 +1848,13 @@ qla24xx_dif_start_scsi(srb_t *sp)
 		goto queuing_error;
 
 	if (req->cnt < (req_cnt + 2)) {
-		cnt = IS_SHADOW_REG_CAPABLE(ha) ? *req->out_ptr :
-		    rd_reg_dword_relaxed(req->req_q_out);
+		if (IS_SHADOW_REG_CAPABLE(ha)) {
+			cnt = *req->out_ptr;
+		} else {
+			cnt = rd_reg_dword_relaxed(req->req_q_out);
+			if (qla2x00_check_reg16_for_disconnect(vha, cnt))
+				goto queuing_error;
+		}
 		if (req->ring_index < cnt)
 			req->cnt = cnt - req->ring_index;
 		else
@@ -1922,6 +1933,7 @@ qla24xx_dif_start_scsi(srb_t *sp)
 
 	qla_put_iocbs(sp->qpair, &sp->iores);
 	spin_unlock_irqrestore(&ha->hardware_lock, flags);
+
 	return QLA_FUNCTION_FAILED;
 }
 
@@ -1991,8 +2003,14 @@ qla2xxx_start_scsi_mq(srb_t *sp)
 		goto queuing_error;
 
 	if (req->cnt < (req_cnt + 2)) {
-		cnt = IS_SHADOW_REG_CAPABLE(ha) ? *req->out_ptr :
-		    rd_reg_dword_relaxed(req->req_q_out);
+		if (IS_SHADOW_REG_CAPABLE(ha)) {
+			cnt = *req->out_ptr;
+		} else {
+			cnt = rd_reg_dword_relaxed(req->req_q_out);
+			if (qla2x00_check_reg16_for_disconnect(vha, cnt))
+				goto queuing_error;
+		}
+
 		if (req->ring_index < cnt)
 			req->cnt = cnt - req->ring_index;
 		else
@@ -2203,8 +2221,14 @@ qla2xxx_dif_start_scsi_mq(srb_t *sp)
 		goto queuing_error;
 
 	if (req->cnt < (req_cnt + 2)) {
-		cnt = IS_SHADOW_REG_CAPABLE(ha) ? *req->out_ptr :
-		    rd_reg_dword_relaxed(req->req_q_out);
+		if (IS_SHADOW_REG_CAPABLE(ha)) {
+			cnt = *req->out_ptr;
+		} else {
+			cnt = rd_reg_dword_relaxed(req->req_q_out);
+			if (qla2x00_check_reg16_for_disconnect(vha, cnt))
+				goto queuing_error;
+		}
+
 		if (req->ring_index < cnt)
 			req->cnt = cnt - req->ring_index;
 		else
@@ -2281,6 +2305,7 @@ qla2xxx_dif_start_scsi_mq(srb_t *sp)
 
 	qla_put_iocbs(sp->qpair, &sp->iores);
 	spin_unlock_irqrestore(&qpair->qp_lock, flags);
+
 	return QLA_FUNCTION_FAILED;
 }
 
@@ -2325,6 +2350,11 @@ __qla2x00_alloc_iocbs(struct qla_qpair *qpair, srb_t *sp)
 			cnt = qla2x00_debounce_register(
 			    ISP_REQ_Q_OUT(ha, &reg->isp));
 
+		if (!qpair->use_shadow_reg && cnt == ISP_REG16_DISCONNECT) {
+			qla_schedule_eeh_work(vha);
+			return NULL;
+		}
+
 		if  (req->ring_index < cnt)
 			req->cnt = cnt - req->ring_index;
 		else
@@ -3739,6 +3769,9 @@ qla2x00_start_sp(srb_t *sp)
 	void *pkt;
 	unsigned long flags;
 
+	if (vha->hw->flags.eeh_busy)
+		return -EIO;
+
 	spin_lock_irqsave(qp->qp_lock_ptr, flags);
 	pkt = __qla2x00_alloc_iocbs(sp->qpair, sp);
 	if (!pkt) {
@@ -3956,8 +3989,14 @@ qla2x00_start_bidir(srb_t *sp, struct scsi_qla_host *vha, uint32_t tot_dsds)
 
 	/* Check for room on request queue. */
 	if (req->cnt < req_cnt + 2) {
-		cnt = IS_SHADOW_REG_CAPABLE(ha) ? *req->out_ptr :
-		    rd_reg_dword_relaxed(req->req_q_out);
+		if (IS_SHADOW_REG_CAPABLE(ha)) {
+			cnt = *req->out_ptr;
+		} else {
+			cnt = rd_reg_dword_relaxed(req->req_q_out);
+			if (qla2x00_check_reg16_for_disconnect(vha, cnt))
+				goto queuing_error;
+		}
+
 		if  (req->ring_index < cnt)
 			req->cnt = cnt - req->ring_index;
 		else
@@ -3996,5 +4035,6 @@ qla2x00_start_bidir(srb_t *sp, struct scsi_qla_host *vha, uint32_t tot_dsds)
 	qla2x00_start_iocbs(vha, req);
 queuing_error:
 	spin_unlock_irqrestore(&ha->hardware_lock, flags);
+
 	return rval;
 }
diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
index 5e188375c871..f21007c8ca51 100644
--- a/drivers/scsi/qla2xxx/qla_isr.c
+++ b/drivers/scsi/qla2xxx/qla_isr.c
@@ -270,12 +270,7 @@ qla2x00_check_reg32_for_disconnect(scsi_qla_host_t *vha, uint32_t reg)
 		if (!test_and_set_bit(PFLG_DISCONNECTED, &vha->pci_flags) &&
 		    !test_bit(PFLG_DRIVER_REMOVING, &vha->pci_flags) &&
 		    !test_bit(PFLG_DRIVER_PROBING, &vha->pci_flags)) {
-			/*
-			 * Schedule this (only once) on the default system
-			 * workqueue so that all the adapter workqueues and the
-			 * DPC thread can be shutdown cleanly.
-			 */
-			schedule_work(&vha->hw->board_disable);
+			qla_schedule_eeh_work(vha);
 		}
 		return true;
 	} else
@@ -1657,8 +1652,6 @@ qla2x00_async_event(scsi_qla_host_t *vha, struct rsp_que *rsp, uint16_t *mb)
 	case MBA_TEMPERATURE_ALERT:
 		ql_dbg(ql_dbg_async, vha, 0x505e,
 		    "TEMPERATURE ALERT: %04x %04x %04x\n", mb[1], mb[2], mb[3]);
-		if (mb[1] == 0x12)
-			schedule_work(&ha->board_disable);
 		break;
 
 	case MBA_TRANS_INSERT:
diff --git a/drivers/scsi/qla2xxx/qla_mbx.c b/drivers/scsi/qla2xxx/qla_mbx.c
index 06c99963b2c9..0149f84cdd8e 100644
--- a/drivers/scsi/qla2xxx/qla_mbx.c
+++ b/drivers/scsi/qla2xxx/qla_mbx.c
@@ -161,7 +161,8 @@ qla2x00_mailbox_command(scsi_qla_host_t *vha, mbx_cmd_t *mcp)
 	/* check if ISP abort is active and return cmd with timeout */
 	if ((test_bit(ABORT_ISP_ACTIVE, &base_vha->dpc_flags) ||
 	    test_bit(ISP_ABORT_RETRY, &base_vha->dpc_flags) ||
-	    test_bit(ISP_ABORT_NEEDED, &base_vha->dpc_flags)) &&
+	    test_bit(ISP_ABORT_NEEDED, &base_vha->dpc_flags) ||
+	    ha->flags.eeh_busy) &&
 	    !is_rom_cmd(mcp->mb[0])) {
 		ql_log(ql_log_info, vha, 0x1005,
 		    "Cmd 0x%x aborted with timeout since ISP Abort is pending\n",
diff --git a/drivers/scsi/qla2xxx/qla_nvme.c b/drivers/scsi/qla2xxx/qla_nvme.c
index 0237588f48b0..0cacb667a88b 100644
--- a/drivers/scsi/qla2xxx/qla_nvme.c
+++ b/drivers/scsi/qla2xxx/qla_nvme.c
@@ -398,8 +398,13 @@ static inline int qla2x00_start_nvme_mq(srb_t *sp)
 	}
 	req_cnt = qla24xx_calc_iocbs(vha, tot_dsds);
 	if (req->cnt < (req_cnt + 2)) {
-		cnt = IS_SHADOW_REG_CAPABLE(ha) ? *req->out_ptr :
-		    rd_reg_dword_relaxed(req->req_q_out);
+		if (IS_SHADOW_REG_CAPABLE(ha)) {
+			cnt = *req->out_ptr;
+		} else {
+			cnt = rd_reg_dword_relaxed(req->req_q_out);
+			if (qla2x00_check_reg16_for_disconnect(vha, cnt))
+				goto queuing_error;
+		}
 
 		if (req->ring_index < cnt)
 			req->cnt = cnt - req->ring_index;
@@ -536,6 +541,7 @@ static inline int qla2x00_start_nvme_mq(srb_t *sp)
 
 queuing_error:
 	spin_unlock_irqrestore(&qpair->qp_lock, flags);
+
 	return rval;
 }
 
diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index 6a57399b515f..135cadecdaa4 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -971,6 +971,13 @@ qla2xxx_mqueuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd,
 		goto qc24_fail_command;
 	}
 
+	if (!qpair->online) {
+		ql_dbg(ql_dbg_io, vha, 0x3077,
+		       "qpair not online. eeh_busy=%d.\n", ha->flags.eeh_busy);
+		cmd->result = DID_NO_CONNECT << 16;
+		goto qc24_fail_command;
+	}
+
 	if (!fcport || fcport->deleted) {
 		cmd->result = DID_IMM_RETRY << 16;
 		goto qc24_fail_command;
@@ -1200,35 +1207,6 @@ qla2x00_wait_for_chip_reset(scsi_qla_host_t *vha)
 	return return_status;
 }
 
-#define ISP_REG_DISCONNECT 0xffffffffU
-/**************************************************************************
-* qla2x00_isp_reg_stat
-*
-* Description:
-*	Read the host status register of ISP before aborting the command.
-*
-* Input:
-*	ha = pointer to host adapter structure.
-*
-*
-* Returns:
-*	Either true or false.
-*
-* Note:	Return true if there is register disconnect.
-**************************************************************************/
-static inline
-uint32_t qla2x00_isp_reg_stat(struct qla_hw_data *ha)
-{
-	struct device_reg_24xx __iomem *reg = &ha->iobase->isp24;
-	struct device_reg_82xx __iomem *reg82 = &ha->iobase->isp82;
-
-	if (IS_P3P_TYPE(ha))
-		return ((rd_reg_dword(&reg82->host_int)) == ISP_REG_DISCONNECT);
-	else
-		return ((rd_reg_dword(&reg->host_status)) ==
-			ISP_REG_DISCONNECT);
-}
-
 /**************************************************************************
 * qla2xxx_eh_abort
 *
@@ -1262,6 +1240,7 @@ qla2xxx_eh_abort(struct scsi_cmnd *cmd)
 	if (qla2x00_isp_reg_stat(ha)) {
 		ql_log(ql_log_info, vha, 0x8042,
 		    "PCI/Register disconnect, exiting.\n");
+		qla_pci_set_eeh_busy(vha);
 		return FAILED;
 	}
 
@@ -1455,6 +1434,7 @@ qla2xxx_eh_device_reset(struct scsi_cmnd *cmd)
 	if (qla2x00_isp_reg_stat(ha)) {
 		ql_log(ql_log_info, vha, 0x803e,
 		    "PCI/Register disconnect, exiting.\n");
+		qla_pci_set_eeh_busy(vha);
 		return FAILED;
 	}
 
@@ -1471,6 +1451,7 @@ qla2xxx_eh_target_reset(struct scsi_cmnd *cmd)
 	if (qla2x00_isp_reg_stat(ha)) {
 		ql_log(ql_log_info, vha, 0x803f,
 		    "PCI/Register disconnect, exiting.\n");
+		qla_pci_set_eeh_busy(vha);
 		return FAILED;
 	}
 
@@ -1506,6 +1487,7 @@ qla2xxx_eh_bus_reset(struct scsi_cmnd *cmd)
 	if (qla2x00_isp_reg_stat(ha)) {
 		ql_log(ql_log_info, vha, 0x8040,
 		    "PCI/Register disconnect, exiting.\n");
+		qla_pci_set_eeh_busy(vha);
 		return FAILED;
 	}
 
@@ -1583,7 +1565,7 @@ qla2xxx_eh_host_reset(struct scsi_cmnd *cmd)
 	if (qla2x00_isp_reg_stat(ha)) {
 		ql_log(ql_log_info, vha, 0x8041,
 		    "PCI/Register disconnect, exiting.\n");
-		schedule_work(&ha->board_disable);
+		qla_pci_set_eeh_busy(vha);
 		return SUCCESS;
 	}
 
@@ -6669,6 +6651,9 @@ qla2x00_do_dpc(void *data)
 
 		schedule();
 
+		if (test_and_clear_bit(DO_EEH_RECOVERY, &base_vha->dpc_flags))
+			qla_pci_set_eeh_busy(base_vha);
+
 		if (!base_vha->flags.init_done || ha->flags.mbox_busy)
 			goto end_loop;
 
@@ -7385,6 +7370,8 @@ static void qla_pci_error_cleanup(scsi_qla_host_t *vha)
 	int i;
 	unsigned long flags;
 
+	ql_dbg(ql_dbg_aer, vha, 0x9000,
+	       "%s\n", __func__);
 	ha->chip_reset++;
 
 	ha->base_qpair->chip_reset = ha->chip_reset;
@@ -7394,28 +7381,15 @@ static void qla_pci_error_cleanup(scsi_qla_host_t *vha)
 			    ha->base_qpair->chip_reset;
 	}
 
-	/* purge MBox commands */
-	if (atomic_read(&ha->num_pend_mbx_stage3)) {
-		clear_bit(MBX_INTR_WAIT, &ha->mbx_cmd_flags);
-		complete(&ha->mbx_intr_comp);
-	}
-
-	i = 0;
-
-	while (atomic_read(&ha->num_pend_mbx_stage3) ||
-	    atomic_read(&ha->num_pend_mbx_stage2) ||
-	    atomic_read(&ha->num_pend_mbx_stage1)) {
-		msleep(20);
-		i++;
-		if (i > 50)
-			break;
-	}
-
-	ha->flags.purge_mbox = 0;
+	/* purge mailbox might take a while. Slot Reset/chip reset
+	 * will take care of the purge
+	 */
 
 	mutex_lock(&ha->mq_lock);
+	ha->base_qpair->online = 0;
 	list_for_each_entry(qpair, &base_vha->qp_list, qp_list_elem)
 		qpair->online = 0;
+	wmb();
 	mutex_unlock(&ha->mq_lock);
 
 	qla2x00_mark_all_devices_lost(vha);
@@ -7452,14 +7426,17 @@ qla2xxx_pci_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
 {
 	scsi_qla_host_t *vha = pci_get_drvdata(pdev);
 	struct qla_hw_data *ha = vha->hw;
+	pci_ers_result_t ret = PCI_ERS_RESULT_NEED_RESET;
 
-	ql_dbg(ql_dbg_aer, vha, 0x9000,
-	    "PCI error detected, state %x.\n", state);
+	ql_log(ql_log_warn, vha, 0x9000,
+	       "PCI error detected, state %x.\n", state);
+	ha->pci_error_state = QLA_PCI_ERR_DETECTED;
 
 	if (!atomic_read(&pdev->enable_cnt)) {
 		ql_log(ql_log_info, vha, 0xffff,
 			"PCI device is disabled,state %x\n", state);
-		return PCI_ERS_RESULT_NEED_RESET;
+		ret = PCI_ERS_RESULT_NEED_RESET;
+		goto out;
 	}
 
 	switch (state) {
@@ -7469,11 +7446,12 @@ qla2xxx_pci_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
 			set_bit(QPAIR_ONLINE_CHECK_NEEDED, &vha->dpc_flags);
 			qla2xxx_wake_dpc(vha);
 		}
-		return PCI_ERS_RESULT_CAN_RECOVER;
+		ret = PCI_ERS_RESULT_CAN_RECOVER;
+		break;
 	case pci_channel_io_frozen:
-		ha->flags.eeh_busy = 1;
-		qla_pci_error_cleanup(vha);
-		return PCI_ERS_RESULT_NEED_RESET;
+		qla_pci_set_eeh_busy(vha);
+		ret = PCI_ERS_RESULT_NEED_RESET;
+		break;
 	case pci_channel_io_perm_failure:
 		ha->flags.pci_channel_io_perm_failure = 1;
 		qla2x00_abort_all_cmds(vha, DID_NO_CONNECT << 16);
@@ -7481,9 +7459,12 @@ qla2xxx_pci_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
 			set_bit(QPAIR_ONLINE_CHECK_NEEDED, &vha->dpc_flags);
 			qla2xxx_wake_dpc(vha);
 		}
-		return PCI_ERS_RESULT_DISCONNECT;
+		ret = PCI_ERS_RESULT_DISCONNECT;
 	}
-	return PCI_ERS_RESULT_NEED_RESET;
+out:
+	ql_dbg(ql_dbg_aer, vha, 0x600d,
+	       "PCI error detected returning [%x].\n", ret);
+	return ret;
 }
 
 static pci_ers_result_t
@@ -7497,6 +7478,10 @@ qla2xxx_pci_mmio_enabled(struct pci_dev *pdev)
 	struct device_reg_2xxx __iomem *reg = &ha->iobase->isp;
 	struct device_reg_24xx __iomem *reg24 = &ha->iobase->isp24;
 
+	ql_log(ql_log_warn, base_vha, 0x9000,
+	       "mmio enabled\n");
+
+	ha->pci_error_state = QLA_PCI_MMIO_ENABLED;
 	if (IS_QLA82XX(ha))
 		return PCI_ERS_RESULT_RECOVERED;
 
@@ -7520,10 +7505,11 @@ qla2xxx_pci_mmio_enabled(struct pci_dev *pdev)
 		ql_log(ql_log_info, base_vha, 0x9003,
 		    "RISC paused -- mmio_enabled, Dumping firmware.\n");
 		qla2xxx_dump_fw(base_vha);
-
-		return PCI_ERS_RESULT_NEED_RESET;
-	} else
-		return PCI_ERS_RESULT_RECOVERED;
+	}
+	/* set PCI_ERS_RESULT_NEED_RESET to trigger call to qla2xxx_pci_slot_reset */
+	ql_dbg(ql_dbg_aer, base_vha, 0x600d,
+	       "mmio enabled returning.\n");
+	return PCI_ERS_RESULT_NEED_RESET;
 }
 
 static pci_ers_result_t
@@ -7535,9 +7521,10 @@ qla2xxx_pci_slot_reset(struct pci_dev *pdev)
 	int rc;
 	struct qla_qpair *qpair = NULL;
 
-	ql_dbg(ql_dbg_aer, base_vha, 0x9004,
-	    "Slot Reset.\n");
+	ql_log(ql_log_warn, base_vha, 0x9004,
+	       "Slot Reset.\n");
 
+	ha->pci_error_state = QLA_PCI_SLOT_RESET;
 	/* Workaround: qla2xxx driver which access hardware earlier
 	 * needs error state to be pci_channel_io_online.
 	 * Otherwise mailbox command timesout.
@@ -7571,16 +7558,24 @@ qla2xxx_pci_slot_reset(struct pci_dev *pdev)
 		qpair->online = 1;
 	mutex_unlock(&ha->mq_lock);
 
+	ha->flags.eeh_busy = 0;
 	base_vha->flags.online = 1;
 	set_bit(ABORT_ISP_ACTIVE, &base_vha->dpc_flags);
-	if (ha->isp_ops->abort_isp(base_vha) == QLA_SUCCESS)
-		ret =  PCI_ERS_RESULT_RECOVERED;
+	ha->isp_ops->abort_isp(base_vha);
 	clear_bit(ABORT_ISP_ACTIVE, &base_vha->dpc_flags);
 
+	if (qla2x00_isp_reg_stat(ha)) {
+		ha->flags.eeh_busy = 1;
+		qla_pci_error_cleanup(base_vha);
+		ql_log(ql_log_warn, base_vha, 0x9005,
+		       "Device unable to recover from PCI error.\n");
+	} else {
+		ret =  PCI_ERS_RESULT_RECOVERED;
+	}
 
 exit_slot_reset:
 	ql_dbg(ql_dbg_aer, base_vha, 0x900e,
-	    "slot_reset return %x.\n", ret);
+	    "Slot Reset returning %x.\n", ret);
 
 	return ret;
 }
@@ -7592,16 +7587,54 @@ qla2xxx_pci_resume(struct pci_dev *pdev)
 	struct qla_hw_data *ha = base_vha->hw;
 	int ret;
 
-	ql_dbg(ql_dbg_aer, base_vha, 0x900f,
-	    "pci_resume.\n");
+	ql_log(ql_log_warn, base_vha, 0x900f,
+	       "Pci Resume.\n");
 
-	ha->flags.eeh_busy = 0;
 
 	ret = qla2x00_wait_for_hba_online(base_vha);
 	if (ret != QLA_SUCCESS) {
 		ql_log(ql_log_fatal, base_vha, 0x9002,
 		    "The device failed to resume I/O from slot/link_reset.\n");
 	}
+	ha->pci_error_state = QLA_PCI_RESUME;
+	ql_dbg(ql_dbg_aer, base_vha, 0x600d,
+	       "Pci Resume returning.\n");
+}
+
+void qla_pci_set_eeh_busy(struct scsi_qla_host *vha)
+{
+	struct qla_hw_data *ha = vha->hw;
+	struct scsi_qla_host *base_vha = pci_get_drvdata(ha->pdev);
+	bool do_cleanup = false;
+	unsigned long flags;
+
+	if (ha->flags.eeh_busy)
+		return;
+
+	spin_lock_irqsave(&base_vha->work_lock, flags);
+	if (!ha->flags.eeh_busy) {
+		ha->flags.eeh_busy = 1;
+		do_cleanup = true;
+	}
+	spin_unlock_irqrestore(&base_vha->work_lock, flags);
+
+	if (do_cleanup)
+		qla_pci_error_cleanup(base_vha);
+}
+
+/* this routine will schedule a task to pause IO from interrupt context
+ * if caller sees a PCIE error event (register read = 0xf's)
+ */
+void qla_schedule_eeh_work(struct scsi_qla_host *vha)
+{
+	struct qla_hw_data *ha = vha->hw;
+	struct scsi_qla_host *base_vha = pci_get_drvdata(ha->pdev);
+
+	if (ha->flags.eeh_busy)
+		return;
+
+	set_bit(DO_EEH_RECOVERY, &base_vha->dpc_flags);
+	qla2xxx_wake_dpc(base_vha);
 }
 
 static void
-- 
2.19.0.rc0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 09/11] qla2xxx: fix mailbox recovery during PCIe error
  2021-03-23  4:42 [PATCH 00/11] qla2xxx driver bug fixes Nilesh Javali
                   ` (7 preceding siblings ...)
  2021-03-23  4:42 ` [PATCH 08/11] qla2xxx: fix crash in PCIe error handling Nilesh Javali
@ 2021-03-23  4:42 ` Nilesh Javali
  2021-03-24 16:23   ` Himanshu Madhani
  2021-03-23  4:42 ` [PATCH 10/11] qla2xxx: include AER debug mask to default Nilesh Javali
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Nilesh Javali @ 2021-03-23  4:42 UTC (permalink / raw)
  To: martin.petersen; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

From: Quinn Tran <qutran@marvell.com>

For the mailbox thread that encounter PCIe error, pause that
thread until PCIe link reset/recovery completed to prevent
the thread from possibly unmapping any type of DMA resource that might
be in progress at the same time.

Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
---
 drivers/scsi/qla2xxx/qla_mbx.c | 38 ++++++++++++++++++++++++++--------
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_mbx.c b/drivers/scsi/qla2xxx/qla_mbx.c
index 0149f84cdd8e..3bc6020cfb8d 100644
--- a/drivers/scsi/qla2xxx/qla_mbx.c
+++ b/drivers/scsi/qla2xxx/qla_mbx.c
@@ -102,7 +102,7 @@ qla2x00_mailbox_command(scsi_qla_host_t *vha, mbx_cmd_t *mcp)
 	int		rval, i;
 	unsigned long    flags = 0;
 	device_reg_t *reg;
-	uint8_t		abort_active;
+	uint8_t		abort_active, eeh_delay;
 	uint8_t		io_lock_on;
 	uint16_t	command = 0;
 	uint16_t	*iptr;
@@ -136,7 +136,7 @@ qla2x00_mailbox_command(scsi_qla_host_t *vha, mbx_cmd_t *mcp)
 		    "PCI error, exiting.\n");
 		return QLA_FUNCTION_TIMEOUT;
 	}
-
+	eeh_delay = 0;
 	reg = ha->iobase;
 	io_lock_on = base_vha->flags.init_done;
 
@@ -159,11 +159,10 @@ qla2x00_mailbox_command(scsi_qla_host_t *vha, mbx_cmd_t *mcp)
 	}
 
 	/* check if ISP abort is active and return cmd with timeout */
-	if ((test_bit(ABORT_ISP_ACTIVE, &base_vha->dpc_flags) ||
-	    test_bit(ISP_ABORT_RETRY, &base_vha->dpc_flags) ||
-	    test_bit(ISP_ABORT_NEEDED, &base_vha->dpc_flags) ||
-	    ha->flags.eeh_busy) &&
-	    !is_rom_cmd(mcp->mb[0])) {
+	if (((test_bit(ABORT_ISP_ACTIVE, &base_vha->dpc_flags) ||
+	      test_bit(ISP_ABORT_RETRY, &base_vha->dpc_flags) ||
+	      test_bit(ISP_ABORT_NEEDED, &base_vha->dpc_flags)) &&
+	      !is_rom_cmd(mcp->mb[0])) || ha->flags.eeh_busy) {
 		ql_log(ql_log_info, vha, 0x1005,
 		    "Cmd 0x%x aborted with timeout since ISP Abort is pending\n",
 		    mcp->mb[0]);
@@ -186,7 +185,11 @@ qla2x00_mailbox_command(scsi_qla_host_t *vha, mbx_cmd_t *mcp)
 		return QLA_FUNCTION_TIMEOUT;
 	}
 	atomic_dec(&ha->num_pend_mbx_stage1);
-	if (ha->flags.purge_mbox || chip_reset != ha->chip_reset) {
+	if (ha->flags.purge_mbox || chip_reset != ha->chip_reset ||
+	    ha->flags.eeh_busy) {
+		ql_log(ql_log_warn, vha, 0xd035,
+		       "Error detected: purge[%d] eeh[%d] cmd=0x%x, Exiting.\n",
+		       ha->flags.purge_mbox, ha->flags.eeh_busy, mcp->mb[0]);
 		rval = QLA_ABORTED;
 		goto premature_exit;
 	}
@@ -266,6 +269,8 @@ qla2x00_mailbox_command(scsi_qla_host_t *vha, mbx_cmd_t *mcp)
 		if (!wait_for_completion_timeout(&ha->mbx_intr_comp,
 		    mcp->tov * HZ)) {
 			if (chip_reset != ha->chip_reset) {
+				eeh_delay = ha->flags.eeh_busy ? 1 : 0;
+
 				spin_lock_irqsave(&ha->hardware_lock, flags);
 				ha->flags.mbox_busy = 0;
 				spin_unlock_irqrestore(&ha->hardware_lock,
@@ -283,6 +288,8 @@ qla2x00_mailbox_command(scsi_qla_host_t *vha, mbx_cmd_t *mcp)
 
 		} else if (ha->flags.purge_mbox ||
 		    chip_reset != ha->chip_reset) {
+			eeh_delay = ha->flags.eeh_busy ? 1 : 0;
+
 			spin_lock_irqsave(&ha->hardware_lock, flags);
 			ha->flags.mbox_busy = 0;
 			spin_unlock_irqrestore(&ha->hardware_lock, flags);
@@ -324,6 +331,8 @@ qla2x00_mailbox_command(scsi_qla_host_t *vha, mbx_cmd_t *mcp)
 		while (!ha->flags.mbox_int) {
 			if (ha->flags.purge_mbox ||
 			    chip_reset != ha->chip_reset) {
+				eeh_delay = ha->flags.eeh_busy ? 1 : 0;
+
 				spin_lock_irqsave(&ha->hardware_lock, flags);
 				ha->flags.mbox_busy = 0;
 				spin_unlock_irqrestore(&ha->hardware_lock,
@@ -532,7 +541,8 @@ qla2x00_mailbox_command(scsi_qla_host_t *vha, mbx_cmd_t *mcp)
 				clear_bit(ISP_ABORT_NEEDED, &vha->dpc_flags);
 				/* Allow next mbx cmd to come in. */
 				complete(&ha->mbx_cmd_comp);
-				if (ha->isp_ops->abort_isp(vha)) {
+				if (ha->isp_ops->abort_isp(vha) &&
+				    !ha->flags.eeh_busy) {
 					/* Failed. retry later. */
 					set_bit(ISP_ABORT_NEEDED,
 					    &vha->dpc_flags);
@@ -585,6 +595,16 @@ qla2x00_mailbox_command(scsi_qla_host_t *vha, mbx_cmd_t *mcp)
 		ql_dbg(ql_dbg_mbx, base_vha, 0x1021, "Done %s.\n", __func__);
 	}
 
+	i = 500;
+	while (i && eeh_delay && (ha->pci_error_state < QLA_PCI_SLOT_RESET)) {
+		/* The caller of this mailbox encounter pci error.
+		 * Hold the thread until PCIE link reset complete to make
+		 * sure caller does not unmap dma while recovery is
+		 * in progress.
+		 */
+		msleep(1);
+		i--;
+	}
 	return rval;
 }
 
-- 
2.19.0.rc0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 10/11] qla2xxx: include AER debug mask to default
  2021-03-23  4:42 [PATCH 00/11] qla2xxx driver bug fixes Nilesh Javali
                   ` (8 preceding siblings ...)
  2021-03-23  4:42 ` [PATCH 09/11] qla2xxx: fix mailbox recovery during PCIe error Nilesh Javali
@ 2021-03-23  4:42 ` Nilesh Javali
  2021-03-24 16:24   ` Himanshu Madhani
  2021-03-23  4:42 ` [PATCH 11/11] qla2xxx: Update version to 10.02.00.106-k Nilesh Javali
  2021-03-24 20:55 ` [PATCH 00/11] qla2xxx driver bug fixes Laurence Oberman
  11 siblings, 1 reply; 29+ messages in thread
From: Nilesh Javali @ 2021-03-23  4:42 UTC (permalink / raw)
  To: martin.petersen; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

From: Quinn Tran <qutran@marvell.com>

Use PCIe AER debug mask as default.

Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
---
 drivers/scsi/qla2xxx/qla_dbg.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/qla2xxx/qla_dbg.h b/drivers/scsi/qla2xxx/qla_dbg.h
index 2e59e75c62b5..9eb708e5e22e 100644
--- a/drivers/scsi/qla2xxx/qla_dbg.h
+++ b/drivers/scsi/qla2xxx/qla_dbg.h
@@ -308,7 +308,7 @@ struct qla2xxx_fw_dump {
 };
 
 #define QL_MSGHDR "qla2xxx"
-#define QL_DBG_DEFAULT1_MASK    0x1e400000
+#define QL_DBG_DEFAULT1_MASK    0x1e600000
 
 #define ql_log_fatal		0 /* display fatal errors */
 #define ql_log_warn		1 /* display critical errors */
-- 
2.19.0.rc0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 11/11] qla2xxx: Update version to 10.02.00.106-k
  2021-03-23  4:42 [PATCH 00/11] qla2xxx driver bug fixes Nilesh Javali
                   ` (9 preceding siblings ...)
  2021-03-23  4:42 ` [PATCH 10/11] qla2xxx: include AER debug mask to default Nilesh Javali
@ 2021-03-23  4:42 ` Nilesh Javali
  2021-03-24 16:24   ` Himanshu Madhani
  2021-03-24 20:55 ` [PATCH 00/11] qla2xxx driver bug fixes Laurence Oberman
  11 siblings, 1 reply; 29+ messages in thread
From: Nilesh Javali @ 2021-03-23  4:42 UTC (permalink / raw)
  To: martin.petersen; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

Signed-off-by: Nilesh Javali <njavali@marvell.com>
---
 drivers/scsi/qla2xxx/qla_version.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_version.h b/drivers/scsi/qla2xxx/qla_version.h
index 72c648442e8d..da11829fa12d 100644
--- a/drivers/scsi/qla2xxx/qla_version.h
+++ b/drivers/scsi/qla2xxx/qla_version.h
@@ -6,9 +6,9 @@
 /*
  * Driver version
  */
-#define QLA2XXX_VERSION      "10.02.00.105-k"
+#define QLA2XXX_VERSION      "10.02.00.106-k"
 
 #define QLA_DRIVER_MAJOR_VER	10
 #define QLA_DRIVER_MINOR_VER	2
 #define QLA_DRIVER_PATCH_VER	0
-#define QLA_DRIVER_BETA_VER	105
+#define QLA_DRIVER_BETA_VER	106
-- 
2.19.0.rc0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 03/11] qla2xxx: fix stuck session
  2021-03-23  4:42 ` [PATCH 03/11] qla2xxx: fix stuck session Nilesh Javali
@ 2021-03-23 16:31   ` Bart Van Assche
  2021-03-24 15:53     ` Himanshu Madhani
  0 siblings, 1 reply; 29+ messages in thread
From: Bart Van Assche @ 2021-03-23 16:31 UTC (permalink / raw)
  To: Nilesh Javali, martin.petersen; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

On 3/22/21 9:42 PM, Nilesh Javali wrote:
> diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c
> index c48daf52725d..fa8c4dae8dce 100644
> --- a/drivers/scsi/qla2xxx/qla_target.c
> +++ b/drivers/scsi/qla2xxx/qla_target.c
> @@ -1029,7 +1029,7 @@ void qlt_free_session_done(struct work_struct *work)
>  			}
>  			msleep(100);
>  			cnt++;
> -			if (cnt > 200)
> +			if (cnt > 230)
>  				break;
>  		}

One magic constant is changed into another magic constant and that is
sufficient to fix a bug? Please add a comment that explains the meaning
of that constant.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 01/11] qla2xxx: Fix IOPS drop seen in some adapters
  2021-03-23  4:42 ` [PATCH 01/11] qla2xxx: Fix IOPS drop seen in some adapters Nilesh Javali
@ 2021-03-24 15:46   ` Himanshu Madhani
  2021-03-26  0:53     ` Arun Easi
  0 siblings, 1 reply; 29+ messages in thread
From: Himanshu Madhani @ 2021-03-24 15:46 UTC (permalink / raw)
  To: Nilesh Javali; +Cc: Martin Petersen, linux-scsi, GR-QLogic-Storage-Upstream



> On Mar 22, 2021, at 11:42 PM, Nilesh Javali <njavali@marvell.com> wrote:
> 
> From: Arun Easi <aeasi@marvell.com>
> 
> Removing the response queue processing in the send path is showing IOPS
> drop. Add back the process_response_queue() call in the send path.
> 

Can you share some details of what effect this change made into IOPS? 

> Signed-off-by: Arun Easi <aeasi@marvell.com>
> Signed-off-by: Nilesh Javali <njavali@marvell.com>
> ---
> drivers/scsi/qla2xxx/qla_iocb.c | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_iocb.c b/drivers/scsi/qla2xxx/qla_iocb.c
> index 8b41cbaf8535..f26a7a14fce9 100644
> --- a/drivers/scsi/qla2xxx/qla_iocb.c
> +++ b/drivers/scsi/qla2xxx/qla_iocb.c
> @@ -1600,12 +1600,14 @@ qla24xx_start_scsi(srb_t *sp)
> 	uint16_t	req_cnt;
> 	uint16_t	tot_dsds;
> 	struct req_que *req = NULL;
> +	struct rsp_que *rsp;
> 	struct scsi_cmnd *cmd = GET_CMD_SP(sp);
> 	struct scsi_qla_host *vha = sp->vha;
> 	struct qla_hw_data *ha = vha->hw;
> 
> 	/* Setup device pointers. */
> 	req = vha->req;
> +	rsp = req->rsp;
> 
> 	/* So we know we haven't pci_map'ed anything yet */
> 	tot_dsds = 0;
> @@ -1707,6 +1709,11 @@ qla24xx_start_scsi(srb_t *sp)
> 	/* Set chip new ring index. */
> 	wrt_reg_dword(req->req_q_in, req->ring_index);
> 
> +	/* Manage unprocessed RIO/ZIO commands in response queue. */
> +	if (vha->flags.process_response_queue &&
> +	    rsp->ring_ptr->signature != RESPONSE_PROCESSED)
> +		qla24xx_process_response_queue(vha, rsp);
> +
> 	spin_unlock_irqrestore(&ha->hardware_lock, flags);
> 	return QLA_SUCCESS;
> 
> @@ -1897,6 +1904,11 @@ qla24xx_dif_start_scsi(srb_t *sp)
> 	/* Set chip new ring index. */
> 	wrt_reg_dword(req->req_q_in, req->ring_index);
> 
> +	/* Manage unprocessed RIO/ZIO commands in response queue. */
> +	if (vha->flags.process_response_queue &&
> +	    rsp->ring_ptr->signature != RESPONSE_PROCESSED)
> +		qla24xx_process_response_queue(vha, rsp);
> +
> 	spin_unlock_irqrestore(&ha->hardware_lock, flags);
> 
> 	return QLA_SUCCESS;
> @@ -1931,6 +1943,7 @@ qla2xxx_start_scsi_mq(srb_t *sp)
> 	uint16_t	req_cnt;
> 	uint16_t	tot_dsds;
> 	struct req_que *req = NULL;
> +	struct rsp_que *rsp;
> 	struct scsi_cmnd *cmd = GET_CMD_SP(sp);
> 	struct scsi_qla_host *vha = sp->fcport->vha;
> 	struct qla_hw_data *ha = vha->hw;
> @@ -1941,6 +1954,7 @@ qla2xxx_start_scsi_mq(srb_t *sp)
> 
> 	/* Setup qpair pointers */
> 	req = qpair->req;
> +	rsp = qpair->rsp;
> 
> 	/* So we know we haven't pci_map'ed anything yet */
> 	tot_dsds = 0;
> @@ -2041,6 +2055,11 @@ qla2xxx_start_scsi_mq(srb_t *sp)
> 	/* Set chip new ring index. */
> 	wrt_reg_dword(req->req_q_in, req->ring_index);
> 
> +	/* Manage unprocessed RIO/ZIO commands in response queue. */
> +	if (vha->flags.process_response_queue &&
> +	    rsp->ring_ptr->signature != RESPONSE_PROCESSED)
> +		qla24xx_process_response_queue(vha, rsp);
> +
> 	spin_unlock_irqrestore(&qpair->qp_lock, flags);
> 	return QLA_SUCCESS;
> 
> -- 
> 2.19.0.rc0
> 

Patch itself looks good. After you add more details in commit message you can add 

Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>

--
Himanshu Madhani	 Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 02/11] qla2xxx: Add H:C:T info in the log message for fc ports
  2021-03-23  4:42 ` [PATCH 02/11] qla2xxx: Add H:C:T info in the log message for fc ports Nilesh Javali
@ 2021-03-24 15:48   ` Himanshu Madhani
  0 siblings, 0 replies; 29+ messages in thread
From: Himanshu Madhani @ 2021-03-24 15:48 UTC (permalink / raw)
  To: Nilesh Javali; +Cc: Martin Petersen, linux-scsi, GR-QLogic-Storage-Upstream



> On Mar 22, 2021, at 11:42 PM, Nilesh Javali <njavali@marvell.com> wrote:
> 
> From: Arun Easi <aeasi@marvell.com>
> 
> The host:channel:scsi_target_id information is helpful in matching
> an fc port with a scsi device, so add it. For initiator fc ports,
> a -1 would be displayed for "target" part.
> 
> Signed-off-by: Arun Easi <aeasi@marvell.com>
> Signed-off-by: Nilesh Javali <njavali@marvell.com>
> ---
> drivers/scsi/qla2xxx/qla_init.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
> index f01f07116bd3..af237c485389 100644
> --- a/drivers/scsi/qla2xxx/qla_init.c
> +++ b/drivers/scsi/qla2xxx/qla_init.c
> @@ -5512,13 +5512,14 @@ qla2x00_reg_remote_port(scsi_qla_host_t *vha, fc_port_t *fcport)
> 	if (fcport->port_type & FCT_NVME_DISCOVERY)
> 		rport_ids.roles |= FC_PORT_ROLE_NVME_DISCOVERY;
> 
> +	fc_remote_port_rolechg(rport, rport_ids.roles);
> +
> 	ql_dbg(ql_dbg_disc, vha, 0x20ee,
> -	    "%s %8phN. rport %p is %s mode\n",
> -	    __func__, fcport->port_name, rport,
> +	    "%s: %8phN. rport %ld:0:%d (%p) is %s mode\n",
> +	    __func__, fcport->port_name, vha->host_no,
> +	    rport->scsi_target_id, rport,
> 	    (fcport->port_type == FCT_TARGET) ? "tgt" :
> 	    ((fcport->port_type & FCT_NVME) ? "nvme" : "ini"));
> -
> -	fc_remote_port_rolechg(rport, rport_ids.roles);
> }
> 
> /*
> -- 
> 2.19.0.rc0
> 

Looks good.

Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>

--
Himanshu Madhani	 Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 03/11] qla2xxx: fix stuck session
  2021-03-23 16:31   ` Bart Van Assche
@ 2021-03-24 15:53     ` Himanshu Madhani
  2021-03-26 17:46       ` Quinn Tran
  0 siblings, 1 reply; 29+ messages in thread
From: Himanshu Madhani @ 2021-03-24 15:53 UTC (permalink / raw)
  To: Nilesh Javali
  Cc: Martin Petersen, linux-scsi, Bart Van Assche, GR-QLogic-Storage-Upstream



> On Mar 23, 2021, at 11:31 AM, Bart Van Assche <bvanassche@acm.org> wrote:
> 
> On 3/22/21 9:42 PM, Nilesh Javali wrote:
>> diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c
>> index c48daf52725d..fa8c4dae8dce 100644
>> --- a/drivers/scsi/qla2xxx/qla_target.c
>> +++ b/drivers/scsi/qla2xxx/qla_target.c
>> @@ -1029,7 +1029,7 @@ void qlt_free_session_done(struct work_struct *work)
>> 			}
>> 			msleep(100);
>> 			cnt++;
>> -			if (cnt > 200)
>> +			if (cnt > 230)
>> 				break;
>> 		}
> 
> One magic constant is changed into another magic constant and that is
> sufficient to fix a bug? Please add a comment that explains the meaning
> of that constant.
> 

Agree with Bart here. 

How did you come up with this new count value?  Some more details (either in commit message or actual comment in code) would definitely help understand. If you have some log message snippet or stack trace add that to commit message.

> Thanks,
> 
> Bart.
> 

--
Himanshu Madhani	 Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 04/11] qla2xxx: consolidate zio threshold setting for both fcp & nvme
  2021-03-23  4:42 ` [PATCH 04/11] qla2xxx: consolidate zio threshold setting for both fcp & nvme Nilesh Javali
@ 2021-03-24 15:55   ` Himanshu Madhani
  0 siblings, 0 replies; 29+ messages in thread
From: Himanshu Madhani @ 2021-03-24 15:55 UTC (permalink / raw)
  To: Nilesh Javali; +Cc: Martin Petersen, linux-scsi, GR-QLogic-Storage-Upstream



> On Mar 22, 2021, at 11:42 PM, Nilesh Javali <njavali@marvell.com> wrote:
> 
> From: Quinn Tran <qutran@marvell.com>
> 
> consolidate zio threshold setting for both fcp & nvme to prevent
> one protocol from clobbering the setting of the other protocol.
> 
> Signed-off-by: Quinn Tran <qutran@marvell.com>
> Signed-off-by: Nilesh Javali <njavali@marvell.com>
> ---
> drivers/scsi/qla2xxx/qla_def.h |  1 -
> drivers/scsi/qla2xxx/qla_os.c  | 34 ++++++++++++++--------------------
> 2 files changed, 14 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h
> index 49b42b430df4..3d09f31895e7 100644
> --- a/drivers/scsi/qla2xxx/qla_def.h
> +++ b/drivers/scsi/qla2xxx/qla_def.h
> @@ -4727,7 +4727,6 @@ typedef struct scsi_qla_host {
> #define FX00_CRITEMP_RECOVERY	25
> #define FX00_HOST_INFO_RESEND	26
> #define QPAIR_ONLINE_CHECK_NEEDED	27
> -#define SET_NVME_ZIO_THRESHOLD_NEEDED	28
> #define DETECT_SFP_CHANGE	29
> #define N2N_LOGIN_NEEDED	30
> #define IOCB_WORK_ACTIVE	31
> diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
> index 074392560f3d..6563d69706ba 100644
> --- a/drivers/scsi/qla2xxx/qla_os.c
> +++ b/drivers/scsi/qla2xxx/qla_os.c
> @@ -6969,28 +6969,23 @@ qla2x00_do_dpc(void *data)
> 			mutex_unlock(&ha->mq_lock);
> 		}
> 
> -		if (test_and_clear_bit(SET_NVME_ZIO_THRESHOLD_NEEDED,
> -		    &base_vha->dpc_flags)) {
> +		if (test_and_clear_bit(SET_ZIO_THRESHOLD_NEEDED,
> +				       &base_vha->dpc_flags)) {
> +			u16 threshold = ha->nvme_last_rptd_aen + ha->last_zio_threshold;
> +
> +			if (threshold > ha->orig_fw_xcb_count)
> +				threshold = ha->orig_fw_xcb_count;
> +
> 			ql_log(ql_log_info, base_vha, 0xffffff,
> -				"nvme: SET ZIO Activity exchange threshold to %d.\n",
> -						ha->nvme_last_rptd_aen);
> -			if (qla27xx_set_zio_threshold(base_vha,
> -			    ha->nvme_last_rptd_aen)) {
> +			       "SET ZIO Activity exchange threshold to %d.\n",
> +			       threshold);
> +			if (qla27xx_set_zio_threshold(base_vha, threshold)) {
> 				ql_log(ql_log_info, base_vha, 0xffffff,
> -				    "nvme: Unable to SET ZIO Activity exchange threshold to %d.\n",
> -				    ha->nvme_last_rptd_aen);
> +				       "Unable to SET ZIO Activity exchange threshold to %d.\n",
> +				       threshold);
> 			}
> 		}
> 
> -		if (test_and_clear_bit(SET_ZIO_THRESHOLD_NEEDED,
> -		    &base_vha->dpc_flags)) {
> -			ql_log(ql_log_info, base_vha, 0xffffff,
> -			    "SET ZIO Activity exchange threshold to %d.\n",
> -			    ha->last_zio_threshold);
> -			qla27xx_set_zio_threshold(base_vha,
> -			    ha->last_zio_threshold);
> -		}
> -
> 		if (!IS_QLAFX00(ha))
> 			qla2x00_do_dpc_all_vps(base_vha);
> 
> @@ -7218,14 +7213,13 @@ qla2x00_timer(struct timer_list *t)
> 	index = atomic_read(&ha->nvme_active_aen_cnt);
> 	if (!vha->vp_idx &&
> 	    (index != ha->nvme_last_rptd_aen) &&
> -	    (index >= DEFAULT_ZIO_THRESHOLD) &&
> 	    ha->zio_mode == QLA_ZIO_MODE_6 &&
> 	    !ha->flags.host_shutting_down) {
> +		ha->nvme_last_rptd_aen = atomic_read(&ha->nvme_active_aen_cnt);
> 		ql_log(ql_log_info, vha, 0x3002,
> 		    "nvme: Sched: Set ZIO exchange threshold to %d.\n",
> 		    ha->nvme_last_rptd_aen);
> -		ha->nvme_last_rptd_aen = atomic_read(&ha->nvme_active_aen_cnt);
> -		set_bit(SET_NVME_ZIO_THRESHOLD_NEEDED, &vha->dpc_flags);
> +		set_bit(SET_ZIO_THRESHOLD_NEEDED, &vha->dpc_flags);
> 		start_dpc++;
> 	}
> 
> -- 
> 2.19.0.rc0
> 

Looks good.

Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>

--
Himanshu Madhani	 Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 05/11] qla2xxx: Fix use after free in bsg
  2021-03-23  4:42 ` [PATCH 05/11] qla2xxx: Fix use after free in bsg Nilesh Javali
@ 2021-03-24 15:57   ` Himanshu Madhani
  0 siblings, 0 replies; 29+ messages in thread
From: Himanshu Madhani @ 2021-03-24 15:57 UTC (permalink / raw)
  To: Nilesh Javali; +Cc: Martin Petersen, linux-scsi, GR-QLogic-Storage-Upstream



> On Mar 22, 2021, at 11:42 PM, Nilesh Javali <njavali@marvell.com> wrote:
> 
> From: Quinn Tran <qutran@marvell.com>
> 
> On bsg command completion, bsg_job_done was called while
> qla driver continue to access the bsg_job buffer. The bsg_job_done
> can free up resources and reuse by other task, qla continue access
> of the same resource can read garbage data.
> 
> localhost kernel: BUG: KASAN: use-after-free in sg_next+0x64/0x80
> localhost kernel: Read of size 8 at addr ffff8883228a3330 by task swapper/26/0
> localhost kernel:
> localhost kernel: CPU: 26 PID: 0 Comm: swapper/26 Kdump:
> loaded Tainted: G          OE    --------- -  - 4.18.0-193.el8.x86_64+debug #1
> localhost kernel: Hardware name: HP ProLiant DL360
> Gen9/ProLiant DL360 Gen9, BIOS P89 08/12/2016
> localhost kernel: Call Trace:
> localhost kernel: <IRQ>
> localhost kernel: dump_stack+0x9a/0xf0
> localhost kernel: print_address_description.cold.3+0x9/0x23b
> localhost kernel: kasan_report.cold.4+0x65/0x95
> localhost kernel: debug_dma_unmap_sg.part.12+0x10d/0x2d0
> localhost kernel: qla2x00_bsg_sp_free+0xaf6/0x1010 [qla2xxx]
> 
> Signed-off-by: Quinn Tran <qutran@marvell.com>
> Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
> Signed-off-by: Nilesh Javali <njavali@marvell.com>
> ---
> drivers/scsi/qla2xxx/qla_bsg.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_bsg.c b/drivers/scsi/qla2xxx/qla_bsg.c
> index bee8cf9f8123..d021e51344f5 100644
> --- a/drivers/scsi/qla2xxx/qla_bsg.c
> +++ b/drivers/scsi/qla2xxx/qla_bsg.c
> @@ -25,10 +25,11 @@ void qla2x00_bsg_job_done(srb_t *sp, int res)
> 	struct bsg_job *bsg_job = sp->u.bsg_job;
> 	struct fc_bsg_reply *bsg_reply = bsg_job->reply;
> 
> +	sp->free(sp);
> +
> 	bsg_reply->result = res;
> 	bsg_job_done(bsg_job, bsg_reply->result,
> 		       bsg_reply->reply_payload_rcv_len);
> -	sp->free(sp);
> }
> 
> void qla2x00_bsg_sp_free(srb_t *sp)
> -- 
> 2.19.0.rc0
> 

Looks good. 

Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>

--
Himanshu Madhani	 Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 06/11] qla2xxx: Fix crash in qla2xxx_mqueuecommand
  2021-03-23  4:42 ` [PATCH 06/11] qla2xxx: Fix crash in qla2xxx_mqueuecommand Nilesh Javali
@ 2021-03-24 15:57   ` Himanshu Madhani
  0 siblings, 0 replies; 29+ messages in thread
From: Himanshu Madhani @ 2021-03-24 15:57 UTC (permalink / raw)
  To: Nilesh Javali; +Cc: Martin Petersen, linux-scsi, GR-QLogic-Storage-Upstream



> On Mar 22, 2021, at 11:42 PM, Nilesh Javali <njavali@marvell.com> wrote:
> 
> From: Arun Easi <aeasi@marvell.com>
> 
>    RIP: 0010:kmem_cache_free+0xfa/0x1b0
>    Call Trace:
>       qla2xxx_mqueuecommand+0x2b5/0x2c0 [qla2xxx]
>       scsi_queue_rq+0x5e2/0xa40
>       __blk_mq_try_issue_directly+0x128/0x1d0
>       blk_mq_request_issue_directly+0x4e/0xb0
> 
> Fix incorrect call to free srb in qla2xxx_mqueuecommand, as
> srb is now allocated by upper layers. This fixes smatch warning of
> srb unintended free.
> 
> Fixes: af2a0c51b120 ("scsi: qla2xxx: Fix SRB leak on switch command timeout")
> Cc: stable@vger.kernel.org # 5.5
> Reported-by: Laurence Oberman <loberman@redhat.com>
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> Signed-off-by: Arun Easi <aeasi@marvell.com>
> Signed-off-by: Nilesh Javali <njavali@marvell.com>
> ---
> drivers/scsi/qla2xxx/qla_os.c | 7 -------
> 1 file changed, 7 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
> index 6563d69706ba..6a57399b515f 100644
> --- a/drivers/scsi/qla2xxx/qla_os.c
> +++ b/drivers/scsi/qla2xxx/qla_os.c
> @@ -1013,8 +1013,6 @@ qla2xxx_mqueuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd,
> 	if (rval != QLA_SUCCESS) {
> 		ql_dbg(ql_dbg_io + ql_dbg_verbose, vha, 0x3078,
> 		    "Start scsi failed rval=%d for cmd=%p.\n", rval, cmd);
> -		if (rval == QLA_INTERFACE_ERROR)
> -			goto qc24_free_sp_fail_command;
> 		goto qc24_host_busy_free_sp;
> 	}
> 
> @@ -1026,11 +1024,6 @@ qla2xxx_mqueuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd,
> qc24_target_busy:
> 	return SCSI_MLQUEUE_TARGET_BUSY;
> 
> -qc24_free_sp_fail_command:
> -	sp->free(sp);
> -	CMD_SP(cmd) = NULL;
> -	qla2xxx_rel_qpair_sp(sp->qpair, sp);
> -
> qc24_fail_command:
> 	cmd->scsi_done(cmd);
> 
> -- 
> 2.19.0.rc0
> 

Looks good. 

Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>

--
Himanshu Madhani	 Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 07/11] qla2xxx: fix RISC RESET completion polling
  2021-03-23  4:42 ` [PATCH 07/11] qla2xxx: fix RISC RESET completion polling Nilesh Javali
@ 2021-03-24 16:03   ` Himanshu Madhani
  0 siblings, 0 replies; 29+ messages in thread
From: Himanshu Madhani @ 2021-03-24 16:03 UTC (permalink / raw)
  To: Nilesh Javali; +Cc: Martin Petersen, linux-scsi, GR-QLogic-Storage-Upstream



> On Mar 22, 2021, at 11:42 PM, Nilesh Javali <njavali@marvell.com> wrote:
> 
> From: Quinn Tran <qutran@marvell.com>
> 
> After risc reset, the poll time for risc reset completion is
> too short. Fix the completion polling time.
> 
> Signed-off-by: Quinn Tran <qutran@marvell.com>
> Signed-off-by: Nilesh Javali <njavali@marvell.com>
> ---
> drivers/scsi/qla2xxx/qla_init.c | 65 ++++++++++++++++++++++++++++++---
> 1 file changed, 59 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
> index f6dc8166e7ba..19681d3c5b7a 100644
> --- a/drivers/scsi/qla2xxx/qla_init.c
> +++ b/drivers/scsi/qla2xxx/qla_init.c
> @@ -2767,6 +2767,49 @@ qla81xx_reset_mpi(scsi_qla_host_t *vha)
> 	return qla81xx_write_mpi_register(vha, mb);
> }
> 
> +static int
> +qla_chk_risc_recovery(scsi_qla_host_t *vha)
> +{
> +	struct qla_hw_data *ha = vha->hw;
> +	struct device_reg_24xx __iomem *reg = &ha->iobase->isp24;
> +	__le16 __iomem *mbptr = &reg->mailbox0;
> +	int i;
> +	u16 mb[32];
> +	int rc = QLA_SUCCESS;
> +
> +	if (!IS_QLA27XX(ha) && !IS_QLA28XX(ha))
> +		return rc;
> +
> +	/* this check is only valid after RISC reset */
> +	mb[0] = rd_reg_word(mbptr);
> +	mbptr++;
> +	if (mb[0] == 0xf) {
> +		rc = QLA_FUNCTION_FAILED;
> +
> +		for (i = 1; i < 32; i++) {
> +			mb[i] = rd_reg_word(mbptr);
> +			mbptr++;
> +		}
> +
> +		ql_log(ql_log_warn, vha, 0x1015,
> +		       "RISC reset failed. mb[0-7] %04xh %04xh %04xh %04xh %04xh %04xh %04xh %04xh\n",
> +		       mb[0], mb[1], mb[2], mb[3], mb[4], mb[5], mb[6], mb[7]);
> +		ql_log(ql_log_warn, vha, 0x1015,
> +		       "RISC reset failed. mb[8-15] %04xh %04xh %04xh %04xh %04xh %04xh %04xh %04xh\n",
> +		       mb[8], mb[9], mb[10], mb[11], mb[12], mb[13], mb[14],
> +		       mb[15]);
> +		ql_log(ql_log_warn, vha, 0x1015,
> +		       "RISC reset failed. mb[16-23] %04xh %04xh %04xh %04xh %04xh %04xh %04xh %04xh\n",
> +		       mb[16], mb[17], mb[18], mb[19], mb[20], mb[21], mb[22],
> +		       mb[23]);
> +		ql_log(ql_log_warn, vha, 0x1015,
> +		       "RISC reset failed. mb[24-31] %04xh %04xh %04xh %04xh %04xh %04xh %04xh %04xh\n",
> +		       mb[24], mb[25], mb[26], mb[27], mb[28], mb[29], mb[30],
> +		       mb[31]);
> +	}
> +	return rc;
> +}
> +
> /**
>  * qla24xx_reset_risc() - Perform full reset of ISP24xx RISC.
>  * @vha: HA context
> @@ -2783,6 +2826,7 @@ qla24xx_reset_risc(scsi_qla_host_t *vha)
> 	uint16_t wd;
> 	static int abts_cnt; /* ISP abort retry counts */
> 	int rval = QLA_SUCCESS;
> +	int print = 1;
> 
> 	spin_lock_irqsave(&ha->hardware_lock, flags);
> 
> @@ -2871,17 +2915,26 @@ qla24xx_reset_risc(scsi_qla_host_t *vha)
> 	rd_reg_dword(&reg->hccr);
> 
> 	wrt_reg_dword(&reg->hccr, HCCRX_CLR_RISC_RESET);
> +	mdelay(10);
> 	rd_reg_dword(&reg->hccr);
> 
> -	rd_reg_word(&reg->mailbox0);
> -	for (cnt = 60; rd_reg_word(&reg->mailbox0) != 0 &&
> -	    rval == QLA_SUCCESS; cnt--) {
> +	wd = rd_reg_word(&reg->mailbox0);
> +	for (cnt = 300; wd != 0 && rval == QLA_SUCCESS; cnt--) {
> 		barrier();
> -		if (cnt)
> -			udelay(5);
> -		else
> +		if (cnt) {
> +			mdelay(1);
> +			if (print && qla_chk_risc_recovery(vha))
> +				print = 0;
> +
> +			wd = rd_reg_word(&reg->mailbox0);
> +		} else {
> 			rval = QLA_FUNCTION_TIMEOUT;
> +
> +			ql_log(ql_log_warn, vha, 0x015e,
> +			       "RISC reset timeout\n");
> +		}
> 	}
> +
> 	if (rval == QLA_SUCCESS)
> 		set_bit(RISC_RDY_AFT_RESET, &ha->fw_dump_cap_flags);
> 
> -- 
> 2.19.0.rc0
> 

Looks Good. 

Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>

--
Himanshu Madhani	 Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 08/11] qla2xxx: fix crash in PCIe error handling
  2021-03-23  4:42 ` [PATCH 08/11] qla2xxx: fix crash in PCIe error handling Nilesh Javali
@ 2021-03-24 16:21   ` Himanshu Madhani
  0 siblings, 0 replies; 29+ messages in thread
From: Himanshu Madhani @ 2021-03-24 16:21 UTC (permalink / raw)
  To: Nilesh Javali; +Cc: Martin Petersen, linux-scsi, GR-QLogic-Storage-Upstream



> On Mar 22, 2021, at 11:42 PM, Nilesh Javali <njavali@marvell.com> wrote:
> 
> From: Quinn Tran <qutran@marvell.com>
> 
> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: qla2x00_abort_isp+0x21/0x6b0 [qla2xxx] PGD 0 P4D 0
> Oops: 0000 [#1] SMP PTI
> CPU: 0 PID: 1715 Comm: kworker/0:2
> Tainted: GOE 4.12.14-122.37-default #1 SLE12-SP5
> Hardware name: HPE Superdome Flex/Superdome Flex, BIOS
> Bundle:3.30.100 SFW:IP147.007.004.017.000.2009211957 09/21/2020
> Workqueue: events aer_recover_work_func
> task: ffff9e399c14ca80 task.stack: ffffc1c58e4ac000
> RIP: 0010:qla2x00_abort_isp+0x21/0x6b0 [qla2xxx]
> RSP: 0018:ffffc1c58e4afd50 EFLAGS: 00010282
> RAX: 0000000000000000 RBX: ffff9e419cdef480 RCX: 0000000000000000
> RDX: ffff9e399c14ca80 RSI: 0000000000000246 RDI: ffff9e419bbc27b8
> RBP: ffff9e419bbc27b8 R08: 0000000000000004 R09: 00000000a0440000
> R10: 0000000000000000 R11: ffff9e399416d1a0 R12: ffff9e419cdef000
> R13: ffff9e3a7cfae800 R14: ffff9e3a7cfae800 R15: 00000000000000c0
> FS:  0000000000000000(0000) GS:ffff9e39a0000000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 00000006cd00a005 CR4: 00000000007606f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> Call Trace:
>  qla2xxx_pci_slot_reset+0x141/0x160 [qla2xxx]
>  report_slot_reset+0x41/0x80
>  ? merge_result.part.4+0x30/0x30
>  pci_walk_bus+0x70/0x90
>  pcie_do_recovery+0x1db/0x2e0
>  aer_recover_work_func+0xc2/0xf0
>  process_one_work+0x14c/0x390
> 
> Disable board_disable logic where driver resources are freed
> while OS is in the process of recovering the adapter.
> 
> Signed-off-by: Quinn Tran <qutran@marvell.com>
> Signed-off-by: Nilesh Javali <njavali@marvell.com>
> ---
> drivers/scsi/qla2xxx/qla_dbg.c    |  32 ++++++
> drivers/scsi/qla2xxx/qla_def.h    |  11 ++
> drivers/scsi/qla2xxx/qla_gbl.h    |   3 +
> drivers/scsi/qla2xxx/qla_init.c   |  40 ++++---
> drivers/scsi/qla2xxx/qla_inline.h |  29 +++++
> drivers/scsi/qla2xxx/qla_iocb.c   |  60 +++++++++--
> drivers/scsi/qla2xxx/qla_isr.c    |   9 +-
> drivers/scsi/qla2xxx/qla_mbx.c    |   3 +-
> drivers/scsi/qla2xxx/qla_nvme.c   |  10 +-
> drivers/scsi/qla2xxx/qla_os.c     | 171 ++++++++++++++++++------------
> 10 files changed, 264 insertions(+), 104 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_dbg.c b/drivers/scsi/qla2xxx/qla_dbg.c
> index 144a893e7335..059f6f9b8698 100644
> --- a/drivers/scsi/qla2xxx/qla_dbg.c
> +++ b/drivers/scsi/qla2xxx/qla_dbg.c
> @@ -113,8 +113,18 @@ qla27xx_dump_mpi_ram(struct qla_hw_data *ha, uint32_t addr, uint32_t *ram,
> 	uint32_t stat;
> 	ulong i, j, timer = 6000000;
> 	int rval = QLA_FUNCTION_FAILED;
> +	scsi_qla_host_t *vha = pci_get_drvdata(ha->pdev);
> 
> 	clear_bit(MBX_INTERRUPT, &ha->mbx_cmd_flags);
> +
> +	stat = rd_reg_dword(&reg->host_status);
> +	if (stat == 0xffffffff) {
> +		ql_log(ql_log_info, vha, 0x8041,
> +		       "dump mpi ram detect PCI disconnect, exiting.\n");
> +		qla_schedule_eeh_work(vha);
> +		return rval;
> +	}
> +

I see this block is repeated multiple times in code. Why not make a little helper to reduce code duplication. 

> 	for (i = 0; i < ram_dwords; i += dwords, addr += dwords) {
> 		if (i + dwords > ram_dwords)
> 			dwords = ram_dwords - i;
> @@ -139,6 +149,13 @@ qla27xx_dump_mpi_ram(struct qla_hw_data *ha, uint32_t addr, uint32_t *ram,
> 			udelay(5);
> 
> 			stat = rd_reg_dword(&reg->host_status);
> +			if (stat == 0xffffffff) {
> +				ql_log(ql_log_info, vha, 0x8041,
> +				       "dump mpi ram detect PCI disconnect, exiting.\n");
> +				qla_schedule_eeh_work(vha);
> +				return rval;
> +			}
> +
> 			/* Check for pending interrupts. */
> 			if (!(stat & HSRX_RISC_INT))
> 				continue;
> @@ -192,9 +209,18 @@ qla24xx_dump_ram(struct qla_hw_data *ha, uint32_t addr, __be32 *ram,
> 	uint32_t dwords = qla2x00_gid_list_size(ha) / 4;
> 	uint32_t stat;
> 	ulong i, j, timer = 6000000;
> +	scsi_qla_host_t *vha = pci_get_drvdata(ha->pdev);
> 
> 	clear_bit(MBX_INTERRUPT, &ha->mbx_cmd_flags);
> 
> +	stat = rd_reg_dword(&reg->host_status);
> +	if (stat == 0xffffffff) {
> +		ql_log(ql_log_info, vha, 0x8041,
> +		       "dump ram detect PCI disconnect, exiting.\n");
> +		qla_schedule_eeh_work(vha);
> +		return rval;
> +	}
> +
> 	for (i = 0; i < ram_dwords; i += dwords, addr += dwords) {
> 		if (i + dwords > ram_dwords)
> 			dwords = ram_dwords - i;
> @@ -217,6 +243,12 @@ qla24xx_dump_ram(struct qla_hw_data *ha, uint32_t addr, __be32 *ram,
> 		while (timer--) {
> 			udelay(5);
> 			stat = rd_reg_dword(&reg->host_status);
> +			if (stat == 0xffffffff) {
> +				ql_log(ql_log_info, vha, 0x8041,
> +				       "dump ram detect PCI disconnect, exiting.\n");
> +				qla_schedule_eeh_work(vha);
> +				return rval;
> +			}
> 
> 			/* Check for pending interrupts. */
> 			if (!(stat & HSRX_RISC_INT))
> diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h
> index 3d09f31895e7..b6b4d4a5b2e8 100644
> --- a/drivers/scsi/qla2xxx/qla_def.h
> +++ b/drivers/scsi/qla2xxx/qla_def.h
> @@ -396,6 +396,7 @@ typedef union {
> 	} b;
> } port_id_t;
> #define INVALID_PORT_ID	0xFFFFFF
> +#define ISP_REG16_DISCONNECT 0xFFFF
> 
> static inline le_id_t be_id_to_le(be_id_t id)
> {
> @@ -3857,6 +3858,13 @@ struct qla_hw_data_stat {
> 	u32 num_mpi_reset;
> };
> 
> +/* refer to pcie_do_recovery reference */
> +typedef enum {
> +	QLA_PCI_RESUME,
> +	QLA_PCI_ERR_DETECTED,
> +	QLA_PCI_MMIO_ENABLED,
> +	QLA_PCI_SLOT_RESET,
> +} pci_error_state_t;
> /*
>  * Qlogic host adapter specific data structure.
> */
> @@ -3928,6 +3936,7 @@ struct qla_hw_data {
> 		uint32_t	max_req_queue_warned:1;
> 		uint32_t	plogi_template_valid:1;
> 		uint32_t	port_isolated:1;
> +		uint32_t	eeh_work_pending:1;
> 	} flags;
> 
> 	uint16_t max_exchg;
> @@ -4607,6 +4616,7 @@ struct qla_hw_data {
> #define DEFAULT_ZIO_THRESHOLD 5
> 
> 	struct qla_hw_data_stat stat;
> +	pci_error_state_t pci_error_state;
> };
> 
> struct active_regions {
> @@ -4727,6 +4737,7 @@ typedef struct scsi_qla_host {
> #define FX00_CRITEMP_RECOVERY	25
> #define FX00_HOST_INFO_RESEND	26
> #define QPAIR_ONLINE_CHECK_NEEDED	27
> +#define DO_EEH_RECOVERY		28
> #define DETECT_SFP_CHANGE	29
> #define N2N_LOGIN_NEEDED	30
> #define IOCB_WORK_ACTIVE	31
> diff --git a/drivers/scsi/qla2xxx/qla_gbl.h b/drivers/scsi/qla2xxx/qla_gbl.h
> index 6486f97d649e..fae5cae6f0a8 100644
> --- a/drivers/scsi/qla2xxx/qla_gbl.h
> +++ b/drivers/scsi/qla2xxx/qla_gbl.h
> @@ -224,6 +224,7 @@ extern int qla2x00_post_uevent_work(struct scsi_qla_host *, u32);
> 
> extern int qla2x00_post_uevent_work(struct scsi_qla_host *, u32);
> extern void qla2x00_disable_board_on_pci_error(struct work_struct *);
> +extern void qla_eeh_work(struct work_struct *);
> extern void qla2x00_sp_compl(srb_t *sp, int);
> extern void qla2xxx_qpair_sp_free_dma(srb_t *sp);
> extern void qla2xxx_qpair_sp_compl(srb_t *sp, int);
> @@ -235,6 +236,8 @@ int qla24xx_post_relogin_work(struct scsi_qla_host *vha);
> void qla2x00_wait_for_sess_deletion(scsi_qla_host_t *);
> void qla24xx_process_purex_rdp(struct scsi_qla_host *vha,
> 			       struct purex_item *pkt);
> +void qla_pci_set_eeh_busy(struct scsi_qla_host *);
> +void qla_schedule_eeh_work(struct scsi_qla_host *);
> 
> /*
>  * Global Functions in qla_mid.c source file.
> diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
> index 19681d3c5b7a..cd051eee8cd1 100644
> --- a/drivers/scsi/qla2xxx/qla_init.c
> +++ b/drivers/scsi/qla2xxx/qla_init.c
> @@ -6932,22 +6932,18 @@ qla2x00_abort_isp_cleanup(scsi_qla_host_t *vha)
> 	}
> 	spin_unlock_irqrestore(&ha->vport_slock, flags);
> 
> -	if (!ha->flags.eeh_busy) {
> -		/* Make sure for ISP 82XX IO DMA is complete */
> -		if (IS_P3P_TYPE(ha)) {
> -			qla82xx_chip_reset_cleanup(vha);
> -			ql_log(ql_log_info, vha, 0x00b4,
> -			    "Done chip reset cleanup.\n");
> -
> -			/* Done waiting for pending commands.
> -			 * Reset the online flag.
> -			 */
> -			vha->flags.online = 0;
> -		}
> +	/* Make sure for ISP 82XX IO DMA is complete */
> +	if (IS_P3P_TYPE(ha)) {
> +		qla82xx_chip_reset_cleanup(vha);
> +		ql_log(ql_log_info, vha, 0x00b4,
> +		       "Done chip reset cleanup.\n");
> 
> -		/* Requeue all commands in outstanding command list. */
> -		qla2x00_abort_all_cmds(vha, DID_RESET << 16);
> +		/* Done waiting for pending commands. Reset online flag */
> +		vha->flags.online = 0;
> 	}
> +
> +	/* Requeue all commands in outstanding command list. */
> +	qla2x00_abort_all_cmds(vha, DID_RESET << 16);
> 	/* memory barrier */
> 	wmb();
> }
> @@ -6978,6 +6974,12 @@ qla2x00_abort_isp(scsi_qla_host_t *vha)
> 		if (vha->hw->flags.port_isolated)
> 			return status;
> 
> +		if (qla2x00_isp_reg_stat(ha)) {
> +			ql_log(ql_log_info, vha, 0x803f,
> +			       "PCI/Register disconnect 1, exiting.\n");
> +			return status;
> +		}
> +

I would like to see more meaningful disconnect message. Please elaborate log message, they are very crucial in debugging. 

> 		if (test_and_clear_bit(ISP_ABORT_TO_ROM, &vha->dpc_flags)) {
> 			ha->flags.chip_reset_done = 1;
> 			vha->flags.online = 1;
> @@ -7017,8 +7019,18 @@ qla2x00_abort_isp(scsi_qla_host_t *vha)
> 
> 		ha->isp_ops->get_flash_version(vha, req->ring);
> 
> +		if (qla2x00_isp_reg_stat(ha)) {
> +			ql_log(ql_log_info, vha, 0x803f,
> +			       "PCI/Register disconnect 2, exiting.\n");
> +			return status;
> +		}

Same comment as earlier log (also use unique message id)

> 		ha->isp_ops->nvram_config(vha);
> 
> +		if (qla2x00_isp_reg_stat(ha)) {
> +			ql_log(ql_log_info, vha, 0x803f,
> +			       "PCI/Register disconnect 3, exiting.\n");
> +			return status;
> +		}

Same here as previous comment. (Please use unique message id)

> 		if (!qla2x00_restart_isp(vha)) {
> 			clear_bit(RESET_MARKER_NEEDED, &vha->dpc_flags);
> 
> diff --git a/drivers/scsi/qla2xxx/qla_inline.h b/drivers/scsi/qla2xxx/qla_inline.h
> index e80e41b6c9e1..12eb74ae1859 100644
> --- a/drivers/scsi/qla2xxx/qla_inline.h
> +++ b/drivers/scsi/qla2xxx/qla_inline.h
> @@ -432,3 +432,32 @@ qla_put_iocbs(struct qla_qpair *qp, struct iocb_resource *iores)
> 	}
> 	iores->res_type = RESOURCE_NONE;
> }
> +
> +#define ISP_REG_DISCONNECT 0xffffffffU
> +/**************************************************************************
> + * qla2x00_isp_reg_stat
> + *
> + * Description:
> + *        Read the host status register of ISP before aborting the command.
> + *
> + * Input:
> + *       ha = pointer to host adapter structure.
> + *
> + *
> + * Returns:
> + *       Either true or false.
> + *
> + * Note: Return true if there is register disconnect.
> + **************************************************************************/
> +static inline
> +uint32_t qla2x00_isp_reg_stat(struct qla_hw_data *ha)
> +{
> +	struct device_reg_24xx __iomem *reg = &ha->iobase->isp24;
> +	struct device_reg_82xx __iomem *reg82 = &ha->iobase->isp82;
> +
> +	if (IS_P3P_TYPE(ha))
> +		return ((rd_reg_dword(&reg82->host_int)) == ISP_REG_DISCONNECT);
> +	else
> +		return ((rd_reg_dword(&reg->host_status)) ==
> +			ISP_REG_DISCONNECT);
> +}
> diff --git a/drivers/scsi/qla2xxx/qla_iocb.c b/drivers/scsi/qla2xxx/qla_iocb.c
> index f26a7a14fce9..a86a856215c5 100644
> --- a/drivers/scsi/qla2xxx/qla_iocb.c
> +++ b/drivers/scsi/qla2xxx/qla_iocb.c
> @@ -1645,8 +1645,14 @@ qla24xx_start_scsi(srb_t *sp)
> 		goto queuing_error;
> 
> 	if (req->cnt < (req_cnt + 2)) {
> -		cnt = IS_SHADOW_REG_CAPABLE(ha) ? *req->out_ptr :
> -		    rd_reg_dword_relaxed(req->req_q_out);
> +		if (IS_SHADOW_REG_CAPABLE(ha)) {
> +			cnt = *req->out_ptr;
> +		} else {
> +			cnt = rd_reg_dword_relaxed(req->req_q_out);
> +			if (qla2x00_check_reg16_for_disconnect(vha, cnt))
> +				goto queuing_error;
> +		}
> +
> 		if (req->ring_index < cnt)
> 			req->cnt = cnt - req->ring_index;
> 		else
> @@ -1842,8 +1848,13 @@ qla24xx_dif_start_scsi(srb_t *sp)
> 		goto queuing_error;
> 
> 	if (req->cnt < (req_cnt + 2)) {
> -		cnt = IS_SHADOW_REG_CAPABLE(ha) ? *req->out_ptr :
> -		    rd_reg_dword_relaxed(req->req_q_out);
> +		if (IS_SHADOW_REG_CAPABLE(ha)) {
> +			cnt = *req->out_ptr;
> +		} else {
> +			cnt = rd_reg_dword_relaxed(req->req_q_out);
> +			if (qla2x00_check_reg16_for_disconnect(vha, cnt))
> +				goto queuing_error;
> +		}
> 		if (req->ring_index < cnt)
> 			req->cnt = cnt - req->ring_index;
> 		else
> @@ -1922,6 +1933,7 @@ qla24xx_dif_start_scsi(srb_t *sp)
> 
> 	qla_put_iocbs(sp->qpair, &sp->iores);
> 	spin_unlock_irqrestore(&ha->hardware_lock, flags);
> +
> 	return QLA_FUNCTION_FAILED;
> }
> 
> @@ -1991,8 +2003,14 @@ qla2xxx_start_scsi_mq(srb_t *sp)
> 		goto queuing_error;
> 
> 	if (req->cnt < (req_cnt + 2)) {
> -		cnt = IS_SHADOW_REG_CAPABLE(ha) ? *req->out_ptr :
> -		    rd_reg_dword_relaxed(req->req_q_out);
> +		if (IS_SHADOW_REG_CAPABLE(ha)) {
> +			cnt = *req->out_ptr;
> +		} else {
> +			cnt = rd_reg_dword_relaxed(req->req_q_out);
> +			if (qla2x00_check_reg16_for_disconnect(vha, cnt))
> +				goto queuing_error;
> +		}
> +
> 		if (req->ring_index < cnt)
> 			req->cnt = cnt - req->ring_index;
> 		else
> @@ -2203,8 +2221,14 @@ qla2xxx_dif_start_scsi_mq(srb_t *sp)
> 		goto queuing_error;
> 
> 	if (req->cnt < (req_cnt + 2)) {
> -		cnt = IS_SHADOW_REG_CAPABLE(ha) ? *req->out_ptr :
> -		    rd_reg_dword_relaxed(req->req_q_out);
> +		if (IS_SHADOW_REG_CAPABLE(ha)) {
> +			cnt = *req->out_ptr;
> +		} else {
> +			cnt = rd_reg_dword_relaxed(req->req_q_out);
> +			if (qla2x00_check_reg16_for_disconnect(vha, cnt))
> +				goto queuing_error;
> +		}
> +
> 		if (req->ring_index < cnt)
> 			req->cnt = cnt - req->ring_index;
> 		else
> @@ -2281,6 +2305,7 @@ qla2xxx_dif_start_scsi_mq(srb_t *sp)
> 
> 	qla_put_iocbs(sp->qpair, &sp->iores);
> 	spin_unlock_irqrestore(&qpair->qp_lock, flags);
> +
> 	return QLA_FUNCTION_FAILED;
> }
> 
> @@ -2325,6 +2350,11 @@ __qla2x00_alloc_iocbs(struct qla_qpair *qpair, srb_t *sp)
> 			cnt = qla2x00_debounce_register(
> 			    ISP_REQ_Q_OUT(ha, &reg->isp));
> 
> +		if (!qpair->use_shadow_reg && cnt == ISP_REG16_DISCONNECT) {
> +			qla_schedule_eeh_work(vha);
> +			return NULL;
> +		}
> +
> 		if  (req->ring_index < cnt)
> 			req->cnt = cnt - req->ring_index;
> 		else
> @@ -3739,6 +3769,9 @@ qla2x00_start_sp(srb_t *sp)
> 	void *pkt;
> 	unsigned long flags;
> 
> +	if (vha->hw->flags.eeh_busy)
> +		return -EIO;
> +
> 	spin_lock_irqsave(qp->qp_lock_ptr, flags);
> 	pkt = __qla2x00_alloc_iocbs(sp->qpair, sp);
> 	if (!pkt) {
> @@ -3956,8 +3989,14 @@ qla2x00_start_bidir(srb_t *sp, struct scsi_qla_host *vha, uint32_t tot_dsds)
> 
> 	/* Check for room on request queue. */
> 	if (req->cnt < req_cnt + 2) {
> -		cnt = IS_SHADOW_REG_CAPABLE(ha) ? *req->out_ptr :
> -		    rd_reg_dword_relaxed(req->req_q_out);
> +		if (IS_SHADOW_REG_CAPABLE(ha)) {
> +			cnt = *req->out_ptr;
> +		} else {
> +			cnt = rd_reg_dword_relaxed(req->req_q_out);
> +			if (qla2x00_check_reg16_for_disconnect(vha, cnt))
> +				goto queuing_error;
> +		}
> +
> 		if  (req->ring_index < cnt)
> 			req->cnt = cnt - req->ring_index;
> 		else
> @@ -3996,5 +4035,6 @@ qla2x00_start_bidir(srb_t *sp, struct scsi_qla_host *vha, uint32_t tot_dsds)
> 	qla2x00_start_iocbs(vha, req);
> queuing_error:
> 	spin_unlock_irqrestore(&ha->hardware_lock, flags);
> +
> 	return rval;
> }
> diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
> index 5e188375c871..f21007c8ca51 100644
> --- a/drivers/scsi/qla2xxx/qla_isr.c
> +++ b/drivers/scsi/qla2xxx/qla_isr.c
> @@ -270,12 +270,7 @@ qla2x00_check_reg32_for_disconnect(scsi_qla_host_t *vha, uint32_t reg)
> 		if (!test_and_set_bit(PFLG_DISCONNECTED, &vha->pci_flags) &&
> 		    !test_bit(PFLG_DRIVER_REMOVING, &vha->pci_flags) &&
> 		    !test_bit(PFLG_DRIVER_PROBING, &vha->pci_flags)) {
> -			/*
> -			 * Schedule this (only once) on the default system
> -			 * workqueue so that all the adapter workqueues and the
> -			 * DPC thread can be shutdown cleanly.
> -			 */
> -			schedule_work(&vha->hw->board_disable);
> +			qla_schedule_eeh_work(vha);
> 		}
> 		return true;
> 	} else
> @@ -1657,8 +1652,6 @@ qla2x00_async_event(scsi_qla_host_t *vha, struct rsp_que *rsp, uint16_t *mb)
> 	case MBA_TEMPERATURE_ALERT:
> 		ql_dbg(ql_dbg_async, vha, 0x505e,
> 		    "TEMPERATURE ALERT: %04x %04x %04x\n", mb[1], mb[2], mb[3]);
> -		if (mb[1] == 0x12)
> -			schedule_work(&ha->board_disable);
> 		break;
> 
> 	case MBA_TRANS_INSERT:
> diff --git a/drivers/scsi/qla2xxx/qla_mbx.c b/drivers/scsi/qla2xxx/qla_mbx.c
> index 06c99963b2c9..0149f84cdd8e 100644
> --- a/drivers/scsi/qla2xxx/qla_mbx.c
> +++ b/drivers/scsi/qla2xxx/qla_mbx.c
> @@ -161,7 +161,8 @@ qla2x00_mailbox_command(scsi_qla_host_t *vha, mbx_cmd_t *mcp)
> 	/* check if ISP abort is active and return cmd with timeout */
> 	if ((test_bit(ABORT_ISP_ACTIVE, &base_vha->dpc_flags) ||
> 	    test_bit(ISP_ABORT_RETRY, &base_vha->dpc_flags) ||
> -	    test_bit(ISP_ABORT_NEEDED, &base_vha->dpc_flags)) &&
> +	    test_bit(ISP_ABORT_NEEDED, &base_vha->dpc_flags) ||
> +	    ha->flags.eeh_busy) &&
> 	    !is_rom_cmd(mcp->mb[0])) {
> 		ql_log(ql_log_info, vha, 0x1005,
> 		    "Cmd 0x%x aborted with timeout since ISP Abort is pending\n",
> diff --git a/drivers/scsi/qla2xxx/qla_nvme.c b/drivers/scsi/qla2xxx/qla_nvme.c
> index 0237588f48b0..0cacb667a88b 100644
> --- a/drivers/scsi/qla2xxx/qla_nvme.c
> +++ b/drivers/scsi/qla2xxx/qla_nvme.c
> @@ -398,8 +398,13 @@ static inline int qla2x00_start_nvme_mq(srb_t *sp)
> 	}
> 	req_cnt = qla24xx_calc_iocbs(vha, tot_dsds);
> 	if (req->cnt < (req_cnt + 2)) {
> -		cnt = IS_SHADOW_REG_CAPABLE(ha) ? *req->out_ptr :
> -		    rd_reg_dword_relaxed(req->req_q_out);
> +		if (IS_SHADOW_REG_CAPABLE(ha)) {
> +			cnt = *req->out_ptr;
> +		} else {
> +			cnt = rd_reg_dword_relaxed(req->req_q_out);
> +			if (qla2x00_check_reg16_for_disconnect(vha, cnt))
> +				goto queuing_error;
> +		}
> 
> 		if (req->ring_index < cnt)
> 			req->cnt = cnt - req->ring_index;
> @@ -536,6 +541,7 @@ static inline int qla2x00_start_nvme_mq(srb_t *sp)
> 
> queuing_error:
> 	spin_unlock_irqrestore(&qpair->qp_lock, flags);
> +
> 	return rval;
> }
> 
> diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
> index 6a57399b515f..135cadecdaa4 100644
> --- a/drivers/scsi/qla2xxx/qla_os.c
> +++ b/drivers/scsi/qla2xxx/qla_os.c
> @@ -971,6 +971,13 @@ qla2xxx_mqueuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd,
> 		goto qc24_fail_command;
> 	}
> 
> +	if (!qpair->online) {
> +		ql_dbg(ql_dbg_io, vha, 0x3077,
> +		       "qpair not online. eeh_busy=%d.\n", ha->flags.eeh_busy);
> +		cmd->result = DID_NO_CONNECT << 16;
> +		goto qc24_fail_command;
> +	}
> +
> 	if (!fcport || fcport->deleted) {
> 		cmd->result = DID_IMM_RETRY << 16;
> 		goto qc24_fail_command;
> @@ -1200,35 +1207,6 @@ qla2x00_wait_for_chip_reset(scsi_qla_host_t *vha)
> 	return return_status;
> }
> 
> -#define ISP_REG_DISCONNECT 0xffffffffU
> -/**************************************************************************
> -* qla2x00_isp_reg_stat
> -*
> -* Description:
> -*	Read the host status register of ISP before aborting the command.
> -*
> -* Input:
> -*	ha = pointer to host adapter structure.
> -*
> -*
> -* Returns:
> -*	Either true or false.
> -*
> -* Note:	Return true if there is register disconnect.
> -**************************************************************************/
> -static inline
> -uint32_t qla2x00_isp_reg_stat(struct qla_hw_data *ha)
> -{
> -	struct device_reg_24xx __iomem *reg = &ha->iobase->isp24;
> -	struct device_reg_82xx __iomem *reg82 = &ha->iobase->isp82;
> -
> -	if (IS_P3P_TYPE(ha))
> -		return ((rd_reg_dword(&reg82->host_int)) == ISP_REG_DISCONNECT);
> -	else
> -		return ((rd_reg_dword(&reg->host_status)) ==
> -			ISP_REG_DISCONNECT);
> -}
> -
> /**************************************************************************
> * qla2xxx_eh_abort
> *
> @@ -1262,6 +1240,7 @@ qla2xxx_eh_abort(struct scsi_cmnd *cmd)
> 	if (qla2x00_isp_reg_stat(ha)) {
> 		ql_log(ql_log_info, vha, 0x8042,
> 		    "PCI/Register disconnect, exiting.\n");
> +		qla_pci_set_eeh_busy(vha);
> 		return FAILED;
> 	}
> 
> @@ -1455,6 +1434,7 @@ qla2xxx_eh_device_reset(struct scsi_cmnd *cmd)
> 	if (qla2x00_isp_reg_stat(ha)) {
> 		ql_log(ql_log_info, vha, 0x803e,
> 		    "PCI/Register disconnect, exiting.\n");
> +		qla_pci_set_eeh_busy(vha);
> 		return FAILED;
> 	}
> 
> @@ -1471,6 +1451,7 @@ qla2xxx_eh_target_reset(struct scsi_cmnd *cmd)
> 	if (qla2x00_isp_reg_stat(ha)) {
> 		ql_log(ql_log_info, vha, 0x803f,
> 		    "PCI/Register disconnect, exiting.\n");
> +		qla_pci_set_eeh_busy(vha);
> 		return FAILED;
> 	}
> 
> @@ -1506,6 +1487,7 @@ qla2xxx_eh_bus_reset(struct scsi_cmnd *cmd)
> 	if (qla2x00_isp_reg_stat(ha)) {
> 		ql_log(ql_log_info, vha, 0x8040,
> 		    "PCI/Register disconnect, exiting.\n");
> +		qla_pci_set_eeh_busy(vha);
> 		return FAILED;
> 	}
> 
> @@ -1583,7 +1565,7 @@ qla2xxx_eh_host_reset(struct scsi_cmnd *cmd)
> 	if (qla2x00_isp_reg_stat(ha)) {
> 		ql_log(ql_log_info, vha, 0x8041,
> 		    "PCI/Register disconnect, exiting.\n");
> -		schedule_work(&ha->board_disable);
> +		qla_pci_set_eeh_busy(vha);
> 		return SUCCESS;
> 	}
> 
> @@ -6669,6 +6651,9 @@ qla2x00_do_dpc(void *data)
> 
> 		schedule();
> 
> +		if (test_and_clear_bit(DO_EEH_RECOVERY, &base_vha->dpc_flags))
> +			qla_pci_set_eeh_busy(base_vha);
> +
> 		if (!base_vha->flags.init_done || ha->flags.mbox_busy)
> 			goto end_loop;
> 
> @@ -7385,6 +7370,8 @@ static void qla_pci_error_cleanup(scsi_qla_host_t *vha)
> 	int i;
> 	unsigned long flags;
> 
> +	ql_dbg(ql_dbg_aer, vha, 0x9000,
> +	       "%s\n", __func__);
> 	ha->chip_reset++;
> 
> 	ha->base_qpair->chip_reset = ha->chip_reset;
> @@ -7394,28 +7381,15 @@ static void qla_pci_error_cleanup(scsi_qla_host_t *vha)
> 			    ha->base_qpair->chip_reset;
> 	}
> 
> -	/* purge MBox commands */
> -	if (atomic_read(&ha->num_pend_mbx_stage3)) {
> -		clear_bit(MBX_INTR_WAIT, &ha->mbx_cmd_flags);
> -		complete(&ha->mbx_intr_comp);
> -	}
> -
> -	i = 0;
> -
> -	while (atomic_read(&ha->num_pend_mbx_stage3) ||
> -	    atomic_read(&ha->num_pend_mbx_stage2) ||
> -	    atomic_read(&ha->num_pend_mbx_stage1)) {
> -		msleep(20);
> -		i++;
> -		if (i > 50)
> -			break;
> -	}
> -
> -	ha->flags.purge_mbox = 0;
> +	/* purge mailbox might take a while. Slot Reset/chip reset
> +	 * will take care of the purge
> +	 */
> 

Please fix comment style for multi line comment. 

> 	mutex_lock(&ha->mq_lock);
> +	ha->base_qpair->online = 0;
> 	list_for_each_entry(qpair, &base_vha->qp_list, qp_list_elem)
> 		qpair->online = 0;
> +	wmb();
> 	mutex_unlock(&ha->mq_lock);
> 
> 	qla2x00_mark_all_devices_lost(vha);
> @@ -7452,14 +7426,17 @@ qla2xxx_pci_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
> {
> 	scsi_qla_host_t *vha = pci_get_drvdata(pdev);
> 	struct qla_hw_data *ha = vha->hw;
> +	pci_ers_result_t ret = PCI_ERS_RESULT_NEED_RESET;
> 
> -	ql_dbg(ql_dbg_aer, vha, 0x9000,
> -	    "PCI error detected, state %x.\n", state);
> +	ql_log(ql_log_warn, vha, 0x9000,
> +	       "PCI error detected, state %x.\n", state);
> +	ha->pci_error_state = QLA_PCI_ERR_DETECTED;
> 
> 	if (!atomic_read(&pdev->enable_cnt)) {
> 		ql_log(ql_log_info, vha, 0xffff,
> 			"PCI device is disabled,state %x\n", state);
> -		return PCI_ERS_RESULT_NEED_RESET;
> +		ret = PCI_ERS_RESULT_NEED_RESET;
> +		goto out;
> 	}
> 
> 	switch (state) {
> @@ -7469,11 +7446,12 @@ qla2xxx_pci_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
> 			set_bit(QPAIR_ONLINE_CHECK_NEEDED, &vha->dpc_flags);
> 			qla2xxx_wake_dpc(vha);
> 		}
> -		return PCI_ERS_RESULT_CAN_RECOVER;
> +		ret = PCI_ERS_RESULT_CAN_RECOVER;
> +		break;
> 	case pci_channel_io_frozen:
> -		ha->flags.eeh_busy = 1;
> -		qla_pci_error_cleanup(vha);
> -		return PCI_ERS_RESULT_NEED_RESET;
> +		qla_pci_set_eeh_busy(vha);
> +		ret = PCI_ERS_RESULT_NEED_RESET;
> +		break;
> 	case pci_channel_io_perm_failure:
> 		ha->flags.pci_channel_io_perm_failure = 1;
> 		qla2x00_abort_all_cmds(vha, DID_NO_CONNECT << 16);
> @@ -7481,9 +7459,12 @@ qla2xxx_pci_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
> 			set_bit(QPAIR_ONLINE_CHECK_NEEDED, &vha->dpc_flags);
> 			qla2xxx_wake_dpc(vha);
> 		}
> -		return PCI_ERS_RESULT_DISCONNECT;
> +		ret = PCI_ERS_RESULT_DISCONNECT;
> 	}
> -	return PCI_ERS_RESULT_NEED_RESET;
> +out:
> +	ql_dbg(ql_dbg_aer, vha, 0x600d,
> +	       "PCI error detected returning [%x].\n", ret);
> +	return ret;
> }
> 
> static pci_ers_result_t
> @@ -7497,6 +7478,10 @@ qla2xxx_pci_mmio_enabled(struct pci_dev *pdev)
> 	struct device_reg_2xxx __iomem *reg = &ha->iobase->isp;
> 	struct device_reg_24xx __iomem *reg24 = &ha->iobase->isp24;
> 
> +	ql_log(ql_log_warn, base_vha, 0x9000,
> +	       "mmio enabled\n");
> +
> +	ha->pci_error_state = QLA_PCI_MMIO_ENABLED;
> 	if (IS_QLA82XX(ha))
> 		return PCI_ERS_RESULT_RECOVERED;
> 
> @@ -7520,10 +7505,11 @@ qla2xxx_pci_mmio_enabled(struct pci_dev *pdev)
> 		ql_log(ql_log_info, base_vha, 0x9003,
> 		    "RISC paused -- mmio_enabled, Dumping firmware.\n");
> 		qla2xxx_dump_fw(base_vha);
> -
> -		return PCI_ERS_RESULT_NEED_RESET;
> -	} else
> -		return PCI_ERS_RESULT_RECOVERED;
> +	}
> +	/* set PCI_ERS_RESULT_NEED_RESET to trigger call to qla2xxx_pci_slot_reset */
> +	ql_dbg(ql_dbg_aer, base_vha, 0x600d,
> +	       "mmio enabled returning.\n");
> +	return PCI_ERS_RESULT_NEED_RESET;
> }
> 
> static pci_ers_result_t
> @@ -7535,9 +7521,10 @@ qla2xxx_pci_slot_reset(struct pci_dev *pdev)
> 	int rc;
> 	struct qla_qpair *qpair = NULL;
> 
> -	ql_dbg(ql_dbg_aer, base_vha, 0x9004,
> -	    "Slot Reset.\n");
> +	ql_log(ql_log_warn, base_vha, 0x9004,
> +	       "Slot Reset.\n");
> 
> +	ha->pci_error_state = QLA_PCI_SLOT_RESET;
> 	/* Workaround: qla2xxx driver which access hardware earlier
> 	 * needs error state to be pci_channel_io_online.
> 	 * Otherwise mailbox command timesout.
> @@ -7571,16 +7558,24 @@ qla2xxx_pci_slot_reset(struct pci_dev *pdev)
> 		qpair->online = 1;
> 	mutex_unlock(&ha->mq_lock);
> 
> +	ha->flags.eeh_busy = 0;
> 	base_vha->flags.online = 1;
> 	set_bit(ABORT_ISP_ACTIVE, &base_vha->dpc_flags);
> -	if (ha->isp_ops->abort_isp(base_vha) == QLA_SUCCESS)
> -		ret =  PCI_ERS_RESULT_RECOVERED;
> +	ha->isp_ops->abort_isp(base_vha);
> 	clear_bit(ABORT_ISP_ACTIVE, &base_vha->dpc_flags);
> 
> +	if (qla2x00_isp_reg_stat(ha)) {
> +		ha->flags.eeh_busy = 1;
> +		qla_pci_error_cleanup(base_vha);
> +		ql_log(ql_log_warn, base_vha, 0x9005,
> +		       "Device unable to recover from PCI error.\n");
> +	} else {
> +		ret =  PCI_ERS_RESULT_RECOVERED;
> +	}
> 
> exit_slot_reset:
> 	ql_dbg(ql_dbg_aer, base_vha, 0x900e,
> -	    "slot_reset return %x.\n", ret);
> +	    "Slot Reset returning %x.\n", ret);
> 
> 	return ret;
> }
> @@ -7592,16 +7587,54 @@ qla2xxx_pci_resume(struct pci_dev *pdev)
> 	struct qla_hw_data *ha = base_vha->hw;
> 	int ret;
> 
> -	ql_dbg(ql_dbg_aer, base_vha, 0x900f,
> -	    "pci_resume.\n");
> +	ql_log(ql_log_warn, base_vha, 0x900f,
> +	       "Pci Resume.\n");
> 
> -	ha->flags.eeh_busy = 0;
> 
> 	ret = qla2x00_wait_for_hba_online(base_vha);
> 	if (ret != QLA_SUCCESS) {
> 		ql_log(ql_log_fatal, base_vha, 0x9002,
> 		    "The device failed to resume I/O from slot/link_reset.\n");
> 	}
> +	ha->pci_error_state = QLA_PCI_RESUME;
> +	ql_dbg(ql_dbg_aer, base_vha, 0x600d,
> +	       "Pci Resume returning.\n");
> +}
> +
> +void qla_pci_set_eeh_busy(struct scsi_qla_host *vha)
> +{
> +	struct qla_hw_data *ha = vha->hw;
> +	struct scsi_qla_host *base_vha = pci_get_drvdata(ha->pdev);
> +	bool do_cleanup = false;
> +	unsigned long flags;
> +
> +	if (ha->flags.eeh_busy)
> +		return;
> +
> +	spin_lock_irqsave(&base_vha->work_lock, flags);
> +	if (!ha->flags.eeh_busy) {
> +		ha->flags.eeh_busy = 1;
> +		do_cleanup = true;
> +	}
> +	spin_unlock_irqrestore(&base_vha->work_lock, flags);
> +
> +	if (do_cleanup)
> +		qla_pci_error_cleanup(base_vha);
> +}
> +
> +/* this routine will schedule a task to pause IO from interrupt context
> + * if caller sees a PCIE error event (register read = 0xf's)
> + */

Please fix comment formatting for multiple line.

> +void qla_schedule_eeh_work(struct scsi_qla_host *vha)
> +{
> +	struct qla_hw_data *ha = vha->hw;
> +	struct scsi_qla_host *base_vha = pci_get_drvdata(ha->pdev);
> +
> +	if (ha->flags.eeh_busy)
> +		return;
> +
> +	set_bit(DO_EEH_RECOVERY, &base_vha->dpc_flags);
> +	qla2xxx_wake_dpc(base_vha);
> }
> 
> static void
> -- 
> 2.19.0.rc0
> 

--
Himanshu Madhani	 Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 09/11] qla2xxx: fix mailbox recovery during PCIe error
  2021-03-23  4:42 ` [PATCH 09/11] qla2xxx: fix mailbox recovery during PCIe error Nilesh Javali
@ 2021-03-24 16:23   ` Himanshu Madhani
  0 siblings, 0 replies; 29+ messages in thread
From: Himanshu Madhani @ 2021-03-24 16:23 UTC (permalink / raw)
  To: Nilesh Javali; +Cc: Martin Petersen, linux-scsi, GR-QLogic-Storage-Upstream



> On Mar 22, 2021, at 11:42 PM, Nilesh Javali <njavali@marvell.com> wrote:
> 
> From: Quinn Tran <qutran@marvell.com>
> 
> For the mailbox thread that encounter PCIe error, pause that
> thread until PCIe link reset/recovery completed to prevent
> the thread from possibly unmapping any type of DMA resource that might
> be in progress at the same time.
> 
> Signed-off-by: Quinn Tran <qutran@marvell.com>
> Signed-off-by: Nilesh Javali <njavali@marvell.com>
> ---
> drivers/scsi/qla2xxx/qla_mbx.c | 38 ++++++++++++++++++++++++++--------
> 1 file changed, 29 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_mbx.c b/drivers/scsi/qla2xxx/qla_mbx.c
> index 0149f84cdd8e..3bc6020cfb8d 100644
> --- a/drivers/scsi/qla2xxx/qla_mbx.c
> +++ b/drivers/scsi/qla2xxx/qla_mbx.c
> @@ -102,7 +102,7 @@ qla2x00_mailbox_command(scsi_qla_host_t *vha, mbx_cmd_t *mcp)
> 	int		rval, i;
> 	unsigned long    flags = 0;
> 	device_reg_t *reg;
> -	uint8_t		abort_active;
> +	uint8_t		abort_active, eeh_delay;
> 	uint8_t		io_lock_on;
> 	uint16_t	command = 0;
> 	uint16_t	*iptr;
> @@ -136,7 +136,7 @@ qla2x00_mailbox_command(scsi_qla_host_t *vha, mbx_cmd_t *mcp)
> 		    "PCI error, exiting.\n");
> 		return QLA_FUNCTION_TIMEOUT;
> 	}
> -
> +	eeh_delay = 0;
> 	reg = ha->iobase;
> 	io_lock_on = base_vha->flags.init_done;
> 
> @@ -159,11 +159,10 @@ qla2x00_mailbox_command(scsi_qla_host_t *vha, mbx_cmd_t *mcp)
> 	}
> 
> 	/* check if ISP abort is active and return cmd with timeout */
> -	if ((test_bit(ABORT_ISP_ACTIVE, &base_vha->dpc_flags) ||
> -	    test_bit(ISP_ABORT_RETRY, &base_vha->dpc_flags) ||
> -	    test_bit(ISP_ABORT_NEEDED, &base_vha->dpc_flags) ||
> -	    ha->flags.eeh_busy) &&
> -	    !is_rom_cmd(mcp->mb[0])) {
> +	if (((test_bit(ABORT_ISP_ACTIVE, &base_vha->dpc_flags) ||
> +	      test_bit(ISP_ABORT_RETRY, &base_vha->dpc_flags) ||
> +	      test_bit(ISP_ABORT_NEEDED, &base_vha->dpc_flags)) &&
> +	      !is_rom_cmd(mcp->mb[0])) || ha->flags.eeh_busy) {
> 		ql_log(ql_log_info, vha, 0x1005,
> 		    "Cmd 0x%x aborted with timeout since ISP Abort is pending\n",
> 		    mcp->mb[0]);
> @@ -186,7 +185,11 @@ qla2x00_mailbox_command(scsi_qla_host_t *vha, mbx_cmd_t *mcp)
> 		return QLA_FUNCTION_TIMEOUT;
> 	}
> 	atomic_dec(&ha->num_pend_mbx_stage1);
> -	if (ha->flags.purge_mbox || chip_reset != ha->chip_reset) {
> +	if (ha->flags.purge_mbox || chip_reset != ha->chip_reset ||
> +	    ha->flags.eeh_busy) {
> +		ql_log(ql_log_warn, vha, 0xd035,
> +		       "Error detected: purge[%d] eeh[%d] cmd=0x%x, Exiting.\n",
> +		       ha->flags.purge_mbox, ha->flags.eeh_busy, mcp->mb[0]);
> 		rval = QLA_ABORTED;
> 		goto premature_exit;
> 	}
> @@ -266,6 +269,8 @@ qla2x00_mailbox_command(scsi_qla_host_t *vha, mbx_cmd_t *mcp)
> 		if (!wait_for_completion_timeout(&ha->mbx_intr_comp,
> 		    mcp->tov * HZ)) {
> 			if (chip_reset != ha->chip_reset) {
> +				eeh_delay = ha->flags.eeh_busy ? 1 : 0;
> +
> 				spin_lock_irqsave(&ha->hardware_lock, flags);
> 				ha->flags.mbox_busy = 0;
> 				spin_unlock_irqrestore(&ha->hardware_lock,
> @@ -283,6 +288,8 @@ qla2x00_mailbox_command(scsi_qla_host_t *vha, mbx_cmd_t *mcp)
> 
> 		} else if (ha->flags.purge_mbox ||
> 		    chip_reset != ha->chip_reset) {
> +			eeh_delay = ha->flags.eeh_busy ? 1 : 0;
> +
> 			spin_lock_irqsave(&ha->hardware_lock, flags);
> 			ha->flags.mbox_busy = 0;
> 			spin_unlock_irqrestore(&ha->hardware_lock, flags);
> @@ -324,6 +331,8 @@ qla2x00_mailbox_command(scsi_qla_host_t *vha, mbx_cmd_t *mcp)
> 		while (!ha->flags.mbox_int) {
> 			if (ha->flags.purge_mbox ||
> 			    chip_reset != ha->chip_reset) {
> +				eeh_delay = ha->flags.eeh_busy ? 1 : 0;
> +
> 				spin_lock_irqsave(&ha->hardware_lock, flags);
> 				ha->flags.mbox_busy = 0;
> 				spin_unlock_irqrestore(&ha->hardware_lock,
> @@ -532,7 +541,8 @@ qla2x00_mailbox_command(scsi_qla_host_t *vha, mbx_cmd_t *mcp)
> 				clear_bit(ISP_ABORT_NEEDED, &vha->dpc_flags);
> 				/* Allow next mbx cmd to come in. */
> 				complete(&ha->mbx_cmd_comp);
> -				if (ha->isp_ops->abort_isp(vha)) {
> +				if (ha->isp_ops->abort_isp(vha) &&
> +				    !ha->flags.eeh_busy) {
> 					/* Failed. retry later. */
> 					set_bit(ISP_ABORT_NEEDED,
> 					    &vha->dpc_flags);
> @@ -585,6 +595,16 @@ qla2x00_mailbox_command(scsi_qla_host_t *vha, mbx_cmd_t *mcp)
> 		ql_dbg(ql_dbg_mbx, base_vha, 0x1021, "Done %s.\n", __func__);
> 	}
> 
> +	i = 500;
> +	while (i && eeh_delay && (ha->pci_error_state < QLA_PCI_SLOT_RESET)) {
> +		/* The caller of this mailbox encounter pci error.
> +		 * Hold the thread until PCIE link reset complete to make
> +		 * sure caller does not unmap dma while recovery is
> +		 * in progress.
> +		 */

Small nit…. Fix comment formatting for multi line.

> +		msleep(1);
> +		i--;
> +	}
> 	return rval;
> }
> 
> -- 
> 2.19.0.rc0
> 

Code itself looks good. After fixing comment you can add

Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>

--
Himanshu Madhani	 Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 10/11] qla2xxx: include AER debug mask to default
  2021-03-23  4:42 ` [PATCH 10/11] qla2xxx: include AER debug mask to default Nilesh Javali
@ 2021-03-24 16:24   ` Himanshu Madhani
  0 siblings, 0 replies; 29+ messages in thread
From: Himanshu Madhani @ 2021-03-24 16:24 UTC (permalink / raw)
  To: Nilesh Javali; +Cc: Martin Petersen, linux-scsi, GR-QLogic-Storage-Upstream



> On Mar 22, 2021, at 11:42 PM, Nilesh Javali <njavali@marvell.com> wrote:
> 
> From: Quinn Tran <qutran@marvell.com>
> 
> Use PCIe AER debug mask as default.
> 
> Signed-off-by: Quinn Tran <qutran@marvell.com>
> Signed-off-by: Nilesh Javali <njavali@marvell.com>
> ---
> drivers/scsi/qla2xxx/qla_dbg.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_dbg.h b/drivers/scsi/qla2xxx/qla_dbg.h
> index 2e59e75c62b5..9eb708e5e22e 100644
> --- a/drivers/scsi/qla2xxx/qla_dbg.h
> +++ b/drivers/scsi/qla2xxx/qla_dbg.h
> @@ -308,7 +308,7 @@ struct qla2xxx_fw_dump {
> };
> 
> #define QL_MSGHDR "qla2xxx"
> -#define QL_DBG_DEFAULT1_MASK    0x1e400000
> +#define QL_DBG_DEFAULT1_MASK    0x1e600000
> 
> #define ql_log_fatal		0 /* display fatal errors */
> #define ql_log_warn		1 /* display critical errors */
> -- 
> 2.19.0.rc0
> 

Looks Good. 

Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>

--
Himanshu Madhani	 Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 11/11] qla2xxx: Update version to 10.02.00.106-k
  2021-03-23  4:42 ` [PATCH 11/11] qla2xxx: Update version to 10.02.00.106-k Nilesh Javali
@ 2021-03-24 16:24   ` Himanshu Madhani
  0 siblings, 0 replies; 29+ messages in thread
From: Himanshu Madhani @ 2021-03-24 16:24 UTC (permalink / raw)
  To: Nilesh Javali; +Cc: Martin Petersen, linux-scsi, GR-QLogic-Storage-Upstream



> On Mar 22, 2021, at 11:42 PM, Nilesh Javali <njavali@marvell.com> wrote:
> 
> Signed-off-by: Nilesh Javali <njavali@marvell.com>
> ---
> drivers/scsi/qla2xxx/qla_version.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_version.h b/drivers/scsi/qla2xxx/qla_version.h
> index 72c648442e8d..da11829fa12d 100644
> --- a/drivers/scsi/qla2xxx/qla_version.h
> +++ b/drivers/scsi/qla2xxx/qla_version.h
> @@ -6,9 +6,9 @@
> /*
>  * Driver version
>  */
> -#define QLA2XXX_VERSION      "10.02.00.105-k"
> +#define QLA2XXX_VERSION      "10.02.00.106-k"
> 
> #define QLA_DRIVER_MAJOR_VER	10
> #define QLA_DRIVER_MINOR_VER	2
> #define QLA_DRIVER_PATCH_VER	0
> -#define QLA_DRIVER_BETA_VER	105
> +#define QLA_DRIVER_BETA_VER	106
> -- 
> 2.19.0.rc0
> 

Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>

--
Himanshu Madhani	 Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 00/11] qla2xxx driver bug fixes
  2021-03-23  4:42 [PATCH 00/11] qla2xxx driver bug fixes Nilesh Javali
                   ` (10 preceding siblings ...)
  2021-03-23  4:42 ` [PATCH 11/11] qla2xxx: Update version to 10.02.00.106-k Nilesh Javali
@ 2021-03-24 20:55 ` Laurence Oberman
  11 siblings, 0 replies; 29+ messages in thread
From: Laurence Oberman @ 2021-03-24 20:55 UTC (permalink / raw)
  To: Nilesh Javali, martin.petersen; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

On Mon, 2021-03-22 at 21:42 -0700, Nilesh Javali wrote:
> Martin,
> 
> Please apply the qla2xxx driver bug fixes to the scsi tree at your
> earliest convenience.
> 
> Thanks,
> Nilesh
> 
> Arun Easi (3):
>   qla2xxx: Fix IOPS drop seen in some adapters
>   qla2xxx: Add H:C:T info in the log message for fc ports
>   qla2xxx: Fix crash in qla2xxx_mqueuecommand
> 
> Nilesh Javali (1):
>   qla2xxx: Update version to 10.02.00.106-k
> 
> Quinn Tran (7):
>   qla2xxx: fix stuck session
>   qla2xxx: consolidate zio threshold setting for both fcp & nvme
>   qla2xxx: Fix use after free in bsg
>   qla2xxx: fix RISC RESET completion polling
>   qla2xxx: fix crash in PCIe error handling
>   qla2xxx: fix mailbox recovery during PCIe error
>   qla2xxx: include AER debug mask to default
> 
>  drivers/scsi/qla2xxx/qla_bsg.c     |   3 +-
>  drivers/scsi/qla2xxx/qla_dbg.c     |  32 +++++
>  drivers/scsi/qla2xxx/qla_dbg.h     |   2 +-
>  drivers/scsi/qla2xxx/qla_def.h     |  12 +-
>  drivers/scsi/qla2xxx/qla_gbl.h     |   3 +
>  drivers/scsi/qla2xxx/qla_init.c    | 115 ++++++++++++----
>  drivers/scsi/qla2xxx/qla_inline.h  |  29 ++++
>  drivers/scsi/qla2xxx/qla_iocb.c    |  79 +++++++++--
>  drivers/scsi/qla2xxx/qla_isr.c     |   9 +-
>  drivers/scsi/qla2xxx/qla_mbx.c     |  37 +++--
>  drivers/scsi/qla2xxx/qla_nvme.c    |  10 +-
>  drivers/scsi/qla2xxx/qla_os.c      | 212 ++++++++++++++++-----------
> --
>  drivers/scsi/qla2xxx/qla_target.c  |   2 +-
>  drivers/scsi/qla2xxx/qla_version.h |   4 +-
>  14 files changed, 395 insertions(+), 154 deletions(-)
> 
> 
> base-commit: f749d8b7a9896bc6e5ffe104cc64345037e0b152

These were tested in the Red Hat Lab using aer error injection for a
very specific customer issue and passed our testing too.

qla2xxx: fix stuck session
qla2xxx: fix RISC RESET completion polling
qla2xxx: fix crash in PCIe error handling
qla2xxx: fix mailbox recovery during PCIe error
qla2xxx: include AER debug mask to default

Tested-by: Laurence Oberman <loberman@redhat.com>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 01/11] qla2xxx: Fix IOPS drop seen in some adapters
  2021-03-24 15:46   ` Himanshu Madhani
@ 2021-03-26  0:53     ` Arun Easi
  2021-03-26 13:38       ` Himanshu Madhani
  0 siblings, 1 reply; 29+ messages in thread
From: Arun Easi @ 2021-03-26  0:53 UTC (permalink / raw)
  To: Himanshu Madhani
  Cc: Nilesh Javali, Martin Petersen, linux-scsi, GR-QLogic-Storage-Upstream

Himanshu,

Thanks for the review. Comments inline..

On Wed, 24 Mar 2021, 8:46am, Himanshu Madhani wrote:

> 
> 
> > On Mar 22, 2021, at 11:42 PM, Nilesh Javali <njavali@marvell.com> wrote:
> > 
> > From: Arun Easi <aeasi@marvell.com>
> > 
> > Removing the response queue processing in the send path is showing IOPS
> > drop. Add back the process_response_queue() call in the send path.
> > 
> 
> Can you share some details of what effect this change made into IOPS? 
> 

I do not remember off the top of my head, I think a few K. I dont have the 
perf setup anymore. Would you still prefer a re-spin of this patch?

Regards,
-Arun

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 01/11] qla2xxx: Fix IOPS drop seen in some adapters
  2021-03-26  0:53     ` Arun Easi
@ 2021-03-26 13:38       ` Himanshu Madhani
  0 siblings, 0 replies; 29+ messages in thread
From: Himanshu Madhani @ 2021-03-26 13:38 UTC (permalink / raw)
  To: Arun Easi
  Cc: Nilesh Javali, Martin Petersen, linux-scsi, GR-QLogic-Storage-Upstream



> On Mar 25, 2021, at 7:53 PM, Arun Easi <aeasi@marvell.com> wrote:
> 
> Himanshu,
> 
> Thanks for the review. Comments inline..
> 
> On Wed, 24 Mar 2021, 8:46am, Himanshu Madhani wrote:
> 
>> 
>> 
>>> On Mar 22, 2021, at 11:42 PM, Nilesh Javali <njavali@marvell.com> wrote:
>>> 
>>> From: Arun Easi <aeasi@marvell.com>
>>> 
>>> Removing the response queue processing in the send path is showing IOPS
>>> drop. Add back the process_response_queue() call in the send path.
>>> 
>> 
>> Can you share some details of what effect this change made into IOPS? 
>> 
> 
> I do not remember off the top of my head, I think a few K. I dont have the 
> perf setup anymore. Would you still prefer a re-spin of this patch?
> 

I do remember this code path and past effort to improve IOPS with these changes. 

I am okay with this change itself. No need to respin. 

> Regards,
> -Arun

--
Himanshu Madhani	 Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 03/11] qla2xxx: fix stuck session
  2021-03-24 15:53     ` Himanshu Madhani
@ 2021-03-26 17:46       ` Quinn Tran
  2021-03-26 17:53         ` Himanshu Madhani
  0 siblings, 1 reply; 29+ messages in thread
From: Quinn Tran @ 2021-03-26 17:46 UTC (permalink / raw)
  To: Himanshu Madhani, Nilesh Javali
  Cc: Martin Petersen, linux-scsi, Bart Van Assche, GR-QLogic-Storage-Upstream



> On Mar 23, 2021, at 11:31 AM, Bart Van Assche <bvanassche@acm.org> wrote:
> 
> On 3/22/21 9:42 PM, Nilesh Javali wrote:
>> diff --git a/drivers/scsi/qla2xxx/qla_target.c 
>> b/drivers/scsi/qla2xxx/qla_target.c
>> index c48daf52725d..fa8c4dae8dce 100644
>> --- a/drivers/scsi/qla2xxx/qla_target.c
>> +++ b/drivers/scsi/qla2xxx/qla_target.c
>> @@ -1029,7 +1029,7 @@ void qlt_free_session_done(struct work_struct *work)
>> 			}
>> 			msleep(100);
>> 			cnt++;
>> -			if (cnt > 200)
>> +			if (cnt > 230)
>> 				break;
>> 		}
> 
> One magic constant is changed into another magic constant and that is 
> sufficient to fix a bug? Please add a comment that explains the 
> meaning of that constant.
> 

Agree with Bart here. 

How did you come up with this new count value?  Some more details (either in commit message or actual comment in code) would definitely help understand. If you have some log message snippet or stack trace add that to commit message.

QT:  FW timeout is 20seconds (cnt=200).  Driver time out is set at 22 seconds (220) to monitor the logout (2 seconds pad for worst case).   Changing from 200 -> 230 to allow the logout thread to finish its sequence before advancing the state.   Previous setting at 200/20s create a race condition where driver was allow to advance to relogin, while the logout completion straddles behind and modifies fields that interfere with the relogin.  This led to session being stuck.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 03/11] qla2xxx: fix stuck session
  2021-03-26 17:46       ` Quinn Tran
@ 2021-03-26 17:53         ` Himanshu Madhani
  0 siblings, 0 replies; 29+ messages in thread
From: Himanshu Madhani @ 2021-03-26 17:53 UTC (permalink / raw)
  To: Quinn Tran
  Cc: Nilesh Javali, Martin Petersen, linux-scsi, Bart Van Assche,
	GR-QLogic-Storage-Upstream



> On Mar 26, 2021, at 12:46 PM, Quinn Tran <qutran@marvell.com> wrote:
> 
> 
> 
>> On Mar 23, 2021, at 11:31 AM, Bart Van Assche <bvanassche@acm.org> wrote:
>> 
>> On 3/22/21 9:42 PM, Nilesh Javali wrote:
>>> diff --git a/drivers/scsi/qla2xxx/qla_target.c 
>>> b/drivers/scsi/qla2xxx/qla_target.c
>>> index c48daf52725d..fa8c4dae8dce 100644
>>> --- a/drivers/scsi/qla2xxx/qla_target.c
>>> +++ b/drivers/scsi/qla2xxx/qla_target.c
>>> @@ -1029,7 +1029,7 @@ void qlt_free_session_done(struct work_struct *work)
>>> 			}
>>> 			msleep(100);
>>> 			cnt++;
>>> -			if (cnt > 200)
>>> +			if (cnt > 230)
>>> 				break;
>>> 		}
>> 
>> One magic constant is changed into another magic constant and that is 
>> sufficient to fix a bug? Please add a comment that explains the 
>> meaning of that constant.
>> 
> 
> Agree with Bart here. 
> 
> How did you come up with this new count value?  Some more details (either in commit message or actual comment in code) would definitely help understand. If you have some log message snippet or stack trace add that to commit message.
> 
> QT:  FW timeout is 20seconds (cnt=200).  Driver time out is set at 22 seconds (220) to monitor the logout (2 seconds pad for worst case).   Changing from 200 -> 230 to allow the logout thread to finish its sequence before advancing the state.   Previous setting at 200/20s create a race condition where driver was allow to advance to relogin, while the logout completion straddles behind and modifies fields that interfere with the relogin.  This led to session being stuck.
> 

Would you add this as a comment above the if() statement for the future would be nice.

You can add my R-B to the patch  
> 

--
Himanshu Madhani	 Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2021-03-26 17:54 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-23  4:42 [PATCH 00/11] qla2xxx driver bug fixes Nilesh Javali
2021-03-23  4:42 ` [PATCH 01/11] qla2xxx: Fix IOPS drop seen in some adapters Nilesh Javali
2021-03-24 15:46   ` Himanshu Madhani
2021-03-26  0:53     ` Arun Easi
2021-03-26 13:38       ` Himanshu Madhani
2021-03-23  4:42 ` [PATCH 02/11] qla2xxx: Add H:C:T info in the log message for fc ports Nilesh Javali
2021-03-24 15:48   ` Himanshu Madhani
2021-03-23  4:42 ` [PATCH 03/11] qla2xxx: fix stuck session Nilesh Javali
2021-03-23 16:31   ` Bart Van Assche
2021-03-24 15:53     ` Himanshu Madhani
2021-03-26 17:46       ` Quinn Tran
2021-03-26 17:53         ` Himanshu Madhani
2021-03-23  4:42 ` [PATCH 04/11] qla2xxx: consolidate zio threshold setting for both fcp & nvme Nilesh Javali
2021-03-24 15:55   ` Himanshu Madhani
2021-03-23  4:42 ` [PATCH 05/11] qla2xxx: Fix use after free in bsg Nilesh Javali
2021-03-24 15:57   ` Himanshu Madhani
2021-03-23  4:42 ` [PATCH 06/11] qla2xxx: Fix crash in qla2xxx_mqueuecommand Nilesh Javali
2021-03-24 15:57   ` Himanshu Madhani
2021-03-23  4:42 ` [PATCH 07/11] qla2xxx: fix RISC RESET completion polling Nilesh Javali
2021-03-24 16:03   ` Himanshu Madhani
2021-03-23  4:42 ` [PATCH 08/11] qla2xxx: fix crash in PCIe error handling Nilesh Javali
2021-03-24 16:21   ` Himanshu Madhani
2021-03-23  4:42 ` [PATCH 09/11] qla2xxx: fix mailbox recovery during PCIe error Nilesh Javali
2021-03-24 16:23   ` Himanshu Madhani
2021-03-23  4:42 ` [PATCH 10/11] qla2xxx: include AER debug mask to default Nilesh Javali
2021-03-24 16:24   ` Himanshu Madhani
2021-03-23  4:42 ` [PATCH 11/11] qla2xxx: Update version to 10.02.00.106-k Nilesh Javali
2021-03-24 16:24   ` Himanshu Madhani
2021-03-24 20:55 ` [PATCH 00/11] qla2xxx driver bug fixes Laurence Oberman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.