All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/5] scsi: qla2xxx: fixes for driver unloading
@ 2020-03-27 16:47 mwilck
  2020-03-27 16:47 ` [PATCH v3 1/5] scsi: qla2xxx: set UNLOADING before waiting for session deletion mwilck
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: mwilck @ 2020-03-27 16:47 UTC (permalink / raw)
  To: Martin K. Petersen, Arun Easi, Quinn Tran, Himanshu Madhani
  Cc: Roman Bolshakov, Daniel Wagner, Bart Van Assche, James Bottomley,
	Hannes Reinecke, linux-scsi, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

Hello Martin, Arun, Himanshu, all,

here is v3 of the little series I first submitted on Nov 29, 2019.
It's pretty much a complete rewrite, except for 1/5 which was 3/3 in v2.

Reviews welcome.
Martin

Changes since v2:
 - Removed "scsi: qla2xxx: avoid sending mailbox commands if firmware is
   stopped", because the first hunk is obsoleted by the (new) 1/3, and Arun
   suggested to use a different approach (which is now in 4/3) for the second
   hunk.
 - Removed "scsi: qla2xxx: don't shut down firmware before closing sessions"
   (nak'd by Arun).
 - Former 3/3 is now 1/5
 - Added "scsi: qla2xxx: check UNLOADING before posting async work". This one
   is key for avoiding lags when qla2xxx is unloaded.
 - Added revert of "scsi: qla2xxx: Fix unbound sleep in fcport delete path.",
   as I believe it's now obsolete.
   If we ever encounter unbound sleep there again, we should rather figure
   out the reason than simply abort waiting.
 - Added patch 4 and 5, a new attempt at avoiding mailbox and HW request queue
   access at low level. 4/5 was motivated by Arun's comments on my v2 series.
   5/5 is obviously similar in spirit to 77ddb94a4853 ("scsi: qla2xxx: Only
   allow operational MBX to proceed during RESET."), but I found that the
   rom_cmds list contains commands that would hang when the FW is stopped,
   so I created a new list. Perhaps some day the two can be consolidated.

Changes since v1:
 - Added patch 3 to set the UNLOADING flag before waiting for sessions
   to end (Roman)
 - Use test_and_set_bit() for UNLOADING (Hannes)

Martin Wilck (5):
  scsi: qla2xxx: set UNLOADING before waiting for session deletion
  scsi: qla2xxx: check UNLOADING before posting async work
  Revert "scsi: qla2xxx: Fix unbound sleep in fcport delete path."
  scsi: qla2xxx: avoid sending iocbs when firmware is stopped
  scsi: qla2xxx: only send certain mailbox commands to stopped firmware

 drivers/scsi/qla2xxx/qla_inline.h |  3 ++
 drivers/scsi/qla2xxx/qla_iocb.c   | 23 ++++++++++++++++
 drivers/scsi/qla2xxx/qla_mbx.c    | 46 +++++++++++++++++++++++++++++++
 drivers/scsi/qla2xxx/qla_nvme.c   |  3 ++
 drivers/scsi/qla2xxx/qla_os.c     | 35 ++++++++++++-----------
 drivers/scsi/qla2xxx/qla_target.c |  4 ---
 6 files changed, 92 insertions(+), 22 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 1/5] scsi: qla2xxx: set UNLOADING before waiting for session deletion
  2020-03-27 16:47 [PATCH v3 0/5] scsi: qla2xxx: fixes for driver unloading mwilck
@ 2020-03-27 16:47 ` mwilck
  2020-04-01 17:02   ` Arun Easi
  2020-03-27 16:47 ` [PATCH v3 2/5] scsi: qla2xxx: check UNLOADING before posting async work mwilck
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: mwilck @ 2020-03-27 16:47 UTC (permalink / raw)
  To: Martin K. Petersen, Arun Easi, Quinn Tran, Himanshu Madhani
  Cc: Roman Bolshakov, Daniel Wagner, Bart Van Assche, James Bottomley,
	Hannes Reinecke, linux-scsi, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

The purpose of the UNLOADING flag is to avoid port login procedures
to continue when a controller is in the process of shutting down.
It makes sense to set this flag before starting session teardown.

Furthermore, use atomic test_and_set_bit() to avoid the shutdown
being run multiple times in parallel. In qla2x00_disable_board_on_pci_error(),
the test for UNLOADING is postponed until after the check for an already
disabled PCI board.

Fixes: 45235022da99 ("scsi: qla2xxx: Fix driver unload by shutting down chip")
Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 drivers/scsi/qla2xxx/qla_os.c | 32 ++++++++++++++------------------
 1 file changed, 14 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index d9072ea..ce0dabb 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -3732,6 +3732,13 @@ qla2x00_remove_one(struct pci_dev *pdev)
 	}
 	qla2x00_wait_for_hba_ready(base_vha);
 
+	/*
+	 * if UNLOADING flag is already set, then continue unload,
+	 * where it was set first.
+	 */
+	if (test_and_set_bit(UNLOADING, &base_vha->dpc_flags))
+		return;
+
 	if (IS_QLA25XX(ha) || IS_QLA2031(ha) || IS_QLA27XX(ha) ||
 	    IS_QLA28XX(ha)) {
 		if (ha->flags.fw_started)
@@ -3750,15 +3757,6 @@ qla2x00_remove_one(struct pci_dev *pdev)
 
 	qla2x00_wait_for_sess_deletion(base_vha);
 
-	/*
-	 * if UNLOAD flag is already set, then continue unload,
-	 * where it was set first.
-	 */
-	if (test_bit(UNLOADING, &base_vha->dpc_flags))
-		return;
-
-	set_bit(UNLOADING, &base_vha->dpc_flags);
-
 	qla_nvme_delete(base_vha);
 
 	dma_free_coherent(&ha->pdev->dev,
@@ -6628,13 +6626,6 @@ qla2x00_disable_board_on_pci_error(struct work_struct *work)
 	struct pci_dev *pdev = ha->pdev;
 	scsi_qla_host_t *base_vha = pci_get_drvdata(ha->pdev);
 
-	/*
-	 * if UNLOAD flag is already set, then continue unload,
-	 * where it was set first.
-	 */
-	if (test_bit(UNLOADING, &base_vha->dpc_flags))
-		return;
-
 	ql_log(ql_log_warn, base_vha, 0x015b,
 	    "Disabling adapter.\n");
 
@@ -6645,9 +6636,14 @@ qla2x00_disable_board_on_pci_error(struct work_struct *work)
 		return;
 	}
 
-	qla2x00_wait_for_sess_deletion(base_vha);
+	/*
+	 * if UNLOADING flag is already set, then continue unload,
+	 * where it was set first.
+	 */
+	if (test_and_set_bit(UNLOADING, &base_vha->dpc_flags))
+		return;
 
-	set_bit(UNLOADING, &base_vha->dpc_flags);
+	qla2x00_wait_for_sess_deletion(base_vha);
 
 	qla2x00_delete_all_vps(ha, base_vha);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 2/5] scsi: qla2xxx: check UNLOADING before posting async work
  2020-03-27 16:47 [PATCH v3 0/5] scsi: qla2xxx: fixes for driver unloading mwilck
  2020-03-27 16:47 ` [PATCH v3 1/5] scsi: qla2xxx: set UNLOADING before waiting for session deletion mwilck
@ 2020-03-27 16:47 ` mwilck
  2020-04-01 17:04   ` Arun Easi
  2020-03-27 16:47 ` [PATCH v3 3/5] Revert "scsi: qla2xxx: Fix unbound sleep in fcport delete path." mwilck
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: mwilck @ 2020-03-27 16:47 UTC (permalink / raw)
  To: Martin K. Petersen, Arun Easi, Quinn Tran, Himanshu Madhani
  Cc: Roman Bolshakov, Daniel Wagner, Bart Van Assche, James Bottomley,
	Hannes Reinecke, linux-scsi, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

qlt_free_session_done() tries to post async PRLO / LOGO, and
waits for the completion of these async commands. If UNLOADING
is set, this is doomed to timeout, because the async logout
command will never complete.

The only way to avoid waiting pointlessly is to fail posting
these commands in the first place if the driver is in UNLOADING state.
In general, posting any command should be avoided when the driver
is UNLOADING.

With this patch, "rmmod qla2xxx" completes without noticeable delay.

Fixes: 45235022da99 ("scsi: qla2xxx: Fix driver unload by shutting down chip")
Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 drivers/scsi/qla2xxx/qla_os.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index ce0dabb..eb25cf5 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -4933,6 +4933,9 @@ int qla2x00_post_async_##name##_work(		\
 {						\
 	struct qla_work_evt *e;			\
 						\
+	if (test_bit(UNLOADING, &vha->dpc_flags)) \
+		return QLA_FUNCTION_FAILED;	\
+						\
 	e = qla2x00_alloc_work(vha, type);	\
 	if (!e)					\
 		return QLA_FUNCTION_FAILED;	\
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 3/5] Revert "scsi: qla2xxx: Fix unbound sleep in fcport delete path."
  2020-03-27 16:47 [PATCH v3 0/5] scsi: qla2xxx: fixes for driver unloading mwilck
  2020-03-27 16:47 ` [PATCH v3 1/5] scsi: qla2xxx: set UNLOADING before waiting for session deletion mwilck
  2020-03-27 16:47 ` [PATCH v3 2/5] scsi: qla2xxx: check UNLOADING before posting async work mwilck
@ 2020-03-27 16:47 ` mwilck
  2020-04-01 17:12   ` [EXT] " Arun Easi
  2020-03-27 16:47 ` [PATCH v3 4/5] scsi: qla2xxx: avoid sending iocbs when firmware is stopped mwilck
  2020-03-27 16:47 ` [PATCH v3 5/5] scsi: qla2xxx: only send certain mailbox commands to stopped firmware mwilck
  4 siblings, 1 reply; 11+ messages in thread
From: mwilck @ 2020-03-27 16:47 UTC (permalink / raw)
  To: Martin K. Petersen, Arun Easi, Quinn Tran, Himanshu Madhani
  Cc: Roman Bolshakov, Daniel Wagner, Bart Van Assche, James Bottomley,
	Hannes Reinecke, linux-scsi, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

This reverts commit c3b6a1d397420a0fdd97af2f06abfb78adc370df.
Aborting the sleep was risky, because after return from
qlt_free_session_done() the driver starts freeing resources,
which is dangerous while we know that there's pending IO.

The previous patch "scsi: qla2xxx: check UNLOADING before posting async
work" avoids sending this IO in the first place, and thus obsoletes
the dangerous timeout.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 drivers/scsi/qla2xxx/qla_target.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c
index 622e733..eec1338 100644
--- a/drivers/scsi/qla2xxx/qla_target.c
+++ b/drivers/scsi/qla2xxx/qla_target.c
@@ -1019,7 +1019,6 @@ void qlt_free_session_done(struct work_struct *work)
 
 	if (logout_started) {
 		bool traced = false;
-		u16 cnt = 0;
 
 		while (!READ_ONCE(sess->logout_completed)) {
 			if (!traced) {
@@ -1029,9 +1028,6 @@ void qlt_free_session_done(struct work_struct *work)
 				traced = true;
 			}
 			msleep(100);
-			cnt++;
-			if (cnt > 200)
-				break;
 		}
 
 		ql_dbg(ql_dbg_disc, vha, 0xf087,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 4/5] scsi: qla2xxx: avoid sending iocbs when firmware is stopped
  2020-03-27 16:47 [PATCH v3 0/5] scsi: qla2xxx: fixes for driver unloading mwilck
                   ` (2 preceding siblings ...)
  2020-03-27 16:47 ` [PATCH v3 3/5] Revert "scsi: qla2xxx: Fix unbound sleep in fcport delete path." mwilck
@ 2020-03-27 16:47 ` mwilck
  2020-03-27 16:47 ` [PATCH v3 5/5] scsi: qla2xxx: only send certain mailbox commands to stopped firmware mwilck
  4 siblings, 0 replies; 11+ messages in thread
From: mwilck @ 2020-03-27 16:47 UTC (permalink / raw)
  To: Martin K. Petersen, Arun Easi, Quinn Tran, Himanshu Madhani
  Cc: Roman Bolshakov, Daniel Wagner, Bart Van Assche, James Bottomley,
	Hannes Reinecke, linux-scsi, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

Since commit 45235022da99 ("scsi: qla2xxx: Fix driver unload by shutting down chip"),
it is possible that FC commands are scheduled after the adapter firmware
has been shut down. IO sent to the firmware in this situation hangs.
Avoid starting iocbs in this situation.

Fixes: 45235022da99 ("scsi: qla2xxx: Fix driver unload by shutting down chip")
Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 drivers/scsi/qla2xxx/qla_inline.h |  3 +++
 drivers/scsi/qla2xxx/qla_iocb.c   | 23 +++++++++++++++++++++++
 drivers/scsi/qla2xxx/qla_nvme.c   |  3 +++
 3 files changed, 29 insertions(+)

diff --git a/drivers/scsi/qla2xxx/qla_inline.h b/drivers/scsi/qla2xxx/qla_inline.h
index 364b3db..f4352f1 100644
--- a/drivers/scsi/qla2xxx/qla_inline.h
+++ b/drivers/scsi/qla2xxx/qla_inline.h
@@ -322,6 +322,9 @@ qla_83xx_start_iocbs(struct qla_qpair *qpair)
 {
 	struct req_que *req = qpair->req;
 
+	if (!qpair->vha->hw->flags.fw_started)
+		return;
+
 	req->ring_index++;
 	if (req->ring_index == req->length) {
 		req->ring_index = 0;
diff --git a/drivers/scsi/qla2xxx/qla_iocb.c b/drivers/scsi/qla2xxx/qla_iocb.c
index 182bd68c..587ba35 100644
--- a/drivers/scsi/qla2xxx/qla_iocb.c
+++ b/drivers/scsi/qla2xxx/qla_iocb.c
@@ -459,6 +459,9 @@ qla2x00_start_iocbs(struct scsi_qla_host *vha, struct req_que *req)
 	struct qla_hw_data *ha = vha->hw;
 	device_reg_t *reg = ISP_QUE_REG(ha, req->id);
 
+	if (!ha->flags.fw_started)
+		return;
+
 	if (IS_P3P_TYPE(ha)) {
 		qla82xx_start_iocbs(vha);
 	} else {
@@ -1603,6 +1606,9 @@ qla24xx_start_scsi(srb_t *sp)
 	struct scsi_qla_host *vha = sp->vha;
 	struct qla_hw_data *ha = vha->hw;
 
+	if (!ha->flags.fw_started)
+		return QLA_FUNCTION_FAILED;
+
 	/* Setup device pointers. */
 	req = vha->req;
 
@@ -1740,6 +1746,9 @@ qla24xx_dif_start_scsi(srb_t *sp)
 
 #define QDSS_GOT_Q_SPACE	BIT_0
 
+	if (!ha->flags.fw_started)
+		return QLA_FUNCTION_FAILED;
+
 	/* Only process protection or >16 cdb in this routine */
 	if (scsi_get_prot_op(cmd) == SCSI_PROT_NORMAL) {
 		if (cmd->cmd_len <= 16)
@@ -1921,6 +1930,9 @@ qla2xxx_start_scsi_mq(srb_t *sp)
 	struct qla_hw_data *ha = vha->hw;
 	struct qla_qpair *qpair = sp->qpair;
 
+	if (!ha->flags.fw_started)
+		return QLA_FUNCTION_FAILED;
+
 	/* Acquire qpair specific lock */
 	spin_lock_irqsave(&qpair->qp_lock, flags);
 
@@ -2062,6 +2074,11 @@ qla2xxx_dif_start_scsi_mq(srb_t *sp)
 
 #define QDSS_GOT_Q_SPACE	BIT_0
 
+	if (!ha->flags.fw_started) {
+		cmd->result = DID_NO_CONNECT << 16;
+		return QLA_FUNCTION_FAILED;
+	}
+
 	/* Check for host side state */
 	if (!qpair->online) {
 		cmd->result = DID_NO_CONNECT << 16;
@@ -3224,6 +3241,9 @@ qla82xx_start_scsi(srb_t *sp)
 	struct req_que *req = NULL;
 	struct rsp_que *rsp = NULL;
 
+	if (!ha->flags.fw_started)
+		return QLA_FUNCTION_FAILED;
+
 	/* Setup device pointers. */
 	reg = &ha->iobase->isp82;
 	cmd = GET_CMD_SP(sp);
@@ -3676,6 +3696,9 @@ qla2x00_start_sp(srb_t *sp)
 	void *pkt;
 	unsigned long flags;
 
+	if (!ha->flags.fw_started)
+		return QLA_FUNCTION_FAILED;
+
 	spin_lock_irqsave(qp->qp_lock_ptr, flags);
 	pkt = __qla2x00_alloc_iocbs(sp->qpair, sp);
 	if (!pkt) {
diff --git a/drivers/scsi/qla2xxx/qla_nvme.c b/drivers/scsi/qla2xxx/qla_nvme.c
index 84e2a980..4f36e73 100644
--- a/drivers/scsi/qla2xxx/qla_nvme.c
+++ b/drivers/scsi/qla2xxx/qla_nvme.c
@@ -369,6 +369,9 @@ static inline int qla2x00_start_nvme_mq(srb_t *sp)
 	struct nvmefc_fcp_req *fd = nvme->u.nvme.desc;
 	uint32_t        rval = QLA_SUCCESS;
 
+	if (!ha->flags.fw_started)
+		return QLA_FUNCTION_FAILED;
+
 	/* Setup qpair pointers */
 	req = qpair->req;
 	tot_dsds = fd->sg_cnt;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 5/5] scsi: qla2xxx: only send certain mailbox commands to stopped firmware
  2020-03-27 16:47 [PATCH v3 0/5] scsi: qla2xxx: fixes for driver unloading mwilck
                   ` (3 preceding siblings ...)
  2020-03-27 16:47 ` [PATCH v3 4/5] scsi: qla2xxx: avoid sending iocbs when firmware is stopped mwilck
@ 2020-03-27 16:47 ` mwilck
  2020-03-30 20:15   ` Martin Wilck
  4 siblings, 1 reply; 11+ messages in thread
From: mwilck @ 2020-03-27 16:47 UTC (permalink / raw)
  To: Martin K. Petersen, Arun Easi, Quinn Tran, Himanshu Madhani
  Cc: Roman Bolshakov, Daniel Wagner, Bart Van Assche, James Bottomley,
	Hannes Reinecke, linux-scsi, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

Since commit 45235022da99 ("scsi: qla2xxx: Fix driver unload by shutting
down chip"), it is possible that FC commands are scheduled after the
adapter firmware has been shut down. IO sent to the firmware in this
situation may hang. Only certain mailbox commands should be sent in
this situation.

This patch identifies the mailbox commands sent during adapter
initialization (before QLA_FW_STARTED() is called) and allows only
these to be sent to the firmware in stopped state.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 drivers/scsi/qla2xxx/qla_mbx.c | 46 ++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/drivers/scsi/qla2xxx/qla_mbx.c b/drivers/scsi/qla2xxx/qla_mbx.c
index 9fd83d1..4a9a583 100644
--- a/drivers/scsi/qla2xxx/qla_mbx.c
+++ b/drivers/scsi/qla2xxx/qla_mbx.c
@@ -77,6 +77,45 @@ static int is_rom_cmd(uint16_t cmd)
 	return 0;
 }
 
+/*
+ * Mailbox commands that must (and can) be sent to the Firmware
+ * even if it isn't running. IOW, commands that are sent before
+ * QLA_FW_STARTED() is called.
+ */
+static uint16_t fw_stopped_cmds[] = {
+	MBC_EXECUTE_FIRMWARE,
+	MBC_READ_RAM_WORD,
+	MBC_MAILBOX_REGISTER_TEST,
+	MBC_VERIFY_CHECKSUM,
+	MBC_GET_FIRMWARE_VERSION,
+	MBC_LOAD_RISC_RAM,
+	MBC_LOAD_RISC_RAM_EXTENDED,
+	MBC_WRITE_RAM_WORD_EXTENDED,
+	MBC_READ_RAM_EXTENDED,
+	MBC_GET_ADAPTER_LOOP_ID,
+	MBC_GET_SET_ZIO_THRESHOLD,
+	MBC_GET_FIRMWARE_OPTION,
+	MBC_GET_MEM_OFFLOAD_CNTRL_STAT,
+	MBC_SET_FIRMWARE_OPTION,
+	MBC_GET_RESOURCE_COUNTS,
+	MBC_INITIALIZE_FIRMWARE,
+	MBC_TRACE_CONTROL,
+	MBC_READ_SFP,
+	MBC_MID_INITIALIZE_FIRMWARE,
+	MBC_FLASH_ACCESS_CTRL,
+};
+
+static bool must_send_if_fw_stopped(uint16_t cmd)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(fw_stopped_cmds); i++) {
+		if (fw_stopped_cmds[i] == cmd)
+			return true;
+	}
+	return false;
+}
+
 /*
  * qla2x00_mailbox_command
  *	Issue mailbox command and waits for completion.
@@ -169,6 +208,13 @@ qla2x00_mailbox_command(scsi_qla_host_t *vha, mbx_cmd_t *mcp)
 		return QLA_FUNCTION_TIMEOUT;
 	}
 
+	if (!ha->flags.fw_started && !must_send_if_fw_stopped(mcp->mb[0])) {
+		ql_log(ql_log_info, vha, 0x1006,
+		       "Cmd 0x%x skipped with timeout since FW is stopped\n",
+		       mcp->mb[0]);
+		return QLA_FUNCTION_TIMEOUT;
+	}
+
 	atomic_inc(&ha->num_pend_mbx_stage1);
 	/*
 	 * Wait for active mailbox commands to finish by waiting at most tov
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 5/5] scsi: qla2xxx: only send certain mailbox commands to stopped firmware
  2020-03-27 16:47 ` [PATCH v3 5/5] scsi: qla2xxx: only send certain mailbox commands to stopped firmware mwilck
@ 2020-03-30 20:15   ` Martin Wilck
  0 siblings, 0 replies; 11+ messages in thread
From: Martin Wilck @ 2020-03-30 20:15 UTC (permalink / raw)
  To: Martin K. Petersen, Arun Easi, Quinn Tran, Himanshu Madhani
  Cc: Roman Bolshakov, Daniel Wagner, Bart Van Assche, James Bottomley,
	Hannes Reinecke, linux-scsi

On Fri, 2020-03-27 at 17:47 +0100, mwilck@suse.com wrote:
> From: Martin Wilck <mwilck@suse.com>
> 
> Since commit 45235022da99 ("scsi: qla2xxx: Fix driver unload by
> shutting
> down chip"), it is possible that FC commands are scheduled after the
> adapter firmware has been shut down. IO sent to the firmware in this
> situation may hang. Only certain mailbox commands should be sent in
> this situation.
> 
> This patch identifies the mailbox commands sent during adapter
> initialization (before QLA_FW_STARTED() is called) and allows only
> these to be sent to the firmware in stopped state.
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  drivers/scsi/qla2xxx/qla_mbx.c | 46
> ++++++++++++++++++++++++++++++++++
>  1 file changed, 46 insertions(+)

Testing with different HW revealed that the list of allowed commands
needs to be extended. Forget this patch for now, please.

Regards
Martin



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/5] scsi: qla2xxx: set UNLOADING before waiting for session deletion
  2020-03-27 16:47 ` [PATCH v3 1/5] scsi: qla2xxx: set UNLOADING before waiting for session deletion mwilck
@ 2020-04-01 17:02   ` Arun Easi
  0 siblings, 0 replies; 11+ messages in thread
From: Arun Easi @ 2020-04-01 17:02 UTC (permalink / raw)
  To: Martin Wilck
  Cc: Martin K. Petersen, Quinn Tran, Himanshu Madhani,
	Roman Bolshakov, Daniel Wagner, Bart Van Assche, James Bottomley,
	Hannes Reinecke, linux-scsi

On Fri, 27 Mar 2020, 9:47am, mwilck@suse.com wrote:

> From: Martin Wilck <mwilck@suse.com>
> 
> The purpose of the UNLOADING flag is to avoid port login procedures
> to continue when a controller is in the process of shutting down.
> It makes sense to set this flag before starting session teardown.
> 
> Furthermore, use atomic test_and_set_bit() to avoid the shutdown
> being run multiple times in parallel. In qla2x00_disable_board_on_pci_error(),
> the test for UNLOADING is postponed until after the check for an already
> disabled PCI board.
> 
> Fixes: 45235022da99 ("scsi: qla2xxx: Fix driver unload by shutting down chip")
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  drivers/scsi/qla2xxx/qla_os.c | 32 ++++++++++++++------------------
>  1 file changed, 14 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
> index d9072ea..ce0dabb 100644
> --- a/drivers/scsi/qla2xxx/qla_os.c
> +++ b/drivers/scsi/qla2xxx/qla_os.c
> @@ -3732,6 +3732,13 @@ qla2x00_remove_one(struct pci_dev *pdev)
>  	}
>  	qla2x00_wait_for_hba_ready(base_vha);
>  
> +	/*
> +	 * if UNLOADING flag is already set, then continue unload,
> +	 * where it was set first.
> +	 */
> +	if (test_and_set_bit(UNLOADING, &base_vha->dpc_flags))
> +		return;
> +
>  	if (IS_QLA25XX(ha) || IS_QLA2031(ha) || IS_QLA27XX(ha) ||
>  	    IS_QLA28XX(ha)) {
>  		if (ha->flags.fw_started)
> @@ -3750,15 +3757,6 @@ qla2x00_remove_one(struct pci_dev *pdev)
>  
>  	qla2x00_wait_for_sess_deletion(base_vha);
>  
> -	/*
> -	 * if UNLOAD flag is already set, then continue unload,
> -	 * where it was set first.
> -	 */
> -	if (test_bit(UNLOADING, &base_vha->dpc_flags))
> -		return;
> -
> -	set_bit(UNLOADING, &base_vha->dpc_flags);
> -
>  	qla_nvme_delete(base_vha);
>  
>  	dma_free_coherent(&ha->pdev->dev,
> @@ -6628,13 +6626,6 @@ qla2x00_disable_board_on_pci_error(struct work_struct *work)
>  	struct pci_dev *pdev = ha->pdev;
>  	scsi_qla_host_t *base_vha = pci_get_drvdata(ha->pdev);
>  
> -	/*
> -	 * if UNLOAD flag is already set, then continue unload,
> -	 * where it was set first.
> -	 */
> -	if (test_bit(UNLOADING, &base_vha->dpc_flags))
> -		return;
> -
>  	ql_log(ql_log_warn, base_vha, 0x015b,
>  	    "Disabling adapter.\n");
>  
> @@ -6645,9 +6636,14 @@ qla2x00_disable_board_on_pci_error(struct work_struct *work)
>  		return;
>  	}
>  
> -	qla2x00_wait_for_sess_deletion(base_vha);
> +	/*
> +	 * if UNLOADING flag is already set, then continue unload,
> +	 * where it was set first.
> +	 */
> +	if (test_and_set_bit(UNLOADING, &base_vha->dpc_flags))
> +		return;
>  
> -	set_bit(UNLOADING, &base_vha->dpc_flags);
> +	qla2x00_wait_for_sess_deletion(base_vha);
>  
>  	qla2x00_delete_all_vps(ha, base_vha);
>  
> 

Reviewed-by: Arun Easi <aeasi@marvell.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/5] scsi: qla2xxx: check UNLOADING before posting async work
  2020-03-27 16:47 ` [PATCH v3 2/5] scsi: qla2xxx: check UNLOADING before posting async work mwilck
@ 2020-04-01 17:04   ` Arun Easi
  0 siblings, 0 replies; 11+ messages in thread
From: Arun Easi @ 2020-04-01 17:04 UTC (permalink / raw)
  To: Martin Wilck
  Cc: Martin K. Petersen, Quinn Tran, Roman Bolshakov, Daniel Wagner,
	Bart Van Assche, James Bottomley, Hannes Reinecke, linux-scsi

On Fri, 27 Mar 2020, 9:47am, mwilck@suse.com wrote:

> From: Martin Wilck <mwilck@suse.com>
> 
> qlt_free_session_done() tries to post async PRLO / LOGO, and
> waits for the completion of these async commands. If UNLOADING
> is set, this is doomed to timeout, because the async logout
> command will never complete.
> 
> The only way to avoid waiting pointlessly is to fail posting
> these commands in the first place if the driver is in UNLOADING state.
> In general, posting any command should be avoided when the driver
> is UNLOADING.
> 
> With this patch, "rmmod qla2xxx" completes without noticeable delay.
> 
> Fixes: 45235022da99 ("scsi: qla2xxx: Fix driver unload by shutting down chip")
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  drivers/scsi/qla2xxx/qla_os.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
> index ce0dabb..eb25cf5 100644
> --- a/drivers/scsi/qla2xxx/qla_os.c
> +++ b/drivers/scsi/qla2xxx/qla_os.c
> @@ -4933,6 +4933,9 @@ int qla2x00_post_async_##name##_work(		\
>  {						\
>  	struct qla_work_evt *e;			\
>  						\
> +	if (test_bit(UNLOADING, &vha->dpc_flags)) \
> +		return QLA_FUNCTION_FAILED;	\
> +						\

Martin,

Could you move this check to qla2x00_alloc_work() so that other callers of 
qla2x00_alloc_work() can also benefit.

Regards,
-Arun

>  	e = qla2x00_alloc_work(vha, type);	\
>  	if (!e)					\
>  		return QLA_FUNCTION_FAILED;	\
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [EXT] [PATCH v3 3/5] Revert "scsi: qla2xxx: Fix unbound sleep in fcport delete path."
  2020-03-27 16:47 ` [PATCH v3 3/5] Revert "scsi: qla2xxx: Fix unbound sleep in fcport delete path." mwilck
@ 2020-04-01 17:12   ` Arun Easi
  2020-04-02  9:39     ` Martin Wilck
  0 siblings, 1 reply; 11+ messages in thread
From: Arun Easi @ 2020-04-01 17:12 UTC (permalink / raw)
  To: Martin Wilck
  Cc: Martin K. Petersen, Quinn Tran, Roman Bolshakov, Daniel Wagner,
	Bart Van Assche, James Bottomley, Hannes Reinecke, linux-scsi

On Fri, 27 Mar 2020, 9:47am, mwilck@suse.com wrote:

> External Email
> 
> ----------------------------------------------------------------------
> From: Martin Wilck <mwilck@suse.com>
> 
> This reverts commit c3b6a1d397420a0fdd97af2f06abfb78adc370df.
> Aborting the sleep was risky, because after return from
> qlt_free_session_done() the driver starts freeing resources,
> which is dangerous while we know that there's pending IO.
> 
> The previous patch "scsi: qla2xxx: check UNLOADING before posting async
> work" avoids sending this IO in the first place, and thus obsoletes
> the dangerous timeout.
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  drivers/scsi/qla2xxx/qla_target.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c
> index 622e733..eec1338 100644
> --- a/drivers/scsi/qla2xxx/qla_target.c
> +++ b/drivers/scsi/qla2xxx/qla_target.c
> @@ -1019,7 +1019,6 @@ void qlt_free_session_done(struct work_struct *work)
>  
>  	if (logout_started) {
>  		bool traced = false;
> -		u16 cnt = 0;
>  
>  		while (!READ_ONCE(sess->logout_completed)) {
>  			if (!traced) {
> @@ -1029,9 +1028,6 @@ void qlt_free_session_done(struct work_struct *work)
>  				traced = true;
>  			}
>  			msleep(100);
> -			cnt++;
> -			if (cnt > 200)
> -				break;

By taking this code out, it would leave a stuck FC target delete thread 
and thus preventing the module unload itself, in case of a bug in this 
logic (which was seen in some instances).

How about increasing the wait time to say 25 seconds (typical worst case 
is 2 * RA_TOV = 20 seconds) and then alerting user with a "WARN", but 
still break out?

Regards,
-Arun

>  		}
>  
>  		ql_dbg(ql_dbg_disc, vha, 0xf087,
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [EXT] [PATCH v3 3/5] Revert "scsi: qla2xxx: Fix unbound sleep in fcport delete path."
  2020-04-01 17:12   ` [EXT] " Arun Easi
@ 2020-04-02  9:39     ` Martin Wilck
  0 siblings, 0 replies; 11+ messages in thread
From: Martin Wilck @ 2020-04-02  9:39 UTC (permalink / raw)
  To: Arun Easi
  Cc: Martin K. Petersen, Quinn Tran, Roman Bolshakov, Daniel Wagner,
	Bart Van Assche, James Bottomley, Hannes Reinecke, linux-scsi

Hello Arun,

On Wed, 2020-04-01 at 10:12 -0700, Arun Easi wrote:
> On Fri, 27 Mar 2020, 9:47am, mwilck@suse.com wrote:
> 
> > External Email
> > 
> > -----------------------------------------------------------------
> > -----
> > From: Martin Wilck <mwilck@suse.com>
> > 
> > This reverts commit c3b6a1d397420a0fdd97af2f06abfb78adc370df.
> > Aborting the sleep was risky, because after return from
> > qlt_free_session_done() the driver starts freeing resources,
> > which is dangerous while we know that there's pending IO.
> > 
> > The previous patch "scsi: qla2xxx: check UNLOADING before posting
> > async
> > work" avoids sending this IO in the first place, and thus obsoletes
> > the dangerous timeout.
> > 
> > Signed-off-by: Martin Wilck <mwilck@suse.com>
> > ---
> >  drivers/scsi/qla2xxx/qla_target.c | 4 ----
> >  1 file changed, 4 deletions(-)
> > 
> > diff --git a/drivers/scsi/qla2xxx/qla_target.c
> > b/drivers/scsi/qla2xxx/qla_target.c
> > index 622e733..eec1338 100644
> > --- a/drivers/scsi/qla2xxx/qla_target.c
> > +++ b/drivers/scsi/qla2xxx/qla_target.c
> > @@ -1019,7 +1019,6 @@ void qlt_free_session_done(struct work_struct
> > *work)
> >  
> >  	if (logout_started) {
> >  		bool traced = false;
> > -		u16 cnt = 0;
> >  
> >  		while (!READ_ONCE(sess->logout_completed)) {
> >  			if (!traced) {
> > @@ -1029,9 +1028,6 @@ void qlt_free_session_done(struct work_struct
> > *work)
> >  				traced = true;
> >  			}
> >  			msleep(100);
> > -			cnt++;
> > -			if (cnt > 200)
> > -				break;
> 
> By taking this code out, it would leave a stuck FC target delete
> thread 
> and thus preventing the module unload itself, in case of a bug in
> this 
> logic (which was seen in some instances).
> 
> How about increasing the wait time to say 25 seconds (typical worst
> case 
> is 2 * RA_TOV = 20 seconds) and then alerting user with a "WARN",
> but 
> still break out?
> 

I've seen the kernel crash because of this breakout. A crash happens if
another code path was blocked because of either the stopped FW or
anything else, and would time out after this code. 

The actual crash I've seen is already been fixed by patch 1 of this
series, because qla2x00_terminate_rport_io() checks UNLOADING before
sending IO. So if it was just about that, we could keep the breakout.

I suspect that the "unbound sleep" occurred only because of the 
hanging access to stopped FW (note that c3b6a1d39742 was made just ~2
months after 45235022da99 "scsi: qla2xxx: Fix driver unload by shutting
down chip"). If that's true, the hang wouldn't occur any more with
patch 1-2 of my series applied. If it's wrong and the hang does occur
again, IMO we should find out what is hanging, and fix it, rather than
blindly quit this wait loop and free resources.

But if you want to keep it in, I'm not going to insist; I'm more
interested in getting the first two patches merged.

Regards,
Martin

(Btw, I suppose you'll admit the combination of msleep() and counting
isn't the state of the art of implementing a timeout in the kernel).




> Regards,
> -Arun
> 
> >  		}
> >  
> >  		ql_dbg(ql_dbg_disc, vha, 0xf087,
> > 



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-04-02  9:39 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-27 16:47 [PATCH v3 0/5] scsi: qla2xxx: fixes for driver unloading mwilck
2020-03-27 16:47 ` [PATCH v3 1/5] scsi: qla2xxx: set UNLOADING before waiting for session deletion mwilck
2020-04-01 17:02   ` Arun Easi
2020-03-27 16:47 ` [PATCH v3 2/5] scsi: qla2xxx: check UNLOADING before posting async work mwilck
2020-04-01 17:04   ` Arun Easi
2020-03-27 16:47 ` [PATCH v3 3/5] Revert "scsi: qla2xxx: Fix unbound sleep in fcport delete path." mwilck
2020-04-01 17:12   ` [EXT] " Arun Easi
2020-04-02  9:39     ` Martin Wilck
2020-03-27 16:47 ` [PATCH v3 4/5] scsi: qla2xxx: avoid sending iocbs when firmware is stopped mwilck
2020-03-27 16:47 ` [PATCH v3 5/5] scsi: qla2xxx: only send certain mailbox commands to stopped firmware mwilck
2020-03-30 20:15   ` Martin Wilck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.