linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] qedi: Misc bug fixes and enhancements
@ 2020-09-08  5:24 Manish Rangankar
  2020-09-08  5:24 ` [PATCH 1/8] qedi: Use qed count from set_fp_int in msix allocation Manish Rangankar
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Manish Rangankar @ 2020-09-08  5:24 UTC (permalink / raw)
  To: martin.petersen, lduncan, cleech; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

Hi Martin,

Please apply the qedi miscellaneous bug fixes and enhancement patches
to the scsi tree at your convenience.

Thanks,
Manish

Manish Rangankar (5):
  qedi: Use qed count from set_fp_int in msix allocation.
  qedi: Skip f/w connection termination for pci shutdown handler.
  qedi: Use snprintf instead of sprintf
  qedi: Add firmware error recovery invocation support.
  qedi: Add support for handling the pcie errors.

Nilesh Javali (3):
  qedi: Fix list_del corruption while removing active IO
  qedi: Protect active command list to avoid list corruption
  qedi: Mark all connections for recovery on link down event

 drivers/scsi/qedi/qedi.h       |   5 ++
 drivers/scsi/qedi/qedi_fw.c    |  30 ++++++++--
 drivers/scsi/qedi/qedi_iscsi.c |   7 +++
 drivers/scsi/qedi/qedi_main.c  | 106 +++++++++++++++++++++++++++++++--
 4 files changed, 138 insertions(+), 10 deletions(-)

-- 
2.25.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/8] qedi: Use qed count from set_fp_int in msix allocation.
  2020-09-08  5:24 [PATCH 0/8] qedi: Misc bug fixes and enhancements Manish Rangankar
@ 2020-09-08  5:24 ` Manish Rangankar
  2020-09-08  5:24 ` [PATCH 2/8] qedi: Skip f/w connection termination for pci shutdown handler Manish Rangankar
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Manish Rangankar @ 2020-09-08  5:24 UTC (permalink / raw)
  To: martin.petersen, lduncan, cleech; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

To avoid unnecessary vector allocation when number of fast-path
queues are less then available msix vectors, use return count
from module qed->set_fp_int.

Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
---
 drivers/scsi/qedi/qedi.h      | 1 +
 drivers/scsi/qedi/qedi_main.c | 9 +++++++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/qedi/qedi.h b/drivers/scsi/qedi/qedi.h
index 9498279ae80d..9c19ec9dc682 100644
--- a/drivers/scsi/qedi/qedi.h
+++ b/drivers/scsi/qedi/qedi.h
@@ -305,6 +305,7 @@ struct qedi_ctx {
 	u32 max_sqes;
 	u8 num_queues;
 	u32 max_active_conns;
+	s32 msix_count;
 
 	struct iscsi_cid_queue cid_que;
 	struct qedi_endpoint **ep_tbl;
diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
index 6f038ae5efca..e1ec22d7d699 100644
--- a/drivers/scsi/qedi/qedi_main.c
+++ b/drivers/scsi/qedi/qedi_main.c
@@ -1357,7 +1357,7 @@ static int qedi_request_msix_irq(struct qedi_ctx *qedi)
 	u16 idx;
 
 	cpu = cpumask_first(cpu_online_mask);
-	for (i = 0; i < qedi->int_info.msix_cnt; i++) {
+	for (i = 0; i < qedi->msix_count; i++) {
 		idx = i * qedi->dev_info.common.num_hwfns +
 			  qedi_ops->common->get_affin_hwfn_idx(qedi->cdev);
 
@@ -1387,7 +1387,12 @@ static int qedi_setup_int(struct qedi_ctx *qedi)
 {
 	int rc = 0;
 
-	rc = qedi_ops->common->set_fp_int(qedi->cdev, num_online_cpus());
+	rc = qedi_ops->common->set_fp_int(qedi->cdev, qedi->num_queues);
+	if (rc < 0)
+		goto exit_setup_int;
+
+	qedi->msix_count = rc;
+
 	rc = qedi_ops->common->get_fp_int(qedi->cdev, &qedi->int_info);
 	if (rc)
 		goto exit_setup_int;
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/8] qedi: Skip f/w connection termination for pci shutdown handler.
  2020-09-08  5:24 [PATCH 0/8] qedi: Misc bug fixes and enhancements Manish Rangankar
  2020-09-08  5:24 ` [PATCH 1/8] qedi: Use qed count from set_fp_int in msix allocation Manish Rangankar
@ 2020-09-08  5:24 ` Manish Rangankar
  2020-09-08  5:24 ` [PATCH 3/8] qedi: Fix list_del corruption while removing active IO Manish Rangankar
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Manish Rangankar @ 2020-09-08  5:24 UTC (permalink / raw)
  To: martin.petersen, lduncan, cleech; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

In boot from san scenario when qedi pci shutdown handler is
called with active iSCSI sessions, sometimes target takes
too long time to respond to f/w connection termination request.
Instead skip sending termination ramrod and progress with
unload path.

Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
---
 drivers/scsi/qedi/qedi_iscsi.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c
index c14ac7882afa..f815845fc568 100644
--- a/drivers/scsi/qedi/qedi_iscsi.c
+++ b/drivers/scsi/qedi/qedi_iscsi.c
@@ -1069,6 +1069,10 @@ static void qedi_ep_disconnect(struct iscsi_endpoint *ep)
 		wait_delay += qedi->pf_params.iscsi_pf_params.two_msl_timer;
 
 	qedi_ep->state = EP_STATE_DISCONN_START;
+
+	if (test_bit(QEDI_IN_SHUTDOWN, &qedi->flags))
+		goto ep_release_conn;
+
 	ret = qedi_ops->destroy_conn(qedi->cdev, qedi_ep->handle, abrt_conn);
 	if (ret) {
 		QEDI_WARN(&qedi->dbg_ctx,
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/8] qedi: Fix list_del corruption while removing active IO
  2020-09-08  5:24 [PATCH 0/8] qedi: Misc bug fixes and enhancements Manish Rangankar
  2020-09-08  5:24 ` [PATCH 1/8] qedi: Use qed count from set_fp_int in msix allocation Manish Rangankar
  2020-09-08  5:24 ` [PATCH 2/8] qedi: Skip f/w connection termination for pci shutdown handler Manish Rangankar
@ 2020-09-08  5:24 ` Manish Rangankar
  2020-09-08  5:24 ` [PATCH 4/8] qedi: Protect active command list to avoid list corruption Manish Rangankar
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Manish Rangankar @ 2020-09-08  5:24 UTC (permalink / raw)
  To: martin.petersen, lduncan, cleech; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

From: Nilesh Javali <njavali@marvell.com>

While aborting the IO, the FW cleanup task timed out and driver
deleted the IO from active command list. Some time later, the FW
sends the cleanup task response and driver again deletes the IO
from active command list causing FW to send IO completion for
non-existent IO and list_del corruption of active command list.

Add fix to check if IO is present before deleting the i/o from active
command list ensuring FW sends valid i/o completion and protect the
list_del corruption.

Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
---
 drivers/scsi/qedi/qedi_fw.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c
index 6ed74583b1b9..34c477bda8a4 100644
--- a/drivers/scsi/qedi/qedi_fw.c
+++ b/drivers/scsi/qedi/qedi_fw.c
@@ -816,8 +816,11 @@ static void qedi_process_cmd_cleanup_resp(struct qedi_ctx *qedi,
 			qedi_clear_task_idx(qedi_conn->qedi, rtid);
 
 			spin_lock(&qedi_conn->list_lock);
-			list_del_init(&dbg_cmd->io_cmd);
-			qedi_conn->active_cmd_count--;
+			if (likely(dbg_cmd->io_cmd_in_list)) {
+				dbg_cmd->io_cmd_in_list = false;
+				list_del_init(&dbg_cmd->io_cmd);
+				qedi_conn->active_cmd_count--;
+			}
 			spin_unlock(&qedi_conn->list_lock);
 			qedi_cmd->state = CLEANUP_RECV;
 			wake_up_interruptible(&qedi_conn->wait_queue);
@@ -1235,6 +1238,7 @@ int qedi_cleanup_all_io(struct qedi_ctx *qedi, struct qedi_conn *qedi_conn,
 		qedi_conn->cmd_cleanup_req++;
 		qedi_iscsi_cleanup_task(ctask, true);
 
+		cmd->io_cmd_in_list = false;
 		list_del_init(&cmd->io_cmd);
 		qedi_conn->active_cmd_count--;
 		QEDI_WARN(&qedi->dbg_ctx,
@@ -1446,8 +1450,11 @@ static void qedi_tmf_work(struct work_struct *work)
 	spin_unlock_bh(&qedi_conn->tmf_work_lock);
 
 	spin_lock(&qedi_conn->list_lock);
-	list_del_init(&cmd->io_cmd);
-	qedi_conn->active_cmd_count--;
+	if (likely(cmd->io_cmd_in_list)) {
+		cmd->io_cmd_in_list = false;
+		list_del_init(&cmd->io_cmd);
+		qedi_conn->active_cmd_count--;
+	}
 	spin_unlock(&qedi_conn->list_lock);
 
 	clear_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/8] qedi: Protect active command list to avoid list corruption
  2020-09-08  5:24 [PATCH 0/8] qedi: Misc bug fixes and enhancements Manish Rangankar
                   ` (2 preceding siblings ...)
  2020-09-08  5:24 ` [PATCH 3/8] qedi: Fix list_del corruption while removing active IO Manish Rangankar
@ 2020-09-08  5:24 ` Manish Rangankar
  2020-09-08  5:24 ` [PATCH 5/8] qedi: Use snprintf instead of sprintf Manish Rangankar
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Manish Rangankar @ 2020-09-08  5:24 UTC (permalink / raw)
  To: martin.petersen, lduncan, cleech; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

From: Nilesh Javali <njavali@marvell.com>

Protect active command list for non i/o commands like,
login response, logout response, text response and recovery
cleanup of active list to avoid list corruption.

Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
---
 drivers/scsi/qedi/qedi_fw.c    | 8 ++++++++
 drivers/scsi/qedi/qedi_iscsi.c | 2 ++
 2 files changed, 10 insertions(+)

diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c
index 34c477bda8a4..f158fde0a43c 100644
--- a/drivers/scsi/qedi/qedi_fw.c
+++ b/drivers/scsi/qedi/qedi_fw.c
@@ -59,6 +59,7 @@ static void qedi_process_logout_resp(struct qedi_ctx *qedi,
 		  "Freeing tid=0x%x for cid=0x%x\n",
 		  cmd->task_id, qedi_conn->iscsi_conn_id);
 
+	spin_lock(&qedi_conn->list_lock);
 	if (likely(cmd->io_cmd_in_list)) {
 		cmd->io_cmd_in_list = false;
 		list_del_init(&cmd->io_cmd);
@@ -69,6 +70,7 @@ static void qedi_process_logout_resp(struct qedi_ctx *qedi,
 			  cmd->task_id, qedi_conn->iscsi_conn_id,
 			  &cmd->io_cmd);
 	}
+	spin_unlock(&qedi_conn->list_lock);
 
 	cmd->state = RESPONSE_RECEIVED;
 	qedi_clear_task_idx(qedi, cmd->task_id);
@@ -122,6 +124,7 @@ static void qedi_process_text_resp(struct qedi_ctx *qedi,
 		  "Freeing tid=0x%x for cid=0x%x\n",
 		  cmd->task_id, qedi_conn->iscsi_conn_id);
 
+	spin_lock(&qedi_conn->list_lock);
 	if (likely(cmd->io_cmd_in_list)) {
 		cmd->io_cmd_in_list = false;
 		list_del_init(&cmd->io_cmd);
@@ -132,6 +135,7 @@ static void qedi_process_text_resp(struct qedi_ctx *qedi,
 			  cmd->task_id, qedi_conn->iscsi_conn_id,
 			  &cmd->io_cmd);
 	}
+	spin_unlock(&qedi_conn->list_lock);
 
 	cmd->state = RESPONSE_RECEIVED;
 	qedi_clear_task_idx(qedi, cmd->task_id);
@@ -222,11 +226,13 @@ static void qedi_process_tmf_resp(struct qedi_ctx *qedi,
 
 	tmf_hdr = (struct iscsi_tm *)qedi_cmd->task->hdr;
 
+	spin_lock(&qedi_conn->list_lock);
 	if (likely(qedi_cmd->io_cmd_in_list)) {
 		qedi_cmd->io_cmd_in_list = false;
 		list_del_init(&qedi_cmd->io_cmd);
 		qedi_conn->active_cmd_count--;
 	}
+	spin_unlock(&qedi_conn->list_lock);
 
 	if (((tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) ==
 	      ISCSI_TM_FUNC_LOGICAL_UNIT_RESET) ||
@@ -288,11 +294,13 @@ static void qedi_process_login_resp(struct qedi_ctx *qedi,
 		  ISCSI_LOGIN_RESPONSE_HDR_DATA_SEG_LEN_MASK;
 	qedi_conn->gen_pdu.resp_wr_ptr = qedi_conn->gen_pdu.resp_buf + pld_len;
 
+	spin_lock(&qedi_conn->list_lock);
 	if (likely(cmd->io_cmd_in_list)) {
 		cmd->io_cmd_in_list = false;
 		list_del_init(&cmd->io_cmd);
 		qedi_conn->active_cmd_count--;
 	}
+	spin_unlock(&qedi_conn->list_lock);
 
 	memset(task_ctx, '\0', sizeof(*task_ctx));
 
diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c
index f815845fc568..ae86a40ca040 100644
--- a/drivers/scsi/qedi/qedi_iscsi.c
+++ b/drivers/scsi/qedi/qedi_iscsi.c
@@ -975,11 +975,13 @@ static void qedi_cleanup_active_cmd_list(struct qedi_conn *qedi_conn)
 {
 	struct qedi_cmd *cmd, *cmd_tmp;
 
+	spin_lock(&qedi_conn->list_lock);
 	list_for_each_entry_safe(cmd, cmd_tmp, &qedi_conn->active_cmd_list,
 				 io_cmd) {
 		list_del_init(&cmd->io_cmd);
 		qedi_conn->active_cmd_count--;
 	}
+	spin_unlock(&qedi_conn->list_lock);
 }
 
 static void qedi_ep_disconnect(struct iscsi_endpoint *ep)
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 5/8] qedi: Use snprintf instead of sprintf
  2020-09-08  5:24 [PATCH 0/8] qedi: Misc bug fixes and enhancements Manish Rangankar
                   ` (3 preceding siblings ...)
  2020-09-08  5:24 ` [PATCH 4/8] qedi: Protect active command list to avoid list corruption Manish Rangankar
@ 2020-09-08  5:24 ` Manish Rangankar
  2020-09-08  5:24 ` [PATCH 6/8] qedi: Mark all connections for recovery on link down event Manish Rangankar
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Manish Rangankar @ 2020-09-08  5:24 UTC (permalink / raw)
  To: martin.petersen, lduncan, cleech; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

Use snprintf to limit max number of bytes to the buffer.

Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
---
 drivers/scsi/qedi/qedi_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
index e1ec22d7d699..2db99613b8a9 100644
--- a/drivers/scsi/qedi/qedi_main.c
+++ b/drivers/scsi/qedi/qedi_main.c
@@ -2538,7 +2538,7 @@ static int __qedi_probe(struct pci_dev *pdev, int mode)
 	QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_DISC, "MAC address is %pM.\n",
 		  qedi->mac);
 
-	sprintf(host_buf, "host_%d", qedi->shost->host_no);
+	snprintf(host_buf, sizeof(host_buf), "host_%d", qedi->shost->host_no);
 	qedi_ops->common->set_name(qedi->cdev, host_buf);
 
 	qedi_ops->register_ops(qedi->cdev, &qedi_cb_ops, qedi);
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 6/8] qedi: Mark all connections for recovery on link down event
  2020-09-08  5:24 [PATCH 0/8] qedi: Misc bug fixes and enhancements Manish Rangankar
                   ` (4 preceding siblings ...)
  2020-09-08  5:24 ` [PATCH 5/8] qedi: Use snprintf instead of sprintf Manish Rangankar
@ 2020-09-08  5:24 ` Manish Rangankar
  2020-09-08  5:24 ` [PATCH 7/8] qedi: Add firmware error recovery invocation support Manish Rangankar
  2020-09-08  5:24 ` [PATCH 8/8] qedi: Add support for handling the pcie errors Manish Rangankar
  7 siblings, 0 replies; 9+ messages in thread
From: Manish Rangankar @ 2020-09-08  5:24 UTC (permalink / raw)
  To: martin.petersen, lduncan, cleech; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

From: Nilesh Javali <njavali@marvell.com>

  For short time cable pulls, the in-flight i/o to the FW,
is never cleaned up, resulting in the behaviour of stale i/o
completion causing list_del corruption and soft lockup of the
system. So, on link down event, mark all the connections for
recovery, causing cleanup of all the in-flight i/o immediately.

Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
---
 drivers/scsi/qedi/qedi_main.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
index 2db99613b8a9..4f43e2a24b50 100644
--- a/drivers/scsi/qedi/qedi_main.c
+++ b/drivers/scsi/qedi/qedi_main.c
@@ -1127,6 +1127,15 @@ static void qedi_schedule_recovery_handler(void *dev)
 	schedule_delayed_work(&qedi->recovery_work, 0);
 }
 
+static void qedi_set_conn_recovery(struct iscsi_cls_session *cls_session)
+{
+	struct iscsi_session *session = cls_session->dd_data;
+	struct iscsi_conn *conn = session->leadconn;
+	struct qedi_conn *qedi_conn = conn->dd_data;
+
+	qedi_start_conn_recovery(qedi_conn->qedi, qedi_conn);
+}
+
 static void qedi_link_update(void *dev, struct qed_link_output *link)
 {
 	struct qedi_ctx *qedi = (struct qedi_ctx *)dev;
@@ -1138,6 +1147,7 @@ static void qedi_link_update(void *dev, struct qed_link_output *link)
 		QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_INFO,
 			  "Link Down event.\n");
 		atomic_set(&qedi->link_state, QEDI_LINK_DOWN);
+		iscsi_host_for_each_session(qedi->shost, qedi_set_conn_recovery);
 	}
 }
 
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 7/8] qedi: Add firmware error recovery invocation support.
  2020-09-08  5:24 [PATCH 0/8] qedi: Misc bug fixes and enhancements Manish Rangankar
                   ` (5 preceding siblings ...)
  2020-09-08  5:24 ` [PATCH 6/8] qedi: Mark all connections for recovery on link down event Manish Rangankar
@ 2020-09-08  5:24 ` Manish Rangankar
  2020-09-08  5:24 ` [PATCH 8/8] qedi: Add support for handling the pcie errors Manish Rangankar
  7 siblings, 0 replies; 9+ messages in thread
From: Manish Rangankar @ 2020-09-08  5:24 UTC (permalink / raw)
  To: martin.petersen, lduncan, cleech; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

Add support to initiate MFW process recovery for all
the devices if storage function receive the event first.

Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
---
 drivers/scsi/qedi/qedi.h       |  4 +++
 drivers/scsi/qedi/qedi_fw.c    |  7 ++--
 drivers/scsi/qedi/qedi_iscsi.c |  3 +-
 drivers/scsi/qedi/qedi_main.c  | 61 +++++++++++++++++++++++++++++++++-
 4 files changed, 71 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/qedi/qedi.h b/drivers/scsi/qedi/qedi.h
index 9c19ec9dc682..7e59d50f2fab 100644
--- a/drivers/scsi/qedi/qedi.h
+++ b/drivers/scsi/qedi/qedi.h
@@ -274,6 +274,10 @@ struct qedi_ctx {
 	spinlock_t ll2_lock;	/* Light L2 lock */
 	spinlock_t hba_lock;	/* per port lock */
 	struct task_struct *ll2_recv_thread;
+	unsigned long qedi_err_flags;
+#define QEDI_ERR_ATTN_CLR_EN	0
+#define QEDI_ERR_IS_RECOVERABLE	2
+#define QEDI_ERR_OVERRIDE_EN	31
 	unsigned long flags;
 #define UIO_DEV_OPENED		1
 #define QEDI_IOTHREAD_WAKE	2
diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c
index f158fde0a43c..440ddd2309f1 100644
--- a/drivers/scsi/qedi/qedi_fw.c
+++ b/drivers/scsi/qedi/qedi_fw.c
@@ -1267,7 +1267,8 @@ int qedi_cleanup_all_io(struct qedi_ctx *qedi, struct qedi_conn *qedi_conn,
 	rval  = wait_event_interruptible_timeout(qedi_conn->wait_queue,
 						 ((qedi_conn->cmd_cleanup_req ==
 						 qedi_conn->cmd_cleanup_cmpl) ||
-						 qedi_conn->ep),
+						 test_bit(QEDI_IN_RECOVERY,
+							  &qedi->flags)),
 						 5 * HZ);
 	if (rval) {
 		QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_SCSI_TM,
@@ -1292,7 +1293,9 @@ int qedi_cleanup_all_io(struct qedi_ctx *qedi, struct qedi_conn *qedi_conn,
 	/* Enable IOs for all other sessions except current.*/
 	if (!wait_event_interruptible_timeout(qedi_conn->wait_queue,
 					      (qedi_conn->cmd_cleanup_req ==
-					       qedi_conn->cmd_cleanup_cmpl),
+					       qedi_conn->cmd_cleanup_cmpl) ||
+					       test_bit(QEDI_IN_RECOVERY,
+							&qedi->flags),
 					      5 * HZ)) {
 		iscsi_host_for_each_session(qedi->shost,
 					    qedi_mark_device_available);
diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c
index ae86a40ca040..08c05403cd72 100644
--- a/drivers/scsi/qedi/qedi_iscsi.c
+++ b/drivers/scsi/qedi/qedi_iscsi.c
@@ -1072,7 +1072,8 @@ static void qedi_ep_disconnect(struct iscsi_endpoint *ep)
 
 	qedi_ep->state = EP_STATE_DISCONN_START;
 
-	if (test_bit(QEDI_IN_SHUTDOWN, &qedi->flags))
+	if (test_bit(QEDI_IN_SHUTDOWN, &qedi->flags) ||
+	    test_bit(QEDI_IN_RECOVERY, &qedi->flags))
 		goto ep_release_conn;
 
 	ret = qedi_ops->destroy_conn(qedi->cdev, qedi_ep->handle, abrt_conn);
diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
index 4f43e2a24b50..28b5d6670580 100644
--- a/drivers/scsi/qedi/qedi_main.c
+++ b/drivers/scsi/qedi/qedi_main.c
@@ -50,6 +50,10 @@ module_param(qedi_ll2_buf_size, uint, 0644);
 MODULE_PARM_DESC(qedi_ll2_buf_size,
 		 "parameter to set ping packet size, default - 0x400, Jumbo packets - 0x2400.");
 
+static uint qedi_flags_override;
+module_param(qedi_flags_override, uint, 0644);
+MODULE_PARM_DESC(qedi_flags_override, "Disable/Enable MFW error flags bits action.");
+
 const struct qed_iscsi_ops *qedi_ops;
 static struct scsi_transport_template *qedi_scsi_transport;
 static struct pci_driver qedi_pci_driver;
@@ -1113,6 +1117,39 @@ static void qedi_get_protocol_tlv_data(void *dev, void *data)
 	return;
 }
 
+void qedi_schedule_hw_err_handler(void *dev,
+				  enum qed_hw_err_type err_type)
+{
+	struct qedi_ctx *qedi = (struct qedi_ctx *)dev;
+	unsigned long override_flags = qedi_flags_override;
+
+	if (override_flags && test_bit(QEDI_ERR_OVERRIDE_EN, &override_flags))
+		qedi->qedi_err_flags = qedi_flags_override;
+
+	QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_INFO,
+		  "HW error handler scheduled, err=%d err_flags=0x%x\n",
+		  err_type, qedi->qedi_err_flags);
+
+	switch (err_type) {
+	case QED_HW_ERR_MFW_RESP_FAIL:
+	case QED_HW_ERR_HW_ATTN:
+	case QED_HW_ERR_DMAE_FAIL:
+	case QED_HW_ERR_RAMROD_FAIL:
+	case QED_HW_ERR_FW_ASSERT:
+		/* Prevent HW attentions from being reasserted */
+		if (test_bit(QEDI_ERR_ATTN_CLR_EN, &qedi->qedi_err_flags))
+			qedi_ops->common->attn_clr_enable(qedi->cdev, true);
+
+		if (err_type == QED_HW_ERR_RAMROD_FAIL &&
+		    test_bit(QEDI_ERR_IS_RECOVERABLE, &qedi->qedi_err_flags))
+			qedi_ops->common->recovery_process(qedi->cdev);
+
+		break;
+	default:
+		break;
+	}
+}
+
 static void qedi_schedule_recovery_handler(void *dev)
 {
 	struct qedi_ctx *qedi = dev;
@@ -1155,6 +1192,7 @@ static struct qed_iscsi_cb_ops qedi_cb_ops = {
 	{
 		.link_update =		qedi_link_update,
 		.schedule_recovery_handler = qedi_schedule_recovery_handler,
+		.schedule_hw_err_handler = qedi_schedule_hw_err_handler,
 		.get_protocol_tlv_data = qedi_get_protocol_tlv_data,
 		.get_generic_tlv_data = qedi_get_generic_tlv_data,
 	}
@@ -2355,6 +2393,7 @@ static void __qedi_remove(struct pci_dev *pdev, int mode)
 {
 	struct qedi_ctx *qedi = pci_get_drvdata(pdev);
 	int rval;
+	u16 retry = 10;
 
 	if (mode == QEDI_MODE_SHUTDOWN)
 		iscsi_host_for_each_session(qedi->shost,
@@ -2383,7 +2422,13 @@ static void __qedi_remove(struct pci_dev *pdev, int mode)
 	qedi_sync_free_irqs(qedi);
 
 	if (!test_bit(QEDI_IN_OFFLINE, &qedi->flags)) {
-		qedi_ops->stop(qedi->cdev);
+		while (retry--) {
+			rval = qedi_ops->stop(qedi->cdev);
+			if (rval < 0)
+				msleep(1000);
+			else
+				break;
+		}
 		qedi_ops->ll2->stop(qedi->cdev);
 	}
 
@@ -2442,6 +2487,7 @@ static int __qedi_probe(struct pci_dev *pdev, int mode)
 	struct qed_probe_params qed_params;
 	void *task_start, *task_end;
 	int rc;
+	u16 retry = 10;
 
 	if (mode != QEDI_MODE_RECOVERY) {
 		qedi = qedi_host_alloc(pdev);
@@ -2453,6 +2499,10 @@ static int __qedi_probe(struct pci_dev *pdev, int mode)
 		qedi = pci_get_drvdata(pdev);
 	}
 
+retry_probe:
+	if (mode == QEDI_MODE_RECOVERY)
+		msleep(2000);
+
 	memset(&qed_params, 0, sizeof(qed_params));
 	qed_params.protocol = QED_PROTOCOL_ISCSI;
 	qed_params.dp_module = qedi_qed_debug;
@@ -2460,11 +2510,20 @@ static int __qedi_probe(struct pci_dev *pdev, int mode)
 	qed_params.is_vf = is_vf;
 	qedi->cdev = qedi_ops->common->probe(pdev, &qed_params);
 	if (!qedi->cdev) {
+		if (mode == QEDI_MODE_RECOVERY && retry) {
+			QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_INFO,
+				  "Retry %d initialize hardware\n", retry);
+			retry--;
+			goto retry_probe;
+		}
+
 		rc = -ENODEV;
 		QEDI_ERR(&qedi->dbg_ctx, "Cannot initialize hardware\n");
 		goto free_host;
 	}
 
+	set_bit(QEDI_ERR_ATTN_CLR_EN, &qedi->qedi_err_flags);
+	set_bit(QEDI_ERR_IS_RECOVERABLE, &qedi->qedi_err_flags);
 	atomic_set(&qedi->link_state, QEDI_LINK_DOWN);
 
 	rc = qedi_ops->fill_dev_info(qedi->cdev, &qedi->dev_info);
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 8/8] qedi: Add support for handling the pcie errors.
  2020-09-08  5:24 [PATCH 0/8] qedi: Misc bug fixes and enhancements Manish Rangankar
                   ` (6 preceding siblings ...)
  2020-09-08  5:24 ` [PATCH 7/8] qedi: Add firmware error recovery invocation support Manish Rangankar
@ 2020-09-08  5:24 ` Manish Rangankar
  7 siblings, 0 replies; 9+ messages in thread
From: Manish Rangankar @ 2020-09-08  5:24 UTC (permalink / raw)
  To: martin.petersen, lduncan, cleech; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

The error recovery is handled by management firmware (MFW) with the help of
qed/qedi drivers. Upon detecting the errors, driver informs MFW about this
event which in turn starts a recovery process. MFW sends ERROR_RECOVERY
notification to the driver which performs the required cleanup/recovery
from the driver side.

Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
---
 drivers/scsi/qedi/qedi_main.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
index 28b5d6670580..24564fff3e93 100644
--- a/drivers/scsi/qedi/qedi_main.c
+++ b/drivers/scsi/qedi/qedi_main.c
@@ -2389,6 +2389,25 @@ static int qedi_setup_boot_info(struct qedi_ctx *qedi)
 	return -ENOMEM;
 }
 
+static pci_ers_result_t qedi_io_error_detected(struct pci_dev *pdev,
+					       pci_channel_state_t state)
+{
+	struct qedi_ctx *qedi = pci_get_drvdata(pdev);
+
+	QEDI_ERR(&qedi->dbg_ctx, "%s: PCI error detected [%d]\n",
+		 __func__, state);
+
+	if (test_and_set_bit(QEDI_IN_RECOVERY, &qedi->flags)) {
+		QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_INFO,
+			  "Recovery already in progress.\n");
+		return PCI_ERS_RESULT_NONE;
+	}
+
+	qedi_ops->common->recovery_process(qedi->cdev);
+
+	return PCI_ERS_RESULT_CAN_RECOVER;
+}
+
 static void __qedi_remove(struct pci_dev *pdev, int mode)
 {
 	struct qedi_ctx *qedi = pci_get_drvdata(pdev);
@@ -2818,12 +2837,17 @@ MODULE_DEVICE_TABLE(pci, qedi_pci_tbl);
 
 static enum cpuhp_state qedi_cpuhp_state;
 
+static struct pci_error_handlers qedi_err_handler = {
+	.error_detected = qedi_io_error_detected,
+};
+
 static struct pci_driver qedi_pci_driver = {
 	.name = QEDI_MODULE_NAME,
 	.id_table = qedi_pci_tbl,
 	.probe = qedi_probe,
 	.remove = qedi_remove,
 	.shutdown = qedi_shutdown,
+	.err_handler = &qedi_err_handler,
 };
 
 static int __init qedi_init(void)
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-09-08  5:27 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-08  5:24 [PATCH 0/8] qedi: Misc bug fixes and enhancements Manish Rangankar
2020-09-08  5:24 ` [PATCH 1/8] qedi: Use qed count from set_fp_int in msix allocation Manish Rangankar
2020-09-08  5:24 ` [PATCH 2/8] qedi: Skip f/w connection termination for pci shutdown handler Manish Rangankar
2020-09-08  5:24 ` [PATCH 3/8] qedi: Fix list_del corruption while removing active IO Manish Rangankar
2020-09-08  5:24 ` [PATCH 4/8] qedi: Protect active command list to avoid list corruption Manish Rangankar
2020-09-08  5:24 ` [PATCH 5/8] qedi: Use snprintf instead of sprintf Manish Rangankar
2020-09-08  5:24 ` [PATCH 6/8] qedi: Mark all connections for recovery on link down event Manish Rangankar
2020-09-08  5:24 ` [PATCH 7/8] qedi: Add firmware error recovery invocation support Manish Rangankar
2020-09-08  5:24 ` [PATCH 8/8] qedi: Add support for handling the pcie errors Manish Rangankar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).