linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/8] qedi: Misc bug fixes and enhancements
@ 2020-09-08  9:56 Manish Rangankar
  2020-09-08  9:56 ` [PATCH v2 1/8] qedi: Use qed count from set_fp_int in msix allocation Manish Rangankar
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: Manish Rangankar @ 2020-09-08  9:56 UTC (permalink / raw)
  To: martin.petersen, lduncan, cleech; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

Hi Martin,

Please apply the qedi miscellaneous bug fixes and enhancement patches
to the scsi tree at your convenience.

v1->v2:
Fix warning reported by kernel test robot

Thanks,
Manish

Manish Rangankar (5):
  qedi: Use qed count from set_fp_int in msix allocation.
  qedi: Skip f/w connection termination for pci shutdown handler.
  qedi: Use snprintf instead of sprintf
  qedi: Add firmware error recovery invocation support.
  qedi: Add support for handling the pcie errors.

Nilesh Javali (3):
  qedi: Fix list_del corruption while removing active IO
  qedi: Protect active command list to avoid list corruption
  qedi: Mark all connections for recovery on link down event

 drivers/scsi/qedi/qedi.h       |   5 ++
 drivers/scsi/qedi/qedi_fw.c    |  30 +++++++--
 drivers/scsi/qedi/qedi_iscsi.c |   7 +++
 drivers/scsi/qedi/qedi_main.c  | 108 +++++++++++++++++++++++++++++++--
 4 files changed, 140 insertions(+), 10 deletions(-)

-- 
2.25.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/8] qedi: Use qed count from set_fp_int in msix allocation.
  2020-09-08  9:56 [PATCH v2 0/8] qedi: Misc bug fixes and enhancements Manish Rangankar
@ 2020-09-08  9:56 ` Manish Rangankar
  2020-09-08  9:56 ` [PATCH v2 2/8] qedi: Skip f/w connection termination for pci shutdown handler Manish Rangankar
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Manish Rangankar @ 2020-09-08  9:56 UTC (permalink / raw)
  To: martin.petersen, lduncan, cleech; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

To avoid unnecessary vector allocation when number of fast-path
queues are less then available msix vectors, use return count
from module qed->set_fp_int.

Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
---
 drivers/scsi/qedi/qedi.h      | 1 +
 drivers/scsi/qedi/qedi_main.c | 9 +++++++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/qedi/qedi.h b/drivers/scsi/qedi/qedi.h
index 9498279ae80d..9c19ec9dc682 100644
--- a/drivers/scsi/qedi/qedi.h
+++ b/drivers/scsi/qedi/qedi.h
@@ -305,6 +305,7 @@ struct qedi_ctx {
 	u32 max_sqes;
 	u8 num_queues;
 	u32 max_active_conns;
+	s32 msix_count;
 
 	struct iscsi_cid_queue cid_que;
 	struct qedi_endpoint **ep_tbl;
diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
index 6f038ae5efca..e1ec22d7d699 100644
--- a/drivers/scsi/qedi/qedi_main.c
+++ b/drivers/scsi/qedi/qedi_main.c
@@ -1357,7 +1357,7 @@ static int qedi_request_msix_irq(struct qedi_ctx *qedi)
 	u16 idx;
 
 	cpu = cpumask_first(cpu_online_mask);
-	for (i = 0; i < qedi->int_info.msix_cnt; i++) {
+	for (i = 0; i < qedi->msix_count; i++) {
 		idx = i * qedi->dev_info.common.num_hwfns +
 			  qedi_ops->common->get_affin_hwfn_idx(qedi->cdev);
 
@@ -1387,7 +1387,12 @@ static int qedi_setup_int(struct qedi_ctx *qedi)
 {
 	int rc = 0;
 
-	rc = qedi_ops->common->set_fp_int(qedi->cdev, num_online_cpus());
+	rc = qedi_ops->common->set_fp_int(qedi->cdev, qedi->num_queues);
+	if (rc < 0)
+		goto exit_setup_int;
+
+	qedi->msix_count = rc;
+
 	rc = qedi_ops->common->get_fp_int(qedi->cdev, &qedi->int_info);
 	if (rc)
 		goto exit_setup_int;
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/8] qedi: Skip f/w connection termination for pci shutdown handler.
  2020-09-08  9:56 [PATCH v2 0/8] qedi: Misc bug fixes and enhancements Manish Rangankar
  2020-09-08  9:56 ` [PATCH v2 1/8] qedi: Use qed count from set_fp_int in msix allocation Manish Rangankar
@ 2020-09-08  9:56 ` Manish Rangankar
  2020-09-08  9:56 ` [PATCH v2 3/8] qedi: Fix list_del corruption while removing active IO Manish Rangankar
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Manish Rangankar @ 2020-09-08  9:56 UTC (permalink / raw)
  To: martin.petersen, lduncan, cleech; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

In boot from san scenario when qedi pci shutdown handler is
called with active iSCSI sessions, sometimes target takes
too long time to respond to f/w connection termination request.
Instead skip sending termination ramrod and progress with
unload path.

Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
---
 drivers/scsi/qedi/qedi_iscsi.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c
index c14ac7882afa..f815845fc568 100644
--- a/drivers/scsi/qedi/qedi_iscsi.c
+++ b/drivers/scsi/qedi/qedi_iscsi.c
@@ -1069,6 +1069,10 @@ static void qedi_ep_disconnect(struct iscsi_endpoint *ep)
 		wait_delay += qedi->pf_params.iscsi_pf_params.two_msl_timer;
 
 	qedi_ep->state = EP_STATE_DISCONN_START;
+
+	if (test_bit(QEDI_IN_SHUTDOWN, &qedi->flags))
+		goto ep_release_conn;
+
 	ret = qedi_ops->destroy_conn(qedi->cdev, qedi_ep->handle, abrt_conn);
 	if (ret) {
 		QEDI_WARN(&qedi->dbg_ctx,
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/8] qedi: Fix list_del corruption while removing active IO
  2020-09-08  9:56 [PATCH v2 0/8] qedi: Misc bug fixes and enhancements Manish Rangankar
  2020-09-08  9:56 ` [PATCH v2 1/8] qedi: Use qed count from set_fp_int in msix allocation Manish Rangankar
  2020-09-08  9:56 ` [PATCH v2 2/8] qedi: Skip f/w connection termination for pci shutdown handler Manish Rangankar
@ 2020-09-08  9:56 ` Manish Rangankar
  2020-09-08  9:56 ` [PATCH v2 4/8] qedi: Protect active command list to avoid list corruption Manish Rangankar
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Manish Rangankar @ 2020-09-08  9:56 UTC (permalink / raw)
  To: martin.petersen, lduncan, cleech; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

From: Nilesh Javali <njavali@marvell.com>

While aborting the IO, the FW cleanup task timed out and driver
deleted the IO from active command list. Some time later, the FW
sends the cleanup task response and driver again deletes the IO
from active command list causing FW to send IO completion for
non-existent IO and list_del corruption of active command list.

Add fix to check if IO is present before deleting the i/o from active
command list ensuring FW sends valid i/o completion and protect the
list_del corruption.

Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
---
 drivers/scsi/qedi/qedi_fw.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c
index 6ed74583b1b9..34c477bda8a4 100644
--- a/drivers/scsi/qedi/qedi_fw.c
+++ b/drivers/scsi/qedi/qedi_fw.c
@@ -816,8 +816,11 @@ static void qedi_process_cmd_cleanup_resp(struct qedi_ctx *qedi,
 			qedi_clear_task_idx(qedi_conn->qedi, rtid);
 
 			spin_lock(&qedi_conn->list_lock);
-			list_del_init(&dbg_cmd->io_cmd);
-			qedi_conn->active_cmd_count--;
+			if (likely(dbg_cmd->io_cmd_in_list)) {
+				dbg_cmd->io_cmd_in_list = false;
+				list_del_init(&dbg_cmd->io_cmd);
+				qedi_conn->active_cmd_count--;
+			}
 			spin_unlock(&qedi_conn->list_lock);
 			qedi_cmd->state = CLEANUP_RECV;
 			wake_up_interruptible(&qedi_conn->wait_queue);
@@ -1235,6 +1238,7 @@ int qedi_cleanup_all_io(struct qedi_ctx *qedi, struct qedi_conn *qedi_conn,
 		qedi_conn->cmd_cleanup_req++;
 		qedi_iscsi_cleanup_task(ctask, true);
 
+		cmd->io_cmd_in_list = false;
 		list_del_init(&cmd->io_cmd);
 		qedi_conn->active_cmd_count--;
 		QEDI_WARN(&qedi->dbg_ctx,
@@ -1446,8 +1450,11 @@ static void qedi_tmf_work(struct work_struct *work)
 	spin_unlock_bh(&qedi_conn->tmf_work_lock);
 
 	spin_lock(&qedi_conn->list_lock);
-	list_del_init(&cmd->io_cmd);
-	qedi_conn->active_cmd_count--;
+	if (likely(cmd->io_cmd_in_list)) {
+		cmd->io_cmd_in_list = false;
+		list_del_init(&cmd->io_cmd);
+		qedi_conn->active_cmd_count--;
+	}
 	spin_unlock(&qedi_conn->list_lock);
 
 	clear_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 4/8] qedi: Protect active command list to avoid list corruption
  2020-09-08  9:56 [PATCH v2 0/8] qedi: Misc bug fixes and enhancements Manish Rangankar
                   ` (2 preceding siblings ...)
  2020-09-08  9:56 ` [PATCH v2 3/8] qedi: Fix list_del corruption while removing active IO Manish Rangankar
@ 2020-09-08  9:56 ` Manish Rangankar
  2020-09-08  9:56 ` [PATCH v2 5/8] qedi: Use snprintf instead of sprintf Manish Rangankar
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Manish Rangankar @ 2020-09-08  9:56 UTC (permalink / raw)
  To: martin.petersen, lduncan, cleech; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

From: Nilesh Javali <njavali@marvell.com>

Protect active command list for non i/o commands like,
login response, logout response, text response and recovery
cleanup of active list to avoid list corruption.

Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
---
 drivers/scsi/qedi/qedi_fw.c    | 8 ++++++++
 drivers/scsi/qedi/qedi_iscsi.c | 2 ++
 2 files changed, 10 insertions(+)

diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c
index 34c477bda8a4..f158fde0a43c 100644
--- a/drivers/scsi/qedi/qedi_fw.c
+++ b/drivers/scsi/qedi/qedi_fw.c
@@ -59,6 +59,7 @@ static void qedi_process_logout_resp(struct qedi_ctx *qedi,
 		  "Freeing tid=0x%x for cid=0x%x\n",
 		  cmd->task_id, qedi_conn->iscsi_conn_id);
 
+	spin_lock(&qedi_conn->list_lock);
 	if (likely(cmd->io_cmd_in_list)) {
 		cmd->io_cmd_in_list = false;
 		list_del_init(&cmd->io_cmd);
@@ -69,6 +70,7 @@ static void qedi_process_logout_resp(struct qedi_ctx *qedi,
 			  cmd->task_id, qedi_conn->iscsi_conn_id,
 			  &cmd->io_cmd);
 	}
+	spin_unlock(&qedi_conn->list_lock);
 
 	cmd->state = RESPONSE_RECEIVED;
 	qedi_clear_task_idx(qedi, cmd->task_id);
@@ -122,6 +124,7 @@ static void qedi_process_text_resp(struct qedi_ctx *qedi,
 		  "Freeing tid=0x%x for cid=0x%x\n",
 		  cmd->task_id, qedi_conn->iscsi_conn_id);
 
+	spin_lock(&qedi_conn->list_lock);
 	if (likely(cmd->io_cmd_in_list)) {
 		cmd->io_cmd_in_list = false;
 		list_del_init(&cmd->io_cmd);
@@ -132,6 +135,7 @@ static void qedi_process_text_resp(struct qedi_ctx *qedi,
 			  cmd->task_id, qedi_conn->iscsi_conn_id,
 			  &cmd->io_cmd);
 	}
+	spin_unlock(&qedi_conn->list_lock);
 
 	cmd->state = RESPONSE_RECEIVED;
 	qedi_clear_task_idx(qedi, cmd->task_id);
@@ -222,11 +226,13 @@ static void qedi_process_tmf_resp(struct qedi_ctx *qedi,
 
 	tmf_hdr = (struct iscsi_tm *)qedi_cmd->task->hdr;
 
+	spin_lock(&qedi_conn->list_lock);
 	if (likely(qedi_cmd->io_cmd_in_list)) {
 		qedi_cmd->io_cmd_in_list = false;
 		list_del_init(&qedi_cmd->io_cmd);
 		qedi_conn->active_cmd_count--;
 	}
+	spin_unlock(&qedi_conn->list_lock);
 
 	if (((tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) ==
 	      ISCSI_TM_FUNC_LOGICAL_UNIT_RESET) ||
@@ -288,11 +294,13 @@ static void qedi_process_login_resp(struct qedi_ctx *qedi,
 		  ISCSI_LOGIN_RESPONSE_HDR_DATA_SEG_LEN_MASK;
 	qedi_conn->gen_pdu.resp_wr_ptr = qedi_conn->gen_pdu.resp_buf + pld_len;
 
+	spin_lock(&qedi_conn->list_lock);
 	if (likely(cmd->io_cmd_in_list)) {
 		cmd->io_cmd_in_list = false;
 		list_del_init(&cmd->io_cmd);
 		qedi_conn->active_cmd_count--;
 	}
+	spin_unlock(&qedi_conn->list_lock);
 
 	memset(task_ctx, '\0', sizeof(*task_ctx));
 
diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c
index f815845fc568..ae86a40ca040 100644
--- a/drivers/scsi/qedi/qedi_iscsi.c
+++ b/drivers/scsi/qedi/qedi_iscsi.c
@@ -975,11 +975,13 @@ static void qedi_cleanup_active_cmd_list(struct qedi_conn *qedi_conn)
 {
 	struct qedi_cmd *cmd, *cmd_tmp;
 
+	spin_lock(&qedi_conn->list_lock);
 	list_for_each_entry_safe(cmd, cmd_tmp, &qedi_conn->active_cmd_list,
 				 io_cmd) {
 		list_del_init(&cmd->io_cmd);
 		qedi_conn->active_cmd_count--;
 	}
+	spin_unlock(&qedi_conn->list_lock);
 }
 
 static void qedi_ep_disconnect(struct iscsi_endpoint *ep)
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 5/8] qedi: Use snprintf instead of sprintf
  2020-09-08  9:56 [PATCH v2 0/8] qedi: Misc bug fixes and enhancements Manish Rangankar
                   ` (3 preceding siblings ...)
  2020-09-08  9:56 ` [PATCH v2 4/8] qedi: Protect active command list to avoid list corruption Manish Rangankar
@ 2020-09-08  9:56 ` Manish Rangankar
  2020-09-08  9:56 ` [PATCH v2 6/8] qedi: Mark all connections for recovery on link down event Manish Rangankar
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Manish Rangankar @ 2020-09-08  9:56 UTC (permalink / raw)
  To: martin.petersen, lduncan, cleech; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

Use snprintf to limit max number of bytes to the buffer.

Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
---
 drivers/scsi/qedi/qedi_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
index e1ec22d7d699..2db99613b8a9 100644
--- a/drivers/scsi/qedi/qedi_main.c
+++ b/drivers/scsi/qedi/qedi_main.c
@@ -2538,7 +2538,7 @@ static int __qedi_probe(struct pci_dev *pdev, int mode)
 	QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_DISC, "MAC address is %pM.\n",
 		  qedi->mac);
 
-	sprintf(host_buf, "host_%d", qedi->shost->host_no);
+	snprintf(host_buf, sizeof(host_buf), "host_%d", qedi->shost->host_no);
 	qedi_ops->common->set_name(qedi->cdev, host_buf);
 
 	qedi_ops->register_ops(qedi->cdev, &qedi_cb_ops, qedi);
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 6/8] qedi: Mark all connections for recovery on link down event
  2020-09-08  9:56 [PATCH v2 0/8] qedi: Misc bug fixes and enhancements Manish Rangankar
                   ` (4 preceding siblings ...)
  2020-09-08  9:56 ` [PATCH v2 5/8] qedi: Use snprintf instead of sprintf Manish Rangankar
@ 2020-09-08  9:56 ` Manish Rangankar
  2020-09-08  9:56 ` [PATCH v2 7/8] qedi: Add firmware error recovery invocation support Manish Rangankar
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Manish Rangankar @ 2020-09-08  9:56 UTC (permalink / raw)
  To: martin.petersen, lduncan, cleech; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

From: Nilesh Javali <njavali@marvell.com>

  For short time cable pulls, the in-flight i/o to the FW,
is never cleaned up, resulting in the behaviour of stale i/o
completion causing list_del corruption and soft lockup of the
system. So, on link down event, mark all the connections for
recovery, causing cleanup of all the in-flight i/o immediately.

Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
---
 drivers/scsi/qedi/qedi_main.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
index 2db99613b8a9..4f43e2a24b50 100644
--- a/drivers/scsi/qedi/qedi_main.c
+++ b/drivers/scsi/qedi/qedi_main.c
@@ -1127,6 +1127,15 @@ static void qedi_schedule_recovery_handler(void *dev)
 	schedule_delayed_work(&qedi->recovery_work, 0);
 }
 
+static void qedi_set_conn_recovery(struct iscsi_cls_session *cls_session)
+{
+	struct iscsi_session *session = cls_session->dd_data;
+	struct iscsi_conn *conn = session->leadconn;
+	struct qedi_conn *qedi_conn = conn->dd_data;
+
+	qedi_start_conn_recovery(qedi_conn->qedi, qedi_conn);
+}
+
 static void qedi_link_update(void *dev, struct qed_link_output *link)
 {
 	struct qedi_ctx *qedi = (struct qedi_ctx *)dev;
@@ -1138,6 +1147,7 @@ static void qedi_link_update(void *dev, struct qed_link_output *link)
 		QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_INFO,
 			  "Link Down event.\n");
 		atomic_set(&qedi->link_state, QEDI_LINK_DOWN);
+		iscsi_host_for_each_session(qedi->shost, qedi_set_conn_recovery);
 	}
 }
 
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 7/8] qedi: Add firmware error recovery invocation support.
  2020-09-08  9:56 [PATCH v2 0/8] qedi: Misc bug fixes and enhancements Manish Rangankar
                   ` (5 preceding siblings ...)
  2020-09-08  9:56 ` [PATCH v2 6/8] qedi: Mark all connections for recovery on link down event Manish Rangankar
@ 2020-09-08  9:56 ` Manish Rangankar
  2020-09-08  9:56 ` [PATCH v2 8/8] qedi: Add support for handling the pcie errors Manish Rangankar
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Manish Rangankar @ 2020-09-08  9:56 UTC (permalink / raw)
  To: martin.petersen, lduncan, cleech; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

Add support to initiate MFW process recovery for all
the devices if storage function receive the event first.

Also added fix for kernel test robot warning,

>> drivers/scsi/qedi/qedi_main.c:1119:6: warning: no previous prototype
>> for 'qedi_schedule_hw_err_handler' [-Wmissing-prototypes]

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
---
 drivers/scsi/qedi/qedi.h       |  4 +++
 drivers/scsi/qedi/qedi_fw.c    |  7 ++--
 drivers/scsi/qedi/qedi_iscsi.c |  3 +-
 drivers/scsi/qedi/qedi_main.c  | 63 +++++++++++++++++++++++++++++++++-
 4 files changed, 73 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/qedi/qedi.h b/drivers/scsi/qedi/qedi.h
index 9c19ec9dc682..7e59d50f2fab 100644
--- a/drivers/scsi/qedi/qedi.h
+++ b/drivers/scsi/qedi/qedi.h
@@ -274,6 +274,10 @@ struct qedi_ctx {
 	spinlock_t ll2_lock;	/* Light L2 lock */
 	spinlock_t hba_lock;	/* per port lock */
 	struct task_struct *ll2_recv_thread;
+	unsigned long qedi_err_flags;
+#define QEDI_ERR_ATTN_CLR_EN	0
+#define QEDI_ERR_IS_RECOVERABLE	2
+#define QEDI_ERR_OVERRIDE_EN	31
 	unsigned long flags;
 #define UIO_DEV_OPENED		1
 #define QEDI_IOTHREAD_WAKE	2
diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c
index f158fde0a43c..440ddd2309f1 100644
--- a/drivers/scsi/qedi/qedi_fw.c
+++ b/drivers/scsi/qedi/qedi_fw.c
@@ -1267,7 +1267,8 @@ int qedi_cleanup_all_io(struct qedi_ctx *qedi, struct qedi_conn *qedi_conn,
 	rval  = wait_event_interruptible_timeout(qedi_conn->wait_queue,
 						 ((qedi_conn->cmd_cleanup_req ==
 						 qedi_conn->cmd_cleanup_cmpl) ||
-						 qedi_conn->ep),
+						 test_bit(QEDI_IN_RECOVERY,
+							  &qedi->flags)),
 						 5 * HZ);
 	if (rval) {
 		QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_SCSI_TM,
@@ -1292,7 +1293,9 @@ int qedi_cleanup_all_io(struct qedi_ctx *qedi, struct qedi_conn *qedi_conn,
 	/* Enable IOs for all other sessions except current.*/
 	if (!wait_event_interruptible_timeout(qedi_conn->wait_queue,
 					      (qedi_conn->cmd_cleanup_req ==
-					       qedi_conn->cmd_cleanup_cmpl),
+					       qedi_conn->cmd_cleanup_cmpl) ||
+					       test_bit(QEDI_IN_RECOVERY,
+							&qedi->flags),
 					      5 * HZ)) {
 		iscsi_host_for_each_session(qedi->shost,
 					    qedi_mark_device_available);
diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c
index ae86a40ca040..08c05403cd72 100644
--- a/drivers/scsi/qedi/qedi_iscsi.c
+++ b/drivers/scsi/qedi/qedi_iscsi.c
@@ -1072,7 +1072,8 @@ static void qedi_ep_disconnect(struct iscsi_endpoint *ep)
 
 	qedi_ep->state = EP_STATE_DISCONN_START;
 
-	if (test_bit(QEDI_IN_SHUTDOWN, &qedi->flags))
+	if (test_bit(QEDI_IN_SHUTDOWN, &qedi->flags) ||
+	    test_bit(QEDI_IN_RECOVERY, &qedi->flags))
 		goto ep_release_conn;
 
 	ret = qedi_ops->destroy_conn(qedi->cdev, qedi_ep->handle, abrt_conn);
diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
index 4f43e2a24b50..1d8c78c41405 100644
--- a/drivers/scsi/qedi/qedi_main.c
+++ b/drivers/scsi/qedi/qedi_main.c
@@ -50,6 +50,10 @@ module_param(qedi_ll2_buf_size, uint, 0644);
 MODULE_PARM_DESC(qedi_ll2_buf_size,
 		 "parameter to set ping packet size, default - 0x400, Jumbo packets - 0x2400.");
 
+static uint qedi_flags_override;
+module_param(qedi_flags_override, uint, 0644);
+MODULE_PARM_DESC(qedi_flags_override, "Disable/Enable MFW error flags bits action.");
+
 const struct qed_iscsi_ops *qedi_ops;
 static struct scsi_transport_template *qedi_scsi_transport;
 static struct pci_driver qedi_pci_driver;
@@ -63,6 +67,8 @@ static void qedi_reset_uio_rings(struct qedi_uio_dev *udev);
 static void qedi_ll2_free_skbs(struct qedi_ctx *qedi);
 static struct nvm_iscsi_block *qedi_get_nvram_block(struct qedi_ctx *qedi);
 static void qedi_recovery_handler(struct work_struct *work);
+static void qedi_schedule_hw_err_handler(void *dev,
+					 enum qed_hw_err_type err_type);
 
 static int qedi_iscsi_event_cb(void *context, u8 fw_event_code, void *fw_handle)
 {
@@ -1113,6 +1119,39 @@ static void qedi_get_protocol_tlv_data(void *dev, void *data)
 	return;
 }
 
+void qedi_schedule_hw_err_handler(void *dev,
+				  enum qed_hw_err_type err_type)
+{
+	struct qedi_ctx *qedi = (struct qedi_ctx *)dev;
+	unsigned long override_flags = qedi_flags_override;
+
+	if (override_flags && test_bit(QEDI_ERR_OVERRIDE_EN, &override_flags))
+		qedi->qedi_err_flags = qedi_flags_override;
+
+	QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_INFO,
+		  "HW error handler scheduled, err=%d err_flags=0x%x\n",
+		  err_type, qedi->qedi_err_flags);
+
+	switch (err_type) {
+	case QED_HW_ERR_MFW_RESP_FAIL:
+	case QED_HW_ERR_HW_ATTN:
+	case QED_HW_ERR_DMAE_FAIL:
+	case QED_HW_ERR_RAMROD_FAIL:
+	case QED_HW_ERR_FW_ASSERT:
+		/* Prevent HW attentions from being reasserted */
+		if (test_bit(QEDI_ERR_ATTN_CLR_EN, &qedi->qedi_err_flags))
+			qedi_ops->common->attn_clr_enable(qedi->cdev, true);
+
+		if (err_type == QED_HW_ERR_RAMROD_FAIL &&
+		    test_bit(QEDI_ERR_IS_RECOVERABLE, &qedi->qedi_err_flags))
+			qedi_ops->common->recovery_process(qedi->cdev);
+
+		break;
+	default:
+		break;
+	}
+}
+
 static void qedi_schedule_recovery_handler(void *dev)
 {
 	struct qedi_ctx *qedi = dev;
@@ -1155,6 +1194,7 @@ static struct qed_iscsi_cb_ops qedi_cb_ops = {
 	{
 		.link_update =		qedi_link_update,
 		.schedule_recovery_handler = qedi_schedule_recovery_handler,
+		.schedule_hw_err_handler = qedi_schedule_hw_err_handler,
 		.get_protocol_tlv_data = qedi_get_protocol_tlv_data,
 		.get_generic_tlv_data = qedi_get_generic_tlv_data,
 	}
@@ -2355,6 +2395,7 @@ static void __qedi_remove(struct pci_dev *pdev, int mode)
 {
 	struct qedi_ctx *qedi = pci_get_drvdata(pdev);
 	int rval;
+	u16 retry = 10;
 
 	if (mode == QEDI_MODE_SHUTDOWN)
 		iscsi_host_for_each_session(qedi->shost,
@@ -2383,7 +2424,13 @@ static void __qedi_remove(struct pci_dev *pdev, int mode)
 	qedi_sync_free_irqs(qedi);
 
 	if (!test_bit(QEDI_IN_OFFLINE, &qedi->flags)) {
-		qedi_ops->stop(qedi->cdev);
+		while (retry--) {
+			rval = qedi_ops->stop(qedi->cdev);
+			if (rval < 0)
+				msleep(1000);
+			else
+				break;
+		}
 		qedi_ops->ll2->stop(qedi->cdev);
 	}
 
@@ -2442,6 +2489,7 @@ static int __qedi_probe(struct pci_dev *pdev, int mode)
 	struct qed_probe_params qed_params;
 	void *task_start, *task_end;
 	int rc;
+	u16 retry = 10;
 
 	if (mode != QEDI_MODE_RECOVERY) {
 		qedi = qedi_host_alloc(pdev);
@@ -2453,6 +2501,10 @@ static int __qedi_probe(struct pci_dev *pdev, int mode)
 		qedi = pci_get_drvdata(pdev);
 	}
 
+retry_probe:
+	if (mode == QEDI_MODE_RECOVERY)
+		msleep(2000);
+
 	memset(&qed_params, 0, sizeof(qed_params));
 	qed_params.protocol = QED_PROTOCOL_ISCSI;
 	qed_params.dp_module = qedi_qed_debug;
@@ -2460,11 +2512,20 @@ static int __qedi_probe(struct pci_dev *pdev, int mode)
 	qed_params.is_vf = is_vf;
 	qedi->cdev = qedi_ops->common->probe(pdev, &qed_params);
 	if (!qedi->cdev) {
+		if (mode == QEDI_MODE_RECOVERY && retry) {
+			QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_INFO,
+				  "Retry %d initialize hardware\n", retry);
+			retry--;
+			goto retry_probe;
+		}
+
 		rc = -ENODEV;
 		QEDI_ERR(&qedi->dbg_ctx, "Cannot initialize hardware\n");
 		goto free_host;
 	}
 
+	set_bit(QEDI_ERR_ATTN_CLR_EN, &qedi->qedi_err_flags);
+	set_bit(QEDI_ERR_IS_RECOVERABLE, &qedi->qedi_err_flags);
 	atomic_set(&qedi->link_state, QEDI_LINK_DOWN);
 
 	rc = qedi_ops->fill_dev_info(qedi->cdev, &qedi->dev_info);
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 8/8] qedi: Add support for handling the pcie errors.
  2020-09-08  9:56 [PATCH v2 0/8] qedi: Misc bug fixes and enhancements Manish Rangankar
                   ` (6 preceding siblings ...)
  2020-09-08  9:56 ` [PATCH v2 7/8] qedi: Add firmware error recovery invocation support Manish Rangankar
@ 2020-09-08  9:56 ` Manish Rangankar
  2020-09-09  2:42 ` [PATCH v2 0/8] qedi: Misc bug fixes and enhancements Martin K. Petersen
  2020-09-15 20:16 ` Martin K. Petersen
  9 siblings, 0 replies; 11+ messages in thread
From: Manish Rangankar @ 2020-09-08  9:56 UTC (permalink / raw)
  To: martin.petersen, lduncan, cleech; +Cc: linux-scsi, GR-QLogic-Storage-Upstream

The error recovery is handled by management firmware (MFW) with the help of
qed/qedi drivers. Upon detecting the errors, driver informs MFW about this
event which in turn starts a recovery process. MFW sends ERROR_RECOVERY
notification to the driver which performs the required cleanup/recovery
from the driver side.

Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
---
 drivers/scsi/qedi/qedi_main.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
index 1d8c78c41405..10a7ee055552 100644
--- a/drivers/scsi/qedi/qedi_main.c
+++ b/drivers/scsi/qedi/qedi_main.c
@@ -2391,6 +2391,25 @@ static int qedi_setup_boot_info(struct qedi_ctx *qedi)
 	return -ENOMEM;
 }
 
+static pci_ers_result_t qedi_io_error_detected(struct pci_dev *pdev,
+					       pci_channel_state_t state)
+{
+	struct qedi_ctx *qedi = pci_get_drvdata(pdev);
+
+	QEDI_ERR(&qedi->dbg_ctx, "%s: PCI error detected [%d]\n",
+		 __func__, state);
+
+	if (test_and_set_bit(QEDI_IN_RECOVERY, &qedi->flags)) {
+		QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_INFO,
+			  "Recovery already in progress.\n");
+		return PCI_ERS_RESULT_NONE;
+	}
+
+	qedi_ops->common->recovery_process(qedi->cdev);
+
+	return PCI_ERS_RESULT_CAN_RECOVER;
+}
+
 static void __qedi_remove(struct pci_dev *pdev, int mode)
 {
 	struct qedi_ctx *qedi = pci_get_drvdata(pdev);
@@ -2820,12 +2839,17 @@ MODULE_DEVICE_TABLE(pci, qedi_pci_tbl);
 
 static enum cpuhp_state qedi_cpuhp_state;
 
+static struct pci_error_handlers qedi_err_handler = {
+	.error_detected = qedi_io_error_detected,
+};
+
 static struct pci_driver qedi_pci_driver = {
 	.name = QEDI_MODULE_NAME,
 	.id_table = qedi_pci_tbl,
 	.probe = qedi_probe,
 	.remove = qedi_remove,
 	.shutdown = qedi_shutdown,
+	.err_handler = &qedi_err_handler,
 };
 
 static int __init qedi_init(void)
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/8] qedi: Misc bug fixes and enhancements
  2020-09-08  9:56 [PATCH v2 0/8] qedi: Misc bug fixes and enhancements Manish Rangankar
                   ` (7 preceding siblings ...)
  2020-09-08  9:56 ` [PATCH v2 8/8] qedi: Add support for handling the pcie errors Manish Rangankar
@ 2020-09-09  2:42 ` Martin K. Petersen
  2020-09-15 20:16 ` Martin K. Petersen
  9 siblings, 0 replies; 11+ messages in thread
From: Martin K. Petersen @ 2020-09-09  2:42 UTC (permalink / raw)
  To: Manish Rangankar
  Cc: martin.petersen, lduncan, cleech, linux-scsi, GR-QLogic-Storage-Upstream


Manish,

> Please apply the qedi miscellaneous bug fixes and enhancement patches
> to the scsi tree at your convenience.

Applied to the 5.10 SCSI staging tree. Thanks!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/8] qedi: Misc bug fixes and enhancements
  2020-09-08  9:56 [PATCH v2 0/8] qedi: Misc bug fixes and enhancements Manish Rangankar
                   ` (8 preceding siblings ...)
  2020-09-09  2:42 ` [PATCH v2 0/8] qedi: Misc bug fixes and enhancements Martin K. Petersen
@ 2020-09-15 20:16 ` Martin K. Petersen
  9 siblings, 0 replies; 11+ messages in thread
From: Martin K. Petersen @ 2020-09-15 20:16 UTC (permalink / raw)
  To: lduncan, Manish Rangankar, cleech
  Cc: Martin K . Petersen, GR-QLogic-Storage-Upstream, linux-scsi

On Tue, 8 Sep 2020 02:56:49 -0700, Manish Rangankar wrote:

> Please apply the qedi miscellaneous bug fixes and enhancement patches
> to the scsi tree at your convenience.
> 
> v1->v2:
> Fix warning reported by kernel test robot
> 
> Thanks,
> Manish
> 
> [...]

Applied to 5.10/scsi-queue, thanks!

[1/8] scsi: qedi: Use qed count from set_fp_int in msix allocation
      https://git.kernel.org/mkp/scsi/c/3f8ad0072bf7
[2/8] scsi: qedi: Skip firmware connection termination for PCI shutdown handler
      https://git.kernel.org/mkp/scsi/c/5c35e4646566
[3/8] scsi: qedi: Fix list_del corruption while removing active I/O
      https://git.kernel.org/mkp/scsi/c/28b35d17f9f8
[4/8] scsi: qedi: Protect active command list to avoid list corruption
      https://git.kernel.org/mkp/scsi/c/c0650e28448d
[5/8] scsi: qedi: Use snprintf instead of sprintf
      https://git.kernel.org/mkp/scsi/c/5a2e69af16ce
[6/8] scsi: qedi: Mark all connections for recovery on link down event
      https://git.kernel.org/mkp/scsi/c/4118879be375
[7/8] scsi: qedi: Add firmware error recovery invocation support
      https://git.kernel.org/mkp/scsi/c/f4ba4e55db6d
[8/8] scsi: qedi: Add support for handling PCIe errors
      https://git.kernel.org/mkp/scsi/c/96a766a789eb

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-09-16  2:45 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-08  9:56 [PATCH v2 0/8] qedi: Misc bug fixes and enhancements Manish Rangankar
2020-09-08  9:56 ` [PATCH v2 1/8] qedi: Use qed count from set_fp_int in msix allocation Manish Rangankar
2020-09-08  9:56 ` [PATCH v2 2/8] qedi: Skip f/w connection termination for pci shutdown handler Manish Rangankar
2020-09-08  9:56 ` [PATCH v2 3/8] qedi: Fix list_del corruption while removing active IO Manish Rangankar
2020-09-08  9:56 ` [PATCH v2 4/8] qedi: Protect active command list to avoid list corruption Manish Rangankar
2020-09-08  9:56 ` [PATCH v2 5/8] qedi: Use snprintf instead of sprintf Manish Rangankar
2020-09-08  9:56 ` [PATCH v2 6/8] qedi: Mark all connections for recovery on link down event Manish Rangankar
2020-09-08  9:56 ` [PATCH v2 7/8] qedi: Add firmware error recovery invocation support Manish Rangankar
2020-09-08  9:56 ` [PATCH v2 8/8] qedi: Add support for handling the pcie errors Manish Rangankar
2020-09-09  2:42 ` [PATCH v2 0/8] qedi: Misc bug fixes and enhancements Martin K. Petersen
2020-09-15 20:16 ` Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).