All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/13]: qedi tmf fixes
@ 2021-04-10 18:40 Mike Christie
  2021-04-10 18:40 ` [PATCH 01/13] scsi: iscsi: revert Perform connection failure entirely in kernel space Mike Christie
                   ` (13 more replies)
  0 siblings, 14 replies; 26+ messages in thread
From: Mike Christie @ 2021-04-10 18:40 UTC (permalink / raw)
  To: lduncan, martin.petersen, mrangankar, svernekar, linux-scsi, jejb

The following patches made over Linus's tree fixes a bunch of issues
in qedi's tmf handling.

Manish, I did some tests here. Please test it out and review.

The first patch is maybe just a temp patch, but assume that eventually
those issues will be fixed one way or another. I just reverted it here
to make it easier to test qedi and not worry about if we are hitting
issues in qedi or with that patch.




^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 01/13] scsi: iscsi: revert Perform connection failure entirely in kernel space
  2021-04-10 18:40 [RFC PATCH 0/13]: qedi tmf fixes Mike Christie
@ 2021-04-10 18:40 ` Mike Christie
  2021-04-10 18:40 ` [PATCH 02/13] scsi: qedi: fix null ref during abort handling Mike Christie
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Mike Christie @ 2021-04-10 18:40 UTC (permalink / raw)
  To: lduncan, martin.petersen, mrangankar, svernekar, linux-scsi, jejb
  Cc: Mike Christie

The patch 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely
in kernel space") has the following regressions or bugs:

1. It can return cmds to upper layers like dm-multipath where that can
retry them. After they are successful the fs/app can send new IO to the
same sectors, but we've left the cmds running in FW or in the net layer.
We need to be calling ep_disconnect.

And for iscsi_tcp which does not have a ep_disconnect we need one to
close the socket so IO is not still retransmitting in the net layer
when the upper layers are now retrying the cmd and sending new IO.

2. The drivers that implement ep_disconnect expect that it's called
before conn_stop. Besides crashes, if the cleanup_task callout is called
before ep_disconnect it might free up driver/card resources for session1
then they could be allocated for session2. But because the driver's
ep_disconnect is not called it has not cleaned up the firmware so the
card is still using the resources for the original cmd.

3. The system shutdown case does not work for the eh path. Passing
stop_conn STOP_CONN_TERM will never block the session and start the
recovery timer, because for that flag userspace will do the unbind
and destroy events which would remove the devices and wake up and kill
the eh. We should be using STOP_CONN_RECOVER.

4. The stop_conn_work_fn can run after userspace has done it's
recovery and we are happily using the session. We will then end up
with various bugs depending on what is going on at the time.

When we add ep_disconnect we need to make sure they only exec once
and exec in order.

5. returning -EAGAIN in iscsi_if_destroy_conn if we haven't yet run
the in kernel stop_conn function is breaking userspace. We should have
been doing this for the caller.

This patch fixes the following issues by just reverting the patch
because I'm worried about the possible data corruption vs how long it
will take to fix all the bugs. If reverting the patch is ok, then we
should look into doing this properly and maybe redoing the interface
because there seem to be several issues with it.

Fixes: 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely
in kernel space")
Signed-off-by: Mike Christie <michael.christie@oracle.com>

---
 drivers/scsi/scsi_transport_iscsi.c | 55 -----------------------------
 include/scsi/scsi_transport_iscsi.h |  1 -
 2 files changed, 56 deletions(-)

diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c
index f4bf62b007a0..2821cd3ddbde 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -86,12 +86,6 @@ struct iscsi_internal {
 	struct transport_container session_cont;
 };
 
-/* Worker to perform connection failure on unresponsive connections
- * completely in kernel space.
- */
-static void stop_conn_work_fn(struct work_struct *work);
-static DECLARE_WORK(stop_conn_work, stop_conn_work_fn);
-
 static atomic_t iscsi_session_nr; /* sysfs session id for next new session */
 static struct workqueue_struct *iscsi_eh_timer_workq;
 
@@ -1629,7 +1623,6 @@ static DEFINE_MUTEX(conn_mutex);
 static LIST_HEAD(sesslist);
 static DEFINE_SPINLOCK(sesslock);
 static LIST_HEAD(connlist);
-static LIST_HEAD(connlist_err);
 static DEFINE_SPINLOCK(connlock);
 
 static uint32_t iscsi_conn_get_sid(struct iscsi_cls_conn *conn)
@@ -2281,7 +2274,6 @@ iscsi_create_conn(struct iscsi_cls_session *session, int dd_size, uint32_t cid)
 
 	mutex_init(&conn->ep_mutex);
 	INIT_LIST_HEAD(&conn->conn_list);
-	INIT_LIST_HEAD(&conn->conn_list_err);
 	conn->transport = transport;
 	conn->cid = cid;
 	conn->state = ISCSI_CONN_DOWN;
@@ -2338,7 +2330,6 @@ int iscsi_destroy_conn(struct iscsi_cls_conn *conn)
 
 	spin_lock_irqsave(&connlock, flags);
 	list_del(&conn->conn_list);
-	list_del(&conn->conn_list_err);
 	spin_unlock_irqrestore(&connlock, flags);
 
 	transport_unregister_device(&conn->dev);
@@ -2480,38 +2471,6 @@ static void iscsi_if_stop_conn(struct iscsi_cls_conn *conn, int flag)
 
 }
 
-static void stop_conn_work_fn(struct work_struct *work)
-{
-	struct iscsi_cls_conn *conn, *tmp;
-	unsigned long flags;
-	LIST_HEAD(recovery_list);
-
-	spin_lock_irqsave(&connlock, flags);
-	if (list_empty(&connlist_err)) {
-		spin_unlock_irqrestore(&connlock, flags);
-		return;
-	}
-	list_splice_init(&connlist_err, &recovery_list);
-	spin_unlock_irqrestore(&connlock, flags);
-
-	list_for_each_entry_safe(conn, tmp, &recovery_list, conn_list_err) {
-		uint32_t sid = iscsi_conn_get_sid(conn);
-		struct iscsi_cls_session *session;
-
-		session = iscsi_session_lookup(sid);
-		if (session) {
-			if (system_state != SYSTEM_RUNNING) {
-				session->recovery_tmo = 0;
-				iscsi_if_stop_conn(conn, STOP_CONN_TERM);
-			} else {
-				iscsi_if_stop_conn(conn, STOP_CONN_RECOVER);
-			}
-		}
-
-		list_del_init(&conn->conn_list_err);
-	}
-}
-
 void iscsi_conn_error_event(struct iscsi_cls_conn *conn, enum iscsi_err error)
 {
 	struct nlmsghdr	*nlh;
@@ -2519,12 +2478,6 @@ void iscsi_conn_error_event(struct iscsi_cls_conn *conn, enum iscsi_err error)
 	struct iscsi_uevent *ev;
 	struct iscsi_internal *priv;
 	int len = nlmsg_total_size(sizeof(*ev));
-	unsigned long flags;
-
-	spin_lock_irqsave(&connlock, flags);
-	list_add(&conn->conn_list_err, &connlist_err);
-	spin_unlock_irqrestore(&connlock, flags);
-	queue_work(system_unbound_wq, &stop_conn_work);
 
 	priv = iscsi_if_transport_lookup(conn->transport);
 	if (!priv)
@@ -2854,19 +2807,11 @@ static int
 iscsi_if_destroy_conn(struct iscsi_transport *transport, struct iscsi_uevent *ev)
 {
 	struct iscsi_cls_conn *conn;
-	unsigned long flags;
 
 	conn = iscsi_conn_lookup(ev->u.d_conn.sid, ev->u.d_conn.cid);
 	if (!conn)
 		return -EINVAL;
 
-	spin_lock_irqsave(&connlock, flags);
-	if (!list_empty(&conn->conn_list_err)) {
-		spin_unlock_irqrestore(&connlock, flags);
-		return -EAGAIN;
-	}
-	spin_unlock_irqrestore(&connlock, flags);
-
 	ISCSI_DBG_TRANS_CONN(conn, "Destroying transport conn\n");
 
 	mutex_lock(&conn_mutex);
diff --git a/include/scsi/scsi_transport_iscsi.h b/include/scsi/scsi_transport_iscsi.h
index fc5a39839b4b..36e2c1dcb059 100644
--- a/include/scsi/scsi_transport_iscsi.h
+++ b/include/scsi/scsi_transport_iscsi.h
@@ -198,7 +198,6 @@ enum iscsi_connection_state {
 
 struct iscsi_cls_conn {
 	struct list_head conn_list;	/* item in connlist */
-	struct list_head conn_list_err;	/* item in connlist_err */
 	void *dd_data;			/* LLD private data */
 	struct iscsi_transport *transport;
 	uint32_t cid;			/* connection id */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 02/13] scsi: qedi: fix null ref during abort handling
  2021-04-10 18:40 [RFC PATCH 0/13]: qedi tmf fixes Mike Christie
  2021-04-10 18:40 ` [PATCH 01/13] scsi: iscsi: revert Perform connection failure entirely in kernel space Mike Christie
@ 2021-04-10 18:40 ` Mike Christie
  2021-04-12 10:25   ` [EXT] " Manish Rangankar
  2021-04-10 18:40 ` [PATCH 03/13] scsi: iscsi: add task completion helper Mike Christie
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Mike Christie @ 2021-04-10 18:40 UTC (permalink / raw)
  To: lduncan, martin.petersen, mrangankar, svernekar, linux-scsi, jejb
  Cc: Mike Christie

If qedi_process_cmd_cleanup_resp finds the cmd it frees the work and sets
list_tmf_work to NULL, so qedi_tmf_work should check if list_tmf_work is
non-NULL when it wants to force cleanup.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/qedi/qedi_fw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c
index 440ddd2309f1..cf57b4e49700 100644
--- a/drivers/scsi/qedi/qedi_fw.c
+++ b/drivers/scsi/qedi/qedi_fw.c
@@ -1453,7 +1453,7 @@ static void qedi_tmf_work(struct work_struct *work)
 
 ldel_exit:
 	spin_lock_bh(&qedi_conn->tmf_work_lock);
-	if (!qedi_cmd->list_tmf_work) {
+	if (qedi_cmd->list_tmf_work) {
 		list_del_init(&list_work->list);
 		qedi_cmd->list_tmf_work = NULL;
 		kfree(list_work);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 03/13] scsi: iscsi: add task completion helper
  2021-04-10 18:40 [RFC PATCH 0/13]: qedi tmf fixes Mike Christie
  2021-04-10 18:40 ` [PATCH 01/13] scsi: iscsi: revert Perform connection failure entirely in kernel space Mike Christie
  2021-04-10 18:40 ` [PATCH 02/13] scsi: qedi: fix null ref during abort handling Mike Christie
@ 2021-04-10 18:40 ` Mike Christie
  2021-04-10 18:40 ` [PATCH 04/13] scsi: qedi: fix abort vs cmd re-use race Mike Christie
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Mike Christie @ 2021-04-10 18:40 UTC (permalink / raw)
  To: lduncan, martin.petersen, mrangankar, svernekar, linux-scsi, jejb
  Cc: Mike Christie

This adds a helper to detect if a cmd has completed but not yet
freed.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 include/scsi/libiscsi.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/scsi/libiscsi.h b/include/scsi/libiscsi.h
index 02f966e9358f..8c6d358a8abc 100644
--- a/include/scsi/libiscsi.h
+++ b/include/scsi/libiscsi.h
@@ -145,6 +145,13 @@ static inline void* iscsi_next_hdr(struct iscsi_task *task)
 	return (void*)task->hdr + task->hdr_len;
 }
 
+static inline bool iscsi_task_is_completed(struct iscsi_task *task)
+{
+	return task->state == ISCSI_TASK_COMPLETED ||
+	       task->state == ISCSI_TASK_ABRT_TMF ||
+	       task->state == ISCSI_TASK_ABRT_SESS_RECOV;
+}
+
 /* Connection's states */
 enum {
 	ISCSI_CONN_INITIAL_STAGE,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 04/13] scsi: qedi: fix abort vs cmd re-use race
  2021-04-10 18:40 [RFC PATCH 0/13]: qedi tmf fixes Mike Christie
                   ` (2 preceding siblings ...)
  2021-04-10 18:40 ` [PATCH 03/13] scsi: iscsi: add task completion helper Mike Christie
@ 2021-04-10 18:40 ` Mike Christie
  2021-04-12 10:24   ` [EXT] " Manish Rangankar
  2021-04-10 18:40 ` [PATCH 05/13] scsi: qedi: fix use after free during abort cleanup Mike Christie
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Mike Christie @ 2021-04-10 18:40 UTC (permalink / raw)
  To: lduncan, martin.petersen, mrangankar, svernekar, linux-scsi, jejb
  Cc: Mike Christie

If the scsi cmd completes after qedi_tmf_work calls iscsi_itt_to_task
then the qedi qedi_cmd->task_id could be freed and used for another
cmd. If we then call qedi_iscsi_cleanup_task with that task_id we will
be cleaning up the wrong cmd.

This patch has us wait to release the task_id until the last put has been
done on the iscsi_task. Because libiscsi grabs a ref to the task when
sending the abort, we know that for the non abort timeout case that the
task_id we are referencing is for the cmd that was supposed to be aborted.

The next patch will fix the case where the abort timesout while we are
running qedi_tmf_work.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/qedi/qedi_fw.c    | 13 -------------
 drivers/scsi/qedi/qedi_iscsi.c | 20 +++++++++++++++++---
 2 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c
index cf57b4e49700..ad4357e4c15d 100644
--- a/drivers/scsi/qedi/qedi_fw.c
+++ b/drivers/scsi/qedi/qedi_fw.c
@@ -73,7 +73,6 @@ static void qedi_process_logout_resp(struct qedi_ctx *qedi,
 	spin_unlock(&qedi_conn->list_lock);
 
 	cmd->state = RESPONSE_RECEIVED;
-	qedi_clear_task_idx(qedi, cmd->task_id);
 	__iscsi_complete_pdu(conn, (struct iscsi_hdr *)resp_hdr, NULL, 0);
 
 	spin_unlock(&session->back_lock);
@@ -138,7 +137,6 @@ static void qedi_process_text_resp(struct qedi_ctx *qedi,
 	spin_unlock(&qedi_conn->list_lock);
 
 	cmd->state = RESPONSE_RECEIVED;
-	qedi_clear_task_idx(qedi, cmd->task_id);
 
 	__iscsi_complete_pdu(conn, (struct iscsi_hdr *)resp_hdr_ptr,
 			     qedi_conn->gen_pdu.resp_buf,
@@ -164,13 +162,11 @@ static void qedi_tmf_resp_work(struct work_struct *work)
 	iscsi_block_session(session->cls_session);
 	rval = qedi_cleanup_all_io(qedi, qedi_conn, qedi_cmd->task, true);
 	if (rval) {
-		qedi_clear_task_idx(qedi, qedi_cmd->task_id);
 		iscsi_unblock_session(session->cls_session);
 		goto exit_tmf_resp;
 	}
 
 	iscsi_unblock_session(session->cls_session);
-	qedi_clear_task_idx(qedi, qedi_cmd->task_id);
 
 	spin_lock(&session->back_lock);
 	__iscsi_complete_pdu(conn, (struct iscsi_hdr *)resp_hdr_ptr, NULL, 0);
@@ -245,8 +241,6 @@ static void qedi_process_tmf_resp(struct qedi_ctx *qedi,
 		goto unblock_sess;
 	}
 
-	qedi_clear_task_idx(qedi, qedi_cmd->task_id);
-
 	__iscsi_complete_pdu(conn, (struct iscsi_hdr *)resp_hdr_ptr, NULL, 0);
 	kfree(resp_hdr_ptr);
 
@@ -314,7 +308,6 @@ static void qedi_process_login_resp(struct qedi_ctx *qedi,
 		  "Freeing tid=0x%x for cid=0x%x\n",
 		  cmd->task_id, qedi_conn->iscsi_conn_id);
 	cmd->state = RESPONSE_RECEIVED;
-	qedi_clear_task_idx(qedi, cmd->task_id);
 }
 
 static void qedi_get_rq_bdq_buf(struct qedi_ctx *qedi,
@@ -468,7 +461,6 @@ static int qedi_process_nopin_mesg(struct qedi_ctx *qedi,
 		}
 
 		spin_unlock(&qedi_conn->list_lock);
-		qedi_clear_task_idx(qedi, cmd->task_id);
 	}
 
 done:
@@ -673,7 +665,6 @@ static void qedi_scsi_completion(struct qedi_ctx *qedi,
 	if (qedi_io_tracing)
 		qedi_trace_io(qedi, task, cmd->task_id, QEDI_IO_TRACE_RSP);
 
-	qedi_clear_task_idx(qedi, cmd->task_id);
 	__iscsi_complete_pdu(conn, (struct iscsi_hdr *)hdr,
 			     conn->data, datalen);
 error:
@@ -730,7 +721,6 @@ static void qedi_process_nopin_local_cmpl(struct qedi_ctx *qedi,
 		  cqe->itid, cmd->task_id);
 
 	cmd->state = RESPONSE_RECEIVED;
-	qedi_clear_task_idx(qedi, cmd->task_id);
 
 	spin_lock_bh(&session->back_lock);
 	__iscsi_put_task(task);
@@ -821,8 +811,6 @@ static void qedi_process_cmd_cleanup_resp(struct qedi_ctx *qedi,
 			if (qedi_cmd->state == CLEANUP_WAIT_FAILED)
 				qedi_cmd->state = CLEANUP_RECV;
 
-			qedi_clear_task_idx(qedi_conn->qedi, rtid);
-
 			spin_lock(&qedi_conn->list_lock);
 			if (likely(dbg_cmd->io_cmd_in_list)) {
 				dbg_cmd->io_cmd_in_list = false;
@@ -856,7 +844,6 @@ static void qedi_process_cmd_cleanup_resp(struct qedi_ctx *qedi,
 		QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_TID,
 			  "Freeing tid=0x%x for cid=0x%x\n",
 			  cqe->itid, qedi_conn->iscsi_conn_id);
-		qedi_clear_task_idx(qedi_conn->qedi, cqe->itid);
 
 	} else {
 		qedi_get_proto_itt(qedi, cqe->itid, &ptmp_itt);
diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c
index 08c05403cd72..d1da34a938da 100644
--- a/drivers/scsi/qedi/qedi_iscsi.c
+++ b/drivers/scsi/qedi/qedi_iscsi.c
@@ -772,7 +772,6 @@ static int qedi_mtask_xmit(struct iscsi_conn *conn, struct iscsi_task *task)
 	}
 
 	cmd->conn = conn->dd_data;
-	cmd->scsi_cmd = NULL;
 	return qedi_iscsi_send_generic_request(task);
 }
 
@@ -783,6 +782,10 @@ static int qedi_task_xmit(struct iscsi_task *task)
 	struct qedi_cmd *cmd = task->dd_data;
 	struct scsi_cmnd *sc = task->sc;
 
+	/* Clear now so in cleanup_task we know it didn't make it */
+	cmd->scsi_cmd = NULL;
+	cmd->task_id = -1;
+
 	if (test_bit(QEDI_IN_SHUTDOWN, &qedi_conn->qedi->flags))
 		return -ENODEV;
 
@@ -1383,13 +1386,24 @@ static umode_t qedi_attr_is_visible(int param_type, int param)
 
 static void qedi_cleanup_task(struct iscsi_task *task)
 {
-	if (!task->sc || task->state == ISCSI_TASK_PENDING) {
+	struct qedi_cmd *cmd;
+
+	if (task->state == ISCSI_TASK_PENDING) {
 		QEDI_INFO(NULL, QEDI_LOG_IO, "Returning ref_cnt=%d\n",
 			  refcount_read(&task->refcount));
 		return;
 	}
 
-	qedi_iscsi_unmap_sg_list(task->dd_data);
+	if (task->sc)
+		qedi_iscsi_unmap_sg_list(task->dd_data);
+
+	cmd = task->dd_data;
+	if (cmd->task_id != -1)
+		qedi_clear_task_idx(iscsi_host_priv(task->conn->session->host),
+				    cmd->task_id);
+
+	cmd->task_id = -1;
+	cmd->scsi_cmd = NULL;
 }
 
 struct iscsi_transport qedi_iscsi_transport = {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 05/13] scsi: qedi: fix use after free during abort cleanup
  2021-04-10 18:40 [RFC PATCH 0/13]: qedi tmf fixes Mike Christie
                   ` (3 preceding siblings ...)
  2021-04-10 18:40 ` [PATCH 04/13] scsi: qedi: fix abort vs cmd re-use race Mike Christie
@ 2021-04-10 18:40 ` Mike Christie
  2021-04-12 10:26   ` [EXT] " Manish Rangankar
  2021-04-10 18:40 ` [PATCH 06/13] scsi: qedi: fix tmf tid allocation Mike Christie
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Mike Christie @ 2021-04-10 18:40 UTC (permalink / raw)
  To: lduncan, martin.petersen, mrangankar, svernekar, linux-scsi, jejb
  Cc: Mike Christie

This fixes two bugs:

1. The scsi cmd task could be completed and the abort could timeout
while we are running qedi_tmf_work so we need to get a ref to the task.

2. If qedi_tmf_work's qedi_wait_for_cleanup_request call times out then
we will also force the clean up of the qedi_work_map but
qedi_process_cmd_cleanup_resp could still be accessing the qedi_cmd
for the abort tmf. We can then race where qedi_process_cmd_cleanup_resp is
still accessing the mtask's qedi_cmd but libiscsi has escalated to session
level cleanup and is cleaning up the mtask while we are still accessing it.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/qedi/qedi_fw.c    | 110 ++++++++++++++++++---------------
 drivers/scsi/qedi/qedi_iscsi.h |   1 +
 2 files changed, 61 insertions(+), 50 deletions(-)

diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c
index ad4357e4c15d..c5699421ec37 100644
--- a/drivers/scsi/qedi/qedi_fw.c
+++ b/drivers/scsi/qedi/qedi_fw.c
@@ -729,7 +729,6 @@ static void qedi_process_nopin_local_cmpl(struct qedi_ctx *qedi,
 
 static void qedi_process_cmd_cleanup_resp(struct qedi_ctx *qedi,
 					  struct iscsi_cqe_solicited *cqe,
-					  struct iscsi_task *task,
 					  struct iscsi_conn *conn)
 {
 	struct qedi_work_map *work, *work_tmp;
@@ -742,7 +741,7 @@ static void qedi_process_cmd_cleanup_resp(struct qedi_ctx *qedi,
 	u32 iscsi_cid;
 	struct qedi_conn *qedi_conn;
 	struct qedi_cmd *dbg_cmd;
-	struct iscsi_task *mtask;
+	struct iscsi_task *mtask, *task;
 	struct iscsi_tm *tmf_hdr = NULL;
 
 	iscsi_cid = cqe->conn_id;
@@ -768,6 +767,7 @@ static void qedi_process_cmd_cleanup_resp(struct qedi_ctx *qedi,
 			}
 			found = 1;
 			mtask = qedi_cmd->task;
+			task = work->ctask;
 			tmf_hdr = (struct iscsi_tm *)mtask->hdr;
 			rtid = work->rtid;
 
@@ -776,52 +776,47 @@ static void qedi_process_cmd_cleanup_resp(struct qedi_ctx *qedi,
 			qedi_cmd->list_tmf_work = NULL;
 		}
 	}
-	spin_unlock_bh(&qedi_conn->tmf_work_lock);
-
-	if (found) {
-		QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_SCSI_TM,
-			  "TMF work, cqe->tid=0x%x, tmf flags=0x%x, cid=0x%x\n",
-			  proto_itt, tmf_hdr->flags, qedi_conn->iscsi_conn_id);
-
-		if ((tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) ==
-		    ISCSI_TM_FUNC_ABORT_TASK) {
-			spin_lock_bh(&conn->session->back_lock);
 
-			protoitt = build_itt(get_itt(tmf_hdr->rtt),
-					     conn->session->age);
-			task = iscsi_itt_to_task(conn, protoitt);
-
-			spin_unlock_bh(&conn->session->back_lock);
+	if (!found) {
+		spin_unlock_bh(&qedi_conn->tmf_work_lock);
+		goto check_cleanup_reqs;
+	}
 
-			if (!task) {
-				QEDI_NOTICE(&qedi->dbg_ctx,
-					    "IO task completed, tmf rtt=0x%x, cid=0x%x\n",
-					    get_itt(tmf_hdr->rtt),
-					    qedi_conn->iscsi_conn_id);
-				return;
-			}
+	QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_SCSI_TM,
+		  "TMF work, cqe->tid=0x%x, tmf flags=0x%x, cid=0x%x\n",
+		  proto_itt, tmf_hdr->flags, qedi_conn->iscsi_conn_id);
+
+	spin_lock_bh(&conn->session->back_lock);
+	if (iscsi_task_is_completed(task)) {
+		QEDI_NOTICE(&qedi->dbg_ctx,
+			    "IO task completed, tmf rtt=0x%x, cid=0x%x\n",
+			   get_itt(tmf_hdr->rtt), qedi_conn->iscsi_conn_id);
+		goto unlock;
+	}
 
-			dbg_cmd = task->dd_data;
+	dbg_cmd = task->dd_data;
 
-			QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_SCSI_TM,
-				  "Abort tmf rtt=0x%x, i/o itt=0x%x, i/o tid=0x%x, cid=0x%x\n",
-				  get_itt(tmf_hdr->rtt), get_itt(task->itt),
-				  dbg_cmd->task_id, qedi_conn->iscsi_conn_id);
+	QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_SCSI_TM,
+		  "Abort tmf rtt=0x%x, i/o itt=0x%x, i/o tid=0x%x, cid=0x%x\n",
+		  get_itt(tmf_hdr->rtt), get_itt(task->itt), dbg_cmd->task_id,
+		  qedi_conn->iscsi_conn_id);
 
-			if (qedi_cmd->state == CLEANUP_WAIT_FAILED)
-				qedi_cmd->state = CLEANUP_RECV;
+	spin_lock(&qedi_conn->list_lock);
+	if (likely(dbg_cmd->io_cmd_in_list)) {
+		dbg_cmd->io_cmd_in_list = false;
+		list_del_init(&dbg_cmd->io_cmd);
+		qedi_conn->active_cmd_count--;
+	}
+	spin_unlock(&qedi_conn->list_lock);
+	qedi_cmd->state = CLEANUP_RECV;
+unlock:
+	spin_unlock_bh(&conn->session->back_lock);
+	spin_unlock_bh(&qedi_conn->tmf_work_lock);
+	wake_up_interruptible(&qedi_conn->wait_queue);
+	return;
 
-			spin_lock(&qedi_conn->list_lock);
-			if (likely(dbg_cmd->io_cmd_in_list)) {
-				dbg_cmd->io_cmd_in_list = false;
-				list_del_init(&dbg_cmd->io_cmd);
-				qedi_conn->active_cmd_count--;
-			}
-			spin_unlock(&qedi_conn->list_lock);
-			qedi_cmd->state = CLEANUP_RECV;
-			wake_up_interruptible(&qedi_conn->wait_queue);
-		}
-	} else if (qedi_conn->cmd_cleanup_req > 0) {
+check_cleanup_reqs:
+	if (qedi_conn->cmd_cleanup_req > 0) {
 		spin_lock_bh(&conn->session->back_lock);
 		qedi_get_proto_itt(qedi, cqe->itid, &ptmp_itt);
 		protoitt = build_itt(ptmp_itt, conn->session->age);
@@ -844,6 +839,7 @@ static void qedi_process_cmd_cleanup_resp(struct qedi_ctx *qedi,
 		QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_TID,
 			  "Freeing tid=0x%x for cid=0x%x\n",
 			  cqe->itid, qedi_conn->iscsi_conn_id);
+		qedi_clear_task_idx(qedi_conn->qedi, cqe->itid);
 
 	} else {
 		qedi_get_proto_itt(qedi, cqe->itid, &ptmp_itt);
@@ -946,8 +942,7 @@ void qedi_fp_process_cqes(struct qedi_work *work)
 		goto exit_fp_process;
 	case ISCSI_CQE_TYPE_TASK_CLEANUP:
 		QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_SCSI_TM, "CleanUp CqE\n");
-		qedi_process_cmd_cleanup_resp(qedi, &cqe->cqe_solicited, task,
-					      conn);
+		qedi_process_cmd_cleanup_resp(qedi, &cqe->cqe_solicited, conn);
 		goto exit_fp_process;
 	default:
 		QEDI_ERR(&qedi->dbg_ctx, "Error cqe.\n");
@@ -1374,12 +1369,22 @@ static void qedi_tmf_work(struct work_struct *work)
 	tmf_hdr = (struct iscsi_tm *)mtask->hdr;
 	set_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
 
-	ctask = iscsi_itt_to_task(conn, tmf_hdr->rtt);
-	if (!ctask || !ctask->sc) {
+	spin_lock_bh(&conn->session->back_lock);
+	ctask = iscsi_itt_to_ctask(conn, tmf_hdr->rtt);
+	if (!ctask || iscsi_task_is_completed(ctask)) {
+		spin_unlock_bh(&conn->session->back_lock);
 		QEDI_ERR(&qedi->dbg_ctx, "Task already completed\n");
-		goto abort_ret;
+		goto clear_cleanup;
 	}
 
+	/*
+	 * libiscsi gets a ref before sending the abort, but if libiscsi times
+	 * it out then it could release it and the cmd could complete from
+	 * under us.
+	 */
+	__iscsi_get_task(ctask);
+	spin_unlock_bh(&conn->session->back_lock);
+
 	cmd = (struct qedi_cmd *)ctask->dd_data;
 	QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_INFO,
 		  "Abort tmf rtt=0x%x, cmd itt=0x%x, cmd tid=0x%x, cid=0x%x\n",
@@ -1389,19 +1394,20 @@ static void qedi_tmf_work(struct work_struct *work)
 	if (qedi_do_not_recover) {
 		QEDI_ERR(&qedi->dbg_ctx, "DONT SEND CLEANUP/ABORT %d\n",
 			 qedi_do_not_recover);
-		goto abort_ret;
+		goto put_task;
 	}
 
 	list_work = kzalloc(sizeof(*list_work), GFP_ATOMIC);
 	if (!list_work) {
 		QEDI_ERR(&qedi->dbg_ctx, "Memory allocation failed\n");
-		goto abort_ret;
+		goto put_task;
 	}
 
 	qedi_cmd->type = TYPEIO;
 	list_work->qedi_cmd = qedi_cmd;
 	list_work->rtid = cmd->task_id;
 	list_work->state = QEDI_WORK_SCHEDULED;
+	list_work->ctask = ctask;
 	qedi_cmd->list_tmf_work = list_work;
 
 	QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_SCSI_TM,
@@ -1434,7 +1440,9 @@ static void qedi_tmf_work(struct work_struct *work)
 	qedi_cmd->task_id = tid;
 	qedi_send_iscsi_tmf(qedi_conn, qedi_cmd->task);
 
-abort_ret:
+put_task:
+	iscsi_put_task(ctask);
+clear_cleanup:
 	clear_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
 	return;
 
@@ -1455,6 +1463,8 @@ static void qedi_tmf_work(struct work_struct *work)
 	}
 	spin_unlock(&qedi_conn->list_lock);
 
+	iscsi_put_task(ctask);
+
 	clear_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
 }
 
diff --git a/drivers/scsi/qedi/qedi_iscsi.h b/drivers/scsi/qedi/qedi_iscsi.h
index 39dc27c85e3c..68ef519f5480 100644
--- a/drivers/scsi/qedi/qedi_iscsi.h
+++ b/drivers/scsi/qedi/qedi_iscsi.h
@@ -212,6 +212,7 @@ struct qedi_cmd {
 struct qedi_work_map {
 	struct list_head list;
 	struct qedi_cmd *qedi_cmd;
+	struct iscsi_task *ctask;
 	int rtid;
 
 	int state;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 06/13] scsi: qedi: fix tmf tid allocation
  2021-04-10 18:40 [RFC PATCH 0/13]: qedi tmf fixes Mike Christie
                   ` (4 preceding siblings ...)
  2021-04-10 18:40 ` [PATCH 05/13] scsi: qedi: fix use after free during abort cleanup Mike Christie
@ 2021-04-10 18:40 ` Mike Christie
  2021-04-12 10:27   ` [EXT] " Manish Rangankar
  2021-04-10 18:40 ` [PATCH 07/13] scsi: qedi: use GFP_NOIO for tmf allocation Mike Christie
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Mike Christie @ 2021-04-10 18:40 UTC (permalink / raw)
  To: lduncan, martin.petersen, mrangankar, svernekar, linux-scsi, jejb
  Cc: Mike Christie

qedi_iscsi_abort_work and qedi_tmf_work allocates a tid then calls
qedi_send_iscsi_tmf which also allcoates a tid. This removes the tid
allocation from the callers.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/qedi/qedi_fw.c    | 76 ++++++++++------------------------
 drivers/scsi/qedi/qedi_gbl.h   |  3 +-
 drivers/scsi/qedi/qedi_iscsi.c |  2 +-
 3 files changed, 25 insertions(+), 56 deletions(-)

diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c
index c5699421ec37..542255c94d96 100644
--- a/drivers/scsi/qedi/qedi_fw.c
+++ b/drivers/scsi/qedi/qedi_fw.c
@@ -14,8 +14,8 @@
 #include "qedi_fw_iscsi.h"
 #include "qedi_fw_scsi.h"
 
-static int qedi_send_iscsi_tmf(struct qedi_conn *qedi_conn,
-			       struct iscsi_task *mtask);
+static int send_iscsi_tmf(struct qedi_conn *qedi_conn,
+			  struct iscsi_task *mtask);
 
 void qedi_iscsi_unmap_sg_list(struct qedi_cmd *cmd)
 {
@@ -1350,7 +1350,7 @@ static int qedi_wait_for_cleanup_request(struct qedi_ctx *qedi,
 	return 0;
 }
 
-static void qedi_tmf_work(struct work_struct *work)
+static void qedi_abort_work(struct work_struct *work)
 {
 	struct qedi_cmd *qedi_cmd =
 		container_of(work, struct qedi_cmd, tmf_work);
@@ -1363,7 +1363,6 @@ static void qedi_tmf_work(struct work_struct *work)
 	struct iscsi_task *ctask;
 	struct iscsi_tm *tmf_hdr;
 	s16 rval = 0;
-	s16 tid = 0;
 
 	mtask = qedi_cmd->task;
 	tmf_hdr = (struct iscsi_tm *)mtask->hdr;
@@ -1404,6 +1403,7 @@ static void qedi_tmf_work(struct work_struct *work)
 	}
 
 	qedi_cmd->type = TYPEIO;
+	qedi_cmd->state = CLEANUP_WAIT;
 	list_work->qedi_cmd = qedi_cmd;
 	list_work->rtid = cmd->task_id;
 	list_work->state = QEDI_WORK_SCHEDULED;
@@ -1430,15 +1430,7 @@ static void qedi_tmf_work(struct work_struct *work)
 		goto ldel_exit;
 	}
 
-	tid = qedi_get_task_idx(qedi);
-	if (tid == -1) {
-		QEDI_ERR(&qedi->dbg_ctx, "Invalid tid, cid=0x%x\n",
-			 qedi_conn->iscsi_conn_id);
-		goto ldel_exit;
-	}
-
-	qedi_cmd->task_id = tid;
-	qedi_send_iscsi_tmf(qedi_conn, qedi_cmd->task);
+	send_iscsi_tmf(qedi_conn, qedi_cmd->task);
 
 put_task:
 	iscsi_put_task(ctask);
@@ -1468,8 +1460,7 @@ static void qedi_tmf_work(struct work_struct *work)
 	clear_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
 }
 
-static int qedi_send_iscsi_tmf(struct qedi_conn *qedi_conn,
-			       struct iscsi_task *mtask)
+static int send_iscsi_tmf(struct qedi_conn *qedi_conn, struct iscsi_task *mtask)
 {
 	struct iscsi_tmf_request_hdr tmf_pdu_header;
 	struct iscsi_task_params task_params;
@@ -1484,7 +1475,6 @@ static int qedi_send_iscsi_tmf(struct qedi_conn *qedi_conn,
 	u32 scsi_lun[2];
 	s16 tid = 0;
 	u16 sq_idx = 0;
-	int rval = 0;
 
 	tmf_hdr = (struct iscsi_tm *)mtask->hdr;
 	qedi_cmd = (struct qedi_cmd *)mtask->dd_data;
@@ -1548,10 +1538,7 @@ static int qedi_send_iscsi_tmf(struct qedi_conn *qedi_conn,
 	task_params.sqe = &ep->sq[sq_idx];
 
 	memset(task_params.sqe, 0, sizeof(struct iscsi_wqe));
-	rval = init_initiator_tmf_request_task(&task_params,
-					       &tmf_pdu_header);
-	if (rval)
-		return -1;
+	init_initiator_tmf_request_task(&task_params, &tmf_pdu_header);
 
 	spin_lock(&qedi_conn->list_lock);
 	list_add_tail(&qedi_cmd->io_cmd, &qedi_conn->active_cmd_list);
@@ -1563,47 +1550,30 @@ static int qedi_send_iscsi_tmf(struct qedi_conn *qedi_conn,
 	return 0;
 }
 
-int qedi_iscsi_abort_work(struct qedi_conn *qedi_conn,
-			  struct iscsi_task *mtask)
+int qedi_send_iscsi_tmf(struct qedi_conn *qedi_conn, struct iscsi_task *mtask)
 {
+	struct iscsi_tm *tmf_hdr = (struct iscsi_tm *)mtask->hdr;
+	struct qedi_cmd *qedi_cmd = mtask->dd_data;
 	struct qedi_ctx *qedi = qedi_conn->qedi;
-	struct iscsi_tm *tmf_hdr;
-	struct qedi_cmd *qedi_cmd = (struct qedi_cmd *)mtask->dd_data;
-	s16 tid = 0;
+	int rc = 0;
 
-	tmf_hdr = (struct iscsi_tm *)mtask->hdr;
-	qedi_cmd->task = mtask;
-
-	/* If abort task then schedule the work and return */
-	if ((tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) ==
-	    ISCSI_TM_FUNC_ABORT_TASK) {
-		qedi_cmd->state = CLEANUP_WAIT;
-		INIT_WORK(&qedi_cmd->tmf_work, qedi_tmf_work);
+	switch (tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) {
+	case ISCSI_TM_FUNC_ABORT_TASK:
+		INIT_WORK(&qedi_cmd->tmf_work, qedi_abort_work);
 		queue_work(qedi->tmf_thread, &qedi_cmd->tmf_work);
-
-	} else if (((tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) ==
-		    ISCSI_TM_FUNC_LOGICAL_UNIT_RESET) ||
-		   ((tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) ==
-		    ISCSI_TM_FUNC_TARGET_WARM_RESET) ||
-		   ((tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) ==
-		    ISCSI_TM_FUNC_TARGET_COLD_RESET)) {
-		tid = qedi_get_task_idx(qedi);
-		if (tid == -1) {
-			QEDI_ERR(&qedi->dbg_ctx, "Invalid tid, cid=0x%x\n",
-				 qedi_conn->iscsi_conn_id);
-			return -1;
-		}
-		qedi_cmd->task_id = tid;
-
-		qedi_send_iscsi_tmf(qedi_conn, qedi_cmd->task);
-
-	} else {
+		break;
+	case ISCSI_TM_FUNC_LOGICAL_UNIT_RESET:
+	case ISCSI_TM_FUNC_TARGET_WARM_RESET:
+	case ISCSI_TM_FUNC_TARGET_COLD_RESET:
+		rc = send_iscsi_tmf(qedi_conn, mtask);
+		break;
+	default:
 		QEDI_ERR(&qedi->dbg_ctx, "Invalid tmf, cid=0x%x\n",
 			 qedi_conn->iscsi_conn_id);
-		return -1;
+		return -EINVAL;
 	}
 
-	return 0;
+	return rc;
 }
 
 int qedi_send_iscsi_text(struct qedi_conn *qedi_conn,
diff --git a/drivers/scsi/qedi/qedi_gbl.h b/drivers/scsi/qedi/qedi_gbl.h
index 116645c08c71..fb44a282613e 100644
--- a/drivers/scsi/qedi/qedi_gbl.h
+++ b/drivers/scsi/qedi/qedi_gbl.h
@@ -31,8 +31,7 @@ int qedi_send_iscsi_login(struct qedi_conn *qedi_conn,
 			  struct iscsi_task *task);
 int qedi_send_iscsi_logout(struct qedi_conn *qedi_conn,
 			   struct iscsi_task *task);
-int qedi_iscsi_abort_work(struct qedi_conn *qedi_conn,
-			  struct iscsi_task *mtask);
+int qedi_send_iscsi_tmf(struct qedi_conn *qedi_conn, struct iscsi_task *mtask);
 int qedi_send_iscsi_text(struct qedi_conn *qedi_conn,
 			 struct iscsi_task *task);
 int qedi_send_iscsi_nopout(struct qedi_conn *qedi_conn,
diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c
index d1da34a938da..821225f9beb0 100644
--- a/drivers/scsi/qedi/qedi_iscsi.c
+++ b/drivers/scsi/qedi/qedi_iscsi.c
@@ -742,7 +742,7 @@ static int qedi_iscsi_send_generic_request(struct iscsi_task *task)
 		rc = qedi_send_iscsi_logout(qedi_conn, task);
 		break;
 	case ISCSI_OP_SCSI_TMFUNC:
-		rc = qedi_iscsi_abort_work(qedi_conn, task);
+		rc = qedi_send_iscsi_tmf(qedi_conn, task);
 		break;
 	case ISCSI_OP_TEXT:
 		rc = qedi_send_iscsi_text(qedi_conn, task);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 07/13] scsi: qedi: use GFP_NOIO for tmf allocation
  2021-04-10 18:40 [RFC PATCH 0/13]: qedi tmf fixes Mike Christie
                   ` (5 preceding siblings ...)
  2021-04-10 18:40 ` [PATCH 06/13] scsi: qedi: fix tmf tid allocation Mike Christie
@ 2021-04-10 18:40 ` Mike Christie
  2021-04-12 10:28   ` [EXT] " Manish Rangankar
  2021-04-10 18:40 ` [PATCH 08/13] scsi: iscsi: fix lun/target reset cmd cleanup handling Mike Christie
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Mike Christie @ 2021-04-10 18:40 UTC (permalink / raw)
  To: lduncan, martin.petersen, mrangankar, svernekar, linux-scsi, jejb
  Cc: Mike Christie

We run from a workqueue with no locks held so use GFP_NOIO.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/qedi/qedi_fw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c
index 542255c94d96..84402e827d42 100644
--- a/drivers/scsi/qedi/qedi_fw.c
+++ b/drivers/scsi/qedi/qedi_fw.c
@@ -1396,7 +1396,7 @@ static void qedi_abort_work(struct work_struct *work)
 		goto put_task;
 	}
 
-	list_work = kzalloc(sizeof(*list_work), GFP_ATOMIC);
+	list_work = kzalloc(sizeof(*list_work), GFP_NOIO);
 	if (!list_work) {
 		QEDI_ERR(&qedi->dbg_ctx, "Memory allocation failed\n");
 		goto put_task;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 08/13] scsi: iscsi: fix lun/target reset cmd cleanup handling
  2021-04-10 18:40 [RFC PATCH 0/13]: qedi tmf fixes Mike Christie
                   ` (6 preceding siblings ...)
  2021-04-10 18:40 ` [PATCH 07/13] scsi: qedi: use GFP_NOIO for tmf allocation Mike Christie
@ 2021-04-10 18:40 ` Mike Christie
  2021-04-10 18:40 ` [PATCH 09/13] scsi: qedi: fix session block/unblock abuse during tmf handling Mike Christie
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Mike Christie @ 2021-04-10 18:40 UTC (permalink / raw)
  To: lduncan, martin.petersen, mrangankar, svernekar, linux-scsi, jejb
  Cc: Mike Christie

If we are handling a sg io reset new cmds can be queued to the driver
while we are executing the reset. If the TMF completes ok, then when
we cleanup the running tasks the driver and libiscsi might disagree
on what commands were running, and one layer might cleanup cmds the
other didn't. This has us block our queuecommand while we process
the reset, so both layers know what cmds are running.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/libiscsi.c | 27 +++++++++++++++++++++------
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 04633e5157e9..c68afcca51a0 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -1922,12 +1922,11 @@ static void fail_scsi_tasks(struct iscsi_conn *conn, u64 lun, int error)
  * iscsi_suspend_queue - suspend iscsi_queuecommand
  * @conn: iscsi conn to stop queueing IO on
  *
- * This grabs the session frwd_lock to make sure no one is in
- * xmit_task/queuecommand, and then sets suspend to prevent
- * new commands from being queued. This only needs to be called
- * by offload drivers that need to sync a path like ep disconnect
- * with the iscsi_queuecommand/xmit_task. To start IO again libiscsi
- * will call iscsi_start_tx and iscsi_unblock_session when in FFP.
+ * This grabs the session frwd_lock to make sure no one is in queuecommand, and
+ * then sets suspend to prevent new commands from being queued. This only needs
+ * to be called by libiscsi or offload drivers that need to sync a path like
+ * ep disconnect with the iscsi_queuecommand. To start IO again libiscsi will
+ * call iscsi_start_tx and/or iscsi_unblock_session when in FFP.
  */
 void iscsi_suspend_queue(struct iscsi_conn *conn)
 {
@@ -2365,6 +2364,13 @@ int iscsi_eh_device_reset(struct scsi_cmnd *sc)
 	if (conn->tmf_state != TMF_INITIAL)
 		goto unlock;
 	conn->tmf_state = TMF_QUEUED;
+	/*
+	 * For sg based ioctls stop IO so we can sync up with the drivers
+	 * on what commands need to be cleaned up. If this function fails
+	 * we can leave suspended since the session will be droppped or
+	 * has already dropped.
+	 */
+	set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx);
 
 	hdr = &conn->tmhdr;
 	iscsi_prep_lun_reset_pdu(sc, hdr);
@@ -2384,6 +2390,7 @@ int iscsi_eh_device_reset(struct scsi_cmnd *sc)
 		goto done;
 	default:
 		conn->tmf_state = TMF_INITIAL;
+		iscsi_start_tx(conn);
 		goto unlock;
 	}
 
@@ -2528,6 +2535,13 @@ static int iscsi_eh_target_reset(struct scsi_cmnd *sc)
 	if (conn->tmf_state != TMF_INITIAL)
 		goto unlock;
 	conn->tmf_state = TMF_QUEUED;
+	/*
+	 * For sg based ioctls stop IO so we can sync up with the drivers
+	 * on what commands need to be cleaned up. If this function fails
+	 * we can leave suspended since the session will be droppped or
+	 * has already dropped.
+	 */
+	set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx);
 
 	hdr = &conn->tmhdr;
 	iscsi_prep_tgt_reset_pdu(sc, hdr);
@@ -2547,6 +2561,7 @@ static int iscsi_eh_target_reset(struct scsi_cmnd *sc)
 		goto done;
 	default:
 		conn->tmf_state = TMF_INITIAL;
+		iscsi_start_tx(conn);
 		goto unlock;
 	}
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 09/13] scsi: qedi: fix session block/unblock abuse during tmf handling
  2021-04-10 18:40 [RFC PATCH 0/13]: qedi tmf fixes Mike Christie
                   ` (7 preceding siblings ...)
  2021-04-10 18:40 ` [PATCH 08/13] scsi: iscsi: fix lun/target reset cmd cleanup handling Mike Christie
@ 2021-04-10 18:40 ` Mike Christie
  2021-04-12 10:31   ` [EXT] " Manish Rangankar
  2021-04-10 18:40 ` [PATCH 10/13] scsi: qedi: fix session block/unblock abuse during cleanup Mike Christie
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Mike Christie @ 2021-04-10 18:40 UTC (permalink / raw)
  To: lduncan, martin.petersen, mrangankar, svernekar, linux-scsi, jejb
  Cc: Mike Christie

Drivers shouldn't be calling block/unblock session for tmf handling
because the functions can change the session state from under libiscsi.
We now block the session for the drivers during tmf processing, so we can
remove these calls.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/qedi/qedi_fw.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c
index 84402e827d42..f13f8af6d931 100644
--- a/drivers/scsi/qedi/qedi_fw.c
+++ b/drivers/scsi/qedi/qedi_fw.c
@@ -159,14 +159,9 @@ static void qedi_tmf_resp_work(struct work_struct *work)
 	set_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
 	resp_hdr_ptr =  (struct iscsi_tm_rsp *)qedi_cmd->tmf_resp_buf;
 
-	iscsi_block_session(session->cls_session);
 	rval = qedi_cleanup_all_io(qedi, qedi_conn, qedi_cmd->task, true);
-	if (rval) {
-		iscsi_unblock_session(session->cls_session);
+	if (rval)
 		goto exit_tmf_resp;
-	}
-
-	iscsi_unblock_session(session->cls_session);
 
 	spin_lock(&session->back_lock);
 	__iscsi_complete_pdu(conn, (struct iscsi_hdr *)resp_hdr_ptr, NULL, 0);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 10/13] scsi: qedi: fix session block/unblock abuse during cleanup
  2021-04-10 18:40 [RFC PATCH 0/13]: qedi tmf fixes Mike Christie
                   ` (8 preceding siblings ...)
  2021-04-10 18:40 ` [PATCH 09/13] scsi: qedi: fix session block/unblock abuse during tmf handling Mike Christie
@ 2021-04-10 18:40 ` Mike Christie
  2021-04-12 10:33   ` [EXT] " Manish Rangankar
  2021-04-10 18:40 ` [PATCH 11/13] scsi: qedi: pass send_iscsi_tmf task to abort Mike Christie
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Mike Christie @ 2021-04-10 18:40 UTC (permalink / raw)
  To: lduncan, martin.petersen, mrangankar, svernekar, linux-scsi, jejb
  Cc: Mike Christie

Drivers shouldn't be calling block/unblock session for cmd cleanup
because the functions can change the session state from under libiscsi.
This adds a new a driver level bit so it can block all IO the host
while it drains the card.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/qedi/qedi.h       |  1 +
 drivers/scsi/qedi/qedi_iscsi.c | 17 +++++++++++++++--
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/qedi/qedi.h b/drivers/scsi/qedi/qedi.h
index c342defc3f52..ce199a7a16b8 100644
--- a/drivers/scsi/qedi/qedi.h
+++ b/drivers/scsi/qedi/qedi.h
@@ -284,6 +284,7 @@ struct qedi_ctx {
 #define QEDI_IN_RECOVERY	5
 #define QEDI_IN_OFFLINE		6
 #define QEDI_IN_SHUTDOWN	7
+#define QEDI_BLOCK_IO		8
 
 	u8 mac[ETH_ALEN];
 	u32 src_ip[4];
diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c
index 821225f9beb0..536d6653ef8e 100644
--- a/drivers/scsi/qedi/qedi_iscsi.c
+++ b/drivers/scsi/qedi/qedi_iscsi.c
@@ -330,12 +330,22 @@ qedi_conn_create(struct iscsi_cls_session *cls_session, uint32_t cid)
 
 void qedi_mark_device_missing(struct iscsi_cls_session *cls_session)
 {
-	iscsi_block_session(cls_session);
+	struct iscsi_session *session = cls_session->dd_data;
+	struct qedi_conn *qedi_conn = session->leadconn->dd_data;
+
+	spin_lock_bh(&session->frwd_lock);
+	set_bit(QEDI_BLOCK_IO, &qedi_conn->qedi->flags);
+	spin_unlock_bh(&session->frwd_lock);
 }
 
 void qedi_mark_device_available(struct iscsi_cls_session *cls_session)
 {
-	iscsi_unblock_session(cls_session);
+	struct iscsi_session *session = cls_session->dd_data;
+	struct qedi_conn *qedi_conn = session->leadconn->dd_data;
+
+	spin_lock_bh(&session->frwd_lock);
+	clear_bit(QEDI_BLOCK_IO, &qedi_conn->qedi->flags);
+	spin_unlock_bh(&session->frwd_lock);
 }
 
 static int qedi_bind_conn_to_iscsi_cid(struct qedi_ctx *qedi,
@@ -789,6 +799,9 @@ static int qedi_task_xmit(struct iscsi_task *task)
 	if (test_bit(QEDI_IN_SHUTDOWN, &qedi_conn->qedi->flags))
 		return -ENODEV;
 
+	if (test_bit(QEDI_BLOCK_IO, &qedi_conn->qedi->flags))
+		return -EACCES;
+
 	cmd->state = 0;
 	cmd->task = NULL;
 	cmd->use_slowpath = false;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 11/13] scsi: qedi: pass send_iscsi_tmf task to abort
  2021-04-10 18:40 [RFC PATCH 0/13]: qedi tmf fixes Mike Christie
                   ` (9 preceding siblings ...)
  2021-04-10 18:40 ` [PATCH 10/13] scsi: qedi: fix session block/unblock abuse during cleanup Mike Christie
@ 2021-04-10 18:40 ` Mike Christie
  2021-04-12 10:34   ` [EXT] " Manish Rangankar
  2021-04-10 18:40 ` [PATCH 12/13] scsi: qedi: make sure tmf works are done before disconnect Mike Christie
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Mike Christie @ 2021-04-10 18:40 UTC (permalink / raw)
  To: lduncan, martin.petersen, mrangankar, svernekar, linux-scsi, jejb
  Cc: Mike Christie

qedi_abort_work knows what task to abort so just pass it to send_iscsi_tmf.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/qedi/qedi_fw.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c
index f13f8af6d931..475cb7823cf1 100644
--- a/drivers/scsi/qedi/qedi_fw.c
+++ b/drivers/scsi/qedi/qedi_fw.c
@@ -15,7 +15,7 @@
 #include "qedi_fw_scsi.h"
 
 static int send_iscsi_tmf(struct qedi_conn *qedi_conn,
-			  struct iscsi_task *mtask);
+			  struct iscsi_task *mtask, struct iscsi_task *ctask);
 
 void qedi_iscsi_unmap_sg_list(struct qedi_cmd *cmd)
 {
@@ -1425,7 +1425,7 @@ static void qedi_abort_work(struct work_struct *work)
 		goto ldel_exit;
 	}
 
-	send_iscsi_tmf(qedi_conn, qedi_cmd->task);
+	send_iscsi_tmf(qedi_conn, qedi_cmd->task, ctask);
 
 put_task:
 	iscsi_put_task(ctask);
@@ -1455,14 +1455,13 @@ static void qedi_abort_work(struct work_struct *work)
 	clear_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
 }
 
-static int send_iscsi_tmf(struct qedi_conn *qedi_conn, struct iscsi_task *mtask)
+static int send_iscsi_tmf(struct qedi_conn *qedi_conn, struct iscsi_task *mtask,
+			  struct iscsi_task *ctask)
 {
 	struct iscsi_tmf_request_hdr tmf_pdu_header;
 	struct iscsi_task_params task_params;
 	struct qedi_ctx *qedi = qedi_conn->qedi;
 	struct e4_iscsi_task_context *fw_task_ctx;
-	struct iscsi_conn *conn = qedi_conn->cls_conn->dd_data;
-	struct iscsi_task *ctask;
 	struct iscsi_tm *tmf_hdr;
 	struct qedi_cmd *qedi_cmd;
 	struct qedi_cmd *cmd;
@@ -1502,12 +1501,6 @@ static int send_iscsi_tmf(struct qedi_conn *qedi_conn, struct iscsi_task *mtask)
 
 	if ((tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) ==
 	     ISCSI_TM_FUNC_ABORT_TASK) {
-		ctask = iscsi_itt_to_task(conn, tmf_hdr->rtt);
-		if (!ctask || !ctask->sc) {
-			QEDI_ERR(&qedi->dbg_ctx,
-				 "Could not get reference task\n");
-			return 0;
-		}
 		cmd = (struct qedi_cmd *)ctask->dd_data;
 		tmf_pdu_header.rtt =
 				qedi_set_itt(cmd->task_id,
@@ -1560,7 +1553,7 @@ int qedi_send_iscsi_tmf(struct qedi_conn *qedi_conn, struct iscsi_task *mtask)
 	case ISCSI_TM_FUNC_LOGICAL_UNIT_RESET:
 	case ISCSI_TM_FUNC_TARGET_WARM_RESET:
 	case ISCSI_TM_FUNC_TARGET_COLD_RESET:
-		rc = send_iscsi_tmf(qedi_conn, mtask);
+		rc = send_iscsi_tmf(qedi_conn, mtask, NULL);
 		break;
 	default:
 		QEDI_ERR(&qedi->dbg_ctx, "Invalid tmf, cid=0x%x\n",
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 12/13] scsi: qedi: make sure tmf works are done before disconnect
  2021-04-10 18:40 [RFC PATCH 0/13]: qedi tmf fixes Mike Christie
                   ` (10 preceding siblings ...)
  2021-04-10 18:40 ` [PATCH 11/13] scsi: qedi: pass send_iscsi_tmf task to abort Mike Christie
@ 2021-04-10 18:40 ` Mike Christie
  2021-04-12 10:36   ` [EXT] " Manish Rangankar
  2021-04-10 18:40 ` [PATCH 13/13] scsi: qedi: always wake up if cmd_cleanup_req is set Mike Christie
  2021-04-12 10:22 ` [EXT] [RFC PATCH 0/13]: qedi tmf fixes Manish Rangankar
  13 siblings, 1 reply; 26+ messages in thread
From: Mike Christie @ 2021-04-10 18:40 UTC (permalink / raw)
  To: lduncan, martin.petersen, mrangankar, svernekar, linux-scsi, jejb
  Cc: Mike Christie

We need to make sure that abort and reset completion work has completed
before ep_disconnect returns. After ep_disconnect we can't manipulate
cmds because libiscsi will call conn_stop and take onwership.

We are trying to make sure abort work and reset completion work has
completed before we do the cmd clean up in ep_disconnect. The problem is
that:

1. the work function sets the QEDI_CONN_FW_CLEANUP bit, so if the work
was still pending we would not see the bit set. We need to do this before
the work is queued.

2. If we had multiple works queued then we could break from the loop in
qedi_ep_disconnect early because when abort work 1 completes it could
clear QEDI_CONN_FW_CLEANUP. qedi_ep_disconnect could then see that before
work 2 has run.

3. A tmf reset completion work could run after ep_disconnect starts
cleaning up cmds via qedi_clearsq. ep_disconnect's call to qedi_clearsq
-> qedi_cleanup_all_io would might think it's done cleaning up cmds,
but the reset completion work could still be running. We then return
from ep_disconnect while still doing cleanup.

This replaces the bit with a counter and adds a bool to prevent new
works from starting from the completion path once a ep_disconnect starts.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/qedi/qedi_fw.c    | 46 +++++++++++++++++++++-------------
 drivers/scsi/qedi/qedi_iscsi.c | 23 +++++++++++------
 drivers/scsi/qedi/qedi_iscsi.h |  4 +--
 3 files changed, 47 insertions(+), 26 deletions(-)

diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c
index 475cb7823cf1..13dd06915d74 100644
--- a/drivers/scsi/qedi/qedi_fw.c
+++ b/drivers/scsi/qedi/qedi_fw.c
@@ -156,7 +156,6 @@ static void qedi_tmf_resp_work(struct work_struct *work)
 	struct iscsi_tm_rsp *resp_hdr_ptr;
 	int rval = 0;
 
-	set_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
 	resp_hdr_ptr =  (struct iscsi_tm_rsp *)qedi_cmd->tmf_resp_buf;
 
 	rval = qedi_cleanup_all_io(qedi, qedi_conn, qedi_cmd->task, true);
@@ -169,7 +168,10 @@ static void qedi_tmf_resp_work(struct work_struct *work)
 
 exit_tmf_resp:
 	kfree(resp_hdr_ptr);
-	clear_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
+
+	spin_lock(&qedi_conn->tmf_work_lock);
+	qedi_conn->fw_cleanup_works--;
+	spin_unlock(&qedi_conn->tmf_work_lock);
 }
 
 static void qedi_process_tmf_resp(struct qedi_ctx *qedi,
@@ -225,16 +227,25 @@ static void qedi_process_tmf_resp(struct qedi_ctx *qedi,
 	}
 	spin_unlock(&qedi_conn->list_lock);
 
-	if (((tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) ==
-	      ISCSI_TM_FUNC_LOGICAL_UNIT_RESET) ||
-	    ((tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) ==
-	      ISCSI_TM_FUNC_TARGET_WARM_RESET) ||
-	    ((tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) ==
-	      ISCSI_TM_FUNC_TARGET_COLD_RESET)) {
+	spin_lock(&qedi_conn->tmf_work_lock);
+	switch (tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) {
+	case ISCSI_TM_FUNC_LOGICAL_UNIT_RESET:
+	case ISCSI_TM_FUNC_TARGET_WARM_RESET:
+	case ISCSI_TM_FUNC_TARGET_COLD_RESET:
+		if (qedi_conn->ep_disconnect_starting) {
+			/* Session is down so ep_disconnect will clean up */
+			spin_unlock(&qedi_conn->tmf_work_lock);
+			goto unblock_sess;
+		}
+
+		qedi_conn->fw_cleanup_works++;
+		spin_unlock(&qedi_conn->tmf_work_lock);
+
 		INIT_WORK(&qedi_cmd->tmf_work, qedi_tmf_resp_work);
 		queue_work(qedi->tmf_thread, &qedi_cmd->tmf_work);
 		goto unblock_sess;
 	}
+	spin_unlock(&qedi_conn->tmf_work_lock);
 
 	__iscsi_complete_pdu(conn, (struct iscsi_hdr *)resp_hdr_ptr, NULL, 0);
 	kfree(resp_hdr_ptr);
@@ -1361,7 +1372,6 @@ static void qedi_abort_work(struct work_struct *work)
 
 	mtask = qedi_cmd->task;
 	tmf_hdr = (struct iscsi_tm *)mtask->hdr;
-	set_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
 
 	spin_lock_bh(&conn->session->back_lock);
 	ctask = iscsi_itt_to_ctask(conn, tmf_hdr->rtt);
@@ -1427,11 +1437,7 @@ static void qedi_abort_work(struct work_struct *work)
 
 	send_iscsi_tmf(qedi_conn, qedi_cmd->task, ctask);
 
-put_task:
-	iscsi_put_task(ctask);
-clear_cleanup:
-	clear_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
-	return;
+	goto put_task;
 
 ldel_exit:
 	spin_lock_bh(&qedi_conn->tmf_work_lock);
@@ -1449,10 +1455,12 @@ static void qedi_abort_work(struct work_struct *work)
 		qedi_conn->active_cmd_count--;
 	}
 	spin_unlock(&qedi_conn->list_lock);
-
+put_task:
 	iscsi_put_task(ctask);
-
-	clear_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
+clear_cleanup:
+	spin_lock(&qedi_conn->tmf_work_lock);
+	qedi_conn->fw_cleanup_works--;
+	spin_unlock(&qedi_conn->tmf_work_lock);
 }
 
 static int send_iscsi_tmf(struct qedi_conn *qedi_conn, struct iscsi_task *mtask,
@@ -1547,6 +1555,10 @@ int qedi_send_iscsi_tmf(struct qedi_conn *qedi_conn, struct iscsi_task *mtask)
 
 	switch (tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) {
 	case ISCSI_TM_FUNC_ABORT_TASK:
+		spin_lock(&qedi_conn->tmf_work_lock);
+		qedi_conn->fw_cleanup_works++;
+		spin_unlock(&qedi_conn->tmf_work_lock);
+
 		INIT_WORK(&qedi_cmd->tmf_work, qedi_abort_work);
 		queue_work(qedi->tmf_thread, &qedi_cmd->tmf_work);
 		break;
diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c
index 536d6653ef8e..8bbdd45ff2a1 100644
--- a/drivers/scsi/qedi/qedi_iscsi.c
+++ b/drivers/scsi/qedi/qedi_iscsi.c
@@ -592,7 +592,11 @@ static int qedi_conn_start(struct iscsi_cls_conn *cls_conn)
 		goto start_err;
 	}
 
-	clear_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
+	spin_lock(&qedi_conn->tmf_work_lock);
+	qedi_conn->fw_cleanup_works = 0;
+	qedi_conn->ep_disconnect_starting = false;
+	spin_unlock(&qedi_conn->tmf_work_lock);
+
 	qedi_conn->abrt_conn = 0;
 
 	rval = iscsi_conn_start(cls_conn);
@@ -1009,7 +1013,6 @@ static void qedi_ep_disconnect(struct iscsi_endpoint *ep)
 	int ret = 0;
 	int wait_delay;
 	int abrt_conn = 0;
-	int count = 10;
 
 	wait_delay = 60 * HZ + DEF_MAX_RT_TIME;
 	qedi_ep = ep->dd_data;
@@ -1027,13 +1030,19 @@ static void qedi_ep_disconnect(struct iscsi_endpoint *ep)
 		iscsi_suspend_queue(conn);
 		abrt_conn = qedi_conn->abrt_conn;
 
-		while (count--)	{
-			if (!test_bit(QEDI_CONN_FW_CLEANUP,
-				      &qedi_conn->flags)) {
-				break;
-			}
+		QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_INFO,
+			  "cid=0x%x qedi_ep=%p waiting for %d tmfs\n",
+			  qedi_ep->iscsi_cid, qedi_ep,
+			  qedi_conn->fw_cleanup_works);
+
+		spin_lock(&qedi_conn->tmf_work_lock);
+		qedi_conn->ep_disconnect_starting = true;
+		while (qedi_conn->fw_cleanup_works > 0) {
+			spin_unlock(&qedi_conn->tmf_work_lock);
 			msleep(1000);
+			spin_lock(&qedi_conn->tmf_work_lock);
 		}
+		spin_unlock(&qedi_conn->tmf_work_lock);
 
 		if (test_bit(QEDI_IN_RECOVERY, &qedi->flags)) {
 			if (qedi_do_not_recover) {
diff --git a/drivers/scsi/qedi/qedi_iscsi.h b/drivers/scsi/qedi/qedi_iscsi.h
index 68ef519f5480..758735209e15 100644
--- a/drivers/scsi/qedi/qedi_iscsi.h
+++ b/drivers/scsi/qedi/qedi_iscsi.h
@@ -169,8 +169,8 @@ struct qedi_conn {
 	struct list_head tmf_work_list;
 	wait_queue_head_t wait_queue;
 	spinlock_t tmf_work_lock;	/* tmf work lock */
-	unsigned long flags;
-#define QEDI_CONN_FW_CLEANUP	1
+	bool ep_disconnect_starting;
+	int fw_cleanup_works;
 };
 
 struct qedi_cmd {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 13/13] scsi: qedi: always wake up if cmd_cleanup_req is set
  2021-04-10 18:40 [RFC PATCH 0/13]: qedi tmf fixes Mike Christie
                   ` (11 preceding siblings ...)
  2021-04-10 18:40 ` [PATCH 12/13] scsi: qedi: make sure tmf works are done before disconnect Mike Christie
@ 2021-04-10 18:40 ` Mike Christie
  2021-04-12 10:35   ` [EXT] " Manish Rangankar
  2021-04-12 10:22 ` [EXT] [RFC PATCH 0/13]: qedi tmf fixes Manish Rangankar
  13 siblings, 1 reply; 26+ messages in thread
From: Mike Christie @ 2021-04-10 18:40 UTC (permalink / raw)
  To: lduncan, martin.petersen, mrangankar, svernekar, linux-scsi, jejb
  Cc: Mike Christie

If we got a response then we should always wake up the conn. For both
the cmd_cleanup_req == 0 or cmd_cleanup_req > 0, we shouldn't dig into
iscsi_itt_to_task because we don't know what the upper layers are doing.

We can also remove the qedi_clear_task_idx call here because once we
signal success libiscsi will loop over the affected commands and end
up calling the cleanup_task callout which will release it.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/qedi/qedi_fw.c | 31 ++++---------------------------
 1 file changed, 4 insertions(+), 27 deletions(-)

diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c
index 13dd06915d74..13d1250951a6 100644
--- a/drivers/scsi/qedi/qedi_fw.c
+++ b/drivers/scsi/qedi/qedi_fw.c
@@ -739,7 +739,6 @@ static void qedi_process_cmd_cleanup_resp(struct qedi_ctx *qedi,
 {
 	struct qedi_work_map *work, *work_tmp;
 	u32 proto_itt = cqe->itid;
-	u32 ptmp_itt = 0;
 	itt_t protoitt = 0;
 	int found = 0;
 	struct qedi_cmd *qedi_cmd = NULL;
@@ -823,37 +822,15 @@ static void qedi_process_cmd_cleanup_resp(struct qedi_ctx *qedi,
 
 check_cleanup_reqs:
 	if (qedi_conn->cmd_cleanup_req > 0) {
-		spin_lock_bh(&conn->session->back_lock);
-		qedi_get_proto_itt(qedi, cqe->itid, &ptmp_itt);
-		protoitt = build_itt(ptmp_itt, conn->session->age);
-		task = iscsi_itt_to_task(conn, protoitt);
-		QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_SCSI_TM,
-			  "cleanup io itid=0x%x, protoitt=0x%x, cmd_cleanup_cmpl=%d, cid=0x%x\n",
-			  cqe->itid, protoitt, qedi_conn->cmd_cleanup_cmpl,
-			  qedi_conn->iscsi_conn_id);
-
-		spin_unlock_bh(&conn->session->back_lock);
-		if (!task) {
-			QEDI_NOTICE(&qedi->dbg_ctx,
-				    "task is null, itid=0x%x, cid=0x%x\n",
-				    cqe->itid, qedi_conn->iscsi_conn_id);
-			return;
-		}
-		qedi_conn->cmd_cleanup_cmpl++;
-		wake_up(&qedi_conn->wait_queue);
-
 		QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_TID,
 			  "Freeing tid=0x%x for cid=0x%x\n",
 			  cqe->itid, qedi_conn->iscsi_conn_id);
-		qedi_clear_task_idx(qedi_conn->qedi, cqe->itid);
-
+		qedi_conn->cmd_cleanup_cmpl++;
+		wake_up(&qedi_conn->wait_queue);
 	} else {
-		qedi_get_proto_itt(qedi, cqe->itid, &ptmp_itt);
-		protoitt = build_itt(ptmp_itt, conn->session->age);
-		task = iscsi_itt_to_task(conn, protoitt);
 		QEDI_ERR(&qedi->dbg_ctx,
-			 "Delayed or untracked cleanup response, itt=0x%x, tid=0x%x, cid=0x%x, task=%p\n",
-			 protoitt, cqe->itid, qedi_conn->iscsi_conn_id, task);
+			 "Delayed or untracked cleanup response, itt=0x%x, tid=0x%x, cid=0x%x\n",
+			 protoitt, cqe->itid, qedi_conn->iscsi_conn_id);
 	}
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* RE: [EXT] [RFC PATCH 0/13]: qedi tmf fixes
  2021-04-10 18:40 [RFC PATCH 0/13]: qedi tmf fixes Mike Christie
                   ` (12 preceding siblings ...)
  2021-04-10 18:40 ` [PATCH 13/13] scsi: qedi: always wake up if cmd_cleanup_req is set Mike Christie
@ 2021-04-12 10:22 ` Manish Rangankar
  13 siblings, 0 replies; 26+ messages in thread
From: Manish Rangankar @ 2021-04-12 10:22 UTC (permalink / raw)
  To: Mike Christie, lduncan, martin.petersen, Santosh Vernekar,
	linux-scsi, jejb



> -----Original Message-----
> From: Mike Christie <michael.christie@oracle.com>
> Sent: Sunday, April 11, 2021 12:10 AM
> To: lduncan@suse.com; martin.petersen@oracle.com; Manish Rangankar
> <mrangankar@marvell.com>; Santosh Vernekar <svernekar@marvell.com>;
> linux-scsi@vger.kernel.org; jejb@linux.ibm.com
> Subject: [EXT] [RFC PATCH 0/13]: qedi tmf fixes
> 
> External Email
> 
> ----------------------------------------------------------------------
> The following patches made over Linus's tree fixes a bunch of issues in qedi's tmf
> handling.
> 
> Manish, I did some tests here. Please test it out and review.
> 

Mike, 

Thanks for the fixes. Various recovery Test with and without IOs with the patch set went fine. 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [EXT] [PATCH 04/13] scsi: qedi: fix abort vs cmd re-use race
  2021-04-10 18:40 ` [PATCH 04/13] scsi: qedi: fix abort vs cmd re-use race Mike Christie
@ 2021-04-12 10:24   ` Manish Rangankar
  0 siblings, 0 replies; 26+ messages in thread
From: Manish Rangankar @ 2021-04-12 10:24 UTC (permalink / raw)
  To: Mike Christie, lduncan, martin.petersen, Santosh Vernekar,
	linux-scsi, jejb



> -----Original Message-----
> From: Mike Christie <michael.christie@oracle.com>
> Sent: Sunday, April 11, 2021 12:10 AM
> To: lduncan@suse.com; martin.petersen@oracle.com; Manish Rangankar
> <mrangankar@marvell.com>; Santosh Vernekar <svernekar@marvell.com>;
> linux-scsi@vger.kernel.org; jejb@linux.ibm.com
> Cc: Mike Christie <michael.christie@oracle.com>
> Subject: [EXT] [PATCH 04/13] scsi: qedi: fix abort vs cmd re-use race
> 
> External Email
> 
> ----------------------------------------------------------------------
> If the scsi cmd completes after qedi_tmf_work calls iscsi_itt_to_task then the
> qedi qedi_cmd->task_id could be freed and used for another cmd. If we then call
> qedi_iscsi_cleanup_task with that task_id we will be cleaning up the wrong cmd.
> 
> This patch has us wait to release the task_id until the last put has been done on
> the iscsi_task. Because libiscsi grabs a ref to the task when sending the abort,
> we know that for the non abort timeout case that the task_id we are
> referencing is for the cmd that was supposed to be aborted.
> 
> The next patch will fix the case where the abort timesout while we are running
> qedi_tmf_work.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/scsi/qedi/qedi_fw.c    | 13 -------------
>  drivers/scsi/qedi/qedi_iscsi.c | 20 +++++++++++++++++---
>  2 files changed, 17 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c index
> cf57b4e49700..ad4357e4c15d 100644
> --- a/drivers/scsi/qedi/qedi_fw.c
> +++ b/drivers/scsi/qedi/qedi_fw.c
> @@ -73,7 +73,6 @@ static void qedi_process_logout_resp(struct qedi_ctx
> *qedi,
>  	spin_unlock(&qedi_conn->list_lock);
> 
>  	cmd->state = RESPONSE_RECEIVED;
> -	qedi_clear_task_idx(qedi, cmd->task_id);
>  	__iscsi_complete_pdu(conn, (struct iscsi_hdr *)resp_hdr, NULL, 0);
> 
>  	spin_unlock(&session->back_lock);
> @@ -138,7 +137,6 @@ static void qedi_process_text_resp(struct qedi_ctx
> *qedi,
>  	spin_unlock(&qedi_conn->list_lock);
> 
>  	cmd->state = RESPONSE_RECEIVED;
> -	qedi_clear_task_idx(qedi, cmd->task_id);
> 
>  	__iscsi_complete_pdu(conn, (struct iscsi_hdr *)resp_hdr_ptr,
>  			     qedi_conn->gen_pdu.resp_buf,
> @@ -164,13 +162,11 @@ static void qedi_tmf_resp_work(struct work_struct
> *work)
>  	iscsi_block_session(session->cls_session);
>  	rval = qedi_cleanup_all_io(qedi, qedi_conn, qedi_cmd->task, true);
>  	if (rval) {
> -		qedi_clear_task_idx(qedi, qedi_cmd->task_id);
>  		iscsi_unblock_session(session->cls_session);
>  		goto exit_tmf_resp;
>  	}
> 
>  	iscsi_unblock_session(session->cls_session);
> -	qedi_clear_task_idx(qedi, qedi_cmd->task_id);
> 
>  	spin_lock(&session->back_lock);
>  	__iscsi_complete_pdu(conn, (struct iscsi_hdr *)resp_hdr_ptr, NULL, 0);
> @@ -245,8 +241,6 @@ static void qedi_process_tmf_resp(struct qedi_ctx
> *qedi,
>  		goto unblock_sess;
>  	}
> 
> -	qedi_clear_task_idx(qedi, qedi_cmd->task_id);
> -
>  	__iscsi_complete_pdu(conn, (struct iscsi_hdr *)resp_hdr_ptr, NULL, 0);
>  	kfree(resp_hdr_ptr);
> 
> @@ -314,7 +308,6 @@ static void qedi_process_login_resp(struct qedi_ctx
> *qedi,
>  		  "Freeing tid=0x%x for cid=0x%x\n",
>  		  cmd->task_id, qedi_conn->iscsi_conn_id);
>  	cmd->state = RESPONSE_RECEIVED;
> -	qedi_clear_task_idx(qedi, cmd->task_id);
>  }
> 
>  static void qedi_get_rq_bdq_buf(struct qedi_ctx *qedi, @@ -468,7 +461,6 @@
> static int qedi_process_nopin_mesg(struct qedi_ctx *qedi,
>  		}
> 
>  		spin_unlock(&qedi_conn->list_lock);
> -		qedi_clear_task_idx(qedi, cmd->task_id);
>  	}
> 
>  done:
> @@ -673,7 +665,6 @@ static void qedi_scsi_completion(struct qedi_ctx *qedi,
>  	if (qedi_io_tracing)
>  		qedi_trace_io(qedi, task, cmd->task_id, QEDI_IO_TRACE_RSP);
> 
> -	qedi_clear_task_idx(qedi, cmd->task_id);
>  	__iscsi_complete_pdu(conn, (struct iscsi_hdr *)hdr,
>  			     conn->data, datalen);
>  error:
> @@ -730,7 +721,6 @@ static void qedi_process_nopin_local_cmpl(struct
> qedi_ctx *qedi,
>  		  cqe->itid, cmd->task_id);
> 
>  	cmd->state = RESPONSE_RECEIVED;
> -	qedi_clear_task_idx(qedi, cmd->task_id);
> 
>  	spin_lock_bh(&session->back_lock);
>  	__iscsi_put_task(task);
> @@ -821,8 +811,6 @@ static void qedi_process_cmd_cleanup_resp(struct
> qedi_ctx *qedi,
>  			if (qedi_cmd->state == CLEANUP_WAIT_FAILED)
>  				qedi_cmd->state = CLEANUP_RECV;
> 
> -			qedi_clear_task_idx(qedi_conn->qedi, rtid);
> -
>  			spin_lock(&qedi_conn->list_lock);
>  			if (likely(dbg_cmd->io_cmd_in_list)) {
>  				dbg_cmd->io_cmd_in_list = false;
> @@ -856,7 +844,6 @@ static void qedi_process_cmd_cleanup_resp(struct
> qedi_ctx *qedi,
>  		QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_TID,
>  			  "Freeing tid=0x%x for cid=0x%x\n",
>  			  cqe->itid, qedi_conn->iscsi_conn_id);
> -		qedi_clear_task_idx(qedi_conn->qedi, cqe->itid);
> 
>  	} else {
>  		qedi_get_proto_itt(qedi, cqe->itid, &ptmp_itt); diff --git
> a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c index
> 08c05403cd72..d1da34a938da 100644
> --- a/drivers/scsi/qedi/qedi_iscsi.c
> +++ b/drivers/scsi/qedi/qedi_iscsi.c
> @@ -772,7 +772,6 @@ static int qedi_mtask_xmit(struct iscsi_conn *conn,
> struct iscsi_task *task)
>  	}
> 
>  	cmd->conn = conn->dd_data;
> -	cmd->scsi_cmd = NULL;
>  	return qedi_iscsi_send_generic_request(task);
>  }
> 
> @@ -783,6 +782,10 @@ static int qedi_task_xmit(struct iscsi_task *task)
>  	struct qedi_cmd *cmd = task->dd_data;
>  	struct scsi_cmnd *sc = task->sc;
> 
> +	/* Clear now so in cleanup_task we know it didn't make it */
> +	cmd->scsi_cmd = NULL;
> +	cmd->task_id = -1;
> +
>  	if (test_bit(QEDI_IN_SHUTDOWN, &qedi_conn->qedi->flags))
>  		return -ENODEV;
> 
> @@ -1383,13 +1386,24 @@ static umode_t qedi_attr_is_visible(int
> param_type, int param)
> 
>  static void qedi_cleanup_task(struct iscsi_task *task)  {
> -	if (!task->sc || task->state == ISCSI_TASK_PENDING) {
> +	struct qedi_cmd *cmd;
> +
> +	if (task->state == ISCSI_TASK_PENDING) {
>  		QEDI_INFO(NULL, QEDI_LOG_IO, "Returning ref_cnt=%d\n",
>  			  refcount_read(&task->refcount));
>  		return;
>  	}
> 
> -	qedi_iscsi_unmap_sg_list(task->dd_data);
> +	if (task->sc)
> +		qedi_iscsi_unmap_sg_list(task->dd_data);
> +
> +	cmd = task->dd_data;
> +	if (cmd->task_id != -1)
> +		qedi_clear_task_idx(iscsi_host_priv(task->conn->session->host),
> +				    cmd->task_id);
> +
> +	cmd->task_id = -1;
> +	cmd->scsi_cmd = NULL;
>  }
> 
>  struct iscsi_transport qedi_iscsi_transport = {
> --
> 2.25.1

Reviewed-by: Manish Rangankar <mrangankar@marvell.com>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [EXT] [PATCH 02/13] scsi: qedi: fix null ref during abort handling
  2021-04-10 18:40 ` [PATCH 02/13] scsi: qedi: fix null ref during abort handling Mike Christie
@ 2021-04-12 10:25   ` Manish Rangankar
  0 siblings, 0 replies; 26+ messages in thread
From: Manish Rangankar @ 2021-04-12 10:25 UTC (permalink / raw)
  To: Mike Christie, lduncan, martin.petersen, Santosh Vernekar,
	linux-scsi, jejb


> -----Original Message-----
> From: Mike Christie <michael.christie@oracle.com>
> Sent: Sunday, April 11, 2021 12:10 AM
> To: lduncan@suse.com; martin.petersen@oracle.com; Manish Rangankar
> <mrangankar@marvell.com>; Santosh Vernekar <svernekar@marvell.com>;
> linux-scsi@vger.kernel.org; jejb@linux.ibm.com
> Cc: Mike Christie <michael.christie@oracle.com>
> Subject: [EXT] [PATCH 02/13] scsi: qedi: fix null ref during abort handling
> 
> External Email
> 
> ----------------------------------------------------------------------
> If qedi_process_cmd_cleanup_resp finds the cmd it frees the work and sets
> list_tmf_work to NULL, so qedi_tmf_work should check if list_tmf_work is non-
> NULL when it wants to force cleanup.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/scsi/qedi/qedi_fw.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c index
> 440ddd2309f1..cf57b4e49700 100644
> --- a/drivers/scsi/qedi/qedi_fw.c
> +++ b/drivers/scsi/qedi/qedi_fw.c
> @@ -1453,7 +1453,7 @@ static void qedi_tmf_work(struct work_struct *work)
> 
>  ldel_exit:
>  	spin_lock_bh(&qedi_conn->tmf_work_lock);
> -	if (!qedi_cmd->list_tmf_work) {
> +	if (qedi_cmd->list_tmf_work) {
>  		list_del_init(&list_work->list);
>  		qedi_cmd->list_tmf_work = NULL;
>  		kfree(list_work);
> --
> 2.25.1

Thanks,
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [EXT] [PATCH 05/13] scsi: qedi: fix use after free during abort cleanup
  2021-04-10 18:40 ` [PATCH 05/13] scsi: qedi: fix use after free during abort cleanup Mike Christie
@ 2021-04-12 10:26   ` Manish Rangankar
  0 siblings, 0 replies; 26+ messages in thread
From: Manish Rangankar @ 2021-04-12 10:26 UTC (permalink / raw)
  To: Mike Christie, lduncan, martin.petersen, Santosh Vernekar,
	linux-scsi, jejb


> -----Original Message-----
> From: Mike Christie <michael.christie@oracle.com>
> Sent: Sunday, April 11, 2021 12:10 AM
> To: lduncan@suse.com; martin.petersen@oracle.com; Manish Rangankar
> <mrangankar@marvell.com>; Santosh Vernekar <svernekar@marvell.com>;
> linux-scsi@vger.kernel.org; jejb@linux.ibm.com
> Cc: Mike Christie <michael.christie@oracle.com>
> Subject: [EXT] [PATCH 05/13] scsi: qedi: fix use after free during abort cleanup
> 
> External Email
> 
> ----------------------------------------------------------------------
> This fixes two bugs:
> 
> 1. The scsi cmd task could be completed and the abort could timeout while we
> are running qedi_tmf_work so we need to get a ref to the task.
> 
> 2. If qedi_tmf_work's qedi_wait_for_cleanup_request call times out then we will
> also force the clean up of the qedi_work_map but
> qedi_process_cmd_cleanup_resp could still be accessing the qedi_cmd for the
> abort tmf. We can then race where qedi_process_cmd_cleanup_resp is still
> accessing the mtask's qedi_cmd but libiscsi has escalated to session level
> cleanup and is cleaning up the mtask while we are still accessing it.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/scsi/qedi/qedi_fw.c    | 110 ++++++++++++++++++---------------
>  drivers/scsi/qedi/qedi_iscsi.h |   1 +
>  2 files changed, 61 insertions(+), 50 deletions(-)
> 
> diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c index
> ad4357e4c15d..c5699421ec37 100644
> --- a/drivers/scsi/qedi/qedi_fw.c
> +++ b/drivers/scsi/qedi/qedi_fw.c
> @@ -729,7 +729,6 @@ static void qedi_process_nopin_local_cmpl(struct
> qedi_ctx *qedi,
> 
>  static void qedi_process_cmd_cleanup_resp(struct qedi_ctx *qedi,
>  					  struct iscsi_cqe_solicited *cqe,
> -					  struct iscsi_task *task,
>  					  struct iscsi_conn *conn)
>  {
>  	struct qedi_work_map *work, *work_tmp; @@ -742,7 +741,7 @@
> static void qedi_process_cmd_cleanup_resp(struct qedi_ctx *qedi,
>  	u32 iscsi_cid;
>  	struct qedi_conn *qedi_conn;
>  	struct qedi_cmd *dbg_cmd;
> -	struct iscsi_task *mtask;
> +	struct iscsi_task *mtask, *task;
>  	struct iscsi_tm *tmf_hdr = NULL;
> 
>  	iscsi_cid = cqe->conn_id;
> @@ -768,6 +767,7 @@ static void qedi_process_cmd_cleanup_resp(struct
> qedi_ctx *qedi,
>  			}
>  			found = 1;
>  			mtask = qedi_cmd->task;
> +			task = work->ctask;
>  			tmf_hdr = (struct iscsi_tm *)mtask->hdr;
>  			rtid = work->rtid;
> 
> @@ -776,52 +776,47 @@ static void qedi_process_cmd_cleanup_resp(struct
> qedi_ctx *qedi,
>  			qedi_cmd->list_tmf_work = NULL;
>  		}
>  	}
> -	spin_unlock_bh(&qedi_conn->tmf_work_lock);
> -
> -	if (found) {
> -		QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_SCSI_TM,
> -			  "TMF work, cqe->tid=0x%x, tmf flags=0x%x,
> cid=0x%x\n",
> -			  proto_itt, tmf_hdr->flags, qedi_conn->iscsi_conn_id);
> -
> -		if ((tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) ==
> -		    ISCSI_TM_FUNC_ABORT_TASK) {
> -			spin_lock_bh(&conn->session->back_lock);
> 
> -			protoitt = build_itt(get_itt(tmf_hdr->rtt),
> -					     conn->session->age);
> -			task = iscsi_itt_to_task(conn, protoitt);
> -
> -			spin_unlock_bh(&conn->session->back_lock);
> +	if (!found) {
> +		spin_unlock_bh(&qedi_conn->tmf_work_lock);
> +		goto check_cleanup_reqs;
> +	}
> 
> -			if (!task) {
> -				QEDI_NOTICE(&qedi->dbg_ctx,
> -					    "IO task completed, tmf rtt=0x%x,
> cid=0x%x\n",
> -					    get_itt(tmf_hdr->rtt),
> -					    qedi_conn->iscsi_conn_id);
> -				return;
> -			}
> +	QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_SCSI_TM,
> +		  "TMF work, cqe->tid=0x%x, tmf flags=0x%x, cid=0x%x\n",
> +		  proto_itt, tmf_hdr->flags, qedi_conn->iscsi_conn_id);
> +
> +	spin_lock_bh(&conn->session->back_lock);
> +	if (iscsi_task_is_completed(task)) {
> +		QEDI_NOTICE(&qedi->dbg_ctx,
> +			    "IO task completed, tmf rtt=0x%x, cid=0x%x\n",
> +			   get_itt(tmf_hdr->rtt), qedi_conn->iscsi_conn_id);
> +		goto unlock;
> +	}
> 
> -			dbg_cmd = task->dd_data;
> +	dbg_cmd = task->dd_data;
> 
> -			QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_SCSI_TM,
> -				  "Abort tmf rtt=0x%x, i/o itt=0x%x, i/o
> tid=0x%x, cid=0x%x\n",
> -				  get_itt(tmf_hdr->rtt), get_itt(task->itt),
> -				  dbg_cmd->task_id, qedi_conn-
> >iscsi_conn_id);
> +	QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_SCSI_TM,
> +		  "Abort tmf rtt=0x%x, i/o itt=0x%x, i/o tid=0x%x, cid=0x%x\n",
> +		  get_itt(tmf_hdr->rtt), get_itt(task->itt), dbg_cmd->task_id,
> +		  qedi_conn->iscsi_conn_id);
> 
> -			if (qedi_cmd->state == CLEANUP_WAIT_FAILED)
> -				qedi_cmd->state = CLEANUP_RECV;
> +	spin_lock(&qedi_conn->list_lock);
> +	if (likely(dbg_cmd->io_cmd_in_list)) {
> +		dbg_cmd->io_cmd_in_list = false;
> +		list_del_init(&dbg_cmd->io_cmd);
> +		qedi_conn->active_cmd_count--;
> +	}
> +	spin_unlock(&qedi_conn->list_lock);
> +	qedi_cmd->state = CLEANUP_RECV;
> +unlock:
> +	spin_unlock_bh(&conn->session->back_lock);
> +	spin_unlock_bh(&qedi_conn->tmf_work_lock);
> +	wake_up_interruptible(&qedi_conn->wait_queue);
> +	return;
> 
> -			spin_lock(&qedi_conn->list_lock);
> -			if (likely(dbg_cmd->io_cmd_in_list)) {
> -				dbg_cmd->io_cmd_in_list = false;
> -				list_del_init(&dbg_cmd->io_cmd);
> -				qedi_conn->active_cmd_count--;
> -			}
> -			spin_unlock(&qedi_conn->list_lock);
> -			qedi_cmd->state = CLEANUP_RECV;
> -			wake_up_interruptible(&qedi_conn->wait_queue);
> -		}
> -	} else if (qedi_conn->cmd_cleanup_req > 0) {
> +check_cleanup_reqs:
> +	if (qedi_conn->cmd_cleanup_req > 0) {
>  		spin_lock_bh(&conn->session->back_lock);
>  		qedi_get_proto_itt(qedi, cqe->itid, &ptmp_itt);
>  		protoitt = build_itt(ptmp_itt, conn->session->age); @@ -844,6
> +839,7 @@ static void qedi_process_cmd_cleanup_resp(struct qedi_ctx *qedi,
>  		QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_TID,
>  			  "Freeing tid=0x%x for cid=0x%x\n",
>  			  cqe->itid, qedi_conn->iscsi_conn_id);
> +		qedi_clear_task_idx(qedi_conn->qedi, cqe->itid);
> 
>  	} else {
>  		qedi_get_proto_itt(qedi, cqe->itid, &ptmp_itt); @@ -946,8
> +942,7 @@ void qedi_fp_process_cqes(struct qedi_work *work)
>  		goto exit_fp_process;
>  	case ISCSI_CQE_TYPE_TASK_CLEANUP:
>  		QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_SCSI_TM, "CleanUp
> CqE\n");
> -		qedi_process_cmd_cleanup_resp(qedi, &cqe->cqe_solicited,
> task,
> -					      conn);
> +		qedi_process_cmd_cleanup_resp(qedi, &cqe->cqe_solicited,
> conn);
>  		goto exit_fp_process;
>  	default:
>  		QEDI_ERR(&qedi->dbg_ctx, "Error cqe.\n"); @@ -1374,12
> +1369,22 @@ static void qedi_tmf_work(struct work_struct *work)
>  	tmf_hdr = (struct iscsi_tm *)mtask->hdr;
>  	set_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
> 
> -	ctask = iscsi_itt_to_task(conn, tmf_hdr->rtt);
> -	if (!ctask || !ctask->sc) {
> +	spin_lock_bh(&conn->session->back_lock);
> +	ctask = iscsi_itt_to_ctask(conn, tmf_hdr->rtt);
> +	if (!ctask || iscsi_task_is_completed(ctask)) {
> +		spin_unlock_bh(&conn->session->back_lock);
>  		QEDI_ERR(&qedi->dbg_ctx, "Task already completed\n");
> -		goto abort_ret;
> +		goto clear_cleanup;
>  	}
> 
> +	/*
> +	 * libiscsi gets a ref before sending the abort, but if libiscsi times
> +	 * it out then it could release it and the cmd could complete from
> +	 * under us.
> +	 */
> +	__iscsi_get_task(ctask);
> +	spin_unlock_bh(&conn->session->back_lock);
> +
>  	cmd = (struct qedi_cmd *)ctask->dd_data;
>  	QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_INFO,
>  		  "Abort tmf rtt=0x%x, cmd itt=0x%x, cmd tid=0x%x,
> cid=0x%x\n", @@ -1389,19 +1394,20 @@ static void qedi_tmf_work(struct
> work_struct *work)
>  	if (qedi_do_not_recover) {
>  		QEDI_ERR(&qedi->dbg_ctx, "DONT SEND CLEANUP/ABORT
> %d\n",
>  			 qedi_do_not_recover);
> -		goto abort_ret;
> +		goto put_task;
>  	}
> 
>  	list_work = kzalloc(sizeof(*list_work), GFP_ATOMIC);
>  	if (!list_work) {
>  		QEDI_ERR(&qedi->dbg_ctx, "Memory allocation failed\n");
> -		goto abort_ret;
> +		goto put_task;
>  	}
> 
>  	qedi_cmd->type = TYPEIO;
>  	list_work->qedi_cmd = qedi_cmd;
>  	list_work->rtid = cmd->task_id;
>  	list_work->state = QEDI_WORK_SCHEDULED;
> +	list_work->ctask = ctask;
>  	qedi_cmd->list_tmf_work = list_work;
> 
>  	QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_SCSI_TM, @@ -1434,7 +1440,9
> @@ static void qedi_tmf_work(struct work_struct *work)
>  	qedi_cmd->task_id = tid;
>  	qedi_send_iscsi_tmf(qedi_conn, qedi_cmd->task);
> 
> -abort_ret:
> +put_task:
> +	iscsi_put_task(ctask);
> +clear_cleanup:
>  	clear_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
>  	return;
> 
> @@ -1455,6 +1463,8 @@ static void qedi_tmf_work(struct work_struct *work)
>  	}
>  	spin_unlock(&qedi_conn->list_lock);
> 
> +	iscsi_put_task(ctask);
> +
>  	clear_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);  }
> 
> diff --git a/drivers/scsi/qedi/qedi_iscsi.h b/drivers/scsi/qedi/qedi_iscsi.h index
> 39dc27c85e3c..68ef519f5480 100644
> --- a/drivers/scsi/qedi/qedi_iscsi.h
> +++ b/drivers/scsi/qedi/qedi_iscsi.h
> @@ -212,6 +212,7 @@ struct qedi_cmd {
>  struct qedi_work_map {
>  	struct list_head list;
>  	struct qedi_cmd *qedi_cmd;
> +	struct iscsi_task *ctask;
>  	int rtid;
> 
>  	int state;
> --
> 2.25.1

Thanks,
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [EXT] [PATCH 06/13] scsi: qedi: fix tmf tid allocation
  2021-04-10 18:40 ` [PATCH 06/13] scsi: qedi: fix tmf tid allocation Mike Christie
@ 2021-04-12 10:27   ` Manish Rangankar
  0 siblings, 0 replies; 26+ messages in thread
From: Manish Rangankar @ 2021-04-12 10:27 UTC (permalink / raw)
  To: Mike Christie, lduncan, martin.petersen, Santosh Vernekar,
	linux-scsi, jejb

> -----Original Message-----
> From: Mike Christie <michael.christie@oracle.com>
> Sent: Sunday, April 11, 2021 12:10 AM
> To: lduncan@suse.com; martin.petersen@oracle.com; Manish Rangankar
> <mrangankar@marvell.com>; Santosh Vernekar <svernekar@marvell.com>;
> linux-scsi@vger.kernel.org; jejb@linux.ibm.com
> Cc: Mike Christie <michael.christie@oracle.com>
> Subject: [EXT] [PATCH 06/13] scsi: qedi: fix tmf tid allocation
> 
> External Email
> 
> ----------------------------------------------------------------------
> qedi_iscsi_abort_work and qedi_tmf_work allocates a tid then calls
> qedi_send_iscsi_tmf which also allcoates a tid. This removes the tid allocation
> from the callers.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/scsi/qedi/qedi_fw.c    | 76 ++++++++++------------------------
>  drivers/scsi/qedi/qedi_gbl.h   |  3 +-
>  drivers/scsi/qedi/qedi_iscsi.c |  2 +-
>  3 files changed, 25 insertions(+), 56 deletions(-)
> 
> diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c index
> c5699421ec37..542255c94d96 100644
> --- a/drivers/scsi/qedi/qedi_fw.c
> +++ b/drivers/scsi/qedi/qedi_fw.c
> @@ -14,8 +14,8 @@
>  #include "qedi_fw_iscsi.h"
>  #include "qedi_fw_scsi.h"
> 
> -static int qedi_send_iscsi_tmf(struct qedi_conn *qedi_conn,
> -			       struct iscsi_task *mtask);
> +static int send_iscsi_tmf(struct qedi_conn *qedi_conn,
> +			  struct iscsi_task *mtask);
> 
>  void qedi_iscsi_unmap_sg_list(struct qedi_cmd *cmd)  { @@ -1350,7 +1350,7
> @@ static int qedi_wait_for_cleanup_request(struct qedi_ctx *qedi,
>  	return 0;
>  }
> 
> -static void qedi_tmf_work(struct work_struct *work)
> +static void qedi_abort_work(struct work_struct *work)
>  {
>  	struct qedi_cmd *qedi_cmd =
>  		container_of(work, struct qedi_cmd, tmf_work); @@ -1363,7
> +1363,6 @@ static void qedi_tmf_work(struct work_struct *work)
>  	struct iscsi_task *ctask;
>  	struct iscsi_tm *tmf_hdr;
>  	s16 rval = 0;
> -	s16 tid = 0;
> 
>  	mtask = qedi_cmd->task;
>  	tmf_hdr = (struct iscsi_tm *)mtask->hdr; @@ -1404,6 +1403,7 @@
> static void qedi_tmf_work(struct work_struct *work)
>  	}
> 
>  	qedi_cmd->type = TYPEIO;
> +	qedi_cmd->state = CLEANUP_WAIT;
>  	list_work->qedi_cmd = qedi_cmd;
>  	list_work->rtid = cmd->task_id;
>  	list_work->state = QEDI_WORK_SCHEDULED; @@ -1430,15 +1430,7
> @@ static void qedi_tmf_work(struct work_struct *work)
>  		goto ldel_exit;
>  	}
> 
> -	tid = qedi_get_task_idx(qedi);
> -	if (tid == -1) {
> -		QEDI_ERR(&qedi->dbg_ctx, "Invalid tid, cid=0x%x\n",
> -			 qedi_conn->iscsi_conn_id);
> -		goto ldel_exit;
> -	}
> -
> -	qedi_cmd->task_id = tid;
> -	qedi_send_iscsi_tmf(qedi_conn, qedi_cmd->task);
> +	send_iscsi_tmf(qedi_conn, qedi_cmd->task);
> 
>  put_task:
>  	iscsi_put_task(ctask);
> @@ -1468,8 +1460,7 @@ static void qedi_tmf_work(struct work_struct *work)
>  	clear_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);  }
> 
> -static int qedi_send_iscsi_tmf(struct qedi_conn *qedi_conn,
> -			       struct iscsi_task *mtask)
> +static int send_iscsi_tmf(struct qedi_conn *qedi_conn, struct
> +iscsi_task *mtask)
>  {
>  	struct iscsi_tmf_request_hdr tmf_pdu_header;
>  	struct iscsi_task_params task_params;
> @@ -1484,7 +1475,6 @@ static int qedi_send_iscsi_tmf(struct qedi_conn
> *qedi_conn,
>  	u32 scsi_lun[2];
>  	s16 tid = 0;
>  	u16 sq_idx = 0;
> -	int rval = 0;
> 
>  	tmf_hdr = (struct iscsi_tm *)mtask->hdr;
>  	qedi_cmd = (struct qedi_cmd *)mtask->dd_data; @@ -1548,10 +1538,7
> @@ static int qedi_send_iscsi_tmf(struct qedi_conn *qedi_conn,
>  	task_params.sqe = &ep->sq[sq_idx];
> 
>  	memset(task_params.sqe, 0, sizeof(struct iscsi_wqe));
> -	rval = init_initiator_tmf_request_task(&task_params,
> -					       &tmf_pdu_header);
> -	if (rval)
> -		return -1;
> +	init_initiator_tmf_request_task(&task_params, &tmf_pdu_header);
> 
>  	spin_lock(&qedi_conn->list_lock);
>  	list_add_tail(&qedi_cmd->io_cmd, &qedi_conn->active_cmd_list); @@ -
> 1563,47 +1550,30 @@ static int qedi_send_iscsi_tmf(struct qedi_conn
> *qedi_conn,
>  	return 0;
>  }
> 
> -int qedi_iscsi_abort_work(struct qedi_conn *qedi_conn,
> -			  struct iscsi_task *mtask)
> +int qedi_send_iscsi_tmf(struct qedi_conn *qedi_conn, struct iscsi_task
> +*mtask)
>  {
> +	struct iscsi_tm *tmf_hdr = (struct iscsi_tm *)mtask->hdr;
> +	struct qedi_cmd *qedi_cmd = mtask->dd_data;
>  	struct qedi_ctx *qedi = qedi_conn->qedi;
> -	struct iscsi_tm *tmf_hdr;
> -	struct qedi_cmd *qedi_cmd = (struct qedi_cmd *)mtask->dd_data;
> -	s16 tid = 0;
> +	int rc = 0;
> 
> -	tmf_hdr = (struct iscsi_tm *)mtask->hdr;
> -	qedi_cmd->task = mtask;
> -
> -	/* If abort task then schedule the work and return */
> -	if ((tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) ==
> -	    ISCSI_TM_FUNC_ABORT_TASK) {
> -		qedi_cmd->state = CLEANUP_WAIT;
> -		INIT_WORK(&qedi_cmd->tmf_work, qedi_tmf_work);
> +	switch (tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) {
> +	case ISCSI_TM_FUNC_ABORT_TASK:
> +		INIT_WORK(&qedi_cmd->tmf_work, qedi_abort_work);
>  		queue_work(qedi->tmf_thread, &qedi_cmd->tmf_work);
> -
> -	} else if (((tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) ==
> -		    ISCSI_TM_FUNC_LOGICAL_UNIT_RESET) ||
> -		   ((tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) ==
> -		    ISCSI_TM_FUNC_TARGET_WARM_RESET) ||
> -		   ((tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) ==
> -		    ISCSI_TM_FUNC_TARGET_COLD_RESET)) {
> -		tid = qedi_get_task_idx(qedi);
> -		if (tid == -1) {
> -			QEDI_ERR(&qedi->dbg_ctx, "Invalid tid, cid=0x%x\n",
> -				 qedi_conn->iscsi_conn_id);
> -			return -1;
> -		}
> -		qedi_cmd->task_id = tid;
> -
> -		qedi_send_iscsi_tmf(qedi_conn, qedi_cmd->task);
> -
> -	} else {
> +		break;
> +	case ISCSI_TM_FUNC_LOGICAL_UNIT_RESET:
> +	case ISCSI_TM_FUNC_TARGET_WARM_RESET:
> +	case ISCSI_TM_FUNC_TARGET_COLD_RESET:
> +		rc = send_iscsi_tmf(qedi_conn, mtask);
> +		break;
> +	default:
>  		QEDI_ERR(&qedi->dbg_ctx, "Invalid tmf, cid=0x%x\n",
>  			 qedi_conn->iscsi_conn_id);
> -		return -1;
> +		return -EINVAL;
>  	}
> 
> -	return 0;
> +	return rc;
>  }
> 
>  int qedi_send_iscsi_text(struct qedi_conn *qedi_conn, diff --git
> a/drivers/scsi/qedi/qedi_gbl.h b/drivers/scsi/qedi/qedi_gbl.h index
> 116645c08c71..fb44a282613e 100644
> --- a/drivers/scsi/qedi/qedi_gbl.h
> +++ b/drivers/scsi/qedi/qedi_gbl.h
> @@ -31,8 +31,7 @@ int qedi_send_iscsi_login(struct qedi_conn *qedi_conn,
>  			  struct iscsi_task *task);
>  int qedi_send_iscsi_logout(struct qedi_conn *qedi_conn,
>  			   struct iscsi_task *task);
> -int qedi_iscsi_abort_work(struct qedi_conn *qedi_conn,
> -			  struct iscsi_task *mtask);
> +int qedi_send_iscsi_tmf(struct qedi_conn *qedi_conn, struct iscsi_task
> +*mtask);
>  int qedi_send_iscsi_text(struct qedi_conn *qedi_conn,
>  			 struct iscsi_task *task);
>  int qedi_send_iscsi_nopout(struct qedi_conn *qedi_conn, diff --git
> a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c index
> d1da34a938da..821225f9beb0 100644
> --- a/drivers/scsi/qedi/qedi_iscsi.c
> +++ b/drivers/scsi/qedi/qedi_iscsi.c
> @@ -742,7 +742,7 @@ static int qedi_iscsi_send_generic_request(struct
> iscsi_task *task)
>  		rc = qedi_send_iscsi_logout(qedi_conn, task);
>  		break;
>  	case ISCSI_OP_SCSI_TMFUNC:
> -		rc = qedi_iscsi_abort_work(qedi_conn, task);
> +		rc = qedi_send_iscsi_tmf(qedi_conn, task);
>  		break;
>  	case ISCSI_OP_TEXT:
>  		rc = qedi_send_iscsi_text(qedi_conn, task);
> --
> 2.25.1

Thanks,
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [EXT] [PATCH 07/13] scsi: qedi: use GFP_NOIO for tmf allocation
  2021-04-10 18:40 ` [PATCH 07/13] scsi: qedi: use GFP_NOIO for tmf allocation Mike Christie
@ 2021-04-12 10:28   ` Manish Rangankar
  0 siblings, 0 replies; 26+ messages in thread
From: Manish Rangankar @ 2021-04-12 10:28 UTC (permalink / raw)
  To: Mike Christie, lduncan, martin.petersen, Santosh Vernekar,
	linux-scsi, jejb

> -----Original Message-----
> From: Mike Christie <michael.christie@oracle.com>
> Sent: Sunday, April 11, 2021 12:10 AM
> To: lduncan@suse.com; martin.petersen@oracle.com; Manish Rangankar
> <mrangankar@marvell.com>; Santosh Vernekar <svernekar@marvell.com>;
> linux-scsi@vger.kernel.org; jejb@linux.ibm.com
> Cc: Mike Christie <michael.christie@oracle.com>
> Subject: [EXT] [PATCH 07/13] scsi: qedi: use GFP_NOIO for tmf allocation
> 
> External Email
> 
> ----------------------------------------------------------------------
> We run from a workqueue with no locks held so use GFP_NOIO.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/scsi/qedi/qedi_fw.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c index
> 542255c94d96..84402e827d42 100644
> --- a/drivers/scsi/qedi/qedi_fw.c
> +++ b/drivers/scsi/qedi/qedi_fw.c
> @@ -1396,7 +1396,7 @@ static void qedi_abort_work(struct work_struct
> *work)
>  		goto put_task;
>  	}
> 
> -	list_work = kzalloc(sizeof(*list_work), GFP_ATOMIC);
> +	list_work = kzalloc(sizeof(*list_work), GFP_NOIO);
>  	if (!list_work) {
>  		QEDI_ERR(&qedi->dbg_ctx, "Memory allocation failed\n");
>  		goto put_task;
> --
> 2.25.1

Thanks,
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [EXT] [PATCH 09/13] scsi: qedi: fix session block/unblock abuse during tmf handling
  2021-04-10 18:40 ` [PATCH 09/13] scsi: qedi: fix session block/unblock abuse during tmf handling Mike Christie
@ 2021-04-12 10:31   ` Manish Rangankar
  0 siblings, 0 replies; 26+ messages in thread
From: Manish Rangankar @ 2021-04-12 10:31 UTC (permalink / raw)
  To: Mike Christie, lduncan, martin.petersen, Santosh Vernekar,
	linux-scsi, jejb

> -----Original Message-----
> From: Mike Christie <michael.christie@oracle.com>
> Sent: Sunday, April 11, 2021 12:10 AM
> To: lduncan@suse.com; martin.petersen@oracle.com; Manish Rangankar
> <mrangankar@marvell.com>; Santosh Vernekar <svernekar@marvell.com>;
> linux-scsi@vger.kernel.org; jejb@linux.ibm.com
> Cc: Mike Christie <michael.christie@oracle.com>
> Subject: [EXT] [PATCH 09/13] scsi: qedi: fix session block/unblock abuse during
> tmf handling
> 
> External Email
> 
> ----------------------------------------------------------------------
> Drivers shouldn't be calling block/unblock session for tmf handling because the
> functions can change the session state from under libiscsi.
> We now block the session for the drivers during tmf processing, so we can
> remove these calls.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/scsi/qedi/qedi_fw.c | 7 +------
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c index
> 84402e827d42..f13f8af6d931 100644
> --- a/drivers/scsi/qedi/qedi_fw.c
> +++ b/drivers/scsi/qedi/qedi_fw.c
> @@ -159,14 +159,9 @@ static void qedi_tmf_resp_work(struct work_struct
> *work)
>  	set_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
>  	resp_hdr_ptr =  (struct iscsi_tm_rsp *)qedi_cmd->tmf_resp_buf;
> 
> -	iscsi_block_session(session->cls_session);
>  	rval = qedi_cleanup_all_io(qedi, qedi_conn, qedi_cmd->task, true);
> -	if (rval) {
> -		iscsi_unblock_session(session->cls_session);
> +	if (rval)
>  		goto exit_tmf_resp;
> -	}
> -
> -	iscsi_unblock_session(session->cls_session);
> 
>  	spin_lock(&session->back_lock);
>  	__iscsi_complete_pdu(conn, (struct iscsi_hdr *)resp_hdr_ptr, NULL, 0);
> --
> 2.25.1

Thanks,
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [EXT] [PATCH 10/13] scsi: qedi: fix session block/unblock abuse during cleanup
  2021-04-10 18:40 ` [PATCH 10/13] scsi: qedi: fix session block/unblock abuse during cleanup Mike Christie
@ 2021-04-12 10:33   ` Manish Rangankar
  0 siblings, 0 replies; 26+ messages in thread
From: Manish Rangankar @ 2021-04-12 10:33 UTC (permalink / raw)
  To: Mike Christie, lduncan, martin.petersen, Santosh Vernekar,
	linux-scsi, jejb

> -----Original Message-----
> From: Mike Christie <michael.christie@oracle.com>
> Sent: Sunday, April 11, 2021 12:10 AM
> To: lduncan@suse.com; martin.petersen@oracle.com; Manish Rangankar
> <mrangankar@marvell.com>; Santosh Vernekar <svernekar@marvell.com>;
> linux-scsi@vger.kernel.org; jejb@linux.ibm.com
> Cc: Mike Christie <michael.christie@oracle.com>
> Subject: [EXT] [PATCH 10/13] scsi: qedi: fix session block/unblock abuse during
> cleanup
> 
> External Email
> 
> ----------------------------------------------------------------------
> Drivers shouldn't be calling block/unblock session for cmd cleanup because the
> functions can change the session state from under libiscsi.
> This adds a new a driver level bit so it can block all IO the host while it drains the
> card.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/scsi/qedi/qedi.h       |  1 +
>  drivers/scsi/qedi/qedi_iscsi.c | 17 +++++++++++++++--
>  2 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/qedi/qedi.h b/drivers/scsi/qedi/qedi.h index
> c342defc3f52..ce199a7a16b8 100644
> --- a/drivers/scsi/qedi/qedi.h
> +++ b/drivers/scsi/qedi/qedi.h
> @@ -284,6 +284,7 @@ struct qedi_ctx {
>  #define QEDI_IN_RECOVERY	5
>  #define QEDI_IN_OFFLINE		6
>  #define QEDI_IN_SHUTDOWN	7
> +#define QEDI_BLOCK_IO		8
> 
>  	u8 mac[ETH_ALEN];
>  	u32 src_ip[4];
> diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c index
> 821225f9beb0..536d6653ef8e 100644
> --- a/drivers/scsi/qedi/qedi_iscsi.c
> +++ b/drivers/scsi/qedi/qedi_iscsi.c
> @@ -330,12 +330,22 @@ qedi_conn_create(struct iscsi_cls_session
> *cls_session, uint32_t cid)
> 
>  void qedi_mark_device_missing(struct iscsi_cls_session *cls_session)  {
> -	iscsi_block_session(cls_session);
> +	struct iscsi_session *session = cls_session->dd_data;
> +	struct qedi_conn *qedi_conn = session->leadconn->dd_data;
> +
> +	spin_lock_bh(&session->frwd_lock);
> +	set_bit(QEDI_BLOCK_IO, &qedi_conn->qedi->flags);
> +	spin_unlock_bh(&session->frwd_lock);
>  }
> 
>  void qedi_mark_device_available(struct iscsi_cls_session *cls_session)  {
> -	iscsi_unblock_session(cls_session);
> +	struct iscsi_session *session = cls_session->dd_data;
> +	struct qedi_conn *qedi_conn = session->leadconn->dd_data;
> +
> +	spin_lock_bh(&session->frwd_lock);
> +	clear_bit(QEDI_BLOCK_IO, &qedi_conn->qedi->flags);
> +	spin_unlock_bh(&session->frwd_lock);
>  }
> 
>  static int qedi_bind_conn_to_iscsi_cid(struct qedi_ctx *qedi, @@ -789,6 +799,9
> @@ static int qedi_task_xmit(struct iscsi_task *task)
>  	if (test_bit(QEDI_IN_SHUTDOWN, &qedi_conn->qedi->flags))
>  		return -ENODEV;
> 
> +	if (test_bit(QEDI_BLOCK_IO, &qedi_conn->qedi->flags))
> +		return -EACCES;
> +
>  	cmd->state = 0;
>  	cmd->task = NULL;
>  	cmd->use_slowpath = false;
> --
> 2.25.1

Thanks,
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [EXT] [PATCH 11/13] scsi: qedi: pass send_iscsi_tmf task to abort
  2021-04-10 18:40 ` [PATCH 11/13] scsi: qedi: pass send_iscsi_tmf task to abort Mike Christie
@ 2021-04-12 10:34   ` Manish Rangankar
  0 siblings, 0 replies; 26+ messages in thread
From: Manish Rangankar @ 2021-04-12 10:34 UTC (permalink / raw)
  To: Mike Christie, lduncan, martin.petersen, Santosh Vernekar,
	linux-scsi, jejb

> -----Original Message-----
> From: Mike Christie <michael.christie@oracle.com>
> Sent: Sunday, April 11, 2021 12:10 AM
> To: lduncan@suse.com; martin.petersen@oracle.com; Manish Rangankar
> <mrangankar@marvell.com>; Santosh Vernekar <svernekar@marvell.com>;
> linux-scsi@vger.kernel.org; jejb@linux.ibm.com
> Cc: Mike Christie <michael.christie@oracle.com>
> Subject: [EXT] [PATCH 11/13] scsi: qedi: pass send_iscsi_tmf task to abort
> 
> External Email
> 
> ----------------------------------------------------------------------
> qedi_abort_work knows what task to abort so just pass it to send_iscsi_tmf.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/scsi/qedi/qedi_fw.c | 17 +++++------------
>  1 file changed, 5 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c index
> f13f8af6d931..475cb7823cf1 100644
> --- a/drivers/scsi/qedi/qedi_fw.c
> +++ b/drivers/scsi/qedi/qedi_fw.c
> @@ -15,7 +15,7 @@
>  #include "qedi_fw_scsi.h"
> 
>  static int send_iscsi_tmf(struct qedi_conn *qedi_conn,
> -			  struct iscsi_task *mtask);
> +			  struct iscsi_task *mtask, struct iscsi_task *ctask);
> 
>  void qedi_iscsi_unmap_sg_list(struct qedi_cmd *cmd)  { @@ -1425,7 +1425,7
> @@ static void qedi_abort_work(struct work_struct *work)
>  		goto ldel_exit;
>  	}
> 
> -	send_iscsi_tmf(qedi_conn, qedi_cmd->task);
> +	send_iscsi_tmf(qedi_conn, qedi_cmd->task, ctask);
> 
>  put_task:
>  	iscsi_put_task(ctask);
> @@ -1455,14 +1455,13 @@ static void qedi_abort_work(struct work_struct
> *work)
>  	clear_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);  }
> 
> -static int send_iscsi_tmf(struct qedi_conn *qedi_conn, struct iscsi_task *mtask)
> +static int send_iscsi_tmf(struct qedi_conn *qedi_conn, struct iscsi_task *mtask,
> +			  struct iscsi_task *ctask)
>  {
>  	struct iscsi_tmf_request_hdr tmf_pdu_header;
>  	struct iscsi_task_params task_params;
>  	struct qedi_ctx *qedi = qedi_conn->qedi;
>  	struct e4_iscsi_task_context *fw_task_ctx;
> -	struct iscsi_conn *conn = qedi_conn->cls_conn->dd_data;
> -	struct iscsi_task *ctask;
>  	struct iscsi_tm *tmf_hdr;
>  	struct qedi_cmd *qedi_cmd;
>  	struct qedi_cmd *cmd;
> @@ -1502,12 +1501,6 @@ static int send_iscsi_tmf(struct qedi_conn
> *qedi_conn, struct iscsi_task *mtask)
> 
>  	if ((tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) ==
>  	     ISCSI_TM_FUNC_ABORT_TASK) {
> -		ctask = iscsi_itt_to_task(conn, tmf_hdr->rtt);
> -		if (!ctask || !ctask->sc) {
> -			QEDI_ERR(&qedi->dbg_ctx,
> -				 "Could not get reference task\n");
> -			return 0;
> -		}
>  		cmd = (struct qedi_cmd *)ctask->dd_data;
>  		tmf_pdu_header.rtt =
>  				qedi_set_itt(cmd->task_id,
> @@ -1560,7 +1553,7 @@ int qedi_send_iscsi_tmf(struct qedi_conn *qedi_conn,
> struct iscsi_task *mtask)
>  	case ISCSI_TM_FUNC_LOGICAL_UNIT_RESET:
>  	case ISCSI_TM_FUNC_TARGET_WARM_RESET:
>  	case ISCSI_TM_FUNC_TARGET_COLD_RESET:
> -		rc = send_iscsi_tmf(qedi_conn, mtask);
> +		rc = send_iscsi_tmf(qedi_conn, mtask, NULL);
>  		break;
>  	default:
>  		QEDI_ERR(&qedi->dbg_ctx, "Invalid tmf, cid=0x%x\n",
> --
> 2.25.1

Thanks,
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [EXT] [PATCH 13/13] scsi: qedi: always wake up if cmd_cleanup_req is set
  2021-04-10 18:40 ` [PATCH 13/13] scsi: qedi: always wake up if cmd_cleanup_req is set Mike Christie
@ 2021-04-12 10:35   ` Manish Rangankar
  0 siblings, 0 replies; 26+ messages in thread
From: Manish Rangankar @ 2021-04-12 10:35 UTC (permalink / raw)
  To: Mike Christie, lduncan, martin.petersen, Santosh Vernekar,
	linux-scsi, jejb

> -----Original Message-----
> From: Mike Christie <michael.christie@oracle.com>
> Sent: Sunday, April 11, 2021 12:10 AM
> To: lduncan@suse.com; martin.petersen@oracle.com; Manish Rangankar
> <mrangankar@marvell.com>; Santosh Vernekar <svernekar@marvell.com>;
> linux-scsi@vger.kernel.org; jejb@linux.ibm.com
> Cc: Mike Christie <michael.christie@oracle.com>
> Subject: [EXT] [PATCH 13/13] scsi: qedi: always wake up if cmd_cleanup_req is
> set
> 
> External Email
> 
> ----------------------------------------------------------------------
> If we got a response then we should always wake up the conn. For both the
> cmd_cleanup_req == 0 or cmd_cleanup_req > 0, we shouldn't dig into
> iscsi_itt_to_task because we don't know what the upper layers are doing.
> 
> We can also remove the qedi_clear_task_idx call here because once we signal
> success libiscsi will loop over the affected commands and end up calling the
> cleanup_task callout which will release it.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/scsi/qedi/qedi_fw.c | 31 ++++---------------------------
>  1 file changed, 4 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c index
> 13dd06915d74..13d1250951a6 100644
> --- a/drivers/scsi/qedi/qedi_fw.c
> +++ b/drivers/scsi/qedi/qedi_fw.c
> @@ -739,7 +739,6 @@ static void qedi_process_cmd_cleanup_resp(struct
> qedi_ctx *qedi,  {
>  	struct qedi_work_map *work, *work_tmp;
>  	u32 proto_itt = cqe->itid;
> -	u32 ptmp_itt = 0;
>  	itt_t protoitt = 0;
>  	int found = 0;
>  	struct qedi_cmd *qedi_cmd = NULL;
> @@ -823,37 +822,15 @@ static void qedi_process_cmd_cleanup_resp(struct
> qedi_ctx *qedi,
> 
>  check_cleanup_reqs:
>  	if (qedi_conn->cmd_cleanup_req > 0) {
> -		spin_lock_bh(&conn->session->back_lock);
> -		qedi_get_proto_itt(qedi, cqe->itid, &ptmp_itt);
> -		protoitt = build_itt(ptmp_itt, conn->session->age);
> -		task = iscsi_itt_to_task(conn, protoitt);
> -		QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_SCSI_TM,
> -			  "cleanup io itid=0x%x, protoitt=0x%x,
> cmd_cleanup_cmpl=%d, cid=0x%x\n",
> -			  cqe->itid, protoitt, qedi_conn->cmd_cleanup_cmpl,
> -			  qedi_conn->iscsi_conn_id);
> -
> -		spin_unlock_bh(&conn->session->back_lock);
> -		if (!task) {
> -			QEDI_NOTICE(&qedi->dbg_ctx,
> -				    "task is null, itid=0x%x, cid=0x%x\n",
> -				    cqe->itid, qedi_conn->iscsi_conn_id);
> -			return;
> -		}
> -		qedi_conn->cmd_cleanup_cmpl++;
> -		wake_up(&qedi_conn->wait_queue);
> -
>  		QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_TID,
>  			  "Freeing tid=0x%x for cid=0x%x\n",
>  			  cqe->itid, qedi_conn->iscsi_conn_id);
> -		qedi_clear_task_idx(qedi_conn->qedi, cqe->itid);
> -
> +		qedi_conn->cmd_cleanup_cmpl++;
> +		wake_up(&qedi_conn->wait_queue);
>  	} else {
> -		qedi_get_proto_itt(qedi, cqe->itid, &ptmp_itt);
> -		protoitt = build_itt(ptmp_itt, conn->session->age);
> -		task = iscsi_itt_to_task(conn, protoitt);
>  		QEDI_ERR(&qedi->dbg_ctx,
> -			 "Delayed or untracked cleanup response, itt=0x%x,
> tid=0x%x, cid=0x%x, task=%p\n",
> -			 protoitt, cqe->itid, qedi_conn->iscsi_conn_id, task);
> +			 "Delayed or untracked cleanup response, itt=0x%x,
> tid=0x%x, cid=0x%x\n",
> +			 protoitt, cqe->itid, qedi_conn->iscsi_conn_id);
>  	}
>  }
> 
> --
> 2.25.1

Thanks,
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [EXT] [PATCH 12/13] scsi: qedi: make sure tmf works are done before disconnect
  2021-04-10 18:40 ` [PATCH 12/13] scsi: qedi: make sure tmf works are done before disconnect Mike Christie
@ 2021-04-12 10:36   ` Manish Rangankar
  0 siblings, 0 replies; 26+ messages in thread
From: Manish Rangankar @ 2021-04-12 10:36 UTC (permalink / raw)
  To: Mike Christie, lduncan, martin.petersen, Santosh Vernekar,
	linux-scsi, jejb

> -----Original Message-----
> From: Mike Christie <michael.christie@oracle.com>
> Sent: Sunday, April 11, 2021 12:10 AM
> To: lduncan@suse.com; martin.petersen@oracle.com; Manish Rangankar
> <mrangankar@marvell.com>; Santosh Vernekar <svernekar@marvell.com>;
> linux-scsi@vger.kernel.org; jejb@linux.ibm.com
> Cc: Mike Christie <michael.christie@oracle.com>
> Subject: [EXT] [PATCH 12/13] scsi: qedi: make sure tmf works are done before
> disconnect
> 
> External Email
> 
> ----------------------------------------------------------------------
> We need to make sure that abort and reset completion work has completed
> before ep_disconnect returns. After ep_disconnect we can't manipulate cmds
> because libiscsi will call conn_stop and take onwership.
> 
> We are trying to make sure abort work and reset completion work has
> completed before we do the cmd clean up in ep_disconnect. The problem is
> that:
> 
> 1. the work function sets the QEDI_CONN_FW_CLEANUP bit, so if the work was
> still pending we would not see the bit set. We need to do this before the work is
> queued.
> 
> 2. If we had multiple works queued then we could break from the loop in
> qedi_ep_disconnect early because when abort work 1 completes it could clear
> QEDI_CONN_FW_CLEANUP. qedi_ep_disconnect could then see that before
> work 2 has run.
> 
> 3. A tmf reset completion work could run after ep_disconnect starts cleaning up
> cmds via qedi_clearsq. ep_disconnect's call to qedi_clearsq
> -> qedi_cleanup_all_io would might think it's done cleaning up cmds,
> but the reset completion work could still be running. We then return from
> ep_disconnect while still doing cleanup.
> 
> This replaces the bit with a counter and adds a bool to prevent new works from
> starting from the completion path once a ep_disconnect starts.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/scsi/qedi/qedi_fw.c    | 46 +++++++++++++++++++++-------------
>  drivers/scsi/qedi/qedi_iscsi.c | 23 +++++++++++------
> drivers/scsi/qedi/qedi_iscsi.h |  4 +--
>  3 files changed, 47 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c index
> 475cb7823cf1..13dd06915d74 100644
> --- a/drivers/scsi/qedi/qedi_fw.c
> +++ b/drivers/scsi/qedi/qedi_fw.c
> @@ -156,7 +156,6 @@ static void qedi_tmf_resp_work(struct work_struct
> *work)
>  	struct iscsi_tm_rsp *resp_hdr_ptr;
>  	int rval = 0;
> 
> -	set_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
>  	resp_hdr_ptr =  (struct iscsi_tm_rsp *)qedi_cmd->tmf_resp_buf;
> 
>  	rval = qedi_cleanup_all_io(qedi, qedi_conn, qedi_cmd->task, true); @@
> -169,7 +168,10 @@ static void qedi_tmf_resp_work(struct work_struct *work)
> 
>  exit_tmf_resp:
>  	kfree(resp_hdr_ptr);
> -	clear_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
> +
> +	spin_lock(&qedi_conn->tmf_work_lock);
> +	qedi_conn->fw_cleanup_works--;
> +	spin_unlock(&qedi_conn->tmf_work_lock);
>  }
> 
>  static void qedi_process_tmf_resp(struct qedi_ctx *qedi, @@ -225,16 +227,25
> @@ static void qedi_process_tmf_resp(struct qedi_ctx *qedi,
>  	}
>  	spin_unlock(&qedi_conn->list_lock);
> 
> -	if (((tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) ==
> -	      ISCSI_TM_FUNC_LOGICAL_UNIT_RESET) ||
> -	    ((tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) ==
> -	      ISCSI_TM_FUNC_TARGET_WARM_RESET) ||
> -	    ((tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) ==
> -	      ISCSI_TM_FUNC_TARGET_COLD_RESET)) {
> +	spin_lock(&qedi_conn->tmf_work_lock);
> +	switch (tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) {
> +	case ISCSI_TM_FUNC_LOGICAL_UNIT_RESET:
> +	case ISCSI_TM_FUNC_TARGET_WARM_RESET:
> +	case ISCSI_TM_FUNC_TARGET_COLD_RESET:
> +		if (qedi_conn->ep_disconnect_starting) {
> +			/* Session is down so ep_disconnect will clean up */
> +			spin_unlock(&qedi_conn->tmf_work_lock);
> +			goto unblock_sess;
> +		}
> +
> +		qedi_conn->fw_cleanup_works++;
> +		spin_unlock(&qedi_conn->tmf_work_lock);
> +
>  		INIT_WORK(&qedi_cmd->tmf_work, qedi_tmf_resp_work);
>  		queue_work(qedi->tmf_thread, &qedi_cmd->tmf_work);
>  		goto unblock_sess;
>  	}
> +	spin_unlock(&qedi_conn->tmf_work_lock);
> 
>  	__iscsi_complete_pdu(conn, (struct iscsi_hdr *)resp_hdr_ptr, NULL, 0);
>  	kfree(resp_hdr_ptr);
> @@ -1361,7 +1372,6 @@ static void qedi_abort_work(struct work_struct
> *work)
> 
>  	mtask = qedi_cmd->task;
>  	tmf_hdr = (struct iscsi_tm *)mtask->hdr;
> -	set_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
> 
>  	spin_lock_bh(&conn->session->back_lock);
>  	ctask = iscsi_itt_to_ctask(conn, tmf_hdr->rtt); @@ -1427,11 +1437,7
> @@ static void qedi_abort_work(struct work_struct *work)
> 
>  	send_iscsi_tmf(qedi_conn, qedi_cmd->task, ctask);
> 
> -put_task:
> -	iscsi_put_task(ctask);
> -clear_cleanup:
> -	clear_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
> -	return;
> +	goto put_task;
> 
>  ldel_exit:
>  	spin_lock_bh(&qedi_conn->tmf_work_lock);
> @@ -1449,10 +1455,12 @@ static void qedi_abort_work(struct work_struct
> *work)
>  		qedi_conn->active_cmd_count--;
>  	}
>  	spin_unlock(&qedi_conn->list_lock);
> -
> +put_task:
>  	iscsi_put_task(ctask);
> -
> -	clear_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
> +clear_cleanup:
> +	spin_lock(&qedi_conn->tmf_work_lock);
> +	qedi_conn->fw_cleanup_works--;
> +	spin_unlock(&qedi_conn->tmf_work_lock);
>  }
> 
>  static int send_iscsi_tmf(struct qedi_conn *qedi_conn, struct iscsi_task *mtask,
> @@ -1547,6 +1555,10 @@ int qedi_send_iscsi_tmf(struct qedi_conn
> *qedi_conn, struct iscsi_task *mtask)
> 
>  	switch (tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) {
>  	case ISCSI_TM_FUNC_ABORT_TASK:
> +		spin_lock(&qedi_conn->tmf_work_lock);
> +		qedi_conn->fw_cleanup_works++;
> +		spin_unlock(&qedi_conn->tmf_work_lock);
> +
>  		INIT_WORK(&qedi_cmd->tmf_work, qedi_abort_work);
>  		queue_work(qedi->tmf_thread, &qedi_cmd->tmf_work);
>  		break;
> diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c index
> 536d6653ef8e..8bbdd45ff2a1 100644
> --- a/drivers/scsi/qedi/qedi_iscsi.c
> +++ b/drivers/scsi/qedi/qedi_iscsi.c
> @@ -592,7 +592,11 @@ static int qedi_conn_start(struct iscsi_cls_conn
> *cls_conn)
>  		goto start_err;
>  	}
> 
> -	clear_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
> +	spin_lock(&qedi_conn->tmf_work_lock);
> +	qedi_conn->fw_cleanup_works = 0;
> +	qedi_conn->ep_disconnect_starting = false;
> +	spin_unlock(&qedi_conn->tmf_work_lock);
> +
>  	qedi_conn->abrt_conn = 0;
> 
>  	rval = iscsi_conn_start(cls_conn);
> @@ -1009,7 +1013,6 @@ static void qedi_ep_disconnect(struct iscsi_endpoint
> *ep)
>  	int ret = 0;
>  	int wait_delay;
>  	int abrt_conn = 0;
> -	int count = 10;
> 
>  	wait_delay = 60 * HZ + DEF_MAX_RT_TIME;
>  	qedi_ep = ep->dd_data;
> @@ -1027,13 +1030,19 @@ static void qedi_ep_disconnect(struct
> iscsi_endpoint *ep)
>  		iscsi_suspend_queue(conn);
>  		abrt_conn = qedi_conn->abrt_conn;
> 
> -		while (count--)	{
> -			if (!test_bit(QEDI_CONN_FW_CLEANUP,
> -				      &qedi_conn->flags)) {
> -				break;
> -			}
> +		QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_INFO,
> +			  "cid=0x%x qedi_ep=%p waiting for %d tmfs\n",
> +			  qedi_ep->iscsi_cid, qedi_ep,
> +			  qedi_conn->fw_cleanup_works);
> +
> +		spin_lock(&qedi_conn->tmf_work_lock);
> +		qedi_conn->ep_disconnect_starting = true;
> +		while (qedi_conn->fw_cleanup_works > 0) {
> +			spin_unlock(&qedi_conn->tmf_work_lock);
>  			msleep(1000);
> +			spin_lock(&qedi_conn->tmf_work_lock);
>  		}
> +		spin_unlock(&qedi_conn->tmf_work_lock);
> 
>  		if (test_bit(QEDI_IN_RECOVERY, &qedi->flags)) {
>  			if (qedi_do_not_recover) {
> diff --git a/drivers/scsi/qedi/qedi_iscsi.h b/drivers/scsi/qedi/qedi_iscsi.h index
> 68ef519f5480..758735209e15 100644
> --- a/drivers/scsi/qedi/qedi_iscsi.h
> +++ b/drivers/scsi/qedi/qedi_iscsi.h
> @@ -169,8 +169,8 @@ struct qedi_conn {
>  	struct list_head tmf_work_list;
>  	wait_queue_head_t wait_queue;
>  	spinlock_t tmf_work_lock;	/* tmf work lock */
> -	unsigned long flags;
> -#define QEDI_CONN_FW_CLEANUP	1
> +	bool ep_disconnect_starting;
> +	int fw_cleanup_works;
>  };
> 
>  struct qedi_cmd {
> --
> 2.25.1

Thanks,
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 12/13] scsi: qedi: make sure tmf works are done before disconnect
  2021-04-13 23:06 [PATCH 00/13] scsi: libicsi and " Mike Christie
@ 2021-04-13 23:06 ` Mike Christie
  0 siblings, 0 replies; 26+ messages in thread
From: Mike Christie @ 2021-04-13 23:06 UTC (permalink / raw)
  To: lduncan, martin.petersen, mrangankar, svernekar, linux-scsi, jejb
  Cc: Mike Christie

We need to make sure that abort and reset completion work has completed
before ep_disconnect returns. After ep_disconnect we can't manipulate
cmds because libiscsi will call conn_stop and take onwership.

We are trying to make sure abort work and reset completion work has
completed before we do the cmd clean up in ep_disconnect. The problem is
that:

1. the work function sets the QEDI_CONN_FW_CLEANUP bit, so if the work
was still pending we would not see the bit set. We need to do this before
the work is queued.

2. If we had multiple works queued then we could break from the loop in
qedi_ep_disconnect early because when abort work 1 completes it could
clear QEDI_CONN_FW_CLEANUP. qedi_ep_disconnect could then see that before
work 2 has run.

3. A tmf reset completion work could run after ep_disconnect starts
cleaning up cmds via qedi_clearsq. ep_disconnect's call to qedi_clearsq
-> qedi_cleanup_all_io would might think it's done cleaning up cmds,
but the reset completion work could still be running. We then return
from ep_disconnect while still doing cleanup.

This replaces the bit with a counter and adds a bool to prevent new
works from starting from the completion path once a ep_disconnect starts.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
---
 drivers/scsi/qedi/qedi_fw.c    | 46 +++++++++++++++++++++-------------
 drivers/scsi/qedi/qedi_iscsi.c | 23 +++++++++++------
 drivers/scsi/qedi/qedi_iscsi.h |  4 +--
 3 files changed, 47 insertions(+), 26 deletions(-)

diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c
index 475cb7823cf1..13dd06915d74 100644
--- a/drivers/scsi/qedi/qedi_fw.c
+++ b/drivers/scsi/qedi/qedi_fw.c
@@ -156,7 +156,6 @@ static void qedi_tmf_resp_work(struct work_struct *work)
 	struct iscsi_tm_rsp *resp_hdr_ptr;
 	int rval = 0;
 
-	set_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
 	resp_hdr_ptr =  (struct iscsi_tm_rsp *)qedi_cmd->tmf_resp_buf;
 
 	rval = qedi_cleanup_all_io(qedi, qedi_conn, qedi_cmd->task, true);
@@ -169,7 +168,10 @@ static void qedi_tmf_resp_work(struct work_struct *work)
 
 exit_tmf_resp:
 	kfree(resp_hdr_ptr);
-	clear_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
+
+	spin_lock(&qedi_conn->tmf_work_lock);
+	qedi_conn->fw_cleanup_works--;
+	spin_unlock(&qedi_conn->tmf_work_lock);
 }
 
 static void qedi_process_tmf_resp(struct qedi_ctx *qedi,
@@ -225,16 +227,25 @@ static void qedi_process_tmf_resp(struct qedi_ctx *qedi,
 	}
 	spin_unlock(&qedi_conn->list_lock);
 
-	if (((tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) ==
-	      ISCSI_TM_FUNC_LOGICAL_UNIT_RESET) ||
-	    ((tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) ==
-	      ISCSI_TM_FUNC_TARGET_WARM_RESET) ||
-	    ((tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) ==
-	      ISCSI_TM_FUNC_TARGET_COLD_RESET)) {
+	spin_lock(&qedi_conn->tmf_work_lock);
+	switch (tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) {
+	case ISCSI_TM_FUNC_LOGICAL_UNIT_RESET:
+	case ISCSI_TM_FUNC_TARGET_WARM_RESET:
+	case ISCSI_TM_FUNC_TARGET_COLD_RESET:
+		if (qedi_conn->ep_disconnect_starting) {
+			/* Session is down so ep_disconnect will clean up */
+			spin_unlock(&qedi_conn->tmf_work_lock);
+			goto unblock_sess;
+		}
+
+		qedi_conn->fw_cleanup_works++;
+		spin_unlock(&qedi_conn->tmf_work_lock);
+
 		INIT_WORK(&qedi_cmd->tmf_work, qedi_tmf_resp_work);
 		queue_work(qedi->tmf_thread, &qedi_cmd->tmf_work);
 		goto unblock_sess;
 	}
+	spin_unlock(&qedi_conn->tmf_work_lock);
 
 	__iscsi_complete_pdu(conn, (struct iscsi_hdr *)resp_hdr_ptr, NULL, 0);
 	kfree(resp_hdr_ptr);
@@ -1361,7 +1372,6 @@ static void qedi_abort_work(struct work_struct *work)
 
 	mtask = qedi_cmd->task;
 	tmf_hdr = (struct iscsi_tm *)mtask->hdr;
-	set_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
 
 	spin_lock_bh(&conn->session->back_lock);
 	ctask = iscsi_itt_to_ctask(conn, tmf_hdr->rtt);
@@ -1427,11 +1437,7 @@ static void qedi_abort_work(struct work_struct *work)
 
 	send_iscsi_tmf(qedi_conn, qedi_cmd->task, ctask);
 
-put_task:
-	iscsi_put_task(ctask);
-clear_cleanup:
-	clear_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
-	return;
+	goto put_task;
 
 ldel_exit:
 	spin_lock_bh(&qedi_conn->tmf_work_lock);
@@ -1449,10 +1455,12 @@ static void qedi_abort_work(struct work_struct *work)
 		qedi_conn->active_cmd_count--;
 	}
 	spin_unlock(&qedi_conn->list_lock);
-
+put_task:
 	iscsi_put_task(ctask);
-
-	clear_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
+clear_cleanup:
+	spin_lock(&qedi_conn->tmf_work_lock);
+	qedi_conn->fw_cleanup_works--;
+	spin_unlock(&qedi_conn->tmf_work_lock);
 }
 
 static int send_iscsi_tmf(struct qedi_conn *qedi_conn, struct iscsi_task *mtask,
@@ -1547,6 +1555,10 @@ int qedi_send_iscsi_tmf(struct qedi_conn *qedi_conn, struct iscsi_task *mtask)
 
 	switch (tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) {
 	case ISCSI_TM_FUNC_ABORT_TASK:
+		spin_lock(&qedi_conn->tmf_work_lock);
+		qedi_conn->fw_cleanup_works++;
+		spin_unlock(&qedi_conn->tmf_work_lock);
+
 		INIT_WORK(&qedi_cmd->tmf_work, qedi_abort_work);
 		queue_work(qedi->tmf_thread, &qedi_cmd->tmf_work);
 		break;
diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c
index 9a2d9a29fc01..17f1cbb376a4 100644
--- a/drivers/scsi/qedi/qedi_iscsi.c
+++ b/drivers/scsi/qedi/qedi_iscsi.c
@@ -592,7 +592,11 @@ static int qedi_conn_start(struct iscsi_cls_conn *cls_conn)
 		goto start_err;
 	}
 
-	clear_bit(QEDI_CONN_FW_CLEANUP, &qedi_conn->flags);
+	spin_lock(&qedi_conn->tmf_work_lock);
+	qedi_conn->fw_cleanup_works = 0;
+	qedi_conn->ep_disconnect_starting = false;
+	spin_unlock(&qedi_conn->tmf_work_lock);
+
 	qedi_conn->abrt_conn = 0;
 
 	rval = iscsi_conn_start(cls_conn);
@@ -1009,7 +1013,6 @@ static void qedi_ep_disconnect(struct iscsi_endpoint *ep)
 	int ret = 0;
 	int wait_delay;
 	int abrt_conn = 0;
-	int count = 10;
 
 	wait_delay = 60 * HZ + DEF_MAX_RT_TIME;
 	qedi_ep = ep->dd_data;
@@ -1027,13 +1030,19 @@ static void qedi_ep_disconnect(struct iscsi_endpoint *ep)
 		iscsi_ep_prep_disconnect(conn);
 		abrt_conn = qedi_conn->abrt_conn;
 
-		while (count--)	{
-			if (!test_bit(QEDI_CONN_FW_CLEANUP,
-				      &qedi_conn->flags)) {
-				break;
-			}
+		QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_INFO,
+			  "cid=0x%x qedi_ep=%p waiting for %d tmfs\n",
+			  qedi_ep->iscsi_cid, qedi_ep,
+			  qedi_conn->fw_cleanup_works);
+
+		spin_lock(&qedi_conn->tmf_work_lock);
+		qedi_conn->ep_disconnect_starting = true;
+		while (qedi_conn->fw_cleanup_works > 0) {
+			spin_unlock(&qedi_conn->tmf_work_lock);
 			msleep(1000);
+			spin_lock(&qedi_conn->tmf_work_lock);
 		}
+		spin_unlock(&qedi_conn->tmf_work_lock);
 
 		if (test_bit(QEDI_IN_RECOVERY, &qedi->flags)) {
 			if (qedi_do_not_recover) {
diff --git a/drivers/scsi/qedi/qedi_iscsi.h b/drivers/scsi/qedi/qedi_iscsi.h
index 68ef519f5480..758735209e15 100644
--- a/drivers/scsi/qedi/qedi_iscsi.h
+++ b/drivers/scsi/qedi/qedi_iscsi.h
@@ -169,8 +169,8 @@ struct qedi_conn {
 	struct list_head tmf_work_list;
 	wait_queue_head_t wait_queue;
 	spinlock_t tmf_work_lock;	/* tmf work lock */
-	unsigned long flags;
-#define QEDI_CONN_FW_CLEANUP	1
+	bool ep_disconnect_starting;
+	int fw_cleanup_works;
 };
 
 struct qedi_cmd {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-04-13 23:07 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-10 18:40 [RFC PATCH 0/13]: qedi tmf fixes Mike Christie
2021-04-10 18:40 ` [PATCH 01/13] scsi: iscsi: revert Perform connection failure entirely in kernel space Mike Christie
2021-04-10 18:40 ` [PATCH 02/13] scsi: qedi: fix null ref during abort handling Mike Christie
2021-04-12 10:25   ` [EXT] " Manish Rangankar
2021-04-10 18:40 ` [PATCH 03/13] scsi: iscsi: add task completion helper Mike Christie
2021-04-10 18:40 ` [PATCH 04/13] scsi: qedi: fix abort vs cmd re-use race Mike Christie
2021-04-12 10:24   ` [EXT] " Manish Rangankar
2021-04-10 18:40 ` [PATCH 05/13] scsi: qedi: fix use after free during abort cleanup Mike Christie
2021-04-12 10:26   ` [EXT] " Manish Rangankar
2021-04-10 18:40 ` [PATCH 06/13] scsi: qedi: fix tmf tid allocation Mike Christie
2021-04-12 10:27   ` [EXT] " Manish Rangankar
2021-04-10 18:40 ` [PATCH 07/13] scsi: qedi: use GFP_NOIO for tmf allocation Mike Christie
2021-04-12 10:28   ` [EXT] " Manish Rangankar
2021-04-10 18:40 ` [PATCH 08/13] scsi: iscsi: fix lun/target reset cmd cleanup handling Mike Christie
2021-04-10 18:40 ` [PATCH 09/13] scsi: qedi: fix session block/unblock abuse during tmf handling Mike Christie
2021-04-12 10:31   ` [EXT] " Manish Rangankar
2021-04-10 18:40 ` [PATCH 10/13] scsi: qedi: fix session block/unblock abuse during cleanup Mike Christie
2021-04-12 10:33   ` [EXT] " Manish Rangankar
2021-04-10 18:40 ` [PATCH 11/13] scsi: qedi: pass send_iscsi_tmf task to abort Mike Christie
2021-04-12 10:34   ` [EXT] " Manish Rangankar
2021-04-10 18:40 ` [PATCH 12/13] scsi: qedi: make sure tmf works are done before disconnect Mike Christie
2021-04-12 10:36   ` [EXT] " Manish Rangankar
2021-04-10 18:40 ` [PATCH 13/13] scsi: qedi: always wake up if cmd_cleanup_req is set Mike Christie
2021-04-12 10:35   ` [EXT] " Manish Rangankar
2021-04-12 10:22 ` [EXT] [RFC PATCH 0/13]: qedi tmf fixes Manish Rangankar
2021-04-13 23:06 [PATCH 00/13] scsi: libicsi and " Mike Christie
2021-04-13 23:06 ` [PATCH 12/13] scsi: qedi: make sure tmf works are done before disconnect Mike Christie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.