All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/6] iscsi: Fix in kernel conn failure handling
@ 2021-04-24 22:17 Mike Christie
  2021-04-24 22:17 ` [PATCH v3 1/6] scsi: iscsi: force immediate failure during shutdown Mike Christie
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Mike Christie @ 2021-04-24 22:17 UTC (permalink / raw)
  To: khazhy, lduncan, martin.petersen, rbharath, krisman, linux-scsi, jejb

These patches are made over the qedi/libiscsi TMF fixes:

https://lore.kernel.org/linux-scsi/20210424220603.123703-1-michael.christie@oracle.com/T/#t

The patch 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely
in kernel space") has the following regressions or bugs that this patch
set fixes.

1. It can return cmds to upper layers like dm-multipath where that can
retry them. After they are successful the fs/app can send new IO to the
same sectors, but we've left the cmds running in FW or in the net layer.
We need to be calling ep_disconnect.

2. The drivers that implement ep_disconnect expect that it's called
before conn_stop. Besides crashes, if the cleanup_task callout is called
before ep_disconnect it might free up driver/card resources for session1
then they could be allocated for session2. But because the driver's
ep_disconnect is not called it has not cleaned up the firmware so the
card is still using the resources for the original cmd.

3. The system shutdown case does not work for the eh path. Passing
stop_conn STOP_CONN_TERM will never block the session and start the
recovery timer, because for that flag userspace will do the unbind
and destroy events which would remove the devices and wake up and kill
the eh. We should be using STOP_CONN_RECOVER.

4. The stop_conn_work_fn can run after userspace has done it's
recovery and we are happily using the session. We will then end up
with various bugs depending on what is going on at the time.

We may also run stop_conn_work_fn late after userspace has called
stop_conn and ep_disconnect and is now going to call start/bind
conn. If stop_conn_work_fn runs after bind but before start,
we would leave the conn in a unbound but sort of started state where
IO might be allowed even though the drivers have been set in a state
where they no longer expect IO.

5. returning -EAGAIN in iscsi_if_destroy_conn if we haven't yet run
the in kernel stop_conn function is breaking userspace. We should have
been doing this for the caller.


V3:
- Rebase over
https://lore.kernel.org/linux-scsi/20210424220603.123703-1-michael.christie@oracle.com/T/#t
- Add fix for if we are doing offload boot and hit a conn failure at the
same time userspace is syncing up it's state and doing a relogin.
- Drop RFC as google has tested.

V2:
- Handle second part of #4 above and fix missing locking
- Include iscsi_tcp kernel sock shutdown patch



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 1/6] scsi: iscsi: force immediate failure during shutdown
  2021-04-24 22:17 [PATCH v3 0/6] iscsi: Fix in kernel conn failure handling Mike Christie
@ 2021-04-24 22:17 ` Mike Christie
  2021-04-28 19:00   ` Lee Duncan
  2021-04-24 22:17 ` [PATCH v3 2/6] scsi: iscsi: use system_unbound_wq for destroy_work Mike Christie
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Mike Christie @ 2021-04-24 22:17 UTC (permalink / raw)
  To: khazhy, lduncan, martin.petersen, rbharath, krisman, linux-scsi, jejb
  Cc: Mike Christie

If the system is not up, we can just fail immediately since iscsid is not
going to ever answer our netlink events. We are already setting the
recovery_tmo to 0, but by passing stop_conn STOP_CONN_TERM we never will
block the session and start the recovery timer, because for that flag
userspace will do the unbind and destroy events which would remove the
devices and wake up and kill the eh.

Since the conn is dead and the system is going dowm this just has us use
STOP_CONN_RECOVER with recovery_tmo=0 so we fail immediately.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/scsi_transport_iscsi.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c
index 82491343e94a..0cd9f2090993 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -2513,11 +2513,11 @@ static void stop_conn_work_fn(struct work_struct *work)
 		session = iscsi_session_lookup(sid);
 		if (session) {
 			if (system_state != SYSTEM_RUNNING) {
+				/* Force recovery to fail immediately */
 				session->recovery_tmo = 0;
-				iscsi_if_stop_conn(conn, STOP_CONN_TERM);
-			} else {
-				iscsi_if_stop_conn(conn, STOP_CONN_RECOVER);
 			}
+
+			iscsi_if_stop_conn(conn, STOP_CONN_RECOVER);
 		}
 
 		list_del_init(&conn->conn_list_err);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 2/6] scsi: iscsi: use system_unbound_wq for destroy_work
  2021-04-24 22:17 [PATCH v3 0/6] iscsi: Fix in kernel conn failure handling Mike Christie
  2021-04-24 22:17 ` [PATCH v3 1/6] scsi: iscsi: force immediate failure during shutdown Mike Christie
@ 2021-04-24 22:17 ` Mike Christie
  2021-04-28 19:02   ` Lee Duncan
  2021-04-24 22:17 ` [PATCH v3 3/6] scsi: iscsi: have callers do the put on iscsi_lookup_endpoint Mike Christie
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Mike Christie @ 2021-04-24 22:17 UTC (permalink / raw)
  To: khazhy, lduncan, martin.petersen, rbharath, krisman, linux-scsi, jejb
  Cc: Mike Christie

Use the system_unbound_wq for async session destruction. We don't need a
dedicated workqueue for async session destruction because:

1. perf does not seem to be an issue since we only allow 1 active work.
2. it does not have deps with other system works and we can run them
in parallel with each other.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/scsi_transport_iscsi.c | 15 +--------------
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c
index 0cd9f2090993..a23fcf871ffd 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -95,8 +95,6 @@ static DECLARE_WORK(stop_conn_work, stop_conn_work_fn);
 static atomic_t iscsi_session_nr; /* sysfs session id for next new session */
 static struct workqueue_struct *iscsi_eh_timer_workq;
 
-static struct workqueue_struct *iscsi_destroy_workq;
-
 static DEFINE_IDA(iscsi_sess_ida);
 /*
  * list of registered transports and lock that must
@@ -3718,7 +3716,7 @@ iscsi_if_recv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, uint32_t *group)
 			list_del_init(&session->sess_list);
 			spin_unlock_irqrestore(&sesslock, flags);
 
-			queue_work(iscsi_destroy_workq, &session->destroy_work);
+			queue_work(system_unbound_wq, &session->destroy_work);
 		}
 		break;
 	case ISCSI_UEVENT_UNBIND_SESSION:
@@ -4814,18 +4812,8 @@ static __init int iscsi_transport_init(void)
 		goto release_nls;
 	}
 
-	iscsi_destroy_workq = alloc_workqueue("%s",
-			WQ_SYSFS | __WQ_LEGACY | WQ_MEM_RECLAIM | WQ_UNBOUND,
-			1, "iscsi_destroy");
-	if (!iscsi_destroy_workq) {
-		err = -ENOMEM;
-		goto destroy_wq;
-	}
-
 	return 0;
 
-destroy_wq:
-	destroy_workqueue(iscsi_eh_timer_workq);
 release_nls:
 	netlink_kernel_release(nls);
 unregister_flashnode_bus:
@@ -4847,7 +4835,6 @@ static __init int iscsi_transport_init(void)
 
 static void __exit iscsi_transport_exit(void)
 {
-	destroy_workqueue(iscsi_destroy_workq);
 	destroy_workqueue(iscsi_eh_timer_workq);
 	netlink_kernel_release(nls);
 	bus_unregister(&iscsi_flashnode_bus);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 3/6] scsi: iscsi: have callers do the put on iscsi_lookup_endpoint
  2021-04-24 22:17 [PATCH v3 0/6] iscsi: Fix in kernel conn failure handling Mike Christie
  2021-04-24 22:17 ` [PATCH v3 1/6] scsi: iscsi: force immediate failure during shutdown Mike Christie
  2021-04-24 22:17 ` [PATCH v3 2/6] scsi: iscsi: use system_unbound_wq for destroy_work Mike Christie
@ 2021-04-24 22:17 ` Mike Christie
  2021-05-04 22:13   ` Lee Duncan
  2021-04-24 22:17 ` [PATCH v3 4/6] scsi: iscsi: fix in-kernel conn failure handling Mike Christie
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Mike Christie @ 2021-04-24 22:17 UTC (permalink / raw)
  To: khazhy, lduncan, martin.petersen, rbharath, krisman, linux-scsi, jejb
  Cc: Mike Christie

The next patches allow the kernel to do ep_disconnect. In that case we
will have to get a proper refcount on the ep so one thread does not
delete it from under another.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/infiniband/ulp/iser/iscsi_iser.c |  1 +
 drivers/scsi/be2iscsi/be_iscsi.c         | 19 ++++++++++++------
 drivers/scsi/bnx2i/bnx2i_iscsi.c         | 23 +++++++++++++++-------
 drivers/scsi/cxgbi/libcxgbi.c            | 12 ++++++++----
 drivers/scsi/qedi/qedi_iscsi.c           | 25 +++++++++++++++++-------
 drivers/scsi/qla4xxx/ql4_os.c            |  1 +
 drivers/scsi/scsi_transport_iscsi.c      | 25 ++++++++++++++++--------
 include/scsi/scsi_transport_iscsi.h      |  1 +
 8 files changed, 75 insertions(+), 32 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c
index 6baebcb6d14d..776e46ee95da 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -506,6 +506,7 @@ iscsi_iser_conn_bind(struct iscsi_cls_session *cls_session,
 	iser_conn->iscsi_conn = conn;
 
 out:
+	iscsi_put_endpoint(ep);
 	mutex_unlock(&iser_conn->state_mutex);
 	return error;
 }
diff --git a/drivers/scsi/be2iscsi/be_iscsi.c b/drivers/scsi/be2iscsi/be_iscsi.c
index a03d0ebc2312..0990b132d62b 100644
--- a/drivers/scsi/be2iscsi/be_iscsi.c
+++ b/drivers/scsi/be2iscsi/be_iscsi.c
@@ -182,6 +182,7 @@ int beiscsi_conn_bind(struct iscsi_cls_session *cls_session,
 	struct beiscsi_endpoint *beiscsi_ep;
 	struct iscsi_endpoint *ep;
 	uint16_t cri_index;
+	int rc = 0;
 
 	ep = iscsi_lookup_endpoint(transport_fd);
 	if (!ep)
@@ -189,15 +190,17 @@ int beiscsi_conn_bind(struct iscsi_cls_session *cls_session,
 
 	beiscsi_ep = ep->dd_data;
 
-	if (iscsi_conn_bind(cls_session, cls_conn, is_leading))
-		return -EINVAL;
+	if (iscsi_conn_bind(cls_session, cls_conn, is_leading)) {
+		rc = -EINVAL;
+		goto put_ep;
+	}
 
 	if (beiscsi_ep->phba != phba) {
 		beiscsi_log(phba, KERN_ERR, BEISCSI_LOG_CONFIG,
 			    "BS_%d : beiscsi_ep->hba=%p not equal to phba=%p\n",
 			    beiscsi_ep->phba, phba);
-
-		return -EEXIST;
+		rc = -EEXIST;
+		goto put_ep;
 	}
 	cri_index = BE_GET_CRI_FROM_CID(beiscsi_ep->ep_cid);
 	if (phba->conn_table[cri_index]) {
@@ -209,7 +212,8 @@ int beiscsi_conn_bind(struct iscsi_cls_session *cls_session,
 				      beiscsi_ep->ep_cid,
 				      beiscsi_conn,
 				      phba->conn_table[cri_index]);
-			return -EINVAL;
+			rc = -EINVAL;
+			goto put_ep;
 		}
 	}
 
@@ -226,7 +230,10 @@ int beiscsi_conn_bind(struct iscsi_cls_session *cls_session,
 		    "BS_%d : cid %d phba->conn_table[%u]=%p\n",
 		    beiscsi_ep->ep_cid, cri_index, beiscsi_conn);
 	phba->conn_table[cri_index] = beiscsi_conn;
-	return 0;
+
+put_ep:
+	iscsi_put_endpoint(ep);
+	return rc;
 }
 
 static int beiscsi_iface_create_ipv4(struct beiscsi_hba *phba)
diff --git a/drivers/scsi/bnx2i/bnx2i_iscsi.c b/drivers/scsi/bnx2i/bnx2i_iscsi.c
index 9a4f4776a78a..26cb1c6536ce 100644
--- a/drivers/scsi/bnx2i/bnx2i_iscsi.c
+++ b/drivers/scsi/bnx2i/bnx2i_iscsi.c
@@ -1420,17 +1420,23 @@ static int bnx2i_conn_bind(struct iscsi_cls_session *cls_session,
 	 * Forcefully terminate all in progress connection recovery at the
 	 * earliest, either in bind(), send_pdu(LOGIN), or conn_start()
 	 */
-	if (bnx2i_adapter_ready(hba))
-		return -EIO;
+	if (bnx2i_adapter_ready(hba)) {
+		ret_code = -EIO;
+		goto put_ep;
+	}
 
 	bnx2i_ep = ep->dd_data;
 	if ((bnx2i_ep->state == EP_STATE_TCP_FIN_RCVD) ||
-	    (bnx2i_ep->state == EP_STATE_TCP_RST_RCVD))
+	    (bnx2i_ep->state == EP_STATE_TCP_RST_RCVD)) {
 		/* Peer disconnect via' FIN or RST */
-		return -EINVAL;
+		ret_code = -EINVAL;
+		goto put_ep;
+	}
 
-	if (iscsi_conn_bind(cls_session, cls_conn, is_leading))
-		return -EINVAL;
+	if (iscsi_conn_bind(cls_session, cls_conn, is_leading)) {
+		ret_code = -EINVAL;
+		goto put_ep;
+	}
 
 	if (bnx2i_ep->hba != hba) {
 		/* Error - TCP connection does not belong to this device
@@ -1441,7 +1447,8 @@ static int bnx2i_conn_bind(struct iscsi_cls_session *cls_session,
 		iscsi_conn_printk(KERN_ALERT, cls_conn->dd_data,
 				  "belong to hba (%s)\n",
 				  hba->netdev->name);
-		return -EEXIST;
+		ret_code = -EEXIST;
+		goto put_ep;
 	}
 	bnx2i_ep->conn = bnx2i_conn;
 	bnx2i_conn->ep = bnx2i_ep;
@@ -1458,6 +1465,8 @@ static int bnx2i_conn_bind(struct iscsi_cls_session *cls_session,
 		bnx2i_put_rq_buf(bnx2i_conn, 0);
 
 	bnx2i_arm_cq_event_coalescing(bnx2i_conn->ep, CNIC_ARM_CQE);
+put_ep:
+	iscsi_put_endpoint(ep);
 	return ret_code;
 }
 
diff --git a/drivers/scsi/cxgbi/libcxgbi.c b/drivers/scsi/cxgbi/libcxgbi.c
index 215dd0eb3f48..dbe22a7136f3 100644
--- a/drivers/scsi/cxgbi/libcxgbi.c
+++ b/drivers/scsi/cxgbi/libcxgbi.c
@@ -2690,11 +2690,13 @@ int cxgbi_bind_conn(struct iscsi_cls_session *cls_session,
 	err = csk->cdev->csk_ddp_setup_pgidx(csk, csk->tid,
 					     ppm->tformat.pgsz_idx_dflt);
 	if (err < 0)
-		return err;
+		goto put_ep;
 
 	err = iscsi_conn_bind(cls_session, cls_conn, is_leading);
-	if (err)
-		return -EINVAL;
+	if (err) {
+		err = -EINVAL;
+		goto put_ep;
+	}
 
 	/*  calculate the tag idx bits needed for this conn based on cmds_max */
 	cconn->task_idx_bits = (__ilog2_u32(conn->session->cmds_max - 1)) + 1;
@@ -2715,7 +2717,9 @@ int cxgbi_bind_conn(struct iscsi_cls_session *cls_session,
 	/*  init recv engine */
 	iscsi_tcp_hdr_recv_prep(tcp_conn);
 
-	return 0;
+put_ep:
+	iscsi_put_endpoint(ep);
+	return err;
 }
 EXPORT_SYMBOL_GPL(cxgbi_bind_conn);
 
diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c
index 25ff2bda077b..bf581ecea897 100644
--- a/drivers/scsi/qedi/qedi_iscsi.c
+++ b/drivers/scsi/qedi/qedi_iscsi.c
@@ -387,6 +387,7 @@ static int qedi_conn_bind(struct iscsi_cls_session *cls_session,
 	struct qedi_ctx *qedi = iscsi_host_priv(shost);
 	struct qedi_endpoint *qedi_ep;
 	struct iscsi_endpoint *ep;
+	int rc = 0;
 
 	ep = iscsi_lookup_endpoint(transport_fd);
 	if (!ep)
@@ -394,11 +395,16 @@ static int qedi_conn_bind(struct iscsi_cls_session *cls_session,
 
 	qedi_ep = ep->dd_data;
 	if ((qedi_ep->state == EP_STATE_TCP_FIN_RCVD) ||
-	    (qedi_ep->state == EP_STATE_TCP_RST_RCVD))
-		return -EINVAL;
+	    (qedi_ep->state == EP_STATE_TCP_RST_RCVD)) {
+		rc = -EINVAL;
+		goto put_ep;
+	}
+
+	if (iscsi_conn_bind(cls_session, cls_conn, is_leading)) {
+		rc = -EINVAL;
+		goto put_ep;
+	}
 
-	if (iscsi_conn_bind(cls_session, cls_conn, is_leading))
-		return -EINVAL;
 
 	qedi_ep->conn = qedi_conn;
 	qedi_conn->ep = qedi_ep;
@@ -408,13 +414,18 @@ static int qedi_conn_bind(struct iscsi_cls_session *cls_session,
 	qedi_conn->cmd_cleanup_req = 0;
 	qedi_conn->cmd_cleanup_cmpl = 0;
 
-	if (qedi_bind_conn_to_iscsi_cid(qedi, qedi_conn))
-		return -EINVAL;
+	if (qedi_bind_conn_to_iscsi_cid(qedi, qedi_conn)) {
+		rc = -EINVAL;
+		goto put_ep;
+	}
+
 
 	spin_lock_init(&qedi_conn->tmf_work_lock);
 	INIT_LIST_HEAD(&qedi_conn->tmf_work_list);
 	init_waitqueue_head(&qedi_conn->wait_queue);
-	return 0;
+put_ep:
+	iscsi_put_endpoint(ep);
+	return rc;
 }
 
 static int qedi_iscsi_update_conn(struct qedi_ctx *qedi,
diff --git a/drivers/scsi/qla4xxx/ql4_os.c b/drivers/scsi/qla4xxx/ql4_os.c
index ff663cb330c2..ea128da08537 100644
--- a/drivers/scsi/qla4xxx/ql4_os.c
+++ b/drivers/scsi/qla4xxx/ql4_os.c
@@ -3235,6 +3235,7 @@ static int qla4xxx_conn_bind(struct iscsi_cls_session *cls_session,
 	conn = cls_conn->dd_data;
 	qla_conn = conn->dd_data;
 	qla_conn->qla_ep = ep->dd_data;
+	iscsi_put_endpoint(ep);
 	return 0;
 }
 
diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c
index a23fcf871ffd..a61a4f0fa83a 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -266,9 +266,20 @@ void iscsi_destroy_endpoint(struct iscsi_endpoint *ep)
 }
 EXPORT_SYMBOL_GPL(iscsi_destroy_endpoint);
 
+void iscsi_put_endpoint(struct iscsi_endpoint *ep)
+{
+	put_device(&ep->dev);
+}
+EXPORT_SYMBOL_GPL(iscsi_put_endpoint);
+
+/**
+ * iscsi_lookup_endpoint - get ep from handle
+ * @handle: endpoint handle
+ *
+ * Caller must do a iscsi_put_endpoint.
+ */
 struct iscsi_endpoint *iscsi_lookup_endpoint(u64 handle)
 {
-	struct iscsi_endpoint *ep;
 	struct device *dev;
 
 	dev = class_find_device(&iscsi_endpoint_class, NULL, &handle,
@@ -276,13 +287,7 @@ struct iscsi_endpoint *iscsi_lookup_endpoint(u64 handle)
 	if (!dev)
 		return NULL;
 
-	ep = iscsi_dev_to_endpoint(dev);
-	/*
-	 * we can drop this now because the interface will prevent
-	 * removals and lookups from racing.
-	 */
-	put_device(dev);
-	return ep;
+	return iscsi_dev_to_endpoint(dev);
 }
 EXPORT_SYMBOL_GPL(iscsi_lookup_endpoint);
 
@@ -2984,6 +2989,7 @@ static int iscsi_if_ep_disconnect(struct iscsi_transport *transport,
 	}
 
 	transport->ep_disconnect(ep);
+	iscsi_put_endpoint(ep);
 	return 0;
 }
 
@@ -3009,6 +3015,7 @@ iscsi_if_transport_ep(struct iscsi_transport *transport,
 
 		ev->r.retcode = transport->ep_poll(ep,
 						   ev->u.ep_poll.timeout_ms);
+		iscsi_put_endpoint(ep);
 		break;
 	case ISCSI_UEVENT_TRANSPORT_EP_DISCONNECT:
 		rc = iscsi_if_ep_disconnect(transport,
@@ -3692,6 +3699,7 @@ iscsi_if_recv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, uint32_t *group)
 					ev->u.c_bound_session.initial_cmdsn,
 					ev->u.c_bound_session.cmds_max,
 					ev->u.c_bound_session.queue_depth);
+		iscsi_put_endpoint(ep);
 		break;
 	case ISCSI_UEVENT_DESTROY_SESSION:
 		session = iscsi_session_lookup(ev->u.d_session.sid);
@@ -3763,6 +3771,7 @@ iscsi_if_recv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, uint32_t *group)
 			mutex_lock(&conn->ep_mutex);
 			conn->ep = ep;
 			mutex_unlock(&conn->ep_mutex);
+			iscsi_put_endpoint(ep);
 		} else
 			iscsi_cls_conn_printk(KERN_ERR, conn,
 					      "Could not set ep conn "
diff --git a/include/scsi/scsi_transport_iscsi.h b/include/scsi/scsi_transport_iscsi.h
index 8874016b3c9a..d36a72cf049f 100644
--- a/include/scsi/scsi_transport_iscsi.h
+++ b/include/scsi/scsi_transport_iscsi.h
@@ -442,6 +442,7 @@ extern int iscsi_scan_finished(struct Scsi_Host *shost, unsigned long time);
 extern struct iscsi_endpoint *iscsi_create_endpoint(int dd_size);
 extern void iscsi_destroy_endpoint(struct iscsi_endpoint *ep);
 extern struct iscsi_endpoint *iscsi_lookup_endpoint(u64 handle);
+extern void iscsi_put_endpoint(struct iscsi_endpoint *ep);
 extern int iscsi_block_scsi_eh(struct scsi_cmnd *cmd);
 extern struct iscsi_iface *iscsi_create_iface(struct Scsi_Host *shost,
 					      struct iscsi_transport *t,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 4/6] scsi: iscsi: fix in-kernel conn failure handling
  2021-04-24 22:17 [PATCH v3 0/6] iscsi: Fix in kernel conn failure handling Mike Christie
                   ` (2 preceding siblings ...)
  2021-04-24 22:17 ` [PATCH v3 3/6] scsi: iscsi: have callers do the put on iscsi_lookup_endpoint Mike Christie
@ 2021-04-24 22:17 ` Mike Christie
  2021-05-05 22:56   ` Lee Duncan
  2021-05-12 18:33   ` Lee Duncan
  2021-04-24 22:17 ` [PATCH v3 5/6] scsi: iscsi_tcp: set no linger Mike Christie
  2021-04-24 22:17 ` [PATCH v3 6/6] scsi: iscsi_tcp: start socket shutdown during conn stop Mike Christie
  5 siblings, 2 replies; 17+ messages in thread
From: Mike Christie @ 2021-04-24 22:17 UTC (permalink / raw)
  To: khazhy, lduncan, martin.petersen, rbharath, krisman, linux-scsi, jejb
  Cc: Mike Christie

The commit 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely
in kernel space") has the following regressions/bugs that this patch fixes:

1. It can return cmds to upper layers like dm-multipath where that can
retry them. After they are successful the fs/app can send new IO to the
same sectors, but we've left the cmds running in FW or in the net layer.
We need to be calling ep_disconnect if userspace is not up.

This patch only fixes the issue for offload drivers. iscsi_tcp will be
fixed in separate patch because it doesn't have a ep_disconnect call.

2. The drivers that implement ep_disconnect expect that it's called
before conn_stop. Besides crashes, if the cleanup_task callout is called
before ep_disconnect it might free up driver/card resources for session1
then they could be allocated for session2. But because the driver's
ep_disconnect is not called it has not cleaned up the firmware so the
card is still using the resources for the original cmd.

3. The stop_conn_work_fn can run after userspace has done it's
recovery and we are happily using the session. We will then end up
with various bugs depending on what is going on at the time.

We may also run stop_conn_work_fn late after userspace has called
stop_conn and ep_disconnect and is now going to call start/bind
conn. If stop_conn_work_fn runs after bind but before start,
we would leave the conn in a unbound but sort of started state where
IO might be allowed even though the drivers have been set in a state
where they no longer expect IO.

4. returning -EAGAIN in iscsi_if_destroy_conn if we haven't yet run
the in kernel stop_conn function is breaking userspace. We should have
been doing this for the caller.

Fixes: 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely in kernel space")
Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/scsi_transport_iscsi.c | 453 ++++++++++++++++------------
 include/scsi/scsi_transport_iscsi.h |  10 +-
 2 files changed, 271 insertions(+), 192 deletions(-)

diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c
index a61a4f0fa83a..1453b033b5d8 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -86,15 +86,11 @@ struct iscsi_internal {
 	struct transport_container session_cont;
 };
 
-/* Worker to perform connection failure on unresponsive connections
- * completely in kernel space.
- */
-static void stop_conn_work_fn(struct work_struct *work);
-static DECLARE_WORK(stop_conn_work, stop_conn_work_fn);
-
 static atomic_t iscsi_session_nr; /* sysfs session id for next new session */
 static struct workqueue_struct *iscsi_eh_timer_workq;
 
+static struct workqueue_struct *iscsi_conn_cleanup_workq;
+
 static DEFINE_IDA(iscsi_sess_ida);
 /*
  * list of registered transports and lock that must
@@ -1623,12 +1619,6 @@ static DECLARE_TRANSPORT_CLASS(iscsi_connection_class,
 static struct sock *nls;
 static DEFINE_MUTEX(rx_queue_mutex);
 
-/*
- * conn_mutex protects the {start,bind,stop,destroy}_conn from racing
- * against the kernel stop_connection recovery mechanism
- */
-static DEFINE_MUTEX(conn_mutex);
-
 static LIST_HEAD(sesslist);
 static DEFINE_SPINLOCK(sesslock);
 static LIST_HEAD(connlist);
@@ -2245,6 +2235,106 @@ void iscsi_remove_session(struct iscsi_cls_session *session)
 }
 EXPORT_SYMBOL_GPL(iscsi_remove_session);
 
+static void iscsi_stop_conn(struct iscsi_cls_conn *conn, int flag)
+{
+	ISCSI_DBG_TRANS_CONN(conn, "Stopping conn.\n");
+
+	switch (flag) {
+	case STOP_CONN_RECOVER:
+		conn->state = ISCSI_CONN_FAILED;
+		break;
+	case STOP_CONN_TERM:
+		conn->state = ISCSI_CONN_DOWN;
+		break;
+	default:
+		iscsi_cls_conn_printk(KERN_ERR, conn, "invalid stop flag %d\n",
+				      flag);
+		return;
+	}
+
+	conn->transport->stop_conn(conn, flag);
+	ISCSI_DBG_TRANS_CONN(conn, "Stopping conn done.\n");
+}
+
+static void iscsi_if_stop_conn(struct iscsi_cls_conn *conn, int flag)
+{
+	ISCSI_DBG_TRANS_CONN(conn, "iscsi if conn stop.\n");
+	/*
+	 * If this is a termination we have to call stop_conn with that flag
+	 * so the correct states get set. If we haven't run the work yet try to
+	 * avoid the extra run.
+	 */
+	if (flag == STOP_CONN_TERM) {
+		cancel_work_sync(&conn->cleanup_work);
+		iscsi_stop_conn(conn, flag);
+	} else {
+		/*
+		 * Figure out if it was the kernel or userspace initiating this.
+		 */
+		if (!test_and_set_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags)) {
+			iscsi_stop_conn(conn, flag);
+		} else {
+			ISCSI_DBG_TRANS_CONN(conn,
+					     "flush kernel conn cleanup.\n");
+			flush_work(&conn->cleanup_work);
+		}
+		/*
+		 * Only clear for recovery to avoid extra cleanup runs during
+		 * termination.
+		 */
+		clear_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags);
+	}
+	ISCSI_DBG_TRANS_CONN(conn, "iscsi if conn stop done.\n");
+}
+
+static void iscsi_ep_disconnect(struct iscsi_cls_conn *conn, bool is_active)
+{
+	struct iscsi_cls_session *session = iscsi_conn_to_session(conn);
+	struct iscsi_endpoint *ep = conn->ep;
+
+	ISCSI_DBG_TRANS_CONN(conn, "disconnect ep.\n");
+	conn->state = ISCSI_CONN_FAILED;
+	/*
+	 * We may not be bound because:
+	 * 1. Some drivers just loop over all sessions/conns and call
+	 * iscsi_conn_error_event when they get a link down event.
+	 *
+	 * 2. iscsi_tcp does not uses eps and binds/unbinds in stop/bind_conn,
+	 * and for old tools in destroy_conn.
+	 */
+	if (!conn->ep || !session->transport->ep_disconnect)
+		return;
+
+	ep = conn->ep;
+	conn->ep = NULL;
+
+	session->transport->unbind_conn(conn, is_active);
+	session->transport->ep_disconnect(ep);
+	ISCSI_DBG_TRANS_CONN(conn, "disconnect ep done.\n");
+}
+
+static void iscsi_cleanup_conn_work_fn(struct work_struct *work)
+{
+	struct iscsi_cls_conn *conn = container_of(work, struct iscsi_cls_conn,
+						   cleanup_work);
+	struct iscsi_cls_session *session = iscsi_conn_to_session(conn);
+
+	mutex_lock(&conn->ep_mutex);
+	iscsi_ep_disconnect(conn, false);
+
+	if (system_state != SYSTEM_RUNNING) {
+		/*
+		 * userspace is not going to be able to reconnect so force
+		 * recovery to fail immediately
+		 */
+		session->recovery_tmo = 0;
+	}
+
+	iscsi_stop_conn(conn, STOP_CONN_RECOVER);
+	mutex_unlock(&conn->ep_mutex);
+	ISCSI_DBG_TRANS_CONN(conn, "cleanup done.\n");
+}
+
 void iscsi_free_session(struct iscsi_cls_session *session)
 {
 	ISCSI_DBG_TRANS_SESSION(session, "Freeing session\n");
@@ -2284,7 +2374,7 @@ iscsi_create_conn(struct iscsi_cls_session *session, int dd_size, uint32_t cid)
 
 	mutex_init(&conn->ep_mutex);
 	INIT_LIST_HEAD(&conn->conn_list);
-	INIT_LIST_HEAD(&conn->conn_list_err);
+	INIT_WORK(&conn->cleanup_work, iscsi_cleanup_conn_work_fn);
 	conn->transport = transport;
 	conn->cid = cid;
 	conn->state = ISCSI_CONN_DOWN;
@@ -2341,7 +2431,6 @@ int iscsi_destroy_conn(struct iscsi_cls_conn *conn)
 
 	spin_lock_irqsave(&connlock, flags);
 	list_del(&conn->conn_list);
-	list_del(&conn->conn_list_err);
 	spin_unlock_irqrestore(&connlock, flags);
 
 	transport_unregister_device(&conn->dev);
@@ -2456,77 +2545,6 @@ int iscsi_offload_mesg(struct Scsi_Host *shost,
 }
 EXPORT_SYMBOL_GPL(iscsi_offload_mesg);
 
-/*
- * This can be called without the rx_queue_mutex, if invoked by the kernel
- * stop work. But, in that case, it is guaranteed not to race with
- * iscsi_destroy by conn_mutex.
- */
-static void iscsi_if_stop_conn(struct iscsi_cls_conn *conn, int flag)
-{
-	/*
-	 * It is important that this path doesn't rely on
-	 * rx_queue_mutex, otherwise, a thread doing allocation on a
-	 * start_session/start_connection could sleep waiting on a
-	 * writeback to a failed iscsi device, that cannot be recovered
-	 * because the lock is held.  If we don't hold it here, the
-	 * kernel stop_conn_work_fn has a chance to stop the broken
-	 * session and resolve the allocation.
-	 *
-	 * Still, the user invoked .stop_conn() needs to be serialized
-	 * with stop_conn_work_fn by a private mutex.  Not pretty, but
-	 * it works.
-	 */
-	mutex_lock(&conn_mutex);
-	switch (flag) {
-	case STOP_CONN_RECOVER:
-		conn->state = ISCSI_CONN_FAILED;
-		break;
-	case STOP_CONN_TERM:
-		conn->state = ISCSI_CONN_DOWN;
-		break;
-	default:
-		iscsi_cls_conn_printk(KERN_ERR, conn,
-				      "invalid stop flag %d\n", flag);
-		goto unlock;
-	}
-
-	conn->transport->stop_conn(conn, flag);
-unlock:
-	mutex_unlock(&conn_mutex);
-}
-
-static void stop_conn_work_fn(struct work_struct *work)
-{
-	struct iscsi_cls_conn *conn, *tmp;
-	unsigned long flags;
-	LIST_HEAD(recovery_list);
-
-	spin_lock_irqsave(&connlock, flags);
-	if (list_empty(&connlist_err)) {
-		spin_unlock_irqrestore(&connlock, flags);
-		return;
-	}
-	list_splice_init(&connlist_err, &recovery_list);
-	spin_unlock_irqrestore(&connlock, flags);
-
-	list_for_each_entry_safe(conn, tmp, &recovery_list, conn_list_err) {
-		uint32_t sid = iscsi_conn_get_sid(conn);
-		struct iscsi_cls_session *session;
-
-		session = iscsi_session_lookup(sid);
-		if (session) {
-			if (system_state != SYSTEM_RUNNING) {
-				/* Force recovery to fail immediately */
-				session->recovery_tmo = 0;
-			}
-
-			iscsi_if_stop_conn(conn, STOP_CONN_RECOVER);
-		}
-
-		list_del_init(&conn->conn_list_err);
-	}
-}
-
 void iscsi_conn_error_event(struct iscsi_cls_conn *conn, enum iscsi_err error)
 {
 	struct nlmsghdr	*nlh;
@@ -2534,12 +2552,9 @@ void iscsi_conn_error_event(struct iscsi_cls_conn *conn, enum iscsi_err error)
 	struct iscsi_uevent *ev;
 	struct iscsi_internal *priv;
 	int len = nlmsg_total_size(sizeof(*ev));
-	unsigned long flags;
 
-	spin_lock_irqsave(&connlock, flags);
-	list_add(&conn->conn_list_err, &connlist_err);
-	spin_unlock_irqrestore(&connlock, flags);
-	queue_work(system_unbound_wq, &stop_conn_work);
+	if (!test_and_set_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags))
+		queue_work(iscsi_conn_cleanup_workq, &conn->cleanup_work);
 
 	priv = iscsi_if_transport_lookup(conn->transport);
 	if (!priv)
@@ -2869,26 +2884,17 @@ static int
 iscsi_if_destroy_conn(struct iscsi_transport *transport, struct iscsi_uevent *ev)
 {
 	struct iscsi_cls_conn *conn;
-	unsigned long flags;
 
 	conn = iscsi_conn_lookup(ev->u.d_conn.sid, ev->u.d_conn.cid);
 	if (!conn)
 		return -EINVAL;
 
-	spin_lock_irqsave(&connlock, flags);
-	if (!list_empty(&conn->conn_list_err)) {
-		spin_unlock_irqrestore(&connlock, flags);
-		return -EAGAIN;
-	}
-	spin_unlock_irqrestore(&connlock, flags);
-
+	ISCSI_DBG_TRANS_CONN(conn, "Flushing cleanup during destruction\n");
+	flush_work(&conn->cleanup_work);
 	ISCSI_DBG_TRANS_CONN(conn, "Destroying transport conn\n");
 
-	mutex_lock(&conn_mutex);
 	if (transport->destroy_conn)
 		transport->destroy_conn(conn);
-	mutex_unlock(&conn_mutex);
-
 	return 0;
 }
 
@@ -2967,7 +2973,7 @@ static int iscsi_if_ep_connect(struct iscsi_transport *transport,
 }
 
 static int iscsi_if_ep_disconnect(struct iscsi_transport *transport,
-				  u64 ep_handle, bool is_active)
+				  u64 ep_handle)
 {
 	struct iscsi_cls_conn *conn;
 	struct iscsi_endpoint *ep;
@@ -2978,17 +2984,30 @@ static int iscsi_if_ep_disconnect(struct iscsi_transport *transport,
 	ep = iscsi_lookup_endpoint(ep_handle);
 	if (!ep)
 		return -EINVAL;
+
 	conn = ep->conn;
-	if (conn) {
-		mutex_lock(&conn->ep_mutex);
-		conn->ep = NULL;
+	if (!conn) {
+		/*
+		 * conn was not even bound yet, so we can't get iscsi conn
+		 * failures yet.
+		 */
+		transport->ep_disconnect(ep);
+		goto put_ep;
+	}
+
+	mutex_lock(&conn->ep_mutex);
+	/* Check if this was a conn error and the kernel took ownership */
+	if (test_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags)) {
+		ISCSI_DBG_TRANS_CONN(conn, "flush kernel conn cleanup.\n");
 		mutex_unlock(&conn->ep_mutex);
-		conn->state = ISCSI_CONN_FAILED;
 
-		transport->unbind_conn(conn, is_active);
+		flush_work(&conn->cleanup_work);
+		goto put_ep;
 	}
 
-	transport->ep_disconnect(ep);
+	iscsi_ep_disconnect(conn, false);
+	mutex_unlock(&conn->ep_mutex);
+put_ep:
 	iscsi_put_endpoint(ep);
 	return 0;
 }
@@ -3019,8 +3038,7 @@ iscsi_if_transport_ep(struct iscsi_transport *transport,
 		break;
 	case ISCSI_UEVENT_TRANSPORT_EP_DISCONNECT:
 		rc = iscsi_if_ep_disconnect(transport,
-					    ev->u.ep_disconnect.ep_handle,
-					    false);
+					    ev->u.ep_disconnect.ep_handle);
 		break;
 	}
 	return rc;
@@ -3647,18 +3665,134 @@ iscsi_get_host_stats(struct iscsi_transport *transport, struct nlmsghdr *nlh)
 	return err;
 }
 
+static int iscsi_if_transport_conn(struct iscsi_transport *transport,
+				   struct nlmsghdr *nlh)
+{
+	struct iscsi_uevent *ev = nlmsg_data(nlh);
+	struct iscsi_cls_session *session;
+	struct iscsi_cls_conn *conn = NULL;
+	struct iscsi_endpoint *ep;
+	uint32_t pdu_len;
+	int err = 0;
+
+	switch (nlh->nlmsg_type) {
+	case ISCSI_UEVENT_CREATE_CONN:
+		return iscsi_if_create_conn(transport, ev);
+	case ISCSI_UEVENT_DESTROY_CONN:
+		return iscsi_if_destroy_conn(transport, ev);
+	}
+
+	switch (nlh->nlmsg_type) {
+	case ISCSI_UEVENT_STOP_CONN:
+		conn = iscsi_conn_lookup(ev->u.stop_conn.sid, ev->u.stop_conn.cid);
+		break;
+	case ISCSI_UEVENT_START_CONN:
+		conn = iscsi_conn_lookup(ev->u.start_conn.sid, ev->u.start_conn.cid);
+		break;
+	case ISCSI_UEVENT_BIND_CONN:
+		conn = iscsi_conn_lookup(ev->u.b_conn.sid, ev->u.b_conn.cid);
+		break;
+	case ISCSI_UEVENT_SEND_PDU:
+		conn = iscsi_conn_lookup(ev->u.send_pdu.sid, ev->u.send_pdu.cid);
+		break;
+	}
+
+	if (!conn)
+		return -EINVAL;
+
+	if (nlh->nlmsg_type == ISCSI_UEVENT_STOP_CONN) {
+		iscsi_if_stop_conn(conn, ev->u.stop_conn.flag);
+		return 0;
+	}
+
+	/*
+	 * The following cmds need to be run under the ep_mutex so in kernel
+	 * conn cleanup (ep_disconnect + unbind and conn) is not done while
+	 * these are running. They also must not run if we have just run a conn
+	 * cleanup because they would set the state in a way that might allow
+	 * IO or send IO themselves.
+	 */
+	mutex_lock(&conn->ep_mutex);
+	if (test_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags)) {
+		mutex_unlock(&conn->ep_mutex);
+		ev->r.retcode = -ENOTCONN;
+		return 0;
+	}
+
+	switch (nlh->nlmsg_type) {
+	case ISCSI_UEVENT_BIND_CONN:
+		if (conn->ep) {
+			/*
+			 * For offload boot support where iscsid is restarted
+			 * during the pivot root stage, the ep will be intact
+			 * here when the new iscsid instance starts up and
+			 * reconnects.
+			 */
+			iscsi_ep_disconnect(conn, true);
+		}
+
+		session = iscsi_session_lookup(ev->u.b_conn.sid);
+		if (!session) {
+			err = -EINVAL;
+			break;
+		}
+
+		ev->r.retcode =	transport->bind_conn(session, conn,
+						ev->u.b_conn.transport_eph,
+						ev->u.b_conn.is_leading);
+		if (!ev->r.retcode)
+			conn->state = ISCSI_CONN_BOUND;
+
+		if (ev->r.retcode || !transport->ep_connect)
+			break;
+
+		ep = iscsi_lookup_endpoint(ev->u.b_conn.transport_eph);
+		if (ep) {
+			ep->conn = conn;
+			conn->ep = ep;
+			iscsi_put_endpoint(ep);
+		} else {
+			err = -ENOTCONN;
+			iscsi_cls_conn_printk(KERN_ERR, conn,
+					      "Could not set ep conn binding\n");
+		}
+		break;
+	case ISCSI_UEVENT_START_CONN:
+		ev->r.retcode = transport->start_conn(conn);
+		if (!ev->r.retcode)
+			conn->state = ISCSI_CONN_UP;
+		break;
+	case ISCSI_UEVENT_SEND_PDU:
+		pdu_len = nlh->nlmsg_len - sizeof(*nlh) - sizeof(*ev);
+
+		if ((ev->u.send_pdu.hdr_size > pdu_len) ||
+		    (ev->u.send_pdu.data_size > (pdu_len - ev->u.send_pdu.hdr_size))) {
+			err = -EINVAL;
+			break;
+		}
+
+		ev->r.retcode =	transport->send_pdu(conn,
+				(struct iscsi_hdr *)((char *)ev + sizeof(*ev)),
+				(char *)ev + sizeof(*ev) + ev->u.send_pdu.hdr_size,
+				ev->u.send_pdu.data_size);
+		break;
+	default:
+		err = -ENOSYS;
+	}
+
+	mutex_unlock(&conn->ep_mutex);
+	return err;
+}
 
 static int
 iscsi_if_recv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, uint32_t *group)
 {
 	int err = 0;
 	u32 portid;
-	u32 pdu_len;
 	struct iscsi_uevent *ev = nlmsg_data(nlh);
 	struct iscsi_transport *transport = NULL;
 	struct iscsi_internal *priv;
 	struct iscsi_cls_session *session;
-	struct iscsi_cls_conn *conn;
 	struct iscsi_endpoint *ep = NULL;
 
 	if (!netlink_capable(skb, CAP_SYS_ADMIN))
@@ -3735,90 +3869,16 @@ iscsi_if_recv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, uint32_t *group)
 		else
 			err = -EINVAL;
 		break;
-	case ISCSI_UEVENT_CREATE_CONN:
-		err = iscsi_if_create_conn(transport, ev);
-		break;
-	case ISCSI_UEVENT_DESTROY_CONN:
-		err = iscsi_if_destroy_conn(transport, ev);
-		break;
-	case ISCSI_UEVENT_BIND_CONN:
-		session = iscsi_session_lookup(ev->u.b_conn.sid);
-		conn = iscsi_conn_lookup(ev->u.b_conn.sid, ev->u.b_conn.cid);
-
-		if (conn && conn->ep)
-			iscsi_if_ep_disconnect(transport, conn->ep->id, true);
-
-		if (!session || !conn) {
-			err = -EINVAL;
-			break;
-		}
-
-		mutex_lock(&conn_mutex);
-		ev->r.retcode =	transport->bind_conn(session, conn,
-						ev->u.b_conn.transport_eph,
-						ev->u.b_conn.is_leading);
-		if (!ev->r.retcode)
-			conn->state = ISCSI_CONN_BOUND;
-		mutex_unlock(&conn_mutex);
-
-		if (ev->r.retcode || !transport->ep_connect)
-			break;
-
-		ep = iscsi_lookup_endpoint(ev->u.b_conn.transport_eph);
-		if (ep) {
-			ep->conn = conn;
-
-			mutex_lock(&conn->ep_mutex);
-			conn->ep = ep;
-			mutex_unlock(&conn->ep_mutex);
-			iscsi_put_endpoint(ep);
-		} else
-			iscsi_cls_conn_printk(KERN_ERR, conn,
-					      "Could not set ep conn "
-					      "binding\n");
-		break;
 	case ISCSI_UEVENT_SET_PARAM:
 		err = iscsi_set_param(transport, ev);
 		break;
-	case ISCSI_UEVENT_START_CONN:
-		conn = iscsi_conn_lookup(ev->u.start_conn.sid, ev->u.start_conn.cid);
-		if (conn) {
-			mutex_lock(&conn_mutex);
-			ev->r.retcode = transport->start_conn(conn);
-			if (!ev->r.retcode)
-				conn->state = ISCSI_CONN_UP;
-			mutex_unlock(&conn_mutex);
-		}
-		else
-			err = -EINVAL;
-		break;
+	case ISCSI_UEVENT_CREATE_CONN:
+	case ISCSI_UEVENT_DESTROY_CONN:
 	case ISCSI_UEVENT_STOP_CONN:
-		conn = iscsi_conn_lookup(ev->u.stop_conn.sid, ev->u.stop_conn.cid);
-		if (conn)
-			iscsi_if_stop_conn(conn, ev->u.stop_conn.flag);
-		else
-			err = -EINVAL;
-		break;
+	case ISCSI_UEVENT_START_CONN:
+	case ISCSI_UEVENT_BIND_CONN:
 	case ISCSI_UEVENT_SEND_PDU:
-		pdu_len = nlh->nlmsg_len - sizeof(*nlh) - sizeof(*ev);
-
-		if ((ev->u.send_pdu.hdr_size > pdu_len) ||
-		    (ev->u.send_pdu.data_size > (pdu_len - ev->u.send_pdu.hdr_size))) {
-			err = -EINVAL;
-			break;
-		}
-
-		conn = iscsi_conn_lookup(ev->u.send_pdu.sid, ev->u.send_pdu.cid);
-		if (conn) {
-			mutex_lock(&conn_mutex);
-			ev->r.retcode =	transport->send_pdu(conn,
-				(struct iscsi_hdr*)((char*)ev + sizeof(*ev)),
-				(char*)ev + sizeof(*ev) + ev->u.send_pdu.hdr_size,
-				ev->u.send_pdu.data_size);
-			mutex_unlock(&conn_mutex);
-		}
-		else
-			err = -EINVAL;
+		err = iscsi_if_transport_conn(transport, nlh);
 		break;
 	case ISCSI_UEVENT_GET_STATS:
 		err = iscsi_if_get_stats(transport, nlh);
@@ -4821,8 +4881,18 @@ static __init int iscsi_transport_init(void)
 		goto release_nls;
 	}
 
+	iscsi_conn_cleanup_workq = alloc_workqueue("%s",
+			WQ_SYSFS | WQ_MEM_RECLAIM | WQ_UNBOUND, 0,
+			"iscsi_conn_cleanup");
+	if (!iscsi_conn_cleanup_workq) {
+		err = -ENOMEM;
+		goto destroy_wq;
+	}
+
 	return 0;
 
+destroy_wq:
+	destroy_workqueue(iscsi_eh_timer_workq);
 release_nls:
 	netlink_kernel_release(nls);
 unregister_flashnode_bus:
@@ -4844,6 +4914,7 @@ static __init int iscsi_transport_init(void)
 
 static void __exit iscsi_transport_exit(void)
 {
+	destroy_workqueue(iscsi_conn_cleanup_workq);
 	destroy_workqueue(iscsi_eh_timer_workq);
 	netlink_kernel_release(nls);
 	bus_unregister(&iscsi_flashnode_bus);
diff --git a/include/scsi/scsi_transport_iscsi.h b/include/scsi/scsi_transport_iscsi.h
index d36a72cf049f..3974329d4d02 100644
--- a/include/scsi/scsi_transport_iscsi.h
+++ b/include/scsi/scsi_transport_iscsi.h
@@ -197,15 +197,23 @@ enum iscsi_connection_state {
 	ISCSI_CONN_BOUND,
 };
 
+#define ISCSI_CLS_CONN_BIT_CLEANUP	1
+
 struct iscsi_cls_conn {
 	struct list_head conn_list;	/* item in connlist */
-	struct list_head conn_list_err;	/* item in connlist_err */
 	void *dd_data;			/* LLD private data */
 	struct iscsi_transport *transport;
 	uint32_t cid;			/* connection id */
+	/*
+	 * This protects the conn startup and binding/unbinding of the ep to
+	 * the conn. Unbinding includes ep_disconnect and stop_conn.
+	 */
 	struct mutex ep_mutex;
 	struct iscsi_endpoint *ep;
 
+	unsigned long flags;
+	struct work_struct cleanup_work;
+
 	struct device dev;		/* sysfs transport/container device */
 	enum iscsi_connection_state state;
 };
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 5/6] scsi: iscsi_tcp: set no linger
  2021-04-24 22:17 [PATCH v3 0/6] iscsi: Fix in kernel conn failure handling Mike Christie
                   ` (3 preceding siblings ...)
  2021-04-24 22:17 ` [PATCH v3 4/6] scsi: iscsi: fix in-kernel conn failure handling Mike Christie
@ 2021-04-24 22:17 ` Mike Christie
  2021-05-06  1:29   ` Lee Duncan
  2021-04-24 22:17 ` [PATCH v3 6/6] scsi: iscsi_tcp: start socket shutdown during conn stop Mike Christie
  5 siblings, 1 reply; 17+ messages in thread
From: Mike Christie @ 2021-04-24 22:17 UTC (permalink / raw)
  To: khazhy, lduncan, martin.petersen, rbharath, krisman, linux-scsi, jejb
  Cc: Mike Christie

Userspace (open-iscsi based tools at least) sets no linger on the socket
to prevent stale data from being sent. However, with the in kernel cleanup
if userspace is not up the sockfd_put will release the socket without
having set that sockopt.

iscsid sets that opt at socket close time, but it seems ok to set this at
setup time in the kernel for all tools. And, if we are only doing the in
kernel cleanup initially because iscsid is down that sockopt gets used.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/iscsi_tcp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index dd33ce0e3737..553e95ad6197 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -689,6 +689,7 @@ iscsi_sw_tcp_conn_bind(struct iscsi_cls_session *cls_session,
 	sk->sk_sndtimeo = 15 * HZ; /* FIXME: make it configurable */
 	sk->sk_allocation = GFP_ATOMIC;
 	sk_set_memalloc(sk);
+	sock_no_linger(sk);
 
 	iscsi_sw_tcp_conn_set_callbacks(conn);
 	tcp_sw_conn->sendpage = tcp_sw_conn->sock->ops->sendpage;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 6/6] scsi: iscsi_tcp: start socket shutdown during conn stop
  2021-04-24 22:17 [PATCH v3 0/6] iscsi: Fix in kernel conn failure handling Mike Christie
                   ` (4 preceding siblings ...)
  2021-04-24 22:17 ` [PATCH v3 5/6] scsi: iscsi_tcp: set no linger Mike Christie
@ 2021-04-24 22:17 ` Mike Christie
  2021-04-28 23:27   ` Khazhy Kumykov
  2021-05-06  1:29   ` Lee Duncan
  5 siblings, 2 replies; 17+ messages in thread
From: Mike Christie @ 2021-04-24 22:17 UTC (permalink / raw)
  To: khazhy, lduncan, martin.petersen, rbharath, krisman, linux-scsi, jejb
  Cc: Mike Christie

Make sure the conn socket shutdown starts before we start the timer
to fail commands to upper layers.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/iscsi_tcp.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index 553e95ad6197..1bc37593c88f 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -600,6 +600,12 @@ static void iscsi_sw_tcp_release_conn(struct iscsi_conn *conn)
 	if (!sock)
 		return;
 
+	/*
+	 * Make sure we start socket shutdown now in case userspace is up
+	 * but delayed in releasing the socket.
+	 */
+	kernel_sock_shutdown(sock, SHUT_RDWR);
+
 	sock_hold(sock->sk);
 	iscsi_sw_tcp_conn_restore_callbacks(conn);
 	sock_put(sock->sk);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/6] scsi: iscsi: force immediate failure during shutdown
  2021-04-24 22:17 ` [PATCH v3 1/6] scsi: iscsi: force immediate failure during shutdown Mike Christie
@ 2021-04-28 19:00   ` Lee Duncan
  0 siblings, 0 replies; 17+ messages in thread
From: Lee Duncan @ 2021-04-28 19:00 UTC (permalink / raw)
  To: Mike Christie, khazhy, martin.petersen, rbharath, krisman,
	linux-scsi, jejb

On 4/24/21 3:17 PM, Mike Christie wrote:
> If the system is not up, we can just fail immediately since iscsid is not
> going to ever answer our netlink events. We are already setting the
> recovery_tmo to 0, but by passing stop_conn STOP_CONN_TERM we never will
> block the session and start the recovery timer, because for that flag
> userspace will do the unbind and destroy events which would remove the
> devices and wake up and kill the eh.
> 
> Since the conn is dead and the system is going dowm this just has us use
> STOP_CONN_RECOVER with recovery_tmo=0 so we fail immediately.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/scsi/scsi_transport_iscsi.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c
> index 82491343e94a..0cd9f2090993 100644
> --- a/drivers/scsi/scsi_transport_iscsi.c
> +++ b/drivers/scsi/scsi_transport_iscsi.c
> @@ -2513,11 +2513,11 @@ static void stop_conn_work_fn(struct work_struct *work)
>  		session = iscsi_session_lookup(sid);
>  		if (session) {
>  			if (system_state != SYSTEM_RUNNING) {
> +				/* Force recovery to fail immediately */
>  				session->recovery_tmo = 0;
> -				iscsi_if_stop_conn(conn, STOP_CONN_TERM);
> -			} else {
> -				iscsi_if_stop_conn(conn, STOP_CONN_RECOVER);
>  			}
> +
> +			iscsi_if_stop_conn(conn, STOP_CONN_RECOVER);
>  		}
>  
>  		list_del_init(&conn->conn_list_err);
> 

Reviewed-by: Lee Duncan <lduncan@suse.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/6] scsi: iscsi: use system_unbound_wq for destroy_work
  2021-04-24 22:17 ` [PATCH v3 2/6] scsi: iscsi: use system_unbound_wq for destroy_work Mike Christie
@ 2021-04-28 19:02   ` Lee Duncan
  0 siblings, 0 replies; 17+ messages in thread
From: Lee Duncan @ 2021-04-28 19:02 UTC (permalink / raw)
  To: Mike Christie, khazhy, martin.petersen, rbharath, krisman,
	linux-scsi, jejb

On 4/24/21 3:17 PM, Mike Christie wrote:
> Use the system_unbound_wq for async session destruction. We don't need a
> dedicated workqueue for async session destruction because:
> 
> 1. perf does not seem to be an issue since we only allow 1 active work.
> 2. it does not have deps with other system works and we can run them
> in parallel with each other.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/scsi/scsi_transport_iscsi.c | 15 +--------------
>  1 file changed, 1 insertion(+), 14 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c
> index 0cd9f2090993..a23fcf871ffd 100644
> --- a/drivers/scsi/scsi_transport_iscsi.c
> +++ b/drivers/scsi/scsi_transport_iscsi.c
> @@ -95,8 +95,6 @@ static DECLARE_WORK(stop_conn_work, stop_conn_work_fn);
>  static atomic_t iscsi_session_nr; /* sysfs session id for next new session */
>  static struct workqueue_struct *iscsi_eh_timer_workq;
>  
> -static struct workqueue_struct *iscsi_destroy_workq;
> -
>  static DEFINE_IDA(iscsi_sess_ida);
>  /*
>   * list of registered transports and lock that must
> @@ -3718,7 +3716,7 @@ iscsi_if_recv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, uint32_t *group)
>  			list_del_init(&session->sess_list);
>  			spin_unlock_irqrestore(&sesslock, flags);
>  
> -			queue_work(iscsi_destroy_workq, &session->destroy_work);
> +			queue_work(system_unbound_wq, &session->destroy_work);
>  		}
>  		break;
>  	case ISCSI_UEVENT_UNBIND_SESSION:
> @@ -4814,18 +4812,8 @@ static __init int iscsi_transport_init(void)
>  		goto release_nls;
>  	}
>  
> -	iscsi_destroy_workq = alloc_workqueue("%s",
> -			WQ_SYSFS | __WQ_LEGACY | WQ_MEM_RECLAIM | WQ_UNBOUND,
> -			1, "iscsi_destroy");
> -	if (!iscsi_destroy_workq) {
> -		err = -ENOMEM;
> -		goto destroy_wq;
> -	}
> -
>  	return 0;
>  
> -destroy_wq:
> -	destroy_workqueue(iscsi_eh_timer_workq);
>  release_nls:
>  	netlink_kernel_release(nls);
>  unregister_flashnode_bus:
> @@ -4847,7 +4835,6 @@ static __init int iscsi_transport_init(void)
>  
>  static void __exit iscsi_transport_exit(void)
>  {
> -	destroy_workqueue(iscsi_destroy_workq);
>  	destroy_workqueue(iscsi_eh_timer_workq);
>  	netlink_kernel_release(nls);
>  	bus_unregister(&iscsi_flashnode_bus);
> 

Reviewed-by: Lee Duncan <lduncan@suse.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 6/6] scsi: iscsi_tcp: start socket shutdown during conn stop
  2021-04-24 22:17 ` [PATCH v3 6/6] scsi: iscsi_tcp: start socket shutdown during conn stop Mike Christie
@ 2021-04-28 23:27   ` Khazhy Kumykov
  2021-05-06  1:29   ` Lee Duncan
  1 sibling, 0 replies; 17+ messages in thread
From: Khazhy Kumykov @ 2021-04-28 23:27 UTC (permalink / raw)
  To: Mike Christie
  Cc: lduncan, Martin K. Petersen, Bharath Ravi,
	Gabriel Krisman Bertazi, linux-scsi, jejb

[-- Attachment #1: Type: text/plain, Size: 1051 bytes --]

On Sat, Apr 24, 2021 at 3:18 PM Mike Christie
<michael.christie@oracle.com> wrote:
>
> Make sure the conn socket shutdown starts before we start the timer
> to fail commands to upper layers.
>
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/scsi/iscsi_tcp.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
> index 553e95ad6197..1bc37593c88f 100644
> --- a/drivers/scsi/iscsi_tcp.c
> +++ b/drivers/scsi/iscsi_tcp.c
> @@ -600,6 +600,12 @@ static void iscsi_sw_tcp_release_conn(struct iscsi_conn *conn)
>         if (!sock)
>                 return;
>
> +       /*
> +        * Make sure we start socket shutdown now in case userspace is up
> +        * but delayed in releasing the socket.
> +        */
> +       kernel_sock_shutdown(sock, SHUT_RDWR);
> +
>         sock_hold(sock->sk);
>         iscsi_sw_tcp_conn_restore_callbacks(conn);
>         sock_put(sock->sk);
> --
> 2.25.1
>

For the series,
Reviewed-by: Khazhismel Kumykov <khazhy@google.com>

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3996 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 3/6] scsi: iscsi: have callers do the put on iscsi_lookup_endpoint
  2021-04-24 22:17 ` [PATCH v3 3/6] scsi: iscsi: have callers do the put on iscsi_lookup_endpoint Mike Christie
@ 2021-05-04 22:13   ` Lee Duncan
  0 siblings, 0 replies; 17+ messages in thread
From: Lee Duncan @ 2021-05-04 22:13 UTC (permalink / raw)
  To: Mike Christie, khazhy, martin.petersen, rbharath, krisman,
	linux-scsi, jejb

On 4/24/21 3:17 PM, Mike Christie wrote:
> The next patches allow the kernel to do ep_disconnect. In that case we
> will have to get a proper refcount on the ep so one thread does not
> delete it from under another.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/infiniband/ulp/iser/iscsi_iser.c |  1 +
>  drivers/scsi/be2iscsi/be_iscsi.c         | 19 ++++++++++++------
>  drivers/scsi/bnx2i/bnx2i_iscsi.c         | 23 +++++++++++++++-------
>  drivers/scsi/cxgbi/libcxgbi.c            | 12 ++++++++----
>  drivers/scsi/qedi/qedi_iscsi.c           | 25 +++++++++++++++++-------
>  drivers/scsi/qla4xxx/ql4_os.c            |  1 +
>  drivers/scsi/scsi_transport_iscsi.c      | 25 ++++++++++++++++--------
>  include/scsi/scsi_transport_iscsi.h      |  1 +
>  8 files changed, 75 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c
> index 6baebcb6d14d..776e46ee95da 100644
> --- a/drivers/infiniband/ulp/iser/iscsi_iser.c
> +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
> @@ -506,6 +506,7 @@ iscsi_iser_conn_bind(struct iscsi_cls_session *cls_session,
>  	iser_conn->iscsi_conn = conn;
>  
>  out:
> +	iscsi_put_endpoint(ep);
>  	mutex_unlock(&iser_conn->state_mutex);
>  	return error;
>  }
> diff --git a/drivers/scsi/be2iscsi/be_iscsi.c b/drivers/scsi/be2iscsi/be_iscsi.c
> index a03d0ebc2312..0990b132d62b 100644
> --- a/drivers/scsi/be2iscsi/be_iscsi.c
> +++ b/drivers/scsi/be2iscsi/be_iscsi.c
> @@ -182,6 +182,7 @@ int beiscsi_conn_bind(struct iscsi_cls_session *cls_session,
>  	struct beiscsi_endpoint *beiscsi_ep;
>  	struct iscsi_endpoint *ep;
>  	uint16_t cri_index;
> +	int rc = 0;
>  
>  	ep = iscsi_lookup_endpoint(transport_fd);
>  	if (!ep)
> @@ -189,15 +190,17 @@ int beiscsi_conn_bind(struct iscsi_cls_session *cls_session,
>  
>  	beiscsi_ep = ep->dd_data;
>  
> -	if (iscsi_conn_bind(cls_session, cls_conn, is_leading))
> -		return -EINVAL;
> +	if (iscsi_conn_bind(cls_session, cls_conn, is_leading)) {
> +		rc = -EINVAL;
> +		goto put_ep;
> +	}
>  
>  	if (beiscsi_ep->phba != phba) {
>  		beiscsi_log(phba, KERN_ERR, BEISCSI_LOG_CONFIG,
>  			    "BS_%d : beiscsi_ep->hba=%p not equal to phba=%p\n",
>  			    beiscsi_ep->phba, phba);
> -
> -		return -EEXIST;
> +		rc = -EEXIST;
> +		goto put_ep;
>  	}
>  	cri_index = BE_GET_CRI_FROM_CID(beiscsi_ep->ep_cid);
>  	if (phba->conn_table[cri_index]) {
> @@ -209,7 +212,8 @@ int beiscsi_conn_bind(struct iscsi_cls_session *cls_session,
>  				      beiscsi_ep->ep_cid,
>  				      beiscsi_conn,
>  				      phba->conn_table[cri_index]);
> -			return -EINVAL;
> +			rc = -EINVAL;
> +			goto put_ep;
>  		}
>  	}
>  
> @@ -226,7 +230,10 @@ int beiscsi_conn_bind(struct iscsi_cls_session *cls_session,
>  		    "BS_%d : cid %d phba->conn_table[%u]=%p\n",
>  		    beiscsi_ep->ep_cid, cri_index, beiscsi_conn);
>  	phba->conn_table[cri_index] = beiscsi_conn;
> -	return 0;
> +
> +put_ep:
> +	iscsi_put_endpoint(ep);
> +	return rc;
>  }
>  
>  static int beiscsi_iface_create_ipv4(struct beiscsi_hba *phba)
> diff --git a/drivers/scsi/bnx2i/bnx2i_iscsi.c b/drivers/scsi/bnx2i/bnx2i_iscsi.c
> index 9a4f4776a78a..26cb1c6536ce 100644
> --- a/drivers/scsi/bnx2i/bnx2i_iscsi.c
> +++ b/drivers/scsi/bnx2i/bnx2i_iscsi.c
> @@ -1420,17 +1420,23 @@ static int bnx2i_conn_bind(struct iscsi_cls_session *cls_session,
>  	 * Forcefully terminate all in progress connection recovery at the
>  	 * earliest, either in bind(), send_pdu(LOGIN), or conn_start()
>  	 */
> -	if (bnx2i_adapter_ready(hba))
> -		return -EIO;
> +	if (bnx2i_adapter_ready(hba)) {
> +		ret_code = -EIO;
> +		goto put_ep;
> +	}
>  
>  	bnx2i_ep = ep->dd_data;
>  	if ((bnx2i_ep->state == EP_STATE_TCP_FIN_RCVD) ||
> -	    (bnx2i_ep->state == EP_STATE_TCP_RST_RCVD))
> +	    (bnx2i_ep->state == EP_STATE_TCP_RST_RCVD)) {
>  		/* Peer disconnect via' FIN or RST */
> -		return -EINVAL;
> +		ret_code = -EINVAL;
> +		goto put_ep;
> +	}
>  
> -	if (iscsi_conn_bind(cls_session, cls_conn, is_leading))
> -		return -EINVAL;
> +	if (iscsi_conn_bind(cls_session, cls_conn, is_leading)) {
> +		ret_code = -EINVAL;
> +		goto put_ep;
> +	}
>  
>  	if (bnx2i_ep->hba != hba) {
>  		/* Error - TCP connection does not belong to this device
> @@ -1441,7 +1447,8 @@ static int bnx2i_conn_bind(struct iscsi_cls_session *cls_session,
>  		iscsi_conn_printk(KERN_ALERT, cls_conn->dd_data,
>  				  "belong to hba (%s)\n",
>  				  hba->netdev->name);
> -		return -EEXIST;
> +		ret_code = -EEXIST;
> +		goto put_ep;
>  	}
>  	bnx2i_ep->conn = bnx2i_conn;
>  	bnx2i_conn->ep = bnx2i_ep;
> @@ -1458,6 +1465,8 @@ static int bnx2i_conn_bind(struct iscsi_cls_session *cls_session,
>  		bnx2i_put_rq_buf(bnx2i_conn, 0);
>  
>  	bnx2i_arm_cq_event_coalescing(bnx2i_conn->ep, CNIC_ARM_CQE);
> +put_ep:
> +	iscsi_put_endpoint(ep);
>  	return ret_code;
>  }
>  
> diff --git a/drivers/scsi/cxgbi/libcxgbi.c b/drivers/scsi/cxgbi/libcxgbi.c
> index 215dd0eb3f48..dbe22a7136f3 100644
> --- a/drivers/scsi/cxgbi/libcxgbi.c
> +++ b/drivers/scsi/cxgbi/libcxgbi.c
> @@ -2690,11 +2690,13 @@ int cxgbi_bind_conn(struct iscsi_cls_session *cls_session,
>  	err = csk->cdev->csk_ddp_setup_pgidx(csk, csk->tid,
>  					     ppm->tformat.pgsz_idx_dflt);
>  	if (err < 0)
> -		return err;
> +		goto put_ep;
>  
>  	err = iscsi_conn_bind(cls_session, cls_conn, is_leading);
> -	if (err)
> -		return -EINVAL;
> +	if (err) {
> +		err = -EINVAL;
> +		goto put_ep;
> +	}
>  
>  	/*  calculate the tag idx bits needed for this conn based on cmds_max */
>  	cconn->task_idx_bits = (__ilog2_u32(conn->session->cmds_max - 1)) + 1;
> @@ -2715,7 +2717,9 @@ int cxgbi_bind_conn(struct iscsi_cls_session *cls_session,
>  	/*  init recv engine */
>  	iscsi_tcp_hdr_recv_prep(tcp_conn);
>  
> -	return 0;
> +put_ep:
> +	iscsi_put_endpoint(ep);
> +	return err;
>  }
>  EXPORT_SYMBOL_GPL(cxgbi_bind_conn);
>  
> diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c
> index 25ff2bda077b..bf581ecea897 100644
> --- a/drivers/scsi/qedi/qedi_iscsi.c
> +++ b/drivers/scsi/qedi/qedi_iscsi.c
> @@ -387,6 +387,7 @@ static int qedi_conn_bind(struct iscsi_cls_session *cls_session,
>  	struct qedi_ctx *qedi = iscsi_host_priv(shost);
>  	struct qedi_endpoint *qedi_ep;
>  	struct iscsi_endpoint *ep;
> +	int rc = 0;
>  
>  	ep = iscsi_lookup_endpoint(transport_fd);
>  	if (!ep)
> @@ -394,11 +395,16 @@ static int qedi_conn_bind(struct iscsi_cls_session *cls_session,
>  
>  	qedi_ep = ep->dd_data;
>  	if ((qedi_ep->state == EP_STATE_TCP_FIN_RCVD) ||
> -	    (qedi_ep->state == EP_STATE_TCP_RST_RCVD))
> -		return -EINVAL;
> +	    (qedi_ep->state == EP_STATE_TCP_RST_RCVD)) {
> +		rc = -EINVAL;
> +		goto put_ep;
> +	}
> +
> +	if (iscsi_conn_bind(cls_session, cls_conn, is_leading)) {
> +		rc = -EINVAL;
> +		goto put_ep;
> +	}
>  
> -	if (iscsi_conn_bind(cls_session, cls_conn, is_leading))
> -		return -EINVAL;
>  
>  	qedi_ep->conn = qedi_conn;
>  	qedi_conn->ep = qedi_ep;
> @@ -408,13 +414,18 @@ static int qedi_conn_bind(struct iscsi_cls_session *cls_session,
>  	qedi_conn->cmd_cleanup_req = 0;
>  	qedi_conn->cmd_cleanup_cmpl = 0;
>  
> -	if (qedi_bind_conn_to_iscsi_cid(qedi, qedi_conn))
> -		return -EINVAL;
> +	if (qedi_bind_conn_to_iscsi_cid(qedi, qedi_conn)) {
> +		rc = -EINVAL;
> +		goto put_ep;
> +	}
> +
>  
>  	spin_lock_init(&qedi_conn->tmf_work_lock);
>  	INIT_LIST_HEAD(&qedi_conn->tmf_work_list);
>  	init_waitqueue_head(&qedi_conn->wait_queue);
> -	return 0;
> +put_ep:
> +	iscsi_put_endpoint(ep);
> +	return rc;
>  }
>  
>  static int qedi_iscsi_update_conn(struct qedi_ctx *qedi,
> diff --git a/drivers/scsi/qla4xxx/ql4_os.c b/drivers/scsi/qla4xxx/ql4_os.c
> index ff663cb330c2..ea128da08537 100644
> --- a/drivers/scsi/qla4xxx/ql4_os.c
> +++ b/drivers/scsi/qla4xxx/ql4_os.c
> @@ -3235,6 +3235,7 @@ static int qla4xxx_conn_bind(struct iscsi_cls_session *cls_session,
>  	conn = cls_conn->dd_data;
>  	qla_conn = conn->dd_data;
>  	qla_conn->qla_ep = ep->dd_data;
> +	iscsi_put_endpoint(ep);
>  	return 0;
>  }
>  
> diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c
> index a23fcf871ffd..a61a4f0fa83a 100644
> --- a/drivers/scsi/scsi_transport_iscsi.c
> +++ b/drivers/scsi/scsi_transport_iscsi.c
> @@ -266,9 +266,20 @@ void iscsi_destroy_endpoint(struct iscsi_endpoint *ep)
>  }
>  EXPORT_SYMBOL_GPL(iscsi_destroy_endpoint);
>  
> +void iscsi_put_endpoint(struct iscsi_endpoint *ep)
> +{
> +	put_device(&ep->dev);
> +}
> +EXPORT_SYMBOL_GPL(iscsi_put_endpoint);
> +
> +/**
> + * iscsi_lookup_endpoint - get ep from handle
> + * @handle: endpoint handle
> + *
> + * Caller must do a iscsi_put_endpoint.
> + */
>  struct iscsi_endpoint *iscsi_lookup_endpoint(u64 handle)
>  {
> -	struct iscsi_endpoint *ep;
>  	struct device *dev;
>  
>  	dev = class_find_device(&iscsi_endpoint_class, NULL, &handle,
> @@ -276,13 +287,7 @@ struct iscsi_endpoint *iscsi_lookup_endpoint(u64 handle)
>  	if (!dev)
>  		return NULL;
>  
> -	ep = iscsi_dev_to_endpoint(dev);
> -	/*
> -	 * we can drop this now because the interface will prevent
> -	 * removals and lookups from racing.
> -	 */
> -	put_device(dev);
> -	return ep;
> +	return iscsi_dev_to_endpoint(dev);
>  }
>  EXPORT_SYMBOL_GPL(iscsi_lookup_endpoint);
>  
> @@ -2984,6 +2989,7 @@ static int iscsi_if_ep_disconnect(struct iscsi_transport *transport,
>  	}
>  
>  	transport->ep_disconnect(ep);
> +	iscsi_put_endpoint(ep);
>  	return 0;
>  }
>  
> @@ -3009,6 +3015,7 @@ iscsi_if_transport_ep(struct iscsi_transport *transport,
>  
>  		ev->r.retcode = transport->ep_poll(ep,
>  						   ev->u.ep_poll.timeout_ms);
> +		iscsi_put_endpoint(ep);
>  		break;
>  	case ISCSI_UEVENT_TRANSPORT_EP_DISCONNECT:
>  		rc = iscsi_if_ep_disconnect(transport,
> @@ -3692,6 +3699,7 @@ iscsi_if_recv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, uint32_t *group)
>  					ev->u.c_bound_session.initial_cmdsn,
>  					ev->u.c_bound_session.cmds_max,
>  					ev->u.c_bound_session.queue_depth);
> +		iscsi_put_endpoint(ep);
>  		break;
>  	case ISCSI_UEVENT_DESTROY_SESSION:
>  		session = iscsi_session_lookup(ev->u.d_session.sid);
> @@ -3763,6 +3771,7 @@ iscsi_if_recv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, uint32_t *group)
>  			mutex_lock(&conn->ep_mutex);
>  			conn->ep = ep;
>  			mutex_unlock(&conn->ep_mutex);
> +			iscsi_put_endpoint(ep);
>  		} else
>  			iscsi_cls_conn_printk(KERN_ERR, conn,
>  					      "Could not set ep conn "
> diff --git a/include/scsi/scsi_transport_iscsi.h b/include/scsi/scsi_transport_iscsi.h
> index 8874016b3c9a..d36a72cf049f 100644
> --- a/include/scsi/scsi_transport_iscsi.h
> +++ b/include/scsi/scsi_transport_iscsi.h
> @@ -442,6 +442,7 @@ extern int iscsi_scan_finished(struct Scsi_Host *shost, unsigned long time);
>  extern struct iscsi_endpoint *iscsi_create_endpoint(int dd_size);
>  extern void iscsi_destroy_endpoint(struct iscsi_endpoint *ep);
>  extern struct iscsi_endpoint *iscsi_lookup_endpoint(u64 handle);
> +extern void iscsi_put_endpoint(struct iscsi_endpoint *ep);
>  extern int iscsi_block_scsi_eh(struct scsi_cmnd *cmd);
>  extern struct iscsi_iface *iscsi_create_iface(struct Scsi_Host *shost,
>  					      struct iscsi_transport *t,
> 

Reviewed-by: Lee Duncan <lduncan@suse.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/6] scsi: iscsi: fix in-kernel conn failure handling
  2021-04-24 22:17 ` [PATCH v3 4/6] scsi: iscsi: fix in-kernel conn failure handling Mike Christie
@ 2021-05-05 22:56   ` Lee Duncan
  2021-05-05 23:28     ` Mike Christie
  2021-05-12 18:33   ` Lee Duncan
  1 sibling, 1 reply; 17+ messages in thread
From: Lee Duncan @ 2021-05-05 22:56 UTC (permalink / raw)
  To: Mike Christie, khazhy, martin.petersen, rbharath, krisman,
	linux-scsi, jejb

On 4/24/21 3:17 PM, Mike Christie wrote:
> The commit 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely
> in kernel space") has the following regressions/bugs that this patch fixes:
> 
> 1. It can return cmds to upper layers like dm-multipath where that can
> retry them. After they are successful the fs/app can send new IO to the
> same sectors, but we've left the cmds running in FW or in the net layer.
> We need to be calling ep_disconnect if userspace is not up.
> 
> This patch only fixes the issue for offload drivers. iscsi_tcp will be
> fixed in separate patch because it doesn't have a ep_disconnect call.
> 
> 2. The drivers that implement ep_disconnect expect that it's called
> before conn_stop. Besides crashes, if the cleanup_task callout is called
> before ep_disconnect it might free up driver/card resources for session1
> then they could be allocated for session2. But because the driver's
> ep_disconnect is not called it has not cleaned up the firmware so the
> card is still using the resources for the original cmd.
> 
> 3. The stop_conn_work_fn can run after userspace has done it's
> recovery and we are happily using the session. We will then end up
> with various bugs depending on what is going on at the time.
> 
> We may also run stop_conn_work_fn late after userspace has called
> stop_conn and ep_disconnect and is now going to call start/bind
> conn. If stop_conn_work_fn runs after bind but before start,
> we would leave the conn in a unbound but sort of started state where
> IO might be allowed even though the drivers have been set in a state
> where they no longer expect IO.
> 
> 4. returning -EAGAIN in iscsi_if_destroy_conn if we haven't yet run
> the in kernel stop_conn function is breaking userspace. We should have
> been doing this for the caller.
> 
> Fixes: 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely in kernel space")
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/scsi/scsi_transport_iscsi.c | 453 ++++++++++++++++------------
>  include/scsi/scsi_transport_iscsi.h |  10 +-
>  2 files changed, 271 insertions(+), 192 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c
> index a61a4f0fa83a..1453b033b5d8 100644
> --- a/drivers/scsi/scsi_transport_iscsi.c
> +++ b/drivers/scsi/scsi_transport_iscsi.c
> @@ -86,15 +86,11 @@ struct iscsi_internal {
>  	struct transport_container session_cont;
>  };
>  
> -/* Worker to perform connection failure on unresponsive connections
> - * completely in kernel space.
> - */
> -static void stop_conn_work_fn(struct work_struct *work);
> -static DECLARE_WORK(stop_conn_work, stop_conn_work_fn);
> -
>  static atomic_t iscsi_session_nr; /* sysfs session id for next new session */
>  static struct workqueue_struct *iscsi_eh_timer_workq;
>  
> +static struct workqueue_struct *iscsi_conn_cleanup_workq;
> +
>  static DEFINE_IDA(iscsi_sess_ida);
>  /*
>   * list of registered transports and lock that must
> @@ -1623,12 +1619,6 @@ static DECLARE_TRANSPORT_CLASS(iscsi_connection_class,
>  static struct sock *nls;
>  static DEFINE_MUTEX(rx_queue_mutex);
>  
> -/*
> - * conn_mutex protects the {start,bind,stop,destroy}_conn from racing
> - * against the kernel stop_connection recovery mechanism
> - */
> -static DEFINE_MUTEX(conn_mutex);
> -
>  static LIST_HEAD(sesslist);
>  static DEFINE_SPINLOCK(sesslock);
>  static LIST_HEAD(connlist);
> @@ -2245,6 +2235,106 @@ void iscsi_remove_session(struct iscsi_cls_session *session)
>  }
>  EXPORT_SYMBOL_GPL(iscsi_remove_session);
>  
> +static void iscsi_stop_conn(struct iscsi_cls_conn *conn, int flag)
> +{
> +	ISCSI_DBG_TRANS_CONN(conn, "Stopping conn.\n");
> +
> +	switch (flag) {
> +	case STOP_CONN_RECOVER:
> +		conn->state = ISCSI_CONN_FAILED;
> +		break;
> +	case STOP_CONN_TERM:
> +		conn->state = ISCSI_CONN_DOWN;
> +		break;
> +	default:
> +		iscsi_cls_conn_printk(KERN_ERR, conn, "invalid stop flag %d\n",
> +				      flag);
> +		return;
> +	}
> +
> +	conn->transport->stop_conn(conn, flag);
> +	ISCSI_DBG_TRANS_CONN(conn, "Stopping conn done.\n");
> +}
> +
> +static void iscsi_if_stop_conn(struct iscsi_cls_conn *conn, int flag)
> +{
> +	ISCSI_DBG_TRANS_CONN(conn, "iscsi if conn stop.\n");
> +	/*
> +	 * If this is a termination we have to call stop_conn with that flag
> +	 * so the correct states get set. If we haven't run the work yet try to
> +	 * avoid the extra run.
> +	 */
> +	if (flag == STOP_CONN_TERM) {
> +		cancel_work_sync(&conn->cleanup_work);
> +		iscsi_stop_conn(conn, flag);
> +	} else {
> +		/*
> +		 * Figure out if it was the kernel or userspace initiating this.
> +		 */
> +		if (!test_and_set_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags)) {
> +			iscsi_stop_conn(conn, flag);
> +		} else {
> +			ISCSI_DBG_TRANS_CONN(conn,
> +					     "flush kernel conn cleanup.\n");
> +			flush_work(&conn->cleanup_work);
> +		}
> +		/*
> +		 * Only clear for recovery to avoid extra cleanup runs during
> +		 * termination.
> +		 */
> +		clear_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags);
> +	}
> +	ISCSI_DBG_TRANS_CONN(conn, "iscsi if conn stop done.\n");
> +}
> +
> +static void iscsi_ep_disconnect(struct iscsi_cls_conn *conn, bool is_active)
> +{
> +	struct iscsi_cls_session *session = iscsi_conn_to_session(conn);
> +	struct iscsi_endpoint *ep = conn->ep;

You set ep here, but then you set it again later? Seems redundant.

> +
> +	ISCSI_DBG_TRANS_CONN(conn, "disconnect ep.\n");
> +	conn->state = ISCSI_CONN_FAILED;
> +	/*
> +	 * We may not be bound because:
> +	 * 1. Some drivers just loop over all sessions/conns and call
> +	 * iscsi_conn_error_event when they get a link down event.
> +	 *
> +	 * 2. iscsi_tcp does not uses eps and binds/unbinds in stop/bind_conn,
> +	 * and for old tools in destroy_conn.
> +	 */
> +	if (!conn->ep || !session->transport->ep_disconnect)
> +		return;
> +
> +	ep = conn->ep;
> +	conn->ep = NULL;
> +
> +	session->transport->unbind_conn(conn, is_active);
> +	session->transport->ep_disconnect(ep);
> +	ISCSI_DBG_TRANS_CONN(conn, "disconnect ep done.\n");
> +}
> +
> +static void iscsi_cleanup_conn_work_fn(struct work_struct *work)
> +{
> +	struct iscsi_cls_conn *conn = container_of(work, struct iscsi_cls_conn,
> +						   cleanup_work);
> +	struct iscsi_cls_session *session = iscsi_conn_to_session(conn);
> +
> +	mutex_lock(&conn->ep_mutex);
> +	iscsi_ep_disconnect(conn, false);
> +
> +	if (system_state != SYSTEM_RUNNING) {
> +		/*
> +		 * userspace is not going to be able to reconnect so force
> +		 * recovery to fail immediately
> +		 */
> +		session->recovery_tmo = 0;
> +	}
> +
> +	iscsi_stop_conn(conn, STOP_CONN_RECOVER);
> +	mutex_unlock(&conn->ep_mutex);
> +	ISCSI_DBG_TRANS_CONN(conn, "cleanup done.\n");
> +}
> +
>  void iscsi_free_session(struct iscsi_cls_session *session)
>  {
>  	ISCSI_DBG_TRANS_SESSION(session, "Freeing session\n");
> @@ -2284,7 +2374,7 @@ iscsi_create_conn(struct iscsi_cls_session *session, int dd_size, uint32_t cid)
>  
>  	mutex_init(&conn->ep_mutex);
>  	INIT_LIST_HEAD(&conn->conn_list);
> -	INIT_LIST_HEAD(&conn->conn_list_err);
> +	INIT_WORK(&conn->cleanup_work, iscsi_cleanup_conn_work_fn);
>  	conn->transport = transport;
>  	conn->cid = cid;
>  	conn->state = ISCSI_CONN_DOWN;
> @@ -2341,7 +2431,6 @@ int iscsi_destroy_conn(struct iscsi_cls_conn *conn)
>  
>  	spin_lock_irqsave(&connlock, flags);
>  	list_del(&conn->conn_list);
> -	list_del(&conn->conn_list_err);
>  	spin_unlock_irqrestore(&connlock, flags);
>  
>  	transport_unregister_device(&conn->dev);
> @@ -2456,77 +2545,6 @@ int iscsi_offload_mesg(struct Scsi_Host *shost,
>  }
>  EXPORT_SYMBOL_GPL(iscsi_offload_mesg);
>  
> -/*
> - * This can be called without the rx_queue_mutex, if invoked by the kernel
> - * stop work. But, in that case, it is guaranteed not to race with
> - * iscsi_destroy by conn_mutex.
> - */
> -static void iscsi_if_stop_conn(struct iscsi_cls_conn *conn, int flag)
> -{
> -	/*
> -	 * It is important that this path doesn't rely on
> -	 * rx_queue_mutex, otherwise, a thread doing allocation on a
> -	 * start_session/start_connection could sleep waiting on a
> -	 * writeback to a failed iscsi device, that cannot be recovered
> -	 * because the lock is held.  If we don't hold it here, the
> -	 * kernel stop_conn_work_fn has a chance to stop the broken
> -	 * session and resolve the allocation.
> -	 *
> -	 * Still, the user invoked .stop_conn() needs to be serialized
> -	 * with stop_conn_work_fn by a private mutex.  Not pretty, but
> -	 * it works.
> -	 */
> -	mutex_lock(&conn_mutex);
> -	switch (flag) {
> -	case STOP_CONN_RECOVER:
> -		conn->state = ISCSI_CONN_FAILED;
> -		break;
> -	case STOP_CONN_TERM:
> -		conn->state = ISCSI_CONN_DOWN;
> -		break;
> -	default:
> -		iscsi_cls_conn_printk(KERN_ERR, conn,
> -				      "invalid stop flag %d\n", flag);
> -		goto unlock;
> -	}
> -
> -	conn->transport->stop_conn(conn, flag);
> -unlock:
> -	mutex_unlock(&conn_mutex);
> -}
> -
> -static void stop_conn_work_fn(struct work_struct *work)
> -{
> -	struct iscsi_cls_conn *conn, *tmp;
> -	unsigned long flags;
> -	LIST_HEAD(recovery_list);
> -
> -	spin_lock_irqsave(&connlock, flags);
> -	if (list_empty(&connlist_err)) {
> -		spin_unlock_irqrestore(&connlock, flags);
> -		return;
> -	}
> -	list_splice_init(&connlist_err, &recovery_list);
> -	spin_unlock_irqrestore(&connlock, flags);
> -
> -	list_for_each_entry_safe(conn, tmp, &recovery_list, conn_list_err) {
> -		uint32_t sid = iscsi_conn_get_sid(conn);
> -		struct iscsi_cls_session *session;
> -
> -		session = iscsi_session_lookup(sid);
> -		if (session) {
> -			if (system_state != SYSTEM_RUNNING) {
> -				/* Force recovery to fail immediately */
> -				session->recovery_tmo = 0;
> -			}
> -
> -			iscsi_if_stop_conn(conn, STOP_CONN_RECOVER);
> -		}
> -
> -		list_del_init(&conn->conn_list_err);
> -	}
> -}
> -
>  void iscsi_conn_error_event(struct iscsi_cls_conn *conn, enum iscsi_err error)
>  {
>  	struct nlmsghdr	*nlh;
> @@ -2534,12 +2552,9 @@ void iscsi_conn_error_event(struct iscsi_cls_conn *conn, enum iscsi_err error)
>  	struct iscsi_uevent *ev;
>  	struct iscsi_internal *priv;
>  	int len = nlmsg_total_size(sizeof(*ev));
> -	unsigned long flags;
>  
> -	spin_lock_irqsave(&connlock, flags);
> -	list_add(&conn->conn_list_err, &connlist_err);
> -	spin_unlock_irqrestore(&connlock, flags);
> -	queue_work(system_unbound_wq, &stop_conn_work);
> +	if (!test_and_set_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags))
> +		queue_work(iscsi_conn_cleanup_workq, &conn->cleanup_work);
>  
>  	priv = iscsi_if_transport_lookup(conn->transport);
>  	if (!priv)
> @@ -2869,26 +2884,17 @@ static int
>  iscsi_if_destroy_conn(struct iscsi_transport *transport, struct iscsi_uevent *ev)
>  {
>  	struct iscsi_cls_conn *conn;
> -	unsigned long flags;
>  
>  	conn = iscsi_conn_lookup(ev->u.d_conn.sid, ev->u.d_conn.cid);
>  	if (!conn)
>  		return -EINVAL;
>  
> -	spin_lock_irqsave(&connlock, flags);
> -	if (!list_empty(&conn->conn_list_err)) {
> -		spin_unlock_irqrestore(&connlock, flags);
> -		return -EAGAIN;
> -	}
> -	spin_unlock_irqrestore(&connlock, flags);
> -
> +	ISCSI_DBG_TRANS_CONN(conn, "Flushing cleanup during destruction\n");
> +	flush_work(&conn->cleanup_work);
>  	ISCSI_DBG_TRANS_CONN(conn, "Destroying transport conn\n");
>  
> -	mutex_lock(&conn_mutex);
>  	if (transport->destroy_conn)
>  		transport->destroy_conn(conn);
> -	mutex_unlock(&conn_mutex);
> -
>  	return 0;
>  }
>  
> @@ -2967,7 +2973,7 @@ static int iscsi_if_ep_connect(struct iscsi_transport *transport,
>  }
>  
>  static int iscsi_if_ep_disconnect(struct iscsi_transport *transport,
> -				  u64 ep_handle, bool is_active)
> +				  u64 ep_handle)
>  {
>  	struct iscsi_cls_conn *conn;
>  	struct iscsi_endpoint *ep;
> @@ -2978,17 +2984,30 @@ static int iscsi_if_ep_disconnect(struct iscsi_transport *transport,
>  	ep = iscsi_lookup_endpoint(ep_handle);
>  	if (!ep)
>  		return -EINVAL;
> +
>  	conn = ep->conn;
> -	if (conn) {
> -		mutex_lock(&conn->ep_mutex);
> -		conn->ep = NULL;
> +	if (!conn) {
> +		/*
> +		 * conn was not even bound yet, so we can't get iscsi conn
> +		 * failures yet.
> +		 */
> +		transport->ep_disconnect(ep);
> +		goto put_ep;
> +	}
> +
> +	mutex_lock(&conn->ep_mutex);
> +	/* Check if this was a conn error and the kernel took ownership */
> +	if (test_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags)) {
> +		ISCSI_DBG_TRANS_CONN(conn, "flush kernel conn cleanup.\n");
>  		mutex_unlock(&conn->ep_mutex);
> -		conn->state = ISCSI_CONN_FAILED;
>  
> -		transport->unbind_conn(conn, is_active);
> +		flush_work(&conn->cleanup_work);
> +		goto put_ep;
>  	}
>  
> -	transport->ep_disconnect(ep);
> +	iscsi_ep_disconnect(conn, false);
> +	mutex_unlock(&conn->ep_mutex);
> +put_ep:
>  	iscsi_put_endpoint(ep);
>  	return 0;
>  }
> @@ -3019,8 +3038,7 @@ iscsi_if_transport_ep(struct iscsi_transport *transport,
>  		break;
>  	case ISCSI_UEVENT_TRANSPORT_EP_DISCONNECT:
>  		rc = iscsi_if_ep_disconnect(transport,
> -					    ev->u.ep_disconnect.ep_handle,
> -					    false);
> +					    ev->u.ep_disconnect.ep_handle);
>  		break;
>  	}
>  	return rc;
> @@ -3647,18 +3665,134 @@ iscsi_get_host_stats(struct iscsi_transport *transport, struct nlmsghdr *nlh)
>  	return err;
>  }
>  
> +static int iscsi_if_transport_conn(struct iscsi_transport *transport,
> +				   struct nlmsghdr *nlh)
> +{
> +	struct iscsi_uevent *ev = nlmsg_data(nlh);
> +	struct iscsi_cls_session *session;
> +	struct iscsi_cls_conn *conn = NULL;
> +	struct iscsi_endpoint *ep;
> +	uint32_t pdu_len;
> +	int err = 0;
> +
> +	switch (nlh->nlmsg_type) {
> +	case ISCSI_UEVENT_CREATE_CONN:
> +		return iscsi_if_create_conn(transport, ev);
> +	case ISCSI_UEVENT_DESTROY_CONN:
> +		return iscsi_if_destroy_conn(transport, ev);
> +	}

Why aren't these two consecutive switch statements just one? I'm
guessing there's a good reason, but I can't see it.

> +
> +	switch (nlh->nlmsg_type) {
> +	case ISCSI_UEVENT_STOP_CONN:
> +		conn = iscsi_conn_lookup(ev->u.stop_conn.sid, ev->u.stop_conn.cid);
> +		break;
> +	case ISCSI_UEVENT_START_CONN:
> +		conn = iscsi_conn_lookup(ev->u.start_conn.sid, ev->u.start_conn.cid);
> +		break;
> +	case ISCSI_UEVENT_BIND_CONN:
> +		conn = iscsi_conn_lookup(ev->u.b_conn.sid, ev->u.b_conn.cid);
> +		break;
> +	case ISCSI_UEVENT_SEND_PDU:
> +		conn = iscsi_conn_lookup(ev->u.send_pdu.sid, ev->u.send_pdu.cid);
> +		break;
> +	}
> +
> +	if (!conn)
> +		return -EINVAL;
> +
> +	if (nlh->nlmsg_type == ISCSI_UEVENT_STOP_CONN) {
> +		iscsi_if_stop_conn(conn, ev->u.stop_conn.flag);
> +		return 0;
> +	}
> +
> +	/*
> +	 * The following cmds need to be run under the ep_mutex so in kernel
> +	 * conn cleanup (ep_disconnect + unbind and conn) is not done while
> +	 * these are running. They also must not run if we have just run a conn
> +	 * cleanup because they would set the state in a way that might allow
> +	 * IO or send IO themselves.
> +	 */
> +	mutex_lock(&conn->ep_mutex);
> +	if (test_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags)) {
> +		mutex_unlock(&conn->ep_mutex);
> +		ev->r.retcode = -ENOTCONN;
> +		return 0;
> +	}
> +
> +	switch (nlh->nlmsg_type) {
> +	case ISCSI_UEVENT_BIND_CONN:
> +		if (conn->ep) {
> +			/*
> +			 * For offload boot support where iscsid is restarted
> +			 * during the pivot root stage, the ep will be intact
> +			 * here when the new iscsid instance starts up and
> +			 * reconnects.
> +			 */
> +			iscsi_ep_disconnect(conn, true);
> +		}
> +
> +		session = iscsi_session_lookup(ev->u.b_conn.sid);
> +		if (!session) {
> +			err = -EINVAL;
> +			break;
> +		}
> +
> +		ev->r.retcode =	transport->bind_conn(session, conn,
> +						ev->u.b_conn.transport_eph,
> +						ev->u.b_conn.is_leading);
> +		if (!ev->r.retcode)
> +			conn->state = ISCSI_CONN_BOUND;
> +
> +		if (ev->r.retcode || !transport->ep_connect)
> +			break;
> +
> +		ep = iscsi_lookup_endpoint(ev->u.b_conn.transport_eph);
> +		if (ep) {
> +			ep->conn = conn;
> +			conn->ep = ep;
> +			iscsi_put_endpoint(ep);
> +		} else {
> +			err = -ENOTCONN;
> +			iscsi_cls_conn_printk(KERN_ERR, conn,
> +					      "Could not set ep conn binding\n");
> +		}
> +		break;
> +	case ISCSI_UEVENT_START_CONN:
> +		ev->r.retcode = transport->start_conn(conn);
> +		if (!ev->r.retcode)
> +			conn->state = ISCSI_CONN_UP;
> +		break;
> +	case ISCSI_UEVENT_SEND_PDU:
> +		pdu_len = nlh->nlmsg_len - sizeof(*nlh) - sizeof(*ev);
> +
> +		if ((ev->u.send_pdu.hdr_size > pdu_len) ||
> +		    (ev->u.send_pdu.data_size > (pdu_len - ev->u.send_pdu.hdr_size))) {
> +			err = -EINVAL;
> +			break;
> +		}
> +
> +		ev->r.retcode =	transport->send_pdu(conn,
> +				(struct iscsi_hdr *)((char *)ev + sizeof(*ev)),
> +				(char *)ev + sizeof(*ev) + ev->u.send_pdu.hdr_size,
> +				ev->u.send_pdu.data_size);
> +		break;
> +	default:
> +		err = -ENOSYS;
> +	}
> +
> +	mutex_unlock(&conn->ep_mutex);
> +	return err;
> +}
>  
>  static int
>  iscsi_if_recv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, uint32_t *group)
>  {
>  	int err = 0;
>  	u32 portid;
> -	u32 pdu_len;
>  	struct iscsi_uevent *ev = nlmsg_data(nlh);
>  	struct iscsi_transport *transport = NULL;
>  	struct iscsi_internal *priv;
>  	struct iscsi_cls_session *session;
> -	struct iscsi_cls_conn *conn;
>  	struct iscsi_endpoint *ep = NULL;
>  
>  	if (!netlink_capable(skb, CAP_SYS_ADMIN))
> @@ -3735,90 +3869,16 @@ iscsi_if_recv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, uint32_t *group)
>  		else
>  			err = -EINVAL;
>  		break;
> -	case ISCSI_UEVENT_CREATE_CONN:
> -		err = iscsi_if_create_conn(transport, ev);
> -		break;
> -	case ISCSI_UEVENT_DESTROY_CONN:
> -		err = iscsi_if_destroy_conn(transport, ev);
> -		break;
> -	case ISCSI_UEVENT_BIND_CONN:
> -		session = iscsi_session_lookup(ev->u.b_conn.sid);
> -		conn = iscsi_conn_lookup(ev->u.b_conn.sid, ev->u.b_conn.cid);
> -
> -		if (conn && conn->ep)
> -			iscsi_if_ep_disconnect(transport, conn->ep->id, true);
> -
> -		if (!session || !conn) {
> -			err = -EINVAL;
> -			break;
> -		}
> -
> -		mutex_lock(&conn_mutex);
> -		ev->r.retcode =	transport->bind_conn(session, conn,
> -						ev->u.b_conn.transport_eph,
> -						ev->u.b_conn.is_leading);
> -		if (!ev->r.retcode)
> -			conn->state = ISCSI_CONN_BOUND;
> -		mutex_unlock(&conn_mutex);
> -
> -		if (ev->r.retcode || !transport->ep_connect)
> -			break;
> -
> -		ep = iscsi_lookup_endpoint(ev->u.b_conn.transport_eph);
> -		if (ep) {
> -			ep->conn = conn;
> -
> -			mutex_lock(&conn->ep_mutex);
> -			conn->ep = ep;
> -			mutex_unlock(&conn->ep_mutex);
> -			iscsi_put_endpoint(ep);
> -		} else
> -			iscsi_cls_conn_printk(KERN_ERR, conn,
> -					      "Could not set ep conn "
> -					      "binding\n");
> -		break;
>  	case ISCSI_UEVENT_SET_PARAM:
>  		err = iscsi_set_param(transport, ev);
>  		break;
> -	case ISCSI_UEVENT_START_CONN:
> -		conn = iscsi_conn_lookup(ev->u.start_conn.sid, ev->u.start_conn.cid);
> -		if (conn) {
> -			mutex_lock(&conn_mutex);
> -			ev->r.retcode = transport->start_conn(conn);
> -			if (!ev->r.retcode)
> -				conn->state = ISCSI_CONN_UP;
> -			mutex_unlock(&conn_mutex);
> -		}
> -		else
> -			err = -EINVAL;
> -		break;
> +	case ISCSI_UEVENT_CREATE_CONN:
> +	case ISCSI_UEVENT_DESTROY_CONN:
>  	case ISCSI_UEVENT_STOP_CONN:
> -		conn = iscsi_conn_lookup(ev->u.stop_conn.sid, ev->u.stop_conn.cid);
> -		if (conn)
> -			iscsi_if_stop_conn(conn, ev->u.stop_conn.flag);
> -		else
> -			err = -EINVAL;
> -		break;
> +	case ISCSI_UEVENT_START_CONN:
> +	case ISCSI_UEVENT_BIND_CONN:
>  	case ISCSI_UEVENT_SEND_PDU:
> -		pdu_len = nlh->nlmsg_len - sizeof(*nlh) - sizeof(*ev);
> -
> -		if ((ev->u.send_pdu.hdr_size > pdu_len) ||
> -		    (ev->u.send_pdu.data_size > (pdu_len - ev->u.send_pdu.hdr_size))) {
> -			err = -EINVAL;
> -			break;
> -		}
> -
> -		conn = iscsi_conn_lookup(ev->u.send_pdu.sid, ev->u.send_pdu.cid);
> -		if (conn) {
> -			mutex_lock(&conn_mutex);
> -			ev->r.retcode =	transport->send_pdu(conn,
> -				(struct iscsi_hdr*)((char*)ev + sizeof(*ev)),
> -				(char*)ev + sizeof(*ev) + ev->u.send_pdu.hdr_size,
> -				ev->u.send_pdu.data_size);
> -			mutex_unlock(&conn_mutex);
> -		}
> -		else
> -			err = -EINVAL;
> +		err = iscsi_if_transport_conn(transport, nlh);
>  		break;
>  	case ISCSI_UEVENT_GET_STATS:
>  		err = iscsi_if_get_stats(transport, nlh);
> @@ -4821,8 +4881,18 @@ static __init int iscsi_transport_init(void)
>  		goto release_nls;
>  	}
>  
> +	iscsi_conn_cleanup_workq = alloc_workqueue("%s",
> +			WQ_SYSFS | WQ_MEM_RECLAIM | WQ_UNBOUND, 0,
> +			"iscsi_conn_cleanup");
> +	if (!iscsi_conn_cleanup_workq) {
> +		err = -ENOMEM;
> +		goto destroy_wq;
> +	}
> +
>  	return 0;
>  
> +destroy_wq:
> +	destroy_workqueue(iscsi_eh_timer_workq);
>  release_nls:
>  	netlink_kernel_release(nls);
>  unregister_flashnode_bus:
> @@ -4844,6 +4914,7 @@ static __init int iscsi_transport_init(void)
>  
>  static void __exit iscsi_transport_exit(void)
>  {
> +	destroy_workqueue(iscsi_conn_cleanup_workq);
>  	destroy_workqueue(iscsi_eh_timer_workq);
>  	netlink_kernel_release(nls);
>  	bus_unregister(&iscsi_flashnode_bus);
> diff --git a/include/scsi/scsi_transport_iscsi.h b/include/scsi/scsi_transport_iscsi.h
> index d36a72cf049f..3974329d4d02 100644
> --- a/include/scsi/scsi_transport_iscsi.h
> +++ b/include/scsi/scsi_transport_iscsi.h
> @@ -197,15 +197,23 @@ enum iscsi_connection_state {
>  	ISCSI_CONN_BOUND,
>  };
>  
> +#define ISCSI_CLS_CONN_BIT_CLEANUP	1
> +
>  struct iscsi_cls_conn {
>  	struct list_head conn_list;	/* item in connlist */
> -	struct list_head conn_list_err;	/* item in connlist_err */
>  	void *dd_data;			/* LLD private data */
>  	struct iscsi_transport *transport;
>  	uint32_t cid;			/* connection id */
> +	/*
> +	 * This protects the conn startup and binding/unbinding of the ep to
> +	 * the conn. Unbinding includes ep_disconnect and stop_conn.
> +	 */
>  	struct mutex ep_mutex;
>  	struct iscsi_endpoint *ep;
>  
> +	unsigned long flags;
> +	struct work_struct cleanup_work;
> +
>  	struct device dev;		/* sysfs transport/container device */
>  	enum iscsi_connection_state state;
>  };
> 

-- 
Lee Duncan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/6] scsi: iscsi: fix in-kernel conn failure handling
  2021-05-05 22:56   ` Lee Duncan
@ 2021-05-05 23:28     ` Mike Christie
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Christie @ 2021-05-05 23:28 UTC (permalink / raw)
  To: Lee Duncan, khazhy, martin.petersen, rbharath, krisman, linux-scsi, jejb

On 5/5/21 5:56 PM, Lee Duncan wrote:
>> +static void iscsi_ep_disconnect(struct iscsi_cls_conn *conn, bool is_active)
>> +{
>> +	struct iscsi_cls_session *session = iscsi_conn_to_session(conn);
>> +	struct iscsi_endpoint *ep = conn->ep;
> 
> You set ep here, but then you set it again later? Seems redundant.
> 

Yeah, will fix.

>> +
>> +	ISCSI_DBG_TRANS_CONN(conn, "disconnect ep.\n");
>> +	conn->state = ISCSI_CONN_FAILED;
>> +	/*
>> +	 * We may not be bound because:
>> +	 * 1. Some drivers just loop over all sessions/conns and call
>> +	 * iscsi_conn_error_event when they get a link down event.
>> +	 *
>> +	 * 2. iscsi_tcp does not uses eps and binds/unbinds in stop/bind_conn,
>> +	 * and for old tools in destroy_conn.
>> +	 */
>> +	if (!conn->ep || !session->transport->ep_disconnect)
>> +		return;
>> +
>> +	ep = conn->ep;
>> +	conn->ep = NULL;
>> +
>> +	session->transport->unbind_conn(conn, is_active);
>> +	session->transport->ep_disconnect(ep);
>> +	ISCSI_DBG_TRANS_CONN(conn, "disconnect ep done.\n");
>> +}
>> +
>> +static void iscsi_cleanup_conn_work_fn(struct work_struct *work)
>> +{
>> +	struct iscsi_cls_conn *conn = container_of(work, struct iscsi_cls_conn,
>> +						   cleanup_work);
>> +	struct iscsi_cls_session *session = iscsi_conn_to_session(conn);
>> +
>> +	mutex_lock(&conn->ep_mutex);
>> +	iscsi_ep_disconnect(conn, false);
>> +
>> +	if (system_state != SYSTEM_RUNNING) {
>> +		/*
>> +		 * userspace is not going to be able to reconnect so force
>> +		 * recovery to fail immediately
>> +		 */
>> +		session->recovery_tmo = 0;
>> +	}
>> +
>> +	iscsi_stop_conn(conn, STOP_CONN_RECOVER);
>> +	mutex_unlock(&conn->ep_mutex);
>> +	ISCSI_DBG_TRANS_CONN(conn, "cleanup done.\n");
>> +}
>> +
>>  void iscsi_free_session(struct iscsi_cls_session *session)
>>  {
>>  	ISCSI_DBG_TRANS_SESSION(session, "Freeing session\n");
>> @@ -2284,7 +2374,7 @@ iscsi_create_conn(struct iscsi_cls_session *session, int dd_size, uint32_t cid)
>>  
>>  	mutex_init(&conn->ep_mutex);
>>  	INIT_LIST_HEAD(&conn->conn_list);
>> -	INIT_LIST_HEAD(&conn->conn_list_err);
>> +	INIT_WORK(&conn->cleanup_work, iscsi_cleanup_conn_work_fn);
>>  	conn->transport = transport;
>>  	conn->cid = cid;
>>  	conn->state = ISCSI_CONN_DOWN;
>> @@ -2341,7 +2431,6 @@ int iscsi_destroy_conn(struct iscsi_cls_conn *conn)
>>  
>>  	spin_lock_irqsave(&connlock, flags);
>>  	list_del(&conn->conn_list);
>> -	list_del(&conn->conn_list_err);
>>  	spin_unlock_irqrestore(&connlock, flags);
>>  
>>  	transport_unregister_device(&conn->dev);
>> @@ -2456,77 +2545,6 @@ int iscsi_offload_mesg(struct Scsi_Host *shost,
>>  }
>>  EXPORT_SYMBOL_GPL(iscsi_offload_mesg);
>>  
>> -/*
>> - * This can be called without the rx_queue_mutex, if invoked by the kernel
>> - * stop work. But, in that case, it is guaranteed not to race with
>> - * iscsi_destroy by conn_mutex.
>> - */
>> -static void iscsi_if_stop_conn(struct iscsi_cls_conn *conn, int flag)
>> -{
>> -	/*
>> -	 * It is important that this path doesn't rely on
>> -	 * rx_queue_mutex, otherwise, a thread doing allocation on a
>> -	 * start_session/start_connection could sleep waiting on a
>> -	 * writeback to a failed iscsi device, that cannot be recovered
>> -	 * because the lock is held.  If we don't hold it here, the
>> -	 * kernel stop_conn_work_fn has a chance to stop the broken
>> -	 * session and resolve the allocation.
>> -	 *
>> -	 * Still, the user invoked .stop_conn() needs to be serialized
>> -	 * with stop_conn_work_fn by a private mutex.  Not pretty, but
>> -	 * it works.
>> -	 */
>> -	mutex_lock(&conn_mutex);
>> -	switch (flag) {
>> -	case STOP_CONN_RECOVER:
>> -		conn->state = ISCSI_CONN_FAILED;
>> -		break;
>> -	case STOP_CONN_TERM:
>> -		conn->state = ISCSI_CONN_DOWN;
>> -		break;
>> -	default:
>> -		iscsi_cls_conn_printk(KERN_ERR, conn,
>> -				      "invalid stop flag %d\n", flag);
>> -		goto unlock;
>> -	}
>> -
>> -	conn->transport->stop_conn(conn, flag);
>> -unlock:
>> -	mutex_unlock(&conn_mutex);
>> -}
>> -
>> -static void stop_conn_work_fn(struct work_struct *work)
>> -{
>> -	struct iscsi_cls_conn *conn, *tmp;
>> -	unsigned long flags;
>> -	LIST_HEAD(recovery_list);
>> -
>> -	spin_lock_irqsave(&connlock, flags);
>> -	if (list_empty(&connlist_err)) {
>> -		spin_unlock_irqrestore(&connlock, flags);
>> -		return;
>> -	}
>> -	list_splice_init(&connlist_err, &recovery_list);
>> -	spin_unlock_irqrestore(&connlock, flags);
>> -
>> -	list_for_each_entry_safe(conn, tmp, &recovery_list, conn_list_err) {
>> -		uint32_t sid = iscsi_conn_get_sid(conn);
>> -		struct iscsi_cls_session *session;
>> -
>> -		session = iscsi_session_lookup(sid);
>> -		if (session) {
>> -			if (system_state != SYSTEM_RUNNING) {
>> -				/* Force recovery to fail immediately */
>> -				session->recovery_tmo = 0;
>> -			}
>> -
>> -			iscsi_if_stop_conn(conn, STOP_CONN_RECOVER);
>> -		}
>> -
>> -		list_del_init(&conn->conn_list_err);
>> -	}
>> -}
>> -
>>  void iscsi_conn_error_event(struct iscsi_cls_conn *conn, enum iscsi_err error)
>>  {
>>  	struct nlmsghdr	*nlh;
>> @@ -2534,12 +2552,9 @@ void iscsi_conn_error_event(struct iscsi_cls_conn *conn, enum iscsi_err error)
>>  	struct iscsi_uevent *ev;
>>  	struct iscsi_internal *priv;
>>  	int len = nlmsg_total_size(sizeof(*ev));
>> -	unsigned long flags;
>>  
>> -	spin_lock_irqsave(&connlock, flags);
>> -	list_add(&conn->conn_list_err, &connlist_err);
>> -	spin_unlock_irqrestore(&connlock, flags);
>> -	queue_work(system_unbound_wq, &stop_conn_work);
>> +	if (!test_and_set_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags))
>> +		queue_work(iscsi_conn_cleanup_workq, &conn->cleanup_work);
>>  
>>  	priv = iscsi_if_transport_lookup(conn->transport);
>>  	if (!priv)
>> @@ -2869,26 +2884,17 @@ static int
>>  iscsi_if_destroy_conn(struct iscsi_transport *transport, struct iscsi_uevent *ev)
>>  {
>>  	struct iscsi_cls_conn *conn;
>> -	unsigned long flags;
>>  
>>  	conn = iscsi_conn_lookup(ev->u.d_conn.sid, ev->u.d_conn.cid);
>>  	if (!conn)
>>  		return -EINVAL;
>>  
>> -	spin_lock_irqsave(&connlock, flags);
>> -	if (!list_empty(&conn->conn_list_err)) {
>> -		spin_unlock_irqrestore(&connlock, flags);
>> -		return -EAGAIN;
>> -	}
>> -	spin_unlock_irqrestore(&connlock, flags);
>> -
>> +	ISCSI_DBG_TRANS_CONN(conn, "Flushing cleanup during destruction\n");
>> +	flush_work(&conn->cleanup_work);
>>  	ISCSI_DBG_TRANS_CONN(conn, "Destroying transport conn\n");
>>  
>> -	mutex_lock(&conn_mutex);
>>  	if (transport->destroy_conn)
>>  		transport->destroy_conn(conn);
>> -	mutex_unlock(&conn_mutex);
>> -
>>  	return 0;
>>  }
>>  
>> @@ -2967,7 +2973,7 @@ static int iscsi_if_ep_connect(struct iscsi_transport *transport,
>>  }
>>  
>>  static int iscsi_if_ep_disconnect(struct iscsi_transport *transport,
>> -				  u64 ep_handle, bool is_active)
>> +				  u64 ep_handle)
>>  {
>>  	struct iscsi_cls_conn *conn;
>>  	struct iscsi_endpoint *ep;
>> @@ -2978,17 +2984,30 @@ static int iscsi_if_ep_disconnect(struct iscsi_transport *transport,
>>  	ep = iscsi_lookup_endpoint(ep_handle);
>>  	if (!ep)
>>  		return -EINVAL;
>> +
>>  	conn = ep->conn;
>> -	if (conn) {
>> -		mutex_lock(&conn->ep_mutex);
>> -		conn->ep = NULL;
>> +	if (!conn) {
>> +		/*
>> +		 * conn was not even bound yet, so we can't get iscsi conn
>> +		 * failures yet.
>> +		 */
>> +		transport->ep_disconnect(ep);
>> +		goto put_ep;
>> +	}
>> +
>> +	mutex_lock(&conn->ep_mutex);
>> +	/* Check if this was a conn error and the kernel took ownership */
>> +	if (test_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags)) {
>> +		ISCSI_DBG_TRANS_CONN(conn, "flush kernel conn cleanup.\n");
>>  		mutex_unlock(&conn->ep_mutex);
>> -		conn->state = ISCSI_CONN_FAILED;
>>  
>> -		transport->unbind_conn(conn, is_active);
>> +		flush_work(&conn->cleanup_work);
>> +		goto put_ep;
>>  	}
>>  
>> -	transport->ep_disconnect(ep);
>> +	iscsi_ep_disconnect(conn, false);
>> +	mutex_unlock(&conn->ep_mutex);
>> +put_ep:
>>  	iscsi_put_endpoint(ep);
>>  	return 0;
>>  }
>> @@ -3019,8 +3038,7 @@ iscsi_if_transport_ep(struct iscsi_transport *transport,
>>  		break;
>>  	case ISCSI_UEVENT_TRANSPORT_EP_DISCONNECT:
>>  		rc = iscsi_if_ep_disconnect(transport,
>> -					    ev->u.ep_disconnect.ep_handle,
>> -					    false);
>> +					    ev->u.ep_disconnect.ep_handle);
>>  		break;
>>  	}
>>  	return rc;
>> @@ -3647,18 +3665,134 @@ iscsi_get_host_stats(struct iscsi_transport *transport, struct nlmsghdr *nlh)
>>  	return err;
>>  }
>>  
>> +static int iscsi_if_transport_conn(struct iscsi_transport *transport,
>> +				   struct nlmsghdr *nlh)
>> +{
>> +	struct iscsi_uevent *ev = nlmsg_data(nlh);
>> +	struct iscsi_cls_session *session;
>> +	struct iscsi_cls_conn *conn = NULL;
>> +	struct iscsi_endpoint *ep;
>> +	uint32_t pdu_len;
>> +	int err = 0;
>> +
>> +	switch (nlh->nlmsg_type) {
>> +	case ISCSI_UEVENT_CREATE_CONN:
>> +		return iscsi_if_create_conn(transport, ev);
>> +	case ISCSI_UEVENT_DESTROY_CONN:
>> +		return iscsi_if_destroy_conn(transport, ev);
>> +	}
> 
> Why aren't these two consecutive switch statements just one? I'm
> guessing there's a good reason, but I can't see it.
>
I thought the layout made it easier to see the cmds require
different things wrt the async stop/ep handling.

First switch are cmds we don't need the ep_mutex and to check
ISCSI_CLS_CONN_BIT_CLEANUP. We can just run the helper and return.

After that is the stop conn cmd or cmds that have issues with stop
conn/ep racing with them. In the second switch we lookup the conn
for these types of cmds.

In the last switch we handle the cmd.

I thought the iscsi_if_stop_conn function could lookup it's own conn
and then be called in the first switch since it handles running with
itself in a special way and does not need the common
ISCSI_CLS_CONN_BIT_CLEANUP check.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 5/6] scsi: iscsi_tcp: set no linger
  2021-04-24 22:17 ` [PATCH v3 5/6] scsi: iscsi_tcp: set no linger Mike Christie
@ 2021-05-06  1:29   ` Lee Duncan
  0 siblings, 0 replies; 17+ messages in thread
From: Lee Duncan @ 2021-05-06  1:29 UTC (permalink / raw)
  To: Mike Christie, khazhy, martin.petersen, rbharath, krisman,
	linux-scsi, jejb

On 4/24/21 3:17 PM, Mike Christie wrote:
> Userspace (open-iscsi based tools at least) sets no linger on the socket
> to prevent stale data from being sent. However, with the in kernel cleanup
> if userspace is not up the sockfd_put will release the socket without
> having set that sockopt.
> 
> iscsid sets that opt at socket close time, but it seems ok to set this at
> setup time in the kernel for all tools. And, if we are only doing the in
> kernel cleanup initially because iscsid is down that sockopt gets used.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/scsi/iscsi_tcp.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
> index dd33ce0e3737..553e95ad6197 100644
> --- a/drivers/scsi/iscsi_tcp.c
> +++ b/drivers/scsi/iscsi_tcp.c
> @@ -689,6 +689,7 @@ iscsi_sw_tcp_conn_bind(struct iscsi_cls_session *cls_session,
>  	sk->sk_sndtimeo = 15 * HZ; /* FIXME: make it configurable */
>  	sk->sk_allocation = GFP_ATOMIC;
>  	sk_set_memalloc(sk);
> +	sock_no_linger(sk);
>  
>  	iscsi_sw_tcp_conn_set_callbacks(conn);
>  	tcp_sw_conn->sendpage = tcp_sw_conn->sock->ops->sendpage;
> 

Reviewed-by: Lee Duncan <lduncan@suse.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 6/6] scsi: iscsi_tcp: start socket shutdown during conn stop
  2021-04-24 22:17 ` [PATCH v3 6/6] scsi: iscsi_tcp: start socket shutdown during conn stop Mike Christie
  2021-04-28 23:27   ` Khazhy Kumykov
@ 2021-05-06  1:29   ` Lee Duncan
  1 sibling, 0 replies; 17+ messages in thread
From: Lee Duncan @ 2021-05-06  1:29 UTC (permalink / raw)
  To: Mike Christie, khazhy, martin.petersen, rbharath, krisman,
	linux-scsi, jejb

On 4/24/21 3:17 PM, Mike Christie wrote:
> Make sure the conn socket shutdown starts before we start the timer
> to fail commands to upper layers.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/scsi/iscsi_tcp.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
> index 553e95ad6197..1bc37593c88f 100644
> --- a/drivers/scsi/iscsi_tcp.c
> +++ b/drivers/scsi/iscsi_tcp.c
> @@ -600,6 +600,12 @@ static void iscsi_sw_tcp_release_conn(struct iscsi_conn *conn)
>  	if (!sock)
>  		return;
>  
> +	/*
> +	 * Make sure we start socket shutdown now in case userspace is up
> +	 * but delayed in releasing the socket.
> +	 */
> +	kernel_sock_shutdown(sock, SHUT_RDWR);
> +
>  	sock_hold(sock->sk);
>  	iscsi_sw_tcp_conn_restore_callbacks(conn);
>  	sock_put(sock->sk);
> 

Reviewed-by: Lee Duncan <lduncan@suse.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/6] scsi: iscsi: fix in-kernel conn failure handling
  2021-04-24 22:17 ` [PATCH v3 4/6] scsi: iscsi: fix in-kernel conn failure handling Mike Christie
  2021-05-05 22:56   ` Lee Duncan
@ 2021-05-12 18:33   ` Lee Duncan
  2021-05-12 19:10     ` Mike Christie
  1 sibling, 1 reply; 17+ messages in thread
From: Lee Duncan @ 2021-05-12 18:33 UTC (permalink / raw)
  To: Mike Christie, khazhy, martin.petersen, rbharath, krisman,
	linux-scsi, jejb

On 4/24/21 3:17 PM, Mike Christie wrote:
> The commit 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely
> in kernel space") has the following regressions/bugs that this patch fixes:
> 
> 1. It can return cmds to upper layers like dm-multipath where that can
> retry them. After they are successful the fs/app can send new IO to the
> same sectors, but we've left the cmds running in FW or in the net layer.
> We need to be calling ep_disconnect if userspace is not up.
> 
> This patch only fixes the issue for offload drivers. iscsi_tcp will be
> fixed in separate patch because it doesn't have a ep_disconnect call.
> 
> 2. The drivers that implement ep_disconnect expect that it's called
> before conn_stop. Besides crashes, if the cleanup_task callout is called
> before ep_disconnect it might free up driver/card resources for session1
> then they could be allocated for session2. But because the driver's
> ep_disconnect is not called it has not cleaned up the firmware so the
> card is still using the resources for the original cmd.
> 
> 3. The stop_conn_work_fn can run after userspace has done it's
> recovery and we are happily using the session. We will then end up
> with various bugs depending on what is going on at the time.
> 
> We may also run stop_conn_work_fn late after userspace has called
> stop_conn and ep_disconnect and is now going to call start/bind
> conn. If stop_conn_work_fn runs after bind but before start,
> we would leave the conn in a unbound but sort of started state where
> IO might be allowed even though the drivers have been set in a state
> where they no longer expect IO.
> 
> 4. returning -EAGAIN in iscsi_if_destroy_conn if we haven't yet run
> the in kernel stop_conn function is breaking userspace. We should have
> been doing this for the caller.
> 
> Fixes: 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely in kernel space")
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/scsi/scsi_transport_iscsi.c | 453 ++++++++++++++++------------
>  include/scsi/scsi_transport_iscsi.h |  10 +-
>  2 files changed, 271 insertions(+), 192 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c
> index a61a4f0fa83a..1453b033b5d8 100644
> --- a/drivers/scsi/scsi_transport_iscsi.c
> +++ b/drivers/scsi/scsi_transport_iscsi.c
> @@ -86,15 +86,11 @@ struct iscsi_internal {
>  	struct transport_container session_cont;
>  };
>  
> -/* Worker to perform connection failure on unresponsive connections
> - * completely in kernel space.
> - */
> -static void stop_conn_work_fn(struct work_struct *work);
> -static DECLARE_WORK(stop_conn_work, stop_conn_work_fn);
> -
>  static atomic_t iscsi_session_nr; /* sysfs session id for next new session */
>  static struct workqueue_struct *iscsi_eh_timer_workq;
>  
> +static struct workqueue_struct *iscsi_conn_cleanup_workq;
> +
>  static DEFINE_IDA(iscsi_sess_ida);
>  /*
>   * list of registered transports and lock that must
> @@ -1623,12 +1619,6 @@ static DECLARE_TRANSPORT_CLASS(iscsi_connection_class,
>  static struct sock *nls;
>  static DEFINE_MUTEX(rx_queue_mutex);
>  
> -/*
> - * conn_mutex protects the {start,bind,stop,destroy}_conn from racing
> - * against the kernel stop_connection recovery mechanism
> - */
> -static DEFINE_MUTEX(conn_mutex);
> -
>  static LIST_HEAD(sesslist);
>  static DEFINE_SPINLOCK(sesslock);
>  static LIST_HEAD(connlist);
> @@ -2245,6 +2235,106 @@ void iscsi_remove_session(struct iscsi_cls_session *session)
>  }
>  EXPORT_SYMBOL_GPL(iscsi_remove_session);
>  
> +static void iscsi_stop_conn(struct iscsi_cls_conn *conn, int flag)
> +{
> +	ISCSI_DBG_TRANS_CONN(conn, "Stopping conn.\n");
> +
> +	switch (flag) {
> +	case STOP_CONN_RECOVER:
> +		conn->state = ISCSI_CONN_FAILED;
> +		break;
> +	case STOP_CONN_TERM:
> +		conn->state = ISCSI_CONN_DOWN;
> +		break;
> +	default:
> +		iscsi_cls_conn_printk(KERN_ERR, conn, "invalid stop flag %d\n",
> +				      flag);
> +		return;
> +	}
> +
> +	conn->transport->stop_conn(conn, flag);
> +	ISCSI_DBG_TRANS_CONN(conn, "Stopping conn done.\n");
> +}
> +
> +static void iscsi_if_stop_conn(struct iscsi_cls_conn *conn, int flag)
> +{
> +	ISCSI_DBG_TRANS_CONN(conn, "iscsi if conn stop.\n");
> +	/*
> +	 * If this is a termination we have to call stop_conn with that flag
> +	 * so the correct states get set. If we haven't run the work yet try to
> +	 * avoid the extra run.
> +	 */
> +	if (flag == STOP_CONN_TERM) {
> +		cancel_work_sync(&conn->cleanup_work);
> +		iscsi_stop_conn(conn, flag);
> +	} else {
> +		/*
> +		 * Figure out if it was the kernel or userspace initiating this.
> +		 */
> +		if (!test_and_set_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags)) {
> +			iscsi_stop_conn(conn, flag);
> +		} else {
> +			ISCSI_DBG_TRANS_CONN(conn,
> +					     "flush kernel conn cleanup.\n");
> +			flush_work(&conn->cleanup_work);
> +		}
> +		/*
> +		 * Only clear for recovery to avoid extra cleanup runs during
> +		 * termination.
> +		 */
> +		clear_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags);
> +	}
> +	ISCSI_DBG_TRANS_CONN(conn, "iscsi if conn stop done.\n");
> +}
> +
> +static void iscsi_ep_disconnect(struct iscsi_cls_conn *conn, bool is_active)
> +{
> +	struct iscsi_cls_session *session = iscsi_conn_to_session(conn);
> +	struct iscsi_endpoint *ep = conn->ep;
> +
> +	ISCSI_DBG_TRANS_CONN(conn, "disconnect ep.\n");
> +	conn->state = ISCSI_CONN_FAILED;
> +	/*
> +	 * We may not be bound because:
> +	 * 1. Some drivers just loop over all sessions/conns and call
> +	 * iscsi_conn_error_event when they get a link down event.
> +	 *
> +	 * 2. iscsi_tcp does not uses eps and binds/unbinds in stop/bind_conn,
> +	 * and for old tools in destroy_conn.
> +	 */
> +	if (!conn->ep || !session->transport->ep_disconnect)
> +		return;
> +
> +	ep = conn->ep;
> +	conn->ep = NULL;
> +
> +	session->transport->unbind_conn(conn, is_active);
> +	session->transport->ep_disconnect(ep);
> +	ISCSI_DBG_TRANS_CONN(conn, "disconnect ep done.\n");
> +}
> +
> +static void iscsi_cleanup_conn_work_fn(struct work_struct *work)
> +{
> +	struct iscsi_cls_conn *conn = container_of(work, struct iscsi_cls_conn,
> +						   cleanup_work);
> +	struct iscsi_cls_session *session = iscsi_conn_to_session(conn);
> +
> +	mutex_lock(&conn->ep_mutex);
> +	iscsi_ep_disconnect(conn, false);
> +
> +	if (system_state != SYSTEM_RUNNING) {
> +		/*
> +		 * userspace is not going to be able to reconnect so force
> +		 * recovery to fail immediately
> +		 */
> +		session->recovery_tmo = 0;
> +	}
> +
> +	iscsi_stop_conn(conn, STOP_CONN_RECOVER);
> +	mutex_unlock(&conn->ep_mutex);
> +	ISCSI_DBG_TRANS_CONN(conn, "cleanup done.\n");
> +}
> +
>  void iscsi_free_session(struct iscsi_cls_session *session)
>  {
>  	ISCSI_DBG_TRANS_SESSION(session, "Freeing session\n");
> @@ -2284,7 +2374,7 @@ iscsi_create_conn(struct iscsi_cls_session *session, int dd_size, uint32_t cid)
>  
>  	mutex_init(&conn->ep_mutex);
>  	INIT_LIST_HEAD(&conn->conn_list);
> -	INIT_LIST_HEAD(&conn->conn_list_err);
> +	INIT_WORK(&conn->cleanup_work, iscsi_cleanup_conn_work_fn);
>  	conn->transport = transport;
>  	conn->cid = cid;
>  	conn->state = ISCSI_CONN_DOWN;
> @@ -2341,7 +2431,6 @@ int iscsi_destroy_conn(struct iscsi_cls_conn *conn)
>  
>  	spin_lock_irqsave(&connlock, flags);
>  	list_del(&conn->conn_list);
> -	list_del(&conn->conn_list_err);
>  	spin_unlock_irqrestore(&connlock, flags);
>  
>  	transport_unregister_device(&conn->dev);
> @@ -2456,77 +2545,6 @@ int iscsi_offload_mesg(struct Scsi_Host *shost,
>  }
>  EXPORT_SYMBOL_GPL(iscsi_offload_mesg);
>  
> -/*
> - * This can be called without the rx_queue_mutex, if invoked by the kernel
> - * stop work. But, in that case, it is guaranteed not to race with
> - * iscsi_destroy by conn_mutex.
> - */
> -static void iscsi_if_stop_conn(struct iscsi_cls_conn *conn, int flag)
> -{
> -	/*
> -	 * It is important that this path doesn't rely on
> -	 * rx_queue_mutex, otherwise, a thread doing allocation on a
> -	 * start_session/start_connection could sleep waiting on a
> -	 * writeback to a failed iscsi device, that cannot be recovered
> -	 * because the lock is held.  If we don't hold it here, the
> -	 * kernel stop_conn_work_fn has a chance to stop the broken
> -	 * session and resolve the allocation.
> -	 *
> -	 * Still, the user invoked .stop_conn() needs to be serialized
> -	 * with stop_conn_work_fn by a private mutex.  Not pretty, but
> -	 * it works.
> -	 */
> -	mutex_lock(&conn_mutex);
> -	switch (flag) {
> -	case STOP_CONN_RECOVER:
> -		conn->state = ISCSI_CONN_FAILED;
> -		break;
> -	case STOP_CONN_TERM:
> -		conn->state = ISCSI_CONN_DOWN;
> -		break;
> -	default:
> -		iscsi_cls_conn_printk(KERN_ERR, conn,
> -				      "invalid stop flag %d\n", flag);
> -		goto unlock;
> -	}
> -
> -	conn->transport->stop_conn(conn, flag);
> -unlock:
> -	mutex_unlock(&conn_mutex);
> -}
> -
> -static void stop_conn_work_fn(struct work_struct *work)
> -{
> -	struct iscsi_cls_conn *conn, *tmp;
> -	unsigned long flags;
> -	LIST_HEAD(recovery_list);
> -
> -	spin_lock_irqsave(&connlock, flags);
> -	if (list_empty(&connlist_err)) {
> -		spin_unlock_irqrestore(&connlock, flags);
> -		return;
> -	}
> -	list_splice_init(&connlist_err, &recovery_list);
> -	spin_unlock_irqrestore(&connlock, flags);
> -
> -	list_for_each_entry_safe(conn, tmp, &recovery_list, conn_list_err) {
> -		uint32_t sid = iscsi_conn_get_sid(conn);
> -		struct iscsi_cls_session *session;
> -
> -		session = iscsi_session_lookup(sid);
> -		if (session) {
> -			if (system_state != SYSTEM_RUNNING) {
> -				/* Force recovery to fail immediately */
> -				session->recovery_tmo = 0;
> -			}
> -
> -			iscsi_if_stop_conn(conn, STOP_CONN_RECOVER);
> -		}
> -
> -		list_del_init(&conn->conn_list_err);
> -	}
> -}
> -
>  void iscsi_conn_error_event(struct iscsi_cls_conn *conn, enum iscsi_err error)
>  {
>  	struct nlmsghdr	*nlh;
> @@ -2534,12 +2552,9 @@ void iscsi_conn_error_event(struct iscsi_cls_conn *conn, enum iscsi_err error)
>  	struct iscsi_uevent *ev;
>  	struct iscsi_internal *priv;
>  	int len = nlmsg_total_size(sizeof(*ev));
> -	unsigned long flags;
>  
> -	spin_lock_irqsave(&connlock, flags);
> -	list_add(&conn->conn_list_err, &connlist_err);
> -	spin_unlock_irqrestore(&connlock, flags);
> -	queue_work(system_unbound_wq, &stop_conn_work);
> +	if (!test_and_set_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags))
> +		queue_work(iscsi_conn_cleanup_workq, &conn->cleanup_work);
>  
>  	priv = iscsi_if_transport_lookup(conn->transport);
>  	if (!priv)
> @@ -2869,26 +2884,17 @@ static int
>  iscsi_if_destroy_conn(struct iscsi_transport *transport, struct iscsi_uevent *ev)
>  {
>  	struct iscsi_cls_conn *conn;
> -	unsigned long flags;
>  
>  	conn = iscsi_conn_lookup(ev->u.d_conn.sid, ev->u.d_conn.cid);
>  	if (!conn)
>  		return -EINVAL;
>  
> -	spin_lock_irqsave(&connlock, flags);
> -	if (!list_empty(&conn->conn_list_err)) {
> -		spin_unlock_irqrestore(&connlock, flags);
> -		return -EAGAIN;
> -	}
> -	spin_unlock_irqrestore(&connlock, flags);
> -
> +	ISCSI_DBG_TRANS_CONN(conn, "Flushing cleanup during destruction\n");
> +	flush_work(&conn->cleanup_work);
>  	ISCSI_DBG_TRANS_CONN(conn, "Destroying transport conn\n");
>  
> -	mutex_lock(&conn_mutex);
>  	if (transport->destroy_conn)
>  		transport->destroy_conn(conn);
> -	mutex_unlock(&conn_mutex);
> -
>  	return 0;
>  }
>  
> @@ -2967,7 +2973,7 @@ static int iscsi_if_ep_connect(struct iscsi_transport *transport,
>  }
>  
>  static int iscsi_if_ep_disconnect(struct iscsi_transport *transport,
> -				  u64 ep_handle, bool is_active)
> +				  u64 ep_handle)
>  {
>  	struct iscsi_cls_conn *conn;
>  	struct iscsi_endpoint *ep;
> @@ -2978,17 +2984,30 @@ static int iscsi_if_ep_disconnect(struct iscsi_transport *transport,
>  	ep = iscsi_lookup_endpoint(ep_handle);
>  	if (!ep)
>  		return -EINVAL;
> +
>  	conn = ep->conn;
> -	if (conn) {
> -		mutex_lock(&conn->ep_mutex);
> -		conn->ep = NULL;
> +	if (!conn) {
> +		/*
> +		 * conn was not even bound yet, so we can't get iscsi conn
> +		 * failures yet.
> +		 */
> +		transport->ep_disconnect(ep);
> +		goto put_ep;
> +	}
> +
> +	mutex_lock(&conn->ep_mutex);
> +	/* Check if this was a conn error and the kernel took ownership */
> +	if (test_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags)) {
> +		ISCSI_DBG_TRANS_CONN(conn, "flush kernel conn cleanup.\n");
>  		mutex_unlock(&conn->ep_mutex);
> -		conn->state = ISCSI_CONN_FAILED;
>  
> -		transport->unbind_conn(conn, is_active);
> +		flush_work(&conn->cleanup_work);
> +		goto put_ep;
>  	}
>  
> -	transport->ep_disconnect(ep);
> +	iscsi_ep_disconnect(conn, false);
> +	mutex_unlock(&conn->ep_mutex);
> +put_ep:
>  	iscsi_put_endpoint(ep);
>  	return 0;
>  }
> @@ -3019,8 +3038,7 @@ iscsi_if_transport_ep(struct iscsi_transport *transport,
>  		break;
>  	case ISCSI_UEVENT_TRANSPORT_EP_DISCONNECT:
>  		rc = iscsi_if_ep_disconnect(transport,
> -					    ev->u.ep_disconnect.ep_handle,
> -					    false);
> +					    ev->u.ep_disconnect.ep_handle);
>  		break;
>  	}
>  	return rc;
> @@ -3647,18 +3665,134 @@ iscsi_get_host_stats(struct iscsi_transport *transport, struct nlmsghdr *nlh)
>  	return err;
>  }
>  
> +static int iscsi_if_transport_conn(struct iscsi_transport *transport,
> +				   struct nlmsghdr *nlh)
> +{
> +	struct iscsi_uevent *ev = nlmsg_data(nlh);
> +	struct iscsi_cls_session *session;
> +	struct iscsi_cls_conn *conn = NULL;
> +	struct iscsi_endpoint *ep;
> +	uint32_t pdu_len;
> +	int err = 0;
> +
> +	switch (nlh->nlmsg_type) {
> +	case ISCSI_UEVENT_CREATE_CONN:
> +		return iscsi_if_create_conn(transport, ev);
> +	case ISCSI_UEVENT_DESTROY_CONN:
> +		return iscsi_if_destroy_conn(transport, ev);
> +	}
> +
> +	switch (nlh->nlmsg_type) {
> +	case ISCSI_UEVENT_STOP_CONN:
> +		conn = iscsi_conn_lookup(ev->u.stop_conn.sid, ev->u.stop_conn.cid);
> +		break;
> +	case ISCSI_UEVENT_START_CONN:
> +		conn = iscsi_conn_lookup(ev->u.start_conn.sid, ev->u.start_conn.cid);
> +		break;
> +	case ISCSI_UEVENT_BIND_CONN:
> +		conn = iscsi_conn_lookup(ev->u.b_conn.sid, ev->u.b_conn.cid);
> +		break;
> +	case ISCSI_UEVENT_SEND_PDU:
> +		conn = iscsi_conn_lookup(ev->u.send_pdu.sid, ev->u.send_pdu.cid);
> +		break;
> +	}
> +
> +	if (!conn)
> +		return -EINVAL;
> +
> +	if (nlh->nlmsg_type == ISCSI_UEVENT_STOP_CONN) {
> +		iscsi_if_stop_conn(conn, ev->u.stop_conn.flag);
> +		return 0;
> +	}
> +
> +	/*
> +	 * The following cmds need to be run under the ep_mutex so in kernel
> +	 * conn cleanup (ep_disconnect + unbind and conn) is not done while
> +	 * these are running. They also must not run if we have just run a conn
> +	 * cleanup because they would set the state in a way that might allow
> +	 * IO or send IO themselves.
> +	 */
> +	mutex_lock(&conn->ep_mutex);
> +	if (test_bit(ISCSI_CLS_CONN_BIT_CLEANUP, &conn->flags)) {
> +		mutex_unlock(&conn->ep_mutex);
> +		ev->r.retcode = -ENOTCONN;
> +		return 0;
> +	}
> +
> +	switch (nlh->nlmsg_type) {
> +	case ISCSI_UEVENT_BIND_CONN:
> +		if (conn->ep) {
> +			/*
> +			 * For offload boot support where iscsid is restarted
> +			 * during the pivot root stage, the ep will be intact
> +			 * here when the new iscsid instance starts up and
> +			 * reconnects.
> +			 */
> +			iscsi_ep_disconnect(conn, true);
> +		}
> +
> +		session = iscsi_session_lookup(ev->u.b_conn.sid);
> +		if (!session) {
> +			err = -EINVAL;
> +			break;
> +		}
> +
> +		ev->r.retcode =	transport->bind_conn(session, conn,
> +						ev->u.b_conn.transport_eph,
> +						ev->u.b_conn.is_leading);
> +		if (!ev->r.retcode)
> +			conn->state = ISCSI_CONN_BOUND;
> +
> +		if (ev->r.retcode || !transport->ep_connect)
> +			break;
> +
> +		ep = iscsi_lookup_endpoint(ev->u.b_conn.transport_eph);
> +		if (ep) {
> +			ep->conn = conn;
> +			conn->ep = ep;
> +			iscsi_put_endpoint(ep);
> +		} else {
> +			err = -ENOTCONN;
> +			iscsi_cls_conn_printk(KERN_ERR, conn,
> +					      "Could not set ep conn binding\n");
> +		}
> +		break;
> +	case ISCSI_UEVENT_START_CONN:
> +		ev->r.retcode = transport->start_conn(conn);
> +		if (!ev->r.retcode)
> +			conn->state = ISCSI_CONN_UP;
> +		break;
> +	case ISCSI_UEVENT_SEND_PDU:
> +		pdu_len = nlh->nlmsg_len - sizeof(*nlh) - sizeof(*ev);
> +
> +		if ((ev->u.send_pdu.hdr_size > pdu_len) ||
> +		    (ev->u.send_pdu.data_size > (pdu_len - ev->u.send_pdu.hdr_size))) {
> +			err = -EINVAL;
> +			break;
> +		}
> +
> +		ev->r.retcode =	transport->send_pdu(conn,
> +				(struct iscsi_hdr *)((char *)ev + sizeof(*ev)),
> +				(char *)ev + sizeof(*ev) + ev->u.send_pdu.hdr_size,
> +				ev->u.send_pdu.data_size);
> +		break;
> +	default:
> +		err = -ENOSYS;
> +	}
> +
> +	mutex_unlock(&conn->ep_mutex);
> +	return err;
> +}
>  
>  static int
>  iscsi_if_recv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, uint32_t *group)
>  {
>  	int err = 0;
>  	u32 portid;
> -	u32 pdu_len;
>  	struct iscsi_uevent *ev = nlmsg_data(nlh);
>  	struct iscsi_transport *transport = NULL;
>  	struct iscsi_internal *priv;
>  	struct iscsi_cls_session *session;
> -	struct iscsi_cls_conn *conn;
>  	struct iscsi_endpoint *ep = NULL;
>  
>  	if (!netlink_capable(skb, CAP_SYS_ADMIN))
> @@ -3735,90 +3869,16 @@ iscsi_if_recv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, uint32_t *group)
>  		else
>  			err = -EINVAL;
>  		break;
> -	case ISCSI_UEVENT_CREATE_CONN:
> -		err = iscsi_if_create_conn(transport, ev);
> -		break;
> -	case ISCSI_UEVENT_DESTROY_CONN:
> -		err = iscsi_if_destroy_conn(transport, ev);
> -		break;
> -	case ISCSI_UEVENT_BIND_CONN:
> -		session = iscsi_session_lookup(ev->u.b_conn.sid);
> -		conn = iscsi_conn_lookup(ev->u.b_conn.sid, ev->u.b_conn.cid);
> -
> -		if (conn && conn->ep)
> -			iscsi_if_ep_disconnect(transport, conn->ep->id, true);
> -
> -		if (!session || !conn) {
> -			err = -EINVAL;
> -			break;
> -		}
> -
> -		mutex_lock(&conn_mutex);
> -		ev->r.retcode =	transport->bind_conn(session, conn,
> -						ev->u.b_conn.transport_eph,
> -						ev->u.b_conn.is_leading);
> -		if (!ev->r.retcode)
> -			conn->state = ISCSI_CONN_BOUND;
> -		mutex_unlock(&conn_mutex);
> -
> -		if (ev->r.retcode || !transport->ep_connect)
> -			break;
> -
> -		ep = iscsi_lookup_endpoint(ev->u.b_conn.transport_eph);
> -		if (ep) {
> -			ep->conn = conn;
> -
> -			mutex_lock(&conn->ep_mutex);
> -			conn->ep = ep;
> -			mutex_unlock(&conn->ep_mutex);
> -			iscsi_put_endpoint(ep);
> -		} else
> -			iscsi_cls_conn_printk(KERN_ERR, conn,
> -					      "Could not set ep conn "
> -					      "binding\n");
> -		break;
>  	case ISCSI_UEVENT_SET_PARAM:
>  		err = iscsi_set_param(transport, ev);
>  		break;
> -	case ISCSI_UEVENT_START_CONN:
> -		conn = iscsi_conn_lookup(ev->u.start_conn.sid, ev->u.start_conn.cid);
> -		if (conn) {
> -			mutex_lock(&conn_mutex);
> -			ev->r.retcode = transport->start_conn(conn);
> -			if (!ev->r.retcode)
> -				conn->state = ISCSI_CONN_UP;
> -			mutex_unlock(&conn_mutex);
> -		}
> -		else
> -			err = -EINVAL;
> -		break;
> +	case ISCSI_UEVENT_CREATE_CONN:
> +	case ISCSI_UEVENT_DESTROY_CONN:
>  	case ISCSI_UEVENT_STOP_CONN:
> -		conn = iscsi_conn_lookup(ev->u.stop_conn.sid, ev->u.stop_conn.cid);
> -		if (conn)
> -			iscsi_if_stop_conn(conn, ev->u.stop_conn.flag);
> -		else
> -			err = -EINVAL;
> -		break;
> +	case ISCSI_UEVENT_START_CONN:
> +	case ISCSI_UEVENT_BIND_CONN:
>  	case ISCSI_UEVENT_SEND_PDU:
> -		pdu_len = nlh->nlmsg_len - sizeof(*nlh) - sizeof(*ev);
> -
> -		if ((ev->u.send_pdu.hdr_size > pdu_len) ||
> -		    (ev->u.send_pdu.data_size > (pdu_len - ev->u.send_pdu.hdr_size))) {
> -			err = -EINVAL;
> -			break;
> -		}
> -
> -		conn = iscsi_conn_lookup(ev->u.send_pdu.sid, ev->u.send_pdu.cid);
> -		if (conn) {
> -			mutex_lock(&conn_mutex);
> -			ev->r.retcode =	transport->send_pdu(conn,
> -				(struct iscsi_hdr*)((char*)ev + sizeof(*ev)),
> -				(char*)ev + sizeof(*ev) + ev->u.send_pdu.hdr_size,
> -				ev->u.send_pdu.data_size);
> -			mutex_unlock(&conn_mutex);
> -		}
> -		else
> -			err = -EINVAL;
> +		err = iscsi_if_transport_conn(transport, nlh);
>  		break;
>  	case ISCSI_UEVENT_GET_STATS:
>  		err = iscsi_if_get_stats(transport, nlh);
> @@ -4821,8 +4881,18 @@ static __init int iscsi_transport_init(void)
>  		goto release_nls;
>  	}
>  
> +	iscsi_conn_cleanup_workq = alloc_workqueue("%s",
> +			WQ_SYSFS | WQ_MEM_RECLAIM | WQ_UNBOUND, 0,
> +			"iscsi_conn_cleanup");
> +	if (!iscsi_conn_cleanup_workq) {
> +		err = -ENOMEM;
> +		goto destroy_wq;
> +	}
> +
>  	return 0;
>  
> +destroy_wq:
> +	destroy_workqueue(iscsi_eh_timer_workq);
>  release_nls:
>  	netlink_kernel_release(nls);
>  unregister_flashnode_bus:
> @@ -4844,6 +4914,7 @@ static __init int iscsi_transport_init(void)
>  
>  static void __exit iscsi_transport_exit(void)
>  {
> +	destroy_workqueue(iscsi_conn_cleanup_workq);
>  	destroy_workqueue(iscsi_eh_timer_workq);
>  	netlink_kernel_release(nls);
>  	bus_unregister(&iscsi_flashnode_bus);
> diff --git a/include/scsi/scsi_transport_iscsi.h b/include/scsi/scsi_transport_iscsi.h
> index d36a72cf049f..3974329d4d02 100644
> --- a/include/scsi/scsi_transport_iscsi.h
> +++ b/include/scsi/scsi_transport_iscsi.h
> @@ -197,15 +197,23 @@ enum iscsi_connection_state {
>  	ISCSI_CONN_BOUND,
>  };
>  
> +#define ISCSI_CLS_CONN_BIT_CLEANUP	1
> +
>  struct iscsi_cls_conn {
>  	struct list_head conn_list;	/* item in connlist */
> -	struct list_head conn_list_err;	/* item in connlist_err */
>  	void *dd_data;			/* LLD private data */
>  	struct iscsi_transport *transport;
>  	uint32_t cid;			/* connection id */
> +	/*
> +	 * This protects the conn startup and binding/unbinding of the ep to
> +	 * the conn. Unbinding includes ep_disconnect and stop_conn.
> +	 */
>  	struct mutex ep_mutex;
>  	struct iscsi_endpoint *ep;
>  
> +	unsigned long flags;
> +	struct work_struct cleanup_work;
> +
>  	struct device dev;		/* sysfs transport/container device */
>  	enum iscsi_connection_state state;
>  };
> 

I can't remember if I replied to this already (what memory?), so just in
case:

Reviewed-by: Lee Duncan <lduncan@suse.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/6] scsi: iscsi: fix in-kernel conn failure handling
  2021-05-12 18:33   ` Lee Duncan
@ 2021-05-12 19:10     ` Mike Christie
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Christie @ 2021-05-12 19:10 UTC (permalink / raw)
  To: Lee Duncan, khazhy, martin.petersen, rbharath, krisman, linux-scsi, jejb

On 5/12/21 1:33 PM, Lee Duncan wrote:
> I can't remember if I replied to this already (what memory?), so just in
> case:
> 
You had 2 review comments I'm handling:

1. I was setting ep multiple times in one function, so I'm cleaning that up.

2. I re-arranged the code in iscsi_if_transport_conn so it's clear why I
separated the 2 types of functions (ones that need the ep_mutex and a common
check for if a conn stop is running and ones that don't).

There is also a new one:

3. While re-testing for #2, I found another issue. Most of the bugs I was
handling were what happens if the in-kernel conn stop runs late due to iscsid
being overloaded. Because we do the conn stop in the kernel we can of course
now run a lot a faster. So we can end up cleaning up from the issue, but then
the kernel fires another event for the same root issue. Because we already
cleaned up from the in-kernel cleanup, we think it's a new issue instead
of just N events for the same root issue.

I'm testing a fix for that still.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-05-12 21:21 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-24 22:17 [PATCH v3 0/6] iscsi: Fix in kernel conn failure handling Mike Christie
2021-04-24 22:17 ` [PATCH v3 1/6] scsi: iscsi: force immediate failure during shutdown Mike Christie
2021-04-28 19:00   ` Lee Duncan
2021-04-24 22:17 ` [PATCH v3 2/6] scsi: iscsi: use system_unbound_wq for destroy_work Mike Christie
2021-04-28 19:02   ` Lee Duncan
2021-04-24 22:17 ` [PATCH v3 3/6] scsi: iscsi: have callers do the put on iscsi_lookup_endpoint Mike Christie
2021-05-04 22:13   ` Lee Duncan
2021-04-24 22:17 ` [PATCH v3 4/6] scsi: iscsi: fix in-kernel conn failure handling Mike Christie
2021-05-05 22:56   ` Lee Duncan
2021-05-05 23:28     ` Mike Christie
2021-05-12 18:33   ` Lee Duncan
2021-05-12 19:10     ` Mike Christie
2021-04-24 22:17 ` [PATCH v3 5/6] scsi: iscsi_tcp: set no linger Mike Christie
2021-05-06  1:29   ` Lee Duncan
2021-04-24 22:17 ` [PATCH v3 6/6] scsi: iscsi_tcp: start socket shutdown during conn stop Mike Christie
2021-04-28 23:27   ` Khazhy Kumykov
2021-05-06  1:29   ` Lee Duncan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.