[PATCH v2 0/7] vhost-scsi: Fix crashes and management op hangs

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/7] vhost-scsi: Fix crashes and management op hangs
@ 2023-03-21  2:06 Mike Christie
  2023-03-21  2:06 ` [PATCH v2 1/7] vhost-scsi: Fix vhost_scsi struct use after free Mike Christie
                   ` (7 more replies)
  0 siblings, 8 replies; 11+ messages in thread
From: Mike Christie @ 2023-03-21  2:06 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, jasowang, mst, sgarzare,
	virtualization

The following patches were made over Linus tree. The patches fix 3
issues:

1. If a user performs LIO LUN unmapping before the endpoint has been
cleared then we can end up trying to free a bogus tmf struct if the
TMF is still exucuting when we do the unmap.
2. If vhost_scsi_setup_vq_cmds fails we can leave the tpg->vhost_scsi
pointer set and we can end up trying to access a freed struct.
3. Management operations like LUN mapping/unmapping and device addition
hang for 30 seconds or up to N minutes depending on the device.

The problem is that we use a global mutex to protect the list of tpgs
and for accessing the tpg, and to make sure they are flushed. We then
hold that mutex during a lot of management operations. So if you
are just trying to add another device, it will have to wait on another
device if we are in the middle of clearing it's endpoint and it's
waiting on hung IO.

This patchset fixes up the ordering of how we flush IO and release
refcounts and how often the global mutex is used so we don't need to
always hold it

v2:
1. Added fix for possible use after free and merge with a locking
cleanup patch.
2. Added fix for LIO LUN unmap during TMF execution bug.
3. Fixed bug where we needed to hold the tpg mutex instead of the
vhost_scsi_mutex when calling vhost_scsi_do_plug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/7] vhost-scsi: Fix vhost_scsi struct use after free
  2023-03-21  2:06 [PATCH v2 0/7] vhost-scsi: Fix crashes and management op hangs Mike Christie
@ 2023-03-21  2:06 ` Mike Christie
  2023-03-21  2:06 ` [PATCH v2 2/7] vhost-scsi: Fix crash during LUN unmapping Mike Christie
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Mike Christie @ 2023-03-21  2:06 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, jasowang, mst, sgarzare,
	virtualization
  Cc: Mike Christie

If vhost_scsi_setup_vq_cmds fails we leave the tpg->vhost_scsi pointer
set. If the device is freed and then the user unmaps the LUN, the call to
vhost_scsi_port_unlink -> vhost_scsi_hotunplug will see the that
tpg->vhost_scsi is still set and try to use it.

This has us clear the vhost_scsi pointer in the failure path. It also
has us take tv_tpg_mutex in this failure path, because tv_tpg_vhost_count
is accessed under this mutex in vhost_scsi_drop_nexus and in the future
we will want to serialize access to tpg->vhost_scsi with that mutex
instead of the vhost_scsi_mutex.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/scsi.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index b244e7c0f514..5875241e1654 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1658,7 +1658,10 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs,
 	for (i = 0; i < VHOST_SCSI_MAX_TARGET; i++) {
 		tpg = vs_tpg[i];
 		if (tpg) {
+			mutex_lock(&tpg->tv_tpg_mutex);
+			tpg->vhost_scsi = NULL;
 			tpg->tv_tpg_vhost_count--;
+			mutex_unlock(&tpg->tv_tpg_mutex);
 			target_undepend_item(&tpg->se_tpg.tpg_group.cg_item);
 		}
 	}
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/7] vhost-scsi: Fix crash during LUN unmapping
  2023-03-21  2:06 [PATCH v2 0/7] vhost-scsi: Fix crashes and management op hangs Mike Christie
  2023-03-21  2:06 ` [PATCH v2 1/7] vhost-scsi: Fix vhost_scsi struct use after free Mike Christie
@ 2023-03-21  2:06 ` Mike Christie
  2023-03-21  2:06 ` [PATCH v2 3/7] vhost-scsi: Delay releasing our refcount on the tpg Mike Christie
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Mike Christie @ 2023-03-21  2:06 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, jasowang, mst, sgarzare,
	virtualization
  Cc: Mike Christie

We normally clear the endpoint then unmap LUNs so the devices are fully
shutdown when the LUN is unmapped, but it's legal to unmap before
clearing. If the user does that while TMFs are running then we can end
up crashing.

vhost_scsi_port_unlink assumes that the LUN's tmf struct will always be on
the tmf_queue list. However, if a TMF is running then it will have been
removed while it's executing. If we do a LUN unmap at this time, then
we assume the entry is on the list and just start accessing it and free
it.

This fixes the bug by just allocating the vhost_scsi_tmf struct when it's
needed like is done with the se_tmr struct that's needed when we submit
the TMF. In this path perf is not an issue and we can use GFP_KERNEL
since it won't swing directly back on us, so we don't need to preallocate
the struct.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/scsi.c | 36 ++++--------------------------------
 1 file changed, 4 insertions(+), 32 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 5875241e1654..32d0be968103 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -125,7 +125,6 @@ struct vhost_scsi_tpg {
 	struct se_portal_group se_tpg;
 	/* Pointer back to vhost_scsi, protected by tv_tpg_mutex */
 	struct vhost_scsi *vhost_scsi;
-	struct list_head tmf_queue;
 };
 
 struct vhost_scsi_tport {
@@ -206,10 +205,8 @@ struct vhost_scsi {
 
 struct vhost_scsi_tmf {
 	struct vhost_work vwork;
-	struct vhost_scsi_tpg *tpg;
 	struct vhost_scsi *vhost;
 	struct vhost_scsi_virtqueue *svq;
-	struct list_head queue_entry;
 
 	struct se_cmd se_cmd;
 	u8 scsi_resp;
@@ -352,12 +349,9 @@ static void vhost_scsi_release_cmd_res(struct se_cmd *se_cmd)
 
 static void vhost_scsi_release_tmf_res(struct vhost_scsi_tmf *tmf)
 {
-	struct vhost_scsi_tpg *tpg = tmf->tpg;
 	struct vhost_scsi_inflight *inflight = tmf->inflight;
 
-	mutex_lock(&tpg->tv_tpg_mutex);
-	list_add_tail(&tpg->tmf_queue, &tmf->queue_entry);
-	mutex_unlock(&tpg->tv_tpg_mutex);
+	kfree(tmf);
 	vhost_scsi_put_inflight(inflight);
 }
 
@@ -1194,19 +1188,11 @@ vhost_scsi_handle_tmf(struct vhost_scsi *vs, struct vhost_scsi_tpg *tpg,
 		goto send_reject;
 	}
 
-	mutex_lock(&tpg->tv_tpg_mutex);
-	if (list_empty(&tpg->tmf_queue)) {
-		pr_err("Missing reserve TMF. Could not handle LUN RESET.\n");
-		mutex_unlock(&tpg->tv_tpg_mutex);
+	tmf = kzalloc(sizeof(*tmf), GFP_KERNEL);
+	if (!tmf)
 		goto send_reject;
-	}
-
-	tmf = list_first_entry(&tpg->tmf_queue, struct vhost_scsi_tmf,
-			       queue_entry);
-	list_del_init(&tmf->queue_entry);
-	mutex_unlock(&tpg->tv_tpg_mutex);
 
-	tmf->tpg = tpg;
+	vhost_work_init(&tmf->vwork, vhost_scsi_tmf_resp_work);
 	tmf->vhost = vs;
 	tmf->svq = svq;
 	tmf->resp_iov = vq->iov[vc->out];
@@ -2035,19 +2021,11 @@ static int vhost_scsi_port_link(struct se_portal_group *se_tpg,
 {
 	struct vhost_scsi_tpg *tpg = container_of(se_tpg,
 				struct vhost_scsi_tpg, se_tpg);
-	struct vhost_scsi_tmf *tmf;
-
-	tmf = kzalloc(sizeof(*tmf), GFP_KERNEL);
-	if (!tmf)
-		return -ENOMEM;
-	INIT_LIST_HEAD(&tmf->queue_entry);
-	vhost_work_init(&tmf->vwork, vhost_scsi_tmf_resp_work);
 
 	mutex_lock(&vhost_scsi_mutex);
 
 	mutex_lock(&tpg->tv_tpg_mutex);
 	tpg->tv_tpg_port_count++;
-	list_add_tail(&tmf->queue_entry, &tpg->tmf_queue);
 	mutex_unlock(&tpg->tv_tpg_mutex);
 
 	vhost_scsi_hotplug(tpg, lun);
@@ -2062,16 +2040,11 @@ static void vhost_scsi_port_unlink(struct se_portal_group *se_tpg,
 {
 	struct vhost_scsi_tpg *tpg = container_of(se_tpg,
 				struct vhost_scsi_tpg, se_tpg);
-	struct vhost_scsi_tmf *tmf;
 
 	mutex_lock(&vhost_scsi_mutex);
 
 	mutex_lock(&tpg->tv_tpg_mutex);
 	tpg->tv_tpg_port_count--;
-	tmf = list_first_entry(&tpg->tmf_queue, struct vhost_scsi_tmf,
-			       queue_entry);
-	list_del(&tmf->queue_entry);
-	kfree(tmf);
 	mutex_unlock(&tpg->tv_tpg_mutex);
 
 	vhost_scsi_hotunplug(tpg, lun);
@@ -2332,7 +2305,6 @@ vhost_scsi_make_tpg(struct se_wwn *wwn, const char *name)
 	}
 	mutex_init(&tpg->tv_tpg_mutex);
 	INIT_LIST_HEAD(&tpg->tv_tpg_list);
-	INIT_LIST_HEAD(&tpg->tmf_queue);
 	tpg->tport = tport;
 	tpg->tport_tpgt = tpgt;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/7] vhost-scsi: Delay releasing our refcount on the tpg
  2023-03-21  2:06 [PATCH v2 0/7] vhost-scsi: Fix crashes and management op hangs Mike Christie
  2023-03-21  2:06 ` [PATCH v2 1/7] vhost-scsi: Fix vhost_scsi struct use after free Mike Christie
  2023-03-21  2:06 ` [PATCH v2 2/7] vhost-scsi: Fix crash during LUN unmapping Mike Christie
@ 2023-03-21  2:06 ` Mike Christie
  2023-03-21  2:06 ` [PATCH v2 4/7] vhost-scsi: Drop device mutex use in vhost_scsi_do_plug Mike Christie
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Mike Christie @ 2023-03-21  2:06 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, jasowang, mst, sgarzare,
	virtualization
  Cc: Mike Christie

We currently hold the vhost_scsi_mutex the entire time we are running
vhost_scsi_clear_endpoint. One of the reasons for this is that it prevents
userspace from being able to free the se_tpg from under us after we have
called target_undepend_item. However, it forces management operations for
for other devices to have to wait on a flakey device's:

vhost_scsi_clear_endpoint -> vhost_scsi_flush()

call which can which can take a long time.

This moves the target_undepend_item call and the tpg unsetup code to after
we have stopped new IO from starting up and after we have waited on
running IO. We can then release our refcount on the tpg and session
knowing our device is no longer accessing them. We can then drop the
vhost_scsi_mutex use during thee flush call in later patches in this set,
when we have removed other reasons for holding it.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/scsi.c | 61 +++++++++++++++++++++++++++-----------------
 1 file changed, 37 insertions(+), 24 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 32d0be968103..502d6803df0b 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1691,11 +1691,10 @@ vhost_scsi_clear_endpoint(struct vhost_scsi *vs,
 		if (!tpg)
 			continue;
 
-		mutex_lock(&tpg->tv_tpg_mutex);
 		tv_tport = tpg->tport;
 		if (!tv_tport) {
 			ret = -ENODEV;
-			goto err_tpg;
+			goto err_dev;
 		}
 
 		if (strcmp(tv_tport->tport_name, t->vhost_wwpn)) {
@@ -1704,35 +1703,51 @@ vhost_scsi_clear_endpoint(struct vhost_scsi *vs,
 				tv_tport->tport_name, tpg->tport_tpgt,
 				t->vhost_wwpn, t->vhost_tpgt);
 			ret = -EINVAL;
-			goto err_tpg;
+			goto err_dev;
 		}
+		match = true;
+	}
+	if (!match)
+		goto free_vs_tpg;
+
+	/* Prevent new cmds from starting and accessing the tpgs/sessions */
+	for (i = 0; i < vs->dev.nvqs; i++) {
+		vq = &vs->vqs[i].vq;
+		mutex_lock(&vq->mutex);
+		vhost_vq_set_backend(vq, NULL);
+		mutex_unlock(&vq->mutex);
+	}
+	/* Make sure cmds are not running before tearing them down. */
+	vhost_scsi_flush(vs);
+
+	for (i = 0; i < vs->dev.nvqs; i++) {
+		vq = &vs->vqs[i].vq;
+		vhost_scsi_destroy_vq_cmds(vq);
+	}
+
+	/*
+	 * We can now release our hold on the tpg and sessions and userspace
+	 * can free them after this point.
+	 */
+	for (i = 0; i < VHOST_SCSI_MAX_TARGET; i++) {
+		target = i;
+		tpg = vs->vs_tpg[target];
+		if (!tpg)
+			continue;
+
+		mutex_lock(&tpg->tv_tpg_mutex);
+
 		tpg->tv_tpg_vhost_count--;
 		tpg->vhost_scsi = NULL;
 		vs->vs_tpg[target] = NULL;
-		match = true;
+
 		mutex_unlock(&tpg->tv_tpg_mutex);
-		/*
-		 * Release se_tpg->tpg_group.cg_item configfs dependency now
-		 * to allow vhost-scsi WWPN se_tpg->tpg_group shutdown to occur.
-		 */
+
 		se_tpg = &tpg->se_tpg;
 		target_undepend_item(&se_tpg->tpg_group.cg_item);
 	}
-	if (match) {
-		for (i = 0; i < vs->dev.nvqs; i++) {
-			vq = &vs->vqs[i].vq;
-			mutex_lock(&vq->mutex);
-			vhost_vq_set_backend(vq, NULL);
-			mutex_unlock(&vq->mutex);
-		}
-		/* Make sure cmds are not running before tearing them down. */
-		vhost_scsi_flush(vs);
 
-		for (i = 0; i < vs->dev.nvqs; i++) {
-			vq = &vs->vqs[i].vq;
-			vhost_scsi_destroy_vq_cmds(vq);
-		}
-	}
+free_vs_tpg:
 	/*
 	 * Act as synchronize_rcu to make sure access to
 	 * old vs->vs_tpg is finished.
@@ -1745,8 +1760,6 @@ vhost_scsi_clear_endpoint(struct vhost_scsi *vs,
 	mutex_unlock(&vhost_scsi_mutex);
 	return 0;
 
-err_tpg:
-	mutex_unlock(&tpg->tv_tpg_mutex);
 err_dev:
 	mutex_unlock(&vs->dev.mutex);
 	mutex_unlock(&vhost_scsi_mutex);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 4/7] vhost-scsi: Drop device mutex use in vhost_scsi_do_plug
  2023-03-21  2:06 [PATCH v2 0/7] vhost-scsi: Fix crashes and management op hangs Mike Christie
                   ` (2 preceding siblings ...)
  2023-03-21  2:06 ` [PATCH v2 3/7] vhost-scsi: Delay releasing our refcount on the tpg Mike Christie
@ 2023-03-21  2:06 ` Mike Christie
  2023-03-21  2:06 ` [PATCH v2 5/7] vhost-scsi: Check for a cleared backend before queueing an event Mike Christie
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Mike Christie @ 2023-03-21  2:06 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, jasowang, mst, sgarzare,
	virtualization
  Cc: Mike Christie

We don't need the device mutex in vhost_scsi_do_plug because:
1. we have the vhost_scsi_mutex so the tpg->vhost_scsi pointer will not
change on us and the vhost_scsi can't be freed from under us if it was
set.
2. vhost_scsi_clear_endpoint will stop the virtqueues and flush them while
holding the vhost_scsi_mutex so we know once vhost_scsi_clear_endpoint
has completed that vhost_scsi_do_plug can't send new events and any
queued ones have completed.

So this patch drops the device mutex use in vhost_scsi_do_plug.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/scsi.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 502d6803df0b..c945136ecf18 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -2003,8 +2003,6 @@ vhost_scsi_do_plug(struct vhost_scsi_tpg *tpg,
 	if (!vs)
 		return;
 
-	mutex_lock(&vs->dev.mutex);
-
 	if (plug)
 		reason = VIRTIO_SCSI_EVT_RESET_RESCAN;
 	else
@@ -2016,7 +2014,6 @@ vhost_scsi_do_plug(struct vhost_scsi_tpg *tpg,
 		vhost_scsi_send_evt(vs, tpg, lun,
 				   VIRTIO_SCSI_T_TRANSPORT_RESET, reason);
 	mutex_unlock(&vq->mutex);
-	mutex_unlock(&vs->dev.mutex);
 }
 
 static void vhost_scsi_hotplug(struct vhost_scsi_tpg *tpg, struct se_lun *lun)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 5/7] vhost-scsi: Check for a cleared backend before queueing an event
  2023-03-21  2:06 [PATCH v2 0/7] vhost-scsi: Fix crashes and management op hangs Mike Christie
                   ` (3 preceding siblings ...)
  2023-03-21  2:06 ` [PATCH v2 4/7] vhost-scsi: Drop device mutex use in vhost_scsi_do_plug Mike Christie
@ 2023-03-21  2:06 ` Mike Christie
  2023-03-21  2:06 ` [PATCH v2 6/7] vhost-scsi: Drop vhost_scsi_mutex use in port callouts Mike Christie
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Mike Christie @ 2023-03-21  2:06 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, jasowang, mst, sgarzare,
	virtualization
  Cc: Mike Christie

We currenly hold the vhost_scsi_mutex while clearing the endpoint and
while performing vhost_scsi_do_plug, so tpg->vhost_scsi can't be freed
from uder us, and to make sure anything queued is handled by the
full call in vhost_scsi_clear_endpoint.

This patch removes the need for the vhost_scsi_mutex for the latter
case. In the next patches, we won't hold the vhost_scsi_mutex while
flushing so this patch adds a check for the clearing of the virtqueue
from vhost_scsi_clear_endpoint. We then know that once
vhost_scsi_clear_endpoint has cleared the backend that no new events
will be queued, and the flush after the vhost_vq_set_backend(vq, NULL)
call will see everything that's been queued to that point. So the flush
will then handle all events without the need for the vhost_scsi_mutex.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/scsi.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index c945136ecf18..ba8097fcea43 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -2010,9 +2010,17 @@ vhost_scsi_do_plug(struct vhost_scsi_tpg *tpg,

 	vq = &vs->vqs[VHOST_SCSI_VQ_EVT].vq;
 	mutex_lock(&vq->mutex);
+	/*
+	 * We can't queue events if the backend has been cleared, because
+	 * we could end up queueing an event after the flush.
+	 */
+	if (!vhost_vq_get_backend(vq))
+		goto unlock;
+
 	if (vhost_has_feature(vq, VIRTIO_SCSI_F_HOTPLUG))
 		vhost_scsi_send_evt(vs, tpg, lun,
 				   VIRTIO_SCSI_T_TRANSPORT_RESET, reason);
+unlock:
 	mutex_unlock(&vq->mutex);
 }

-- 
2.25.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 6/7] vhost-scsi: Drop vhost_scsi_mutex use in port callouts
  2023-03-21  2:06 [PATCH v2 0/7] vhost-scsi: Fix crashes and management op hangs Mike Christie
                   ` (4 preceding siblings ...)
  2023-03-21  2:06 ` [PATCH v2 5/7] vhost-scsi: Check for a cleared backend before queueing an event Mike Christie
@ 2023-03-21  2:06 ` Mike Christie
  2023-03-21  2:06 ` [PATCH v2 7/7] vhost-scsi: Reduce vhost_scsi_mutex use Mike Christie
  2023-03-21  2:29 ` [PATCH v2 0/7] vhost-scsi: Fix crashes and management op hangs michael.christie
  7 siblings, 0 replies; 11+ messages in thread
From: Mike Christie @ 2023-03-21  2:06 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, jasowang, mst, sgarzare,
	virtualization
  Cc: Mike Christie

We are using the vhost_scsi_mutex to make sure vhost_scsi_port_link and
vhost_scsi_port_unlink see if vhost_scsi_clear_endpoint has cleared
tpg->vhost_scsi and it can't be freed while they are using.
 
However, we currently set the tpg->vhost_scsi pointer while holding
tv_tpg_mutex. So, we can just hold that while calling
vhost_scsi_hotplug/hotunplug. We then don't need to hold the
vhost_scsi_mutex while vhost_scsi_clear_endpoint is holding it and doing
a flush which could cause the LUN map/unmap to have to wait on another
device's flush.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/scsi.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index ba8097fcea43..d4372a4aff49 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -2040,15 +2040,10 @@ static int vhost_scsi_port_link(struct se_portal_group *se_tpg,
 	struct vhost_scsi_tpg *tpg = container_of(se_tpg,
 				struct vhost_scsi_tpg, se_tpg);
 
-	mutex_lock(&vhost_scsi_mutex);
-
 	mutex_lock(&tpg->tv_tpg_mutex);
 	tpg->tv_tpg_port_count++;
-	mutex_unlock(&tpg->tv_tpg_mutex);
-
 	vhost_scsi_hotplug(tpg, lun);
-
-	mutex_unlock(&vhost_scsi_mutex);
+	mutex_unlock(&tpg->tv_tpg_mutex);
 
 	return 0;
 }
@@ -2059,15 +2054,10 @@ static void vhost_scsi_port_unlink(struct se_portal_group *se_tpg,
 	struct vhost_scsi_tpg *tpg = container_of(se_tpg,
 				struct vhost_scsi_tpg, se_tpg);
 
-	mutex_lock(&vhost_scsi_mutex);
-
 	mutex_lock(&tpg->tv_tpg_mutex);
 	tpg->tv_tpg_port_count--;
-	mutex_unlock(&tpg->tv_tpg_mutex);
-
 	vhost_scsi_hotunplug(tpg, lun);
-
-	mutex_unlock(&vhost_scsi_mutex);
+	mutex_unlock(&tpg->tv_tpg_mutex);
 }
 
 static ssize_t vhost_scsi_tpg_attrib_fabric_prot_type_store(
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 7/7] vhost-scsi: Reduce vhost_scsi_mutex use
  2023-03-21  2:06 [PATCH v2 0/7] vhost-scsi: Fix crashes and management op hangs Mike Christie
                   ` (5 preceding siblings ...)
  2023-03-21  2:06 ` [PATCH v2 6/7] vhost-scsi: Drop vhost_scsi_mutex use in port callouts Mike Christie
@ 2023-03-21  2:06 ` Mike Christie
  2023-03-21  2:29 ` [PATCH v2 0/7] vhost-scsi: Fix crashes and management op hangs michael.christie
  7 siblings, 0 replies; 11+ messages in thread
From: Mike Christie @ 2023-03-21  2:06 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, jasowang, mst, sgarzare,
	virtualization
  Cc: Mike Christie

We on longer need to hold the vhost_scsi_mutex the entire time we
set/clear the endpoint. The tv_tpg_mutex handles tpg accesses not related
to the tpg list, the port link/unlink functions use the tv_tpg_mutex while
accessing the tpg->vhost_scsi pointer, vhost_scsi_do_plug will no longer
queue events after the virtqueue's backend has been cleared and flushed,
and we don't drop our refcount to the tpg until after we have stopped
cmds and wait for outstanding cmds to complete.

This moves the vhost_scsi_mutex use to it's documented use of being used
to access the tpg list. We then don't need to hold it while a flush is
being performed causing other device's vhost_scsi_set_endpoint
and vhost_scsi_make_tpg/vhost_scsi_drop_tpg calls to have to wait on a
flakey device.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/scsi.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index d4372a4aff49..3b0b556c57ef 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -229,7 +229,10 @@ struct vhost_scsi_ctx {
 	struct iov_iter out_iter;
 };
 
-/* Global spinlock to protect vhost_scsi TPG list for vhost IOCTL access */
+/*
+ * Global mutex to protect vhost_scsi TPG list for vhost IOCTLs and LIO
+ * configfs management operations.
+ */
 static DEFINE_MUTEX(vhost_scsi_mutex);
 static LIST_HEAD(vhost_scsi_list);
 
@@ -1526,7 +1529,7 @@ static int vhost_scsi_setup_vq_cmds(struct vhost_virtqueue *vq, int max_cmds)
  * vhost_scsi_tpg with an active struct vhost_scsi_nexus
  *
  *  The lock nesting rule is:
- *    vhost_scsi_mutex -> vs->dev.mutex -> tpg->tv_tpg_mutex -> vq->mutex
+ *    vs->dev.mutex -> vhost_scsi_mutex -> tpg->tv_tpg_mutex -> vq->mutex
  */
 static int
 vhost_scsi_set_endpoint(struct vhost_scsi *vs,
@@ -1540,7 +1543,6 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs,
 	int index, ret, i, len;
 	bool match = false;
 
-	mutex_lock(&vhost_scsi_mutex);
 	mutex_lock(&vs->dev.mutex);
 
 	/* Verify that ring has been setup correctly. */
@@ -1561,6 +1563,7 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs,
 	if (vs->vs_tpg)
 		memcpy(vs_tpg, vs->vs_tpg, len);
 
+	mutex_lock(&vhost_scsi_mutex);
 	list_for_each_entry(tpg, &vhost_scsi_list, tv_tpg_list) {
 		mutex_lock(&tpg->tv_tpg_mutex);
 		if (!tpg->tpg_nexus) {
@@ -1576,6 +1579,7 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs,
 		if (!strcmp(tv_tport->tport_name, t->vhost_wwpn)) {
 			if (vs->vs_tpg && vs->vs_tpg[tpg->tport_tpgt]) {
 				mutex_unlock(&tpg->tv_tpg_mutex);
+				mutex_unlock(&vhost_scsi_mutex);
 				ret = -EEXIST;
 				goto undepend;
 			}
@@ -1590,6 +1594,7 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs,
 			if (ret) {
 				pr_warn("target_depend_item() failed: %d\n", ret);
 				mutex_unlock(&tpg->tv_tpg_mutex);
+				mutex_unlock(&vhost_scsi_mutex);
 				goto undepend;
 			}
 			tpg->tv_tpg_vhost_count++;
@@ -1599,6 +1604,7 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs,
 		}
 		mutex_unlock(&tpg->tv_tpg_mutex);
 	}
+	mutex_unlock(&vhost_scsi_mutex);
 
 	if (match) {
 		memcpy(vs->vs_vhost_wwpn, t->vhost_wwpn,
@@ -1654,7 +1660,6 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs,
 	kfree(vs_tpg);
 out:
 	mutex_unlock(&vs->dev.mutex);
-	mutex_unlock(&vhost_scsi_mutex);
 	return ret;
 }
 
@@ -1670,7 +1675,6 @@ vhost_scsi_clear_endpoint(struct vhost_scsi *vs,
 	int index, ret, i;
 	u8 target;
 
-	mutex_lock(&vhost_scsi_mutex);
 	mutex_lock(&vs->dev.mutex);
 	/* Verify that ring has been setup correctly. */
 	for (index = 0; index < vs->dev.nvqs; ++index) {
@@ -1757,12 +1761,10 @@ vhost_scsi_clear_endpoint(struct vhost_scsi *vs,
 	vs->vs_tpg = NULL;
 	WARN_ON(vs->vs_events_nr);
 	mutex_unlock(&vs->dev.mutex);
-	mutex_unlock(&vhost_scsi_mutex);
 	return 0;
 
 err_dev:
 	mutex_unlock(&vs->dev.mutex);
-	mutex_unlock(&vhost_scsi_mutex);
 	return ret;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/7] vhost-scsi: Fix crashes and management op hangs
  2023-03-21  2:06 [PATCH v2 0/7] vhost-scsi: Fix crashes and management op hangs Mike Christie
                   ` (6 preceding siblings ...)
  2023-03-21  2:06 ` [PATCH v2 7/7] vhost-scsi: Reduce vhost_scsi_mutex use Mike Christie
@ 2023-03-21  2:29 ` michael.christie
  2023-03-21  7:12   ` Michael S. Tsirkin
  7 siblings, 1 reply; 11+ messages in thread
From: michael.christie @ 2023-03-21  2:29 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, jasowang, mst, sgarzare,
	virtualization

On 3/20/23 9:06 PM, Mike Christie wrote:
> The following patches were made over Linus tree.

Hi Michael, I see you merged my first version of the patchset in your
vhost branch.

Do you want me to just send a followup patchset?

The major diff between the 2 versions:

1. I added the first 2 patches which fix some bugs in the existing code
I found while doing some code review and testing another LIO patchset
plus v1.

Note: The other day I posted that I thought the 3rd patch in v1 caused
the bugs but they were already in the code.

2. In v2 I made one of the patches not need the vhost device lock when
unmapping/mapping LUNs, so you can add new LUNs even if one LUN on the same
vhost_scsi device was hung.

Since it's not regressions with the existing patches, I can just send those
as a followup patchset if that's preferred.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/7] vhost-scsi: Fix crashes and management op hangs
  2023-03-21  2:29 ` [PATCH v2 0/7] vhost-scsi: Fix crashes and management op hangs michael.christie
@ 2023-03-21  7:12   ` Michael S. Tsirkin
  2023-03-21 17:25     ` Mike Christie
  0 siblings, 1 reply; 11+ messages in thread
From: Michael S. Tsirkin @ 2023-03-21  7:12 UTC (permalink / raw)
  To: michael.christie
  Cc: target-devel, linux-scsi, stefanha, jasowang, sgarzare, virtualization

On Mon, Mar 20, 2023 at 09:29:50PM -0500, michael.christie@oracle.com wrote:
> On 3/20/23 9:06 PM, Mike Christie wrote:
> > The following patches were made over Linus tree.
> 
> Hi Michael, I see you merged my first version of the patchset in your
> vhost branch.
> 
> Do you want me to just send a followup patchset?
> 
> The major diff between the 2 versions:
> 
> 1. I added the first 2 patches which fix some bugs in the existing code
> I found while doing some code review and testing another LIO patchset
> plus v1.
> 
> Note: The other day I posted that I thought the 3rd patch in v1 caused
> the bugs but they were already in the code.
> 
> 2. In v2 I made one of the patches not need the vhost device lock when
> unmapping/mapping LUNs, so you can add new LUNs even if one LUN on the same
> vhost_scsi device was hung.
> 
> Since it's not regressions with the existing patches, I can just send those
> as a followup patchset if that's preferred.

It's ok, I will drop v1 and replace it with v2.
Do you feel any of this is needed in this release?

-- 
MST


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/7] vhost-scsi: Fix crashes and management op hangs
  2023-03-21  7:12   ` Michael S. Tsirkin
@ 2023-03-21 17:25     ` Mike Christie
  0 siblings, 0 replies; 11+ messages in thread
From: Mike Christie @ 2023-03-21 17:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: target-devel, linux-scsi, stefanha, jasowang, sgarzare, virtualization

On 3/21/23 2:12 AM, Michael S. Tsirkin wrote:
> On Mon, Mar 20, 2023 at 09:29:50PM -0500, michael.christie@oracle.com wrote:
>> On 3/20/23 9:06 PM, Mike Christie wrote:
>>> The following patches were made over Linus tree.
>>
>> Hi Michael, I see you merged my first version of the patchset in your
>> vhost branch.
>>
>> Do you want me to just send a followup patchset?
>>
>> The major diff between the 2 versions:
>>
>> 1. I added the first 2 patches which fix some bugs in the existing code
>> I found while doing some code review and testing another LIO patchset
>> plus v1.
>>
>> Note: The other day I posted that I thought the 3rd patch in v1 caused
>> the bugs but they were already in the code.
>>
>> 2. In v2 I made one of the patches not need the vhost device lock when
>> unmapping/mapping LUNs, so you can add new LUNs even if one LUN on the same
>> vhost_scsi device was hung.
>>
>> Since it's not regressions with the existing patches, I can just send those
>> as a followup patchset if that's preferred.
> 
> It's ok, I will drop v1 and replace it with v2.
> Do you feel any of this is needed in this release?
> 

No. It can wait for 6.4. Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-03-21 17:25 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-21  2:06 [PATCH v2 0/7] vhost-scsi: Fix crashes and management op hangs Mike Christie
2023-03-21  2:06 ` [PATCH v2 1/7] vhost-scsi: Fix vhost_scsi struct use after free Mike Christie
2023-03-21  2:06 ` [PATCH v2 2/7] vhost-scsi: Fix crash during LUN unmapping Mike Christie
2023-03-21  2:06 ` [PATCH v2 3/7] vhost-scsi: Delay releasing our refcount on the tpg Mike Christie
2023-03-21  2:06 ` [PATCH v2 4/7] vhost-scsi: Drop device mutex use in vhost_scsi_do_plug Mike Christie
2023-03-21  2:06 ` [PATCH v2 5/7] vhost-scsi: Check for a cleared backend before queueing an event Mike Christie
2023-03-21  2:06 ` [PATCH v2 6/7] vhost-scsi: Drop vhost_scsi_mutex use in port callouts Mike Christie
2023-03-21  2:06 ` [PATCH v2 7/7] vhost-scsi: Reduce vhost_scsi_mutex use Mike Christie
2023-03-21  2:29 ` [PATCH v2 0/7] vhost-scsi: Fix crashes and management op hangs michael.christie
2023-03-21  7:12   ` Michael S. Tsirkin
2023-03-21 17:25     ` Mike Christie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).