Target-devel archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 00/17 V3] vhost: fix scsi cmd handling and cgroup support
@ 2020-10-22  0:34 Mike Christie
  2020-10-22  0:34 ` [PATCH 01/17] vhost scsi: add lun parser helper Mike Christie
                   ` (17 more replies)
  0 siblings, 18 replies; 43+ messages in thread
From: Mike Christie @ 2020-10-22  0:34 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

In-Reply-To: 

The following patches were made over Michael's vhost branch here:

https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/log/?h=vhost

They fix a couple issues with vhost-scsi when we hit the 256 cmd limit
that result in the guest getting IO errors, add LUN reset support so
devices are not offlined during transient errors, allow us to manage
vhost scsi IO with cgroups, and imrpove IOPs up to 2X.

The following patches are a follow up to this post:
https://patchwork.kernel.org/project/target-devel/cover/1600712588-9514-1-git-send-email-michael.christie@oracle.com/
which originally was fixing how vhost-scsi handled cmds so we would
not get IO errors when sending more than 256 cmds.

In that patchset I needed to detect if a vq was in use and for this
patch:
https://patchwork.kernel.org/project/target-devel/patch/1600712588-9514-3-git-send-email-michael.christie@oracle.com/
It was suggested to add support for VHOST_RING_ENABLE. While doing
that though I hit a couple problems:

1. The patches moved how vhost-scsi allocated cmds from per lio
session to per vhost vq. To support both VHOST_RING_ENABLE and
where userspace didn't support it, I would have to keep around the
old per session/device cmd allocator/completion and then also maintain
the new code. Or, I would still have to use this patch
patchwork.kernel.org/cover/11790763/ for the compat case so there
adding the new ioctl would not help much.

2. For vhost-scsi I also wanted to prevent where we allocate iovecs
for 128 vqs even though we normally use a couple. To do this, I needed
something similar to #1, but the problem is that the VHOST_RING_ENABLE
call would come too late.

To try and balance #1 and #2, these patches just allow vhost-scsi
to setup a vq when userspace starts to config it. This allows the
driver to only fully setup (we still waste some memory to support older
setups but do not have to preallocate everything like before) what
is used plus I do not need to maintain 2 code paths.

V3:
- fix compile errors
- fix possible crash where cmd could be freed while adding it to
completion list
- fix issue where we added the worker thread to the blk cgroup but
the blk IO was submitted by a driver workqueue.

V2:
- fix use before set cpu var errors
- drop vhost_vq_is_setup
- include patches to do a worker thread per scsi IO vq

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 01/17] vhost scsi: add lun parser helper
  2020-10-22  0:34 [PATCH 00/17 V3] vhost: fix scsi cmd handling and cgroup support Mike Christie
@ 2020-10-22  0:34 ` Mike Christie
  2020-10-26  3:33   ` Jason Wang
  2020-10-22  0:34 ` [PATCH 02/17] vhost: remove work arg from vhost_work_flush Mike Christie
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 43+ messages in thread
From: Mike Christie @ 2020-10-22  0:34 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

Move code to parse lun from req's lun_buf to helper, so tmf code
can use it in the next patch.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 drivers/vhost/scsi.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index b22adf0..0ea78d0 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -907,6 +907,11 @@ static void vhost_scsi_submission_work(struct work_struct *work)
 	return ret;
 }
 
+static u16 vhost_buf_to_lun(u8 *lun_buf)
+{
+	return ((lun_buf[2] << 8) | lun_buf[3]) & 0x3FFF;
+}
+
 static void
 vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
 {
@@ -1045,12 +1050,12 @@ static void vhost_scsi_submission_work(struct work_struct *work)
 			tag = vhost64_to_cpu(vq, v_req_pi.tag);
 			task_attr = v_req_pi.task_attr;
 			cdb = &v_req_pi.cdb[0];
-			lun = ((v_req_pi.lun[2] << 8) | v_req_pi.lun[3]) & 0x3FFF;
+			lun = vhost_buf_to_lun(v_req_pi.lun);
 		} else {
 			tag = vhost64_to_cpu(vq, v_req.tag);
 			task_attr = v_req.task_attr;
 			cdb = &v_req.cdb[0];
-			lun = ((v_req.lun[2] << 8) | v_req.lun[3]) & 0x3FFF;
+			lun = vhost_buf_to_lun(v_req.lun);
 		}
 		/*
 		 * Check that the received CDB size does not exceeded our
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 02/17] vhost: remove work arg from vhost_work_flush
  2020-10-22  0:34 [PATCH 00/17 V3] vhost: fix scsi cmd handling and cgroup support Mike Christie
  2020-10-22  0:34 ` [PATCH 01/17] vhost scsi: add lun parser helper Mike Christie
@ 2020-10-22  0:34 ` Mike Christie
  2020-10-22  0:51   ` Chaitanya Kulkarni
  2020-10-22  0:34 ` [PATCH 03/17] vhost net: use goto error handling in open Mike Christie
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 43+ messages in thread
From: Mike Christie @ 2020-10-22  0:34 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

vhost_work_flush doesn't do anything with the work arg. This patch drops
it and then renames vhost_work_flush to vhost_work_dev_flush to reflect
that the function flushes all the works in the dev and not just a
specific queue or work item.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/scsi.c  | 4 ++--
 drivers/vhost/vhost.c | 8 ++++----
 drivers/vhost/vhost.h | 2 +-
 drivers/vhost/vsock.c | 2 +-
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 0ea78d0..86617bb 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1370,8 +1370,8 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
 	/* Flush both the vhost poll and vhost work */
 	for (i = 0; i < VHOST_SCSI_MAX_VQ; i++)
 		vhost_scsi_flush_vq(vs, i);
-	vhost_work_flush(&vs->dev, &vs->vs_completion_work);
-	vhost_work_flush(&vs->dev, &vs->vs_event_work);
+	vhost_work_dev_flush(&vs->dev);
+	vhost_work_dev_flush(&vs->dev);
 
 	/* Wait for all reqs issued before the flush to be finished */
 	for (i = 0; i < VHOST_SCSI_MAX_VQ; i++)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 5c835a2..6818f71 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -231,7 +231,7 @@ void vhost_poll_stop(struct vhost_poll *poll)
 }
 EXPORT_SYMBOL_GPL(vhost_poll_stop);
 
-void vhost_work_flush(struct vhost_dev *dev, struct vhost_work *work)
+void vhost_work_dev_flush(struct vhost_dev *dev)
 {
 	struct vhost_flush_struct flush;
 
@@ -243,13 +243,13 @@ void vhost_work_flush(struct vhost_dev *dev, struct vhost_work *work)
 		wait_for_completion(&flush.wait_event);
 	}
 }
-EXPORT_SYMBOL_GPL(vhost_work_flush);
+EXPORT_SYMBOL_GPL(vhost_work_dev_flush);
 
 /* Flush any work that has been scheduled. When calling this, don't hold any
  * locks that are also used by the callback. */
 void vhost_poll_flush(struct vhost_poll *poll)
 {
-	vhost_work_flush(poll->dev, &poll->work);
+	vhost_work_dev_flush(poll->dev);
 }
 EXPORT_SYMBOL_GPL(vhost_poll_flush);
 
@@ -532,7 +532,7 @@ static int vhost_attach_cgroups(struct vhost_dev *dev)
 	attach.owner = current;
 	vhost_work_init(&attach.work, vhost_attach_cgroups_work);
 	vhost_work_queue(dev, &attach.work);
-	vhost_work_flush(dev, &attach.work);
+	vhost_work_dev_flush(dev);
 	return attach.ret;
 }
 
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index e016cd3..1365f33 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -46,7 +46,7 @@ void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
 void vhost_poll_stop(struct vhost_poll *poll);
 void vhost_poll_flush(struct vhost_poll *poll);
 void vhost_poll_queue(struct vhost_poll *poll);
-void vhost_work_flush(struct vhost_dev *dev, struct vhost_work *work);
+void vhost_work_dev_flush(struct vhost_dev *dev);
 long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *argp);
 
 struct vhost_log {
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index a483cec..f40205f 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -652,7 +652,7 @@ static void vhost_vsock_flush(struct vhost_vsock *vsock)
 	for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++)
 		if (vsock->vqs[i].handle_kick)
 			vhost_poll_flush(&vsock->vqs[i].poll);
-	vhost_work_flush(&vsock->dev, &vsock->send_pkt_work);
+	vhost_work_dev_flush(&vsock->dev);
 }
 
 static void vhost_vsock_reset_orphans(struct sock *sk)
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 03/17] vhost net: use goto error handling in open
  2020-10-22  0:34 [PATCH 00/17 V3] vhost: fix scsi cmd handling and cgroup support Mike Christie
  2020-10-22  0:34 ` [PATCH 01/17] vhost scsi: add lun parser helper Mike Christie
  2020-10-22  0:34 ` [PATCH 02/17] vhost: remove work arg from vhost_work_flush Mike Christie
@ 2020-10-22  0:34 ` Mike Christie
  2020-10-22  0:45   ` Chaitanya Kulkarni
  2020-10-26  3:34   ` Jason Wang
  2020-10-22  0:34 ` [PATCH 04/17] vhost: prep vhost_dev_init users to handle failures Mike Christie
                   ` (14 subsequent siblings)
  17 siblings, 2 replies; 43+ messages in thread
From: Mike Christie @ 2020-10-22  0:34 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

In the next patches vhost_dev_init will be able to fail. This patch has
vhost_net_open use goto error handling like is done in the other vhost
code to make handling vhost_dev_init failures easier to handle and
extend in the future.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/net.c | 29 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 531a00d..831d824 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1286,27 +1286,18 @@ static int vhost_net_open(struct inode *inode, struct file *f)
 	if (!n)
 		return -ENOMEM;
 	vqs = kmalloc_array(VHOST_NET_VQ_MAX, sizeof(*vqs), GFP_KERNEL);
-	if (!vqs) {
-		kvfree(n);
-		return -ENOMEM;
-	}
+	if (!vqs)
+		goto err_vqs;
 
 	queue = kmalloc_array(VHOST_NET_BATCH, sizeof(void *),
 			      GFP_KERNEL);
-	if (!queue) {
-		kfree(vqs);
-		kvfree(n);
-		return -ENOMEM;
-	}
+	if (!queue)
+		goto err_queue;
 	n->vqs[VHOST_NET_VQ_RX].rxq.queue = queue;
 
 	xdp = kmalloc_array(VHOST_NET_BATCH, sizeof(*xdp), GFP_KERNEL);
-	if (!xdp) {
-		kfree(vqs);
-		kvfree(n);
-		kfree(queue);
-		return -ENOMEM;
-	}
+	if (!xdp)
+		goto err_xdp;
 	n->vqs[VHOST_NET_VQ_TX].xdp = xdp;
 
 	dev = &n->dev;
@@ -1338,6 +1329,14 @@ static int vhost_net_open(struct inode *inode, struct file *f)
 	n->refcnt_bias = 0;
 
 	return 0;
+
+err_xdp:
+	kfree(queue);
+err_queue:
+	kfree(vqs);
+err_vqs:
+	kvfree(n);
+	return -ENOMEM;
 }
 
 static struct socket *vhost_net_stop_vq(struct vhost_net *n,
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 04/17] vhost: prep vhost_dev_init users to handle failures
  2020-10-22  0:34 [PATCH 00/17 V3] vhost: fix scsi cmd handling and cgroup support Mike Christie
                   ` (2 preceding siblings ...)
  2020-10-22  0:34 ` [PATCH 03/17] vhost net: use goto error handling in open Mike Christie
@ 2020-10-22  0:34 ` Mike Christie
  2020-10-22  5:22   ` kernel test robot
                     ` (3 more replies)
  2020-10-22  0:34 ` [PATCH 05/17] vhost: move vq iovec allocation to dev init time Mike Christie
                   ` (13 subsequent siblings)
  17 siblings, 4 replies; 43+ messages in thread
From: Mike Christie @ 2020-10-22  0:34 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

This is just a prep patch to get vhost_dev_init callers ready to handle
the next patch where the function can fail. In this patch vhost_dev_init
just returns 0, but I think it's easier to check for goto/error handling
errors separated from the next patch.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/net.c   | 11 +++++++----
 drivers/vhost/scsi.c  |  7 +++++--
 drivers/vhost/test.c  |  9 +++++++--
 drivers/vhost/vdpa.c  |  7 +++++--
 drivers/vhost/vhost.c | 14 ++++++++------
 drivers/vhost/vhost.h | 10 +++++-----
 drivers/vhost/vsock.c |  9 ++++++---
 7 files changed, 43 insertions(+), 24 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 831d824..fd30b53 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1316,10 +1316,11 @@ static int vhost_net_open(struct inode *inode, struct file *f)
 		n->vqs[i].rx_ring = NULL;
 		vhost_net_buf_init(&n->vqs[i].rxq);
 	}
-	vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX,
-		       UIO_MAXIOV + VHOST_NET_BATCH,
-		       VHOST_NET_PKT_WEIGHT, VHOST_NET_WEIGHT, true,
-		       NULL);
+	if (vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX,
+			   UIO_MAXIOV + VHOST_NET_BATCH,
+			   VHOST_NET_PKT_WEIGHT, VHOST_NET_WEIGHT, true,
+			   NULL))
+		goto err_dev_init;
 
 	vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, EPOLLOUT, dev);
 	vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, EPOLLIN, dev);
@@ -1330,6 +1331,8 @@ static int vhost_net_open(struct inode *inode, struct file *f)
 
 	return 0;
 
+err_dev_init:
+	kfree(xdp);
 err_xdp:
 	kfree(queue);
 err_queue:
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 86617bb..63ba363 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1632,14 +1632,17 @@ static int vhost_scsi_open(struct inode *inode, struct file *f)
 		vqs[i] = &vs->vqs[i].vq;
 		vs->vqs[i].vq.handle_kick = vhost_scsi_handle_kick;
 	}
-	vhost_dev_init(&vs->dev, vqs, VHOST_SCSI_MAX_VQ, UIO_MAXIOV,
-		       VHOST_SCSI_WEIGHT, 0, true, NULL);
+	if (vhost_dev_init(&vs->dev, vqs, VHOST_SCSI_MAX_VQ, UIO_MAXIOV,
+			   VHOST_SCSI_WEIGHT, 0, true, NULL))
+		goto err_dev_init;
 
 	vhost_scsi_init_inflight(vs, NULL);
 
 	f->private_data = vs;
 	return 0;
 
+err_dev_init:
+	kfree(vqs);
 err_vqs:
 	kvfree(vs);
 err_vs:
diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index a09dedc..c255ae5 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -119,12 +119,17 @@ static int vhost_test_open(struct inode *inode, struct file *f)
 	dev = &n->dev;
 	vqs[VHOST_TEST_VQ] = &n->vqs[VHOST_TEST_VQ];
 	n->vqs[VHOST_TEST_VQ].handle_kick = handle_vq_kick;
-	vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, UIO_MAXIOV,
-		       VHOST_TEST_PKT_WEIGHT, VHOST_TEST_WEIGHT, true, NULL);
+	if (vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, UIO_MAXIOV,
+			   VHOST_TEST_PKT_WEIGHT, VHOST_TEST_WEIGHT, true, NULL)
+		goto err_dev_init;
 
 	f->private_data = n;
 
 	return 0;
+
+err_dev_init:
+	kfree(vqs);
+	return -ENOMEM;
 }
 
 static void *vhost_test_stop_vq(struct vhost_test *n,
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index a2dbc85..9c8a686 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -810,8 +810,10 @@ static int vhost_vdpa_open(struct inode *inode, struct file *filep)
 		vqs[i] = &v->vqs[i];
 		vqs[i]->handle_kick = handle_vq_kick;
 	}
-	vhost_dev_init(dev, vqs, nvqs, 0, 0, 0, false,
-		       vhost_vdpa_process_iotlb_msg);
+	r = vhost_dev_init(dev, vqs, nvqs, 0, 0, 0, false,
+			   vhost_vdpa_process_iotlb_msg);
+	if (r)
+		goto err_dev_init;
 
 	dev->iotlb = vhost_iotlb_alloc(0, 0);
 	if (!dev->iotlb) {
@@ -829,6 +831,7 @@ static int vhost_vdpa_open(struct inode *inode, struct file *filep)
 
 err_init_iotlb:
 	vhost_dev_cleanup(&v->vdev);
+err_dev_init:
 	kfree(vqs);
 err:
 	atomic_dec(&v->opened);
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 6818f71..b35229e 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -458,12 +458,12 @@ static size_t vhost_get_desc_size(struct vhost_virtqueue *vq,
 	return sizeof(*vq->desc) * num;
 }
 
-void vhost_dev_init(struct vhost_dev *dev,
-		    struct vhost_virtqueue **vqs, int nvqs,
-		    int iov_limit, int weight, int byte_weight,
-		    bool use_worker,
-		    int (*msg_handler)(struct vhost_dev *dev,
-				       struct vhost_iotlb_msg *msg))
+int vhost_dev_init(struct vhost_dev *dev,
+		   struct vhost_virtqueue **vqs, int nvqs,
+		   int iov_limit, int weight, int byte_weight,
+		   bool use_worker,
+		   int (*msg_handler)(struct vhost_dev *dev,
+				      struct vhost_iotlb_msg *msg))
 {
 	struct vhost_virtqueue *vq;
 	int i;
@@ -500,6 +500,8 @@ void vhost_dev_init(struct vhost_dev *dev,
 			vhost_poll_init(&vq->poll, vq->handle_kick,
 					EPOLLIN, dev);
 	}
+
+	return 0;
 }
 EXPORT_SYMBOL_GPL(vhost_dev_init);
 
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 1365f33..9ad34b1 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -167,11 +167,11 @@ struct vhost_dev {
 };
 
 bool vhost_exceeds_weight(struct vhost_virtqueue *vq, int pkts, int total_len);
-void vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue **vqs,
-		    int nvqs, int iov_limit, int weight, int byte_weight,
-		    bool use_worker,
-		    int (*msg_handler)(struct vhost_dev *dev,
-				       struct vhost_iotlb_msg *msg));
+int vhost_dev_init(struct vhost_dev *dev, struct vhost_virtqueue **vqs,
+		   int nvqs, int iov_limit, int weight, int byte_weight,
+		   bool use_worker,
+		   int (*msg_handler)(struct vhost_dev *dev,
+				      struct vhost_iotlb_msg *msg));
 long vhost_dev_set_owner(struct vhost_dev *dev);
 bool vhost_dev_has_owner(struct vhost_dev *dev);
 long vhost_dev_check_owner(struct vhost_dev *);
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index f40205f..a1a35e1 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -630,9 +630,10 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
 	vsock->vqs[VSOCK_VQ_TX].handle_kick = vhost_vsock_handle_tx_kick;
 	vsock->vqs[VSOCK_VQ_RX].handle_kick = vhost_vsock_handle_rx_kick;
 
-	vhost_dev_init(&vsock->dev, vqs, ARRAY_SIZE(vsock->vqs),
-		       UIO_MAXIOV, VHOST_VSOCK_PKT_WEIGHT,
-		       VHOST_VSOCK_WEIGHT, true, NULL);
+	if (vhost_dev_init(&vsock->dev, vqs, ARRAY_SIZE(vsock->vqs),
+			   UIO_MAXIOV, VHOST_VSOCK_PKT_WEIGHT,
+			   VHOST_VSOCK_WEIGHT, true, NULL))
+		goto err_dev_init;
 
 	file->private_data = vsock;
 	spin_lock_init(&vsock->send_pkt_list_lock);
@@ -640,6 +641,8 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
 	vhost_work_init(&vsock->send_pkt_work, vhost_transport_send_pkt_work);
 	return 0;
 
+err_dev_init:
+	kfree(vqs);
 out:
 	vhost_vsock_free(vsock);
 	return ret;
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 05/17] vhost: move vq iovec allocation to dev init time
  2020-10-22  0:34 [PATCH 00/17 V3] vhost: fix scsi cmd handling and cgroup support Mike Christie
                   ` (3 preceding siblings ...)
  2020-10-22  0:34 ` [PATCH 04/17] vhost: prep vhost_dev_init users to handle failures Mike Christie
@ 2020-10-22  0:34 ` Mike Christie
  2020-10-22  0:34 ` [PATCH 06/17] vhost: support delayed vq creation Mike Christie
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 43+ messages in thread
From: Mike Christie @ 2020-10-22  0:34 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

The next patches allow us to create vqs on demand after vhost_dev_init
and vhost_dev_set_owner have been called. For vhost-scsi we don't
know the number of vqs we really want until the vring/vq setup
operations have started up. For other devices we know the number of vqs
at vhost_dev_init time, so for those devs we init the vq and allocate
the needed iovecs. For vhost-scsi we will do it later when userspace has
instructed us to create a vq.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/vhost.c | 71 +++++++++++++++++++++++++++------------------------
 1 file changed, 38 insertions(+), 33 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index b35229e..a4a4450 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -383,29 +383,27 @@ static void vhost_vq_free_iovecs(struct vhost_virtqueue *vq)
 	vq->heads = NULL;
 }
 
-/* Helper to allocate iovec buffers for all vqs. */
-static long vhost_dev_alloc_iovecs(struct vhost_dev *dev)
+static int vhost_vq_alloc_iovecs(struct vhost_dev *dev,
+				 struct vhost_virtqueue *vq)
 {
-	struct vhost_virtqueue *vq;
-	int i;
+	vq->indirect = kmalloc_array(UIO_MAXIOV, sizeof(*vq->indirect),
+				     GFP_KERNEL);
+	if (!vq->indirect)
+		return -ENOMEM;
+
+	if (!dev->iov_limit)
+		return 0;
+
+	vq->log = kmalloc_array(dev->iov_limit, sizeof(*vq->log), GFP_KERNEL);
+	vq->heads = kmalloc_array(dev->iov_limit, sizeof(*vq->heads),
+				  GFP_KERNEL);
+	if (!vq->log || !vq->heads)
+		goto err_nomem;
 
-	for (i = 0; i < dev->nvqs; ++i) {
-		vq = dev->vqs[i];
-		vq->indirect = kmalloc_array(UIO_MAXIOV,
-					     sizeof(*vq->indirect),
-					     GFP_KERNEL);
-		vq->log = kmalloc_array(dev->iov_limit, sizeof(*vq->log),
-					GFP_KERNEL);
-		vq->heads = kmalloc_array(dev->iov_limit, sizeof(*vq->heads),
-					  GFP_KERNEL);
-		if (!vq->indirect || !vq->log || !vq->heads)
-			goto err_nomem;
-	}
 	return 0;
 
 err_nomem:
-	for (; i >= 0; --i)
-		vhost_vq_free_iovecs(dev->vqs[i]);
+	vhost_vq_free_iovecs(vq);
 	return -ENOMEM;
 }
 
@@ -458,6 +456,21 @@ static size_t vhost_get_desc_size(struct vhost_virtqueue *vq,
 	return sizeof(*vq->desc) * num;
 }
 
+static int vhost_vq_init(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+{
+	vq->log = NULL;
+	vq->indirect = NULL;
+	vq->heads = NULL;
+	vq->dev = dev;
+	mutex_init(&vq->mutex);
+	vhost_vq_reset(dev, vq);
+
+	if (vq->handle_kick)
+		vhost_poll_init(&vq->poll, vq->handle_kick, EPOLLIN, dev);
+
+	return vhost_vq_alloc_iovecs(dev, vq);
+}
+
 int vhost_dev_init(struct vhost_dev *dev,
 		   struct vhost_virtqueue **vqs, int nvqs,
 		   int iov_limit, int weight, int byte_weight,
@@ -465,7 +478,6 @@ int vhost_dev_init(struct vhost_dev *dev,
 		   int (*msg_handler)(struct vhost_dev *dev,
 				      struct vhost_iotlb_msg *msg))
 {
-	struct vhost_virtqueue *vq;
 	int i;
 
 	dev->vqs = vqs;
@@ -489,19 +501,16 @@ int vhost_dev_init(struct vhost_dev *dev,
 
 
 	for (i = 0; i < dev->nvqs; ++i) {
-		vq = dev->vqs[i];
-		vq->log = NULL;
-		vq->indirect = NULL;
-		vq->heads = NULL;
-		vq->dev = dev;
-		mutex_init(&vq->mutex);
-		vhost_vq_reset(dev, vq);
-		if (vq->handle_kick)
-			vhost_poll_init(&vq->poll, vq->handle_kick,
-					EPOLLIN, dev);
+		if (vhost_vq_init(dev, dev->vqs[i]))
+			goto err_vq_init;
 	}
 
 	return 0;
+
+err_vq_init:
+	for (--i; i >= 0; --i)
+		vhost_vq_free_iovecs(dev->vqs[i]);
+	return -ENOMEM;
 }
 EXPORT_SYMBOL_GPL(vhost_dev_init);
 
@@ -606,10 +615,6 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
 			goto err_cgroup;
 	}
 
-	err = vhost_dev_alloc_iovecs(dev);
-	if (err)
-		goto err_cgroup;
-
 	return 0;
 err_cgroup:
 	if (dev->worker) {
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 06/17] vhost: support delayed vq creation
  2020-10-22  0:34 [PATCH 00/17 V3] vhost: fix scsi cmd handling and cgroup support Mike Christie
                   ` (4 preceding siblings ...)
  2020-10-22  0:34 ` [PATCH 05/17] vhost: move vq iovec allocation to dev init time Mike Christie
@ 2020-10-22  0:34 ` Mike Christie
  2020-10-22  0:34 ` [PATCH 07/17] vhost scsi: support delayed IO " Mike Christie
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 43+ messages in thread
From: Mike Christie @ 2020-10-22  0:34 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

This allows vq creation to be done when it's first accessed by
userspace. vhost-scsi doesn't know how many queues the user requested
until they are first setup, and we don't want to allocate resources
like the iovecs for 128 vqs when we are only using 1 or 2 most of the
time. In the next pathces, vhost-scsi will also switch to preallocating
cmds per vq instead of per lio session and we don't want to allocate
them for 127 extra vqs if they are not in use.

With this patch when a driver calls vhost_dev_init they pass in the
number of vqs that they know they need and the max they can support.
This patch has all the drivers pass in the same value for both the
initial number of vqs and the max. The next patch will convert scsi.
The other drivers like net/vsock have their vqs hard coded in the
kernel or setup/discovered via other methods like with vdpa.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/net.c   |  2 +-
 drivers/vhost/scsi.c  |  4 +--
 drivers/vhost/test.c  |  5 ++--
 drivers/vhost/vdpa.c  |  2 +-
 drivers/vhost/vhost.c | 71 ++++++++++++++++++++++++++++++++++-----------------
 drivers/vhost/vhost.h |  7 +++--
 drivers/vhost/vsock.c | 11 ++++----
 7 files changed, 66 insertions(+), 36 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index fd30b53..fce46f0 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1316,7 +1316,7 @@ static int vhost_net_open(struct inode *inode, struct file *f)
 		n->vqs[i].rx_ring = NULL;
 		vhost_net_buf_init(&n->vqs[i].rxq);
 	}
-	if (vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX,
+	if (vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX, VHOST_NET_VQ_MAX,
 			   UIO_MAXIOV + VHOST_NET_BATCH,
 			   VHOST_NET_PKT_WEIGHT, VHOST_NET_WEIGHT, true,
 			   NULL))
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 63ba363..5d412f1 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1632,8 +1632,8 @@ static int vhost_scsi_open(struct inode *inode, struct file *f)
 		vqs[i] = &vs->vqs[i].vq;
 		vs->vqs[i].vq.handle_kick = vhost_scsi_handle_kick;
 	}
-	if (vhost_dev_init(&vs->dev, vqs, VHOST_SCSI_MAX_VQ, UIO_MAXIOV,
-			   VHOST_SCSI_WEIGHT, 0, true, NULL))
+	if (vhost_dev_init(&vs->dev, vqs, VHOST_SCSI_MAX_VQ, VHOST_SCSI_MAX_VQ,
+			   UIO_MAXIOV, VHOST_SCSI_WEIGHT, 0, true, NULL))
 		goto err_dev_init;
 
 	vhost_scsi_init_inflight(vs, NULL);
diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index c255ae5..9d2bfa3 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -119,8 +119,9 @@ static int vhost_test_open(struct inode *inode, struct file *f)
 	dev = &n->dev;
 	vqs[VHOST_TEST_VQ] = &n->vqs[VHOST_TEST_VQ];
 	n->vqs[VHOST_TEST_VQ].handle_kick = handle_vq_kick;
-	if (vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, UIO_MAXIOV,
-			   VHOST_TEST_PKT_WEIGHT, VHOST_TEST_WEIGHT, true, NULL)
+	if (vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, VHOST_TEST_VQ_MAX,
+			   UIO_MAXIOV, VHOST_TEST_PKT_WEIGHT, VHOST_TEST_WEIGHT,
+			   true, NULL)
 		goto err_dev_init;
 
 	f->private_data = n;
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 9c8a686..313ff5a 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -810,7 +810,7 @@ static int vhost_vdpa_open(struct inode *inode, struct file *filep)
 		vqs[i] = &v->vqs[i];
 		vqs[i]->handle_kick = handle_vq_kick;
 	}
-	r = vhost_dev_init(dev, vqs, nvqs, 0, 0, 0, false,
+	r = vhost_dev_init(dev, vqs, nvqs, nvqs, 0, 0, 0, false,
 			   vhost_vdpa_process_iotlb_msg);
 	if (r)
 		goto err_dev_init;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index a4a4450..2ca2e71 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -294,7 +294,7 @@ static void vhost_vq_meta_reset(struct vhost_dev *d)
 {
 	int i;
 
-	for (i = 0; i < d->nvqs; ++i)
+	for (i = 0; i < d->max_nvqs; ++i)
 		__vhost_vq_meta_reset(d->vqs[i]);
 }
 
@@ -331,6 +331,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 	vq->busyloop_timeout = 0;
 	vq->umem = NULL;
 	vq->iotlb = NULL;
+	vq->initialized = false;
 	vhost_vring_call_reset(&vq->call_ctx);
 	__vhost_vq_meta_reset(vq);
 }
@@ -411,7 +412,7 @@ static void vhost_dev_free_iovecs(struct vhost_dev *dev)
 {
 	int i;
 
-	for (i = 0; i < dev->nvqs; ++i)
+	for (i = 0; i < dev->max_nvqs; ++i)
 		vhost_vq_free_iovecs(dev->vqs[i]);
 }
 
@@ -456,7 +457,7 @@ static size_t vhost_get_desc_size(struct vhost_virtqueue *vq,
 	return sizeof(*vq->desc) * num;
 }
 
-static int vhost_vq_init(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+static void __vhost_vq_init(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 {
 	vq->log = NULL;
 	vq->indirect = NULL;
@@ -467,12 +468,29 @@ static int vhost_vq_init(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 
 	if (vq->handle_kick)
 		vhost_poll_init(&vq->poll, vq->handle_kick, EPOLLIN, dev);
+}
+
+static int vhost_vq_init(struct vhost_dev *dev, int vq_idx)
+{
+	struct vhost_virtqueue *vq;
+	int ret;
+
+	if (vq_idx >= dev->max_nvqs)
+		return -ENOBUFS;
+
+	vq = dev->vqs[vq_idx];
+	__vhost_vq_init(dev, vq);
+	ret = vhost_vq_alloc_iovecs(dev, vq);
+	if (ret)
+		return ret;
 
-	return vhost_vq_alloc_iovecs(dev, vq);
+	vq->initialized = true;
+	dev->nvqs++;
+	return 0;
 }
 
 int vhost_dev_init(struct vhost_dev *dev,
-		   struct vhost_virtqueue **vqs, int nvqs,
+		   struct vhost_virtqueue **vqs, int nvqs, int max_nvqs,
 		   int iov_limit, int weight, int byte_weight,
 		   bool use_worker,
 		   int (*msg_handler)(struct vhost_dev *dev,
@@ -481,7 +499,8 @@ int vhost_dev_init(struct vhost_dev *dev,
 	int i;
 
 	dev->vqs = vqs;
-	dev->nvqs = nvqs;
+	dev->nvqs = 0;
+	dev->max_nvqs = max_nvqs;
 	mutex_init(&dev->mutex);
 	dev->log_ctx = NULL;
 	dev->umem = NULL;
@@ -499,12 +518,15 @@ int vhost_dev_init(struct vhost_dev *dev,
 	INIT_LIST_HEAD(&dev->pending_list);
 	spin_lock_init(&dev->iotlb_lock);
 
-
-	for (i = 0; i < dev->nvqs; ++i) {
-		if (vhost_vq_init(dev, dev->vqs[i]))
+	for (i = 0; i < nvqs; ++i) {
+		if (vhost_vq_init(dev, i))
 			goto err_vq_init;
 	}
 
+	for (; i < dev->max_nvqs; ++i)
+		/* Just prep/clear the fields and set initializedúlse */
+		__vhost_vq_init(dev, dev->vqs[i]);
+
 	return 0;
 
 err_vq_init:
@@ -652,7 +674,7 @@ void vhost_dev_reset_owner(struct vhost_dev *dev, struct vhost_iotlb *umem)
 	/* We don't need VQ locks below since vhost_dev_cleanup makes sure
 	 * VQs aren't running.
 	 */
-	for (i = 0; i < dev->nvqs; ++i)
+	for (i = 0; i < dev->max_nvqs; ++i)
 		dev->vqs[i]->umem = umem;
 }
 EXPORT_SYMBOL_GPL(vhost_dev_reset_owner);
@@ -661,7 +683,7 @@ void vhost_dev_stop(struct vhost_dev *dev)
 {
 	int i;
 
-	for (i = 0; i < dev->nvqs; ++i) {
+	for (i = 0; i < dev->max_nvqs; ++i) {
 		if (dev->vqs[i]->kick && dev->vqs[i]->handle_kick) {
 			vhost_poll_stop(&dev->vqs[i]->poll);
 			vhost_poll_flush(&dev->vqs[i]->poll);
@@ -693,7 +715,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
 {
 	int i;
 
-	for (i = 0; i < dev->nvqs; ++i) {
+	for (i = 0; i < dev->max_nvqs; ++i) {
 		if (dev->vqs[i]->error_ctx)
 			eventfd_ctx_put(dev->vqs[i]->error_ctx);
 		if (dev->vqs[i]->kick)
@@ -787,7 +809,7 @@ static bool memory_access_ok(struct vhost_dev *d, struct vhost_iotlb *umem,
 {
 	int i;
 
-	for (i = 0; i < d->nvqs; ++i) {
+	for (i = 0; i < d->max_nvqs; ++i) {
 		bool ok;
 		bool log;
 
@@ -999,14 +1021,14 @@ static inline int vhost_put_used_idx(struct vhost_virtqueue *vq)
 static void vhost_dev_lock_vqs(struct vhost_dev *d)
 {
 	int i = 0;
-	for (i = 0; i < d->nvqs; ++i)
+	for (i = 0; i < d->max_nvqs; ++i)
 		mutex_lock_nested(&d->vqs[i]->mutex, i);
 }
 
 static void vhost_dev_unlock_vqs(struct vhost_dev *d)
 {
 	int i = 0;
-	for (i = 0; i < d->nvqs; ++i)
+	for (i = 0; i < d->max_nvqs; ++i)
 		mutex_unlock(&d->vqs[i]->mutex);
 }
 
@@ -1462,7 +1484,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
 	d->umem = newumem;
 
 	/* All memory accesses are done under some VQ mutex. */
-	for (i = 0; i < d->nvqs; ++i) {
+	for (i = 0; i < d->max_nvqs; ++i) {
 		mutex_lock(&d->vqs[i]->mutex);
 		d->vqs[i]->umem = newumem;
 		mutex_unlock(&d->vqs[i]->mutex);
@@ -1590,11 +1612,14 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
 	r = get_user(idx, idxp);
 	if (r < 0)
 		return r;
-	if (idx >= d->nvqs)
-		return -ENOBUFS;
 
-	idx = array_index_nospec(idx, d->nvqs);
+	idx = array_index_nospec(idx, d->max_nvqs);
 	vq = d->vqs[idx];
+	if (!vq->initialized) {
+		r = vhost_vq_init(d, idx);
+		if (r)
+			return r;
+	}
 
 	if (ioctl = VHOST_SET_VRING_NUM ||
 	    ioctl = VHOST_SET_VRING_ADDR) {
@@ -1724,7 +1749,7 @@ int vhost_init_device_iotlb(struct vhost_dev *d, bool enabled)
 	oiotlb = d->iotlb;
 	d->iotlb = niotlb;
 
-	for (i = 0; i < d->nvqs; ++i) {
+	for (i = 0; i < d->max_nvqs; ++i) {
 		struct vhost_virtqueue *vq = d->vqs[i];
 
 		mutex_lock(&vq->mutex);
@@ -1771,7 +1796,7 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *argp)
 			r = -EFAULT;
 			break;
 		}
-		for (i = 0; i < d->nvqs; ++i) {
+		for (i = 0; i < d->max_nvqs; ++i) {
 			struct vhost_virtqueue *vq;
 			void __user *base = (void __user *)(unsigned long)p;
 			vq = d->vqs[i];
@@ -1794,7 +1819,7 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *argp)
 			break;
 		}
 		swap(ctx, d->log_ctx);
-		for (i = 0; i < d->nvqs; ++i) {
+		for (i = 0; i < d->max_nvqs; ++i) {
 			mutex_lock(&d->vqs[i]->mutex);
 			d->vqs[i]->log_ctx = d->log_ctx;
 			mutex_unlock(&d->vqs[i]->mutex);
@@ -2609,7 +2634,7 @@ void vhost_set_backend_features(struct vhost_dev *dev, u64 features)
 	int i;
 
 	mutex_lock(&dev->mutex);
-	for (i = 0; i < dev->nvqs; ++i) {
+	for (i = 0; i < dev->max_nvqs; ++i) {
 		vq = dev->vqs[i];
 		mutex_lock(&vq->mutex);
 		vq->acked_backend_features = features;
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 9ad34b1..9677870 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -132,6 +132,8 @@ struct vhost_virtqueue {
 	bool user_be;
 #endif
 	u32 busyloop_timeout;
+
+	bool initialized;
 };
 
 struct vhost_msg_node {
@@ -148,6 +150,7 @@ struct vhost_dev {
 	struct mutex mutex;
 	struct vhost_virtqueue **vqs;
 	int nvqs;
+	int max_nvqs;
 	struct eventfd_ctx *log_ctx;
 	struct llist_head work_list;
 	struct task_struct *worker;
@@ -168,8 +171,8 @@ struct vhost_dev {
 
 bool vhost_exceeds_weight(struct vhost_virtqueue *vq, int pkts, int total_len);
 int vhost_dev_init(struct vhost_dev *dev, struct vhost_virtqueue **vqs,
-		   int nvqs, int iov_limit, int weight, int byte_weight,
-		   bool use_worker,
+		   int nvqs, int max_nvqs, int iov_limit, int weight,
+		   int byte_weight, bool use_worker,
 		   int (*msg_handler)(struct vhost_dev *dev,
 				      struct vhost_iotlb_msg *msg));
 long vhost_dev_set_owner(struct vhost_dev *dev);
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index a1a35e1..9200868 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -606,7 +606,7 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
 {
 	struct vhost_virtqueue **vqs;
 	struct vhost_vsock *vsock;
-	int ret;
+	int ret, nvqs;
 
 	/* This struct is large and allocation could fail, fall back to vmalloc
 	 * if there is no other way.
@@ -615,7 +615,8 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
 	if (!vsock)
 		return -ENOMEM;
 
-	vqs = kmalloc_array(ARRAY_SIZE(vsock->vqs), sizeof(*vqs), GFP_KERNEL);
+	nvqs = ARRAY_SIZE(vsock->vqs);
+	vqs = kmalloc_array(nvqs, sizeof(*vqs), GFP_KERNEL);
 	if (!vqs) {
 		ret = -ENOMEM;
 		goto out;
@@ -630,9 +631,9 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
 	vsock->vqs[VSOCK_VQ_TX].handle_kick = vhost_vsock_handle_tx_kick;
 	vsock->vqs[VSOCK_VQ_RX].handle_kick = vhost_vsock_handle_rx_kick;
 
-	if (vhost_dev_init(&vsock->dev, vqs, ARRAY_SIZE(vsock->vqs),
-			   UIO_MAXIOV, VHOST_VSOCK_PKT_WEIGHT,
-			   VHOST_VSOCK_WEIGHT, true, NULL))
+	if (vhost_dev_init(&vsock->dev, vqs, nvqs, nvqs, UIO_MAXIOV,
+			   VHOST_VSOCK_PKT_WEIGHT, VHOST_VSOCK_WEIGHT, true,
+			   NULL))
 		goto err_dev_init;
 
 	file->private_data = vsock;
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 07/17] vhost scsi: support delayed IO vq creation
  2020-10-22  0:34 [PATCH 00/17 V3] vhost: fix scsi cmd handling and cgroup support Mike Christie
                   ` (5 preceding siblings ...)
  2020-10-22  0:34 ` [PATCH 06/17] vhost: support delayed vq creation Mike Christie
@ 2020-10-22  0:34 ` Mike Christie
  2020-10-26  3:51   ` Jason Wang
  2020-10-22  0:34 ` [PATCH 08/17] vhost scsi: alloc cmds per vq instead of session Mike Christie
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 43+ messages in thread
From: Mike Christie @ 2020-10-22  0:34 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

Each vhost-scsi device will need a evt and ctl queue, but the number
of IO queues depends on whatever the user has configured in userspace.
This patch has vhost-scsi create the evt, ctl and one IO vq at device
open time. We then create the other IO vqs when userspace starts to
set them up. We still waste some mem on the vq and scsi vq structs,
but we don't waste mem on iovec related arrays and for later patches
we know which queues are used by the dev->nvqs value.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/scsi.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 5d412f1..ab1b656 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1401,7 +1401,7 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
 	mutex_lock(&vs->dev.mutex);
 
 	/* Verify that ring has been setup correctly. */
-	for (index = 0; index < vs->dev.nvqs; ++index) {
+	for (index = 0; index < vs->dev.max_nvqs; ++index) {
 		/* Verify that ring has been setup correctly. */
 		if (!vhost_vq_access_ok(&vs->vqs[index].vq)) {
 			ret = -EFAULT;
@@ -1464,6 +1464,9 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
 		       sizeof(vs->vs_vhost_wwpn));
 		for (i = 0; i < VHOST_SCSI_MAX_VQ; i++) {
 			vq = &vs->vqs[i].vq;
+			if (!vq->initialized)
+				continue;
+
 			mutex_lock(&vq->mutex);
 			vhost_vq_set_backend(vq, vs_tpg);
 			vhost_vq_init_access(vq);
@@ -1503,7 +1506,7 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
 	mutex_lock(&vhost_scsi_mutex);
 	mutex_lock(&vs->dev.mutex);
 	/* Verify that ring has been setup correctly. */
-	for (index = 0; index < vs->dev.nvqs; ++index) {
+	for (index = 0; index < vs->dev.max_nvqs; ++index) {
 		if (!vhost_vq_access_ok(&vs->vqs[index].vq)) {
 			ret = -EFAULT;
 			goto err_dev;
@@ -1551,6 +1554,9 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
 	if (match) {
 		for (i = 0; i < VHOST_SCSI_MAX_VQ; i++) {
 			vq = &vs->vqs[i].vq;
+			if (!vq->initialized)
+				continue;
+
 			mutex_lock(&vq->mutex);
 			vhost_vq_set_backend(vq, NULL);
 			mutex_unlock(&vq->mutex);
@@ -1632,8 +1638,13 @@ static int vhost_scsi_open(struct inode *inode, struct file *f)
 		vqs[i] = &vs->vqs[i].vq;
 		vs->vqs[i].vq.handle_kick = vhost_scsi_handle_kick;
 	}
-	if (vhost_dev_init(&vs->dev, vqs, VHOST_SCSI_MAX_VQ, VHOST_SCSI_MAX_VQ,
-			   UIO_MAXIOV, VHOST_SCSI_WEIGHT, 0, true, NULL))
+
+	/*
+	 * We will always need the ctl, evt and at least 1 IO vq. Create more
+	 * IO vqs if userspace requests them.
+	 */
+	if (vhost_dev_init(&vs->dev, vqs, 3, VHOST_SCSI_MAX_VQ, UIO_MAXIOV,
+			   VHOST_SCSI_WEIGHT, 0, true, NULL))
 		goto err_dev_init;
 
 	vhost_scsi_init_inflight(vs, NULL);
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 08/17] vhost scsi: alloc cmds per vq instead of session
  2020-10-22  0:34 [PATCH 00/17 V3] vhost: fix scsi cmd handling and cgroup support Mike Christie
                   ` (6 preceding siblings ...)
  2020-10-22  0:34 ` [PATCH 07/17] vhost scsi: support delayed IO " Mike Christie
@ 2020-10-22  0:34 ` Mike Christie
  2020-10-22  0:34 ` [PATCH 09/17] vhost scsi: fix cmd completion race Mike Christie
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 43+ messages in thread
From: Mike Christie @ 2020-10-22  0:34 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

We currently are limited to 256 cmds per session. This leads to problems
where if the user has increased virtqueue_size to more than 2 or
cmd_per_lun to more than 256 vhost_scsi_get_tag can fail and the guest
will get IO errors.

This patch moves the cmd allocation to per vq so we can easily match
whatever the user has specified for num_queues and
virtqueue_size/cmd_per_lun. It also makes it easier to control how much
memory we preallocate. For cases, where perf is not as important and
we can use the current defaults (1 vq and 128 cmds per vq) memory use
from preallocate cmds is cut in half. For cases, where we are willing
to use more memory for higher perf, cmd mem use will now increase as
the num queues and queue depth increases.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/scsi.c | 207 +++++++++++++++++++++++++++++++--------------------
 1 file changed, 128 insertions(+), 79 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index ab1b656..f6b9010 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -52,7 +52,6 @@
 #define VHOST_SCSI_VERSION  "v0.1"
 #define VHOST_SCSI_NAMELEN 256
 #define VHOST_SCSI_MAX_CDB_SIZE 32
-#define VHOST_SCSI_DEFAULT_TAGS 256
 #define VHOST_SCSI_PREALLOC_SGLS 2048
 #define VHOST_SCSI_PREALLOC_UPAGES 2048
 #define VHOST_SCSI_PREALLOC_PROT_SGLS 2048
@@ -189,6 +188,9 @@ struct vhost_scsi_virtqueue {
 	 * Writers must also take dev mutex and flush under it.
 	 */
 	int inflight_idx;
+	struct vhost_scsi_cmd *scsi_cmds;
+	struct sbitmap scsi_tags;
+	int max_cmds;
 };
 
 struct vhost_scsi {
@@ -324,7 +326,9 @@ static void vhost_scsi_release_cmd(struct se_cmd *se_cmd)
 {
 	struct vhost_scsi_cmd *tv_cmd = container_of(se_cmd,
 				struct vhost_scsi_cmd, tvc_se_cmd);
-	struct se_session *se_sess = tv_cmd->tvc_nexus->tvn_se_sess;
+	struct vhost_scsi_virtqueue *svq = container_of(tv_cmd->tvc_vq,
+				struct vhost_scsi_virtqueue, vq);
+	struct vhost_scsi_inflight *inflight = tv_cmd->inflight;
 	int i;
 
 	if (tv_cmd->tvc_sgl_count) {
@@ -336,8 +340,8 @@ static void vhost_scsi_release_cmd(struct se_cmd *se_cmd)
 			put_page(sg_page(&tv_cmd->tvc_prot_sgl[i]));
 	}
 
-	vhost_scsi_put_inflight(tv_cmd->inflight);
-	target_free_tag(se_sess, se_cmd);
+	sbitmap_clear_bit(&svq->scsi_tags, se_cmd->map_tag);
+	vhost_scsi_put_inflight(inflight);
 }
 
 static u32 vhost_scsi_sess_get_index(struct se_session *se_sess)
@@ -566,31 +570,31 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work *work)
 }
 
 static struct vhost_scsi_cmd *
-vhost_scsi_get_tag(struct vhost_virtqueue *vq, struct vhost_scsi_tpg *tpg,
+vhost_scsi_get_cmd(struct vhost_virtqueue *vq, struct vhost_scsi_tpg *tpg,
 		   unsigned char *cdb, u64 scsi_tag, u16 lun, u8 task_attr,
 		   u32 exp_data_len, int data_direction)
 {
+	struct vhost_scsi_virtqueue *svq = container_of(vq,
+					struct vhost_scsi_virtqueue, vq);
 	struct vhost_scsi_cmd *cmd;
 	struct vhost_scsi_nexus *tv_nexus;
-	struct se_session *se_sess;
 	struct scatterlist *sg, *prot_sg;
 	struct page **pages;
-	int tag, cpu;
+	int tag;
 
 	tv_nexus = tpg->tpg_nexus;
 	if (!tv_nexus) {
 		pr_err("Unable to locate active struct vhost_scsi_nexus\n");
 		return ERR_PTR(-EIO);
 	}
-	se_sess = tv_nexus->tvn_se_sess;
 
-	tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
+	tag = sbitmap_get(&svq->scsi_tags, 0, false);
 	if (tag < 0) {
 		pr_err("Unable to obtain tag for vhost_scsi_cmd\n");
 		return ERR_PTR(-ENOMEM);
 	}
 
-	cmd = &((struct vhost_scsi_cmd *)se_sess->sess_cmd_map)[tag];
+	cmd = &svq->scsi_cmds[tag];
 	sg = cmd->tvc_sgl;
 	prot_sg = cmd->tvc_prot_sgl;
 	pages = cmd->tvc_upages;
@@ -599,7 +603,6 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work *work)
 	cmd->tvc_prot_sgl = prot_sg;
 	cmd->tvc_upages = pages;
 	cmd->tvc_se_cmd.map_tag = tag;
-	cmd->tvc_se_cmd.map_cpu = cpu;
 	cmd->tvc_tag = scsi_tag;
 	cmd->tvc_lun = lun;
 	cmd->tvc_task_attr = task_attr;
@@ -1070,11 +1073,11 @@ static u16 vhost_buf_to_lun(u8 *lun_buf)
 				scsi_command_size(cdb), VHOST_SCSI_MAX_CDB_SIZE);
 				goto err;
 		}
-		cmd = vhost_scsi_get_tag(vq, tpg, cdb, tag, lun, task_attr,
+		cmd = vhost_scsi_get_cmd(vq, tpg, cdb, tag, lun, task_attr,
 					 exp_data_len + prot_bytes,
 					 data_direction);
 		if (IS_ERR(cmd)) {
-			vq_err(vq, "vhost_scsi_get_tag failed %ld\n",
+			vq_err(vq, "vhost_scsi_get_cmd failed %ld\n",
 			       PTR_ERR(cmd));
 			goto err;
 		}
@@ -1378,6 +1381,83 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
 		wait_for_completion(&old_inflight[i]->comp);
 }
 
+static void vhost_scsi_destroy_vq_cmds(struct vhost_virtqueue *vq)
+{
+	struct vhost_scsi_virtqueue *svq = container_of(vq,
+					struct vhost_scsi_virtqueue, vq);
+	struct vhost_scsi_cmd *tv_cmd;
+	unsigned int i;
+
+	if (!svq->scsi_cmds)
+		return;
+
+	for (i = 0; i < svq->max_cmds; i++) {
+		tv_cmd = &svq->scsi_cmds[i];
+
+		kfree(tv_cmd->tvc_sgl);
+		kfree(tv_cmd->tvc_prot_sgl);
+		kfree(tv_cmd->tvc_upages);
+	}
+
+	sbitmap_free(&svq->scsi_tags);
+	kfree(svq->scsi_cmds);
+	svq->scsi_cmds = NULL;
+}
+
+static int vhost_scsi_setup_vq_cmds(struct vhost_virtqueue *vq, int max_cmds)
+{
+	struct vhost_scsi_virtqueue *svq = container_of(vq,
+					struct vhost_scsi_virtqueue, vq);
+	struct vhost_scsi_cmd *tv_cmd;
+	unsigned int i;
+
+	if (svq->scsi_cmds)
+		return 0;
+
+	if (sbitmap_init_node(&svq->scsi_tags, max_cmds, -1, GFP_KERNEL,
+			      NUMA_NO_NODE))
+		return -ENOMEM;
+	svq->max_cmds = max_cmds;
+
+	svq->scsi_cmds = kcalloc(max_cmds, sizeof(*tv_cmd), GFP_KERNEL);
+	if (!svq->scsi_cmds) {
+		sbitmap_free(&svq->scsi_tags);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < max_cmds; i++) {
+		tv_cmd = &svq->scsi_cmds[i];
+
+		tv_cmd->tvc_sgl = kcalloc(VHOST_SCSI_PREALLOC_SGLS,
+					  sizeof(struct scatterlist),
+					  GFP_KERNEL);
+		if (!tv_cmd->tvc_sgl) {
+			pr_err("Unable to allocate tv_cmd->tvc_sgl\n");
+			goto out;
+		}
+
+		tv_cmd->tvc_upages = kcalloc(VHOST_SCSI_PREALLOC_UPAGES,
+					     sizeof(struct page *),
+					     GFP_KERNEL);
+		if (!tv_cmd->tvc_upages) {
+			pr_err("Unable to allocate tv_cmd->tvc_upages\n");
+			goto out;
+		}
+
+		tv_cmd->tvc_prot_sgl = kcalloc(VHOST_SCSI_PREALLOC_PROT_SGLS,
+					       sizeof(struct scatterlist),
+					       GFP_KERNEL);
+		if (!tv_cmd->tvc_prot_sgl) {
+			pr_err("Unable to allocate tv_cmd->tvc_prot_sgl\n");
+			goto out;
+		}
+	}
+	return 0;
+out:
+	vhost_scsi_destroy_vq_cmds(vq);
+	return -ENOMEM;
+}
+
 /*
  * Called from vhost_scsi_ioctl() context to walk the list of available
  * vhost_scsi_tpg with an active struct vhost_scsi_nexus
@@ -1432,10 +1512,9 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
 
 		if (!strcmp(tv_tport->tport_name, t->vhost_wwpn)) {
 			if (vs->vs_tpg && vs->vs_tpg[tpg->tport_tpgt]) {
-				kfree(vs_tpg);
 				mutex_unlock(&tpg->tv_tpg_mutex);
 				ret = -EEXIST;
-				goto out;
+				goto undepend;
 			}
 			/*
 			 * In order to ensure individual vhost-scsi configfs
@@ -1447,9 +1526,8 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
 			ret = target_depend_item(&se_tpg->tpg_group.cg_item);
 			if (ret) {
 				pr_warn("target_depend_item() failed: %d\n", ret);
-				kfree(vs_tpg);
 				mutex_unlock(&tpg->tv_tpg_mutex);
-				goto out;
+				goto undepend;
 			}
 			tpg->tv_tpg_vhost_count++;
 			tpg->vhost_scsi = vs;
@@ -1462,6 +1540,16 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
 	if (match) {
 		memcpy(vs->vs_vhost_wwpn, t->vhost_wwpn,
 		       sizeof(vs->vs_vhost_wwpn));
+
+		for (i = VHOST_SCSI_VQ_IO; i < VHOST_SCSI_MAX_VQ; i++) {
+			vq = &vs->vqs[i].vq;
+			if (!vq->initialized)
+				continue;
+
+			if (vhost_scsi_setup_vq_cmds(vq, vq->num))
+				goto destroy_vq_cmds;
+		}
+
 		for (i = 0; i < VHOST_SCSI_MAX_VQ; i++) {
 			vq = &vs->vqs[i].vq;
 			if (!vq->initialized)
@@ -1484,7 +1572,22 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
 	vhost_scsi_flush(vs);
 	kfree(vs->vs_tpg);
 	vs->vs_tpg = vs_tpg;
+	goto out;
 
+destroy_vq_cmds:
+	for (i--; i >= VHOST_SCSI_VQ_IO; i--) {
+		if (!vhost_vq_get_backend(&vs->vqs[i].vq))
+			vhost_scsi_destroy_vq_cmds(&vs->vqs[i].vq);
+	}
+undepend:
+	for (i = 0; i < VHOST_SCSI_MAX_TARGET; i++) {
+		tpg = vs_tpg[i];
+		if (tpg) {
+			tpg->tv_tpg_vhost_count--;
+			target_undepend_item(&tpg->se_tpg.tpg_group.cg_item);
+		}
+	}
+	kfree(vs_tpg);
 out:
 	mutex_unlock(&vs->dev.mutex);
 	mutex_unlock(&vhost_scsi_mutex);
@@ -1560,6 +1663,12 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
 			mutex_lock(&vq->mutex);
 			vhost_vq_set_backend(vq, NULL);
 			mutex_unlock(&vq->mutex);
+			/*
+			 * Make sure cmds are not running before tearing them
+			 * down.
+			 */
+			vhost_scsi_flush(vs);
+			vhost_scsi_destroy_vq_cmds(vq);
 		}
 	}
 	/*
@@ -1861,23 +1970,6 @@ static void vhost_scsi_port_unlink(struct se_portal_group *se_tpg,
 	mutex_unlock(&vhost_scsi_mutex);
 }
 
-static void vhost_scsi_free_cmd_map_res(struct se_session *se_sess)
-{
-	struct vhost_scsi_cmd *tv_cmd;
-	unsigned int i;
-
-	if (!se_sess->sess_cmd_map)
-		return;
-
-	for (i = 0; i < VHOST_SCSI_DEFAULT_TAGS; i++) {
-		tv_cmd = &((struct vhost_scsi_cmd *)se_sess->sess_cmd_map)[i];
-
-		kfree(tv_cmd->tvc_sgl);
-		kfree(tv_cmd->tvc_prot_sgl);
-		kfree(tv_cmd->tvc_upages);
-	}
-}
-
 static ssize_t vhost_scsi_tpg_attrib_fabric_prot_type_store(
 		struct config_item *item, const char *page, size_t count)
 {
@@ -1917,45 +2009,6 @@ static ssize_t vhost_scsi_tpg_attrib_fabric_prot_type_show(
 	NULL,
 };
 
-static int vhost_scsi_nexus_cb(struct se_portal_group *se_tpg,
-			       struct se_session *se_sess, void *p)
-{
-	struct vhost_scsi_cmd *tv_cmd;
-	unsigned int i;
-
-	for (i = 0; i < VHOST_SCSI_DEFAULT_TAGS; i++) {
-		tv_cmd = &((struct vhost_scsi_cmd *)se_sess->sess_cmd_map)[i];
-
-		tv_cmd->tvc_sgl = kcalloc(VHOST_SCSI_PREALLOC_SGLS,
-					  sizeof(struct scatterlist),
-					  GFP_KERNEL);
-		if (!tv_cmd->tvc_sgl) {
-			pr_err("Unable to allocate tv_cmd->tvc_sgl\n");
-			goto out;
-		}
-
-		tv_cmd->tvc_upages = kcalloc(VHOST_SCSI_PREALLOC_UPAGES,
-					     sizeof(struct page *),
-					     GFP_KERNEL);
-		if (!tv_cmd->tvc_upages) {
-			pr_err("Unable to allocate tv_cmd->tvc_upages\n");
-			goto out;
-		}
-
-		tv_cmd->tvc_prot_sgl = kcalloc(VHOST_SCSI_PREALLOC_PROT_SGLS,
-					       sizeof(struct scatterlist),
-					       GFP_KERNEL);
-		if (!tv_cmd->tvc_prot_sgl) {
-			pr_err("Unable to allocate tv_cmd->tvc_prot_sgl\n");
-			goto out;
-		}
-	}
-	return 0;
-out:
-	vhost_scsi_free_cmd_map_res(se_sess);
-	return -ENOMEM;
-}
-
 static int vhost_scsi_make_nexus(struct vhost_scsi_tpg *tpg,
 				const char *name)
 {
@@ -1979,12 +2032,9 @@ static int vhost_scsi_make_nexus(struct vhost_scsi_tpg *tpg,
 	 * struct se_node_acl for the vhost_scsi struct se_portal_group with
 	 * the SCSI Initiator port name of the passed configfs group 'name'.
 	 */
-	tv_nexus->tvn_se_sess = target_setup_session(&tpg->se_tpg,
-					VHOST_SCSI_DEFAULT_TAGS,
-					sizeof(struct vhost_scsi_cmd),
+	tv_nexus->tvn_se_sess = target_setup_session(&tpg->se_tpg, 0, 0,
 					TARGET_PROT_DIN_PASS | TARGET_PROT_DOUT_PASS,
-					(unsigned char *)name, tv_nexus,
-					vhost_scsi_nexus_cb);
+					(unsigned char *)name, tv_nexus, NULL);
 	if (IS_ERR(tv_nexus->tvn_se_sess)) {
 		mutex_unlock(&tpg->tv_tpg_mutex);
 		kfree(tv_nexus);
@@ -2034,7 +2084,6 @@ static int vhost_scsi_drop_nexus(struct vhost_scsi_tpg *tpg)
 		" %s Initiator Port: %s\n", vhost_scsi_dump_proto_id(tpg->tport),
 		tv_nexus->tvn_se_sess->se_node_acl->initiatorname);
 
-	vhost_scsi_free_cmd_map_res(se_sess);
 	/*
 	 * Release the SCSI I_T Nexus to the emulated vhost Target Port
 	 */
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 09/17] vhost scsi: fix cmd completion race
  2020-10-22  0:34 [PATCH 00/17 V3] vhost: fix scsi cmd handling and cgroup support Mike Christie
                   ` (7 preceding siblings ...)
  2020-10-22  0:34 ` [PATCH 08/17] vhost scsi: alloc cmds per vq instead of session Mike Christie
@ 2020-10-22  0:34 ` Mike Christie
  2020-10-27 13:07   ` Maurizio Lombardi
  2020-10-30  8:51   ` Michael S. Tsirkin
  2020-10-22  0:34 ` [PATCH 10/17] vhost scsi: Add support for LUN resets Mike Christie
                   ` (8 subsequent siblings)
  17 siblings, 2 replies; 43+ messages in thread
From: Mike Christie @ 2020-10-22  0:34 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

We might not do the final se_cmd put from vhost_scsi_complete_cmd_work.
When the last put happens a little later then we could race where
vhost_scsi_complete_cmd_work does vhost_signal, the guest runs and sends
more IO, and vhost_scsi_handle_vq runs but does not find any free cmds.

This patch has us delay completing the cmd until the last lio core ref
is dropped. We then know that once we signal to the guest that the cmd
is completed that if it queues a new command it will find a free cmd.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/scsi.c | 42 +++++++++++++++---------------------------
 1 file changed, 15 insertions(+), 27 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index f6b9010..2fa48dd 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -322,7 +322,7 @@ static u32 vhost_scsi_tpg_get_inst_index(struct se_portal_group *se_tpg)
 	return 1;
 }
 
-static void vhost_scsi_release_cmd(struct se_cmd *se_cmd)
+static void vhost_scsi_release_cmd_res(struct se_cmd *se_cmd)
 {
 	struct vhost_scsi_cmd *tv_cmd = container_of(se_cmd,
 				struct vhost_scsi_cmd, tvc_se_cmd);
@@ -344,6 +344,16 @@ static void vhost_scsi_release_cmd(struct se_cmd *se_cmd)
 	vhost_scsi_put_inflight(inflight);
 }
 
+static void vhost_scsi_release_cmd(struct se_cmd *se_cmd)
+{
+	struct vhost_scsi_cmd *cmd = container_of(se_cmd,
+					struct vhost_scsi_cmd, tvc_se_cmd);
+	struct vhost_scsi *vs = cmd->tvc_vhost;
+
+	llist_add(&cmd->tvc_completion_list, &vs->vs_completion_list);
+	vhost_work_queue(&vs->dev, &vs->vs_completion_work);
+}
+
 static u32 vhost_scsi_sess_get_index(struct se_session *se_sess)
 {
 	return 0;
@@ -366,28 +376,15 @@ static int vhost_scsi_get_cmd_state(struct se_cmd *se_cmd)
 	return 0;
 }
 
-static void vhost_scsi_complete_cmd(struct vhost_scsi_cmd *cmd)
-{
-	struct vhost_scsi *vs = cmd->tvc_vhost;
-
-	llist_add(&cmd->tvc_completion_list, &vs->vs_completion_list);
-
-	vhost_work_queue(&vs->dev, &vs->vs_completion_work);
-}
-
 static int vhost_scsi_queue_data_in(struct se_cmd *se_cmd)
 {
-	struct vhost_scsi_cmd *cmd = container_of(se_cmd,
-				struct vhost_scsi_cmd, tvc_se_cmd);
-	vhost_scsi_complete_cmd(cmd);
+	transport_generic_free_cmd(se_cmd, 0);
 	return 0;
 }
 
 static int vhost_scsi_queue_status(struct se_cmd *se_cmd)
 {
-	struct vhost_scsi_cmd *cmd = container_of(se_cmd,
-				struct vhost_scsi_cmd, tvc_se_cmd);
-	vhost_scsi_complete_cmd(cmd);
+	transport_generic_free_cmd(se_cmd, 0);
 	return 0;
 }
 
@@ -433,15 +430,6 @@ static void vhost_scsi_free_evt(struct vhost_scsi *vs, struct vhost_scsi_evt *ev
 	return evt;
 }
 
-static void vhost_scsi_free_cmd(struct vhost_scsi_cmd *cmd)
-{
-	struct se_cmd *se_cmd = &cmd->tvc_se_cmd;
-
-	/* TODO locking against target/backend threads? */
-	transport_generic_free_cmd(se_cmd, 0);
-
-}
-
 static int vhost_scsi_check_stop_free(struct se_cmd *se_cmd)
 {
 	return target_put_sess_cmd(se_cmd);
@@ -560,7 +548,7 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work *work)
 		} else
 			pr_err("Faulted on virtio_scsi_cmd_resp\n");
 
-		vhost_scsi_free_cmd(cmd);
+		vhost_scsi_release_cmd_res(se_cmd);
 	}
 
 	vq = -1;
@@ -1096,7 +1084,7 @@ static u16 vhost_buf_to_lun(u8 *lun_buf)
 						      &prot_iter, exp_data_len,
 						      &data_iter))) {
 				vq_err(vq, "Failed to map iov to sgl\n");
-				vhost_scsi_release_cmd(&cmd->tvc_se_cmd);
+				vhost_scsi_release_cmd_res(&cmd->tvc_se_cmd);
 				goto err;
 			}
 		}
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 10/17] vhost scsi: Add support for LUN resets.
  2020-10-22  0:34 [PATCH 00/17 V3] vhost: fix scsi cmd handling and cgroup support Mike Christie
                   ` (8 preceding siblings ...)
  2020-10-22  0:34 ` [PATCH 09/17] vhost scsi: fix cmd completion race Mike Christie
@ 2020-10-22  0:34 ` Mike Christie
  2020-10-22  0:34 ` [PATCH 11/17] vhost scsi: remove extra flushes Mike Christie
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 43+ messages in thread
From: Mike Christie @ 2020-10-22  0:34 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

In newer versions of virtio-scsi we just reset the timer when an a
command times out, so TMFs are never sent for the cmd time out case.
However, in older kernels and for the TMF inject cases, we can still get
resets and we end up just failing immediately so the guest might see the
device get offlined and IO errors.

For the older kernel cases, we want the same end result as the
modern virtio-scsi driver where we let the lower levels fire their error
handling and handle the problem. And at the upper levels we want to
wait. This patch ties the LUN reset handling into the LIO TMF code which
will just wait for outstanding commands to complete like we are doing in
the modern virtio-scsi case.

Note: I did not handle the ABORT case to keep this simple. For ABORTs
LIO just waits on the cmd like how it does for the RESET case. If
an ABORT fails, the guest OS ends up escalating to LUN RESET, so in
the end we get the same behavior where we wait on the outstanding
cmds.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/scsi.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 134 insertions(+), 13 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 2fa48dd..f543fa0 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -139,6 +139,7 @@ struct vhost_scsi_tpg {
 	struct se_portal_group se_tpg;
 	/* Pointer back to vhost_scsi, protected by tv_tpg_mutex */
 	struct vhost_scsi *vhost_scsi;
+	struct list_head tmf_queue;
 };
 
 struct vhost_scsi_tport {
@@ -211,6 +212,20 @@ struct vhost_scsi {
 	int vs_events_nr; /* num of pending events, protected by vq->mutex */
 };
 
+struct vhost_scsi_tmf {
+	struct vhost_work vwork;
+	struct vhost_scsi_tpg *tpg;
+	struct vhost_scsi *vhost;
+	struct vhost_scsi_virtqueue *svq;
+	struct list_head queue_entry;
+
+	struct se_cmd se_cmd;
+	struct vhost_scsi_inflight *inflight;
+	struct iovec resp_iov;
+	int in_iovs;
+	int vq_desc;
+};
+
 /*
  * Context for processing request and control queue operations.
  */
@@ -344,14 +359,32 @@ static void vhost_scsi_release_cmd_res(struct se_cmd *se_cmd)
 	vhost_scsi_put_inflight(inflight);
 }
 
+static void vhost_scsi_release_tmf_res(struct vhost_scsi_tmf *tmf)
+{
+	struct vhost_scsi_tpg *tpg = tmf->tpg;
+	struct vhost_scsi_inflight *inflight = tmf->inflight;
+
+	mutex_lock(&tpg->tv_tpg_mutex);
+	list_add_tail(&tpg->tmf_queue, &tmf->queue_entry);
+	mutex_unlock(&tpg->tv_tpg_mutex);
+	vhost_scsi_put_inflight(inflight);
+}
+
 static void vhost_scsi_release_cmd(struct se_cmd *se_cmd)
 {
-	struct vhost_scsi_cmd *cmd = container_of(se_cmd,
+	if (se_cmd->se_cmd_flags & SCF_SCSI_TMR_CDB) {
+		struct vhost_scsi_tmf *tmf = container_of(se_cmd,
+					struct vhost_scsi_tmf, se_cmd);
+
+		vhost_work_queue(&tmf->vhost->dev, &tmf->vwork);
+	} else {
+		struct vhost_scsi_cmd *cmd = container_of(se_cmd,
 					struct vhost_scsi_cmd, tvc_se_cmd);
-	struct vhost_scsi *vs = cmd->tvc_vhost;
+		struct vhost_scsi *vs = cmd->tvc_vhost;
 
-	llist_add(&cmd->tvc_completion_list, &vs->vs_completion_list);
-	vhost_work_queue(&vs->dev, &vs->vs_completion_work);
+		llist_add(&cmd->tvc_completion_list, &vs->vs_completion_list);
+		vhost_work_queue(&vs->dev, &vs->vs_completion_work);
+	}
 }
 
 static u32 vhost_scsi_sess_get_index(struct se_session *se_sess)
@@ -390,7 +423,10 @@ static int vhost_scsi_queue_status(struct se_cmd *se_cmd)
 
 static void vhost_scsi_queue_tm_rsp(struct se_cmd *se_cmd)
 {
-	return;
+	struct vhost_scsi_tmf *tmf = container_of(se_cmd, struct vhost_scsi_tmf,
+						  se_cmd);
+
+	transport_generic_free_cmd(&tmf->se_cmd, 0);
 }
 
 static void vhost_scsi_aborted_task(struct se_cmd *se_cmd)
@@ -1120,9 +1156,9 @@ static u16 vhost_buf_to_lun(u8 *lun_buf)
 }
 
 static void
-vhost_scsi_send_tmf_reject(struct vhost_scsi *vs,
-			   struct vhost_virtqueue *vq,
-			   struct vhost_scsi_ctx *vc)
+vhost_scsi_send_tmf_resp(struct vhost_scsi *vs, struct vhost_virtqueue *vq,
+			 int in_iovs, int vq_desc, struct iovec *resp_iov,
+			 int tmf_resp_code)
 {
 	struct virtio_scsi_ctrl_tmf_resp rsp;
 	struct iov_iter iov_iter;
@@ -1130,17 +1166,87 @@ static u16 vhost_buf_to_lun(u8 *lun_buf)
 
 	pr_debug("%s\n", __func__);
 	memset(&rsp, 0, sizeof(rsp));
-	rsp.response = VIRTIO_SCSI_S_FUNCTION_REJECTED;
+	rsp.response = tmf_resp_code;
 
-	iov_iter_init(&iov_iter, READ, &vq->iov[vc->out], vc->in, sizeof(rsp));
+	iov_iter_init(&iov_iter, READ, resp_iov, in_iovs, sizeof(rsp));
 
 	ret = copy_to_iter(&rsp, sizeof(rsp), &iov_iter);
 	if (likely(ret = sizeof(rsp)))
-		vhost_add_used_and_signal(&vs->dev, vq, vc->head, 0);
+		vhost_add_used_and_signal(&vs->dev, vq, vq_desc, 0);
 	else
 		pr_err("Faulted on virtio_scsi_ctrl_tmf_resp\n");
 }
 
+static void vhost_scsi_tmf_resp_work(struct vhost_work *work)
+{
+	struct vhost_scsi_tmf *tmf = container_of(work, struct vhost_scsi_tmf,
+						  vwork);
+	int resp_code;
+
+	if (tmf->se_cmd.se_tmr_req->response = TMR_FUNCTION_COMPLETE)
+		resp_code = VIRTIO_SCSI_S_FUNCTION_SUCCEEDED;
+	else
+		resp_code = VIRTIO_SCSI_S_FUNCTION_REJECTED;
+
+	vhost_scsi_send_tmf_resp(tmf->vhost, &tmf->svq->vq, tmf->in_iovs,
+				 tmf->vq_desc, &tmf->resp_iov, resp_code);
+	vhost_scsi_release_tmf_res(tmf);
+}
+
+static void
+vhost_scsi_handle_tmf(struct vhost_scsi *vs, struct vhost_scsi_tpg *tpg,
+		      struct vhost_virtqueue *vq,
+		      struct virtio_scsi_ctrl_tmf_req *vtmf,
+		      struct vhost_scsi_ctx *vc)
+{
+	struct vhost_scsi_virtqueue *svq = container_of(vq,
+					struct vhost_scsi_virtqueue, vq);
+	struct vhost_scsi_tmf *tmf;
+
+	if (vhost32_to_cpu(vq, vtmf->subtype) !+	    VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET)
+		goto send_reject;
+
+	if (!tpg->tpg_nexus || !tpg->tpg_nexus->tvn_se_sess) {
+		pr_err("Unable to locate active struct vhost_scsi_nexus for LUN RESET.\n");
+		goto send_reject;
+	}
+
+	mutex_lock(&tpg->tv_tpg_mutex);
+	if (list_empty(&tpg->tmf_queue)) {
+		pr_err("Missing reserve TMF. Could not handle LUN RESET.\n");
+		mutex_unlock(&tpg->tv_tpg_mutex);
+		goto send_reject;
+	}
+
+	tmf = list_first_entry(&tpg->tmf_queue, struct vhost_scsi_tmf,
+			       queue_entry);
+	list_del_init(&tmf->queue_entry);
+	mutex_unlock(&tpg->tv_tpg_mutex);
+
+	tmf->tpg = tpg;
+	tmf->vhost = vs;
+	tmf->svq = svq;
+	tmf->resp_iov = vq->iov[vc->out];
+	tmf->vq_desc = vc->head;
+	tmf->in_iovs = vc->in;
+	tmf->inflight = vhost_scsi_get_inflight(vq);
+
+	if (target_submit_tmr(&tmf->se_cmd, tpg->tpg_nexus->tvn_se_sess, NULL,
+			      vhost_buf_to_lun(vtmf->lun), NULL,
+			      TMR_LUN_RESET, GFP_KERNEL, 0,
+			      TARGET_SCF_ACK_KREF) < 0) {
+		vhost_scsi_release_tmf_res(tmf);
+		goto send_reject;
+	}
+
+	return;
+
+send_reject:
+	vhost_scsi_send_tmf_resp(vs, vq, vc->in, vc->head, &vq->iov[vc->out],
+				 VIRTIO_SCSI_S_FUNCTION_REJECTED);
+}
+
 static void
 vhost_scsi_send_an_resp(struct vhost_scsi *vs,
 			struct vhost_virtqueue *vq,
@@ -1166,6 +1272,7 @@ static u16 vhost_buf_to_lun(u8 *lun_buf)
 static void
 vhost_scsi_ctl_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
 {
+	struct vhost_scsi_tpg *tpg;
 	union {
 		__virtio32 type;
 		struct virtio_scsi_ctrl_an_req an;
@@ -1247,12 +1354,12 @@ static u16 vhost_buf_to_lun(u8 *lun_buf)
 		vc.req += typ_size;
 		vc.req_size -= typ_size;
 
-		ret = vhost_scsi_get_req(vq, &vc, NULL);
+		ret = vhost_scsi_get_req(vq, &vc, &tpg);
 		if (ret)
 			goto err;
 
 		if (v_req.type = VIRTIO_SCSI_T_TMF)
-			vhost_scsi_send_tmf_reject(vs, vq, &vc);
+			vhost_scsi_handle_tmf(vs, tpg, vq, &v_req.tmf, &vc);
 		else
 			vhost_scsi_send_an_resp(vs, vq, &vc);
 err:
@@ -1927,11 +2034,19 @@ static int vhost_scsi_port_link(struct se_portal_group *se_tpg,
 {
 	struct vhost_scsi_tpg *tpg = container_of(se_tpg,
 				struct vhost_scsi_tpg, se_tpg);
+	struct vhost_scsi_tmf *tmf;
+
+	tmf = kzalloc(sizeof(*tmf), GFP_KERNEL);
+	if (!tmf)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&tmf->queue_entry);
+	vhost_work_init(&tmf->vwork, vhost_scsi_tmf_resp_work);
 
 	mutex_lock(&vhost_scsi_mutex);
 
 	mutex_lock(&tpg->tv_tpg_mutex);
 	tpg->tv_tpg_port_count++;
+	list_add_tail(&tmf->queue_entry, &tpg->tmf_queue);
 	mutex_unlock(&tpg->tv_tpg_mutex);
 
 	vhost_scsi_hotplug(tpg, lun);
@@ -1946,11 +2061,16 @@ static void vhost_scsi_port_unlink(struct se_portal_group *se_tpg,
 {
 	struct vhost_scsi_tpg *tpg = container_of(se_tpg,
 				struct vhost_scsi_tpg, se_tpg);
+	struct vhost_scsi_tmf *tmf;
 
 	mutex_lock(&vhost_scsi_mutex);
 
 	mutex_lock(&tpg->tv_tpg_mutex);
 	tpg->tv_tpg_port_count--;
+	tmf = list_first_entry(&tpg->tmf_queue, struct vhost_scsi_tmf,
+			       queue_entry);
+	list_del(&tmf->queue_entry);
+	kfree(tmf);
 	mutex_unlock(&tpg->tv_tpg_mutex);
 
 	vhost_scsi_hotunplug(tpg, lun);
@@ -2211,6 +2331,7 @@ static ssize_t vhost_scsi_tpg_nexus_store(struct config_item *item,
 	}
 	mutex_init(&tpg->tv_tpg_mutex);
 	INIT_LIST_HEAD(&tpg->tv_tpg_list);
+	INIT_LIST_HEAD(&tpg->tmf_queue);
 	tpg->tport = tport;
 	tpg->tport_tpgt = tpgt;
 
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 11/17] vhost scsi: remove extra flushes
  2020-10-22  0:34 [PATCH 00/17 V3] vhost: fix scsi cmd handling and cgroup support Mike Christie
                   ` (9 preceding siblings ...)
  2020-10-22  0:34 ` [PATCH 10/17] vhost scsi: Add support for LUN resets Mike Christie
@ 2020-10-22  0:34 ` Mike Christie
  2020-10-22  0:34 ` [PATCH 12/17] vhost poll: fix coding style Mike Christie
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 43+ messages in thread
From: Mike Christie @ 2020-10-22  0:34 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

The vhost work flush function was flushing the entire work queue, so
there is no need for the double vhost_work_dev_flush calls in
vhost_scsi_flush.

And we do not need to call vhost_poll_flush for each poller because
that call also ends up flushing the same work queue thread the
vhost_work_dev_flush call flushed.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/scsi.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index f543fa0..b348e9c 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1443,11 +1443,6 @@ static void vhost_scsi_handle_kick(struct vhost_work *work)
 	vhost_scsi_handle_vq(vs, vq);
 }
 
-static void vhost_scsi_flush_vq(struct vhost_scsi *vs, int index)
-{
-	vhost_poll_flush(&vs->vqs[index].vq.poll);
-}
-
 /* Callers must hold dev mutex */
 static void vhost_scsi_flush(struct vhost_scsi *vs)
 {
@@ -1466,9 +1461,6 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
 		kref_put(&old_inflight[i]->kref, vhost_scsi_done_inflight);
 
 	/* Flush both the vhost poll and vhost work */
-	for (i = 0; i < VHOST_SCSI_MAX_VQ; i++)
-		vhost_scsi_flush_vq(vs, i);
-	vhost_work_dev_flush(&vs->dev);
 	vhost_work_dev_flush(&vs->dev);
 
 	/* Wait for all reqs issued before the flush to be finished */
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 12/17] vhost poll: fix coding style
  2020-10-22  0:34 [PATCH 00/17 V3] vhost: fix scsi cmd handling and cgroup support Mike Christie
                   ` (10 preceding siblings ...)
  2020-10-22  0:34 ` [PATCH 11/17] vhost scsi: remove extra flushes Mike Christie
@ 2020-10-22  0:34 ` Mike Christie
  2020-10-22  0:39   ` Chaitanya Kulkarni
  2020-10-22  0:34 ` [PATCH 13/17] vhost: support multiple worker threads Mike Christie
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 43+ messages in thread
From: Mike Christie @ 2020-10-22  0:34 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

We use like 3 coding styles in this struct. Switch to just tabs.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/vhost.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 9677870..08c5aef 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -28,12 +28,12 @@ struct vhost_work {
 /* Poll a file (eventfd or socket) */
 /* Note: there's nothing vhost specific about this structure. */
 struct vhost_poll {
-	poll_table                table;
-	wait_queue_head_t        *wqh;
-	wait_queue_entry_t              wait;
-	struct vhost_work	  work;
-	__poll_t		  mask;
-	struct vhost_dev	 *dev;
+	poll_table		table;
+	wait_queue_head_t	*wqh;
+	wait_queue_entry_t	wait;
+	struct vhost_work	work;
+	__poll_t		mask;
+	struct vhost_dev	*dev;
 };
 
 void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 13/17] vhost: support multiple worker threads
  2020-10-22  0:34 [PATCH 00/17 V3] vhost: fix scsi cmd handling and cgroup support Mike Christie
                   ` (11 preceding siblings ...)
  2020-10-22  0:34 ` [PATCH 12/17] vhost poll: fix coding style Mike Christie
@ 2020-10-22  0:34 ` Mike Christie
  2020-10-22  0:35 ` [PATCH 14/17] vhost: poll support support multiple workers Mike Christie
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 43+ messages in thread
From: Mike Christie @ 2020-10-22  0:34 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

This is a prep patch to support multiple vhost worker threads per vhost
dev. This patch converts the code that had assumed a single worker
thread by:

1. Moving worker related fields to a new struct vhost_worker.
2. Converting vhost.c code to use the new struct and assume we will
have an array of workers.
3. It also exports a helper function that will be used in the last
patch when vhost-scsi is converted to use this new functionality.

Why do we need multiple worker threads?

For vhost-scsi, we do the initial submission and completion from the
vhost worker thread and after adding 2 vqs this single thread becomes a
bottleneck.

With the null_blk driver we max out at 360K IOPs when doing a random
workload like:

fio --direct=1 --rw=randrw --bs=4k --ioengine=libaio \
--iodepth=VQ_QUEUE_DEPTH --numjobs=NUM_VQS --filename  /dev/sdXYZ

where NUM_VQS gets up to 8 (number of cores per numa node on my system)
and VQ_QUEUE_DEPTH can be anywhere from 32 to 128.

With the patches in this set, we are able to get IOPs from a single
LUN up to 640K. And, With some other changes I am working on to the
LIO locking and binding worker threads to specific CPUs we can get this
up to 880K

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/vhost.c | 232 +++++++++++++++++++++++++++++++++++++++-----------
 drivers/vhost/vhost.h |  12 ++-
 2 files changed, 190 insertions(+), 54 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 2ca2e71..75866a2 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -231,18 +231,48 @@ void vhost_poll_stop(struct vhost_poll *poll)
 }
 EXPORT_SYMBOL_GPL(vhost_poll_stop);
 
-void vhost_work_dev_flush(struct vhost_dev *dev)
+static void vhost_work_queue_on(struct vhost_dev *dev, struct vhost_work *work,
+				int worker_id)
+{
+	if (!dev->workers)
+		return;
+
+	if (!test_and_set_bit(VHOST_WORK_QUEUED, &work->flags)) {
+		/* We can only add the work to the list after we're
+		 * sure it was not in the list.
+		 * test_and_set_bit() implies a memory barrier.
+		 */
+		llist_add(&work->node, &dev->workers[worker_id]->work_list);
+		wake_up_process(dev->workers[worker_id]->task);
+	}
+}
+
+void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
+{
+	vhost_work_queue_on(dev, work, 0);
+}
+EXPORT_SYMBOL_GPL(vhost_work_queue);
+
+static void vhost_work_flush_on(struct vhost_dev *dev, int worker_id)
 {
 	struct vhost_flush_struct flush;
 
-	if (dev->worker) {
+	if (dev->workers) {
 		init_completion(&flush.wait_event);
 		vhost_work_init(&flush.work, vhost_flush_work);
 
-		vhost_work_queue(dev, &flush.work);
+		vhost_work_queue_on(dev, &flush.work, worker_id);
 		wait_for_completion(&flush.wait_event);
 	}
 }
+
+void vhost_work_dev_flush(struct vhost_dev *dev)
+{
+	int i;
+
+	for (i = 0; i < dev->num_workers; i++)
+		vhost_work_flush_on(dev, i);
+}
 EXPORT_SYMBOL_GPL(vhost_work_dev_flush);
 
 /* Flush any work that has been scheduled. When calling this, don't hold any
@@ -253,26 +283,20 @@ void vhost_poll_flush(struct vhost_poll *poll)
 }
 EXPORT_SYMBOL_GPL(vhost_poll_flush);
 
-void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
+/* A lockless hint for busy polling code to exit the loop */
+bool vhost_has_work(struct vhost_dev *dev)
 {
-	if (!dev->worker)
-		return;
+	int i;
 
-	if (!test_and_set_bit(VHOST_WORK_QUEUED, &work->flags)) {
-		/* We can only add the work to the list after we're
-		 * sure it was not in the list.
-		 * test_and_set_bit() implies a memory barrier.
-		 */
-		llist_add(&work->node, &dev->work_list);
-		wake_up_process(dev->worker);
+	if (!dev->workers)
+		return false;
+
+	for (i = 0; i < dev->num_workers; i++) {
+		if (!llist_empty(&dev->workers[i]->work_list))
+			return true;
 	}
-}
-EXPORT_SYMBOL_GPL(vhost_work_queue);
 
-/* A lockless hint for busy polling code to exit the loop */
-bool vhost_has_work(struct vhost_dev *dev)
-{
-	return !llist_empty(&dev->work_list);
+	return false;
 }
 EXPORT_SYMBOL_GPL(vhost_has_work);
 
@@ -338,7 +362,8 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 
 static int vhost_worker(void *data)
 {
-	struct vhost_dev *dev = data;
+	struct vhost_worker *worker = data;
+	struct vhost_dev *dev = worker->dev;
 	struct vhost_work *work, *work_next;
 	struct llist_node *node;
 
@@ -352,8 +377,7 @@ static int vhost_worker(void *data)
 			__set_current_state(TASK_RUNNING);
 			break;
 		}
-
-		node = llist_del_all(&dev->work_list);
+		node = llist_del_all(&worker->work_list);
 		if (!node)
 			schedule();
 
@@ -506,13 +530,13 @@ int vhost_dev_init(struct vhost_dev *dev,
 	dev->umem = NULL;
 	dev->iotlb = NULL;
 	dev->mm = NULL;
-	dev->worker = NULL;
+	dev->workers = NULL;
+	dev->num_workers = 0;
 	dev->iov_limit = iov_limit;
 	dev->weight = weight;
-	dev->byte_weight = byte_weight;
 	dev->use_worker = use_worker;
+	dev->byte_weight = byte_weight;
 	dev->msg_handler = msg_handler;
-	init_llist_head(&dev->work_list);
 	init_waitqueue_head(&dev->wait);
 	INIT_LIST_HEAD(&dev->read_list);
 	INIT_LIST_HEAD(&dev->pending_list);
@@ -558,16 +582,28 @@ static void vhost_attach_cgroups_work(struct vhost_work *work)
 	s->ret = cgroup_attach_task_all(s->owner, current);
 }
 
-static int vhost_attach_cgroups(struct vhost_dev *dev)
+static int vhost_attach_cgroups_on(struct vhost_dev *dev, int worker_id)
 {
 	struct vhost_attach_cgroups_struct attach;
 
 	attach.owner = current;
 	vhost_work_init(&attach.work, vhost_attach_cgroups_work);
-	vhost_work_queue(dev, &attach.work);
-	vhost_work_dev_flush(dev);
+	vhost_work_queue_on(dev, &attach.work, worker_id);
+	vhost_work_flush_on(dev, worker_id);
 	return attach.ret;
 }
+static int vhost_attach_cgroups(struct vhost_dev *dev, int first_worker)
+{
+	int i, ret = 0;
+
+	for (i = first_worker; i < dev->num_workers; i++) {
+		ret = vhost_attach_cgroups_on(dev, i);
+		if (ret)
+			break;
+	}
+
+	return ret;
+}
 
 /* Caller should have device mutex */
 bool vhost_dev_has_owner(struct vhost_dev *dev)
@@ -606,10 +642,117 @@ static void vhost_detach_mm(struct vhost_dev *dev)
 	dev->mm = NULL;
 }
 
+static void vhost_workers_free(struct vhost_dev *dev)
+{
+	struct vhost_worker *worker;
+	int i;
+
+	if (!dev->workers)
+		return;
+
+	for (i = 0; i < dev->num_workers; i++) {
+		worker = dev->workers[i];
+
+		WARN_ON(!llist_empty(&worker->work_list));
+		kthread_stop(worker->task);
+		kfree(worker);
+	}
+
+	kfree(dev->workers);
+	dev->workers = NULL;
+	dev->num_workers = 0;
+}
+
+static int vhost_worker_create(struct vhost_dev *dev, int worker_id)
+{
+	struct vhost_worker *worker;
+	struct task_struct *task;
+	int ret;
+
+	worker = kzalloc(sizeof(*worker), GFP_KERNEL);
+	if (!worker)
+		return -ENOMEM;
+
+	init_llist_head(&worker->work_list);
+	worker->dev = dev;
+
+	task = kthread_create(vhost_worker, worker, "vhost-%d", current->pid);
+	if (IS_ERR(task)) {
+		ret = PTR_ERR(task);
+		goto free_worker;
+	}
+
+	dev->workers[worker_id] = worker;
+	worker->task = task;
+	wake_up_process(task); /* avoid contributing to loadavg */
+	return 0;
+
+free_worker:
+	kfree(worker);
+	return ret;
+}
+
+/**
+ * vhost_workers_create - create vhost workers and attach to cgroup
+ * @dev: vhost device
+ * @new_num_workers: the total number of workers we want after this returns
+ *
+ * Caller must have the device mutex and have stopped operations that
+ * can access the workers array.
+ */
+int vhost_workers_create(struct vhost_dev *dev, int new_num_workers)
+{
+	struct vhost_worker **new_workers;
+	struct mm_struct *mm;
+	bool owner_match = true;
+	int i, err, start;
+
+	if (new_num_workers = dev->num_workers)
+		return 0;
+
+	if (new_num_workers < dev->num_workers)
+		return -EINVAL;
+
+	if (vhost_dev_has_owner(dev)) {
+		mm = get_task_mm(current);
+		if (mm != dev->mm)
+			owner_match = false;
+		mmput(mm);
+		if (!owner_match)
+			return -EBUSY;
+	}
+
+	new_workers = krealloc(dev->workers, new_num_workers * sizeof(*new_workers),
+			       GFP_KERNEL);
+	if (!new_workers) {
+		err = -ENOMEM;
+		goto free_workers;
+	}
+	dev->workers = new_workers;
+
+	start = dev->num_workers;
+	for (i = start; i < new_num_workers; i++) {
+		err = vhost_worker_create(dev, i);
+		if (err)
+			goto free_workers;
+		dev->num_workers++;
+	}
+
+	err = vhost_attach_cgroups(dev, start);
+	if (err)
+		goto free_workers;
+
+	return 0;
+
+free_workers:
+	vhost_workers_free(dev);
+	return err;
+}
+EXPORT_SYMBOL_GPL(vhost_workers_create);
+
 /* Caller should have device mutex */
 long vhost_dev_set_owner(struct vhost_dev *dev)
 {
-	struct task_struct *worker;
 	int err;
 
 	/* Is there an owner already? */
@@ -622,27 +765,16 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
 
 	dev->kcov_handle = kcov_common_handle();
 	if (dev->use_worker) {
-		worker = kthread_create(vhost_worker, dev,
-					"vhost-%d", current->pid);
-		if (IS_ERR(worker)) {
-			err = PTR_ERR(worker);
-			goto err_worker;
-		}
-
-		dev->worker = worker;
-		wake_up_process(worker); /* avoid contributing to loadavg */
-
-		err = vhost_attach_cgroups(dev);
+		/*
+		 * All drivers that set use_worker=true, use at least one
+		 * worker. Drivers like vhost-scsi may override this later.
+		 */
+		err = vhost_workers_create(dev, 1);
 		if (err)
-			goto err_cgroup;
+			goto err_worker;
 	}
 
 	return 0;
-err_cgroup:
-	if (dev->worker) {
-		kthread_stop(dev->worker);
-		dev->worker = NULL;
-	}
 err_worker:
 	vhost_detach_mm(dev);
 	dev->kcov_handle = 0;
@@ -735,12 +867,8 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
 	dev->iotlb = NULL;
 	vhost_clear_msg(dev);
 	wake_up_interruptible_poll(&dev->wait, EPOLLIN | EPOLLRDNORM);
-	WARN_ON(!llist_empty(&dev->work_list));
-	if (dev->worker) {
-		kthread_stop(dev->worker);
-		dev->worker = NULL;
-		dev->kcov_handle = 0;
-	}
+	vhost_workers_free(dev);
+	dev->kcov_handle = 0;
 	vhost_detach_mm(dev);
 }
 EXPORT_SYMBOL_GPL(vhost_dev_cleanup);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 08c5aef..b0973e6 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -25,6 +25,12 @@ struct vhost_work {
 	unsigned long		  flags;
 };
 
+struct vhost_worker {
+	struct task_struct *task;
+	struct llist_head work_list;
+	struct vhost_dev *dev;
+};
+
 /* Poll a file (eventfd or socket) */
 /* Note: there's nothing vhost specific about this structure. */
 struct vhost_poll {
@@ -39,6 +45,7 @@ struct vhost_poll {
 void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
 void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
 bool vhost_has_work(struct vhost_dev *dev);
+int vhost_workers_create(struct vhost_dev *dev, int new_num_workers);
 
 void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
 		     __poll_t mask, struct vhost_dev *dev);
@@ -152,8 +159,8 @@ struct vhost_dev {
 	int nvqs;
 	int max_nvqs;
 	struct eventfd_ctx *log_ctx;
-	struct llist_head work_list;
-	struct task_struct *worker;
+	struct vhost_worker **workers;
+	int num_workers;
 	struct vhost_iotlb *umem;
 	struct vhost_iotlb *iotlb;
 	spinlock_t iotlb_lock;
@@ -175,6 +182,7 @@ int vhost_dev_init(struct vhost_dev *dev, struct vhost_virtqueue **vqs,
 		   int byte_weight, bool use_worker,
 		   int (*msg_handler)(struct vhost_dev *dev,
 				      struct vhost_iotlb_msg *msg));
+int vhost_vq_set_worker(struct vhost_virtqueue *vq, int worker_id);
 long vhost_dev_set_owner(struct vhost_dev *dev);
 bool vhost_dev_has_owner(struct vhost_dev *dev);
 long vhost_dev_check_owner(struct vhost_dev *);
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 14/17] vhost: poll support support multiple workers
  2020-10-22  0:34 [PATCH 00/17 V3] vhost: fix scsi cmd handling and cgroup support Mike Christie
                   ` (12 preceding siblings ...)
  2020-10-22  0:34 ` [PATCH 13/17] vhost: support multiple worker threads Mike Christie
@ 2020-10-22  0:35 ` Mike Christie
  2020-10-22  0:35 ` [PATCH 15/17] host: support delayed vq creation Mike Christie
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 43+ messages in thread
From: Mike Christie @ 2020-10-22  0:35 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

The final patches are going to have vhost scsi create a vhost worker
per IO vq. This patch converts the poll code to poll and queue work on
the worker that is tied to the vq (in this patch we maintain the old
behavior where all vqs use a single worker).

For drivers that do not convert over to the multiple worker support
or for the case where the user just does not want to allocate the
resources then we maintain support for the single worker case.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/net.c   |  6 ++++--
 drivers/vhost/vhost.c | 25 ++++++++++++++++++++++---
 drivers/vhost/vhost.h |  9 +++++++--
 3 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index fce46f0..a316ed0 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1322,8 +1322,10 @@ static int vhost_net_open(struct inode *inode, struct file *f)
 			   NULL))
 		goto err_dev_init;
 
-	vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, EPOLLOUT, dev);
-	vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, EPOLLIN, dev);
+	vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, EPOLLOUT, dev,
+			vqs[VHOST_NET_VQ_TX]);
+	vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, EPOLLIN, dev,
+			vqs[VHOST_NET_VQ_RX]);
 
 	f->private_data = n;
 	n->page_frag.page = NULL;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 75866a2..991f781 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -187,13 +187,15 @@ void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn)
 
 /* Init poll structure */
 void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
-		     __poll_t mask, struct vhost_dev *dev)
+		     __poll_t mask, struct vhost_dev *dev,
+		     struct vhost_virtqueue *vq)
 {
 	init_waitqueue_func_entry(&poll->wait, vhost_poll_wakeup);
 	init_poll_funcptr(&poll->table, vhost_poll_func);
 	poll->mask = mask;
 	poll->dev = dev;
 	poll->wqh = NULL;
+	poll->vq = vq;
 
 	vhost_work_init(&poll->work, fn);
 }
@@ -283,6 +285,12 @@ void vhost_poll_flush(struct vhost_poll *poll)
 }
 EXPORT_SYMBOL_GPL(vhost_poll_flush);
 
+void vhost_vq_work_queue(struct vhost_virtqueue *vq, struct vhost_work *work)
+{
+	vhost_work_queue_on(vq->dev, work, vq->worker_id);
+}
+EXPORT_SYMBOL_GPL(vhost_vq_work_queue);
+
 /* A lockless hint for busy polling code to exit the loop */
 bool vhost_has_work(struct vhost_dev *dev)
 {
@@ -302,7 +310,7 @@ bool vhost_has_work(struct vhost_dev *dev)
 
 void vhost_poll_queue(struct vhost_poll *poll)
 {
-	vhost_work_queue(poll->dev, &poll->work);
+	vhost_vq_work_queue(poll->vq, &poll->work);
 }
 EXPORT_SYMBOL_GPL(vhost_poll_queue);
 
@@ -487,11 +495,12 @@ static void __vhost_vq_init(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 	vq->indirect = NULL;
 	vq->heads = NULL;
 	vq->dev = dev;
+	vq->worker_id = 0;
 	mutex_init(&vq->mutex);
 	vhost_vq_reset(dev, vq);
 
 	if (vq->handle_kick)
-		vhost_poll_init(&vq->poll, vq->handle_kick, EPOLLIN, dev);
+		vhost_poll_init(&vq->poll, vq->handle_kick, EPOLLIN, dev, vq);
 }
 
 static int vhost_vq_init(struct vhost_dev *dev, int vq_idx)
@@ -642,6 +651,16 @@ static void vhost_detach_mm(struct vhost_dev *dev)
 	dev->mm = NULL;
 }
 
+int vhost_vq_set_worker(struct vhost_virtqueue *vq, int worker_id)
+{
+	if (vhost_vq_get_backend(vq))
+		return -EBUSY;
+
+	vq->worker_id = worker_id;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(vhost_vq_set_worker);
+
 static void vhost_workers_free(struct vhost_dev *dev)
 {
 	struct vhost_worker *worker;
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index b0973e6..598aee7 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -16,6 +16,7 @@
 #include <linux/irqbypass.h>
 
 struct vhost_work;
+struct vhost_virtqueue;
 typedef void (*vhost_work_fn_t)(struct vhost_work *work);
 
 #define VHOST_WORK_QUEUED 1
@@ -32,7 +33,6 @@ struct vhost_worker {
 };
 
 /* Poll a file (eventfd or socket) */
-/* Note: there's nothing vhost specific about this structure. */
 struct vhost_poll {
 	poll_table		table;
 	wait_queue_head_t	*wqh;
@@ -40,15 +40,19 @@ struct vhost_poll {
 	struct vhost_work	work;
 	__poll_t		mask;
 	struct vhost_dev	*dev;
+	struct vhost_virtqueue	*vq;
 };
 
+int vhost_vq_set_worker(struct vhost_virtqueue *vq, int worker_id);
+void vhost_vq_work_queue(struct vhost_virtqueue *vq, struct vhost_work *work);
 void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
 void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
 bool vhost_has_work(struct vhost_dev *dev);
 int vhost_workers_create(struct vhost_dev *dev, int new_num_workers);
 
 void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
-		     __poll_t mask, struct vhost_dev *dev);
+		     __poll_t mask, struct vhost_dev *dev,
+		     struct vhost_virtqueue *vq);
 int vhost_poll_start(struct vhost_poll *poll, struct file *file);
 void vhost_poll_stop(struct vhost_poll *poll);
 void vhost_poll_flush(struct vhost_poll *poll);
@@ -91,6 +95,7 @@ struct vhost_virtqueue {
 
 	struct vhost_poll poll;
 
+	int worker_id;
 	/* The routine to call when the Guest pings us, or timeout. */
 	vhost_work_fn_t handle_kick;
 
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 15/17] host: support delayed vq creation
  2020-10-22  0:34 [PATCH 00/17 V3] vhost: fix scsi cmd handling and cgroup support Mike Christie
                   ` (13 preceding siblings ...)
  2020-10-22  0:35 ` [PATCH 14/17] vhost: poll support support multiple workers Mike Christie
@ 2020-10-22  0:35 ` Mike Christie
  2020-10-22  0:50   ` Mike Christie
  2020-10-22  0:35 ` [PATCH 16/17] vhost scsi: multiple worker support Mike Christie
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 43+ messages in thread
From: Mike Christie @ 2020-10-22  0:35 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

This allows vq creation to be done when it's first accessed by
userspace. vhost-scsi doesn't know how many queues the user requested
until they are first setup, and we don't want to allocate resources
like the iovecs for 128 vqs when we are only using 1 or 2 most of the
time. In the next pathces, vhost-scsi will also switch to preallocating
cmds per vq instead of per lio session and we don't want to allocate
them for 127 extra vqs if they are not in use.

With this patch when a driver calls vhost_dev_init they pass in the
number of vqs that they know they need and the max they can support.
This patch has all the drivers pass in the same value for both the
initial number of vqs and the max. The next patch will convert scsi.
The other drivers like net/vsock have their vqs hard coded in the
kernel or setup/discovered via other methods like with vdpa.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/scsi.c | 48 +++++++++++++++++++++++++-----------------------
 1 file changed, 25 insertions(+), 23 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index b348e9c..5d6dc15 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -178,6 +178,7 @@ enum {
 
 struct vhost_scsi_virtqueue {
 	struct vhost_virtqueue vq;
+	struct vhost_scsi *vs;
 	/*
 	 * Reference counting for inflight reqs, used for flush operation. At
 	 * each time, one reference tracks new commands submitted, while we
@@ -192,6 +193,9 @@ struct vhost_scsi_virtqueue {
 	struct vhost_scsi_cmd *scsi_cmds;
 	struct sbitmap scsi_tags;
 	int max_cmds;
+
+	struct vhost_work completion_work;
+	struct llist_head completion_list;
 };
 
 struct vhost_scsi {
@@ -202,9 +206,6 @@ struct vhost_scsi {
 	struct vhost_dev dev;
 	struct vhost_scsi_virtqueue vqs[VHOST_SCSI_MAX_VQ];
 
-	struct vhost_work vs_completion_work; /* cmd completion work item */
-	struct llist_head vs_completion_list; /* cmd completion queue */
-
 	struct vhost_work vs_event_work; /* evt injection work item */
 	struct llist_head vs_event_list; /* evt injection queue */
 
@@ -380,10 +381,11 @@ static void vhost_scsi_release_cmd(struct se_cmd *se_cmd)
 	} else {
 		struct vhost_scsi_cmd *cmd = container_of(se_cmd,
 					struct vhost_scsi_cmd, tvc_se_cmd);
-		struct vhost_scsi *vs = cmd->tvc_vhost;
+		struct vhost_scsi_virtqueue *svq =  container_of(cmd->tvc_vq,
+					struct vhost_scsi_virtqueue, vq);
 
-		llist_add(&cmd->tvc_completion_list, &vs->vs_completion_list);
-		vhost_work_queue(&vs->dev, &vs->vs_completion_work);
+		llist_add(&cmd->tvc_completion_list, &svq->completion_list);
+		vhost_vq_work_queue(&svq->vq, &svq->completion_work);
 	}
 }
 
@@ -545,18 +547,17 @@ static void vhost_scsi_evt_work(struct vhost_work *work)
  */
 static void vhost_scsi_complete_cmd_work(struct vhost_work *work)
 {
-	struct vhost_scsi *vs = container_of(work, struct vhost_scsi,
-					vs_completion_work);
-	DECLARE_BITMAP(signal, VHOST_SCSI_MAX_VQ);
+	struct vhost_scsi_virtqueue *svq = container_of(work,
+				struct vhost_scsi_virtqueue, completion_work);
 	struct virtio_scsi_cmd_resp v_rsp;
 	struct vhost_scsi_cmd *cmd, *t;
 	struct llist_node *llnode;
 	struct se_cmd *se_cmd;
 	struct iov_iter iov_iter;
-	int ret, vq;
+	bool signal = false;
+	int ret;
 
-	bitmap_zero(signal, VHOST_SCSI_MAX_VQ);
-	llnode = llist_del_all(&vs->vs_completion_list);
+	llnode = llist_del_all(&svq->completion_list);
 	llist_for_each_entry_safe(cmd, t, llnode, tvc_completion_list) {
 		se_cmd = &cmd->tvc_se_cmd;
 
@@ -576,21 +577,16 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work *work)
 			      cmd->tvc_in_iovs, sizeof(v_rsp));
 		ret = copy_to_iter(&v_rsp, sizeof(v_rsp), &iov_iter);
 		if (likely(ret = sizeof(v_rsp))) {
-			struct vhost_scsi_virtqueue *q;
+			signal = true;
 			vhost_add_used(cmd->tvc_vq, cmd->tvc_vq_desc, 0);
-			q = container_of(cmd->tvc_vq, struct vhost_scsi_virtqueue, vq);
-			vq = q - vs->vqs;
-			__set_bit(vq, signal);
 		} else
 			pr_err("Faulted on virtio_scsi_cmd_resp\n");
 
 		vhost_scsi_release_cmd_res(se_cmd);
 	}
 
-	vq = -1;
-	while ((vq = find_next_bit(signal, VHOST_SCSI_MAX_VQ, vq + 1))
-		< VHOST_SCSI_MAX_VQ)
-		vhost_signal(&vs->dev, &vs->vqs[vq].vq);
+	if (signal)
+		vhost_signal(&svq->vs->dev, &svq->vq);
 }
 
 static struct vhost_scsi_cmd *
@@ -1805,6 +1801,7 @@ static int vhost_scsi_set_features(struct vhost_scsi *vs, u64 features)
 
 static int vhost_scsi_open(struct inode *inode, struct file *f)
 {
+	struct vhost_scsi_virtqueue *svq;
 	struct vhost_scsi *vs;
 	struct vhost_virtqueue **vqs;
 	int r = -ENOMEM, i;
@@ -1820,7 +1817,6 @@ static int vhost_scsi_open(struct inode *inode, struct file *f)
 	if (!vqs)
 		goto err_vqs;
 
-	vhost_work_init(&vs->vs_completion_work, vhost_scsi_complete_cmd_work);
 	vhost_work_init(&vs->vs_event_work, vhost_scsi_evt_work);
 
 	vs->vs_events_nr = 0;
@@ -1831,8 +1827,14 @@ static int vhost_scsi_open(struct inode *inode, struct file *f)
 	vs->vqs[VHOST_SCSI_VQ_CTL].vq.handle_kick = vhost_scsi_ctl_handle_kick;
 	vs->vqs[VHOST_SCSI_VQ_EVT].vq.handle_kick = vhost_scsi_evt_handle_kick;
 	for (i = VHOST_SCSI_VQ_IO; i < VHOST_SCSI_MAX_VQ; i++) {
-		vqs[i] = &vs->vqs[i].vq;
-		vs->vqs[i].vq.handle_kick = vhost_scsi_handle_kick;
+		svq = &vs->vqs[i];
+
+		vqs[i] = &svq->vq;
+		svq->vs = vs;
+		init_llist_head(&svq->completion_list);
+		vhost_work_init(&svq->completion_work,
+				vhost_scsi_complete_cmd_work);
+		svq->vq.handle_kick = vhost_scsi_handle_kick;
 	}
 
 	/*
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 16/17] vhost scsi: multiple worker support
  2020-10-22  0:34 [PATCH 00/17 V3] vhost: fix scsi cmd handling and cgroup support Mike Christie
                   ` (14 preceding siblings ...)
  2020-10-22  0:35 ` [PATCH 15/17] host: support delayed vq creation Mike Christie
@ 2020-10-22  0:35 ` Mike Christie
  2020-10-22  0:35 ` [PATCH 17/17] vhost scsi: drop submission workqueue Mike Christie
  2020-10-29 21:47 ` [PATCH 00/17 V3] vhost: fix scsi cmd handling and cgroup support Michael S. Tsirkin
  17 siblings, 0 replies; 43+ messages in thread
From: Mike Christie @ 2020-10-22  0:35 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

Create a vhost_worker per IO vq. When using more than 2 vqs and/or
multiple LUNs per vhost-scsi dev, we hit a bottleneck with the single
worker where we want to start and complete all vqs and all LUNs from the
same thread.

Combined with the previous patches that allow us to increase the
queue depths and virtqueue count, for a single LUN/device with 8
virtqueues at queue depth of 128 cmds per queue, IOPs heavy workloads
(like 50/50 randrw 4K IOs with numjobs=virtqueues and iodepth=queue
depth) go from 180K to 400K where the native device can get 500K IOPs.

When using the null_blk driver, with a single LUN/device and the
same number of virtqueues/queuedepth and fio workload we see IOPs go
from 360K to 640K.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/scsi.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 5d6dc15..4e91a90 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1624,6 +1624,22 @@ static int vhost_scsi_setup_vq_cmds(struct vhost_virtqueue *vq, int max_cmds)
 		memcpy(vs->vs_vhost_wwpn, t->vhost_wwpn,
 		       sizeof(vs->vs_vhost_wwpn));
 
+		/*
+		 * For compat, have the evt and ctl vqs share worker0 with
+		 * the first IO vq like is setup as default already. Any
+		 * additional vqs will get their own worker.
+		 *
+		 * Note: if we fail later, then the vhost_dev_cleanup call on
+		 * release() will clean up all the workers.
+		 */
+		ret = vhost_workers_create(&vs->dev,
+					   vs->dev.nvqs - VHOST_SCSI_VQ_IO);
+		if (ret) {
+			pr_err("Could not create vhost-scsi workers. Error %d.",
+			       ret);
+			goto undepend;
+		}
+
 		for (i = VHOST_SCSI_VQ_IO; i < VHOST_SCSI_MAX_VQ; i++) {
 			vq = &vs->vqs[i].vq;
 			if (!vq->initialized)
@@ -1631,6 +1647,7 @@ static int vhost_scsi_setup_vq_cmds(struct vhost_virtqueue *vq, int max_cmds)
 
 			if (vhost_scsi_setup_vq_cmds(vq, vq->num))
 				goto destroy_vq_cmds;
+			vhost_vq_set_worker(vq, i - VHOST_SCSI_VQ_IO);
 		}
 
 		for (i = 0; i < VHOST_SCSI_MAX_VQ; i++) {
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 17/17] vhost scsi: drop submission workqueue
  2020-10-22  0:34 [PATCH 00/17 V3] vhost: fix scsi cmd handling and cgroup support Mike Christie
                   ` (15 preceding siblings ...)
  2020-10-22  0:35 ` [PATCH 16/17] vhost scsi: multiple worker support Mike Christie
@ 2020-10-22  0:35 ` Mike Christie
  2020-10-29 21:47 ` [PATCH 00/17 V3] vhost: fix scsi cmd handling and cgroup support Michael S. Tsirkin
  17 siblings, 0 replies; 43+ messages in thread
From: Mike Christie @ 2020-10-22  0:35 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

We can't control a VM's vhost scsi IO right now because the vhost worker
is added to the VM's blk cgroup, but the vhost worker thread actually
just passes the cmd to a vhost-scsi driver workqueue which ends up
submitting the cmd to the block layer.

This patch has us submit from the vhost worker thread and removes the
work queue.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/scsi.c | 30 +++---------------------------
 1 file changed, 3 insertions(+), 27 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 4e91a90..3178bf54 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -101,8 +101,6 @@ struct vhost_scsi_cmd {
 	struct vhost_scsi_nexus *tvc_nexus;
 	/* The TCM I/O descriptor that is accessed via container_of() */
 	struct se_cmd tvc_se_cmd;
-	/* work item used for cmwq dispatch to vhost_scsi_submission_work() */
-	struct work_struct work;
 	/* Copy of the incoming SCSI command descriptor block (CDB) */
 	unsigned char tvc_cdb[VHOST_SCSI_MAX_CDB_SIZE];
 	/* Sense buffer that will be mapped into outgoing status */
@@ -240,8 +238,6 @@ struct vhost_scsi_ctx {
 	struct iov_iter out_iter;
 };
 
-static struct workqueue_struct *vhost_scsi_workqueue;
-
 /* Global spinlock to protect vhost_scsi TPG list for vhost IOCTL access */
 static DEFINE_MUTEX(vhost_scsi_mutex);
 static LIST_HEAD(vhost_scsi_list);
@@ -776,10 +772,8 @@ static int vhost_scsi_to_tcm_attr(int attr)
 	return TCM_SIMPLE_TAG;
 }
 
-static void vhost_scsi_submission_work(struct work_struct *work)
+static void vhost_scsi_target_submit(struct vhost_scsi_cmd *cmd)
 {
-	struct vhost_scsi_cmd *cmd -		container_of(work, struct vhost_scsi_cmd, work);
 	struct vhost_scsi_nexus *tv_nexus;
 	struct se_cmd *se_cmd = &cmd->tvc_se_cmd;
 	struct scatterlist *sg_ptr, *sg_prot_ptr = NULL;
@@ -1126,14 +1120,7 @@ static u16 vhost_buf_to_lun(u8 *lun_buf)
 		 * vhost_scsi_queue_data_in() and vhost_scsi_queue_status()
 		 */
 		cmd->tvc_vq_desc = vc.head;
-		/*
-		 * Dispatch cmd descriptor for cmwq execution in process
-		 * context provided by vhost_scsi_workqueue.  This also ensures
-		 * cmd is executed on the same kworker CPU as this vhost
-		 * thread to gain positive L2 cache locality effects.
-		 */
-		INIT_WORK(&cmd->work, vhost_scsi_submission_work);
-		queue_work(vhost_scsi_workqueue, &cmd->work);
+		vhost_scsi_target_submit(cmd);
 		ret = 0;
 err:
 		/*
@@ -2511,17 +2498,9 @@ static int __init vhost_scsi_init(void)
 		" on "UTS_RELEASE"\n", VHOST_SCSI_VERSION, utsname()->sysname,
 		utsname()->machine);
 
-	/*
-	 * Use our own dedicated workqueue for submitting I/O into
-	 * target core to avoid contention within system_wq.
-	 */
-	vhost_scsi_workqueue = alloc_workqueue("vhost_scsi", 0, 0);
-	if (!vhost_scsi_workqueue)
-		goto out;
-
 	ret = vhost_scsi_register();
 	if (ret < 0)
-		goto out_destroy_workqueue;
+		goto out;
 
 	ret = target_register_template(&vhost_scsi_ops);
 	if (ret < 0)
@@ -2531,8 +2510,6 @@ static int __init vhost_scsi_init(void)
 
 out_vhost_scsi_deregister:
 	vhost_scsi_deregister();
-out_destroy_workqueue:
-	destroy_workqueue(vhost_scsi_workqueue);
 out:
 	return ret;
 };
@@ -2541,7 +2518,6 @@ static void vhost_scsi_exit(void)
 {
 	target_unregister_template(&vhost_scsi_ops);
 	vhost_scsi_deregister();
-	destroy_workqueue(vhost_scsi_workqueue);
 };
 
 MODULE_DESCRIPTION("VHOST_SCSI series fabric driver");
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 12/17] vhost poll: fix coding style
  2020-10-22  0:34 ` [PATCH 12/17] vhost poll: fix coding style Mike Christie
@ 2020-10-22  0:39   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 43+ messages in thread
From: Chaitanya Kulkarni @ 2020-10-22  0:39 UTC (permalink / raw)
  To: Mike Christie, martin.petersen, linux-scsi, target-devel, mst,
	jasowang, pbonzini, stefanha, virtualization

On 10/21/20 17:35, Mike Christie wrote:
> We use like 3 coding styles in this struct. Switch to just tabs.
>
> Signed-off-by: Mike Christie <michael.christie@oracle.com>

Looks good.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 03/17] vhost net: use goto error handling in open
  2020-10-22  0:34 ` [PATCH 03/17] vhost net: use goto error handling in open Mike Christie
@ 2020-10-22  0:45   ` Chaitanya Kulkarni
  2020-10-26  3:34   ` Jason Wang
  1 sibling, 0 replies; 43+ messages in thread
From: Chaitanya Kulkarni @ 2020-10-22  0:45 UTC (permalink / raw)
  To: Mike Christie, martin.petersen, linux-scsi, target-devel, mst,
	jasowang, pbonzini, stefanha, virtualization

On 10/21/20 17:35, Mike Christie wrote:
> In the next patches vhost_dev_init will be able to fail. This patch has
> vhost_net_open use goto error handling like is done in the other vhost
> code to make handling vhost_dev_init failures easier to handle and
> extend in the future.
>
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---

Also, not it reduces the repeated kfree() calls in the code for vqs and

n, looks good.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 15/17] host: support delayed vq creation
  2020-10-22  0:35 ` [PATCH 15/17] host: support delayed vq creation Mike Christie
@ 2020-10-22  0:50   ` Mike Christie
  0 siblings, 0 replies; 43+ messages in thread
From: Mike Christie @ 2020-10-22  0:50 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

The email subject and patch description got cut and paste from the wrong
patch when I updated this patch. It should be:


[PATCH 15/17] vhost scsi: make completion per vq

In the last patches we are going to have a worker thread per IO vq.
This patch separates the scsi cmd completion code paths so we can
complete cmds based on their vq instead of having all cmds complete
on the same worker thread.


---

When I handle other review comments in the set I'll fix this up too.


On 10/21/20 7:35 PM, Mike Christie wrote:
> This allows vq creation to be done when it's first accessed by
> userspace. vhost-scsi doesn't know how many queues the user requested
> until they are first setup, and we don't want to allocate resources
> like the iovecs for 128 vqs when we are only using 1 or 2 most of the
> time. In the next pathces, vhost-scsi will also switch to preallocating
> cmds per vq instead of per lio session and we don't want to allocate
> them for 127 extra vqs if they are not in use.
> 
> With this patch when a driver calls vhost_dev_init they pass in the
> number of vqs that they know they need and the max they can support.
> This patch has all the drivers pass in the same value for both the
> initial number of vqs and the max. The next patch will convert scsi.
> The other drivers like net/vsock have their vqs hard coded in the
> kernel or setup/discovered via other methods like with vdpa.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/vhost/scsi.c | 48 +++++++++++++++++++++++++-----------------------
>  1 file changed, 25 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
> index b348e9c..5d6dc15 100644
> --- a/drivers/vhost/scsi.c
> +++ b/drivers/vhost/scsi.c
> @@ -178,6 +178,7 @@ enum {
>  
>  struct vhost_scsi_virtqueue {
>  	struct vhost_virtqueue vq;
> +	struct vhost_scsi *vs;
>  	/*
>  	 * Reference counting for inflight reqs, used for flush operation. At
>  	 * each time, one reference tracks new commands submitted, while we
> @@ -192,6 +193,9 @@ struct vhost_scsi_virtqueue {
>  	struct vhost_scsi_cmd *scsi_cmds;
>  	struct sbitmap scsi_tags;
>  	int max_cmds;
> +
> +	struct vhost_work completion_work;
> +	struct llist_head completion_list;
>  };
>  
>  struct vhost_scsi {
> @@ -202,9 +206,6 @@ struct vhost_scsi {
>  	struct vhost_dev dev;
>  	struct vhost_scsi_virtqueue vqs[VHOST_SCSI_MAX_VQ];
>  
> -	struct vhost_work vs_completion_work; /* cmd completion work item */
> -	struct llist_head vs_completion_list; /* cmd completion queue */
> -
>  	struct vhost_work vs_event_work; /* evt injection work item */
>  	struct llist_head vs_event_list; /* evt injection queue */
>  
> @@ -380,10 +381,11 @@ static void vhost_scsi_release_cmd(struct se_cmd *se_cmd)
>  	} else {
>  		struct vhost_scsi_cmd *cmd = container_of(se_cmd,
>  					struct vhost_scsi_cmd, tvc_se_cmd);
> -		struct vhost_scsi *vs = cmd->tvc_vhost;
> +		struct vhost_scsi_virtqueue *svq =  container_of(cmd->tvc_vq,
> +					struct vhost_scsi_virtqueue, vq);
>  
> -		llist_add(&cmd->tvc_completion_list, &vs->vs_completion_list);
> -		vhost_work_queue(&vs->dev, &vs->vs_completion_work);
> +		llist_add(&cmd->tvc_completion_list, &svq->completion_list);
> +		vhost_vq_work_queue(&svq->vq, &svq->completion_work);
>  	}
>  }
>  
> @@ -545,18 +547,17 @@ static void vhost_scsi_evt_work(struct vhost_work *work)
>   */
>  static void vhost_scsi_complete_cmd_work(struct vhost_work *work)
>  {
> -	struct vhost_scsi *vs = container_of(work, struct vhost_scsi,
> -					vs_completion_work);
> -	DECLARE_BITMAP(signal, VHOST_SCSI_MAX_VQ);
> +	struct vhost_scsi_virtqueue *svq = container_of(work,
> +				struct vhost_scsi_virtqueue, completion_work);
>  	struct virtio_scsi_cmd_resp v_rsp;
>  	struct vhost_scsi_cmd *cmd, *t;
>  	struct llist_node *llnode;
>  	struct se_cmd *se_cmd;
>  	struct iov_iter iov_iter;
> -	int ret, vq;
> +	bool signal = false;
> +	int ret;
>  
> -	bitmap_zero(signal, VHOST_SCSI_MAX_VQ);
> -	llnode = llist_del_all(&vs->vs_completion_list);
> +	llnode = llist_del_all(&svq->completion_list);
>  	llist_for_each_entry_safe(cmd, t, llnode, tvc_completion_list) {
>  		se_cmd = &cmd->tvc_se_cmd;
>  
> @@ -576,21 +577,16 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work *work)
>  			      cmd->tvc_in_iovs, sizeof(v_rsp));
>  		ret = copy_to_iter(&v_rsp, sizeof(v_rsp), &iov_iter);
>  		if (likely(ret = sizeof(v_rsp))) {
> -			struct vhost_scsi_virtqueue *q;
> +			signal = true;
>  			vhost_add_used(cmd->tvc_vq, cmd->tvc_vq_desc, 0);
> -			q = container_of(cmd->tvc_vq, struct vhost_scsi_virtqueue, vq);
> -			vq = q - vs->vqs;
> -			__set_bit(vq, signal);
>  		} else
>  			pr_err("Faulted on virtio_scsi_cmd_resp\n");
>  
>  		vhost_scsi_release_cmd_res(se_cmd);
>  	}
>  
> -	vq = -1;
> -	while ((vq = find_next_bit(signal, VHOST_SCSI_MAX_VQ, vq + 1))
> -		< VHOST_SCSI_MAX_VQ)
> -		vhost_signal(&vs->dev, &vs->vqs[vq].vq);
> +	if (signal)
> +		vhost_signal(&svq->vs->dev, &svq->vq);
>  }
>  
>  static struct vhost_scsi_cmd *
> @@ -1805,6 +1801,7 @@ static int vhost_scsi_set_features(struct vhost_scsi *vs, u64 features)
>  
>  static int vhost_scsi_open(struct inode *inode, struct file *f)
>  {
> +	struct vhost_scsi_virtqueue *svq;
>  	struct vhost_scsi *vs;
>  	struct vhost_virtqueue **vqs;
>  	int r = -ENOMEM, i;
> @@ -1820,7 +1817,6 @@ static int vhost_scsi_open(struct inode *inode, struct file *f)
>  	if (!vqs)
>  		goto err_vqs;
>  
> -	vhost_work_init(&vs->vs_completion_work, vhost_scsi_complete_cmd_work);
>  	vhost_work_init(&vs->vs_event_work, vhost_scsi_evt_work);
>  
>  	vs->vs_events_nr = 0;
> @@ -1831,8 +1827,14 @@ static int vhost_scsi_open(struct inode *inode, struct file *f)
>  	vs->vqs[VHOST_SCSI_VQ_CTL].vq.handle_kick = vhost_scsi_ctl_handle_kick;
>  	vs->vqs[VHOST_SCSI_VQ_EVT].vq.handle_kick = vhost_scsi_evt_handle_kick;
>  	for (i = VHOST_SCSI_VQ_IO; i < VHOST_SCSI_MAX_VQ; i++) {
> -		vqs[i] = &vs->vqs[i].vq;
> -		vs->vqs[i].vq.handle_kick = vhost_scsi_handle_kick;
> +		svq = &vs->vqs[i];
> +
> +		vqs[i] = &svq->vq;
> +		svq->vs = vs;
> +		init_llist_head(&svq->completion_list);
> +		vhost_work_init(&svq->completion_work,
> +				vhost_scsi_complete_cmd_work);
> +		svq->vq.handle_kick = vhost_scsi_handle_kick;
>  	}
>  
>  	/*
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 02/17] vhost: remove work arg from vhost_work_flush
  2020-10-22  0:34 ` [PATCH 02/17] vhost: remove work arg from vhost_work_flush Mike Christie
@ 2020-10-22  0:51   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 43+ messages in thread
From: Chaitanya Kulkarni @ 2020-10-22  0:51 UTC (permalink / raw)
  To: Mike Christie, martin.petersen, linux-scsi, target-devel, mst,
	jasowang, pbonzini, stefanha, virtualization

On 10/21/20 17:35, Mike Christie wrote:
> vhost_work_flush doesn't do anything with the work arg. This patch drops
> it and then renames vhost_work_flush to vhost_work_dev_flush to reflect
> that the function flushes all the works in the dev and not just a
> specific queue or work item.
>
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> Acked-by: Jason Wang <jasowang@redhat.com>

Apparently it used local flush.work, not sure if it supposed to

use work as an argument instead of local variable, if so looks good.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 04/17] vhost: prep vhost_dev_init users to handle failures
  2020-10-22  0:34 ` [PATCH 04/17] vhost: prep vhost_dev_init users to handle failures Mike Christie
@ 2020-10-22  5:22   ` kernel test robot
  2020-10-23 16:15   ` Mike Christie
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 43+ messages in thread
From: kernel test robot @ 2020-10-22  5:22 UTC (permalink / raw)
  To: Mike Christie, martin.petersen, linux-scsi, target-devel, mst,
	jasowang, pbonzini, stefanha, virtualization
  Cc: kbuild-all, clang-built-linux


[-- Attachment #1: Type: text/plain, Size: 4184 bytes --]

Hi Mike,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on vhost/linux-next]
[also build test WARNING on v5.9 next-20201021]
[cannot apply to target/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Mike-Christie/vhost-fix-scsi-cmd-handling-and-cgroup-support/20201022-083844
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: x86_64-randconfig-a013-20201021 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project ee6abef5323d59b983129bf3514ef6775d1d6cd5)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # https://github.com/0day-ci/linux/commit/6e1629548d318c2c9af7490379a3c9d7e3cba0d5
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mike-Christie/vhost-fix-scsi-cmd-handling-and-cgroup-support/20201022-083844
        git checkout 6e1629548d318c2c9af7490379a3c9d7e3cba0d5
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/vhost/vsock.c:633:6: warning: variable 'ret' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
           if (vhost_dev_init(&vsock->dev, vqs, ARRAY_SIZE(vsock->vqs),
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/vhost/vsock.c:648:9: note: uninitialized use occurs here
           return ret;
                  ^~~
   drivers/vhost/vsock.c:633:2: note: remove the 'if' if its condition is always false
           if (vhost_dev_init(&vsock->dev, vqs, ARRAY_SIZE(vsock->vqs),
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/vhost/vsock.c:609:9: note: initialize the variable 'ret' to silence this warning
           int ret;
                  ^
                   = 0
   1 warning generated.

vim +633 drivers/vhost/vsock.c

   604	
   605	static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
   606	{
   607		struct vhost_virtqueue **vqs;
   608		struct vhost_vsock *vsock;
   609		int ret;
   610	
   611		/* This struct is large and allocation could fail, fall back to vmalloc
   612		 * if there is no other way.
   613		 */
   614		vsock = kvmalloc(sizeof(*vsock), GFP_KERNEL | __GFP_RETRY_MAYFAIL);
   615		if (!vsock)
   616			return -ENOMEM;
   617	
   618		vqs = kmalloc_array(ARRAY_SIZE(vsock->vqs), sizeof(*vqs), GFP_KERNEL);
   619		if (!vqs) {
   620			ret = -ENOMEM;
   621			goto out;
   622		}
   623	
   624		vsock->guest_cid = 0; /* no CID assigned yet */
   625	
   626		atomic_set(&vsock->queued_replies, 0);
   627	
   628		vqs[VSOCK_VQ_TX] = &vsock->vqs[VSOCK_VQ_TX];
   629		vqs[VSOCK_VQ_RX] = &vsock->vqs[VSOCK_VQ_RX];
   630		vsock->vqs[VSOCK_VQ_TX].handle_kick = vhost_vsock_handle_tx_kick;
   631		vsock->vqs[VSOCK_VQ_RX].handle_kick = vhost_vsock_handle_rx_kick;
   632	
 > 633		if (vhost_dev_init(&vsock->dev, vqs, ARRAY_SIZE(vsock->vqs),
   634				   UIO_MAXIOV, VHOST_VSOCK_PKT_WEIGHT,
   635				   VHOST_VSOCK_WEIGHT, true, NULL))
   636			goto err_dev_init;
   637	
   638		file->private_data = vsock;
   639		spin_lock_init(&vsock->send_pkt_list_lock);
   640		INIT_LIST_HEAD(&vsock->send_pkt_list);
   641		vhost_work_init(&vsock->send_pkt_work, vhost_transport_send_pkt_work);
   642		return 0;
   643	
   644	err_dev_init:
   645		kfree(vqs);
   646	out:
   647		vhost_vsock_free(vsock);
   648		return ret;
   649	}
   650	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 30985 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 04/17] vhost: prep vhost_dev_init users to handle failures
  2020-10-22  0:34 ` [PATCH 04/17] vhost: prep vhost_dev_init users to handle failures Mike Christie
  2020-10-22  5:22   ` kernel test robot
@ 2020-10-23 16:15   ` Mike Christie
  2020-11-02  5:57   ` Jason Wang
  2020-11-03 10:04   ` Dan Carpenter
  3 siblings, 0 replies; 43+ messages in thread
From: Mike Christie @ 2020-10-23 16:15 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

On 10/21/20 7:34 PM, Mike Christie wrote:
> diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
> index 86617bb..63ba363 100644
> --- a/drivers/vhost/scsi.c
> +++ b/drivers/vhost/scsi.c
> @@ -1632,14 +1632,17 @@ static int vhost_scsi_open(struct inode *inode, struct file *f)
>   		vqs[i] = &vs->vqs[i].vq;
>   		vs->vqs[i].vq.handle_kick = vhost_scsi_handle_kick;
>   	}
> -	vhost_dev_init(&vs->dev, vqs, VHOST_SCSI_MAX_VQ, UIO_MAXIOV,
> -		       VHOST_SCSI_WEIGHT, 0, true, NULL);
> +	if (vhost_dev_init(&vs->dev, vqs, VHOST_SCSI_MAX_VQ, UIO_MAXIOV,
> +			   VHOST_SCSI_WEIGHT, 0, true, NULL))


> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index f40205f..a1a35e1 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -630,9 +630,10 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
>   	vsock->vqs[VSOCK_VQ_TX].handle_kick = vhost_vsock_handle_tx_kick;
>   	vsock->vqs[VSOCK_VQ_RX].handle_kick = vhost_vsock_handle_rx_kick;
>   
> -	vhost_dev_init(&vsock->dev, vqs, ARRAY_SIZE(vsock->vqs),
> -		       UIO_MAXIOV, VHOST_VSOCK_PKT_WEIGHT,
> -		       VHOST_VSOCK_WEIGHT, true, NULL);
> +	if (vhost_dev_init(&vsock->dev, vqs, ARRAY_SIZE(vsock->vqs),
> +			   UIO_MAXIOV, VHOST_VSOCK_PKT_WEIGHT,
>

Just giving myself a review comment, so it doesn't happen like last time 
where multiple people waste their time and hit the same issue :)

I will fix this one found by the kernel test robot and fix up a similar 
scsi.c case where the return value is not propagated above.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/17] vhost scsi: add lun parser helper
  2020-10-22  0:34 ` [PATCH 01/17] vhost scsi: add lun parser helper Mike Christie
@ 2020-10-26  3:33   ` Jason Wang
  0 siblings, 0 replies; 43+ messages in thread
From: Jason Wang @ 2020-10-26  3:33 UTC (permalink / raw)
  To: Mike Christie, martin.petersen, linux-scsi, target-devel, mst,
	pbonzini, stefanha, virtualization


On 2020/10/22 上午8:34, Mike Christie wrote:
> Move code to parse lun from req's lun_buf to helper, so tmf code
> can use it in the next patch.
>
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   drivers/vhost/scsi.c | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)


Acked-by: Jason Wang <jasowang@redhat.com>


>
> diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
> index b22adf0..0ea78d0 100644
> --- a/drivers/vhost/scsi.c
> +++ b/drivers/vhost/scsi.c
> @@ -907,6 +907,11 @@ static void vhost_scsi_submission_work(struct work_struct *work)
>   	return ret;
>   }
>   
> +static u16 vhost_buf_to_lun(u8 *lun_buf)
> +{
> +	return ((lun_buf[2] << 8) | lun_buf[3]) & 0x3FFF;
> +}
> +
>   static void
>   vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
>   {
> @@ -1045,12 +1050,12 @@ static void vhost_scsi_submission_work(struct work_struct *work)
>   			tag = vhost64_to_cpu(vq, v_req_pi.tag);
>   			task_attr = v_req_pi.task_attr;
>   			cdb = &v_req_pi.cdb[0];
> -			lun = ((v_req_pi.lun[2] << 8) | v_req_pi.lun[3]) & 0x3FFF;
> +			lun = vhost_buf_to_lun(v_req_pi.lun);
>   		} else {
>   			tag = vhost64_to_cpu(vq, v_req.tag);
>   			task_attr = v_req.task_attr;
>   			cdb = &v_req.cdb[0];
> -			lun = ((v_req.lun[2] << 8) | v_req.lun[3]) & 0x3FFF;
> +			lun = vhost_buf_to_lun(v_req.lun);
>   		}
>   		/*
>   		 * Check that the received CDB size does not exceeded our

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 03/17] vhost net: use goto error handling in open
  2020-10-22  0:34 ` [PATCH 03/17] vhost net: use goto error handling in open Mike Christie
  2020-10-22  0:45   ` Chaitanya Kulkarni
@ 2020-10-26  3:34   ` Jason Wang
  1 sibling, 0 replies; 43+ messages in thread
From: Jason Wang @ 2020-10-26  3:34 UTC (permalink / raw)
  To: Mike Christie, martin.petersen, linux-scsi, target-devel, mst,
	pbonzini, stefanha, virtualization


On 2020/10/22 上午8:34, Mike Christie wrote:
> In the next patches vhost_dev_init will be able to fail. This patch has
> vhost_net_open use goto error handling like is done in the other vhost
> code to make handling vhost_dev_init failures easier to handle and
> extend in the future.
>
> Signed-off-by: Mike Christie <michael.christie@oracle.com>


Acked-by: Jason Wang <jasowang@redhat.com>


> ---
>   drivers/vhost/net.c | 29 ++++++++++++++---------------
>   1 file changed, 14 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 531a00d..831d824 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -1286,27 +1286,18 @@ static int vhost_net_open(struct inode *inode, struct file *f)
>   	if (!n)
>   		return -ENOMEM;
>   	vqs = kmalloc_array(VHOST_NET_VQ_MAX, sizeof(*vqs), GFP_KERNEL);
> -	if (!vqs) {
> -		kvfree(n);
> -		return -ENOMEM;
> -	}
> +	if (!vqs)
> +		goto err_vqs;
>   
>   	queue = kmalloc_array(VHOST_NET_BATCH, sizeof(void *),
>   			      GFP_KERNEL);
> -	if (!queue) {
> -		kfree(vqs);
> -		kvfree(n);
> -		return -ENOMEM;
> -	}
> +	if (!queue)
> +		goto err_queue;
>   	n->vqs[VHOST_NET_VQ_RX].rxq.queue = queue;
>   
>   	xdp = kmalloc_array(VHOST_NET_BATCH, sizeof(*xdp), GFP_KERNEL);
> -	if (!xdp) {
> -		kfree(vqs);
> -		kvfree(n);
> -		kfree(queue);
> -		return -ENOMEM;
> -	}
> +	if (!xdp)
> +		goto err_xdp;
>   	n->vqs[VHOST_NET_VQ_TX].xdp = xdp;
>   
>   	dev = &n->dev;
> @@ -1338,6 +1329,14 @@ static int vhost_net_open(struct inode *inode, struct file *f)
>   	n->refcnt_bias = 0;
>   
>   	return 0;
> +
> +err_xdp:
> +	kfree(queue);
> +err_queue:
> +	kfree(vqs);
> +err_vqs:
> +	kvfree(n);
> +	return -ENOMEM;
>   }
>   
>   static struct socket *vhost_net_stop_vq(struct vhost_net *n,

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 07/17] vhost scsi: support delayed IO vq creation
  2020-10-22  0:34 ` [PATCH 07/17] vhost scsi: support delayed IO " Mike Christie
@ 2020-10-26  3:51   ` Jason Wang
  2020-10-27  5:47     ` Mike Christie
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Wang @ 2020-10-26  3:51 UTC (permalink / raw)
  To: Mike Christie, martin.petersen, linux-scsi, target-devel, mst,
	pbonzini, stefanha, virtualization


On 2020/10/22 上午8:34, Mike Christie wrote:
> Each vhost-scsi device will need a evt and ctl queue, but the number
> of IO queues depends on whatever the user has configured in userspace.
> This patch has vhost-scsi create the evt, ctl and one IO vq at device
> open time. We then create the other IO vqs when userspace starts to
> set them up. We still waste some mem on the vq and scsi vq structs,
> but we don't waste mem on iovec related arrays and for later patches
> we know which queues are used by the dev->nvqs value.
>
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>   drivers/vhost/scsi.c | 19 +++++++++++++++----
>   1 file changed, 15 insertions(+), 4 deletions(-)


Not familiar with SCSI. But I wonder if it could behave like vhost-net.

E.g userspace should known the number of virtqueues so it can just open 
and close multiple vhost-scsi file descriptors.

Thanks

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 07/17] vhost scsi: support delayed IO vq creation
  2020-10-26  3:51   ` Jason Wang
@ 2020-10-27  5:47     ` Mike Christie
  2020-10-28  1:55       ` Jason Wang
  2020-10-30  8:47       ` Michael S. Tsirkin
  0 siblings, 2 replies; 43+ messages in thread
From: Mike Christie @ 2020-10-27  5:47 UTC (permalink / raw)
  To: Jason Wang, martin.petersen, linux-scsi, target-devel, mst,
	pbonzini, stefanha, virtualization

On 10/25/20 10:51 PM, Jason Wang wrote:
> 
> On 2020/10/22 上午8:34, Mike Christie wrote:
>> Each vhost-scsi device will need a evt and ctl queue, but the number
>> of IO queues depends on whatever the user has configured in userspace.
>> This patch has vhost-scsi create the evt, ctl and one IO vq at device
>> open time. We then create the other IO vqs when userspace starts to
>> set them up. We still waste some mem on the vq and scsi vq structs,
>> but we don't waste mem on iovec related arrays and for later patches
>> we know which queues are used by the dev->nvqs value.
>>
>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
>> ---
>>   drivers/vhost/scsi.c | 19 +++++++++++++++----
>>   1 file changed, 15 insertions(+), 4 deletions(-)
> 
> 
> Not familiar with SCSI. But I wonder if it could behave like vhost-net.
> 
> E.g userspace should known the number of virtqueues so it can just open 
> and close multiple vhost-scsi file descriptors.
> 

One hiccup I'm hitting is that we might end up creating about 3x more 
vqs than we need. The problem is that for scsi each vhost device has:

vq=0: special control vq
vq=1: event vq
vq=2 and above: SCSI CMD/IO vqs. We want to create N of these.

Today we do:

Uerspace does open(/dev/vhost-scsi)
         vhost_dev_init(create 128 vqs and then later we setup and use N 
of them);

Qemu does ioctl(VHOST_SET_OWNER)
         vhost_dev_set_owner()

For N vqs userspace does:
         // virtqueue setup related ioctls

Qemu does ioctl(VHOST_SCSI_SET_ENDPOINT)
         - match LIO/target port to vhost_dev


So we could change that to:

For N IO vqs userspace does
         open(/dev/vhost-scsi)
                 vhost_dev_init(create IO, evt, and ctl);

for N IO vqs Qemu does:
         ioctl(VHOST_SET_OWNER)
                 vhost_dev_set_owner()

for N IO vqs Qemu does:
         // virtqueue setup related ioctls

for N IO vqs Qemu does:
         ioctl(VHOST_SCSI_SET_ENDPOINT)
                 - match LIO/target port to vhost_dev and assemble the 
multiple vhost_dev device.

The problem is that we have to setup some of the evt/ctl specific parts 
at open() time when vhost_dev_init does vhost_poll_init for example.

- At open time, we don't know if this vhost_dev is going to be part of a 
multiple vhost_device device or a single one so we need to create at 
least 3 of them
- If it is a multiple device we don't know if its the first device being 
created for the device or the N'th, so we don't know if the dev's vqs 
will be used for IO or ctls/evts, so we have to create all 3.

When we get the first VHOST_SCSI_SET_ENDPOINT call for a new style 
multiple vhost_dev device, we can use that dev's evt/ctl vqs for 
events/controls requests. When we get the other VHOST_SCSI_SET_ENDPOINT 
calls for the multiple vhost_dev device then those dev's evt/ctl vqs 
will be ignored and we will only use their IO vqs. So we end up with a 
lot of extra vqs.


One other question/issue I have is that qemu can open the 
/dev/vhost-scsi device or it allows tools like libvirtd to open the 
device and pass in the fd to use. For the latter case, would we continue 
to have those tools pass in the leading fd, then have qemu do the other 
num_queues - 1 open(/dev/vhost-scsi) calls? Or do these apps that pass 
in the fd need to know about all of the fds for some management reason?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 09/17] vhost scsi: fix cmd completion race
  2020-10-22  0:34 ` [PATCH 09/17] vhost scsi: fix cmd completion race Mike Christie
@ 2020-10-27 13:07   ` Maurizio Lombardi
  2020-10-30  8:51   ` Michael S. Tsirkin
  1 sibling, 0 replies; 43+ messages in thread
From: Maurizio Lombardi @ 2020-10-27 13:07 UTC (permalink / raw)
  To: Mike Christie, martin.petersen, linux-scsi, target-devel, mst,
	jasowang, pbonzini, stefanha, virtualization



Dne 22. 10. 20 v 2:34 Mike Christie napsal(a):
> We might not do the final se_cmd put from vhost_scsi_complete_cmd_work.
> When the last put happens a little later then we could race where
> vhost_scsi_complete_cmd_work does vhost_signal, the guest runs and sends
> more IO, and vhost_scsi_handle_vq runs but does not find any free cmds.
> 
> This patch has us delay completing the cmd until the last lio core ref
> is dropped. We then know that once we signal to the guest that the cmd
> is completed that if it queues a new command it will find a free cmd.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/vhost/scsi.c | 42 +++++++++++++++---------------------------
>  1 file changed, 15 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
> index f6b9010..2fa48dd 100644
> --- a/drivers/vhost/scsi.c
> +++ b/drivers/vhost/scsi.c
> @@ -322,7 +322,7 @@ static u32 vhost_scsi_tpg_get_inst_index(struct se_portal_group *se_tpg)
>  	return 1;
>  }
>  
> -static void vhost_scsi_release_cmd(struct se_cmd *se_cmd)
> +static void vhost_scsi_release_cmd_res(struct se_cmd *se_cmd)
>  {
>  	struct vhost_scsi_cmd *tv_cmd = container_of(se_cmd,
>  				struct vhost_scsi_cmd, tvc_se_cmd);
> @@ -344,6 +344,16 @@ static void vhost_scsi_release_cmd(struct se_cmd *se_cmd)
>  	vhost_scsi_put_inflight(inflight);
>  }
>  
> +static void vhost_scsi_release_cmd(struct se_cmd *se_cmd)
> +{
> +	struct vhost_scsi_cmd *cmd = container_of(se_cmd,
> +					struct vhost_scsi_cmd, tvc_se_cmd);
> +	struct vhost_scsi *vs = cmd->tvc_vhost;
> +
> +	llist_add(&cmd->tvc_completion_list, &vs->vs_completion_list);
> +	vhost_work_queue(&vs->dev, &vs->vs_completion_work);
> +}
> +
>  static u32 vhost_scsi_sess_get_index(struct se_session *se_sess)
>  {
>  	return 0;
> @@ -366,28 +376,15 @@ static int vhost_scsi_get_cmd_state(struct se_cmd *se_cmd)
>  	return 0;
>  }
>  
> -static void vhost_scsi_complete_cmd(struct vhost_scsi_cmd *cmd)
> -{
> -	struct vhost_scsi *vs = cmd->tvc_vhost;
> -
> -	llist_add(&cmd->tvc_completion_list, &vs->vs_completion_list);
> -
> -	vhost_work_queue(&vs->dev, &vs->vs_completion_work);
> -}
> -
>  static int vhost_scsi_queue_data_in(struct se_cmd *se_cmd)
>  {
> -	struct vhost_scsi_cmd *cmd = container_of(se_cmd,
> -				struct vhost_scsi_cmd, tvc_se_cmd);
> -	vhost_scsi_complete_cmd(cmd);
> +	transport_generic_free_cmd(se_cmd, 0);
>  	return 0;
>  }
>  
>  static int vhost_scsi_queue_status(struct se_cmd *se_cmd)
>  {
> -	struct vhost_scsi_cmd *cmd = container_of(se_cmd,
> -				struct vhost_scsi_cmd, tvc_se_cmd);
> -	vhost_scsi_complete_cmd(cmd);
> +	transport_generic_free_cmd(se_cmd, 0);
>  	return 0;
>  }
>  
> @@ -433,15 +430,6 @@ static void vhost_scsi_free_evt(struct vhost_scsi *vs, struct vhost_scsi_evt *ev
>  	return evt;
>  }
>  
> -static void vhost_scsi_free_cmd(struct vhost_scsi_cmd *cmd)
> -{
> -	struct se_cmd *se_cmd = &cmd->tvc_se_cmd;
> -
> -	/* TODO locking against target/backend threads? */
> -	transport_generic_free_cmd(se_cmd, 0);
> -
> -}
> -
>  static int vhost_scsi_check_stop_free(struct se_cmd *se_cmd)
>  {
>  	return target_put_sess_cmd(se_cmd);
> @@ -560,7 +548,7 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work *work)
>  		} else
>  			pr_err("Faulted on virtio_scsi_cmd_resp\n");
>  
> -		vhost_scsi_free_cmd(cmd);
> +		vhost_scsi_release_cmd_res(se_cmd);
>  	}
>  
>  	vq = -1;
> @@ -1096,7 +1084,7 @@ static u16 vhost_buf_to_lun(u8 *lun_buf)
>  						      &prot_iter, exp_data_len,
>  						      &data_iter))) {
>  				vq_err(vq, "Failed to map iov to sgl\n");
> -				vhost_scsi_release_cmd(&cmd->tvc_se_cmd);
> +				vhost_scsi_release_cmd_res(&cmd->tvc_se_cmd);
>  				goto err;
>  			}
>  		}
> 

Looks ok to me.

Reviewed-by: Maurizio Lombardi <mlombard@redhat.com>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 07/17] vhost scsi: support delayed IO vq creation
  2020-10-27  5:47     ` Mike Christie
@ 2020-10-28  1:55       ` Jason Wang
  2020-10-30  8:47       ` Michael S. Tsirkin
  1 sibling, 0 replies; 43+ messages in thread
From: Jason Wang @ 2020-10-28  1:55 UTC (permalink / raw)
  To: Mike Christie, martin.petersen, linux-scsi, target-devel, mst,
	pbonzini, stefanha, virtualization


On 2020/10/27 下午1:47, Mike Christie wrote:
> On 10/25/20 10:51 PM, Jason Wang wrote:
>>
>> On 2020/10/22 上午8:34, Mike Christie wrote:
>>> Each vhost-scsi device will need a evt and ctl queue, but the number
>>> of IO queues depends on whatever the user has configured in userspace.
>>> This patch has vhost-scsi create the evt, ctl and one IO vq at device
>>> open time. We then create the other IO vqs when userspace starts to
>>> set them up. We still waste some mem on the vq and scsi vq structs,
>>> but we don't waste mem on iovec related arrays and for later patches
>>> we know which queues are used by the dev->nvqs value.
>>>
>>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
>>> ---
>>>   drivers/vhost/scsi.c | 19 +++++++++++++++----
>>>   1 file changed, 15 insertions(+), 4 deletions(-)
>>
>>
>> Not familiar with SCSI. But I wonder if it could behave like vhost-net.
>>
>> E.g userspace should known the number of virtqueues so it can just 
>> open and close multiple vhost-scsi file descriptors.
>>
>
> One hiccup I'm hitting is that we might end up creating about 3x more 
> vqs than we need. The problem is that for scsi each vhost device has:
>
> vq=0: special control vq
> vq=1: event vq
> vq=2 and above: SCSI CMD/IO vqs. We want to create N of these.
>
> Today we do:
>
> Uerspace does open(/dev/vhost-scsi)
>         vhost_dev_init(create 128 vqs and then later we setup and use 
> N of them);
>
> Qemu does ioctl(VHOST_SET_OWNER)
>         vhost_dev_set_owner()
>
> For N vqs userspace does:
>         // virtqueue setup related ioctls
>
> Qemu does ioctl(VHOST_SCSI_SET_ENDPOINT)
>         - match LIO/target port to vhost_dev
>
>
> So we could change that to:
>
> For N IO vqs userspace does
>         open(/dev/vhost-scsi)
>                 vhost_dev_init(create IO, evt, and ctl);
>
> for N IO vqs Qemu does:
>         ioctl(VHOST_SET_OWNER)
>                 vhost_dev_set_owner()
>
> for N IO vqs Qemu does:
>         // virtqueue setup related ioctls
>
> for N IO vqs Qemu does:
>         ioctl(VHOST_SCSI_SET_ENDPOINT)
>                 - match LIO/target port to vhost_dev and assemble the 
> multiple vhost_dev device.
>
> The problem is that we have to setup some of the evt/ctl specific 
> parts at open() time when vhost_dev_init does vhost_poll_init for 
> example.
>
> - At open time, we don't know if this vhost_dev is going to be part of 
> a multiple vhost_device device or a single one so we need to create at 
> least 3 of them
> - If it is a multiple device we don't know if its the first device 
> being created for the device or the N'th, so we don't know if the 
> dev's vqs will be used for IO or ctls/evts, so we have to create all 3.
>
> When we get the first VHOST_SCSI_SET_ENDPOINT call for a new style 
> multiple vhost_dev device, we can use that dev's evt/ctl vqs for 
> events/controls requests. When we get the other 
> VHOST_SCSI_SET_ENDPOINT calls for the multiple vhost_dev device then 
> those dev's evt/ctl vqs will be ignored and we will only use their IO 
> vqs. So we end up with a lot of extra vqs.


Right, so in this case we can use this patch to address this issue 
probably. If evt/ctl vq is not used, we won't even create them.


>
>
> One other question/issue I have is that qemu can open the 
> /dev/vhost-scsi device or it allows tools like libvirtd to open the 
> device and pass in the fd to use.


It allows libvirt to open and pass fds to qemu. This is how multie-queue 
virtio-net is done, libvirt is in charge of opening multiple file 
descriptors and pass them to qemu.


> For the latter case, would we continue to have those tools pass in the 
> leading fd, then have qemu do the other num_queues - 1 
> open(/dev/vhost-scsi) calls? Or do these apps that pass in the fd need 
> to know about all of the fds for some management reason?


Usually qemu is running without privilege. So it depends on the 
management to open the device.

Note that I'm not object your proposal, just want to see if it could be 
done via a more easy way. During the development if multiqueue 
virito-net, something similar as you've done was proposed but we end up 
with the multiple vhost-net fd model which keeps kernel code unchanged.

Thanks

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 00/17 V3] vhost: fix scsi cmd handling and cgroup support
  2020-10-22  0:34 [PATCH 00/17 V3] vhost: fix scsi cmd handling and cgroup support Mike Christie
                   ` (16 preceding siblings ...)
  2020-10-22  0:35 ` [PATCH 17/17] vhost scsi: drop submission workqueue Mike Christie
@ 2020-10-29 21:47 ` Michael S. Tsirkin
  2020-10-29 22:19   ` Mike Christie
  17 siblings, 1 reply; 43+ messages in thread
From: Michael S. Tsirkin @ 2020-10-29 21:47 UTC (permalink / raw)
  To: Mike Christie
  Cc: martin.petersen, linux-scsi, target-devel, jasowang, pbonzini,
	stefanha, virtualization

On Wed, Oct 21, 2020 at 07:34:46PM -0500, Mike Christie wrote:
> In-Reply-To: 
> 
> The following patches were made over Michael's vhost branch here:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/log/?h=vhost
> 
> They fix a couple issues with vhost-scsi when we hit the 256 cmd limit
> that result in the guest getting IO errors, add LUN reset support so
> devices are not offlined during transient errors, allow us to manage
> vhost scsi IO with cgroups, and imrpove IOPs up to 2X.
> 
> The following patches are a follow up to this post:
> https://patchwork.kernel.org/project/target-devel/cover/1600712588-9514-1-git-send-email-michael.christie@oracle.com/
> which originally was fixing how vhost-scsi handled cmds so we would
> not get IO errors when sending more than 256 cmds.
> 
> In that patchset I needed to detect if a vq was in use and for this
> patch:
> https://patchwork.kernel.org/project/target-devel/patch/1600712588-9514-3-git-send-email-michael.christie@oracle.com/
> It was suggested to add support for VHOST_RING_ENABLE. While doing
> that though I hit a couple problems:
> 
> 1. The patches moved how vhost-scsi allocated cmds from per lio
> session to per vhost vq. To support both VHOST_RING_ENABLE and
> where userspace didn't support it, I would have to keep around the
> old per session/device cmd allocator/completion and then also maintain
> the new code. Or, I would still have to use this patch
> patchwork.kernel.org/cover/11790763/ for the compat case so there
> adding the new ioctl would not help much.
> 
> 2. For vhost-scsi I also wanted to prevent where we allocate iovecs
> for 128 vqs even though we normally use a couple. To do this, I needed
> something similar to #1, but the problem is that the VHOST_RING_ENABLE
> call would come too late.
> 
> To try and balance #1 and #2, these patches just allow vhost-scsi
> to setup a vq when userspace starts to config it. This allows the
> driver to only fully setup (we still waste some memory to support older
> setups but do not have to preallocate everything like before) what
> is used plus I do not need to maintain 2 code paths.


OK, so could we get a patchset with just bugfixes for this release
please?
And features should go into next one ...

> V3:
> - fix compile errors
> - fix possible crash where cmd could be freed while adding it to
> completion list
> - fix issue where we added the worker thread to the blk cgroup but
> the blk IO was submitted by a driver workqueue.
> 
> V2:
> - fix use before set cpu var errors
> - drop vhost_vq_is_setup
> - include patches to do a worker thread per scsi IO vq
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 00/17 V3] vhost: fix scsi cmd handling and cgroup support
  2020-10-29 21:47 ` [PATCH 00/17 V3] vhost: fix scsi cmd handling and cgroup support Michael S. Tsirkin
@ 2020-10-29 22:19   ` Mike Christie
  0 siblings, 0 replies; 43+ messages in thread
From: Mike Christie @ 2020-10-29 22:19 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: martin.petersen, linux-scsi, target-devel, jasowang, pbonzini,
	stefanha, virtualization

On 10/29/20 4:47 PM, Michael S. Tsirkin wrote:
> On Wed, Oct 21, 2020 at 07:34:46PM -0500, Mike Christie wrote:
>> In-Reply-To:
>>
>> The following patches were made over Michael's vhost branch here:
>>
>> https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/log/?h=vhost__;!!GqivPVa7Brio!ORlQVwL5FxDLtNmvh5l9nLqhQJOO6UexX4vl-NrAhagQG9dAGFNFCPXoSNU8rW75g3OH$
>>
>> They fix a couple issues with vhost-scsi when we hit the 256 cmd limit
>> that result in the guest getting IO errors, add LUN reset support so
>> devices are not offlined during transient errors, allow us to manage
>> vhost scsi IO with cgroups, and imrpove IOPs up to 2X.
>>
>> The following patches are a follow up to this post:
>> https://urldefense.com/v3/__https://patchwork.kernel.org/project/target-devel/cover/1600712588-9514-1-git-send-email-michael.christie@oracle.com/__;!!GqivPVa7Brio!ORlQVwL5FxDLtNmvh5l9nLqhQJOO6UexX4vl-NrAhagQG9dAGFNFCPXoSNU8rXJWM8fh$
>> which originally was fixing how vhost-scsi handled cmds so we would
>> not get IO errors when sending more than 256 cmds.
>>
>> In that patchset I needed to detect if a vq was in use and for this
>> patch:
>> https://urldefense.com/v3/__https://patchwork.kernel.org/project/target-devel/patch/1600712588-9514-3-git-send-email-michael.christie@oracle.com/__;!!GqivPVa7Brio!ORlQVwL5FxDLtNmvh5l9nLqhQJOO6UexX4vl-NrAhagQG9dAGFNFCPXoSNU8rbRNqMbK$
>> It was suggested to add support for VHOST_RING_ENABLE. While doing
>> that though I hit a couple problems:
>>
>> 1. The patches moved how vhost-scsi allocated cmds from per lio
>> session to per vhost vq. To support both VHOST_RING_ENABLE and
>> where userspace didn't support it, I would have to keep around the
>> old per session/device cmd allocator/completion and then also maintain
>> the new code. Or, I would still have to use this patch
>> patchwork.kernel.org/cover/11790763/ for the compat case so there
>> adding the new ioctl would not help much.
>>
>> 2. For vhost-scsi I also wanted to prevent where we allocate iovecs
>> for 128 vqs even though we normally use a couple. To do this, I needed
>> something similar to #1, but the problem is that the VHOST_RING_ENABLE
>> call would come too late.
>>
>> To try and balance #1 and #2, these patches just allow vhost-scsi
>> to setup a vq when userspace starts to config it. This allows the
>> driver to only fully setup (we still waste some memory to support older
>> setups but do not have to preallocate everything like before) what
>> is used plus I do not need to maintain 2 code paths.
> 
> 
> OK, so could we get a patchset with just bugfixes for this release
> please? > And features should go into next one ...

Yeah, that sounds good.

Just to make sure I am on the same page as you and Jason about what 
patches are features vs fixes/cleanups. I'm thinking patches 1 - 11 are 
to resend in the fixes patchset?

0. Patches 1 - 2 are adding helpers I use later.

1. Patches 3 - 8 are related to fixing IO errors due to the VM sending 
virtqueue_size/cmd_per_lun commands but vhost-scsi having 256 hard coded.

Patch:

[PATCH 08/17] vhost scsi: alloc cmds per vq instead of session

is what fixes the issue in vhost-scsi so we allocate enough resource to 
match what the VM is going to send us, but is built on patches 3 - 7 
which allow vhost-scsi to know which vqs to allocate cmd resrouces for.

[PATCH 03/17] vhost net: use goto error handling in open
[PATCH 04/17] vhost: prep vhost_dev_init users to handle failures
[PATCH 05/17] vhost: move vq iovec allocation to dev init time
[PATCH 06/17] vhost: support delayed vq creation
[PATCH 07/17] vhost scsi: support delayed IO vq creation

2. Patch 9 fixes a race where we signal userspace the cmd is done before 
we were done with it. We can then get IO errors if the VM sends a new IO 
before we free up the old cmd.


3. Patch 10 adds LUN reset support. Currently, if the real/backing 
device hits a temp issue, the VM's scsi/block layer cmd timer might fire 
and send a reset. We don't implement this, so we just fail. The VM's 
device goes offline and the app in the VM gets IO errors.

4. Patch 11 removes extra flush calls.


Patches 12 - 17 are adding support for the thread per VQ which is a perf 
feature so resend them separately when we figure what is best.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 07/17] vhost scsi: support delayed IO vq creation
  2020-10-27  5:47     ` Mike Christie
  2020-10-28  1:55       ` Jason Wang
@ 2020-10-30  8:47       ` Michael S. Tsirkin
  2020-10-30 16:30         ` Mike Christie
                           ` (2 more replies)
  1 sibling, 3 replies; 43+ messages in thread
From: Michael S. Tsirkin @ 2020-10-30  8:47 UTC (permalink / raw)
  To: Mike Christie
  Cc: Jason Wang, martin.petersen, linux-scsi, target-devel, pbonzini,
	stefanha, virtualization

On Tue, Oct 27, 2020 at 12:47:34AM -0500, Mike Christie wrote:
> On 10/25/20 10:51 PM, Jason Wang wrote:
> > 
> > On 2020/10/22 上午8:34, Mike Christie wrote:
> > > Each vhost-scsi device will need a evt and ctl queue, but the number
> > > of IO queues depends on whatever the user has configured in userspace.
> > > This patch has vhost-scsi create the evt, ctl and one IO vq at device
> > > open time. We then create the other IO vqs when userspace starts to
> > > set them up. We still waste some mem on the vq and scsi vq structs,
> > > but we don't waste mem on iovec related arrays and for later patches
> > > we know which queues are used by the dev->nvqs value.
> > > 
> > > Signed-off-by: Mike Christie <michael.christie@oracle.com>
> > > ---
> > >   drivers/vhost/scsi.c | 19 +++++++++++++++----
> > >   1 file changed, 15 insertions(+), 4 deletions(-)
> > 
> > 
> > Not familiar with SCSI. But I wonder if it could behave like vhost-net.
> > 
> > E.g userspace should known the number of virtqueues so it can just open
> > and close multiple vhost-scsi file descriptors.
> > 
> 
> One hiccup I'm hitting is that we might end up creating about 3x more vqs
> than we need. The problem is that for scsi each vhost device has:
> 
> vq=0: special control vq
> vq=1: event vq
> vq=2 and above: SCSI CMD/IO vqs. We want to create N of these.
> 
> Today we do:
> 
> Uerspace does open(/dev/vhost-scsi)
>         vhost_dev_init(create 128 vqs and then later we setup and use N of
> them);
> 
> Qemu does ioctl(VHOST_SET_OWNER)
>         vhost_dev_set_owner()
> 
> For N vqs userspace does:
>         // virtqueue setup related ioctls
> 
> Qemu does ioctl(VHOST_SCSI_SET_ENDPOINT)
>         - match LIO/target port to vhost_dev
> 
> 
> So we could change that to:
> 
> For N IO vqs userspace does
>         open(/dev/vhost-scsi)
>                 vhost_dev_init(create IO, evt, and ctl);
> 
> for N IO vqs Qemu does:
>         ioctl(VHOST_SET_OWNER)
>                 vhost_dev_set_owner()
> 
> for N IO vqs Qemu does:
>         // virtqueue setup related ioctls
> 
> for N IO vqs Qemu does:
>         ioctl(VHOST_SCSI_SET_ENDPOINT)
>                 - match LIO/target port to vhost_dev and assemble the
> multiple vhost_dev device.
> 
> The problem is that we have to setup some of the evt/ctl specific parts at
> open() time when vhost_dev_init does vhost_poll_init for example.
> 
> - At open time, we don't know if this vhost_dev is going to be part of a
> multiple vhost_device device or a single one so we need to create at least 3
> of them
> - If it is a multiple device we don't know if its the first device being
> created for the device or the N'th, so we don't know if the dev's vqs will
> be used for IO or ctls/evts, so we have to create all 3.
> 
> When we get the first VHOST_SCSI_SET_ENDPOINT call for a new style multiple
> vhost_dev device, we can use that dev's evt/ctl vqs for events/controls
> requests. When we get the other VHOST_SCSI_SET_ENDPOINT calls for the
> multiple vhost_dev device then those dev's evt/ctl vqs will be ignored and
> we will only use their IO vqs. So we end up with a lot of extra vqs.

The issue Jason's hinting at is how can admins control the amount
of resources a given qemu instance can consume?
After all vhost vqs all live in host kernel memory ...
Limiting # of open fds would be one way to do that ...

The need to share event/control vqs between devices is a problem though,
and sending lots of ioctls on things like reset is also not that elegant.
Jason, did you have a good solution in mind?

> One other question/issue I have is that qemu can open the /dev/vhost-scsi
> device or it allows tools like libvirtd to open the device and pass in the
> fd to use. For the latter case, would we continue to have those tools pass
> in the leading fd, then have qemu do the other num_queues - 1
> open(/dev/vhost-scsi) calls? Or do these apps that pass in the fd need to
> know about all of the fds for some management reason?

They know about all the fds, for resource control and priveledge
separation reasons.

-- 
MST

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 09/17] vhost scsi: fix cmd completion race
  2020-10-22  0:34 ` [PATCH 09/17] vhost scsi: fix cmd completion race Mike Christie
  2020-10-27 13:07   ` Maurizio Lombardi
@ 2020-10-30  8:51   ` Michael S. Tsirkin
  2020-10-30 16:04     ` Paolo Bonzini
  1 sibling, 1 reply; 43+ messages in thread
From: Michael S. Tsirkin @ 2020-10-30  8:51 UTC (permalink / raw)
  To: Mike Christie
  Cc: martin.petersen, linux-scsi, target-devel, jasowang, pbonzini,
	stefanha, virtualization

On Wed, Oct 21, 2020 at 07:34:55PM -0500, Mike Christie wrote:
> We might not do the final se_cmd put from vhost_scsi_complete_cmd_work.
> When the last put happens a little later then we could race where
> vhost_scsi_complete_cmd_work does vhost_signal, the guest runs and sends
> more IO, and vhost_scsi_handle_vq runs but does not find any free cmds.
> 
> This patch has us delay completing the cmd until the last lio core ref
> is dropped. We then know that once we signal to the guest that the cmd
> is completed that if it queues a new command it will find a free cmd.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>


Paolo, could you review this one?

> ---
>  drivers/vhost/scsi.c | 42 +++++++++++++++---------------------------
>  1 file changed, 15 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
> index f6b9010..2fa48dd 100644
> --- a/drivers/vhost/scsi.c
> +++ b/drivers/vhost/scsi.c
> @@ -322,7 +322,7 @@ static u32 vhost_scsi_tpg_get_inst_index(struct se_portal_group *se_tpg)
>  	return 1;
>  }
>  
> -static void vhost_scsi_release_cmd(struct se_cmd *se_cmd)
> +static void vhost_scsi_release_cmd_res(struct se_cmd *se_cmd)
>  {
>  	struct vhost_scsi_cmd *tv_cmd = container_of(se_cmd,
>  				struct vhost_scsi_cmd, tvc_se_cmd);
> @@ -344,6 +344,16 @@ static void vhost_scsi_release_cmd(struct se_cmd *se_cmd)
>  	vhost_scsi_put_inflight(inflight);
>  }
>  
> +static void vhost_scsi_release_cmd(struct se_cmd *se_cmd)
> +{
> +	struct vhost_scsi_cmd *cmd = container_of(se_cmd,
> +					struct vhost_scsi_cmd, tvc_se_cmd);
> +	struct vhost_scsi *vs = cmd->tvc_vhost;
> +
> +	llist_add(&cmd->tvc_completion_list, &vs->vs_completion_list);
> +	vhost_work_queue(&vs->dev, &vs->vs_completion_work);
> +}
> +
>  static u32 vhost_scsi_sess_get_index(struct se_session *se_sess)
>  {
>  	return 0;
> @@ -366,28 +376,15 @@ static int vhost_scsi_get_cmd_state(struct se_cmd *se_cmd)
>  	return 0;
>  }
>  
> -static void vhost_scsi_complete_cmd(struct vhost_scsi_cmd *cmd)
> -{
> -	struct vhost_scsi *vs = cmd->tvc_vhost;
> -
> -	llist_add(&cmd->tvc_completion_list, &vs->vs_completion_list);
> -
> -	vhost_work_queue(&vs->dev, &vs->vs_completion_work);
> -}
> -
>  static int vhost_scsi_queue_data_in(struct se_cmd *se_cmd)
>  {
> -	struct vhost_scsi_cmd *cmd = container_of(se_cmd,
> -				struct vhost_scsi_cmd, tvc_se_cmd);
> -	vhost_scsi_complete_cmd(cmd);
> +	transport_generic_free_cmd(se_cmd, 0);
>  	return 0;
>  }
>  
>  static int vhost_scsi_queue_status(struct se_cmd *se_cmd)
>  {
> -	struct vhost_scsi_cmd *cmd = container_of(se_cmd,
> -				struct vhost_scsi_cmd, tvc_se_cmd);
> -	vhost_scsi_complete_cmd(cmd);
> +	transport_generic_free_cmd(se_cmd, 0);
>  	return 0;
>  }
>  
> @@ -433,15 +430,6 @@ static void vhost_scsi_free_evt(struct vhost_scsi *vs, struct vhost_scsi_evt *ev
>  	return evt;
>  }
>  
> -static void vhost_scsi_free_cmd(struct vhost_scsi_cmd *cmd)
> -{
> -	struct se_cmd *se_cmd = &cmd->tvc_se_cmd;
> -
> -	/* TODO locking against target/backend threads? */
> -	transport_generic_free_cmd(se_cmd, 0);
> -
> -}
> -
>  static int vhost_scsi_check_stop_free(struct se_cmd *se_cmd)
>  {
>  	return target_put_sess_cmd(se_cmd);
> @@ -560,7 +548,7 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work *work)
>  		} else
>  			pr_err("Faulted on virtio_scsi_cmd_resp\n");
>  
> -		vhost_scsi_free_cmd(cmd);
> +		vhost_scsi_release_cmd_res(se_cmd);
>  	}
>  
>  	vq = -1;
> @@ -1096,7 +1084,7 @@ static u16 vhost_buf_to_lun(u8 *lun_buf)
>  						      &prot_iter, exp_data_len,
>  						      &data_iter))) {
>  				vq_err(vq, "Failed to map iov to sgl\n");
> -				vhost_scsi_release_cmd(&cmd->tvc_se_cmd);
> +				vhost_scsi_release_cmd_res(&cmd->tvc_se_cmd);
>  				goto err;
>  			}
>  		}
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 09/17] vhost scsi: fix cmd completion race
  2020-10-30  8:51   ` Michael S. Tsirkin
@ 2020-10-30 16:04     ` Paolo Bonzini
  0 siblings, 0 replies; 43+ messages in thread
From: Paolo Bonzini @ 2020-10-30 16:04 UTC (permalink / raw)
  To: Michael S. Tsirkin, Mike Christie
  Cc: martin.petersen, linux-scsi, target-devel, jasowang, stefanha,
	virtualization

On 30/10/20 09:51, Michael S. Tsirkin wrote:
> On Wed, Oct 21, 2020 at 07:34:55PM -0500, Mike Christie wrote:
>> We might not do the final se_cmd put from vhost_scsi_complete_cmd_work.
>> When the last put happens a little later then we could race where
>> vhost_scsi_complete_cmd_work does vhost_signal, the guest runs and sends
>> more IO, and vhost_scsi_handle_vq runs but does not find any free cmds.
>>
>> This patch has us delay completing the cmd until the last lio core ref
>> is dropped. We then know that once we signal to the guest that the cmd
>> is completed that if it queues a new command it will find a free cmd.
>>
>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> 
> Paolo, could you review this one?

I don't know how LIO does all the callbacks, honestly (I have only ever
worked on the virtio-scsi driver, not vhost-scsi, and I have only ever
reviewed some virtio-scsi spec bits of vhost-scsi).

The vhost_scsi_complete_cmd_work parts look fine, but I have no idea why
vhost_scsi_queue_data_in and vhost_scsi_queue_status call.

Paolo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 07/17] vhost scsi: support delayed IO vq creation
  2020-10-30  8:47       ` Michael S. Tsirkin
@ 2020-10-30 16:30         ` Mike Christie
  2020-10-30 17:26           ` Mike Christie
  2020-11-01 22:06         ` Mike Christie
  2020-11-02  6:36         ` Jason Wang
  2 siblings, 1 reply; 43+ messages in thread
From: Mike Christie @ 2020-10-30 16:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, martin.petersen, linux-scsi, target-devel, pbonzini,
	stefanha, virtualization

On 10/30/20 3:47 AM, Michael S. Tsirkin wrote:
> On Tue, Oct 27, 2020 at 12:47:34AM -0500, Mike Christie wrote:
>> On 10/25/20 10:51 PM, Jason Wang wrote:
>>>
>>> On 2020/10/22 上午8:34, Mike Christie wrote:
>>>> Each vhost-scsi device will need a evt and ctl queue, but the number
>>>> of IO queues depends on whatever the user has configured in userspace.
>>>> This patch has vhost-scsi create the evt, ctl and one IO vq at device
>>>> open time. We then create the other IO vqs when userspace starts to
>>>> set them up. We still waste some mem on the vq and scsi vq structs,
>>>> but we don't waste mem on iovec related arrays and for later patches
>>>> we know which queues are used by the dev->nvqs value.
>>>>
>>>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
>>>> ---
>>>>    drivers/vhost/scsi.c | 19 +++++++++++++++----
>>>>    1 file changed, 15 insertions(+), 4 deletions(-)
>>>
>>>
>>> Not familiar with SCSI. But I wonder if it could behave like vhost-net.
>>>
>>> E.g userspace should known the number of virtqueues so it can just open
>>> and close multiple vhost-scsi file descriptors.
>>>
>>
>> One hiccup I'm hitting is that we might end up creating about 3x more vqs
>> than we need. The problem is that for scsi each vhost device has:
>>
>> vq=0: special control vq
>> vq=1: event vq
>> vq=2 and above: SCSI CMD/IO vqs. We want to create N of these.
>>
>> Today we do:
>>
>> Uerspace does open(/dev/vhost-scsi)
>>          vhost_dev_init(create 128 vqs and then later we setup and use N of
>> them);
>>
>> Qemu does ioctl(VHOST_SET_OWNER)
>>          vhost_dev_set_owner()
>>
>> For N vqs userspace does:
>>          // virtqueue setup related ioctls
>>
>> Qemu does ioctl(VHOST_SCSI_SET_ENDPOINT)
>>          - match LIO/target port to vhost_dev
>>
>>
>> So we could change that to:
>>
>> For N IO vqs userspace does
>>          open(/dev/vhost-scsi)
>>                  vhost_dev_init(create IO, evt, and ctl);
>>
>> for N IO vqs Qemu does:
>>          ioctl(VHOST_SET_OWNER)
>>                  vhost_dev_set_owner()
>>
>> for N IO vqs Qemu does:
>>          // virtqueue setup related ioctls
>>
>> for N IO vqs Qemu does:
>>          ioctl(VHOST_SCSI_SET_ENDPOINT)
>>                  - match LIO/target port to vhost_dev and assemble the
>> multiple vhost_dev device.
>>
>> The problem is that we have to setup some of the evt/ctl specific parts at
>> open() time when vhost_dev_init does vhost_poll_init for example.
>>
>> - At open time, we don't know if this vhost_dev is going to be part of a
>> multiple vhost_device device or a single one so we need to create at least 3
>> of them
>> - If it is a multiple device we don't know if its the first device being
>> created for the device or the N'th, so we don't know if the dev's vqs will
>> be used for IO or ctls/evts, so we have to create all 3.
>>
>> When we get the first VHOST_SCSI_SET_ENDPOINT call for a new style multiple
>> vhost_dev device, we can use that dev's evt/ctl vqs for events/controls
>> requests. When we get the other VHOST_SCSI_SET_ENDPOINT calls for the
>> multiple vhost_dev device then those dev's evt/ctl vqs will be ignored and
>> we will only use their IO vqs. So we end up with a lot of extra vqs.
> 
> The issue Jason's hinting at is how can admins control the amount
> of resources a given qemu instance can consume?
> After all vhost vqs all live in host kernel memory ...
> Limiting # of open fds would be one way to do that ...

If I understand you, then the answer is vhost scsi has a setting 
num_queues already that controls the number of vqs. The upstream 
kernel's vhost scsi driver and qemu's vhost scsi code support multiqueue 
today. To enable it, the admin is setting the qemu property num_queues 
(qemu/hw/scsi/host-scsi.c). In the current code, we are already doing 
what I described in "Today we do:".

In the second chunk of patches (patches 13 - 16) I'm just trying to make 
it so vhost-scsi gets a thread per IO vq.

Patch 17 then fixes up the cgroup support so the user can control the IO 
vqs with cgroups. Today for vhost scsi the vhost work thread takes the 
request from the vq, then passes it to a workqueue_struct workqueue to 
submit it to the block layer. So today we are putting the vhost work 
thread in the cgroup, but it's a different thread interacting with the 
block layer, and the cgroup settings/limits are not applying.


> 
> The need to share event/control vqs between devices is a problem though,
> and sending lots of ioctls on things like reset is also not that elegant.
> Jason, did you have a good solution in mind?
> 
>> One other question/issue I have is that qemu can open the /dev/vhost-scsi
>> device or it allows tools like libvirtd to open the device and pass in the
>> fd to use. For the latter case, would we continue to have those tools pass
>> in the leading fd, then have qemu do the other num_queues - 1
>> open(/dev/vhost-scsi) calls? Or do these apps that pass in the fd need to
>> know about all of the fds for some management reason?
> 
> They know about all the fds, for resource control and priveledge
> separation reasons.
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 07/17] vhost scsi: support delayed IO vq creation
  2020-10-30 16:30         ` Mike Christie
@ 2020-10-30 17:26           ` Mike Christie
  0 siblings, 0 replies; 43+ messages in thread
From: Mike Christie @ 2020-10-30 17:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, martin.petersen, linux-scsi, target-devel, pbonzini,
	stefanha, virtualization

On 10/30/20 11:30 AM, Mike Christie wrote:
> On 10/30/20 3:47 AM, Michael S. Tsirkin wrote:
>> On Tue, Oct 27, 2020 at 12:47:34AM -0500, Mike Christie wrote:
>>> On 10/25/20 10:51 PM, Jason Wang wrote:
>>>>
>>>> On 2020/10/22 上午8:34, Mike Christie wrote:
>>>>> Each vhost-scsi device will need a evt and ctl queue, but the number
>>>>> of IO queues depends on whatever the user has configured in userspace.
>>>>> This patch has vhost-scsi create the evt, ctl and one IO vq at device
>>>>> open time. We then create the other IO vqs when userspace starts to
>>>>> set them up. We still waste some mem on the vq and scsi vq structs,
>>>>> but we don't waste mem on iovec related arrays and for later patches
>>>>> we know which queues are used by the dev->nvqs value.
>>>>>
>>>>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
>>>>> ---
>>>>>    drivers/vhost/scsi.c | 19 +++++++++++++++----
>>>>>    1 file changed, 15 insertions(+), 4 deletions(-)
>>>>
>>>>
>>>> Not familiar with SCSI. But I wonder if it could behave like vhost-net.
>>>>
>>>> E.g userspace should known the number of virtqueues so it can just open
>>>> and close multiple vhost-scsi file descriptors.
>>>>
>>>
>>> One hiccup I'm hitting is that we might end up creating about 3x more 
>>> vqs
>>> than we need. The problem is that for scsi each vhost device has:
>>>
>>> vq=0: special control vq
>>> vq=1: event vq
>>> vq=2 and above: SCSI CMD/IO vqs. We want to create N of these.
>>>
>>> Today we do:
>>>
>>> Uerspace does open(/dev/vhost-scsi)
>>>          vhost_dev_init(create 128 vqs and then later we setup and 
>>> use N of
>>> them);
>>>
>>> Qemu does ioctl(VHOST_SET_OWNER)
>>>          vhost_dev_set_owner()
>>>
>>> For N vqs userspace does:
>>>          // virtqueue setup related ioctls
>>>
>>> Qemu does ioctl(VHOST_SCSI_SET_ENDPOINT)
>>>          - match LIO/target port to vhost_dev
>>>
>>>
>>> So we could change that to:
>>>
>>> For N IO vqs userspace does
>>>          open(/dev/vhost-scsi)
>>>                  vhost_dev_init(create IO, evt, and ctl);
>>>
>>> for N IO vqs Qemu does:
>>>          ioctl(VHOST_SET_OWNER)
>>>                  vhost_dev_set_owner()
>>>
>>> for N IO vqs Qemu does:
>>>          // virtqueue setup related ioctls
>>>
>>> for N IO vqs Qemu does:
>>>          ioctl(VHOST_SCSI_SET_ENDPOINT)
>>>                  - match LIO/target port to vhost_dev and assemble the
>>> multiple vhost_dev device.
>>>
>>> The problem is that we have to setup some of the evt/ctl specific 
>>> parts at
>>> open() time when vhost_dev_init does vhost_poll_init for example.
>>>
>>> - At open time, we don't know if this vhost_dev is going to be part of a
>>> multiple vhost_device device or a single one so we need to create at 
>>> least 3
>>> of them
>>> - If it is a multiple device we don't know if its the first device being
>>> created for the device or the N'th, so we don't know if the dev's vqs 
>>> will
>>> be used for IO or ctls/evts, so we have to create all 3.
>>>
>>> When we get the first VHOST_SCSI_SET_ENDPOINT call for a new style 
>>> multiple
>>> vhost_dev device, we can use that dev's evt/ctl vqs for events/controls
>>> requests. When we get the other VHOST_SCSI_SET_ENDPOINT calls for the
>>> multiple vhost_dev device then those dev's evt/ctl vqs will be 
>>> ignored and
>>> we will only use their IO vqs. So we end up with a lot of extra vqs.
>>
>> The issue Jason's hinting at is how can admins control the amount
>> of resources a given qemu instance can consume?
>> After all vhost vqs all live in host kernel memory ...
>> Limiting # of open fds would be one way to do that ...
> 
> If I understand you, then the answer is vhost scsi has a setting 
> num_queues already that controls the number of vqs. The upstream 
> kernel's vhost scsi driver and qemu's vhost scsi code support multiqueue 
> today. To enable it, the admin is setting the qemu property num_queues 
> (qemu/hw/scsi/host-scsi.c). In the current code, we are already doing 
> what I described in "Today we do:".
> 
> In the second chunk of patches (patches 13 - 16) I'm just trying to make 
> it so vhost-scsi gets a thread per IO vq.
> 
> Patch 17 then fixes up the cgroup support so the user can control the IO 
> vqs with cgroups. Today for vhost scsi the vhost work thread takes the 
> request from the vq, then passes it to a workqueue_struct workqueue to 
> submit it to the block layer. So today we are putting the vhost work 
> thread in the cgroup, but it's a different thread interacting with the 
> block layer, and the cgroup settings/limits are not applying.
> 

Ah, I think I did misundestand you. Today, you can set the fd limit to N 
and that would limit the total number of devices. But right now the user 
can set each of those N device's to have anywhere from num_queues=1 - 
128 which could be a wide range of resource use. You want something 
finer grained right?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 07/17] vhost scsi: support delayed IO vq creation
  2020-10-30  8:47       ` Michael S. Tsirkin
  2020-10-30 16:30         ` Mike Christie
@ 2020-11-01 22:06         ` Mike Christie
  2020-11-02  6:36         ` Jason Wang
  2 siblings, 0 replies; 43+ messages in thread
From: Mike Christie @ 2020-11-01 22:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, martin.petersen, linux-scsi, target-devel, pbonzini,
	stefanha, virtualization

On 10/30/20 3:47 AM, Michael S. Tsirkin wrote:
> On Tue, Oct 27, 2020 at 12:47:34AM -0500, Mike Christie wrote:
>> On 10/25/20 10:51 PM, Jason Wang wrote:
>>>
>>> On 2020/10/22 上午8:34, Mike Christie wrote:
>>>> Each vhost-scsi device will need a evt and ctl queue, but the number
>>>> of IO queues depends on whatever the user has configured in userspace.
>>>> This patch has vhost-scsi create the evt, ctl and one IO vq at device
>>>> open time. We then create the other IO vqs when userspace starts to
>>>> set them up. We still waste some mem on the vq and scsi vq structs,
>>>> but we don't waste mem on iovec related arrays and for later patches
>>>> we know which queues are used by the dev->nvqs value.
>>>>
>>>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
>>>> ---
>>>>   drivers/vhost/scsi.c | 19 +++++++++++++++----
>>>>   1 file changed, 15 insertions(+), 4 deletions(-)
>>>
>>>
>>> Not familiar with SCSI. But I wonder if it could behave like vhost-net.
>>>
>>> E.g userspace should known the number of virtqueues so it can just open
>>> and close multiple vhost-scsi file descriptors.
>>>
>>
>> One hiccup I'm hitting is that we might end up creating about 3x more vqs
>> than we need. The problem is that for scsi each vhost device has:
>>
>> vq=0: special control vq
>> vq=1: event vq
>> vq=2 and above: SCSI CMD/IO vqs. We want to create N of these.
>>
>> Today we do:
>>
>> Uerspace does open(/dev/vhost-scsi)
>>         vhost_dev_init(create 128 vqs and then later we setup and use N of
>> them);
>>
>> Qemu does ioctl(VHOST_SET_OWNER)
>>         vhost_dev_set_owner()
>>
>> For N vqs userspace does:
>>         // virtqueue setup related ioctls
>>
>> Qemu does ioctl(VHOST_SCSI_SET_ENDPOINT)
>>         - match LIO/target port to vhost_dev
>>
>>
>> So we could change that to:
>>
>> For N IO vqs userspace does
>>         open(/dev/vhost-scsi)
>>                 vhost_dev_init(create IO, evt, and ctl);
>>
>> for N IO vqs Qemu does:
>>         ioctl(VHOST_SET_OWNER)
>>                 vhost_dev_set_owner()
>>
>> for N IO vqs Qemu does:
>>         // virtqueue setup related ioctls
>>
>> for N IO vqs Qemu does:
>>         ioctl(VHOST_SCSI_SET_ENDPOINT)
>>                 - match LIO/target port to vhost_dev and assemble the
>> multiple vhost_dev device.
>>
>> The problem is that we have to setup some of the evt/ctl specific parts at
>> open() time when vhost_dev_init does vhost_poll_init for example.
>>
>> - At open time, we don't know if this vhost_dev is going to be part of a
>> multiple vhost_device device or a single one so we need to create at least 3
>> of them
>> - If it is a multiple device we don't know if its the first device being
>> created for the device or the N'th, so we don't know if the dev's vqs will
>> be used for IO or ctls/evts, so we have to create all 3.
>>
>> When we get the first VHOST_SCSI_SET_ENDPOINT call for a new style multiple
>> vhost_dev device, we can use that dev's evt/ctl vqs for events/controls
>> requests. When we get the other VHOST_SCSI_SET_ENDPOINT calls for the
>> multiple vhost_dev device then those dev's evt/ctl vqs will be ignored and
>> we will only use their IO vqs. So we end up with a lot of extra vqs.
> 
> The issue Jason's hinting at is how can admins control the amount
> of resources a given qemu instance can consume?
> After all vhost vqs all live in host kernel memory ...
> Limiting # of open fds would be one way to do that ...
> 
> The need to share event/control vqs between devices is a problem though,
> and sending lots of ioctls on things like reset is also not that elegant.
> Jason, did you have a good solution in mind?
> 

Hey, so here is a prototype/outline of how we could add support for the
multiple device approach and keep compat support for the existing single
device multiple vq code. And, for the new style multiple dev approach we
keep the vq allocation to a minimum.

This patch was made over patches 0 - 11 in this patchset, but do not waste
your time reviewing this patch line by line. It's still really broken :) It
should give you an idea of what I was saying above about the evt/ctl queue
issue and give you an idea of how ugly/nice it is vs the patches 12 - 16
in this set.

--------------

In this patch we add a new struct vhost_scsi_md that represents multiple
vhost_scsi devices that are being combined to make one device.

Userspace signals the kernel it supports the new md approach by writing to
a new mod param vhost_scsi_multi_dev_per_nexus. If that is set then at
open() scsi.c will do vhost_dev_init with 1 vq. The vq's handle_kick
function is a dummy no op (vhost_scsi_no_op_kick), because at this time we
don't know if this device's vq is going to be a evt, ctl or IO vq.

Userpsace would then do open() N times for each vq it wanted to create.

Qemu then does it's dev and ring/vq setup.

Lastly qemu does the ioctl that calls into vhost_scsi_set_endpoint. For this
function scsi has to figure out if it's a md device or old style one. If a md
device then we figure out which vq this will be in the new function
vhost_scsi_md_add_vs().

Here is where it gets a little gross. Because we don't know what type of
vq it is at vhost_dev_init/open time, I've added a new function
vhost_vq_reset_kick_handler which just resets the handle_kick callout
that we had setup in vhost_dev_init. We call this in vhost_scsi_md_add_vs
to set the correct handle_kick function.



diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 774bffe..f18f7b1 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -62,6 +62,12 @@
  */
 #define VHOST_SCSI_WEIGHT 256
 
+static bool vhost_scsi_multi_dev_per_nexus;
+module_param_named(multiple_vhost_devs_per_nexus,
+		   vhost_scsi_multi_dev_per_nexus, bool, 0644);
+MODULE_PARM_DESC(multiple_vhost_devs_per_nexus,
+		 "Turn on support for combing multiple vhost-scsi device instances into a single I_T Nexus. Set to true to turn on. Default is off.");
+
 struct vhost_scsi_inflight {
 	/* Wait for the flush operation to finish */
 	struct completion comp;
@@ -127,7 +133,7 @@ struct vhost_scsi_tpg {
 	int tv_tpg_vhost_count;
 	/* Used for enabling T10-PI with legacy devices */
 	int tv_fabric_prot_type;
-	/* list for vhost_scsi_list */
+	/* list for vhost_scsi_tpg_list */
 	struct list_head tv_tpg_list;
 	/* Used to protect access for tpg_nexus */
 	struct mutex tv_tpg_mutex;
@@ -137,7 +143,7 @@ struct vhost_scsi_tpg {
 	struct vhost_scsi_tport *tport;
 	/* Returned by vhost_scsi_make_tpg() */
 	struct se_portal_group se_tpg;
-	/* Pointer back to vhost_scsi, protected by tv_tpg_mutex */
+	/* Pointer back to vhost_scsi used for events, protected by tv_tpg_mutex */
 	struct vhost_scsi *vhost_scsi;
 	struct list_head tmf_queue;
 };
@@ -194,13 +200,22 @@ struct vhost_scsi_virtqueue {
 	int max_cmds;
 };
 
+struct vhost_scsi_md {
+	struct list_head vhost_scsi_md_list_entry;
+	struct list_head vhost_scsi_list;
+	int vs_cnt;
+};
+
 struct vhost_scsi {
 	/* Protected by vhost_scsi->dev.mutex */
 	struct vhost_scsi_tpg **vs_tpg;
+	struct list_head vhost_scsi_list_entry;
 	char vs_vhost_wwpn[TRANSPORT_IQN_LEN];
+	struct vhost_scsi_md *md;
+	bool md_enabled;
 
 	struct vhost_dev dev;
-	struct vhost_scsi_virtqueue vqs[VHOST_SCSI_MAX_VQ];
+	struct vhost_scsi_virtqueue *vqs;
 
 	struct vhost_work vs_completion_work; /* cmd completion work item */
 	struct llist_head vs_completion_list; /* cmd completion queue */
@@ -242,8 +257,11 @@ struct vhost_scsi_ctx {
 static struct workqueue_struct *vhost_scsi_workqueue;
 
 /* Global spinlock to protect vhost_scsi TPG list for vhost IOCTL access */
-static DEFINE_MUTEX(vhost_scsi_mutex);
-static LIST_HEAD(vhost_scsi_list);
+static DEFINE_MUTEX(vhost_scsi_tpg_mutex);
+static LIST_HEAD(vhost_scsi_tpg_list);
+
+/* List of multiple device (mq) devs accesed under the vhost_scsi_tpg_mutex */
+static LIST_HEAD(vhost_scsi_md_list);
 
 static void vhost_scsi_done_inflight(struct kref *kref)
 {
@@ -260,7 +278,7 @@ static void vhost_scsi_init_inflight(struct vhost_scsi *vs,
 	struct vhost_virtqueue *vq;
 	int idx, i;
 
-	for (i = 0; i < VHOST_SCSI_MAX_VQ; i++) {
+	for (i = 0; i < vs->dev.max_nvqs; i++) {
 		vq = &vs->vqs[i].vq;
 
 		mutex_lock(&vq->mutex);
@@ -588,8 +606,7 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work *work)
 	}
 
 	vq = -1;
-	while ((vq = find_next_bit(signal, VHOST_SCSI_MAX_VQ, vq + 1))
-		< VHOST_SCSI_MAX_VQ)
+	while ((vq = find_next_bit(signal, vs->dev.nvqs, vq + 1)) < vs->dev.nvqs)
 		vhost_signal(&vs->dev, &vs->vqs[vq].vq);
 }
 
@@ -1443,6 +1460,11 @@ static void vhost_scsi_handle_kick(struct vhost_work *work)
 	vhost_scsi_handle_vq(vs, vq);
 }
 
+static void vhost_scsi_no_op_kick(struct vhost_work *work)
+{
+	pr_err("Invalid no op kick call\n");
+}
+
 /* Callers must hold dev mutex */
 static void vhost_scsi_flush(struct vhost_scsi *vs)
 {
@@ -1457,14 +1479,14 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
 	 * indicate the start of the flush operation so that it will reach 0
 	 * when all the reqs are finished.
 	 */
-	for (i = 0; i < VHOST_SCSI_MAX_VQ; i++)
+	for (i = 0; i < vs->dev.nvqs; i++)
 		kref_put(&old_inflight[i]->kref, vhost_scsi_done_inflight);
 
 	/* Flush both the vhost poll and vhost work */
 	vhost_work_dev_flush(&vs->dev);
 
 	/* Wait for all reqs issued before the flush to be finished */
-	for (i = 0; i < VHOST_SCSI_MAX_VQ; i++)
+	for (i = 0; i < vs->dev.nvqs; i++)
 		wait_for_completion(&old_inflight[i]->comp);
 }
 
@@ -1545,12 +1567,87 @@ static int vhost_scsi_setup_vq_cmds(struct vhost_virtqueue *vq, int max_cmds)
 	return -ENOMEM;
 }
 
+static void vhost_scsi_md_del_vs(struct vhost_scsi *vs)
+{
+	struct vhost_scsi_md *md;
+
+	if (!vs->md_enabled)
+		return;
+
+	if (list_empty(&vs->vhost_scsi_list_entry))
+		return;
+
+	md = vs->md;
+	vs->md = NULL;
+	md->vs_cnt--;
+	list_del_init(&vs->vhost_scsi_list_entry);
+
+	if (!md->vs_cnt) {
+		list_del(&md->vhost_scsi_md_list_entry);
+		kfree(md);
+	}
+}
+
+static int vhost_scsi_md_add_vs(struct vhost_scsi *vs,
+				struct vhost_scsi_target *tgt)
+{
+	struct vhost_scsi *lead_vs;
+	struct vhost_scsi_md *md;
+
+	if (!vs->md_enabled)
+		return 0;
+
+	if (!list_empty(&vs->vhost_scsi_list_entry))
+		return 0;
+
+	list_for_each_entry(md, &vhost_scsi_md_list, vhost_scsi_md_list_entry) {
+		lead_vs = list_first_entry(&md->vhost_scsi_list,
+					   struct vhost_scsi,
+					   vhost_scsi_list_entry);
+		if (memcmp(lead_vs->vs_vhost_wwpn, tgt->vhost_wwpn,
+			   sizeof(tgt->vhost_wwpn)))
+			continue;
+
+		goto add_vs;
+	}
+
+	md = kzalloc(sizeof(*md), GFP_KERNEL);
+	if (md)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&md->vhost_scsi_list);
+	INIT_LIST_HEAD(&md->vhost_scsi_md_list_entry);
+
+	list_add_tail(&md->vhost_scsi_md_list_entry, &vhost_scsi_md_list);
+
+add_vs:
+	switch (md->vs_cnt) {
+	case VHOST_SCSI_VQ_CTL:
+		vhost_vq_reset_kick_handler(&vs->vqs[0].vq,
+					    vhost_scsi_ctl_handle_kick);
+		break;
+	case VHOST_SCSI_VQ_EVT:
+		vhost_vq_reset_kick_handler(&vs->vqs[0].vq,
+					    vhost_scsi_evt_handle_kick);
+		break;
+	default:
+		vhost_vq_reset_kick_handler(&vs->vqs[0].vq,
+					    vhost_scsi_handle_kick);
+		break;
+	}
+
+	vs->md = md;
+	md->vs_cnt++;
+	list_add_tail(&vs->vhost_scsi_list_entry, &md->vhost_scsi_list);
+	return 0;
+}
+
 /*
  * Called from vhost_scsi_ioctl() context to walk the list of available
  * vhost_scsi_tpg with an active struct vhost_scsi_nexus
  *
  *  The lock nesting rule is:
- *    vhost_scsi_mutex -> vs->dev.mutex -> tpg->tv_tpg_mutex -> vq->mutex
+ *    vhost_scsi_tpg_mutex -> vs->dev.mutex -> tpg->tv_tpg_mutex -> vq->mutex
  */
 static int
 vhost_scsi_set_endpoint(struct vhost_scsi *vs,
@@ -1564,7 +1661,7 @@ static int vhost_scsi_setup_vq_cmds(struct vhost_virtqueue *vq, int max_cmds)
 	int index, ret, i, len;
 	bool match = false;
 
-	mutex_lock(&vhost_scsi_mutex);
+	mutex_lock(&vhost_scsi_tpg_mutex);
 	mutex_lock(&vs->dev.mutex);
 
 	/* Verify that ring has been setup correctly. */
@@ -1585,13 +1682,14 @@ static int vhost_scsi_setup_vq_cmds(struct vhost_virtqueue *vq, int max_cmds)
 	if (vs->vs_tpg)
 		memcpy(vs_tpg, vs->vs_tpg, len);
 
-	list_for_each_entry(tpg, &vhost_scsi_list, tv_tpg_list) {
+	list_for_each_entry(tpg, &vhost_scsi_tpg_list, tv_tpg_list) {
 		mutex_lock(&tpg->tv_tpg_mutex);
 		if (!tpg->tpg_nexus) {
 			mutex_unlock(&tpg->tv_tpg_mutex);
 			continue;
 		}
-		if (tpg->tv_tpg_vhost_count != 0) {
+		if (!vhost_scsi_multi_dev_per_nexus &&
+		    tpg->tv_tpg_vhost_count != 0) {
 			mutex_unlock(&tpg->tv_tpg_mutex);
 			continue;
 		}
@@ -1616,8 +1714,19 @@ static int vhost_scsi_setup_vq_cmds(struct vhost_virtqueue *vq, int max_cmds)
 				mutex_unlock(&tpg->tv_tpg_mutex);
 				goto undepend;
 			}
+
+			ret = vhost_scsi_md_add_vs(vs, t);
+			if (ret)
+				goto undepend;
+
+			/*
+			 * In md mode the first vs added will be used for the
+			 * event queue. In non-md mode we only have the 1 vs.
+			 */
+			if (!tpg->vhost_scsi)
+				tpg->vhost_scsi = vs;
+
 			tpg->tv_tpg_vhost_count++;
-			tpg->vhost_scsi = vs;
 			vs_tpg[tpg->tport_tpgt] = tpg;
 			match = true;
 		}
@@ -1628,7 +1737,7 @@ static int vhost_scsi_setup_vq_cmds(struct vhost_virtqueue *vq, int max_cmds)
 		memcpy(vs->vs_vhost_wwpn, t->vhost_wwpn,
 		       sizeof(vs->vs_vhost_wwpn));
 
-		for (i = VHOST_SCSI_VQ_IO; i < VHOST_SCSI_MAX_VQ; i++) {
+		for (i = VHOST_SCSI_VQ_IO; i < vs->dev.nvqs; i++) {
 			vq = &vs->vqs[i].vq;
 			if (!vq->initialized)
 				continue;
@@ -1637,7 +1746,7 @@ static int vhost_scsi_setup_vq_cmds(struct vhost_virtqueue *vq, int max_cmds)
 				goto destroy_vq_cmds;
 		}
 
-		for (i = 0; i < VHOST_SCSI_MAX_VQ; i++) {
+		for (i = 0; i < vs->dev.nvqs; i++) {
 			vq = &vs->vqs[i].vq;
 			if (!vq->initialized)
 				continue;
@@ -1670,6 +1779,7 @@ static int vhost_scsi_setup_vq_cmds(struct vhost_virtqueue *vq, int max_cmds)
 	for (i = 0; i < VHOST_SCSI_MAX_TARGET; i++) {
 		tpg = vs_tpg[i];
 		if (tpg) {
+			vhost_scsi_md_del_vs(vs);
 			tpg->tv_tpg_vhost_count--;
 			target_undepend_item(&tpg->se_tpg.tpg_group.cg_item);
 		}
@@ -1677,7 +1787,7 @@ static int vhost_scsi_setup_vq_cmds(struct vhost_virtqueue *vq, int max_cmds)
 	kfree(vs_tpg);
 out:
 	mutex_unlock(&vs->dev.mutex);
-	mutex_unlock(&vhost_scsi_mutex);
+	mutex_unlock(&vhost_scsi_tpg_mutex);
 	return ret;
 }
 
@@ -1693,7 +1803,7 @@ static int vhost_scsi_setup_vq_cmds(struct vhost_virtqueue *vq, int max_cmds)
 	int index, ret, i;
 	u8 target;
 
-	mutex_lock(&vhost_scsi_mutex);
+	mutex_lock(&vhost_scsi_tpg_mutex);
 	mutex_lock(&vs->dev.mutex);
 	/* Verify that ring has been setup correctly. */
 	for (index = 0; index < vs->dev.max_nvqs; ++index) {
@@ -1732,6 +1842,7 @@ static int vhost_scsi_setup_vq_cmds(struct vhost_virtqueue *vq, int max_cmds)
 		tpg->tv_tpg_vhost_count--;
 		tpg->vhost_scsi = NULL;
 		vs->vs_tpg[target] = NULL;
+		vhost_scsi_md_del_vs(vs);
 		match = true;
 		mutex_unlock(&tpg->tv_tpg_mutex);
 		/*
@@ -1742,7 +1853,7 @@ static int vhost_scsi_setup_vq_cmds(struct vhost_virtqueue *vq, int max_cmds)
 		target_undepend_item(&se_tpg->tpg_group.cg_item);
 	}
 	if (match) {
-		for (i = 0; i < VHOST_SCSI_MAX_VQ; i++) {
+		for (i = 0; i < vs->dev.nvqs; i++) {
 			vq = &vs->vqs[i].vq;
 			if (!vq->initialized)
 				continue;
@@ -1767,14 +1878,14 @@ static int vhost_scsi_setup_vq_cmds(struct vhost_virtqueue *vq, int max_cmds)
 	vs->vs_tpg = NULL;
 	WARN_ON(vs->vs_events_nr);
 	mutex_unlock(&vs->dev.mutex);
-	mutex_unlock(&vhost_scsi_mutex);
+	mutex_unlock(&vhost_scsi_tpg_mutex);
 	return 0;
 
 err_tpg:
 	mutex_unlock(&tpg->tv_tpg_mutex);
 err_dev:
 	mutex_unlock(&vs->dev.mutex);
-	mutex_unlock(&vhost_scsi_mutex);
+	mutex_unlock(&vhost_scsi_tpg_mutex);
 	return ret;
 }
 
@@ -1793,7 +1904,7 @@ static int vhost_scsi_set_features(struct vhost_scsi *vs, u64 features)
 		return -EFAULT;
 	}
 
-	for (i = 0; i < VHOST_SCSI_MAX_VQ; i++) {
+	for (i = 0; i < vs->dev.nvqs; i++) {
 		vq = &vs->vqs[i].vq;
 		mutex_lock(&vq->mutex);
 		vq->acked_features = features;
@@ -1803,11 +1914,48 @@ static int vhost_scsi_set_features(struct vhost_scsi *vs, u64 features)
 	return 0;
 }
 
+static struct vhost_virtqueue **
+vhost_scsi_vqs_init(struct vhost_scsi *vs, int max_nvqs)
+{
+	struct vhost_virtqueue **vqs;
+	int i;
+
+	vs->vqs = kcalloc(max_nvqs, sizeof(*vs->vqs), GFP_KERNEL);
+	if (!vs->vqs)
+		return NULL;
+
+	vqs = kcalloc(max_nvqs, sizeof(*vqs), GFP_KERNEL);
+	if (!vqs)
+		goto err_vqs;
+
+	if (!vs->md_enabled) {
+		vqs[VHOST_SCSI_VQ_CTL] = &vs->vqs[VHOST_SCSI_VQ_CTL].vq;
+		vqs[VHOST_SCSI_VQ_EVT] = &vs->vqs[VHOST_SCSI_VQ_EVT].vq;
+		vs->vqs[VHOST_SCSI_VQ_CTL].vq.handle_kick +						vhost_scsi_ctl_handle_kick;
+		vs->vqs[VHOST_SCSI_VQ_EVT].vq.handle_kick +						vhost_scsi_evt_handle_kick;
+		for (i = VHOST_SCSI_VQ_IO; i < max_nvqs; i++) {
+			vqs[i] = &vs->vqs[i].vq;
+			vs->vqs[i].vq.handle_kick = vhost_scsi_handle_kick;
+		}
+	} else {
+		vqs[0] = &vs->vqs[0].vq;
+		vs->vqs[0].vq.handle_kick = vhost_scsi_no_op_kick;
+	}
+
+	return vqs;
+
+err_vqs:
+	kfree(vs->vqs);
+	return NULL;
+}
+
 static int vhost_scsi_open(struct inode *inode, struct file *f)
 {
 	struct vhost_scsi *vs;
 	struct vhost_virtqueue **vqs;
-	int r = -ENOMEM, i;
+	int r = -ENOMEM, nvqs, max_nvqs;
 
 	vs = kzalloc(sizeof(*vs), GFP_KERNEL | __GFP_NOWARN | __GFP_RETRY_MAYFAIL);
 	if (!vs) {
@@ -1815,10 +1963,20 @@ static int vhost_scsi_open(struct inode *inode, struct file *f)
 		if (!vs)
 			goto err_vs;
 	}
+	INIT_LIST_HEAD(&vs->vhost_scsi_list_entry);
+	vs->md_enabled = vhost_scsi_multi_dev_per_nexus;
 
-	vqs = kmalloc_array(VHOST_SCSI_MAX_VQ, sizeof(*vqs), GFP_KERNEL);
-	if (!vqs)
-		goto err_vqs;
+	if (vs->md_enabled) {
+		max_nvqs = 1;
+		nvqs = 1;
+	} else {
+		/*
+		 * We will always need the ctl, evt and at least 1 IO vq.
+		 * Create more IO vqs if userspace requests them.
+		 */
+		max_nvqs = VHOST_SCSI_MAX_VQ;
+		nvqs = 3;
+	}
 
 	vhost_work_init(&vs->vs_completion_work, vhost_scsi_complete_cmd_work);
 	vhost_work_init(&vs->vs_event_work, vhost_scsi_evt_work);
@@ -1826,20 +1984,11 @@ static int vhost_scsi_open(struct inode *inode, struct file *f)
 	vs->vs_events_nr = 0;
 	vs->vs_events_missed = false;
 
-	vqs[VHOST_SCSI_VQ_CTL] = &vs->vqs[VHOST_SCSI_VQ_CTL].vq;
-	vqs[VHOST_SCSI_VQ_EVT] = &vs->vqs[VHOST_SCSI_VQ_EVT].vq;
-	vs->vqs[VHOST_SCSI_VQ_CTL].vq.handle_kick = vhost_scsi_ctl_handle_kick;
-	vs->vqs[VHOST_SCSI_VQ_EVT].vq.handle_kick = vhost_scsi_evt_handle_kick;
-	for (i = VHOST_SCSI_VQ_IO; i < VHOST_SCSI_MAX_VQ; i++) {
-		vqs[i] = &vs->vqs[i].vq;
-		vs->vqs[i].vq.handle_kick = vhost_scsi_handle_kick;
-	}
+	vqs = vhost_scsi_vqs_init(vs, max_nvqs);
+	if (!vqs)
+		goto err_vqs_init;
 
-	/*
-	 * We will always need the ctl, evt and at least 1 IO vq. Create more
-	 * IO vqs if userspace requests them.
-	 */
-	r = vhost_dev_init(&vs->dev, vqs, 3, VHOST_SCSI_MAX_VQ, UIO_MAXIOV,
+	r = vhost_dev_init(&vs->dev, vqs, nvqs, max_nvqs, UIO_MAXIOV,
 			   VHOST_SCSI_WEIGHT, 0, true, NULL);
 	if (r)
 		goto err_dev_init;
@@ -1851,7 +2000,8 @@ static int vhost_scsi_open(struct inode *inode, struct file *f)
 
 err_dev_init:
 	kfree(vqs);
-err_vqs:
+	kfree(vs->vqs);
+err_vqs_init:
 	kvfree(vs);
 err_vs:
 	return r;
@@ -1871,6 +2021,7 @@ static int vhost_scsi_release(struct inode *inode, struct file *f)
 	/* Jobs can re-queue themselves in evt kick handler. Do extra flush. */
 	vhost_scsi_flush(vs);
 	kfree(vs->dev.vqs);
+	kfree(vs->vqs);
 	kvfree(vs);
 	return 0;
 }
@@ -2035,7 +2186,7 @@ static int vhost_scsi_port_link(struct se_portal_group *se_tpg,
 	INIT_LIST_HEAD(&tmf->queue_entry);
 	vhost_work_init(&tmf->vwork, vhost_scsi_tmf_resp_work);
 
-	mutex_lock(&vhost_scsi_mutex);
+	mutex_lock(&vhost_scsi_tpg_mutex);
 
 	mutex_lock(&tpg->tv_tpg_mutex);
 	tpg->tv_tpg_port_count++;
@@ -2044,7 +2195,7 @@ static int vhost_scsi_port_link(struct se_portal_group *se_tpg,
 
 	vhost_scsi_hotplug(tpg, lun);
 
-	mutex_unlock(&vhost_scsi_mutex);
+	mutex_unlock(&vhost_scsi_tpg_mutex);
 
 	return 0;
 }
@@ -2056,7 +2207,7 @@ static void vhost_scsi_port_unlink(struct se_portal_group *se_tpg,
 				struct vhost_scsi_tpg, se_tpg);
 	struct vhost_scsi_tmf *tmf;
 
-	mutex_lock(&vhost_scsi_mutex);
+	mutex_lock(&vhost_scsi_tpg_mutex);
 
 	mutex_lock(&tpg->tv_tpg_mutex);
 	tpg->tv_tpg_port_count--;
@@ -2068,7 +2219,7 @@ static void vhost_scsi_port_unlink(struct se_portal_group *se_tpg,
 
 	vhost_scsi_hotunplug(tpg, lun);
 
-	mutex_unlock(&vhost_scsi_mutex);
+	mutex_unlock(&vhost_scsi_tpg_mutex);
 }
 
 static ssize_t vhost_scsi_tpg_attrib_fabric_prot_type_store(
@@ -2333,9 +2484,9 @@ static ssize_t vhost_scsi_tpg_nexus_store(struct config_item *item,
 		kfree(tpg);
 		return NULL;
 	}
-	mutex_lock(&vhost_scsi_mutex);
-	list_add_tail(&tpg->tv_tpg_list, &vhost_scsi_list);
-	mutex_unlock(&vhost_scsi_mutex);
+	mutex_lock(&vhost_scsi_tpg_mutex);
+	list_add_tail(&tpg->tv_tpg_list, &vhost_scsi_tpg_list);
+	mutex_unlock(&vhost_scsi_tpg_mutex);
 
 	return &tpg->se_tpg;
 }
@@ -2345,9 +2496,9 @@ static void vhost_scsi_drop_tpg(struct se_portal_group *se_tpg)
 	struct vhost_scsi_tpg *tpg = container_of(se_tpg,
 				struct vhost_scsi_tpg, se_tpg);
 
-	mutex_lock(&vhost_scsi_mutex);
+	mutex_lock(&vhost_scsi_tpg_mutex);
 	list_del(&tpg->tv_tpg_list);
-	mutex_unlock(&vhost_scsi_mutex);
+	mutex_unlock(&vhost_scsi_tpg_mutex);
 	/*
 	 * Release the virtual I_T Nexus for this vhost TPG
 	 */
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 2ca2e71..d44351c 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -199,6 +199,25 @@ void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
 }
 EXPORT_SYMBOL_GPL(vhost_poll_init);
 
+static void vhost_poll_reinit_work(struct vhost_poll *poll, vhost_work_fn_t fn)
+{
+	vhost_work_init(&poll->work, fn);
+}
+
+/**
+ * vhost_vq_reset_kick_handler: reset the vq's kick handler
+ * @vq: vq to reset
+ * @fn: new kick function
+ *
+ * This must be called before the vq is activated
+ */
+void vhost_vq_reset_kick_handler(struct vhost_virtqueue *vq, vhost_work_fn_t fn)
+{
+	vq->handle_kick = fn;
+	vhost_poll_reinit_work(&vq->poll, fn);
+}
+EXPORT_SYMBOL_GPL(vhost_vq_reset_kick_handler);
+
 /* Start polling a file. We add ourselves to file's wait queue. The caller must
  * keep a reference to a file until after vhost_poll_stop is called. */
 int vhost_poll_start(struct vhost_poll *poll, struct file *file)
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 08c5aef..d748cb7 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -193,6 +193,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *,
 		      struct vhost_log *log, unsigned int *log_num);
 void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
 
+void vhost_vq_reset_kick_handler(struct vhost_virtqueue *vq, vhost_work_fn_t fn);
 int vhost_vq_init_access(struct vhost_virtqueue *);
 int vhost_add_used(struct vhost_virtqueue *, unsigned int head, int len);
 int vhost_add_used_n(struct vhost_virtqueue *, struct vring_used_elem *heads,

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 04/17] vhost: prep vhost_dev_init users to handle failures
  2020-10-22  0:34 ` [PATCH 04/17] vhost: prep vhost_dev_init users to handle failures Mike Christie
  2020-10-22  5:22   ` kernel test robot
  2020-10-23 16:15   ` Mike Christie
@ 2020-11-02  5:57   ` Jason Wang
  2020-11-03 10:04   ` Dan Carpenter
  3 siblings, 0 replies; 43+ messages in thread
From: Jason Wang @ 2020-11-02  5:57 UTC (permalink / raw)
  To: Mike Christie, martin.petersen, linux-scsi, target-devel, mst,
	pbonzini, stefanha, virtualization


On 2020/10/22 上午8:34, Mike Christie wrote:
> This is just a prep patch to get vhost_dev_init callers ready to handle
> the next patch where the function can fail. In this patch vhost_dev_init
> just returns 0, but I think it's easier to check for goto/error handling
> errors separated from the next patch.
>
> Signed-off-by: Mike Christie<michael.christie@oracle.com>
> ---
>   drivers/vhost/net.c   | 11 +++++++----
>   drivers/vhost/scsi.c  |  7 +++++--
>   drivers/vhost/test.c  |  9 +++++++--
>   drivers/vhost/vdpa.c  |  7 +++++--
>   drivers/vhost/vhost.c | 14 ++++++++------
>   drivers/vhost/vhost.h | 10 +++++-----
>   drivers/vhost/vsock.c |  9 ++++++---
>   7 files changed, 43 insertions(+), 24 deletions(-)


Acked-by: Jason Wang <jasowang@redhat.com>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 07/17] vhost scsi: support delayed IO vq creation
  2020-10-30  8:47       ` Michael S. Tsirkin
  2020-10-30 16:30         ` Mike Christie
  2020-11-01 22:06         ` Mike Christie
@ 2020-11-02  6:36         ` Jason Wang
  2020-11-02  6:49           ` Jason Wang
  2 siblings, 1 reply; 43+ messages in thread
From: Jason Wang @ 2020-11-02  6:36 UTC (permalink / raw)
  To: Michael S. Tsirkin, Mike Christie
  Cc: martin.petersen, linux-scsi, target-devel, pbonzini, stefanha,
	virtualization


On 2020/10/30 下午4:47, Michael S. Tsirkin wrote:
> On Tue, Oct 27, 2020 at 12:47:34AM -0500, Mike Christie wrote:
>> On 10/25/20 10:51 PM, Jason Wang wrote:
>>> On 2020/10/22 上午8:34, Mike Christie wrote:
>>>> Each vhost-scsi device will need a evt and ctl queue, but the number
>>>> of IO queues depends on whatever the user has configured in userspace.
>>>> This patch has vhost-scsi create the evt, ctl and one IO vq at device
>>>> open time. We then create the other IO vqs when userspace starts to
>>>> set them up. We still waste some mem on the vq and scsi vq structs,
>>>> but we don't waste mem on iovec related arrays and for later patches
>>>> we know which queues are used by the dev->nvqs value.
>>>>
>>>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
>>>> ---
>>>>    drivers/vhost/scsi.c | 19 +++++++++++++++----
>>>>    1 file changed, 15 insertions(+), 4 deletions(-)
>>>
>>> Not familiar with SCSI. But I wonder if it could behave like vhost-net.
>>>
>>> E.g userspace should known the number of virtqueues so it can just open
>>> and close multiple vhost-scsi file descriptors.
>>>
>> One hiccup I'm hitting is that we might end up creating about 3x more vqs
>> than we need. The problem is that for scsi each vhost device has:
>>
>> vq=0: special control vq
>> vq=1: event vq
>> vq=2 and above: SCSI CMD/IO vqs. We want to create N of these.
>>
>> Today we do:
>>
>> Uerspace does open(/dev/vhost-scsi)
>>          vhost_dev_init(create 128 vqs and then later we setup and use N of
>> them);
>>
>> Qemu does ioctl(VHOST_SET_OWNER)
>>          vhost_dev_set_owner()
>>
>> For N vqs userspace does:
>>          // virtqueue setup related ioctls
>>
>> Qemu does ioctl(VHOST_SCSI_SET_ENDPOINT)
>>          - match LIO/target port to vhost_dev
>>
>>
>> So we could change that to:
>>
>> For N IO vqs userspace does
>>          open(/dev/vhost-scsi)
>>                  vhost_dev_init(create IO, evt, and ctl);
>>
>> for N IO vqs Qemu does:
>>          ioctl(VHOST_SET_OWNER)
>>                  vhost_dev_set_owner()
>>
>> for N IO vqs Qemu does:
>>          // virtqueue setup related ioctls
>>
>> for N IO vqs Qemu does:
>>          ioctl(VHOST_SCSI_SET_ENDPOINT)
>>                  - match LIO/target port to vhost_dev and assemble the
>> multiple vhost_dev device.
>>
>> The problem is that we have to setup some of the evt/ctl specific parts at
>> open() time when vhost_dev_init does vhost_poll_init for example.
>>
>> - At open time, we don't know if this vhost_dev is going to be part of a
>> multiple vhost_device device or a single one so we need to create at least 3
>> of them
>> - If it is a multiple device we don't know if its the first device being
>> created for the device or the N'th, so we don't know if the dev's vqs will
>> be used for IO or ctls/evts, so we have to create all 3.
>>
>> When we get the first VHOST_SCSI_SET_ENDPOINT call for a new style multiple
>> vhost_dev device, we can use that dev's evt/ctl vqs for events/controls
>> requests. When we get the other VHOST_SCSI_SET_ENDPOINT calls for the
>> multiple vhost_dev device then those dev's evt/ctl vqs will be ignored and
>> we will only use their IO vqs. So we end up with a lot of extra vqs.
> The issue Jason's hinting at is how can admins control the amount
> of resources a given qemu instance can consume?
> After all vhost vqs all live in host kernel memory ...
> Limiting # of open fds would be one way to do that ...
>
> The need to share event/control vqs between devices is a problem though,
> and sending lots of ioctls on things like reset is also not that elegant.
> Jason, did you have a good solution in mind?


Nope, I'm not familiar with SCSI so I don't even know sharing evt/cvq is 
possible. Consider VHOST_SCSI_MAX_VQ is already 128 per device. Mike's 
proposal seems to be better.

Thanks


>
>> One other question/issue I have is that qemu can open the /dev/vhost-scsi
>> device or it allows tools like libvirtd to open the device and pass in the
>> fd to use. For the latter case, would we continue to have those tools pass
>> in the leading fd, then have qemu do the other num_queues - 1
>> open(/dev/vhost-scsi) calls? Or do these apps that pass in the fd need to
>> know about all of the fds for some management reason?
> They know about all the fds, for resource control and priveledge
> separation reasons.
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 07/17] vhost scsi: support delayed IO vq creation
  2020-11-02  6:36         ` Jason Wang
@ 2020-11-02  6:49           ` Jason Wang
  2020-11-02 16:19             ` Mike Christie
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Wang @ 2020-11-02  6:49 UTC (permalink / raw)
  To: Michael S. Tsirkin, Mike Christie
  Cc: martin.petersen, linux-scsi, virtualization, target-devel,
	stefanha, pbonzini


On 2020/11/2 下午2:36, Jason Wang wrote:
>>
>> The need to share event/control vqs between devices is a problem though,
>> and sending lots of ioctls on things like reset is also not that 
>> elegant.
>> Jason, did you have a good solution in mind?
>
>
> Nope, I'm not familiar with SCSI so I don't even know sharing evt/cvq 
> is possible. Consider VHOST_SCSI_MAX_VQ is already 128 per device. 
> Mike's proposal seems to be better.
>
> Thanks 


Btw, it looks to me vhost_scsi_do_evt_work() has the assumption of iovec 
layout which needs to be fixed.

Thanks

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 07/17] vhost scsi: support delayed IO vq creation
  2020-11-02  6:49           ` Jason Wang
@ 2020-11-02 16:19             ` Mike Christie
  0 siblings, 0 replies; 43+ messages in thread
From: Mike Christie @ 2020-11-02 16:19 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: martin.petersen, linux-scsi, virtualization, target-devel,
	stefanha, pbonzini

On 11/2/20 12:49 AM, Jason Wang wrote:
> 
> On 2020/11/2 下午2:36, Jason Wang wrote:
>>>
>>> The need to share event/control vqs between devices is a problem though,
>>> and sending lots of ioctls on things like reset is also not that 
>>> elegant.
>>> Jason, did you have a good solution in mind?
>>
>>
>> Nope, I'm not familiar with SCSI so I don't even know sharing evt/cvq 
>> is possible. Consider VHOST_SCSI_MAX_VQ is already 128 per device. 
>> Mike's proposal seems to be better.

Hey, which proposal are you saying was best?

1. Add on to the current scsi mq design where we are doing a single 
device and multiple vqs already. So basically just fix what we have and 
add in patches 12 - 16 to do a thread per VQ?

2. The proposal I stated to hack up over the weekend to try and support 
the current design and then add in support for your multiple device 
single vq design:

http://archive.lwn.net:8080/linux-scsi/292879d9-915d-8587-0678-8677a800c613@oracle.com/

>>
>> Thanks 
> 
> 
> Btw, it looks to me vhost_scsi_do_evt_work() has the assumption of iovec 
> layout which needs to be fixed.

I wanted to be clear, because I thought you meant #1, but this comment 
seems like it would only be for #2.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 04/17] vhost: prep vhost_dev_init users to handle failures
  2020-10-22  0:34 ` [PATCH 04/17] vhost: prep vhost_dev_init users to handle failures Mike Christie
                     ` (2 preceding siblings ...)
  2020-11-02  5:57   ` Jason Wang
@ 2020-11-03 10:04   ` Dan Carpenter
  3 siblings, 0 replies; 43+ messages in thread
From: Dan Carpenter @ 2020-11-03 10:04 UTC (permalink / raw)
  To: kbuild, Mike Christie, martin.petersen, linux-scsi, target-devel,
	mst, jasowang, pbonzini, stefanha, virtualization
  Cc: lkp, Dan Carpenter, kbuild-all


[-- Attachment #1: Type: text/plain, Size: 4277 bytes --]

Hi Mike,

url:    https://github.com/0day-ci/linux/commits/Mike-Christie/vhost-fix-scsi-cmd-handling-and-cgroup-support/20201022-083844
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: i386-randconfig-m021-20201101 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

smatch warnings:
drivers/vhost/vsock.c:648 vhost_vsock_dev_open() error: uninitialized symbol 'ret'.

vim +/ret +648 drivers/vhost/vsock.c

433fc58e6bf2c8b Asias He        2016-07-28  605  static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
433fc58e6bf2c8b Asias He        2016-07-28  606  {
433fc58e6bf2c8b Asias He        2016-07-28  607  	struct vhost_virtqueue **vqs;
433fc58e6bf2c8b Asias He        2016-07-28  608  	struct vhost_vsock *vsock;
433fc58e6bf2c8b Asias He        2016-07-28  609  	int ret;
433fc58e6bf2c8b Asias He        2016-07-28  610  
433fc58e6bf2c8b Asias He        2016-07-28  611  	/* This struct is large and allocation could fail, fall back to vmalloc
433fc58e6bf2c8b Asias He        2016-07-28  612  	 * if there is no other way.
433fc58e6bf2c8b Asias He        2016-07-28  613  	 */
dcda9b04713c3f6 Michal Hocko    2017-07-12  614  	vsock = kvmalloc(sizeof(*vsock), GFP_KERNEL | __GFP_RETRY_MAYFAIL);
433fc58e6bf2c8b Asias He        2016-07-28  615  	if (!vsock)
433fc58e6bf2c8b Asias He        2016-07-28  616  		return -ENOMEM;
433fc58e6bf2c8b Asias He        2016-07-28  617  
433fc58e6bf2c8b Asias He        2016-07-28  618  	vqs = kmalloc_array(ARRAY_SIZE(vsock->vqs), sizeof(*vqs), GFP_KERNEL);
433fc58e6bf2c8b Asias He        2016-07-28  619  	if (!vqs) {
433fc58e6bf2c8b Asias He        2016-07-28  620  		ret = -ENOMEM;
433fc58e6bf2c8b Asias He        2016-07-28  621  		goto out;
433fc58e6bf2c8b Asias He        2016-07-28  622  	}
433fc58e6bf2c8b Asias He        2016-07-28  623  
a72b69dc083a931 Stefan Hajnoczi 2017-11-09  624  	vsock->guest_cid = 0; /* no CID assigned yet */
a72b69dc083a931 Stefan Hajnoczi 2017-11-09  625  
433fc58e6bf2c8b Asias He        2016-07-28  626  	atomic_set(&vsock->queued_replies, 0);
433fc58e6bf2c8b Asias He        2016-07-28  627  
433fc58e6bf2c8b Asias He        2016-07-28  628  	vqs[VSOCK_VQ_TX] = &vsock->vqs[VSOCK_VQ_TX];
433fc58e6bf2c8b Asias He        2016-07-28  629  	vqs[VSOCK_VQ_RX] = &vsock->vqs[VSOCK_VQ_RX];
433fc58e6bf2c8b Asias He        2016-07-28  630  	vsock->vqs[VSOCK_VQ_TX].handle_kick = vhost_vsock_handle_tx_kick;
433fc58e6bf2c8b Asias He        2016-07-28  631  	vsock->vqs[VSOCK_VQ_RX].handle_kick = vhost_vsock_handle_rx_kick;
433fc58e6bf2c8b Asias He        2016-07-28  632  
6e1629548d318c2 Mike Christie   2020-10-21  633  	if (vhost_dev_init(&vsock->dev, vqs, ARRAY_SIZE(vsock->vqs),
e82b9b0727ff6d6 Jason Wang      2019-05-17  634  			   UIO_MAXIOV, VHOST_VSOCK_PKT_WEIGHT,
6e1629548d318c2 Mike Christie   2020-10-21  635  			   VHOST_VSOCK_WEIGHT, true, NULL))
6e1629548d318c2 Mike Christie   2020-10-21  636  		goto err_dev_init;
                                                                ^^^^^^^^^^^^^^^^^
"ret" needs to be set here.

433fc58e6bf2c8b Asias He        2016-07-28  637  
433fc58e6bf2c8b Asias He        2016-07-28  638  	file->private_data = vsock;
433fc58e6bf2c8b Asias He        2016-07-28  639  	spin_lock_init(&vsock->send_pkt_list_lock);
433fc58e6bf2c8b Asias He        2016-07-28  640  	INIT_LIST_HEAD(&vsock->send_pkt_list);
433fc58e6bf2c8b Asias He        2016-07-28  641  	vhost_work_init(&vsock->send_pkt_work, vhost_transport_send_pkt_work);
433fc58e6bf2c8b Asias He        2016-07-28  642  	return 0;
433fc58e6bf2c8b Asias He        2016-07-28  643  
6e1629548d318c2 Mike Christie   2020-10-21  644  err_dev_init:
6e1629548d318c2 Mike Christie   2020-10-21  645  	kfree(vqs);
433fc58e6bf2c8b Asias He        2016-07-28  646  out:
433fc58e6bf2c8b Asias He        2016-07-28  647  	vhost_vsock_free(vsock);
433fc58e6bf2c8b Asias He        2016-07-28 @648  	return ret;
433fc58e6bf2c8b Asias He        2016-07-28  649  }

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 32485 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, back to index

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-22  0:34 [PATCH 00/17 V3] vhost: fix scsi cmd handling and cgroup support Mike Christie
2020-10-22  0:34 ` [PATCH 01/17] vhost scsi: add lun parser helper Mike Christie
2020-10-26  3:33   ` Jason Wang
2020-10-22  0:34 ` [PATCH 02/17] vhost: remove work arg from vhost_work_flush Mike Christie
2020-10-22  0:51   ` Chaitanya Kulkarni
2020-10-22  0:34 ` [PATCH 03/17] vhost net: use goto error handling in open Mike Christie
2020-10-22  0:45   ` Chaitanya Kulkarni
2020-10-26  3:34   ` Jason Wang
2020-10-22  0:34 ` [PATCH 04/17] vhost: prep vhost_dev_init users to handle failures Mike Christie
2020-10-22  5:22   ` kernel test robot
2020-10-23 16:15   ` Mike Christie
2020-11-02  5:57   ` Jason Wang
2020-11-03 10:04   ` Dan Carpenter
2020-10-22  0:34 ` [PATCH 05/17] vhost: move vq iovec allocation to dev init time Mike Christie
2020-10-22  0:34 ` [PATCH 06/17] vhost: support delayed vq creation Mike Christie
2020-10-22  0:34 ` [PATCH 07/17] vhost scsi: support delayed IO " Mike Christie
2020-10-26  3:51   ` Jason Wang
2020-10-27  5:47     ` Mike Christie
2020-10-28  1:55       ` Jason Wang
2020-10-30  8:47       ` Michael S. Tsirkin
2020-10-30 16:30         ` Mike Christie
2020-10-30 17:26           ` Mike Christie
2020-11-01 22:06         ` Mike Christie
2020-11-02  6:36         ` Jason Wang
2020-11-02  6:49           ` Jason Wang
2020-11-02 16:19             ` Mike Christie
2020-10-22  0:34 ` [PATCH 08/17] vhost scsi: alloc cmds per vq instead of session Mike Christie
2020-10-22  0:34 ` [PATCH 09/17] vhost scsi: fix cmd completion race Mike Christie
2020-10-27 13:07   ` Maurizio Lombardi
2020-10-30  8:51   ` Michael S. Tsirkin
2020-10-30 16:04     ` Paolo Bonzini
2020-10-22  0:34 ` [PATCH 10/17] vhost scsi: Add support for LUN resets Mike Christie
2020-10-22  0:34 ` [PATCH 11/17] vhost scsi: remove extra flushes Mike Christie
2020-10-22  0:34 ` [PATCH 12/17] vhost poll: fix coding style Mike Christie
2020-10-22  0:39   ` Chaitanya Kulkarni
2020-10-22  0:34 ` [PATCH 13/17] vhost: support multiple worker threads Mike Christie
2020-10-22  0:35 ` [PATCH 14/17] vhost: poll support support multiple workers Mike Christie
2020-10-22  0:35 ` [PATCH 15/17] host: support delayed vq creation Mike Christie
2020-10-22  0:50   ` Mike Christie
2020-10-22  0:35 ` [PATCH 16/17] vhost scsi: multiple worker support Mike Christie
2020-10-22  0:35 ` [PATCH 17/17] vhost scsi: drop submission workqueue Mike Christie
2020-10-29 21:47 ` [PATCH 00/17 V3] vhost: fix scsi cmd handling and cgroup support Michael S. Tsirkin
2020-10-29 22:19   ` Mike Christie

Target-devel archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/target-devel/0 target-devel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 target-devel target-devel/ https://lore.kernel.org/target-devel \
		target-devel@vger.kernel.org
	public-inbox-index target-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.target-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git