target-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/11 V4] vhost: vhost-scsi bug fixes
@ 2020-11-04 22:26 Mike Christie
  2020-11-04 22:26 ` [PATCH 01/11] vhost scsi: add lun parser helper Mike Christie
                   ` (10 more replies)
  0 siblings, 11 replies; 19+ messages in thread
From: Mike Christie @ 2020-11-04 22:26 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

The following patches made over Michael's vhost branch
https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/log/?h=vhost

The patches fix issues with vhost-scsi's 256 cmd limit
where if the guest sends more than 256 cmds it will get IO
errors, or even if exactly 256 are being sent then we can race and
get IO errors. The patches also add LUN RESET support so guests
that do not support resetting command timers will not get IO
errors and offlined devices when the physical device hits temp
issues.

Jason, you reviewed ever non scsi patch but 2 of them. I was not
sure if that was an oversight or you were ok with them. I'm sending
this out hoping you were ok with them and forgot to reply (or I guess
will be ok with them if you just missed them last time).

V4:
- really really fix compile errors
- dropped threading patches so we can figure that out separately.

V3:
- fix compile errors
- fix possible crash where cmd could be freed while adding it to
completion list
- fix issue where we added the worker thread to the blk cgroup but
the blk IO was submitted by a driver workqueue.

V2:
- fix use before set cpu var errors
- drop vhost_vq_is_setup
- include patches to do a worker thread per scsi IO vq

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 01/11] vhost scsi: add lun parser helper
  2020-11-04 22:26 [PATCH 00/11 V4] vhost: vhost-scsi bug fixes Mike Christie
@ 2020-11-04 22:26 ` Mike Christie
  2020-11-04 22:26 ` [PATCH 02/11] vhost: remove work arg from vhost_work_flush Mike Christie
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Mike Christie @ 2020-11-04 22:26 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

Move code to parse lun from req's lun_buf to helper, so tmf code
can use it in the next patch.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/scsi.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index b22adf0..0ea78d0 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -907,6 +907,11 @@ static void vhost_scsi_submission_work(struct work_struct *work)
 	return ret;
 }
 
+static u16 vhost_buf_to_lun(u8 *lun_buf)
+{
+	return ((lun_buf[2] << 8) | lun_buf[3]) & 0x3FFF;
+}
+
 static void
 vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
 {
@@ -1045,12 +1050,12 @@ static void vhost_scsi_submission_work(struct work_struct *work)
 			tag = vhost64_to_cpu(vq, v_req_pi.tag);
 			task_attr = v_req_pi.task_attr;
 			cdb = &v_req_pi.cdb[0];
-			lun = ((v_req_pi.lun[2] << 8) | v_req_pi.lun[3]) & 0x3FFF;
+			lun = vhost_buf_to_lun(v_req_pi.lun);
 		} else {
 			tag = vhost64_to_cpu(vq, v_req.tag);
 			task_attr = v_req.task_attr;
 			cdb = &v_req.cdb[0];
-			lun = ((v_req.lun[2] << 8) | v_req.lun[3]) & 0x3FFF;
+			lun = vhost_buf_to_lun(v_req.lun);
 		}
 		/*
 		 * Check that the received CDB size does not exceeded our
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 02/11] vhost: remove work arg from vhost_work_flush
  2020-11-04 22:26 [PATCH 00/11 V4] vhost: vhost-scsi bug fixes Mike Christie
  2020-11-04 22:26 ` [PATCH 01/11] vhost scsi: add lun parser helper Mike Christie
@ 2020-11-04 22:26 ` Mike Christie
  2020-11-04 22:26 ` [PATCH 03/11] vhost net: use goto error handling in open Mike Christie
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Mike Christie @ 2020-11-04 22:26 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

vhost_work_flush doesn't do anything with the work arg. This patch drops
it and then renames vhost_work_flush to vhost_work_dev_flush to reflect
that the function flushes all the works in the dev and not just a
specific queue or work item.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/vhost/scsi.c  | 4 ++--
 drivers/vhost/vhost.c | 8 ++++----
 drivers/vhost/vhost.h | 2 +-
 drivers/vhost/vsock.c | 2 +-
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 0ea78d0..86617bb 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1370,8 +1370,8 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
 	/* Flush both the vhost poll and vhost work */
 	for (i = 0; i < VHOST_SCSI_MAX_VQ; i++)
 		vhost_scsi_flush_vq(vs, i);
-	vhost_work_flush(&vs->dev, &vs->vs_completion_work);
-	vhost_work_flush(&vs->dev, &vs->vs_event_work);
+	vhost_work_dev_flush(&vs->dev);
+	vhost_work_dev_flush(&vs->dev);
 
 	/* Wait for all reqs issued before the flush to be finished */
 	for (i = 0; i < VHOST_SCSI_MAX_VQ; i++)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 5c835a2..6818f71 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -231,7 +231,7 @@ void vhost_poll_stop(struct vhost_poll *poll)
 }
 EXPORT_SYMBOL_GPL(vhost_poll_stop);
 
-void vhost_work_flush(struct vhost_dev *dev, struct vhost_work *work)
+void vhost_work_dev_flush(struct vhost_dev *dev)
 {
 	struct vhost_flush_struct flush;
 
@@ -243,13 +243,13 @@ void vhost_work_flush(struct vhost_dev *dev, struct vhost_work *work)
 		wait_for_completion(&flush.wait_event);
 	}
 }
-EXPORT_SYMBOL_GPL(vhost_work_flush);
+EXPORT_SYMBOL_GPL(vhost_work_dev_flush);
 
 /* Flush any work that has been scheduled. When calling this, don't hold any
  * locks that are also used by the callback. */
 void vhost_poll_flush(struct vhost_poll *poll)
 {
-	vhost_work_flush(poll->dev, &poll->work);
+	vhost_work_dev_flush(poll->dev);
 }
 EXPORT_SYMBOL_GPL(vhost_poll_flush);
 
@@ -532,7 +532,7 @@ static int vhost_attach_cgroups(struct vhost_dev *dev)
 	attach.owner = current;
 	vhost_work_init(&attach.work, vhost_attach_cgroups_work);
 	vhost_work_queue(dev, &attach.work);
-	vhost_work_flush(dev, &attach.work);
+	vhost_work_dev_flush(dev);
 	return attach.ret;
 }
 
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index e016cd3..1365f33 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -46,7 +46,7 @@ void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
 void vhost_poll_stop(struct vhost_poll *poll);
 void vhost_poll_flush(struct vhost_poll *poll);
 void vhost_poll_queue(struct vhost_poll *poll);
-void vhost_work_flush(struct vhost_dev *dev, struct vhost_work *work);
+void vhost_work_dev_flush(struct vhost_dev *dev);
 long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *argp);
 
 struct vhost_log {
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index a483cec..f40205f 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -652,7 +652,7 @@ static void vhost_vsock_flush(struct vhost_vsock *vsock)
 	for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++)
 		if (vsock->vqs[i].handle_kick)
 			vhost_poll_flush(&vsock->vqs[i].poll);
-	vhost_work_flush(&vsock->dev, &vsock->send_pkt_work);
+	vhost_work_dev_flush(&vsock->dev);
 }
 
 static void vhost_vsock_reset_orphans(struct sock *sk)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 03/11] vhost net: use goto error handling in open
  2020-11-04 22:26 [PATCH 00/11 V4] vhost: vhost-scsi bug fixes Mike Christie
  2020-11-04 22:26 ` [PATCH 01/11] vhost scsi: add lun parser helper Mike Christie
  2020-11-04 22:26 ` [PATCH 02/11] vhost: remove work arg from vhost_work_flush Mike Christie
@ 2020-11-04 22:26 ` Mike Christie
  2020-11-04 22:26 ` [PATCH 04/11] vhost: prep vhost_dev_init users to handle failures Mike Christie
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Mike Christie @ 2020-11-04 22:26 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

In the next patches vhost_dev_init will be able to fail. This patch has
vhost_net_open use goto error handling like is done in the other vhost
code to make handling vhost_dev_init failures easier to handle and
extend in the future.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c | 29 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 531a00d..831d824 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1286,27 +1286,18 @@ static int vhost_net_open(struct inode *inode, struct file *f)
 	if (!n)
 		return -ENOMEM;
 	vqs = kmalloc_array(VHOST_NET_VQ_MAX, sizeof(*vqs), GFP_KERNEL);
-	if (!vqs) {
-		kvfree(n);
-		return -ENOMEM;
-	}
+	if (!vqs)
+		goto err_vqs;
 
 	queue = kmalloc_array(VHOST_NET_BATCH, sizeof(void *),
 			      GFP_KERNEL);
-	if (!queue) {
-		kfree(vqs);
-		kvfree(n);
-		return -ENOMEM;
-	}
+	if (!queue)
+		goto err_queue;
 	n->vqs[VHOST_NET_VQ_RX].rxq.queue = queue;
 
 	xdp = kmalloc_array(VHOST_NET_BATCH, sizeof(*xdp), GFP_KERNEL);
-	if (!xdp) {
-		kfree(vqs);
-		kvfree(n);
-		kfree(queue);
-		return -ENOMEM;
-	}
+	if (!xdp)
+		goto err_xdp;
 	n->vqs[VHOST_NET_VQ_TX].xdp = xdp;
 
 	dev = &n->dev;
@@ -1338,6 +1329,14 @@ static int vhost_net_open(struct inode *inode, struct file *f)
 	n->refcnt_bias = 0;
 
 	return 0;
+
+err_xdp:
+	kfree(queue);
+err_queue:
+	kfree(vqs);
+err_vqs:
+	kvfree(n);
+	return -ENOMEM;
 }
 
 static struct socket *vhost_net_stop_vq(struct vhost_net *n,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 04/11] vhost: prep vhost_dev_init users to handle failures
  2020-11-04 22:26 [PATCH 00/11 V4] vhost: vhost-scsi bug fixes Mike Christie
                   ` (2 preceding siblings ...)
  2020-11-04 22:26 ` [PATCH 03/11] vhost net: use goto error handling in open Mike Christie
@ 2020-11-04 22:26 ` Mike Christie
  2020-11-04 22:26 ` [PATCH 05/11] vhost: move vq iovec allocation to dev init time Mike Christie
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Mike Christie @ 2020-11-04 22:26 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

This is just a prep patch to get vhost_dev_init callers ready to handle
the next patch where the function can fail. In this patch vhost_dev_init
just returns 0, but I think it's easier to check for goto/error handling
errors separated from the next patch.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com> 
---
 drivers/vhost/net.c   | 11 +++++++----
 drivers/vhost/scsi.c  |  8 ++++++--
 drivers/vhost/test.c  |  9 +++++++--
 drivers/vhost/vdpa.c  |  7 +++++--
 drivers/vhost/vhost.c | 14 ++++++++------
 drivers/vhost/vhost.h | 10 +++++-----
 drivers/vhost/vsock.c | 10 +++++++---
 7 files changed, 45 insertions(+), 24 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 831d824..fd30b53 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1316,10 +1316,11 @@ static int vhost_net_open(struct inode *inode, struct file *f)
 		n->vqs[i].rx_ring = NULL;
 		vhost_net_buf_init(&n->vqs[i].rxq);
 	}
-	vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX,
-		       UIO_MAXIOV + VHOST_NET_BATCH,
-		       VHOST_NET_PKT_WEIGHT, VHOST_NET_WEIGHT, true,
-		       NULL);
+	if (vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX,
+			   UIO_MAXIOV + VHOST_NET_BATCH,
+			   VHOST_NET_PKT_WEIGHT, VHOST_NET_WEIGHT, true,
+			   NULL))
+		goto err_dev_init;
 
 	vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, EPOLLOUT, dev);
 	vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, EPOLLIN, dev);
@@ -1330,6 +1331,8 @@ static int vhost_net_open(struct inode *inode, struct file *f)
 
 	return 0;
 
+err_dev_init:
+	kfree(xdp);
 err_xdp:
 	kfree(queue);
 err_queue:
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 86617bb..ed8baf9 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1632,14 +1632,18 @@ static int vhost_scsi_open(struct inode *inode, struct file *f)
 		vqs[i] = &vs->vqs[i].vq;
 		vs->vqs[i].vq.handle_kick = vhost_scsi_handle_kick;
 	}
-	vhost_dev_init(&vs->dev, vqs, VHOST_SCSI_MAX_VQ, UIO_MAXIOV,
-		       VHOST_SCSI_WEIGHT, 0, true, NULL);
+	r = vhost_dev_init(&vs->dev, vqs, VHOST_SCSI_MAX_VQ, UIO_MAXIOV,
+			   VHOST_SCSI_WEIGHT, 0, true, NULL);
+	if (r)
+		goto err_dev_init;
 
 	vhost_scsi_init_inflight(vs, NULL);
 
 	f->private_data = vs;
 	return 0;
 
+err_dev_init:
+	kfree(vqs);
 err_vqs:
 	kvfree(vs);
 err_vs:
diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index a09dedc..c255ae5 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -119,12 +119,17 @@ static int vhost_test_open(struct inode *inode, struct file *f)
 	dev = &n->dev;
 	vqs[VHOST_TEST_VQ] = &n->vqs[VHOST_TEST_VQ];
 	n->vqs[VHOST_TEST_VQ].handle_kick = handle_vq_kick;
-	vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, UIO_MAXIOV,
-		       VHOST_TEST_PKT_WEIGHT, VHOST_TEST_WEIGHT, true, NULL);
+	if (vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, UIO_MAXIOV,
+			   VHOST_TEST_PKT_WEIGHT, VHOST_TEST_WEIGHT, true, NULL)
+		goto err_dev_init;
 
 	f->private_data = n;
 
 	return 0;
+
+err_dev_init:
+	kfree(vqs);
+	return -ENOMEM;
 }
 
 static void *vhost_test_stop_vq(struct vhost_test *n,
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 2754f30..c1615a7 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -829,8 +829,10 @@ static int vhost_vdpa_open(struct inode *inode, struct file *filep)
 		vqs[i] = &v->vqs[i];
 		vqs[i]->handle_kick = handle_vq_kick;
 	}
-	vhost_dev_init(dev, vqs, nvqs, 0, 0, 0, false,
-		       vhost_vdpa_process_iotlb_msg);
+	r = vhost_dev_init(dev, vqs, nvqs, 0, 0, 0, false,
+			   vhost_vdpa_process_iotlb_msg);
+	if (r)
+		goto err_dev_init;
 
 	dev->iotlb = vhost_iotlb_alloc(0, 0);
 	if (!dev->iotlb) {
@@ -850,6 +852,7 @@ static int vhost_vdpa_open(struct inode *inode, struct file *filep)
 
 err_init_iotlb:
 	vhost_dev_cleanup(&v->vdev);
+err_dev_init:
 	kfree(vqs);
 err:
 	atomic_dec(&v->opened);
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 6818f71..b35229e 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -458,12 +458,12 @@ static size_t vhost_get_desc_size(struct vhost_virtqueue *vq,
 	return sizeof(*vq->desc) * num;
 }
 
-void vhost_dev_init(struct vhost_dev *dev,
-		    struct vhost_virtqueue **vqs, int nvqs,
-		    int iov_limit, int weight, int byte_weight,
-		    bool use_worker,
-		    int (*msg_handler)(struct vhost_dev *dev,
-				       struct vhost_iotlb_msg *msg))
+int vhost_dev_init(struct vhost_dev *dev,
+		   struct vhost_virtqueue **vqs, int nvqs,
+		   int iov_limit, int weight, int byte_weight,
+		   bool use_worker,
+		   int (*msg_handler)(struct vhost_dev *dev,
+				      struct vhost_iotlb_msg *msg))
 {
 	struct vhost_virtqueue *vq;
 	int i;
@@ -500,6 +500,8 @@ void vhost_dev_init(struct vhost_dev *dev,
 			vhost_poll_init(&vq->poll, vq->handle_kick,
 					EPOLLIN, dev);
 	}
+
+	return 0;
 }
 EXPORT_SYMBOL_GPL(vhost_dev_init);
 
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 1365f33..9ad34b1 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -167,11 +167,11 @@ struct vhost_dev {
 };
 
 bool vhost_exceeds_weight(struct vhost_virtqueue *vq, int pkts, int total_len);
-void vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue **vqs,
-		    int nvqs, int iov_limit, int weight, int byte_weight,
-		    bool use_worker,
-		    int (*msg_handler)(struct vhost_dev *dev,
-				       struct vhost_iotlb_msg *msg));
+int vhost_dev_init(struct vhost_dev *dev, struct vhost_virtqueue **vqs,
+		   int nvqs, int iov_limit, int weight, int byte_weight,
+		   bool use_worker,
+		   int (*msg_handler)(struct vhost_dev *dev,
+				      struct vhost_iotlb_msg *msg));
 long vhost_dev_set_owner(struct vhost_dev *dev);
 bool vhost_dev_has_owner(struct vhost_dev *dev);
 long vhost_dev_check_owner(struct vhost_dev *);
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index f40205f..cae0083 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -630,9 +630,11 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
 	vsock->vqs[VSOCK_VQ_TX].handle_kick = vhost_vsock_handle_tx_kick;
 	vsock->vqs[VSOCK_VQ_RX].handle_kick = vhost_vsock_handle_rx_kick;
 
-	vhost_dev_init(&vsock->dev, vqs, ARRAY_SIZE(vsock->vqs),
-		       UIO_MAXIOV, VHOST_VSOCK_PKT_WEIGHT,
-		       VHOST_VSOCK_WEIGHT, true, NULL);
+	ret = vhost_dev_init(&vsock->dev, vqs, ARRAY_SIZE(vsock->vqs),
+			     UIO_MAXIOV, VHOST_VSOCK_PKT_WEIGHT,
+			     VHOST_VSOCK_WEIGHT, true, NULL);
+	if (ret)
+		goto err_dev_init;
 
 	file->private_data = vsock;
 	spin_lock_init(&vsock->send_pkt_list_lock);
@@ -640,6 +642,8 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
 	vhost_work_init(&vsock->send_pkt_work, vhost_transport_send_pkt_work);
 	return 0;
 
+err_dev_init:
+	kfree(vqs);
 out:
 	vhost_vsock_free(vsock);
 	return ret;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 05/11] vhost: move vq iovec allocation to dev init time
  2020-11-04 22:26 [PATCH 00/11 V4] vhost: vhost-scsi bug fixes Mike Christie
                   ` (3 preceding siblings ...)
  2020-11-04 22:26 ` [PATCH 04/11] vhost: prep vhost_dev_init users to handle failures Mike Christie
@ 2020-11-04 22:26 ` Mike Christie
  2020-11-09  3:41   ` Jason Wang
  2020-11-04 22:26 ` [PATCH 06/11] vhost: support delayed vq creation Mike Christie
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 19+ messages in thread
From: Mike Christie @ 2020-11-04 22:26 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

The next patches allow us to create vqs on demand after vhost_dev_init
and vhost_dev_set_owner have been called. For vhost-scsi we don't
know the number of vqs we really want until the vring/vq setup
operations have started up. For other devices we know the number of vqs
at vhost_dev_init time, so for those devs we init the vq and allocate
the needed iovecs. For vhost-scsi we will do it later when userspace has
indicated to us that it's going to use a vq.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/vhost.c | 71 +++++++++++++++++++++++++++------------------------
 1 file changed, 38 insertions(+), 33 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index b35229e..a4a4450 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -383,29 +383,27 @@ static void vhost_vq_free_iovecs(struct vhost_virtqueue *vq)
 	vq->heads = NULL;
 }
 
-/* Helper to allocate iovec buffers for all vqs. */
-static long vhost_dev_alloc_iovecs(struct vhost_dev *dev)
+static int vhost_vq_alloc_iovecs(struct vhost_dev *dev,
+				 struct vhost_virtqueue *vq)
 {
-	struct vhost_virtqueue *vq;
-	int i;
+	vq->indirect = kmalloc_array(UIO_MAXIOV, sizeof(*vq->indirect),
+				     GFP_KERNEL);
+	if (!vq->indirect)
+		return -ENOMEM;
+
+	if (!dev->iov_limit)
+		return 0;
+
+	vq->log = kmalloc_array(dev->iov_limit, sizeof(*vq->log), GFP_KERNEL);
+	vq->heads = kmalloc_array(dev->iov_limit, sizeof(*vq->heads),
+				  GFP_KERNEL);
+	if (!vq->log || !vq->heads)
+		goto err_nomem;
 
-	for (i = 0; i < dev->nvqs; ++i) {
-		vq = dev->vqs[i];
-		vq->indirect = kmalloc_array(UIO_MAXIOV,
-					     sizeof(*vq->indirect),
-					     GFP_KERNEL);
-		vq->log = kmalloc_array(dev->iov_limit, sizeof(*vq->log),
-					GFP_KERNEL);
-		vq->heads = kmalloc_array(dev->iov_limit, sizeof(*vq->heads),
-					  GFP_KERNEL);
-		if (!vq->indirect || !vq->log || !vq->heads)
-			goto err_nomem;
-	}
 	return 0;
 
 err_nomem:
-	for (; i >= 0; --i)
-		vhost_vq_free_iovecs(dev->vqs[i]);
+	vhost_vq_free_iovecs(vq);
 	return -ENOMEM;
 }
 
@@ -458,6 +456,21 @@ static size_t vhost_get_desc_size(struct vhost_virtqueue *vq,
 	return sizeof(*vq->desc) * num;
 }
 
+static int vhost_vq_init(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+{
+	vq->log = NULL;
+	vq->indirect = NULL;
+	vq->heads = NULL;
+	vq->dev = dev;
+	mutex_init(&vq->mutex);
+	vhost_vq_reset(dev, vq);
+
+	if (vq->handle_kick)
+		vhost_poll_init(&vq->poll, vq->handle_kick, EPOLLIN, dev);
+
+	return vhost_vq_alloc_iovecs(dev, vq);
+}
+
 int vhost_dev_init(struct vhost_dev *dev,
 		   struct vhost_virtqueue **vqs, int nvqs,
 		   int iov_limit, int weight, int byte_weight,
@@ -465,7 +478,6 @@ int vhost_dev_init(struct vhost_dev *dev,
 		   int (*msg_handler)(struct vhost_dev *dev,
 				      struct vhost_iotlb_msg *msg))
 {
-	struct vhost_virtqueue *vq;
 	int i;
 
 	dev->vqs = vqs;
@@ -489,19 +501,16 @@ int vhost_dev_init(struct vhost_dev *dev,
 
 
 	for (i = 0; i < dev->nvqs; ++i) {
-		vq = dev->vqs[i];
-		vq->log = NULL;
-		vq->indirect = NULL;
-		vq->heads = NULL;
-		vq->dev = dev;
-		mutex_init(&vq->mutex);
-		vhost_vq_reset(dev, vq);
-		if (vq->handle_kick)
-			vhost_poll_init(&vq->poll, vq->handle_kick,
-					EPOLLIN, dev);
+		if (vhost_vq_init(dev, dev->vqs[i]))
+			goto err_vq_init;
 	}
 
 	return 0;
+
+err_vq_init:
+	for (--i; i >= 0; --i)
+		vhost_vq_free_iovecs(dev->vqs[i]);
+	return -ENOMEM;
 }
 EXPORT_SYMBOL_GPL(vhost_dev_init);
 
@@ -606,10 +615,6 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
 			goto err_cgroup;
 	}
 
-	err = vhost_dev_alloc_iovecs(dev);
-	if (err)
-		goto err_cgroup;
-
 	return 0;
 err_cgroup:
 	if (dev->worker) {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 06/11] vhost: support delayed vq creation
  2020-11-04 22:26 [PATCH 00/11 V4] vhost: vhost-scsi bug fixes Mike Christie
                   ` (4 preceding siblings ...)
  2020-11-04 22:26 ` [PATCH 05/11] vhost: move vq iovec allocation to dev init time Mike Christie
@ 2020-11-04 22:26 ` Mike Christie
  2020-11-09  4:01   ` Jason Wang
  2020-11-04 22:26 ` [PATCH 07/11] vhost scsi: support delayed IO " Mike Christie
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 19+ messages in thread
From: Mike Christie @ 2020-11-04 22:26 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

This allows vq creation to be done when it's first accessed by
userspace. vhost-scsi doesn't know how many queues the user requested
until they are first setup, and we don't want to allocate resources
like the iovecs for 128 vqs when we are only using 1 or 2 most of the
time. In the next pathces, vhost-scsi will also switch to preallocating
cmds per vq instead of per lio session and we don't want to allocate
them for 127 extra vqs if they are not in use.

With this patch when a driver calls vhost_dev_init they pass in the
number of vqs that they know they need and the max they can support.
This patch has all the drivers pass in the same value for both the
initial number of vqs and the max. The next patch will convert scsi.
The other drivers like net/vsock have their vqs hard coded in the
kernel or setup/discovered via other methods like with vdpa.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/net.c   |  2 +-
 drivers/vhost/scsi.c  |  4 +--
 drivers/vhost/test.c  |  5 ++--
 drivers/vhost/vdpa.c  |  2 +-
 drivers/vhost/vhost.c | 71 ++++++++++++++++++++++++++++++++++-----------------
 drivers/vhost/vhost.h |  7 +++--
 drivers/vhost/vsock.c | 11 ++++----
 7 files changed, 66 insertions(+), 36 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index fd30b53..fce46f0 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1316,7 +1316,7 @@ static int vhost_net_open(struct inode *inode, struct file *f)
 		n->vqs[i].rx_ring = NULL;
 		vhost_net_buf_init(&n->vqs[i].rxq);
 	}
-	if (vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX,
+	if (vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX, VHOST_NET_VQ_MAX,
 			   UIO_MAXIOV + VHOST_NET_BATCH,
 			   VHOST_NET_PKT_WEIGHT, VHOST_NET_WEIGHT, true,
 			   NULL))
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index ed8baf9..5b3720e 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1632,8 +1632,8 @@ static int vhost_scsi_open(struct inode *inode, struct file *f)
 		vqs[i] = &vs->vqs[i].vq;
 		vs->vqs[i].vq.handle_kick = vhost_scsi_handle_kick;
 	}
-	r = vhost_dev_init(&vs->dev, vqs, VHOST_SCSI_MAX_VQ, UIO_MAXIOV,
-			   VHOST_SCSI_WEIGHT, 0, true, NULL);
+	r = vhost_dev_init(&vs->dev, vqs, VHOST_SCSI_MAX_VQ, VHOST_SCSI_MAX_VQ,
+			   UIO_MAXIOV, VHOST_SCSI_WEIGHT, 0, true, NULL);
 	if (r)
 		goto err_dev_init;
 
diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index c255ae5..9d2bfa3 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -119,8 +119,9 @@ static int vhost_test_open(struct inode *inode, struct file *f)
 	dev = &n->dev;
 	vqs[VHOST_TEST_VQ] = &n->vqs[VHOST_TEST_VQ];
 	n->vqs[VHOST_TEST_VQ].handle_kick = handle_vq_kick;
-	if (vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, UIO_MAXIOV,
-			   VHOST_TEST_PKT_WEIGHT, VHOST_TEST_WEIGHT, true, NULL)
+	if (vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, VHOST_TEST_VQ_MAX,
+			   UIO_MAXIOV, VHOST_TEST_PKT_WEIGHT, VHOST_TEST_WEIGHT,
+			   true, NULL)
 		goto err_dev_init;
 
 	f->private_data = n;
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index c1615a7..6abd9d8 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -829,7 +829,7 @@ static int vhost_vdpa_open(struct inode *inode, struct file *filep)
 		vqs[i] = &v->vqs[i];
 		vqs[i]->handle_kick = handle_vq_kick;
 	}
-	r = vhost_dev_init(dev, vqs, nvqs, 0, 0, 0, false,
+	r = vhost_dev_init(dev, vqs, nvqs, nvqs, 0, 0, 0, false,
 			   vhost_vdpa_process_iotlb_msg);
 	if (r)
 		goto err_dev_init;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index a4a4450..2ca2e71 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -294,7 +294,7 @@ static void vhost_vq_meta_reset(struct vhost_dev *d)
 {
 	int i;
 
-	for (i = 0; i < d->nvqs; ++i)
+	for (i = 0; i < d->max_nvqs; ++i)
 		__vhost_vq_meta_reset(d->vqs[i]);
 }
 
@@ -331,6 +331,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 	vq->busyloop_timeout = 0;
 	vq->umem = NULL;
 	vq->iotlb = NULL;
+	vq->initialized = false;
 	vhost_vring_call_reset(&vq->call_ctx);
 	__vhost_vq_meta_reset(vq);
 }
@@ -411,7 +412,7 @@ static void vhost_dev_free_iovecs(struct vhost_dev *dev)
 {
 	int i;
 
-	for (i = 0; i < dev->nvqs; ++i)
+	for (i = 0; i < dev->max_nvqs; ++i)
 		vhost_vq_free_iovecs(dev->vqs[i]);
 }
 
@@ -456,7 +457,7 @@ static size_t vhost_get_desc_size(struct vhost_virtqueue *vq,
 	return sizeof(*vq->desc) * num;
 }
 
-static int vhost_vq_init(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+static void __vhost_vq_init(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 {
 	vq->log = NULL;
 	vq->indirect = NULL;
@@ -467,12 +468,29 @@ static int vhost_vq_init(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 
 	if (vq->handle_kick)
 		vhost_poll_init(&vq->poll, vq->handle_kick, EPOLLIN, dev);
+}
+
+static int vhost_vq_init(struct vhost_dev *dev, int vq_idx)
+{
+	struct vhost_virtqueue *vq;
+	int ret;
+
+	if (vq_idx >= dev->max_nvqs)
+		return -ENOBUFS;
+
+	vq = dev->vqs[vq_idx];
+	__vhost_vq_init(dev, vq);
+	ret = vhost_vq_alloc_iovecs(dev, vq);
+	if (ret)
+		return ret;
 
-	return vhost_vq_alloc_iovecs(dev, vq);
+	vq->initialized = true;
+	dev->nvqs++;
+	return 0;
 }
 
 int vhost_dev_init(struct vhost_dev *dev,
-		   struct vhost_virtqueue **vqs, int nvqs,
+		   struct vhost_virtqueue **vqs, int nvqs, int max_nvqs,
 		   int iov_limit, int weight, int byte_weight,
 		   bool use_worker,
 		   int (*msg_handler)(struct vhost_dev *dev,
@@ -481,7 +499,8 @@ int vhost_dev_init(struct vhost_dev *dev,
 	int i;
 
 	dev->vqs = vqs;
-	dev->nvqs = nvqs;
+	dev->nvqs = 0;
+	dev->max_nvqs = max_nvqs;
 	mutex_init(&dev->mutex);
 	dev->log_ctx = NULL;
 	dev->umem = NULL;
@@ -499,12 +518,15 @@ int vhost_dev_init(struct vhost_dev *dev,
 	INIT_LIST_HEAD(&dev->pending_list);
 	spin_lock_init(&dev->iotlb_lock);
 
-
-	for (i = 0; i < dev->nvqs; ++i) {
-		if (vhost_vq_init(dev, dev->vqs[i]))
+	for (i = 0; i < nvqs; ++i) {
+		if (vhost_vq_init(dev, i))
 			goto err_vq_init;
 	}
 
+	for (; i < dev->max_nvqs; ++i)
+		/* Just prep/clear the fields and set initializedúlse */
+		__vhost_vq_init(dev, dev->vqs[i]);
+
 	return 0;
 
 err_vq_init:
@@ -652,7 +674,7 @@ void vhost_dev_reset_owner(struct vhost_dev *dev, struct vhost_iotlb *umem)
 	/* We don't need VQ locks below since vhost_dev_cleanup makes sure
 	 * VQs aren't running.
 	 */
-	for (i = 0; i < dev->nvqs; ++i)
+	for (i = 0; i < dev->max_nvqs; ++i)
 		dev->vqs[i]->umem = umem;
 }
 EXPORT_SYMBOL_GPL(vhost_dev_reset_owner);
@@ -661,7 +683,7 @@ void vhost_dev_stop(struct vhost_dev *dev)
 {
 	int i;
 
-	for (i = 0; i < dev->nvqs; ++i) {
+	for (i = 0; i < dev->max_nvqs; ++i) {
 		if (dev->vqs[i]->kick && dev->vqs[i]->handle_kick) {
 			vhost_poll_stop(&dev->vqs[i]->poll);
 			vhost_poll_flush(&dev->vqs[i]->poll);
@@ -693,7 +715,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
 {
 	int i;
 
-	for (i = 0; i < dev->nvqs; ++i) {
+	for (i = 0; i < dev->max_nvqs; ++i) {
 		if (dev->vqs[i]->error_ctx)
 			eventfd_ctx_put(dev->vqs[i]->error_ctx);
 		if (dev->vqs[i]->kick)
@@ -787,7 +809,7 @@ static bool memory_access_ok(struct vhost_dev *d, struct vhost_iotlb *umem,
 {
 	int i;
 
-	for (i = 0; i < d->nvqs; ++i) {
+	for (i = 0; i < d->max_nvqs; ++i) {
 		bool ok;
 		bool log;
 
@@ -999,14 +1021,14 @@ static inline int vhost_put_used_idx(struct vhost_virtqueue *vq)
 static void vhost_dev_lock_vqs(struct vhost_dev *d)
 {
 	int i = 0;
-	for (i = 0; i < d->nvqs; ++i)
+	for (i = 0; i < d->max_nvqs; ++i)
 		mutex_lock_nested(&d->vqs[i]->mutex, i);
 }
 
 static void vhost_dev_unlock_vqs(struct vhost_dev *d)
 {
 	int i = 0;
-	for (i = 0; i < d->nvqs; ++i)
+	for (i = 0; i < d->max_nvqs; ++i)
 		mutex_unlock(&d->vqs[i]->mutex);
 }
 
@@ -1462,7 +1484,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
 	d->umem = newumem;
 
 	/* All memory accesses are done under some VQ mutex. */
-	for (i = 0; i < d->nvqs; ++i) {
+	for (i = 0; i < d->max_nvqs; ++i) {
 		mutex_lock(&d->vqs[i]->mutex);
 		d->vqs[i]->umem = newumem;
 		mutex_unlock(&d->vqs[i]->mutex);
@@ -1590,11 +1612,14 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
 	r = get_user(idx, idxp);
 	if (r < 0)
 		return r;
-	if (idx >= d->nvqs)
-		return -ENOBUFS;
 
-	idx = array_index_nospec(idx, d->nvqs);
+	idx = array_index_nospec(idx, d->max_nvqs);
 	vq = d->vqs[idx];
+	if (!vq->initialized) {
+		r = vhost_vq_init(d, idx);
+		if (r)
+			return r;
+	}
 
 	if (ioctl = VHOST_SET_VRING_NUM ||
 	    ioctl = VHOST_SET_VRING_ADDR) {
@@ -1724,7 +1749,7 @@ int vhost_init_device_iotlb(struct vhost_dev *d, bool enabled)
 	oiotlb = d->iotlb;
 	d->iotlb = niotlb;
 
-	for (i = 0; i < d->nvqs; ++i) {
+	for (i = 0; i < d->max_nvqs; ++i) {
 		struct vhost_virtqueue *vq = d->vqs[i];
 
 		mutex_lock(&vq->mutex);
@@ -1771,7 +1796,7 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *argp)
 			r = -EFAULT;
 			break;
 		}
-		for (i = 0; i < d->nvqs; ++i) {
+		for (i = 0; i < d->max_nvqs; ++i) {
 			struct vhost_virtqueue *vq;
 			void __user *base = (void __user *)(unsigned long)p;
 			vq = d->vqs[i];
@@ -1794,7 +1819,7 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *argp)
 			break;
 		}
 		swap(ctx, d->log_ctx);
-		for (i = 0; i < d->nvqs; ++i) {
+		for (i = 0; i < d->max_nvqs; ++i) {
 			mutex_lock(&d->vqs[i]->mutex);
 			d->vqs[i]->log_ctx = d->log_ctx;
 			mutex_unlock(&d->vqs[i]->mutex);
@@ -2609,7 +2634,7 @@ void vhost_set_backend_features(struct vhost_dev *dev, u64 features)
 	int i;
 
 	mutex_lock(&dev->mutex);
-	for (i = 0; i < dev->nvqs; ++i) {
+	for (i = 0; i < dev->max_nvqs; ++i) {
 		vq = dev->vqs[i];
 		mutex_lock(&vq->mutex);
 		vq->acked_backend_features = features;
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 9ad34b1..9677870 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -132,6 +132,8 @@ struct vhost_virtqueue {
 	bool user_be;
 #endif
 	u32 busyloop_timeout;
+
+	bool initialized;
 };
 
 struct vhost_msg_node {
@@ -148,6 +150,7 @@ struct vhost_dev {
 	struct mutex mutex;
 	struct vhost_virtqueue **vqs;
 	int nvqs;
+	int max_nvqs;
 	struct eventfd_ctx *log_ctx;
 	struct llist_head work_list;
 	struct task_struct *worker;
@@ -168,8 +171,8 @@ struct vhost_dev {
 
 bool vhost_exceeds_weight(struct vhost_virtqueue *vq, int pkts, int total_len);
 int vhost_dev_init(struct vhost_dev *dev, struct vhost_virtqueue **vqs,
-		   int nvqs, int iov_limit, int weight, int byte_weight,
-		   bool use_worker,
+		   int nvqs, int max_nvqs, int iov_limit, int weight,
+		   int byte_weight, bool use_worker,
 		   int (*msg_handler)(struct vhost_dev *dev,
 				      struct vhost_iotlb_msg *msg));
 long vhost_dev_set_owner(struct vhost_dev *dev);
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index cae0083..bcdfd58 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -606,7 +606,7 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
 {
 	struct vhost_virtqueue **vqs;
 	struct vhost_vsock *vsock;
-	int ret;
+	int ret, nvqs;
 
 	/* This struct is large and allocation could fail, fall back to vmalloc
 	 * if there is no other way.
@@ -615,7 +615,8 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
 	if (!vsock)
 		return -ENOMEM;
 
-	vqs = kmalloc_array(ARRAY_SIZE(vsock->vqs), sizeof(*vqs), GFP_KERNEL);
+	nvqs = ARRAY_SIZE(vsock->vqs);
+	vqs = kmalloc_array(nvqs, sizeof(*vqs), GFP_KERNEL);
 	if (!vqs) {
 		ret = -ENOMEM;
 		goto out;
@@ -630,9 +631,9 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
 	vsock->vqs[VSOCK_VQ_TX].handle_kick = vhost_vsock_handle_tx_kick;
 	vsock->vqs[VSOCK_VQ_RX].handle_kick = vhost_vsock_handle_rx_kick;
 
-	ret = vhost_dev_init(&vsock->dev, vqs, ARRAY_SIZE(vsock->vqs),
-			     UIO_MAXIOV, VHOST_VSOCK_PKT_WEIGHT,
-			     VHOST_VSOCK_WEIGHT, true, NULL);
+	ret = vhost_dev_init(&vsock->dev, vqs, nvqs, nvqs, UIO_MAXIOV,
+			     VHOST_VSOCK_PKT_WEIGHT, VHOST_VSOCK_WEIGHT, true,
+			     NULL);
 	if (ret)
 		goto err_dev_init;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 07/11] vhost scsi: support delayed IO vq creation
  2020-11-04 22:26 [PATCH 00/11 V4] vhost: vhost-scsi bug fixes Mike Christie
                   ` (5 preceding siblings ...)
  2020-11-04 22:26 ` [PATCH 06/11] vhost: support delayed vq creation Mike Christie
@ 2020-11-04 22:26 ` Mike Christie
  2020-11-04 22:26 ` [PATCH 08/11] vhost scsi: alloc cmds per vq instead of session Mike Christie
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Mike Christie @ 2020-11-04 22:26 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

Each vhost-scsi device will need a evt and ctl queue, but the number
of IO queues depends on whatever the user has configured in userspace.
This patch has vhost-scsi create the evt, ctl and one IO vq at device
open time. We then create the other IO vqs when userspace starts to
set them up. We still waste some mem on the vq and scsi vq structs,
but we don't waste mem on iovec related arrays and for later patches
we know which queues are used by the dev->nvqs value.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/scsi.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 5b3720e..24c345c 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1401,7 +1401,7 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
 	mutex_lock(&vs->dev.mutex);
 
 	/* Verify that ring has been setup correctly. */
-	for (index = 0; index < vs->dev.nvqs; ++index) {
+	for (index = 0; index < vs->dev.max_nvqs; ++index) {
 		/* Verify that ring has been setup correctly. */
 		if (!vhost_vq_access_ok(&vs->vqs[index].vq)) {
 			ret = -EFAULT;
@@ -1464,6 +1464,9 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
 		       sizeof(vs->vs_vhost_wwpn));
 		for (i = 0; i < VHOST_SCSI_MAX_VQ; i++) {
 			vq = &vs->vqs[i].vq;
+			if (!vq->initialized)
+				continue;
+
 			mutex_lock(&vq->mutex);
 			vhost_vq_set_backend(vq, vs_tpg);
 			vhost_vq_init_access(vq);
@@ -1503,7 +1506,7 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
 	mutex_lock(&vhost_scsi_mutex);
 	mutex_lock(&vs->dev.mutex);
 	/* Verify that ring has been setup correctly. */
-	for (index = 0; index < vs->dev.nvqs; ++index) {
+	for (index = 0; index < vs->dev.max_nvqs; ++index) {
 		if (!vhost_vq_access_ok(&vs->vqs[index].vq)) {
 			ret = -EFAULT;
 			goto err_dev;
@@ -1551,6 +1554,9 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
 	if (match) {
 		for (i = 0; i < VHOST_SCSI_MAX_VQ; i++) {
 			vq = &vs->vqs[i].vq;
+			if (!vq->initialized)
+				continue;
+
 			mutex_lock(&vq->mutex);
 			vhost_vq_set_backend(vq, NULL);
 			mutex_unlock(&vq->mutex);
@@ -1632,8 +1638,13 @@ static int vhost_scsi_open(struct inode *inode, struct file *f)
 		vqs[i] = &vs->vqs[i].vq;
 		vs->vqs[i].vq.handle_kick = vhost_scsi_handle_kick;
 	}
-	r = vhost_dev_init(&vs->dev, vqs, VHOST_SCSI_MAX_VQ, VHOST_SCSI_MAX_VQ,
-			   UIO_MAXIOV, VHOST_SCSI_WEIGHT, 0, true, NULL);
+
+	/*
+	 * We will always need the ctl, evt and at least 1 IO vq. Create more
+	 * IO vqs if userspace requests them.
+	 */
+	r = vhost_dev_init(&vs->dev, vqs, 3, VHOST_SCSI_MAX_VQ, UIO_MAXIOV,
+			   VHOST_SCSI_WEIGHT, 0, true, NULL);
 	if (r)
 		goto err_dev_init;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 08/11] vhost scsi: alloc cmds per vq instead of session
  2020-11-04 22:26 [PATCH 00/11 V4] vhost: vhost-scsi bug fixes Mike Christie
                   ` (6 preceding siblings ...)
  2020-11-04 22:26 ` [PATCH 07/11] vhost scsi: support delayed IO " Mike Christie
@ 2020-11-04 22:26 ` Mike Christie
  2020-11-04 22:26 ` [PATCH 09/11] vhost scsi: fix cmd completion race Mike Christie
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Mike Christie @ 2020-11-04 22:26 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

We currently are limited to 256 cmds per session. This leads to problems
where if the user has increased virtqueue_size to more than 2 or
cmd_per_lun to more than 256 vhost_scsi_get_tag can fail and the guest
will get IO errors.

This patch moves the cmd allocation to per vq so we can easily match
whatever the user has specified for num_queues and
virtqueue_size/cmd_per_lun. It also makes it easier to control how much
memory we preallocate. For cases, where perf is not as important and
we can use the current defaults (1 vq and 128 cmds per vq) memory use
from preallocate cmds is cut in half. For cases, where we are willing
to use more memory for higher perf, cmd mem use will now increase as
the num queues and queue depth increases.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/scsi.c | 207 +++++++++++++++++++++++++++++++--------------------
 1 file changed, 128 insertions(+), 79 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 24c345c..aecfa5f 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -52,7 +52,6 @@
 #define VHOST_SCSI_VERSION  "v0.1"
 #define VHOST_SCSI_NAMELEN 256
 #define VHOST_SCSI_MAX_CDB_SIZE 32
-#define VHOST_SCSI_DEFAULT_TAGS 256
 #define VHOST_SCSI_PREALLOC_SGLS 2048
 #define VHOST_SCSI_PREALLOC_UPAGES 2048
 #define VHOST_SCSI_PREALLOC_PROT_SGLS 2048
@@ -189,6 +188,9 @@ struct vhost_scsi_virtqueue {
 	 * Writers must also take dev mutex and flush under it.
 	 */
 	int inflight_idx;
+	struct vhost_scsi_cmd *scsi_cmds;
+	struct sbitmap scsi_tags;
+	int max_cmds;
 };
 
 struct vhost_scsi {
@@ -324,7 +326,9 @@ static void vhost_scsi_release_cmd(struct se_cmd *se_cmd)
 {
 	struct vhost_scsi_cmd *tv_cmd = container_of(se_cmd,
 				struct vhost_scsi_cmd, tvc_se_cmd);
-	struct se_session *se_sess = tv_cmd->tvc_nexus->tvn_se_sess;
+	struct vhost_scsi_virtqueue *svq = container_of(tv_cmd->tvc_vq,
+				struct vhost_scsi_virtqueue, vq);
+	struct vhost_scsi_inflight *inflight = tv_cmd->inflight;
 	int i;
 
 	if (tv_cmd->tvc_sgl_count) {
@@ -336,8 +340,8 @@ static void vhost_scsi_release_cmd(struct se_cmd *se_cmd)
 			put_page(sg_page(&tv_cmd->tvc_prot_sgl[i]));
 	}
 
-	vhost_scsi_put_inflight(tv_cmd->inflight);
-	target_free_tag(se_sess, se_cmd);
+	sbitmap_clear_bit(&svq->scsi_tags, se_cmd->map_tag);
+	vhost_scsi_put_inflight(inflight);
 }
 
 static u32 vhost_scsi_sess_get_index(struct se_session *se_sess)
@@ -566,31 +570,31 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work *work)
 }
 
 static struct vhost_scsi_cmd *
-vhost_scsi_get_tag(struct vhost_virtqueue *vq, struct vhost_scsi_tpg *tpg,
+vhost_scsi_get_cmd(struct vhost_virtqueue *vq, struct vhost_scsi_tpg *tpg,
 		   unsigned char *cdb, u64 scsi_tag, u16 lun, u8 task_attr,
 		   u32 exp_data_len, int data_direction)
 {
+	struct vhost_scsi_virtqueue *svq = container_of(vq,
+					struct vhost_scsi_virtqueue, vq);
 	struct vhost_scsi_cmd *cmd;
 	struct vhost_scsi_nexus *tv_nexus;
-	struct se_session *se_sess;
 	struct scatterlist *sg, *prot_sg;
 	struct page **pages;
-	int tag, cpu;
+	int tag;
 
 	tv_nexus = tpg->tpg_nexus;
 	if (!tv_nexus) {
 		pr_err("Unable to locate active struct vhost_scsi_nexus\n");
 		return ERR_PTR(-EIO);
 	}
-	se_sess = tv_nexus->tvn_se_sess;
 
-	tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
+	tag = sbitmap_get(&svq->scsi_tags, 0, false);
 	if (tag < 0) {
 		pr_err("Unable to obtain tag for vhost_scsi_cmd\n");
 		return ERR_PTR(-ENOMEM);
 	}
 
-	cmd = &((struct vhost_scsi_cmd *)se_sess->sess_cmd_map)[tag];
+	cmd = &svq->scsi_cmds[tag];
 	sg = cmd->tvc_sgl;
 	prot_sg = cmd->tvc_prot_sgl;
 	pages = cmd->tvc_upages;
@@ -599,7 +603,6 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work *work)
 	cmd->tvc_prot_sgl = prot_sg;
 	cmd->tvc_upages = pages;
 	cmd->tvc_se_cmd.map_tag = tag;
-	cmd->tvc_se_cmd.map_cpu = cpu;
 	cmd->tvc_tag = scsi_tag;
 	cmd->tvc_lun = lun;
 	cmd->tvc_task_attr = task_attr;
@@ -1070,11 +1073,11 @@ static u16 vhost_buf_to_lun(u8 *lun_buf)
 				scsi_command_size(cdb), VHOST_SCSI_MAX_CDB_SIZE);
 				goto err;
 		}
-		cmd = vhost_scsi_get_tag(vq, tpg, cdb, tag, lun, task_attr,
+		cmd = vhost_scsi_get_cmd(vq, tpg, cdb, tag, lun, task_attr,
 					 exp_data_len + prot_bytes,
 					 data_direction);
 		if (IS_ERR(cmd)) {
-			vq_err(vq, "vhost_scsi_get_tag failed %ld\n",
+			vq_err(vq, "vhost_scsi_get_cmd failed %ld\n",
 			       PTR_ERR(cmd));
 			goto err;
 		}
@@ -1378,6 +1381,83 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
 		wait_for_completion(&old_inflight[i]->comp);
 }
 
+static void vhost_scsi_destroy_vq_cmds(struct vhost_virtqueue *vq)
+{
+	struct vhost_scsi_virtqueue *svq = container_of(vq,
+					struct vhost_scsi_virtqueue, vq);
+	struct vhost_scsi_cmd *tv_cmd;
+	unsigned int i;
+
+	if (!svq->scsi_cmds)
+		return;
+
+	for (i = 0; i < svq->max_cmds; i++) {
+		tv_cmd = &svq->scsi_cmds[i];
+
+		kfree(tv_cmd->tvc_sgl);
+		kfree(tv_cmd->tvc_prot_sgl);
+		kfree(tv_cmd->tvc_upages);
+	}
+
+	sbitmap_free(&svq->scsi_tags);
+	kfree(svq->scsi_cmds);
+	svq->scsi_cmds = NULL;
+}
+
+static int vhost_scsi_setup_vq_cmds(struct vhost_virtqueue *vq, int max_cmds)
+{
+	struct vhost_scsi_virtqueue *svq = container_of(vq,
+					struct vhost_scsi_virtqueue, vq);
+	struct vhost_scsi_cmd *tv_cmd;
+	unsigned int i;
+
+	if (svq->scsi_cmds)
+		return 0;
+
+	if (sbitmap_init_node(&svq->scsi_tags, max_cmds, -1, GFP_KERNEL,
+			      NUMA_NO_NODE))
+		return -ENOMEM;
+	svq->max_cmds = max_cmds;
+
+	svq->scsi_cmds = kcalloc(max_cmds, sizeof(*tv_cmd), GFP_KERNEL);
+	if (!svq->scsi_cmds) {
+		sbitmap_free(&svq->scsi_tags);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < max_cmds; i++) {
+		tv_cmd = &svq->scsi_cmds[i];
+
+		tv_cmd->tvc_sgl = kcalloc(VHOST_SCSI_PREALLOC_SGLS,
+					  sizeof(struct scatterlist),
+					  GFP_KERNEL);
+		if (!tv_cmd->tvc_sgl) {
+			pr_err("Unable to allocate tv_cmd->tvc_sgl\n");
+			goto out;
+		}
+
+		tv_cmd->tvc_upages = kcalloc(VHOST_SCSI_PREALLOC_UPAGES,
+					     sizeof(struct page *),
+					     GFP_KERNEL);
+		if (!tv_cmd->tvc_upages) {
+			pr_err("Unable to allocate tv_cmd->tvc_upages\n");
+			goto out;
+		}
+
+		tv_cmd->tvc_prot_sgl = kcalloc(VHOST_SCSI_PREALLOC_PROT_SGLS,
+					       sizeof(struct scatterlist),
+					       GFP_KERNEL);
+		if (!tv_cmd->tvc_prot_sgl) {
+			pr_err("Unable to allocate tv_cmd->tvc_prot_sgl\n");
+			goto out;
+		}
+	}
+	return 0;
+out:
+	vhost_scsi_destroy_vq_cmds(vq);
+	return -ENOMEM;
+}
+
 /*
  * Called from vhost_scsi_ioctl() context to walk the list of available
  * vhost_scsi_tpg with an active struct vhost_scsi_nexus
@@ -1432,10 +1512,9 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
 
 		if (!strcmp(tv_tport->tport_name, t->vhost_wwpn)) {
 			if (vs->vs_tpg && vs->vs_tpg[tpg->tport_tpgt]) {
-				kfree(vs_tpg);
 				mutex_unlock(&tpg->tv_tpg_mutex);
 				ret = -EEXIST;
-				goto out;
+				goto undepend;
 			}
 			/*
 			 * In order to ensure individual vhost-scsi configfs
@@ -1447,9 +1526,8 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
 			ret = target_depend_item(&se_tpg->tpg_group.cg_item);
 			if (ret) {
 				pr_warn("target_depend_item() failed: %d\n", ret);
-				kfree(vs_tpg);
 				mutex_unlock(&tpg->tv_tpg_mutex);
-				goto out;
+				goto undepend;
 			}
 			tpg->tv_tpg_vhost_count++;
 			tpg->vhost_scsi = vs;
@@ -1462,6 +1540,16 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
 	if (match) {
 		memcpy(vs->vs_vhost_wwpn, t->vhost_wwpn,
 		       sizeof(vs->vs_vhost_wwpn));
+
+		for (i = VHOST_SCSI_VQ_IO; i < VHOST_SCSI_MAX_VQ; i++) {
+			vq = &vs->vqs[i].vq;
+			if (!vq->initialized)
+				continue;
+
+			if (vhost_scsi_setup_vq_cmds(vq, vq->num))
+				goto destroy_vq_cmds;
+		}
+
 		for (i = 0; i < VHOST_SCSI_MAX_VQ; i++) {
 			vq = &vs->vqs[i].vq;
 			if (!vq->initialized)
@@ -1484,7 +1572,22 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
 	vhost_scsi_flush(vs);
 	kfree(vs->vs_tpg);
 	vs->vs_tpg = vs_tpg;
+	goto out;
 
+destroy_vq_cmds:
+	for (i--; i >= VHOST_SCSI_VQ_IO; i--) {
+		if (!vhost_vq_get_backend(&vs->vqs[i].vq))
+			vhost_scsi_destroy_vq_cmds(&vs->vqs[i].vq);
+	}
+undepend:
+	for (i = 0; i < VHOST_SCSI_MAX_TARGET; i++) {
+		tpg = vs_tpg[i];
+		if (tpg) {
+			tpg->tv_tpg_vhost_count--;
+			target_undepend_item(&tpg->se_tpg.tpg_group.cg_item);
+		}
+	}
+	kfree(vs_tpg);
 out:
 	mutex_unlock(&vs->dev.mutex);
 	mutex_unlock(&vhost_scsi_mutex);
@@ -1560,6 +1663,12 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
 			mutex_lock(&vq->mutex);
 			vhost_vq_set_backend(vq, NULL);
 			mutex_unlock(&vq->mutex);
+			/*
+			 * Make sure cmds are not running before tearing them
+			 * down.
+			 */
+			vhost_scsi_flush(vs);
+			vhost_scsi_destroy_vq_cmds(vq);
 		}
 	}
 	/*
@@ -1862,23 +1971,6 @@ static void vhost_scsi_port_unlink(struct se_portal_group *se_tpg,
 	mutex_unlock(&vhost_scsi_mutex);
 }
 
-static void vhost_scsi_free_cmd_map_res(struct se_session *se_sess)
-{
-	struct vhost_scsi_cmd *tv_cmd;
-	unsigned int i;
-
-	if (!se_sess->sess_cmd_map)
-		return;
-
-	for (i = 0; i < VHOST_SCSI_DEFAULT_TAGS; i++) {
-		tv_cmd = &((struct vhost_scsi_cmd *)se_sess->sess_cmd_map)[i];
-
-		kfree(tv_cmd->tvc_sgl);
-		kfree(tv_cmd->tvc_prot_sgl);
-		kfree(tv_cmd->tvc_upages);
-	}
-}
-
 static ssize_t vhost_scsi_tpg_attrib_fabric_prot_type_store(
 		struct config_item *item, const char *page, size_t count)
 {
@@ -1918,45 +2010,6 @@ static ssize_t vhost_scsi_tpg_attrib_fabric_prot_type_show(
 	NULL,
 };
 
-static int vhost_scsi_nexus_cb(struct se_portal_group *se_tpg,
-			       struct se_session *se_sess, void *p)
-{
-	struct vhost_scsi_cmd *tv_cmd;
-	unsigned int i;
-
-	for (i = 0; i < VHOST_SCSI_DEFAULT_TAGS; i++) {
-		tv_cmd = &((struct vhost_scsi_cmd *)se_sess->sess_cmd_map)[i];
-
-		tv_cmd->tvc_sgl = kcalloc(VHOST_SCSI_PREALLOC_SGLS,
-					  sizeof(struct scatterlist),
-					  GFP_KERNEL);
-		if (!tv_cmd->tvc_sgl) {
-			pr_err("Unable to allocate tv_cmd->tvc_sgl\n");
-			goto out;
-		}
-
-		tv_cmd->tvc_upages = kcalloc(VHOST_SCSI_PREALLOC_UPAGES,
-					     sizeof(struct page *),
-					     GFP_KERNEL);
-		if (!tv_cmd->tvc_upages) {
-			pr_err("Unable to allocate tv_cmd->tvc_upages\n");
-			goto out;
-		}
-
-		tv_cmd->tvc_prot_sgl = kcalloc(VHOST_SCSI_PREALLOC_PROT_SGLS,
-					       sizeof(struct scatterlist),
-					       GFP_KERNEL);
-		if (!tv_cmd->tvc_prot_sgl) {
-			pr_err("Unable to allocate tv_cmd->tvc_prot_sgl\n");
-			goto out;
-		}
-	}
-	return 0;
-out:
-	vhost_scsi_free_cmd_map_res(se_sess);
-	return -ENOMEM;
-}
-
 static int vhost_scsi_make_nexus(struct vhost_scsi_tpg *tpg,
 				const char *name)
 {
@@ -1980,12 +2033,9 @@ static int vhost_scsi_make_nexus(struct vhost_scsi_tpg *tpg,
 	 * struct se_node_acl for the vhost_scsi struct se_portal_group with
 	 * the SCSI Initiator port name of the passed configfs group 'name'.
 	 */
-	tv_nexus->tvn_se_sess = target_setup_session(&tpg->se_tpg,
-					VHOST_SCSI_DEFAULT_TAGS,
-					sizeof(struct vhost_scsi_cmd),
+	tv_nexus->tvn_se_sess = target_setup_session(&tpg->se_tpg, 0, 0,
 					TARGET_PROT_DIN_PASS | TARGET_PROT_DOUT_PASS,
-					(unsigned char *)name, tv_nexus,
-					vhost_scsi_nexus_cb);
+					(unsigned char *)name, tv_nexus, NULL);
 	if (IS_ERR(tv_nexus->tvn_se_sess)) {
 		mutex_unlock(&tpg->tv_tpg_mutex);
 		kfree(tv_nexus);
@@ -2035,7 +2085,6 @@ static int vhost_scsi_drop_nexus(struct vhost_scsi_tpg *tpg)
 		" %s Initiator Port: %s\n", vhost_scsi_dump_proto_id(tpg->tport),
 		tv_nexus->tvn_se_sess->se_node_acl->initiatorname);
 
-	vhost_scsi_free_cmd_map_res(se_sess);
 	/*
 	 * Release the SCSI I_T Nexus to the emulated vhost Target Port
 	 */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 09/11] vhost scsi: fix cmd completion race
  2020-11-04 22:26 [PATCH 00/11 V4] vhost: vhost-scsi bug fixes Mike Christie
                   ` (7 preceding siblings ...)
  2020-11-04 22:26 ` [PATCH 08/11] vhost scsi: alloc cmds per vq instead of session Mike Christie
@ 2020-11-04 22:26 ` Mike Christie
  2020-11-04 22:26 ` [PATCH 10/11] vhost scsi: Add support for LUN resets Mike Christie
  2020-11-04 22:26 ` [PATCH 11/11] vhost scsi: remove extra flushes Mike Christie
  10 siblings, 0 replies; 19+ messages in thread
From: Mike Christie @ 2020-11-04 22:26 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

We might not do the final se_cmd put from vhost_scsi_complete_cmd_work.
When the last put happens a little later then we could race where
vhost_scsi_complete_cmd_work does vhost_signal, the guest runs and sends
more IO, and vhost_scsi_handle_vq runs but does not find any free cmds.

This patch has us delay completing the cmd until the last lio core ref
is dropped. We then know that once we signal to the guest that the cmd
is completed that if it queues a new command it will find a free cmd.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Maurizio Lombardi <mlombard@redhat.com>
---
 drivers/vhost/scsi.c | 42 +++++++++++++++---------------------------
 1 file changed, 15 insertions(+), 27 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index aecfa5f..0d5cec5 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -322,7 +322,7 @@ static u32 vhost_scsi_tpg_get_inst_index(struct se_portal_group *se_tpg)
 	return 1;
 }
 
-static void vhost_scsi_release_cmd(struct se_cmd *se_cmd)
+static void vhost_scsi_release_cmd_res(struct se_cmd *se_cmd)
 {
 	struct vhost_scsi_cmd *tv_cmd = container_of(se_cmd,
 				struct vhost_scsi_cmd, tvc_se_cmd);
@@ -344,6 +344,16 @@ static void vhost_scsi_release_cmd(struct se_cmd *se_cmd)
 	vhost_scsi_put_inflight(inflight);
 }
 
+static void vhost_scsi_release_cmd(struct se_cmd *se_cmd)
+{
+	struct vhost_scsi_cmd *cmd = container_of(se_cmd,
+					struct vhost_scsi_cmd, tvc_se_cmd);
+	struct vhost_scsi *vs = cmd->tvc_vhost;
+
+	llist_add(&cmd->tvc_completion_list, &vs->vs_completion_list);
+	vhost_work_queue(&vs->dev, &vs->vs_completion_work);
+}
+
 static u32 vhost_scsi_sess_get_index(struct se_session *se_sess)
 {
 	return 0;
@@ -366,28 +376,15 @@ static int vhost_scsi_get_cmd_state(struct se_cmd *se_cmd)
 	return 0;
 }
 
-static void vhost_scsi_complete_cmd(struct vhost_scsi_cmd *cmd)
-{
-	struct vhost_scsi *vs = cmd->tvc_vhost;
-
-	llist_add(&cmd->tvc_completion_list, &vs->vs_completion_list);
-
-	vhost_work_queue(&vs->dev, &vs->vs_completion_work);
-}
-
 static int vhost_scsi_queue_data_in(struct se_cmd *se_cmd)
 {
-	struct vhost_scsi_cmd *cmd = container_of(se_cmd,
-				struct vhost_scsi_cmd, tvc_se_cmd);
-	vhost_scsi_complete_cmd(cmd);
+	transport_generic_free_cmd(se_cmd, 0);
 	return 0;
 }
 
 static int vhost_scsi_queue_status(struct se_cmd *se_cmd)
 {
-	struct vhost_scsi_cmd *cmd = container_of(se_cmd,
-				struct vhost_scsi_cmd, tvc_se_cmd);
-	vhost_scsi_complete_cmd(cmd);
+	transport_generic_free_cmd(se_cmd, 0);
 	return 0;
 }
 
@@ -433,15 +430,6 @@ static void vhost_scsi_free_evt(struct vhost_scsi *vs, struct vhost_scsi_evt *ev
 	return evt;
 }
 
-static void vhost_scsi_free_cmd(struct vhost_scsi_cmd *cmd)
-{
-	struct se_cmd *se_cmd = &cmd->tvc_se_cmd;
-
-	/* TODO locking against target/backend threads? */
-	transport_generic_free_cmd(se_cmd, 0);
-
-}
-
 static int vhost_scsi_check_stop_free(struct se_cmd *se_cmd)
 {
 	return target_put_sess_cmd(se_cmd);
@@ -560,7 +548,7 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work *work)
 		} else
 			pr_err("Faulted on virtio_scsi_cmd_resp\n");
 
-		vhost_scsi_free_cmd(cmd);
+		vhost_scsi_release_cmd_res(se_cmd);
 	}
 
 	vq = -1;
@@ -1096,7 +1084,7 @@ static u16 vhost_buf_to_lun(u8 *lun_buf)
 						      &prot_iter, exp_data_len,
 						      &data_iter))) {
 				vq_err(vq, "Failed to map iov to sgl\n");
-				vhost_scsi_release_cmd(&cmd->tvc_se_cmd);
+				vhost_scsi_release_cmd_res(&cmd->tvc_se_cmd);
 				goto err;
 			}
 		}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 10/11] vhost scsi: Add support for LUN resets.
  2020-11-04 22:26 [PATCH 00/11 V4] vhost: vhost-scsi bug fixes Mike Christie
                   ` (8 preceding siblings ...)
  2020-11-04 22:26 ` [PATCH 09/11] vhost scsi: fix cmd completion race Mike Christie
@ 2020-11-04 22:26 ` Mike Christie
  2020-11-04 22:26 ` [PATCH 11/11] vhost scsi: remove extra flushes Mike Christie
  10 siblings, 0 replies; 19+ messages in thread
From: Mike Christie @ 2020-11-04 22:26 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

In newer versions of virtio-scsi we just reset the timer when an a
command times out, so TMFs are never sent for the cmd time out case.
However, in older kernels and for the TMF inject cases, we can still get
resets and we end up just failing immediately so the guest might see the
device get offlined and IO errors.

For the older kernel cases, we want the same end result as the
modern virtio-scsi driver where we let the lower levels fire their error
handling and handle the problem. And at the upper levels we want to
wait. This patch ties the LUN reset handling into the LIO TMF code which
will just wait for outstanding commands to complete like we are doing in
the modern virtio-scsi case.

Note: I did not handle the ABORT case to keep this simple. For ABORTs
LIO just waits on the cmd like how it does for the RESET case. If
an ABORT fails, the guest OS ends up escalating to LUN RESET, so in
the end we get the same behavior where we wait on the outstanding
cmds.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/scsi.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 134 insertions(+), 13 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 0d5cec5..144cd05 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -139,6 +139,7 @@ struct vhost_scsi_tpg {
 	struct se_portal_group se_tpg;
 	/* Pointer back to vhost_scsi, protected by tv_tpg_mutex */
 	struct vhost_scsi *vhost_scsi;
+	struct list_head tmf_queue;
 };
 
 struct vhost_scsi_tport {
@@ -211,6 +212,20 @@ struct vhost_scsi {
 	int vs_events_nr; /* num of pending events, protected by vq->mutex */
 };
 
+struct vhost_scsi_tmf {
+	struct vhost_work vwork;
+	struct vhost_scsi_tpg *tpg;
+	struct vhost_scsi *vhost;
+	struct vhost_scsi_virtqueue *svq;
+	struct list_head queue_entry;
+
+	struct se_cmd se_cmd;
+	struct vhost_scsi_inflight *inflight;
+	struct iovec resp_iov;
+	int in_iovs;
+	int vq_desc;
+};
+
 /*
  * Context for processing request and control queue operations.
  */
@@ -344,14 +359,32 @@ static void vhost_scsi_release_cmd_res(struct se_cmd *se_cmd)
 	vhost_scsi_put_inflight(inflight);
 }
 
+static void vhost_scsi_release_tmf_res(struct vhost_scsi_tmf *tmf)
+{
+	struct vhost_scsi_tpg *tpg = tmf->tpg;
+	struct vhost_scsi_inflight *inflight = tmf->inflight;
+
+	mutex_lock(&tpg->tv_tpg_mutex);
+	list_add_tail(&tpg->tmf_queue, &tmf->queue_entry);
+	mutex_unlock(&tpg->tv_tpg_mutex);
+	vhost_scsi_put_inflight(inflight);
+}
+
 static void vhost_scsi_release_cmd(struct se_cmd *se_cmd)
 {
-	struct vhost_scsi_cmd *cmd = container_of(se_cmd,
+	if (se_cmd->se_cmd_flags & SCF_SCSI_TMR_CDB) {
+		struct vhost_scsi_tmf *tmf = container_of(se_cmd,
+					struct vhost_scsi_tmf, se_cmd);
+
+		vhost_work_queue(&tmf->vhost->dev, &tmf->vwork);
+	} else {
+		struct vhost_scsi_cmd *cmd = container_of(se_cmd,
 					struct vhost_scsi_cmd, tvc_se_cmd);
-	struct vhost_scsi *vs = cmd->tvc_vhost;
+		struct vhost_scsi *vs = cmd->tvc_vhost;
 
-	llist_add(&cmd->tvc_completion_list, &vs->vs_completion_list);
-	vhost_work_queue(&vs->dev, &vs->vs_completion_work);
+		llist_add(&cmd->tvc_completion_list, &vs->vs_completion_list);
+		vhost_work_queue(&vs->dev, &vs->vs_completion_work);
+	}
 }
 
 static u32 vhost_scsi_sess_get_index(struct se_session *se_sess)
@@ -390,7 +423,10 @@ static int vhost_scsi_queue_status(struct se_cmd *se_cmd)
 
 static void vhost_scsi_queue_tm_rsp(struct se_cmd *se_cmd)
 {
-	return;
+	struct vhost_scsi_tmf *tmf = container_of(se_cmd, struct vhost_scsi_tmf,
+						  se_cmd);
+
+	transport_generic_free_cmd(&tmf->se_cmd, 0);
 }
 
 static void vhost_scsi_aborted_task(struct se_cmd *se_cmd)
@@ -1120,9 +1156,9 @@ static u16 vhost_buf_to_lun(u8 *lun_buf)
 }
 
 static void
-vhost_scsi_send_tmf_reject(struct vhost_scsi *vs,
-			   struct vhost_virtqueue *vq,
-			   struct vhost_scsi_ctx *vc)
+vhost_scsi_send_tmf_resp(struct vhost_scsi *vs, struct vhost_virtqueue *vq,
+			 int in_iovs, int vq_desc, struct iovec *resp_iov,
+			 int tmf_resp_code)
 {
 	struct virtio_scsi_ctrl_tmf_resp rsp;
 	struct iov_iter iov_iter;
@@ -1130,17 +1166,87 @@ static u16 vhost_buf_to_lun(u8 *lun_buf)
 
 	pr_debug("%s\n", __func__);
 	memset(&rsp, 0, sizeof(rsp));
-	rsp.response = VIRTIO_SCSI_S_FUNCTION_REJECTED;
+	rsp.response = tmf_resp_code;
 
-	iov_iter_init(&iov_iter, READ, &vq->iov[vc->out], vc->in, sizeof(rsp));
+	iov_iter_init(&iov_iter, READ, resp_iov, in_iovs, sizeof(rsp));
 
 	ret = copy_to_iter(&rsp, sizeof(rsp), &iov_iter);
 	if (likely(ret = sizeof(rsp)))
-		vhost_add_used_and_signal(&vs->dev, vq, vc->head, 0);
+		vhost_add_used_and_signal(&vs->dev, vq, vq_desc, 0);
 	else
 		pr_err("Faulted on virtio_scsi_ctrl_tmf_resp\n");
 }
 
+static void vhost_scsi_tmf_resp_work(struct vhost_work *work)
+{
+	struct vhost_scsi_tmf *tmf = container_of(work, struct vhost_scsi_tmf,
+						  vwork);
+	int resp_code;
+
+	if (tmf->se_cmd.se_tmr_req->response = TMR_FUNCTION_COMPLETE)
+		resp_code = VIRTIO_SCSI_S_FUNCTION_SUCCEEDED;
+	else
+		resp_code = VIRTIO_SCSI_S_FUNCTION_REJECTED;
+
+	vhost_scsi_send_tmf_resp(tmf->vhost, &tmf->svq->vq, tmf->in_iovs,
+				 tmf->vq_desc, &tmf->resp_iov, resp_code);
+	vhost_scsi_release_tmf_res(tmf);
+}
+
+static void
+vhost_scsi_handle_tmf(struct vhost_scsi *vs, struct vhost_scsi_tpg *tpg,
+		      struct vhost_virtqueue *vq,
+		      struct virtio_scsi_ctrl_tmf_req *vtmf,
+		      struct vhost_scsi_ctx *vc)
+{
+	struct vhost_scsi_virtqueue *svq = container_of(vq,
+					struct vhost_scsi_virtqueue, vq);
+	struct vhost_scsi_tmf *tmf;
+
+	if (vhost32_to_cpu(vq, vtmf->subtype) !+	    VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET)
+		goto send_reject;
+
+	if (!tpg->tpg_nexus || !tpg->tpg_nexus->tvn_se_sess) {
+		pr_err("Unable to locate active struct vhost_scsi_nexus for LUN RESET.\n");
+		goto send_reject;
+	}
+
+	mutex_lock(&tpg->tv_tpg_mutex);
+	if (list_empty(&tpg->tmf_queue)) {
+		pr_err("Missing reserve TMF. Could not handle LUN RESET.\n");
+		mutex_unlock(&tpg->tv_tpg_mutex);
+		goto send_reject;
+	}
+
+	tmf = list_first_entry(&tpg->tmf_queue, struct vhost_scsi_tmf,
+			       queue_entry);
+	list_del_init(&tmf->queue_entry);
+	mutex_unlock(&tpg->tv_tpg_mutex);
+
+	tmf->tpg = tpg;
+	tmf->vhost = vs;
+	tmf->svq = svq;
+	tmf->resp_iov = vq->iov[vc->out];
+	tmf->vq_desc = vc->head;
+	tmf->in_iovs = vc->in;
+	tmf->inflight = vhost_scsi_get_inflight(vq);
+
+	if (target_submit_tmr(&tmf->se_cmd, tpg->tpg_nexus->tvn_se_sess, NULL,
+			      vhost_buf_to_lun(vtmf->lun), NULL,
+			      TMR_LUN_RESET, GFP_KERNEL, 0,
+			      TARGET_SCF_ACK_KREF) < 0) {
+		vhost_scsi_release_tmf_res(tmf);
+		goto send_reject;
+	}
+
+	return;
+
+send_reject:
+	vhost_scsi_send_tmf_resp(vs, vq, vc->in, vc->head, &vq->iov[vc->out],
+				 VIRTIO_SCSI_S_FUNCTION_REJECTED);
+}
+
 static void
 vhost_scsi_send_an_resp(struct vhost_scsi *vs,
 			struct vhost_virtqueue *vq,
@@ -1166,6 +1272,7 @@ static u16 vhost_buf_to_lun(u8 *lun_buf)
 static void
 vhost_scsi_ctl_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
 {
+	struct vhost_scsi_tpg *tpg;
 	union {
 		__virtio32 type;
 		struct virtio_scsi_ctrl_an_req an;
@@ -1247,12 +1354,12 @@ static u16 vhost_buf_to_lun(u8 *lun_buf)
 		vc.req += typ_size;
 		vc.req_size -= typ_size;
 
-		ret = vhost_scsi_get_req(vq, &vc, NULL);
+		ret = vhost_scsi_get_req(vq, &vc, &tpg);
 		if (ret)
 			goto err;
 
 		if (v_req.type = VIRTIO_SCSI_T_TMF)
-			vhost_scsi_send_tmf_reject(vs, vq, &vc);
+			vhost_scsi_handle_tmf(vs, tpg, vq, &v_req.tmf, &vc);
 		else
 			vhost_scsi_send_an_resp(vs, vq, &vc);
 err:
@@ -1928,11 +2035,19 @@ static int vhost_scsi_port_link(struct se_portal_group *se_tpg,
 {
 	struct vhost_scsi_tpg *tpg = container_of(se_tpg,
 				struct vhost_scsi_tpg, se_tpg);
+	struct vhost_scsi_tmf *tmf;
+
+	tmf = kzalloc(sizeof(*tmf), GFP_KERNEL);
+	if (!tmf)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&tmf->queue_entry);
+	vhost_work_init(&tmf->vwork, vhost_scsi_tmf_resp_work);
 
 	mutex_lock(&vhost_scsi_mutex);
 
 	mutex_lock(&tpg->tv_tpg_mutex);
 	tpg->tv_tpg_port_count++;
+	list_add_tail(&tmf->queue_entry, &tpg->tmf_queue);
 	mutex_unlock(&tpg->tv_tpg_mutex);
 
 	vhost_scsi_hotplug(tpg, lun);
@@ -1947,11 +2062,16 @@ static void vhost_scsi_port_unlink(struct se_portal_group *se_tpg,
 {
 	struct vhost_scsi_tpg *tpg = container_of(se_tpg,
 				struct vhost_scsi_tpg, se_tpg);
+	struct vhost_scsi_tmf *tmf;
 
 	mutex_lock(&vhost_scsi_mutex);
 
 	mutex_lock(&tpg->tv_tpg_mutex);
 	tpg->tv_tpg_port_count--;
+	tmf = list_first_entry(&tpg->tmf_queue, struct vhost_scsi_tmf,
+			       queue_entry);
+	list_del(&tmf->queue_entry);
+	kfree(tmf);
 	mutex_unlock(&tpg->tv_tpg_mutex);
 
 	vhost_scsi_hotunplug(tpg, lun);
@@ -2212,6 +2332,7 @@ static ssize_t vhost_scsi_tpg_nexus_store(struct config_item *item,
 	}
 	mutex_init(&tpg->tv_tpg_mutex);
 	INIT_LIST_HEAD(&tpg->tv_tpg_list);
+	INIT_LIST_HEAD(&tpg->tmf_queue);
 	tpg->tport = tport;
 	tpg->tport_tpgt = tpgt;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 11/11] vhost scsi: remove extra flushes
  2020-11-04 22:26 [PATCH 00/11 V4] vhost: vhost-scsi bug fixes Mike Christie
                   ` (9 preceding siblings ...)
  2020-11-04 22:26 ` [PATCH 10/11] vhost scsi: Add support for LUN resets Mike Christie
@ 2020-11-04 22:26 ` Mike Christie
  10 siblings, 0 replies; 19+ messages in thread
From: Mike Christie @ 2020-11-04 22:26 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	pbonzini, stefanha, virtualization

The vhost work flush function was flushing the entire work queue, so
there is no need for the double vhost_work_dev_flush calls in
vhost_scsi_flush.

And we do not need to call vhost_poll_flush for each poller because
that call also ends up flushing the same work queue thread the
vhost_work_dev_flush call flushed.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/scsi.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 144cd05..774bffe 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1443,11 +1443,6 @@ static void vhost_scsi_handle_kick(struct vhost_work *work)
 	vhost_scsi_handle_vq(vs, vq);
 }
 
-static void vhost_scsi_flush_vq(struct vhost_scsi *vs, int index)
-{
-	vhost_poll_flush(&vs->vqs[index].vq.poll);
-}
-
 /* Callers must hold dev mutex */
 static void vhost_scsi_flush(struct vhost_scsi *vs)
 {
@@ -1466,9 +1461,6 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
 		kref_put(&old_inflight[i]->kref, vhost_scsi_done_inflight);
 
 	/* Flush both the vhost poll and vhost work */
-	for (i = 0; i < VHOST_SCSI_MAX_VQ; i++)
-		vhost_scsi_flush_vq(vs, i);
-	vhost_work_dev_flush(&vs->dev);
 	vhost_work_dev_flush(&vs->dev);
 
 	/* Wait for all reqs issued before the flush to be finished */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 05/11] vhost: move vq iovec allocation to dev init time
  2020-11-04 22:26 ` [PATCH 05/11] vhost: move vq iovec allocation to dev init time Mike Christie
@ 2020-11-09  3:41   ` Jason Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Jason Wang @ 2020-11-09  3:41 UTC (permalink / raw)
  To: Mike Christie, martin.petersen, linux-scsi, target-devel, mst,
	pbonzini, stefanha, virtualization


On 2020/11/5 上午6:26, Mike Christie wrote:
> The next patches allow us to create vqs on demand after vhost_dev_init
> and vhost_dev_set_owner have been called. For vhost-scsi we don't
> know the number of vqs we really want until the vring/vq setup
> operations have started up. For other devices we know the number of vqs
> at vhost_dev_init time, so for those devs we init the vq and allocate
> the needed iovecs. For vhost-scsi we will do it later when userspace has
> indicated to us that it's going to use a vq.
>
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>   drivers/vhost/vhost.c | 71 +++++++++++++++++++++++++++------------------------
>   1 file changed, 38 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index b35229e..a4a4450 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -383,29 +383,27 @@ static void vhost_vq_free_iovecs(struct vhost_virtqueue *vq)
>   	vq->heads = NULL;
>   }
>   
> -/* Helper to allocate iovec buffers for all vqs. */
> -static long vhost_dev_alloc_iovecs(struct vhost_dev *dev)
> +static int vhost_vq_alloc_iovecs(struct vhost_dev *dev,
> +				 struct vhost_virtqueue *vq)
>   {
> -	struct vhost_virtqueue *vq;
> -	int i;
> +	vq->indirect = kmalloc_array(UIO_MAXIOV, sizeof(*vq->indirect),
> +				     GFP_KERNEL);
> +	if (!vq->indirect)
> +		return -ENOMEM;
> +
> +	if (!dev->iov_limit)
> +		return 0;


This looks like an optimization. Let's try to defer this into another patch.


> +
> +	vq->log = kmalloc_array(dev->iov_limit, sizeof(*vq->log), GFP_KERNEL);
> +	vq->heads = kmalloc_array(dev->iov_limit, sizeof(*vq->heads),
> +				  GFP_KERNEL);
> +	if (!vq->log || !vq->heads)
> +		goto err_nomem;
>   
> -	for (i = 0; i < dev->nvqs; ++i) {
> -		vq = dev->vqs[i];
> -		vq->indirect = kmalloc_array(UIO_MAXIOV,
> -					     sizeof(*vq->indirect),
> -					     GFP_KERNEL);
> -		vq->log = kmalloc_array(dev->iov_limit, sizeof(*vq->log),
> -					GFP_KERNEL);
> -		vq->heads = kmalloc_array(dev->iov_limit, sizeof(*vq->heads),
> -					  GFP_KERNEL);
> -		if (!vq->indirect || !vq->log || !vq->heads)
> -			goto err_nomem;
> -	}
>   	return 0;
>   
>   err_nomem:
> -	for (; i >= 0; --i)
> -		vhost_vq_free_iovecs(dev->vqs[i]);
> +	vhost_vq_free_iovecs(vq);
>   	return -ENOMEM;
>   }
>   
> @@ -458,6 +456,21 @@ static size_t vhost_get_desc_size(struct vhost_virtqueue *vq,
>   	return sizeof(*vq->desc) * num;
>   }
>   
> +static int vhost_vq_init(struct vhost_dev *dev, struct vhost_virtqueue *vq)
> +{
> +	vq->log = NULL;
> +	vq->indirect = NULL;
> +	vq->heads = NULL;
> +	vq->dev = dev;
> +	mutex_init(&vq->mutex);
> +	vhost_vq_reset(dev, vq);
> +
> +	if (vq->handle_kick)
> +		vhost_poll_init(&vq->poll, vq->handle_kick, EPOLLIN, dev);
> +
> +	return vhost_vq_alloc_iovecs(dev, vq);
> +}


If it's possible, I would do a patch to introduce vhost_vq_init() and 
then add vhost_vq_alloc_iovecs() on top.

Thanks


> +
>   int vhost_dev_init(struct vhost_dev *dev,
>   		   struct vhost_virtqueue **vqs, int nvqs,
>   		   int iov_limit, int weight, int byte_weight,
> @@ -465,7 +478,6 @@ int vhost_dev_init(struct vhost_dev *dev,
>   		   int (*msg_handler)(struct vhost_dev *dev,
>   				      struct vhost_iotlb_msg *msg))
>   {
> -	struct vhost_virtqueue *vq;
>   	int i;
>   
>   	dev->vqs = vqs;
> @@ -489,19 +501,16 @@ int vhost_dev_init(struct vhost_dev *dev,
>   
>   
>   	for (i = 0; i < dev->nvqs; ++i) {
> -		vq = dev->vqs[i];
> -		vq->log = NULL;
> -		vq->indirect = NULL;
> -		vq->heads = NULL;
> -		vq->dev = dev;
> -		mutex_init(&vq->mutex);
> -		vhost_vq_reset(dev, vq);
> -		if (vq->handle_kick)
> -			vhost_poll_init(&vq->poll, vq->handle_kick,
> -					EPOLLIN, dev);
> +		if (vhost_vq_init(dev, dev->vqs[i]))
> +			goto err_vq_init;
>   	}
>   
>   	return 0;
> +
> +err_vq_init:
> +	for (--i; i >= 0; --i)
> +		vhost_vq_free_iovecs(dev->vqs[i]);
> +	return -ENOMEM;
>   }
>   EXPORT_SYMBOL_GPL(vhost_dev_init);
>   
> @@ -606,10 +615,6 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
>   			goto err_cgroup;
>   	}
>   
> -	err = vhost_dev_alloc_iovecs(dev);
> -	if (err)
> -		goto err_cgroup;
> -
>   	return 0;
>   err_cgroup:
>   	if (dev->worker) {

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 06/11] vhost: support delayed vq creation
  2020-11-04 22:26 ` [PATCH 06/11] vhost: support delayed vq creation Mike Christie
@ 2020-11-09  4:01   ` Jason Wang
  2020-11-09 18:41     ` Mike Christie
  0 siblings, 1 reply; 19+ messages in thread
From: Jason Wang @ 2020-11-09  4:01 UTC (permalink / raw)
  To: Mike Christie, martin.petersen, linux-scsi, target-devel, mst,
	pbonzini, stefanha, virtualization


On 2020/11/5 上午6:26, Mike Christie wrote:
> This allows vq creation to be done when it's first accessed by
> userspace. vhost-scsi doesn't know how many queues the user requested
> until they are first setup, and we don't want to allocate resources
> like the iovecs for 128 vqs when we are only using 1 or 2 most of the
> time. In the next pathces, vhost-scsi will also switch to preallocating
> cmds per vq instead of per lio session and we don't want to allocate
> them for 127 extra vqs if they are not in use.
>
> With this patch when a driver calls vhost_dev_init they pass in the
> number of vqs that they know they need and the max they can support.
> This patch has all the drivers pass in the same value for both the
> initial number of vqs and the max. The next patch will convert scsi.
> The other drivers like net/vsock have their vqs hard coded in the
> kernel or setup/discovered via other methods like with vdpa.
>
> Signed-off-by: Mike Christie <michael.christie@oracle.com>


This looks like a delayed vq metadata creation instead of vq itself.

Several questions:

- can we simply introduce new ioctls to set the #max_vqs for SCSI? Or 
simply introduce VRING_ENABLE ioctl for SCSI and let qemu to call that 
ioctl for 1.0 device, qemu can simply forward the ENABLE command to 
vhost-scsi, for 0.9x device qemu can simply check whether ring.addr is 
zero or not?
- if not, it's better to convert all the devices/queues to behave as if 
SCSI to have a consistent way allocate metadata
- do we need to care about vq free, e.g free vqs during reset?

Thanks


> ---
>   drivers/vhost/net.c   |  2 +-
>   drivers/vhost/scsi.c  |  4 +--
>   drivers/vhost/test.c  |  5 ++--
>   drivers/vhost/vdpa.c  |  2 +-
>   drivers/vhost/vhost.c | 71 ++++++++++++++++++++++++++++++++++-----------------
>   drivers/vhost/vhost.h |  7 +++--
>   drivers/vhost/vsock.c | 11 ++++----
>   7 files changed, 66 insertions(+), 36 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index fd30b53..fce46f0 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -1316,7 +1316,7 @@ static int vhost_net_open(struct inode *inode, struct file *f)
>   		n->vqs[i].rx_ring = NULL;
>   		vhost_net_buf_init(&n->vqs[i].rxq);
>   	}
> -	if (vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX,
> +	if (vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX, VHOST_NET_VQ_MAX,
>   			   UIO_MAXIOV + VHOST_NET_BATCH,
>   			   VHOST_NET_PKT_WEIGHT, VHOST_NET_WEIGHT, true,
>   			   NULL))
> diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
> index ed8baf9..5b3720e 100644
> --- a/drivers/vhost/scsi.c
> +++ b/drivers/vhost/scsi.c
> @@ -1632,8 +1632,8 @@ static int vhost_scsi_open(struct inode *inode, struct file *f)
>   		vqs[i] = &vs->vqs[i].vq;
>   		vs->vqs[i].vq.handle_kick = vhost_scsi_handle_kick;
>   	}
> -	r = vhost_dev_init(&vs->dev, vqs, VHOST_SCSI_MAX_VQ, UIO_MAXIOV,
> -			   VHOST_SCSI_WEIGHT, 0, true, NULL);
> +	r = vhost_dev_init(&vs->dev, vqs, VHOST_SCSI_MAX_VQ, VHOST_SCSI_MAX_VQ,
> +			   UIO_MAXIOV, VHOST_SCSI_WEIGHT, 0, true, NULL);
>   	if (r)
>   		goto err_dev_init;
>   
> diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
> index c255ae5..9d2bfa3 100644
> --- a/drivers/vhost/test.c
> +++ b/drivers/vhost/test.c
> @@ -119,8 +119,9 @@ static int vhost_test_open(struct inode *inode, struct file *f)
>   	dev = &n->dev;
>   	vqs[VHOST_TEST_VQ] = &n->vqs[VHOST_TEST_VQ];
>   	n->vqs[VHOST_TEST_VQ].handle_kick = handle_vq_kick;
> -	if (vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, UIO_MAXIOV,
> -			   VHOST_TEST_PKT_WEIGHT, VHOST_TEST_WEIGHT, true, NULL)
> +	if (vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, VHOST_TEST_VQ_MAX,
> +			   UIO_MAXIOV, VHOST_TEST_PKT_WEIGHT, VHOST_TEST_WEIGHT,
> +			   true, NULL)
>   		goto err_dev_init;
>   
>   	f->private_data = n;
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index c1615a7..6abd9d8 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -829,7 +829,7 @@ static int vhost_vdpa_open(struct inode *inode, struct file *filep)
>   		vqs[i] = &v->vqs[i];
>   		vqs[i]->handle_kick = handle_vq_kick;
>   	}
> -	r = vhost_dev_init(dev, vqs, nvqs, 0, 0, 0, false,
> +	r = vhost_dev_init(dev, vqs, nvqs, nvqs, 0, 0, 0, false,
>   			   vhost_vdpa_process_iotlb_msg);
>   	if (r)
>   		goto err_dev_init;
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index a4a4450..2ca2e71 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -294,7 +294,7 @@ static void vhost_vq_meta_reset(struct vhost_dev *d)
>   {
>   	int i;
>   
> -	for (i = 0; i < d->nvqs; ++i)
> +	for (i = 0; i < d->max_nvqs; ++i)
>   		__vhost_vq_meta_reset(d->vqs[i]);
>   }
>   
> @@ -331,6 +331,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
>   	vq->busyloop_timeout = 0;
>   	vq->umem = NULL;
>   	vq->iotlb = NULL;
> +	vq->initialized = false;
>   	vhost_vring_call_reset(&vq->call_ctx);
>   	__vhost_vq_meta_reset(vq);
>   }
> @@ -411,7 +412,7 @@ static void vhost_dev_free_iovecs(struct vhost_dev *dev)
>   {
>   	int i;
>   
> -	for (i = 0; i < dev->nvqs; ++i)
> +	for (i = 0; i < dev->max_nvqs; ++i)
>   		vhost_vq_free_iovecs(dev->vqs[i]);
>   }
>   
> @@ -456,7 +457,7 @@ static size_t vhost_get_desc_size(struct vhost_virtqueue *vq,
>   	return sizeof(*vq->desc) * num;
>   }
>   
> -static int vhost_vq_init(struct vhost_dev *dev, struct vhost_virtqueue *vq)
> +static void __vhost_vq_init(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>   {
>   	vq->log = NULL;
>   	vq->indirect = NULL;
> @@ -467,12 +468,29 @@ static int vhost_vq_init(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>   
>   	if (vq->handle_kick)
>   		vhost_poll_init(&vq->poll, vq->handle_kick, EPOLLIN, dev);
> +}
> +
> +static int vhost_vq_init(struct vhost_dev *dev, int vq_idx)
> +{
> +	struct vhost_virtqueue *vq;
> +	int ret;
> +
> +	if (vq_idx >= dev->max_nvqs)
> +		return -ENOBUFS;
> +
> +	vq = dev->vqs[vq_idx];
> +	__vhost_vq_init(dev, vq);
> +	ret = vhost_vq_alloc_iovecs(dev, vq);
> +	if (ret)
> +		return ret;
>   
> -	return vhost_vq_alloc_iovecs(dev, vq);
> +	vq->initialized = true;
> +	dev->nvqs++;
> +	return 0;
>   }
>   
>   int vhost_dev_init(struct vhost_dev *dev,
> -		   struct vhost_virtqueue **vqs, int nvqs,
> +		   struct vhost_virtqueue **vqs, int nvqs, int max_nvqs,
>   		   int iov_limit, int weight, int byte_weight,
>   		   bool use_worker,
>   		   int (*msg_handler)(struct vhost_dev *dev,
> @@ -481,7 +499,8 @@ int vhost_dev_init(struct vhost_dev *dev,
>   	int i;
>   
>   	dev->vqs = vqs;
> -	dev->nvqs = nvqs;
> +	dev->nvqs = 0;
> +	dev->max_nvqs = max_nvqs;
>   	mutex_init(&dev->mutex);
>   	dev->log_ctx = NULL;
>   	dev->umem = NULL;
> @@ -499,12 +518,15 @@ int vhost_dev_init(struct vhost_dev *dev,
>   	INIT_LIST_HEAD(&dev->pending_list);
>   	spin_lock_init(&dev->iotlb_lock);
>   
> -
> -	for (i = 0; i < dev->nvqs; ++i) {
> -		if (vhost_vq_init(dev, dev->vqs[i]))
> +	for (i = 0; i < nvqs; ++i) {
> +		if (vhost_vq_init(dev, i))
>   			goto err_vq_init;
>   	}
>   
> +	for (; i < dev->max_nvqs; ++i)
> +		/* Just prep/clear the fields and set initialized�lse */
> +		__vhost_vq_init(dev, dev->vqs[i]);
> +
>   	return 0;
>   
>   err_vq_init:
> @@ -652,7 +674,7 @@ void vhost_dev_reset_owner(struct vhost_dev *dev, struct vhost_iotlb *umem)
>   	/* We don't need VQ locks below since vhost_dev_cleanup makes sure
>   	 * VQs aren't running.
>   	 */
> -	for (i = 0; i < dev->nvqs; ++i)
> +	for (i = 0; i < dev->max_nvqs; ++i)
>   		dev->vqs[i]->umem = umem;
>   }
>   EXPORT_SYMBOL_GPL(vhost_dev_reset_owner);
> @@ -661,7 +683,7 @@ void vhost_dev_stop(struct vhost_dev *dev)
>   {
>   	int i;
>   
> -	for (i = 0; i < dev->nvqs; ++i) {
> +	for (i = 0; i < dev->max_nvqs; ++i) {
>   		if (dev->vqs[i]->kick && dev->vqs[i]->handle_kick) {
>   			vhost_poll_stop(&dev->vqs[i]->poll);
>   			vhost_poll_flush(&dev->vqs[i]->poll);
> @@ -693,7 +715,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
>   {
>   	int i;
>   
> -	for (i = 0; i < dev->nvqs; ++i) {
> +	for (i = 0; i < dev->max_nvqs; ++i) {
>   		if (dev->vqs[i]->error_ctx)
>   			eventfd_ctx_put(dev->vqs[i]->error_ctx);
>   		if (dev->vqs[i]->kick)
> @@ -787,7 +809,7 @@ static bool memory_access_ok(struct vhost_dev *d, struct vhost_iotlb *umem,
>   {
>   	int i;
>   
> -	for (i = 0; i < d->nvqs; ++i) {
> +	for (i = 0; i < d->max_nvqs; ++i) {
>   		bool ok;
>   		bool log;
>   
> @@ -999,14 +1021,14 @@ static inline int vhost_put_used_idx(struct vhost_virtqueue *vq)
>   static void vhost_dev_lock_vqs(struct vhost_dev *d)
>   {
>   	int i = 0;
> -	for (i = 0; i < d->nvqs; ++i)
> +	for (i = 0; i < d->max_nvqs; ++i)
>   		mutex_lock_nested(&d->vqs[i]->mutex, i);
>   }
>   
>   static void vhost_dev_unlock_vqs(struct vhost_dev *d)
>   {
>   	int i = 0;
> -	for (i = 0; i < d->nvqs; ++i)
> +	for (i = 0; i < d->max_nvqs; ++i)
>   		mutex_unlock(&d->vqs[i]->mutex);
>   }
>   
> @@ -1462,7 +1484,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
>   	d->umem = newumem;
>   
>   	/* All memory accesses are done under some VQ mutex. */
> -	for (i = 0; i < d->nvqs; ++i) {
> +	for (i = 0; i < d->max_nvqs; ++i) {
>   		mutex_lock(&d->vqs[i]->mutex);
>   		d->vqs[i]->umem = newumem;
>   		mutex_unlock(&d->vqs[i]->mutex);
> @@ -1590,11 +1612,14 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
>   	r = get_user(idx, idxp);
>   	if (r < 0)
>   		return r;
> -	if (idx >= d->nvqs)
> -		return -ENOBUFS;
>   
> -	idx = array_index_nospec(idx, d->nvqs);
> +	idx = array_index_nospec(idx, d->max_nvqs);
>   	vq = d->vqs[idx];
> +	if (!vq->initialized) {
> +		r = vhost_vq_init(d, idx);
> +		if (r)
> +			return r;
> +	}
>   
>   	if (ioctl = VHOST_SET_VRING_NUM ||
>   	    ioctl = VHOST_SET_VRING_ADDR) {
> @@ -1724,7 +1749,7 @@ int vhost_init_device_iotlb(struct vhost_dev *d, bool enabled)
>   	oiotlb = d->iotlb;
>   	d->iotlb = niotlb;
>   
> -	for (i = 0; i < d->nvqs; ++i) {
> +	for (i = 0; i < d->max_nvqs; ++i) {
>   		struct vhost_virtqueue *vq = d->vqs[i];
>   
>   		mutex_lock(&vq->mutex);
> @@ -1771,7 +1796,7 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *argp)
>   			r = -EFAULT;
>   			break;
>   		}
> -		for (i = 0; i < d->nvqs; ++i) {
> +		for (i = 0; i < d->max_nvqs; ++i) {
>   			struct vhost_virtqueue *vq;
>   			void __user *base = (void __user *)(unsigned long)p;
>   			vq = d->vqs[i];
> @@ -1794,7 +1819,7 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *argp)
>   			break;
>   		}
>   		swap(ctx, d->log_ctx);
> -		for (i = 0; i < d->nvqs; ++i) {
> +		for (i = 0; i < d->max_nvqs; ++i) {
>   			mutex_lock(&d->vqs[i]->mutex);
>   			d->vqs[i]->log_ctx = d->log_ctx;
>   			mutex_unlock(&d->vqs[i]->mutex);
> @@ -2609,7 +2634,7 @@ void vhost_set_backend_features(struct vhost_dev *dev, u64 features)
>   	int i;
>   
>   	mutex_lock(&dev->mutex);
> -	for (i = 0; i < dev->nvqs; ++i) {
> +	for (i = 0; i < dev->max_nvqs; ++i) {
>   		vq = dev->vqs[i];
>   		mutex_lock(&vq->mutex);
>   		vq->acked_backend_features = features;
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 9ad34b1..9677870 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -132,6 +132,8 @@ struct vhost_virtqueue {
>   	bool user_be;
>   #endif
>   	u32 busyloop_timeout;
> +
> +	bool initialized;
>   };
>   
>   struct vhost_msg_node {
> @@ -148,6 +150,7 @@ struct vhost_dev {
>   	struct mutex mutex;
>   	struct vhost_virtqueue **vqs;
>   	int nvqs;
> +	int max_nvqs;
>   	struct eventfd_ctx *log_ctx;
>   	struct llist_head work_list;
>   	struct task_struct *worker;
> @@ -168,8 +171,8 @@ struct vhost_dev {
>   
>   bool vhost_exceeds_weight(struct vhost_virtqueue *vq, int pkts, int total_len);
>   int vhost_dev_init(struct vhost_dev *dev, struct vhost_virtqueue **vqs,
> -		   int nvqs, int iov_limit, int weight, int byte_weight,
> -		   bool use_worker,
> +		   int nvqs, int max_nvqs, int iov_limit, int weight,
> +		   int byte_weight, bool use_worker,
>   		   int (*msg_handler)(struct vhost_dev *dev,
>   				      struct vhost_iotlb_msg *msg));
>   long vhost_dev_set_owner(struct vhost_dev *dev);
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index cae0083..bcdfd58 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -606,7 +606,7 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
>   {
>   	struct vhost_virtqueue **vqs;
>   	struct vhost_vsock *vsock;
> -	int ret;
> +	int ret, nvqs;
>   
>   	/* This struct is large and allocation could fail, fall back to vmalloc
>   	 * if there is no other way.
> @@ -615,7 +615,8 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
>   	if (!vsock)
>   		return -ENOMEM;
>   
> -	vqs = kmalloc_array(ARRAY_SIZE(vsock->vqs), sizeof(*vqs), GFP_KERNEL);
> +	nvqs = ARRAY_SIZE(vsock->vqs);
> +	vqs = kmalloc_array(nvqs, sizeof(*vqs), GFP_KERNEL);
>   	if (!vqs) {
>   		ret = -ENOMEM;
>   		goto out;
> @@ -630,9 +631,9 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
>   	vsock->vqs[VSOCK_VQ_TX].handle_kick = vhost_vsock_handle_tx_kick;
>   	vsock->vqs[VSOCK_VQ_RX].handle_kick = vhost_vsock_handle_rx_kick;
>   
> -	ret = vhost_dev_init(&vsock->dev, vqs, ARRAY_SIZE(vsock->vqs),
> -			     UIO_MAXIOV, VHOST_VSOCK_PKT_WEIGHT,
> -			     VHOST_VSOCK_WEIGHT, true, NULL);
> +	ret = vhost_dev_init(&vsock->dev, vqs, nvqs, nvqs, UIO_MAXIOV,
> +			     VHOST_VSOCK_PKT_WEIGHT, VHOST_VSOCK_WEIGHT, true,
> +			     NULL);
>   	if (ret)
>   		goto err_dev_init;
>   

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 06/11] vhost: support delayed vq creation
  2020-11-09  4:01   ` Jason Wang
@ 2020-11-09 18:41     ` Mike Christie
  2020-11-09 20:30       ` Mike Christie
  2020-11-10  2:44       ` Jason Wang
  0 siblings, 2 replies; 19+ messages in thread
From: Mike Christie @ 2020-11-09 18:41 UTC (permalink / raw)
  To: Jason Wang, martin.petersen, linux-scsi, target-devel, mst,
	pbonzini, stefanha, virtualization

On 11/8/20 10:01 PM, Jason Wang wrote:
> 
> On 2020/11/5 上午6:26, Mike Christie wrote:
>> This allows vq creation to be done when it's first accessed by
>> userspace. vhost-scsi doesn't know how many queues the user requested
>> until they are first setup, and we don't want to allocate resources
>> like the iovecs for 128 vqs when we are only using 1 or 2 most of the
>> time. In the next pathces, vhost-scsi will also switch to preallocating
>> cmds per vq instead of per lio session and we don't want to allocate
>> them for 127 extra vqs if they are not in use.
>>
>> With this patch when a driver calls vhost_dev_init they pass in the
>> number of vqs that they know they need and the max they can support.
>> This patch has all the drivers pass in the same value for both the
>> initial number of vqs and the max. The next patch will convert scsi.
>> The other drivers like net/vsock have their vqs hard coded in the
>> kernel or setup/discovered via other methods like with vdpa.
>>
>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> 
> 
> This looks like a delayed vq metadata creation instead of vq itself. >
> Several questions:
> 
> - can we simply introduce new ioctls to set the #max_vqs for SCSI? Or 

We can do this code wise, but I was not sure about the spec and 
VHOST_SET_OWNER's requirements. For the latter, the first ioctl cmd 
quemu does is VHOST_SET_OWNER. The vhost_dev_ioctl then requires that 
the caller be the owner to be able to do any other ioctls. The 
individual drivers like scsi, net, etc allow userspace to do ioctls 
without being the owner, and I was not sure if that is by accident or 
design.

> simply introduce VRING_ENABLE ioctl for SCSI and let qemu to call that 
> ioctl for 1.0 device, qemu can simply forward the ENABLE command to 

If it's allowed to do ioctl cmds that set vhost driver settings before 
VHOST_SET_OWNER then we can do the VRING_ENABLE calls first. For that 
though, I had one question about the spec. In section "4.1.4.3.2 Driver 
Requirements: Common configuration structure layout" it had the line:

-----
The driver MUST configure the other virtqueue fields before enabling the 
virtqueue with queue_enable.
-----

Does that apply to our case, or only when it's being used in a 
virtio_pci_common_cfg struct based command? If it applies to our case, 
then we have to change userspace to set config settings like the 
queue_size and features earlier, right?

I was thinking if we had to deal with that then maybe it's better to add 
a new ioctl like VRING_CREATE. It would be done first, singles that the 
kernel should do the vq creation (allocation, init steps we do in 
vhost_dev_init, etc).


> vhost-scsi, for 0.9x device qemu can simply check whether ring.addr is 
> zero or not
I had a question about that and my thread per vq patches.

If you want to pair this patchset down to it's bare bug fixes that 
prevent users from getting IO errors, I would do ring.addr check only 
approach for this bug fix patchset.

For my other patchset to do a vhost worker thread per vq, I think it 
would be cleanest to have a new ioctl or a userspace change. The problem 
is that we create the worker thread in vhost_dev_set_owner and at that 
time we do not know how many vqs there are going to be. For that we could:
1. Add a new VHOST_SET_OWNER_V2 that has an arg which contains the 
number of queues or worker threads
2. do the VRING_ENABLE approach if allowed spec/VHOST_SET_OWNER wise.
3. add a new ioctl that is done before VHOST_SET_OWNER

If we split it up like this, current users get the IO error bug fixes. 
This patchset gets a lot smaller. Users would then only have to upgrade 
userspace to get the multple thread feature.

Compat setup support might be cleaner too. If we did the 
VHOST_SET_OWNER_V2 or added a new ioctl then vhost_dev_set_owner just 
takes the new arg and loops num_queues/workers times to create more threads.


> - if not, it's better to convert all the devices/queues to behave as if 
> SCSI to have a consistent way allocate metadata
> - do we need to care about vq free, e.g free vqs during reset?
> 
> Thanks
> 
> 
>> ---
>>   drivers/vhost/net.c   |  2 +-
>>   drivers/vhost/scsi.c  |  4 +--
>>   drivers/vhost/test.c  |  5 ++--
>>   drivers/vhost/vdpa.c  |  2 +-
>>   drivers/vhost/vhost.c | 71 
>> ++++++++++++++++++++++++++++++++++-----------------
>>   drivers/vhost/vhost.h |  7 +++--
>>   drivers/vhost/vsock.c | 11 ++++----
>>   7 files changed, 66 insertions(+), 36 deletions(-)
>>
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index fd30b53..fce46f0 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -1316,7 +1316,7 @@ static int vhost_net_open(struct inode *inode, 
>> struct file *f)
>>           n->vqs[i].rx_ring = NULL;
>>           vhost_net_buf_init(&n->vqs[i].rxq);
>>       }
>> -    if (vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX,
>> +    if (vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX, VHOST_NET_VQ_MAX,
>>                  UIO_MAXIOV + VHOST_NET_BATCH,
>>                  VHOST_NET_PKT_WEIGHT, VHOST_NET_WEIGHT, true,
>>                  NULL))
>> diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
>> index ed8baf9..5b3720e 100644
>> --- a/drivers/vhost/scsi.c
>> +++ b/drivers/vhost/scsi.c
>> @@ -1632,8 +1632,8 @@ static int vhost_scsi_open(struct inode *inode, 
>> struct file *f)
>>           vqs[i] = &vs->vqs[i].vq;
>>           vs->vqs[i].vq.handle_kick = vhost_scsi_handle_kick;
>>       }
>> -    r = vhost_dev_init(&vs->dev, vqs, VHOST_SCSI_MAX_VQ, UIO_MAXIOV,
>> -               VHOST_SCSI_WEIGHT, 0, true, NULL);
>> +    r = vhost_dev_init(&vs->dev, vqs, VHOST_SCSI_MAX_VQ, 
>> VHOST_SCSI_MAX_VQ,
>> +               UIO_MAXIOV, VHOST_SCSI_WEIGHT, 0, true, NULL);
>>       if (r)
>>           goto err_dev_init;
>> diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
>> index c255ae5..9d2bfa3 100644
>> --- a/drivers/vhost/test.c
>> +++ b/drivers/vhost/test.c
>> @@ -119,8 +119,9 @@ static int vhost_test_open(struct inode *inode, 
>> struct file *f)
>>       dev = &n->dev;
>>       vqs[VHOST_TEST_VQ] = &n->vqs[VHOST_TEST_VQ];
>>       n->vqs[VHOST_TEST_VQ].handle_kick = handle_vq_kick;
>> -    if (vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, UIO_MAXIOV,
>> -               VHOST_TEST_PKT_WEIGHT, VHOST_TEST_WEIGHT, true, NULL)
>> +    if (vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, VHOST_TEST_VQ_MAX,
>> +               UIO_MAXIOV, VHOST_TEST_PKT_WEIGHT, VHOST_TEST_WEIGHT,
>> +               true, NULL)
>>           goto err_dev_init;
>>       f->private_data = n;
>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
>> index c1615a7..6abd9d8 100644
>> --- a/drivers/vhost/vdpa.c
>> +++ b/drivers/vhost/vdpa.c
>> @@ -829,7 +829,7 @@ static int vhost_vdpa_open(struct inode *inode, 
>> struct file *filep)
>>           vqs[i] = &v->vqs[i];
>>           vqs[i]->handle_kick = handle_vq_kick;
>>       }
>> -    r = vhost_dev_init(dev, vqs, nvqs, 0, 0, 0, false,
>> +    r = vhost_dev_init(dev, vqs, nvqs, nvqs, 0, 0, 0, false,
>>                  vhost_vdpa_process_iotlb_msg);
>>       if (r)
>>           goto err_dev_init;
>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>> index a4a4450..2ca2e71 100644
>> --- a/drivers/vhost/vhost.c
>> +++ b/drivers/vhost/vhost.c
>> @@ -294,7 +294,7 @@ static void vhost_vq_meta_reset(struct vhost_dev *d)
>>   {
>>       int i;
>> -    for (i = 0; i < d->nvqs; ++i)
>> +    for (i = 0; i < d->max_nvqs; ++i)
>>           __vhost_vq_meta_reset(d->vqs[i]);
>>   }
>> @@ -331,6 +331,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
>>       vq->busyloop_timeout = 0;
>>       vq->umem = NULL;
>>       vq->iotlb = NULL;
>> +    vq->initialized = false;
>>       vhost_vring_call_reset(&vq->call_ctx);
>>       __vhost_vq_meta_reset(vq);
>>   }
>> @@ -411,7 +412,7 @@ static void vhost_dev_free_iovecs(struct vhost_dev 
>> *dev)
>>   {
>>       int i;
>> -    for (i = 0; i < dev->nvqs; ++i)
>> +    for (i = 0; i < dev->max_nvqs; ++i)
>>           vhost_vq_free_iovecs(dev->vqs[i]);
>>   }
>> @@ -456,7 +457,7 @@ static size_t vhost_get_desc_size(struct 
>> vhost_virtqueue *vq,
>>       return sizeof(*vq->desc) * num;
>>   }
>> -static int vhost_vq_init(struct vhost_dev *dev, struct 
>> vhost_virtqueue *vq)
>> +static void __vhost_vq_init(struct vhost_dev *dev, struct 
>> vhost_virtqueue *vq)
>>   {
>>       vq->log = NULL;
>>       vq->indirect = NULL;
>> @@ -467,12 +468,29 @@ static int vhost_vq_init(struct vhost_dev *dev, 
>> struct vhost_virtqueue *vq)
>>       if (vq->handle_kick)
>>           vhost_poll_init(&vq->poll, vq->handle_kick, EPOLLIN, dev);
>> +}
>> +
>> +static int vhost_vq_init(struct vhost_dev *dev, int vq_idx)
>> +{
>> +    struct vhost_virtqueue *vq;
>> +    int ret;
>> +
>> +    if (vq_idx >= dev->max_nvqs)
>> +        return -ENOBUFS;
>> +
>> +    vq = dev->vqs[vq_idx];
>> +    __vhost_vq_init(dev, vq);
>> +    ret = vhost_vq_alloc_iovecs(dev, vq);
>> +    if (ret)
>> +        return ret;
>> -    return vhost_vq_alloc_iovecs(dev, vq);
>> +    vq->initialized = true;
>> +    dev->nvqs++;
>> +    return 0;
>>   }
>>   int vhost_dev_init(struct vhost_dev *dev,
>> -           struct vhost_virtqueue **vqs, int nvqs,
>> +           struct vhost_virtqueue **vqs, int nvqs, int max_nvqs,
>>              int iov_limit, int weight, int byte_weight,
>>              bool use_worker,
>>              int (*msg_handler)(struct vhost_dev *dev,
>> @@ -481,7 +499,8 @@ int vhost_dev_init(struct vhost_dev *dev,
>>       int i;
>>       dev->vqs = vqs;
>> -    dev->nvqs = nvqs;
>> +    dev->nvqs = 0;
>> +    dev->max_nvqs = max_nvqs;
>>       mutex_init(&dev->mutex);
>>       dev->log_ctx = NULL;
>>       dev->umem = NULL;
>> @@ -499,12 +518,15 @@ int vhost_dev_init(struct vhost_dev *dev,
>>       INIT_LIST_HEAD(&dev->pending_list);
>>       spin_lock_init(&dev->iotlb_lock);
>> -
>> -    for (i = 0; i < dev->nvqs; ++i) {
>> -        if (vhost_vq_init(dev, dev->vqs[i]))
>> +    for (i = 0; i < nvqs; ++i) {
>> +        if (vhost_vq_init(dev, i))
>>               goto err_vq_init;
>>       }
>> +    for (; i < dev->max_nvqs; ++i)
>> +        /* Just prep/clear the fields and set initialized�lse */
>> +        __vhost_vq_init(dev, dev->vqs[i]);
>> +
>>       return 0;
>>   err_vq_init:
>> @@ -652,7 +674,7 @@ void vhost_dev_reset_owner(struct vhost_dev *dev, 
>> struct vhost_iotlb *umem)
>>       /* We don't need VQ locks below since vhost_dev_cleanup makes sure
>>        * VQs aren't running.
>>        */
>> -    for (i = 0; i < dev->nvqs; ++i)
>> +    for (i = 0; i < dev->max_nvqs; ++i)
>>           dev->vqs[i]->umem = umem;
>>   }
>>   EXPORT_SYMBOL_GPL(vhost_dev_reset_owner);
>> @@ -661,7 +683,7 @@ void vhost_dev_stop(struct vhost_dev *dev)
>>   {
>>       int i;
>> -    for (i = 0; i < dev->nvqs; ++i) {
>> +    for (i = 0; i < dev->max_nvqs; ++i) {
>>           if (dev->vqs[i]->kick && dev->vqs[i]->handle_kick) {
>>               vhost_poll_stop(&dev->vqs[i]->poll);
>>               vhost_poll_flush(&dev->vqs[i]->poll);
>> @@ -693,7 +715,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
>>   {
>>       int i;
>> -    for (i = 0; i < dev->nvqs; ++i) {
>> +    for (i = 0; i < dev->max_nvqs; ++i) {
>>           if (dev->vqs[i]->error_ctx)
>>               eventfd_ctx_put(dev->vqs[i]->error_ctx);
>>           if (dev->vqs[i]->kick)
>> @@ -787,7 +809,7 @@ static bool memory_access_ok(struct vhost_dev *d, 
>> struct vhost_iotlb *umem,
>>   {
>>       int i;
>> -    for (i = 0; i < d->nvqs; ++i) {
>> +    for (i = 0; i < d->max_nvqs; ++i) {
>>           bool ok;
>>           bool log;
>> @@ -999,14 +1021,14 @@ static inline int vhost_put_used_idx(struct 
>> vhost_virtqueue *vq)
>>   static void vhost_dev_lock_vqs(struct vhost_dev *d)
>>   {
>>       int i = 0;
>> -    for (i = 0; i < d->nvqs; ++i)
>> +    for (i = 0; i < d->max_nvqs; ++i)
>>           mutex_lock_nested(&d->vqs[i]->mutex, i);
>>   }
>>   static void vhost_dev_unlock_vqs(struct vhost_dev *d)
>>   {
>>       int i = 0;
>> -    for (i = 0; i < d->nvqs; ++i)
>> +    for (i = 0; i < d->max_nvqs; ++i)
>>           mutex_unlock(&d->vqs[i]->mutex);
>>   }
>> @@ -1462,7 +1484,7 @@ static long vhost_set_memory(struct vhost_dev 
>> *d, struct vhost_memory __user *m)
>>       d->umem = newumem;
>>       /* All memory accesses are done under some VQ mutex. */
>> -    for (i = 0; i < d->nvqs; ++i) {
>> +    for (i = 0; i < d->max_nvqs; ++i) {
>>           mutex_lock(&d->vqs[i]->mutex);
>>           d->vqs[i]->umem = newumem;
>>           mutex_unlock(&d->vqs[i]->mutex);
>> @@ -1590,11 +1612,14 @@ long vhost_vring_ioctl(struct vhost_dev *d, 
>> unsigned int ioctl, void __user *arg
>>       r = get_user(idx, idxp);
>>       if (r < 0)
>>           return r;
>> -    if (idx >= d->nvqs)
>> -        return -ENOBUFS;
>> -    idx = array_index_nospec(idx, d->nvqs);
>> +    idx = array_index_nospec(idx, d->max_nvqs);
>>       vq = d->vqs[idx];
>> +    if (!vq->initialized) {
>> +        r = vhost_vq_init(d, idx);
>> +        if (r)
>> +            return r;
>> +    }
>>       if (ioctl = VHOST_SET_VRING_NUM ||
>>           ioctl = VHOST_SET_VRING_ADDR) {
>> @@ -1724,7 +1749,7 @@ int vhost_init_device_iotlb(struct vhost_dev *d, 
>> bool enabled)
>>       oiotlb = d->iotlb;
>>       d->iotlb = niotlb;
>> -    for (i = 0; i < d->nvqs; ++i) {
>> +    for (i = 0; i < d->max_nvqs; ++i) {
>>           struct vhost_virtqueue *vq = d->vqs[i];
>>           mutex_lock(&vq->mutex);
>> @@ -1771,7 +1796,7 @@ long vhost_dev_ioctl(struct vhost_dev *d, 
>> unsigned int ioctl, void __user *argp)
>>               r = -EFAULT;
>>               break;
>>           }
>> -        for (i = 0; i < d->nvqs; ++i) {
>> +        for (i = 0; i < d->max_nvqs; ++i) {
>>               struct vhost_virtqueue *vq;
>>               void __user *base = (void __user *)(unsigned long)p;
>>               vq = d->vqs[i];
>> @@ -1794,7 +1819,7 @@ long vhost_dev_ioctl(struct vhost_dev *d, 
>> unsigned int ioctl, void __user *argp)
>>               break;
>>           }
>>           swap(ctx, d->log_ctx);
>> -        for (i = 0; i < d->nvqs; ++i) {
>> +        for (i = 0; i < d->max_nvqs; ++i) {
>>               mutex_lock(&d->vqs[i]->mutex);
>>               d->vqs[i]->log_ctx = d->log_ctx;
>>               mutex_unlock(&d->vqs[i]->mutex);
>> @@ -2609,7 +2634,7 @@ void vhost_set_backend_features(struct vhost_dev 
>> *dev, u64 features)
>>       int i;
>>       mutex_lock(&dev->mutex);
>> -    for (i = 0; i < dev->nvqs; ++i) {
>> +    for (i = 0; i < dev->max_nvqs; ++i) {
>>           vq = dev->vqs[i];
>>           mutex_lock(&vq->mutex);
>>           vq->acked_backend_features = features;
>> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
>> index 9ad34b1..9677870 100644
>> --- a/drivers/vhost/vhost.h
>> +++ b/drivers/vhost/vhost.h
>> @@ -132,6 +132,8 @@ struct vhost_virtqueue {
>>       bool user_be;
>>   #endif
>>       u32 busyloop_timeout;
>> +
>> +    bool initialized;
>>   };
>>   struct vhost_msg_node {
>> @@ -148,6 +150,7 @@ struct vhost_dev {
>>       struct mutex mutex;
>>       struct vhost_virtqueue **vqs;
>>       int nvqs;
>> +    int max_nvqs;
>>       struct eventfd_ctx *log_ctx;
>>       struct llist_head work_list;
>>       struct task_struct *worker;
>> @@ -168,8 +171,8 @@ struct vhost_dev {
>>   bool vhost_exceeds_weight(struct vhost_virtqueue *vq, int pkts, int 
>> total_len);
>>   int vhost_dev_init(struct vhost_dev *dev, struct vhost_virtqueue **vqs,
>> -           int nvqs, int iov_limit, int weight, int byte_weight,
>> -           bool use_worker,
>> +           int nvqs, int max_nvqs, int iov_limit, int weight,
>> +           int byte_weight, bool use_worker,
>>              int (*msg_handler)(struct vhost_dev *dev,
>>                         struct vhost_iotlb_msg *msg));
>>   long vhost_dev_set_owner(struct vhost_dev *dev);
>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>> index cae0083..bcdfd58 100644
>> --- a/drivers/vhost/vsock.c
>> +++ b/drivers/vhost/vsock.c
>> @@ -606,7 +606,7 @@ static int vhost_vsock_dev_open(struct inode 
>> *inode, struct file *file)
>>   {
>>       struct vhost_virtqueue **vqs;
>>       struct vhost_vsock *vsock;
>> -    int ret;
>> +    int ret, nvqs;
>>       /* This struct is large and allocation could fail, fall back to 
>> vmalloc
>>        * if there is no other way.
>> @@ -615,7 +615,8 @@ static int vhost_vsock_dev_open(struct inode 
>> *inode, struct file *file)
>>       if (!vsock)
>>           return -ENOMEM;
>> -    vqs = kmalloc_array(ARRAY_SIZE(vsock->vqs), sizeof(*vqs), 
>> GFP_KERNEL);
>> +    nvqs = ARRAY_SIZE(vsock->vqs);
>> +    vqs = kmalloc_array(nvqs, sizeof(*vqs), GFP_KERNEL);
>>       if (!vqs) {
>>           ret = -ENOMEM;
>>           goto out;
>> @@ -630,9 +631,9 @@ static int vhost_vsock_dev_open(struct inode 
>> *inode, struct file *file)
>>       vsock->vqs[VSOCK_VQ_TX].handle_kick = vhost_vsock_handle_tx_kick;
>>       vsock->vqs[VSOCK_VQ_RX].handle_kick = vhost_vsock_handle_rx_kick;
>> -    ret = vhost_dev_init(&vsock->dev, vqs, ARRAY_SIZE(vsock->vqs),
>> -                 UIO_MAXIOV, VHOST_VSOCK_PKT_WEIGHT,
>> -                 VHOST_VSOCK_WEIGHT, true, NULL);
>> +    ret = vhost_dev_init(&vsock->dev, vqs, nvqs, nvqs, UIO_MAXIOV,
>> +                 VHOST_VSOCK_PKT_WEIGHT, VHOST_VSOCK_WEIGHT, true,
>> +                 NULL);
>>       if (ret)
>>           goto err_dev_init;
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 06/11] vhost: support delayed vq creation
  2020-11-09 18:41     ` Mike Christie
@ 2020-11-09 20:30       ` Mike Christie
  2020-11-09 22:32         ` Michael S. Tsirkin
  2020-11-10  2:50         ` Jason Wang
  2020-11-10  2:44       ` Jason Wang
  1 sibling, 2 replies; 19+ messages in thread
From: Mike Christie @ 2020-11-09 20:30 UTC (permalink / raw)
  To: Jason Wang, martin.petersen, linux-scsi, target-devel, mst,
	pbonzini, stefanha, virtualization

On 11/9/20 12:41 PM, Mike Christie wrote:
> 
> If you want to pair this patchset down to it's bare bug fixes that 
> prevent users from getting IO errors, I would do ring.addr check only 
> approach for this bug fix patchset.

Oh yeah, just so we don't have to burn an extra day, above I'm proposing 
I repost the original patchset:

https://lore.kernel.org/linux-scsi/1600712588-9514-1-git-send-email-michael.christie@oracle.com/t/

for the bug fix only patches. It will have the compile error fixed and 
Bart's comment handled.

To even trim it down more I can also drop the last 2 patches:

0007-vhost-remove-work-arg-from-vhost_work_flush.patch
0008-vhost-scsi-remove-extra-flushes.patch

and send separately in a cleanups patchset since the extra flushes it 
kills don't really hurt.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 06/11] vhost: support delayed vq creation
  2020-11-09 20:30       ` Mike Christie
@ 2020-11-09 22:32         ` Michael S. Tsirkin
  2020-11-10  2:50         ` Jason Wang
  1 sibling, 0 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2020-11-09 22:32 UTC (permalink / raw)
  To: Mike Christie
  Cc: Jason Wang, martin.petersen, linux-scsi, target-devel, pbonzini,
	stefanha, virtualization

On Mon, Nov 09, 2020 at 02:30:36PM -0600, Mike Christie wrote:
> On 11/9/20 12:41 PM, Mike Christie wrote:
> > 
> > If you want to pair this patchset down to it's bare bug fixes that
> > prevent users from getting IO errors, I would do ring.addr check only
> > approach for this bug fix patchset.
> 
> Oh yeah, just so we don't have to burn an extra day, above I'm proposing I
> repost the original patchset:
> 
> https://lore.kernel.org/linux-scsi/1600712588-9514-1-git-send-email-michael.christie@oracle.com/t/
> 
> for the bug fix only patches. It will have the compile error fixed and
> Bart's comment handled.
> 
> To even trim it down more I can also drop the last 2 patches:
> 
> 0007-vhost-remove-work-arg-from-vhost_work_flush.patch
> 0008-vhost-scsi-remove-extra-flushes.patch
> 
> and send separately in a cleanups patchset since the extra flushes it kills
> don't really hurt.

Makes sense to me.

-- 
MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 06/11] vhost: support delayed vq creation
  2020-11-09 18:41     ` Mike Christie
  2020-11-09 20:30       ` Mike Christie
@ 2020-11-10  2:44       ` Jason Wang
  1 sibling, 0 replies; 19+ messages in thread
From: Jason Wang @ 2020-11-10  2:44 UTC (permalink / raw)
  To: Mike Christie, martin.petersen, linux-scsi, target-devel, mst,
	pbonzini, stefanha, virtualization


On 2020/11/10 上午2:41, Mike Christie wrote:
> On 11/8/20 10:01 PM, Jason Wang wrote:
>>
>> On 2020/11/5 上午6:26, Mike Christie wrote:
>>> This allows vq creation to be done when it's first accessed by
>>> userspace. vhost-scsi doesn't know how many queues the user requested
>>> until they are first setup, and we don't want to allocate resources
>>> like the iovecs for 128 vqs when we are only using 1 or 2 most of the
>>> time. In the next pathces, vhost-scsi will also switch to preallocating
>>> cmds per vq instead of per lio session and we don't want to allocate
>>> them for 127 extra vqs if they are not in use.
>>>
>>> With this patch when a driver calls vhost_dev_init they pass in the
>>> number of vqs that they know they need and the max they can support.
>>> This patch has all the drivers pass in the same value for both the
>>> initial number of vqs and the max. The next patch will convert scsi.
>>> The other drivers like net/vsock have their vqs hard coded in the
>>> kernel or setup/discovered via other methods like with vdpa.
>>>
>>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
>>
>>
>> This looks like a delayed vq metadata creation instead of vq itself. >
>> Several questions:
>>
>> - can we simply introduce new ioctls to set the #max_vqs for SCSI? Or 
>
> We can do this code wise, but I was not sure about the spec and 
> VHOST_SET_OWNER's requirements. For the latter, the first ioctl cmd 
> quemu does is VHOST_SET_OWNER. The vhost_dev_ioctl then requires that 
> the caller be the owner to be able to do any other ioctls. The 
> individual drivers like scsi, net, etc allow userspace to do ioctls 
> without being the owner, and I was not sure if that is by accident or 
> design.


My understanding is that owner is required for memory mapping 
operations. For setting #max_vqs, net will talk with the backend 
(TUN/TAP) instead of vhost. For SCSI, it looks OK to do it without an 
owner (my understanding is that vhost-scsi doesn't need to tell #max_vqs 
to the under-layer).


>
>> simply introduce VRING_ENABLE ioctl for SCSI and let qemu to call 
>> that ioctl for 1.0 device, qemu can simply forward the ENABLE command to 
>
> If it's allowed to do ioctl cmds that set vhost driver settings before 
> VHOST_SET_OWNER then we can do the VRING_ENABLE calls first. For that 
> though, I had one question about the spec. In section "4.1.4.3.2 
> Driver Requirements: Common configuration structure layout" it had the 
> line:
>
> -----
> The driver MUST configure the other virtqueue fields before enabling 
> the virtqueue with queue_enable.
> -----
>
> Does that apply to our case, or only when it's being used in a 
> virtio_pci_common_cfg struct based command? 


The spec is for virtio(pci) only, for vhost, it could choose to follow 
the spec (which should be better) or not.


> If it applies to our case, then we have to change userspace to set 
> config settings like the queue_size and features earlier, right?


Yes, or do VRING_ENABLE after queue_size and features are set.


>
> I was thinking if we had to deal with that then maybe it's better to 
> add a new ioctl like VRING_CREATE. It would be done first, singles 
> that the kernel should do the vq creation (allocation, init steps we 
> do in vhost_dev_init, etc).
>

Technically, that's possible, but it looks like a new version of vhost 
ioctl protocol which requires new features to be advertised to userspace 
to choose between the two. I'm not sure it's worth to do that.


>
>> vhost-scsi, for 0.9x device qemu can simply check whether ring.addr 
>> is zero or not
> I had a question about that and my thread per vq patches.
>
> If you want to pair this patchset down to it's bare bug fixes that 
> prevent users from getting IO errors, I would do ring.addr check only 
> approach for this bug fix patchset.


Yes, I think it would be helpful if we can limit the changes to bug fixes.


>
> For my other patchset to do a vhost worker thread per vq, I think it 
> would be cleanest to have a new ioctl or a userspace change. The 
> problem is that we create the worker thread in vhost_dev_set_owner and 
> at that time we do not know how many vqs there are going to be. For 
> that we could:
> 1. Add a new VHOST_SET_OWNER_V2 that has an arg which contains the 
> number of queues or worker threads
> 2. do the VRING_ENABLE approach if allowed spec/VHOST_SET_OWNER wise.
> 3. add a new ioctl that is done before VHOST_SET_OWNER


Can we just create one worker during SET_OWNER and for the rest during 
VRING_ENABLE?


>
> If we split it up like this, current users get the IO error bug fixes. 
> This patchset gets a lot smaller. Users would then only have to 
> upgrade userspace to get the multple thread feature.


Yes.


>
> Compat setup support might be cleaner too. If we did the 
> VHOST_SET_OWNER_V2 or added a new ioctl then vhost_dev_set_owner just 
> takes the new arg and loops num_queues/workers times to create more 
> threads.
>
>

Thanks


>> - if not, it's better to convert all the devices/queues to behave as 
>> if SCSI to have a consistent way allocate metadata
>> - do we need to care about vq free, e.g free vqs during reset?
>>
>> Thanks
>>
>>
>>> ---
>>>   drivers/vhost/net.c   |  2 +-
>>>   drivers/vhost/scsi.c  |  4 +--
>>>   drivers/vhost/test.c  |  5 ++--
>>>   drivers/vhost/vdpa.c  |  2 +-
>>>   drivers/vhost/vhost.c | 71 
>>> ++++++++++++++++++++++++++++++++++-----------------
>>>   drivers/vhost/vhost.h |  7 +++--
>>>   drivers/vhost/vsock.c | 11 ++++----
>>>   7 files changed, 66 insertions(+), 36 deletions(-)
>>>
>>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>>> index fd30b53..fce46f0 100644
>>> --- a/drivers/vhost/net.c
>>> +++ b/drivers/vhost/net.c
>>> @@ -1316,7 +1316,7 @@ static int vhost_net_open(struct inode *inode, 
>>> struct file *f)
>>>           n->vqs[i].rx_ring = NULL;
>>>           vhost_net_buf_init(&n->vqs[i].rxq);
>>>       }
>>> -    if (vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX,
>>> +    if (vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX, VHOST_NET_VQ_MAX,
>>>                  UIO_MAXIOV + VHOST_NET_BATCH,
>>>                  VHOST_NET_PKT_WEIGHT, VHOST_NET_WEIGHT, true,
>>>                  NULL))
>>> diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
>>> index ed8baf9..5b3720e 100644
>>> --- a/drivers/vhost/scsi.c
>>> +++ b/drivers/vhost/scsi.c
>>> @@ -1632,8 +1632,8 @@ static int vhost_scsi_open(struct inode 
>>> *inode, struct file *f)
>>>           vqs[i] = &vs->vqs[i].vq;
>>>           vs->vqs[i].vq.handle_kick = vhost_scsi_handle_kick;
>>>       }
>>> -    r = vhost_dev_init(&vs->dev, vqs, VHOST_SCSI_MAX_VQ, UIO_MAXIOV,
>>> -               VHOST_SCSI_WEIGHT, 0, true, NULL);
>>> +    r = vhost_dev_init(&vs->dev, vqs, VHOST_SCSI_MAX_VQ, 
>>> VHOST_SCSI_MAX_VQ,
>>> +               UIO_MAXIOV, VHOST_SCSI_WEIGHT, 0, true, NULL);
>>>       if (r)
>>>           goto err_dev_init;
>>> diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
>>> index c255ae5..9d2bfa3 100644
>>> --- a/drivers/vhost/test.c
>>> +++ b/drivers/vhost/test.c
>>> @@ -119,8 +119,9 @@ static int vhost_test_open(struct inode *inode, 
>>> struct file *f)
>>>       dev = &n->dev;
>>>       vqs[VHOST_TEST_VQ] = &n->vqs[VHOST_TEST_VQ];
>>>       n->vqs[VHOST_TEST_VQ].handle_kick = handle_vq_kick;
>>> -    if (vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, UIO_MAXIOV,
>>> -               VHOST_TEST_PKT_WEIGHT, VHOST_TEST_WEIGHT, true, NULL)
>>> +    if (vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, VHOST_TEST_VQ_MAX,
>>> +               UIO_MAXIOV, VHOST_TEST_PKT_WEIGHT, VHOST_TEST_WEIGHT,
>>> +               true, NULL)
>>>           goto err_dev_init;
>>>       f->private_data = n;
>>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
>>> index c1615a7..6abd9d8 100644
>>> --- a/drivers/vhost/vdpa.c
>>> +++ b/drivers/vhost/vdpa.c
>>> @@ -829,7 +829,7 @@ static int vhost_vdpa_open(struct inode *inode, 
>>> struct file *filep)
>>>           vqs[i] = &v->vqs[i];
>>>           vqs[i]->handle_kick = handle_vq_kick;
>>>       }
>>> -    r = vhost_dev_init(dev, vqs, nvqs, 0, 0, 0, false,
>>> +    r = vhost_dev_init(dev, vqs, nvqs, nvqs, 0, 0, 0, false,
>>>                  vhost_vdpa_process_iotlb_msg);
>>>       if (r)
>>>           goto err_dev_init;
>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>>> index a4a4450..2ca2e71 100644
>>> --- a/drivers/vhost/vhost.c
>>> +++ b/drivers/vhost/vhost.c
>>> @@ -294,7 +294,7 @@ static void vhost_vq_meta_reset(struct vhost_dev 
>>> *d)
>>>   {
>>>       int i;
>>> -    for (i = 0; i < d->nvqs; ++i)
>>> +    for (i = 0; i < d->max_nvqs; ++i)
>>>           __vhost_vq_meta_reset(d->vqs[i]);
>>>   }
>>> @@ -331,6 +331,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
>>>       vq->busyloop_timeout = 0;
>>>       vq->umem = NULL;
>>>       vq->iotlb = NULL;
>>> +    vq->initialized = false;
>>>       vhost_vring_call_reset(&vq->call_ctx);
>>>       __vhost_vq_meta_reset(vq);
>>>   }
>>> @@ -411,7 +412,7 @@ static void vhost_dev_free_iovecs(struct 
>>> vhost_dev *dev)
>>>   {
>>>       int i;
>>> -    for (i = 0; i < dev->nvqs; ++i)
>>> +    for (i = 0; i < dev->max_nvqs; ++i)
>>>           vhost_vq_free_iovecs(dev->vqs[i]);
>>>   }
>>> @@ -456,7 +457,7 @@ static size_t vhost_get_desc_size(struct 
>>> vhost_virtqueue *vq,
>>>       return sizeof(*vq->desc) * num;
>>>   }
>>> -static int vhost_vq_init(struct vhost_dev *dev, struct 
>>> vhost_virtqueue *vq)
>>> +static void __vhost_vq_init(struct vhost_dev *dev, struct 
>>> vhost_virtqueue *vq)
>>>   {
>>>       vq->log = NULL;
>>>       vq->indirect = NULL;
>>> @@ -467,12 +468,29 @@ static int vhost_vq_init(struct vhost_dev 
>>> *dev, struct vhost_virtqueue *vq)
>>>       if (vq->handle_kick)
>>>           vhost_poll_init(&vq->poll, vq->handle_kick, EPOLLIN, dev);
>>> +}
>>> +
>>> +static int vhost_vq_init(struct vhost_dev *dev, int vq_idx)
>>> +{
>>> +    struct vhost_virtqueue *vq;
>>> +    int ret;
>>> +
>>> +    if (vq_idx >= dev->max_nvqs)
>>> +        return -ENOBUFS;
>>> +
>>> +    vq = dev->vqs[vq_idx];
>>> +    __vhost_vq_init(dev, vq);
>>> +    ret = vhost_vq_alloc_iovecs(dev, vq);
>>> +    if (ret)
>>> +        return ret;
>>> -    return vhost_vq_alloc_iovecs(dev, vq);
>>> +    vq->initialized = true;
>>> +    dev->nvqs++;
>>> +    return 0;
>>>   }
>>>   int vhost_dev_init(struct vhost_dev *dev,
>>> -           struct vhost_virtqueue **vqs, int nvqs,
>>> +           struct vhost_virtqueue **vqs, int nvqs, int max_nvqs,
>>>              int iov_limit, int weight, int byte_weight,
>>>              bool use_worker,
>>>              int (*msg_handler)(struct vhost_dev *dev,
>>> @@ -481,7 +499,8 @@ int vhost_dev_init(struct vhost_dev *dev,
>>>       int i;
>>>       dev->vqs = vqs;
>>> -    dev->nvqs = nvqs;
>>> +    dev->nvqs = 0;
>>> +    dev->max_nvqs = max_nvqs;
>>>       mutex_init(&dev->mutex);
>>>       dev->log_ctx = NULL;
>>>       dev->umem = NULL;
>>> @@ -499,12 +518,15 @@ int vhost_dev_init(struct vhost_dev *dev,
>>>       INIT_LIST_HEAD(&dev->pending_list);
>>>       spin_lock_init(&dev->iotlb_lock);
>>> -
>>> -    for (i = 0; i < dev->nvqs; ++i) {
>>> -        if (vhost_vq_init(dev, dev->vqs[i]))
>>> +    for (i = 0; i < nvqs; ++i) {
>>> +        if (vhost_vq_init(dev, i))
>>>               goto err_vq_init;
>>>       }
>>> +    for (; i < dev->max_nvqs; ++i)
>>> +        /* Just prep/clear the fields and set initialized�lse */
>>> +        __vhost_vq_init(dev, dev->vqs[i]);
>>> +
>>>       return 0;
>>>   err_vq_init:
>>> @@ -652,7 +674,7 @@ void vhost_dev_reset_owner(struct vhost_dev 
>>> *dev, struct vhost_iotlb *umem)
>>>       /* We don't need VQ locks below since vhost_dev_cleanup makes 
>>> sure
>>>        * VQs aren't running.
>>>        */
>>> -    for (i = 0; i < dev->nvqs; ++i)
>>> +    for (i = 0; i < dev->max_nvqs; ++i)
>>>           dev->vqs[i]->umem = umem;
>>>   }
>>>   EXPORT_SYMBOL_GPL(vhost_dev_reset_owner);
>>> @@ -661,7 +683,7 @@ void vhost_dev_stop(struct vhost_dev *dev)
>>>   {
>>>       int i;
>>> -    for (i = 0; i < dev->nvqs; ++i) {
>>> +    for (i = 0; i < dev->max_nvqs; ++i) {
>>>           if (dev->vqs[i]->kick && dev->vqs[i]->handle_kick) {
>>>               vhost_poll_stop(&dev->vqs[i]->poll);
>>>               vhost_poll_flush(&dev->vqs[i]->poll);
>>> @@ -693,7 +715,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
>>>   {
>>>       int i;
>>> -    for (i = 0; i < dev->nvqs; ++i) {
>>> +    for (i = 0; i < dev->max_nvqs; ++i) {
>>>           if (dev->vqs[i]->error_ctx)
>>>               eventfd_ctx_put(dev->vqs[i]->error_ctx);
>>>           if (dev->vqs[i]->kick)
>>> @@ -787,7 +809,7 @@ static bool memory_access_ok(struct vhost_dev 
>>> *d, struct vhost_iotlb *umem,
>>>   {
>>>       int i;
>>> -    for (i = 0; i < d->nvqs; ++i) {
>>> +    for (i = 0; i < d->max_nvqs; ++i) {
>>>           bool ok;
>>>           bool log;
>>> @@ -999,14 +1021,14 @@ static inline int vhost_put_used_idx(struct 
>>> vhost_virtqueue *vq)
>>>   static void vhost_dev_lock_vqs(struct vhost_dev *d)
>>>   {
>>>       int i = 0;
>>> -    for (i = 0; i < d->nvqs; ++i)
>>> +    for (i = 0; i < d->max_nvqs; ++i)
>>>           mutex_lock_nested(&d->vqs[i]->mutex, i);
>>>   }
>>>   static void vhost_dev_unlock_vqs(struct vhost_dev *d)
>>>   {
>>>       int i = 0;
>>> -    for (i = 0; i < d->nvqs; ++i)
>>> +    for (i = 0; i < d->max_nvqs; ++i)
>>>           mutex_unlock(&d->vqs[i]->mutex);
>>>   }
>>> @@ -1462,7 +1484,7 @@ static long vhost_set_memory(struct vhost_dev 
>>> *d, struct vhost_memory __user *m)
>>>       d->umem = newumem;
>>>       /* All memory accesses are done under some VQ mutex. */
>>> -    for (i = 0; i < d->nvqs; ++i) {
>>> +    for (i = 0; i < d->max_nvqs; ++i) {
>>>           mutex_lock(&d->vqs[i]->mutex);
>>>           d->vqs[i]->umem = newumem;
>>>           mutex_unlock(&d->vqs[i]->mutex);
>>> @@ -1590,11 +1612,14 @@ long vhost_vring_ioctl(struct vhost_dev *d, 
>>> unsigned int ioctl, void __user *arg
>>>       r = get_user(idx, idxp);
>>>       if (r < 0)
>>>           return r;
>>> -    if (idx >= d->nvqs)
>>> -        return -ENOBUFS;
>>> -    idx = array_index_nospec(idx, d->nvqs);
>>> +    idx = array_index_nospec(idx, d->max_nvqs);
>>>       vq = d->vqs[idx];
>>> +    if (!vq->initialized) {
>>> +        r = vhost_vq_init(d, idx);
>>> +        if (r)
>>> +            return r;
>>> +    }
>>>       if (ioctl = VHOST_SET_VRING_NUM ||
>>>           ioctl = VHOST_SET_VRING_ADDR) {
>>> @@ -1724,7 +1749,7 @@ int vhost_init_device_iotlb(struct vhost_dev 
>>> *d, bool enabled)
>>>       oiotlb = d->iotlb;
>>>       d->iotlb = niotlb;
>>> -    for (i = 0; i < d->nvqs; ++i) {
>>> +    for (i = 0; i < d->max_nvqs; ++i) {
>>>           struct vhost_virtqueue *vq = d->vqs[i];
>>>           mutex_lock(&vq->mutex);
>>> @@ -1771,7 +1796,7 @@ long vhost_dev_ioctl(struct vhost_dev *d, 
>>> unsigned int ioctl, void __user *argp)
>>>               r = -EFAULT;
>>>               break;
>>>           }
>>> -        for (i = 0; i < d->nvqs; ++i) {
>>> +        for (i = 0; i < d->max_nvqs; ++i) {
>>>               struct vhost_virtqueue *vq;
>>>               void __user *base = (void __user *)(unsigned long)p;
>>>               vq = d->vqs[i];
>>> @@ -1794,7 +1819,7 @@ long vhost_dev_ioctl(struct vhost_dev *d, 
>>> unsigned int ioctl, void __user *argp)
>>>               break;
>>>           }
>>>           swap(ctx, d->log_ctx);
>>> -        for (i = 0; i < d->nvqs; ++i) {
>>> +        for (i = 0; i < d->max_nvqs; ++i) {
>>>               mutex_lock(&d->vqs[i]->mutex);
>>>               d->vqs[i]->log_ctx = d->log_ctx;
>>>               mutex_unlock(&d->vqs[i]->mutex);
>>> @@ -2609,7 +2634,7 @@ void vhost_set_backend_features(struct 
>>> vhost_dev *dev, u64 features)
>>>       int i;
>>>       mutex_lock(&dev->mutex);
>>> -    for (i = 0; i < dev->nvqs; ++i) {
>>> +    for (i = 0; i < dev->max_nvqs; ++i) {
>>>           vq = dev->vqs[i];
>>>           mutex_lock(&vq->mutex);
>>>           vq->acked_backend_features = features;
>>> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
>>> index 9ad34b1..9677870 100644
>>> --- a/drivers/vhost/vhost.h
>>> +++ b/drivers/vhost/vhost.h
>>> @@ -132,6 +132,8 @@ struct vhost_virtqueue {
>>>       bool user_be;
>>>   #endif
>>>       u32 busyloop_timeout;
>>> +
>>> +    bool initialized;
>>>   };
>>>   struct vhost_msg_node {
>>> @@ -148,6 +150,7 @@ struct vhost_dev {
>>>       struct mutex mutex;
>>>       struct vhost_virtqueue **vqs;
>>>       int nvqs;
>>> +    int max_nvqs;
>>>       struct eventfd_ctx *log_ctx;
>>>       struct llist_head work_list;
>>>       struct task_struct *worker;
>>> @@ -168,8 +171,8 @@ struct vhost_dev {
>>>   bool vhost_exceeds_weight(struct vhost_virtqueue *vq, int pkts, 
>>> int total_len);
>>>   int vhost_dev_init(struct vhost_dev *dev, struct vhost_virtqueue 
>>> **vqs,
>>> -           int nvqs, int iov_limit, int weight, int byte_weight,
>>> -           bool use_worker,
>>> +           int nvqs, int max_nvqs, int iov_limit, int weight,
>>> +           int byte_weight, bool use_worker,
>>>              int (*msg_handler)(struct vhost_dev *dev,
>>>                         struct vhost_iotlb_msg *msg));
>>>   long vhost_dev_set_owner(struct vhost_dev *dev);
>>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>>> index cae0083..bcdfd58 100644
>>> --- a/drivers/vhost/vsock.c
>>> +++ b/drivers/vhost/vsock.c
>>> @@ -606,7 +606,7 @@ static int vhost_vsock_dev_open(struct inode 
>>> *inode, struct file *file)
>>>   {
>>>       struct vhost_virtqueue **vqs;
>>>       struct vhost_vsock *vsock;
>>> -    int ret;
>>> +    int ret, nvqs;
>>>       /* This struct is large and allocation could fail, fall back 
>>> to vmalloc
>>>        * if there is no other way.
>>> @@ -615,7 +615,8 @@ static int vhost_vsock_dev_open(struct inode 
>>> *inode, struct file *file)
>>>       if (!vsock)
>>>           return -ENOMEM;
>>> -    vqs = kmalloc_array(ARRAY_SIZE(vsock->vqs), sizeof(*vqs), 
>>> GFP_KERNEL);
>>> +    nvqs = ARRAY_SIZE(vsock->vqs);
>>> +    vqs = kmalloc_array(nvqs, sizeof(*vqs), GFP_KERNEL);
>>>       if (!vqs) {
>>>           ret = -ENOMEM;
>>>           goto out;
>>> @@ -630,9 +631,9 @@ static int vhost_vsock_dev_open(struct inode 
>>> *inode, struct file *file)
>>>       vsock->vqs[VSOCK_VQ_TX].handle_kick = vhost_vsock_handle_tx_kick;
>>>       vsock->vqs[VSOCK_VQ_RX].handle_kick = vhost_vsock_handle_rx_kick;
>>> -    ret = vhost_dev_init(&vsock->dev, vqs, ARRAY_SIZE(vsock->vqs),
>>> -                 UIO_MAXIOV, VHOST_VSOCK_PKT_WEIGHT,
>>> -                 VHOST_VSOCK_WEIGHT, true, NULL);
>>> +    ret = vhost_dev_init(&vsock->dev, vqs, nvqs, nvqs, UIO_MAXIOV,
>>> +                 VHOST_VSOCK_PKT_WEIGHT, VHOST_VSOCK_WEIGHT, true,
>>> +                 NULL);
>>>       if (ret)
>>>           goto err_dev_init;
>>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 06/11] vhost: support delayed vq creation
  2020-11-09 20:30       ` Mike Christie
  2020-11-09 22:32         ` Michael S. Tsirkin
@ 2020-11-10  2:50         ` Jason Wang
  1 sibling, 0 replies; 19+ messages in thread
From: Jason Wang @ 2020-11-10  2:50 UTC (permalink / raw)
  To: Mike Christie, martin.petersen, linux-scsi, target-devel, mst,
	pbonzini, stefanha, virtualization


On 2020/11/10 上午4:30, Mike Christie wrote:
> On 11/9/20 12:41 PM, Mike Christie wrote:
>>
>> If you want to pair this patchset down to it's bare bug fixes that 
>> prevent users from getting IO errors, I would do ring.addr check only 
>> approach for this bug fix patchset.
>
> Oh yeah, just so we don't have to burn an extra day, above I'm 
> proposing I repost the original patchset:
>
> https://lore.kernel.org/linux-scsi/1600712588-9514-1-git-send-email-michael.christie@oracle.com/t/ 
>
>
> for the bug fix only patches. It will have the compile error fixed and 
> Bart's comment handled.
>
> To even trim it down more I can also drop the last 2 patches:
>
> 0007-vhost-remove-work-arg-from-vhost_work_flush.patch
> 0008-vhost-scsi-remove-extra-flushes.patch
>
> and send separately in a cleanups patchset since the extra flushes it 
> kills don't really hurt.


Fine with me.

Thanks

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-11-10  2:50 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-04 22:26 [PATCH 00/11 V4] vhost: vhost-scsi bug fixes Mike Christie
2020-11-04 22:26 ` [PATCH 01/11] vhost scsi: add lun parser helper Mike Christie
2020-11-04 22:26 ` [PATCH 02/11] vhost: remove work arg from vhost_work_flush Mike Christie
2020-11-04 22:26 ` [PATCH 03/11] vhost net: use goto error handling in open Mike Christie
2020-11-04 22:26 ` [PATCH 04/11] vhost: prep vhost_dev_init users to handle failures Mike Christie
2020-11-04 22:26 ` [PATCH 05/11] vhost: move vq iovec allocation to dev init time Mike Christie
2020-11-09  3:41   ` Jason Wang
2020-11-04 22:26 ` [PATCH 06/11] vhost: support delayed vq creation Mike Christie
2020-11-09  4:01   ` Jason Wang
2020-11-09 18:41     ` Mike Christie
2020-11-09 20:30       ` Mike Christie
2020-11-09 22:32         ` Michael S. Tsirkin
2020-11-10  2:50         ` Jason Wang
2020-11-10  2:44       ` Jason Wang
2020-11-04 22:26 ` [PATCH 07/11] vhost scsi: support delayed IO " Mike Christie
2020-11-04 22:26 ` [PATCH 08/11] vhost scsi: alloc cmds per vq instead of session Mike Christie
2020-11-04 22:26 ` [PATCH 09/11] vhost scsi: fix cmd completion race Mike Christie
2020-11-04 22:26 ` [PATCH 10/11] vhost scsi: Add support for LUN resets Mike Christie
2020-11-04 22:26 ` [PATCH 11/11] vhost scsi: remove extra flushes Mike Christie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).