All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V3 00/11] vhost: multiple worker support
@ 2021-10-22  5:18 ` Mike Christie
  0 siblings, 0 replies; 74+ messages in thread
From: Mike Christie @ 2021-10-22  5:18 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, pbonzini, jasowang, mst,
	sgarzare, virtualization

The following patches apply over linus's tree and this patchset

https://lore.kernel.org/all/20211007214448.6282-1-michael.christie@oracle.com/

which allows us to check the vhost owner thread's RLIMITs:

It looks like that patchset has been ok'd by all the major parties
and just needs a small cleanup to apply to Jens and Paul trees, so I
wanted to post my threading patches based over it for review.

The following patches allow us to support multiple vhost workers per
device. I ended up just doing Stefan's original idea where userspace has
the kernel create a worker and we pass back the pid. This has the benefit
over the workqueue and userspace thread approach where we only have
one'ish code path in the kernel during setup to detect old tools. The
main IO paths and device/vq setup/teardown paths all use common code.

I've also included a patch for qemu so you can get an idea of how it
works. If we are ok with the kernel code then I'll break that up into
a patchset and send to qemu-devel for review.

Results:
--------

fio jobs        1       2       4       8       12      16
----------------------------------------------------------
1 worker        84k    492k    510k    -       -       -
worker per vq   184k   380k    744k    1422k   2256k   2434k

Notes:
0. This used a simple fio command:

fio --filename=/dev/sdb  --direct=1 --rw=randrw --bs=4k \
--ioengine=libaio --iodepth=128  --numjobs=$JOBS_ABOVE

and I used a VM with 16 vCPUs and 16 virtqueues.

1. The patches were tested with emulate_pr=0 and these patches:

https://lore.kernel.org/all/yq1tuhge4bg.fsf@ca-mkp.ca.oracle.com/t/

which are in mkp's scsi branches for the next kernel. They fix the perf
issues where IOPs dropped at 12 vqs/jobs.

2. Because we have a hard limit of 1024 cmds, if the num jobs * iodepth
was greater than 1024, I would decrease iodepth. So 12 jobs used 85 cmds,
and 16 used 64.

3. The perf issue above at 2 jobs is because when we only have 1 worker
we execute more cmds per vhost_work due to all vqs funneling to one worker.
This results in less context switches and better batching without having to
tweak any settings. I'm working on patches to add back batching during lio
completion and do polling on the submission side.

We will still want the threading patches, because if we batch at the fio
level plus use the vhost theading patches, we can see a big boost like
below. So hopefully doing it at the kernel will allow apps to just work
without having to be smart like fio.

fio using io_uring and batching with the iodepth_batch* settings:

fio jobs        1       2       4       8       12      16
-------------------------------------------------------------
1 worker        494k    520k    -       -       -       -
worker per vq   496k    878k    1542k   2436k   2304k   2590k

V3:
- fully convert vhost code to use vq based APIs instead of leaving it
half per dev and half per vq.
- rebase against kernel worker API.
- Drop delayed worker creation. We always create the default worker at
VHOST_SET_OWNER time. Userspace can create and bind workers after that.

v2:
- change loop that we take a refcount to the worker in
- replaced pid == -1 with define.
- fixed tabbing/spacing coding style issue
- use hash instead of list to lookup workers.
- I dropped the patch that added an ioctl cmd to get a vq's worker's
pid. I saw we might do a generic netlink interface instead.





^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH V3 00/11] vhost: multiple worker support
@ 2021-10-22  5:18 ` Mike Christie
  0 siblings, 0 replies; 74+ messages in thread
From: Mike Christie @ 2021-10-22  5:18 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, pbonzini, jasowang, mst,
	sgarzare, virtualization

The following patches apply over linus's tree and this patchset

https://lore.kernel.org/all/20211007214448.6282-1-michael.christie@oracle.com/

which allows us to check the vhost owner thread's RLIMITs:

It looks like that patchset has been ok'd by all the major parties
and just needs a small cleanup to apply to Jens and Paul trees, so I
wanted to post my threading patches based over it for review.

The following patches allow us to support multiple vhost workers per
device. I ended up just doing Stefan's original idea where userspace has
the kernel create a worker and we pass back the pid. This has the benefit
over the workqueue and userspace thread approach where we only have
one'ish code path in the kernel during setup to detect old tools. The
main IO paths and device/vq setup/teardown paths all use common code.

I've also included a patch for qemu so you can get an idea of how it
works. If we are ok with the kernel code then I'll break that up into
a patchset and send to qemu-devel for review.

Results:
--------

fio jobs        1       2       4       8       12      16
----------------------------------------------------------
1 worker        84k    492k    510k    -       -       -
worker per vq   184k   380k    744k    1422k   2256k   2434k

Notes:
0. This used a simple fio command:

fio --filename=/dev/sdb  --direct=1 --rw=randrw --bs=4k \
--ioengine=libaio --iodepth=128  --numjobs=$JOBS_ABOVE

and I used a VM with 16 vCPUs and 16 virtqueues.

1. The patches were tested with emulate_pr=0 and these patches:

https://lore.kernel.org/all/yq1tuhge4bg.fsf@ca-mkp.ca.oracle.com/t/

which are in mkp's scsi branches for the next kernel. They fix the perf
issues where IOPs dropped at 12 vqs/jobs.

2. Because we have a hard limit of 1024 cmds, if the num jobs * iodepth
was greater than 1024, I would decrease iodepth. So 12 jobs used 85 cmds,
and 16 used 64.

3. The perf issue above at 2 jobs is because when we only have 1 worker
we execute more cmds per vhost_work due to all vqs funneling to one worker.
This results in less context switches and better batching without having to
tweak any settings. I'm working on patches to add back batching during lio
completion and do polling on the submission side.

We will still want the threading patches, because if we batch at the fio
level plus use the vhost theading patches, we can see a big boost like
below. So hopefully doing it at the kernel will allow apps to just work
without having to be smart like fio.

fio using io_uring and batching with the iodepth_batch* settings:

fio jobs        1       2       4       8       12      16
-------------------------------------------------------------
1 worker        494k    520k    -       -       -       -
worker per vq   496k    878k    1542k   2436k   2304k   2590k

V3:
- fully convert vhost code to use vq based APIs instead of leaving it
half per dev and half per vq.
- rebase against kernel worker API.
- Drop delayed worker creation. We always create the default worker at
VHOST_SET_OWNER time. Userspace can create and bind workers after that.

v2:
- change loop that we take a refcount to the worker in
- replaced pid == -1 with define.
- fixed tabbing/spacing coding style issue
- use hash instead of list to lookup workers.
- I dropped the patch that added an ioctl cmd to get a vq's worker's
pid. I saw we might do a generic netlink interface instead.




_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH] QEMU vhost-scsi: add support for VHOST_SET_VRING_WORKER
  2021-10-22  5:18 ` Mike Christie
@ 2021-10-22  5:19   ` Mike Christie
  -1 siblings, 0 replies; 74+ messages in thread
From: Mike Christie @ 2021-10-22  5:19 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, pbonzini, jasowang, mst,
	sgarzare, virtualization
  Cc: Mike Christie

This patch adds support for the proposed ioctl that allows userspace
to create virtqueue workers. For vhost-scsi you can set virtqueue_workers
to:

 0: default behavior where we have 1 worker for all vqs.
-1: create a worker per vq.
>0: create N workers and allow the vqs to share them.

TODO:
- Better support for sharing workers where we bind based on ISR to vq
mapping.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 hw/scsi/vhost-scsi.c                         | 65 ++++++++++++++++++++
 hw/virtio/vhost-backend.c                    |  8 +++
 include/hw/virtio/vhost-backend.h            |  4 ++
 include/hw/virtio/virtio-scsi.h              |  1 +
 include/standard-headers/linux/vhost_types.h | 12 ++++
 linux-headers/linux/vhost.h                  | 11 ++++
 6 files changed, 101 insertions(+)

diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index 039caf2614eb..74363f9a519e 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -163,6 +163,62 @@ static const VMStateDescription vmstate_virtio_vhost_scsi = {
     .pre_save = vhost_scsi_pre_save,
 };
 
+static int vhost_scsi_set_workers(VHostSCSICommon *vsc, int vq_workers)
+{
+    struct vhost_dev *dev = &vsc->dev;
+    int workers_per_queue = 1, io_queues;
+    struct vhost_vring_worker w;
+    int i, ret, cnt = 0;
+
+    if (vq_workers < VHOST_VRING_NEW_WORKER)
+        return -EINVAL;
+
+    if (vq_workers == 0 ||
+        dev->nvqs == VHOST_SCSI_VQ_NUM_FIXED + 1)
+        /* Use the single default worker */
+        return 0;
+
+    io_queues = dev->nvqs - VHOST_SCSI_VQ_NUM_FIXED;
+    if (vq_workers > 0 && io_queues > vq_workers)
+        workers_per_queue = io_queues / vq_workers;
+
+    w.pid = VHOST_VRING_NEW_WORKER;
+    /*
+     * ctl/evt share the first worker since it will be rare for them
+     * to send cmds while IO is running. The rest of the vqs get their
+     * own worker.
+     */
+    for (i = VHOST_SCSI_VQ_NUM_FIXED + 1; i < dev->nvqs; i++) {
+        w.index = i;
+
+        switch (vq_workers) {
+        case VHOST_VRING_NEW_WORKER:
+            w.pid = VHOST_VRING_NEW_WORKER;
+            break;
+        default:
+            /*
+             * TODO: we should get the ISR to vq mapping and bind workers
+             * so vqs sharing a ISR share a worker.
+             */
+            if (cnt == workers_per_queue) {
+                w.pid = VHOST_VRING_NEW_WORKER;
+                cnt = 0;
+            } else {
+                cnt++;
+            }
+        }
+
+        ret = dev->vhost_ops->vhost_set_vring_worker(dev, &w);
+        if (ret == -ENOTTY) {
+            ret = 0;
+            break;
+        } else if (ret)
+            break;
+    }
+
+    return ret;
+}
+
 static void vhost_scsi_realize(DeviceState *dev, Error **errp)
 {
     VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
@@ -223,6 +279,13 @@ static void vhost_scsi_realize(DeviceState *dev, Error **errp)
         goto free_vqs;
     }
 
+    ret = vhost_scsi_set_workers(vsc, vs->conf.virtqueue_workers);
+    if (ret < 0) {
+        error_setg(errp, "vhost-scsi: vhost worker setup failed: %s",
+                   strerror(-ret));
+        goto free_vqs;
+    }
+
     /* At present, channel and lun both are 0 for bootable vhost-scsi disk */
     vsc->channel = 0;
     vsc->lun = 0;
@@ -281,6 +344,8 @@ static Property vhost_scsi_properties[] = {
                                                  VIRTIO_SCSI_F_T10_PI,
                                                  false),
     DEFINE_PROP_BOOL("migratable", VHostSCSICommon, migratable, false),
+    DEFINE_PROP_INT32("virtqueue_workers", VirtIOSCSICommon,
+                      conf.virtqueue_workers, 0),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
index b65f8f7e97bf..175ed1f1c3a8 100644
--- a/hw/virtio/vhost-backend.c
+++ b/hw/virtio/vhost-backend.c
@@ -152,6 +152,13 @@ static int vhost_kernel_set_vring_busyloop_timeout(struct vhost_dev *dev,
     return vhost_kernel_call(dev, VHOST_SET_VRING_BUSYLOOP_TIMEOUT, s);
 }
 
+static int vhost_kernel_set_vring_worker(struct vhost_dev *dev,
+                                         struct vhost_vring_worker *worker)
+{
+    return vhost_kernel_call(dev, VHOST_SET_VRING_WORKER, worker);
+}
+
+
 static int vhost_kernel_set_features(struct vhost_dev *dev,
                                      uint64_t features)
 {
@@ -313,6 +320,7 @@ const VhostOps kernel_ops = {
         .vhost_set_vring_call = vhost_kernel_set_vring_call,
         .vhost_set_vring_busyloop_timeout =
                                 vhost_kernel_set_vring_busyloop_timeout,
+        .vhost_set_vring_worker = vhost_kernel_set_vring_worker,
         .vhost_set_features = vhost_kernel_set_features,
         .vhost_get_features = vhost_kernel_get_features,
         .vhost_set_backend_cap = vhost_kernel_set_backend_cap,
diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
index 81bf3109f837..bede35cf4686 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -33,6 +33,7 @@ struct vhost_memory;
 struct vhost_vring_file;
 struct vhost_vring_state;
 struct vhost_vring_addr;
+struct vhost_vring_worker;
 struct vhost_scsi_target;
 struct vhost_iotlb_msg;
 struct vhost_virtqueue;
@@ -71,6 +72,8 @@ typedef int (*vhost_set_vring_call_op)(struct vhost_dev *dev,
                                        struct vhost_vring_file *file);
 typedef int (*vhost_set_vring_busyloop_timeout_op)(struct vhost_dev *dev,
                                                    struct vhost_vring_state *r);
+typedef int (*vhost_set_vring_worker_op)(struct vhost_dev *dev,
+                                         struct vhost_vring_worker *worker);
 typedef int (*vhost_set_features_op)(struct vhost_dev *dev,
                                      uint64_t features);
 typedef int (*vhost_get_features_op)(struct vhost_dev *dev,
@@ -146,6 +149,7 @@ typedef struct VhostOps {
     vhost_set_vring_kick_op vhost_set_vring_kick;
     vhost_set_vring_call_op vhost_set_vring_call;
     vhost_set_vring_busyloop_timeout_op vhost_set_vring_busyloop_timeout;
+    vhost_set_vring_worker_op vhost_set_vring_worker;
     vhost_set_features_op vhost_set_features;
     vhost_get_features_op vhost_get_features;
     vhost_set_backend_cap_op vhost_set_backend_cap;
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 543681bc1838..694221601dad 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -58,6 +58,7 @@ struct VirtIOSCSIConf {
 #ifdef CONFIG_VHOST_SCSI
     char *vhostfd;
     char *wwpn;
+    int virtqueue_workers;
 #endif
     CharBackend chardev;
     uint32_t boot_tpgt;
diff --git a/include/standard-headers/linux/vhost_types.h b/include/standard-headers/linux/vhost_types.h
index 0bd2684a2ae4..0b844969a524 100644
--- a/include/standard-headers/linux/vhost_types.h
+++ b/include/standard-headers/linux/vhost_types.h
@@ -27,6 +27,18 @@ struct vhost_vring_file {
 
 };
 
+#define VHOST_VRING_NEW_WORKER -1
+
+struct vhost_vring_worker {
+	unsigned int index;
+	/*
+	 * The pid of the vhost worker that the vq will be bound to. If
+	 * pid is VHOST_VRING_NEW_WORKER a new worker will be created and it's
+	 * pid will be returned in pid.
+	 */
+	pid_t pid;
+};
+
 struct vhost_vring_addr {
 	unsigned int index;
 	/* Option flags. */
diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
index c998860d7bbc..6823f3b5309c 100644
--- a/linux-headers/linux/vhost.h
+++ b/linux-headers/linux/vhost.h
@@ -70,6 +70,17 @@
 #define VHOST_VRING_BIG_ENDIAN 1
 #define VHOST_SET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_state)
 #define VHOST_GET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state)
+/* By default, a device gets one vhost_worker created during VHOST_SET_OWNER
+ * that its virtqueues share. This allows userspace to create a vhost_worker
+ * and bind a virtqueue to it or bind a virtqueue to an existing worker.
+ *
+ * If pid > 0 and it matches an existing vhost_worker thread it will be bound
+ * to the virtqueue. If pid is VHOST_VRING_NEW_WORKER, then a new worker will be
+ * created and bound to the virtqueue.
+ *
+ * This must be called after VHOST_SET_OWNER and before the virtqueue is active.
+ */
+#define VHOST_SET_VRING_WORKER _IOWR(VHOST_VIRTIO, 0x15, struct vhost_vring_worker)
 
 /* The following ioctls use eventfd file descriptors to signal and poll
  * for events. */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH] QEMU vhost-scsi: add support for VHOST_SET_VRING_WORKER
@ 2021-10-22  5:19   ` Mike Christie
  0 siblings, 0 replies; 74+ messages in thread
From: Mike Christie @ 2021-10-22  5:19 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, pbonzini, jasowang, mst,
	sgarzare, virtualization

This patch adds support for the proposed ioctl that allows userspace
to create virtqueue workers. For vhost-scsi you can set virtqueue_workers
to:

 0: default behavior where we have 1 worker for all vqs.
-1: create a worker per vq.
>0: create N workers and allow the vqs to share them.

TODO:
- Better support for sharing workers where we bind based on ISR to vq
mapping.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 hw/scsi/vhost-scsi.c                         | 65 ++++++++++++++++++++
 hw/virtio/vhost-backend.c                    |  8 +++
 include/hw/virtio/vhost-backend.h            |  4 ++
 include/hw/virtio/virtio-scsi.h              |  1 +
 include/standard-headers/linux/vhost_types.h | 12 ++++
 linux-headers/linux/vhost.h                  | 11 ++++
 6 files changed, 101 insertions(+)

diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index 039caf2614eb..74363f9a519e 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -163,6 +163,62 @@ static const VMStateDescription vmstate_virtio_vhost_scsi = {
     .pre_save = vhost_scsi_pre_save,
 };
 
+static int vhost_scsi_set_workers(VHostSCSICommon *vsc, int vq_workers)
+{
+    struct vhost_dev *dev = &vsc->dev;
+    int workers_per_queue = 1, io_queues;
+    struct vhost_vring_worker w;
+    int i, ret, cnt = 0;
+
+    if (vq_workers < VHOST_VRING_NEW_WORKER)
+        return -EINVAL;
+
+    if (vq_workers == 0 ||
+        dev->nvqs == VHOST_SCSI_VQ_NUM_FIXED + 1)
+        /* Use the single default worker */
+        return 0;
+
+    io_queues = dev->nvqs - VHOST_SCSI_VQ_NUM_FIXED;
+    if (vq_workers > 0 && io_queues > vq_workers)
+        workers_per_queue = io_queues / vq_workers;
+
+    w.pid = VHOST_VRING_NEW_WORKER;
+    /*
+     * ctl/evt share the first worker since it will be rare for them
+     * to send cmds while IO is running. The rest of the vqs get their
+     * own worker.
+     */
+    for (i = VHOST_SCSI_VQ_NUM_FIXED + 1; i < dev->nvqs; i++) {
+        w.index = i;
+
+        switch (vq_workers) {
+        case VHOST_VRING_NEW_WORKER:
+            w.pid = VHOST_VRING_NEW_WORKER;
+            break;
+        default:
+            /*
+             * TODO: we should get the ISR to vq mapping and bind workers
+             * so vqs sharing a ISR share a worker.
+             */
+            if (cnt == workers_per_queue) {
+                w.pid = VHOST_VRING_NEW_WORKER;
+                cnt = 0;
+            } else {
+                cnt++;
+            }
+        }
+
+        ret = dev->vhost_ops->vhost_set_vring_worker(dev, &w);
+        if (ret == -ENOTTY) {
+            ret = 0;
+            break;
+        } else if (ret)
+            break;
+    }
+
+    return ret;
+}
+
 static void vhost_scsi_realize(DeviceState *dev, Error **errp)
 {
     VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
@@ -223,6 +279,13 @@ static void vhost_scsi_realize(DeviceState *dev, Error **errp)
         goto free_vqs;
     }
 
+    ret = vhost_scsi_set_workers(vsc, vs->conf.virtqueue_workers);
+    if (ret < 0) {
+        error_setg(errp, "vhost-scsi: vhost worker setup failed: %s",
+                   strerror(-ret));
+        goto free_vqs;
+    }
+
     /* At present, channel and lun both are 0 for bootable vhost-scsi disk */
     vsc->channel = 0;
     vsc->lun = 0;
@@ -281,6 +344,8 @@ static Property vhost_scsi_properties[] = {
                                                  VIRTIO_SCSI_F_T10_PI,
                                                  false),
     DEFINE_PROP_BOOL("migratable", VHostSCSICommon, migratable, false),
+    DEFINE_PROP_INT32("virtqueue_workers", VirtIOSCSICommon,
+                      conf.virtqueue_workers, 0),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
index b65f8f7e97bf..175ed1f1c3a8 100644
--- a/hw/virtio/vhost-backend.c
+++ b/hw/virtio/vhost-backend.c
@@ -152,6 +152,13 @@ static int vhost_kernel_set_vring_busyloop_timeout(struct vhost_dev *dev,
     return vhost_kernel_call(dev, VHOST_SET_VRING_BUSYLOOP_TIMEOUT, s);
 }
 
+static int vhost_kernel_set_vring_worker(struct vhost_dev *dev,
+                                         struct vhost_vring_worker *worker)
+{
+    return vhost_kernel_call(dev, VHOST_SET_VRING_WORKER, worker);
+}
+
+
 static int vhost_kernel_set_features(struct vhost_dev *dev,
                                      uint64_t features)
 {
@@ -313,6 +320,7 @@ const VhostOps kernel_ops = {
         .vhost_set_vring_call = vhost_kernel_set_vring_call,
         .vhost_set_vring_busyloop_timeout =
                                 vhost_kernel_set_vring_busyloop_timeout,
+        .vhost_set_vring_worker = vhost_kernel_set_vring_worker,
         .vhost_set_features = vhost_kernel_set_features,
         .vhost_get_features = vhost_kernel_get_features,
         .vhost_set_backend_cap = vhost_kernel_set_backend_cap,
diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
index 81bf3109f837..bede35cf4686 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -33,6 +33,7 @@ struct vhost_memory;
 struct vhost_vring_file;
 struct vhost_vring_state;
 struct vhost_vring_addr;
+struct vhost_vring_worker;
 struct vhost_scsi_target;
 struct vhost_iotlb_msg;
 struct vhost_virtqueue;
@@ -71,6 +72,8 @@ typedef int (*vhost_set_vring_call_op)(struct vhost_dev *dev,
                                        struct vhost_vring_file *file);
 typedef int (*vhost_set_vring_busyloop_timeout_op)(struct vhost_dev *dev,
                                                    struct vhost_vring_state *r);
+typedef int (*vhost_set_vring_worker_op)(struct vhost_dev *dev,
+                                         struct vhost_vring_worker *worker);
 typedef int (*vhost_set_features_op)(struct vhost_dev *dev,
                                      uint64_t features);
 typedef int (*vhost_get_features_op)(struct vhost_dev *dev,
@@ -146,6 +149,7 @@ typedef struct VhostOps {
     vhost_set_vring_kick_op vhost_set_vring_kick;
     vhost_set_vring_call_op vhost_set_vring_call;
     vhost_set_vring_busyloop_timeout_op vhost_set_vring_busyloop_timeout;
+    vhost_set_vring_worker_op vhost_set_vring_worker;
     vhost_set_features_op vhost_set_features;
     vhost_get_features_op vhost_get_features;
     vhost_set_backend_cap_op vhost_set_backend_cap;
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 543681bc1838..694221601dad 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -58,6 +58,7 @@ struct VirtIOSCSIConf {
 #ifdef CONFIG_VHOST_SCSI
     char *vhostfd;
     char *wwpn;
+    int virtqueue_workers;
 #endif
     CharBackend chardev;
     uint32_t boot_tpgt;
diff --git a/include/standard-headers/linux/vhost_types.h b/include/standard-headers/linux/vhost_types.h
index 0bd2684a2ae4..0b844969a524 100644
--- a/include/standard-headers/linux/vhost_types.h
+++ b/include/standard-headers/linux/vhost_types.h
@@ -27,6 +27,18 @@ struct vhost_vring_file {
 
 };
 
+#define VHOST_VRING_NEW_WORKER -1
+
+struct vhost_vring_worker {
+	unsigned int index;
+	/*
+	 * The pid of the vhost worker that the vq will be bound to. If
+	 * pid is VHOST_VRING_NEW_WORKER a new worker will be created and it's
+	 * pid will be returned in pid.
+	 */
+	pid_t pid;
+};
+
 struct vhost_vring_addr {
 	unsigned int index;
 	/* Option flags. */
diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
index c998860d7bbc..6823f3b5309c 100644
--- a/linux-headers/linux/vhost.h
+++ b/linux-headers/linux/vhost.h
@@ -70,6 +70,17 @@
 #define VHOST_VRING_BIG_ENDIAN 1
 #define VHOST_SET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_state)
 #define VHOST_GET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state)
+/* By default, a device gets one vhost_worker created during VHOST_SET_OWNER
+ * that its virtqueues share. This allows userspace to create a vhost_worker
+ * and bind a virtqueue to it or bind a virtqueue to an existing worker.
+ *
+ * If pid > 0 and it matches an existing vhost_worker thread it will be bound
+ * to the virtqueue. If pid is VHOST_VRING_NEW_WORKER, then a new worker will be
+ * created and bound to the virtqueue.
+ *
+ * This must be called after VHOST_SET_OWNER and before the virtqueue is active.
+ */
+#define VHOST_SET_VRING_WORKER _IOWR(VHOST_VIRTIO, 0x15, struct vhost_vring_worker)
 
 /* The following ioctls use eventfd file descriptors to signal and poll
  * for events. */
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH V3 01/11] vhost: add vhost_worker pointer to vhost_virtqueue
  2021-10-22  5:18 ` Mike Christie
@ 2021-10-22  5:19   ` Mike Christie
  -1 siblings, 0 replies; 74+ messages in thread
From: Mike Christie @ 2021-10-22  5:19 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, pbonzini, jasowang, mst,
	sgarzare, virtualization

This patchset allows userspace to map vqs to different workers. This
patch adds a worker pointer to the vq so we can store that info.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/vhost.c | 24 +++++++++++++-----------
 drivers/vhost/vhost.h |  1 +
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 9aa04fcdf210..9b4e7d74dab4 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -496,6 +496,7 @@ void vhost_dev_init(struct vhost_dev *dev,
 		vq->log = NULL;
 		vq->indirect = NULL;
 		vq->heads = NULL;
+		vq->worker = NULL;
 		vq->dev = dev;
 		mutex_init(&vq->mutex);
 		vhost_vq_reset(dev, vq);
@@ -574,15 +575,14 @@ static void vhost_worker_free(struct vhost_dev *dev)
 	kfree(worker);
 }
 
-static int vhost_worker_create(struct vhost_dev *dev)
+static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
 {
 	struct vhost_worker *worker;
 	struct task_struct *task;
-	int ret;
 
 	worker = kzalloc(sizeof(*worker), GFP_KERNEL_ACCOUNT);
 	if (!worker)
-		return -ENOMEM;
+		return NULL;
 
 	dev->worker = worker;
 	worker->kcov_handle = kcov_common_handle();
@@ -594,25 +594,24 @@ static int vhost_worker_create(struct vhost_dev *dev)
 	 */
 	task = kernel_worker(vhost_worker, worker, NUMA_NO_NODE, CLONE_FS,
 			     KERN_WORKER_NO_FILES | KERN_WORKER_SIG_IGN);
-	if (IS_ERR(task)) {
-		ret = PTR_ERR(task);
+	if (IS_ERR(task))
 		goto free_worker;
-	}
 
 	worker->task = task;
 	kernel_worker_start(task, "vhost-%d", current->pid);
-	return 0;
+	return worker;
 
 free_worker:
 	kfree(worker);
 	dev->worker = NULL;
-	return ret;
+	return NULL;
 }
 
 /* Caller should have device mutex */
 long vhost_dev_set_owner(struct vhost_dev *dev)
 {
-	int err;
+	struct vhost_worker *worker;
+	int err, i;
 
 	/* Is there an owner already? */
 	if (vhost_dev_has_owner(dev)) {
@@ -623,9 +622,12 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
 	vhost_attach_mm(dev);
 
 	if (dev->use_worker) {
-		err = vhost_worker_create(dev);
-		if (err)
+		worker = vhost_worker_create(dev);
+		if (!worker)
 			goto err_worker;
+
+		for (i = 0; i < dev->nvqs; i++)
+			dev->vqs[i]->worker = worker;
 	}
 
 	err = vhost_dev_alloc_iovecs(dev);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 09748694cb66..7ee38b4a2807 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -80,6 +80,7 @@ struct vhost_vring_call {
 /* The virtqueue structure describes a queue attached to a device. */
 struct vhost_virtqueue {
 	struct vhost_dev *dev;
+	struct vhost_worker *worker;
 
 	/* The actual ring of buffers. */
 	struct mutex mutex;
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH V3 01/11] vhost: add vhost_worker pointer to vhost_virtqueue
@ 2021-10-22  5:19   ` Mike Christie
  0 siblings, 0 replies; 74+ messages in thread
From: Mike Christie @ 2021-10-22  5:19 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, pbonzini, jasowang, mst,
	sgarzare, virtualization
  Cc: Mike Christie

This patchset allows userspace to map vqs to different workers. This
patch adds a worker pointer to the vq so we can store that info.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/vhost.c | 24 +++++++++++++-----------
 drivers/vhost/vhost.h |  1 +
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 9aa04fcdf210..9b4e7d74dab4 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -496,6 +496,7 @@ void vhost_dev_init(struct vhost_dev *dev,
 		vq->log = NULL;
 		vq->indirect = NULL;
 		vq->heads = NULL;
+		vq->worker = NULL;
 		vq->dev = dev;
 		mutex_init(&vq->mutex);
 		vhost_vq_reset(dev, vq);
@@ -574,15 +575,14 @@ static void vhost_worker_free(struct vhost_dev *dev)
 	kfree(worker);
 }
 
-static int vhost_worker_create(struct vhost_dev *dev)
+static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
 {
 	struct vhost_worker *worker;
 	struct task_struct *task;
-	int ret;
 
 	worker = kzalloc(sizeof(*worker), GFP_KERNEL_ACCOUNT);
 	if (!worker)
-		return -ENOMEM;
+		return NULL;
 
 	dev->worker = worker;
 	worker->kcov_handle = kcov_common_handle();
@@ -594,25 +594,24 @@ static int vhost_worker_create(struct vhost_dev *dev)
 	 */
 	task = kernel_worker(vhost_worker, worker, NUMA_NO_NODE, CLONE_FS,
 			     KERN_WORKER_NO_FILES | KERN_WORKER_SIG_IGN);
-	if (IS_ERR(task)) {
-		ret = PTR_ERR(task);
+	if (IS_ERR(task))
 		goto free_worker;
-	}
 
 	worker->task = task;
 	kernel_worker_start(task, "vhost-%d", current->pid);
-	return 0;
+	return worker;
 
 free_worker:
 	kfree(worker);
 	dev->worker = NULL;
-	return ret;
+	return NULL;
 }
 
 /* Caller should have device mutex */
 long vhost_dev_set_owner(struct vhost_dev *dev)
 {
-	int err;
+	struct vhost_worker *worker;
+	int err, i;
 
 	/* Is there an owner already? */
 	if (vhost_dev_has_owner(dev)) {
@@ -623,9 +622,12 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
 	vhost_attach_mm(dev);
 
 	if (dev->use_worker) {
-		err = vhost_worker_create(dev);
-		if (err)
+		worker = vhost_worker_create(dev);
+		if (!worker)
 			goto err_worker;
+
+		for (i = 0; i < dev->nvqs; i++)
+			dev->vqs[i]->worker = worker;
 	}
 
 	err = vhost_dev_alloc_iovecs(dev);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 09748694cb66..7ee38b4a2807 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -80,6 +80,7 @@ struct vhost_vring_call {
 /* The virtqueue structure describes a queue attached to a device. */
 struct vhost_virtqueue {
 	struct vhost_dev *dev;
+	struct vhost_worker *worker;
 
 	/* The actual ring of buffers. */
 	struct mutex mutex;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH V3 02/11] vhost, vhost-net: add helper to check if vq has work
  2021-10-22  5:18 ` Mike Christie
@ 2021-10-22  5:19   ` Mike Christie
  -1 siblings, 0 replies; 74+ messages in thread
From: Mike Christie @ 2021-10-22  5:19 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, pbonzini, jasowang, mst,
	sgarzare, virtualization
  Cc: Mike Christie

This adds a helper to check if a vq has work pending and converts
vhost-net to use it.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/net.c   | 2 +-
 drivers/vhost/vhost.c | 6 +++---
 drivers/vhost/vhost.h | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 28ef323882fb..6c4b2b2158bb 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -543,7 +543,7 @@ static void vhost_net_busy_poll(struct vhost_net *net,
 	endtime = busy_clock() + busyloop_timeout;
 
 	while (vhost_can_busy_poll(endtime)) {
-		if (vhost_has_work(&net->dev)) {
+		if (vhost_vq_has_work(vq)) {
 			*busyloop_intr = true;
 			break;
 		}
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 9b4e7d74dab4..f6bfa1c481df 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -269,11 +269,11 @@ void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
 EXPORT_SYMBOL_GPL(vhost_work_queue);
 
 /* A lockless hint for busy polling code to exit the loop */
-bool vhost_has_work(struct vhost_dev *dev)
+bool vhost_vq_has_work(struct vhost_virtqueue *vq)
 {
-	return dev->worker && !llist_empty(&dev->worker->work_list);
+	return vq->worker && !llist_empty(&vq->worker->work_list);
 }
-EXPORT_SYMBOL_GPL(vhost_has_work);
+EXPORT_SYMBOL_GPL(vhost_vq_has_work);
 
 void vhost_poll_queue(struct vhost_poll *poll)
 {
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 7ee38b4a2807..37989518a07c 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -50,7 +50,6 @@ struct vhost_poll {
 
 void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
 void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
-bool vhost_has_work(struct vhost_dev *dev);
 
 void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
 		     __poll_t mask, struct vhost_dev *dev);
@@ -200,6 +199,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *,
 		      struct vhost_log *log, unsigned int *log_num);
 void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
 
+bool vhost_vq_has_work(struct vhost_virtqueue *vq);
 bool vhost_vq_is_setup(struct vhost_virtqueue *vq);
 int vhost_vq_init_access(struct vhost_virtqueue *);
 int vhost_add_used(struct vhost_virtqueue *, unsigned int head, int len);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH V3 02/11] vhost, vhost-net: add helper to check if vq has work
@ 2021-10-22  5:19   ` Mike Christie
  0 siblings, 0 replies; 74+ messages in thread
From: Mike Christie @ 2021-10-22  5:19 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, pbonzini, jasowang, mst,
	sgarzare, virtualization

This adds a helper to check if a vq has work pending and converts
vhost-net to use it.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/net.c   | 2 +-
 drivers/vhost/vhost.c | 6 +++---
 drivers/vhost/vhost.h | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 28ef323882fb..6c4b2b2158bb 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -543,7 +543,7 @@ static void vhost_net_busy_poll(struct vhost_net *net,
 	endtime = busy_clock() + busyloop_timeout;
 
 	while (vhost_can_busy_poll(endtime)) {
-		if (vhost_has_work(&net->dev)) {
+		if (vhost_vq_has_work(vq)) {
 			*busyloop_intr = true;
 			break;
 		}
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 9b4e7d74dab4..f6bfa1c481df 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -269,11 +269,11 @@ void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
 EXPORT_SYMBOL_GPL(vhost_work_queue);
 
 /* A lockless hint for busy polling code to exit the loop */
-bool vhost_has_work(struct vhost_dev *dev)
+bool vhost_vq_has_work(struct vhost_virtqueue *vq)
 {
-	return dev->worker && !llist_empty(&dev->worker->work_list);
+	return vq->worker && !llist_empty(&vq->worker->work_list);
 }
-EXPORT_SYMBOL_GPL(vhost_has_work);
+EXPORT_SYMBOL_GPL(vhost_vq_has_work);
 
 void vhost_poll_queue(struct vhost_poll *poll)
 {
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 7ee38b4a2807..37989518a07c 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -50,7 +50,6 @@ struct vhost_poll {
 
 void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
 void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
-bool vhost_has_work(struct vhost_dev *dev);
 
 void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
 		     __poll_t mask, struct vhost_dev *dev);
@@ -200,6 +199,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *,
 		      struct vhost_log *log, unsigned int *log_num);
 void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
 
+bool vhost_vq_has_work(struct vhost_virtqueue *vq);
 bool vhost_vq_is_setup(struct vhost_virtqueue *vq);
 int vhost_vq_init_access(struct vhost_virtqueue *);
 int vhost_add_used(struct vhost_virtqueue *, unsigned int head, int len);
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH V3 03/11] vhost: take worker or vq instead of dev for queueing
  2021-10-22  5:18 ` Mike Christie
@ 2021-10-22  5:19   ` Mike Christie
  -1 siblings, 0 replies; 74+ messages in thread
From: Mike Christie @ 2021-10-22  5:19 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, pbonzini, jasowang, mst,
	sgarzare, virtualization
  Cc: Mike Christie

This patch has the core work queueing function take a worker for when we
support multiple workers. It also adds a helper that takes a vq during
queueing so modules can control which vq/worker to queue work on.

This temp leaves vhost_work_queue. It will be removed when the drivers
are converted in the next patches.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/vhost.c | 44 +++++++++++++++++++++++++++----------------
 drivers/vhost/vhost.h |  1 +
 2 files changed, 29 insertions(+), 16 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index f6bfa1c481df..f2c6981297d0 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -230,6 +230,34 @@ void vhost_poll_stop(struct vhost_poll *poll)
 }
 EXPORT_SYMBOL_GPL(vhost_poll_stop);
 
+static void vhost_work_queue_on(struct vhost_worker *worker,
+				struct vhost_work *work)
+{
+	if (!worker)
+		return;
+
+	if (!test_and_set_bit(VHOST_WORK_QUEUED, &work->flags)) {
+		/* We can only add the work to the list after we're
+		 * sure it was not in the list.
+		 * test_and_set_bit() implies a memory barrier.
+		 */
+		llist_add(&work->node, &worker->work_list);
+		wake_up_process(worker->task);
+	}
+}
+
+void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
+{
+	vhost_work_queue_on(dev->worker, work);
+}
+EXPORT_SYMBOL_GPL(vhost_work_queue);
+
+void vhost_vq_work_queue(struct vhost_virtqueue *vq, struct vhost_work *work)
+{
+	vhost_work_queue_on(vq->worker, work);
+}
+EXPORT_SYMBOL_GPL(vhost_vq_work_queue);
+
 void vhost_work_dev_flush(struct vhost_dev *dev)
 {
 	struct vhost_flush_struct flush;
@@ -252,22 +280,6 @@ void vhost_poll_flush(struct vhost_poll *poll)
 }
 EXPORT_SYMBOL_GPL(vhost_poll_flush);
 
-void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
-{
-	if (!dev->worker)
-		return;
-
-	if (!test_and_set_bit(VHOST_WORK_QUEUED, &work->flags)) {
-		/* We can only add the work to the list after we're
-		 * sure it was not in the list.
-		 * test_and_set_bit() implies a memory barrier.
-		 */
-		llist_add(&work->node, &dev->worker->work_list);
-		wake_up_process(dev->worker->task);
-	}
-}
-EXPORT_SYMBOL_GPL(vhost_work_queue);
-
 /* A lockless hint for busy polling code to exit the loop */
 bool vhost_vq_has_work(struct vhost_virtqueue *vq)
 {
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 37989518a07c..8f209d1c8890 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -199,6 +199,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *,
 		      struct vhost_log *log, unsigned int *log_num);
 void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
 
+void vhost_vq_work_queue(struct vhost_virtqueue *vq, struct vhost_work *work);
 bool vhost_vq_has_work(struct vhost_virtqueue *vq);
 bool vhost_vq_is_setup(struct vhost_virtqueue *vq);
 int vhost_vq_init_access(struct vhost_virtqueue *);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH V3 03/11] vhost: take worker or vq instead of dev for queueing
@ 2021-10-22  5:19   ` Mike Christie
  0 siblings, 0 replies; 74+ messages in thread
From: Mike Christie @ 2021-10-22  5:19 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, pbonzini, jasowang, mst,
	sgarzare, virtualization

This patch has the core work queueing function take a worker for when we
support multiple workers. It also adds a helper that takes a vq during
queueing so modules can control which vq/worker to queue work on.

This temp leaves vhost_work_queue. It will be removed when the drivers
are converted in the next patches.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/vhost.c | 44 +++++++++++++++++++++++++++----------------
 drivers/vhost/vhost.h |  1 +
 2 files changed, 29 insertions(+), 16 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index f6bfa1c481df..f2c6981297d0 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -230,6 +230,34 @@ void vhost_poll_stop(struct vhost_poll *poll)
 }
 EXPORT_SYMBOL_GPL(vhost_poll_stop);
 
+static void vhost_work_queue_on(struct vhost_worker *worker,
+				struct vhost_work *work)
+{
+	if (!worker)
+		return;
+
+	if (!test_and_set_bit(VHOST_WORK_QUEUED, &work->flags)) {
+		/* We can only add the work to the list after we're
+		 * sure it was not in the list.
+		 * test_and_set_bit() implies a memory barrier.
+		 */
+		llist_add(&work->node, &worker->work_list);
+		wake_up_process(worker->task);
+	}
+}
+
+void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
+{
+	vhost_work_queue_on(dev->worker, work);
+}
+EXPORT_SYMBOL_GPL(vhost_work_queue);
+
+void vhost_vq_work_queue(struct vhost_virtqueue *vq, struct vhost_work *work)
+{
+	vhost_work_queue_on(vq->worker, work);
+}
+EXPORT_SYMBOL_GPL(vhost_vq_work_queue);
+
 void vhost_work_dev_flush(struct vhost_dev *dev)
 {
 	struct vhost_flush_struct flush;
@@ -252,22 +280,6 @@ void vhost_poll_flush(struct vhost_poll *poll)
 }
 EXPORT_SYMBOL_GPL(vhost_poll_flush);
 
-void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
-{
-	if (!dev->worker)
-		return;
-
-	if (!test_and_set_bit(VHOST_WORK_QUEUED, &work->flags)) {
-		/* We can only add the work to the list after we're
-		 * sure it was not in the list.
-		 * test_and_set_bit() implies a memory barrier.
-		 */
-		llist_add(&work->node, &dev->worker->work_list);
-		wake_up_process(dev->worker->task);
-	}
-}
-EXPORT_SYMBOL_GPL(vhost_work_queue);
-
 /* A lockless hint for busy polling code to exit the loop */
 bool vhost_vq_has_work(struct vhost_virtqueue *vq)
 {
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 37989518a07c..8f209d1c8890 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -199,6 +199,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *,
 		      struct vhost_log *log, unsigned int *log_num);
 void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
 
+void vhost_vq_work_queue(struct vhost_virtqueue *vq, struct vhost_work *work);
 bool vhost_vq_has_work(struct vhost_virtqueue *vq);
 bool vhost_vq_is_setup(struct vhost_virtqueue *vq);
 int vhost_vq_init_access(struct vhost_virtqueue *);
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH V3 04/11] vhost: take worker or vq instead of dev for flushing
  2021-10-22  5:18 ` Mike Christie
@ 2021-10-22  5:19   ` Mike Christie
  -1 siblings, 0 replies; 74+ messages in thread
From: Mike Christie @ 2021-10-22  5:19 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, pbonzini, jasowang, mst,
	sgarzare, virtualization
  Cc: Mike Christie

This patch has the core work flush function take a worker for when we
support multiple workers. It also adds a helper that takes a vq during
flushing so modules can control which vq/worker to flush.

This temp leaves vhost_work_dev_flush. It will be removed when the drivers
are converted in the next patches.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/vhost.c | 30 +++++++++++++++++++++---------
 drivers/vhost/vhost.h |  1 +
 2 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index f2c6981297d0..e4d765f6c821 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -246,6 +246,26 @@ static void vhost_work_queue_on(struct vhost_worker *worker,
 	}
 }
 
+static void vhost_work_flush_on(struct vhost_worker *worker)
+{
+	struct vhost_flush_struct flush;
+
+	if (!worker)
+		return;
+
+	init_completion(&flush.wait_event);
+	vhost_work_init(&flush.work, vhost_flush_work);
+
+	vhost_work_queue_on(worker, &flush.work);
+	wait_for_completion(&flush.wait_event);
+}
+
+void vhost_vq_work_flush(struct vhost_virtqueue *vq)
+{
+	vhost_work_flush_on(vq->worker);
+}
+EXPORT_SYMBOL_GPL(vhost_vq_work_flush);
+
 void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
 {
 	vhost_work_queue_on(dev->worker, work);
@@ -260,15 +280,7 @@ EXPORT_SYMBOL_GPL(vhost_vq_work_queue);
 
 void vhost_work_dev_flush(struct vhost_dev *dev)
 {
-	struct vhost_flush_struct flush;
-
-	if (dev->worker) {
-		init_completion(&flush.wait_event);
-		vhost_work_init(&flush.work, vhost_flush_work);
-
-		vhost_work_queue(dev, &flush.work);
-		wait_for_completion(&flush.wait_event);
-	}
+	vhost_work_flush_on(dev->worker);
 }
 EXPORT_SYMBOL_GPL(vhost_work_dev_flush);
 
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 8f209d1c8890..a985caa35633 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -199,6 +199,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *,
 		      struct vhost_log *log, unsigned int *log_num);
 void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
 
+void vhost_vq_work_flush(struct vhost_virtqueue *vq);
 void vhost_vq_work_queue(struct vhost_virtqueue *vq, struct vhost_work *work);
 bool vhost_vq_has_work(struct vhost_virtqueue *vq);
 bool vhost_vq_is_setup(struct vhost_virtqueue *vq);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH V3 04/11] vhost: take worker or vq instead of dev for flushing
@ 2021-10-22  5:19   ` Mike Christie
  0 siblings, 0 replies; 74+ messages in thread
From: Mike Christie @ 2021-10-22  5:19 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, pbonzini, jasowang, mst,
	sgarzare, virtualization

This patch has the core work flush function take a worker for when we
support multiple workers. It also adds a helper that takes a vq during
flushing so modules can control which vq/worker to flush.

This temp leaves vhost_work_dev_flush. It will be removed when the drivers
are converted in the next patches.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/vhost.c | 30 +++++++++++++++++++++---------
 drivers/vhost/vhost.h |  1 +
 2 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index f2c6981297d0..e4d765f6c821 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -246,6 +246,26 @@ static void vhost_work_queue_on(struct vhost_worker *worker,
 	}
 }
 
+static void vhost_work_flush_on(struct vhost_worker *worker)
+{
+	struct vhost_flush_struct flush;
+
+	if (!worker)
+		return;
+
+	init_completion(&flush.wait_event);
+	vhost_work_init(&flush.work, vhost_flush_work);
+
+	vhost_work_queue_on(worker, &flush.work);
+	wait_for_completion(&flush.wait_event);
+}
+
+void vhost_vq_work_flush(struct vhost_virtqueue *vq)
+{
+	vhost_work_flush_on(vq->worker);
+}
+EXPORT_SYMBOL_GPL(vhost_vq_work_flush);
+
 void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
 {
 	vhost_work_queue_on(dev->worker, work);
@@ -260,15 +280,7 @@ EXPORT_SYMBOL_GPL(vhost_vq_work_queue);
 
 void vhost_work_dev_flush(struct vhost_dev *dev)
 {
-	struct vhost_flush_struct flush;
-
-	if (dev->worker) {
-		init_completion(&flush.wait_event);
-		vhost_work_init(&flush.work, vhost_flush_work);
-
-		vhost_work_queue(dev, &flush.work);
-		wait_for_completion(&flush.wait_event);
-	}
+	vhost_work_flush_on(dev->worker);
 }
 EXPORT_SYMBOL_GPL(vhost_work_dev_flush);
 
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 8f209d1c8890..a985caa35633 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -199,6 +199,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *,
 		      struct vhost_log *log, unsigned int *log_num);
 void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
 
+void vhost_vq_work_flush(struct vhost_virtqueue *vq);
 void vhost_vq_work_queue(struct vhost_virtqueue *vq, struct vhost_work *work);
 bool vhost_vq_has_work(struct vhost_virtqueue *vq);
 bool vhost_vq_is_setup(struct vhost_virtqueue *vq);
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH V3 05/11] vhost: convert poll work to be vq based
  2021-10-22  5:18 ` Mike Christie
@ 2021-10-22  5:19   ` Mike Christie
  -1 siblings, 0 replies; 74+ messages in thread
From: Mike Christie @ 2021-10-22  5:19 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, pbonzini, jasowang, mst,
	sgarzare, virtualization
  Cc: Mike Christie

This has the drivers pass in their poll to vq mapping and then converts
the core poll code to use the vq based helpers.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/net.c   |  6 ++++--
 drivers/vhost/vhost.c | 10 ++++++----
 drivers/vhost/vhost.h |  4 +++-
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 6c4b2b2158bb..2e1abab45f3c 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1339,8 +1339,10 @@ static int vhost_net_open(struct inode *inode, struct file *f)
 		       VHOST_NET_PKT_WEIGHT, VHOST_NET_WEIGHT, true,
 		       NULL);
 
-	vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, EPOLLOUT, dev);
-	vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, EPOLLIN, dev);
+	vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, EPOLLOUT, dev,
+			vqs[VHOST_NET_VQ_TX]);
+	vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, EPOLLIN, dev,
+			vqs[VHOST_NET_VQ_RX]);
 
 	f->private_data = n;
 	n->page_frag.page = NULL;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index e4d765f6c821..c84024afefff 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -186,13 +186,15 @@ EXPORT_SYMBOL_GPL(vhost_work_init);
 
 /* Init poll structure */
 void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
-		     __poll_t mask, struct vhost_dev *dev)
+		     __poll_t mask, struct vhost_dev *dev,
+		     struct vhost_virtqueue *vq)
 {
 	init_waitqueue_func_entry(&poll->wait, vhost_poll_wakeup);
 	init_poll_funcptr(&poll->table, vhost_poll_func);
 	poll->mask = mask;
 	poll->dev = dev;
 	poll->wqh = NULL;
+	poll->vq = vq;
 
 	vhost_work_init(&poll->work, fn);
 }
@@ -288,7 +290,7 @@ EXPORT_SYMBOL_GPL(vhost_work_dev_flush);
  * locks that are also used by the callback. */
 void vhost_poll_flush(struct vhost_poll *poll)
 {
-	vhost_work_dev_flush(poll->dev);
+	vhost_vq_work_flush(poll->vq);
 }
 EXPORT_SYMBOL_GPL(vhost_poll_flush);
 
@@ -301,7 +303,7 @@ EXPORT_SYMBOL_GPL(vhost_vq_has_work);
 
 void vhost_poll_queue(struct vhost_poll *poll)
 {
-	vhost_work_queue(poll->dev, &poll->work);
+	vhost_vq_work_queue(poll->vq, &poll->work);
 }
 EXPORT_SYMBOL_GPL(vhost_poll_queue);
 
@@ -526,7 +528,7 @@ void vhost_dev_init(struct vhost_dev *dev,
 		vhost_vq_reset(dev, vq);
 		if (vq->handle_kick)
 			vhost_poll_init(&vq->poll, vq->handle_kick,
-					EPOLLIN, dev);
+					EPOLLIN, dev, vq);
 	}
 }
 EXPORT_SYMBOL_GPL(vhost_dev_init);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index a985caa35633..d9650da2be2c 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -46,13 +46,15 @@ struct vhost_poll {
 	struct vhost_work	work;
 	__poll_t		mask;
 	struct vhost_dev	*dev;
+	struct vhost_virtqueue	*vq;
 };
 
 void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
 void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
 
 void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
-		     __poll_t mask, struct vhost_dev *dev);
+		     __poll_t mask, struct vhost_dev *dev,
+		     struct vhost_virtqueue *vq);
 int vhost_poll_start(struct vhost_poll *poll, struct file *file);
 void vhost_poll_stop(struct vhost_poll *poll);
 void vhost_poll_flush(struct vhost_poll *poll);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH V3 05/11] vhost: convert poll work to be vq based
@ 2021-10-22  5:19   ` Mike Christie
  0 siblings, 0 replies; 74+ messages in thread
From: Mike Christie @ 2021-10-22  5:19 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, pbonzini, jasowang, mst,
	sgarzare, virtualization

This has the drivers pass in their poll to vq mapping and then converts
the core poll code to use the vq based helpers.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/net.c   |  6 ++++--
 drivers/vhost/vhost.c | 10 ++++++----
 drivers/vhost/vhost.h |  4 +++-
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 6c4b2b2158bb..2e1abab45f3c 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1339,8 +1339,10 @@ static int vhost_net_open(struct inode *inode, struct file *f)
 		       VHOST_NET_PKT_WEIGHT, VHOST_NET_WEIGHT, true,
 		       NULL);
 
-	vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, EPOLLOUT, dev);
-	vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, EPOLLIN, dev);
+	vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, EPOLLOUT, dev,
+			vqs[VHOST_NET_VQ_TX]);
+	vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, EPOLLIN, dev,
+			vqs[VHOST_NET_VQ_RX]);
 
 	f->private_data = n;
 	n->page_frag.page = NULL;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index e4d765f6c821..c84024afefff 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -186,13 +186,15 @@ EXPORT_SYMBOL_GPL(vhost_work_init);
 
 /* Init poll structure */
 void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
-		     __poll_t mask, struct vhost_dev *dev)
+		     __poll_t mask, struct vhost_dev *dev,
+		     struct vhost_virtqueue *vq)
 {
 	init_waitqueue_func_entry(&poll->wait, vhost_poll_wakeup);
 	init_poll_funcptr(&poll->table, vhost_poll_func);
 	poll->mask = mask;
 	poll->dev = dev;
 	poll->wqh = NULL;
+	poll->vq = vq;
 
 	vhost_work_init(&poll->work, fn);
 }
@@ -288,7 +290,7 @@ EXPORT_SYMBOL_GPL(vhost_work_dev_flush);
  * locks that are also used by the callback. */
 void vhost_poll_flush(struct vhost_poll *poll)
 {
-	vhost_work_dev_flush(poll->dev);
+	vhost_vq_work_flush(poll->vq);
 }
 EXPORT_SYMBOL_GPL(vhost_poll_flush);
 
@@ -301,7 +303,7 @@ EXPORT_SYMBOL_GPL(vhost_vq_has_work);
 
 void vhost_poll_queue(struct vhost_poll *poll)
 {
-	vhost_work_queue(poll->dev, &poll->work);
+	vhost_vq_work_queue(poll->vq, &poll->work);
 }
 EXPORT_SYMBOL_GPL(vhost_poll_queue);
 
@@ -526,7 +528,7 @@ void vhost_dev_init(struct vhost_dev *dev,
 		vhost_vq_reset(dev, vq);
 		if (vq->handle_kick)
 			vhost_poll_init(&vq->poll, vq->handle_kick,
-					EPOLLIN, dev);
+					EPOLLIN, dev, vq);
 	}
 }
 EXPORT_SYMBOL_GPL(vhost_dev_init);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index a985caa35633..d9650da2be2c 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -46,13 +46,15 @@ struct vhost_poll {
 	struct vhost_work	work;
 	__poll_t		mask;
 	struct vhost_dev	*dev;
+	struct vhost_virtqueue	*vq;
 };
 
 void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
 void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
 
 void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
-		     __poll_t mask, struct vhost_dev *dev);
+		     __poll_t mask, struct vhost_dev *dev,
+		     struct vhost_virtqueue *vq);
 int vhost_poll_start(struct vhost_poll *poll, struct file *file);
 void vhost_poll_stop(struct vhost_poll *poll);
 void vhost_poll_flush(struct vhost_poll *poll);
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH V3 06/11] vhost-sock: convert to vq helpers
  2021-10-22  5:18 ` Mike Christie
@ 2021-10-22  5:19   ` Mike Christie
  -1 siblings, 0 replies; 74+ messages in thread
From: Mike Christie @ 2021-10-22  5:19 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, pbonzini, jasowang, mst,
	sgarzare, virtualization
  Cc: Mike Christie

Convert from vhost dev based helpers to vq ones.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/vsock.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 938aefbc75ec..c50c60d0955e 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -300,7 +300,7 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
 	list_add_tail(&pkt->list, &vsock->send_pkt_list);
 	spin_unlock_bh(&vsock->send_pkt_list_lock);
 
-	vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
+	vhost_vq_work_queue(&vsock->vqs[VSOCK_VQ_TX], &vsock->send_pkt_work);
 
 	rcu_read_unlock();
 	return len;
@@ -612,7 +612,7 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
 	/* Some packets may have been queued before the device was started,
 	 * let's kick the send worker to send them.
 	 */
-	vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
+	vhost_vq_work_queue(&vsock->vqs[VSOCK_VQ_TX], &vsock->send_pkt_work);
 
 	mutex_unlock(&vsock->dev.mutex);
 	return 0;
@@ -712,7 +712,9 @@ static void vhost_vsock_flush(struct vhost_vsock *vsock)
 	for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++)
 		if (vsock->vqs[i].handle_kick)
 			vhost_poll_flush(&vsock->vqs[i].poll);
-	vhost_work_dev_flush(&vsock->dev);
+
+	vhost_vq_work_flush(&vsock->vqs[VSOCK_VQ_TX]);
+	vhost_vq_work_flush(&vsock->vqs[VSOCK_VQ_RX]);
 }
 
 static void vhost_vsock_reset_orphans(struct sock *sk)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH V3 06/11] vhost-sock: convert to vq helpers
@ 2021-10-22  5:19   ` Mike Christie
  0 siblings, 0 replies; 74+ messages in thread
From: Mike Christie @ 2021-10-22  5:19 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, pbonzini, jasowang, mst,
	sgarzare, virtualization

Convert from vhost dev based helpers to vq ones.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/vsock.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 938aefbc75ec..c50c60d0955e 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -300,7 +300,7 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
 	list_add_tail(&pkt->list, &vsock->send_pkt_list);
 	spin_unlock_bh(&vsock->send_pkt_list_lock);
 
-	vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
+	vhost_vq_work_queue(&vsock->vqs[VSOCK_VQ_TX], &vsock->send_pkt_work);
 
 	rcu_read_unlock();
 	return len;
@@ -612,7 +612,7 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
 	/* Some packets may have been queued before the device was started,
 	 * let's kick the send worker to send them.
 	 */
-	vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
+	vhost_vq_work_queue(&vsock->vqs[VSOCK_VQ_TX], &vsock->send_pkt_work);
 
 	mutex_unlock(&vsock->dev.mutex);
 	return 0;
@@ -712,7 +712,9 @@ static void vhost_vsock_flush(struct vhost_vsock *vsock)
 	for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++)
 		if (vsock->vqs[i].handle_kick)
 			vhost_poll_flush(&vsock->vqs[i].poll);
-	vhost_work_dev_flush(&vsock->dev);
+
+	vhost_vq_work_flush(&vsock->vqs[VSOCK_VQ_TX]);
+	vhost_vq_work_flush(&vsock->vqs[VSOCK_VQ_RX]);
 }
 
 static void vhost_vsock_reset_orphans(struct sock *sk)
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH V3 07/11] vhost-scsi: make SCSI cmd completion per vq
  2021-10-22  5:18 ` Mike Christie
@ 2021-10-22  5:19   ` Mike Christie
  -1 siblings, 0 replies; 74+ messages in thread
From: Mike Christie @ 2021-10-22  5:19 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, pbonzini, jasowang, mst,
	sgarzare, virtualization

This patch separates the scsi cmd completion code paths so we can complete
cmds based on their vq instead of having all cmds complete on the same
worker/CPU. This will be useful with the next patches that allow us to
create mulitple worker threads and bind them to different vqs, so we can
have completions running on different threads/CPUs.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 drivers/vhost/scsi.c | 48 +++++++++++++++++++++++---------------------
 1 file changed, 25 insertions(+), 23 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 532e204f2b1b..0d85ddb68420 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -164,6 +164,7 @@ enum {
 
 struct vhost_scsi_virtqueue {
 	struct vhost_virtqueue vq;
+	struct vhost_scsi *vs;
 	/*
 	 * Reference counting for inflight reqs, used for flush operation. At
 	 * each time, one reference tracks new commands submitted, while we
@@ -178,6 +179,9 @@ struct vhost_scsi_virtqueue {
 	struct vhost_scsi_cmd *scsi_cmds;
 	struct sbitmap scsi_tags;
 	int max_cmds;
+
+	struct vhost_work completion_work;
+	struct llist_head completion_list;
 };
 
 struct vhost_scsi {
@@ -188,9 +192,6 @@ struct vhost_scsi {
 	struct vhost_dev dev;
 	struct vhost_scsi_virtqueue vqs[VHOST_SCSI_MAX_VQ];
 
-	struct vhost_work vs_completion_work; /* cmd completion work item */
-	struct llist_head vs_completion_list; /* cmd completion queue */
-
 	struct vhost_work vs_event_work; /* evt injection work item */
 	struct llist_head vs_event_list; /* evt injection queue */
 
@@ -365,10 +366,11 @@ static void vhost_scsi_release_cmd(struct se_cmd *se_cmd)
 	} else {
 		struct vhost_scsi_cmd *cmd = container_of(se_cmd,
 					struct vhost_scsi_cmd, tvc_se_cmd);
-		struct vhost_scsi *vs = cmd->tvc_vhost;
+		struct vhost_scsi_virtqueue *svq =  container_of(cmd->tvc_vq,
+					struct vhost_scsi_virtqueue, vq);
 
-		llist_add(&cmd->tvc_completion_list, &vs->vs_completion_list);
-		vhost_work_queue(&vs->dev, &vs->vs_completion_work);
+		llist_add(&cmd->tvc_completion_list, &svq->completion_list);
+		vhost_vq_work_queue(&svq->vq, &svq->completion_work);
 	}
 }
 
@@ -531,18 +533,17 @@ static void vhost_scsi_evt_work(struct vhost_work *work)
  */
 static void vhost_scsi_complete_cmd_work(struct vhost_work *work)
 {
-	struct vhost_scsi *vs = container_of(work, struct vhost_scsi,
-					vs_completion_work);
-	DECLARE_BITMAP(signal, VHOST_SCSI_MAX_VQ);
+	struct vhost_scsi_virtqueue *svq = container_of(work,
+				struct vhost_scsi_virtqueue, completion_work);
 	struct virtio_scsi_cmd_resp v_rsp;
 	struct vhost_scsi_cmd *cmd, *t;
 	struct llist_node *llnode;
 	struct se_cmd *se_cmd;
 	struct iov_iter iov_iter;
-	int ret, vq;
+	bool signal = false;
+	int ret;
 
-	bitmap_zero(signal, VHOST_SCSI_MAX_VQ);
-	llnode = llist_del_all(&vs->vs_completion_list);
+	llnode = llist_del_all(&svq->completion_list);
 	llist_for_each_entry_safe(cmd, t, llnode, tvc_completion_list) {
 		se_cmd = &cmd->tvc_se_cmd;
 
@@ -562,21 +563,16 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work *work)
 			      cmd->tvc_in_iovs, sizeof(v_rsp));
 		ret = copy_to_iter(&v_rsp, sizeof(v_rsp), &iov_iter);
 		if (likely(ret == sizeof(v_rsp))) {
-			struct vhost_scsi_virtqueue *q;
+			signal = true;
 			vhost_add_used(cmd->tvc_vq, cmd->tvc_vq_desc, 0);
-			q = container_of(cmd->tvc_vq, struct vhost_scsi_virtqueue, vq);
-			vq = q - vs->vqs;
-			__set_bit(vq, signal);
 		} else
 			pr_err("Faulted on virtio_scsi_cmd_resp\n");
 
 		vhost_scsi_release_cmd_res(se_cmd);
 	}
 
-	vq = -1;
-	while ((vq = find_next_bit(signal, VHOST_SCSI_MAX_VQ, vq + 1))
-		< VHOST_SCSI_MAX_VQ)
-		vhost_signal(&vs->dev, &vs->vqs[vq].vq);
+	if (signal)
+		vhost_signal(&svq->vs->dev, &svq->vq);
 }
 
 static struct vhost_scsi_cmd *
@@ -1776,6 +1772,7 @@ static int vhost_scsi_set_features(struct vhost_scsi *vs, u64 features)
 
 static int vhost_scsi_open(struct inode *inode, struct file *f)
 {
+	struct vhost_scsi_virtqueue *svq;
 	struct vhost_scsi *vs;
 	struct vhost_virtqueue **vqs;
 	int r = -ENOMEM, i;
@@ -1788,7 +1785,6 @@ static int vhost_scsi_open(struct inode *inode, struct file *f)
 	if (!vqs)
 		goto err_vqs;
 
-	vhost_work_init(&vs->vs_completion_work, vhost_scsi_complete_cmd_work);
 	vhost_work_init(&vs->vs_event_work, vhost_scsi_evt_work);
 
 	vs->vs_events_nr = 0;
@@ -1799,8 +1795,14 @@ static int vhost_scsi_open(struct inode *inode, struct file *f)
 	vs->vqs[VHOST_SCSI_VQ_CTL].vq.handle_kick = vhost_scsi_ctl_handle_kick;
 	vs->vqs[VHOST_SCSI_VQ_EVT].vq.handle_kick = vhost_scsi_evt_handle_kick;
 	for (i = VHOST_SCSI_VQ_IO; i < VHOST_SCSI_MAX_VQ; i++) {
-		vqs[i] = &vs->vqs[i].vq;
-		vs->vqs[i].vq.handle_kick = vhost_scsi_handle_kick;
+		svq = &vs->vqs[i];
+
+		vqs[i] = &svq->vq;
+		svq->vs = vs;
+		init_llist_head(&svq->completion_list);
+		vhost_work_init(&svq->completion_work,
+				vhost_scsi_complete_cmd_work);
+		svq->vq.handle_kick = vhost_scsi_handle_kick;
 	}
 	vhost_dev_init(&vs->dev, vqs, VHOST_SCSI_MAX_VQ, UIO_MAXIOV,
 		       VHOST_SCSI_WEIGHT, 0, true, NULL);
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH V3 07/11] vhost-scsi: make SCSI cmd completion per vq
@ 2021-10-22  5:19   ` Mike Christie
  0 siblings, 0 replies; 74+ messages in thread
From: Mike Christie @ 2021-10-22  5:19 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, pbonzini, jasowang, mst,
	sgarzare, virtualization
  Cc: Mike Christie

This patch separates the scsi cmd completion code paths so we can complete
cmds based on their vq instead of having all cmds complete on the same
worker/CPU. This will be useful with the next patches that allow us to
create mulitple worker threads and bind them to different vqs, so we can
have completions running on different threads/CPUs.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 drivers/vhost/scsi.c | 48 +++++++++++++++++++++++---------------------
 1 file changed, 25 insertions(+), 23 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 532e204f2b1b..0d85ddb68420 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -164,6 +164,7 @@ enum {
 
 struct vhost_scsi_virtqueue {
 	struct vhost_virtqueue vq;
+	struct vhost_scsi *vs;
 	/*
 	 * Reference counting for inflight reqs, used for flush operation. At
 	 * each time, one reference tracks new commands submitted, while we
@@ -178,6 +179,9 @@ struct vhost_scsi_virtqueue {
 	struct vhost_scsi_cmd *scsi_cmds;
 	struct sbitmap scsi_tags;
 	int max_cmds;
+
+	struct vhost_work completion_work;
+	struct llist_head completion_list;
 };
 
 struct vhost_scsi {
@@ -188,9 +192,6 @@ struct vhost_scsi {
 	struct vhost_dev dev;
 	struct vhost_scsi_virtqueue vqs[VHOST_SCSI_MAX_VQ];
 
-	struct vhost_work vs_completion_work; /* cmd completion work item */
-	struct llist_head vs_completion_list; /* cmd completion queue */
-
 	struct vhost_work vs_event_work; /* evt injection work item */
 	struct llist_head vs_event_list; /* evt injection queue */
 
@@ -365,10 +366,11 @@ static void vhost_scsi_release_cmd(struct se_cmd *se_cmd)
 	} else {
 		struct vhost_scsi_cmd *cmd = container_of(se_cmd,
 					struct vhost_scsi_cmd, tvc_se_cmd);
-		struct vhost_scsi *vs = cmd->tvc_vhost;
+		struct vhost_scsi_virtqueue *svq =  container_of(cmd->tvc_vq,
+					struct vhost_scsi_virtqueue, vq);
 
-		llist_add(&cmd->tvc_completion_list, &vs->vs_completion_list);
-		vhost_work_queue(&vs->dev, &vs->vs_completion_work);
+		llist_add(&cmd->tvc_completion_list, &svq->completion_list);
+		vhost_vq_work_queue(&svq->vq, &svq->completion_work);
 	}
 }
 
@@ -531,18 +533,17 @@ static void vhost_scsi_evt_work(struct vhost_work *work)
  */
 static void vhost_scsi_complete_cmd_work(struct vhost_work *work)
 {
-	struct vhost_scsi *vs = container_of(work, struct vhost_scsi,
-					vs_completion_work);
-	DECLARE_BITMAP(signal, VHOST_SCSI_MAX_VQ);
+	struct vhost_scsi_virtqueue *svq = container_of(work,
+				struct vhost_scsi_virtqueue, completion_work);
 	struct virtio_scsi_cmd_resp v_rsp;
 	struct vhost_scsi_cmd *cmd, *t;
 	struct llist_node *llnode;
 	struct se_cmd *se_cmd;
 	struct iov_iter iov_iter;
-	int ret, vq;
+	bool signal = false;
+	int ret;
 
-	bitmap_zero(signal, VHOST_SCSI_MAX_VQ);
-	llnode = llist_del_all(&vs->vs_completion_list);
+	llnode = llist_del_all(&svq->completion_list);
 	llist_for_each_entry_safe(cmd, t, llnode, tvc_completion_list) {
 		se_cmd = &cmd->tvc_se_cmd;
 
@@ -562,21 +563,16 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work *work)
 			      cmd->tvc_in_iovs, sizeof(v_rsp));
 		ret = copy_to_iter(&v_rsp, sizeof(v_rsp), &iov_iter);
 		if (likely(ret == sizeof(v_rsp))) {
-			struct vhost_scsi_virtqueue *q;
+			signal = true;
 			vhost_add_used(cmd->tvc_vq, cmd->tvc_vq_desc, 0);
-			q = container_of(cmd->tvc_vq, struct vhost_scsi_virtqueue, vq);
-			vq = q - vs->vqs;
-			__set_bit(vq, signal);
 		} else
 			pr_err("Faulted on virtio_scsi_cmd_resp\n");
 
 		vhost_scsi_release_cmd_res(se_cmd);
 	}
 
-	vq = -1;
-	while ((vq = find_next_bit(signal, VHOST_SCSI_MAX_VQ, vq + 1))
-		< VHOST_SCSI_MAX_VQ)
-		vhost_signal(&vs->dev, &vs->vqs[vq].vq);
+	if (signal)
+		vhost_signal(&svq->vs->dev, &svq->vq);
 }
 
 static struct vhost_scsi_cmd *
@@ -1776,6 +1772,7 @@ static int vhost_scsi_set_features(struct vhost_scsi *vs, u64 features)
 
 static int vhost_scsi_open(struct inode *inode, struct file *f)
 {
+	struct vhost_scsi_virtqueue *svq;
 	struct vhost_scsi *vs;
 	struct vhost_virtqueue **vqs;
 	int r = -ENOMEM, i;
@@ -1788,7 +1785,6 @@ static int vhost_scsi_open(struct inode *inode, struct file *f)
 	if (!vqs)
 		goto err_vqs;
 
-	vhost_work_init(&vs->vs_completion_work, vhost_scsi_complete_cmd_work);
 	vhost_work_init(&vs->vs_event_work, vhost_scsi_evt_work);
 
 	vs->vs_events_nr = 0;
@@ -1799,8 +1795,14 @@ static int vhost_scsi_open(struct inode *inode, struct file *f)
 	vs->vqs[VHOST_SCSI_VQ_CTL].vq.handle_kick = vhost_scsi_ctl_handle_kick;
 	vs->vqs[VHOST_SCSI_VQ_EVT].vq.handle_kick = vhost_scsi_evt_handle_kick;
 	for (i = VHOST_SCSI_VQ_IO; i < VHOST_SCSI_MAX_VQ; i++) {
-		vqs[i] = &vs->vqs[i].vq;
-		vs->vqs[i].vq.handle_kick = vhost_scsi_handle_kick;
+		svq = &vs->vqs[i];
+
+		vqs[i] = &svq->vq;
+		svq->vs = vs;
+		init_llist_head(&svq->completion_list);
+		vhost_work_init(&svq->completion_work,
+				vhost_scsi_complete_cmd_work);
+		svq->vq.handle_kick = vhost_scsi_handle_kick;
 	}
 	vhost_dev_init(&vs->dev, vqs, VHOST_SCSI_MAX_VQ, UIO_MAXIOV,
 		       VHOST_SCSI_WEIGHT, 0, true, NULL);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH V3 08/11] vhost-scsi: convert to vq helpers
  2021-10-22  5:18 ` Mike Christie
@ 2021-10-22  5:19   ` Mike Christie
  -1 siblings, 0 replies; 74+ messages in thread
From: Mike Christie @ 2021-10-22  5:19 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, pbonzini, jasowang, mst,
	sgarzare, virtualization

Convert from vhost dev based helpers to vq ones.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/scsi.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 0d85ddb68420..08beba73ada4 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -361,8 +361,9 @@ static void vhost_scsi_release_cmd(struct se_cmd *se_cmd)
 	if (se_cmd->se_cmd_flags & SCF_SCSI_TMR_CDB) {
 		struct vhost_scsi_tmf *tmf = container_of(se_cmd,
 					struct vhost_scsi_tmf, se_cmd);
+		struct vhost_virtqueue *vq = &tmf->svq->vq;
 
-		vhost_work_queue(&tmf->vhost->dev, &tmf->vwork);
+		vhost_vq_work_queue(vq, &tmf->vwork);
 	} else {
 		struct vhost_scsi_cmd *cmd = container_of(se_cmd,
 					struct vhost_scsi_cmd, tvc_se_cmd);
@@ -1360,11 +1361,9 @@ static void vhost_scsi_ctl_handle_kick(struct vhost_work *work)
 }
 
 static void
-vhost_scsi_send_evt(struct vhost_scsi *vs,
-		   struct vhost_scsi_tpg *tpg,
-		   struct se_lun *lun,
-		   u32 event,
-		   u32 reason)
+vhost_scsi_send_evt(struct vhost_scsi *vs, struct vhost_virtqueue *vq,
+		    struct vhost_scsi_tpg *tpg, struct se_lun *lun,
+		    u32 event, u32 reason)
 {
 	struct vhost_scsi_evt *evt;
 
@@ -1386,7 +1385,7 @@ vhost_scsi_send_evt(struct vhost_scsi *vs,
 	}
 
 	llist_add(&evt->list, &vs->vs_event_list);
-	vhost_work_queue(&vs->dev, &vs->vs_event_work);
+	vhost_vq_work_queue(vq, &vs->vs_event_work);
 }
 
 static void vhost_scsi_evt_handle_kick(struct vhost_work *work)
@@ -1400,7 +1399,8 @@ static void vhost_scsi_evt_handle_kick(struct vhost_work *work)
 		goto out;
 
 	if (vs->vs_events_missed)
-		vhost_scsi_send_evt(vs, NULL, NULL, VIRTIO_SCSI_T_NO_EVENT, 0);
+		vhost_scsi_send_evt(vs, vq, NULL, NULL, VIRTIO_SCSI_T_NO_EVENT,
+				    0);
 out:
 	mutex_unlock(&vq->mutex);
 }
@@ -1428,11 +1428,10 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
 	 * indicate the start of the flush operation so that it will reach 0
 	 * when all the reqs are finished.
 	 */
-	for (i = 0; i < VHOST_SCSI_MAX_VQ; i++)
+	for (i = 0; i < VHOST_SCSI_MAX_VQ; i++) {
 		kref_put(&old_inflight[i]->kref, vhost_scsi_done_inflight);
-
-	/* Flush both the vhost poll and vhost work */
-	vhost_work_dev_flush(&vs->dev);
+		vhost_vq_work_flush(&vs->vqs[i].vq);
+	}
 
 	/* Wait for all reqs issued before the flush to be finished */
 	for (i = 0; i < VHOST_SCSI_MAX_VQ; i++)
@@ -1967,8 +1966,8 @@ vhost_scsi_do_plug(struct vhost_scsi_tpg *tpg,
 	vq = &vs->vqs[VHOST_SCSI_VQ_EVT].vq;
 	mutex_lock(&vq->mutex);
 	if (vhost_has_feature(vq, VIRTIO_SCSI_F_HOTPLUG))
-		vhost_scsi_send_evt(vs, tpg, lun,
-				   VIRTIO_SCSI_T_TRANSPORT_RESET, reason);
+		vhost_scsi_send_evt(vs, vq, tpg, lun,
+				    VIRTIO_SCSI_T_TRANSPORT_RESET, reason);
 	mutex_unlock(&vq->mutex);
 	mutex_unlock(&vs->dev.mutex);
 }
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH V3 08/11] vhost-scsi: convert to vq helpers
@ 2021-10-22  5:19   ` Mike Christie
  0 siblings, 0 replies; 74+ messages in thread
From: Mike Christie @ 2021-10-22  5:19 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, pbonzini, jasowang, mst,
	sgarzare, virtualization
  Cc: Mike Christie

Convert from vhost dev based helpers to vq ones.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/scsi.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 0d85ddb68420..08beba73ada4 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -361,8 +361,9 @@ static void vhost_scsi_release_cmd(struct se_cmd *se_cmd)
 	if (se_cmd->se_cmd_flags & SCF_SCSI_TMR_CDB) {
 		struct vhost_scsi_tmf *tmf = container_of(se_cmd,
 					struct vhost_scsi_tmf, se_cmd);
+		struct vhost_virtqueue *vq = &tmf->svq->vq;
 
-		vhost_work_queue(&tmf->vhost->dev, &tmf->vwork);
+		vhost_vq_work_queue(vq, &tmf->vwork);
 	} else {
 		struct vhost_scsi_cmd *cmd = container_of(se_cmd,
 					struct vhost_scsi_cmd, tvc_se_cmd);
@@ -1360,11 +1361,9 @@ static void vhost_scsi_ctl_handle_kick(struct vhost_work *work)
 }
 
 static void
-vhost_scsi_send_evt(struct vhost_scsi *vs,
-		   struct vhost_scsi_tpg *tpg,
-		   struct se_lun *lun,
-		   u32 event,
-		   u32 reason)
+vhost_scsi_send_evt(struct vhost_scsi *vs, struct vhost_virtqueue *vq,
+		    struct vhost_scsi_tpg *tpg, struct se_lun *lun,
+		    u32 event, u32 reason)
 {
 	struct vhost_scsi_evt *evt;
 
@@ -1386,7 +1385,7 @@ vhost_scsi_send_evt(struct vhost_scsi *vs,
 	}
 
 	llist_add(&evt->list, &vs->vs_event_list);
-	vhost_work_queue(&vs->dev, &vs->vs_event_work);
+	vhost_vq_work_queue(vq, &vs->vs_event_work);
 }
 
 static void vhost_scsi_evt_handle_kick(struct vhost_work *work)
@@ -1400,7 +1399,8 @@ static void vhost_scsi_evt_handle_kick(struct vhost_work *work)
 		goto out;
 
 	if (vs->vs_events_missed)
-		vhost_scsi_send_evt(vs, NULL, NULL, VIRTIO_SCSI_T_NO_EVENT, 0);
+		vhost_scsi_send_evt(vs, vq, NULL, NULL, VIRTIO_SCSI_T_NO_EVENT,
+				    0);
 out:
 	mutex_unlock(&vq->mutex);
 }
@@ -1428,11 +1428,10 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
 	 * indicate the start of the flush operation so that it will reach 0
 	 * when all the reqs are finished.
 	 */
-	for (i = 0; i < VHOST_SCSI_MAX_VQ; i++)
+	for (i = 0; i < VHOST_SCSI_MAX_VQ; i++) {
 		kref_put(&old_inflight[i]->kref, vhost_scsi_done_inflight);
-
-	/* Flush both the vhost poll and vhost work */
-	vhost_work_dev_flush(&vs->dev);
+		vhost_vq_work_flush(&vs->vqs[i].vq);
+	}
 
 	/* Wait for all reqs issued before the flush to be finished */
 	for (i = 0; i < VHOST_SCSI_MAX_VQ; i++)
@@ -1967,8 +1966,8 @@ vhost_scsi_do_plug(struct vhost_scsi_tpg *tpg,
 	vq = &vs->vqs[VHOST_SCSI_VQ_EVT].vq;
 	mutex_lock(&vq->mutex);
 	if (vhost_has_feature(vq, VIRTIO_SCSI_F_HOTPLUG))
-		vhost_scsi_send_evt(vs, tpg, lun,
-				   VIRTIO_SCSI_T_TRANSPORT_RESET, reason);
+		vhost_scsi_send_evt(vs, vq, tpg, lun,
+				    VIRTIO_SCSI_T_TRANSPORT_RESET, reason);
 	mutex_unlock(&vq->mutex);
 	mutex_unlock(&vs->dev.mutex);
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH V3 09/11] vhost-scsi: flush IO vqs then send TMF rsp
  2021-10-22  5:18 ` Mike Christie
@ 2021-10-22  5:19   ` Mike Christie
  -1 siblings, 0 replies; 74+ messages in thread
From: Mike Christie @ 2021-10-22  5:19 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, pbonzini, jasowang, mst,
	sgarzare, virtualization

With one worker we will always send the scsi cmd responses then send the
TMF rsp, because LIO will always complete the scsi cmds first then call
into us to send the TMF response.

With multiple workers, one of the IO vq threads could be run after the
TMF is queued, so this has us flush all the IO vqs before sending the TMF
response.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/scsi.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 08beba73ada4..29d9adcdb4fc 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1153,12 +1153,28 @@ static void vhost_scsi_tmf_resp_work(struct vhost_work *work)
 {
 	struct vhost_scsi_tmf *tmf = container_of(work, struct vhost_scsi_tmf,
 						  vwork);
-	int resp_code;
+	struct vhost_virtqueue *ctl_vq, *vq;
+	int resp_code, i;
+
+	if (tmf->scsi_resp == TMR_FUNCTION_COMPLETE) {
+		/*
+		 * Flush IO vqs that don't share a worker with the ctl to make
+		 * sure they have sent their responses before us.
+		 */
+		ctl_vq = &tmf->vhost->vqs[VHOST_SCSI_VQ_CTL].vq;
+		for (i = VHOST_SCSI_VQ_IO; i < tmf->vhost->dev.nvqs; i++) {
+			vq = &tmf->vhost->vqs[i].vq;
+
+			if (vhost_vq_is_setup(vq) &&
+			    vq->worker != ctl_vq->worker) {
+				vhost_vq_work_flush(vq);
+			}
+		}
 
-	if (tmf->scsi_resp == TMR_FUNCTION_COMPLETE)
 		resp_code = VIRTIO_SCSI_S_FUNCTION_SUCCEEDED;
-	else
+	} else {
 		resp_code = VIRTIO_SCSI_S_FUNCTION_REJECTED;
+	}
 
 	vhost_scsi_send_tmf_resp(tmf->vhost, &tmf->svq->vq, tmf->in_iovs,
 				 tmf->vq_desc, &tmf->resp_iov, resp_code);
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH V3 09/11] vhost-scsi: flush IO vqs then send TMF rsp
@ 2021-10-22  5:19   ` Mike Christie
  0 siblings, 0 replies; 74+ messages in thread
From: Mike Christie @ 2021-10-22  5:19 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, pbonzini, jasowang, mst,
	sgarzare, virtualization
  Cc: Mike Christie

With one worker we will always send the scsi cmd responses then send the
TMF rsp, because LIO will always complete the scsi cmds first then call
into us to send the TMF response.

With multiple workers, one of the IO vq threads could be run after the
TMF is queued, so this has us flush all the IO vqs before sending the TMF
response.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/scsi.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 08beba73ada4..29d9adcdb4fc 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1153,12 +1153,28 @@ static void vhost_scsi_tmf_resp_work(struct vhost_work *work)
 {
 	struct vhost_scsi_tmf *tmf = container_of(work, struct vhost_scsi_tmf,
 						  vwork);
-	int resp_code;
+	struct vhost_virtqueue *ctl_vq, *vq;
+	int resp_code, i;
+
+	if (tmf->scsi_resp == TMR_FUNCTION_COMPLETE) {
+		/*
+		 * Flush IO vqs that don't share a worker with the ctl to make
+		 * sure they have sent their responses before us.
+		 */
+		ctl_vq = &tmf->vhost->vqs[VHOST_SCSI_VQ_CTL].vq;
+		for (i = VHOST_SCSI_VQ_IO; i < tmf->vhost->dev.nvqs; i++) {
+			vq = &tmf->vhost->vqs[i].vq;
+
+			if (vhost_vq_is_setup(vq) &&
+			    vq->worker != ctl_vq->worker) {
+				vhost_vq_work_flush(vq);
+			}
+		}
 
-	if (tmf->scsi_resp == TMR_FUNCTION_COMPLETE)
 		resp_code = VIRTIO_SCSI_S_FUNCTION_SUCCEEDED;
-	else
+	} else {
 		resp_code = VIRTIO_SCSI_S_FUNCTION_REJECTED;
+	}
 
 	vhost_scsi_send_tmf_resp(tmf->vhost, &tmf->svq->vq, tmf->in_iovs,
 				 tmf->vq_desc, &tmf->resp_iov, resp_code);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH V3 10/11] vhost: remove device wide queu/flushing helpers
  2021-10-22  5:18 ` Mike Christie
@ 2021-10-22  5:19   ` Mike Christie
  -1 siblings, 0 replies; 74+ messages in thread
From: Mike Christie @ 2021-10-22  5:19 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, pbonzini, jasowang, mst,
	sgarzare, virtualization

vhost_work_queue and vhost_work_dev_flush are no longer used, so drop
them.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/vhost.c | 12 ------------
 drivers/vhost/vhost.h |  2 --
 2 files changed, 14 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index c84024afefff..04f43a6445e1 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -268,24 +268,12 @@ void vhost_vq_work_flush(struct vhost_virtqueue *vq)
 }
 EXPORT_SYMBOL_GPL(vhost_vq_work_flush);
 
-void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
-{
-	vhost_work_queue_on(dev->worker, work);
-}
-EXPORT_SYMBOL_GPL(vhost_work_queue);
-
 void vhost_vq_work_queue(struct vhost_virtqueue *vq, struct vhost_work *work)
 {
 	vhost_work_queue_on(vq->worker, work);
 }
 EXPORT_SYMBOL_GPL(vhost_vq_work_queue);
 
-void vhost_work_dev_flush(struct vhost_dev *dev)
-{
-	vhost_work_flush_on(dev->worker);
-}
-EXPORT_SYMBOL_GPL(vhost_work_dev_flush);
-
 /* Flush any work that has been scheduled. When calling this, don't hold any
  * locks that are also used by the callback. */
 void vhost_poll_flush(struct vhost_poll *poll)
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index d9650da2be2c..33c63b24187a 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -50,7 +50,6 @@ struct vhost_poll {
 };
 
 void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
-void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
 
 void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
 		     __poll_t mask, struct vhost_dev *dev,
@@ -59,7 +58,6 @@ int vhost_poll_start(struct vhost_poll *poll, struct file *file);
 void vhost_poll_stop(struct vhost_poll *poll);
 void vhost_poll_flush(struct vhost_poll *poll);
 void vhost_poll_queue(struct vhost_poll *poll);
-void vhost_work_dev_flush(struct vhost_dev *dev);
 
 struct vhost_log {
 	u64 addr;
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH V3 10/11] vhost: remove device wide queu/flushing helpers
@ 2021-10-22  5:19   ` Mike Christie
  0 siblings, 0 replies; 74+ messages in thread
From: Mike Christie @ 2021-10-22  5:19 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, pbonzini, jasowang, mst,
	sgarzare, virtualization
  Cc: Mike Christie

vhost_work_queue and vhost_work_dev_flush are no longer used, so drop
them.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/vhost.c | 12 ------------
 drivers/vhost/vhost.h |  2 --
 2 files changed, 14 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index c84024afefff..04f43a6445e1 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -268,24 +268,12 @@ void vhost_vq_work_flush(struct vhost_virtqueue *vq)
 }
 EXPORT_SYMBOL_GPL(vhost_vq_work_flush);
 
-void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
-{
-	vhost_work_queue_on(dev->worker, work);
-}
-EXPORT_SYMBOL_GPL(vhost_work_queue);
-
 void vhost_vq_work_queue(struct vhost_virtqueue *vq, struct vhost_work *work)
 {
 	vhost_work_queue_on(vq->worker, work);
 }
 EXPORT_SYMBOL_GPL(vhost_vq_work_queue);
 
-void vhost_work_dev_flush(struct vhost_dev *dev)
-{
-	vhost_work_flush_on(dev->worker);
-}
-EXPORT_SYMBOL_GPL(vhost_work_dev_flush);
-
 /* Flush any work that has been scheduled. When calling this, don't hold any
  * locks that are also used by the callback. */
 void vhost_poll_flush(struct vhost_poll *poll)
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index d9650da2be2c..33c63b24187a 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -50,7 +50,6 @@ struct vhost_poll {
 };
 
 void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
-void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
 
 void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
 		     __poll_t mask, struct vhost_dev *dev,
@@ -59,7 +58,6 @@ int vhost_poll_start(struct vhost_poll *poll, struct file *file);
 void vhost_poll_stop(struct vhost_poll *poll);
 void vhost_poll_flush(struct vhost_poll *poll);
 void vhost_poll_queue(struct vhost_poll *poll);
-void vhost_work_dev_flush(struct vhost_dev *dev);
 
 struct vhost_log {
 	u64 addr;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH V3 11/11] vhost: allow userspace to create workers
  2021-10-22  5:18 ` Mike Christie
@ 2021-10-22  5:19   ` Mike Christie
  -1 siblings, 0 replies; 74+ messages in thread
From: Mike Christie @ 2021-10-22  5:19 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, pbonzini, jasowang, mst,
	sgarzare, virtualization

This patch allows userspace to create workers and bind them to vqs. You
can have N workers per dev and also share N workers with M vqs.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/vhost.c            | 99 ++++++++++++++++++++++++++++----
 drivers/vhost/vhost.h            |  2 +-
 include/uapi/linux/vhost.h       | 11 ++++
 include/uapi/linux/vhost_types.h | 12 ++++
 4 files changed, 112 insertions(+), 12 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 04f43a6445e1..c86e88d7f35c 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -493,7 +493,6 @@ void vhost_dev_init(struct vhost_dev *dev,
 	dev->umem = NULL;
 	dev->iotlb = NULL;
 	dev->mm = NULL;
-	dev->worker = NULL;
 	dev->iov_limit = iov_limit;
 	dev->weight = weight;
 	dev->byte_weight = byte_weight;
@@ -576,20 +575,40 @@ static void vhost_worker_stop(struct vhost_worker *worker)
 	wait_for_completion(worker->exit_done);
 }
 
-static void vhost_worker_free(struct vhost_dev *dev)
-{
-	struct vhost_worker *worker = dev->worker;
 
+static void vhost_worker_put(struct vhost_worker *worker)
+{
 	if (!worker)
 		return;
 
-	dev->worker = NULL;
+	if (!refcount_dec_and_test(&worker->refcount))
+		return;
+
 	WARN_ON(!llist_empty(&worker->work_list));
 	vhost_worker_stop(worker);
 	kfree(worker);
 }
 
-static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
+static void vhost_vq_clear_worker(struct vhost_virtqueue *vq)
+{
+	if (vq->worker)
+		vhost_worker_put(vq->worker);
+	vq->worker = NULL;
+}
+
+static void vhost_workers_free(struct vhost_dev *dev)
+{
+	int i;
+
+	if (!dev->use_worker)
+		return;
+
+	for (i = 0; i < dev->nvqs; i++)
+		vhost_vq_clear_worker(dev->vqs[i]);
+}
+
+static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev,
+						int init_vq_map_count)
 {
 	struct vhost_worker *worker;
 	struct task_struct *task;
@@ -598,9 +617,9 @@ static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
 	if (!worker)
 		return NULL;
 
-	dev->worker = worker;
 	worker->kcov_handle = kcov_common_handle();
 	init_llist_head(&worker->work_list);
+	refcount_set(&worker->refcount, init_vq_map_count);
 
 	/*
 	 * vhost used to use the kthread API which ignores all signals by
@@ -617,10 +636,58 @@ static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
 
 free_worker:
 	kfree(worker);
-	dev->worker = NULL;
 	return NULL;
 }
 
+static struct vhost_worker *vhost_worker_find(struct vhost_dev *dev, pid_t pid)
+{
+	struct vhost_worker *worker = NULL;
+	int i;
+
+	for (i = 0; i < dev->nvqs; i++) {
+		if (dev->vqs[i]->worker->task->pid != pid)
+			continue;
+
+		worker = dev->vqs[i]->worker;
+		break;
+	}
+
+	return worker;
+}
+
+/* Caller must have device mutex */
+static int vhost_vq_setup_worker(struct vhost_virtqueue *vq,
+				 struct vhost_vring_worker *info)
+{
+	struct vhost_dev *dev = vq->dev;
+	struct vhost_worker *worker;
+
+	if (!dev->use_worker)
+		return -EINVAL;
+
+	/* We don't support setting a worker on an active vq */
+	if (vq->private_data)
+		return -EBUSY;
+
+	if (info->pid == VHOST_VRING_NEW_WORKER) {
+		worker = vhost_worker_create(dev, 1);
+		if (!worker)
+			return -ENOMEM;
+
+		info->pid = worker->task->pid;
+	} else {
+		worker = vhost_worker_find(dev, info->pid);
+		if (!worker)
+			return -ENODEV;
+
+		refcount_inc(&worker->refcount);
+	}
+
+	vhost_vq_clear_worker(vq);
+	vq->worker = worker;
+	return 0;
+}
+
 /* Caller should have device mutex */
 long vhost_dev_set_owner(struct vhost_dev *dev)
 {
@@ -636,7 +703,7 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
 	vhost_attach_mm(dev);
 
 	if (dev->use_worker) {
-		worker = vhost_worker_create(dev);
+		worker = vhost_worker_create(dev, dev->nvqs);
 		if (!worker)
 			goto err_worker;
 
@@ -650,7 +717,7 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
 
 	return 0;
 err_iovecs:
-	vhost_worker_free(dev);
+	vhost_workers_free(dev);
 err_worker:
 	vhost_detach_mm(dev);
 err_mm:
@@ -742,7 +809,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
 	dev->iotlb = NULL;
 	vhost_clear_msg(dev);
 	wake_up_interruptible_poll(&dev->wait, EPOLLIN | EPOLLRDNORM);
-	vhost_worker_free(dev);
+	vhost_workers_free(dev);
 	vhost_detach_mm(dev);
 }
 EXPORT_SYMBOL_GPL(vhost_dev_cleanup);
@@ -1612,6 +1679,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
 	struct eventfd_ctx *ctx = NULL;
 	u32 __user *idxp = argp;
 	struct vhost_virtqueue *vq;
+	struct vhost_vring_worker w;
 	struct vhost_vring_state s;
 	struct vhost_vring_file f;
 	u32 idx;
@@ -1719,6 +1787,15 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
 		if (copy_to_user(argp, &s, sizeof(s)))
 			r = -EFAULT;
 		break;
+	case VHOST_SET_VRING_WORKER:
+		if (copy_from_user(&w, argp, sizeof(w))) {
+			r = -EFAULT;
+			break;
+		}
+		r = vhost_vq_setup_worker(vq, &w);
+		if (!r && copy_to_user(argp, &w, sizeof(w)))
+			r = -EFAULT;
+		break;
 	default:
 		r = -ENOIOCTLCMD;
 	}
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 33c63b24187a..0911d1a9bd3b 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -35,6 +35,7 @@ struct vhost_worker {
 	struct llist_head	work_list;
 	u64			kcov_handle;
 	unsigned long		flags;
+	refcount_t		refcount;
 };
 
 /* Poll a file (eventfd or socket) */
@@ -160,7 +161,6 @@ struct vhost_dev {
 	struct vhost_virtqueue **vqs;
 	int nvqs;
 	struct eventfd_ctx *log_ctx;
-	struct vhost_worker *worker;
 	struct vhost_iotlb *umem;
 	struct vhost_iotlb *iotlb;
 	spinlock_t iotlb_lock;
diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index c998860d7bbc..e5c0669430e5 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -70,6 +70,17 @@
 #define VHOST_VRING_BIG_ENDIAN 1
 #define VHOST_SET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_state)
 #define VHOST_GET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state)
+/* By default, a device gets one vhost_worker created during VHOST_SET_OWNER
+ * that its virtqueues share. This allows userspace to create a vhost_worker
+ * and map a virtqueue to it or map a virtqueue to an existing worker.
+ *
+ * If pid > 0 and it matches an existing vhost_worker thread it will be bound
+ * to the vq. If pid is VHOST_VRING_NEW_WORKER, then a new worker will be
+ * created and bound to the vq.
+ *
+ * This must be called after VHOST_SET_OWNER and before the vq is active.
+ */
+#define VHOST_SET_VRING_WORKER _IOWR(VHOST_VIRTIO, 0x15, struct vhost_vring_worker)
 
 /* The following ioctls use eventfd file descriptors to signal and poll
  * for events. */
diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
index f7f6a3a28977..af654e3cef0e 100644
--- a/include/uapi/linux/vhost_types.h
+++ b/include/uapi/linux/vhost_types.h
@@ -47,6 +47,18 @@ struct vhost_vring_addr {
 	__u64 log_guest_addr;
 };
 
+#define VHOST_VRING_NEW_WORKER -1
+
+struct vhost_vring_worker {
+	unsigned int index;
+	/*
+	 * The pid of the vhost worker that the vq will be bound to. If
+	 * pid is VHOST_VRING_NEW_WORKER a new worker will be created and its
+	 * pid will be returned in pid.
+	 */
+	__kernel_pid_t pid;
+};
+
 /* no alignment requirement */
 struct vhost_iotlb_msg {
 	__u64 iova;
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH V3 11/11] vhost: allow userspace to create workers
@ 2021-10-22  5:19   ` Mike Christie
  0 siblings, 0 replies; 74+ messages in thread
From: Mike Christie @ 2021-10-22  5:19 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, pbonzini, jasowang, mst,
	sgarzare, virtualization
  Cc: Mike Christie

This patch allows userspace to create workers and bind them to vqs. You
can have N workers per dev and also share N workers with M vqs.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/vhost.c            | 99 ++++++++++++++++++++++++++++----
 drivers/vhost/vhost.h            |  2 +-
 include/uapi/linux/vhost.h       | 11 ++++
 include/uapi/linux/vhost_types.h | 12 ++++
 4 files changed, 112 insertions(+), 12 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 04f43a6445e1..c86e88d7f35c 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -493,7 +493,6 @@ void vhost_dev_init(struct vhost_dev *dev,
 	dev->umem = NULL;
 	dev->iotlb = NULL;
 	dev->mm = NULL;
-	dev->worker = NULL;
 	dev->iov_limit = iov_limit;
 	dev->weight = weight;
 	dev->byte_weight = byte_weight;
@@ -576,20 +575,40 @@ static void vhost_worker_stop(struct vhost_worker *worker)
 	wait_for_completion(worker->exit_done);
 }
 
-static void vhost_worker_free(struct vhost_dev *dev)
-{
-	struct vhost_worker *worker = dev->worker;
 
+static void vhost_worker_put(struct vhost_worker *worker)
+{
 	if (!worker)
 		return;
 
-	dev->worker = NULL;
+	if (!refcount_dec_and_test(&worker->refcount))
+		return;
+
 	WARN_ON(!llist_empty(&worker->work_list));
 	vhost_worker_stop(worker);
 	kfree(worker);
 }
 
-static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
+static void vhost_vq_clear_worker(struct vhost_virtqueue *vq)
+{
+	if (vq->worker)
+		vhost_worker_put(vq->worker);
+	vq->worker = NULL;
+}
+
+static void vhost_workers_free(struct vhost_dev *dev)
+{
+	int i;
+
+	if (!dev->use_worker)
+		return;
+
+	for (i = 0; i < dev->nvqs; i++)
+		vhost_vq_clear_worker(dev->vqs[i]);
+}
+
+static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev,
+						int init_vq_map_count)
 {
 	struct vhost_worker *worker;
 	struct task_struct *task;
@@ -598,9 +617,9 @@ static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
 	if (!worker)
 		return NULL;
 
-	dev->worker = worker;
 	worker->kcov_handle = kcov_common_handle();
 	init_llist_head(&worker->work_list);
+	refcount_set(&worker->refcount, init_vq_map_count);
 
 	/*
 	 * vhost used to use the kthread API which ignores all signals by
@@ -617,10 +636,58 @@ static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
 
 free_worker:
 	kfree(worker);
-	dev->worker = NULL;
 	return NULL;
 }
 
+static struct vhost_worker *vhost_worker_find(struct vhost_dev *dev, pid_t pid)
+{
+	struct vhost_worker *worker = NULL;
+	int i;
+
+	for (i = 0; i < dev->nvqs; i++) {
+		if (dev->vqs[i]->worker->task->pid != pid)
+			continue;
+
+		worker = dev->vqs[i]->worker;
+		break;
+	}
+
+	return worker;
+}
+
+/* Caller must have device mutex */
+static int vhost_vq_setup_worker(struct vhost_virtqueue *vq,
+				 struct vhost_vring_worker *info)
+{
+	struct vhost_dev *dev = vq->dev;
+	struct vhost_worker *worker;
+
+	if (!dev->use_worker)
+		return -EINVAL;
+
+	/* We don't support setting a worker on an active vq */
+	if (vq->private_data)
+		return -EBUSY;
+
+	if (info->pid == VHOST_VRING_NEW_WORKER) {
+		worker = vhost_worker_create(dev, 1);
+		if (!worker)
+			return -ENOMEM;
+
+		info->pid = worker->task->pid;
+	} else {
+		worker = vhost_worker_find(dev, info->pid);
+		if (!worker)
+			return -ENODEV;
+
+		refcount_inc(&worker->refcount);
+	}
+
+	vhost_vq_clear_worker(vq);
+	vq->worker = worker;
+	return 0;
+}
+
 /* Caller should have device mutex */
 long vhost_dev_set_owner(struct vhost_dev *dev)
 {
@@ -636,7 +703,7 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
 	vhost_attach_mm(dev);
 
 	if (dev->use_worker) {
-		worker = vhost_worker_create(dev);
+		worker = vhost_worker_create(dev, dev->nvqs);
 		if (!worker)
 			goto err_worker;
 
@@ -650,7 +717,7 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
 
 	return 0;
 err_iovecs:
-	vhost_worker_free(dev);
+	vhost_workers_free(dev);
 err_worker:
 	vhost_detach_mm(dev);
 err_mm:
@@ -742,7 +809,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
 	dev->iotlb = NULL;
 	vhost_clear_msg(dev);
 	wake_up_interruptible_poll(&dev->wait, EPOLLIN | EPOLLRDNORM);
-	vhost_worker_free(dev);
+	vhost_workers_free(dev);
 	vhost_detach_mm(dev);
 }
 EXPORT_SYMBOL_GPL(vhost_dev_cleanup);
@@ -1612,6 +1679,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
 	struct eventfd_ctx *ctx = NULL;
 	u32 __user *idxp = argp;
 	struct vhost_virtqueue *vq;
+	struct vhost_vring_worker w;
 	struct vhost_vring_state s;
 	struct vhost_vring_file f;
 	u32 idx;
@@ -1719,6 +1787,15 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
 		if (copy_to_user(argp, &s, sizeof(s)))
 			r = -EFAULT;
 		break;
+	case VHOST_SET_VRING_WORKER:
+		if (copy_from_user(&w, argp, sizeof(w))) {
+			r = -EFAULT;
+			break;
+		}
+		r = vhost_vq_setup_worker(vq, &w);
+		if (!r && copy_to_user(argp, &w, sizeof(w)))
+			r = -EFAULT;
+		break;
 	default:
 		r = -ENOIOCTLCMD;
 	}
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 33c63b24187a..0911d1a9bd3b 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -35,6 +35,7 @@ struct vhost_worker {
 	struct llist_head	work_list;
 	u64			kcov_handle;
 	unsigned long		flags;
+	refcount_t		refcount;
 };
 
 /* Poll a file (eventfd or socket) */
@@ -160,7 +161,6 @@ struct vhost_dev {
 	struct vhost_virtqueue **vqs;
 	int nvqs;
 	struct eventfd_ctx *log_ctx;
-	struct vhost_worker *worker;
 	struct vhost_iotlb *umem;
 	struct vhost_iotlb *iotlb;
 	spinlock_t iotlb_lock;
diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index c998860d7bbc..e5c0669430e5 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -70,6 +70,17 @@
 #define VHOST_VRING_BIG_ENDIAN 1
 #define VHOST_SET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_state)
 #define VHOST_GET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state)
+/* By default, a device gets one vhost_worker created during VHOST_SET_OWNER
+ * that its virtqueues share. This allows userspace to create a vhost_worker
+ * and map a virtqueue to it or map a virtqueue to an existing worker.
+ *
+ * If pid > 0 and it matches an existing vhost_worker thread it will be bound
+ * to the vq. If pid is VHOST_VRING_NEW_WORKER, then a new worker will be
+ * created and bound to the vq.
+ *
+ * This must be called after VHOST_SET_OWNER and before the vq is active.
+ */
+#define VHOST_SET_VRING_WORKER _IOWR(VHOST_VIRTIO, 0x15, struct vhost_vring_worker)
 
 /* The following ioctls use eventfd file descriptors to signal and poll
  * for events. */
diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
index f7f6a3a28977..af654e3cef0e 100644
--- a/include/uapi/linux/vhost_types.h
+++ b/include/uapi/linux/vhost_types.h
@@ -47,6 +47,18 @@ struct vhost_vring_addr {
 	__u64 log_guest_addr;
 };
 
+#define VHOST_VRING_NEW_WORKER -1
+
+struct vhost_vring_worker {
+	unsigned int index;
+	/*
+	 * The pid of the vhost worker that the vq will be bound to. If
+	 * pid is VHOST_VRING_NEW_WORKER a new worker will be created and its
+	 * pid will be returned in pid.
+	 */
+	__kernel_pid_t pid;
+};
+
 /* no alignment requirement */
 struct vhost_iotlb_msg {
 	__u64 iova;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 00/11] vhost: multiple worker support
  2021-10-22  5:18 ` Mike Christie
@ 2021-10-22  6:02   ` michael.christie
  -1 siblings, 0 replies; 74+ messages in thread
From: michael.christie @ 2021-10-22  6:02 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, pbonzini, jasowang, mst,
	sgarzare, virtualization

On 10/22/21 12:18 AM, Mike Christie wrote:
> Results:
> --------
> 
> fio jobs        1       2       4       8       12      16
> ----------------------------------------------------------
> 1 worker        84k    492k    510k    -       -       -

That should be 184k above.

> worker per vq   184k   380k    744k    1422k   2256k   2434k


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 00/11] vhost: multiple worker support
@ 2021-10-22  6:02   ` michael.christie
  0 siblings, 0 replies; 74+ messages in thread
From: michael.christie @ 2021-10-22  6:02 UTC (permalink / raw)
  To: target-devel, linux-scsi, stefanha, pbonzini, jasowang, mst,
	sgarzare, virtualization

On 10/22/21 12:18 AM, Mike Christie wrote:
> Results:
> --------
> 
> fio jobs        1       2       4       8       12      16
> ----------------------------------------------------------
> 1 worker        84k    492k    510k    -       -       -

That should be 184k above.

> worker per vq   184k   380k    744k    1422k   2256k   2434k

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 00/11] vhost: multiple worker support
  2021-10-22  5:18 ` Mike Christie
@ 2021-10-22  9:48   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 74+ messages in thread
From: Michael S. Tsirkin @ 2021-10-22  9:48 UTC (permalink / raw)
  To: Mike Christie
  Cc: target-devel, linux-scsi, stefanha, pbonzini, jasowang, sgarzare,
	virtualization

On Fri, Oct 22, 2021 at 12:18:59AM -0500, Mike Christie wrote:
> The following patches apply over linus's tree and this patchset
> 
> https://lore.kernel.org/all/20211007214448.6282-1-michael.christie@oracle.com/
> 
> which allows us to check the vhost owner thread's RLIMITs:


Unfortunately that patchset in turn triggers kbuild warnings.
I was hoping you would address them, I don't think
merging that patchset before kbuild issues are addressed
is possible.

It also doesn't have lots of acks, I'm a bit apprehensive
of merging core changes like this through the vhost tree.
Try to CC more widely/ping people?

> It looks like that patchset has been ok'd by all the major parties
> and just needs a small cleanup to apply to Jens and Paul trees, so I
> wanted to post my threading patches based over it for review.
> 
> The following patches allow us to support multiple vhost workers per
> device. I ended up just doing Stefan's original idea where userspace has
> the kernel create a worker and we pass back the pid. This has the benefit
> over the workqueue and userspace thread approach where we only have
> one'ish code path in the kernel during setup to detect old tools. The
> main IO paths and device/vq setup/teardown paths all use common code.
> 
> I've also included a patch for qemu so you can get an idea of how it
> works. If we are ok with the kernel code then I'll break that up into
> a patchset and send to qemu-devel for review.
> 
> Results:
> --------
> 
> fio jobs        1       2       4       8       12      16
> ----------------------------------------------------------
> 1 worker        84k    492k    510k    -       -       -
> worker per vq   184k   380k    744k    1422k   2256k   2434k
> 
> Notes:
> 0. This used a simple fio command:
> 
> fio --filename=/dev/sdb  --direct=1 --rw=randrw --bs=4k \
> --ioengine=libaio --iodepth=128  --numjobs=$JOBS_ABOVE
> 
> and I used a VM with 16 vCPUs and 16 virtqueues.
> 
> 1. The patches were tested with emulate_pr=0 and these patches:
> 
> https://lore.kernel.org/all/yq1tuhge4bg.fsf@ca-mkp.ca.oracle.com/t/
> 
> which are in mkp's scsi branches for the next kernel. They fix the perf
> issues where IOPs dropped at 12 vqs/jobs.
> 
> 2. Because we have a hard limit of 1024 cmds, if the num jobs * iodepth
> was greater than 1024, I would decrease iodepth. So 12 jobs used 85 cmds,
> and 16 used 64.
> 
> 3. The perf issue above at 2 jobs is because when we only have 1 worker
> we execute more cmds per vhost_work due to all vqs funneling to one worker.
> This results in less context switches and better batching without having to
> tweak any settings. I'm working on patches to add back batching during lio
> completion and do polling on the submission side.
> 
> We will still want the threading patches, because if we batch at the fio
> level plus use the vhost theading patches, we can see a big boost like
> below. So hopefully doing it at the kernel will allow apps to just work
> without having to be smart like fio.
> 
> fio using io_uring and batching with the iodepth_batch* settings:
> 
> fio jobs        1       2       4       8       12      16
> -------------------------------------------------------------
> 1 worker        494k    520k    -       -       -       -
> worker per vq   496k    878k    1542k   2436k   2304k   2590k
> 
> V3:
> - fully convert vhost code to use vq based APIs instead of leaving it
> half per dev and half per vq.
> - rebase against kernel worker API.
> - Drop delayed worker creation. We always create the default worker at
> VHOST_SET_OWNER time. Userspace can create and bind workers after that.
> 
> v2:
> - change loop that we take a refcount to the worker in
> - replaced pid == -1 with define.
> - fixed tabbing/spacing coding style issue
> - use hash instead of list to lookup workers.
> - I dropped the patch that added an ioctl cmd to get a vq's worker's
> pid. I saw we might do a generic netlink interface instead.
> 
> 
> 


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 00/11] vhost: multiple worker support
@ 2021-10-22  9:48   ` Michael S. Tsirkin
  0 siblings, 0 replies; 74+ messages in thread
From: Michael S. Tsirkin @ 2021-10-22  9:48 UTC (permalink / raw)
  To: Mike Christie
  Cc: linux-scsi, virtualization, target-devel, stefanha, pbonzini

On Fri, Oct 22, 2021 at 12:18:59AM -0500, Mike Christie wrote:
> The following patches apply over linus's tree and this patchset
> 
> https://lore.kernel.org/all/20211007214448.6282-1-michael.christie@oracle.com/
> 
> which allows us to check the vhost owner thread's RLIMITs:


Unfortunately that patchset in turn triggers kbuild warnings.
I was hoping you would address them, I don't think
merging that patchset before kbuild issues are addressed
is possible.

It also doesn't have lots of acks, I'm a bit apprehensive
of merging core changes like this through the vhost tree.
Try to CC more widely/ping people?

> It looks like that patchset has been ok'd by all the major parties
> and just needs a small cleanup to apply to Jens and Paul trees, so I
> wanted to post my threading patches based over it for review.
> 
> The following patches allow us to support multiple vhost workers per
> device. I ended up just doing Stefan's original idea where userspace has
> the kernel create a worker and we pass back the pid. This has the benefit
> over the workqueue and userspace thread approach where we only have
> one'ish code path in the kernel during setup to detect old tools. The
> main IO paths and device/vq setup/teardown paths all use common code.
> 
> I've also included a patch for qemu so you can get an idea of how it
> works. If we are ok with the kernel code then I'll break that up into
> a patchset and send to qemu-devel for review.
> 
> Results:
> --------
> 
> fio jobs        1       2       4       8       12      16
> ----------------------------------------------------------
> 1 worker        84k    492k    510k    -       -       -
> worker per vq   184k   380k    744k    1422k   2256k   2434k
> 
> Notes:
> 0. This used a simple fio command:
> 
> fio --filename=/dev/sdb  --direct=1 --rw=randrw --bs=4k \
> --ioengine=libaio --iodepth=128  --numjobs=$JOBS_ABOVE
> 
> and I used a VM with 16 vCPUs and 16 virtqueues.
> 
> 1. The patches were tested with emulate_pr=0 and these patches:
> 
> https://lore.kernel.org/all/yq1tuhge4bg.fsf@ca-mkp.ca.oracle.com/t/
> 
> which are in mkp's scsi branches for the next kernel. They fix the perf
> issues where IOPs dropped at 12 vqs/jobs.
> 
> 2. Because we have a hard limit of 1024 cmds, if the num jobs * iodepth
> was greater than 1024, I would decrease iodepth. So 12 jobs used 85 cmds,
> and 16 used 64.
> 
> 3. The perf issue above at 2 jobs is because when we only have 1 worker
> we execute more cmds per vhost_work due to all vqs funneling to one worker.
> This results in less context switches and better batching without having to
> tweak any settings. I'm working on patches to add back batching during lio
> completion and do polling on the submission side.
> 
> We will still want the threading patches, because if we batch at the fio
> level plus use the vhost theading patches, we can see a big boost like
> below. So hopefully doing it at the kernel will allow apps to just work
> without having to be smart like fio.
> 
> fio using io_uring and batching with the iodepth_batch* settings:
> 
> fio jobs        1       2       4       8       12      16
> -------------------------------------------------------------
> 1 worker        494k    520k    -       -       -       -
> worker per vq   496k    878k    1542k   2436k   2304k   2590k
> 
> V3:
> - fully convert vhost code to use vq based APIs instead of leaving it
> half per dev and half per vq.
> - rebase against kernel worker API.
> - Drop delayed worker creation. We always create the default worker at
> VHOST_SET_OWNER time. Userspace can create and bind workers after that.
> 
> v2:
> - change loop that we take a refcount to the worker in
> - replaced pid == -1 with define.
> - fixed tabbing/spacing coding style issue
> - use hash instead of list to lookup workers.
> - I dropped the patch that added an ioctl cmd to get a vq's worker's
> pid. I saw we might do a generic netlink interface instead.
> 
> 
> 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 00/11] vhost: multiple worker support
  2021-10-22  6:02   ` michael.christie
@ 2021-10-22  9:49     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 74+ messages in thread
From: Michael S. Tsirkin @ 2021-10-22  9:49 UTC (permalink / raw)
  To: michael.christie
  Cc: linux-scsi, virtualization, target-devel, stefanha, pbonzini

On Fri, Oct 22, 2021 at 01:02:19AM -0500, michael.christie@oracle.com wrote:
> On 10/22/21 12:18 AM, Mike Christie wrote:
> > Results:
> > --------
> > 
> > fio jobs        1       2       4       8       12      16
> > ----------------------------------------------------------
> > 1 worker        84k    492k    510k    -       -       -
> 
> That should be 184k above.

Nice. I'd like to merge this but blocked because of a dependency
(since we can't allow userspace to create threads without any limit).

> > worker per vq   184k   380k    744k    1422k   2256k   2434k

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 00/11] vhost: multiple worker support
@ 2021-10-22  9:49     ` Michael S. Tsirkin
  0 siblings, 0 replies; 74+ messages in thread
From: Michael S. Tsirkin @ 2021-10-22  9:49 UTC (permalink / raw)
  To: michael.christie
  Cc: target-devel, linux-scsi, stefanha, pbonzini, jasowang, sgarzare,
	virtualization

On Fri, Oct 22, 2021 at 01:02:19AM -0500, michael.christie@oracle.com wrote:
> On 10/22/21 12:18 AM, Mike Christie wrote:
> > Results:
> > --------
> > 
> > fio jobs        1       2       4       8       12      16
> > ----------------------------------------------------------
> > 1 worker        84k    492k    510k    -       -       -
> 
> That should be 184k above.

Nice. I'd like to merge this but blocked because of a dependency
(since we can't allow userspace to create threads without any limit).

> > worker per vq   184k   380k    744k    1422k   2256k   2434k


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
  2021-10-22  5:19   ` Mike Christie
@ 2021-10-22 10:47     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 74+ messages in thread
From: Michael S. Tsirkin @ 2021-10-22 10:47 UTC (permalink / raw)
  To: Mike Christie
  Cc: target-devel, linux-scsi, stefanha, pbonzini, jasowang, sgarzare,
	virtualization

On Fri, Oct 22, 2021 at 12:19:11AM -0500, Mike Christie wrote:
> This patch allows userspace to create workers and bind them to vqs. You
> can have N workers per dev and also share N workers with M vqs.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/vhost/vhost.c            | 99 ++++++++++++++++++++++++++++----
>  drivers/vhost/vhost.h            |  2 +-
>  include/uapi/linux/vhost.h       | 11 ++++
>  include/uapi/linux/vhost_types.h | 12 ++++
>  4 files changed, 112 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 04f43a6445e1..c86e88d7f35c 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -493,7 +493,6 @@ void vhost_dev_init(struct vhost_dev *dev,
>  	dev->umem = NULL;
>  	dev->iotlb = NULL;
>  	dev->mm = NULL;
> -	dev->worker = NULL;
>  	dev->iov_limit = iov_limit;
>  	dev->weight = weight;
>  	dev->byte_weight = byte_weight;
> @@ -576,20 +575,40 @@ static void vhost_worker_stop(struct vhost_worker *worker)
>  	wait_for_completion(worker->exit_done);
>  }
>  
> -static void vhost_worker_free(struct vhost_dev *dev)
> -{
> -	struct vhost_worker *worker = dev->worker;
>  
> +static void vhost_worker_put(struct vhost_worker *worker)
> +{
>  	if (!worker)
>  		return;
>  
> -	dev->worker = NULL;
> +	if (!refcount_dec_and_test(&worker->refcount))
> +		return;
> +
>  	WARN_ON(!llist_empty(&worker->work_list));
>  	vhost_worker_stop(worker);
>  	kfree(worker);
>  }
>  
> -static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
> +static void vhost_vq_clear_worker(struct vhost_virtqueue *vq)
> +{
> +	if (vq->worker)
> +		vhost_worker_put(vq->worker);
> +	vq->worker = NULL;
> +}
> +
> +static void vhost_workers_free(struct vhost_dev *dev)
> +{
> +	int i;
> +
> +	if (!dev->use_worker)
> +		return;
> +
> +	for (i = 0; i < dev->nvqs; i++)
> +		vhost_vq_clear_worker(dev->vqs[i]);
> +}
> +
> +static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev,
> +						int init_vq_map_count)
>  {
>  	struct vhost_worker *worker;
>  	struct task_struct *task;
> @@ -598,9 +617,9 @@ static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
>  	if (!worker)
>  		return NULL;
>  
> -	dev->worker = worker;
>  	worker->kcov_handle = kcov_common_handle();
>  	init_llist_head(&worker->work_list);
> +	refcount_set(&worker->refcount, init_vq_map_count);
>  
>  	/*
>  	 * vhost used to use the kthread API which ignores all signals by
> @@ -617,10 +636,58 @@ static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
>  
>  free_worker:
>  	kfree(worker);
> -	dev->worker = NULL;
>  	return NULL;
>  }
>  
> +static struct vhost_worker *vhost_worker_find(struct vhost_dev *dev, pid_t pid)
> +{
> +	struct vhost_worker *worker = NULL;
> +	int i;
> +
> +	for (i = 0; i < dev->nvqs; i++) {
> +		if (dev->vqs[i]->worker->task->pid != pid)
> +			continue;
> +
> +		worker = dev->vqs[i]->worker;
> +		break;
> +	}
> +
> +	return worker;
> +}
> +
> +/* Caller must have device mutex */
> +static int vhost_vq_setup_worker(struct vhost_virtqueue *vq,
> +				 struct vhost_vring_worker *info)
> +{
> +	struct vhost_dev *dev = vq->dev;
> +	struct vhost_worker *worker;
> +
> +	if (!dev->use_worker)
> +		return -EINVAL;
> +
> +	/* We don't support setting a worker on an active vq */
> +	if (vq->private_data)
> +		return -EBUSY;
> +
> +	if (info->pid == VHOST_VRING_NEW_WORKER) {
> +		worker = vhost_worker_create(dev, 1);
> +		if (!worker)
> +			return -ENOMEM;
> +
> +		info->pid = worker->task->pid;
> +	} else {
> +		worker = vhost_worker_find(dev, info->pid);
> +		if (!worker)
> +			return -ENODEV;
> +
> +		refcount_inc(&worker->refcount);
> +	}
> +
> +	vhost_vq_clear_worker(vq);
> +	vq->worker = worker;
> +	return 0;
> +}
> +
>  /* Caller should have device mutex */
>  long vhost_dev_set_owner(struct vhost_dev *dev)
>  {
> @@ -636,7 +703,7 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
>  	vhost_attach_mm(dev);
>  
>  	if (dev->use_worker) {
> -		worker = vhost_worker_create(dev);
> +		worker = vhost_worker_create(dev, dev->nvqs);
>  		if (!worker)
>  			goto err_worker;
>  
> @@ -650,7 +717,7 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
>  
>  	return 0;
>  err_iovecs:
> -	vhost_worker_free(dev);
> +	vhost_workers_free(dev);
>  err_worker:
>  	vhost_detach_mm(dev);
>  err_mm:
> @@ -742,7 +809,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
>  	dev->iotlb = NULL;
>  	vhost_clear_msg(dev);
>  	wake_up_interruptible_poll(&dev->wait, EPOLLIN | EPOLLRDNORM);
> -	vhost_worker_free(dev);
> +	vhost_workers_free(dev);
>  	vhost_detach_mm(dev);
>  }
>  EXPORT_SYMBOL_GPL(vhost_dev_cleanup);
> @@ -1612,6 +1679,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
>  	struct eventfd_ctx *ctx = NULL;
>  	u32 __user *idxp = argp;
>  	struct vhost_virtqueue *vq;
> +	struct vhost_vring_worker w;
>  	struct vhost_vring_state s;
>  	struct vhost_vring_file f;
>  	u32 idx;
> @@ -1719,6 +1787,15 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
>  		if (copy_to_user(argp, &s, sizeof(s)))
>  			r = -EFAULT;
>  		break;
> +	case VHOST_SET_VRING_WORKER:
> +		if (copy_from_user(&w, argp, sizeof(w))) {
> +			r = -EFAULT;
> +			break;
> +		}
> +		r = vhost_vq_setup_worker(vq, &w);
> +		if (!r && copy_to_user(argp, &w, sizeof(w)))
> +			r = -EFAULT;
> +		break;
>  	default:
>  		r = -ENOIOCTLCMD;
>  	}
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 33c63b24187a..0911d1a9bd3b 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -35,6 +35,7 @@ struct vhost_worker {
>  	struct llist_head	work_list;
>  	u64			kcov_handle;
>  	unsigned long		flags;
> +	refcount_t		refcount;
>  };
>  
>  /* Poll a file (eventfd or socket) */
> @@ -160,7 +161,6 @@ struct vhost_dev {
>  	struct vhost_virtqueue **vqs;
>  	int nvqs;
>  	struct eventfd_ctx *log_ctx;
> -	struct vhost_worker *worker;
>  	struct vhost_iotlb *umem;
>  	struct vhost_iotlb *iotlb;
>  	spinlock_t iotlb_lock;
> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> index c998860d7bbc..e5c0669430e5 100644
> --- a/include/uapi/linux/vhost.h
> +++ b/include/uapi/linux/vhost.h
> @@ -70,6 +70,17 @@
>  #define VHOST_VRING_BIG_ENDIAN 1
>  #define VHOST_SET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_state)
>  #define VHOST_GET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state)
> +/* By default, a device gets one vhost_worker created during VHOST_SET_OWNER
> + * that its virtqueues share. This allows userspace to create a vhost_worker
> + * and map a virtqueue to it or map a virtqueue to an existing worker.
> + *
> + * If pid > 0 and it matches an existing vhost_worker thread it will be bound
> + * to the vq. If pid is VHOST_VRING_NEW_WORKER, then a new worker will be
> + * created and bound to the vq.
> + *
> + * This must be called after VHOST_SET_OWNER and before the vq is active.
> + */

A couple of things here:
it's probably a good idea not to make it match pid exactly,
if for no other reason than I'm not sure we want to
commit this being a pid. Let's just call it an id?
And maybe byteswap it or xor with some value
just to make sure userspace does not begin abusing it anyway.

Also, interaction with pid namespace is unclear to me.
Can you document what happens here?
No need to fix funky things like moving the fd between
pid namespaces while also creating/destroying workers, but let's
document it's not supported.


> +#define VHOST_SET_VRING_WORKER _IOWR(VHOST_VIRTIO, 0x15, struct vhost_vring_worker)
>  
>  /* The following ioctls use eventfd file descriptors to signal and poll
>   * for events. */
> diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
> index f7f6a3a28977..af654e3cef0e 100644
> --- a/include/uapi/linux/vhost_types.h
> +++ b/include/uapi/linux/vhost_types.h
> @@ -47,6 +47,18 @@ struct vhost_vring_addr {
>  	__u64 log_guest_addr;
>  };
>  
> +#define VHOST_VRING_NEW_WORKER -1
> +
> +struct vhost_vring_worker {
> +	unsigned int index;
> +	/*
> +	 * The pid of the vhost worker that the vq will be bound to. If
> +	 * pid is VHOST_VRING_NEW_WORKER a new worker will be created and its
> +	 * pid will be returned in pid.
> +	 */
> +	__kernel_pid_t pid;
> +};
> +
>  /* no alignment requirement */
>  struct vhost_iotlb_msg {
>  	__u64 iova;
> -- 
> 2.25.1


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
@ 2021-10-22 10:47     ` Michael S. Tsirkin
  0 siblings, 0 replies; 74+ messages in thread
From: Michael S. Tsirkin @ 2021-10-22 10:47 UTC (permalink / raw)
  To: Mike Christie
  Cc: linux-scsi, virtualization, target-devel, stefanha, pbonzini

On Fri, Oct 22, 2021 at 12:19:11AM -0500, Mike Christie wrote:
> This patch allows userspace to create workers and bind them to vqs. You
> can have N workers per dev and also share N workers with M vqs.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/vhost/vhost.c            | 99 ++++++++++++++++++++++++++++----
>  drivers/vhost/vhost.h            |  2 +-
>  include/uapi/linux/vhost.h       | 11 ++++
>  include/uapi/linux/vhost_types.h | 12 ++++
>  4 files changed, 112 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 04f43a6445e1..c86e88d7f35c 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -493,7 +493,6 @@ void vhost_dev_init(struct vhost_dev *dev,
>  	dev->umem = NULL;
>  	dev->iotlb = NULL;
>  	dev->mm = NULL;
> -	dev->worker = NULL;
>  	dev->iov_limit = iov_limit;
>  	dev->weight = weight;
>  	dev->byte_weight = byte_weight;
> @@ -576,20 +575,40 @@ static void vhost_worker_stop(struct vhost_worker *worker)
>  	wait_for_completion(worker->exit_done);
>  }
>  
> -static void vhost_worker_free(struct vhost_dev *dev)
> -{
> -	struct vhost_worker *worker = dev->worker;
>  
> +static void vhost_worker_put(struct vhost_worker *worker)
> +{
>  	if (!worker)
>  		return;
>  
> -	dev->worker = NULL;
> +	if (!refcount_dec_and_test(&worker->refcount))
> +		return;
> +
>  	WARN_ON(!llist_empty(&worker->work_list));
>  	vhost_worker_stop(worker);
>  	kfree(worker);
>  }
>  
> -static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
> +static void vhost_vq_clear_worker(struct vhost_virtqueue *vq)
> +{
> +	if (vq->worker)
> +		vhost_worker_put(vq->worker);
> +	vq->worker = NULL;
> +}
> +
> +static void vhost_workers_free(struct vhost_dev *dev)
> +{
> +	int i;
> +
> +	if (!dev->use_worker)
> +		return;
> +
> +	for (i = 0; i < dev->nvqs; i++)
> +		vhost_vq_clear_worker(dev->vqs[i]);
> +}
> +
> +static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev,
> +						int init_vq_map_count)
>  {
>  	struct vhost_worker *worker;
>  	struct task_struct *task;
> @@ -598,9 +617,9 @@ static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
>  	if (!worker)
>  		return NULL;
>  
> -	dev->worker = worker;
>  	worker->kcov_handle = kcov_common_handle();
>  	init_llist_head(&worker->work_list);
> +	refcount_set(&worker->refcount, init_vq_map_count);
>  
>  	/*
>  	 * vhost used to use the kthread API which ignores all signals by
> @@ -617,10 +636,58 @@ static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
>  
>  free_worker:
>  	kfree(worker);
> -	dev->worker = NULL;
>  	return NULL;
>  }
>  
> +static struct vhost_worker *vhost_worker_find(struct vhost_dev *dev, pid_t pid)
> +{
> +	struct vhost_worker *worker = NULL;
> +	int i;
> +
> +	for (i = 0; i < dev->nvqs; i++) {
> +		if (dev->vqs[i]->worker->task->pid != pid)
> +			continue;
> +
> +		worker = dev->vqs[i]->worker;
> +		break;
> +	}
> +
> +	return worker;
> +}
> +
> +/* Caller must have device mutex */
> +static int vhost_vq_setup_worker(struct vhost_virtqueue *vq,
> +				 struct vhost_vring_worker *info)
> +{
> +	struct vhost_dev *dev = vq->dev;
> +	struct vhost_worker *worker;
> +
> +	if (!dev->use_worker)
> +		return -EINVAL;
> +
> +	/* We don't support setting a worker on an active vq */
> +	if (vq->private_data)
> +		return -EBUSY;
> +
> +	if (info->pid == VHOST_VRING_NEW_WORKER) {
> +		worker = vhost_worker_create(dev, 1);
> +		if (!worker)
> +			return -ENOMEM;
> +
> +		info->pid = worker->task->pid;
> +	} else {
> +		worker = vhost_worker_find(dev, info->pid);
> +		if (!worker)
> +			return -ENODEV;
> +
> +		refcount_inc(&worker->refcount);
> +	}
> +
> +	vhost_vq_clear_worker(vq);
> +	vq->worker = worker;
> +	return 0;
> +}
> +
>  /* Caller should have device mutex */
>  long vhost_dev_set_owner(struct vhost_dev *dev)
>  {
> @@ -636,7 +703,7 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
>  	vhost_attach_mm(dev);
>  
>  	if (dev->use_worker) {
> -		worker = vhost_worker_create(dev);
> +		worker = vhost_worker_create(dev, dev->nvqs);
>  		if (!worker)
>  			goto err_worker;
>  
> @@ -650,7 +717,7 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
>  
>  	return 0;
>  err_iovecs:
> -	vhost_worker_free(dev);
> +	vhost_workers_free(dev);
>  err_worker:
>  	vhost_detach_mm(dev);
>  err_mm:
> @@ -742,7 +809,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
>  	dev->iotlb = NULL;
>  	vhost_clear_msg(dev);
>  	wake_up_interruptible_poll(&dev->wait, EPOLLIN | EPOLLRDNORM);
> -	vhost_worker_free(dev);
> +	vhost_workers_free(dev);
>  	vhost_detach_mm(dev);
>  }
>  EXPORT_SYMBOL_GPL(vhost_dev_cleanup);
> @@ -1612,6 +1679,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
>  	struct eventfd_ctx *ctx = NULL;
>  	u32 __user *idxp = argp;
>  	struct vhost_virtqueue *vq;
> +	struct vhost_vring_worker w;
>  	struct vhost_vring_state s;
>  	struct vhost_vring_file f;
>  	u32 idx;
> @@ -1719,6 +1787,15 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
>  		if (copy_to_user(argp, &s, sizeof(s)))
>  			r = -EFAULT;
>  		break;
> +	case VHOST_SET_VRING_WORKER:
> +		if (copy_from_user(&w, argp, sizeof(w))) {
> +			r = -EFAULT;
> +			break;
> +		}
> +		r = vhost_vq_setup_worker(vq, &w);
> +		if (!r && copy_to_user(argp, &w, sizeof(w)))
> +			r = -EFAULT;
> +		break;
>  	default:
>  		r = -ENOIOCTLCMD;
>  	}
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 33c63b24187a..0911d1a9bd3b 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -35,6 +35,7 @@ struct vhost_worker {
>  	struct llist_head	work_list;
>  	u64			kcov_handle;
>  	unsigned long		flags;
> +	refcount_t		refcount;
>  };
>  
>  /* Poll a file (eventfd or socket) */
> @@ -160,7 +161,6 @@ struct vhost_dev {
>  	struct vhost_virtqueue **vqs;
>  	int nvqs;
>  	struct eventfd_ctx *log_ctx;
> -	struct vhost_worker *worker;
>  	struct vhost_iotlb *umem;
>  	struct vhost_iotlb *iotlb;
>  	spinlock_t iotlb_lock;
> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> index c998860d7bbc..e5c0669430e5 100644
> --- a/include/uapi/linux/vhost.h
> +++ b/include/uapi/linux/vhost.h
> @@ -70,6 +70,17 @@
>  #define VHOST_VRING_BIG_ENDIAN 1
>  #define VHOST_SET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_state)
>  #define VHOST_GET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state)
> +/* By default, a device gets one vhost_worker created during VHOST_SET_OWNER
> + * that its virtqueues share. This allows userspace to create a vhost_worker
> + * and map a virtqueue to it or map a virtqueue to an existing worker.
> + *
> + * If pid > 0 and it matches an existing vhost_worker thread it will be bound
> + * to the vq. If pid is VHOST_VRING_NEW_WORKER, then a new worker will be
> + * created and bound to the vq.
> + *
> + * This must be called after VHOST_SET_OWNER and before the vq is active.
> + */

A couple of things here:
it's probably a good idea not to make it match pid exactly,
if for no other reason than I'm not sure we want to
commit this being a pid. Let's just call it an id?
And maybe byteswap it or xor with some value
just to make sure userspace does not begin abusing it anyway.

Also, interaction with pid namespace is unclear to me.
Can you document what happens here?
No need to fix funky things like moving the fd between
pid namespaces while also creating/destroying workers, but let's
document it's not supported.


> +#define VHOST_SET_VRING_WORKER _IOWR(VHOST_VIRTIO, 0x15, struct vhost_vring_worker)
>  
>  /* The following ioctls use eventfd file descriptors to signal and poll
>   * for events. */
> diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
> index f7f6a3a28977..af654e3cef0e 100644
> --- a/include/uapi/linux/vhost_types.h
> +++ b/include/uapi/linux/vhost_types.h
> @@ -47,6 +47,18 @@ struct vhost_vring_addr {
>  	__u64 log_guest_addr;
>  };
>  
> +#define VHOST_VRING_NEW_WORKER -1
> +
> +struct vhost_vring_worker {
> +	unsigned int index;
> +	/*
> +	 * The pid of the vhost worker that the vq will be bound to. If
> +	 * pid is VHOST_VRING_NEW_WORKER a new worker will be created and its
> +	 * pid will be returned in pid.
> +	 */
> +	__kernel_pid_t pid;
> +};
> +
>  /* no alignment requirement */
>  struct vhost_iotlb_msg {
>  	__u64 iova;
> -- 
> 2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 00/11] vhost: multiple worker support
  2021-10-22  9:48   ` Michael S. Tsirkin
@ 2021-10-22 15:54     ` michael.christie
  -1 siblings, 0 replies; 74+ messages in thread
From: michael.christie @ 2021-10-22 15:54 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-scsi, virtualization, target-devel, stefanha, pbonzini,
	Christian Brauner

Ccing Christian for the kernel worker API merging stuff.

On 10/22/21 4:48 AM, Michael S. Tsirkin wrote:
> On Fri, Oct 22, 2021 at 12:18:59AM -0500, Mike Christie wrote:
>> The following patches apply over linus's tree and this patchset
>>
>> https://urldefense.com/v3/__https://lore.kernel.org/all/20211007214448.6282-1-michael.christie@oracle.com/__;!!ACWV5N9M2RV99hQ!aqbE06mycEW-AMIj5avlBMDSvg2FONlNdYHr8PcNKdvl5FeO4QLCxCOyaVg8g8C2_Kp5$ 
>>
>> which allows us to check the vhost owner thread's RLIMITs:
> 
> 
> Unfortunately that patchset in turn triggers kbuild warnings.

Yeah, that's the Jens/Paul issue I mentioned. I have to remove the
old create_io_thread code and resolve issues with their trees. Paul's
tree has a conflict with Jens and then my patch has a issue with Paul's
patches.

So Christian and I thought we would re-push the patchset through
Christian after that has settled in 5.16-rc1 and then shoot for 5.17
so it has time to bake in next.


> I was hoping you would address them, I don't think
> merging that patchset before kbuild issues are addressed
> is possible.
> 
> It also doesn't have lots of acks, I'm a bit apprehensive
> of merging core changes like this through the vhost tree.

Ok. Just to make sure we are on the same page. Christian was going to
push the kernel worker API changes.

> Try to CC more widely/ping people?
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 00/11] vhost: multiple worker support
@ 2021-10-22 15:54     ` michael.christie
  0 siblings, 0 replies; 74+ messages in thread
From: michael.christie @ 2021-10-22 15:54 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: target-devel, linux-scsi, stefanha, pbonzini, jasowang, sgarzare,
	virtualization, Christian Brauner

Ccing Christian for the kernel worker API merging stuff.

On 10/22/21 4:48 AM, Michael S. Tsirkin wrote:
> On Fri, Oct 22, 2021 at 12:18:59AM -0500, Mike Christie wrote:
>> The following patches apply over linus's tree and this patchset
>>
>> https://urldefense.com/v3/__https://lore.kernel.org/all/20211007214448.6282-1-michael.christie@oracle.com/__;!!ACWV5N9M2RV99hQ!aqbE06mycEW-AMIj5avlBMDSvg2FONlNdYHr8PcNKdvl5FeO4QLCxCOyaVg8g8C2_Kp5$ 
>>
>> which allows us to check the vhost owner thread's RLIMITs:
> 
> 
> Unfortunately that patchset in turn triggers kbuild warnings.

Yeah, that's the Jens/Paul issue I mentioned. I have to remove the
old create_io_thread code and resolve issues with their trees. Paul's
tree has a conflict with Jens and then my patch has a issue with Paul's
patches.

So Christian and I thought we would re-push the patchset through
Christian after that has settled in 5.16-rc1 and then shoot for 5.17
so it has time to bake in next.


> I was hoping you would address them, I don't think
> merging that patchset before kbuild issues are addressed
> is possible.
> 
> It also doesn't have lots of acks, I'm a bit apprehensive
> of merging core changes like this through the vhost tree.

Ok. Just to make sure we are on the same page. Christian was going to
push the kernel worker API changes.

> Try to CC more widely/ping people?

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
  2021-10-22 10:47     ` Michael S. Tsirkin
@ 2021-10-22 16:12       ` michael.christie
  -1 siblings, 0 replies; 74+ messages in thread
From: michael.christie @ 2021-10-22 16:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: target-devel, linux-scsi, stefanha, pbonzini, jasowang, sgarzare,
	virtualization

On 10/22/21 5:47 AM, Michael S. Tsirkin wrote:
>> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
>> index c998860d7bbc..e5c0669430e5 100644
>> --- a/include/uapi/linux/vhost.h
>> +++ b/include/uapi/linux/vhost.h
>> @@ -70,6 +70,17 @@
>>  #define VHOST_VRING_BIG_ENDIAN 1
>>  #define VHOST_SET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_state)
>>  #define VHOST_GET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state)
>> +/* By default, a device gets one vhost_worker created during VHOST_SET_OWNER
>> + * that its virtqueues share. This allows userspace to create a vhost_worker
>> + * and map a virtqueue to it or map a virtqueue to an existing worker.
>> + *
>> + * If pid > 0 and it matches an existing vhost_worker thread it will be bound
>> + * to the vq. If pid is VHOST_VRING_NEW_WORKER, then a new worker will be
>> + * created and bound to the vq.
>> + *
>> + * This must be called after VHOST_SET_OWNER and before the vq is active.
>> + */
> 
> A couple of things here:
> it's probably a good idea not to make it match pid exactly,
> if for no other reason than I'm not sure we want to
> commit this being a pid. Let's just call it an id?

Ok.

> And maybe byteswap it or xor with some value
> just to make sure userspace does not begin abusing it anyway.
> 
> Also, interaction with pid namespace is unclear to me.
> Can you document what happens here?

This current patchset only allows the vhost_dev owner to
create/bind workers for devices it owns, so namespace don't come
into play. If a thread from another namespace tried to create/bind
a worker we would hit the owner checks in vhost_dev_ioctl which is
done before vhost_vring_ioctl normally (for vdpa we hit the use_worker
check and fail there).

However, with the kernel worker API changes the worker threads will
now be in the vhost dev owner's namespace and not the kthreadd/default
one, so in the future we are covered if we want to do something more
advanced. For example, I've seen people working on an API to export the
worker pids:

https://lore.kernel.org/netdev/20210507154332.hiblsd6ot5wzwkdj@steredhat/T/

and in the future for interfaces that export that info we could restrict
access to root or users from the same namespace or I guess add interfaces
to allow different namespaces to see the workers and share them.


> No need to fix funky things like moving the fd between
> pid namespaces while also creating/destroying workers, but let's
> document it's not supported.

Ok. I'll add a comment.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
@ 2021-10-22 16:12       ` michael.christie
  0 siblings, 0 replies; 74+ messages in thread
From: michael.christie @ 2021-10-22 16:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-scsi, virtualization, target-devel, stefanha, pbonzini

On 10/22/21 5:47 AM, Michael S. Tsirkin wrote:
>> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
>> index c998860d7bbc..e5c0669430e5 100644
>> --- a/include/uapi/linux/vhost.h
>> +++ b/include/uapi/linux/vhost.h
>> @@ -70,6 +70,17 @@
>>  #define VHOST_VRING_BIG_ENDIAN 1
>>  #define VHOST_SET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_state)
>>  #define VHOST_GET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state)
>> +/* By default, a device gets one vhost_worker created during VHOST_SET_OWNER
>> + * that its virtqueues share. This allows userspace to create a vhost_worker
>> + * and map a virtqueue to it or map a virtqueue to an existing worker.
>> + *
>> + * If pid > 0 and it matches an existing vhost_worker thread it will be bound
>> + * to the vq. If pid is VHOST_VRING_NEW_WORKER, then a new worker will be
>> + * created and bound to the vq.
>> + *
>> + * This must be called after VHOST_SET_OWNER and before the vq is active.
>> + */
> 
> A couple of things here:
> it's probably a good idea not to make it match pid exactly,
> if for no other reason than I'm not sure we want to
> commit this being a pid. Let's just call it an id?

Ok.

> And maybe byteswap it or xor with some value
> just to make sure userspace does not begin abusing it anyway.
> 
> Also, interaction with pid namespace is unclear to me.
> Can you document what happens here?

This current patchset only allows the vhost_dev owner to
create/bind workers for devices it owns, so namespace don't come
into play. If a thread from another namespace tried to create/bind
a worker we would hit the owner checks in vhost_dev_ioctl which is
done before vhost_vring_ioctl normally (for vdpa we hit the use_worker
check and fail there).

However, with the kernel worker API changes the worker threads will
now be in the vhost dev owner's namespace and not the kthreadd/default
one, so in the future we are covered if we want to do something more
advanced. For example, I've seen people working on an API to export the
worker pids:

https://lore.kernel.org/netdev/20210507154332.hiblsd6ot5wzwkdj@steredhat/T/

and in the future for interfaces that export that info we could restrict
access to root or users from the same namespace or I guess add interfaces
to allow different namespaces to see the workers and share them.


> No need to fix funky things like moving the fd between
> pid namespaces while also creating/destroying workers, but let's
> document it's not supported.

Ok. I'll add a comment.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
  2021-10-22 16:12       ` michael.christie
@ 2021-10-22 18:17         ` michael.christie
  -1 siblings, 0 replies; 74+ messages in thread
From: michael.christie @ 2021-10-22 18:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: target-devel, linux-scsi, stefanha, pbonzini, jasowang, sgarzare,
	virtualization

On 10/22/21 11:12 AM, michael.christie@oracle.com wrote:
> On 10/22/21 5:47 AM, Michael S. Tsirkin wrote:
>>> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
>>> index c998860d7bbc..e5c0669430e5 100644
>>> --- a/include/uapi/linux/vhost.h
>>> +++ b/include/uapi/linux/vhost.h
>>> @@ -70,6 +70,17 @@
>>>  #define VHOST_VRING_BIG_ENDIAN 1
>>>  #define VHOST_SET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_state)
>>>  #define VHOST_GET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state)
>>> +/* By default, a device gets one vhost_worker created during VHOST_SET_OWNER
>>> + * that its virtqueues share. This allows userspace to create a vhost_worker
>>> + * and map a virtqueue to it or map a virtqueue to an existing worker.
>>> + *
>>> + * If pid > 0 and it matches an existing vhost_worker thread it will be bound
>>> + * to the vq. If pid is VHOST_VRING_NEW_WORKER, then a new worker will be
>>> + * created and bound to the vq.
>>> + *
>>> + * This must be called after VHOST_SET_OWNER and before the vq is active.
>>> + */
>>
>> A couple of things here:
>> it's probably a good idea not to make it match pid exactly,
>> if for no other reason than I'm not sure we want to
>> commit this being a pid. Let's just call it an id?
> 
> Ok.
> 
>> And maybe byteswap it or xor with some value
>> just to make sure userspace does not begin abusing it anyway.
>>
>> Also, interaction with pid namespace is unclear to me.
>> Can you document what happens here?
> 
> This current patchset only allows the vhost_dev owner to
> create/bind workers for devices it owns, so namespace don't come

I made a mistake here. The patches do restrict VHOST_SET_VRING_WORKER
to the same owner like I wrote. However, it looks like we could have 2
threads with the same mm pointer so vhost_dev_check_owner returns true,
but they could be in different namespaces.

Even though we are not going to pass the pid_t between user/kernel
space, should I add a pid namespace check when I repost the patches?



> into play. If a thread from another namespace tried to create/bind
> a worker we would hit the owner checks in vhost_dev_ioctl which is
> done before vhost_vring_ioctl normally (for vdpa we hit the use_worker
> check and fail there).
> 
> However, with the kernel worker API changes the worker threads will
> now be in the vhost dev owner's namespace and not the kthreadd/default
> one, so in the future we are covered if we want to do something more
> advanced. For example, I've seen people working on an API to export the
> worker pids:
> 
> https://lore.kernel.org/netdev/20210507154332.hiblsd6ot5wzwkdj@steredhat/T/
> 
> and in the future for interfaces that export that info we could restrict
> access to root or users from the same namespace or I guess add interfaces
> to allow different namespaces to see the workers and share them.
> 
> 
>> No need to fix funky things like moving the fd between
>> pid namespaces while also creating/destroying workers, but let's
>> document it's not supported.
> 
> Ok. I'll add a comment.
> 


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
@ 2021-10-22 18:17         ` michael.christie
  0 siblings, 0 replies; 74+ messages in thread
From: michael.christie @ 2021-10-22 18:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-scsi, virtualization, target-devel, stefanha, pbonzini

On 10/22/21 11:12 AM, michael.christie@oracle.com wrote:
> On 10/22/21 5:47 AM, Michael S. Tsirkin wrote:
>>> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
>>> index c998860d7bbc..e5c0669430e5 100644
>>> --- a/include/uapi/linux/vhost.h
>>> +++ b/include/uapi/linux/vhost.h
>>> @@ -70,6 +70,17 @@
>>>  #define VHOST_VRING_BIG_ENDIAN 1
>>>  #define VHOST_SET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_state)
>>>  #define VHOST_GET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state)
>>> +/* By default, a device gets one vhost_worker created during VHOST_SET_OWNER
>>> + * that its virtqueues share. This allows userspace to create a vhost_worker
>>> + * and map a virtqueue to it or map a virtqueue to an existing worker.
>>> + *
>>> + * If pid > 0 and it matches an existing vhost_worker thread it will be bound
>>> + * to the vq. If pid is VHOST_VRING_NEW_WORKER, then a new worker will be
>>> + * created and bound to the vq.
>>> + *
>>> + * This must be called after VHOST_SET_OWNER and before the vq is active.
>>> + */
>>
>> A couple of things here:
>> it's probably a good idea not to make it match pid exactly,
>> if for no other reason than I'm not sure we want to
>> commit this being a pid. Let's just call it an id?
> 
> Ok.
> 
>> And maybe byteswap it or xor with some value
>> just to make sure userspace does not begin abusing it anyway.
>>
>> Also, interaction with pid namespace is unclear to me.
>> Can you document what happens here?
> 
> This current patchset only allows the vhost_dev owner to
> create/bind workers for devices it owns, so namespace don't come

I made a mistake here. The patches do restrict VHOST_SET_VRING_WORKER
to the same owner like I wrote. However, it looks like we could have 2
threads with the same mm pointer so vhost_dev_check_owner returns true,
but they could be in different namespaces.

Even though we are not going to pass the pid_t between user/kernel
space, should I add a pid namespace check when I repost the patches?



> into play. If a thread from another namespace tried to create/bind
> a worker we would hit the owner checks in vhost_dev_ioctl which is
> done before vhost_vring_ioctl normally (for vdpa we hit the use_worker
> check and fail there).
> 
> However, with the kernel worker API changes the worker threads will
> now be in the vhost dev owner's namespace and not the kthreadd/default
> one, so in the future we are covered if we want to do something more
> advanced. For example, I've seen people working on an API to export the
> worker pids:
> 
> https://lore.kernel.org/netdev/20210507154332.hiblsd6ot5wzwkdj@steredhat/T/
> 
> and in the future for interfaces that export that info we could restrict
> access to root or users from the same namespace or I guess add interfaces
> to allow different namespaces to see the workers and share them.
> 
> 
>> No need to fix funky things like moving the fd between
>> pid namespaces while also creating/destroying workers, but let's
>> document it's not supported.
> 
> Ok. I'll add a comment.
> 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
  2021-10-22 18:17         ` michael.christie
@ 2021-10-23 20:11           ` Michael S. Tsirkin
  -1 siblings, 0 replies; 74+ messages in thread
From: Michael S. Tsirkin @ 2021-10-23 20:11 UTC (permalink / raw)
  To: michael.christie
  Cc: target-devel, linux-scsi, stefanha, pbonzini, jasowang, sgarzare,
	virtualization

On Fri, Oct 22, 2021 at 01:17:26PM -0500, michael.christie@oracle.com wrote:
> On 10/22/21 11:12 AM, michael.christie@oracle.com wrote:
> > On 10/22/21 5:47 AM, Michael S. Tsirkin wrote:
> >>> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> >>> index c998860d7bbc..e5c0669430e5 100644
> >>> --- a/include/uapi/linux/vhost.h
> >>> +++ b/include/uapi/linux/vhost.h
> >>> @@ -70,6 +70,17 @@
> >>>  #define VHOST_VRING_BIG_ENDIAN 1
> >>>  #define VHOST_SET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_state)
> >>>  #define VHOST_GET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state)
> >>> +/* By default, a device gets one vhost_worker created during VHOST_SET_OWNER
> >>> + * that its virtqueues share. This allows userspace to create a vhost_worker
> >>> + * and map a virtqueue to it or map a virtqueue to an existing worker.
> >>> + *
> >>> + * If pid > 0 and it matches an existing vhost_worker thread it will be bound
> >>> + * to the vq. If pid is VHOST_VRING_NEW_WORKER, then a new worker will be
> >>> + * created and bound to the vq.
> >>> + *
> >>> + * This must be called after VHOST_SET_OWNER and before the vq is active.
> >>> + */
> >>
> >> A couple of things here:
> >> it's probably a good idea not to make it match pid exactly,
> >> if for no other reason than I'm not sure we want to
> >> commit this being a pid. Let's just call it an id?
> > 
> > Ok.
> > 
> >> And maybe byteswap it or xor with some value
> >> just to make sure userspace does not begin abusing it anyway.
> >>
> >> Also, interaction with pid namespace is unclear to me.
> >> Can you document what happens here?
> > 
> > This current patchset only allows the vhost_dev owner to
> > create/bind workers for devices it owns, so namespace don't come
> 
> I made a mistake here. The patches do restrict VHOST_SET_VRING_WORKER
> to the same owner like I wrote. However, it looks like we could have 2
> threads with the same mm pointer so vhost_dev_check_owner returns true,
> but they could be in different namespaces.
> 
> Even though we are not going to pass the pid_t between user/kernel
> space, should I add a pid namespace check when I repost the patches?

Um it's part of the ioctl. How you are not going to pass it around?

So if we do worry about this, I would just make it a 64 bit integer,
rename it "id" and increment each time a thread is created.
 
> 
> > into play. If a thread from another namespace tried to create/bind
> > a worker we would hit the owner checks in vhost_dev_ioctl which is
> > done before vhost_vring_ioctl normally (for vdpa we hit the use_worker
> > check and fail there).
> > 
> > However, with the kernel worker API changes the worker threads will
> > now be in the vhost dev owner's namespace and not the kthreadd/default
> > one, so in the future we are covered if we want to do something more
> > advanced. For example, I've seen people working on an API to export the
> > worker pids:
> > 
> > https://lore.kernel.org/netdev/20210507154332.hiblsd6ot5wzwkdj@steredhat/T/
> > 
> > and in the future for interfaces that export that info we could restrict
> > access to root or users from the same namespace or I guess add interfaces
> > to allow different namespaces to see the workers and share them.
> > 
> > 
> >> No need to fix funky things like moving the fd between
> >> pid namespaces while also creating/destroying workers, but let's
> >> document it's not supported.
> > 
> > Ok. I'll add a comment.
> > 


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
@ 2021-10-23 20:11           ` Michael S. Tsirkin
  0 siblings, 0 replies; 74+ messages in thread
From: Michael S. Tsirkin @ 2021-10-23 20:11 UTC (permalink / raw)
  To: michael.christie
  Cc: linux-scsi, virtualization, target-devel, stefanha, pbonzini

On Fri, Oct 22, 2021 at 01:17:26PM -0500, michael.christie@oracle.com wrote:
> On 10/22/21 11:12 AM, michael.christie@oracle.com wrote:
> > On 10/22/21 5:47 AM, Michael S. Tsirkin wrote:
> >>> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> >>> index c998860d7bbc..e5c0669430e5 100644
> >>> --- a/include/uapi/linux/vhost.h
> >>> +++ b/include/uapi/linux/vhost.h
> >>> @@ -70,6 +70,17 @@
> >>>  #define VHOST_VRING_BIG_ENDIAN 1
> >>>  #define VHOST_SET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_state)
> >>>  #define VHOST_GET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state)
> >>> +/* By default, a device gets one vhost_worker created during VHOST_SET_OWNER
> >>> + * that its virtqueues share. This allows userspace to create a vhost_worker
> >>> + * and map a virtqueue to it or map a virtqueue to an existing worker.
> >>> + *
> >>> + * If pid > 0 and it matches an existing vhost_worker thread it will be bound
> >>> + * to the vq. If pid is VHOST_VRING_NEW_WORKER, then a new worker will be
> >>> + * created and bound to the vq.
> >>> + *
> >>> + * This must be called after VHOST_SET_OWNER and before the vq is active.
> >>> + */
> >>
> >> A couple of things here:
> >> it's probably a good idea not to make it match pid exactly,
> >> if for no other reason than I'm not sure we want to
> >> commit this being a pid. Let's just call it an id?
> > 
> > Ok.
> > 
> >> And maybe byteswap it or xor with some value
> >> just to make sure userspace does not begin abusing it anyway.
> >>
> >> Also, interaction with pid namespace is unclear to me.
> >> Can you document what happens here?
> > 
> > This current patchset only allows the vhost_dev owner to
> > create/bind workers for devices it owns, so namespace don't come
> 
> I made a mistake here. The patches do restrict VHOST_SET_VRING_WORKER
> to the same owner like I wrote. However, it looks like we could have 2
> threads with the same mm pointer so vhost_dev_check_owner returns true,
> but they could be in different namespaces.
> 
> Even though we are not going to pass the pid_t between user/kernel
> space, should I add a pid namespace check when I repost the patches?

Um it's part of the ioctl. How you are not going to pass it around?

So if we do worry about this, I would just make it a 64 bit integer,
rename it "id" and increment each time a thread is created.
 
> 
> > into play. If a thread from another namespace tried to create/bind
> > a worker we would hit the owner checks in vhost_dev_ioctl which is
> > done before vhost_vring_ioctl normally (for vdpa we hit the use_worker
> > check and fail there).
> > 
> > However, with the kernel worker API changes the worker threads will
> > now be in the vhost dev owner's namespace and not the kthreadd/default
> > one, so in the future we are covered if we want to do something more
> > advanced. For example, I've seen people working on an API to export the
> > worker pids:
> > 
> > https://lore.kernel.org/netdev/20210507154332.hiblsd6ot5wzwkdj@steredhat/T/
> > 
> > and in the future for interfaces that export that info we could restrict
> > access to root or users from the same namespace or I guess add interfaces
> > to allow different namespaces to see the workers and share them.
> > 
> > 
> >> No need to fix funky things like moving the fd between
> >> pid namespaces while also creating/destroying workers, but let's
> >> document it's not supported.
> > 
> > Ok. I'll add a comment.
> > 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 00/11] vhost: multiple worker support
  2021-10-22 15:54     ` michael.christie
@ 2021-10-23 20:12       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 74+ messages in thread
From: Michael S. Tsirkin @ 2021-10-23 20:12 UTC (permalink / raw)
  To: michael.christie
  Cc: target-devel, linux-scsi, stefanha, pbonzini, jasowang, sgarzare,
	virtualization, Christian Brauner

On Fri, Oct 22, 2021 at 10:54:24AM -0500, michael.christie@oracle.com wrote:
> Ccing Christian for the kernel worker API merging stuff.
> 
> On 10/22/21 4:48 AM, Michael S. Tsirkin wrote:
> > On Fri, Oct 22, 2021 at 12:18:59AM -0500, Mike Christie wrote:
> >> The following patches apply over linus's tree and this patchset
> >>
> >> https://urldefense.com/v3/__https://lore.kernel.org/all/20211007214448.6282-1-michael.christie@oracle.com/__;!!ACWV5N9M2RV99hQ!aqbE06mycEW-AMIj5avlBMDSvg2FONlNdYHr8PcNKdvl5FeO4QLCxCOyaVg8g8C2_Kp5$ 
> >>
> >> which allows us to check the vhost owner thread's RLIMITs:
> > 
> > 
> > Unfortunately that patchset in turn triggers kbuild warnings.
> 
> Yeah, that's the Jens/Paul issue I mentioned. I have to remove the
> old create_io_thread code and resolve issues with their trees. Paul's
> tree has a conflict with Jens and then my patch has a issue with Paul's
> patches.
> 
> So Christian and I thought we would re-push the patchset through
> Christian after that has settled in 5.16-rc1 and then shoot for 5.17
> so it has time to bake in next.
> 

Sounds good to me.

> > I was hoping you would address them, I don't think
> > merging that patchset before kbuild issues are addressed
> > is possible.
> > 
> > It also doesn't have lots of acks, I'm a bit apprehensive
> > of merging core changes like this through the vhost tree.
> 
> Ok. Just to make sure we are on the same page. Christian was going to
> push the kernel worker API changes.

Fine.

> > Try to CC more widely/ping people?


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 00/11] vhost: multiple worker support
@ 2021-10-23 20:12       ` Michael S. Tsirkin
  0 siblings, 0 replies; 74+ messages in thread
From: Michael S. Tsirkin @ 2021-10-23 20:12 UTC (permalink / raw)
  To: michael.christie
  Cc: linux-scsi, virtualization, target-devel, stefanha, pbonzini,
	Christian Brauner

On Fri, Oct 22, 2021 at 10:54:24AM -0500, michael.christie@oracle.com wrote:
> Ccing Christian for the kernel worker API merging stuff.
> 
> On 10/22/21 4:48 AM, Michael S. Tsirkin wrote:
> > On Fri, Oct 22, 2021 at 12:18:59AM -0500, Mike Christie wrote:
> >> The following patches apply over linus's tree and this patchset
> >>
> >> https://urldefense.com/v3/__https://lore.kernel.org/all/20211007214448.6282-1-michael.christie@oracle.com/__;!!ACWV5N9M2RV99hQ!aqbE06mycEW-AMIj5avlBMDSvg2FONlNdYHr8PcNKdvl5FeO4QLCxCOyaVg8g8C2_Kp5$ 
> >>
> >> which allows us to check the vhost owner thread's RLIMITs:
> > 
> > 
> > Unfortunately that patchset in turn triggers kbuild warnings.
> 
> Yeah, that's the Jens/Paul issue I mentioned. I have to remove the
> old create_io_thread code and resolve issues with their trees. Paul's
> tree has a conflict with Jens and then my patch has a issue with Paul's
> patches.
> 
> So Christian and I thought we would re-push the patchset through
> Christian after that has settled in 5.16-rc1 and then shoot for 5.17
> so it has time to bake in next.
> 

Sounds good to me.

> > I was hoping you would address them, I don't think
> > merging that patchset before kbuild issues are addressed
> > is possible.
> > 
> > It also doesn't have lots of acks, I'm a bit apprehensive
> > of merging core changes like this through the vhost tree.
> 
> Ok. Just to make sure we are on the same page. Christian was going to
> push the kernel worker API changes.

Fine.

> > Try to CC more widely/ping people?

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 06/11] vhost-sock: convert to vq helpers
  2021-10-22  5:19   ` Mike Christie
@ 2021-10-25  9:08     ` Stefano Garzarella
  -1 siblings, 0 replies; 74+ messages in thread
From: Stefano Garzarella @ 2021-10-25  9:08 UTC (permalink / raw)
  To: Mike Christie
  Cc: target-devel, linux-scsi, stefanha, pbonzini, jasowang, mst,
	virtualization

On Fri, Oct 22, 2021 at 12:19:06AM -0500, Mike Christie wrote:
>Convert from vhost dev based helpers to vq ones.
>
>Signed-off-by: Mike Christie <michael.christie@oracle.com>
>---
> drivers/vhost/vsock.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>index 938aefbc75ec..c50c60d0955e 100644
>--- a/drivers/vhost/vsock.c
>+++ b/drivers/vhost/vsock.c
>@@ -300,7 +300,7 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
> 	list_add_tail(&pkt->list, &vsock->send_pkt_list);
> 	spin_unlock_bh(&vsock->send_pkt_list_lock);
>
>-	vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
>+	vhost_vq_work_queue(&vsock->vqs[VSOCK_VQ_TX], &vsock->send_pkt_work);

I think we should use VSOCK_VQ_RX. I know, the nomenclature is weird, 
but it's from the guest's point of view, so the host when sending 
packets uses the VSOCK_VQ_RX, see vhost_transport_send_pkt_work().


>
> 	rcu_read_unlock();
> 	return len;
>@@ -612,7 +612,7 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
> 	/* Some packets may have been queued before the device was started,
> 	 * let's kick the send worker to send them.
> 	 */
>-	vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
>+	vhost_vq_work_queue(&vsock->vqs[VSOCK_VQ_TX], 
>&vsock->send_pkt_work);

Ditto.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 06/11] vhost-sock: convert to vq helpers
@ 2021-10-25  9:08     ` Stefano Garzarella
  0 siblings, 0 replies; 74+ messages in thread
From: Stefano Garzarella @ 2021-10-25  9:08 UTC (permalink / raw)
  To: Mike Christie
  Cc: linux-scsi, mst, virtualization, target-devel, stefanha, pbonzini

On Fri, Oct 22, 2021 at 12:19:06AM -0500, Mike Christie wrote:
>Convert from vhost dev based helpers to vq ones.
>
>Signed-off-by: Mike Christie <michael.christie@oracle.com>
>---
> drivers/vhost/vsock.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>index 938aefbc75ec..c50c60d0955e 100644
>--- a/drivers/vhost/vsock.c
>+++ b/drivers/vhost/vsock.c
>@@ -300,7 +300,7 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
> 	list_add_tail(&pkt->list, &vsock->send_pkt_list);
> 	spin_unlock_bh(&vsock->send_pkt_list_lock);
>
>-	vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
>+	vhost_vq_work_queue(&vsock->vqs[VSOCK_VQ_TX], &vsock->send_pkt_work);

I think we should use VSOCK_VQ_RX. I know, the nomenclature is weird, 
but it's from the guest's point of view, so the host when sending 
packets uses the VSOCK_VQ_RX, see vhost_transport_send_pkt_work().


>
> 	rcu_read_unlock();
> 	return len;
>@@ -612,7 +612,7 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
> 	/* Some packets may have been queued before the device was started,
> 	 * let's kick the send worker to send them.
> 	 */
>-	vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
>+	vhost_vq_work_queue(&vsock->vqs[VSOCK_VQ_TX], 
>&vsock->send_pkt_work);

Ditto.

Thanks,
Stefano

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
  2021-10-23 20:11           ` Michael S. Tsirkin
@ 2021-10-25 16:04             ` michael.christie
  -1 siblings, 0 replies; 74+ messages in thread
From: michael.christie @ 2021-10-25 16:04 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: target-devel, linux-scsi, stefanha, pbonzini, jasowang, sgarzare,
	virtualization

On 10/23/21 3:11 PM, Michael S. Tsirkin wrote:
> On Fri, Oct 22, 2021 at 01:17:26PM -0500, michael.christie@oracle.com wrote:
>> On 10/22/21 11:12 AM, michael.christie@oracle.com wrote:
>>> On 10/22/21 5:47 AM, Michael S. Tsirkin wrote:
>>>>> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
>>>>> index c998860d7bbc..e5c0669430e5 100644
>>>>> --- a/include/uapi/linux/vhost.h
>>>>> +++ b/include/uapi/linux/vhost.h
>>>>> @@ -70,6 +70,17 @@
>>>>>  #define VHOST_VRING_BIG_ENDIAN 1
>>>>>  #define VHOST_SET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_state)
>>>>>  #define VHOST_GET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state)
>>>>> +/* By default, a device gets one vhost_worker created during VHOST_SET_OWNER
>>>>> + * that its virtqueues share. This allows userspace to create a vhost_worker
>>>>> + * and map a virtqueue to it or map a virtqueue to an existing worker.
>>>>> + *
>>>>> + * If pid > 0 and it matches an existing vhost_worker thread it will be bound
>>>>> + * to the vq. If pid is VHOST_VRING_NEW_WORKER, then a new worker will be
>>>>> + * created and bound to the vq.
>>>>> + *
>>>>> + * This must be called after VHOST_SET_OWNER and before the vq is active.
>>>>> + */
>>>>
>>>> A couple of things here:
>>>> it's probably a good idea not to make it match pid exactly,
>>>> if for no other reason than I'm not sure we want to
>>>> commit this being a pid. Let's just call it an id?
>>>
>>> Ok.
>>>
>>>> And maybe byteswap it or xor with some value
>>>> just to make sure userspace does not begin abusing it anyway.
>>>>
>>>> Also, interaction with pid namespace is unclear to me.
>>>> Can you document what happens here?
>>>
>>> This current patchset only allows the vhost_dev owner to
>>> create/bind workers for devices it owns, so namespace don't come
>>
>> I made a mistake here. The patches do restrict VHOST_SET_VRING_WORKER
>> to the same owner like I wrote. However, it looks like we could have 2
>> threads with the same mm pointer so vhost_dev_check_owner returns true,
>> but they could be in different namespaces.
>>
>> Even though we are not going to pass the pid_t between user/kernel
>> space, should I add a pid namespace check when I repost the patches?
> 
> Um it's part of the ioctl. How you are not going to pass it around?

The not passing a pid around was referring to your comment about
obfuscating the pid. I might have misunderstood you and thought you
wanted to do something more like you suggested below where to userspace
it's just some int as far as userspace knows.


> 
> So if we do worry about this, I would just make it a 64 bit integer,
> rename it "id" and increment each time a thread is created.
>  
Yeah, this works for me. I just used a ida to allocate the id. We can 
then use it's lookup functions too.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
@ 2021-10-25 16:04             ` michael.christie
  0 siblings, 0 replies; 74+ messages in thread
From: michael.christie @ 2021-10-25 16:04 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-scsi, virtualization, target-devel, stefanha, pbonzini

On 10/23/21 3:11 PM, Michael S. Tsirkin wrote:
> On Fri, Oct 22, 2021 at 01:17:26PM -0500, michael.christie@oracle.com wrote:
>> On 10/22/21 11:12 AM, michael.christie@oracle.com wrote:
>>> On 10/22/21 5:47 AM, Michael S. Tsirkin wrote:
>>>>> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
>>>>> index c998860d7bbc..e5c0669430e5 100644
>>>>> --- a/include/uapi/linux/vhost.h
>>>>> +++ b/include/uapi/linux/vhost.h
>>>>> @@ -70,6 +70,17 @@
>>>>>  #define VHOST_VRING_BIG_ENDIAN 1
>>>>>  #define VHOST_SET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_state)
>>>>>  #define VHOST_GET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state)
>>>>> +/* By default, a device gets one vhost_worker created during VHOST_SET_OWNER
>>>>> + * that its virtqueues share. This allows userspace to create a vhost_worker
>>>>> + * and map a virtqueue to it or map a virtqueue to an existing worker.
>>>>> + *
>>>>> + * If pid > 0 and it matches an existing vhost_worker thread it will be bound
>>>>> + * to the vq. If pid is VHOST_VRING_NEW_WORKER, then a new worker will be
>>>>> + * created and bound to the vq.
>>>>> + *
>>>>> + * This must be called after VHOST_SET_OWNER and before the vq is active.
>>>>> + */
>>>>
>>>> A couple of things here:
>>>> it's probably a good idea not to make it match pid exactly,
>>>> if for no other reason than I'm not sure we want to
>>>> commit this being a pid. Let's just call it an id?
>>>
>>> Ok.
>>>
>>>> And maybe byteswap it or xor with some value
>>>> just to make sure userspace does not begin abusing it anyway.
>>>>
>>>> Also, interaction with pid namespace is unclear to me.
>>>> Can you document what happens here?
>>>
>>> This current patchset only allows the vhost_dev owner to
>>> create/bind workers for devices it owns, so namespace don't come
>>
>> I made a mistake here. The patches do restrict VHOST_SET_VRING_WORKER
>> to the same owner like I wrote. However, it looks like we could have 2
>> threads with the same mm pointer so vhost_dev_check_owner returns true,
>> but they could be in different namespaces.
>>
>> Even though we are not going to pass the pid_t between user/kernel
>> space, should I add a pid namespace check when I repost the patches?
> 
> Um it's part of the ioctl. How you are not going to pass it around?

The not passing a pid around was referring to your comment about
obfuscating the pid. I might have misunderstood you and thought you
wanted to do something more like you suggested below where to userspace
it's just some int as far as userspace knows.


> 
> So if we do worry about this, I would just make it a 64 bit integer,
> rename it "id" and increment each time a thread is created.
>  
Yeah, this works for me. I just used a ida to allocate the id. We can 
then use it's lookup functions too.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 06/11] vhost-sock: convert to vq helpers
  2021-10-25  9:08     ` Stefano Garzarella
@ 2021-10-25 16:09       ` michael.christie
  -1 siblings, 0 replies; 74+ messages in thread
From: michael.christie @ 2021-10-25 16:09 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: target-devel, linux-scsi, stefanha, pbonzini, jasowang, mst,
	virtualization

On 10/25/21 4:08 AM, Stefano Garzarella wrote:
> On Fri, Oct 22, 2021 at 12:19:06AM -0500, Mike Christie wrote:
>> Convert from vhost dev based helpers to vq ones.
>>
>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
>> ---
>> drivers/vhost/vsock.c | 8 +++++---
>> 1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>> index 938aefbc75ec..c50c60d0955e 100644
>> --- a/drivers/vhost/vsock.c
>> +++ b/drivers/vhost/vsock.c
>> @@ -300,7 +300,7 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
>>     list_add_tail(&pkt->list, &vsock->send_pkt_list);
>>     spin_unlock_bh(&vsock->send_pkt_list_lock);
>>
>> -    vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
>> +    vhost_vq_work_queue(&vsock->vqs[VSOCK_VQ_TX], &vsock->send_pkt_work);
> 
> I think we should use VSOCK_VQ_RX. I know, the nomenclature is weird, but it's from the guest's point of view, so the host when sending packets uses the VSOCK_VQ_RX, see vhost_transport_send_pkt_work().
> 
> 
>>
>>     rcu_read_unlock();
>>     return len;
>> @@ -612,7 +612,7 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
>>     /* Some packets may have been queued before the device was started,
>>      * let's kick the send worker to send them.
>>      */
>> -    vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
>> +    vhost_vq_work_queue(&vsock->vqs[VSOCK_VQ_TX], &vsock->send_pkt_work);
> 
> Ditto.
> 

You're right. I'll fix. Thanks.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 06/11] vhost-sock: convert to vq helpers
@ 2021-10-25 16:09       ` michael.christie
  0 siblings, 0 replies; 74+ messages in thread
From: michael.christie @ 2021-10-25 16:09 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: linux-scsi, mst, virtualization, target-devel, stefanha, pbonzini

On 10/25/21 4:08 AM, Stefano Garzarella wrote:
> On Fri, Oct 22, 2021 at 12:19:06AM -0500, Mike Christie wrote:
>> Convert from vhost dev based helpers to vq ones.
>>
>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
>> ---
>> drivers/vhost/vsock.c | 8 +++++---
>> 1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>> index 938aefbc75ec..c50c60d0955e 100644
>> --- a/drivers/vhost/vsock.c
>> +++ b/drivers/vhost/vsock.c
>> @@ -300,7 +300,7 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
>>     list_add_tail(&pkt->list, &vsock->send_pkt_list);
>>     spin_unlock_bh(&vsock->send_pkt_list_lock);
>>
>> -    vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
>> +    vhost_vq_work_queue(&vsock->vqs[VSOCK_VQ_TX], &vsock->send_pkt_work);
> 
> I think we should use VSOCK_VQ_RX. I know, the nomenclature is weird, but it's from the guest's point of view, so the host when sending packets uses the VSOCK_VQ_RX, see vhost_transport_send_pkt_work().
> 
> 
>>
>>     rcu_read_unlock();
>>     return len;
>> @@ -612,7 +612,7 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
>>     /* Some packets may have been queued before the device was started,
>>      * let's kick the send worker to send them.
>>      */
>> -    vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
>> +    vhost_vq_work_queue(&vsock->vqs[VSOCK_VQ_TX], &vsock->send_pkt_work);
> 
> Ditto.
> 

You're right. I'll fix. Thanks.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
  2021-10-25 16:04             ` michael.christie
@ 2021-10-25 17:14               ` Michael S. Tsirkin
  -1 siblings, 0 replies; 74+ messages in thread
From: Michael S. Tsirkin @ 2021-10-25 17:14 UTC (permalink / raw)
  To: michael.christie
  Cc: target-devel, linux-scsi, stefanha, pbonzini, jasowang, sgarzare,
	virtualization

On Mon, Oct 25, 2021 at 11:04:42AM -0500, michael.christie@oracle.com wrote:
> On 10/23/21 3:11 PM, Michael S. Tsirkin wrote:
> > On Fri, Oct 22, 2021 at 01:17:26PM -0500, michael.christie@oracle.com wrote:
> >> On 10/22/21 11:12 AM, michael.christie@oracle.com wrote:
> >>> On 10/22/21 5:47 AM, Michael S. Tsirkin wrote:
> >>>>> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> >>>>> index c998860d7bbc..e5c0669430e5 100644
> >>>>> --- a/include/uapi/linux/vhost.h
> >>>>> +++ b/include/uapi/linux/vhost.h
> >>>>> @@ -70,6 +70,17 @@
> >>>>>  #define VHOST_VRING_BIG_ENDIAN 1
> >>>>>  #define VHOST_SET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_state)
> >>>>>  #define VHOST_GET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state)
> >>>>> +/* By default, a device gets one vhost_worker created during VHOST_SET_OWNER
> >>>>> + * that its virtqueues share. This allows userspace to create a vhost_worker
> >>>>> + * and map a virtqueue to it or map a virtqueue to an existing worker.
> >>>>> + *
> >>>>> + * If pid > 0 and it matches an existing vhost_worker thread it will be bound
> >>>>> + * to the vq. If pid is VHOST_VRING_NEW_WORKER, then a new worker will be
> >>>>> + * created and bound to the vq.
> >>>>> + *
> >>>>> + * This must be called after VHOST_SET_OWNER and before the vq is active.
> >>>>> + */
> >>>>
> >>>> A couple of things here:
> >>>> it's probably a good idea not to make it match pid exactly,
> >>>> if for no other reason than I'm not sure we want to
> >>>> commit this being a pid. Let's just call it an id?
> >>>
> >>> Ok.
> >>>
> >>>> And maybe byteswap it or xor with some value
> >>>> just to make sure userspace does not begin abusing it anyway.
> >>>>
> >>>> Also, interaction with pid namespace is unclear to me.
> >>>> Can you document what happens here?
> >>>
> >>> This current patchset only allows the vhost_dev owner to
> >>> create/bind workers for devices it owns, so namespace don't come
> >>
> >> I made a mistake here. The patches do restrict VHOST_SET_VRING_WORKER
> >> to the same owner like I wrote. However, it looks like we could have 2
> >> threads with the same mm pointer so vhost_dev_check_owner returns true,
> >> but they could be in different namespaces.
> >>
> >> Even though we are not going to pass the pid_t between user/kernel
> >> space, should I add a pid namespace check when I repost the patches?
> > 
> > Um it's part of the ioctl. How you are not going to pass it around?
> 
> The not passing a pid around was referring to your comment about
> obfuscating the pid. I might have misunderstood you and thought you
> wanted to do something more like you suggested below where to userspace
> it's just some int as far as userspace knows.
> 
> 
> > 
> > So if we do worry about this, I would just make it a 64 bit integer,
> > rename it "id" and increment each time a thread is created.
> >  
> Yeah, this works for me. I just used a ida to allocate the id. We can 
> then use it's lookup functions too.

Probably for the best, linear lookups will make destroying lots of
threads and O(N^2) operation.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
@ 2021-10-25 17:14               ` Michael S. Tsirkin
  0 siblings, 0 replies; 74+ messages in thread
From: Michael S. Tsirkin @ 2021-10-25 17:14 UTC (permalink / raw)
  To: michael.christie
  Cc: linux-scsi, virtualization, target-devel, stefanha, pbonzini

On Mon, Oct 25, 2021 at 11:04:42AM -0500, michael.christie@oracle.com wrote:
> On 10/23/21 3:11 PM, Michael S. Tsirkin wrote:
> > On Fri, Oct 22, 2021 at 01:17:26PM -0500, michael.christie@oracle.com wrote:
> >> On 10/22/21 11:12 AM, michael.christie@oracle.com wrote:
> >>> On 10/22/21 5:47 AM, Michael S. Tsirkin wrote:
> >>>>> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> >>>>> index c998860d7bbc..e5c0669430e5 100644
> >>>>> --- a/include/uapi/linux/vhost.h
> >>>>> +++ b/include/uapi/linux/vhost.h
> >>>>> @@ -70,6 +70,17 @@
> >>>>>  #define VHOST_VRING_BIG_ENDIAN 1
> >>>>>  #define VHOST_SET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_state)
> >>>>>  #define VHOST_GET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state)
> >>>>> +/* By default, a device gets one vhost_worker created during VHOST_SET_OWNER
> >>>>> + * that its virtqueues share. This allows userspace to create a vhost_worker
> >>>>> + * and map a virtqueue to it or map a virtqueue to an existing worker.
> >>>>> + *
> >>>>> + * If pid > 0 and it matches an existing vhost_worker thread it will be bound
> >>>>> + * to the vq. If pid is VHOST_VRING_NEW_WORKER, then a new worker will be
> >>>>> + * created and bound to the vq.
> >>>>> + *
> >>>>> + * This must be called after VHOST_SET_OWNER and before the vq is active.
> >>>>> + */
> >>>>
> >>>> A couple of things here:
> >>>> it's probably a good idea not to make it match pid exactly,
> >>>> if for no other reason than I'm not sure we want to
> >>>> commit this being a pid. Let's just call it an id?
> >>>
> >>> Ok.
> >>>
> >>>> And maybe byteswap it or xor with some value
> >>>> just to make sure userspace does not begin abusing it anyway.
> >>>>
> >>>> Also, interaction with pid namespace is unclear to me.
> >>>> Can you document what happens here?
> >>>
> >>> This current patchset only allows the vhost_dev owner to
> >>> create/bind workers for devices it owns, so namespace don't come
> >>
> >> I made a mistake here. The patches do restrict VHOST_SET_VRING_WORKER
> >> to the same owner like I wrote. However, it looks like we could have 2
> >> threads with the same mm pointer so vhost_dev_check_owner returns true,
> >> but they could be in different namespaces.
> >>
> >> Even though we are not going to pass the pid_t between user/kernel
> >> space, should I add a pid namespace check when I repost the patches?
> > 
> > Um it's part of the ioctl. How you are not going to pass it around?
> 
> The not passing a pid around was referring to your comment about
> obfuscating the pid. I might have misunderstood you and thought you
> wanted to do something more like you suggested below where to userspace
> it's just some int as far as userspace knows.
> 
> 
> > 
> > So if we do worry about this, I would just make it a 64 bit integer,
> > rename it "id" and increment each time a thread is created.
> >  
> Yeah, this works for me. I just used a ida to allocate the id. We can 
> then use it's lookup functions too.

Probably for the best, linear lookups will make destroying lots of
threads and O(N^2) operation.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
  2021-10-22  5:19   ` Mike Christie
@ 2021-10-26  5:37     ` Jason Wang
  -1 siblings, 0 replies; 74+ messages in thread
From: Jason Wang @ 2021-10-26  5:37 UTC (permalink / raw)
  To: Mike Christie, target-devel, linux-scsi, stefanha, pbonzini, mst,
	sgarzare, virtualization


在 2021/10/22 下午1:19, Mike Christie 写道:
> This patch allows userspace to create workers and bind them to vqs. You
> can have N workers per dev and also share N workers with M vqs.
>
> Signed-off-by: Mike Christie <michael.christie@oracle.com>


A question, who is the best one to determine the binding? Is it the VMM 
(Qemu etc) or the management stack? If the latter, it looks to me it's 
better to expose this via sysfs?


> ---
>   drivers/vhost/vhost.c            | 99 ++++++++++++++++++++++++++++----
>   drivers/vhost/vhost.h            |  2 +-
>   include/uapi/linux/vhost.h       | 11 ++++
>   include/uapi/linux/vhost_types.h | 12 ++++
>   4 files changed, 112 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 04f43a6445e1..c86e88d7f35c 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -493,7 +493,6 @@ void vhost_dev_init(struct vhost_dev *dev,
>   	dev->umem = NULL;
>   	dev->iotlb = NULL;
>   	dev->mm = NULL;
> -	dev->worker = NULL;
>   	dev->iov_limit = iov_limit;
>   	dev->weight = weight;
>   	dev->byte_weight = byte_weight;
> @@ -576,20 +575,40 @@ static void vhost_worker_stop(struct vhost_worker *worker)
>   	wait_for_completion(worker->exit_done);
>   }
>   
> -static void vhost_worker_free(struct vhost_dev *dev)
> -{
> -	struct vhost_worker *worker = dev->worker;
>   
> +static void vhost_worker_put(struct vhost_worker *worker)
> +{
>   	if (!worker)
>   		return;
>   
> -	dev->worker = NULL;
> +	if (!refcount_dec_and_test(&worker->refcount))
> +		return;
> +
>   	WARN_ON(!llist_empty(&worker->work_list));
>   	vhost_worker_stop(worker);
>   	kfree(worker);
>   }
>   
> -static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
> +static void vhost_vq_clear_worker(struct vhost_virtqueue *vq)
> +{
> +	if (vq->worker)
> +		vhost_worker_put(vq->worker);
> +	vq->worker = NULL;
> +}
> +
> +static void vhost_workers_free(struct vhost_dev *dev)
> +{
> +	int i;
> +
> +	if (!dev->use_worker)
> +		return;
> +
> +	for (i = 0; i < dev->nvqs; i++)
> +		vhost_vq_clear_worker(dev->vqs[i]);
> +}
> +
> +static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev,
> +						int init_vq_map_count)
>   {
>   	struct vhost_worker *worker;
>   	struct task_struct *task;
> @@ -598,9 +617,9 @@ static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
>   	if (!worker)
>   		return NULL;
>   
> -	dev->worker = worker;
>   	worker->kcov_handle = kcov_common_handle();
>   	init_llist_head(&worker->work_list);
> +	refcount_set(&worker->refcount, init_vq_map_count);
>   
>   	/*
>   	 * vhost used to use the kthread API which ignores all signals by
> @@ -617,10 +636,58 @@ static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
>   
>   free_worker:
>   	kfree(worker);
> -	dev->worker = NULL;
>   	return NULL;
>   }
>   
> +static struct vhost_worker *vhost_worker_find(struct vhost_dev *dev, pid_t pid)
> +{
> +	struct vhost_worker *worker = NULL;
> +	int i;
> +
> +	for (i = 0; i < dev->nvqs; i++) {
> +		if (dev->vqs[i]->worker->task->pid != pid)
> +			continue;
> +
> +		worker = dev->vqs[i]->worker;
> +		break;
> +	}
> +
> +	return worker;
> +}
> +
> +/* Caller must have device mutex */
> +static int vhost_vq_setup_worker(struct vhost_virtqueue *vq,
> +				 struct vhost_vring_worker *info)
> +{
> +	struct vhost_dev *dev = vq->dev;
> +	struct vhost_worker *worker;
> +
> +	if (!dev->use_worker)
> +		return -EINVAL;
> +
> +	/* We don't support setting a worker on an active vq */
> +	if (vq->private_data)
> +		return -EBUSY;


Is it valuable to allow the worker switching on active vq?


> +
> +	if (info->pid == VHOST_VRING_NEW_WORKER) {
> +		worker = vhost_worker_create(dev, 1);
> +		if (!worker)
> +			return -ENOMEM;
> +
> +		info->pid = worker->task->pid;
> +	} else {
> +		worker = vhost_worker_find(dev, info->pid);
> +		if (!worker)
> +			return -ENODEV;
> +
> +		refcount_inc(&worker->refcount);
> +	}
> +
> +	vhost_vq_clear_worker(vq);
> +	vq->worker = worker;
> +	return 0;
> +}
> +
>   /* Caller should have device mutex */
>   long vhost_dev_set_owner(struct vhost_dev *dev)
>   {
> @@ -636,7 +703,7 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
>   	vhost_attach_mm(dev);
>   
>   	if (dev->use_worker) {
> -		worker = vhost_worker_create(dev);
> +		worker = vhost_worker_create(dev, dev->nvqs);
>   		if (!worker)
>   			goto err_worker;
>   
> @@ -650,7 +717,7 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
>   
>   	return 0;
>   err_iovecs:
> -	vhost_worker_free(dev);
> +	vhost_workers_free(dev);
>   err_worker:
>   	vhost_detach_mm(dev);
>   err_mm:
> @@ -742,7 +809,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
>   	dev->iotlb = NULL;
>   	vhost_clear_msg(dev);
>   	wake_up_interruptible_poll(&dev->wait, EPOLLIN | EPOLLRDNORM);
> -	vhost_worker_free(dev);
> +	vhost_workers_free(dev);
>   	vhost_detach_mm(dev);
>   }
>   EXPORT_SYMBOL_GPL(vhost_dev_cleanup);
> @@ -1612,6 +1679,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
>   	struct eventfd_ctx *ctx = NULL;
>   	u32 __user *idxp = argp;
>   	struct vhost_virtqueue *vq;
> +	struct vhost_vring_worker w;
>   	struct vhost_vring_state s;
>   	struct vhost_vring_file f;
>   	u32 idx;
> @@ -1719,6 +1787,15 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
>   		if (copy_to_user(argp, &s, sizeof(s)))
>   			r = -EFAULT;
>   		break;
> +	case VHOST_SET_VRING_WORKER:
> +		if (copy_from_user(&w, argp, sizeof(w))) {
> +			r = -EFAULT;
> +			break;
> +		}
> +		r = vhost_vq_setup_worker(vq, &w);
> +		if (!r && copy_to_user(argp, &w, sizeof(w)))
> +			r = -EFAULT;
> +		break;
>   	default:
>   		r = -ENOIOCTLCMD;
>   	}
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 33c63b24187a..0911d1a9bd3b 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -35,6 +35,7 @@ struct vhost_worker {
>   	struct llist_head	work_list;
>   	u64			kcov_handle;
>   	unsigned long		flags;
> +	refcount_t		refcount;
>   };
>   
>   /* Poll a file (eventfd or socket) */
> @@ -160,7 +161,6 @@ struct vhost_dev {
>   	struct vhost_virtqueue **vqs;
>   	int nvqs;
>   	struct eventfd_ctx *log_ctx;
> -	struct vhost_worker *worker;
>   	struct vhost_iotlb *umem;
>   	struct vhost_iotlb *iotlb;
>   	spinlock_t iotlb_lock;
> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> index c998860d7bbc..e5c0669430e5 100644
> --- a/include/uapi/linux/vhost.h
> +++ b/include/uapi/linux/vhost.h
> @@ -70,6 +70,17 @@
>   #define VHOST_VRING_BIG_ENDIAN 1
>   #define VHOST_SET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_state)
>   #define VHOST_GET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state)
> +/* By default, a device gets one vhost_worker created during VHOST_SET_OWNER
> + * that its virtqueues share. This allows userspace to create a vhost_worker
> + * and map a virtqueue to it or map a virtqueue to an existing worker.
> + *
> + * If pid > 0 and it matches an existing vhost_worker thread it will be bound
> + * to the vq. If pid is VHOST_VRING_NEW_WORKER, then a new worker will be
> + * created and bound to the vq.
> + *
> + * This must be called after VHOST_SET_OWNER and before the vq is active.
> + */
> +#define VHOST_SET_VRING_WORKER _IOWR(VHOST_VIRTIO, 0x15, struct vhost_vring_worker)
>   
>   /* The following ioctls use eventfd file descriptors to signal and poll
>    * for events. */
> diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
> index f7f6a3a28977..af654e3cef0e 100644
> --- a/include/uapi/linux/vhost_types.h
> +++ b/include/uapi/linux/vhost_types.h
> @@ -47,6 +47,18 @@ struct vhost_vring_addr {
>   	__u64 log_guest_addr;
>   };
>   
> +#define VHOST_VRING_NEW_WORKER -1


Do we need VHOST_VRING_FREE_WORKER? And I wonder if using dedicated 
ioctls are better:

VHOST_VRING_NEW/FREE_WORKER
VHOST_VRING_ATTACH_WORKER

etc.

Thanks


> +
> +struct vhost_vring_worker {
> +	unsigned int index;
> +	/*
> +	 * The pid of the vhost worker that the vq will be bound to. If
> +	 * pid is VHOST_VRING_NEW_WORKER a new worker will be created and its
> +	 * pid will be returned in pid.
> +	 */
> +	__kernel_pid_t pid;
> +};
> +
>   /* no alignment requirement */
>   struct vhost_iotlb_msg {
>   	__u64 iova;


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
@ 2021-10-26  5:37     ` Jason Wang
  0 siblings, 0 replies; 74+ messages in thread
From: Jason Wang @ 2021-10-26  5:37 UTC (permalink / raw)
  To: Mike Christie, target-devel, linux-scsi, stefanha, pbonzini, mst,
	sgarzare, virtualization


在 2021/10/22 下午1:19, Mike Christie 写道:
> This patch allows userspace to create workers and bind them to vqs. You
> can have N workers per dev and also share N workers with M vqs.
>
> Signed-off-by: Mike Christie <michael.christie@oracle.com>


A question, who is the best one to determine the binding? Is it the VMM 
(Qemu etc) or the management stack? If the latter, it looks to me it's 
better to expose this via sysfs?


> ---
>   drivers/vhost/vhost.c            | 99 ++++++++++++++++++++++++++++----
>   drivers/vhost/vhost.h            |  2 +-
>   include/uapi/linux/vhost.h       | 11 ++++
>   include/uapi/linux/vhost_types.h | 12 ++++
>   4 files changed, 112 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 04f43a6445e1..c86e88d7f35c 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -493,7 +493,6 @@ void vhost_dev_init(struct vhost_dev *dev,
>   	dev->umem = NULL;
>   	dev->iotlb = NULL;
>   	dev->mm = NULL;
> -	dev->worker = NULL;
>   	dev->iov_limit = iov_limit;
>   	dev->weight = weight;
>   	dev->byte_weight = byte_weight;
> @@ -576,20 +575,40 @@ static void vhost_worker_stop(struct vhost_worker *worker)
>   	wait_for_completion(worker->exit_done);
>   }
>   
> -static void vhost_worker_free(struct vhost_dev *dev)
> -{
> -	struct vhost_worker *worker = dev->worker;
>   
> +static void vhost_worker_put(struct vhost_worker *worker)
> +{
>   	if (!worker)
>   		return;
>   
> -	dev->worker = NULL;
> +	if (!refcount_dec_and_test(&worker->refcount))
> +		return;
> +
>   	WARN_ON(!llist_empty(&worker->work_list));
>   	vhost_worker_stop(worker);
>   	kfree(worker);
>   }
>   
> -static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
> +static void vhost_vq_clear_worker(struct vhost_virtqueue *vq)
> +{
> +	if (vq->worker)
> +		vhost_worker_put(vq->worker);
> +	vq->worker = NULL;
> +}
> +
> +static void vhost_workers_free(struct vhost_dev *dev)
> +{
> +	int i;
> +
> +	if (!dev->use_worker)
> +		return;
> +
> +	for (i = 0; i < dev->nvqs; i++)
> +		vhost_vq_clear_worker(dev->vqs[i]);
> +}
> +
> +static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev,
> +						int init_vq_map_count)
>   {
>   	struct vhost_worker *worker;
>   	struct task_struct *task;
> @@ -598,9 +617,9 @@ static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
>   	if (!worker)
>   		return NULL;
>   
> -	dev->worker = worker;
>   	worker->kcov_handle = kcov_common_handle();
>   	init_llist_head(&worker->work_list);
> +	refcount_set(&worker->refcount, init_vq_map_count);
>   
>   	/*
>   	 * vhost used to use the kthread API which ignores all signals by
> @@ -617,10 +636,58 @@ static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
>   
>   free_worker:
>   	kfree(worker);
> -	dev->worker = NULL;
>   	return NULL;
>   }
>   
> +static struct vhost_worker *vhost_worker_find(struct vhost_dev *dev, pid_t pid)
> +{
> +	struct vhost_worker *worker = NULL;
> +	int i;
> +
> +	for (i = 0; i < dev->nvqs; i++) {
> +		if (dev->vqs[i]->worker->task->pid != pid)
> +			continue;
> +
> +		worker = dev->vqs[i]->worker;
> +		break;
> +	}
> +
> +	return worker;
> +}
> +
> +/* Caller must have device mutex */
> +static int vhost_vq_setup_worker(struct vhost_virtqueue *vq,
> +				 struct vhost_vring_worker *info)
> +{
> +	struct vhost_dev *dev = vq->dev;
> +	struct vhost_worker *worker;
> +
> +	if (!dev->use_worker)
> +		return -EINVAL;
> +
> +	/* We don't support setting a worker on an active vq */
> +	if (vq->private_data)
> +		return -EBUSY;


Is it valuable to allow the worker switching on active vq?


> +
> +	if (info->pid == VHOST_VRING_NEW_WORKER) {
> +		worker = vhost_worker_create(dev, 1);
> +		if (!worker)
> +			return -ENOMEM;
> +
> +		info->pid = worker->task->pid;
> +	} else {
> +		worker = vhost_worker_find(dev, info->pid);
> +		if (!worker)
> +			return -ENODEV;
> +
> +		refcount_inc(&worker->refcount);
> +	}
> +
> +	vhost_vq_clear_worker(vq);
> +	vq->worker = worker;
> +	return 0;
> +}
> +
>   /* Caller should have device mutex */
>   long vhost_dev_set_owner(struct vhost_dev *dev)
>   {
> @@ -636,7 +703,7 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
>   	vhost_attach_mm(dev);
>   
>   	if (dev->use_worker) {
> -		worker = vhost_worker_create(dev);
> +		worker = vhost_worker_create(dev, dev->nvqs);
>   		if (!worker)
>   			goto err_worker;
>   
> @@ -650,7 +717,7 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
>   
>   	return 0;
>   err_iovecs:
> -	vhost_worker_free(dev);
> +	vhost_workers_free(dev);
>   err_worker:
>   	vhost_detach_mm(dev);
>   err_mm:
> @@ -742,7 +809,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
>   	dev->iotlb = NULL;
>   	vhost_clear_msg(dev);
>   	wake_up_interruptible_poll(&dev->wait, EPOLLIN | EPOLLRDNORM);
> -	vhost_worker_free(dev);
> +	vhost_workers_free(dev);
>   	vhost_detach_mm(dev);
>   }
>   EXPORT_SYMBOL_GPL(vhost_dev_cleanup);
> @@ -1612,6 +1679,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
>   	struct eventfd_ctx *ctx = NULL;
>   	u32 __user *idxp = argp;
>   	struct vhost_virtqueue *vq;
> +	struct vhost_vring_worker w;
>   	struct vhost_vring_state s;
>   	struct vhost_vring_file f;
>   	u32 idx;
> @@ -1719,6 +1787,15 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
>   		if (copy_to_user(argp, &s, sizeof(s)))
>   			r = -EFAULT;
>   		break;
> +	case VHOST_SET_VRING_WORKER:
> +		if (copy_from_user(&w, argp, sizeof(w))) {
> +			r = -EFAULT;
> +			break;
> +		}
> +		r = vhost_vq_setup_worker(vq, &w);
> +		if (!r && copy_to_user(argp, &w, sizeof(w)))
> +			r = -EFAULT;
> +		break;
>   	default:
>   		r = -ENOIOCTLCMD;
>   	}
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 33c63b24187a..0911d1a9bd3b 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -35,6 +35,7 @@ struct vhost_worker {
>   	struct llist_head	work_list;
>   	u64			kcov_handle;
>   	unsigned long		flags;
> +	refcount_t		refcount;
>   };
>   
>   /* Poll a file (eventfd or socket) */
> @@ -160,7 +161,6 @@ struct vhost_dev {
>   	struct vhost_virtqueue **vqs;
>   	int nvqs;
>   	struct eventfd_ctx *log_ctx;
> -	struct vhost_worker *worker;
>   	struct vhost_iotlb *umem;
>   	struct vhost_iotlb *iotlb;
>   	spinlock_t iotlb_lock;
> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> index c998860d7bbc..e5c0669430e5 100644
> --- a/include/uapi/linux/vhost.h
> +++ b/include/uapi/linux/vhost.h
> @@ -70,6 +70,17 @@
>   #define VHOST_VRING_BIG_ENDIAN 1
>   #define VHOST_SET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_state)
>   #define VHOST_GET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state)
> +/* By default, a device gets one vhost_worker created during VHOST_SET_OWNER
> + * that its virtqueues share. This allows userspace to create a vhost_worker
> + * and map a virtqueue to it or map a virtqueue to an existing worker.
> + *
> + * If pid > 0 and it matches an existing vhost_worker thread it will be bound
> + * to the vq. If pid is VHOST_VRING_NEW_WORKER, then a new worker will be
> + * created and bound to the vq.
> + *
> + * This must be called after VHOST_SET_OWNER and before the vq is active.
> + */
> +#define VHOST_SET_VRING_WORKER _IOWR(VHOST_VIRTIO, 0x15, struct vhost_vring_worker)
>   
>   /* The following ioctls use eventfd file descriptors to signal and poll
>    * for events. */
> diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
> index f7f6a3a28977..af654e3cef0e 100644
> --- a/include/uapi/linux/vhost_types.h
> +++ b/include/uapi/linux/vhost_types.h
> @@ -47,6 +47,18 @@ struct vhost_vring_addr {
>   	__u64 log_guest_addr;
>   };
>   
> +#define VHOST_VRING_NEW_WORKER -1


Do we need VHOST_VRING_FREE_WORKER? And I wonder if using dedicated 
ioctls are better:

VHOST_VRING_NEW/FREE_WORKER
VHOST_VRING_ATTACH_WORKER

etc.

Thanks


> +
> +struct vhost_vring_worker {
> +	unsigned int index;
> +	/*
> +	 * The pid of the vhost worker that the vq will be bound to. If
> +	 * pid is VHOST_VRING_NEW_WORKER a new worker will be created and its
> +	 * pid will be returned in pid.
> +	 */
> +	__kernel_pid_t pid;
> +};
> +
>   /* no alignment requirement */
>   struct vhost_iotlb_msg {
>   	__u64 iova;

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
  2021-10-26  5:37     ` Jason Wang
@ 2021-10-26 13:09       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 74+ messages in thread
From: Michael S. Tsirkin @ 2021-10-26 13:09 UTC (permalink / raw)
  To: Jason Wang
  Cc: Mike Christie, target-devel, linux-scsi, stefanha, pbonzini,
	sgarzare, virtualization

On Tue, Oct 26, 2021 at 01:37:14PM +0800, Jason Wang wrote:
> 
> 在 2021/10/22 下午1:19, Mike Christie 写道:
> > This patch allows userspace to create workers and bind them to vqs. You
> > can have N workers per dev and also share N workers with M vqs.
> > 
> > Signed-off-by: Mike Christie <michael.christie@oracle.com>
> 
> 
> A question, who is the best one to determine the binding? Is it the VMM
> (Qemu etc) or the management stack? If the latter, it looks to me it's
> better to expose this via sysfs?

I think it's a bit much to expect this from management.

> 
> > ---
> >   drivers/vhost/vhost.c            | 99 ++++++++++++++++++++++++++++----
> >   drivers/vhost/vhost.h            |  2 +-
> >   include/uapi/linux/vhost.h       | 11 ++++
> >   include/uapi/linux/vhost_types.h | 12 ++++
> >   4 files changed, 112 insertions(+), 12 deletions(-)
> > 
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index 04f43a6445e1..c86e88d7f35c 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -493,7 +493,6 @@ void vhost_dev_init(struct vhost_dev *dev,
> >   	dev->umem = NULL;
> >   	dev->iotlb = NULL;
> >   	dev->mm = NULL;
> > -	dev->worker = NULL;
> >   	dev->iov_limit = iov_limit;
> >   	dev->weight = weight;
> >   	dev->byte_weight = byte_weight;
> > @@ -576,20 +575,40 @@ static void vhost_worker_stop(struct vhost_worker *worker)
> >   	wait_for_completion(worker->exit_done);
> >   }
> > -static void vhost_worker_free(struct vhost_dev *dev)
> > -{
> > -	struct vhost_worker *worker = dev->worker;
> > +static void vhost_worker_put(struct vhost_worker *worker)
> > +{
> >   	if (!worker)
> >   		return;
> > -	dev->worker = NULL;
> > +	if (!refcount_dec_and_test(&worker->refcount))
> > +		return;
> > +
> >   	WARN_ON(!llist_empty(&worker->work_list));
> >   	vhost_worker_stop(worker);
> >   	kfree(worker);
> >   }
> > -static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
> > +static void vhost_vq_clear_worker(struct vhost_virtqueue *vq)
> > +{
> > +	if (vq->worker)
> > +		vhost_worker_put(vq->worker);
> > +	vq->worker = NULL;
> > +}
> > +
> > +static void vhost_workers_free(struct vhost_dev *dev)
> > +{
> > +	int i;
> > +
> > +	if (!dev->use_worker)
> > +		return;
> > +
> > +	for (i = 0; i < dev->nvqs; i++)
> > +		vhost_vq_clear_worker(dev->vqs[i]);
> > +}
> > +
> > +static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev,
> > +						int init_vq_map_count)
> >   {
> >   	struct vhost_worker *worker;
> >   	struct task_struct *task;
> > @@ -598,9 +617,9 @@ static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
> >   	if (!worker)
> >   		return NULL;
> > -	dev->worker = worker;
> >   	worker->kcov_handle = kcov_common_handle();
> >   	init_llist_head(&worker->work_list);
> > +	refcount_set(&worker->refcount, init_vq_map_count);
> >   	/*
> >   	 * vhost used to use the kthread API which ignores all signals by
> > @@ -617,10 +636,58 @@ static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
> >   free_worker:
> >   	kfree(worker);
> > -	dev->worker = NULL;
> >   	return NULL;
> >   }
> > +static struct vhost_worker *vhost_worker_find(struct vhost_dev *dev, pid_t pid)
> > +{
> > +	struct vhost_worker *worker = NULL;
> > +	int i;
> > +
> > +	for (i = 0; i < dev->nvqs; i++) {
> > +		if (dev->vqs[i]->worker->task->pid != pid)
> > +			continue;
> > +
> > +		worker = dev->vqs[i]->worker;
> > +		break;
> > +	}
> > +
> > +	return worker;
> > +}
> > +
> > +/* Caller must have device mutex */
> > +static int vhost_vq_setup_worker(struct vhost_virtqueue *vq,
> > +				 struct vhost_vring_worker *info)
> > +{
> > +	struct vhost_dev *dev = vq->dev;
> > +	struct vhost_worker *worker;
> > +
> > +	if (!dev->use_worker)
> > +		return -EINVAL;
> > +
> > +	/* We don't support setting a worker on an active vq */
> > +	if (vq->private_data)
> > +		return -EBUSY;
> 
> 
> Is it valuable to allow the worker switching on active vq?
> 
> 
> > +
> > +	if (info->pid == VHOST_VRING_NEW_WORKER) {
> > +		worker = vhost_worker_create(dev, 1);
> > +		if (!worker)
> > +			return -ENOMEM;
> > +
> > +		info->pid = worker->task->pid;
> > +	} else {
> > +		worker = vhost_worker_find(dev, info->pid);
> > +		if (!worker)
> > +			return -ENODEV;
> > +
> > +		refcount_inc(&worker->refcount);
> > +	}
> > +
> > +	vhost_vq_clear_worker(vq);
> > +	vq->worker = worker;
> > +	return 0;
> > +}
> > +
> >   /* Caller should have device mutex */
> >   long vhost_dev_set_owner(struct vhost_dev *dev)
> >   {
> > @@ -636,7 +703,7 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
> >   	vhost_attach_mm(dev);
> >   	if (dev->use_worker) {
> > -		worker = vhost_worker_create(dev);
> > +		worker = vhost_worker_create(dev, dev->nvqs);
> >   		if (!worker)
> >   			goto err_worker;
> > @@ -650,7 +717,7 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
> >   	return 0;
> >   err_iovecs:
> > -	vhost_worker_free(dev);
> > +	vhost_workers_free(dev);
> >   err_worker:
> >   	vhost_detach_mm(dev);
> >   err_mm:
> > @@ -742,7 +809,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> >   	dev->iotlb = NULL;
> >   	vhost_clear_msg(dev);
> >   	wake_up_interruptible_poll(&dev->wait, EPOLLIN | EPOLLRDNORM);
> > -	vhost_worker_free(dev);
> > +	vhost_workers_free(dev);
> >   	vhost_detach_mm(dev);
> >   }
> >   EXPORT_SYMBOL_GPL(vhost_dev_cleanup);
> > @@ -1612,6 +1679,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
> >   	struct eventfd_ctx *ctx = NULL;
> >   	u32 __user *idxp = argp;
> >   	struct vhost_virtqueue *vq;
> > +	struct vhost_vring_worker w;
> >   	struct vhost_vring_state s;
> >   	struct vhost_vring_file f;
> >   	u32 idx;
> > @@ -1719,6 +1787,15 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
> >   		if (copy_to_user(argp, &s, sizeof(s)))
> >   			r = -EFAULT;
> >   		break;
> > +	case VHOST_SET_VRING_WORKER:
> > +		if (copy_from_user(&w, argp, sizeof(w))) {
> > +			r = -EFAULT;
> > +			break;
> > +		}
> > +		r = vhost_vq_setup_worker(vq, &w);
> > +		if (!r && copy_to_user(argp, &w, sizeof(w)))
> > +			r = -EFAULT;
> > +		break;
> >   	default:
> >   		r = -ENOIOCTLCMD;
> >   	}
> > diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> > index 33c63b24187a..0911d1a9bd3b 100644
> > --- a/drivers/vhost/vhost.h
> > +++ b/drivers/vhost/vhost.h
> > @@ -35,6 +35,7 @@ struct vhost_worker {
> >   	struct llist_head	work_list;
> >   	u64			kcov_handle;
> >   	unsigned long		flags;
> > +	refcount_t		refcount;
> >   };
> >   /* Poll a file (eventfd or socket) */
> > @@ -160,7 +161,6 @@ struct vhost_dev {
> >   	struct vhost_virtqueue **vqs;
> >   	int nvqs;
> >   	struct eventfd_ctx *log_ctx;
> > -	struct vhost_worker *worker;
> >   	struct vhost_iotlb *umem;
> >   	struct vhost_iotlb *iotlb;
> >   	spinlock_t iotlb_lock;
> > diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> > index c998860d7bbc..e5c0669430e5 100644
> > --- a/include/uapi/linux/vhost.h
> > +++ b/include/uapi/linux/vhost.h
> > @@ -70,6 +70,17 @@
> >   #define VHOST_VRING_BIG_ENDIAN 1
> >   #define VHOST_SET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_state)
> >   #define VHOST_GET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state)
> > +/* By default, a device gets one vhost_worker created during VHOST_SET_OWNER
> > + * that its virtqueues share. This allows userspace to create a vhost_worker
> > + * and map a virtqueue to it or map a virtqueue to an existing worker.
> > + *
> > + * If pid > 0 and it matches an existing vhost_worker thread it will be bound
> > + * to the vq. If pid is VHOST_VRING_NEW_WORKER, then a new worker will be
> > + * created and bound to the vq.
> > + *
> > + * This must be called after VHOST_SET_OWNER and before the vq is active.
> > + */
> > +#define VHOST_SET_VRING_WORKER _IOWR(VHOST_VIRTIO, 0x15, struct vhost_vring_worker)
> >   /* The following ioctls use eventfd file descriptors to signal and poll
> >    * for events. */
> > diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
> > index f7f6a3a28977..af654e3cef0e 100644
> > --- a/include/uapi/linux/vhost_types.h
> > +++ b/include/uapi/linux/vhost_types.h
> > @@ -47,6 +47,18 @@ struct vhost_vring_addr {
> >   	__u64 log_guest_addr;
> >   };
> > +#define VHOST_VRING_NEW_WORKER -1
> 
> 
> Do we need VHOST_VRING_FREE_WORKER? And I wonder if using dedicated ioctls
> are better:
> 
> VHOST_VRING_NEW/FREE_WORKER
> VHOST_VRING_ATTACH_WORKER
> 
> etc.
> 
> Thanks
> 
> 
> > +
> > +struct vhost_vring_worker {
> > +	unsigned int index;
> > +	/*
> > +	 * The pid of the vhost worker that the vq will be bound to. If
> > +	 * pid is VHOST_VRING_NEW_WORKER a new worker will be created and its
> > +	 * pid will be returned in pid.
> > +	 */
> > +	__kernel_pid_t pid;
> > +};
> > +
> >   /* no alignment requirement */
> >   struct vhost_iotlb_msg {
> >   	__u64 iova;


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
@ 2021-10-26 13:09       ` Michael S. Tsirkin
  0 siblings, 0 replies; 74+ messages in thread
From: Michael S. Tsirkin @ 2021-10-26 13:09 UTC (permalink / raw)
  To: Jason Wang; +Cc: linux-scsi, virtualization, target-devel, stefanha, pbonzini

On Tue, Oct 26, 2021 at 01:37:14PM +0800, Jason Wang wrote:
> 
> 在 2021/10/22 下午1:19, Mike Christie 写道:
> > This patch allows userspace to create workers and bind them to vqs. You
> > can have N workers per dev and also share N workers with M vqs.
> > 
> > Signed-off-by: Mike Christie <michael.christie@oracle.com>
> 
> 
> A question, who is the best one to determine the binding? Is it the VMM
> (Qemu etc) or the management stack? If the latter, it looks to me it's
> better to expose this via sysfs?

I think it's a bit much to expect this from management.

> 
> > ---
> >   drivers/vhost/vhost.c            | 99 ++++++++++++++++++++++++++++----
> >   drivers/vhost/vhost.h            |  2 +-
> >   include/uapi/linux/vhost.h       | 11 ++++
> >   include/uapi/linux/vhost_types.h | 12 ++++
> >   4 files changed, 112 insertions(+), 12 deletions(-)
> > 
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index 04f43a6445e1..c86e88d7f35c 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -493,7 +493,6 @@ void vhost_dev_init(struct vhost_dev *dev,
> >   	dev->umem = NULL;
> >   	dev->iotlb = NULL;
> >   	dev->mm = NULL;
> > -	dev->worker = NULL;
> >   	dev->iov_limit = iov_limit;
> >   	dev->weight = weight;
> >   	dev->byte_weight = byte_weight;
> > @@ -576,20 +575,40 @@ static void vhost_worker_stop(struct vhost_worker *worker)
> >   	wait_for_completion(worker->exit_done);
> >   }
> > -static void vhost_worker_free(struct vhost_dev *dev)
> > -{
> > -	struct vhost_worker *worker = dev->worker;
> > +static void vhost_worker_put(struct vhost_worker *worker)
> > +{
> >   	if (!worker)
> >   		return;
> > -	dev->worker = NULL;
> > +	if (!refcount_dec_and_test(&worker->refcount))
> > +		return;
> > +
> >   	WARN_ON(!llist_empty(&worker->work_list));
> >   	vhost_worker_stop(worker);
> >   	kfree(worker);
> >   }
> > -static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
> > +static void vhost_vq_clear_worker(struct vhost_virtqueue *vq)
> > +{
> > +	if (vq->worker)
> > +		vhost_worker_put(vq->worker);
> > +	vq->worker = NULL;
> > +}
> > +
> > +static void vhost_workers_free(struct vhost_dev *dev)
> > +{
> > +	int i;
> > +
> > +	if (!dev->use_worker)
> > +		return;
> > +
> > +	for (i = 0; i < dev->nvqs; i++)
> > +		vhost_vq_clear_worker(dev->vqs[i]);
> > +}
> > +
> > +static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev,
> > +						int init_vq_map_count)
> >   {
> >   	struct vhost_worker *worker;
> >   	struct task_struct *task;
> > @@ -598,9 +617,9 @@ static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
> >   	if (!worker)
> >   		return NULL;
> > -	dev->worker = worker;
> >   	worker->kcov_handle = kcov_common_handle();
> >   	init_llist_head(&worker->work_list);
> > +	refcount_set(&worker->refcount, init_vq_map_count);
> >   	/*
> >   	 * vhost used to use the kthread API which ignores all signals by
> > @@ -617,10 +636,58 @@ static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
> >   free_worker:
> >   	kfree(worker);
> > -	dev->worker = NULL;
> >   	return NULL;
> >   }
> > +static struct vhost_worker *vhost_worker_find(struct vhost_dev *dev, pid_t pid)
> > +{
> > +	struct vhost_worker *worker = NULL;
> > +	int i;
> > +
> > +	for (i = 0; i < dev->nvqs; i++) {
> > +		if (dev->vqs[i]->worker->task->pid != pid)
> > +			continue;
> > +
> > +		worker = dev->vqs[i]->worker;
> > +		break;
> > +	}
> > +
> > +	return worker;
> > +}
> > +
> > +/* Caller must have device mutex */
> > +static int vhost_vq_setup_worker(struct vhost_virtqueue *vq,
> > +				 struct vhost_vring_worker *info)
> > +{
> > +	struct vhost_dev *dev = vq->dev;
> > +	struct vhost_worker *worker;
> > +
> > +	if (!dev->use_worker)
> > +		return -EINVAL;
> > +
> > +	/* We don't support setting a worker on an active vq */
> > +	if (vq->private_data)
> > +		return -EBUSY;
> 
> 
> Is it valuable to allow the worker switching on active vq?
> 
> 
> > +
> > +	if (info->pid == VHOST_VRING_NEW_WORKER) {
> > +		worker = vhost_worker_create(dev, 1);
> > +		if (!worker)
> > +			return -ENOMEM;
> > +
> > +		info->pid = worker->task->pid;
> > +	} else {
> > +		worker = vhost_worker_find(dev, info->pid);
> > +		if (!worker)
> > +			return -ENODEV;
> > +
> > +		refcount_inc(&worker->refcount);
> > +	}
> > +
> > +	vhost_vq_clear_worker(vq);
> > +	vq->worker = worker;
> > +	return 0;
> > +}
> > +
> >   /* Caller should have device mutex */
> >   long vhost_dev_set_owner(struct vhost_dev *dev)
> >   {
> > @@ -636,7 +703,7 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
> >   	vhost_attach_mm(dev);
> >   	if (dev->use_worker) {
> > -		worker = vhost_worker_create(dev);
> > +		worker = vhost_worker_create(dev, dev->nvqs);
> >   		if (!worker)
> >   			goto err_worker;
> > @@ -650,7 +717,7 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
> >   	return 0;
> >   err_iovecs:
> > -	vhost_worker_free(dev);
> > +	vhost_workers_free(dev);
> >   err_worker:
> >   	vhost_detach_mm(dev);
> >   err_mm:
> > @@ -742,7 +809,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> >   	dev->iotlb = NULL;
> >   	vhost_clear_msg(dev);
> >   	wake_up_interruptible_poll(&dev->wait, EPOLLIN | EPOLLRDNORM);
> > -	vhost_worker_free(dev);
> > +	vhost_workers_free(dev);
> >   	vhost_detach_mm(dev);
> >   }
> >   EXPORT_SYMBOL_GPL(vhost_dev_cleanup);
> > @@ -1612,6 +1679,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
> >   	struct eventfd_ctx *ctx = NULL;
> >   	u32 __user *idxp = argp;
> >   	struct vhost_virtqueue *vq;
> > +	struct vhost_vring_worker w;
> >   	struct vhost_vring_state s;
> >   	struct vhost_vring_file f;
> >   	u32 idx;
> > @@ -1719,6 +1787,15 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
> >   		if (copy_to_user(argp, &s, sizeof(s)))
> >   			r = -EFAULT;
> >   		break;
> > +	case VHOST_SET_VRING_WORKER:
> > +		if (copy_from_user(&w, argp, sizeof(w))) {
> > +			r = -EFAULT;
> > +			break;
> > +		}
> > +		r = vhost_vq_setup_worker(vq, &w);
> > +		if (!r && copy_to_user(argp, &w, sizeof(w)))
> > +			r = -EFAULT;
> > +		break;
> >   	default:
> >   		r = -ENOIOCTLCMD;
> >   	}
> > diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> > index 33c63b24187a..0911d1a9bd3b 100644
> > --- a/drivers/vhost/vhost.h
> > +++ b/drivers/vhost/vhost.h
> > @@ -35,6 +35,7 @@ struct vhost_worker {
> >   	struct llist_head	work_list;
> >   	u64			kcov_handle;
> >   	unsigned long		flags;
> > +	refcount_t		refcount;
> >   };
> >   /* Poll a file (eventfd or socket) */
> > @@ -160,7 +161,6 @@ struct vhost_dev {
> >   	struct vhost_virtqueue **vqs;
> >   	int nvqs;
> >   	struct eventfd_ctx *log_ctx;
> > -	struct vhost_worker *worker;
> >   	struct vhost_iotlb *umem;
> >   	struct vhost_iotlb *iotlb;
> >   	spinlock_t iotlb_lock;
> > diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> > index c998860d7bbc..e5c0669430e5 100644
> > --- a/include/uapi/linux/vhost.h
> > +++ b/include/uapi/linux/vhost.h
> > @@ -70,6 +70,17 @@
> >   #define VHOST_VRING_BIG_ENDIAN 1
> >   #define VHOST_SET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_state)
> >   #define VHOST_GET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state)
> > +/* By default, a device gets one vhost_worker created during VHOST_SET_OWNER
> > + * that its virtqueues share. This allows userspace to create a vhost_worker
> > + * and map a virtqueue to it or map a virtqueue to an existing worker.
> > + *
> > + * If pid > 0 and it matches an existing vhost_worker thread it will be bound
> > + * to the vq. If pid is VHOST_VRING_NEW_WORKER, then a new worker will be
> > + * created and bound to the vq.
> > + *
> > + * This must be called after VHOST_SET_OWNER and before the vq is active.
> > + */
> > +#define VHOST_SET_VRING_WORKER _IOWR(VHOST_VIRTIO, 0x15, struct vhost_vring_worker)
> >   /* The following ioctls use eventfd file descriptors to signal and poll
> >    * for events. */
> > diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
> > index f7f6a3a28977..af654e3cef0e 100644
> > --- a/include/uapi/linux/vhost_types.h
> > +++ b/include/uapi/linux/vhost_types.h
> > @@ -47,6 +47,18 @@ struct vhost_vring_addr {
> >   	__u64 log_guest_addr;
> >   };
> > +#define VHOST_VRING_NEW_WORKER -1
> 
> 
> Do we need VHOST_VRING_FREE_WORKER? And I wonder if using dedicated ioctls
> are better:
> 
> VHOST_VRING_NEW/FREE_WORKER
> VHOST_VRING_ATTACH_WORKER
> 
> etc.
> 
> Thanks
> 
> 
> > +
> > +struct vhost_vring_worker {
> > +	unsigned int index;
> > +	/*
> > +	 * The pid of the vhost worker that the vq will be bound to. If
> > +	 * pid is VHOST_VRING_NEW_WORKER a new worker will be created and its
> > +	 * pid will be returned in pid.
> > +	 */
> > +	__kernel_pid_t pid;
> > +};
> > +
> >   /* no alignment requirement */
> >   struct vhost_iotlb_msg {
> >   	__u64 iova;

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
  2021-10-22  5:19   ` Mike Christie
@ 2021-10-26 15:22     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 74+ messages in thread
From: Stefan Hajnoczi @ 2021-10-26 15:22 UTC (permalink / raw)
  To: Mike Christie
  Cc: target-devel, linux-scsi, pbonzini, jasowang, mst, sgarzare,
	virtualization

[-- Attachment #1: Type: text/plain, Size: 291 bytes --]

On Fri, Oct 22, 2021 at 12:19:11AM -0500, Mike Christie wrote:
> +/* Caller must have device mutex */
> +static int vhost_vq_setup_worker(struct vhost_virtqueue *vq,
> +				 struct vhost_vring_worker *info)

It's clearer if the function name matches the ioctl name
(VHOST_SET_VRING_WORKER).

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
@ 2021-10-26 15:22     ` Stefan Hajnoczi
  0 siblings, 0 replies; 74+ messages in thread
From: Stefan Hajnoczi @ 2021-10-26 15:22 UTC (permalink / raw)
  To: Mike Christie; +Cc: linux-scsi, mst, virtualization, target-devel, pbonzini


[-- Attachment #1.1: Type: text/plain, Size: 291 bytes --]

On Fri, Oct 22, 2021 at 12:19:11AM -0500, Mike Christie wrote:
> +/* Caller must have device mutex */
> +static int vhost_vq_setup_worker(struct vhost_virtqueue *vq,
> +				 struct vhost_vring_worker *info)

It's clearer if the function name matches the ioctl name
(VHOST_SET_VRING_WORKER).

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
  2021-10-22  5:19   ` Mike Christie
@ 2021-10-26 15:24     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 74+ messages in thread
From: Stefan Hajnoczi @ 2021-10-26 15:24 UTC (permalink / raw)
  To: Mike Christie
  Cc: target-devel, linux-scsi, pbonzini, jasowang, mst, sgarzare,
	virtualization

[-- Attachment #1: Type: text/plain, Size: 1174 bytes --]

On Fri, Oct 22, 2021 at 12:19:11AM -0500, Mike Christie wrote:
> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> index c998860d7bbc..e5c0669430e5 100644
> --- a/include/uapi/linux/vhost.h
> +++ b/include/uapi/linux/vhost.h
> @@ -70,6 +70,17 @@
>  #define VHOST_VRING_BIG_ENDIAN 1
>  #define VHOST_SET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_state)
>  #define VHOST_GET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state)
> +/* By default, a device gets one vhost_worker created during VHOST_SET_OWNER
> + * that its virtqueues share. This allows userspace to create a vhost_worker
> + * and map a virtqueue to it or map a virtqueue to an existing worker.
> + *
> + * If pid > 0 and it matches an existing vhost_worker thread it will be bound
> + * to the vq. If pid is VHOST_VRING_NEW_WORKER, then a new worker will be
> + * created and bound to the vq.
> + *
> + * This must be called after VHOST_SET_OWNER and before the vq is active.
> + */
> +#define VHOST_SET_VRING_WORKER _IOWR(VHOST_VIRTIO, 0x15, struct vhost_vring_worker)

Please clarify whether or not multiple devices can attach vqs to the same worker.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
@ 2021-10-26 15:24     ` Stefan Hajnoczi
  0 siblings, 0 replies; 74+ messages in thread
From: Stefan Hajnoczi @ 2021-10-26 15:24 UTC (permalink / raw)
  To: Mike Christie; +Cc: linux-scsi, mst, virtualization, target-devel, pbonzini


[-- Attachment #1.1: Type: text/plain, Size: 1174 bytes --]

On Fri, Oct 22, 2021 at 12:19:11AM -0500, Mike Christie wrote:
> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> index c998860d7bbc..e5c0669430e5 100644
> --- a/include/uapi/linux/vhost.h
> +++ b/include/uapi/linux/vhost.h
> @@ -70,6 +70,17 @@
>  #define VHOST_VRING_BIG_ENDIAN 1
>  #define VHOST_SET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_state)
>  #define VHOST_GET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state)
> +/* By default, a device gets one vhost_worker created during VHOST_SET_OWNER
> + * that its virtqueues share. This allows userspace to create a vhost_worker
> + * and map a virtqueue to it or map a virtqueue to an existing worker.
> + *
> + * If pid > 0 and it matches an existing vhost_worker thread it will be bound
> + * to the vq. If pid is VHOST_VRING_NEW_WORKER, then a new worker will be
> + * created and bound to the vq.
> + *
> + * This must be called after VHOST_SET_OWNER and before the vq is active.
> + */
> +#define VHOST_SET_VRING_WORKER _IOWR(VHOST_VIRTIO, 0x15, struct vhost_vring_worker)

Please clarify whether or not multiple devices can attach vqs to the same worker.

Stefan

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
  2021-10-26  5:37     ` Jason Wang
@ 2021-10-26 15:44       ` Stefan Hajnoczi
  -1 siblings, 0 replies; 74+ messages in thread
From: Stefan Hajnoczi @ 2021-10-26 15:44 UTC (permalink / raw)
  To: Jason Wang
  Cc: Mike Christie, target-devel, linux-scsi, pbonzini, mst, sgarzare,
	virtualization

[-- Attachment #1: Type: text/plain, Size: 1943 bytes --]

On Tue, Oct 26, 2021 at 01:37:14PM +0800, Jason Wang wrote:
> 
> 在 2021/10/22 下午1:19, Mike Christie 写道:
> > This patch allows userspace to create workers and bind them to vqs. You
> > can have N workers per dev and also share N workers with M vqs.
> > 
> > Signed-off-by: Mike Christie <michael.christie@oracle.com>
> 
> 
> A question, who is the best one to determine the binding? Is it the VMM
> (Qemu etc) or the management stack? If the latter, it looks to me it's
> better to expose this via sysfs?

A few options that let the management stack control vhost worker CPU
affinity:

1. The management tool opens the vhost device node, calls
   ioctl(VHOST_SET_VRING_WORKER), sets up CPU affinity, and then passes
   the fd to the VMM. In this case the VMM is still able to call the
   ioctl, which may be undesirable from an attack surface perspective.

2. The VMM calls ioctl(VHOST_SET_VRING_WORKER) itself and the management
   tool queries the vq:worker details from the VMM (e.g. a new QEMU QMP
   query-vhost-workers command similar to query-iothreads). The
   management tool can then control CPU affinity on the vhost worker
   threads.

   (This is how CPU affinity works in QEMU and libvirt today.)

3. The sysfs approach you suggested. Does sysfs export vq-0/, vq-1/, etc
   directories with a "worker" attribute? Do we need to define a point
   when the VMM has set up vqs and the management stack is able to query
   them? Vhost devices currently pre-allocate the maximum number of vqs
   and I'm not sure how to determine the number of vqs that will
   actually be used?

   One advantage of this is that access to the vq:worker mapping can be
   limited to the management stack and the VMM cannot access it. But it
   seems a little tricky because the vhost model today doesn't use sysfs
   or define a lifecycle where the management stack can configure
   devices.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
@ 2021-10-26 15:44       ` Stefan Hajnoczi
  0 siblings, 0 replies; 74+ messages in thread
From: Stefan Hajnoczi @ 2021-10-26 15:44 UTC (permalink / raw)
  To: Jason Wang; +Cc: linux-scsi, mst, virtualization, target-devel, pbonzini


[-- Attachment #1.1: Type: text/plain, Size: 1943 bytes --]

On Tue, Oct 26, 2021 at 01:37:14PM +0800, Jason Wang wrote:
> 
> 在 2021/10/22 下午1:19, Mike Christie 写道:
> > This patch allows userspace to create workers and bind them to vqs. You
> > can have N workers per dev and also share N workers with M vqs.
> > 
> > Signed-off-by: Mike Christie <michael.christie@oracle.com>
> 
> 
> A question, who is the best one to determine the binding? Is it the VMM
> (Qemu etc) or the management stack? If the latter, it looks to me it's
> better to expose this via sysfs?

A few options that let the management stack control vhost worker CPU
affinity:

1. The management tool opens the vhost device node, calls
   ioctl(VHOST_SET_VRING_WORKER), sets up CPU affinity, and then passes
   the fd to the VMM. In this case the VMM is still able to call the
   ioctl, which may be undesirable from an attack surface perspective.

2. The VMM calls ioctl(VHOST_SET_VRING_WORKER) itself and the management
   tool queries the vq:worker details from the VMM (e.g. a new QEMU QMP
   query-vhost-workers command similar to query-iothreads). The
   management tool can then control CPU affinity on the vhost worker
   threads.

   (This is how CPU affinity works in QEMU and libvirt today.)

3. The sysfs approach you suggested. Does sysfs export vq-0/, vq-1/, etc
   directories with a "worker" attribute? Do we need to define a point
   when the VMM has set up vqs and the management stack is able to query
   them? Vhost devices currently pre-allocate the maximum number of vqs
   and I'm not sure how to determine the number of vqs that will
   actually be used?

   One advantage of this is that access to the vq:worker mapping can be
   limited to the management stack and the VMM cannot access it. But it
   seems a little tricky because the vhost model today doesn't use sysfs
   or define a lifecycle where the management stack can configure
   devices.

Stefan

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
  2021-10-26 13:09       ` Michael S. Tsirkin
@ 2021-10-26 16:36         ` Stefan Hajnoczi
  -1 siblings, 0 replies; 74+ messages in thread
From: Stefan Hajnoczi @ 2021-10-26 16:36 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, Mike Christie, target-devel, linux-scsi, pbonzini,
	sgarzare, virtualization

[-- Attachment #1: Type: text/plain, Size: 906 bytes --]

On Tue, Oct 26, 2021 at 09:09:52AM -0400, Michael S. Tsirkin wrote:
> On Tue, Oct 26, 2021 at 01:37:14PM +0800, Jason Wang wrote:
> > 
> > 在 2021/10/22 下午1:19, Mike Christie 写道:
> > > This patch allows userspace to create workers and bind them to vqs. You
> > > can have N workers per dev and also share N workers with M vqs.
> > > 
> > > Signed-off-by: Mike Christie <michael.christie@oracle.com>
> > 
> > 
> > A question, who is the best one to determine the binding? Is it the VMM
> > (Qemu etc) or the management stack? If the latter, it looks to me it's
> > better to expose this via sysfs?
> 
> I think it's a bit much to expect this from management.

The management stack controls the number of vqs used as well as the vCPU
and IOThread CPU affinity. It seems natural for it to also control the
vhost worker CPU affinity. Where else should that be controlled?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
@ 2021-10-26 16:36         ` Stefan Hajnoczi
  0 siblings, 0 replies; 74+ messages in thread
From: Stefan Hajnoczi @ 2021-10-26 16:36 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-scsi, virtualization, target-devel, pbonzini


[-- Attachment #1.1: Type: text/plain, Size: 906 bytes --]

On Tue, Oct 26, 2021 at 09:09:52AM -0400, Michael S. Tsirkin wrote:
> On Tue, Oct 26, 2021 at 01:37:14PM +0800, Jason Wang wrote:
> > 
> > 在 2021/10/22 下午1:19, Mike Christie 写道:
> > > This patch allows userspace to create workers and bind them to vqs. You
> > > can have N workers per dev and also share N workers with M vqs.
> > > 
> > > Signed-off-by: Mike Christie <michael.christie@oracle.com>
> > 
> > 
> > A question, who is the best one to determine the binding? Is it the VMM
> > (Qemu etc) or the management stack? If the latter, it looks to me it's
> > better to expose this via sysfs?
> 
> I think it's a bit much to expect this from management.

The management stack controls the number of vqs used as well as the vCPU
and IOThread CPU affinity. It seems natural for it to also control the
vhost worker CPU affinity. Where else should that be controlled?

Stefan

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
  2021-10-26  5:37     ` Jason Wang
@ 2021-10-26 16:49       ` michael.christie
  -1 siblings, 0 replies; 74+ messages in thread
From: michael.christie @ 2021-10-26 16:49 UTC (permalink / raw)
  To: Jason Wang, target-devel, linux-scsi, stefanha, pbonzini, mst,
	sgarzare, virtualization

On 10/26/21 12:37 AM, Jason Wang wrote:
> 
> 在 2021/10/22 下午1:19, Mike Christie 写道:
>> This patch allows userspace to create workers and bind them to vqs. You
>> can have N workers per dev and also share N workers with M vqs.
>>
>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> 
> 
> A question, who is the best one to determine the binding? Is it the VMM (Qemu etc) or the management stack? If the latter, it looks to me it's better to expose this via sysfs?

I thought it would be where you have management app settings, then the
management app talks to the qemu control interface like it does when it
adds new devices on the fly.

A problem with the management app doing it is to handle the RLIMIT_NPROC
review comment, this patchset:

https://lore.kernel.org/all/20211007214448.6282-1-michael.christie@oracle.com/

basically has the kernel do a clone() from the caller's context. So adding
a worker is like doing the VHOST_SET_OWNER ioctl where it still has to be done
from a process you can inherit values like the mm, cgroups, and now RLIMITs.


>> diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
>> index f7f6a3a28977..af654e3cef0e 100644
>> --- a/include/uapi/linux/vhost_types.h
>> +++ b/include/uapi/linux/vhost_types.h
>> @@ -47,6 +47,18 @@ struct vhost_vring_addr {
>>       __u64 log_guest_addr;
>>   };
>>   +#define VHOST_VRING_NEW_WORKER -1
> 
> 
> Do we need VHOST_VRING_FREE_WORKER? And I wonder if using dedicated ioctls are better:
> 
> VHOST_VRING_NEW/FREE_WORKER
> VHOST_VRING_ATTACH_WORKER


We didn't need a free worker, because the kernel handles it for userspace. I
tried to make it easy for userspace because in some cases it may not be able
to do syscalls like close on the device. For example if qemu crashes or for
vhost-scsi we don't do an explicit close during VM shutdown.

So we start off with the default worker thread that's used by all vqs like we do
today. Userspace can then override it by creating a new worker. That also unbinds/
detaches the existing worker and does a put on the workers refcount. We also do a
put on the worker when we stop using it during device shutdown/closure/release.
When the worker's refcount goes to zero the kernel deletes it.

I think separating the calls could be helpful though.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
@ 2021-10-26 16:49       ` michael.christie
  0 siblings, 0 replies; 74+ messages in thread
From: michael.christie @ 2021-10-26 16:49 UTC (permalink / raw)
  To: Jason Wang, target-devel, linux-scsi, stefanha, pbonzini, mst,
	sgarzare, virtualization

On 10/26/21 12:37 AM, Jason Wang wrote:
> 
> 在 2021/10/22 下午1:19, Mike Christie 写道:
>> This patch allows userspace to create workers and bind them to vqs. You
>> can have N workers per dev and also share N workers with M vqs.
>>
>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> 
> 
> A question, who is the best one to determine the binding? Is it the VMM (Qemu etc) or the management stack? If the latter, it looks to me it's better to expose this via sysfs?

I thought it would be where you have management app settings, then the
management app talks to the qemu control interface like it does when it
adds new devices on the fly.

A problem with the management app doing it is to handle the RLIMIT_NPROC
review comment, this patchset:

https://lore.kernel.org/all/20211007214448.6282-1-michael.christie@oracle.com/

basically has the kernel do a clone() from the caller's context. So adding
a worker is like doing the VHOST_SET_OWNER ioctl where it still has to be done
from a process you can inherit values like the mm, cgroups, and now RLIMITs.


>> diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
>> index f7f6a3a28977..af654e3cef0e 100644
>> --- a/include/uapi/linux/vhost_types.h
>> +++ b/include/uapi/linux/vhost_types.h
>> @@ -47,6 +47,18 @@ struct vhost_vring_addr {
>>       __u64 log_guest_addr;
>>   };
>>   +#define VHOST_VRING_NEW_WORKER -1
> 
> 
> Do we need VHOST_VRING_FREE_WORKER? And I wonder if using dedicated ioctls are better:
> 
> VHOST_VRING_NEW/FREE_WORKER
> VHOST_VRING_ATTACH_WORKER


We didn't need a free worker, because the kernel handles it for userspace. I
tried to make it easy for userspace because in some cases it may not be able
to do syscalls like close on the device. For example if qemu crashes or for
vhost-scsi we don't do an explicit close during VM shutdown.

So we start off with the default worker thread that's used by all vqs like we do
today. Userspace can then override it by creating a new worker. That also unbinds/
detaches the existing worker and does a put on the workers refcount. We also do a
put on the worker when we stop using it during device shutdown/closure/release.
When the worker's refcount goes to zero the kernel deletes it.

I think separating the calls could be helpful though.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
  2021-10-26 15:44       ` Stefan Hajnoczi
@ 2021-10-27  2:55         ` Jason Wang
  -1 siblings, 0 replies; 74+ messages in thread
From: Jason Wang @ 2021-10-27  2:55 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Mike Christie, target-devel, linux-scsi, pbonzini, mst,
	Stefano Garzarella, virtualization

On Tue, Oct 26, 2021 at 11:45 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Tue, Oct 26, 2021 at 01:37:14PM +0800, Jason Wang wrote:
> >
> > 在 2021/10/22 下午1:19, Mike Christie 写道:
> > > This patch allows userspace to create workers and bind them to vqs. You
> > > can have N workers per dev and also share N workers with M vqs.
> > >
> > > Signed-off-by: Mike Christie <michael.christie@oracle.com>
> >
> >
> > A question, who is the best one to determine the binding? Is it the VMM
> > (Qemu etc) or the management stack? If the latter, it looks to me it's
> > better to expose this via sysfs?
>
> A few options that let the management stack control vhost worker CPU
> affinity:
>
> 1. The management tool opens the vhost device node, calls
>    ioctl(VHOST_SET_VRING_WORKER), sets up CPU affinity, and then passes
>    the fd to the VMM. In this case the VMM is still able to call the
>    ioctl, which may be undesirable from an attack surface perspective.

Yes, and we can't do post or dynamic configuration afterwards after
the VM is launched?

>
> 2. The VMM calls ioctl(VHOST_SET_VRING_WORKER) itself and the management
>    tool queries the vq:worker details from the VMM (e.g. a new QEMU QMP
>    query-vhost-workers command similar to query-iothreads). The
>    management tool can then control CPU affinity on the vhost worker
>    threads.
>
>    (This is how CPU affinity works in QEMU and libvirt today.)

Then we also need a "bind-vhost-workers" command.

>
> 3. The sysfs approach you suggested. Does sysfs export vq-0/, vq-1/, etc
>    directories with a "worker" attribute?

Something like this.

> Do we need to define a point
>    when the VMM has set up vqs and the management stack is able to query
>    them?

It could be the point that the vhost fd is opened.

>  Vhost devices currently pre-allocate the maximum number of vqs
>    and I'm not sure how to determine the number of vqs that will
>    actually be used?

It requires more information to be exposed. But before this, we should
allow the dynamic binding of between vq and worker.

>
>    One advantage of this is that access to the vq:worker mapping can be
>    limited to the management stack and the VMM cannot access it. But it
>    seems a little tricky because the vhost model today doesn't use sysfs
>    or define a lifecycle where the management stack can configure
>    devices.

Yes.

Thanks

>
> Stefan


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
@ 2021-10-27  2:55         ` Jason Wang
  0 siblings, 0 replies; 74+ messages in thread
From: Jason Wang @ 2021-10-27  2:55 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: linux-scsi, mst, virtualization, target-devel, pbonzini

On Tue, Oct 26, 2021 at 11:45 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Tue, Oct 26, 2021 at 01:37:14PM +0800, Jason Wang wrote:
> >
> > 在 2021/10/22 下午1:19, Mike Christie 写道:
> > > This patch allows userspace to create workers and bind them to vqs. You
> > > can have N workers per dev and also share N workers with M vqs.
> > >
> > > Signed-off-by: Mike Christie <michael.christie@oracle.com>
> >
> >
> > A question, who is the best one to determine the binding? Is it the VMM
> > (Qemu etc) or the management stack? If the latter, it looks to me it's
> > better to expose this via sysfs?
>
> A few options that let the management stack control vhost worker CPU
> affinity:
>
> 1. The management tool opens the vhost device node, calls
>    ioctl(VHOST_SET_VRING_WORKER), sets up CPU affinity, and then passes
>    the fd to the VMM. In this case the VMM is still able to call the
>    ioctl, which may be undesirable from an attack surface perspective.

Yes, and we can't do post or dynamic configuration afterwards after
the VM is launched?

>
> 2. The VMM calls ioctl(VHOST_SET_VRING_WORKER) itself and the management
>    tool queries the vq:worker details from the VMM (e.g. a new QEMU QMP
>    query-vhost-workers command similar to query-iothreads). The
>    management tool can then control CPU affinity on the vhost worker
>    threads.
>
>    (This is how CPU affinity works in QEMU and libvirt today.)

Then we also need a "bind-vhost-workers" command.

>
> 3. The sysfs approach you suggested. Does sysfs export vq-0/, vq-1/, etc
>    directories with a "worker" attribute?

Something like this.

> Do we need to define a point
>    when the VMM has set up vqs and the management stack is able to query
>    them?

It could be the point that the vhost fd is opened.

>  Vhost devices currently pre-allocate the maximum number of vqs
>    and I'm not sure how to determine the number of vqs that will
>    actually be used?

It requires more information to be exposed. But before this, we should
allow the dynamic binding of between vq and worker.

>
>    One advantage of this is that access to the vq:worker mapping can be
>    limited to the management stack and the VMM cannot access it. But it
>    seems a little tricky because the vhost model today doesn't use sysfs
>    or define a lifecycle where the management stack can configure
>    devices.

Yes.

Thanks

>
> Stefan

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
  2021-10-26 16:49       ` michael.christie
@ 2021-10-27  6:02         ` Jason Wang
  -1 siblings, 0 replies; 74+ messages in thread
From: Jason Wang @ 2021-10-27  6:02 UTC (permalink / raw)
  To: Mike Christie
  Cc: target-devel, linux-scsi, Stefan Hajnoczi, pbonzini, mst,
	Stefano Garzarella, virtualization

On Wed, Oct 27, 2021 at 12:49 AM <michael.christie@oracle.com> wrote:
>
> On 10/26/21 12:37 AM, Jason Wang wrote:
> >
> > 在 2021/10/22 下午1:19, Mike Christie 写道:
> >> This patch allows userspace to create workers and bind them to vqs. You
> >> can have N workers per dev and also share N workers with M vqs.
> >>
> >> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> >
> >
> > A question, who is the best one to determine the binding? Is it the VMM (Qemu etc) or the management stack? If the latter, it looks to me it's better to expose this via sysfs?
>
> I thought it would be where you have management app settings, then the
> management app talks to the qemu control interface like it does when it
> adds new devices on the fly.
>
> A problem with the management app doing it is to handle the RLIMIT_NPROC
> review comment, this patchset:
>
> https://lore.kernel.org/all/20211007214448.6282-1-michael.christie@oracle.com/
>
> basically has the kernel do a clone() from the caller's context. So adding
> a worker is like doing the VHOST_SET_OWNER ioctl where it still has to be done
> from a process you can inherit values like the mm, cgroups, and now RLIMITs.

Right, so as Stefan suggested, we probably need new QMP commands then
management can help there. Then it can satisfy the model you described
above.

>
>
> >> diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
> >> index f7f6a3a28977..af654e3cef0e 100644
> >> --- a/include/uapi/linux/vhost_types.h
> >> +++ b/include/uapi/linux/vhost_types.h
> >> @@ -47,6 +47,18 @@ struct vhost_vring_addr {
> >>       __u64 log_guest_addr;
> >>   };
> >>   +#define VHOST_VRING_NEW_WORKER -1
> >
> >
> > Do we need VHOST_VRING_FREE_WORKER? And I wonder if using dedicated ioctls are better:
> >
> > VHOST_VRING_NEW/FREE_WORKER
> > VHOST_VRING_ATTACH_WORKER
>
>
> We didn't need a free worker, because the kernel handles it for userspace. I
> tried to make it easy for userspace because in some cases it may not be able
> to do syscalls like close on the device. For example if qemu crashes or for
> vhost-scsi we don't do an explicit close during VM shutdown.
>

Ok, the motivation is that if in some cases (e.g the active number of
queues are changed), qemu can choose to free some resources.

> So we start off with the default worker thread that's used by all vqs like we do
> today. Userspace can then override it by creating a new worker. That also unbinds/
> detaches the existing worker and does a put on the workers refcount. We also do a
> put on the worker when we stop using it during device shutdown/closure/release.
> When the worker's refcount goes to zero the kernel deletes it.
>
> I think separating the calls could be helpful though.
>

Ok.

Thanks


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
@ 2021-10-27  6:02         ` Jason Wang
  0 siblings, 0 replies; 74+ messages in thread
From: Jason Wang @ 2021-10-27  6:02 UTC (permalink / raw)
  To: Mike Christie
  Cc: linux-scsi, mst, virtualization, target-devel, Stefan Hajnoczi, pbonzini

On Wed, Oct 27, 2021 at 12:49 AM <michael.christie@oracle.com> wrote:
>
> On 10/26/21 12:37 AM, Jason Wang wrote:
> >
> > 在 2021/10/22 下午1:19, Mike Christie 写道:
> >> This patch allows userspace to create workers and bind them to vqs. You
> >> can have N workers per dev and also share N workers with M vqs.
> >>
> >> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> >
> >
> > A question, who is the best one to determine the binding? Is it the VMM (Qemu etc) or the management stack? If the latter, it looks to me it's better to expose this via sysfs?
>
> I thought it would be where you have management app settings, then the
> management app talks to the qemu control interface like it does when it
> adds new devices on the fly.
>
> A problem with the management app doing it is to handle the RLIMIT_NPROC
> review comment, this patchset:
>
> https://lore.kernel.org/all/20211007214448.6282-1-michael.christie@oracle.com/
>
> basically has the kernel do a clone() from the caller's context. So adding
> a worker is like doing the VHOST_SET_OWNER ioctl where it still has to be done
> from a process you can inherit values like the mm, cgroups, and now RLIMITs.

Right, so as Stefan suggested, we probably need new QMP commands then
management can help there. Then it can satisfy the model you described
above.

>
>
> >> diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
> >> index f7f6a3a28977..af654e3cef0e 100644
> >> --- a/include/uapi/linux/vhost_types.h
> >> +++ b/include/uapi/linux/vhost_types.h
> >> @@ -47,6 +47,18 @@ struct vhost_vring_addr {
> >>       __u64 log_guest_addr;
> >>   };
> >>   +#define VHOST_VRING_NEW_WORKER -1
> >
> >
> > Do we need VHOST_VRING_FREE_WORKER? And I wonder if using dedicated ioctls are better:
> >
> > VHOST_VRING_NEW/FREE_WORKER
> > VHOST_VRING_ATTACH_WORKER
>
>
> We didn't need a free worker, because the kernel handles it for userspace. I
> tried to make it easy for userspace because in some cases it may not be able
> to do syscalls like close on the device. For example if qemu crashes or for
> vhost-scsi we don't do an explicit close during VM shutdown.
>

Ok, the motivation is that if in some cases (e.g the active number of
queues are changed), qemu can choose to free some resources.

> So we start off with the default worker thread that's used by all vqs like we do
> today. Userspace can then override it by creating a new worker. That also unbinds/
> detaches the existing worker and does a put on the workers refcount. We also do a
> put on the worker when we stop using it during device shutdown/closure/release.
> When the worker's refcount goes to zero the kernel deletes it.
>
> I think separating the calls could be helpful though.
>

Ok.

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
  2021-10-27  2:55         ` Jason Wang
@ 2021-10-27  9:01           ` Stefan Hajnoczi
  -1 siblings, 0 replies; 74+ messages in thread
From: Stefan Hajnoczi @ 2021-10-27  9:01 UTC (permalink / raw)
  To: Jason Wang
  Cc: Mike Christie, target-devel, linux-scsi, pbonzini, mst,
	Stefano Garzarella, virtualization

[-- Attachment #1: Type: text/plain, Size: 1852 bytes --]

On Wed, Oct 27, 2021 at 10:55:04AM +0800, Jason Wang wrote:
> On Tue, Oct 26, 2021 at 11:45 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >
> > On Tue, Oct 26, 2021 at 01:37:14PM +0800, Jason Wang wrote:
> > >
> > > 在 2021/10/22 下午1:19, Mike Christie 写道:
> > > > This patch allows userspace to create workers and bind them to vqs. You
> > > > can have N workers per dev and also share N workers with M vqs.
> > > >
> > > > Signed-off-by: Mike Christie <michael.christie@oracle.com>
> > >
> > >
> > > A question, who is the best one to determine the binding? Is it the VMM
> > > (Qemu etc) or the management stack? If the latter, it looks to me it's
> > > better to expose this via sysfs?
> >
> > A few options that let the management stack control vhost worker CPU
> > affinity:
> >
> > 1. The management tool opens the vhost device node, calls
> >    ioctl(VHOST_SET_VRING_WORKER), sets up CPU affinity, and then passes
> >    the fd to the VMM. In this case the VMM is still able to call the
> >    ioctl, which may be undesirable from an attack surface perspective.
> 
> Yes, and we can't do post or dynamic configuration afterwards after
> the VM is launched?

Yes, at least it's a little risky for the management stack to keep the
vhost fd open and make ioctl calls while the VMM is using it.

> >
> > 2. The VMM calls ioctl(VHOST_SET_VRING_WORKER) itself and the management
> >    tool queries the vq:worker details from the VMM (e.g. a new QEMU QMP
> >    query-vhost-workers command similar to query-iothreads). The
> >    management tool can then control CPU affinity on the vhost worker
> >    threads.
> >
> >    (This is how CPU affinity works in QEMU and libvirt today.)
> 
> Then we also need a "bind-vhost-workers" command.

The VMM doesn't but the management tool does.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
@ 2021-10-27  9:01           ` Stefan Hajnoczi
  0 siblings, 0 replies; 74+ messages in thread
From: Stefan Hajnoczi @ 2021-10-27  9:01 UTC (permalink / raw)
  To: Jason Wang; +Cc: linux-scsi, mst, virtualization, target-devel, pbonzini


[-- Attachment #1.1: Type: text/plain, Size: 1852 bytes --]

On Wed, Oct 27, 2021 at 10:55:04AM +0800, Jason Wang wrote:
> On Tue, Oct 26, 2021 at 11:45 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >
> > On Tue, Oct 26, 2021 at 01:37:14PM +0800, Jason Wang wrote:
> > >
> > > 在 2021/10/22 下午1:19, Mike Christie 写道:
> > > > This patch allows userspace to create workers and bind them to vqs. You
> > > > can have N workers per dev and also share N workers with M vqs.
> > > >
> > > > Signed-off-by: Mike Christie <michael.christie@oracle.com>
> > >
> > >
> > > A question, who is the best one to determine the binding? Is it the VMM
> > > (Qemu etc) or the management stack? If the latter, it looks to me it's
> > > better to expose this via sysfs?
> >
> > A few options that let the management stack control vhost worker CPU
> > affinity:
> >
> > 1. The management tool opens the vhost device node, calls
> >    ioctl(VHOST_SET_VRING_WORKER), sets up CPU affinity, and then passes
> >    the fd to the VMM. In this case the VMM is still able to call the
> >    ioctl, which may be undesirable from an attack surface perspective.
> 
> Yes, and we can't do post or dynamic configuration afterwards after
> the VM is launched?

Yes, at least it's a little risky for the management stack to keep the
vhost fd open and make ioctl calls while the VMM is using it.

> >
> > 2. The VMM calls ioctl(VHOST_SET_VRING_WORKER) itself and the management
> >    tool queries the vq:worker details from the VMM (e.g. a new QEMU QMP
> >    query-vhost-workers command similar to query-iothreads). The
> >    management tool can then control CPU affinity on the vhost worker
> >    threads.
> >
> >    (This is how CPU affinity works in QEMU and libvirt today.)
> 
> Then we also need a "bind-vhost-workers" command.

The VMM doesn't but the management tool does.

Stefan

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
  2021-10-26 16:49       ` michael.christie
@ 2021-10-27  9:03         ` Stefan Hajnoczi
  -1 siblings, 0 replies; 74+ messages in thread
From: Stefan Hajnoczi @ 2021-10-27  9:03 UTC (permalink / raw)
  To: michael.christie
  Cc: Jason Wang, target-devel, linux-scsi, pbonzini, mst, sgarzare,
	virtualization

[-- Attachment #1: Type: text/plain, Size: 1160 bytes --]

On Tue, Oct 26, 2021 at 11:49:37AM -0500, michael.christie@oracle.com wrote:
> On 10/26/21 12:37 AM, Jason Wang wrote:
> > Do we need VHOST_VRING_FREE_WORKER? And I wonder if using dedicated ioctls are better:
> > 
> > VHOST_VRING_NEW/FREE_WORKER
> > VHOST_VRING_ATTACH_WORKER
> 
> 
> We didn't need a free worker, because the kernel handles it for userspace. I
> tried to make it easy for userspace because in some cases it may not be able
> to do syscalls like close on the device. For example if qemu crashes or for
> vhost-scsi we don't do an explicit close during VM shutdown.
> 
> So we start off with the default worker thread that's used by all vqs like we do
> today. Userspace can then override it by creating a new worker. That also unbinds/
> detaches the existing worker and does a put on the workers refcount. We also do a
> put on the worker when we stop using it during device shutdown/closure/release.
> When the worker's refcount goes to zero the kernel deletes it.

Please document the worker (p)id lifetime for the ioctl. Otherwise
userspace doesn't know whether a previously created worker is still
alive.

SSTefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH V3 11/11] vhost: allow userspace to create workers
@ 2021-10-27  9:03         ` Stefan Hajnoczi
  0 siblings, 0 replies; 74+ messages in thread
From: Stefan Hajnoczi @ 2021-10-27  9:03 UTC (permalink / raw)
  To: michael.christie; +Cc: linux-scsi, mst, virtualization, target-devel, pbonzini


[-- Attachment #1.1: Type: text/plain, Size: 1160 bytes --]

On Tue, Oct 26, 2021 at 11:49:37AM -0500, michael.christie@oracle.com wrote:
> On 10/26/21 12:37 AM, Jason Wang wrote:
> > Do we need VHOST_VRING_FREE_WORKER? And I wonder if using dedicated ioctls are better:
> > 
> > VHOST_VRING_NEW/FREE_WORKER
> > VHOST_VRING_ATTACH_WORKER
> 
> 
> We didn't need a free worker, because the kernel handles it for userspace. I
> tried to make it easy for userspace because in some cases it may not be able
> to do syscalls like close on the device. For example if qemu crashes or for
> vhost-scsi we don't do an explicit close during VM shutdown.
> 
> So we start off with the default worker thread that's used by all vqs like we do
> today. Userspace can then override it by creating a new worker. That also unbinds/
> detaches the existing worker and does a put on the workers refcount. We also do a
> put on the worker when we stop using it during device shutdown/closure/release.
> When the worker's refcount goes to zero the kernel deletes it.

Please document the worker (p)id lifetime for the ioctl. Otherwise
userspace doesn't know whether a previously created worker is still
alive.

SSTefan

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2021-10-27  9:03 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-22  5:18 [PATCH V3 00/11] vhost: multiple worker support Mike Christie
2021-10-22  5:18 ` Mike Christie
2021-10-22  5:19 ` [PATCH] QEMU vhost-scsi: add support for VHOST_SET_VRING_WORKER Mike Christie
2021-10-22  5:19   ` Mike Christie
2021-10-22  5:19 ` [PATCH V3 01/11] vhost: add vhost_worker pointer to vhost_virtqueue Mike Christie
2021-10-22  5:19   ` Mike Christie
2021-10-22  5:19 ` [PATCH V3 02/11] vhost, vhost-net: add helper to check if vq has work Mike Christie
2021-10-22  5:19   ` Mike Christie
2021-10-22  5:19 ` [PATCH V3 03/11] vhost: take worker or vq instead of dev for queueing Mike Christie
2021-10-22  5:19   ` Mike Christie
2021-10-22  5:19 ` [PATCH V3 04/11] vhost: take worker or vq instead of dev for flushing Mike Christie
2021-10-22  5:19   ` Mike Christie
2021-10-22  5:19 ` [PATCH V3 05/11] vhost: convert poll work to be vq based Mike Christie
2021-10-22  5:19   ` Mike Christie
2021-10-22  5:19 ` [PATCH V3 06/11] vhost-sock: convert to vq helpers Mike Christie
2021-10-22  5:19   ` Mike Christie
2021-10-25  9:08   ` Stefano Garzarella
2021-10-25  9:08     ` Stefano Garzarella
2021-10-25 16:09     ` michael.christie
2021-10-25 16:09       ` michael.christie
2021-10-22  5:19 ` [PATCH V3 07/11] vhost-scsi: make SCSI cmd completion per vq Mike Christie
2021-10-22  5:19   ` Mike Christie
2021-10-22  5:19 ` [PATCH V3 08/11] vhost-scsi: convert to vq helpers Mike Christie
2021-10-22  5:19   ` Mike Christie
2021-10-22  5:19 ` [PATCH V3 09/11] vhost-scsi: flush IO vqs then send TMF rsp Mike Christie
2021-10-22  5:19   ` Mike Christie
2021-10-22  5:19 ` [PATCH V3 10/11] vhost: remove device wide queu/flushing helpers Mike Christie
2021-10-22  5:19   ` Mike Christie
2021-10-22  5:19 ` [PATCH V3 11/11] vhost: allow userspace to create workers Mike Christie
2021-10-22  5:19   ` Mike Christie
2021-10-22 10:47   ` Michael S. Tsirkin
2021-10-22 10:47     ` Michael S. Tsirkin
2021-10-22 16:12     ` michael.christie
2021-10-22 16:12       ` michael.christie
2021-10-22 18:17       ` michael.christie
2021-10-22 18:17         ` michael.christie
2021-10-23 20:11         ` Michael S. Tsirkin
2021-10-23 20:11           ` Michael S. Tsirkin
2021-10-25 16:04           ` michael.christie
2021-10-25 16:04             ` michael.christie
2021-10-25 17:14             ` Michael S. Tsirkin
2021-10-25 17:14               ` Michael S. Tsirkin
2021-10-26  5:37   ` Jason Wang
2021-10-26  5:37     ` Jason Wang
2021-10-26 13:09     ` Michael S. Tsirkin
2021-10-26 13:09       ` Michael S. Tsirkin
2021-10-26 16:36       ` Stefan Hajnoczi
2021-10-26 16:36         ` Stefan Hajnoczi
2021-10-26 15:44     ` Stefan Hajnoczi
2021-10-26 15:44       ` Stefan Hajnoczi
2021-10-27  2:55       ` Jason Wang
2021-10-27  2:55         ` Jason Wang
2021-10-27  9:01         ` Stefan Hajnoczi
2021-10-27  9:01           ` Stefan Hajnoczi
2021-10-26 16:49     ` michael.christie
2021-10-26 16:49       ` michael.christie
2021-10-27  6:02       ` Jason Wang
2021-10-27  6:02         ` Jason Wang
2021-10-27  9:03       ` Stefan Hajnoczi
2021-10-27  9:03         ` Stefan Hajnoczi
2021-10-26 15:22   ` Stefan Hajnoczi
2021-10-26 15:22     ` Stefan Hajnoczi
2021-10-26 15:24   ` Stefan Hajnoczi
2021-10-26 15:24     ` Stefan Hajnoczi
2021-10-22  6:02 ` [PATCH V3 00/11] vhost: multiple worker support michael.christie
2021-10-22  6:02   ` michael.christie
2021-10-22  9:49   ` Michael S. Tsirkin
2021-10-22  9:49     ` Michael S. Tsirkin
2021-10-22  9:48 ` Michael S. Tsirkin
2021-10-22  9:48   ` Michael S. Tsirkin
2021-10-22 15:54   ` michael.christie
2021-10-22 15:54     ` michael.christie
2021-10-23 20:12     ` Michael S. Tsirkin
2021-10-23 20:12       ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.