[PATCH RFC 00/14] vhost: multiple worker support

* [PATCH RFC 00/14] vhost: multiple worker support
@ 2021-04-28 22:36 Mike Christie
  2021-04-28 22:37 ` [PATCH RFC 01/14] vhost: remove work arg from vhost_work_flush Mike Christie
                   ` (15 more replies)
  0 siblings, 16 replies; 23+ messages in thread
From: Mike Christie @ 2021-04-28 22:36 UTC (permalink / raw)
  To: stefanha, pbonzini, jasowang, mst, sgarzare, virtualization

The following patches apply over mst's vhost branch and were tested
againt that branch and also mkp's 5.13 branch which has some vhost-scsi
changes.

These patches allow us to support multiple vhost workers per device. I
ended up just doing Stefan's original idea where userspace has the
kernel create a worker and we pass back the pid. This has the benefit
over the workqueue and userspace thread approach where we only have
one'ish code path in the kernel.

The kernel patches here allow us to then do N workers device and also
share workers across devices.

I included a patch for qemu so you can get an idea of how it works.

TODO:
-----
- polling
- Allow sharing workers across devices. Kernel support is added and I
hacked up userspace to test, but I'm still working on a sane way to
manage it in userspace.
- Bind to specific CPUs. Commands like "virsh emulatorpin" work with
these patches and allow us to set the group of vhost threads to different
CPUs. But we can also set a specific vq's worker to run on a CPU.
- I'm handling old kernel by just checking for EPERM. Does this require
a feature?

Results:
--------
When running with the null_blk driver and vhost-scsi I can get 1.2
million IOPs by just running a simple

fio --filename=/dev/sda --direct=1 --rw=randrw --bs=4k --ioengine=libaio
--iodepth=128  --numjobs=8 --time_based --group_reporting --name=iops
--runtime=60 --eta-newline=1

The VM has 8 vCPUs and sda has 8 virtqueues and we can do a total of
1024 cmds per devices. To get 1.2 million IOPs I did have to tune and
ran the virsh emulatorpin command so the vhost threads were running
on different CPUs than the VM. If the vhost threads share CPUs then I
get around 800K.

For a more real device that are also CPU hogs like iscsi, I can still
get 1 million IOPs using 1 dm-multipath device over 8 iscsi paths
(natively it gets 1.1 million IOPs).

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 23+ messages in thread