On Wed, Nov 18, 2020 at 04:54:07AM -0500, Michael S. Tsirkin wrote: > On Tue, Nov 17, 2020 at 01:13:14PM -0600, Mike Christie wrote: > > On 11/17/20 10:40 AM, Stefan Hajnoczi wrote: > > > On Thu, Nov 12, 2020 at 05:18:59PM -0600, Mike Christie wrote: > > >> The following kernel patches were made over Michael's vhost branch: > > >> > > >> https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/log/?h=vhost > > >> > > >> and the vhost-scsi bug fix patchset: > > >> > > >> https://lore.kernel.org/linux-scsi/20201112170008.GB1555653@stefanha-x1.localdomain/T/#t > > >> > > >> And the qemu patch was made over the qemu master branch. > > >> > > >> vhost-scsi currently supports multiple queues with the num_queues > > >> setting, but we end up with a setup where the guest's scsi/block > > >> layer can do a queue per vCPU and the layers below vhost can do > > >> a queue per CPU. vhost-scsi will then do a num_queue virtqueues, > > >> but all IO gets set on and completed on a single vhost-scsi thread. > > >> After 2 - 4 vqs this becomes a bottleneck. > > >> > > >> This patchset allows us to create a worker thread per IO vq, so we > > >> can better utilize multiple CPUs with the multiple queues. It > > >> implments Jason's suggestion to create the initial worker like > > >> normal, then create the extra workers for IO vqs with the > > >> VHOST_SET_VRING_ENABLE ioctl command added in this patchset. > > > > > > How does userspace find out the tids and set their CPU affinity? > > > > > > > When we create the worker thread we add it to the device owner's cgroup, > > so we end up inheriting those settings like affinity. > > > > However, are you more asking about finer control like if the guest is > > doing mq, and the mq hw queue is bound to cpu0, it would perform > > better if we could bind vhost vq's worker thread to cpu0? I think the > > problem might is if you are in the cgroup then we can't set a specific > > threads CPU affinity to just one specific CPU. So you can either do > > cgroups or not. > > Something we wanted to try for a while is to allow userspace > to create threads for us, then specify which vqs it processes. Do you mean an interface like a blocking ioctl(vhost_fd, VHOST_WORKER_RUN) where the vhost processing is done in the context of the caller's userspace thread? What is neat about this is that it removes thread configuration from the kernel vhost code. On the other hand, userspace still needs an interface indicating which vqs should be processed. Maybe it would even require an int worker_fd = ioctl(vhost_fd, VHOST_WORKER_CREATE) and then ioctl(worker_fd, VHOST_WORKER_BIND_VQ, vq_idx)? So then it becomes complex again... Stefan