Target-devel archive on lore.kernel.org
 help / color / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: Mike Christie <michael.christie@oracle.com>,
	Stefan Hajnoczi <stefanha@redhat.com>
Cc: fam@euphon.net, linux-scsi@vger.kernel.org, mst@redhat.com,
	qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org,
	target-devel@vger.kernel.org, pbonzini@redhat.com
Subject: Re: [PATCH 00/10] vhost/qemu: thread per IO SCSI vq
Date: Thu, 19 Nov 2020 12:35:59 +0800
Message-ID: <e91e9eee-7ff4-3ef6-955a-706276065d9b@redhat.com> (raw)
In-Reply-To: <d6ffcf17-ab12-4830-cc3c-0f0402fb8a0f@oracle.com>


On 2020/11/19 上午4:06, Mike Christie wrote:
> On 11/18/20 1:54 AM, Jason Wang wrote:
>>
>> On 2020/11/18 下午2:57, Mike Christie wrote:
>>> On 11/17/20 11:17 PM, Jason Wang wrote:
>>>> On 2020/11/18 上午12:40, Stefan Hajnoczi wrote:
>>>>> On Thu, Nov 12, 2020 at 05:18:59PM -0600, Mike Christie wrote:
>>>>>> The following kernel patches were made over Michael's vhost branch:
>>>>>>
>>>>>> https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/log/?h=vhost__;!!GqivPVa7Brio!MzCv3wdRfz5dltunazRWGCeUkMg91pPEOLpIivsebLX9vhYDSi_E1V36e9H0NoRys_hU$ 
>>>>>>
>>>>>> and the vhost-scsi bug fix patchset:
>>>>>>
>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-scsi/20201112170008.GB1555653@stefanha-x1.localdomain/T/*t__;Iw!!GqivPVa7Brio!MzCv3wdRfz5dltunazRWGCeUkMg91pPEOLpIivsebLX9vhYDSi_E1V36e9H0NmuPE_m8$ 
>>>>>>
>>>>>> And the qemu patch was made over the qemu master branch.
>>>>>>
>>>>>> vhost-scsi currently supports multiple queues with the num_queues
>>>>>> setting, but we end up with a setup where the guest's scsi/block
>>>>>> layer can do a queue per vCPU and the layers below vhost can do
>>>>>> a queue per CPU. vhost-scsi will then do a num_queue virtqueues,
>>>>>> but all IO gets set on and completed on a single vhost-scsi thread.
>>>>>> After 2 - 4 vqs this becomes a bottleneck.
>>>>>>
>>>>>> This patchset allows us to create a worker thread per IO vq, so we
>>>>>> can better utilize multiple CPUs with the multiple queues. It
>>>>>> implments Jason's suggestion to create the initial worker like
>>>>>> normal, then create the extra workers for IO vqs with the
>>>>>> VHOST_SET_VRING_ENABLE ioctl command added in this patchset.
>>>>> How does userspace find out the tids and set their CPU affinity?
>>>>>
>>>>> What is the meaning of the new VHOST_SET_VRING_ENABLE ioctl? It 
>>>>> doesn't
>>>>> really "enable" or "disable" the vq, requests are processed 
>>>>> regardless.
>>>>
>>>> Actually I think it should do the real "enable/disable" that tries 
>>>> to follow the virtio spec.
>>>>
>>> What does real mean here?
>>
>>
>> I think it means when a vq is disabled, vhost won't process any 
>> request from that virtqueue.
>>
>>
>>> For the vdpa enable call for example, would it be like
>>> ifcvf_vdpa_set_vq_ready where it sets the ready bit or more like 
>>> mlx5_vdpa_set_vq_ready
>>> where it can do some more work in the disable case?
>>
>>
>> For vDPA, it would be more complicated.
>>
>> E.g for IFCVF, it just delay the setting of queue_enable when it get 
>> DRIVER_OK. Technically it can passthrough the queue_enable to the 
>> hardware as what mlx5e did.
>>
>>
>>>
>>> For net and something like ifcvf_vdpa_set_vq_ready's design would we 
>>> have
>>> vhost_ring_ioctl() set some vhost_virtqueue enable bit. We then have 
>>> some helper
>>> vhost_vq_is_enabled() and some code to detect if userspace supports 
>>> the new ioctl.
>>
>>
>> Yes, vhost support backend capability. When userspace negotiate the 
>> new capability, we should depend on SET_VRING_ENABLE, if not we can 
>> do vhost_vq_is_enable().
>>
>>
>>> And then in vhost_net_set_backend do we call vhost_vq_is_enabled()? 
>>> What is done
>>> for disable then?
>>
>>
>> It needs more thought, but the question is not specific to 
>> SET_VRING_ENABLE. Consider guest may zero ring address as well.
>>
>> For disabling, we can simply flush the work and disable all the polls.
>>
>>
>>> It doesn't seem to buy a lot of new functionality. Is it just
>>> so we follow the spec?
>>
>>
>> My understanding is that, since spec defines queue_enable, we should 
>> support it in vhost. And we can piggyback the delayed vq creation 
>> with this feature. Otherwise we will duplicate the function if we 
>> want to support queue_enable.
>
>
> I had actually given up on the delayed vq creation goal. I'm still not 
> sure how it's related to ENABLE and I think it gets pretty gross.
>
> 1. If we started from a semi-clean slate, and used the ENABLE ioctl 
> more like a CREATE ioctl, and did the ENABLE after vhost dev open() 
> but before any other ioctls, we can allocate the vq when we get the 
> ENABLE ioctl. This fixes the issue where vhost scsi is allocating 128 
> vqs at open() time. We can then allocate metadata like the iovecs at 
> ENABLE time or when we get a setup ioctl that is related to the 
> metadata, so it fixes that too.
>
> That makes sense how ENABLE is related to delayed vq allocation and 
> why we would want it.
>
> If we now need to support old tools though, then you lose me. To try 
> and keep the code paths using the same code, then at vhost dev open() 
> time do we start vhost_dev_init with zero vqs like with the allocate 
> at ENABLE time case? Then when we get the first vring or dev ioctl, do 
> we allocate the vq and related metadata? If so, the ENABLE does not 
> buy us a lot since we get the delayed allocation from the compat code. 
> Also this compat case gets really messy when we are delaying the 
> actual vq and not just the metadata.
>
> If for the compat case, we keep the code that before/during 
> vhost_dev_init allocates all the vqs and does the initialization, then 
> we end up with 2 very very different code paths. And we also need a 
> new modparam or something to tell the drivers to do the old or new 
> open() behavior.


Right, so I think maybe we can take a step back. Instead of depending on 
explicit new ioctl which may cause a lot of issues, can we do something 
similar to vhost_vq_is_setup().

That means, let's create/destory new workers on SET_VRING_ADDR?


>
> 2. If we do an approach that is less invasive to the kernel for the 
> compat case, and do the ENABLE ioctl after other vring ioctl calls 
> then that would not work for the delayed vq allocation goal since the 
> ENABLE call is too late.
>
>
>>
>>
>>>
>>> Or do you want it work more like mlx5_vdpa_set_vq_ready? For this in 
>>> vhost_ring_ioctl
>>> when we get the new ioctl we would call into the drivers and have it 
>>> start queues
>>> and stop queues? For enable, what we you do for net for this case?
>>
>>
>> Net is something different, we can simply use SET_BACKEND to disable 
>> a specific virtqueue without introducing new ioctls. Notice that, net 
>> mq is kind of different with scsi which have a per queue pair vhost 
>> device, and the API allows us to set backend for a specific virtqueue.
>
>
> That's one of the things I am trying to understand. It sounds like 
> ENABLE is not useful to net. Will net even use/implement the ENABLE 
> ioctl or just use the SET_BACKEND?


I think SET_BACKEND is sufficient for net.


> What about vsock?


For vsock (and scsi as well), their backend is per virtqueue, but the 
actual issue is there's no uAPI to configure it per vq. The current uAPI 
is per device.


>
> For net it sounds like it's just going to add an extra code path if 
> you support it.


Yes, so if we really want one w(which is still questionable during our 
discussion). We can start from a SCSI specific one (or an alias of vDPA 
one).


>
>
>>
>>
>>> For disable,
>>> would you do something like vhost_net_stop_vq (we don't free up 
>>> anything allocated
>>> in vhost_vring_ioctl calls, but we can stop what we setup in the net 
>>> driver)?
>>
>>
>> It's up to you, if you think you should free the resources you can do 
>> that.
>>
>>
>>> Is this useful for the current net mq design or is this for 
>>> something like where
>>> you would do one vhost net device with multiple vqs?
>>
>>
>> I think SET_VRING_ENABLE is more useful for SCSI since it have a 
>> model of multiple vqs per vhost device.
>
> That is why I was asking about if you were going to change net.
>
> It would have been useful for scsi if we had it when mq support was 
> added and we don't have to support old tools. But now, if enable=true, 
> is only going to be something where we set some bit so later when 
> VHOST_SCSI_SET_ENDPOINT is run it we can do what we are already doing 
> its just extra code. This patch:
> https://www.spinics.net/lists/linux-scsi/msg150151.html
> would work without the ENABLE ioctl I mean.


That seems to pre-allocate all workers. If we don't care the resources 
(127 workers) consumption it could be fine.


>
>
> And if you guys want to do the completely new interface, then none of 
> this matters I guess :)
>
> For disable see below.
>
>>
>>
>>>
>>> My issue/convern is that in general these calls seems useful, but we 
>>> don't really
>>> need them for scsi because vhost scsi is already stuck creating vqs 
>>> like how it does
>>> due to existing users. If we do the ifcvf_vdpa_set_vq_ready type of 
>>> design where
>>> we just set some bit, then the new ioctl does not give us a lot. 
>>> It's just an extra
>>> check and extra code.
>>>
>>> And for the mlx5_vdpa_set_vq_ready type of design, it doesn't seem 
>>> like it's going
>>> to happen a lot where the admin is going to want to remove vqs from 
>>> a running device.
>>
>>
>> In this case, qemu may just disable the queues of vhost-scsi via 
>> SET_VRING_ENABLE and then we can free resources?
>
>
> Some SCSI background in case it doesn't work like net:
> -------
> When the user sets up mq for vhost-scsi/virtio-scsi, for max perf and 
> no cares about mem use they would normally set num_queues based on the 
> number of vCPUs and MSI-x vectors. I think the default in qemu now is 
> to try and detect that value.
>
> When the virtio_scsi driver is loaded into the guest kernel, it takes 
> the num_queues value and tells the scsi/block mq layer to create 
> num_queues multiqueue hw queues.


If I read the code correctly, for modern device, guest will set 
queue_enable for the queues that it wants to use. So in this ideal case, 
qemu can forward them to VRING_ENABLE and reset VRING_ENABLE during 
device reset.

But it would be complicated to support legacy device and qemu.


>
> ------
>
> I was trying to say in the previous email that is if all we do is set 
> some bits to indicate the queue is disabled, free its resources, stop 
> polling/queueing in the scsi/target layer, flush etc, it does not seem 
> useful. I was trying to ask when would a user only want this behavior?


I think it's device reset, the semantic is that unless the queue is 
enabled, we should treat it as disabled.


>
> I think we need an extra piece where the guests needs to be modified 
> to handle the queue removal or the block/scsi layers would still send 
> IO and we would get IO errors. Without this it seems like some extra 
> code that we will not use.
>
> And then if we are going to make disable useful like this, what about 
> enable? We would want to the reverse where we add the queue and the 
> guest remaps the mq to hw queue layout. To do this, enable has to do 
> more than just set some bits. There is also an issue with how it would 
> need to interact with the SET_BACKEND 
> (VHOST_SCSI_SET_ENDPOINT/VHOST_SCSI_CLEAR_ENDPOINT for scsi) calls.
>
> I think if we wanted the ENABLE ioctl to work like this then that is 
> not related to my patches and I like I've written before I think my 
> patches do not need the ENABLE ioctl in general. We could add the 
> patch where we create the workers threads from 
> VHOST_SCSI_SET_ENDPOINT. And if we ever add this queue hotplug type of 
> code, then the worker thread would just get moved/rearranged with the 
> other vq modification code in 
> vhost_scsi_set_endpoint/vhost_scsi_clear_endpoint.
>
> We could also go the new threading interface route, and also do the 
> ENABLE ioctl separately.


Right, my original idea is to try to make queue_enable (in the spec) 
work for SCSI and we can use that for any delayed stuffs (vq, or workers).

But it looks not as easy as I imaged.

Thanks





  reply index

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-12 23:18 Mike Christie
2020-11-12 23:19 ` [PATCH 1/1] qemu vhost scsi: add VHOST_SET_VRING_ENABLE support Mike Christie
2020-11-17 11:53   ` Stefan Hajnoczi
2020-12-02  9:59   ` Michael S. Tsirkin
2020-12-02 16:05     ` Michael Christie
2020-11-12 23:19 ` [PATCH 01/10] vhost: remove work arg from vhost_work_flush Mike Christie
2020-11-17 13:04   ` Stefan Hajnoczi
2020-11-12 23:19 ` [PATCH 02/10] vhost scsi: remove extra flushes Mike Christie
2020-11-17 13:07   ` Stefan Hajnoczi
2020-11-12 23:19 ` [PATCH 03/10] vhost poll: fix coding style Mike Christie
2020-11-17 13:07   ` Stefan Hajnoczi
2020-11-12 23:19 ` [PATCH 04/10] vhost: support multiple worker threads Mike Christie
2020-11-12 23:19 ` [PATCH 05/10] vhost: poll support support multiple workers Mike Christie
2020-11-17 15:32   ` Stefan Hajnoczi
2020-11-12 23:19 ` [PATCH 06/10] vhost scsi: make SCSI cmd completion per vq Mike Christie
2020-11-17 16:04   ` Stefan Hajnoczi
2020-11-12 23:19 ` [PATCH 07/10] vhost, vhost-scsi: flush IO vqs then send TMF rsp Mike Christie
2020-11-17 16:05   ` Stefan Hajnoczi
2020-11-12 23:19 ` [PATCH 08/10] vhost: move msg_handler to new ops struct Mike Christie
2020-11-17 16:08   ` Stefan Hajnoczi
2020-11-12 23:19 ` [PATCH 09/10] vhost: add VHOST_SET_VRING_ENABLE support Mike Christie
2020-11-17 16:14   ` Stefan Hajnoczi
2020-11-12 23:19 ` [PATCH 10/10] vhost-scsi: create a woker per IO vq Mike Christie
2020-11-17 16:40 ` [PATCH 00/10] vhost/qemu: thread per IO SCSI vq Stefan Hajnoczi
2020-11-17 19:13   ` Mike Christie
2020-11-18  9:54     ` Michael S. Tsirkin
2020-11-19 14:00       ` Stefan Hajnoczi
2020-11-18 11:31     ` Stefan Hajnoczi
2020-11-19 14:46       ` Michael S. Tsirkin
2020-11-19 16:11         ` Mike Christie
2020-11-19 16:24           ` Stefan Hajnoczi
2020-11-19 16:43             ` Mike Christie
2020-11-19 17:08               ` Stefan Hajnoczi
2020-11-20  8:45                 ` Stefan Hajnoczi
2020-11-20 12:31                   ` Michael S. Tsirkin
2020-12-01 12:59                     ` Stefan Hajnoczi
2020-12-01 13:45                       ` Stefano Garzarella
2020-12-01 17:43                         ` Stefan Hajnoczi
2020-12-02 10:35                           ` Stefano Garzarella
2020-11-23 15:17                   ` Stefano Garzarella
2020-11-18  5:17   ` Jason Wang
2020-11-18  6:57     ` Mike Christie
2020-11-18  7:19       ` Mike Christie
2020-11-18  7:54       ` Jason Wang
2020-11-18 20:06         ` Mike Christie
2020-11-19  4:35           ` Jason Wang [this message]
2020-11-19 15:49             ` Mike Christie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e91e9eee-7ff4-3ef6-955a-706276065d9b@redhat.com \
    --to=jasowang@redhat.com \
    --cc=fam@euphon.net \
    --cc=linux-scsi@vger.kernel.org \
    --cc=michael.christie@oracle.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=target-devel@vger.kernel.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Target-devel archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/target-devel/0 target-devel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 target-devel target-devel/ https://lore.kernel.org/target-devel \
		target-devel@vger.kernel.org
	public-inbox-index target-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.target-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git