linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dongli Zhang <dongli.zhang@oracle.com>
To: Cornelia Huck <cohuck@redhat.com>
Cc: mst@redhat.com, jasowang@redhat.com,
	virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, axboe@kernel.dk,
	linux-kernel@vger.kernel.org
Subject: Re: virtio-blk: should num_vqs be limited by num_possible_cpus()?
Date: Fri, 15 Mar 2019 00:08:18 +0800	[thread overview]
Message-ID: <74cf10ad-34bb-333a-3119-6021697c8e33@oracle.com> (raw)
In-Reply-To: <20190314131339.1b61fff6.cohuck@redhat.com>



On 03/14/2019 08:13 PM, Cornelia Huck wrote:
> On Thu, 14 Mar 2019 14:12:32 +0800
> Dongli Zhang <dongli.zhang@oracle.com> wrote:
> 
>> On 3/13/19 5:39 PM, Cornelia Huck wrote:
>>> On Wed, 13 Mar 2019 11:26:04 +0800
>>> Dongli Zhang <dongli.zhang@oracle.com> wrote:
>>>   
>>>> On 3/13/19 1:33 AM, Cornelia Huck wrote:  
>>>>> On Tue, 12 Mar 2019 10:22:46 -0700 (PDT)
>>>>> Dongli Zhang <dongli.zhang@oracle.com> wrote:
> 
>>>>>> Is this by design on purpose, or can we fix with below?
>>>>>>
>>>>>>
>>>>>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>>>>>> index 4bc083b..df95ce3 100644
>>>>>> --- a/drivers/block/virtio_blk.c
>>>>>> +++ b/drivers/block/virtio_blk.c
>>>>>> @@ -513,6 +513,8 @@ static int init_vq(struct virtio_blk *vblk)
>>>>>>  	if (err)
>>>>>>  		num_vqs = 1;
>>>>>>  
>>>>>> +	num_vqs = min(num_possible_cpus(), num_vqs);
>>>>>> +
>>>>>>  	vblk->vqs = kmalloc_array(num_vqs, sizeof(*vblk->vqs), GFP_KERNEL);
>>>>>>  	if (!vblk->vqs)
>>>>>>  		return -ENOMEM;    
>>>>>
>>>>> virtio-blk, however, is not pci-specific.
>>>>>
>>>>> If we are using the ccw transport on s390, a completely different
>>>>> interrupt mechanism is in use ('floating' interrupts, which are not
>>>>> per-cpu). A check like that should therefore not go into the generic
>>>>> driver.
>>>>>     
>>>>
>>>> So far there seems two options.
>>>>
>>>> The 1st option is to ask the qemu user to always specify "-num-queues" with the
>>>> same number of vcpus when running x86 guest with pci for virtio-blk or
>>>> virtio-scsi, in order to assign a vector for each queue.  
>>>
>>> That does seem like an extra burden for the user: IIUC, things work
>>> even if you have too many queues, it's just not optimal. It sounds like
>>> something that can be done by a management layer (e.g. libvirt), though.
>>>   
>>>> Or, is it fine for virtio folks to add a new hook to 'struct virtio_config_ops'
>>>> so that different platforms (e.g., pci or ccw) would use different ways to limit
>>>> the max number of queues in guest, with something like below?  
>>>
>>> That sounds better, as both transports and drivers can opt-in here.
>>>
>>> However, maybe it would be even better to try to come up with a better
>>> strategy of allocating msix vectors in virtio-pci. More vectors in the
>>> num_queues > num_cpus case, even if they still need to be shared?
>>> Individual vectors for n-1 cpus and then a shared one for the remaining
>>> queues?
>>>
>>> It might even be device-specific: Have some low-traffic status queues
>>> share a vector, and provide an individual vector for high-traffic
>>> queues. Would need some device<->transport interface, obviously.
>>>   
>>
>> This sounds a little bit similar to multiple hctx maps?
>>
>> So far, as virtio-blk only supports set->nr_maps = 1, no matter how many hw
>> queues are assigned for virtio-blk, blk_mq_alloc_tag_set() would use at most
>> nr_cpu_ids hw queues.
>>
>> 2981 int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
>> ... ...
>> 3021         /*
>> 3022          * There is no use for more h/w queues than cpus if we just have
>> 3023          * a single map
>> 3024          */
>> 3025         if (set->nr_maps == 1 && set->nr_hw_queues > nr_cpu_ids)
>> 3026                 set->nr_hw_queues = nr_cpu_ids;
>>
>> Even the block layer would limit the number of hw queues by nr_cpu_ids when
>> (set->nr_maps == 1).
> 
> Correct me if I'm wrong, but there seem to be two kinds of limitations
> involved here:
> - Allocation of msix vectors by the virtio-pci transport. We end up
>   with shared vectors if we have more virtqueues than vcpus. Other
>   transports may or may not have similar issues, but essentially, this
>   is something that applies to all kind of virtio devices attached via
>   the virtio-pci transport.

It depends.

For virtio-net, we need to specify the number of available vectors on qemu side,
e.g.,:

-device virtio-net-pci,netdev=tapnet,mq=true,vectors=16

This parameter is specific for virtio-net.

Suppose if 'queues=8' while 'vectors=16', as 2*8+1 > 16, there will be lack of
vectors and guest would not be able to assign one vector for each queue.

I was tortured by this long time ago and it seems qemu would minimize the memory
allocation and the default 'vectors' is 3.

BTW, why cannot we have a more consistent configuration for most qemu devices,
e.g., so far:

virtio-blk use 'num-queues'
nvme use 'num_queues'
virtio-net use 'queues' for tap :)

> - The block layer limits the number of hw queues to the number of
>   vcpus. This applies only to virtio devices that interact with the
>   block layer, but regardless of the virtio transport.

Yes: virtio-blk and virtio-scsi.

> 
>> That's why I think virtio-blk should use the similar solution as nvme
>> (regardless about write_queues and poll_queues) and xen-blkfront.
> 
> Ok, the hw queues limit from above would be an argument to limit to
> #vcpus in the virtio-blk driver, regardless of the transport used. (No
> idea if there are better ways to deal with this, I'm not familiar with
> the interface.)
> 
> For virtio devices that don't interact with the block layer and are
> attached via the virtio-pci transport, it might still make sense to
> revisit vector allocation.
> 

As mentioned above, we need to specify 'vectors' for virtio-net as the default
value is only 3 (config + tx + rx). That would make a little difference?

Dongli Zhang

  reply	other threads:[~2019-03-14 16:08 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-12 17:22 virtio-blk: should num_vqs be limited by num_possible_cpus()? Dongli Zhang
2019-03-12 17:33 ` Cornelia Huck
2019-03-13  3:26   ` Dongli Zhang
2019-03-13  9:39     ` Cornelia Huck
2019-03-14  6:12       ` Dongli Zhang
2019-03-14 12:13         ` Cornelia Huck
2019-03-14 16:08           ` Dongli Zhang [this message]
2019-03-15  4:50         ` Jason Wang
2019-03-15 12:41           ` Cornelia Huck
2019-03-18  7:47             ` Jason Wang
2019-03-19  2:22               ` Dongli Zhang
2019-03-20 12:53                 ` Jason Wang
2019-03-21  2:14                   ` Dongli Zhang
2019-03-21 15:57                   ` Stefan Hajnoczi
2019-03-14 12:32 ` Michael S. Tsirkin
2019-03-14 15:36   ` Dongli Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=74cf10ad-34bb-333a-3119-6021697c8e33@oracle.com \
    --to=dongli.zhang@oracle.com \
    --cc=axboe@kernel.dk \
    --cc=cohuck@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).