Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-03-04 17:33 Wei Li
  2019-03-05 17:29 ` Stefan Hajnoczi
  0 siblings, 1 reply; 46+ messages in thread
From: Wei Li @ 2019-03-04 17:33 UTC (permalink / raw)
  To: stefanha, qemu-devel

Hi Stefan and all,

I spent some time on getting familiar with QEMU and relevant concepts. My project is using QEMU 2.9 with virtio-scsi backend, and I am exploring proper way to improve the IOPS of my project.

Thanks @Stefan for the response and advices!

Could you please help review and clarify following questions:
While @Stefan mentioned about additional iothread object support of virtio-blk, Is the feature also supported by virtio-scsi? I am trying to exploring the perf multiple IO threads / per VM via followings:
QMP setup example to create 2 io threads in QEMU, one io thread per device:

(QEMU) object-add qom-type=iothread id=iothread0

(QEMU) object-add qom-type=iothread id=iothread1

(QEMU) device_add driver=virtio-scsi-pci id=test0 iothread=iothread0

(QEMU) device_add driver=virtio-scsi-pci id=test1 iothread=iothread1

(QEMU) device_add driver=scsi-block drive=none0 id=v0 bus=test0.0

(QEMU) device_add driver=scsi-block drive=none1 id=v1 bus=test1.0
You mentioned about the multi-queue devices feature, it seems like the multi-queue feature will help improve the IOPS of  single Device. Could you please provide more details?
What’s the current plan of support multi-queue device? Which release will include the support or it has already been included in any existing release newer than 2.9?
Is there any feature branch which I would get more details about the code and in progress status?
In addition, Someone posted related to multi-queue https://marc.info/?l=linux-virtualization&m=135583400026151&w=2, but it only measure the bandwidth, do we have any perf result about IOPS improvement of Multi-Queue approach?

Thanks again,

Wei

On 2/18/19, 2:24 AM, "Stefan Hajnoczi" <stefanha@redhat.com> wrote:

    On Thu, Feb 14, 2019 at 08:21:30AM -0800, Wei Li wrote:

    > I learnt that QEMU iothread architecture has one QEMU thread per vCPU and a dedicated event loop thread which is iothread, and I want to better understand whether there is any specific reason to have a single iothead instead of multiple iothreads?

    > Given that single iothread becomes a performance bottleneck in my project, if there any proper way to support multiple iothreads? E.g. have one iothread per volume attachment instead of single iothread per host?  But I am not quite sure whether it is feasible or not. Please let me know if you have any advices.

    Hi,

    Please send general questions to qemu-devel@nongnu.org and CC me in the

    future.  That way others can participate in the discussion and it will

    be archived so someone searching for the same question will find the

    answer in the future.

    QEMU supports additional IOThread objects:

      -object iothread,id=iothread0

      -device virtio-blk-pci,iothread=iothread0,drive=drive0

    This virtio-blk device will perform device emulation and I/O in

    iothread0 instead of the main loop thread.

    Currently only 1:1 device<->IOThread association is possible.  In the

    future 1:N should be possible and will allow multi-queue devices to

    achieve better performance.

    Stefan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
  2019-03-04 17:33 [Qemu-devel] Following up questions related to QEMU and I/O Thread Wei Li
@ 2019-03-05 17:29 ` Stefan Hajnoczi
       [not found]   ` <2D7F11D0-4A02-4A0F-961D-854240376B17@oracle.com>
  0 siblings, 1 reply; 46+ messages in thread
From: Stefan Hajnoczi @ 2019-03-05 17:29 UTC (permalink / raw)
  To: Wei Li; +Cc: qemu-devel, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 1528 bytes --]

On Mon, Mar 04, 2019 at 09:33:26AM -0800, Wei Li wrote:
> While @Stefan mentioned about additional iothread object support of virtio-blk, Is the feature also supported by virtio-scsi? I am trying to exploring the perf multiple IO threads / per VM via followings:
> QMP setup example to create 2 io threads in QEMU, one io thread per device:
> 
> (QEMU) object-add qom-type=iothread id=iothread0
> 
> (QEMU) object-add qom-type=iothread id=iothread1
> 
>  
> 
> (QEMU) device_add driver=virtio-scsi-pci id=test0 iothread=iothread0
> 
> (QEMU) device_add driver=virtio-scsi-pci id=test1 iothread=iothread1
> 
>  
> 
> (QEMU) device_add driver=scsi-block drive=none0 id=v0 bus=test0.0
> 
> (QEMU) device_add driver=scsi-block drive=none1 id=v1 bus=test1.0

Yes, each virtio-scsi-pci device can be assigned to an iothread.

> You mentioned about the multi-queue devices feature, it seems like the multi-queue feature will help improve the IOPS of  single Device. Could you please provide more details?
> What’s the current plan of support multi-queue device? Which release will include the support or it has already been included in any existing release newer than 2.9?
> Is there any feature branch which I would get more details about the code and in progress status?

I have CCed Paolo, who has worked on multiqueue block layer support in
QEMU.  This feature is not yet complete.

The virtio-scsi device also supports multiqueue, but the QEMU block
layer will still be a single queue.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
       [not found]   ` <2D7F11D0-4A02-4A0F-961D-854240376B17@oracle.com>
@ 2019-04-01  9:07     ` Stefan Hajnoczi
  2019-04-05 21:09         ` Wei Li
       [not found]       ` <CC372DF3-1AC6-46B5-98A5-21159497034A@oracle.com>
  0 siblings, 2 replies; 46+ messages in thread
From: Stefan Hajnoczi @ 2019-04-01  9:07 UTC (permalink / raw)
  To: Wei Li; +Cc: Stefan Hajnoczi, Paolo Bonzini, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 780 bytes --]

On Fri, Mar 29, 2019 at 08:16:36AM -0700, Wei Li wrote:
> Thanks Stefan for your reply and guidance!
> 
> We spent some time on exploring the multiple I/O Threads approach per your feedback. Based on the perf measurement data, we did see some IOPS improvement for multiple volumes, which is great. :)
> 
> In addition, IOPS for single Volume will still be a bottleneck, it seems like multiqueue block layer feature which Paolo is working on may be able to help improving the IOPS for single volume.
> 
> @Paolo, @Stefan, 
> Would you mind sharing the multiqueue feature code branch with us? So that we could get some rough idea about this feature and maybe start doing some exploration? 

Paolo last worked on this code, so he may be able to send you a link.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-05 21:09         ` Wei Li
  0 siblings, 0 replies; 46+ messages in thread
From: Wei Li @ 2019-04-05 21:09 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Stefan Hajnoczi, Paolo Bonzini, qemu-devel

Thanks Stefan for your quick response!

Hi Paolo,
Could you please send us a link related to the multiqueue feature which you are working on so that we could start getting some details about the feature.

Thanks again,
Wei 

On 4/1/19, 3:54 AM, "Stefan Hajnoczi" <stefanha@gmail.com> wrote:

    On Fri, Mar 29, 2019 at 08:16:36AM -0700, Wei Li wrote:
    > Thanks Stefan for your reply and guidance!
    > 
    > We spent some time on exploring the multiple I/O Threads approach per your feedback. Based on the perf measurement data, we did see some IOPS improvement for multiple volumes, which is great. :)
    > 
    > In addition, IOPS for single Volume will still be a bottleneck, it seems like multiqueue block layer feature which Paolo is working on may be able to help improving the IOPS for single volume.
    > 
    > @Paolo, @Stefan, 
    > Would you mind sharing the multiqueue feature code branch with us? So that we could get some rough idea about this feature and maybe start doing some exploration? 
    
    Paolo last worked on this code, so he may be able to send you a link.
    
    Stefan
    

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-05 21:09         ` Wei Li
  0 siblings, 0 replies; 46+ messages in thread
From: Wei Li @ 2019-04-05 21:09 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Paolo Bonzini, qemu-devel, Stefan Hajnoczi

Thanks Stefan for your quick response!

Hi Paolo,
Could you please send us a link related to the multiqueue feature which you are working on so that we could start getting some details about the feature.

Thanks again,
Wei 

On 4/1/19, 3:54 AM, "Stefan Hajnoczi" <stefanha@gmail.com> wrote:

    On Fri, Mar 29, 2019 at 08:16:36AM -0700, Wei Li wrote:
    > Thanks Stefan for your reply and guidance!
    > 
    > We spent some time on exploring the multiple I/O Threads approach per your feedback. Based on the perf measurement data, we did see some IOPS improvement for multiple volumes, which is great. :)
    > 
    > In addition, IOPS for single Volume will still be a bottleneck, it seems like multiqueue block layer feature which Paolo is working on may be able to help improving the IOPS for single volume.
    > 
    > @Paolo, @Stefan, 
    > Would you mind sharing the multiqueue feature code branch with us? So that we could get some rough idea about this feature and maybe start doing some exploration? 
    
    Paolo last worked on this code, so he may be able to send you a link.
    
    Stefan
    




^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
       [not found]       ` <CC372DF3-1AC6-46B5-98A5-21159497034A@oracle.com>
@ 2019-04-15 17:34         ` Wei Li
  2019-04-15 23:23             ` Dongli Zhang
  0 siblings, 1 reply; 46+ messages in thread
From: Wei Li @ 2019-04-15 17:34 UTC (permalink / raw)
  To: Stefan Hajnoczi, Stefan Hajnoczi, Paolo Bonzini; +Cc: qemu-devel

Hi @Paolo Bonzini & @Stefan Hajnoczi,

Would you please help confirm whether @Paolo Bonzini's multiqueue feature change will benefit virtio-scsi or not? Thanks!

@Stefan Hajnoczi,
I also spent some time on exploring the virtio-scsi multi-queue features via num_queues parameter as below, here are what we found:

1. Increase number of Queues from one to the same number as CPU will get better IOPS increase.
2. Increase number of Queues to the number (e.g. 8) larger than the number of vCPU (e.g. 2) can get even better IOPS increase.

In addition, It seems Qemu can get better IOPS while the attachment uses more queues than the number of vCPU, how could it possible? Could you please help us better understand the behavior? Thanks a lot!


Host CPU Configuration:
CPU(s):                2
Thread(s) per core:    2
Core(s) per socket:    1
Socket(s):             1

Commands for multi queue Setup:
(QEMU)  device_add driver=virtio-scsi-pci num_queues=1 id=test1
(QEMU)  device_add driver=virtio-scsi-pci num_queues=2 id=test2
(QEMU)  device_add driver=virtio-scsi-pci num_queues=4 id=test4
(QEMU)  device_add driver=virtio-scsi-pci num_queues=8 id=test8


Result:
	|  8 Queues   |  4 Queues     |      2 Queues    |      Single Queue
IOPS 	|   +29%         |      27%           |        11%           |      Baseline

Thanks,
Wei

On 4/5/19, 2:09 PM, "Wei Li" <wei.d.li@oracle.com> wrote:

    Thanks Stefan for your quick response!
    
    Hi Paolo,
    Could you please send us a link related to the multiqueue feature which you are working on so that we could start getting some details about the feature.
    
    Thanks again,
    Wei 
    
    On 4/1/19, 3:54 AM, "Stefan Hajnoczi" <stefanha@gmail.com> wrote:
    
        On Fri, Mar 29, 2019 at 08:16:36AM -0700, Wei Li wrote:
        > Thanks Stefan for your reply and guidance!
        > 
        > We spent some time on exploring the multiple I/O Threads approach per your feedback. Based on the perf measurement data, we did see some IOPS improvement for multiple volumes, which is great. :)
        > 
        > In addition, IOPS for single Volume will still be a bottleneck, it seems like multiqueue block layer feature which Paolo is working on may be able to help improving the IOPS for single volume.
        > 
        > @Paolo, @Stefan, 
        > Would you mind sharing the multiqueue feature code branch with us? So that we could get some rough idea about this feature and maybe start doing some exploration? 
        
        Paolo last worked on this code, so he may be able to send you a link.
        
        Stefan
        
    

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-15 23:23             ` Dongli Zhang
  0 siblings, 0 replies; 46+ messages in thread
From: Dongli Zhang @ 2019-04-15 23:23 UTC (permalink / raw)
  To: Wei Li; +Cc: Stefan Hajnoczi, Stefan Hajnoczi, Paolo Bonzini, qemu-devel



On 4/16/19 1:34 AM, Wei Li wrote:
> Hi @Paolo Bonzini & @Stefan Hajnoczi,
> 
> Would you please help confirm whether @Paolo Bonzini's multiqueue feature change will benefit virtio-scsi or not? Thanks!
> 
> @Stefan Hajnoczi,
> I also spent some time on exploring the virtio-scsi multi-queue features via num_queues parameter as below, here are what we found:
> 
> 1. Increase number of Queues from one to the same number as CPU will get better IOPS increase.
> 2. Increase number of Queues to the number (e.g. 8) larger than the number of vCPU (e.g. 2) can get even better IOPS increase.

As mentioned in below link, when the number of hw queues is larger than
nr_cpu_ids, the blk-mq layer would limit and only use at most nr_cpu_ids queues
(e.g., /sys/block/sda/mq/).

That is, when the num_queus=4 while vcpus is 2, there should be only 2 queues
available /sys/block/sda/mq/

https://lore.kernel.org/lkml/1553682995-5682-1-git-send-email-dongli.zhang@oracle.com/

I am just curious how increasing the num_queues from 2 to 4 would double the
iops, while there are only 2 vcpus available...

Dongli Zhang

> 
> In addition, It seems Qemu can get better IOPS while the attachment uses more queues than the number of vCPU, how could it possible? Could you please help us better understand the behavior? Thanks a lot!
> 
> 
> Host CPU Configuration:
> CPU(s):                2
> Thread(s) per core:    2
> Core(s) per socket:    1
> Socket(s):             1
> 
> Commands for multi queue Setup:
> (QEMU)  device_add driver=virtio-scsi-pci num_queues=1 id=test1
> (QEMU)  device_add driver=virtio-scsi-pci num_queues=2 id=test2
> (QEMU)  device_add driver=virtio-scsi-pci num_queues=4 id=test4
> (QEMU)  device_add driver=virtio-scsi-pci num_queues=8 id=test8
> 
> 
> Result:
> 	|  8 Queues   |  4 Queues     |      2 Queues    |      Single Queue
> IOPS 	|   +29%         |      27%           |        11%           |      Baseline
> 
> Thanks,
> Wei
> 
> On 4/5/19, 2:09 PM, "Wei Li" <wei.d.li@oracle.com> wrote:
> 
>     Thanks Stefan for your quick response!
>     
>     Hi Paolo,
>     Could you please send us a link related to the multiqueue feature which you are working on so that we could start getting some details about the feature.
>     
>     Thanks again,
>     Wei 
>     
>     On 4/1/19, 3:54 AM, "Stefan Hajnoczi" <stefanha@gmail.com> wrote:
>     
>         On Fri, Mar 29, 2019 at 08:16:36AM -0700, Wei Li wrote:
>         > Thanks Stefan for your reply and guidance!
>         > 
>         > We spent some time on exploring the multiple I/O Threads approach per your feedback. Based on the perf measurement data, we did see some IOPS improvement for multiple volumes, which is great. :)
>         > 
>         > In addition, IOPS for single Volume will still be a bottleneck, it seems like multiqueue block layer feature which Paolo is working on may be able to help improving the IOPS for single volume.
>         > 
>         > @Paolo, @Stefan, 
>         > Would you mind sharing the multiqueue feature code branch with us? So that we could get some rough idea about this feature and maybe start doing some exploration? 
>         
>         Paolo last worked on this code, so he may be able to send you a link.
>         
>         Stefan
>         
>     
> 
> 
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-15 23:23             ` Dongli Zhang
  0 siblings, 0 replies; 46+ messages in thread
From: Dongli Zhang @ 2019-04-15 23:23 UTC (permalink / raw)
  To: Wei Li; +Cc: Stefan Hajnoczi, qemu-devel, Stefan Hajnoczi, Paolo Bonzini



On 4/16/19 1:34 AM, Wei Li wrote:
> Hi @Paolo Bonzini & @Stefan Hajnoczi,
> 
> Would you please help confirm whether @Paolo Bonzini's multiqueue feature change will benefit virtio-scsi or not? Thanks!
> 
> @Stefan Hajnoczi,
> I also spent some time on exploring the virtio-scsi multi-queue features via num_queues parameter as below, here are what we found:
> 
> 1. Increase number of Queues from one to the same number as CPU will get better IOPS increase.
> 2. Increase number of Queues to the number (e.g. 8) larger than the number of vCPU (e.g. 2) can get even better IOPS increase.

As mentioned in below link, when the number of hw queues is larger than
nr_cpu_ids, the blk-mq layer would limit and only use at most nr_cpu_ids queues
(e.g., /sys/block/sda/mq/).

That is, when the num_queus=4 while vcpus is 2, there should be only 2 queues
available /sys/block/sda/mq/

https://lore.kernel.org/lkml/1553682995-5682-1-git-send-email-dongli.zhang@oracle.com/

I am just curious how increasing the num_queues from 2 to 4 would double the
iops, while there are only 2 vcpus available...

Dongli Zhang

> 
> In addition, It seems Qemu can get better IOPS while the attachment uses more queues than the number of vCPU, how could it possible? Could you please help us better understand the behavior? Thanks a lot!
> 
> 
> Host CPU Configuration:
> CPU(s):                2
> Thread(s) per core:    2
> Core(s) per socket:    1
> Socket(s):             1
> 
> Commands for multi queue Setup:
> (QEMU)  device_add driver=virtio-scsi-pci num_queues=1 id=test1
> (QEMU)  device_add driver=virtio-scsi-pci num_queues=2 id=test2
> (QEMU)  device_add driver=virtio-scsi-pci num_queues=4 id=test4
> (QEMU)  device_add driver=virtio-scsi-pci num_queues=8 id=test8
> 
> 
> Result:
> 	|  8 Queues   |  4 Queues     |      2 Queues    |      Single Queue
> IOPS 	|   +29%         |      27%           |        11%           |      Baseline
> 
> Thanks,
> Wei
> 
> On 4/5/19, 2:09 PM, "Wei Li" <wei.d.li@oracle.com> wrote:
> 
>     Thanks Stefan for your quick response!
>     
>     Hi Paolo,
>     Could you please send us a link related to the multiqueue feature which you are working on so that we could start getting some details about the feature.
>     
>     Thanks again,
>     Wei 
>     
>     On 4/1/19, 3:54 AM, "Stefan Hajnoczi" <stefanha@gmail.com> wrote:
>     
>         On Fri, Mar 29, 2019 at 08:16:36AM -0700, Wei Li wrote:
>         > Thanks Stefan for your reply and guidance!
>         > 
>         > We spent some time on exploring the multiple I/O Threads approach per your feedback. Based on the perf measurement data, we did see some IOPS improvement for multiple volumes, which is great. :)
>         > 
>         > In addition, IOPS for single Volume will still be a bottleneck, it seems like multiqueue block layer feature which Paolo is working on may be able to help improving the IOPS for single volume.
>         > 
>         > @Paolo, @Stefan, 
>         > Would you mind sharing the multiqueue feature code branch with us? So that we could get some rough idea about this feature and maybe start doing some exploration? 
>         
>         Paolo last worked on this code, so he may be able to send you a link.
>         
>         Stefan
>         
>     
> 
> 
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-16  9:20               ` Stefan Hajnoczi
  0 siblings, 0 replies; 46+ messages in thread
From: Stefan Hajnoczi @ 2019-04-16  9:20 UTC (permalink / raw)
  To: Dongli Zhang; +Cc: Wei Li, Stefan Hajnoczi, Paolo Bonzini, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2580 bytes --]

On Tue, Apr 16, 2019 at 07:23:38AM +0800, Dongli Zhang wrote:
> 
> 
> On 4/16/19 1:34 AM, Wei Li wrote:
> > Hi @Paolo Bonzini & @Stefan Hajnoczi,
> > 
> > Would you please help confirm whether @Paolo Bonzini's multiqueue feature change will benefit virtio-scsi or not? Thanks!
> > 
> > @Stefan Hajnoczi,
> > I also spent some time on exploring the virtio-scsi multi-queue features via num_queues parameter as below, here are what we found:
> > 
> > 1. Increase number of Queues from one to the same number as CPU will get better IOPS increase.
> > 2. Increase number of Queues to the number (e.g. 8) larger than the number of vCPU (e.g. 2) can get even better IOPS increase.
> 
> As mentioned in below link, when the number of hw queues is larger than
> nr_cpu_ids, the blk-mq layer would limit and only use at most nr_cpu_ids queues
> (e.g., /sys/block/sda/mq/).
> 
> That is, when the num_queus=4 while vcpus is 2, there should be only 2 queues
> available /sys/block/sda/mq/
> 
> https://lore.kernel.org/lkml/1553682995-5682-1-git-send-email-dongli.zhang@oracle.com/
> 
> I am just curious how increasing the num_queues from 2 to 4 would double the
> iops, while there are only 2 vcpus available...

I don't know the answer.  It's especially hard to guess without seeing
the benchmark (fio?) configuration and QEMU command-line.

Common things to look at are:

1. Compare "iostat -dx 1" inside the guest and host.  Are the I/O
   patterns comparable?  blktrace(8) can give you even more detail on
   the exact I/O patterns.  If the guest and host have different I/O
   patterns (blocksize, IOPS, queue depth) then request merging or
   I/O scheduler effects could be responsible for the difference.

2. kvm_stat or perf record -a -e kvm:\* counters for vmexits and
   interrupt injections.  If these counters vary greatly between queue
   sizes, then that is usually a clue.  It's possible to get higher
   performance by spending more CPU cycles although your system doesn't
   have many CPUs available, so I'm not sure if this is the case.

3. Power management and polling (kvm.ko halt_poll_ns, tuned profiles,
   and QEMU iothread poll-max-ns).  It's expensive to wake a CPU when it
   goes into a low power mode due to idle.  There are several features
   that can keep the CPU awake or even poll so that request latency is
   reduced.  The reason why the number of queues may matter is that
   kicking multiple queues may keep the CPU awake more than batching
   multiple requests onto a small number of queues.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-16  9:20               ` Stefan Hajnoczi
  0 siblings, 0 replies; 46+ messages in thread
From: Stefan Hajnoczi @ 2019-04-16  9:20 UTC (permalink / raw)
  To: Dongli Zhang; +Cc: Paolo Bonzini, Wei Li, qemu-devel, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2580 bytes --]

On Tue, Apr 16, 2019 at 07:23:38AM +0800, Dongli Zhang wrote:
> 
> 
> On 4/16/19 1:34 AM, Wei Li wrote:
> > Hi @Paolo Bonzini & @Stefan Hajnoczi,
> > 
> > Would you please help confirm whether @Paolo Bonzini's multiqueue feature change will benefit virtio-scsi or not? Thanks!
> > 
> > @Stefan Hajnoczi,
> > I also spent some time on exploring the virtio-scsi multi-queue features via num_queues parameter as below, here are what we found:
> > 
> > 1. Increase number of Queues from one to the same number as CPU will get better IOPS increase.
> > 2. Increase number of Queues to the number (e.g. 8) larger than the number of vCPU (e.g. 2) can get even better IOPS increase.
> 
> As mentioned in below link, when the number of hw queues is larger than
> nr_cpu_ids, the blk-mq layer would limit and only use at most nr_cpu_ids queues
> (e.g., /sys/block/sda/mq/).
> 
> That is, when the num_queus=4 while vcpus is 2, there should be only 2 queues
> available /sys/block/sda/mq/
> 
> https://lore.kernel.org/lkml/1553682995-5682-1-git-send-email-dongli.zhang@oracle.com/
> 
> I am just curious how increasing the num_queues from 2 to 4 would double the
> iops, while there are only 2 vcpus available...

I don't know the answer.  It's especially hard to guess without seeing
the benchmark (fio?) configuration and QEMU command-line.

Common things to look at are:

1. Compare "iostat -dx 1" inside the guest and host.  Are the I/O
   patterns comparable?  blktrace(8) can give you even more detail on
   the exact I/O patterns.  If the guest and host have different I/O
   patterns (blocksize, IOPS, queue depth) then request merging or
   I/O scheduler effects could be responsible for the difference.

2. kvm_stat or perf record -a -e kvm:\* counters for vmexits and
   interrupt injections.  If these counters vary greatly between queue
   sizes, then that is usually a clue.  It's possible to get higher
   performance by spending more CPU cycles although your system doesn't
   have many CPUs available, so I'm not sure if this is the case.

3. Power management and polling (kvm.ko halt_poll_ns, tuned profiles,
   and QEMU iothread poll-max-ns).  It's expensive to wake a CPU when it
   goes into a low power mode due to idle.  There are several features
   that can keep the CPU awake or even poll so that request latency is
   reduced.  The reason why the number of queues may matter is that
   kicking multiple queues may keep the CPU awake more than batching
   multiple requests onto a small number of queues.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-16 14:01           ` Paolo Bonzini
  0 siblings, 0 replies; 46+ messages in thread
From: Paolo Bonzini @ 2019-04-16 14:01 UTC (permalink / raw)
  To: Wei Li, Stefan Hajnoczi; +Cc: Stefan Hajnoczi, qemu-devel

On 05/04/19 23:09, Wei Li wrote:
> Thanks Stefan for your quick response!
> 
> Hi Paolo, Could you please send us a link related to the multiqueue
> feature which you are working on so that we could start getting some
> details about the feature.

I have never gotten to the point of multiqueue, a prerequisite for that
was to make the block layer thread safe.

The latest state of the work is at github.com/bonzini/qemu, branch
dataplane7.

Paolo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-16 14:01           ` Paolo Bonzini
  0 siblings, 0 replies; 46+ messages in thread
From: Paolo Bonzini @ 2019-04-16 14:01 UTC (permalink / raw)
  To: Wei Li, Stefan Hajnoczi; +Cc: qemu-devel, Stefan Hajnoczi

On 05/04/19 23:09, Wei Li wrote:
> Thanks Stefan for your quick response!
> 
> Hi Paolo, Could you please send us a link related to the multiqueue
> feature which you are working on so that we could start getting some
> details about the feature.

I have never gotten to the point of multiqueue, a prerequisite for that
was to make the block layer thread safe.

The latest state of the work is at github.com/bonzini/qemu, branch
dataplane7.

Paolo


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-17  1:38             ` Wei Li
  0 siblings, 0 replies; 46+ messages in thread
From: Wei Li @ 2019-04-17  1:38 UTC (permalink / raw)
  To: Paolo Bonzini, Stefan Hajnoczi; +Cc: Stefan Hajnoczi, qemu-devel

Thanks Paolo for your response and clarification. 

Btw, is there any rough schedule about when are you planning to start working on the multi queue feature?  Once you start working on the feature, I would like to hear more details about the design and better understand how this feature will benefit the performance of virtio-scsi.

Thanks again,
Wei

On 4/16/19, 7:01 AM, "Paolo Bonzini" <pbonzini@redhat.com> wrote:

    On 05/04/19 23:09, Wei Li wrote:
    > Thanks Stefan for your quick response!
    > 
    > Hi Paolo, Could you please send us a link related to the multiqueue
    > feature which you are working on so that we could start getting some
    > details about the feature.

    I have never gotten to the point of multiqueue, a prerequisite for that
    was to make the block layer thread safe.

    The latest state of the work is at github.com/bonzini/qemu, branch
    dataplane7.

    Paolo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-17  1:38             ` Wei Li
  0 siblings, 0 replies; 46+ messages in thread
From: Wei Li @ 2019-04-17  1:38 UTC (permalink / raw)
  To: Paolo Bonzini, Stefan Hajnoczi; +Cc: qemu-devel, Stefan Hajnoczi

Thanks Paolo for your response and clarification. 

Btw, is there any rough schedule about when are you planning to start working on the multi queue feature?  Once you start working on the feature, I would like to hear more details about the design and better understand how this feature will benefit the performance of virtio-scsi.

Thanks again,
Wei

On 4/16/19, 7:01 AM, "Paolo Bonzini" <pbonzini@redhat.com> wrote:

    On 05/04/19 23:09, Wei Li wrote:
    > Thanks Stefan for your quick response!
    > 
    > Hi Paolo, Could you please send us a link related to the multiqueue
    > feature which you are working on so that we could start getting some
    > details about the feature.

    I have never gotten to the point of multiqueue, a prerequisite for that
    was to make the block layer thread safe.

    The latest state of the work is at github.com/bonzini/qemu, branch
    dataplane7.

    Paolo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-17  1:42                 ` Wei Li
  0 siblings, 0 replies; 46+ messages in thread
From: Wei Li @ 2019-04-17  1:42 UTC (permalink / raw)
  To: Stefan Hajnoczi, Dongli Zhang; +Cc: Stefan Hajnoczi, Paolo Bonzini, qemu-devel

Thanks Stefan and Dongli for your feedback and advices!

I will do the further investigation per your advices and get back to you later on.

Thanks, 
-Wei

On 4/16/19, 2:20 AM, "Stefan Hajnoczi" <stefanha@gmail.com> wrote:

    On Tue, Apr 16, 2019 at 07:23:38AM +0800, Dongli Zhang wrote:
    > 
    > 
    > On 4/16/19 1:34 AM, Wei Li wrote:
    > > Hi @Paolo Bonzini & @Stefan Hajnoczi,
    > > 
    > > Would you please help confirm whether @Paolo Bonzini's multiqueue feature change will benefit virtio-scsi or not? Thanks!
    > > 
    > > @Stefan Hajnoczi,
    > > I also spent some time on exploring the virtio-scsi multi-queue features via num_queues parameter as below, here are what we found:
    > > 
    > > 1. Increase number of Queues from one to the same number as CPU will get better IOPS increase.
    > > 2. Increase number of Queues to the number (e.g. 8) larger than the number of vCPU (e.g. 2) can get even better IOPS increase.
    > 
    > As mentioned in below link, when the number of hw queues is larger than
    > nr_cpu_ids, the blk-mq layer would limit and only use at most nr_cpu_ids queues
    > (e.g., /sys/block/sda/mq/).
    > 
    > That is, when the num_queus=4 while vcpus is 2, there should be only 2 queues
    > available /sys/block/sda/mq/
    > 
    > https://lore.kernel.org/lkml/1553682995-5682-1-git-send-email-dongli.zhang@oracle.com/
    > 
    > I am just curious how increasing the num_queues from 2 to 4 would double the
    > iops, while there are only 2 vcpus available...
    
    I don't know the answer.  It's especially hard to guess without seeing
    the benchmark (fio?) configuration and QEMU command-line.
    
    Common things to look at are:
    
    1. Compare "iostat -dx 1" inside the guest and host.  Are the I/O
       patterns comparable?  blktrace(8) can give you even more detail on
       the exact I/O patterns.  If the guest and host have different I/O
       patterns (blocksize, IOPS, queue depth) then request merging or
       I/O scheduler effects could be responsible for the difference.
    
    2. kvm_stat or perf record -a -e kvm:\* counters for vmexits and
       interrupt injections.  If these counters vary greatly between queue
       sizes, then that is usually a clue.  It's possible to get higher
       performance by spending more CPU cycles although your system doesn't
       have many CPUs available, so I'm not sure if this is the case.
    
    3. Power management and polling (kvm.ko halt_poll_ns, tuned profiles,
       and QEMU iothread poll-max-ns).  It's expensive to wake a CPU when it
       goes into a low power mode due to idle.  There are several features
       that can keep the CPU awake or even poll so that request latency is
       reduced.  The reason why the number of queues may matter is that
       kicking multiple queues may keep the CPU awake more than batching
       multiple requests onto a small number of queues.
    
    Stefan
    

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-17  1:42                 ` Wei Li
  0 siblings, 0 replies; 46+ messages in thread
From: Wei Li @ 2019-04-17  1:42 UTC (permalink / raw)
  To: Stefan Hajnoczi, Dongli Zhang; +Cc: Paolo Bonzini, qemu-devel, Stefan Hajnoczi

Thanks Stefan and Dongli for your feedback and advices!

I will do the further investigation per your advices and get back to you later on.

Thanks, 
-Wei

On 4/16/19, 2:20 AM, "Stefan Hajnoczi" <stefanha@gmail.com> wrote:

    On Tue, Apr 16, 2019 at 07:23:38AM +0800, Dongli Zhang wrote:
    > 
    > 
    > On 4/16/19 1:34 AM, Wei Li wrote:
    > > Hi @Paolo Bonzini & @Stefan Hajnoczi,
    > > 
    > > Would you please help confirm whether @Paolo Bonzini's multiqueue feature change will benefit virtio-scsi or not? Thanks!
    > > 
    > > @Stefan Hajnoczi,
    > > I also spent some time on exploring the virtio-scsi multi-queue features via num_queues parameter as below, here are what we found:
    > > 
    > > 1. Increase number of Queues from one to the same number as CPU will get better IOPS increase.
    > > 2. Increase number of Queues to the number (e.g. 8) larger than the number of vCPU (e.g. 2) can get even better IOPS increase.
    > 
    > As mentioned in below link, when the number of hw queues is larger than
    > nr_cpu_ids, the blk-mq layer would limit and only use at most nr_cpu_ids queues
    > (e.g., /sys/block/sda/mq/).
    > 
    > That is, when the num_queus=4 while vcpus is 2, there should be only 2 queues
    > available /sys/block/sda/mq/
    > 
    > https://lore.kernel.org/lkml/1553682995-5682-1-git-send-email-dongli.zhang@oracle.com/
    > 
    > I am just curious how increasing the num_queues from 2 to 4 would double the
    > iops, while there are only 2 vcpus available...
    
    I don't know the answer.  It's especially hard to guess without seeing
    the benchmark (fio?) configuration and QEMU command-line.
    
    Common things to look at are:
    
    1. Compare "iostat -dx 1" inside the guest and host.  Are the I/O
       patterns comparable?  blktrace(8) can give you even more detail on
       the exact I/O patterns.  If the guest and host have different I/O
       patterns (blocksize, IOPS, queue depth) then request merging or
       I/O scheduler effects could be responsible for the difference.
    
    2. kvm_stat or perf record -a -e kvm:\* counters for vmexits and
       interrupt injections.  If these counters vary greatly between queue
       sizes, then that is usually a clue.  It's possible to get higher
       performance by spending more CPU cycles although your system doesn't
       have many CPUs available, so I'm not sure if this is the case.
    
    3. Power management and polling (kvm.ko halt_poll_ns, tuned profiles,
       and QEMU iothread poll-max-ns).  It's expensive to wake a CPU when it
       goes into a low power mode due to idle.  There are several features
       that can keep the CPU awake or even poll so that request latency is
       reduced.  The reason why the number of queues may matter is that
       kicking multiple queues may keep the CPU awake more than batching
       multiple requests onto a small number of queues.
    
    Stefan
    




^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-17 12:17               ` Paolo Bonzini
  0 siblings, 0 replies; 46+ messages in thread
From: Paolo Bonzini @ 2019-04-17 12:17 UTC (permalink / raw)
  To: Wei Li, Stefan Hajnoczi; +Cc: Stefan Hajnoczi, qemu-devel, Sergio Lopez Pascual

On 17/04/19 03:38, Wei Li wrote:
> Thanks Paolo for your response and clarification.
> 
> Btw, is there any rough schedule about when are you planning to start
> working on the multi queue feature?  Once you start working on the
> feature, I would like to hear more details about the design and
> better understand how this feature will benefit the performance of
> virtio-scsi.

I wish I knew... :)  However, hopefully I will share the details soon
with Sergio and start flushing that queue in 4.1.

Paolo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-17 12:17               ` Paolo Bonzini
  0 siblings, 0 replies; 46+ messages in thread
From: Paolo Bonzini @ 2019-04-17 12:17 UTC (permalink / raw)
  To: Wei Li, Stefan Hajnoczi; +Cc: Sergio Lopez Pascual, qemu-devel, Stefan Hajnoczi

On 17/04/19 03:38, Wei Li wrote:
> Thanks Paolo for your response and clarification.
> 
> Btw, is there any rough schedule about when are you planning to start
> working on the multi queue feature?  Once you start working on the
> feature, I would like to hear more details about the design and
> better understand how this feature will benefit the performance of
> virtio-scsi.

I wish I knew... :)  However, hopefully I will share the details soon
with Sergio and start flushing that queue in 4.1.

Paolo


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-18  3:34                 ` Wei Li
  0 siblings, 0 replies; 46+ messages in thread
From: Wei Li @ 2019-04-18  3:34 UTC (permalink / raw)
  To: Paolo Bonzini, Stefan Hajnoczi
  Cc: Stefan Hajnoczi, qemu-devel, Sergio Lopez Pascual

Sounds good, let's keep in touch. __

Thanks,
Wei

On 4/17/19, 5:17 AM, "Paolo Bonzini" <pbonzini@redhat.com> wrote:

    On 17/04/19 03:38, Wei Li wrote:
    > Thanks Paolo for your response and clarification.
    > 
    > Btw, is there any rough schedule about when are you planning to start
    > working on the multi queue feature?  Once you start working on the
    > feature, I would like to hear more details about the design and
    > better understand how this feature will benefit the performance of
    > virtio-scsi.
    
    I wish I knew... :)  However, hopefully I will share the details soon
    with Sergio and start flushing that queue in 4.1.
    
    Paolo
    

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-18  3:34                 ` Wei Li
  0 siblings, 0 replies; 46+ messages in thread
From: Wei Li @ 2019-04-18  3:34 UTC (permalink / raw)
  To: Paolo Bonzini, Stefan Hajnoczi
  Cc: Sergio Lopez Pascual, qemu-devel, Stefan Hajnoczi

Sounds good, let's keep in touch. __

Thanks,
Wei

On 4/17/19, 5:17 AM, "Paolo Bonzini" <pbonzini@redhat.com> wrote:

    On 17/04/19 03:38, Wei Li wrote:
    > Thanks Paolo for your response and clarification.
    > 
    > Btw, is there any rough schedule about when are you planning to start
    > working on the multi queue feature?  Once you start working on the
    > feature, I would like to hear more details about the design and
    > better understand how this feature will benefit the performance of
    > virtio-scsi.
    
    I wish I knew... :)  However, hopefully I will share the details soon
    with Sergio and start flushing that queue in 4.1.
    
    Paolo
    




^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-23  4:21                   ` Wei Li
  0 siblings, 0 replies; 46+ messages in thread
From: Wei Li @ 2019-04-23  4:21 UTC (permalink / raw)
  To: Stefan Hajnoczi, Dongli Zhang; +Cc: Stefan Hajnoczi, Paolo Bonzini, qemu-devel

Hi Stefan,

I did investigation per your advices, please see inline for the details and questions.
 
       1. Compare "iostat -dx 1" inside the guest and host.  Are the I/O
           patterns comparable?  blktrace(8) can give you even more detail on
           the exact I/O patterns.  If the guest and host have different I/O
           patterns (blocksize, IOPS, queue depth) then request merging or
           I/O scheduler effects could be responsible for the difference.

[wei]: That's good point, I compared 'iostate -dx1" between guest and host, but I have not find obvious difference between guest and host which could responsible for the difference.
        
        2. kvm_stat or perf record -a -e kvm:\* counters for vmexits and
           interrupt injections.  If these counters vary greatly between queue
           sizes, then that is usually a clue.  It's possible to get higher
           performance by spending more CPU cycles although your system doesn't
           have many CPUs available, so I'm not sure if this is the case.

[wei]: vmexits looks like a reason. I am using FIO tool to read/write block storage via following sample command, interesting thing is that kvm:kvm_exit count decreased from 846K to 395K after I increased num_queues from 2 to 4 while the vCPU count is 2.
           1). Does this mean using more queues than vCPU count may increase IOPS via spending more CPU cycle? 
           2). Could you please help me better understand how more queues is able to spend more CPU cycle? Thanks!
           FIO command: fio --filename=/dev/sdb --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=64 --numjobs=4 --time_based --group_reporting --name=iops --runtime=60 --eta-newline=1

        3. Power management and polling (kvm.ko halt_poll_ns, tuned profiles,
           and QEMU iothread poll-max-ns).  It's expensive to wake a CPU when it
           goes into a low power mode due to idle.  There are several features
           that can keep the CPU awake or even poll so that request latency is
           reduced.  The reason why the number of queues may matter is that
           kicking multiple queues may keep the CPU awake more than batching
           multiple requests onto a small number of queues.
[wei]: CPU awake could be another reason, I noticed that kvm:kvm_vcpu_wakeup count decreased from 151K to 47K after I increased num_queues from 2 to 4 while the vCPU count is 2.
           1). Does this mean more queues may keep CPU more busy and awake which reduced the vcpu wakeup time?
           2). If using more num_queues than vCPU count is able to get higher IOPS for this case, is it safe to use 4 queues while it only have 2 vCPU, or there is any concern or impact by using more queues than vCPU count which I should keep in mind?

In addition, does Virtio-scsi support Batch I/O Submission feature which may be able to increase the IOPS via reducing the number of system calls?

Thanks,
Wei

On 4/16/19, 6:42 PM, "Wei Li" <wei.d.li@oracle.com> wrote:

    Thanks Stefan and Dongli for your feedback and advices!
    
    I will do the further investigation per your advices and get back to you later on.
    
    Thanks, 
    -Wei
    
    On 4/16/19, 2:20 AM, "Stefan Hajnoczi" <stefanha@gmail.com> wrote:
    
        On Tue, Apr 16, 2019 at 07:23:38AM +0800, Dongli Zhang wrote:
        > 
        > 
        > On 4/16/19 1:34 AM, Wei Li wrote:
        > > Hi @Paolo Bonzini & @Stefan Hajnoczi,
        > > 
        > > Would you please help confirm whether @Paolo Bonzini's multiqueue feature change will benefit virtio-scsi or not? Thanks!
        > > 
        > > @Stefan Hajnoczi,
        > > I also spent some time on exploring the virtio-scsi multi-queue features via num_queues parameter as below, here are what we found:
        > > 
        > > 1. Increase number of Queues from one to the same number as CPU will get better IOPS increase.
        > > 2. Increase number of Queues to the number (e.g. 8) larger than the number of vCPU (e.g. 2) can get even better IOPS increase.
        > 
        > As mentioned in below link, when the number of hw queues is larger than
        > nr_cpu_ids, the blk-mq layer would limit and only use at most nr_cpu_ids queues
        > (e.g., /sys/block/sda/mq/).
        > 
        > That is, when the num_queus=4 while vcpus is 2, there should be only 2 queues
        > available /sys/block/sda/mq/
        > 
        > https://lore.kernel.org/lkml/1553682995-5682-1-git-send-email-dongli.zhang@oracle.com/
        > 
        > I am just curious how increasing the num_queues from 2 to 4 would double the
        > iops, while there are only 2 vcpus available...
        
        I don't know the answer.  It's especially hard to guess without seeing
        the benchmark (fio?) configuration and QEMU command-line.
        
        Common things to look at are:
        
        1. Compare "iostat -dx 1" inside the guest and host.  Are the I/O
           patterns comparable?  blktrace(8) can give you even more detail on
           the exact I/O patterns.  If the guest and host have different I/O
           patterns (blocksize, IOPS, queue depth) then request merging or
           I/O scheduler effects could be responsible for the difference.
        
        2. kvm_stat or perf record -a -e kvm:\* counters for vmexits and
           interrupt injections.  If these counters vary greatly between queue
           sizes, then that is usually a clue.  It's possible to get higher
           performance by spending more CPU cycles although your system doesn't
           have many CPUs available, so I'm not sure if this is the case.
        
        3. Power management and polling (kvm.ko halt_poll_ns, tuned profiles,
           and QEMU iothread poll-max-ns).  It's expensive to wake a CPU when it
           goes into a low power mode due to idle.  There are several features
           that can keep the CPU awake or even poll so that request latency is
           reduced.  The reason why the number of queues may matter is that
           kicking multiple queues may keep the CPU awake more than batching
           multiple requests onto a small number of queues.
        
        Stefan
        
    

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-23  4:21                   ` Wei Li
  0 siblings, 0 replies; 46+ messages in thread
From: Wei Li @ 2019-04-23  4:21 UTC (permalink / raw)
  To: Stefan Hajnoczi, Dongli Zhang; +Cc: Paolo Bonzini, qemu-devel, Stefan Hajnoczi

Hi Stefan,

I did investigation per your advices, please see inline for the details and questions.
 
       1. Compare "iostat -dx 1" inside the guest and host.  Are the I/O
           patterns comparable?  blktrace(8) can give you even more detail on
           the exact I/O patterns.  If the guest and host have different I/O
           patterns (blocksize, IOPS, queue depth) then request merging or
           I/O scheduler effects could be responsible for the difference.

[wei]: That's good point, I compared 'iostate -dx1" between guest and host, but I have not find obvious difference between guest and host which could responsible for the difference.
        
        2. kvm_stat or perf record -a -e kvm:\* counters for vmexits and
           interrupt injections.  If these counters vary greatly between queue
           sizes, then that is usually a clue.  It's possible to get higher
           performance by spending more CPU cycles although your system doesn't
           have many CPUs available, so I'm not sure if this is the case.

[wei]: vmexits looks like a reason. I am using FIO tool to read/write block storage via following sample command, interesting thing is that kvm:kvm_exit count decreased from 846K to 395K after I increased num_queues from 2 to 4 while the vCPU count is 2.
           1). Does this mean using more queues than vCPU count may increase IOPS via spending more CPU cycle? 
           2). Could you please help me better understand how more queues is able to spend more CPU cycle? Thanks!
           FIO command: fio --filename=/dev/sdb --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=64 --numjobs=4 --time_based --group_reporting --name=iops --runtime=60 --eta-newline=1

        3. Power management and polling (kvm.ko halt_poll_ns, tuned profiles,
           and QEMU iothread poll-max-ns).  It's expensive to wake a CPU when it
           goes into a low power mode due to idle.  There are several features
           that can keep the CPU awake or even poll so that request latency is
           reduced.  The reason why the number of queues may matter is that
           kicking multiple queues may keep the CPU awake more than batching
           multiple requests onto a small number of queues.
[wei]: CPU awake could be another reason, I noticed that kvm:kvm_vcpu_wakeup count decreased from 151K to 47K after I increased num_queues from 2 to 4 while the vCPU count is 2.
           1). Does this mean more queues may keep CPU more busy and awake which reduced the vcpu wakeup time?
           2). If using more num_queues than vCPU count is able to get higher IOPS for this case, is it safe to use 4 queues while it only have 2 vCPU, or there is any concern or impact by using more queues than vCPU count which I should keep in mind?

In addition, does Virtio-scsi support Batch I/O Submission feature which may be able to increase the IOPS via reducing the number of system calls?

Thanks,
Wei

On 4/16/19, 6:42 PM, "Wei Li" <wei.d.li@oracle.com> wrote:

    Thanks Stefan and Dongli for your feedback and advices!
    
    I will do the further investigation per your advices and get back to you later on.
    
    Thanks, 
    -Wei
    
    On 4/16/19, 2:20 AM, "Stefan Hajnoczi" <stefanha@gmail.com> wrote:
    
        On Tue, Apr 16, 2019 at 07:23:38AM +0800, Dongli Zhang wrote:
        > 
        > 
        > On 4/16/19 1:34 AM, Wei Li wrote:
        > > Hi @Paolo Bonzini & @Stefan Hajnoczi,
        > > 
        > > Would you please help confirm whether @Paolo Bonzini's multiqueue feature change will benefit virtio-scsi or not? Thanks!
        > > 
        > > @Stefan Hajnoczi,
        > > I also spent some time on exploring the virtio-scsi multi-queue features via num_queues parameter as below, here are what we found:
        > > 
        > > 1. Increase number of Queues from one to the same number as CPU will get better IOPS increase.
        > > 2. Increase number of Queues to the number (e.g. 8) larger than the number of vCPU (e.g. 2) can get even better IOPS increase.
        > 
        > As mentioned in below link, when the number of hw queues is larger than
        > nr_cpu_ids, the blk-mq layer would limit and only use at most nr_cpu_ids queues
        > (e.g., /sys/block/sda/mq/).
        > 
        > That is, when the num_queus=4 while vcpus is 2, there should be only 2 queues
        > available /sys/block/sda/mq/
        > 
        > https://lore.kernel.org/lkml/1553682995-5682-1-git-send-email-dongli.zhang@oracle.com/
        > 
        > I am just curious how increasing the num_queues from 2 to 4 would double the
        > iops, while there are only 2 vcpus available...
        
        I don't know the answer.  It's especially hard to guess without seeing
        the benchmark (fio?) configuration and QEMU command-line.
        
        Common things to look at are:
        
        1. Compare "iostat -dx 1" inside the guest and host.  Are the I/O
           patterns comparable?  blktrace(8) can give you even more detail on
           the exact I/O patterns.  If the guest and host have different I/O
           patterns (blocksize, IOPS, queue depth) then request merging or
           I/O scheduler effects could be responsible for the difference.
        
        2. kvm_stat or perf record -a -e kvm:\* counters for vmexits and
           interrupt injections.  If these counters vary greatly between queue
           sizes, then that is usually a clue.  It's possible to get higher
           performance by spending more CPU cycles although your system doesn't
           have many CPUs available, so I'm not sure if this is the case.
        
        3. Power management and polling (kvm.ko halt_poll_ns, tuned profiles,
           and QEMU iothread poll-max-ns).  It's expensive to wake a CPU when it
           goes into a low power mode due to idle.  There are several features
           that can keep the CPU awake or even poll so that request latency is
           reduced.  The reason why the number of queues may matter is that
           kicking multiple queues may keep the CPU awake more than batching
           multiple requests onto a small number of queues.
        
        Stefan
        
    




^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-23 12:04                     ` Stefan Hajnoczi
  0 siblings, 0 replies; 46+ messages in thread
From: Stefan Hajnoczi @ 2019-04-23 12:04 UTC (permalink / raw)
  To: Wei Li; +Cc: Stefan Hajnoczi, Dongli Zhang, Paolo Bonzini, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2913 bytes --]

On Mon, Apr 22, 2019 at 09:21:53PM -0700, Wei Li wrote:
>         2. kvm_stat or perf record -a -e kvm:\* counters for vmexits and
>            interrupt injections.  If these counters vary greatly between queue
>            sizes, then that is usually a clue.  It's possible to get higher
>            performance by spending more CPU cycles although your system doesn't
>            have many CPUs available, so I'm not sure if this is the case.
> 
> [wei]: vmexits looks like a reason. I am using FIO tool to read/write block storage via following sample command, interesting thing is that kvm:kvm_exit count decreased from 846K to 395K after I increased num_queues from 2 to 4 while the vCPU count is 2.
>            1). Does this mean using more queues than vCPU count may increase IOPS via spending more CPU cycle? 
>            2). Could you please help me better understand how more queues is able to spend more CPU cycle? Thanks!
>            FIO command: fio --filename=/dev/sdb --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=64 --numjobs=4 --time_based --group_reporting --name=iops --runtime=60 --eta-newline=1
> 
>         3. Power management and polling (kvm.ko halt_poll_ns, tuned profiles,
>            and QEMU iothread poll-max-ns).  It's expensive to wake a CPU when it
>            goes into a low power mode due to idle.  There are several features
>            that can keep the CPU awake or even poll so that request latency is
>            reduced.  The reason why the number of queues may matter is that
>            kicking multiple queues may keep the CPU awake more than batching
>            multiple requests onto a small number of queues.
> [wei]: CPU awake could be another reason, I noticed that kvm:kvm_vcpu_wakeup count decreased from 151K to 47K after I increased num_queues from 2 to 4 while the vCPU count is 2.

This suggests that wakeups are involved in the performance difference.

>            1). Does this mean more queues may keep CPU more busy and awake which reduced the vcpu wakeup time?

Yes, although it depends on how I/O requests are distributed across the
queues.  You can check /proc/interrupts inside the guest to see
interrupt counts for the virtqueues.

>            2). If using more num_queues than vCPU count is able to get higher IOPS for this case, is it safe to use 4 queues while it only have 2 vCPU, or there is any concern or impact by using more queues than vCPU count which I should keep in mind?

2 vs 4 queues should be functionally identical.  The only difference is
performance.

> In addition, does Virtio-scsi support Batch I/O Submission feature which may be able to increase the IOPS via reducing the number of system calls?

I don't see obvious batching support in drivers/scsi/virtio_scsi.c.  The
Linux block layer supports batching but I'm not sure if the SCSI layer
does.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-23 12:04                     ` Stefan Hajnoczi
  0 siblings, 0 replies; 46+ messages in thread
From: Stefan Hajnoczi @ 2019-04-23 12:04 UTC (permalink / raw)
  To: Wei Li; +Cc: Stefan Hajnoczi, Paolo Bonzini, qemu-devel, Dongli Zhang

[-- Attachment #1: Type: text/plain, Size: 2913 bytes --]

On Mon, Apr 22, 2019 at 09:21:53PM -0700, Wei Li wrote:
>         2. kvm_stat or perf record -a -e kvm:\* counters for vmexits and
>            interrupt injections.  If these counters vary greatly between queue
>            sizes, then that is usually a clue.  It's possible to get higher
>            performance by spending more CPU cycles although your system doesn't
>            have many CPUs available, so I'm not sure if this is the case.
> 
> [wei]: vmexits looks like a reason. I am using FIO tool to read/write block storage via following sample command, interesting thing is that kvm:kvm_exit count decreased from 846K to 395K after I increased num_queues from 2 to 4 while the vCPU count is 2.
>            1). Does this mean using more queues than vCPU count may increase IOPS via spending more CPU cycle? 
>            2). Could you please help me better understand how more queues is able to spend more CPU cycle? Thanks!
>            FIO command: fio --filename=/dev/sdb --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=64 --numjobs=4 --time_based --group_reporting --name=iops --runtime=60 --eta-newline=1
> 
>         3. Power management and polling (kvm.ko halt_poll_ns, tuned profiles,
>            and QEMU iothread poll-max-ns).  It's expensive to wake a CPU when it
>            goes into a low power mode due to idle.  There are several features
>            that can keep the CPU awake or even poll so that request latency is
>            reduced.  The reason why the number of queues may matter is that
>            kicking multiple queues may keep the CPU awake more than batching
>            multiple requests onto a small number of queues.
> [wei]: CPU awake could be another reason, I noticed that kvm:kvm_vcpu_wakeup count decreased from 151K to 47K after I increased num_queues from 2 to 4 while the vCPU count is 2.

This suggests that wakeups are involved in the performance difference.

>            1). Does this mean more queues may keep CPU more busy and awake which reduced the vcpu wakeup time?

Yes, although it depends on how I/O requests are distributed across the
queues.  You can check /proc/interrupts inside the guest to see
interrupt counts for the virtqueues.

>            2). If using more num_queues than vCPU count is able to get higher IOPS for this case, is it safe to use 4 queues while it only have 2 vCPU, or there is any concern or impact by using more queues than vCPU count which I should keep in mind?

2 vs 4 queues should be functionally identical.  The only difference is
performance.

> In addition, does Virtio-scsi support Batch I/O Submission feature which may be able to increase the IOPS via reducing the number of system calls?

I don't see obvious batching support in drivers/scsi/virtio_scsi.c.  The
Linux block layer supports batching but I'm not sure if the SCSI layer
does.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-26  8:14                       ` Paolo Bonzini
  0 siblings, 0 replies; 46+ messages in thread
From: Paolo Bonzini @ 2019-04-26  8:14 UTC (permalink / raw)
  To: Stefan Hajnoczi, Wei Li; +Cc: Stefan Hajnoczi, Dongli Zhang, qemu-devel

On 23/04/19 14:04, Stefan Hajnoczi wrote:
>> In addition, does Virtio-scsi support Batch I/O Submission feature
>> which may be able to increase the IOPS via reducing the number of
>> system calls?
>
> I don't see obvious batching support in drivers/scsi/virtio_scsi.c.
> The Linux block layer supports batching but I'm not sure if the SCSI
> layer does.

I think he's referring to QEMU, in which case yes, virtio-scsi does
batch I/O submission.  See virtio_scsi_handle_cmd_req_prepare and
virtio_scsi_handle_cmd_req_submit in hw/scsi/virtio-scsi.c, they do
blk_io_plug and blk_io_unplug in order to batch I/O requests from QEMU
to the host kernel.

Paolo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-26  8:14                       ` Paolo Bonzini
  0 siblings, 0 replies; 46+ messages in thread
From: Paolo Bonzini @ 2019-04-26  8:14 UTC (permalink / raw)
  To: Stefan Hajnoczi, Wei Li; +Cc: Stefan Hajnoczi, qemu-devel, Dongli Zhang

On 23/04/19 14:04, Stefan Hajnoczi wrote:
>> In addition, does Virtio-scsi support Batch I/O Submission feature
>> which may be able to increase the IOPS via reducing the number of
>> system calls?
>
> I don't see obvious batching support in drivers/scsi/virtio_scsi.c.
> The Linux block layer supports batching but I'm not sure if the SCSI
> layer does.

I think he's referring to QEMU, in which case yes, virtio-scsi does
batch I/O submission.  See virtio_scsi_handle_cmd_req_prepare and
virtio_scsi_handle_cmd_req_submit in hw/scsi/virtio-scsi.c, they do
blk_io_plug and blk_io_unplug in order to batch I/O requests from QEMU
to the host kernel.

Paolo


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-26 23:02                         ` Wei Li
  0 siblings, 0 replies; 46+ messages in thread
From: Wei Li @ 2019-04-26 23:02 UTC (permalink / raw)
  To: Paolo Bonzini, Stefan Hajnoczi; +Cc: Stefan Hajnoczi, Dongli Zhang, qemu-devel

Thanks Stefan and Paolo for your response and advice!

Hi Paolo,

As to the virtio-scsi batch I/O submission feature in QEMU which you mentioned, is this feature turned on by default in QEMU 2.9 or there is a tunable parameters to turn on/off the feature?

Thanks,
Wei

On 4/26/19, 1:14 AM, "Paolo Bonzini" <pbonzini@redhat.com> wrote:

    On 23/04/19 14:04, Stefan Hajnoczi wrote:
    >>In addition, does Virtio-scsi support Batch I/O Submission feature
    >>which may be able to increase the IOPS via reducing the number of
    >>system calls?
    >
    >I don't see obvious batching support in drivers/scsi/virtio_scsi.c.
    >The Linux block layer supports batching but I'm not sure if the SCSI
    >layer does.
    
    I think he's referring to QEMU, in which case yes, virtio-scsi does
    batch I/O submission.  See virtio_scsi_handle_cmd_req_prepare and
    virtio_scsi_handle_cmd_req_submit in hw/scsi/virtio-scsi.c, they do
    blk_io_plug and blk_io_unplug in order to batch I/O requests from QEMU
    to the host kernel.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-26 23:02                         ` Wei Li
  0 siblings, 0 replies; 46+ messages in thread
From: Wei Li @ 2019-04-26 23:02 UTC (permalink / raw)
  To: Paolo Bonzini, Stefan Hajnoczi; +Cc: Stefan Hajnoczi, qemu-devel, Dongli Zhang

Thanks Stefan and Paolo for your response and advice!

Hi Paolo,

As to the virtio-scsi batch I/O submission feature in QEMU which you mentioned, is this feature turned on by default in QEMU 2.9 or there is a tunable parameters to turn on/off the feature?

Thanks,
Wei

On 4/26/19, 1:14 AM, "Paolo Bonzini" <pbonzini@redhat.com> wrote:

    On 23/04/19 14:04, Stefan Hajnoczi wrote:
    >>In addition, does Virtio-scsi support Batch I/O Submission feature
    >>which may be able to increase the IOPS via reducing the number of
    >>system calls?
    >
    >I don't see obvious batching support in drivers/scsi/virtio_scsi.c.
    >The Linux block layer supports batching but I'm not sure if the SCSI
    >layer does.
    
    I think he's referring to QEMU, in which case yes, virtio-scsi does
    batch I/O submission.  See virtio_scsi_handle_cmd_req_prepare and
    virtio_scsi_handle_cmd_req_submit in hw/scsi/virtio-scsi.c, they do
    blk_io_plug and blk_io_unplug in order to batch I/O requests from QEMU
    to the host kernel.




^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-27  4:24                           ` Paolo Bonzini
  0 siblings, 0 replies; 46+ messages in thread
From: Paolo Bonzini @ 2019-04-27  4:24 UTC (permalink / raw)
  To: Wei Li; +Cc: Stefan Hajnoczi, Stefan Hajnoczi, Dongli Zhang, qemu-devel


> Thanks Stefan and Paolo for your response and advice!
> 
> Hi Paolo,
> 
> As to the virtio-scsi batch I/O submission feature in QEMU which you
> mentioned, is this feature turned on by default in QEMU 2.9 or there is a
> tunable parameters to turn on/off the feature?

Yes, it is available by default since 2.2.0.  It cannot be turned off, however
it is only possible to batch I/O with aio=native (and, since 2.12.0, with the NVMe
backend).

Paolo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-27  4:24                           ` Paolo Bonzini
  0 siblings, 0 replies; 46+ messages in thread
From: Paolo Bonzini @ 2019-04-27  4:24 UTC (permalink / raw)
  To: Wei Li; +Cc: Stefan Hajnoczi, qemu-devel, Stefan Hajnoczi, Dongli Zhang


> Thanks Stefan and Paolo for your response and advice!
> 
> Hi Paolo,
> 
> As to the virtio-scsi batch I/O submission feature in QEMU which you
> mentioned, is this feature turned on by default in QEMU 2.9 or there is a
> tunable parameters to turn on/off the feature?

Yes, it is available by default since 2.2.0.  It cannot be turned off, however
it is only possible to batch I/O with aio=native (and, since 2.12.0, with the NVMe
backend).

Paolo


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-29 13:40                         ` Stefan Hajnoczi
  0 siblings, 0 replies; 46+ messages in thread
From: Stefan Hajnoczi @ 2019-04-29 13:40 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Wei Li, Stefan Hajnoczi, Dongli Zhang, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1044 bytes --]

On Fri, Apr 26, 2019 at 10:14:16AM +0200, Paolo Bonzini wrote:
> On 23/04/19 14:04, Stefan Hajnoczi wrote:
> >> In addition, does Virtio-scsi support Batch I/O Submission feature
> >> which may be able to increase the IOPS via reducing the number of
> >> system calls?
> >
> > I don't see obvious batching support in drivers/scsi/virtio_scsi.c.
> > The Linux block layer supports batching but I'm not sure if the SCSI
> > layer does.
> 
> I think he's referring to QEMU, in which case yes, virtio-scsi does
> batch I/O submission.  See virtio_scsi_handle_cmd_req_prepare and
> virtio_scsi_handle_cmd_req_submit in hw/scsi/virtio-scsi.c, they do
> blk_io_plug and blk_io_unplug in order to batch I/O requests from QEMU
> to the host kernel.

This isn't fully effective since the guest driver kicks once per
request.  Therefore QEMU-level batching you mentioned only works if QEMU
is slower at handling virtqueue kicks than the guest is at submitting
requests.

I wonder if this is something that can be improved.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-29 13:40                         ` Stefan Hajnoczi
  0 siblings, 0 replies; 46+ messages in thread
From: Stefan Hajnoczi @ 2019-04-29 13:40 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Stefan Hajnoczi, Wei Li, qemu-devel, Dongli Zhang

[-- Attachment #1: Type: text/plain, Size: 1044 bytes --]

On Fri, Apr 26, 2019 at 10:14:16AM +0200, Paolo Bonzini wrote:
> On 23/04/19 14:04, Stefan Hajnoczi wrote:
> >> In addition, does Virtio-scsi support Batch I/O Submission feature
> >> which may be able to increase the IOPS via reducing the number of
> >> system calls?
> >
> > I don't see obvious batching support in drivers/scsi/virtio_scsi.c.
> > The Linux block layer supports batching but I'm not sure if the SCSI
> > layer does.
> 
> I think he's referring to QEMU, in which case yes, virtio-scsi does
> batch I/O submission.  See virtio_scsi_handle_cmd_req_prepare and
> virtio_scsi_handle_cmd_req_submit in hw/scsi/virtio-scsi.c, they do
> blk_io_plug and blk_io_unplug in order to batch I/O requests from QEMU
> to the host kernel.

This isn't fully effective since the guest driver kicks once per
request.  Therefore QEMU-level batching you mentioned only works if QEMU
is slower at handling virtqueue kicks than the guest is at submitting
requests.

I wonder if this is something that can be improved.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-29 17:49                             ` Wei Li
  0 siblings, 0 replies; 46+ messages in thread
From: Wei Li @ 2019-04-29 17:49 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Stefan Hajnoczi, Stefan Hajnoczi, Dongli Zhang, qemu-devel

Thanks Paolo for your clarification!

Just wanted to double confirm, does this mean batch I/O submission won't apply to aio=threads (which is the default mode)?

Thanks,
Wei


On 4/26/19, 9:25 PM, "Paolo Bonzini" <pbonzini@redhat.com> wrote:

    
    > Thanks Stefan and Paolo for your response and advice!
    > 
    > Hi Paolo,
    > 
    > As to the virtio-scsi batch I/O submission feature in QEMU which you
    > mentioned, is this feature turned on by default in QEMU 2.9 or there is a
    > tunable parameters to turn on/off the feature?
    
    Yes, it is available by default since 2.2.0.  It cannot be turned off, however
    it is only possible to batch I/O with aio=native (and, since 2.12.0, with the NVMe
    backend).
    
    Paolo
    

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-29 17:49                             ` Wei Li
  0 siblings, 0 replies; 46+ messages in thread
From: Wei Li @ 2019-04-29 17:49 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Stefan Hajnoczi, qemu-devel, Stefan Hajnoczi, Dongli Zhang

Thanks Paolo for your clarification!

Just wanted to double confirm, does this mean batch I/O submission won't apply to aio=threads (which is the default mode)?

Thanks,
Wei


On 4/26/19, 9:25 PM, "Paolo Bonzini" <pbonzini@redhat.com> wrote:

    
    > Thanks Stefan and Paolo for your response and advice!
    > 
    > Hi Paolo,
    > 
    > As to the virtio-scsi batch I/O submission feature in QEMU which you
    > mentioned, is this feature turned on by default in QEMU 2.9 or there is a
    > tunable parameters to turn on/off the feature?
    
    Yes, it is available by default since 2.2.0.  It cannot be turned off, however
    it is only possible to batch I/O with aio=native (and, since 2.12.0, with the NVMe
    backend).
    
    Paolo
    




^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-29 17:56                           ` Wei Li
  0 siblings, 0 replies; 46+ messages in thread
From: Wei Li @ 2019-04-29 17:56 UTC (permalink / raw)
  To: Stefan Hajnoczi, Paolo Bonzini; +Cc: Stefan Hajnoczi, Dongli Zhang, qemu-devel

Thanks Stefan!

Does this mean the performance could be improved via adding Batch I/O submission support in Guest driver side which will be able to reduce the number of virtqueue kicks?

Thanks,
Wei

On 4/29/19, 6:40 AM, "Stefan Hajnoczi" <stefanha@redhat.com> wrote:

    On Fri, Apr 26, 2019 at 10:14:16AM +0200, Paolo Bonzini wrote:
    > On 23/04/19 14:04, Stefan Hajnoczi wrote:
    > >> In addition, does Virtio-scsi support Batch I/O Submission feature
    > >> which may be able to increase the IOPS via reducing the number of
    > >> system calls?
    > >
    > > I don't see obvious batching support in drivers/scsi/virtio_scsi.c.
    > > The Linux block layer supports batching but I'm not sure if the SCSI
    > > layer does.
    > 
    > I think he's referring to QEMU, in which case yes, virtio-scsi does
    > batch I/O submission.  See virtio_scsi_handle_cmd_req_prepare and
    > virtio_scsi_handle_cmd_req_submit in hw/scsi/virtio-scsi.c, they do
    > blk_io_plug and blk_io_unplug in order to batch I/O requests from QEMU
    > to the host kernel.
    
    This isn't fully effective since the guest driver kicks once per
    request.  Therefore QEMU-level batching you mentioned only works if QEMU
    is slower at handling virtqueue kicks than the guest is at submitting
    requests.
    
    I wonder if this is something that can be improved.
    
    Stefan
    

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-29 17:56                           ` Wei Li
  0 siblings, 0 replies; 46+ messages in thread
From: Wei Li @ 2019-04-29 17:56 UTC (permalink / raw)
  To: Stefan Hajnoczi, Paolo Bonzini; +Cc: Stefan Hajnoczi, qemu-devel, Dongli Zhang

Thanks Stefan!

Does this mean the performance could be improved via adding Batch I/O submission support in Guest driver side which will be able to reduce the number of virtqueue kicks?

Thanks,
Wei

On 4/29/19, 6:40 AM, "Stefan Hajnoczi" <stefanha@redhat.com> wrote:

    On Fri, Apr 26, 2019 at 10:14:16AM +0200, Paolo Bonzini wrote:
    > On 23/04/19 14:04, Stefan Hajnoczi wrote:
    > >> In addition, does Virtio-scsi support Batch I/O Submission feature
    > >> which may be able to increase the IOPS via reducing the number of
    > >> system calls?
    > >
    > > I don't see obvious batching support in drivers/scsi/virtio_scsi.c.
    > > The Linux block layer supports batching but I'm not sure if the SCSI
    > > layer does.
    > 
    > I think he's referring to QEMU, in which case yes, virtio-scsi does
    > batch I/O submission.  See virtio_scsi_handle_cmd_req_prepare and
    > virtio_scsi_handle_cmd_req_submit in hw/scsi/virtio-scsi.c, they do
    > blk_io_plug and blk_io_unplug in order to batch I/O requests from QEMU
    > to the host kernel.
    
    This isn't fully effective since the guest driver kicks once per
    request.  Therefore QEMU-level batching you mentioned only works if QEMU
    is slower at handling virtqueue kicks than the guest is at submitting
    requests.
    
    I wonder if this is something that can be improved.
    
    Stefan
    




^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-30 11:21                           ` Paolo Bonzini
  0 siblings, 0 replies; 46+ messages in thread
From: Paolo Bonzini @ 2019-04-30 11:21 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Wei Li, Stefan Hajnoczi, Dongli Zhang, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 536 bytes --]

On 29/04/19 15:40, Stefan Hajnoczi wrote:
> This isn't fully effective since the guest driver kicks once per
> request.  Therefore QEMU-level batching you mentioned only works if QEMU
> is slower at handling virtqueue kicks than the guest is at submitting
> requests.
> 
> I wonder if this is something that can be improved.

Right, virtscsi_kick_cmd does limit notifications but not submissions.
The SCSI layer does not have separate queue_rq and commit_rqs callbacks.
 That should not be too hard to fix though.

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-04-30 11:21                           ` Paolo Bonzini
  0 siblings, 0 replies; 46+ messages in thread
From: Paolo Bonzini @ 2019-04-30 11:21 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Stefan Hajnoczi, Wei Li, qemu-devel, Dongli Zhang

[-- Attachment #1: Type: text/plain, Size: 536 bytes --]

On 29/04/19 15:40, Stefan Hajnoczi wrote:
> This isn't fully effective since the guest driver kicks once per
> request.  Therefore QEMU-level batching you mentioned only works if QEMU
> is slower at handling virtqueue kicks than the guest is at submitting
> requests.
> 
> I wonder if this is something that can be improved.

Right, virtscsi_kick_cmd does limit notifications but not submissions.
The SCSI layer does not have separate queue_rq and commit_rqs callbacks.
 That should not be too hard to fix though.

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-05-01 16:36                             ` Stefan Hajnoczi
  0 siblings, 0 replies; 46+ messages in thread
From: Stefan Hajnoczi @ 2019-05-01 16:36 UTC (permalink / raw)
  To: Wei Li; +Cc: Stefan Hajnoczi, Paolo Bonzini, Dongli Zhang, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 425 bytes --]

On Mon, Apr 29, 2019 at 10:56:31AM -0700, Wei Li wrote:
> Does this mean the performance could be improved via adding Batch I/O submission support in Guest driver side which will be able to reduce the number of virtqueue kicks?

Yes, I think so.  It's not obvious to me how a Linux SCSI driver is
supposed to implement batching though.  The .queuecommand API doesn't
seem to include information relevant to batching.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-05-01 16:36                             ` Stefan Hajnoczi
  0 siblings, 0 replies; 46+ messages in thread
From: Stefan Hajnoczi @ 2019-05-01 16:36 UTC (permalink / raw)
  To: Wei Li; +Cc: Paolo Bonzini, qemu-devel, Stefan Hajnoczi, Dongli Zhang

[-- Attachment #1: Type: text/plain, Size: 425 bytes --]

On Mon, Apr 29, 2019 at 10:56:31AM -0700, Wei Li wrote:
> Does this mean the performance could be improved via adding Batch I/O submission support in Guest driver side which will be able to reduce the number of virtqueue kicks?

Yes, I think so.  It's not obvious to me how a Linux SCSI driver is
supposed to implement batching though.  The .queuecommand API doesn't
seem to include information relevant to batching.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-05-03 16:21                               ` Wei Li
  0 siblings, 0 replies; 46+ messages in thread
From: Wei Li @ 2019-05-03 16:21 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Stefan Hajnoczi, Paolo Bonzini, Dongli Zhang, qemu-devel

Got it, thanks Stefan for your clarification!

Wei

On 5/1/19, 9:36 AM, "Stefan Hajnoczi" <stefanha@gmail.com> wrote:

    On Mon, Apr 29, 2019 at 10:56:31AM -0700, Wei Li wrote:
    >Does this mean the performance could be improved via adding Batch I/O submission support in Guest driver side which will be able to reduce the number of virtqueue kicks?
    
    Yes, I think so.  It's not obvious to me how a Linux SCSI driver is
    supposed to implement batching though.  The .queuecommand API doesn't
    seem to include information relevant to batching.
    
    Stefan
    
    

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-05-03 16:21                               ` Wei Li
  0 siblings, 0 replies; 46+ messages in thread
From: Wei Li @ 2019-05-03 16:21 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Paolo Bonzini, qemu-devel, Stefan Hajnoczi, Dongli Zhang

Got it, thanks Stefan for your clarification!

Wei

On 5/1/19, 9:36 AM, "Stefan Hajnoczi" <stefanha@gmail.com> wrote:

    On Mon, Apr 29, 2019 at 10:56:31AM -0700, Wei Li wrote:
    >Does this mean the performance could be improved via adding Batch I/O submission support in Guest driver side which will be able to reduce the number of virtqueue kicks?
    
    Yes, I think so.  It's not obvious to me how a Linux SCSI driver is
    supposed to implement batching though.  The .queuecommand API doesn't
    seem to include information relevant to batching.
    
    Stefan
    
    




^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-05-03 18:05                                 ` Paolo Bonzini
  0 siblings, 0 replies; 46+ messages in thread
From: Paolo Bonzini @ 2019-05-03 18:05 UTC (permalink / raw)
  To: Wei Li, Stefan Hajnoczi; +Cc: Stefan Hajnoczi, Dongli Zhang, qemu-devel

On 03/05/19 10:21, Wei Li wrote:
> Got it, thanks Stefan for your clarification!

Hi Wei,

Stefan and I should be posting a patch to add Linux SCSI driver
batching, and an implementation for virtio-scsi.

Paolo

> Wei
> 
> On 5/1/19, 9:36 AM, "Stefan Hajnoczi" <stefanha@gmail.com> wrote:
> 
>     On Mon, Apr 29, 2019 at 10:56:31AM -0700, Wei Li wrote:
>     >Does this mean the performance could be improved via adding Batch I/O submission support in Guest driver side which will be able to reduce the number of virtqueue kicks?
>     
>     Yes, I think so.  It's not obvious to me how a Linux SCSI driver is
>     supposed to implement batching though.  The .queuecommand API doesn't
>     seem to include information relevant to batching.
>     
>     Stefan
>     
>     
> 
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-05-03 18:05                                 ` Paolo Bonzini
  0 siblings, 0 replies; 46+ messages in thread
From: Paolo Bonzini @ 2019-05-03 18:05 UTC (permalink / raw)
  To: Wei Li, Stefan Hajnoczi; +Cc: Dongli Zhang, qemu-devel, Stefan Hajnoczi

On 03/05/19 10:21, Wei Li wrote:
> Got it, thanks Stefan for your clarification!

Hi Wei,

Stefan and I should be posting a patch to add Linux SCSI driver
batching, and an implementation for virtio-scsi.

Paolo

> Wei
> 
> On 5/1/19, 9:36 AM, "Stefan Hajnoczi" <stefanha@gmail.com> wrote:
> 
>     On Mon, Apr 29, 2019 at 10:56:31AM -0700, Wei Li wrote:
>     >Does this mean the performance could be improved via adding Batch I/O submission support in Guest driver side which will be able to reduce the number of virtqueue kicks?
>     
>     Yes, I think so.  It's not obvious to me how a Linux SCSI driver is
>     supposed to implement batching though.  The .queuecommand API doesn't
>     seem to include information relevant to batching.
>     
>     Stefan
>     
>     
> 
> 



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-05-03 18:11                                   ` Wei Li
  0 siblings, 0 replies; 46+ messages in thread
From: Wei Li @ 2019-05-03 18:11 UTC (permalink / raw)
  To: Paolo Bonzini, Stefan Hajnoczi; +Cc: Stefan Hajnoczi, Dongli Zhang, qemu-devel

Hi Paolo,

That will be great, I would like to hear more details about the design and implementation once you get those ready. 

Thanks a lot,
Wei

On 5/3/19, 11:05 AM, "Paolo Bonzini" <pbonzini@redhat.com> wrote:

    On 03/05/19 10:21, Wei Li wrote:
    > Got it, thanks Stefan for your clarification!
    
    Hi Wei,
    
    Stefan and I should be posting a patch to add Linux SCSI driver
    batching, and an implementation for virtio-scsi.
    
    Paolo
    
    > Wei
    > 
    > On 5/1/19, 9:36 AM, "Stefan Hajnoczi" <stefanha@gmail.com> wrote:
    > 
    >     On Mon, Apr 29, 2019 at 10:56:31AM -0700, Wei Li wrote:
    >     >Does this mean the performance could be improved via adding Batch I/O submission support in Guest driver side which will be able to reduce the number of virtqueue kicks?
    >     
    >     Yes, I think so.  It's not obvious to me how a Linux SCSI driver is
    >     supposed to implement batching though.  The .queuecommand API doesn't
    >     seem to include information relevant to batching.
    >     
    >     Stefan
    >     
    >     
    > 
    > 
    
    

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
@ 2019-05-03 18:11                                   ` Wei Li
  0 siblings, 0 replies; 46+ messages in thread
From: Wei Li @ 2019-05-03 18:11 UTC (permalink / raw)
  To: Paolo Bonzini, Stefan Hajnoczi; +Cc: Dongli Zhang, qemu-devel, Stefan Hajnoczi

Hi Paolo,

That will be great, I would like to hear more details about the design and implementation once you get those ready. 

Thanks a lot,
Wei

On 5/3/19, 11:05 AM, "Paolo Bonzini" <pbonzini@redhat.com> wrote:

    On 03/05/19 10:21, Wei Li wrote:
    > Got it, thanks Stefan for your clarification!
    
    Hi Wei,
    
    Stefan and I should be posting a patch to add Linux SCSI driver
    batching, and an implementation for virtio-scsi.
    
    Paolo
    
    > Wei
    > 
    > On 5/1/19, 9:36 AM, "Stefan Hajnoczi" <stefanha@gmail.com> wrote:
    > 
    >     On Mon, Apr 29, 2019 at 10:56:31AM -0700, Wei Li wrote:
    >     >Does this mean the performance could be improved via adding Batch I/O submission support in Guest driver side which will be able to reduce the number of virtqueue kicks?
    >     
    >     Yes, I think so.  It's not obvious to me how a Linux SCSI driver is
    >     supposed to implement batching though.  The .queuecommand API doesn't
    >     seem to include information relevant to batching.
    >     
    >     Stefan
    >     
    >     
    > 
    > 
    
    




^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2019-05-03 18:12 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-04 17:33 [Qemu-devel] Following up questions related to QEMU and I/O Thread Wei Li
2019-03-05 17:29 ` Stefan Hajnoczi
     [not found]   ` <2D7F11D0-4A02-4A0F-961D-854240376B17@oracle.com>
2019-04-01  9:07     ` Stefan Hajnoczi
2019-04-05 21:09       ` Wei Li
2019-04-05 21:09         ` Wei Li
2019-04-16 14:01         ` Paolo Bonzini
2019-04-16 14:01           ` Paolo Bonzini
2019-04-17  1:38           ` Wei Li
2019-04-17  1:38             ` Wei Li
2019-04-17 12:17             ` Paolo Bonzini
2019-04-17 12:17               ` Paolo Bonzini
2019-04-18  3:34               ` Wei Li
2019-04-18  3:34                 ` Wei Li
     [not found]       ` <CC372DF3-1AC6-46B5-98A5-21159497034A@oracle.com>
2019-04-15 17:34         ` Wei Li
2019-04-15 23:23           ` Dongli Zhang
2019-04-15 23:23             ` Dongli Zhang
2019-04-16  9:20             ` Stefan Hajnoczi
2019-04-16  9:20               ` Stefan Hajnoczi
2019-04-17  1:42               ` Wei Li
2019-04-17  1:42                 ` Wei Li
     [not found]               ` <8E5AF770-69ED-4D44-8A25-B51344996D9E@oracle.com>
2019-04-23  4:21                 ` Wei Li
2019-04-23  4:21                   ` Wei Li
2019-04-23 12:04                   ` Stefan Hajnoczi
2019-04-23 12:04                     ` Stefan Hajnoczi
2019-04-26  8:14                     ` Paolo Bonzini
2019-04-26  8:14                       ` Paolo Bonzini
2019-04-26 23:02                       ` Wei Li
2019-04-26 23:02                         ` Wei Li
2019-04-27  4:24                         ` Paolo Bonzini
2019-04-27  4:24                           ` Paolo Bonzini
2019-04-29 17:49                           ` Wei Li
2019-04-29 17:49                             ` Wei Li
2019-04-29 13:40                       ` Stefan Hajnoczi
2019-04-29 13:40                         ` Stefan Hajnoczi
2019-04-29 17:56                         ` Wei Li
2019-04-29 17:56                           ` Wei Li
2019-05-01 16:36                           ` Stefan Hajnoczi
2019-05-01 16:36                             ` Stefan Hajnoczi
2019-05-03 16:21                             ` Wei Li
2019-05-03 16:21                               ` Wei Li
2019-05-03 18:05                               ` Paolo Bonzini
2019-05-03 18:05                                 ` Paolo Bonzini
2019-05-03 18:11                                 ` Wei Li
2019-05-03 18:11                                   ` Wei Li
2019-04-30 11:21                         ` Paolo Bonzini
2019-04-30 11:21                           ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.