All of lore.kernel.org
 help / color / mirror / Atom feed
* QEMU 5.0 virtio-blk performance regression with high queue depths
@ 2020-08-24 13:44 Stefan Hajnoczi
  2020-09-16 13:32 ` Stefan Hajnoczi
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Hajnoczi @ 2020-08-24 13:44 UTC (permalink / raw)
  To: Denis Plotnikov; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 4020 bytes --]

Hi Denis,
A performance regression was found after the virtio-blk queue-size
property was increased from 128 to 256 in QEMU 5.0 in commit
c9b7d9ec21dfca716f0bb3b68dee75660d86629c ("virtio: increase virtqueue
size for virtio-scsi and virtio-blk"). I wanted to let you know if case
you have ideas or see something similar.

Throughput and IOPS of the following fio benchmarks dropped by 30-40%:

  # mkfs.xfs /dev/vdb
  # mount /dev/vdb /mnt
  # fio --rw=%s --bs=%s --iodepth=64 --runtime=1m --direct=1 --filename=/mnt/%s --name=job1 --ioengine=libaio --thread --group_reporting --numjobs=16 --size=512MB --time_based --output=/tmp/fio_result &> /dev/null
    - rw: read write
    - bs: 4k 64k

Note that there are 16 threads submitting 64 requests each! The guest
block device queue depth will be maxed out. The virtqueue should be full
most of the time.

Have you seen regressions after virtio-blk queue-size was increased in
QEMU 5.0?

Here are the details of the host storage:

  # mkfs.xfs /dev/sdb # 60GB SSD drive
  # mount /dev/sdb /mnt/test
  # qemu-img create -f qcow2 /mnt/test/storage2.qcow2 40G

The guest command-line is:

  # MALLOC_PERTURB_=1 numactl \
    -m 1  /usr/libexec/qemu-kvm \
    -S  \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine q35 \
    -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
    -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x2 \
    -m 4096  \
    -smp 2,maxcpus=2,cores=1,threads=1,dies=1,sockets=2  \
    -cpu 'IvyBridge',+kvm_pv_unhalt \
    -chardev socket,server,id=qmp_id_qmpmonitor1,nowait,path=/var/tmp/avocado_bapfdqao/monitor-qmpmonitor1-20200721-014154-5HJGMjxW  \
    -mon chardev=qmp_id_qmpmonitor1,mode=control \
    -chardev socket,server,id=qmp_id_catch_monitor,nowait,path=/var/tmp/avocado_bapfdqao/monitor-catch_monitor-20200721-014154-5HJGMjxW  \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=id31BN83 \
    -chardev socket,server,id=chardev_serial0,nowait,path=/var/tmp/avocado_bapfdqao/serial-serial0-20200721-014154-5HJGMjxW \
    -device isa-serial,id=serial0,chardev=chardev_serial0  \
    -chardev socket,id=seabioslog_id_20200721-014154-5HJGMjxW,path=/var/tmp/avocado_bapfdqao/seabios-20200721-014154-5HJGMjxW,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20200721-014154-5HJGMjxW,iobase=0x402 \
    -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
    -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -blockdev node-name=file_image1,driver=file,aio=threads,filename=rootfs.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,write-cache=on,bus=pcie-root-port-2,addr=0x0 \
    -blockdev node-name=file_disk1,driver=file,aio=threads,filename=/mnt/test/storage2.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_disk1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_disk1 \
    -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
    -device virtio-blk-pci,id=disk1,drive=drive_disk1,bootindex=1,write-cache=on,bus=pcie-root-port-3,addr=0x0 \
    -device pcie-root-port,id=pcie-root-port-4,port=0x4,addr=0x1.0x4,bus=pcie.0,chassis=5 \
    -device virtio-net-pci,mac=9a:37:37:37:37:4e,id=idBMd7vy,netdev=idLb51aS,bus=pcie-root-port-4,addr=0x0  \
    -netdev tap,id=idLb51aS,fd=14  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -enable-kvm \
    -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=6

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: QEMU 5.0 virtio-blk performance regression with high queue depths
  2020-08-24 13:44 QEMU 5.0 virtio-blk performance regression with high queue depths Stefan Hajnoczi
@ 2020-09-16 13:32 ` Stefan Hajnoczi
  2020-09-16 14:07   ` Denis V. Lunev
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Hajnoczi @ 2020-09-16 13:32 UTC (permalink / raw)
  To: Denis Plotnikov; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, Denis V. Lunev

On Thu, Aug 27, 2020 at 3:24 PM Stefan Hajnoczi <stefanha@gmail.com> wrote:
> Hi Denis,
> A performance regression was found after the virtio-blk queue-size
> property was increased from 128 to 256 in QEMU 5.0 in commit
> c9b7d9ec21dfca716f0bb3b68dee75660d86629c ("virtio: increase virtqueue
> size for virtio-scsi and virtio-blk"). I wanted to let you know if case
> you have ideas or see something similar.

Ping, have you noticed performance regressions after switching to
virtio-blk queue-size 256?

>
> Throughput and IOPS of the following fio benchmarks dropped by 30-40%:
>
>   # mkfs.xfs /dev/vdb
>   # mount /dev/vdb /mnt
>   # fio --rw=%s --bs=%s --iodepth=64 --runtime=1m --direct=1 --filename=/mnt/%s --name=job1 --ioengine=libaio --thread --group_reporting --numjobs=16 --size=512MB --time_based --output=/tmp/fio_result &> /dev/null
>     - rw: read write
>     - bs: 4k 64k
>
> Note that there are 16 threads submitting 64 requests each! The guest
> block device queue depth will be maxed out. The virtqueue should be full
> most of the time.
>
> Have you seen regressions after virtio-blk queue-size was increased in
> QEMU 5.0?
>
> Here are the details of the host storage:
>
>   # mkfs.xfs /dev/sdb # 60GB SSD drive
>   # mount /dev/sdb /mnt/test
>   # qemu-img create -f qcow2 /mnt/test/storage2.qcow2 40G
>
> The guest command-line is:
>
>   # MALLOC_PERTURB_=1 numactl \
>     -m 1  /usr/libexec/qemu-kvm \
>     -S  \
>     -name 'avocado-vt-vm1'  \
>     -sandbox on  \
>     -machine q35 \
>     -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
>     -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
>     -nodefaults \
>     -device VGA,bus=pcie.0,addr=0x2 \
>     -m 4096  \
>     -smp 2,maxcpus=2,cores=1,threads=1,dies=1,sockets=2  \
>     -cpu 'IvyBridge',+kvm_pv_unhalt \
>     -chardev socket,server,id=qmp_id_qmpmonitor1,nowait,path=/var/tmp/avocado_bapfdqao/monitor-qmpmonitor1-20200721-014154-5HJGMjxW  \
>     -mon chardev=qmp_id_qmpmonitor1,mode=control \
>     -chardev socket,server,id=qmp_id_catch_monitor,nowait,path=/var/tmp/avocado_bapfdqao/monitor-catch_monitor-20200721-014154-5HJGMjxW  \
>     -mon chardev=qmp_id_catch_monitor,mode=control \
>     -device pvpanic,ioport=0x505,id=id31BN83 \
>     -chardev socket,server,id=chardev_serial0,nowait,path=/var/tmp/avocado_bapfdqao/serial-serial0-20200721-014154-5HJGMjxW \
>     -device isa-serial,id=serial0,chardev=chardev_serial0  \
>     -chardev socket,id=seabioslog_id_20200721-014154-5HJGMjxW,path=/var/tmp/avocado_bapfdqao/seabios-20200721-014154-5HJGMjxW,server,nowait \
>     -device isa-debugcon,chardev=seabioslog_id_20200721-014154-5HJGMjxW,iobase=0x402 \
>     -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
>     -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
>     -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
>     -blockdev node-name=file_image1,driver=file,aio=threads,filename=rootfs.qcow2,cache.direct=on,cache.no-flush=off \
>     -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
>     -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
>     -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,write-cache=on,bus=pcie-root-port-2,addr=0x0 \
>     -blockdev node-name=file_disk1,driver=file,aio=threads,filename=/mnt/test/storage2.qcow2,cache.direct=on,cache.no-flush=off \
>     -blockdev node-name=drive_disk1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_disk1 \
>     -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
>     -device virtio-blk-pci,id=disk1,drive=drive_disk1,bootindex=1,write-cache=on,bus=pcie-root-port-3,addr=0x0 \
>     -device pcie-root-port,id=pcie-root-port-4,port=0x4,addr=0x1.0x4,bus=pcie.0,chassis=5 \
>     -device virtio-net-pci,mac=9a:37:37:37:37:4e,id=idBMd7vy,netdev=idLb51aS,bus=pcie-root-port-4,addr=0x0  \
>     -netdev tap,id=idLb51aS,fd=14  \
>     -vnc :0  \
>     -rtc base=utc,clock=host,driftfix=slew  \
>     -boot menu=off,order=cdn,once=c,strict=off \
>     -enable-kvm \
>     -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=6
>
> Stefan


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: QEMU 5.0 virtio-blk performance regression with high queue depths
  2020-09-16 13:32 ` Stefan Hajnoczi
@ 2020-09-16 14:07   ` Denis V. Lunev
  2020-09-16 16:43     ` Denis V. Lunev
  0 siblings, 1 reply; 7+ messages in thread
From: Denis V. Lunev @ 2020-09-16 14:07 UTC (permalink / raw)
  To: Stefan Hajnoczi, Denis Plotnikov; +Cc: qemu-devel, Vladimir Sementsov-Ogievskiy

On 9/16/20 4:32 PM, Stefan Hajnoczi wrote:
> On Thu, Aug 27, 2020 at 3:24 PM Stefan Hajnoczi <stefanha@gmail.com> wrote:
>> Hi Denis,
>> A performance regression was found after the virtio-blk queue-size
>> property was increased from 128 to 256 in QEMU 5.0 in commit
>> c9b7d9ec21dfca716f0bb3b68dee75660d86629c ("virtio: increase virtqueue
>> size for virtio-scsi and virtio-blk"). I wanted to let you know if case
>> you have ideas or see something similar.
> Ping, have you noticed performance regressions after switching to
> virtio-blk queue-size 256?
oops, I have missed original letter.

Denis Plotnikov have left the team at the moment.


>> Throughput and IOPS of the following fio benchmarks dropped by 30-40%:
>>
>>   # mkfs.xfs /dev/vdb
>>   # mount /dev/vdb /mnt
>>   # fio --rw=%s --bs=%s --iodepth=64 --runtime=1m --direct=1 --filename=/mnt/%s --name=job1 --ioengine=libaio --thread --group_reporting --numjobs=16 --size=512MB --time_based --output=/tmp/fio_result &> /dev/null
>>     - rw: read write
>>     - bs: 4k 64k
>>
>> Note that there are 16 threads submitting 64 requests each! The guest
>> block device queue depth will be maxed out. The virtqueue should be full
>> most of the time.
>>
>> Have you seen regressions after virtio-blk queue-size was increased in
>> QEMU 5.0?
>>
>> Here are the details of the host storage:
>>
>>   # mkfs.xfs /dev/sdb # 60GB SSD drive
>>   # mount /dev/sdb /mnt/test
>>   # qemu-img create -f qcow2 /mnt/test/storage2.qcow2 40G
>>
>> The guest command-line is:
>>
>>   # MALLOC_PERTURB_=1 numactl \
>>     -m 1  /usr/libexec/qemu-kvm \
>>     -S  \
>>     -name 'avocado-vt-vm1'  \
>>     -sandbox on  \
>>     -machine q35 \
>>     -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
>>     -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
>>     -nodefaults \
>>     -device VGA,bus=pcie.0,addr=0x2 \
>>     -m 4096  \
>>     -smp 2,maxcpus=2,cores=1,threads=1,dies=1,sockets=2  \
>>     -cpu 'IvyBridge',+kvm_pv_unhalt \
>>     -chardev socket,server,id=qmp_id_qmpmonitor1,nowait,path=/var/tmp/avocado_bapfdqao/monitor-qmpmonitor1-20200721-014154-5HJGMjxW  \
>>     -mon chardev=qmp_id_qmpmonitor1,mode=control \
>>     -chardev socket,server,id=qmp_id_catch_monitor,nowait,path=/var/tmp/avocado_bapfdqao/monitor-catch_monitor-20200721-014154-5HJGMjxW  \
>>     -mon chardev=qmp_id_catch_monitor,mode=control \
>>     -device pvpanic,ioport=0x505,id=id31BN83 \
>>     -chardev socket,server,id=chardev_serial0,nowait,path=/var/tmp/avocado_bapfdqao/serial-serial0-20200721-014154-5HJGMjxW \
>>     -device isa-serial,id=serial0,chardev=chardev_serial0  \
>>     -chardev socket,id=seabioslog_id_20200721-014154-5HJGMjxW,path=/var/tmp/avocado_bapfdqao/seabios-20200721-014154-5HJGMjxW,server,nowait \
>>     -device isa-debugcon,chardev=seabioslog_id_20200721-014154-5HJGMjxW,iobase=0x402 \
>>     -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
>>     -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
>>     -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
>>     -blockdev node-name=file_image1,driver=file,aio=threads,filename=rootfs.qcow2,cache.direct=on,cache.no-flush=off \
>>     -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
>>     -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
>>     -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,write-cache=on,bus=pcie-root-port-2,addr=0x0 \
>>     -blockdev node-name=file_disk1,driver=file,aio=threads,filename=/mnt/test/storage2.qcow2,cache.direct=on,cache.no-flush=off \
>>     -blockdev node-name=drive_disk1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_disk1 \
>>     -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
>>     -device virtio-blk-pci,id=disk1,drive=drive_disk1,bootindex=1,write-cache=on,bus=pcie-root-port-3,addr=0x0 \
>>     -device pcie-root-port,id=pcie-root-port-4,port=0x4,addr=0x1.0x4,bus=pcie.0,chassis=5 \
>>     -device virtio-net-pci,mac=9a:37:37:37:37:4e,id=idBMd7vy,netdev=idLb51aS,bus=pcie-root-port-4,addr=0x0  \
>>     -netdev tap,id=idLb51aS,fd=14  \
>>     -vnc :0  \
>>     -rtc base=utc,clock=host,driftfix=slew  \
>>     -boot menu=off,order=cdn,once=c,strict=off \
>>     -enable-kvm \
>>     -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=6
I will make a check today.

Talking about our performance measurements, we have not
seen ANY performance degradation, especially 30-40%.
This looking quite strange to me.

Though there is quite important difference. We are always
using O_DIRECT and 'native' AIO engine.

Den


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: QEMU 5.0 virtio-blk performance regression with high queue depths
  2020-09-16 14:07   ` Denis V. Lunev
@ 2020-09-16 16:43     ` Denis V. Lunev
  2020-09-17 12:41       ` Stefan Hajnoczi
  0 siblings, 1 reply; 7+ messages in thread
From: Denis V. Lunev @ 2020-09-16 16:43 UTC (permalink / raw)
  To: Stefan Hajnoczi, Denis Plotnikov; +Cc: qemu-devel, Vladimir Sementsov-Ogievskiy

On 9/16/20 5:07 PM, Denis V. Lunev wrote:
> On 9/16/20 4:32 PM, Stefan Hajnoczi wrote:
>> On Thu, Aug 27, 2020 at 3:24 PM Stefan Hajnoczi <stefanha@gmail.com> wrote:
>>> Hi Denis,
>>> A performance regression was found after the virtio-blk queue-size
>>> property was increased from 128 to 256 in QEMU 5.0 in commit
>>> c9b7d9ec21dfca716f0bb3b68dee75660d86629c ("virtio: increase virtqueue
>>> size for virtio-scsi and virtio-blk"). I wanted to let you know if case
>>> you have ideas or see something similar.
>> Ping, have you noticed performance regressions after switching to
>> virtio-blk queue-size 256?
> oops, I have missed original letter.
>
> Denis Plotnikov have left the team at the moment.
>
>
>>> Throughput and IOPS of the following fio benchmarks dropped by 30-40%:
>>>
>>>   # mkfs.xfs /dev/vdb
>>>   # mount /dev/vdb /mnt
>>>   # fio --rw=%s --bs=%s --iodepth=64 --runtime=1m --direct=1 --filename=/mnt/%s --name=job1 --ioengine=libaio --thread --group_reporting --numjobs=16 --size=512MB --time_based --output=/tmp/fio_result &> /dev/null
>>>     - rw: read write
>>>     - bs: 4k 64k
>>>
>>> Note that there are 16 threads submitting 64 requests each! The guest
>>> block device queue depth will be maxed out. The virtqueue should be full
>>> most of the time.
>>>
>>> Have you seen regressions after virtio-blk queue-size was increased in
>>> QEMU 5.0?
>>>
>>> Here are the details of the host storage:
>>>
>>>   # mkfs.xfs /dev/sdb # 60GB SSD drive
>>>   # mount /dev/sdb /mnt/test
>>>   # qemu-img create -f qcow2 /mnt/test/storage2.qcow2 40G
>>>
>>> The guest command-line is:
>>>
>>>   # MALLOC_PERTURB_=1 numactl \
>>>     -m 1  /usr/libexec/qemu-kvm \
>>>     -S  \
>>>     -name 'avocado-vt-vm1'  \
>>>     -sandbox on  \
>>>     -machine q35 \
>>>     -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
>>>     -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
>>>     -nodefaults \
>>>     -device VGA,bus=pcie.0,addr=0x2 \
>>>     -m 4096  \
>>>     -smp 2,maxcpus=2,cores=1,threads=1,dies=1,sockets=2  \
>>>     -cpu 'IvyBridge',+kvm_pv_unhalt \
>>>     -chardev socket,server,id=qmp_id_qmpmonitor1,nowait,path=/var/tmp/avocado_bapfdqao/monitor-qmpmonitor1-20200721-014154-5HJGMjxW  \
>>>     -mon chardev=qmp_id_qmpmonitor1,mode=control \
>>>     -chardev socket,server,id=qmp_id_catch_monitor,nowait,path=/var/tmp/avocado_bapfdqao/monitor-catch_monitor-20200721-014154-5HJGMjxW  \
>>>     -mon chardev=qmp_id_catch_monitor,mode=control \
>>>     -device pvpanic,ioport=0x505,id=id31BN83 \
>>>     -chardev socket,server,id=chardev_serial0,nowait,path=/var/tmp/avocado_bapfdqao/serial-serial0-20200721-014154-5HJGMjxW \
>>>     -device isa-serial,id=serial0,chardev=chardev_serial0  \
>>>     -chardev socket,id=seabioslog_id_20200721-014154-5HJGMjxW,path=/var/tmp/avocado_bapfdqao/seabios-20200721-014154-5HJGMjxW,server,nowait \
>>>     -device isa-debugcon,chardev=seabioslog_id_20200721-014154-5HJGMjxW,iobase=0x402 \
>>>     -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
>>>     -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
>>>     -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
>>>     -blockdev node-name=file_image1,driver=file,aio=threads,filename=rootfs.qcow2,cache.direct=on,cache.no-flush=off \
>>>     -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
>>>     -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
>>>     -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,write-cache=on,bus=pcie-root-port-2,addr=0x0 \
>>>     -blockdev node-name=file_disk1,driver=file,aio=threads,filename=/mnt/test/storage2.qcow2,cache.direct=on,cache.no-flush=off \
>>>     -blockdev node-name=drive_disk1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_disk1 \
>>>     -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
>>>     -device virtio-blk-pci,id=disk1,drive=drive_disk1,bootindex=1,write-cache=on,bus=pcie-root-port-3,addr=0x0 \
>>>     -device pcie-root-port,id=pcie-root-port-4,port=0x4,addr=0x1.0x4,bus=pcie.0,chassis=5 \
>>>     -device virtio-net-pci,mac=9a:37:37:37:37:4e,id=idBMd7vy,netdev=idLb51aS,bus=pcie-root-port-4,addr=0x0  \
>>>     -netdev tap,id=idLb51aS,fd=14  \
>>>     -vnc :0  \
>>>     -rtc base=utc,clock=host,driftfix=slew  \
>>>     -boot menu=off,order=cdn,once=c,strict=off \
>>>     -enable-kvm \
>>>     -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=6
> I will make a check today.
>
> Talking about our performance measurements, we have not
> seen ANY performance degradation, especially 30-40%.
> This looking quite strange to me.
>
> Though there is quite important difference. We are always
> using O_DIRECT and 'native' AIO engine.
>
> Den

I have put my hands into this and it looks like you are right. There is
a difference. It is not as significant for me as in your case, but I observe
stable around 10% difference with 128 vs 256 queue size.

I have checked with:
- QEMU 5.1
- Fedora 31 in guest
- qcow2 (64k, 1Mb) and raw image on host
- nocache and both threaded/native IO modes

The test was run on Thinkpad Carbon X1 gen 6 laptop.

For the reference, I have seen 330k IOPS for read
at max which is looking awesome for native and 220k
IOPS for threads.

Den


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: QEMU 5.0 virtio-blk performance regression with high queue depths
  2020-09-16 16:43     ` Denis V. Lunev
@ 2020-09-17 12:41       ` Stefan Hajnoczi
  2020-09-18  9:59         ` Denis V. Lunev
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Hajnoczi @ 2020-09-17 12:41 UTC (permalink / raw)
  To: Denis V. Lunev; +Cc: Denis Plotnikov, Vladimir Sementsov-Ogievskiy, qemu-devel

On Wed, Sep 16, 2020 at 5:43 PM Denis V. Lunev <den@virtuozzo.com> wrote:
> On 9/16/20 5:07 PM, Denis V. Lunev wrote:
> > I will make a check today.
> >
> > Talking about our performance measurements, we have not
> > seen ANY performance degradation, especially 30-40%.
> > This looking quite strange to me.
> >
> > Though there is quite important difference. We are always
> > using O_DIRECT and 'native' AIO engine.
> >
> > Den
>
> I have put my hands into this and it looks like you are right. There is
> a difference. It is not as significant for me as in your case, but I observe
> stable around 10% difference with 128 vs 256 queue size.
>
> I have checked with:
> - QEMU 5.1
> - Fedora 31 in guest
> - qcow2 (64k, 1Mb) and raw image on host
> - nocache and both threaded/native IO modes
>
> The test was run on Thinkpad Carbon X1 gen 6 laptop.
>
> For the reference, I have seen 330k IOPS for read
> at max which is looking awesome for native and 220k
> IOPS for threads.

Thanks for confirming! Reverting the commit is unattractive since it
does improve performance in some cases.

It would be good to understand the root cause so the regression can be
fixed without reducing queue-size again.

Do you have time to investigate?

Thanks,
Stefan


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: QEMU 5.0 virtio-blk performance regression with high queue depths
  2020-09-17 12:41       ` Stefan Hajnoczi
@ 2020-09-18  9:59         ` Denis V. Lunev
  2020-09-21  8:42           ` Stefan Hajnoczi
  0 siblings, 1 reply; 7+ messages in thread
From: Denis V. Lunev @ 2020-09-18  9:59 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Denis Plotnikov, qemu-devel, Vladimir Sementsov-Ogievskiy

On 9/17/20 3:41 PM, Stefan Hajnoczi wrote:
> On Wed, Sep 16, 2020 at 5:43 PM Denis V. Lunev <den@virtuozzo.com> wrote:
>> On 9/16/20 5:07 PM, Denis V. Lunev wrote:
>>> I will make a check today.
>>>
>>> Talking about our performance measurements, we have not
>>> seen ANY performance degradation, especially 30-40%.
>>> This looking quite strange to me.
>>>
>>> Though there is quite important difference. We are always
>>> using O_DIRECT and 'native' AIO engine.
>>>
>>> Den
>> I have put my hands into this and it looks like you are right. There is
>> a difference. It is not as significant for me as in your case, but I observe
>> stable around 10% difference with 128 vs 256 queue size.
>>
>> I have checked with:
>> - QEMU 5.1
>> - Fedora 31 in guest
>> - qcow2 (64k, 1Mb) and raw image on host
>> - nocache and both threaded/native IO modes
>>
>> The test was run on Thinkpad Carbon X1 gen 6 laptop.
>>
>> For the reference, I have seen 330k IOPS for read
>> at max which is looking awesome for native and 220k
>> IOPS for threads.
> Thanks for confirming! Reverting the commit is unattractive since it
> does improve performance in some cases.
>
> It would be good to understand the root cause so the regression can be
> fixed without reducing queue-size again.
>
> Do you have time to investigate?
I will make a try next week.

Den


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: QEMU 5.0 virtio-blk performance regression with high queue depths
  2020-09-18  9:59         ` Denis V. Lunev
@ 2020-09-21  8:42           ` Stefan Hajnoczi
  0 siblings, 0 replies; 7+ messages in thread
From: Stefan Hajnoczi @ 2020-09-21  8:42 UTC (permalink / raw)
  To: Denis V. Lunev; +Cc: Denis Plotnikov, Vladimir Sementsov-Ogievskiy, qemu-devel

On Fri, Sep 18, 2020 at 10:59 AM Denis V. Lunev <den@virtuozzo.com> wrote:
> On 9/17/20 3:41 PM, Stefan Hajnoczi wrote:
> > On Wed, Sep 16, 2020 at 5:43 PM Denis V. Lunev <den@virtuozzo.com> wrote:
> >> On 9/16/20 5:07 PM, Denis V. Lunev wrote:
> >>> I will make a check today.
> >>>
> >>> Talking about our performance measurements, we have not
> >>> seen ANY performance degradation, especially 30-40%.
> >>> This looking quite strange to me.
> >>>
> >>> Though there is quite important difference. We are always
> >>> using O_DIRECT and 'native' AIO engine.
> >>>
> >>> Den
> >> I have put my hands into this and it looks like you are right. There is
> >> a difference. It is not as significant for me as in your case, but I observe
> >> stable around 10% difference with 128 vs 256 queue size.
> >>
> >> I have checked with:
> >> - QEMU 5.1
> >> - Fedora 31 in guest
> >> - qcow2 (64k, 1Mb) and raw image on host
> >> - nocache and both threaded/native IO modes
> >>
> >> The test was run on Thinkpad Carbon X1 gen 6 laptop.
> >>
> >> For the reference, I have seen 330k IOPS for read
> >> at max which is looking awesome for native and 220k
> >> IOPS for threads.
> > Thanks for confirming! Reverting the commit is unattractive since it
> > does improve performance in some cases.
> >
> > It would be good to understand the root cause so the regression can be
> > fixed without reducing queue-size again.
> >
> > Do you have time to investigate?
> I will make a try next week.

Thanks!

Stefan


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-09-21  8:45 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-24 13:44 QEMU 5.0 virtio-blk performance regression with high queue depths Stefan Hajnoczi
2020-09-16 13:32 ` Stefan Hajnoczi
2020-09-16 14:07   ` Denis V. Lunev
2020-09-16 16:43     ` Denis V. Lunev
2020-09-17 12:41       ` Stefan Hajnoczi
2020-09-18  9:59         ` Denis V. Lunev
2020-09-21  8:42           ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.