RE: issue about virtio-blk queue size

From: Wangyong <wang.yongD@h3c.com>
To: Stefan Hajnoczi <stefanha@redhat.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"mark.kanda@oracle.com" <mark.kanda@oracle.com>,
	"hch@lst.de" <hch@lst.de>
Cc: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: RE: issue about virtio-blk queue size
Date: Thu, 5 Dec 2019 01:30:09 +0000	[thread overview]
Message-ID: <18dcb1c11c1d481eadf491f9074f6306@h3c.com> (raw)
In-Reply-To: <20191203143731.GD230219@stefanha-x1.localdomain>

>
> On Thu, Nov 28, 2019 at 08:44:43AM +0000, Wangyong wrote:
> > Hi all,
>
> This looks interesting, please continue this discussion on the QEMU mailing list
> <qemu-devel@nongnu.org> so that others can participate.
>
> >
> > This patch makes virtio_blk queue size configurable
> >
> > commit 6040aedddb5f474a9c2304b6a432a652d82b3d3c
> > Author: Mark Kanda <mark.kanda@oracle.com>
> > Date:   Mon Dec 11 09:16:24 2017 -0600
> >
> >     virtio-blk: make queue size configurable
> >
> > But when we set the queue size to more than 128, it will not take effect.
> >
> > That's because linux aio's maximum outstanding requests at a time is
> > always less than or equal to 128
> >
> > The following code limits the outstanding requests at a time:
> >
> > #define MAX_EVENTS 128
> >
> > laio_do_submit()
> > {
> >
> >     if (!s->io_q.blocked &&
> >         (!s->io_q.plugged ||
> >          s->io_q.in_flight + s->io_q.in_queue >= MAX_EVENTS)) {
> >         ioq_submit(s);
> >     }
> > }
> >
> > Should we make the value of MAX_EVENTS configurable ?
>
> Increasing MAX_EVENTS to a larger hardcoded value seems reasonable as a
> shortterm fix.  Please first check how /proc/sys/fs/aio-max-nr and
> io_setup(2) handle this resource limit.  The patch must not break existing
> systems where 128 works today.
[root@node2 ~]# cat /etc/centos-release
CentOS Linux release 7.5.1804 (Core)

[root@node2 ~]# cat /proc/sys/fs/aio-max-nr
4294967296

> > MAX_EVENTS should have the same value as queue size ?
>
> Multiple virtio-blk devices can share a single AioContext,
Is multiple virtio-blk configured with one IOThread?
Multiple virtio-blk performance will be worse.

>so setting it to the
> queue size may not be enough.  That's why I suggest increasing the
> hardcoded limit for now unless someone things up a way to size MAX_EVENTS
> correctly.
>
> > I set the virtio blk queue size to 1024, then tested the results as
> > follows
> >
> > fio --filename=/dev/vda -direct=1 -iodepth=1024 -thread -rw=randread
> > -ioengine=libaio -bs=8k -size=50G -numjobs=1 -runtime=600
> > -group_reporting -name=test
> > guest:
> >
> > [root@localhost ~]# cat /sys/module/virtio_blk/parameters/queue_depth
> > 1024
> >
> > [root@localhost ~]# cat /sys/block/vda/queue/nr_requests
> > 1024
> >
> > Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> > vda               0.00     0.00    0.00 1432.00     0.00 11456.00
> 16.00  1024.91  720.82    0.00  720.82   0.70 100.10
>
> This iostat output doesn't correspond to the fio -rw=randread command-line
> you posted because it shows writes instead of reads ;).  I assume nothing else
> was changed in the fio command-line.
fio --filename=/dev/vda -direct=1 -iodepth=1024 -thread -rw=randread -ioengine=libaio -bs=8k -size=50G -numjobs=1 -runtime=600 -group_reporting -name=test

MAX_EVENTS = 128

guest:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.00 1324.00    0.00 10592.00     0.00    16.00  1023.90  769.05  769.05    0.00   0.76 100.00

host:

root@cvk~/build# cat /sys/block/sda/queue/nr_requests
1024

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00 1359.00    0.00 10872.00     0.00    16.00   127.91   93.93   93.93    0.00   0.74 100.00


I redefined this macro(MAX_EVENTS = 1024)
#define MAX_EVENTS 1024
Then retested, the results are as follows: （IO performance will be greatly improved）

guest:

[root@localhost ~]# cat /sys/module/virtio_blk/parameters/queue_depth
1024

[root@localhost ~]# cat /sys/block/vda/queue/nr_requests
1024

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.00 1743.00    0.00 13944.00     0.00    16.00  1024.50  584.94  584.94    0.00   0.57 100.10


host:

root@cvk~/build# cat /sys/block/sda/queue/nr_requests
1024


Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00 1414.00    1.00 11312.00     1.00    15.99  1023.37  726.36  726.86   24.00   0.71 100.00
>
> >
> > host:
> >
> > root@cvk~/build# cat /sys/block/sda/queue/nr_requests
> > 1024
> >
> > Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> > sda               0.00    11.00    0.00 1402.00     0.00 11244.00
> 16.04   128.00   88.30    0.00   88.30   0.71 100.00
> >
> >
> >
> > I redefined this macro(MAX_EVENTS = 1024) #define MAX_EVENTS 1024
> >
> > Then retested, the results are as follows: （IO performance will be
> > greatly improved）
> >
> > fio --filename=/dev/vda -direct=1 -iodepth=1024 -thread -rw=randread
> > -ioengine=libaio -bs=8k -size=50G -numjobs=1 -runtime=600
> > -group_reporting -name=test
> >
> > guest:
> >
> > [root@localhost ~]# cat /sys/module/virtio_blk/parameters/queue_depth
> > 1024
> >
> > [root@localhost ~]# cat /sys/block/vda/queue/nr_requests
> > 1024
> >
> > Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> > vda               0.00     0.00 1743.00    0.00 13944.00     0.00
> 16.00  1024.50  584.94  584.94    0.00   0.57 100.10
>
> Now the iostat output shows reads instead of writes.  Please check again and
> make sure you're comparing reads with reads.
>
> Thanks,
> Stefan

Thanks
-------------------------------------------------------------------------------------------------------------------------------------
本邮件及其附件含有新华三集团的保密信息，仅限于发送给上面地址中列出
的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、
或散发）本邮件中的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本
邮件！
This e-mail and its attachments contain confidential information from New H3C, which is
intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
by phone or email immediately and delete it!