QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
From: Wangyong <wang.yongD@h3c.com>
To: Stefan Hajnoczi <stefanha@redhat.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"mark.kanda@oracle.com" <mark.kanda@oracle.com>,
	"hch@lst.de" <hch@lst.de>
Cc: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: RE: issue about virtio-blk queue size
Date: Thu, 5 Dec 2019 01:30:09 +0000
Message-ID: <18dcb1c11c1d481eadf491f9074f6306@h3c.com> (raw)
In-Reply-To: <20191203143731.GD230219@stefanha-x1.localdomain>

>
> On Thu, Nov 28, 2019 at 08:44:43AM +0000, Wangyong wrote:
> > Hi all,
>
> This looks interesting, please continue this discussion on the QEMU mailing list
> <qemu-devel@nongnu.org> so that others can participate.
>
> >
> > This patch makes virtio_blk queue size configurable
> >
> > commit 6040aedddb5f474a9c2304b6a432a652d82b3d3c
> > Author: Mark Kanda <mark.kanda@oracle.com>
> > Date:   Mon Dec 11 09:16:24 2017 -0600
> >
> >     virtio-blk: make queue size configurable
> >
> > But when we set the queue size to more than 128, it will not take effect.
> >
> > That's because linux aio's maximum outstanding requests at a time is
> > always less than or equal to 128
> >
> > The following code limits the outstanding requests at a time:
> >
> > #define MAX_EVENTS 128
> >
> > laio_do_submit()
> > {
> >
> >     if (!s->io_q.blocked &&
> >         (!s->io_q.plugged ||
> >          s->io_q.in_flight + s->io_q.in_queue >= MAX_EVENTS)) {
> >         ioq_submit(s);
> >     }
> > }
> >
> > Should we make the value of MAX_EVENTS configurable ?
>
> Increasing MAX_EVENTS to a larger hardcoded value seems reasonable as a
> shortterm fix.  Please first check how /proc/sys/fs/aio-max-nr and
> io_setup(2) handle this resource limit.  The patch must not break existing
> systems where 128 works today.
[root@node2 ~]# cat /etc/centos-release
CentOS Linux release 7.5.1804 (Core)

[root@node2 ~]# cat /proc/sys/fs/aio-max-nr
4294967296

> > MAX_EVENTS should have the same value as queue size ?
>
> Multiple virtio-blk devices can share a single AioContext,
Is multiple virtio-blk configured with one IOThread?
Multiple virtio-blk performance will be worse.

>so setting it to the
> queue size may not be enough.  That's why I suggest increasing the
> hardcoded limit for now unless someone things up a way to size MAX_EVENTS
> correctly.
>
> > I set the virtio blk queue size to 1024, then tested the results as
> > follows
> >
> > fio --filename=/dev/vda -direct=1 -iodepth=1024 -thread -rw=randread
> > -ioengine=libaio -bs=8k -size=50G -numjobs=1 -runtime=600
> > -group_reporting -name=test
> > guest:
> >
> > [root@localhost ~]# cat /sys/module/virtio_blk/parameters/queue_depth
> > 1024
> >
> > [root@localhost ~]# cat /sys/block/vda/queue/nr_requests
> > 1024
> >
> > Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> > vda               0.00     0.00    0.00 1432.00     0.00 11456.00
> 16.00  1024.91  720.82    0.00  720.82   0.70 100.10
>
> This iostat output doesn't correspond to the fio -rw=randread command-line
> you posted because it shows writes instead of reads ;).  I assume nothing else
> was changed in the fio command-line.
fio --filename=/dev/vda -direct=1 -iodepth=1024 -thread -rw=randread -ioengine=libaio -bs=8k -size=50G -numjobs=1 -runtime=600 -group_reporting -name=test

MAX_EVENTS = 128

guest:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.00 1324.00    0.00 10592.00     0.00    16.00  1023.90  769.05  769.05    0.00   0.76 100.00

host:

root@cvk~/build# cat /sys/block/sda/queue/nr_requests
1024

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00 1359.00    0.00 10872.00     0.00    16.00   127.91   93.93   93.93    0.00   0.74 100.00


I redefined this macro(MAX_EVENTS = 1024)
#define MAX_EVENTS 1024
Then retested, the results are as follows: (IO performance will be greatly improved)

guest:

[root@localhost ~]# cat /sys/module/virtio_blk/parameters/queue_depth
1024

[root@localhost ~]# cat /sys/block/vda/queue/nr_requests
1024

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.00 1743.00    0.00 13944.00     0.00    16.00  1024.50  584.94  584.94    0.00   0.57 100.10


host:

root@cvk~/build# cat /sys/block/sda/queue/nr_requests
1024


Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00 1414.00    1.00 11312.00     1.00    15.99  1023.37  726.36  726.86   24.00   0.71 100.00
>
> >
> > host:
> >
> > root@cvk~/build# cat /sys/block/sda/queue/nr_requests
> > 1024
> >
> > Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> > sda               0.00    11.00    0.00 1402.00     0.00 11244.00
> 16.04   128.00   88.30    0.00   88.30   0.71 100.00
> >
> >
> >
> > I redefined this macro(MAX_EVENTS = 1024) #define MAX_EVENTS 1024
> >
> > Then retested, the results are as follows: (IO performance will be
> > greatly improved)
> >
> > fio --filename=/dev/vda -direct=1 -iodepth=1024 -thread -rw=randread
> > -ioengine=libaio -bs=8k -size=50G -numjobs=1 -runtime=600
> > -group_reporting -name=test
> >
> > guest:
> >
> > [root@localhost ~]# cat /sys/module/virtio_blk/parameters/queue_depth
> > 1024
> >
> > [root@localhost ~]# cat /sys/block/vda/queue/nr_requests
> > 1024
> >
> > Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> > vda               0.00     0.00 1743.00    0.00 13944.00     0.00
> 16.00  1024.50  584.94  584.94    0.00   0.57 100.10
>
> Now the iostat output shows reads instead of writes.  Please check again and
> make sure you're comparing reads with reads.
>
> Thanks,
> Stefan

Thanks
-------------------------------------------------------------------------------------------------------------------------------------
本邮件及其附件含有新华三集团的保密信息,仅限于发送给上面地址中列出
的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
邮件!
This e-mail and its attachments contain confidential information from New H3C, which is
intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
by phone or email immediately and delete it!

           reply index

Thread overview: expand[flat|nested]  mbox.gz  Atom feed
 [parent not found: <20191203143731.GD230219@stefanha-x1.localdomain>]

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=18dcb1c11c1d481eadf491f9074f6306@h3c.com \
    --to=wang.yongd@h3c.com \
    --cc=hch@lst.de \
    --cc=mark.kanda@oracle.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git