From: Stefan Hajnoczi <stefanha@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: virtualization@lists.linux-foundation.org,
linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
Christoph Hellwig <hch@lst.de>,
Paolo Bonzini <pbonzini@redhat.com>, Jens Axboe <axboe@kernel.dk>,
slp@redhat.com, sgarzare@redhat.com,
"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [PATCH 3/3] virtio_blk: implement blk_mq_ops->poll()
Date: Thu, 3 Jun 2021 16:24:16 +0100 [thread overview]
Message-ID: <YLj0IN6rFuBMjzQQ@stefanha-x1.localdomain> (raw)
In-Reply-To: <c51191b8-d741-2abc-0446-8a139e2ea401@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 5053 bytes --]
On Thu, May 27, 2021 at 01:48:36PM +0800, Jason Wang wrote:
>
> 在 2021/5/25 下午4:59, Stefan Hajnoczi 写道:
> > On Tue, May 25, 2021 at 11:21:41AM +0800, Jason Wang wrote:
> > > 在 2021/5/20 下午10:13, Stefan Hajnoczi 写道:
> > > > Request completion latency can be reduced by using polling instead of
> > > > irqs. Even Posted Interrupts or similar hardware support doesn't beat
> > > > polling. The reason is that disabling virtqueue notifications saves
> > > > critical-path CPU cycles on the host by skipping irq injection and in
> > > > the guest by skipping the irq handler. So let's add blk_mq_ops->poll()
> > > > support to virtio_blk.
> > > >
> > > > The approach taken by this patch differs from the NVMe driver's
> > > > approach. NVMe dedicates hardware queues to polling and submits
> > > > REQ_HIPRI requests only on those queues. This patch does not require
> > > > exclusive polling queues for virtio_blk. Instead, it switches between
> > > > irqs and polling when one or more REQ_HIPRI requests are in flight on a
> > > > virtqueue.
> > > >
> > > > This is possible because toggling virtqueue notifications is cheap even
> > > > while the virtqueue is running. NVMe cqs can't do this because irqs are
> > > > only enabled/disabled at queue creation time.
> > > >
> > > > This toggling approach requires no configuration. There is no need to
> > > > dedicate queues ahead of time or to teach users and orchestration tools
> > > > how to set up polling queues.
> > > >
> > > > Possible drawbacks of this approach:
> > > >
> > > > - Hardware virtio_blk implementations may find virtqueue_disable_cb()
> > > > expensive since it requires DMA.
> > >
> > > Note that it's probably not related to the behavior of the driver but the
> > > design of the event suppression mechanism.
> > >
> > > Device can choose to ignore the suppression flag and keep sending
> > > interrupts.
> > Yes, it's the design of the event suppression mechanism.
> >
> > If we use dedicated polling virtqueues then the hardware doesn't need to
> > check whether interrupts are enabled for each notification. However,
> > there's no mechanism to tell the device that virtqueue interrupts are
> > permanently disabled. This means that as of today, even dedicated
> > virtqueues cannot suppress interrupts without the device checking for
> > each notification.
>
>
> This can be detected via a transport specific way.
>
> E.g in the case of MSI, VIRTIO_MSI_NO_VECTOR could be a hint.
Nice idea :). Then there would be no need for changes to the hardware
interface. IRQ-less virtqueues is could still be mentioned explicitly in
the VIRTIO spec so that driver/device authors are aware of the
VIRTIO_MSI_NO_VECTOR trick.
> > > > +static int virtblk_poll(struct blk_mq_hw_ctx *hctx)
> > > > +{
> > > > + struct virtio_blk *vblk = hctx->queue->queuedata;
> > > > + struct virtqueue *vq = vblk->vqs[hctx->queue_num].vq;
> > > > +
> > > > + if (!virtqueue_more_used(vq))
> > >
> > > I'm not familiar with block polling but what happens if a buffer is made
> > > available after virtqueue_more_used() returns false here?
> > Can you explain the scenario, I'm not sure I understand? "buffer is made
> > available" -> are you thinking about additional requests being submitted
> > by the driver or an in-flight request being marked used by the device?
>
>
> Something like that:
>
> 1) requests are submitted
> 2) poll but virtqueue_more_used() return false
> 3) device make buffer used
>
> In this case, will poll() be triggered again by somebody else? (I think
> interrupt is disabled here).
Yes. An example blk_poll() user is
fs/block_dev.c:__blkdev_direct_IO_simple():
qc = submit_bio(&bio);
for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
if (!READ_ONCE(bio.bi_private))
break;
if (!(iocb->ki_flags & IOCB_HIPRI) ||
!blk_poll(bdev_get_queue(bdev), qc, true))
blk_io_schedule();
}
That's the infinite loop. The block layer implements the generic portion
of blk_poll(). blk_poll() calls mq_ops->poll() (virtblk_poll()).
So in general the polling loop will keep iterating, but there are
exceptions:
1. need_resched() causes blk_poll() to return 0 and blk_io_schedule()
will be called.
2. blk-mq has a fancier io_poll algorithm that estimates I/O time and
sleeps until the expected completion time to save CPU cycles. I
haven't looked into detail at this one.
Both these cases affect existing mq_ops->poll() implementations (e.g.
NVMe). What's new in this patch series is that virtio-blk could have
non-polling requests on the virtqueue which now has irqs disabled. So we
could wait for them.
I think there's an easy solution for this: don't disable virtqueue irqs
when there are non-REQ_HIPRI requests in flight. The disadvantage is
that we'll keep irqs disable in more situations so the performance
improvement may not apply in some configurations.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2021-06-03 15:24 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-20 14:13 [PATCH 0/3] virtio_blk: blk-mq io_poll support Stefan Hajnoczi
2021-05-20 14:13 ` [PATCH 1/3] virtio: add virtioqueue_more_used() Stefan Hajnoczi
2021-05-25 2:23 ` Jason Wang
2021-05-25 8:48 ` Stefan Hajnoczi
2021-05-20 14:13 ` [PATCH 2/3] virtio_blk: avoid repeating vblk->vqs[qid] Stefan Hajnoczi
2021-05-25 2:25 ` Jason Wang
2021-05-20 14:13 ` [PATCH 3/3] virtio_blk: implement blk_mq_ops->poll() Stefan Hajnoczi
2021-05-24 14:59 ` Christoph Hellwig
2021-05-25 7:22 ` Paolo Bonzini
2021-05-25 7:38 ` Ming Lei
2021-05-25 8:06 ` Paolo Bonzini
2021-05-25 13:20 ` Stefan Hajnoczi
2021-05-25 13:19 ` Stefan Hajnoczi
2021-05-25 3:21 ` Jason Wang
2021-05-25 8:59 ` Stefan Hajnoczi
2021-05-27 5:48 ` Jason Wang
2021-06-03 15:24 ` Stefan Hajnoczi [this message]
2021-05-27 2:44 ` Ming Lei
2021-06-03 15:12 ` Stefan Hajnoczi
2021-06-03 15:30 ` [PATCH 0/3] virtio_blk: blk-mq io_poll support Stefan Hajnoczi
2021-06-16 7:43 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YLj0IN6rFuBMjzQQ@stefanha-x1.localdomain \
--to=stefanha@redhat.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=jasowang@redhat.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=sgarzare@redhat.com \
--cc=slp@redhat.com \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).