All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	Christoph Hellwig <hch@lst.de>,
	Paolo Bonzini <pbonzini@redhat.com>, Jens Axboe <axboe@kernel.dk>,
	slp@redhat.com, sgarzare@redhat.com,
	"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [PATCH 3/3] virtio_blk: implement blk_mq_ops->poll()
Date: Thu, 3 Jun 2021 16:24:16 +0100	[thread overview]
Message-ID: <YLj0IN6rFuBMjzQQ@stefanha-x1.localdomain> (raw)
In-Reply-To: <c51191b8-d741-2abc-0446-8a139e2ea401@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 5053 bytes --]

On Thu, May 27, 2021 at 01:48:36PM +0800, Jason Wang wrote:
> 
> 在 2021/5/25 下午4:59, Stefan Hajnoczi 写道:
> > On Tue, May 25, 2021 at 11:21:41AM +0800, Jason Wang wrote:
> > > 在 2021/5/20 下午10:13, Stefan Hajnoczi 写道:
> > > > Request completion latency can be reduced by using polling instead of
> > > > irqs. Even Posted Interrupts or similar hardware support doesn't beat
> > > > polling. The reason is that disabling virtqueue notifications saves
> > > > critical-path CPU cycles on the host by skipping irq injection and in
> > > > the guest by skipping the irq handler. So let's add blk_mq_ops->poll()
> > > > support to virtio_blk.
> > > > 
> > > > The approach taken by this patch differs from the NVMe driver's
> > > > approach. NVMe dedicates hardware queues to polling and submits
> > > > REQ_HIPRI requests only on those queues. This patch does not require
> > > > exclusive polling queues for virtio_blk. Instead, it switches between
> > > > irqs and polling when one or more REQ_HIPRI requests are in flight on a
> > > > virtqueue.
> > > > 
> > > > This is possible because toggling virtqueue notifications is cheap even
> > > > while the virtqueue is running. NVMe cqs can't do this because irqs are
> > > > only enabled/disabled at queue creation time.
> > > > 
> > > > This toggling approach requires no configuration. There is no need to
> > > > dedicate queues ahead of time or to teach users and orchestration tools
> > > > how to set up polling queues.
> > > > 
> > > > Possible drawbacks of this approach:
> > > > 
> > > > - Hardware virtio_blk implementations may find virtqueue_disable_cb()
> > > >     expensive since it requires DMA.
> > > 
> > > Note that it's probably not related to the behavior of the driver but the
> > > design of the event suppression mechanism.
> > > 
> > > Device can choose to ignore the suppression flag and keep sending
> > > interrupts.
> > Yes, it's the design of the event suppression mechanism.
> > 
> > If we use dedicated polling virtqueues then the hardware doesn't need to
> > check whether interrupts are enabled for each notification. However,
> > there's no mechanism to tell the device that virtqueue interrupts are
> > permanently disabled. This means that as of today, even dedicated
> > virtqueues cannot suppress interrupts without the device checking for
> > each notification.
> 
> 
> This can be detected via a transport specific way.
> 
> E.g in the case of MSI, VIRTIO_MSI_NO_VECTOR could be a hint.

Nice idea :). Then there would be no need for changes to the hardware
interface. IRQ-less virtqueues is could still be mentioned explicitly in
the VIRTIO spec so that driver/device authors are aware of the
VIRTIO_MSI_NO_VECTOR trick.

> > > > +static int virtblk_poll(struct blk_mq_hw_ctx *hctx)
> > > > +{
> > > > +	struct virtio_blk *vblk = hctx->queue->queuedata;
> > > > +	struct virtqueue *vq = vblk->vqs[hctx->queue_num].vq;
> > > > +
> > > > +	if (!virtqueue_more_used(vq))
> > > 
> > > I'm not familiar with block polling but what happens if a buffer is made
> > > available after virtqueue_more_used() returns false here?
> > Can you explain the scenario, I'm not sure I understand? "buffer is made
> > available" -> are you thinking about additional requests being submitted
> > by the driver or an in-flight request being marked used by the device?
> 
> 
> Something like that:
> 
> 1) requests are submitted
> 2) poll but virtqueue_more_used() return false
> 3) device make buffer used
> 
> In this case, will poll() be triggered again by somebody else? (I think
> interrupt is disabled here).

Yes. An example blk_poll() user is
fs/block_dev.c:__blkdev_direct_IO_simple():

  qc = submit_bio(&bio);
  for (;;) {
      set_current_state(TASK_UNINTERRUPTIBLE);
      if (!READ_ONCE(bio.bi_private))
          break;
      if (!(iocb->ki_flags & IOCB_HIPRI) ||
          !blk_poll(bdev_get_queue(bdev), qc, true))
          blk_io_schedule();
  }

That's the infinite loop. The block layer implements the generic portion
of blk_poll(). blk_poll() calls mq_ops->poll() (virtblk_poll()).

So in general the polling loop will keep iterating, but there are
exceptions:
1. need_resched() causes blk_poll() to return 0 and blk_io_schedule()
   will be called.
2. blk-mq has a fancier io_poll algorithm that estimates I/O time and
   sleeps until the expected completion time to save CPU cycles. I
   haven't looked into detail at this one.

Both these cases affect existing mq_ops->poll() implementations (e.g.
NVMe). What's new in this patch series is that virtio-blk could have
non-polling requests on the virtqueue which now has irqs disabled. So we
could wait for them.

I think there's an easy solution for this: don't disable virtqueue irqs
when there are non-REQ_HIPRI requests in flight. The disadvantage is
that we'll keep irqs disable in more situations so the performance
improvement may not apply in some configurations.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

WARNING: multiple messages have this Message-ID (diff)
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH 3/3] virtio_blk: implement blk_mq_ops->poll()
Date: Thu, 3 Jun 2021 16:24:16 +0100	[thread overview]
Message-ID: <YLj0IN6rFuBMjzQQ@stefanha-x1.localdomain> (raw)
In-Reply-To: <c51191b8-d741-2abc-0446-8a139e2ea401@redhat.com>


[-- Attachment #1.1: Type: text/plain, Size: 5053 bytes --]

On Thu, May 27, 2021 at 01:48:36PM +0800, Jason Wang wrote:
> 
> 在 2021/5/25 下午4:59, Stefan Hajnoczi 写道:
> > On Tue, May 25, 2021 at 11:21:41AM +0800, Jason Wang wrote:
> > > 在 2021/5/20 下午10:13, Stefan Hajnoczi 写道:
> > > > Request completion latency can be reduced by using polling instead of
> > > > irqs. Even Posted Interrupts or similar hardware support doesn't beat
> > > > polling. The reason is that disabling virtqueue notifications saves
> > > > critical-path CPU cycles on the host by skipping irq injection and in
> > > > the guest by skipping the irq handler. So let's add blk_mq_ops->poll()
> > > > support to virtio_blk.
> > > > 
> > > > The approach taken by this patch differs from the NVMe driver's
> > > > approach. NVMe dedicates hardware queues to polling and submits
> > > > REQ_HIPRI requests only on those queues. This patch does not require
> > > > exclusive polling queues for virtio_blk. Instead, it switches between
> > > > irqs and polling when one or more REQ_HIPRI requests are in flight on a
> > > > virtqueue.
> > > > 
> > > > This is possible because toggling virtqueue notifications is cheap even
> > > > while the virtqueue is running. NVMe cqs can't do this because irqs are
> > > > only enabled/disabled at queue creation time.
> > > > 
> > > > This toggling approach requires no configuration. There is no need to
> > > > dedicate queues ahead of time or to teach users and orchestration tools
> > > > how to set up polling queues.
> > > > 
> > > > Possible drawbacks of this approach:
> > > > 
> > > > - Hardware virtio_blk implementations may find virtqueue_disable_cb()
> > > >     expensive since it requires DMA.
> > > 
> > > Note that it's probably not related to the behavior of the driver but the
> > > design of the event suppression mechanism.
> > > 
> > > Device can choose to ignore the suppression flag and keep sending
> > > interrupts.
> > Yes, it's the design of the event suppression mechanism.
> > 
> > If we use dedicated polling virtqueues then the hardware doesn't need to
> > check whether interrupts are enabled for each notification. However,
> > there's no mechanism to tell the device that virtqueue interrupts are
> > permanently disabled. This means that as of today, even dedicated
> > virtqueues cannot suppress interrupts without the device checking for
> > each notification.
> 
> 
> This can be detected via a transport specific way.
> 
> E.g in the case of MSI, VIRTIO_MSI_NO_VECTOR could be a hint.

Nice idea :). Then there would be no need for changes to the hardware
interface. IRQ-less virtqueues is could still be mentioned explicitly in
the VIRTIO spec so that driver/device authors are aware of the
VIRTIO_MSI_NO_VECTOR trick.

> > > > +static int virtblk_poll(struct blk_mq_hw_ctx *hctx)
> > > > +{
> > > > +	struct virtio_blk *vblk = hctx->queue->queuedata;
> > > > +	struct virtqueue *vq = vblk->vqs[hctx->queue_num].vq;
> > > > +
> > > > +	if (!virtqueue_more_used(vq))
> > > 
> > > I'm not familiar with block polling but what happens if a buffer is made
> > > available after virtqueue_more_used() returns false here?
> > Can you explain the scenario, I'm not sure I understand? "buffer is made
> > available" -> are you thinking about additional requests being submitted
> > by the driver or an in-flight request being marked used by the device?
> 
> 
> Something like that:
> 
> 1) requests are submitted
> 2) poll but virtqueue_more_used() return false
> 3) device make buffer used
> 
> In this case, will poll() be triggered again by somebody else? (I think
> interrupt is disabled here).

Yes. An example blk_poll() user is
fs/block_dev.c:__blkdev_direct_IO_simple():

  qc = submit_bio(&bio);
  for (;;) {
      set_current_state(TASK_UNINTERRUPTIBLE);
      if (!READ_ONCE(bio.bi_private))
          break;
      if (!(iocb->ki_flags & IOCB_HIPRI) ||
          !blk_poll(bdev_get_queue(bdev), qc, true))
          blk_io_schedule();
  }

That's the infinite loop. The block layer implements the generic portion
of blk_poll(). blk_poll() calls mq_ops->poll() (virtblk_poll()).

So in general the polling loop will keep iterating, but there are
exceptions:
1. need_resched() causes blk_poll() to return 0 and blk_io_schedule()
   will be called.
2. blk-mq has a fancier io_poll algorithm that estimates I/O time and
   sleeps until the expected completion time to save CPU cycles. I
   haven't looked into detail at this one.

Both these cases affect existing mq_ops->poll() implementations (e.g.
NVMe). What's new in this patch series is that virtio-blk could have
non-polling requests on the virtqueue which now has irqs disabled. So we
could wait for them.

I think there's an easy solution for this: don't disable virtqueue irqs
when there are non-REQ_HIPRI requests in flight. The disadvantage is
that we'll keep irqs disable in more situations so the performance
improvement may not apply in some configurations.

Stefan

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

  reply	other threads:[~2021-06-03 15:24 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-20 14:13 [PATCH 0/3] virtio_blk: blk-mq io_poll support Stefan Hajnoczi
2021-05-20 14:13 ` Stefan Hajnoczi
2021-05-20 14:13 ` [PATCH 1/3] virtio: add virtioqueue_more_used() Stefan Hajnoczi
2021-05-20 14:13   ` Stefan Hajnoczi
2021-05-25  2:23   ` Jason Wang
2021-05-25  2:23     ` Jason Wang
2021-05-25  8:48     ` Stefan Hajnoczi
2021-05-25  8:48       ` Stefan Hajnoczi
2021-05-20 14:13 ` [PATCH 2/3] virtio_blk: avoid repeating vblk->vqs[qid] Stefan Hajnoczi
2021-05-20 14:13   ` Stefan Hajnoczi
2021-05-25  2:25   ` Jason Wang
2021-05-25  2:25     ` Jason Wang
2021-05-20 14:13 ` [PATCH 3/3] virtio_blk: implement blk_mq_ops->poll() Stefan Hajnoczi
2021-05-20 14:13   ` Stefan Hajnoczi
2021-05-24 14:59   ` Christoph Hellwig
2021-05-24 14:59     ` Christoph Hellwig
2021-05-25  7:22     ` Paolo Bonzini
2021-05-25  7:22       ` Paolo Bonzini
2021-05-25  7:38       ` Ming Lei
2021-05-25  7:38         ` Ming Lei
2021-05-25  8:06         ` Paolo Bonzini
2021-05-25  8:06           ` Paolo Bonzini
2021-05-25 13:20         ` Stefan Hajnoczi
2021-05-25 13:20           ` Stefan Hajnoczi
2021-05-25 13:19     ` Stefan Hajnoczi
2021-05-25 13:19       ` Stefan Hajnoczi
2021-05-25  3:21   ` Jason Wang
2021-05-25  3:21     ` Jason Wang
2021-05-25  8:59     ` Stefan Hajnoczi
2021-05-25  8:59       ` Stefan Hajnoczi
2021-05-27  5:48       ` Jason Wang
2021-05-27  5:48         ` Jason Wang
2021-06-03 15:24         ` Stefan Hajnoczi [this message]
2021-06-03 15:24           ` Stefan Hajnoczi
2021-05-27  2:44   ` Ming Lei
2021-05-27  2:44     ` Ming Lei
2021-06-03 15:12     ` Stefan Hajnoczi
2021-06-03 15:12       ` Stefan Hajnoczi
2021-06-03 15:30 ` [PATCH 0/3] virtio_blk: blk-mq io_poll support Stefan Hajnoczi
2021-06-03 15:30   ` Stefan Hajnoczi
2021-06-16  7:43   ` Christoph Hellwig
2021-06-16  7:43     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YLj0IN6rFuBMjzQQ@stefanha-x1.localdomain \
    --to=stefanha@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=jasowang@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=sgarzare@redhat.com \
    --cc=slp@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.