All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	Christoph Hellwig <hch@lst.de>, Jason Wang <jasowang@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>, Jens Axboe <axboe@kernel.dk>,
	slp@redhat.com, sgarzare@redhat.com,
	"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [PATCH 3/3] virtio_blk: implement blk_mq_ops->poll()
Date: Thu, 3 Jun 2021 16:12:05 +0100	[thread overview]
Message-ID: <YLjxRcFN2G98mwH5@stefanha-x1.localdomain> (raw)
In-Reply-To: <YK8Ho3mC117M8GXS@T590>

[-- Attachment #1: Type: text/plain, Size: 3720 bytes --]

On Thu, May 27, 2021 at 10:44:51AM +0800, Ming Lei wrote:
> On Thu, May 20, 2021 at 03:13:05PM +0100, Stefan Hajnoczi wrote:
> > Request completion latency can be reduced by using polling instead of
> > irqs. Even Posted Interrupts or similar hardware support doesn't beat
> > polling. The reason is that disabling virtqueue notifications saves
> > critical-path CPU cycles on the host by skipping irq injection and in
> > the guest by skipping the irq handler. So let's add blk_mq_ops->poll()
> > support to virtio_blk.
> > 
> > The approach taken by this patch differs from the NVMe driver's
> > approach. NVMe dedicates hardware queues to polling and submits
> > REQ_HIPRI requests only on those queues. This patch does not require
> > exclusive polling queues for virtio_blk. Instead, it switches between
> > irqs and polling when one or more REQ_HIPRI requests are in flight on a
> > virtqueue.
> > 
> > This is possible because toggling virtqueue notifications is cheap even
> > while the virtqueue is running. NVMe cqs can't do this because irqs are
> > only enabled/disabled at queue creation time.
> > 
> > This toggling approach requires no configuration. There is no need to
> > dedicate queues ahead of time or to teach users and orchestration tools
> > how to set up polling queues.
> 
> This approach looks good, and very neat thanks per-vq lock.
> 
> BTW, is there any virt-exit saved by disabling vq interrupt? I understand
> there isn't since virt-exit may only be involved in remote completion
> via sending IPI.

This patch doesn't eliminate vmexits. QEMU already has virtqueue polling
code that disables the vq notification (the virtio-pci hardware register
write that causes a vmexit).

However, when both the guest
driver and the emulated device are polling then there are no vmexits or
interrupt injections with this patch.

> > 
> > Possible drawbacks of this approach:
> > 
> > - Hardware virtio_blk implementations may find virtqueue_disable_cb()
> >   expensive since it requires DMA. If such devices become popular then
> 
> You mean the hardware need to consider order between DMA completion and
> interrupt notify? But it is disabling notify, guest just calls
> virtqueue_get_buf() to see if there is buffer available, if not, it will be
> polled again.

Software devices have cheap access to guest RAM for looking at the
virtqueue_disable_cb() state before injecting an irq. Hardware devices
need to perform a DMA transaction to read that state. They have to do
this every time they want to raise an irq because the guest driver may
have changed the value.

I'm not sure if the DMA overhead is acceptable. This problem is not
introduced by this patch, it's a VIRTIO spec design issue.

I was trying to express that dedicated polling queues would avoid the
DMA since the device knows that irqs are never needed for this virtqueue.

> 
> >   the virtio_blk driver could use a similar approach to NVMe when
> >   VIRTIO_F_ACCESS_PLATFORM is detected in the future.
> > 
> > - If a blk_poll() thread is descheduled it not only hurts polling
> >   performance but also delays completion of non-REQ_HIPRI requests on
> >   that virtqueue since vq notifications are disabled.
> > 
> > Performance:
> > 
> > - Benchmark: fio ioengine=pvsync2 numjobs=4 direct=1
> > - Guest: 4 vCPUs with one virtio-blk device (4 virtqueues)
> 
> 4 jobs can consume up all 4 vCPUs. Just run a quick fio test with
> 'ioengine=io_uring --numjobs=1' on single vq, and IOPS can be improved
> by ~20%(hipri=1 vs hipri=0) with the 3 patches, and the virtio-blk is
> still backed on NVMe SSD.

Nice, thank you for sharing the data!

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

WARNING: multiple messages have this Message-ID (diff)
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH 3/3] virtio_blk: implement blk_mq_ops->poll()
Date: Thu, 3 Jun 2021 16:12:05 +0100	[thread overview]
Message-ID: <YLjxRcFN2G98mwH5@stefanha-x1.localdomain> (raw)
In-Reply-To: <YK8Ho3mC117M8GXS@T590>


[-- Attachment #1.1: Type: text/plain, Size: 3720 bytes --]

On Thu, May 27, 2021 at 10:44:51AM +0800, Ming Lei wrote:
> On Thu, May 20, 2021 at 03:13:05PM +0100, Stefan Hajnoczi wrote:
> > Request completion latency can be reduced by using polling instead of
> > irqs. Even Posted Interrupts or similar hardware support doesn't beat
> > polling. The reason is that disabling virtqueue notifications saves
> > critical-path CPU cycles on the host by skipping irq injection and in
> > the guest by skipping the irq handler. So let's add blk_mq_ops->poll()
> > support to virtio_blk.
> > 
> > The approach taken by this patch differs from the NVMe driver's
> > approach. NVMe dedicates hardware queues to polling and submits
> > REQ_HIPRI requests only on those queues. This patch does not require
> > exclusive polling queues for virtio_blk. Instead, it switches between
> > irqs and polling when one or more REQ_HIPRI requests are in flight on a
> > virtqueue.
> > 
> > This is possible because toggling virtqueue notifications is cheap even
> > while the virtqueue is running. NVMe cqs can't do this because irqs are
> > only enabled/disabled at queue creation time.
> > 
> > This toggling approach requires no configuration. There is no need to
> > dedicate queues ahead of time or to teach users and orchestration tools
> > how to set up polling queues.
> 
> This approach looks good, and very neat thanks per-vq lock.
> 
> BTW, is there any virt-exit saved by disabling vq interrupt? I understand
> there isn't since virt-exit may only be involved in remote completion
> via sending IPI.

This patch doesn't eliminate vmexits. QEMU already has virtqueue polling
code that disables the vq notification (the virtio-pci hardware register
write that causes a vmexit).

However, when both the guest
driver and the emulated device are polling then there are no vmexits or
interrupt injections with this patch.

> > 
> > Possible drawbacks of this approach:
> > 
> > - Hardware virtio_blk implementations may find virtqueue_disable_cb()
> >   expensive since it requires DMA. If such devices become popular then
> 
> You mean the hardware need to consider order between DMA completion and
> interrupt notify? But it is disabling notify, guest just calls
> virtqueue_get_buf() to see if there is buffer available, if not, it will be
> polled again.

Software devices have cheap access to guest RAM for looking at the
virtqueue_disable_cb() state before injecting an irq. Hardware devices
need to perform a DMA transaction to read that state. They have to do
this every time they want to raise an irq because the guest driver may
have changed the value.

I'm not sure if the DMA overhead is acceptable. This problem is not
introduced by this patch, it's a VIRTIO spec design issue.

I was trying to express that dedicated polling queues would avoid the
DMA since the device knows that irqs are never needed for this virtqueue.

> 
> >   the virtio_blk driver could use a similar approach to NVMe when
> >   VIRTIO_F_ACCESS_PLATFORM is detected in the future.
> > 
> > - If a blk_poll() thread is descheduled it not only hurts polling
> >   performance but also delays completion of non-REQ_HIPRI requests on
> >   that virtqueue since vq notifications are disabled.
> > 
> > Performance:
> > 
> > - Benchmark: fio ioengine=pvsync2 numjobs=4 direct=1
> > - Guest: 4 vCPUs with one virtio-blk device (4 virtqueues)
> 
> 4 jobs can consume up all 4 vCPUs. Just run a quick fio test with
> 'ioengine=io_uring --numjobs=1' on single vq, and IOPS can be improved
> by ~20%(hipri=1 vs hipri=0) with the 3 patches, and the virtio-blk is
> still backed on NVMe SSD.

Nice, thank you for sharing the data!

Stefan

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

  reply	other threads:[~2021-06-03 15:12 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-20 14:13 [PATCH 0/3] virtio_blk: blk-mq io_poll support Stefan Hajnoczi
2021-05-20 14:13 ` Stefan Hajnoczi
2021-05-20 14:13 ` [PATCH 1/3] virtio: add virtioqueue_more_used() Stefan Hajnoczi
2021-05-20 14:13   ` Stefan Hajnoczi
2021-05-25  2:23   ` Jason Wang
2021-05-25  2:23     ` Jason Wang
2021-05-25  8:48     ` Stefan Hajnoczi
2021-05-25  8:48       ` Stefan Hajnoczi
2021-05-20 14:13 ` [PATCH 2/3] virtio_blk: avoid repeating vblk->vqs[qid] Stefan Hajnoczi
2021-05-20 14:13   ` Stefan Hajnoczi
2021-05-25  2:25   ` Jason Wang
2021-05-25  2:25     ` Jason Wang
2021-05-20 14:13 ` [PATCH 3/3] virtio_blk: implement blk_mq_ops->poll() Stefan Hajnoczi
2021-05-20 14:13   ` Stefan Hajnoczi
2021-05-24 14:59   ` Christoph Hellwig
2021-05-24 14:59     ` Christoph Hellwig
2021-05-25  7:22     ` Paolo Bonzini
2021-05-25  7:22       ` Paolo Bonzini
2021-05-25  7:38       ` Ming Lei
2021-05-25  7:38         ` Ming Lei
2021-05-25  8:06         ` Paolo Bonzini
2021-05-25  8:06           ` Paolo Bonzini
2021-05-25 13:20         ` Stefan Hajnoczi
2021-05-25 13:20           ` Stefan Hajnoczi
2021-05-25 13:19     ` Stefan Hajnoczi
2021-05-25 13:19       ` Stefan Hajnoczi
2021-05-25  3:21   ` Jason Wang
2021-05-25  3:21     ` Jason Wang
2021-05-25  8:59     ` Stefan Hajnoczi
2021-05-25  8:59       ` Stefan Hajnoczi
2021-05-27  5:48       ` Jason Wang
2021-05-27  5:48         ` Jason Wang
2021-06-03 15:24         ` Stefan Hajnoczi
2021-06-03 15:24           ` Stefan Hajnoczi
2021-05-27  2:44   ` Ming Lei
2021-05-27  2:44     ` Ming Lei
2021-06-03 15:12     ` Stefan Hajnoczi [this message]
2021-06-03 15:12       ` Stefan Hajnoczi
2021-06-03 15:30 ` [PATCH 0/3] virtio_blk: blk-mq io_poll support Stefan Hajnoczi
2021-06-03 15:30   ` Stefan Hajnoczi
2021-06-16  7:43   ` Christoph Hellwig
2021-06-16  7:43     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YLjxRcFN2G98mwH5@stefanha-x1.localdomain \
    --to=stefanha@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=jasowang@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=sgarzare@redhat.com \
    --cc=slp@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.