All of lore.kernel.org
 help / color / mirror / Atom feed
From: Khoa Huynh <khoa@us.ibm.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
	Anthony Liguori <aliguori@us.ibm.com>,
	Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>,
	kvm@vger.kernel.org, qemu-devel@nongnu.org,
	Paolo Bonzini <pbonzini@redhat.com>, Asias He <asias@redhat.com>
Subject: Re: [RFC v9 00/27] virtio: virtio-blk data plane
Date: Wed, 18 Jul 2012 11:18:29 -0500	[thread overview]
Message-ID: <OF445A2873.11F78213-ON85257A3F.00585BA8-86257A3F.00599614@us.ibm.com> (raw)
In-Reply-To: <20120718154323.GE1777@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 5320 bytes --]


"Michael S. Tsirkin" <mst@redhat.com> wrote on 07/18/2012 10:43:23 AM:

> From: "Michael S. Tsirkin" <mst@redhat.com>
> To: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>,
> Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org, Anthony Liguori/
> Austin/IBM@IBMUS, Kevin Wolf <kwolf@redhat.com>, Paolo Bonzini
> <pbonzini@redhat.com>, Asias He <asias@redhat.com>, Khoa Huynh/
> Austin/IBM@IBMUS
> Date: 07/18/2012 10:46 AM
> Subject: Re: [RFC v9 00/27] virtio: virtio-blk data plane
>
> On Wed, Jul 18, 2012 at 04:07:27PM +0100, Stefan Hajnoczi wrote:
> > This series implements a dedicated thread for virtio-blk
> processing using Linux
> > AIO for raw image files only.  It is based on qemu-kvm.git a0bc8c3
> and somewhat
> > old but I wanted to share it on the list since it has been
> mentioned on mailing
> > lists and IRC recently.
> >
> > These patches can be used for benchmarking and discussion about
> how to improve
> > block performance.  Paolo Bonzini has also worked in this area andmight
want
> > to share his patches.
> >
> > The basic approach is:
> > 1. Each virtio-blk device has a thread dedicated to handling ioeventfd
> >    signalling when the guest kicks the virtqueue.
> > 2. Requests are processed without going through the QEMU block layer
using
> >    Linux AIO directly.
> > 3. Completion interrupts are injected via ioctl from the dedicated
thread.
> >
> > The series also contains request merging as a bdrv_aio_multiwrite
> () equivalent.
> > This was only to get a comparison against the QEMU block layer and
> I would drop
> > it for other types of analysis.
> >
> > The effect of this series is that O_DIRECT Linux AIO on raw files can
bypass
> > the QEMU global mutex and block layer.  This means higher performance.
>
> Do you have any numbers at all?

Yes, we do have a lot of data for this data-plane patch set.  I can send
you
detailed charts if you like, but generally, we run into a performance
bottleneck
with the existing qemu due to the qemu global mutex, and thus, could only
get
to about 140,000 IOPS for a single guest (at least on my setup).  With this
data-plane patch set, we bypass this bottleneck and have been able to
achieve
more than 600,000 IOPS for a single guest, and an aggregate 1.33 million
IOPS
with 4 guests on a single host.

Just for reference, VMware has claimed that they could get 300,000 IOPS for
a
single VM and 1 million IOPS with 6 VMs on a single VSphere 5.0 host.  So
we
definitely need something like this for KVM to be competitive with VMware
and
other hypervisors.  Of course, this would also help satisfy the high I/O
rate
requirements for BigData and other data-intensive applications or
benchmarks
running on KVM.

Thanks,
-Khoa

>
> > A cleaned up version of this approach could be added to QEMU as a
> raw O_DIRECT
> > Linux AIO fast path.  Image file formats, protocols, and other block
layer
> > features are not supported by virtio-blk-data-plane.
> >
> > Git repo:
> > http://repo.or.cz/w/qemu-kvm/stefanha.git/shortlog/refs/heads/
> virtio-blk-data-plane
> >
> > Stefan Hajnoczi (27):
> >   virtio-blk: Remove virtqueue request handling code
> >   virtio-blk: Set up host notifier for data plane
> >   virtio-blk: Data plane thread event loop
> >   virtio-blk: Map vring
> >   virtio-blk: Do cheapest possible memory mapping
> >   virtio-blk: Take PCI memory range into account
> >   virtio-blk: Put dataplane code into its own directory
> >   virtio-blk: Read requests from the vring
> >   virtio-blk: Add Linux AIO queue
> >   virtio-blk: Stop data plane thread cleanly
> >   virtio-blk: Indirect vring and flush support
> >   virtio-blk: Add workaround for BUG_ON() dependency in virtio_ring.h
> >   virtio-blk: Increase max requests for indirect vring
> >   virtio-blk: Use pthreads instead of qemu-thread
> >   notifier: Add a function to set the notifier
> >   virtio-blk: Kick data plane thread using event notifier set
> >   virtio-blk: Use guest notifier to raise interrupts
> >   virtio-blk: Call ioctl() directly instead of irqfd
> >   virtio-blk: Disable guest->host notifies while processing vring
> >   virtio-blk: Add ioscheduler to detect mergable requests
> >   virtio-blk: Add basic request merging
> >   virtio-blk: Fix request merging
> >   virtio-blk: Stub out SCSI commands
> >   virtio-blk: fix incorrect length
> >   msix: fix irqchip breakage in msix_try_notify_from_thread()
> >   msix: use upstream kvm_irqchip_set_irq()
> >   virtio-blk: add EVENT_IDX support to dataplane
> >
> >  event_notifier.c          |    7 +
> >  event_notifier.h          |    1 +
> >  hw/dataplane/event-poll.h |  116 +++++++
> >  hw/dataplane/ioq.h        |  128 ++++++++
> >  hw/dataplane/iosched.h    |   97 ++++++
> >  hw/dataplane/vring.h      |  334 ++++++++++++++++++++
> >  hw/msix.c                 |   15 +
> >  hw/msix.h                 |    1 +
> >  hw/virtio-blk.c           |  753 ++++++++++++++++++++
> +------------------------
> >  hw/virtio-pci.c           |    8 +
> >  hw/virtio.c               |    9 +
> >  hw/virtio.h               |    3 +
> >  12 files changed, 1074 insertions(+), 398 deletions(-)
> >  create mode 100644 hw/dataplane/event-poll.h
> >  create mode 100644 hw/dataplane/ioq.h
> >  create mode 100644 hw/dataplane/iosched.h
> >  create mode 100644 hw/dataplane/vring.h
> >
> > --
> > 1.7.10.4
>

[-- Attachment #2: Type: text/html, Size: 7881 bytes --]

WARNING: multiple messages have this Message-ID (diff)
From: Khoa Huynh <khoa@us.ibm.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
	Anthony Liguori <aliguori@us.ibm.com>,
	Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>,
	kvm@vger.kernel.org, qemu-devel@nongnu.org,
	Paolo Bonzini <pbonzini@redhat.com>, Asias He <asias@redhat.com>
Subject: Re: [Qemu-devel] [RFC v9 00/27] virtio: virtio-blk data plane
Date: Wed, 18 Jul 2012 11:18:29 -0500	[thread overview]
Message-ID: <OF445A2873.11F78213-ON85257A3F.00585BA8-86257A3F.00599614@us.ibm.com> (raw)
In-Reply-To: <20120718154323.GE1777@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 5320 bytes --]


"Michael S. Tsirkin" <mst@redhat.com> wrote on 07/18/2012 10:43:23 AM:

> From: "Michael S. Tsirkin" <mst@redhat.com>
> To: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>,
> Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org, Anthony Liguori/
> Austin/IBM@IBMUS, Kevin Wolf <kwolf@redhat.com>, Paolo Bonzini
> <pbonzini@redhat.com>, Asias He <asias@redhat.com>, Khoa Huynh/
> Austin/IBM@IBMUS
> Date: 07/18/2012 10:46 AM
> Subject: Re: [RFC v9 00/27] virtio: virtio-blk data plane
>
> On Wed, Jul 18, 2012 at 04:07:27PM +0100, Stefan Hajnoczi wrote:
> > This series implements a dedicated thread for virtio-blk
> processing using Linux
> > AIO for raw image files only.  It is based on qemu-kvm.git a0bc8c3
> and somewhat
> > old but I wanted to share it on the list since it has been
> mentioned on mailing
> > lists and IRC recently.
> >
> > These patches can be used for benchmarking and discussion about
> how to improve
> > block performance.  Paolo Bonzini has also worked in this area andmight
want
> > to share his patches.
> >
> > The basic approach is:
> > 1. Each virtio-blk device has a thread dedicated to handling ioeventfd
> >    signalling when the guest kicks the virtqueue.
> > 2. Requests are processed without going through the QEMU block layer
using
> >    Linux AIO directly.
> > 3. Completion interrupts are injected via ioctl from the dedicated
thread.
> >
> > The series also contains request merging as a bdrv_aio_multiwrite
> () equivalent.
> > This was only to get a comparison against the QEMU block layer and
> I would drop
> > it for other types of analysis.
> >
> > The effect of this series is that O_DIRECT Linux AIO on raw files can
bypass
> > the QEMU global mutex and block layer.  This means higher performance.
>
> Do you have any numbers at all?

Yes, we do have a lot of data for this data-plane patch set.  I can send
you
detailed charts if you like, but generally, we run into a performance
bottleneck
with the existing qemu due to the qemu global mutex, and thus, could only
get
to about 140,000 IOPS for a single guest (at least on my setup).  With this
data-plane patch set, we bypass this bottleneck and have been able to
achieve
more than 600,000 IOPS for a single guest, and an aggregate 1.33 million
IOPS
with 4 guests on a single host.

Just for reference, VMware has claimed that they could get 300,000 IOPS for
a
single VM and 1 million IOPS with 6 VMs on a single VSphere 5.0 host.  So
we
definitely need something like this for KVM to be competitive with VMware
and
other hypervisors.  Of course, this would also help satisfy the high I/O
rate
requirements for BigData and other data-intensive applications or
benchmarks
running on KVM.

Thanks,
-Khoa

>
> > A cleaned up version of this approach could be added to QEMU as a
> raw O_DIRECT
> > Linux AIO fast path.  Image file formats, protocols, and other block
layer
> > features are not supported by virtio-blk-data-plane.
> >
> > Git repo:
> > http://repo.or.cz/w/qemu-kvm/stefanha.git/shortlog/refs/heads/
> virtio-blk-data-plane
> >
> > Stefan Hajnoczi (27):
> >   virtio-blk: Remove virtqueue request handling code
> >   virtio-blk: Set up host notifier for data plane
> >   virtio-blk: Data plane thread event loop
> >   virtio-blk: Map vring
> >   virtio-blk: Do cheapest possible memory mapping
> >   virtio-blk: Take PCI memory range into account
> >   virtio-blk: Put dataplane code into its own directory
> >   virtio-blk: Read requests from the vring
> >   virtio-blk: Add Linux AIO queue
> >   virtio-blk: Stop data plane thread cleanly
> >   virtio-blk: Indirect vring and flush support
> >   virtio-blk: Add workaround for BUG_ON() dependency in virtio_ring.h
> >   virtio-blk: Increase max requests for indirect vring
> >   virtio-blk: Use pthreads instead of qemu-thread
> >   notifier: Add a function to set the notifier
> >   virtio-blk: Kick data plane thread using event notifier set
> >   virtio-blk: Use guest notifier to raise interrupts
> >   virtio-blk: Call ioctl() directly instead of irqfd
> >   virtio-blk: Disable guest->host notifies while processing vring
> >   virtio-blk: Add ioscheduler to detect mergable requests
> >   virtio-blk: Add basic request merging
> >   virtio-blk: Fix request merging
> >   virtio-blk: Stub out SCSI commands
> >   virtio-blk: fix incorrect length
> >   msix: fix irqchip breakage in msix_try_notify_from_thread()
> >   msix: use upstream kvm_irqchip_set_irq()
> >   virtio-blk: add EVENT_IDX support to dataplane
> >
> >  event_notifier.c          |    7 +
> >  event_notifier.h          |    1 +
> >  hw/dataplane/event-poll.h |  116 +++++++
> >  hw/dataplane/ioq.h        |  128 ++++++++
> >  hw/dataplane/iosched.h    |   97 ++++++
> >  hw/dataplane/vring.h      |  334 ++++++++++++++++++++
> >  hw/msix.c                 |   15 +
> >  hw/msix.h                 |    1 +
> >  hw/virtio-blk.c           |  753 ++++++++++++++++++++
> +------------------------
> >  hw/virtio-pci.c           |    8 +
> >  hw/virtio.c               |    9 +
> >  hw/virtio.h               |    3 +
> >  12 files changed, 1074 insertions(+), 398 deletions(-)
> >  create mode 100644 hw/dataplane/event-poll.h
> >  create mode 100644 hw/dataplane/ioq.h
> >  create mode 100644 hw/dataplane/iosched.h
> >  create mode 100644 hw/dataplane/vring.h
> >
> > --
> > 1.7.10.4
>

[-- Attachment #2: Type: text/html, Size: 7881 bytes --]

  reply	other threads:[~2012-07-18 16:18 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-18 15:07 [RFC v9 00/27] virtio: virtio-blk data plane Stefan Hajnoczi
2012-07-18 15:07 ` [Qemu-devel] " Stefan Hajnoczi
2012-07-18 15:07 ` [RFC v9 01/27] virtio-blk: Remove virtqueue request handling code Stefan Hajnoczi
2012-07-18 15:07   ` [Qemu-devel] " Stefan Hajnoczi
2012-07-18 15:07 ` [RFC v9 02/27] virtio-blk: Set up host notifier for data plane Stefan Hajnoczi
2012-07-18 15:07   ` [Qemu-devel] " Stefan Hajnoczi
2012-07-18 15:07 ` [RFC v9 03/27] virtio-blk: Data plane thread event loop Stefan Hajnoczi
2012-07-18 15:07   ` [Qemu-devel] " Stefan Hajnoczi
2012-07-18 15:07 ` [RFC v9 04/27] virtio-blk: Map vring Stefan Hajnoczi
2012-07-18 15:07   ` [Qemu-devel] " Stefan Hajnoczi
2012-07-18 15:07 ` [RFC v9 05/27] virtio-blk: Do cheapest possible memory mapping Stefan Hajnoczi
2012-07-18 15:07   ` [Qemu-devel] " Stefan Hajnoczi
2012-07-18 15:07 ` [RFC v9 06/27] virtio-blk: Take PCI memory range into account Stefan Hajnoczi
2012-07-18 15:07   ` [Qemu-devel] " Stefan Hajnoczi
2012-07-18 18:29   ` Michael S. Tsirkin
2012-07-18 18:29     ` [Qemu-devel] " Michael S. Tsirkin
2012-07-19  9:14     ` Stefan Hajnoczi
2012-07-19  9:14       ` [Qemu-devel] " Stefan Hajnoczi
2012-07-19  9:16       ` Stefan Hajnoczi
2012-07-19  9:16         ` Stefan Hajnoczi
2012-07-19  9:29         ` Avi Kivity
2012-07-19  9:29           ` Avi Kivity
2012-07-18 15:07 ` [RFC v9 07/27] virtio-blk: Put dataplane code into its own directory Stefan Hajnoczi
2012-07-18 15:07   ` [Qemu-devel] " Stefan Hajnoczi
2012-07-18 15:07 ` [RFC v9 08/27] virtio-blk: Read requests from the vring Stefan Hajnoczi
2012-07-18 15:07   ` [Qemu-devel] " Stefan Hajnoczi
2012-07-18 15:07 ` [RFC v9 09/27] virtio-blk: Add Linux AIO queue Stefan Hajnoczi
2012-07-18 15:07   ` [Qemu-devel] " Stefan Hajnoczi
2012-07-18 15:07 ` [RFC v9 10/27] virtio-blk: Stop data plane thread cleanly Stefan Hajnoczi
2012-07-18 15:07   ` [Qemu-devel] " Stefan Hajnoczi
2012-07-18 15:07 ` [RFC v9 11/27] virtio-blk: Indirect vring and flush support Stefan Hajnoczi
2012-07-18 15:07   ` [Qemu-devel] " Stefan Hajnoczi
2012-07-18 18:28   ` Michael S. Tsirkin
2012-07-18 18:28     ` [Qemu-devel] " Michael S. Tsirkin
2012-07-18 19:02   ` Michael S. Tsirkin
2012-07-18 19:02     ` [Qemu-devel] " Michael S. Tsirkin
2012-07-18 15:07 ` [RFC v9 12/27] virtio-blk: Add workaround for BUG_ON() dependency in virtio_ring.h Stefan Hajnoczi
2012-07-18 15:07   ` [Qemu-devel] " Stefan Hajnoczi
2012-07-18 19:03   ` Michael S. Tsirkin
2012-07-18 19:03     ` [Qemu-devel] " Michael S. Tsirkin
2012-07-18 15:07 ` [RFC v9 13/27] virtio-blk: Increase max requests for indirect vring Stefan Hajnoczi
2012-07-18 15:07   ` [Qemu-devel] " Stefan Hajnoczi
2012-07-18 15:07 ` [RFC v9 14/27] virtio-blk: Use pthreads instead of qemu-thread Stefan Hajnoczi
2012-07-18 15:07   ` [Qemu-devel] " Stefan Hajnoczi
2012-07-18 15:07 ` [RFC v9 15/27] notifier: Add a function to set the notifier Stefan Hajnoczi
2012-07-18 15:07   ` [Qemu-devel] " Stefan Hajnoczi
2012-07-18 15:07 ` [RFC v9 16/27] virtio-blk: Kick data plane thread using event notifier set Stefan Hajnoczi
2012-07-18 15:07   ` [Qemu-devel] " Stefan Hajnoczi
2012-07-18 15:07 ` [RFC v9 17/27] virtio-blk: Use guest notifier to raise interrupts Stefan Hajnoczi
2012-07-18 15:07   ` [Qemu-devel] " Stefan Hajnoczi
2012-07-18 15:07 ` [RFC v9 18/27] virtio-blk: Call ioctl() directly instead of irqfd Stefan Hajnoczi
2012-07-18 15:07   ` [Qemu-devel] " Stefan Hajnoczi
2012-07-18 15:40   ` Michael S. Tsirkin
2012-07-18 15:40     ` [Qemu-devel] " Michael S. Tsirkin
2012-07-19  9:11     ` Stefan Hajnoczi
2012-07-19  9:11       ` Stefan Hajnoczi
2012-07-19  9:19       ` Michael S. Tsirkin
2012-07-19  9:19         ` Michael S. Tsirkin
2012-07-18 15:07 ` [RFC v9 19/27] virtio-blk: Disable guest->host notifies while processing vring Stefan Hajnoczi
2012-07-18 15:07   ` [Qemu-devel] " Stefan Hajnoczi
2012-07-18 15:07 ` [RFC v9 20/27] virtio-blk: Add ioscheduler to detect mergable requests Stefan Hajnoczi
2012-07-18 15:07   ` [Qemu-devel] " Stefan Hajnoczi
2012-07-18 15:07 ` [RFC v9 21/27] virtio-blk: Add basic request merging Stefan Hajnoczi
2012-07-18 15:07   ` [Qemu-devel] " Stefan Hajnoczi
2012-07-18 15:07 ` [RFC v9 22/27] virtio-blk: Fix " Stefan Hajnoczi
2012-07-18 15:07   ` [Qemu-devel] " Stefan Hajnoczi
2012-07-18 19:04   ` Michael S. Tsirkin
2012-07-18 19:04     ` [Qemu-devel] " Michael S. Tsirkin
2012-07-18 15:07 ` [RFC v9 23/27] virtio-blk: Stub out SCSI commands Stefan Hajnoczi
2012-07-18 15:07   ` [Qemu-devel] " Stefan Hajnoczi
2012-07-18 19:05   ` Michael S. Tsirkin
2012-07-18 19:05     ` [Qemu-devel] " Michael S. Tsirkin
2012-07-18 15:07 ` [RFC v9 24/27] virtio-blk: fix incorrect length Stefan Hajnoczi
2012-07-18 15:07   ` [Qemu-devel] " Stefan Hajnoczi
2012-07-18 15:07 ` [RFC v9 25/27] msix: fix irqchip breakage in msix_try_notify_from_thread() Stefan Hajnoczi
2012-07-18 15:07   ` [Qemu-devel] " Stefan Hajnoczi
2012-07-18 15:07 ` [RFC v9 26/27] msix: use upstream kvm_irqchip_set_irq() Stefan Hajnoczi
2012-07-18 15:07   ` [Qemu-devel] " Stefan Hajnoczi
2012-07-18 15:07 ` [RFC v9 27/27] virtio-blk: add EVENT_IDX support to dataplane Stefan Hajnoczi
2012-07-18 15:07   ` [Qemu-devel] " Stefan Hajnoczi
2012-07-18 15:43 ` [RFC v9 00/27] virtio: virtio-blk data plane Michael S. Tsirkin
2012-07-18 15:43   ` [Qemu-devel] " Michael S. Tsirkin
2012-07-18 16:18   ` Khoa Huynh [this message]
2012-07-18 16:18     ` Khoa Huynh
2012-07-18 16:41   ` Khoa Huynh
2012-07-18 16:41     ` [Qemu-devel] " Khoa Huynh
2012-07-18 15:49 ` Michael S. Tsirkin
2012-07-18 15:49   ` [Qemu-devel] " Michael S. Tsirkin
2012-07-19  9:48   ` Stefan Hajnoczi
2012-07-19  9:48     ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=OF445A2873.11F78213-ON85257A3F.00585BA8-86257A3F.00599614@us.ibm.com \
    --to=khoa@us.ibm.com \
    --cc=aliguori@us.ibm.com \
    --cc=asias@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwolf@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.