From mboxrd@z Thu Jan  1 00:00:00 1970
From: Peter Xu <peterx@redhat.com>
Subject: Re: [Qemu-devel] [RFC PATCH 5/5] vfio/quirks: Enable ioeventfd
 quirks to be handled by vfio directly
Date: Fri, 9 Feb 2018 15:11:45 +0800
Message-ID: <20180209071145.GE2783@xz-mi>
References: <20180207001615.1156.10547.stgit@gimli.home>
 <20180207002646.1156.37051.stgit@gimli.home>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org
To: Alex Williamson <alex.williamson@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx3-rdu2.redhat.com ([66.187.233.73]:49192 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP
        id S1751134AbeBIHL4 (ORCPT <rfc822;kvm@vger.kernel.org>);
        Fri, 9 Feb 2018 02:11:56 -0500
Content-Disposition: inline
In-Reply-To: <20180207002646.1156.37051.stgit@gimli.home>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On Tue, Feb 06, 2018 at 05:26:46PM -0700, Alex Williamson wrote:
> With vfio ioeventfd support, we can program vfio-pci to perform a
> specified BAR write when an eventfd is triggered.  This allows the
> KVM ioeventfd to be wired directly to vfio-pci, entirely avoiding
> userspace handling for these events.  On the same micro-benchmark
> where the ioeventfd got us to almost 90% of performance versus
> disabling the GeForce quirks, this gets us to within 95%.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>  hw/vfio/pci-quirks.c |   42 ++++++++++++++++++++++++++++++++++++------
>  1 file changed, 36 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> index e739efe601b1..35a4d5197e2d 100644
> --- a/hw/vfio/pci-quirks.c
> +++ b/hw/vfio/pci-quirks.c
> @@ -16,6 +16,7 @@
>  #include "qemu/range.h"
>  #include "qapi/error.h"
>  #include "qapi/visitor.h"
> +#include <sys/ioctl.h>
>  #include "hw/nvram/fw_cfg.h"
>  #include "pci.h"
>  #include "trace.h"
> @@ -287,13 +288,27 @@ static VFIOQuirk *vfio_quirk_alloc(int nr_mem)
>      return quirk;
>  }
>  
> -static void vfio_ioeventfd_exit(VFIOIOEventFD *ioeventfd)
> +static void vfio_ioeventfd_exit(VFIOPCIDevice *vdev, VFIOIOEventFD *ioeventfd)
>  {
> +    struct vfio_device_ioeventfd vfio_ioeventfd;
> +
>      QLIST_REMOVE(ioeventfd, next);
> +
>      memory_region_del_eventfd(ioeventfd->mr, ioeventfd->addr, ioeventfd->size,
>                                ioeventfd->match_data, ioeventfd->data,
>                                &ioeventfd->e);
> +
>      qemu_set_fd_handler(event_notifier_get_fd(&ioeventfd->e), NULL, NULL, NULL);
> +
> +    vfio_ioeventfd.argsz = sizeof(vfio_ioeventfd);
> +    vfio_ioeventfd.flags = ioeventfd->size;
> +    vfio_ioeventfd.data = ioeventfd->data;
> +    vfio_ioeventfd.offset = ioeventfd->region->fd_offset +
> +                            ioeventfd->region_addr;
> +    vfio_ioeventfd.fd = -1;
> +
> +    ioctl(vdev->vbasedev.fd, VFIO_DEVICE_IOEVENTFD, &vfio_ioeventfd);
> +
>      event_notifier_cleanup(&ioeventfd->e);
>      g_free(ioeventfd);
>  }
> @@ -315,6 +330,8 @@ static VFIOIOEventFD *vfio_ioeventfd_init(VFIOPCIDevice *vdev,
>                                            hwaddr region_addr)
>  {
>      VFIOIOEventFD *ioeventfd = g_malloc0(sizeof(*ioeventfd));
> +    struct vfio_device_ioeventfd vfio_ioeventfd;
> +    char vfio_enabled = '+';
>  
>      if (event_notifier_init(&ioeventfd->e, 0)) {
>          g_free(ioeventfd);
> @@ -329,15 +346,28 @@ static VFIOIOEventFD *vfio_ioeventfd_init(VFIOPCIDevice *vdev,
>      ioeventfd->region = region;
>      ioeventfd->region_addr = region_addr;
>  
> -    qemu_set_fd_handler(event_notifier_get_fd(&ioeventfd->e),
> -                        vfio_ioeventfd_handler, NULL, ioeventfd);
> +    vfio_ioeventfd.argsz = sizeof(vfio_ioeventfd);
> +    vfio_ioeventfd.flags = ioeventfd->size;
> +    vfio_ioeventfd.data = ioeventfd->data;
> +    vfio_ioeventfd.offset = ioeventfd->region->fd_offset +
> +                            ioeventfd->region_addr;
> +    vfio_ioeventfd.fd = event_notifier_get_fd(&ioeventfd->e);
> +
> +    if (ioctl(vdev->vbasedev.fd,
> +              VFIO_DEVICE_IOEVENTFD, &vfio_ioeventfd) != 0) {
> +        qemu_set_fd_handler(event_notifier_get_fd(&ioeventfd->e),
> +                            vfio_ioeventfd_handler, NULL, ioeventfd);
> +        vfio_enabled = '-';

Would the performance be even slower if a new QEMU runs on a old
kernel due to these ioeventfds (MMIO -> eventfd -> same MMIO again)?
If so, shall we only enable this ioeventfd enhancement only if we
detected that the kernel supports this new feature (assuming this
feature bit won't change after VM starts)?

Then, could we avoid the slow path at all (qemu_set_fd_handler)?

Thanks,

-- 
Peter Xu