All of lore.kernel.org
 help / color / mirror / Atom feed
From: Auger Eric <eric.auger@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [RFC PATCH] vfio/pci: Add ioeventfd support
Date: Thu, 8 Feb 2018 14:48:13 +0100	[thread overview]
Message-ID: <070da5ee-2895-0350-3bc7-d17eb1f19344@redhat.com> (raw)
In-Reply-To: <20180207095746.14cef887@w520.home>

Hi Alex,

On 07/02/18 17:57, Alex Williamson wrote:
> On Wed, 7 Feb 2018 16:46:19 +0100
> Auger Eric <eric.auger@redhat.com> wrote:
> 
>> Hi Alex,
>>
>> On 07/02/18 01:08, Alex Williamson wrote:
>>> The ioeventfd here is actually irqfd handling of an ioeventfd such as
>>> supported in KVM.  A user is able to pre-program a device write to
>>> occur when the eventfd triggers.  This is yet another instance of
>>> eventfd-irqfd triggering between KVM and vfio.  The impetus for this
>>> is high frequency writes to pages which are virtualized in QEMU.
>>> Enabling this near-direct write path for selected registers within
>>> the virtualized page can improve performance and reduce overhead.
>>> Specifically this is initially targeted at NVIDIA graphics cards where
>>> the driver issues a write to an MMIO register within a virtualized
>>> region in order to allow the MSI interrupt to re-trigger.
>>>
>>> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>  
>>
>> fyi it does not apply anymore on master (conflict in
>> include/uapi/linux/vfio.h on GFX stuff)
> 
> Sorry, I should have noted that this was against v4.15, I didn't want
> the churn of the merge window since I was benchmarking against this.
> Will update for non-RFC.
> 
> ...
>>> +long vfio_pci_ioeventfd(struct vfio_pci_device *vdev, loff_t offset,
>>> +			uint64_t data, int count, int fd)
>>> +{
>>> +	struct pci_dev *pdev = vdev->pdev;
>>> +	loff_t pos = offset & VFIO_PCI_OFFSET_MASK;
>>> +	int ret, bar = VFIO_PCI_OFFSET_TO_INDEX(offset);
>>> +	struct vfio_pci_ioeventfd *ioeventfd;
>>> +	int (*handler)(void *, void *);
>>> +	unsigned long val;
>>> +
>>> +	/* Only support ioeventfds into BARs */
>>> +	if (bar > VFIO_PCI_BAR5_REGION_INDEX)
>>> +		return -EINVAL;
>>> +
>>> +	if (pos + count > pci_resource_len(pdev, bar))
>>> +		return -EINVAL;
>>> +
>>> +	/* Disallow ioeventfds working around MSI-X table writes */
>>> +	if (bar == vdev->msix_bar &&
>>> +	    !(pos + count <= vdev->msix_offset ||
>>> +	      pos >= vdev->msix_offset + vdev->msix_size))
>>> +		return -EINVAL;
>>> +
>>> +	switch (count) {
>>> +	case 1:
>>> +		handler = &vfio_pci_ioeventfd_handler8;
>>> +		val = data;
>>> +		break;
>>> +	case 2:
>>> +		handler = &vfio_pci_ioeventfd_handler16;
>>> +		val = le16_to_cpu(data);
>>> +		break;
>>> +	case 4:
>>> +		handler = &vfio_pci_ioeventfd_handler32;
>>> +		val = le32_to_cpu(data);
>>> +		break;
>>> +#ifdef iowrite64
>>> +	case 8:
>>> +		handler = &vfio_pci_ioeventfd_handler64;
>>> +		val = le64_to_cpu(data);
>>> +		break;
>>> +#endif
>>> +	default:
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	ret = vfio_pci_setup_barmap(vdev, bar);
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +	mutex_lock(&vdev->ioeventfds_lock);
>>> +
>>> +	list_for_each_entry(ioeventfd, &vdev->ioeventfds_list, next) {
>>> +		if (ioeventfd->pos == pos && ioeventfd->bar == bar &&
>>> +		    ioeventfd->data == data && ioeventfd->count == count) {
>>> +			if (fd == -1) {
>>> +				vfio_virqfd_disable(&ioeventfd->virqfd);
>>> +				list_del(&ioeventfd->next);
>>> +				kfree(ioeventfd);
>>> +				ret = 0;
>>> +			} else
>>> +				ret = -EEXIST;
>>> +
>>> +			goto out_unlock;
>>> +		}
>>> +	}
>>> +
>>> +	if (fd < 0) {
>>> +		ret = -ENODEV;
>>> +		goto out_unlock;
>>> +	}
>>> +
>>> +	ioeventfd = kzalloc(sizeof(*ioeventfd), GFP_KERNEL);
>>> +	if (!ioeventfd) {
>>> +		ret = -ENOMEM;
>>> +		goto out_unlock;
>>> +	}
>>> +
>>> +	ioeventfd->pos = pos;
>>> +	ioeventfd->bar = bar;
>>> +	ioeventfd->data = data;
>>> +	ioeventfd->count = count;
>>> +
>>> +	ret = vfio_virqfd_enable(vdev->barmap[ioeventfd->bar] + ioeventfd->pos,  
>> nit: bar and pos could be used directly
> 
> Indeed, probably leftover from development.  Fixed and re-wrapped the
> following lines.
> 
>>> +				 handler, NULL, (void *)val,
>>> +				 &ioeventfd->virqfd, fd);
>>> +	if (ret) {
>>> +		kfree(ioeventfd);
>>> +		goto out_unlock;
>>> +	}
>>> +
>>> +	list_add(&ioeventfd->next, &vdev->ioeventfds_list);
>>> +
>>> +out_unlock:
>>> +	mutex_unlock(&vdev->ioeventfds_lock);
>>> +
>>> +	return ret;
>>> +}
>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>> index e3301dbd27d4..07966a5f0832 100644
>>> --- a/include/uapi/linux/vfio.h
>>> +++ b/include/uapi/linux/vfio.h
>>> @@ -503,6 +503,30 @@ struct vfio_pci_hot_reset {
>>>  
>>>  #define VFIO_DEVICE_PCI_HOT_RESET	_IO(VFIO_TYPE, VFIO_BASE + 13)
>>>  
>>> +/**
>>> + * VFIO_DEVICE_IOEVENTFD - _IOW(VFIO_TYPE, VFIO_BASE + 14,
>>> + *                              struct vfio_device_ioeventfd)
>>> + *
>>> + * Perform a write to the device at the specified device fd offset, with
>>> + * the specified data and width when the provided eventfd is triggered.  
>> don't you want to document the limitation on BAR only and excluding the
>> MSIx table.
> 
> Sure, we could.  This is also just an acceleration interface, it's not
> required for functionality, so a user can probe the capabilities by
> trying to enable an ioeventfd to test for support.  I don't really want
> to add a flag to each region to identify support or create yet another
> sparse map identifying which sections of regions are supported.  We
> could enable this for PCI config space, but I didn't think it really
> made sense (and I was too lazy).  Real PCI config space (not MMIO
> mirrors of config space on GPUs) should never be a performance path,
> therefore I question if there's any ROI for the code to support it.
> Device specific regions would need to be handled on a case by case
> basis, and I don't think we have any cases yet were it makes sense
> (IIRC the IGD regions are read-only).  Of course ROMs are read-only, so
> it doesn't make sense to support them.
> 
> I also note that this patch of course only supports directly assigned
> vfio-pci devices, not vfio-platform and not mdev devices.  Since the
> ioctl is intended to be device agnostic, maybe we could have a default
> handler that simply uses the device file write interface.  At least one
> issue there is that those expect a user buffer.  I'll look into how I
> might add the support more generically, if perhaps less optimized.

OK

> 
> Does it seem like a sufficient user API to try and ioeventfd and be
> prepared for it to fail?  Given that this is a new ioctl, users need to
> do that for backwards compatibility anyway.

looks OK to me.
> 
> Something I forgot to mention in my rush to send this out is that I
> debated whether to create a new ioctl or re-use the SET_IRQS ioctl.  In
> the end, I couldn't resolve how to handle the index, start, and count
> fields, so I created this new ioctl.  Would we create an ioeventfd
> index?  We couldn't simply pass an array of ioeventfds since each needs
> to be associated with an offset, data, and size, so we'd need a new
> data type with a structure encompassing those fields.  In the end it
> obviously didn't make sense to me to re-use SET_IRQS, but if others can
> come up with something that makes sense, I'm open to it.  Thanks,

as far as I am concerned, I prefer this API rather than the SET_IRQS one ;-)


Thanks

Eric
> 
> Alex
> 

  reply	other threads:[~2018-02-08 13:48 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-07  0:08 Alex Williamson
2018-02-07  0:08 ` [Qemu-devel] " Alex Williamson
2018-02-07  4:09 ` Alexey Kardashevskiy
2018-02-07  4:09   ` [Qemu-devel] " Alexey Kardashevskiy
2018-02-07  4:25   ` Alex Williamson
2018-02-07  4:25     ` [Qemu-devel] " Alex Williamson
2018-02-07  4:48     ` Alexey Kardashevskiy
2018-02-07  4:48       ` [Qemu-devel] " Alexey Kardashevskiy
2018-02-07 14:12       ` Alex Williamson
2018-02-07 14:12         ` [Qemu-devel] " Alex Williamson
2018-02-08  1:22         ` Alexey Kardashevskiy
2018-02-08  1:22           ` [Qemu-devel] " Alexey Kardashevskiy
2018-03-13 12:38           ` Auger Eric
2018-03-13 12:38             ` [Qemu-devel] " Auger Eric
2018-03-15 21:23             ` Alex Williamson
2018-03-15 21:23               ` [Qemu-devel] " Alex Williamson
2018-02-07 15:46 ` Auger Eric
2018-02-07 16:57   ` Alex Williamson
2018-02-08 13:48     ` Auger Eric [this message]
2018-02-09  7:05 ` Peter Xu
2018-02-09 21:45   ` Alex Williamson
2018-02-11  3:09     ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=070da5ee-2895-0350-3bc7-d17eb1f19344@redhat.com \
    --to=eric.auger@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=qemu-devel@nongnu.org \
    --subject='Re: [Qemu-devel] [RFC PATCH] vfio/pci: Add ioeventfd support' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.