All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [RFC PATCH] vfio/pci: Add ioeventfd support
Date: Sun, 11 Feb 2018 11:09:36 +0800	[thread overview]
Message-ID: <20180211030936.GH2783@xz-mi> (raw)
In-Reply-To: <20180209144541.059554bf@w520.home>

On Fri, Feb 09, 2018 at 02:45:41PM -0700, Alex Williamson wrote:
> On Fri, 9 Feb 2018 15:05:11 +0800
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Tue, Feb 06, 2018 at 05:08:14PM -0700, Alex Williamson wrote:
> > 
> > [...]
> > 
> > > +long vfio_pci_ioeventfd(struct vfio_pci_device *vdev, loff_t offset,
> > > +			uint64_t data, int count, int fd)
> > > +{
> > > +	struct pci_dev *pdev = vdev->pdev;
> > > +	loff_t pos = offset & VFIO_PCI_OFFSET_MASK;
> > > +	int ret, bar = VFIO_PCI_OFFSET_TO_INDEX(offset);
> > > +	struct vfio_pci_ioeventfd *ioeventfd;
> > > +	int (*handler)(void *, void *);
> > > +	unsigned long val;
> > > +
> > > +	/* Only support ioeventfds into BARs */
> > > +	if (bar > VFIO_PCI_BAR5_REGION_INDEX)
> > > +		return -EINVAL;
> > > +
> > > +	if (pos + count > pci_resource_len(pdev, bar))
> > > +		return -EINVAL;
> > > +
> > > +	/* Disallow ioeventfds working around MSI-X table writes */
> > > +	if (bar == vdev->msix_bar &&
> > > +	    !(pos + count <= vdev->msix_offset ||
> > > +	      pos >= vdev->msix_offset + vdev->msix_size))
> > > +		return -EINVAL;
> > > +
> > > +	switch (count) {
> > > +	case 1:
> > > +		handler = &vfio_pci_ioeventfd_handler8;
> > > +		val = data;
> > > +		break;
> > > +	case 2:
> > > +		handler = &vfio_pci_ioeventfd_handler16;
> > > +		val = le16_to_cpu(data);
> > > +		break;
> > > +	case 4:
> > > +		handler = &vfio_pci_ioeventfd_handler32;
> > > +		val = le32_to_cpu(data);
> > > +		break;
> > > +#ifdef iowrite64
> > > +	case 8:
> > > +		handler = &vfio_pci_ioeventfd_handler64;
> > > +		val = le64_to_cpu(data);
> > > +		break;
> > > +#endif
> > > +	default:
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	ret = vfio_pci_setup_barmap(vdev, bar);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	mutex_lock(&vdev->ioeventfds_lock);
> > > +
> > > +	list_for_each_entry(ioeventfd, &vdev->ioeventfds_list, next) {
> > > +		if (ioeventfd->pos == pos && ioeventfd->bar == bar &&
> > > +		    ioeventfd->data == data && ioeventfd->count == count) {
> > > +			if (fd == -1) {
> > > +				vfio_virqfd_disable(&ioeventfd->virqfd);
> > > +				list_del(&ioeventfd->next);
> > > +				kfree(ioeventfd);
> > > +				ret = 0;
> > > +			} else
> > > +				ret = -EEXIST;
> > > +
> > > +			goto out_unlock;
> > > +		}
> > > +	}
> > > +
> > > +	if (fd < 0) {
> > > +		ret = -ENODEV;
> > > +		goto out_unlock;
> > > +	}
> > > +
> > > +	ioeventfd = kzalloc(sizeof(*ioeventfd), GFP_KERNEL);
> > > +	if (!ioeventfd) {
> > > +		ret = -ENOMEM;
> > > +		goto out_unlock;
> > > +	}
> > > +
> > > +	ioeventfd->pos = pos;
> > > +	ioeventfd->bar = bar;
> > > +	ioeventfd->data = data;
> > > +	ioeventfd->count = count;
> > > +
> > > +	ret = vfio_virqfd_enable(vdev->barmap[ioeventfd->bar] + ioeventfd->pos,
> > > +				 handler, NULL, (void *)val,
> > > +				 &ioeventfd->virqfd, fd);
> > > +	if (ret) {
> > > +		kfree(ioeventfd);
> > > +		goto out_unlock;
> > > +	}
> > > +
> > > +	list_add(&ioeventfd->next, &vdev->ioeventfds_list);  
> > 
> > Is there a limit on how many ioeventfds that can be created?
> > 
> > IIUC we'll create this eventfd "automatically" if a MMIO addr/data
> > triggered continuously for N=10 times, then would it be safer we have
> > a limitation on maximum eventfds?  Or not sure whether a malicious
> > guest can consume the host memory by sending:
> > 
> > - addr1/data1, 10 times
> > - addr2/data2, 10 times
> > - ...
> > 
> > To create unlimited ioeventfds?  Thanks,
> 
> Good question, it is somewhat exploitable in the guest the way it's
> written, however a user process does have an open file limit and each
> eventfd consumes a file handle, so unless someone is running QEMU with
> unlimited file handles, there is a built-in limit.  Two problems remain
> though:
> 
> First, is it still a bad idea that a user within a guest can target
> this device page to consume all of the QEMU process' open file handles,
> even if ultimately they're only harming themselves?  What would a
> reasonable cap of file descriptors for this purpose be?  How would we
> know which are actively used and which could be expired?  Currently
> only 2 are registered, the MSI-ACK address and some unknown secondary
> one that's low frequency, but enough to trigger the algorithm here (and
> doesn't seem harmful to let it get enabled).  We could therefore
> arbitrarily pick 5 as an upper limit here, maybe with a warning if the
> code hits that limit.
> 
> Second, is there still an exploit in the proposed vfio interface that a
> user could re-use a single file descriptor for multiple vfio
> ioeventfds.  I don't know.  I thought about looking to see whether a
> file descriptor is re-used, but then I wondered if that might actually
> be kind of a neat and potentially useful interface that a single
> eventfd could trigger multiple device writes.

Do you know any of possible scenario for this?  Since that sounds like
an interesting idea.

> It looks like KVM
> arbitrarily sets a limit of 1000 iobus devices, where each ioeventfd
> would be such a device.  Perhaps we take the same approach and pick an
> arbitrary high upper limit per vfio device.  What's your opinion?

I did a "ulimit -n" and I got 1024 as default.  So even if someone
granted the QEMU process with more/unlimited file handles, we'll still
have a hard limit of ~1000 vfio-ioeventfds, which sounds safe.

But I agree that we can still have a even smaller limit for vfio,
especially we know explicitly on how many we may need in most cases.
I'll follow your choice of number because I totally have no good idea
on that.  While if we are going to have a limitation, I'm thinking
whether it'll be good we have a WARN_ONCE when that limit is reached.

Thanks,

-- 
Peter Xu

      reply	other threads:[~2018-02-11  3:09 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-07  0:08 [RFC PATCH] vfio/pci: Add ioeventfd support Alex Williamson
2018-02-07  0:08 ` [Qemu-devel] " Alex Williamson
2018-02-07  4:09 ` Alexey Kardashevskiy
2018-02-07  4:09   ` [Qemu-devel] " Alexey Kardashevskiy
2018-02-07  4:25   ` Alex Williamson
2018-02-07  4:25     ` [Qemu-devel] " Alex Williamson
2018-02-07  4:48     ` Alexey Kardashevskiy
2018-02-07  4:48       ` [Qemu-devel] " Alexey Kardashevskiy
2018-02-07 14:12       ` Alex Williamson
2018-02-07 14:12         ` [Qemu-devel] " Alex Williamson
2018-02-08  1:22         ` Alexey Kardashevskiy
2018-02-08  1:22           ` [Qemu-devel] " Alexey Kardashevskiy
2018-03-13 12:38           ` Auger Eric
2018-03-13 12:38             ` [Qemu-devel] " Auger Eric
2018-03-15 21:23             ` Alex Williamson
2018-03-15 21:23               ` [Qemu-devel] " Alex Williamson
2018-02-07 15:46 ` Auger Eric
2018-02-07 16:57   ` Alex Williamson
2018-02-08 13:48     ` Auger Eric
2018-02-09  7:05 ` Peter Xu
2018-02-09 21:45   ` Alex Williamson
2018-02-11  3:09     ` Peter Xu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180211030936.GH2783@xz-mi \
    --to=peterx@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.