All of lore.kernel.org
 help / color / mirror / Atom feed
* [KVM PATCH v10] kvm: add support for irqfd
@ 2009-05-20 14:30 Gregory Haskins
  2009-05-20 14:35 ` Avi Kivity
                   ` (4 more replies)
  0 siblings, 5 replies; 28+ messages in thread
From: Gregory Haskins @ 2009-05-20 14:30 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, avi, davidel, mtosatti

(Applies to kvm.git/queue:fd2e987d)

KVM provides a complete virtual system environment for guests, including
support for injecting interrupts modeled after the real exception/interrupt
facilities present on the native platform (such as the IDT on x86).
Virtual interrupts can come from a variety of sources (emulated devices,
pass-through devices, etc) but all must be injected to the guest via
the KVM infrastructure.  This patch adds a new mechanism to inject a specific
interrupt to a guest using a decoupled eventfd mechnanism:  Any legal signal
on the irqfd (using eventfd semantics from either userspace or kernel) will
translate into an injected interrupt in the guest at the next available
interrupt window.

This has been unit-tested with an updated version of my test harness, which
you can download here:

ftp://ftp.novell.com/dev/ghaskins/kvm-eventfd.tar.bz2

The test verifies both assign and deassign paths, and they appear to work
as intended.

The cooresponding userspace patches from v8 are unchanged, which you can find
here:

http://www.mail-archive.com/kvm@vger.kernel.org/msg14913.html

[ Changelog:

   v10:
	*) Fixed formatting/consistency issue in irqfd_remove
	*) Fixed return value error in deassign
	*) Fixed grammatical errors in comments
	*) Rebased to kvm.git/queue branch

   v9:
        *) Fixed a bug in deassign where we could deadlock with the way
           flush_work was being used (Thanks to Marcelo Tosatti's for spotting
           this bug).
        *) Rebased to kvm.git:2ffc3882

   v8:
	*) Re-seperated irqfd and iofd (now called iosignalfd) into two 
  	   distinct series.
        *) We compare both the fd/file and gsi on deassign
        *) De-assign is exhaustive (to support multiple associations in the
           future)
        *) s/KVM_CAP_EVENTFD/KVM_CAP_IRQFD

   v7:
        *) Added "iofd" to allow PIO/MMIO writes to generate an eventfd
           signal.  This was previously discussed as "hypercallfd", but
           since explicit hypercalls are not looking to be very popular,
           and based on the fact that they were not going to carry payload
           anyway, I named them "iofd".
        *) Generalized some of the code so that irqfd and iofd could be
           logically grouped together.  For instance
           s/KVM_CAP_IRQFD/KVM_CAP_EVENTFD and virt/kvm/irqfd.c becomes
	   virt/kvm/eventfd.c
        *) Added support for "deassign" operations to ensure we can properly
           support hot-unplug.
	*) Reinstated the eventfd EXPORT_SYMBOL patch since we need it again
           for supporting iofd.
        *) Rebased to kvm.git:b5e725fa

   v6:
        *) Moved eventfd creation back to userspace, per Avi's request
        *) Dropped no longer necessary supporting patches from series
        *) Rebased to kvm.git:833367b57

   v5:
        *) Added padding to the ioctl structure
        *) Added proper ref-count increment to the file before returning
           success. (Needs review by Al Viro, Davide Libenzi)
	*) Cleaned up error-handling path to make sure we remove ourself
	   from the waitq if necessary.
        *) Make sure we only add ourselves to kvm->irqfds if successful
           creating the irqfd in the first place.
	*) Rebased to kvm.git:66b0aed4

   v4:
        *) Changed allocation model to create the new fd last, after
           we get past the last potential error point by using Davide's
           new eventfd_file_create interface (Al Viro, Davide Libenzi)
	*) We no longer export sys_eventfd2() since it is replaced
           functionally with eventfd_file_create();
        *) Rebased to kvm.git:7da2e3ba

   v3:
        *) The kernel now allocates the eventfd (need to export sys_eventfd2)
        *) Added a flags field for future expansion to kvm_irqfd()
        *) We properly toggle the irq level 1+0.
        *) We re-use the USERSPACE_SRC_ID instead of creating our own
	*) Properly check for failures establishing a poll-table with eventfd
	*) Fixed fd/file leaks on failure
	*) Rebased to lateste kvm.git::41b76d8d04

   v2:
	*) Dropped notifier_chain based callbacks in favor of
	   wait_queue_t::func and file::poll based callbacks (Thanks to
	   Davide for the suggestion)

   v1:
        *) Initial release

]

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
---

 arch/x86/kvm/Makefile    |    2 
 arch/x86/kvm/x86.c       |    1 
 include/linux/kvm.h      |   11 ++
 include/linux/kvm_host.h |    4 +
 virt/kvm/eventfd.c       |  228 ++++++++++++++++++++++++++++++++++++++++++++++
 virt/kvm/kvm_main.c      |   11 ++
 6 files changed, 256 insertions(+), 1 deletions(-)
 create mode 100644 virt/kvm/eventfd.c

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index bee9512..01e3c61 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -2,7 +2,7 @@
 EXTRA_CFLAGS += -Ivirt/kvm -Iarch/x86/kvm
 
 kvm-y			+= $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \
-				coalesced_mmio.o irq_comm.o)
+				coalesced_mmio.o irq_comm.o eventfd.o)
 kvm-$(CONFIG_KVM_TRACE)	+= $(addprefix ../../../virt/kvm/, kvm_trace.o)
 kvm-$(CONFIG_IOMMU_API)	+= $(addprefix ../../../virt/kvm/, iommu.o)
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7978d32..98c2434 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1084,6 +1084,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_REINJECT_CONTROL:
 	case KVM_CAP_IRQ_INJECT_STATUS:
 	case KVM_CAP_ASSIGN_DEV_IRQ:
+	case KVM_CAP_IRQFD:
 		r = 1;
 		break;
 	case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 7b17141..8f53f24 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -418,6 +418,7 @@ struct kvm_trace_rec {
 #ifdef __KVM_HAVE_MCE
 #define KVM_CAP_MCE 31
 #endif
+#define KVM_CAP_IRQFD 32
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -470,6 +471,15 @@ struct kvm_x86_mce {
 };
 #endif
 
+#define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
+
+struct kvm_irqfd {
+	__u32 fd;
+	__u32 gsi;
+	__u32 flags;
+	__u8  pad[20];
+};
+
 /*
  * ioctls for VM fds
  */
@@ -514,6 +524,7 @@ struct kvm_x86_mce {
 #define KVM_ASSIGN_SET_MSIX_ENTRY \
 			_IOW(KVMIO, 0x74, struct kvm_assigned_msix_entry)
 #define KVM_DEASSIGN_DEV_IRQ       _IOW(KVMIO, 0x75, struct kvm_assigned_irq)
+#define KVM_IRQFD                  _IOW(KVMIO, 0x76, struct kvm_irqfd)
 
 /*
  * ioctls for vcpu fds
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8f410d3..3b6caf5 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -134,6 +134,7 @@ struct kvm {
 	struct list_head vm_list;
 	struct kvm_io_bus mmio_bus;
 	struct kvm_io_bus pio_bus;
+	struct list_head irqfds;
 	struct kvm_vm_stat stat;
 	struct kvm_arch arch;
 	atomic_t users_count;
@@ -528,4 +529,7 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {}
 
 #endif
 
+int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags);
+void kvm_irqfd_release(struct kvm *kvm);
+
 #endif
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
new file mode 100644
index 0000000..72a282e
--- /dev/null
+++ b/virt/kvm/eventfd.c
@@ -0,0 +1,228 @@
+/*
+ * kvm eventfd support - use eventfd objects to signal various KVM events
+ *
+ * Copyright 2009 Novell.  All Rights Reserved.
+ *
+ * Author:
+ *	Gregory Haskins <ghaskins@novell.com>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#include <linux/kvm_host.h>
+#include <linux/workqueue.h>
+#include <linux/syscalls.h>
+#include <linux/wait.h>
+#include <linux/poll.h>
+#include <linux/file.h>
+#include <linux/list.h>
+
+/*
+ * --------------------------------------------------------------------
+ * irqfd: Allows an fd to be used to inject an interrupt to the guest
+ *
+ * Credit goes to Avi Kivity for the original idea.
+ * --------------------------------------------------------------------
+ */
+struct _irqfd {
+	struct kvm               *kvm;
+	int                       gsi;
+	struct file              *file;
+	struct list_head          list;
+	poll_table                pt;
+	wait_queue_head_t        *wqh;
+	wait_queue_t              wait;
+	struct work_struct        work;
+};
+
+static void
+irqfd_inject(struct work_struct *work)
+{
+	struct _irqfd *irqfd = container_of(work, struct _irqfd, work);
+	struct kvm *kvm = irqfd->kvm;
+
+	mutex_lock(&kvm->lock);
+	kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1);
+	kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
+	mutex_unlock(&kvm->lock);
+}
+
+static int
+irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
+{
+	struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait);
+
+	/*
+	 * The wake_up is called with interrupts disabled.  Therefore we need
+	 * to defer the IRQ injection until later since we need to acquire the
+	 * kvm->lock to do so.
+	 */
+	schedule_work(&irqfd->work);
+
+	return 0;
+}
+
+static void
+irqfd_ptable_queue_proc(struct file *file, wait_queue_head_t *wqh,
+			poll_table *pt)
+{
+	struct _irqfd *irqfd = container_of(pt, struct _irqfd, pt);
+
+	irqfd->wqh = wqh;
+	add_wait_queue(wqh, &irqfd->wait);
+}
+
+static int
+kvm_assign_irqfd(struct kvm *kvm, int fd, int gsi)
+{
+	struct _irqfd *irqfd;
+	struct file *file = NULL;
+	int ret;
+
+	irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
+	if (!irqfd)
+		return -ENOMEM;
+
+	irqfd->kvm = kvm;
+	irqfd->gsi = gsi;
+	INIT_LIST_HEAD(&irqfd->list);
+	INIT_WORK(&irqfd->work, irqfd_inject);
+
+	/*
+	 * Embed the file* lifetime in the irqfd.
+	 */
+	file = fget(fd);
+	if (IS_ERR(file)) {
+		ret = PTR_ERR(file);
+		goto fail;
+	}
+
+	/*
+	 * Install our own custom wake-up handling so we are notified via
+	 * a callback whenever someone signals the underlying eventfd
+	 */
+	init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup);
+	init_poll_funcptr(&irqfd->pt, irqfd_ptable_queue_proc);
+
+	ret = file->f_op->poll(file, &irqfd->pt);
+	if (ret < 0)
+		goto fail;
+
+	irqfd->file = file;
+
+	mutex_lock(&kvm->lock);
+	list_add_tail(&irqfd->list, &kvm->irqfds);
+	mutex_unlock(&kvm->lock);
+
+	return 0;
+
+fail:
+	if (irqfd->wqh)
+		remove_wait_queue(irqfd->wqh, &irqfd->wait);
+
+	if (file && !IS_ERR(file))
+		fput(file);
+
+	kfree(irqfd);
+	return ret;
+}
+
+static void
+irqfd_release(struct _irqfd *irqfd)
+{
+	/*
+	 * The ordering is important.  We must remove ourselves from the wqh
+	 * first to ensure no more event callbacks are issued, and then flush
+	 * any previously scheduled work prior to freeing the memory
+	 */
+	remove_wait_queue(irqfd->wqh, &irqfd->wait);
+
+	flush_work(&irqfd->work);
+
+	fput(irqfd->file);
+	kfree(irqfd);
+}
+
+static struct _irqfd *
+irqfd_remove(struct kvm *kvm, struct file *file, int gsi)
+{
+	struct _irqfd *irqfd;
+
+	mutex_lock(&kvm->lock);
+
+	/*
+	 * linear search isn't brilliant, but this should be an infrequent
+	 * slow-path operation, and the list should not grow very large
+	 */
+	list_for_each_entry(irqfd, &kvm->irqfds, list) {
+		if (irqfd->file != file || irqfd->gsi != gsi)
+			continue;
+
+		list_del(&irqfd->list);
+		mutex_unlock(&kvm->lock);
+
+		return irqfd;
+	}
+
+	mutex_unlock(&kvm->lock);
+
+	return NULL;
+}
+
+static int
+kvm_deassign_irqfd(struct kvm *kvm, int fd, int gsi)
+{
+	struct _irqfd *irqfd;
+	struct file *file;
+	int count = 0;
+
+	file = fget(fd);
+	if (IS_ERR(file))
+		return PTR_ERR(file);
+
+	while ((irqfd = irqfd_remove(kvm, file, gsi))) {
+		/*
+		 * We remove the item from the list under the lock, but we
+		 * free it outside the lock to avoid deadlocking with the
+		 * flush_work and the work_item taking the lock
+		 */
+		irqfd_release(irqfd);
+		count++;
+	}
+
+	fput(file);
+
+	return count ? count : -ENOENT;
+}
+
+int
+kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags)
+{
+	if (flags & KVM_IRQFD_FLAG_DEASSIGN)
+		return kvm_deassign_irqfd(kvm, fd, gsi);
+
+	return kvm_assign_irqfd(kvm, fd, gsi);
+}
+
+void
+kvm_irqfd_release(struct kvm *kvm)
+{
+	struct _irqfd *irqfd, *tmp;
+
+	/* don't bother with the lock..we are shutting down */
+	list_for_each_entry_safe(irqfd, tmp, &kvm->irqfds, list) {
+		list_del(&irqfd->list);
+		irqfd_release(irqfd);
+	}
+}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index bebfe59..b58837d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -983,6 +983,7 @@ static struct kvm *kvm_create_vm(void)
 	atomic_inc(&kvm->mm->mm_count);
 	spin_lock_init(&kvm->mmu_lock);
 	kvm_io_bus_init(&kvm->pio_bus);
+	INIT_LIST_HEAD(&kvm->irqfds);
 	mutex_init(&kvm->lock);
 	kvm_io_bus_init(&kvm->mmio_bus);
 	init_rwsem(&kvm->slots_lock);
@@ -1034,6 +1035,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
 	spin_lock(&kvm_lock);
 	list_del(&kvm->vm_list);
 	spin_unlock(&kvm_lock);
+	kvm_irqfd_release(kvm);
 	kvm_free_irq_routing(kvm);
 	kvm_io_bus_destroy(&kvm->pio_bus);
 	kvm_io_bus_destroy(&kvm->mmio_bus);
@@ -2210,6 +2212,15 @@ static long kvm_vm_ioctl(struct file *filp,
 	}
 #endif
 #endif /* KVM_CAP_IRQ_ROUTING */
+	case KVM_IRQFD: {
+		struct kvm_irqfd data;
+
+		r = -EFAULT;
+		if (copy_from_user(&data, argp, sizeof data))
+			goto out;
+		r = kvm_irqfd(kvm, data.fd, data.gsi, data.flags);
+		break;
+	}
 	default:
 		r = kvm_arch_vm_ioctl(filp, ioctl, arg);
 	}


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [KVM PATCH v10] kvm: add support for irqfd
  2009-05-20 14:30 [KVM PATCH v10] kvm: add support for irqfd Gregory Haskins
@ 2009-05-20 14:35 ` Avi Kivity
  2009-05-26 16:42 ` Michael S. Tsirkin
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 28+ messages in thread
From: Avi Kivity @ 2009-05-20 14:35 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: kvm, linux-kernel, davidel, mtosatti

Gregory Haskins wrote:
> (Applies to kvm.git/queue:fd2e987d)
>
> KVM provides a complete virtual system environment for guests, including
> support for injecting interrupts modeled after the real exception/interrupt
> facilities present on the native platform (such as the IDT on x86).
> Virtual interrupts can come from a variety of sources (emulated devices,
> pass-through devices, etc) but all must be injected to the guest via
> the KVM infrastructure.  This patch adds a new mechanism to inject a specific
> interrupt to a guest using a decoupled eventfd mechnanism:  Any legal signal
> on the irqfd (using eventfd semantics from either userspace or kernel) will
> translate into an injected interrupt in the guest at the next available
> interrupt window.
>   

Applied, thanks.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [KVM PATCH v10] kvm: add support for irqfd
  2009-05-20 14:30 [KVM PATCH v10] kvm: add support for irqfd Gregory Haskins
  2009-05-20 14:35 ` Avi Kivity
@ 2009-05-26 16:42 ` Michael S. Tsirkin
  2009-05-26 18:05   ` Gregory Haskins
  2009-05-26 20:00   ` Davide Libenzi
  2009-05-27 13:55 ` Michael S. Tsirkin
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 28+ messages in thread
From: Michael S. Tsirkin @ 2009-05-26 16:42 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: kvm, linux-kernel, avi, davidel, mtosatti

On Wed, May 20, 2009 at 10:30:49AM -0400, Gregory Haskins wrote:
> +static int
> +irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
> +{
> +	struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait);
> +
> +	/*
> +	 * The wake_up is called with interrupts disabled.  Therefore we need
> +	 * to defer the IRQ injection until later since we need to acquire the
> +	 * kvm->lock to do so.
> +	 */
> +	schedule_work(&irqfd->work);
> +
> +	return 0;
> +}

This schedule_work is there just to work around the spinlock
in eventfd_signal, which we don't really need. Isn't this right?
And this is on each interrupt. Seems like a pity.
How about a flag in eventfd that would
convert it from waking up someone to a plain function call?

Davide, could we add something like


diff --git a/fs/eventfd.c b/fs/eventfd.c
index 2a701d5..8bfa308 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -29,6 +29,7 @@ struct eventfd_ctx {
 	 */
 	__u64 count;
 	unsigned int flags;
+	int nolock;
 };
 
 /*
@@ -46,6 +47,12 @@ int eventfd_signal(struct file *file, int n)
 
 	if (n < 0)
 		return -EINVAL;
+	if (ctx->nolock) {
+               /* Whoever set nolock
+                  better set wqh.func as well. */
+		ctx->wqh.func(&ctx->wqh, 0, 0, NULL);
+		return 0;
+	}
 	spin_lock_irqsave(&ctx->wqh.lock, flags);
 	if (ULLONG_MAX - ctx->count < n)
 		n = (int) (ULLONG_MAX - ctx->count);

-- 
MST

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [KVM PATCH v10] kvm: add support for irqfd
  2009-05-26 16:42 ` Michael S. Tsirkin
@ 2009-05-26 18:05   ` Gregory Haskins
  2009-05-26 20:00   ` Davide Libenzi
  1 sibling, 0 replies; 28+ messages in thread
From: Gregory Haskins @ 2009-05-26 18:05 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, linux-kernel, avi, davidel, mtosatti

[-- Attachment #1: Type: text/plain, Size: 3066 bytes --]

Michael S. Tsirkin wrote:
> On Wed, May 20, 2009 at 10:30:49AM -0400, Gregory Haskins wrote:
>   
>> +static int
>> +irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
>> +{
>> +	struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait);
>> +
>> +	/*
>> +	 * The wake_up is called with interrupts disabled.  Therefore we need
>> +	 * to defer the IRQ injection until later since we need to acquire the
>> +	 * kvm->lock to do so.
>> +	 */
>> +	schedule_work(&irqfd->work);
>> +
>> +	return 0;
>> +}
>>     
>
> This schedule_work is there just to work around the spinlock
> in eventfd_signal, which we don't really need. Isn't this right?
>   

Yep.

> And this is on each interrupt. Seems like a pity.
>   

I agree.  Moving towards a way to be able to inject without deferring to
a workqueue will be a good thing.  Note, however, that addressing it at
the eventfd/wqh-lock layer is only part of the picture since ideally we
can inject (i.e. eventfd_signal()) from any atomic context (e.g.
hard-irq), not just the artificial one created by the wqh based
implementation.  I think Marcelo's irq_lock split-up code is taking us
in that direction by (eventually) allowing the kvm_set_irq() path to be
atomic-context friendly.

> How about a flag in eventfd that would
> convert it from waking up someone to a plain function call?
>
> Davide, could we add something like
>
>
> diff --git a/fs/eventfd.c b/fs/eventfd.c
> index 2a701d5..8bfa308 100644
> --- a/fs/eventfd.c
> +++ b/fs/eventfd.c
> @@ -29,6 +29,7 @@ struct eventfd_ctx {
>  	 */
>  	__u64 count;
>  	unsigned int flags;
> +	int nolock;
>  };
>  
>  /*
> @@ -46,6 +47,12 @@ int eventfd_signal(struct file *file, int n)
>  
>  	if (n < 0)
>  		return -EINVAL;
> +	if (ctx->nolock) {
> +               /* Whoever set nolock
> +                  better set wqh.func as well. */
> +		ctx->wqh.func(&ctx->wqh, 0, 0, NULL);
> +		return 0;
> +	}
>  	spin_lock_irqsave(&ctx->wqh.lock, flags);
>  	if (ULLONG_MAX - ctx->count < n)
>  		n = (int) (ULLONG_MAX - ctx->count);
>
>   

If we think we still need to address it at the eventfd layer (which I am
not 100% convinced we do), I think we should probably generalize it a
little more and make it so it doesn't completely re-route the
notification (there may be other end-points interrested in the event, I
suppose).

I am thinking something along the lines that the internal eventfd uses
an srcu_notifier, and we register a default notifier which points to a
wqh path very much like what we have today.  Then something like kvm
could register an additional srcu_notifier which should allow it to be
invoked lockless (*).  This would theoretically allow the eventfd to
remain free to support an arbitrary number of end-points which support
both locked and lockless operation.

-Greg

(*) disclaimer: I've never looked at the srcu_notifier implementation,
so perhaps this is not what they really offer.  I base this only on
basic RCU understanding.




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [KVM PATCH v10] kvm: add support for irqfd
  2009-05-26 16:42 ` Michael S. Tsirkin
  2009-05-26 18:05   ` Gregory Haskins
@ 2009-05-26 20:00   ` Davide Libenzi
  1 sibling, 0 replies; 28+ messages in thread
From: Davide Libenzi @ 2009-05-26 20:00 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Gregory Haskins, kvm, Linux Kernel Mailing List, avi, mtosatti

On Tue, 26 May 2009, Michael S. Tsirkin wrote:

> On Wed, May 20, 2009 at 10:30:49AM -0400, Gregory Haskins wrote:
> > +static int
> > +irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
> > +{
> > +	struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait);
> > +
> > +	/*
> > +	 * The wake_up is called with interrupts disabled.  Therefore we need
> > +	 * to defer the IRQ injection until later since we need to acquire the
> > +	 * kvm->lock to do so.
> > +	 */
> > +	schedule_work(&irqfd->work);
> > +
> > +	return 0;
> > +}
> 
> This schedule_work is there just to work around the spinlock
> in eventfd_signal, which we don't really need. Isn't this right?
> And this is on each interrupt. Seems like a pity.
> How about a flag in eventfd that would
> convert it from waking up someone to a plain function call?
> 
> Davide, could we add something like

I'm sorry, but it's not very pretty. Please find another way around.



> diff --git a/fs/eventfd.c b/fs/eventfd.c
> index 2a701d5..8bfa308 100644
> --- a/fs/eventfd.c
> +++ b/fs/eventfd.c
> @@ -29,6 +29,7 @@ struct eventfd_ctx {
>  	 */
>  	__u64 count;
>  	unsigned int flags;
> +	int nolock;
>  };
>  
>  /*
> @@ -46,6 +47,12 @@ int eventfd_signal(struct file *file, int n)
>  
>  	if (n < 0)
>  		return -EINVAL;
> +	if (ctx->nolock) {
> +               /* Whoever set nolock
> +                  better set wqh.func as well. */
> +		ctx->wqh.func(&ctx->wqh, 0, 0, NULL);
> +		return 0;
> +	}
>  	spin_lock_irqsave(&ctx->wqh.lock, flags);
>  	if (ULLONG_MAX - ctx->count < n)
>  		n = (int) (ULLONG_MAX - ctx->count);



- Davide



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [KVM PATCH v10] kvm: add support for irqfd
  2009-05-20 14:30 [KVM PATCH v10] kvm: add support for irqfd Gregory Haskins
  2009-05-20 14:35 ` Avi Kivity
  2009-05-26 16:42 ` Michael S. Tsirkin
@ 2009-05-27 13:55 ` Michael S. Tsirkin
  2009-05-27 14:06   ` Gregory Haskins
  2009-06-11 13:16 ` Michael S. Tsirkin
  2009-06-14  9:25 ` Michael S. Tsirkin
  4 siblings, 1 reply; 28+ messages in thread
From: Michael S. Tsirkin @ 2009-05-27 13:55 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: kvm, linux-kernel, avi, davidel, mtosatti

On Wed, May 20, 2009 at 10:30:49AM -0400, Gregory Haskins wrote:
> +static int
> +kvm_assign_irqfd(struct kvm *kvm, int fd, int gsi)
> +{
> +	struct _irqfd *irqfd;
> +	struct file *file = NULL;
> +	int ret;
> +
> +	irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
> +	if (!irqfd)
> +		return -ENOMEM;
> +
> +	irqfd->kvm = kvm;
> +	irqfd->gsi = gsi;
> +	INIT_LIST_HEAD(&irqfd->list);
> +	INIT_WORK(&irqfd->work, irqfd_inject);
> +
> +	/*
> +	 * Embed the file* lifetime in the irqfd.
> +	 */
> +	file = fget(fd);
> +	if (IS_ERR(file)) {
> +		ret = PTR_ERR(file);
> +		goto fail;
> +	}

So we get a reference to a file, and unless the user is nice to us, it
will only be dropped when kvm char device file is closed?
I think this will deadlock if the fd in question is the open kvm char device.


-- 
MST

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [KVM PATCH v10] kvm: add support for irqfd
  2009-05-27 13:55 ` Michael S. Tsirkin
@ 2009-05-27 14:06   ` Gregory Haskins
  2009-05-27 14:36     ` [PATCH 0/2] kvm: validate irqfd type Gregory Haskins
  2009-05-27 18:41     ` [KVM PATCH v10] kvm: add support for irqfd Michael S. Tsirkin
  0 siblings, 2 replies; 28+ messages in thread
From: Gregory Haskins @ 2009-05-27 14:06 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, linux-kernel, avi, davidel, mtosatti

[-- Attachment #1: Type: text/plain, Size: 1464 bytes --]

Michael S. Tsirkin wrote:
> On Wed, May 20, 2009 at 10:30:49AM -0400, Gregory Haskins wrote:
>   
>> +static int
>> +kvm_assign_irqfd(struct kvm *kvm, int fd, int gsi)
>> +{
>> +	struct _irqfd *irqfd;
>> +	struct file *file = NULL;
>> +	int ret;
>> +
>> +	irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
>> +	if (!irqfd)
>> +		return -ENOMEM;
>> +
>> +	irqfd->kvm = kvm;
>> +	irqfd->gsi = gsi;
>> +	INIT_LIST_HEAD(&irqfd->list);
>> +	INIT_WORK(&irqfd->work, irqfd_inject);
>> +
>> +	/*
>> +	 * Embed the file* lifetime in the irqfd.
>> +	 */
>> +	file = fget(fd);
>> +	if (IS_ERR(file)) {
>> +		ret = PTR_ERR(file);
>> +		goto fail;
>> +	}
>>     
>
> So we get a reference to a file, and unless the user is nice to us, it
> will only be dropped when kvm char device file is closed?
> I think this will deadlock if the fd in question is the open kvm char device.
>
>
>   
Hmm...I hadn't considered this possibility, though I am not sure if it
would cause a deadlock in the pattern you suggest.  It seems more like
it would result in, at worst, an extra reference to itself (and thus a
leak) rather than a deadlock...

I digress.  In either case, perhaps I should s/fget/eventfd_fget to at
least limit the type of fd to eventfd.  I was trying to be "slick" by
not needing the eventfd_fget() exported, but I am going to need to
export it later anyway for iosignalfd, so its probably a moot point.

Thanks Michael,
-Greg


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 0/2] kvm: validate irqfd type
  2009-05-27 14:06   ` Gregory Haskins
@ 2009-05-27 14:36     ` Gregory Haskins
  2009-05-27 14:37       ` [PATCH 1/2] eventfd: export eventfd interfaces for module use Gregory Haskins
                         ` (3 more replies)
  2009-05-27 18:41     ` [KVM PATCH v10] kvm: add support for irqfd Michael S. Tsirkin
  1 sibling, 4 replies; 28+ messages in thread
From: Gregory Haskins @ 2009-05-27 14:36 UTC (permalink / raw)
  To: mst; +Cc: kvm, linux-kernel, avi, davidel, mtosatti

Michael Tsirkin pointed out an issue in kvm.git w.r.t. malicious userspace
configuring irqfd objects:

http://lkml.org/lkml/2009/5/27/341

This series applies to kvm.git/master:1a6a35a1 to attempt to close the
vulnerability.

---

Gregory Haskins (2):
      kvm: validate irqfd type
      eventfd: export eventfd interfaces for module use


 fs/eventfd.c       |    3 +++
 virt/kvm/eventfd.c |    3 ++-
 2 files changed, 5 insertions(+), 1 deletions(-)

-- 
Signature

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/2] eventfd: export eventfd interfaces for module use
  2009-05-27 14:36     ` [PATCH 0/2] kvm: validate irqfd type Gregory Haskins
@ 2009-05-27 14:37       ` Gregory Haskins
  2009-05-27 14:37       ` [PATCH 2/2] kvm: validate irqfd type Gregory Haskins
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 28+ messages in thread
From: Gregory Haskins @ 2009-05-27 14:37 UTC (permalink / raw)
  To: mst; +Cc: kvm, linux-kernel, avi, davidel, mtosatti

We want to use eventfd from KVM which can be compiled as a module, so
export the interfaces.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
---

 fs/eventfd.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 2a701d5..3f0e197 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -16,6 +16,7 @@
 #include <linux/anon_inodes.h>
 #include <linux/eventfd.h>
 #include <linux/syscalls.h>
+#include <linux/module.h>
 
 struct eventfd_ctx {
 	wait_queue_head_t wqh;
@@ -56,6 +57,7 @@ int eventfd_signal(struct file *file, int n)
 
 	return n;
 }
+EXPORT_SYMBOL_GPL(eventfd_signal);
 
 static int eventfd_release(struct inode *inode, struct file *file)
 {
@@ -197,6 +199,7 @@ struct file *eventfd_fget(int fd)
 
 	return file;
 }
+EXPORT_SYMBOL_GPL(eventfd_fget);
 
 SYSCALL_DEFINE2(eventfd2, unsigned int, count, int, flags)
 {


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/2] kvm: validate irqfd type
  2009-05-27 14:36     ` [PATCH 0/2] kvm: validate irqfd type Gregory Haskins
  2009-05-27 14:37       ` [PATCH 1/2] eventfd: export eventfd interfaces for module use Gregory Haskins
@ 2009-05-27 14:37       ` Gregory Haskins
  2009-05-27 15:06       ` [PATCH 0/2] " Gregory Haskins
  2009-05-31  9:36       ` Avi Kivity
  3 siblings, 0 replies; 28+ messages in thread
From: Gregory Haskins @ 2009-05-27 14:37 UTC (permalink / raw)
  To: mst; +Cc: kvm, linux-kernel, avi, davidel, mtosatti

We should be more vigilant in validating the fd type passed down for use
in irqfd.  A malicious userspace could do something nasty like pass the
kvm fd which would cause problems such as a reference leak on the kvm
object on shutdown.

Therefore, we use the eventfd_fget() routine in place of the plain fget()
to at least make sure its of the proper type.

Reported-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
---

 virt/kvm/eventfd.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index c63ff6a..f3f2ea1 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -27,6 +27,7 @@
 #include <linux/poll.h>
 #include <linux/file.h>
 #include <linux/list.h>
+#include <linux/eventfd.h>
 
 /*
  * --------------------------------------------------------------------
@@ -102,7 +103,7 @@ kvm_assign_irqfd(struct kvm *kvm, int fd, int gsi)
 	/*
 	 * Embed the file* lifetime in the irqfd.
 	 */
-	file = fget(fd);
+	file = eventfd_fget(fd);
 	if (IS_ERR(file)) {
 		ret = PTR_ERR(file);
 		goto fail;


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/2] kvm: validate irqfd type
  2009-05-27 14:36     ` [PATCH 0/2] kvm: validate irqfd type Gregory Haskins
  2009-05-27 14:37       ` [PATCH 1/2] eventfd: export eventfd interfaces for module use Gregory Haskins
  2009-05-27 14:37       ` [PATCH 2/2] kvm: validate irqfd type Gregory Haskins
@ 2009-05-27 15:06       ` Gregory Haskins
  2009-05-31  9:36       ` Avi Kivity
  3 siblings, 0 replies; 28+ messages in thread
From: Gregory Haskins @ 2009-05-27 15:06 UTC (permalink / raw)
  To: avi; +Cc: mst, kvm, linux-kernel, davidel, mtosatti

[-- Attachment #1: Type: text/plain, Size: 694 bytes --]

Gregory Haskins wrote:
> Michael Tsirkin pointed out an issue in kvm.git w.r.t. malicious userspace
> configuring irqfd objects:
>
> http://lkml.org/lkml/2009/5/27/341
>
> This series applies to kvm.git/master:1a6a35a1 to attempt to close the
> vulnerability.
>   

Avi,

FYI: This is also available as a git-pull:

git pull
git://git.kernel.org/pub/scm/linux/kernel/git/ghaskins/linux-2.6-hacks.git
for-avi

> ---
>
> Gregory Haskins (2):
>       kvm: validate irqfd type
>       eventfd: export eventfd interfaces for module use
>
>
>  fs/eventfd.c       |    3 +++
>  virt/kvm/eventfd.c |    3 ++-
>  2 files changed, 5 insertions(+), 1 deletions(-)
>
>   



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [KVM PATCH v10] kvm: add support for irqfd
  2009-05-27 14:06   ` Gregory Haskins
  2009-05-27 14:36     ` [PATCH 0/2] kvm: validate irqfd type Gregory Haskins
@ 2009-05-27 18:41     ` Michael S. Tsirkin
  2009-05-27 19:28       ` Davide Libenzi
  2009-05-27 20:07       ` Gregory Haskins
  1 sibling, 2 replies; 28+ messages in thread
From: Michael S. Tsirkin @ 2009-05-27 18:41 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: kvm, linux-kernel, avi, davidel, mtosatti

On Wed, May 27, 2009 at 10:06:50AM -0400, Gregory Haskins wrote:
> Michael S. Tsirkin wrote:
> > On Wed, May 20, 2009 at 10:30:49AM -0400, Gregory Haskins wrote:
> >   
> >> +static int
> >> +kvm_assign_irqfd(struct kvm *kvm, int fd, int gsi)
> >> +{
> >> +	struct _irqfd *irqfd;
> >> +	struct file *file = NULL;
> >> +	int ret;
> >> +
> >> +	irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
> >> +	if (!irqfd)
> >> +		return -ENOMEM;
> >> +
> >> +	irqfd->kvm = kvm;
> >> +	irqfd->gsi = gsi;
> >> +	INIT_LIST_HEAD(&irqfd->list);
> >> +	INIT_WORK(&irqfd->work, irqfd_inject);
> >> +
> >> +	/*
> >> +	 * Embed the file* lifetime in the irqfd.
> >> +	 */
> >> +	file = fget(fd);
> >> +	if (IS_ERR(file)) {
> >> +		ret = PTR_ERR(file);
> >> +		goto fail;
> >> +	}
> >>     
> >
> > So we get a reference to a file, and unless the user is nice to us, it
> > will only be dropped when kvm char device file is closed?
> > I think this will deadlock if the fd in question is the open kvm char device.
> >
> >
> >   
> Hmm...I hadn't considered this possibility, though I am not sure if it
> would cause a deadlock in the pattern you suggest.  It seems more like
> it would result in, at worst, an extra reference to itself (and thus a
> leak) rather than a deadlock...
> 
> I digress.  In either case, perhaps I should s/fget/eventfd_fget to at
> least limit the type of fd to eventfd.  I was trying to be "slick" by
> not needing the eventfd_fget() exported, but I am going to need to
> export it later anyway for iosignalfd, so its probably a moot point.
> 
> Thanks Michael,
> -Greg
> 

This only works as long as eventfd does not do fget on some fd as well.
Which it does not do now, and may never do - but we create a fragile
system this way.

I think it's really wrong, fundamentally, to keep a reference to a
file until another file is closed, unless you are code under fs/.
We will get nasty circular references sooner or later.

Isn't the real reason we use fd to be able to support the same interface
on top of both kvm and lguest?
And if so, wouldn't some kind of bus be a better solution?

-- 
MST

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [KVM PATCH v10] kvm: add support for irqfd
  2009-05-27 18:41     ` [KVM PATCH v10] kvm: add support for irqfd Michael S. Tsirkin
@ 2009-05-27 19:28       ` Davide Libenzi
  2009-05-27 20:07       ` Gregory Haskins
  1 sibling, 0 replies; 28+ messages in thread
From: Davide Libenzi @ 2009-05-27 19:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Gregory Haskins, kvm, Linux Kernel Mailing List, avi, mtosatti

On Wed, 27 May 2009, Michael S. Tsirkin wrote:

> On Wed, May 27, 2009 at 10:06:50AM -0400, Gregory Haskins wrote:
> > Michael S. Tsirkin wrote:
> > > On Wed, May 20, 2009 at 10:30:49AM -0400, Gregory Haskins wrote:
> > >   
> > >> +static int
> > >> +kvm_assign_irqfd(struct kvm *kvm, int fd, int gsi)
> > >> +{
> > >> +	struct _irqfd *irqfd;
> > >> +	struct file *file = NULL;
> > >> +	int ret;
> > >> +
> > >> +	irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
> > >> +	if (!irqfd)
> > >> +		return -ENOMEM;
> > >> +
> > >> +	irqfd->kvm = kvm;
> > >> +	irqfd->gsi = gsi;
> > >> +	INIT_LIST_HEAD(&irqfd->list);
> > >> +	INIT_WORK(&irqfd->work, irqfd_inject);
> > >> +
> > >> +	/*
> > >> +	 * Embed the file* lifetime in the irqfd.
> > >> +	 */
> > >> +	file = fget(fd);
> > >> +	if (IS_ERR(file)) {
> > >> +		ret = PTR_ERR(file);
> > >> +		goto fail;
> > >> +	}
> > >>     
> > >
> > > So we get a reference to a file, and unless the user is nice to us, it
> > > will only be dropped when kvm char device file is closed?
> > > I think this will deadlock if the fd in question is the open kvm char device.
> > >
> > >
> > >   
> > Hmm...I hadn't considered this possibility, though I am not sure if it
> > would cause a deadlock in the pattern you suggest.  It seems more like
> > it would result in, at worst, an extra reference to itself (and thus a
> > leak) rather than a deadlock...
> > 
> > I digress.  In either case, perhaps I should s/fget/eventfd_fget to at
> > least limit the type of fd to eventfd.  I was trying to be "slick" by
> > not needing the eventfd_fget() exported, but I am going to need to
> > export it later anyway for iosignalfd, so its probably a moot point.
> > 
> > Thanks Michael,
> > -Greg
> > 
> 
> This only works as long as eventfd does not do fget on some fd as well.
> Which it does not do now, and may never do - but we create a fragile
> system this way.
> 
> I think it's really wrong, fundamentally, to keep a reference to a
> file until another file is closed, unless you are code under fs/.
> We will get nasty circular references sooner or later.
> 
> Isn't the real reason we use fd to be able to support the same interface
> on top of both kvm and lguest?
> And if so, wouldn't some kind of bus be a better solution?

Another solution, that I proposed in the past, is having irqfd hold no 
references to the eventfd. It's just register (holding an eventfd-get()) 
for events (in the way that currently does), notice the POLLHUP, unchain 
from it, and propagate the eventfd-close event to whatever the irqfd logic 
is supposed to.



- Davide



---
 fs/eventfd.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Index: linux-2.6.mod/fs/eventfd.c
===================================================================
--- linux-2.6.mod.orig/fs/eventfd.c	2009-05-27 12:10:03.000000000 -0700
+++ linux-2.6.mod/fs/eventfd.c	2009-05-27 12:16:57.000000000 -0700
@@ -59,7 +59,15 @@ int eventfd_signal(struct file *file, in
 
 static int eventfd_release(struct inode *inode, struct file *file)
 {
-	kfree(file->private_data);
+	struct eventfd_ctx *ctx = file->private_data;
+
+	/*
+	 * No need to hold the lock here, since we are on the file cleanup
+	 * path and the ones still attached to the wait queue will be
+	 * serialized by wake_up_locked_poll().
+	 */
+	wake_up_locked_poll(&ctx->wqh, POLLHUP);
+	kfree(ctx);
 	return 0;
 }
 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [KVM PATCH v10] kvm: add support for irqfd
  2009-05-27 18:41     ` [KVM PATCH v10] kvm: add support for irqfd Michael S. Tsirkin
  2009-05-27 19:28       ` Davide Libenzi
@ 2009-05-27 20:07       ` Gregory Haskins
  2009-05-27 20:43         ` Michael S. Tsirkin
  1 sibling, 1 reply; 28+ messages in thread
From: Gregory Haskins @ 2009-05-27 20:07 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, linux-kernel, avi, davidel, mtosatti

[-- Attachment #1: Type: text/plain, Size: 3868 bytes --]

Michael S. Tsirkin wrote:
> On Wed, May 27, 2009 at 10:06:50AM -0400, Gregory Haskins wrote:
>   
>> Michael S. Tsirkin wrote:
>>     
>>> On Wed, May 20, 2009 at 10:30:49AM -0400, Gregory Haskins wrote:
>>>   
>>>       
>>>> +static int
>>>> +kvm_assign_irqfd(struct kvm *kvm, int fd, int gsi)
>>>> +{
>>>> +	struct _irqfd *irqfd;
>>>> +	struct file *file = NULL;
>>>> +	int ret;
>>>> +
>>>> +	irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
>>>> +	if (!irqfd)
>>>> +		return -ENOMEM;
>>>> +
>>>> +	irqfd->kvm = kvm;
>>>> +	irqfd->gsi = gsi;
>>>> +	INIT_LIST_HEAD(&irqfd->list);
>>>> +	INIT_WORK(&irqfd->work, irqfd_inject);
>>>> +
>>>> +	/*
>>>> +	 * Embed the file* lifetime in the irqfd.
>>>> +	 */
>>>> +	file = fget(fd);
>>>> +	if (IS_ERR(file)) {
>>>> +		ret = PTR_ERR(file);
>>>> +		goto fail;
>>>> +	}
>>>>     
>>>>         
>>> So we get a reference to a file, and unless the user is nice to us, it
>>> will only be dropped when kvm char device file is closed?
>>> I think this will deadlock if the fd in question is the open kvm char device.
>>>
>>>
>>>   
>>>       
>> Hmm...I hadn't considered this possibility, though I am not sure if it
>> would cause a deadlock in the pattern you suggest.  It seems more like
>> it would result in, at worst, an extra reference to itself (and thus a
>> leak) rather than a deadlock...
>>
>> I digress.  In either case, perhaps I should s/fget/eventfd_fget to at
>> least limit the type of fd to eventfd.  I was trying to be "slick" by
>> not needing the eventfd_fget() exported, but I am going to need to
>> export it later anyway for iosignalfd, so its probably a moot point.
>>
>> Thanks Michael,
>> -Greg
>>
>>     
>
> This only works as long as eventfd does not do fget on some fd as well.
> Which it does not do now, and may never do - but we create a fragile
> system this way.
>
> I think it's really wrong, fundamentally, to keep a reference to a
> file until another file is closed, unless you are code under fs/.
> We will get nasty circular references sooner or later.
>   

Hmm.. I understand your concern, but I respectfully disagree.

One object referencing another is a natural expression, regardless of
what type they may be.  The fact is that introducing the concept of
irqfd creates a relationship between an eventfd instance and a kvm
instance whether we like it or not, and this relationship needs to be
managed.  It is therefore IMO perfectly natural to express that
relationship with a reference count, and I do not currently see anything
wrong or even particularly fragile about how I've currently done this. 
I'm sure there are other ways, however.  Do you have a particular
suggestion in mind?

> Isn't the real reason we use fd to be able to support the same interface
> on top of both kvm and lguest?
>   

Actually, the reason why we use an fd is to decouple the
interrupt-producing end-point from the KVM core.  Ignoring eventfd in
specific for a moment, one convenient way to do that is with an fd
because it provides a nice, already written/tested handle-to-pointer
translation, and a polymorphic interface (e.g. f_ops).  Choosing to use
eventfd flavored fd's buys us additional advantages in terms of
leveraging already tested f_ops code, and compatibility with an
interface that is designed-for/used-by other established subsystems for
signaling.
> And if so, wouldn't some kind of bus be a better solution?
>   

Ultimately I aim to implement a bus (vbus, specifically) in terms of
irqfd (and iosignalfd, for that matter).  However, the eventfd
interfaces are general purpose and can be used in other areas as well
(for instance, virtio-pci, or the shared-mem driver recently
discussed).  I realize this is probably not the point you were making
here, but fyi.

Regards,
-Greg



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [KVM PATCH v10] kvm: add support for irqfd
  2009-05-27 20:07       ` Gregory Haskins
@ 2009-05-27 20:43         ` Michael S. Tsirkin
  2009-05-27 20:46           ` Gregory Haskins
  0 siblings, 1 reply; 28+ messages in thread
From: Michael S. Tsirkin @ 2009-05-27 20:43 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: kvm, linux-kernel, avi, davidel, mtosatti

On Wed, May 27, 2009 at 04:07:23PM -0400, Gregory Haskins wrote:
> Michael S. Tsirkin wrote:
> > On Wed, May 27, 2009 at 10:06:50AM -0400, Gregory Haskins wrote:
> >   
> >> Michael S. Tsirkin wrote:
> >>     
> >>> On Wed, May 20, 2009 at 10:30:49AM -0400, Gregory Haskins wrote:
> >>>   
> >>>       
> >>>> +static int
> >>>> +kvm_assign_irqfd(struct kvm *kvm, int fd, int gsi)
> >>>> +{
> >>>> +	struct _irqfd *irqfd;
> >>>> +	struct file *file = NULL;
> >>>> +	int ret;
> >>>> +
> >>>> +	irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
> >>>> +	if (!irqfd)
> >>>> +		return -ENOMEM;
> >>>> +
> >>>> +	irqfd->kvm = kvm;
> >>>> +	irqfd->gsi = gsi;
> >>>> +	INIT_LIST_HEAD(&irqfd->list);
> >>>> +	INIT_WORK(&irqfd->work, irqfd_inject);
> >>>> +
> >>>> +	/*
> >>>> +	 * Embed the file* lifetime in the irqfd.
> >>>> +	 */
> >>>> +	file = fget(fd);
> >>>> +	if (IS_ERR(file)) {
> >>>> +		ret = PTR_ERR(file);
> >>>> +		goto fail;
> >>>> +	}
> >>>>     
> >>>>         
> >>> So we get a reference to a file, and unless the user is nice to us, it
> >>> will only be dropped when kvm char device file is closed?
> >>> I think this will deadlock if the fd in question is the open kvm char device.
> >>>
> >>>
> >>>   
> >>>       
> >> Hmm...I hadn't considered this possibility, though I am not sure if it
> >> would cause a deadlock in the pattern you suggest.  It seems more like
> >> it would result in, at worst, an extra reference to itself (and thus a
> >> leak) rather than a deadlock...
> >>
> >> I digress.  In either case, perhaps I should s/fget/eventfd_fget to at
> >> least limit the type of fd to eventfd.  I was trying to be "slick" by
> >> not needing the eventfd_fget() exported, but I am going to need to
> >> export it later anyway for iosignalfd, so its probably a moot point.
> >>
> >> Thanks Michael,
> >> -Greg
> >>
> >>     
> >
> > This only works as long as eventfd does not do fget on some fd as well.
> > Which it does not do now, and may never do - but we create a fragile
> > system this way.
> >
> > I think it's really wrong, fundamentally, to keep a reference to a
> > file until another file is closed, unless you are code under fs/.
> > We will get nasty circular references sooner or later.
> >   
> 
> Hmm.. I understand your concern, but I respectfully disagree.
> 
> One object referencing another is a natural expression, regardless of
> what type they may be.  The fact is that introducing the concept of
> irqfd creates a relationship between an eventfd instance and a kvm
> instance whether we like it or not, and this relationship needs to be
> managed.  It is therefore IMO perfectly natural to express that
> relationship with a reference count, and I do not currently see anything
> wrong or even particularly fragile about how I've currently done this. 
> I'm sure there are other ways, however.  Do you have a particular
> suggestion in mind?

Yes. I'll try to post a patch soon.

> > Isn't the real reason we use fd to be able to support the same interface
> > on top of both kvm and lguest?
> >   
> 
> Actually, the reason why we use an fd is to decouple the
> interrupt-producing end-point from the KVM core.  Ignoring eventfd in
> specific for a moment, one convenient way to do that is with an fd
> because it provides a nice, already written/tested handle-to-pointer
> translation, and a polymorphic interface (e.g. f_ops).  Choosing to use
> eventfd flavored fd's buys us additional advantages in terms of
> leveraging already tested f_ops code, and compatibility with an
> interface that is designed-for/used-by other established subsystems for
> signaling.
> > And if so, wouldn't some kind of bus be a better solution?
> >   
> 
> Ultimately I aim to implement a bus (vbus, specifically) in terms of
> irqfd (and iosignalfd, for that matter).  However, the eventfd
> interfaces are general purpose and can be used in other areas as well
> (for instance, virtio-pci, or the shared-mem driver recently
> discussed).  I realize this is probably not the point you were making
> here, but fyi.
> 
> Regards,
> -Greg
> 
> 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [KVM PATCH v10] kvm: add support for irqfd
  2009-05-27 20:43         ` Michael S. Tsirkin
@ 2009-05-27 20:46           ` Gregory Haskins
  0 siblings, 0 replies; 28+ messages in thread
From: Gregory Haskins @ 2009-05-27 20:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Gregory Haskins, kvm, linux-kernel, avi, davidel, mtosatti

[-- Attachment #1: Type: text/plain, Size: 254 bytes --]

Michael S. Tsirkin wrote:
> On Wed, May 27, 2009 at 04:07:23PM -0400, Gregory Haskins wrote:
>   
>> Do you have a particular suggestion in mind?
>>     
>
> Yes. I'll try to post a patch soon.
>
>   

Sounds good.  Thanks Michael.

-Greg


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/2] kvm: validate irqfd type
  2009-05-27 14:36     ` [PATCH 0/2] kvm: validate irqfd type Gregory Haskins
                         ` (2 preceding siblings ...)
  2009-05-27 15:06       ` [PATCH 0/2] " Gregory Haskins
@ 2009-05-31  9:36       ` Avi Kivity
  3 siblings, 0 replies; 28+ messages in thread
From: Avi Kivity @ 2009-05-31  9:36 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: mst, kvm, linux-kernel, davidel, mtosatti

Gregory Haskins wrote:
> Michael Tsirkin pointed out an issue in kvm.git w.r.t. malicious userspace
> configuring irqfd objects:
>
> http://lkml.org/lkml/2009/5/27/341
>
> This series applies to kvm.git/master:1a6a35a1 to attempt to close the
> vulnerability.
>
>   

Applied, thanks.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [KVM PATCH v10] kvm: add support for irqfd
  2009-05-20 14:30 [KVM PATCH v10] kvm: add support for irqfd Gregory Haskins
                   ` (2 preceding siblings ...)
  2009-05-27 13:55 ` Michael S. Tsirkin
@ 2009-06-11 13:16 ` Michael S. Tsirkin
  2009-06-11 13:36   ` Michael S. Tsirkin
  2009-06-14  9:25 ` Michael S. Tsirkin
  4 siblings, 1 reply; 28+ messages in thread
From: Michael S. Tsirkin @ 2009-06-11 13:16 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: kvm, linux-kernel, avi, davidel, mtosatti

Going over this code again, I seem to see a minor error handling issue here:

On Wed, May 20, 2009 at 10:30:49AM -0400, Gregory Haskins wrote:
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> new file mode 100644
> index 0000000..72a282e
> --- /dev/null
> +++ b/virt/kvm/eventfd.c
> @@ -0,0 +1,228 @@
> +/*
> + * kvm eventfd support - use eventfd objects to signal various KVM events
> + *
> + * Copyright 2009 Novell.  All Rights Reserved.
> + *
> + * Author:
> + *	Gregory Haskins <ghaskins@novell.com>
> + *
> + * This file is free software; you can redistribute it and/or modify
> + * it under the terms of version 2 of the GNU General Public License
> + * as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software Foundation,
> + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
> + */
> +
> +#include <linux/kvm_host.h>
> +#include <linux/workqueue.h>
> +#include <linux/syscalls.h>
> +#include <linux/wait.h>
> +#include <linux/poll.h>
> +#include <linux/file.h>
> +#include <linux/list.h>
> +
> +/*
> + * --------------------------------------------------------------------
> + * irqfd: Allows an fd to be used to inject an interrupt to the guest
> + *
> + * Credit goes to Avi Kivity for the original idea.
> + * --------------------------------------------------------------------
> + */
> +struct _irqfd {
> +	struct kvm               *kvm;
> +	int                       gsi;
> +	struct file              *file;
> +	struct list_head          list;
> +	poll_table                pt;
> +	wait_queue_head_t        *wqh;
> +	wait_queue_t              wait;
> +	struct work_struct        work;
> +};
> +
> +static void
> +irqfd_inject(struct work_struct *work)
> +{
> +	struct _irqfd *irqfd = container_of(work, struct _irqfd, work);
> +	struct kvm *kvm = irqfd->kvm;
> +
> +	mutex_lock(&kvm->lock);
> +	kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1);
> +	kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
> +	mutex_unlock(&kvm->lock);
> +}
> +
> +static int
> +irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
> +{
> +	struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait);
> +
> +	/*
> +	 * The wake_up is called with interrupts disabled.  Therefore we need
> +	 * to defer the IRQ injection until later since we need to acquire the
> +	 * kvm->lock to do so.
> +	 */
> +	schedule_work(&irqfd->work);
> +
> +	return 0;
> +}
> +
> +static void
> +irqfd_ptable_queue_proc(struct file *file, wait_queue_head_t *wqh,
> +			poll_table *pt)
> +{
> +	struct _irqfd *irqfd = container_of(pt, struct _irqfd, pt);
> +
> +	irqfd->wqh = wqh;
> +	add_wait_queue(wqh, &irqfd->wait);
> +}
> +
> +static int
> +kvm_assign_irqfd(struct kvm *kvm, int fd, int gsi)
> +{
> +	struct _irqfd *irqfd;
> +	struct file *file = NULL;
> +	int ret;
> +
> +	irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
> +	if (!irqfd)
> +		return -ENOMEM;
> +
> +	irqfd->kvm = kvm;
> +	irqfd->gsi = gsi;
> +	INIT_LIST_HEAD(&irqfd->list);
> +	INIT_WORK(&irqfd->work, irqfd_inject);
> +
> +	/*
> +	 * Embed the file* lifetime in the irqfd.
> +	 */
> +	file = fget(fd);
> +	if (IS_ERR(file)) {
> +		ret = PTR_ERR(file);
> +		goto fail;
> +	}
> +
> +	/*
> +	 * Install our own custom wake-up handling so we are notified via
> +	 * a callback whenever someone signals the underlying eventfd
> +	 */
> +	init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup);
> +	init_poll_funcptr(&irqfd->pt, irqfd_ptable_queue_proc);
> +
> +	ret = file->f_op->poll(file, &irqfd->pt);
> +	if (ret < 0)
> +		goto fail;
> +
> +	irqfd->file = file;
> +
> +	mutex_lock(&kvm->lock);
> +	list_add_tail(&irqfd->list, &kvm->irqfds);
> +	mutex_unlock(&kvm->lock);
> +
> +	return 0;
> +
> +fail:
> +	if (irqfd->wqh)
> +		remove_wait_queue(irqfd->wqh, &irqfd->wait);

Why are these 2 lines here? Either we might get a callback even though
poll failed - and then this test without lock is probably racy -
or we can't, and then we can replace the above with BUG_ON(irqfd->wqh).

Which is it? I think the later ...


> +
> +	if (file && !IS_ERR(file))
> +		fput(file);
> +
> +	kfree(irqfd);
> +	return ret;
> +}
> +
> +static void
> +irqfd_release(struct _irqfd *irqfd)
> +{
> +	/*
> +	 * The ordering is important.  We must remove ourselves from the wqh
> +	 * first to ensure no more event callbacks are issued, and then flush
> +	 * any previously scheduled work prior to freeing the memory
> +	 */
> +	remove_wait_queue(irqfd->wqh, &irqfd->wait);
> +
> +	flush_work(&irqfd->work);
> +
> +	fput(irqfd->file);
> +	kfree(irqfd);
> +}
> +
> +static struct _irqfd *
> +irqfd_remove(struct kvm *kvm, struct file *file, int gsi)
> +{
> +	struct _irqfd *irqfd;
> +
> +	mutex_lock(&kvm->lock);
> +
> +	/*
> +	 * linear search isn't brilliant, but this should be an infrequent
> +	 * slow-path operation, and the list should not grow very large
> +	 */
> +	list_for_each_entry(irqfd, &kvm->irqfds, list) {
> +		if (irqfd->file != file || irqfd->gsi != gsi)
> +			continue;
> +
> +		list_del(&irqfd->list);
> +		mutex_unlock(&kvm->lock);
> +
> +		return irqfd;
> +	}
> +
> +	mutex_unlock(&kvm->lock);
> +
> +	return NULL;
> +}
> +
> +static int
> +kvm_deassign_irqfd(struct kvm *kvm, int fd, int gsi)
> +{
> +	struct _irqfd *irqfd;
> +	struct file *file;
> +	int count = 0;
> +
> +	file = fget(fd);
> +	if (IS_ERR(file))
> +		return PTR_ERR(file);
> +
> +	while ((irqfd = irqfd_remove(kvm, file, gsi))) {
> +		/*
> +		 * We remove the item from the list under the lock, but we
> +		 * free it outside the lock to avoid deadlocking with the
> +		 * flush_work and the work_item taking the lock
> +		 */
> +		irqfd_release(irqfd);
> +		count++;
> +	}
> +
> +	fput(file);
> +
> +	return count ? count : -ENOENT;
> +}
> +
> +int
> +kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags)
> +{
> +	if (flags & KVM_IRQFD_FLAG_DEASSIGN)
> +		return kvm_deassign_irqfd(kvm, fd, gsi);
> +
> +	return kvm_assign_irqfd(kvm, fd, gsi);
> +}
> +
> +void
> +kvm_irqfd_release(struct kvm *kvm)
> +{
> +	struct _irqfd *irqfd, *tmp;
> +
> +	/* don't bother with the lock..we are shutting down */
> +	list_for_each_entry_safe(irqfd, tmp, &kvm->irqfds, list) {
> +		list_del(&irqfd->list);
> +		irqfd_release(irqfd);
> +	}
> +}

-- 
MST

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [KVM PATCH v10] kvm: add support for irqfd
  2009-06-11 13:16 ` Michael S. Tsirkin
@ 2009-06-11 13:36   ` Michael S. Tsirkin
  2009-06-14 12:25     ` Gregory Haskins
  0 siblings, 1 reply; 28+ messages in thread
From: Michael S. Tsirkin @ 2009-06-11 13:36 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: kvm, linux-kernel, avi, davidel, mtosatti

On Thu, Jun 11, 2009 at 04:16:47PM +0300, Michael S. Tsirkin wrote:
> > +
> > +	ret = file->f_op->poll(file, &irqfd->pt);
> > +	if (ret < 0)
> > +		goto fail;

Looking at it some more, we have:
struct file_operations {
....
	unsigned int (*poll) (struct file *, struct poll_table_struct *);

So the comparison above does not seem to make sense:
it seems that the return value from poll can not be negative.

Will the callback be executed if someone did a write to eventfd
before we attached it? If no, maybe we should call it here
if ret != 0.


> > +
> > +	irqfd->file = file;
> > +
> > +	mutex_lock(&kvm->lock);
> > +	list_add_tail(&irqfd->list, &kvm->irqfds);
> > +	mutex_unlock(&kvm->lock);
> > +
> > +	return 0;
> > +
> > +fail:
> > +	if (irqfd->wqh)
> > +		remove_wait_queue(irqfd->wqh, &irqfd->wait);
> 
> Why are these 2 lines here? Either we might get a callback even though
> poll failed - and then this test without lock is probably racy -
> or we can't, and then we can replace the above with BUG_ON(irqfd->wqh).
> 
> Which is it? I think the later ...
> 
> 
> > +
> > +	if (file && !IS_ERR(file))
> > +		fput(file);
> > +
> > +	kfree(irqfd);
> > +	return ret;
> > +}
> > +

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [KVM PATCH v10] kvm: add support for irqfd
  2009-05-20 14:30 [KVM PATCH v10] kvm: add support for irqfd Gregory Haskins
                   ` (3 preceding siblings ...)
  2009-06-11 13:16 ` Michael S. Tsirkin
@ 2009-06-14  9:25 ` Michael S. Tsirkin
  2009-06-14 12:40   ` Gregory Haskins
  4 siblings, 1 reply; 28+ messages in thread
From: Michael S. Tsirkin @ 2009-06-14  9:25 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: kvm, linux-kernel, avi, davidel, mtosatti

On Wed, May 20, 2009 at 10:30:49AM -0400, Gregory Haskins wrote:

...

> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> +static int
> +kvm_assign_irqfd(struct kvm *kvm, int fd, int gsi)
> +{
> +	struct _irqfd *irqfd;
> +	struct file *file = NULL;
> +	int ret;
> +
> +	irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
> +	if (!irqfd)
> +		return -ENOMEM;
> +
> +	irqfd->kvm = kvm;
> +	irqfd->gsi = gsi;
> +	INIT_LIST_HEAD(&irqfd->list);
> +	INIT_WORK(&irqfd->work, irqfd_inject);
> +
> +	/*
> +	 * Embed the file* lifetime in the irqfd.
> +	 */
> +	file = fget(fd);
> +	if (IS_ERR(file)) {
> +		ret = PTR_ERR(file);
> +		goto fail;
> +	}
> +
> +	/*
> +	 * Install our own custom wake-up handling so we are notified via
> +	 * a callback whenever someone signals the underlying eventfd
> +	 */
> +	init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup);
> +	init_poll_funcptr(&irqfd->pt, irqfd_ptable_queue_proc);
> +
> +	ret = file->f_op->poll(file, &irqfd->pt);
> +	if (ret < 0)
> +		goto fail;
> +
> +	irqfd->file = file;
> +
> +	mutex_lock(&kvm->lock);
> +	list_add_tail(&irqfd->list, &kvm->irqfds);
> +	mutex_unlock(&kvm->lock);
> +
> +	return 0;
> +
> +fail:
> +	if (irqfd->wqh)
> +		remove_wait_queue(irqfd->wqh, &irqfd->wait);
> +
> +	if (file && !IS_ERR(file))
> +		fput(file);
> +
> +	kfree(irqfd);
> +	return ret;
> +}

It seems that this lets the guest assign an unlimited number of fds
to the same gsi, potentially using up all of kernel memory.

Since we don't need multiple fds assigned to the same gsi (instead,
multiple processes can write to the same eventfd to trigger an
interrupt) let's simply check that no fd is yet assigned to this gsi.

Makes sense?

-- 
MST

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [KVM PATCH v10] kvm: add support for irqfd
  2009-06-11 13:36   ` Michael S. Tsirkin
@ 2009-06-14 12:25     ` Gregory Haskins
  2009-06-14 13:20       ` Michael S. Tsirkin
  0 siblings, 1 reply; 28+ messages in thread
From: Gregory Haskins @ 2009-06-14 12:25 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, linux-kernel, avi, davidel, mtosatti

[-- Attachment #1: Type: text/plain, Size: 1580 bytes --]

Michael S. Tsirkin wrote:
> On Thu, Jun 11, 2009 at 04:16:47PM +0300, Michael S. Tsirkin wrote:
>   
>>> +
>>> +	ret = file->f_op->poll(file, &irqfd->pt);
>>> +	if (ret < 0)
>>> +		goto fail;
>>>       
>
> Looking at it some more, we have:
> struct file_operations {
> ....
> 	unsigned int (*poll) (struct file *, struct poll_table_struct *);
>
> So the comparison above does not seem to make sense:
> it seems that the return value from poll can not be negative.
>   

Indeed.  Will fix.
> Will the callback be executed if someone did a write to eventfd
> before we attached it? If no, maybe we should call it here
> if ret != 0.
>   

I do the cleanup in case the callback has been called, but poll() fails
somewhere internally afterwards.  Perhaps this is not a realistic
scenario, but it was my motivation for adding the wqh cleanup.
>
>   
>>> +
>>> +	irqfd->file = file;
>>> +
>>> +	mutex_lock(&kvm->lock);
>>> +	list_add_tail(&irqfd->list, &kvm->irqfds);
>>> +	mutex_unlock(&kvm->lock);
>>> +
>>> +	return 0;
>>> +
>>> +fail:
>>> +	if (irqfd->wqh)
>>> +		remove_wait_queue(irqfd->wqh, &irqfd->wait);
>>>       
>> Why are these 2 lines here? Either we might get a callback even though
>> poll failed - and then this test without lock is probably racy -
>> or we can't, and then we can replace the above with BUG_ON(irqfd->wqh).
>>
>> Which is it? I think the later ...
>>
>>
>>     
>>> +
>>> +	if (file && !IS_ERR(file))
>>> +		fput(file);
>>> +
>>> +	kfree(irqfd);
>>> +	return ret;
>>> +}
>>> +
>>>       



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [KVM PATCH v10] kvm: add support for irqfd
  2009-06-14  9:25 ` Michael S. Tsirkin
@ 2009-06-14 12:40   ` Gregory Haskins
  2009-06-14 13:19     ` Michael S. Tsirkin
  0 siblings, 1 reply; 28+ messages in thread
From: Gregory Haskins @ 2009-06-14 12:40 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, linux-kernel, avi, davidel, mtosatti

[-- Attachment #1: Type: text/plain, Size: 2134 bytes --]

Michael S. Tsirkin wrote:
> On Wed, May 20, 2009 at 10:30:49AM -0400, Gregory Haskins wrote:
>
> ...
>
>   
>> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
>> +static int
>> +kvm_assign_irqfd(struct kvm *kvm, int fd, int gsi)
>> +{
>> +	struct _irqfd *irqfd;
>> +	struct file *file = NULL;
>> +	int ret;
>> +
>> +	irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
>> +	if (!irqfd)
>> +		return -ENOMEM;
>> +
>> +	irqfd->kvm = kvm;
>> +	irqfd->gsi = gsi;
>> +	INIT_LIST_HEAD(&irqfd->list);
>> +	INIT_WORK(&irqfd->work, irqfd_inject);
>> +
>> +	/*
>> +	 * Embed the file* lifetime in the irqfd.
>> +	 */
>> +	file = fget(fd);
>> +	if (IS_ERR(file)) {
>> +		ret = PTR_ERR(file);
>> +		goto fail;
>> +	}
>> +
>> +	/*
>> +	 * Install our own custom wake-up handling so we are notified via
>> +	 * a callback whenever someone signals the underlying eventfd
>> +	 */
>> +	init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup);
>> +	init_poll_funcptr(&irqfd->pt, irqfd_ptable_queue_proc);
>> +
>> +	ret = file->f_op->poll(file, &irqfd->pt);
>> +	if (ret < 0)
>> +		goto fail;
>> +
>> +	irqfd->file = file;
>> +
>> +	mutex_lock(&kvm->lock);
>> +	list_add_tail(&irqfd->list, &kvm->irqfds);
>> +	mutex_unlock(&kvm->lock);
>> +
>> +	return 0;
>> +
>> +fail:
>> +	if (irqfd->wqh)
>> +		remove_wait_queue(irqfd->wqh, &irqfd->wait);
>> +
>> +	if (file && !IS_ERR(file))
>> +		fput(file);
>> +
>> +	kfree(irqfd);
>> +	return ret;
>> +}
>>     
>
> It seems that this lets the guest assign an unlimited number of fds
> to the same gsi, potentially using up all of kernel memory.
>
> Since we don't need multiple fds assigned to the same gsi (instead,
> multiple processes can write to the same eventfd to trigger an
> interrupt) let's simply check that no fd is yet assigned to this gsi.
>   

I think Avi asked for this specific feature during review which is the
reason why its there today.  However, I agree that it would probably be
a good idea to put an upper limit on the number of supported aliases
that can be registered.  Will fix.

Thanks Michael,

-Greg



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [KVM PATCH v10] kvm: add support for irqfd
  2009-06-14 12:40   ` Gregory Haskins
@ 2009-06-14 13:19     ` Michael S. Tsirkin
  2009-06-14 13:23       ` Avi Kivity
  0 siblings, 1 reply; 28+ messages in thread
From: Michael S. Tsirkin @ 2009-06-14 13:19 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: kvm, linux-kernel, avi, davidel, mtosatti

On Sun, Jun 14, 2009 at 08:40:57AM -0400, Gregory Haskins wrote:
> Michael S. Tsirkin wrote:
> > On Wed, May 20, 2009 at 10:30:49AM -0400, Gregory Haskins wrote:
> >
> > ...
> >
> >   
> >> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> >> +static int
> >> +kvm_assign_irqfd(struct kvm *kvm, int fd, int gsi)
> >> +{
> >> +	struct _irqfd *irqfd;
> >> +	struct file *file = NULL;
> >> +	int ret;
> >> +
> >> +	irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
> >> +	if (!irqfd)
> >> +		return -ENOMEM;
> >> +
> >> +	irqfd->kvm = kvm;
> >> +	irqfd->gsi = gsi;
> >> +	INIT_LIST_HEAD(&irqfd->list);
> >> +	INIT_WORK(&irqfd->work, irqfd_inject);
> >> +
> >> +	/*
> >> +	 * Embed the file* lifetime in the irqfd.
> >> +	 */
> >> +	file = fget(fd);
> >> +	if (IS_ERR(file)) {
> >> +		ret = PTR_ERR(file);
> >> +		goto fail;
> >> +	}
> >> +
> >> +	/*
> >> +	 * Install our own custom wake-up handling so we are notified via
> >> +	 * a callback whenever someone signals the underlying eventfd
> >> +	 */
> >> +	init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup);
> >> +	init_poll_funcptr(&irqfd->pt, irqfd_ptable_queue_proc);
> >> +
> >> +	ret = file->f_op->poll(file, &irqfd->pt);
> >> +	if (ret < 0)
> >> +		goto fail;
> >> +
> >> +	irqfd->file = file;
> >> +
> >> +	mutex_lock(&kvm->lock);
> >> +	list_add_tail(&irqfd->list, &kvm->irqfds);
> >> +	mutex_unlock(&kvm->lock);
> >> +
> >> +	return 0;
> >> +
> >> +fail:
> >> +	if (irqfd->wqh)
> >> +		remove_wait_queue(irqfd->wqh, &irqfd->wait);
> >> +
> >> +	if (file && !IS_ERR(file))
> >> +		fput(file);
> >> +
> >> +	kfree(irqfd);
> >> +	return ret;
> >> +}
> >>     
> >
> > It seems that this lets the guest assign an unlimited number of fds
> > to the same gsi, potentially using up all of kernel memory.
> >
> > Since we don't need multiple fds assigned to the same gsi (instead,
> > multiple processes can write to the same eventfd to trigger an
> > interrupt) let's simply check that no fd is yet assigned to this gsi.
> >   
> 
> I think Avi asked for this specific feature during review which is the
> reason why its there today.  However, I agree that it would probably be
> a good idea to put an upper limit on the number of supported aliases
> that can be registered.  Will fix.
> 
> Thanks Michael,
> 
> -Greg
> 
> 


Avi, can you elaborate on why do we want to map multiple fds
to the same gsi? I think it's better to allow a 1:1 mapping
only: if many processes want to trigger interrupts they can
all write to the same fd.

-- 
MST

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [KVM PATCH v10] kvm: add support for irqfd
  2009-06-14 12:25     ` Gregory Haskins
@ 2009-06-14 13:20       ` Michael S. Tsirkin
  0 siblings, 0 replies; 28+ messages in thread
From: Michael S. Tsirkin @ 2009-06-14 13:20 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: kvm, linux-kernel, avi, davidel, mtosatti

On Sun, Jun 14, 2009 at 08:25:43AM -0400, Gregory Haskins wrote:
> Michael S. Tsirkin wrote:
> > On Thu, Jun 11, 2009 at 04:16:47PM +0300, Michael S. Tsirkin wrote:
> >   
> >>> +
> >>> +	ret = file->f_op->poll(file, &irqfd->pt);
> >>> +	if (ret < 0)
> >>> +		goto fail;
> >>>       
> >
> > Looking at it some more, we have:
> > struct file_operations {
> > ....
> > 	unsigned int (*poll) (struct file *, struct poll_table_struct *);
> >
> > So the comparison above does not seem to make sense:
> > it seems that the return value from poll can not be negative.
> >   
> 
> Indeed.  Will fix.
> > Will the callback be executed if someone did a write to eventfd
> > before we attached it? If no, maybe we should call it here
> > if ret != 0.
> >   
> 
> I do the cleanup in case the callback has been called, but poll() fails
> somewhere internally afterwards.  Perhaps this is not a realistic
> scenario, but it was my motivation for adding the wqh cleanup.

Could it be that poll returns the event mask and you mistake it for
error?

> >   
> >>> +
> >>> +	irqfd->file = file;
> >>> +
> >>> +	mutex_lock(&kvm->lock);
> >>> +	list_add_tail(&irqfd->list, &kvm->irqfds);
> >>> +	mutex_unlock(&kvm->lock);
> >>> +
> >>> +	return 0;
> >>> +
> >>> +fail:
> >>> +	if (irqfd->wqh)
> >>> +		remove_wait_queue(irqfd->wqh, &irqfd->wait);
> >>>       
> >> Why are these 2 lines here? Either we might get a callback even though
> >> poll failed - and then this test without lock is probably racy -
> >> or we can't, and then we can replace the above with BUG_ON(irqfd->wqh).
> >>
> >> Which is it? I think the later ...
> >>
> >>
> >>     
> >>> +
> >>> +	if (file && !IS_ERR(file))
> >>> +		fput(file);
> >>> +
> >>> +	kfree(irqfd);
> >>> +	return ret;
> >>> +}
> >>> +
> >>>       
> 
> 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [KVM PATCH v10] kvm: add support for irqfd
  2009-06-14 13:19     ` Michael S. Tsirkin
@ 2009-06-14 13:23       ` Avi Kivity
  2009-06-14 13:30         ` Michael S. Tsirkin
  0 siblings, 1 reply; 28+ messages in thread
From: Avi Kivity @ 2009-06-14 13:23 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Gregory Haskins, kvm, linux-kernel, davidel, mtosatti

Michael S. Tsirkin wrote:

  

>> I think Avi asked for this specific feature during review which is the
>> reason why its there today.  However, I agree that it would probably be
>> a good idea to put an upper limit on the number of supported aliases
>> that can be registered.  Will fix.
>>
>> Thanks Michael,
>>
>> -Greg
>>
>>
>>     
>
>
> Avi, can you elaborate on why do we want to map multiple fds
> to the same gsi? I think it's better to allow a 1:1 mapping
> only: if many processes want to trigger interrupts they can
> all write to the same fd.
>   

I don't want to assume that the eventfds all come from the same source.

That said, we have a workaround, allocate a new gsi with the same routes 
and attach the excess eventfds there.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [KVM PATCH v10] kvm: add support for irqfd
  2009-06-14 13:23       ` Avi Kivity
@ 2009-06-14 13:30         ` Michael S. Tsirkin
  2009-06-14 13:40           ` Avi Kivity
  0 siblings, 1 reply; 28+ messages in thread
From: Michael S. Tsirkin @ 2009-06-14 13:30 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Gregory Haskins, kvm, linux-kernel, davidel, mtosatti

On Sun, Jun 14, 2009 at 04:23:36PM +0300, Avi Kivity wrote:
> Michael S. Tsirkin wrote:
>
>  
>
>>> I think Avi asked for this specific feature during review which is the
>>> reason why its there today.  However, I agree that it would probably be
>>> a good idea to put an upper limit on the number of supported aliases
>>> that can be registered.  Will fix.
>>>
>>> Thanks Michael,
>>>
>>> -Greg
>>>
>>>
>>>     
>>
>>
>> Avi, can you elaborate on why do we want to map multiple fds
>> to the same gsi? I think it's better to allow a 1:1 mapping
>> only: if many processes want to trigger interrupts they can
>> all write to the same fd.
>>   
>
> I don't want to assume that the eventfds all come from the same source.
>
> That said, we have a workaround, allocate a new gsi with the same routes  
> and attach the excess eventfds there.

Right. So you are ok with 1:1 irqfd:gsi requirement for now?
This seems nicer than N:1 with an arbitrary N.

-- 
MST

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [KVM PATCH v10] kvm: add support for irqfd
  2009-06-14 13:30         ` Michael S. Tsirkin
@ 2009-06-14 13:40           ` Avi Kivity
  2009-06-14 13:50             ` Michael S. Tsirkin
  0 siblings, 1 reply; 28+ messages in thread
From: Avi Kivity @ 2009-06-14 13:40 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Gregory Haskins, kvm, linux-kernel, davidel, mtosatti

Michael S. Tsirkin wrote:

  

>> I don't want to assume that the eventfds all come from the same source.
>>
>> That said, we have a workaround, allocate a new gsi with the same routes  
>> and attach the excess eventfds there.
>>     
>
> Right. So you are ok with 1:1 irqfd:gsi requirement for now?
> This seems nicer than N:1 with an arbitrary N

Not too happy, but okay.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [KVM PATCH v10] kvm: add support for irqfd
  2009-06-14 13:40           ` Avi Kivity
@ 2009-06-14 13:50             ` Michael S. Tsirkin
  0 siblings, 0 replies; 28+ messages in thread
From: Michael S. Tsirkin @ 2009-06-14 13:50 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Gregory Haskins, kvm, linux-kernel, davidel, mtosatti

On Sun, Jun 14, 2009 at 04:40:11PM +0300, Avi Kivity wrote:
> Michael S. Tsirkin wrote:
>
>  
>
>>> I don't want to assume that the eventfds all come from the same source.
>>>
>>> That said, we have a workaround, allocate a new gsi with the same 
>>> routes  and attach the excess eventfds there.
>>>     
>>
>> Right. So you are ok with 1:1 irqfd:gsi requirement for now?
>> This seems nicer than N:1 with an arbitrary N
>
> Not too happy, but okay.

Is the answer 42:1 then, as usual?


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2009-06-14 13:50 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-20 14:30 [KVM PATCH v10] kvm: add support for irqfd Gregory Haskins
2009-05-20 14:35 ` Avi Kivity
2009-05-26 16:42 ` Michael S. Tsirkin
2009-05-26 18:05   ` Gregory Haskins
2009-05-26 20:00   ` Davide Libenzi
2009-05-27 13:55 ` Michael S. Tsirkin
2009-05-27 14:06   ` Gregory Haskins
2009-05-27 14:36     ` [PATCH 0/2] kvm: validate irqfd type Gregory Haskins
2009-05-27 14:37       ` [PATCH 1/2] eventfd: export eventfd interfaces for module use Gregory Haskins
2009-05-27 14:37       ` [PATCH 2/2] kvm: validate irqfd type Gregory Haskins
2009-05-27 15:06       ` [PATCH 0/2] " Gregory Haskins
2009-05-31  9:36       ` Avi Kivity
2009-05-27 18:41     ` [KVM PATCH v10] kvm: add support for irqfd Michael S. Tsirkin
2009-05-27 19:28       ` Davide Libenzi
2009-05-27 20:07       ` Gregory Haskins
2009-05-27 20:43         ` Michael S. Tsirkin
2009-05-27 20:46           ` Gregory Haskins
2009-06-11 13:16 ` Michael S. Tsirkin
2009-06-11 13:36   ` Michael S. Tsirkin
2009-06-14 12:25     ` Gregory Haskins
2009-06-14 13:20       ` Michael S. Tsirkin
2009-06-14  9:25 ` Michael S. Tsirkin
2009-06-14 12:40   ` Gregory Haskins
2009-06-14 13:19     ` Michael S. Tsirkin
2009-06-14 13:23       ` Avi Kivity
2009-06-14 13:30         ` Michael S. Tsirkin
2009-06-14 13:40           ` Avi Kivity
2009-06-14 13:50             ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.