All of lore.kernel.org
 help / color / mirror / Atom feed
* [KVM PATCH v2 0/2] irqfd
@ 2009-04-24  4:25 Gregory Haskins
  2009-04-24  4:25 ` [KVM PATCH v2 1/2] eventfd: export fget and signal interfaces for module use Gregory Haskins
  2009-04-24  4:25 ` [KVM PATCH v2 2/2] kvm: add support for irqfd via eventfd-notification interface Gregory Haskins
  0 siblings, 2 replies; 20+ messages in thread
From: Gregory Haskins @ 2009-04-24  4:25 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, avi, davidel

(Applies to kvm.git b59cd3560111)

This series implements a mechanism called "irqfd".  It lets you create
an eventfd based file-desriptor to inject interrupts to the guest.  We
associate one gsi per fd for proper routing granularity.

This is v2.  Changes since v1

*) Dropped notifier_chain based callbacks in favor of wait_queue_t::func
   and file::poll based callbacks (Thanks to Davide for the suggestion)

--------

We do not have a user of this interface in this series, though note
future version of virtual-bus (v4 and above) will be based on this.

The first patch will require mainline buy-in, particularly from Davide
(cc'd).  The last patch is kvm specific.

kvm-userspace.git patch to follow.

-Greg

---

Gregory Haskins (2):
      kvm: add support for irqfd via eventfd-notification interface
      eventfd: export fget and signal interfaces for module use


 arch/x86/kvm/Makefile    |    2 -
 arch/x86/kvm/x86.c       |    1 
 fs/eventfd.c             |    3 +
 include/linux/kvm.h      |    7 ++
 include/linux/kvm_host.h |    7 ++
 virt/kvm/irqfd.c         |  146 ++++++++++++++++++++++++++++++++++++++++++++++
 virt/kvm/kvm_main.c      |   12 ++++
 7 files changed, 177 insertions(+), 1 deletions(-)
 create mode 100644 virt/kvm/irqfd.c

-- 
Signature

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [KVM PATCH v2 1/2] eventfd: export fget and signal interfaces for module use
  2009-04-24  4:25 [KVM PATCH v2 0/2] irqfd Gregory Haskins
@ 2009-04-24  4:25 ` Gregory Haskins
  2009-04-24  4:25 ` [KVM PATCH v2 2/2] kvm: add support for irqfd via eventfd-notification interface Gregory Haskins
  1 sibling, 0 replies; 20+ messages in thread
From: Gregory Haskins @ 2009-04-24  4:25 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, avi, davidel

We will use this later in the series

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
---

 fs/eventfd.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 2a701d5..3f0e197 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -16,6 +16,7 @@
 #include <linux/anon_inodes.h>
 #include <linux/eventfd.h>
 #include <linux/syscalls.h>
+#include <linux/module.h>
 
 struct eventfd_ctx {
 	wait_queue_head_t wqh;
@@ -56,6 +57,7 @@ int eventfd_signal(struct file *file, int n)
 
 	return n;
 }
+EXPORT_SYMBOL_GPL(eventfd_signal);
 
 static int eventfd_release(struct inode *inode, struct file *file)
 {
@@ -197,6 +199,7 @@ struct file *eventfd_fget(int fd)
 
 	return file;
 }
+EXPORT_SYMBOL_GPL(eventfd_fget);
 
 SYSCALL_DEFINE2(eventfd2, unsigned int, count, int, flags)
 {


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [KVM PATCH v2 2/2] kvm: add support for irqfd via eventfd-notification interface
  2009-04-24  4:25 [KVM PATCH v2 0/2] irqfd Gregory Haskins
  2009-04-24  4:25 ` [KVM PATCH v2 1/2] eventfd: export fget and signal interfaces for module use Gregory Haskins
@ 2009-04-24  4:25 ` Gregory Haskins
  2009-04-24 17:07   ` Gregory Haskins
  2009-04-27  8:55   ` Avi Kivity
  1 sibling, 2 replies; 20+ messages in thread
From: Gregory Haskins @ 2009-04-24  4:25 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, avi, davidel

This allows an eventfd to be registered as an irq source with a guest.  Any
signaling operation on the eventfd (via userspace or kernel) will inject
the registered GSI at the next available window.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
---

 arch/x86/kvm/Makefile    |    2 -
 arch/x86/kvm/x86.c       |    1 
 include/linux/kvm.h      |    7 ++
 include/linux/kvm_host.h |    7 ++
 virt/kvm/irqfd.c         |  146 ++++++++++++++++++++++++++++++++++++++++++++++
 virt/kvm/kvm_main.c      |   12 ++++
 6 files changed, 174 insertions(+), 1 deletions(-)
 create mode 100644 virt/kvm/irqfd.c

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index b43c4ef..d5fff51 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -3,7 +3,7 @@
 #
 
 common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \
-                coalesced_mmio.o irq_comm.o)
+                coalesced_mmio.o irq_comm.o irqfd.o)
 ifeq ($(CONFIG_KVM_TRACE),y)
 common-objs += $(addprefix ../../../virt/kvm/, kvm_trace.o)
 endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8cb8542..88c45e6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1027,6 +1027,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_REINJECT_CONTROL:
 	case KVM_CAP_IRQ_INJECT_STATUS:
 	case KVM_CAP_ASSIGN_DEV_IRQ:
+	case KVM_CAP_IRQFD:
 		r = 1;
 		break;
 	case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 3db5d8d..2404775 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -415,6 +415,7 @@ struct kvm_trace_rec {
 #define KVM_CAP_ASSIGN_DEV_IRQ 29
 /* Another bug in KVM_SET_USER_MEMORY_REGION fixed: */
 #define KVM_CAP_JOIN_MEMORY_REGIONS_WORKS 30
+#define KVM_CAP_IRQFD 31
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -454,6 +455,11 @@ struct kvm_irq_routing {
 
 #endif
 
+struct kvm_irqfd {
+	__u32 fd;
+	__u32 gsi;
+};
+
 /*
  * ioctls for VM fds
  */
@@ -498,6 +504,7 @@ struct kvm_irq_routing {
 #define KVM_ASSIGN_SET_MSIX_ENTRY \
 			_IOW(KVMIO, 0x74, struct kvm_assigned_msix_entry)
 #define KVM_DEASSIGN_DEV_IRQ       _IOW(KVMIO, 0x75, struct kvm_assigned_irq)
+#define KVM_ASSIGN_IRQFD          _IOW(KVMIO, 0x76, struct kvm_irqfd)
 
 /*
  * ioctls for vcpu fds
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 095ebb6..57f8dfe 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -134,6 +134,10 @@ struct kvm {
 	struct list_head vm_list;
 	struct kvm_io_bus mmio_bus;
 	struct kvm_io_bus pio_bus;
+	struct {
+		struct list_head items;
+		int              src;
+	} irqfd;
 	struct kvm_vm_stat stat;
 	struct kvm_arch arch;
 	atomic_t users_count;
@@ -524,4 +528,7 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {}
 
 #endif
 
+int kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi);
+void kvm_irqfd_release(struct kvm *kvm);
+
 #endif
diff --git a/virt/kvm/irqfd.c b/virt/kvm/irqfd.c
new file mode 100644
index 0000000..d2e0915
--- /dev/null
+++ b/virt/kvm/irqfd.c
@@ -0,0 +1,146 @@
+/*
+ * irqfd: Allows an eventfd to be used to inject an interrupt to the guest
+ *
+ * Credit goes to Avi Kivity for the original idea.
+ *
+ * Copyright 2009 Novell.  All Rights Reserved.
+ *
+ * Author:
+ *	Gregory Haskins <ghaskins@novell.com>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#include <linux/kvm_host.h>
+#include <linux/eventfd.h>
+#include <linux/workqueue.h>
+#include <linux/wait.h>
+#include <linux/poll.h>
+#include <linux/file.h>
+#include <linux/list.h>
+
+struct _irqfd {
+	struct kvm               *kvm;
+	int                       gsi;
+	struct file              *file;
+	struct list_head          list;
+	poll_table                pt;
+	wait_queue_head_t        *wqh;
+	wait_queue_t              wait;
+	struct work_struct        work;
+};
+
+static void
+irqfd_inject(struct work_struct *work)
+{
+	struct _irqfd *irqfd = container_of(work, struct _irqfd, work);
+	struct kvm *kvm = irqfd->kvm;
+
+	mutex_lock(&kvm->lock);
+	kvm_set_irq(kvm, kvm->irqfd.src, irqfd->gsi, 1);
+	mutex_unlock(&kvm->lock);
+}
+
+static int
+irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
+{
+	struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait);
+
+	/*
+	 * The eventfd calls its wake_up with interrupts disabled,
+	 * so we need to defer the IRQ injection until later since we need
+	 * to acquire the kvm->lock to do so.
+	 */
+	schedule_work(&irqfd->work);
+
+	return 0;
+}
+
+static void
+irqfd_ptable_queue_proc(struct file *file, wait_queue_head_t *wqh,
+			poll_table *pt)
+{
+	struct _irqfd *irqfd = container_of(pt, struct _irqfd, pt);
+
+	irqfd->wqh = wqh;
+	add_wait_queue(wqh, &irqfd->wait);
+}
+
+int
+kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
+{
+	struct _irqfd *irqfd;
+	struct file *file;
+	int ret;
+
+	irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
+	if (!irqfd)
+		return -ENOMEM;
+
+	irqfd->kvm = kvm;
+	irqfd->gsi = gsi;
+	INIT_LIST_HEAD(&irqfd->list);
+	init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup);
+	init_poll_funcptr(&irqfd->pt, irqfd_ptable_queue_proc);
+	INIT_WORK(&irqfd->work, irqfd_inject);
+
+	file = eventfd_fget(fd);
+	if (IS_ERR(file)) {
+		ret = PTR_ERR(file);
+		goto fail;
+	}
+
+	ret = file->f_op->poll(file, &irqfd->pt);
+	/* do we need to look for errors in ret? */
+
+	irqfd->file = file;
+
+	mutex_lock(&kvm->lock);
+	if (kvm->irqfd.src == -1) {
+		ret = kvm_request_irq_source_id(kvm);
+		BUG_ON(ret < 0);
+
+		kvm->irqfd.src = ret;
+	}
+
+	list_add_tail(&irqfd->list, &kvm->irqfd.items);
+
+	mutex_unlock(&kvm->lock);
+
+	return 0;
+
+fail:
+	kfree(irqfd);
+	return ret;
+}
+
+void
+kvm_irqfd_release(struct kvm *kvm)
+{
+	struct _irqfd *irqfd, *tmp;
+
+	list_for_each_entry_safe(irqfd, tmp, &kvm->irqfd.items, list) {
+		remove_wait_queue(irqfd->wqh, &irqfd->wait);
+
+		flush_work(&irqfd->work);
+		fput(irqfd->file);
+
+		list_del(&irqfd->list);
+		kfree(irqfd);
+	}
+
+	if (kvm->irqfd.src != -1)
+		kvm_free_irq_source_id(kvm, kvm->irqfd.src);
+
+}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3265566..c822d79 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -972,6 +972,8 @@ static struct kvm *kvm_create_vm(void)
 	atomic_inc(&kvm->mm->mm_count);
 	spin_lock_init(&kvm->mmu_lock);
 	kvm_io_bus_init(&kvm->pio_bus);
+	INIT_LIST_HEAD(&kvm->irqfd.items);
+	kvm->irqfd.src = -1;
 	mutex_init(&kvm->lock);
 	kvm_io_bus_init(&kvm->mmio_bus);
 	init_rwsem(&kvm->slots_lock);
@@ -1023,6 +1025,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
 	spin_lock(&kvm_lock);
 	list_del(&kvm->vm_list);
 	spin_unlock(&kvm_lock);
+	kvm_irqfd_release(kvm);
 	kvm_free_irq_routing(kvm);
 	kvm_io_bus_destroy(&kvm->pio_bus);
 	kvm_io_bus_destroy(&kvm->mmio_bus);
@@ -2197,6 +2200,15 @@ static long kvm_vm_ioctl(struct file *filp,
 	}
 #endif
 #endif /* KVM_CAP_IRQ_ROUTING */
+	case KVM_ASSIGN_IRQFD: {
+		struct kvm_irqfd data;
+
+		r = -EFAULT;
+		if (copy_from_user(&data, argp, sizeof data))
+			goto out;
+		r = kvm_irqfd_assign(kvm, data.fd, data.gsi);
+		break;
+	}
 	default:
 		r = kvm_arch_vm_ioctl(filp, ioctl, arg);
 	}


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [KVM PATCH v2 2/2] kvm: add support for irqfd via eventfd-notification interface
  2009-04-24  4:25 ` [KVM PATCH v2 2/2] kvm: add support for irqfd via eventfd-notification interface Gregory Haskins
@ 2009-04-24 17:07   ` Gregory Haskins
  2009-04-24 17:47     ` Davide Libenzi
  2009-04-27  8:55   ` Avi Kivity
  1 sibling, 1 reply; 20+ messages in thread
From: Gregory Haskins @ 2009-04-24 17:07 UTC (permalink / raw)
  To: davidel; +Cc: kvm, linux-kernel, avi

[-- Attachment #1: Type: text/plain, Size: 9055 bytes --]

Hi David,
   See inline for an answer to your last question:

Gregory Haskins wrote:
> This allows an eventfd to be registered as an irq source with a guest.  Any
> signaling operation on the eventfd (via userspace or kernel) will inject
> the registered GSI at the next available window.
>
> Signed-off-by: Gregory Haskins <ghaskins@novell.com>
> ---
>
>  arch/x86/kvm/Makefile    |    2 -
>  arch/x86/kvm/x86.c       |    1 
>  include/linux/kvm.h      |    7 ++
>  include/linux/kvm_host.h |    7 ++
>  virt/kvm/irqfd.c         |  146 ++++++++++++++++++++++++++++++++++++++++++++++
>  virt/kvm/kvm_main.c      |   12 ++++
>  6 files changed, 174 insertions(+), 1 deletions(-)
>  create mode 100644 virt/kvm/irqfd.c
>
> diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
> index b43c4ef..d5fff51 100644
> --- a/arch/x86/kvm/Makefile
> +++ b/arch/x86/kvm/Makefile
> @@ -3,7 +3,7 @@
>  #
>  
>  common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \
> -                coalesced_mmio.o irq_comm.o)
> +                coalesced_mmio.o irq_comm.o irqfd.o)
>  ifeq ($(CONFIG_KVM_TRACE),y)
>  common-objs += $(addprefix ../../../virt/kvm/, kvm_trace.o)
>  endif
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 8cb8542..88c45e6 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1027,6 +1027,7 @@ int kvm_dev_ioctl_check_extension(long ext)
>  	case KVM_CAP_REINJECT_CONTROL:
>  	case KVM_CAP_IRQ_INJECT_STATUS:
>  	case KVM_CAP_ASSIGN_DEV_IRQ:
> +	case KVM_CAP_IRQFD:
>  		r = 1;
>  		break;
>  	case KVM_CAP_COALESCED_MMIO:
> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> index 3db5d8d..2404775 100644
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -415,6 +415,7 @@ struct kvm_trace_rec {
>  #define KVM_CAP_ASSIGN_DEV_IRQ 29
>  /* Another bug in KVM_SET_USER_MEMORY_REGION fixed: */
>  #define KVM_CAP_JOIN_MEMORY_REGIONS_WORKS 30
> +#define KVM_CAP_IRQFD 31
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> @@ -454,6 +455,11 @@ struct kvm_irq_routing {
>  
>  #endif
>  
> +struct kvm_irqfd {
> +	__u32 fd;
> +	__u32 gsi;
> +};
> +
>  /*
>   * ioctls for VM fds
>   */
> @@ -498,6 +504,7 @@ struct kvm_irq_routing {
>  #define KVM_ASSIGN_SET_MSIX_ENTRY \
>  			_IOW(KVMIO, 0x74, struct kvm_assigned_msix_entry)
>  #define KVM_DEASSIGN_DEV_IRQ       _IOW(KVMIO, 0x75, struct kvm_assigned_irq)
> +#define KVM_ASSIGN_IRQFD          _IOW(KVMIO, 0x76, struct kvm_irqfd)
>  
>  /*
>   * ioctls for vcpu fds
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 095ebb6..57f8dfe 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -134,6 +134,10 @@ struct kvm {
>  	struct list_head vm_list;
>  	struct kvm_io_bus mmio_bus;
>  	struct kvm_io_bus pio_bus;
> +	struct {
> +		struct list_head items;
> +		int              src;
> +	} irqfd;
>  	struct kvm_vm_stat stat;
>  	struct kvm_arch arch;
>  	atomic_t users_count;
> @@ -524,4 +528,7 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {}
>  
>  #endif
>  
> +int kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi);
> +void kvm_irqfd_release(struct kvm *kvm);
> +
>  #endif
> diff --git a/virt/kvm/irqfd.c b/virt/kvm/irqfd.c
> new file mode 100644
> index 0000000..d2e0915
> --- /dev/null
> +++ b/virt/kvm/irqfd.c
> @@ -0,0 +1,146 @@
> +/*
> + * irqfd: Allows an eventfd to be used to inject an interrupt to the guest
> + *
> + * Credit goes to Avi Kivity for the original idea.
> + *
> + * Copyright 2009 Novell.  All Rights Reserved.
> + *
> + * Author:
> + *	Gregory Haskins <ghaskins@novell.com>
> + *
> + * This file is free software; you can redistribute it and/or modify
> + * it under the terms of version 2 of the GNU General Public License
> + * as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software Foundation,
> + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
> + */
> +
> +#include <linux/kvm_host.h>
> +#include <linux/eventfd.h>
> +#include <linux/workqueue.h>
> +#include <linux/wait.h>
> +#include <linux/poll.h>
> +#include <linux/file.h>
> +#include <linux/list.h>
> +
> +struct _irqfd {
> +	struct kvm               *kvm;
> +	int                       gsi;
> +	struct file              *file;
> +	struct list_head          list;
> +	poll_table                pt;
> +	wait_queue_head_t        *wqh;
> +	wait_queue_t              wait;
> +	struct work_struct        work;
> +};
> +
> +static void
> +irqfd_inject(struct work_struct *work)
> +{
> +	struct _irqfd *irqfd = container_of(work, struct _irqfd, work);
> +	struct kvm *kvm = irqfd->kvm;
> +
> +	mutex_lock(&kvm->lock);
> +	kvm_set_irq(kvm, kvm->irqfd.src, irqfd->gsi, 1);
> +	mutex_unlock(&kvm->lock);
> +}
> +
> +static int
> +irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
> +{
> +	struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait);
> +
> +	/*
> +	 * The eventfd calls its wake_up with interrupts disabled,
> +	 * so we need to defer the IRQ injection until later since we need
> +	 * to acquire the kvm->lock to do so.
> +	 */
> +	schedule_work(&irqfd->work);
> +
> +	return 0;
> +}
> +
> +static void
> +irqfd_ptable_queue_proc(struct file *file, wait_queue_head_t *wqh,
> +			poll_table *pt)
> +{
> +	struct _irqfd *irqfd = container_of(pt, struct _irqfd, pt);
> +
> +	irqfd->wqh = wqh;
> +	add_wait_queue(wqh, &irqfd->wait);
> +}
> +
> +int
> +kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
> +{
> +	struct _irqfd *irqfd;
> +	struct file *file;
> +	int ret;
> +
> +	irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
> +	if (!irqfd)
> +		return -ENOMEM;
> +
> +	irqfd->kvm = kvm;
> +	irqfd->gsi = gsi;
> +	INIT_LIST_HEAD(&irqfd->list);
> +	init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup);
> +	init_poll_funcptr(&irqfd->pt, irqfd_ptable_queue_proc);
> +	INIT_WORK(&irqfd->work, irqfd_inject);
> +
> +	file = eventfd_fget(fd);
>   

I grab the fget() reference here, and I hold it until after I have
already called remove_wait_queue()...

> +	if (IS_ERR(file)) {
> +		ret = PTR_ERR(file);
> +		goto fail;
> +	}
> +
> +	ret = file->f_op->poll(file, &irqfd->pt);
> +	/* do we need to look for errors in ret? */
> +
> +	irqfd->file = file;
> +
> +	mutex_lock(&kvm->lock);
> +	if (kvm->irqfd.src == -1) {
> +		ret = kvm_request_irq_source_id(kvm);
> +		BUG_ON(ret < 0);
> +
> +		kvm->irqfd.src = ret;
> +	}
> +
> +	list_add_tail(&irqfd->list, &kvm->irqfd.items);
> +
> +	mutex_unlock(&kvm->lock);
> +
> +	return 0;
> +
> +fail:
> +	kfree(irqfd);
> +	return ret;
> +}
> +
> +void
> +kvm_irqfd_release(struct kvm *kvm)
> +{
> +	struct _irqfd *irqfd, *tmp;
> +
> +	list_for_each_entry_safe(irqfd, tmp, &kvm->irqfd.items, list) {
> +		remove_wait_queue(irqfd->wqh, &irqfd->wait);
> +
> +		flush_work(&irqfd->work);
> +		fput(irqfd->file);
>   

... here.

Please let me know if you see any holes in that approach.

-Greg


> +
> +		list_del(&irqfd->list);
> +		kfree(irqfd);
> +	}
> +
> +	if (kvm->irqfd.src != -1)
> +		kvm_free_irq_source_id(kvm, kvm->irqfd.src);
> +
> +}
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 3265566..c822d79 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -972,6 +972,8 @@ static struct kvm *kvm_create_vm(void)
>  	atomic_inc(&kvm->mm->mm_count);
>  	spin_lock_init(&kvm->mmu_lock);
>  	kvm_io_bus_init(&kvm->pio_bus);
> +	INIT_LIST_HEAD(&kvm->irqfd.items);
> +	kvm->irqfd.src = -1;
>  	mutex_init(&kvm->lock);
>  	kvm_io_bus_init(&kvm->mmio_bus);
>  	init_rwsem(&kvm->slots_lock);
> @@ -1023,6 +1025,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
>  	spin_lock(&kvm_lock);
>  	list_del(&kvm->vm_list);
>  	spin_unlock(&kvm_lock);
> +	kvm_irqfd_release(kvm);
>  	kvm_free_irq_routing(kvm);
>  	kvm_io_bus_destroy(&kvm->pio_bus);
>  	kvm_io_bus_destroy(&kvm->mmio_bus);
> @@ -2197,6 +2200,15 @@ static long kvm_vm_ioctl(struct file *filp,
>  	}
>  #endif
>  #endif /* KVM_CAP_IRQ_ROUTING */
> +	case KVM_ASSIGN_IRQFD: {
> +		struct kvm_irqfd data;
> +
> +		r = -EFAULT;
> +		if (copy_from_user(&data, argp, sizeof data))
> +			goto out;
> +		r = kvm_irqfd_assign(kvm, data.fd, data.gsi);
> +		break;
> +	}
>  	default:
>  		r = kvm_arch_vm_ioctl(filp, ioctl, arg);
>  	}
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [KVM PATCH v2 2/2] kvm: add support for irqfd via eventfd-notification interface
  2009-04-24 17:07   ` Gregory Haskins
@ 2009-04-24 17:47     ` Davide Libenzi
  0 siblings, 0 replies; 20+ messages in thread
From: Davide Libenzi @ 2009-04-24 17:47 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: kvm, Linux Kernel Mailing List, avi

On Fri, 24 Apr 2009, Gregory Haskins wrote:

> Hi David,
>    See inline for an answer to your last question:

Ok, so basically the eventfd lifetime is embedded inside the irqfd 
lifetime. Looks OK to me.



- Davide



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [KVM PATCH v2 2/2] kvm: add support for irqfd via eventfd-notification interface
  2009-04-24  4:25 ` [KVM PATCH v2 2/2] kvm: add support for irqfd via eventfd-notification interface Gregory Haskins
  2009-04-24 17:07   ` Gregory Haskins
@ 2009-04-27  8:55   ` Avi Kivity
  2009-04-27 10:35     ` Gregory Haskins
  1 sibling, 1 reply; 20+ messages in thread
From: Avi Kivity @ 2009-04-27  8:55 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: kvm, linux-kernel, davidel

Gregory Haskins wrote:
> This allows an eventfd to be registered as an irq source with a guest.  Any
> signaling operation on the eventfd (via userspace or kernel) will inject
> the registered GSI at the next available window.
>
>  
> +struct kvm_irqfd {
> +	__u32 fd;
> +	__u32 gsi;
> +};
> +
>   

I think it's better to have ioctl create and return the fd.  This way we 
aren't tied to eventfd (though it makes a lot of sense to use it).

Also, please add a flags field and some padding so we can extend it later.

> +
> +#include <linux/kvm_host.h>
> +#include <linux/eventfd.h>
> +#include <linux/workqueue.h>
> +#include <linux/wait.h>
> +#include <linux/poll.h>
> +#include <linux/file.h>
> +#include <linux/list.h>
> +
> +struct _irqfd {
> +	struct kvm               *kvm;
> +	int                       gsi;
> +	struct file              *file;
> +	struct list_head          list;
> +	poll_table                pt;
> +	wait_queue_head_t        *wqh;
> +	wait_queue_t              wait;
> +	struct work_struct        work;
> +};
> +
> +static void
> +irqfd_inject(struct work_struct *work)
> +{
> +	struct _irqfd *irqfd = container_of(work, struct _irqfd, work);
> +	struct kvm *kvm = irqfd->kvm;
> +
> +	mutex_lock(&kvm->lock);
> +	kvm_set_irq(kvm, kvm->irqfd.src, irqfd->gsi, 1);
>   

Need to lower the irq too (though irqfd only supports edge triggered 
interrupts).

> +	mutex_unlock(&kvm->lock);
> +}
> +
> +static int
> +irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
> +{
> +	struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait);
> +
> +	/*
> +	 * The eventfd calls its wake_up with interrupts disabled,
> +	 * so we need to defer the IRQ injection until later since we need
> +	 * to acquire the kvm->lock to do so.
> +	 */
> +	schedule_work(&irqfd->work);
> +
> +	return 0;
> +}
>   

One day we'll have lockless injection and we'll want to drop this.  I 
guess if we create the fd ourselves we can make it work, but I don't see 
how we can do this with eventfd.

> +int
> +kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
> +{
> +	struct _irqfd *irqfd;
> +	struct file *file;
> +	int ret;
> +
> +	irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
> +	if (!irqfd)
> +		return -ENOMEM;
> +
> +	irqfd->kvm = kvm;
> +	irqfd->gsi = gsi;
> +	INIT_LIST_HEAD(&irqfd->list);
> +	init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup);
> +	init_poll_funcptr(&irqfd->pt, irqfd_ptable_queue_proc);
> +	INIT_WORK(&irqfd->work, irqfd_inject);
> +
> +	file = eventfd_fget(fd);
> +	if (IS_ERR(file)) {
> +		ret = PTR_ERR(file);
> +		goto fail;
> +	}
> +
> +	ret = file->f_op->poll(file, &irqfd->pt);
> +	/* do we need to look for errors in ret? */
>   

Do we?

> +
> +	irqfd->file = file;
> +
> +	mutex_lock(&kvm->lock);
> +	if (kvm->irqfd.src == -1) {
> +		ret = kvm_request_irq_source_id(kvm);
> +		BUG_ON(ret < 0);
>   

I think you can reuse the userspace irq source (since it's just another 
way for userspace to inject an interrupt).  It isn't really needed since 
the irq source stuff is only needed to support level triggered interrupts.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [KVM PATCH v2 2/2] kvm: add support for irqfd via eventfd-notification interface
  2009-04-27  8:55   ` Avi Kivity
@ 2009-04-27 10:35     ` Gregory Haskins
  2009-04-27 10:48       ` Avi Kivity
  2009-04-27 10:58       ` Gregory Haskins
  0 siblings, 2 replies; 20+ messages in thread
From: Gregory Haskins @ 2009-04-27 10:35 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, linux-kernel, davidel

[-- Attachment #1: Type: text/plain, Size: 4176 bytes --]

Avi Kivity wrote:
> Gregory Haskins wrote:
>> This allows an eventfd to be registered as an irq source with a
>> guest.  Any
>> signaling operation on the eventfd (via userspace or kernel) will inject
>> the registered GSI at the next available window.
>>
>>  
>> +struct kvm_irqfd {
>> +    __u32 fd;
>> +    __u32 gsi;
>> +};
>> +
>>   
>
> I think it's better to have ioctl create and return the fd.  This way
> we aren't tied to eventfd (though it makes a lot of sense to use it).

I dont mind either way, but I am not sure it buys us much as the one
driving the fd would need to understand if the interface is
eventfd-esque or something else anyway.  Let me know if you still want
to see this changed.

>
> Also, please add a flags field and some padding so we can extend it
> later.
>

Good idea.  Will do.

>> +
>> +#include <linux/kvm_host.h>
>> +#include <linux/eventfd.h>
>> +#include <linux/workqueue.h>
>> +#include <linux/wait.h>
>> +#include <linux/poll.h>
>> +#include <linux/file.h>
>> +#include <linux/list.h>
>> +
>> +struct _irqfd {
>> +    struct kvm               *kvm;
>> +    int                       gsi;
>> +    struct file              *file;
>> +    struct list_head          list;
>> +    poll_table                pt;
>> +    wait_queue_head_t        *wqh;
>> +    wait_queue_t              wait;
>> +    struct work_struct        work;
>> +};
>> +
>> +static void
>> +irqfd_inject(struct work_struct *work)
>> +{
>> +    struct _irqfd *irqfd = container_of(work, struct _irqfd, work);
>> +    struct kvm *kvm = irqfd->kvm;
>> +
>> +    mutex_lock(&kvm->lock);
>> +    kvm_set_irq(kvm, kvm->irqfd.src, irqfd->gsi, 1);
>>   
>
> Need to lower the irq too (though irqfd only supports edge triggered
> interrupts).
>
Should I just do back-to-back 1+0 inside the same lock?

>> +    mutex_unlock(&kvm->lock);
>> +}
>> +
>> +static int
>> +irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
>> +{
>> +    struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait);
>> +
>> +    /*
>> +     * The eventfd calls its wake_up with interrupts disabled,
>> +     * so we need to defer the IRQ injection until later since we need
>> +     * to acquire the kvm->lock to do so.
>> +     */
>> +    schedule_work(&irqfd->work);
>> +
>> +    return 0;
>> +}
>>   
>
> One day we'll have lockless injection and we'll want to drop this.  I
> guess if we create the fd ourselves we can make it work, but I don't
> see how we can do this with eventfd.
>

Hmm...this is a good point.  There probably is no way to use eventfd
"off the shelf" in a way that doesn't cause this callback to be in a
critical section.  Should we just worry about switching away from
eventfd when this occurs, or should I implement a custom anon-fd now?


>> +int
>> +kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
>> +{
>> +    struct _irqfd *irqfd;
>> +    struct file *file;
>> +    int ret;
>> +
>> +    irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
>> +    if (!irqfd)
>> +        return -ENOMEM;
>> +
>> +    irqfd->kvm = kvm;
>> +    irqfd->gsi = gsi;
>> +    INIT_LIST_HEAD(&irqfd->list);
>> +    init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup);
>> +    init_poll_funcptr(&irqfd->pt, irqfd_ptable_queue_proc);
>> +    INIT_WORK(&irqfd->work, irqfd_inject);
>> +
>> +    file = eventfd_fget(fd);
>> +    if (IS_ERR(file)) {
>> +        ret = PTR_ERR(file);
>> +        goto fail;
>> +    }
>> +
>> +    ret = file->f_op->poll(file, &irqfd->pt);
>> +    /* do we need to look for errors in ret? */
>>   
>
> Do we?

Probably.  Will fix in v3.

>
>> +
>> +    irqfd->file = file;
>> +
>> +    mutex_lock(&kvm->lock);
>> +    if (kvm->irqfd.src == -1) {
>> +        ret = kvm_request_irq_source_id(kvm);
>> +        BUG_ON(ret < 0);
>>   
>
> I think you can reuse the userspace irq source (since it's just
> another way for userspace to inject an interrupt).  It isn't really
> needed since the irq source stuff is only needed to support level
> triggered interrupts.
>
Ack, will do.

Thanks Avi,
-Greg




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [KVM PATCH v2 2/2] kvm: add support for irqfd via eventfd-notification interface
  2009-04-27 10:35     ` Gregory Haskins
@ 2009-04-27 10:48       ` Avi Kivity
  2009-04-27 13:27         ` Gregory Haskins
  2009-04-27 10:58       ` Gregory Haskins
  1 sibling, 1 reply; 20+ messages in thread
From: Avi Kivity @ 2009-04-27 10:48 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: kvm, linux-kernel, davidel

Gregory Haskins wrote:
>>> This allows an eventfd to be registered as an irq source with a
>>> guest.  Any
>>> signaling operation on the eventfd (via userspace or kernel) will inject
>>> the registered GSI at the next available window.
>>>
>>>  
>>> +struct kvm_irqfd {
>>> +    __u32 fd;
>>> +    __u32 gsi;
>>> +};
>>> +
>>>   
>>>       
>> I think it's better to have ioctl create and return the fd.  This way
>> we aren't tied to eventfd (though it makes a lot of sense to use it).
>>     
>
> I dont mind either way, but I am not sure it buys us much as the one
> driving the fd would need to understand if the interface is
> eventfd-esque or something else anyway.  Let me know if you still want
> to see this changed.
>   

Sure, the interface remains the same (write 8 bytes), but the 
implementation can change.  For example, we can implement it to work 
from interrupt context, once we hack the locking appropriately.


>>> +static void
>>> +irqfd_inject(struct work_struct *work)
>>> +{
>>> +    struct _irqfd *irqfd = container_of(work, struct _irqfd, work);
>>> +    struct kvm *kvm = irqfd->kvm;
>>> +
>>> +    mutex_lock(&kvm->lock);
>>> +    kvm_set_irq(kvm, kvm->irqfd.src, irqfd->gsi, 1);
>>>   
>>>       
>> Need to lower the irq too (though irqfd only supports edge triggered
>> interrupts).
>>
>>     
> Should I just do back-to-back 1+0 inside the same lock?
>
>   

Yes.  Might be nice to add a kvm_toggle_irq(), but let's leave that 
until later.

  

>> One day we'll have lockless injection and we'll want to drop this.  I
>> guess if we create the fd ourselves we can make it work, but I don't
>> see how we can do this with eventfd.
>>
>>     
>
> Hmm...this is a good point.  There probably is no way to use eventfd
> "off the shelf" in a way that doesn't cause this callback to be in a
> critical section.  Should we just worry about switching away from
> eventfd when this occurs, or should I implement a custom anon-fd now?
>   

I'd just go with eventfd, and switch when it becomes relevant.  As long 
as the kernel allocates the fd, we're free to do as we like.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [KVM PATCH v2 2/2] kvm: add support for irqfd via eventfd-notification interface
  2009-04-27 10:35     ` Gregory Haskins
  2009-04-27 10:48       ` Avi Kivity
@ 2009-04-27 10:58       ` Gregory Haskins
  2009-04-27 11:23         ` Avi Kivity
  1 sibling, 1 reply; 20+ messages in thread
From: Gregory Haskins @ 2009-04-27 10:58 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, linux-kernel, davidel

[-- Attachment #1: Type: text/plain, Size: 1287 bytes --]

Gregory Haskins wrote:
> Avi Kivity wrote:
>   
>>
>> One day we'll have lockless injection and we'll want to drop this.  I
>> guess if we create the fd ourselves we can make it work, but I don't
>> see how we can do this with eventfd.
>>
>>     
>
> Hmm...this is a good point.  There probably is no way to use eventfd
> "off the shelf" in a way that doesn't cause this callback to be in a
> critical section.  Should we just worry about switching away from
> eventfd when this occurs, or should I implement a custom anon-fd now?
>   

Something else to consider:  eventfd supports calling eventfd_signal()
from interrupt context, so
even if the callback were not invoked from a preempt/irq off CS due to
the wqh->lock, the context may still be a CS anyway.  Now, userspace and
vbus based injection do not need to worry about this, but I wonder if
this is a  desirable attribute for some other source (device-assignment
perhaps)?

If so, is there a way to design the lockless-injection code such that
its still friendly to irq-context, or is this a mode that we would never
want to support anyway?  I think this would be a good thing to have at
least to maintain compatibility with the existing eventfd interface,
which has its own advantages.

-Greg



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [KVM PATCH v2 2/2] kvm: add support for irqfd via eventfd-notification interface
  2009-04-27 10:58       ` Gregory Haskins
@ 2009-04-27 11:23         ` Avi Kivity
  0 siblings, 0 replies; 20+ messages in thread
From: Avi Kivity @ 2009-04-27 11:23 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: kvm, linux-kernel, davidel

Gregory Haskins wrote:
> Something else to consider:  eventfd supports calling eventfd_signal()
> from interrupt context, so
> even if the callback were not invoked from a preempt/irq off CS due to
> the wqh->lock, the context may still be a CS anyway.  Now, userspace and
> vbus based injection do not need to worry about this, but I wonder if
> this is a  desirable attribute for some other source (device-assignment
> perhaps)?
>   

I'm no networking expert, but device assignment certainly wants to queue 
an interrupt from an interrupt (basically it forwards the interrupt from 
host to guest).  Block is also simple enough to trigger the interrupt 
from the completion context.

For networking, we'd eventually want to do the host->guest copy using a 
dma engine, and we'd want to inject the interrupt from the dma engine's 
completion handler.


> If so, is there a way to design the lockless-injection code such that
> its still friendly to irq-context, or is this a mode that we would never
> want to support anyway?  I think this would be a good thing to have at
> least to maintain compatibility with the existing eventfd interface,
> which has its own advantages.
>   

This has RCU painted all over it in a 1200-point font.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [KVM PATCH v2 2/2] kvm: add support for irqfd via eventfd-notification interface
  2009-04-27 10:48       ` Avi Kivity
@ 2009-04-27 13:27         ` Gregory Haskins
  2009-04-28  9:35           ` Avi Kivity
  0 siblings, 1 reply; 20+ messages in thread
From: Gregory Haskins @ 2009-04-27 13:27 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, linux-kernel, davidel

[-- Attachment #1: Type: text/plain, Size: 2446 bytes --]

Avi Kivity wrote:
> Gregory Haskins wrote:
>>>> This allows an eventfd to be registered as an irq source with a
>>>> guest.  Any
>>>> signaling operation on the eventfd (via userspace or kernel) will
>>>> inject
>>>> the registered GSI at the next available window.
>>>>
>>>>  
>>>> +struct kvm_irqfd {
>>>> +    __u32 fd;
>>>> +    __u32 gsi;
>>>> +};
>>>> +
>>>>         
>>> I think it's better to have ioctl create and return the fd.  This way
>>> we aren't tied to eventfd (though it makes a lot of sense to use it).
>>>     
>>
>> I dont mind either way, but I am not sure it buys us much as the one
>> driving the fd would need to understand if the interface is
>> eventfd-esque or something else anyway.  Let me know if you still want
>> to see this changed.
>>   
>
> Sure, the interface remains the same (write 8 bytes), but the
> implementation can change.  For example, we can implement it to work
> from interrupt context, once we hack the locking appropriately.

I was thinking more along the lines of eventfd_signal().  AIO and vbus
currently use this interface, as opposed to the more polymorhpic
f_ops->write().

>
>
>>>> +static void
>>>> +irqfd_inject(struct work_struct *work)
>>>> +{
>>>> +    struct _irqfd *irqfd = container_of(work, struct _irqfd, work);
>>>> +    struct kvm *kvm = irqfd->kvm;
>>>> +
>>>> +    mutex_lock(&kvm->lock);
>>>> +    kvm_set_irq(kvm, kvm->irqfd.src, irqfd->gsi, 1);
>>>>         
>>> Need to lower the irq too (though irqfd only supports edge triggered
>>> interrupts).
>>>
>>>     
>> Should I just do back-to-back 1+0 inside the same lock?
>>
>>   
>
> Yes.  Might be nice to add a kvm_toggle_irq(), but let's leave that
> until later.

Ok.

>
>
>  
>
>>> One day we'll have lockless injection and we'll want to drop this.  I
>>> guess if we create the fd ourselves we can make it work, but I don't
>>> see how we can do this with eventfd.
>>>
>>>     
>>
>> Hmm...this is a good point.  There probably is no way to use eventfd
>> "off the shelf" in a way that doesn't cause this callback to be in a
>> critical section.  Should we just worry about switching away from
>> eventfd when this occurs, or should I implement a custom anon-fd now?
>>   
>
> I'd just go with eventfd, and switch when it becomes relevant.  As
> long as the kernel allocates the fd, we're free to do as we like.

Sounds good.

-Greg



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [KVM PATCH v2 2/2] kvm: add support for irqfd via eventfd-notification interface
  2009-04-27 13:27         ` Gregory Haskins
@ 2009-04-28  9:35           ` Avi Kivity
  2009-04-28 10:34             ` Gregory Haskins
  0 siblings, 1 reply; 20+ messages in thread
From: Avi Kivity @ 2009-04-28  9:35 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: kvm, linux-kernel, davidel

Gregory Haskins wrote:

  

>> Sure, the interface remains the same (write 8 bytes), but the
>> implementation can change.  For example, we can implement it to work
>> from interrupt context, once we hack the locking appropriately.
>>     
>
> I was thinking more along the lines of eventfd_signal().  AIO and vbus
> currently use this interface, as opposed to the more polymorhpic
> f_ops->write().
>
>   

But eventfd_signal basically marries us to eventfd.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [KVM PATCH v2 2/2] kvm: add support for irqfd via eventfd-notification interface
  2009-04-28  9:35           ` Avi Kivity
@ 2009-04-28 10:34             ` Gregory Haskins
  2009-04-28 11:00               ` Avi Kivity
  0 siblings, 1 reply; 20+ messages in thread
From: Gregory Haskins @ 2009-04-28 10:34 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, linux-kernel, davidel

[-- Attachment #1: Type: text/plain, Size: 1157 bytes --]

Avi Kivity wrote:
> Gregory Haskins wrote:
>
>  
>
>>> Sure, the interface remains the same (write 8 bytes), but the
>>> implementation can change.  For example, we can implement it to work
>>> from interrupt context, once we hack the locking appropriately.
>>>     
>>
>> I was thinking more along the lines of eventfd_signal().  AIO and vbus
>> currently use this interface, as opposed to the more polymorhpic
>> f_ops->write().
>>
>>   
>
> But eventfd_signal basically marries us to eventfd.

Well, only if we expect the fd to have eventfd semantics.  There are
advantages to doing so, as we have discussed, because things like AIO
can polymorhpically signal an interrupt without even knowing whats
behind the eventfd.  But this isn't a strict requirement to support
AIO.  Really all we need is a way for both kernel and userspace to
signal.  Perhaps I should export an "irqfd_signal()" function from kvm,
which today will map to eventfd_signal(), and tomorrow to ??.  I don't
think using f_ops->write() is an option for in-kernel signaling, so we
need some kind of interface here.

Does that sound reasonable?

-Greg



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [KVM PATCH v2 2/2] kvm: add support for irqfd via eventfd-notification interface
  2009-04-28 10:34             ` Gregory Haskins
@ 2009-04-28 11:00               ` Avi Kivity
  2009-04-28 11:04                 ` Gregory Haskins
  0 siblings, 1 reply; 20+ messages in thread
From: Avi Kivity @ 2009-04-28 11:00 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: kvm, linux-kernel, davidel

Gregory Haskins wrote:

  

>> But eventfd_signal basically marries us to eventfd.
>>     
>
> Well, only if we expect the fd to have eventfd semantics.  There are
> advantages to doing so, as we have discussed, because things like AIO
> can polymorhpically signal an interrupt without even knowing whats
> behind the eventfd.  But this isn't a strict requirement to support
> AIO.  Really all we need is a way for both kernel and userspace to
> signal.  Perhaps I should export an "irqfd_signal()" function from kvm,
> which today will map to eventfd_signal(), and tomorrow to ??.  I don't
> think using f_ops->write() is an option for in-kernel signaling, so we
> need some kind of interface here.
>
> Does that sound reasonable?
>   

irqfd_signal() ties the user of irqfd to kvm.  I want this user to be 
independent of kvm; it should work with eventfd, kvm's eventfd lookalike 
(if we move away from eventfd) or pipes.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [KVM PATCH v2 2/2] kvm: add support for irqfd via eventfd-notification interface
  2009-04-28 11:00               ` Avi Kivity
@ 2009-04-28 11:04                 ` Gregory Haskins
  2009-04-28 11:05                   ` Avi Kivity
  0 siblings, 1 reply; 20+ messages in thread
From: Gregory Haskins @ 2009-04-28 11:04 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, linux-kernel, davidel

[-- Attachment #1: Type: text/plain, Size: 1052 bytes --]

Avi Kivity wrote:
> Gregory Haskins wrote:
>
>  
>
>>> But eventfd_signal basically marries us to eventfd.
>>>     
>>
>> Well, only if we expect the fd to have eventfd semantics.  There are
>> advantages to doing so, as we have discussed, because things like AIO
>> can polymorhpically signal an interrupt without even knowing whats
>> behind the eventfd.  But this isn't a strict requirement to support
>> AIO.  Really all we need is a way for both kernel and userspace to
>> signal.  Perhaps I should export an "irqfd_signal()" function from kvm,
>> which today will map to eventfd_signal(), and tomorrow to ??.  I don't
>> think using f_ops->write() is an option for in-kernel signaling, so we
>> need some kind of interface here.
>>
>> Does that sound reasonable?
>>   
>
> irqfd_signal() ties the user of irqfd to kvm.  I want this user to be
> independent of kvm; it should work with eventfd, kvm's eventfd
> lookalike (if we move away from eventfd) or pipes.

So what is your proposal for such interface?

-Greg



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [KVM PATCH v2 2/2] kvm: add support for irqfd via eventfd-notification interface
  2009-04-28 11:04                 ` Gregory Haskins
@ 2009-04-28 11:05                   ` Avi Kivity
  2009-04-28 11:08                     ` Avi Kivity
  0 siblings, 1 reply; 20+ messages in thread
From: Avi Kivity @ 2009-04-28 11:05 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: kvm, linux-kernel, davidel

Gregory Haskins wrote:
>>>> But eventfd_signal basically marries us to eventfd.
>>>>     
>>>>         
>>> Well, only if we expect the fd to have eventfd semantics.  There are
>>> advantages to doing so, as we have discussed, because things like AIO
>>> can polymorhpically signal an interrupt without even knowing whats
>>> behind the eventfd.  But this isn't a strict requirement to support
>>> AIO.  Really all we need is a way for both kernel and userspace to
>>> signal.  Perhaps I should export an "irqfd_signal()" function from kvm,
>>> which today will map to eventfd_signal(), and tomorrow to ??.  I don't
>>> think using f_ops->write() is an option for in-kernel signaling, so we
>>> need some kind of interface here.
>>>
>>> Does that sound reasonable?
>>>   
>>>       
>> irqfd_signal() ties the user of irqfd to kvm.  I want this user to be
>> independent of kvm; it should work with eventfd, kvm's eventfd
>> lookalike (if we move away from eventfd) or pipes.
>>     
>
> So what is your proposal for such interface?
>
>   

->write().

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [KVM PATCH v2 2/2] kvm: add support for irqfd via eventfd-notification interface
  2009-04-28 11:05                   ` Avi Kivity
@ 2009-04-28 11:08                     ` Avi Kivity
  2009-04-28 11:38                       ` Gregory Haskins
  0 siblings, 1 reply; 20+ messages in thread
From: Avi Kivity @ 2009-04-28 11:08 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: kvm, linux-kernel, davidel

Avi Kivity wrote:
>>
>> So what is your proposal for such interface?
>>
>>   
>
> ->write().
>

Alternatively, a new fileop ->signal_event(), which would map to 
eventfd_signal() or irqfd_signal().  This would be defined to work in 
irq contexts.  It may also be useful for uio interrupts.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [KVM PATCH v2 2/2] kvm: add support for irqfd via eventfd-notification interface
  2009-04-28 11:08                     ` Avi Kivity
@ 2009-04-28 11:38                       ` Gregory Haskins
  2009-04-28 11:48                         ` Avi Kivity
  0 siblings, 1 reply; 20+ messages in thread
From: Gregory Haskins @ 2009-04-28 11:38 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, linux-kernel, davidel

[-- Attachment #1: Type: text/plain, Size: 1712 bytes --]

Avi Kivity wrote:
> Avi Kivity wrote:
>>>
>>> So what is your proposal for such interface?
>>>
>>>   
>>
>> ->write().
>>
>
> Alternatively, a new fileop ->signal_event(), which would map to
> eventfd_signal() or irqfd_signal().  This would be defined to work in
> irq contexts.  It may also be useful for uio interrupts.
>
Hmm...I'm not crazy about either of those.  write() has obvious
limitations both from a interrupt execution context, as well as the
awkwardness of dealing with creating+passing a viable "userspace"
pointer from kernel code.  On the other hand, a new fileop doesn't quite
seem appropriate either since it doesn't apply to the overall fileop
abstraction very well.

We could potentially have a separate vtable interface just for
event-type fds, and make eventfd and irqfd the first implementations. 
But I am not sure it is worth it.  What I suggest is that we work within
the existing eventfd interface.  It was designed specifically to signal
events, after all.

If at some point in the future we need to ensure that the callbacks are
not invoked from a preempt-off/irq-off critical section, we can revist
the eventfd internals at that time.  Note that since we would like to
support signaling from interrupt context anyway, trying to get rid of
the wqh critical section that we have today may be a fools errand (*). 
Instead, we should probably focus on making the injection path support
non-preemptible contexts, as this will have the biggest benefits and
gains in the long run.

Thoughts?
-Greg

(*) The biggest benefit is that you can do tricks like
                                  "if (preemptible()) do_it_now(); else
do_it_later();"



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [KVM PATCH v2 2/2] kvm: add support for irqfd via eventfd-notification interface
  2009-04-28 11:38                       ` Gregory Haskins
@ 2009-04-28 11:48                         ` Avi Kivity
  2009-04-28 12:07                           ` Gregory Haskins
  0 siblings, 1 reply; 20+ messages in thread
From: Avi Kivity @ 2009-04-28 11:48 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: kvm, linux-kernel, davidel

Gregory Haskins wrote:
> Avi Kivity wrote:
>   
>> Avi Kivity wrote:
>>     
>>>> So what is your proposal for such interface?
>>>>
>>>>   
>>>>         
>>> ->write().
>>>
>>>       
>> Alternatively, a new fileop ->signal_event(), which would map to
>> eventfd_signal() or irqfd_signal().  This would be defined to work in
>> irq contexts.  It may also be useful for uio interrupts.
>>
>>     
> Hmm...I'm not crazy about either of those.  write() has obvious
> limitations both from a interrupt execution context, as well as the
> awkwardness of dealing with creating+passing a viable "userspace"
> pointer from kernel code.  On the other hand, a new fileop doesn't quite
> seem appropriate either since it doesn't apply to the overall fileop
> abstraction very well.
>
> We could potentially have a separate vtable interface just for
> event-type fds, and make eventfd and irqfd the first implementations. 
> But I am not sure it is worth it.  What I suggest is that we work within
> the existing eventfd interface.  It was designed specifically to signal
> events, after all.
>
> If at some point in the future we need to ensure that the callbacks are
> not invoked from a preempt-off/irq-off critical section, we can revist
> the eventfd internals at that time.  Note that since we would like to
> support signaling from interrupt context anyway, trying to get rid of
> the wqh critical section that we have today may be a fools errand (*). 
> Instead, we should probably focus on making the injection path support
> non-preemptible contexts, as this will have the biggest benefits and
> gains in the long run.
>
>   
But again, you're forcing everyone who uses irqfd to require eventfd.

Maybe we should change eventfd_signal() to fall back to ->write if the 
file happens not to be an eventfd.  It could also handle the 
nonpreemptible context as well.


-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [KVM PATCH v2 2/2] kvm: add support for irqfd via eventfd-notification interface
  2009-04-28 11:48                         ` Avi Kivity
@ 2009-04-28 12:07                           ` Gregory Haskins
  0 siblings, 0 replies; 20+ messages in thread
From: Gregory Haskins @ 2009-04-28 12:07 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, linux-kernel, davidel

[-- Attachment #1: Type: text/plain, Size: 2067 bytes --]

Avi Kivity wrote:
> Gregory Haskins wrote:
>> Avi Kivity wrote:
>>  
>>> Avi Kivity wrote:
>>>    
>>>>> So what is your proposal for such interface?
>>>>>
>>>>>           
>>>> ->write().
>>>>
>>>>       
>>> Alternatively, a new fileop ->signal_event(), which would map to
>>> eventfd_signal() or irqfd_signal().  This would be defined to work in
>>> irq contexts.  It may also be useful for uio interrupts.
>>>
>>>     
>> Hmm...I'm not crazy about either of those.  write() has obvious
>> limitations both from a interrupt execution context, as well as the
>> awkwardness of dealing with creating+passing a viable "userspace"
>> pointer from kernel code.  On the other hand, a new fileop doesn't quite
>> seem appropriate either since it doesn't apply to the overall fileop
>> abstraction very well.
>>
>> We could potentially have a separate vtable interface just for
>> event-type fds, and make eventfd and irqfd the first implementations.
>> But I am not sure it is worth it.  What I suggest is that we work within
>> the existing eventfd interface.  It was designed specifically to signal
>> events, after all.
>>
>> If at some point in the future we need to ensure that the callbacks are
>> not invoked from a preempt-off/irq-off critical section, we can revist
>> the eventfd internals at that time.  Note that since we would like to
>> support signaling from interrupt context anyway, trying to get rid of
>> the wqh critical section that we have today may be a fools errand
>> (*). Instead, we should probably focus on making the injection path
>> support
>> non-preemptible contexts, as this will have the biggest benefits and
>> gains in the long run.
>>
>>   
> But again, you're forcing everyone who uses irqfd to require eventfd.
>
> Maybe we should change eventfd_signal() to fall back to ->write if the
> file happens not to be an eventfd.  It could also handle the
> nonpreemptible context as well.
>
>
Yep, that works.  And the nice thing from my perspective is: we don't
need a v4 ;)

-Greg


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2009-04-28 12:07 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-24  4:25 [KVM PATCH v2 0/2] irqfd Gregory Haskins
2009-04-24  4:25 ` [KVM PATCH v2 1/2] eventfd: export fget and signal interfaces for module use Gregory Haskins
2009-04-24  4:25 ` [KVM PATCH v2 2/2] kvm: add support for irqfd via eventfd-notification interface Gregory Haskins
2009-04-24 17:07   ` Gregory Haskins
2009-04-24 17:47     ` Davide Libenzi
2009-04-27  8:55   ` Avi Kivity
2009-04-27 10:35     ` Gregory Haskins
2009-04-27 10:48       ` Avi Kivity
2009-04-27 13:27         ` Gregory Haskins
2009-04-28  9:35           ` Avi Kivity
2009-04-28 10:34             ` Gregory Haskins
2009-04-28 11:00               ` Avi Kivity
2009-04-28 11:04                 ` Gregory Haskins
2009-04-28 11:05                   ` Avi Kivity
2009-04-28 11:08                     ` Avi Kivity
2009-04-28 11:38                       ` Gregory Haskins
2009-04-28 11:48                         ` Avi Kivity
2009-04-28 12:07                           ` Gregory Haskins
2009-04-27 10:58       ` Gregory Haskins
2009-04-27 11:23         ` Avi Kivity

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.