linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v9 0/2] kvm: level irqfd support
@ 2012-08-21 19:28 Alex Williamson
  2012-08-21 19:29 ` [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd Alex Williamson
                   ` (4 more replies)
  0 siblings, 5 replies; 35+ messages in thread
From: Alex Williamson @ 2012-08-21 19:28 UTC (permalink / raw)
  To: avi, mst; +Cc: gleb, kvm, linux-kernel

Here's the much anticipated re-write of support for level irqfds.  As
Michael suggested, I've rolled the eoi/ack notification fd into
KVM_IRQFD as a new mode.  For lack of a better name, as there seems to
be objections to associating this specifically with an EOI or an ACK,
I've name this OADN or "On Ack, De-assert & Notify".

Patch 1of2 switches current KVM_IRQFDs to use their own IRQ source ID
since we're potentially stepping on KVM_USERSPACE_IRQ_SOURCE_ID.
Unfurtunately I was not able to make 2of2 use a single IRQ source ID,
the reason is it's racy.  Objects to track OADNs are made dynamically,
we look through existing ones for a match under spinlock and setup a
new one if there's no match.  On teardown, we can remove the OADN from
the list under lock, but that same lock prevents us from de-assigning
the IRQ ACK notifier or waiting for an RCU grace period.  We must make
sure that any unused GSI is de-asserted, but the above means it's
possible that another OADN has been created for this source ID/GSI
and de-asserting the GSI could lead to breakage.  Instead each OADN
object gets it's own source ID, but these are all shared by users
of the same GSI.  So for PCI devices, we might have up to 4 IRQ
source IDs allocated.

Michael had also suggested avoiding reference counting and using
list_empty for this OADN object.  Unfortunately, that doesn't work
for similar reasons.  We want to release the OADN object underlock,
preventing others from re-using it on the free path, but in order
to have lock-less de-assert & notify we use RCU, meaning we can't
trust list_empty until after an RCU grace period, which must be
done outside of spinlocks.

If there are suggestions how we can handle these better, please
make them, but I think this compromise is race-free and still
manages to make allocation of IRQ source IDs mostly a non-issue
for device assignment limits.  Thanks,

Alex

---

Alex Williamson (2):
      kvm: On Ack, De-assert & Notify KVM_IRQFD extension
      kvm: Use a reserved IRQ source ID for irqfd


 Documentation/virtual/kvm/api.txt |   13 ++
 arch/x86/kvm/x86.c                |    4 +
 include/linux/kvm.h               |    7 +
 include/linux/kvm_host.h          |    2 
 virt/kvm/eventfd.c                |  199 ++++++++++++++++++++++++++++++++++++-
 5 files changed, 218 insertions(+), 7 deletions(-)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd
  2012-08-21 19:28 [PATCH v9 0/2] kvm: level irqfd support Alex Williamson
@ 2012-08-21 19:29 ` Alex Williamson
  2012-08-21 19:58   ` Michael S. Tsirkin
  2012-09-05 14:46   ` Avi Kivity
  2012-08-21 19:29 ` [PATCH v9 2/2] kvm: On Ack, De-assert & Notify KVM_IRQFD extension Alex Williamson
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 35+ messages in thread
From: Alex Williamson @ 2012-08-21 19:29 UTC (permalink / raw)
  To: avi, mst; +Cc: gleb, kvm, linux-kernel

KVM_IRQFD currently uses the reserved KVM_USERSPACE_IRQ_SOURCE_ID
which is also shared with userspace injection methods like
KVM_IRQ_LINE.  This can cause a conflict if an irqfd triggers on
a GSI asserted through KVM_IRQ_LINE.  Move irqfd to it's own
reserved IRQ source ID.  Add a capability for userspace to test
for this fix.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 arch/x86/kvm/x86.c       |    3 +++
 include/linux/kvm.h      |    1 +
 include/linux/kvm_host.h |    1 +
 virt/kvm/eventfd.c       |    6 +++---
 4 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 42bce48..cd98673 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2174,6 +2174,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_GET_TSC_KHZ:
 	case KVM_CAP_PCI_2_3:
 	case KVM_CAP_KVMCLOCK_CTRL:
+	case KVM_CAP_IRQFD_IRQ_SOURCE_ID:
 		r = 1;
 		break;
 	case KVM_CAP_COALESCED_MMIO:
@@ -6258,6 +6259,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 
 	/* Reserve bit 0 of irq_sources_bitmap for userspace irq source */
 	set_bit(KVM_USERSPACE_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap);
+	/* Reserve bit 1 of irq_sources_bitmap for irqfd irq source */
+	set_bit(KVM_IRQFD_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap);
 
 	raw_spin_lock_init(&kvm->arch.tsc_write_lock);
 
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 2ce09aa..ae66b9c 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_PPC_GET_SMMU_INFO 78
 #define KVM_CAP_S390_COW 79
 #define KVM_CAP_PPC_ALLOC_HTAB 80
+#define KVM_CAP_IRQFD_IRQ_SOURCE_ID 81
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b70b48b..b763230 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -71,6 +71,7 @@
 #define KVM_REQ_PMI               17
 
 #define KVM_USERSPACE_IRQ_SOURCE_ID	0
+#define KVM_IRQFD_IRQ_SOURCE_ID		1
 
 struct kvm;
 struct kvm_vcpu;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 7d7e2aa..2245cfa 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -67,8 +67,8 @@ irqfd_inject(struct work_struct *work)
 	struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
 	struct kvm *kvm = irqfd->kvm;
 
-	kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1);
-	kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
+	kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 1);
+	kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 0);
 }
 
 /*
@@ -138,7 +138,7 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
 		irq = rcu_dereference(irqfd->irq_entry);
 		/* An event has been signaled, inject an interrupt */
 		if (irq)
-			kvm_set_msi(irq, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1);
+			kvm_set_msi(irq, kvm, KVM_IRQFD_IRQ_SOURCE_ID, 1);
 		else
 			schedule_work(&irqfd->inject);
 		rcu_read_unlock();


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v9 2/2] kvm: On Ack, De-assert & Notify KVM_IRQFD extension
  2012-08-21 19:28 [PATCH v9 0/2] kvm: level irqfd support Alex Williamson
  2012-08-21 19:29 ` [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd Alex Williamson
@ 2012-08-21 19:29 ` Alex Williamson
  2012-08-21 20:37   ` Michael S. Tsirkin
  2012-09-05 14:57   ` Avi Kivity
  2012-08-22  0:31 ` [PATCH v9 0/2] kvm: level irqfd support Michael S. Tsirkin
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 35+ messages in thread
From: Alex Williamson @ 2012-08-21 19:29 UTC (permalink / raw)
  To: avi, mst; +Cc: gleb, kvm, linux-kernel

For VFIO based device assignment we'd like a mechanism to allow level
triggered interrutps to be directly injected into KVM.  KVM_IRQFD
already allows this for edge triggered interrupts, but for level, we
need to watch for acknowledgement of the interrupt from the guest to
provide us a hint when to test the device and allow it to re-assert
if necessary.  To do this, we create a new KVM_IRQFD mode called
"On Ack, De-assert & Notify", or OADN.  In this mode, an interrupt
injection provides only a gsi assertion.  We then hook into the IRQ
ACK notifier, which when triggered de-asserts the gsi and notifies
via another eventfd.  It's then the responsibility of the user to
re-assert the interrupt is service is still required.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 Documentation/virtual/kvm/api.txt |   13 ++
 arch/x86/kvm/x86.c                |    1 
 include/linux/kvm.h               |    6 +
 include/linux/kvm_host.h          |    1 
 virt/kvm/eventfd.c                |  193 ++++++++++++++++++++++++++++++++++++-
 5 files changed, 210 insertions(+), 4 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index bf33aaa..87d7321 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1946,6 +1946,19 @@ the guest using the specified gsi pin.  The irqfd is removed using
 the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
 and kvm_irqfd.gsi.
 
+With KVM_CAP_IRQFD_OADN, KVM_IRQFD supports an "On Ack, De-assert &
+Notify" option that allows emulation of level-triggered interrupts.
+When kvm_irqfd.fd is triggered, the requested gsi is asserted and
+remains asserted until interaction with the irqchip indicates the
+VM has acknowledged the interrupt, such as an EOI.  On acknoledgement
+the gsi is automatically de-asserted and the user is notified via
+kvm_irqfd.notifyfd.  The user is then required to re-assert the
+interrupt if the associated device still requires service.  To enable
+this mode, configure the KVM_IRQFD using the KVM_IRQFD_FLAG_OADN flag
+and specify kvm_irqfd.notifyfd.  Note that closing kvm_irqfd.notifyfd
+while configured in this mode does not disable the irqfd.  The
+KVM_IRQFD_FLAG_OADN flag is only necessary on assignment.
+
 4.76 KVM_PPC_ALLOCATE_HTAB
 
 Capability: KVM_CAP_PPC_ALLOC_HTAB
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cd98673..fde7b66 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2175,6 +2175,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_PCI_2_3:
 	case KVM_CAP_KVMCLOCK_CTRL:
 	case KVM_CAP_IRQFD_IRQ_SOURCE_ID:
+	case KVM_CAP_IRQFD_OADN:
 		r = 1;
 		break;
 	case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index ae66b9c..ec0f1d8 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -619,6 +619,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_S390_COW 79
 #define KVM_CAP_PPC_ALLOC_HTAB 80
 #define KVM_CAP_IRQFD_IRQ_SOURCE_ID 81
+#define KVM_CAP_IRQFD_OADN 82
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -684,12 +685,15 @@ struct kvm_xen_hvm_config {
 #endif
 
 #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
+/* Availabie with KVM_CAP_IRQFD_OADN */
+#define KVM_IRQFD_FLAG_OADN (1 << 1)
 
 struct kvm_irqfd {
 	__u32 fd;
 	__u32 gsi;
 	__u32 flags;
-	__u8  pad[20];
+	__u32 notifyfd;
+	__u8  pad[16];
 };
 
 struct kvm_clock_data {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b763230..d502d08 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -284,6 +284,7 @@ struct kvm {
 	struct {
 		spinlock_t        lock;
 		struct list_head  items;
+		struct list_head  oadns;
 	} irqfds;
 	struct list_head ioeventfds;
 #endif
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 2245cfa..dfdb5b2 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -43,6 +43,23 @@
  * --------------------------------------------------------------------
  */
 
+/*
+ * OADN irqfds (On Ack, De-assert & Notify) are a special variety of
+ * irqfds that assert an interrupt to the irqchip on eventfd trigger,
+ * receieve notification when userspace acknowledges the interrupt,
+ * automatically de-asserts the irqchip level, and notifies userspace
+ * via the oadn_eventfd.  This object helps to provide one-to-many
+ * deassert-to-notify so we can share a single irq source ID per OADN.
+ */
+struct _irqfd_oadn {
+	struct kvm *kvm;
+	int irq_source_id; /* IRQ source ID shared by these irqfds */
+	struct list_head irqfds; /* list of irqfds using this object */
+	struct kvm_irq_ack_notifier notifier; /* IRQ ACK notification */
+	struct kref kref; /* Race-free removal */
+	struct list_head list;
+};
+
 struct _irqfd {
 	/* Used for MSI fast-path */
 	struct kvm *kvm;
@@ -52,6 +69,10 @@ struct _irqfd {
 	/* Used for level IRQ fast-path */
 	int gsi;
 	struct work_struct inject;
+	/* Used for OADN (On Ack, De-assert & Notify) path */
+	struct eventfd_ctx *oadn_eventfd;
+	struct list_head oadn_list;
+	struct _irqfd_oadn *oadn;
 	/* Used for setup/shutdown */
 	struct eventfd_ctx *eventfd;
 	struct list_head list;
@@ -71,6 +92,51 @@ irqfd_inject(struct work_struct *work)
 	kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 0);
 }
 
+static void
+irqfd_oadn_inject(struct work_struct *work)
+{
+	struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
+
+	kvm_set_irq(irqfd->kvm, irqfd->oadn->irq_source_id, irqfd->gsi, 1);
+}
+
+/*
+ * Since OADN irqfds share an IRQ source ID, we de-assert once then
+ * notify all of the OADN irqfds using this GSI.  We can't do multiple
+ * de-asserts or we risk racing with incoming re-asserts.
+ */
+static void
+irqfd_oadn_notify(struct kvm_irq_ack_notifier *kian)
+{
+	struct _irqfd_oadn *oadn;
+	struct _irqfd *irqfd;
+
+	oadn = container_of(kian, struct _irqfd_oadn, notifier);
+
+	rcu_read_lock();
+
+	kvm_set_irq(oadn->kvm, oadn->irq_source_id,
+		    oadn->notifier.gsi, 0);
+
+	list_for_each_entry_rcu(irqfd, &oadn->irqfds, oadn_list)
+		eventfd_signal(irqfd->oadn_eventfd, 1);
+
+	rcu_read_unlock();
+}
+
+/*
+ * Assumes kvm->irqfds.lock is held.  Remove from the list of OADNs, new
+ * users will allocate their own, letting us de-assert this GSI on free,
+ * after the RCU grace period.
+ */
+static void
+irqfd_oadn_release(struct kref *kref)
+{
+	struct _irqfd_oadn *oadn = container_of(kref, struct _irqfd_oadn, kref);
+
+	list_del(&oadn->list);
+}
+
 /*
  * Race-free decouple logic (ordering is critical)
  */
@@ -93,6 +159,35 @@ irqfd_shutdown(struct work_struct *work)
 	flush_work_sync(&irqfd->inject);
 
 	/*
+	 * OADN irqfds use an RCU list for lockless de-assert and user
+	 * notification.  Modify the list using RCU under lock and release
+	 * the OADN object.  Since we later free this irqfd, we must wait
+	 * for an RCU grace period.  If the OADN was released, we can then
+	 * unregister the irq ack notifier, free the irq source ID (assuring
+	 * that the GSI is de-asserted), and free the object.
+	 */
+	if (irqfd->oadn) {
+		struct _irqfd_oadn *oadn = irqfd->oadn;
+		struct kvm *kvm = oadn->kvm;
+		bool released;
+
+		spin_lock_irq(&irqfd->kvm->irqfds.lock);
+		list_del_rcu(&irqfd->oadn_list);
+		released = kref_put(&oadn->kref, irqfd_oadn_release);
+		spin_unlock_irq(&irqfd->kvm->irqfds.lock);
+
+		synchronize_rcu();
+
+		if (released) {
+			kvm_unregister_irq_ack_notifier(kvm, &oadn->notifier);
+			kvm_free_irq_source_id(kvm, oadn->irq_source_id);
+			kfree(oadn);
+		}
+
+		eventfd_ctx_put(irqfd->oadn_eventfd);
+	}
+
+	/*
 	 * It is now safe to release the object's resources
 	 */
 	eventfd_ctx_put(irqfd->eventfd);
@@ -202,8 +297,9 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 {
 	struct kvm_irq_routing_table *irq_rt;
 	struct _irqfd *irqfd, *tmp;
+	struct _irqfd_oadn *oadn = NULL;
 	struct file *file = NULL;
-	struct eventfd_ctx *eventfd = NULL;
+	struct eventfd_ctx *eventfd = NULL, *oadn_eventfd = NULL;
 	int ret;
 	unsigned int events;
 
@@ -214,7 +310,49 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 	irqfd->kvm = kvm;
 	irqfd->gsi = args->gsi;
 	INIT_LIST_HEAD(&irqfd->list);
-	INIT_WORK(&irqfd->inject, irqfd_inject);
+	INIT_LIST_HEAD(&irqfd->oadn_list);
+
+	if (args->flags & KVM_IRQFD_FLAG_OADN) {
+		oadn_eventfd = eventfd_ctx_fdget(args->notifyfd);
+		if (IS_ERR(oadn_eventfd)) {
+			ret = PTR_ERR(oadn_eventfd);
+			goto fail;
+		}
+
+		irqfd->oadn_eventfd = oadn_eventfd;
+
+		/*
+		 * We may be able to share an existing OADN, but we won't
+		 * know that until we search under spinlock.  We can't get
+		 * an irq_source_id under spinlock, and we'd prefer not to
+		 * do an atomic allocation, so prep an object here and free
+		 * it if we don't need it.
+		 */
+		oadn = kzalloc(sizeof(*oadn), GFP_KERNEL);
+		if (!oadn) {
+			ret = -ENOMEM;
+			goto fail;
+		}
+
+		oadn->kvm = kvm;
+		INIT_LIST_HEAD(&oadn->irqfds);
+		oadn->notifier.gsi = irqfd->gsi;
+		oadn->notifier.irq_acked = irqfd_oadn_notify;
+		kref_init(&oadn->kref);
+		INIT_LIST_HEAD(&oadn->list);
+
+		oadn->irq_source_id = kvm_request_irq_source_id(kvm);
+		if (oadn->irq_source_id < 0) {
+			ret = oadn->irq_source_id;
+			goto fail;
+		}
+
+		irqfd->oadn = oadn;
+
+		INIT_WORK(&irqfd->inject, irqfd_oadn_inject);
+	} else
+		INIT_WORK(&irqfd->inject, irqfd_inject);
+
 	INIT_WORK(&irqfd->shutdown, irqfd_shutdown);
 
 	file = eventfd_fget(args->fd);
@@ -250,6 +388,27 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 		goto fail;
 	}
 
+	/*
+	 * When configuring an OADN irqfd, try to re-use an existing OADN.
+	 */
+	if (oadn) {
+		struct _irqfd_oadn *oadn_tmp;
+
+		list_for_each_entry(oadn_tmp, &kvm->irqfds.oadns, list) {
+			if (oadn_tmp->notifier.gsi == irqfd->gsi) {
+				irqfd->oadn = oadn_tmp;
+				break;
+			}
+		}
+
+		if (irqfd->oadn != oadn)
+			kref_get(&irqfd->oadn->kref);
+		else
+			list_add(&oadn->list, &kvm->irqfds.oadns);
+
+		list_add_rcu(&irqfd->oadn_list, &irqfd->oadn->irqfds);
+	}
+
 	irq_rt = rcu_dereference_protected(kvm->irq_routing,
 					   lockdep_is_held(&kvm->irqfds.lock));
 	irqfd_update(kvm, irqfd, irq_rt);
@@ -267,6 +426,26 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 
 	spin_unlock_irq(&kvm->irqfds.lock);
 
+	if (oadn) {
+		synchronize_rcu();
+
+		/*
+		 * If we weren't able to re-use an OADN, setup the irq ack
+		 * notifier outside of spinlock.  Our interface requires
+		 * users to be able to handle spurious de-assert & notifies,
+		 * so trigger one here to make sure we didn't miss anything.
+		 * Cleanup unused OADN if we share.
+		 */
+		if (irqfd->oadn == oadn)
+			kvm_register_irq_ack_notifier(kvm, &oadn->notifier);
+		else {
+			kvm_free_irq_source_id(kvm, oadn->irq_source_id);
+			kfree(oadn);
+		}
+
+		irqfd_oadn_notify(&irqfd->oadn->notifier);
+	}
+
 	/*
 	 * do not drop the file until the irqfd is fully initialized, otherwise
 	 * we might race against the POLLHUP
@@ -276,12 +455,19 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 	return 0;
 
 fail:
+	if (oadn_eventfd && !IS_ERR(oadn_eventfd))
+		eventfd_ctx_put(oadn_eventfd);
+
+	if (oadn && oadn->irq_source_id >= 0)
+		kvm_free_irq_source_id(kvm, oadn->irq_source_id);
+
 	if (eventfd && !IS_ERR(eventfd))
 		eventfd_ctx_put(eventfd);
 
 	if (!IS_ERR(file))
 		fput(file);
 
+	kfree(oadn);
 	kfree(irqfd);
 	return ret;
 }
@@ -291,6 +477,7 @@ kvm_eventfd_init(struct kvm *kvm)
 {
 	spin_lock_init(&kvm->irqfds.lock);
 	INIT_LIST_HEAD(&kvm->irqfds.items);
+	INIT_LIST_HEAD(&kvm->irqfds.oadns);
 	INIT_LIST_HEAD(&kvm->ioeventfds);
 }
 
@@ -340,7 +527,7 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
 int
 kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
 {
-	if (args->flags & ~KVM_IRQFD_FLAG_DEASSIGN)
+	if (args->flags & ~(KVM_IRQFD_FLAG_DEASSIGN | KVM_IRQFD_FLAG_OADN))
 		return -EINVAL;
 
 	if (args->flags & KVM_IRQFD_FLAG_DEASSIGN)


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd
  2012-08-21 19:29 ` [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd Alex Williamson
@ 2012-08-21 19:58   ` Michael S. Tsirkin
  2012-08-21 20:06     ` Alex Williamson
  2012-09-05 14:46   ` Avi Kivity
  1 sibling, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2012-08-21 19:58 UTC (permalink / raw)
  To: Alex Williamson; +Cc: avi, gleb, kvm, linux-kernel

On Tue, Aug 21, 2012 at 01:29:06PM -0600, Alex Williamson wrote:
> KVM_IRQFD currently uses the reserved KVM_USERSPACE_IRQ_SOURCE_ID
> which is also shared with userspace injection methods like
> KVM_IRQ_LINE.  This can cause a conflict if an irqfd triggers on
> a GSI asserted through KVM_IRQ_LINE.

What kind of conflict do you envision?  Pls note level interrupts are
unsupported ATM.

> Move irqfd to it's own reserved IRQ source ID.  Add a capability for
> userspace to test for this fix.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
> 
>  arch/x86/kvm/x86.c       |    3 +++
>  include/linux/kvm.h      |    1 +
>  include/linux/kvm_host.h |    1 +
>  virt/kvm/eventfd.c       |    6 +++---
>  4 files changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 42bce48..cd98673 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2174,6 +2174,7 @@ int kvm_dev_ioctl_check_extension(long ext)
>  	case KVM_CAP_GET_TSC_KHZ:
>  	case KVM_CAP_PCI_2_3:
>  	case KVM_CAP_KVMCLOCK_CTRL:
> +	case KVM_CAP_IRQFD_IRQ_SOURCE_ID:
>  		r = 1;
>  		break;
>  	case KVM_CAP_COALESCED_MMIO:
> @@ -6258,6 +6259,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  
>  	/* Reserve bit 0 of irq_sources_bitmap for userspace irq source */
>  	set_bit(KVM_USERSPACE_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap);
> +	/* Reserve bit 1 of irq_sources_bitmap for irqfd irq source */
> +	set_bit(KVM_IRQFD_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap);
>  
>  	raw_spin_lock_init(&kvm->arch.tsc_write_lock);
>  
> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> index 2ce09aa..ae66b9c 100644
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
>  #define KVM_CAP_PPC_GET_SMMU_INFO 78
>  #define KVM_CAP_S390_COW 79
>  #define KVM_CAP_PPC_ALLOC_HTAB 80
> +#define KVM_CAP_IRQFD_IRQ_SOURCE_ID 81
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index b70b48b..b763230 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -71,6 +71,7 @@
>  #define KVM_REQ_PMI               17
>  
>  #define KVM_USERSPACE_IRQ_SOURCE_ID	0
> +#define KVM_IRQFD_IRQ_SOURCE_ID		1
>  
>  struct kvm;
>  struct kvm_vcpu;

Above looks fine but I'm not sure why is the below needed.
This changes irqfd behaviour for edge GSIs slightly
in a userspace-visible way. Maybe make it a separate patch
so it can be considered on merits?

> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 7d7e2aa..2245cfa 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -67,8 +67,8 @@ irqfd_inject(struct work_struct *work)
>  	struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
>  	struct kvm *kvm = irqfd->kvm;
>  
> -	kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1);
> -	kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
> +	kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 1);
> +	kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 0);
>  }
>  
>  /*
> @@ -138,7 +138,7 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
>  		irq = rcu_dereference(irqfd->irq_entry);
>  		/* An event has been signaled, inject an interrupt */
>  		if (irq)
> -			kvm_set_msi(irq, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1);
> +			kvm_set_msi(irq, kvm, KVM_IRQFD_IRQ_SOURCE_ID, 1);
>  		else
>  			schedule_work(&irqfd->inject);
>  		rcu_read_unlock();

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd
  2012-08-21 19:58   ` Michael S. Tsirkin
@ 2012-08-21 20:06     ` Alex Williamson
  2012-08-21 20:41       ` Michael S. Tsirkin
  0 siblings, 1 reply; 35+ messages in thread
From: Alex Williamson @ 2012-08-21 20:06 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: avi, gleb, kvm, linux-kernel

On Tue, 2012-08-21 at 22:58 +0300, Michael S. Tsirkin wrote:
> On Tue, Aug 21, 2012 at 01:29:06PM -0600, Alex Williamson wrote:
> > KVM_IRQFD currently uses the reserved KVM_USERSPACE_IRQ_SOURCE_ID
> > which is also shared with userspace injection methods like
> > KVM_IRQ_LINE.  This can cause a conflict if an irqfd triggers on
> > a GSI asserted through KVM_IRQ_LINE.
> 
> What kind of conflict do you envision?  Pls note level interrupts are
> unsupported ATM.

If KVM_IRQ_LINE asserts a level interrupt and KVM_IRQFD triggers on the
same GSI then the pin is no longer asserted as userspace thinks it is.
Do we just chalk this up to userspace error?

> > Move irqfd to it's own reserved IRQ source ID.  Add a capability for
> > userspace to test for this fix.
> > 
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > ---
> > 
> >  arch/x86/kvm/x86.c       |    3 +++
> >  include/linux/kvm.h      |    1 +
> >  include/linux/kvm_host.h |    1 +
> >  virt/kvm/eventfd.c       |    6 +++---
> >  4 files changed, 8 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 42bce48..cd98673 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -2174,6 +2174,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> >  	case KVM_CAP_GET_TSC_KHZ:
> >  	case KVM_CAP_PCI_2_3:
> >  	case KVM_CAP_KVMCLOCK_CTRL:
> > +	case KVM_CAP_IRQFD_IRQ_SOURCE_ID:
> >  		r = 1;
> >  		break;
> >  	case KVM_CAP_COALESCED_MMIO:
> > @@ -6258,6 +6259,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> >  
> >  	/* Reserve bit 0 of irq_sources_bitmap for userspace irq source */
> >  	set_bit(KVM_USERSPACE_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap);
> > +	/* Reserve bit 1 of irq_sources_bitmap for irqfd irq source */
> > +	set_bit(KVM_IRQFD_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap);
> >  
> >  	raw_spin_lock_init(&kvm->arch.tsc_write_lock);
> >  
> > diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> > index 2ce09aa..ae66b9c 100644
> > --- a/include/linux/kvm.h
> > +++ b/include/linux/kvm.h
> > @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
> >  #define KVM_CAP_PPC_GET_SMMU_INFO 78
> >  #define KVM_CAP_S390_COW 79
> >  #define KVM_CAP_PPC_ALLOC_HTAB 80
> > +#define KVM_CAP_IRQFD_IRQ_SOURCE_ID 81
> >  
> >  #ifdef KVM_CAP_IRQ_ROUTING
> >  
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index b70b48b..b763230 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -71,6 +71,7 @@
> >  #define KVM_REQ_PMI               17
> >  
> >  #define KVM_USERSPACE_IRQ_SOURCE_ID	0
> > +#define KVM_IRQFD_IRQ_SOURCE_ID		1
> >  
> >  struct kvm;
> >  struct kvm_vcpu;
> 
> Above looks fine but I'm not sure why is the below needed.
> This changes irqfd behaviour for edge GSIs slightly
> in a userspace-visible way. Maybe make it a separate patch
> so it can be considered on merits?

Hmm, the above does nothing without the below.  I thought I was just
implementing your idea that IRQFDs should all share a single IRQ source
ID... why is that no longer a good idea?  Thanks,

Alex

> > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> > index 7d7e2aa..2245cfa 100644
> > --- a/virt/kvm/eventfd.c
> > +++ b/virt/kvm/eventfd.c
> > @@ -67,8 +67,8 @@ irqfd_inject(struct work_struct *work)
> >  	struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
> >  	struct kvm *kvm = irqfd->kvm;
> >  
> > -	kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1);
> > -	kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
> > +	kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 1);
> > +	kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 0);
> >  }
> >  
> >  /*
> > @@ -138,7 +138,7 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
> >  		irq = rcu_dereference(irqfd->irq_entry);
> >  		/* An event has been signaled, inject an interrupt */
> >  		if (irq)
> > -			kvm_set_msi(irq, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1);
> > +			kvm_set_msi(irq, kvm, KVM_IRQFD_IRQ_SOURCE_ID, 1);
> >  		else
> >  			schedule_work(&irqfd->inject);
> >  		rcu_read_unlock();




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 2/2] kvm: On Ack, De-assert & Notify KVM_IRQFD extension
  2012-08-21 19:29 ` [PATCH v9 2/2] kvm: On Ack, De-assert & Notify KVM_IRQFD extension Alex Williamson
@ 2012-08-21 20:37   ` Michael S. Tsirkin
  2012-08-21 21:11     ` Alex Williamson
  2012-09-05 14:57   ` Avi Kivity
  1 sibling, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2012-08-21 20:37 UTC (permalink / raw)
  To: Alex Williamson; +Cc: avi, gleb, kvm, linux-kernel

On Tue, Aug 21, 2012 at 01:29:14PM -0600, Alex Williamson wrote:
> For VFIO based device assignment we'd like a mechanism to allow level
> triggered interrutps to be directly injected into KVM.  KVM_IRQFD
> already allows this for edge triggered interrupts, but for level, we
> need to watch for acknowledgement of the interrupt from the guest to
> provide us a hint when to test the device and allow it to re-assert
> if necessary.  To do this, we create a new KVM_IRQFD mode called
> "On Ack, De-assert & Notify", or OADN.  In this mode, an interrupt
> injection provides only a gsi assertion.  We then hook into the IRQ
> ACK notifier, which when triggered de-asserts the gsi and notifies
> via another eventfd.  It's then the responsibility of the user to
> re-assert the interrupt is service is still required.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

Naming aside, looks good.
I think I see some minor bugs, and I added some improvement
suggestions below.

Thanks!

> ---
> 
>  Documentation/virtual/kvm/api.txt |   13 ++
>  arch/x86/kvm/x86.c                |    1 
>  include/linux/kvm.h               |    6 +
>  include/linux/kvm_host.h          |    1 
>  virt/kvm/eventfd.c                |  193 ++++++++++++++++++++++++++++++++++++-
>  5 files changed, 210 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index bf33aaa..87d7321 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -1946,6 +1946,19 @@ the guest using the specified gsi pin.  The irqfd is removed using
>  the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
>  and kvm_irqfd.gsi.
>  
> +With KVM_CAP_IRQFD_OADN, KVM_IRQFD supports an "On Ack, De-assert &
> +Notify" option that allows emulation of level-triggered interrupts.
> +When kvm_irqfd.fd is triggered, the requested gsi is asserted and
> +remains asserted until interaction with the irqchip indicates the
> +VM has acknowledged the interrupt, such as an EOI.  On acknoledgement
> +the gsi is automatically de-asserted and the user is notified via
> +kvm_irqfd.notifyfd.  The user is then required to re-assert the
> +interrupt if the associated device still requires service.  To enable
> +this mode, configure the KVM_IRQFD using the KVM_IRQFD_FLAG_OADN flag
> +and specify kvm_irqfd.notifyfd.  Note that closing kvm_irqfd.notifyfd
> +while configured in this mode does not disable the irqfd.  The
> +KVM_IRQFD_FLAG_OADN flag is only necessary on assignment.
> +
>  4.76 KVM_PPC_ALLOCATE_HTAB
>  
>  Capability: KVM_CAP_PPC_ALLOC_HTAB
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index cd98673..fde7b66 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2175,6 +2175,7 @@ int kvm_dev_ioctl_check_extension(long ext)
>  	case KVM_CAP_PCI_2_3:
>  	case KVM_CAP_KVMCLOCK_CTRL:
>  	case KVM_CAP_IRQFD_IRQ_SOURCE_ID:
> +	case KVM_CAP_IRQFD_OADN:
>  		r = 1;
>  		break;
>  	case KVM_CAP_COALESCED_MMIO:
> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> index ae66b9c..ec0f1d8 100644
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -619,6 +619,7 @@ struct kvm_ppc_smmu_info {
>  #define KVM_CAP_S390_COW 79
>  #define KVM_CAP_PPC_ALLOC_HTAB 80
>  #define KVM_CAP_IRQFD_IRQ_SOURCE_ID 81
> +#define KVM_CAP_IRQFD_OADN 82
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> @@ -684,12 +685,15 @@ struct kvm_xen_hvm_config {
>  #endif
>  
>  #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
> +/* Availabie with KVM_CAP_IRQFD_OADN */

Need to also explain what it is.

> +#define KVM_IRQFD_FLAG_OADN (1 << 1)
>  
>  struct kvm_irqfd {
>  	__u32 fd;
>  	__u32 gsi;
>  	__u32 flags;
> -	__u8  pad[20];
> +	__u32 notifyfd;

Document that this is only valid with OADN flag.  Might be a good idea
to rename this to deassert_on_ack_notifyfd or oadn_notifyfd
to avoid confusion.


> +	__u8  pad[16];
>  };
>  
>  struct kvm_clock_data {
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index b763230..d502d08 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -284,6 +284,7 @@ struct kvm {
>  	struct {
>  		spinlock_t        lock;
>  		struct list_head  items;
> +		struct list_head  oadns;
>  	} irqfds;
>  	struct list_head ioeventfds;
>  #endif
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 2245cfa..dfdb5b2 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -43,6 +43,23 @@
>   * --------------------------------------------------------------------
>   */
>  
> +/*
> + * OADN irqfds (On Ack, De-assert & Notify) are a special variety of
> + * irqfds that assert an interrupt to the irqchip on eventfd trigger,
> + * receieve notification when userspace acknowledges the interrupt,
> + * automatically de-asserts the irqchip level, and notifies userspace
> + * via the oadn_eventfd.  This object helps to provide one-to-many
> + * deassert-to-notify so we can share a single irq source ID per OADN.
> + */
> +struct _irqfd_oadn {
> +	struct kvm *kvm;
> +	int irq_source_id; /* IRQ source ID shared by these irqfds */
> +	struct list_head irqfds; /* list of irqfds using this object */
> +	struct kvm_irq_ack_notifier notifier; /* IRQ ACK notification */
> +	struct kref kref; /* Race-free removal */
> +	struct list_head list;
> +};
> +
>  struct _irqfd {
>  	/* Used for MSI fast-path */
>  	struct kvm *kvm;
> @@ -52,6 +69,10 @@ struct _irqfd {
>  	/* Used for level IRQ fast-path */
>  	int gsi;
>  	struct work_struct inject;
> +	/* Used for OADN (On Ack, De-assert & Notify) path */
> +	struct eventfd_ctx *oadn_eventfd;
> +	struct list_head oadn_list;

Could you document RCU and locking rules for this field please?

> +	struct _irqfd_oadn *oadn;
>  	/* Used for setup/shutdown */
>  	struct eventfd_ctx *eventfd;
>  	struct list_head list;
> @@ -71,6 +92,51 @@ irqfd_inject(struct work_struct *work)
>  	kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 0);
>  }
>  
> +static void
> +irqfd_oadn_inject(struct work_struct *work)
> +{
> +	struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
> +
> +	kvm_set_irq(irqfd->kvm, irqfd->oadn->irq_source_id, irqfd->gsi, 1);
> +}
> +
> +/*
> + * Since OADN irqfds share an IRQ source ID, we de-assert once then
> + * notify all of the OADN irqfds using this GSI.  We can't do multiple
> + * de-asserts or we risk racing with incoming re-asserts.
> + */
> +static void
> +irqfd_oadn_notify(struct kvm_irq_ack_notifier *kian)
> +{
> +	struct _irqfd_oadn *oadn;
> +	struct _irqfd *irqfd;
> +
> +	oadn = container_of(kian, struct _irqfd_oadn, notifier);
> +
> +	rcu_read_lock();
> +
> +	kvm_set_irq(oadn->kvm, oadn->irq_source_id,
> +		    oadn->notifier.gsi, 0);
> +
> +	list_for_each_entry_rcu(irqfd, &oadn->irqfds, oadn_list)
> +		eventfd_signal(irqfd->oadn_eventfd, 1);
> +
> +	rcu_read_unlock();
> +}
> +
> +/*
> + * Assumes kvm->irqfds.lock is held.  Remove from the list of OADNs, new
> + * users will allocate their own, letting us de-assert this GSI on free,
> + * after the RCU grace period.
> + */
> +static void
> +irqfd_oadn_release(struct kref *kref)
> +{
> +	struct _irqfd_oadn *oadn = container_of(kref, struct _irqfd_oadn, kref);
> +
> +	list_del(&oadn->list);
> +}
> +
>  /*
>   * Race-free decouple logic (ordering is critical)
>   */
> @@ -93,6 +159,35 @@ irqfd_shutdown(struct work_struct *work)
>  	flush_work_sync(&irqfd->inject);
>  
>  	/*
> +	 * OADN irqfds use an RCU list for lockless de-assert and user
> +	 * notification.  Modify the list using RCU under lock and release
> +	 * the OADN object.  Since we later free this irqfd, we must wait
> +	 * for an RCU grace period.  If the OADN was released, we can then
> +	 * unregister the irq ack notifier, free the irq source ID (assuring
> +	 * that the GSI is de-asserted), and free the object.
> +	 */
> +	if (irqfd->oadn) {
> +		struct _irqfd_oadn *oadn = irqfd->oadn;
> +		struct kvm *kvm = oadn->kvm;
> +		bool released;
> +
> +		spin_lock_irq(&irqfd->kvm->irqfds.lock);
> +		list_del_rcu(&irqfd->oadn_list);
> +		released = kref_put(&oadn->kref, irqfd_oadn_release);
> +		spin_unlock_irq(&irqfd->kvm->irqfds.lock);
> +
> +		synchronize_rcu();
> +
> +		if (released) {
> +			kvm_unregister_irq_ack_notifier(kvm, &oadn->notifier);
> +			kvm_free_irq_source_id(kvm, oadn->irq_source_id);
> +			kfree(oadn);
> +		}
> +
> +		eventfd_ctx_put(irqfd->oadn_eventfd);
> +	}
> +
> +	/*
>  	 * It is now safe to release the object's resources
>  	 */
>  	eventfd_ctx_put(irqfd->eventfd);
> @@ -202,8 +297,9 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
>  {
>  	struct kvm_irq_routing_table *irq_rt;
>  	struct _irqfd *irqfd, *tmp;
> +	struct _irqfd_oadn *oadn = NULL;
>  	struct file *file = NULL;
> -	struct eventfd_ctx *eventfd = NULL;
> +	struct eventfd_ctx *eventfd = NULL, *oadn_eventfd = NULL;
>  	int ret;
>  	unsigned int events;
>  
> @@ -214,7 +310,49 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
>  	irqfd->kvm = kvm;
>  	irqfd->gsi = args->gsi;
>  	INIT_LIST_HEAD(&irqfd->list);
> -	INIT_WORK(&irqfd->inject, irqfd_inject);
> +	INIT_LIST_HEAD(&irqfd->oadn_list);
> +
> +	if (args->flags & KVM_IRQFD_FLAG_OADN) {
> +		oadn_eventfd = eventfd_ctx_fdget(args->notifyfd);
> +		if (IS_ERR(oadn_eventfd)) {
> +			ret = PTR_ERR(oadn_eventfd);
> +			goto fail;
> +		}
> +
> +		irqfd->oadn_eventfd = oadn_eventfd;
> +
> +		/*
> +		 * We may be able to share an existing OADN, but we won't
> +		 * know that until we search under spinlock.  We can't get
> +		 * an irq_source_id under spinlock, and we'd prefer not to
> +		 * do an atomic allocation, so prep an object here and free
> +		 * it if we don't need it.
> +		 */

So if everyone uses same source ID this code will be simpler too?

> +		oadn = kzalloc(sizeof(*oadn), GFP_KERNEL);
> +		if (!oadn) {
> +			ret = -ENOMEM;
> +			goto fail;
> +		}
> +
> +		oadn->kvm = kvm;
> +		INIT_LIST_HEAD(&oadn->irqfds);
> +		oadn->notifier.gsi = irqfd->gsi;
> +		oadn->notifier.irq_acked = irqfd_oadn_notify;
> +		kref_init(&oadn->kref);
> +		INIT_LIST_HEAD(&oadn->list);
> +
> +		oadn->irq_source_id = kvm_request_irq_source_id(kvm);
> +		if (oadn->irq_source_id < 0) {
> +			ret = oadn->irq_source_id;
> +			goto fail;
> +		}
> +

This still has the problem that by allocating these
OADN irqfds you can run out of source IDs for other uses.
Why don't OADNs just use KVM_IRQFD_IRQ_SOURCE_ID?
I thought this is why it was introduced in this patchset ...


> +		irqfd->oadn = oadn;
> +
> +		INIT_WORK(&irqfd->inject, irqfd_oadn_inject);
> +	} else
> +		INIT_WORK(&irqfd->inject, irqfd_inject);
> +
>  	INIT_WORK(&irqfd->shutdown, irqfd_shutdown);
>  
>  	file = eventfd_fget(args->fd);
> @@ -250,6 +388,27 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
>  		goto fail;
>  	}
>  
> +	/*
> +	 * When configuring an OADN irqfd, try to re-use an existing OADN.
> +	 */
> +	if (oadn) {
> +		struct _irqfd_oadn *oadn_tmp;
> +
> +		list_for_each_entry(oadn_tmp, &kvm->irqfds.oadns, list) {
> +			if (oadn_tmp->notifier.gsi == irqfd->gsi) {
> +				irqfd->oadn = oadn_tmp;
> +				break;
> +			}
> +		}
> +
> +		if (irqfd->oadn != oadn)
> +			kref_get(&irqfd->oadn->kref);
> +		else
> +			list_add(&oadn->list, &kvm->irqfds.oadns);
> +
> +		list_add_rcu(&irqfd->oadn_list, &irqfd->oadn->irqfds);

Hang on, don't we need irqfds_lock here? 

> +	}
> +
>  	irq_rt = rcu_dereference_protected(kvm->irq_routing,
>  					   lockdep_is_held(&kvm->irqfds.lock));
>  	irqfd_update(kvm, irqfd, irq_rt);
> @@ -267,6 +426,26 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
>  
>  	spin_unlock_irq(&kvm->irqfds.lock);
>  
> +	if (oadn) {
> +		synchronize_rcu();

Seems unexpected on assign.
What does this synchronize_rcu do?

> +
> +		/*
> +		 * If we weren't able to re-use an OADN, setup the irq ack
> +		 * notifier outside of spinlock.  Our interface requires
> +		 * users to be able to handle spurious de-assert & notifies,
> +		 * so trigger one here to make sure we didn't miss anything.
> +		 * Cleanup unused OADN if we share.
> +		 */
> +		if (irqfd->oadn == oadn)
> +			kvm_register_irq_ack_notifier(kvm, &oadn->notifier);
> +		else {
> +			kvm_free_irq_source_id(kvm, oadn->irq_source_id);

This is unfortunate - so we need a spare source ID for internal use.
Thus if 1 source ID is available, then it is possible that

- allocate irqfd for A
- allocate irqfd for B

succeeds because A uses a shared source id so it is freed here.

While

- allocate irqfd for B
- allocate irqfd for A

fails because B used the last ID and now temporary allocation
above fails.

> +			kfree(oadn);
> +		}
> +
> +		irqfd_oadn_notify(&irqfd->oadn->notifier);
> +	}
> +
>  	/*
>  	 * do not drop the file until the irqfd is fully initialized, otherwise
>  	 * we might race against the POLLHUP
> @@ -276,12 +455,19 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
>  	return 0;
>  
>  fail:
> +	if (oadn_eventfd && !IS_ERR(oadn_eventfd))
> +		eventfd_ctx_put(oadn_eventfd);
> +
> +	if (oadn && oadn->irq_source_id >= 0)
> +		kvm_free_irq_source_id(kvm, oadn->irq_source_id);
> +

A bit cleaner to have distinct labels for failures
at different stages of the setup.
I know it's not written this way ATM so it's not a must
to address this.

>  	if (eventfd && !IS_ERR(eventfd))
>  		eventfd_ctx_put(eventfd);
>  
>  	if (!IS_ERR(file))
>  		fput(file);
>  
> +	kfree(oadn);
>  	kfree(irqfd);
>  	return ret;
>  }
> @@ -291,6 +477,7 @@ kvm_eventfd_init(struct kvm *kvm)
>  {
>  	spin_lock_init(&kvm->irqfds.lock);
>  	INIT_LIST_HEAD(&kvm->irqfds.items);
> +	INIT_LIST_HEAD(&kvm->irqfds.oadns);
>  	INIT_LIST_HEAD(&kvm->ioeventfds);
>  }
>  
> @@ -340,7 +527,7 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
>  int
>  kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
>  {
> -	if (args->flags & ~KVM_IRQFD_FLAG_DEASSIGN)
> +	if (args->flags & ~(KVM_IRQFD_FLAG_DEASSIGN | KVM_IRQFD_FLAG_OADN))
>  		return -EINVAL;
>  
>  	if (args->flags & KVM_IRQFD_FLAG_DEASSIGN)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd
  2012-08-21 20:06     ` Alex Williamson
@ 2012-08-21 20:41       ` Michael S. Tsirkin
  2012-08-21 21:14         ` Alex Williamson
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2012-08-21 20:41 UTC (permalink / raw)
  To: Alex Williamson; +Cc: avi, gleb, kvm, linux-kernel

On Tue, Aug 21, 2012 at 02:06:19PM -0600, Alex Williamson wrote:
> On Tue, 2012-08-21 at 22:58 +0300, Michael S. Tsirkin wrote:
> > On Tue, Aug 21, 2012 at 01:29:06PM -0600, Alex Williamson wrote:
> > > KVM_IRQFD currently uses the reserved KVM_USERSPACE_IRQ_SOURCE_ID
> > > which is also shared with userspace injection methods like
> > > KVM_IRQ_LINE.  This can cause a conflict if an irqfd triggers on
> > > a GSI asserted through KVM_IRQ_LINE.
> > 
> > What kind of conflict do you envision?  Pls note level interrupts are
> > unsupported ATM.
> 
> If KVM_IRQ_LINE asserts a level interrupt and KVM_IRQFD triggers on the
> same GSI then the pin is no longer asserted as userspace thinks it is.
> Do we just chalk this up to userspace error?

Yes: using a level GSI with current irqfd is a userspace error
because you can lose interrupts anyway.

Are edge GSIs affected?

> > > Move irqfd to it's own reserved IRQ source ID.  Add a capability for
> > > userspace to test for this fix.
> > > 
> > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > ---
> > > 
> > >  arch/x86/kvm/x86.c       |    3 +++
> > >  include/linux/kvm.h      |    1 +
> > >  include/linux/kvm_host.h |    1 +
> > >  virt/kvm/eventfd.c       |    6 +++---
> > >  4 files changed, 8 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index 42bce48..cd98673 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -2174,6 +2174,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> > >  	case KVM_CAP_GET_TSC_KHZ:
> > >  	case KVM_CAP_PCI_2_3:
> > >  	case KVM_CAP_KVMCLOCK_CTRL:
> > > +	case KVM_CAP_IRQFD_IRQ_SOURCE_ID:
> > >  		r = 1;
> > >  		break;
> > >  	case KVM_CAP_COALESCED_MMIO:
> > > @@ -6258,6 +6259,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> > >  
> > >  	/* Reserve bit 0 of irq_sources_bitmap for userspace irq source */
> > >  	set_bit(KVM_USERSPACE_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap);
> > > +	/* Reserve bit 1 of irq_sources_bitmap for irqfd irq source */
> > > +	set_bit(KVM_IRQFD_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap);
> > >  
> > >  	raw_spin_lock_init(&kvm->arch.tsc_write_lock);
> > >  
> > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> > > index 2ce09aa..ae66b9c 100644
> > > --- a/include/linux/kvm.h
> > > +++ b/include/linux/kvm.h
> > > @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
> > >  #define KVM_CAP_PPC_GET_SMMU_INFO 78
> > >  #define KVM_CAP_S390_COW 79
> > >  #define KVM_CAP_PPC_ALLOC_HTAB 80
> > > +#define KVM_CAP_IRQFD_IRQ_SOURCE_ID 81
> > >  
> > >  #ifdef KVM_CAP_IRQ_ROUTING
> > >  
> > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > index b70b48b..b763230 100644
> > > --- a/include/linux/kvm_host.h
> > > +++ b/include/linux/kvm_host.h
> > > @@ -71,6 +71,7 @@
> > >  #define KVM_REQ_PMI               17
> > >  
> > >  #define KVM_USERSPACE_IRQ_SOURCE_ID	0
> > > +#define KVM_IRQFD_IRQ_SOURCE_ID		1
> > >  
> > >  struct kvm;
> > >  struct kvm_vcpu;
> > 
> > Above looks fine but I'm not sure why is the below needed.
> > This changes irqfd behaviour for edge GSIs slightly
> > in a userspace-visible way. Maybe make it a separate patch
> > so it can be considered on merits?
> 
> Hmm, the above does nothing without the below.

Yes. But you can use the above with the new irqfds you are adding.

> I thought I was just
> implementing your idea that IRQFDs should all share a single IRQ source
> ID...

Sorry I only meant for level irqfds. You are changing edge here.

> why is that no longer a good idea?  Thanks,
> 
> Alex

Maybe it is a good idea. I am just asking for the motivation.

> > > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> > > index 7d7e2aa..2245cfa 100644
> > > --- a/virt/kvm/eventfd.c
> > > +++ b/virt/kvm/eventfd.c
> > > @@ -67,8 +67,8 @@ irqfd_inject(struct work_struct *work)
> > >  	struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
> > >  	struct kvm *kvm = irqfd->kvm;
> > >  
> > > -	kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1);
> > > -	kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
> > > +	kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 1);
> > > +	kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 0);
> > >  }
> > >  
> > >  /*
> > > @@ -138,7 +138,7 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
> > >  		irq = rcu_dereference(irqfd->irq_entry);
> > >  		/* An event has been signaled, inject an interrupt */
> > >  		if (irq)
> > > -			kvm_set_msi(irq, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1);
> > > +			kvm_set_msi(irq, kvm, KVM_IRQFD_IRQ_SOURCE_ID, 1);
> > >  		else
> > >  			schedule_work(&irqfd->inject);
> > >  		rcu_read_unlock();
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 2/2] kvm: On Ack, De-assert & Notify KVM_IRQFD extension
  2012-08-21 20:37   ` Michael S. Tsirkin
@ 2012-08-21 21:11     ` Alex Williamson
  2012-08-22  0:14       ` Michael S. Tsirkin
  0 siblings, 1 reply; 35+ messages in thread
From: Alex Williamson @ 2012-08-21 21:11 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: avi, gleb, kvm, linux-kernel

On Tue, 2012-08-21 at 23:37 +0300, Michael S. Tsirkin wrote:
> On Tue, Aug 21, 2012 at 01:29:14PM -0600, Alex Williamson wrote:
> > For VFIO based device assignment we'd like a mechanism to allow level
> > triggered interrutps to be directly injected into KVM.  KVM_IRQFD
> > already allows this for edge triggered interrupts, but for level, we
> > need to watch for acknowledgement of the interrupt from the guest to
> > provide us a hint when to test the device and allow it to re-assert
> > if necessary.  To do this, we create a new KVM_IRQFD mode called
> > "On Ack, De-assert & Notify", or OADN.  In this mode, an interrupt
> > injection provides only a gsi assertion.  We then hook into the IRQ
> > ACK notifier, which when triggered de-asserts the gsi and notifies
> > via another eventfd.  It's then the responsibility of the user to
> > re-assert the interrupt is service is still required.
> > 
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> 
> Naming aside, looks good.
> I think I see some minor bugs, and I added some improvement
> suggestions below.
> 
> Thanks!
> 
> > ---
> > 
> >  Documentation/virtual/kvm/api.txt |   13 ++
> >  arch/x86/kvm/x86.c                |    1 
> >  include/linux/kvm.h               |    6 +
> >  include/linux/kvm_host.h          |    1 
> >  virt/kvm/eventfd.c                |  193 ++++++++++++++++++++++++++++++++++++-
> >  5 files changed, 210 insertions(+), 4 deletions(-)
> > 
> > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > index bf33aaa..87d7321 100644
> > --- a/Documentation/virtual/kvm/api.txt
> > +++ b/Documentation/virtual/kvm/api.txt
> > @@ -1946,6 +1946,19 @@ the guest using the specified gsi pin.  The irqfd is removed using
> >  the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
> >  and kvm_irqfd.gsi.
> >  
> > +With KVM_CAP_IRQFD_OADN, KVM_IRQFD supports an "On Ack, De-assert &
> > +Notify" option that allows emulation of level-triggered interrupts.
> > +When kvm_irqfd.fd is triggered, the requested gsi is asserted and
> > +remains asserted until interaction with the irqchip indicates the
> > +VM has acknowledged the interrupt, such as an EOI.  On acknoledgement
> > +the gsi is automatically de-asserted and the user is notified via
> > +kvm_irqfd.notifyfd.  The user is then required to re-assert the
> > +interrupt if the associated device still requires service.  To enable
> > +this mode, configure the KVM_IRQFD using the KVM_IRQFD_FLAG_OADN flag
> > +and specify kvm_irqfd.notifyfd.  Note that closing kvm_irqfd.notifyfd
> > +while configured in this mode does not disable the irqfd.  The
> > +KVM_IRQFD_FLAG_OADN flag is only necessary on assignment.
> > +
> >  4.76 KVM_PPC_ALLOCATE_HTAB
> >  
> >  Capability: KVM_CAP_PPC_ALLOC_HTAB
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index cd98673..fde7b66 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -2175,6 +2175,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> >  	case KVM_CAP_PCI_2_3:
> >  	case KVM_CAP_KVMCLOCK_CTRL:
> >  	case KVM_CAP_IRQFD_IRQ_SOURCE_ID:
> > +	case KVM_CAP_IRQFD_OADN:
> >  		r = 1;
> >  		break;
> >  	case KVM_CAP_COALESCED_MMIO:
> > diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> > index ae66b9c..ec0f1d8 100644
> > --- a/include/linux/kvm.h
> > +++ b/include/linux/kvm.h
> > @@ -619,6 +619,7 @@ struct kvm_ppc_smmu_info {
> >  #define KVM_CAP_S390_COW 79
> >  #define KVM_CAP_PPC_ALLOC_HTAB 80
> >  #define KVM_CAP_IRQFD_IRQ_SOURCE_ID 81
> > +#define KVM_CAP_IRQFD_OADN 82
> >  
> >  #ifdef KVM_CAP_IRQ_ROUTING
> >  
> > @@ -684,12 +685,15 @@ struct kvm_xen_hvm_config {
> >  #endif
> >  
> >  #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
> > +/* Availabie with KVM_CAP_IRQFD_OADN */
> 
> Need to also explain what it is.

Beyond Documentation/virtual/kvm/api.txt?  I don't see much else getting
documented here.  Or maybe you mean

/* On Ack, De-assert & Notify */

> > +#define KVM_IRQFD_FLAG_OADN (1 << 1)
> >  
> >  struct kvm_irqfd {
> >  	__u32 fd;
> >  	__u32 gsi;
> >  	__u32 flags;
> > -	__u8  pad[20];
> > +	__u32 notifyfd;
> 
> Document that this is only valid with OADN flag.  Might be a good idea
> to rename this to deassert_on_ack_notifyfd or oadn_notifyfd
> to avoid confusion.

I'll add a /* only valid with KVM_IRQFD_FLAG_OADN */

I can change the name if you prefer, but it seems pretty clear to me how
a notifyfd might relate to a "On Ack, De-assert & Notify" irqfd without
pulling longer names into userspace.

> > +	__u8  pad[16];
> >  };
> >  
> >  struct kvm_clock_data {
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index b763230..d502d08 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -284,6 +284,7 @@ struct kvm {
> >  	struct {
> >  		spinlock_t        lock;
> >  		struct list_head  items;
> > +		struct list_head  oadns;
> >  	} irqfds;
> >  	struct list_head ioeventfds;
> >  #endif
> > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> > index 2245cfa..dfdb5b2 100644
> > --- a/virt/kvm/eventfd.c
> > +++ b/virt/kvm/eventfd.c
> > @@ -43,6 +43,23 @@
> >   * --------------------------------------------------------------------
> >   */
> >  
> > +/*
> > + * OADN irqfds (On Ack, De-assert & Notify) are a special variety of
> > + * irqfds that assert an interrupt to the irqchip on eventfd trigger,
> > + * receieve notification when userspace acknowledges the interrupt,
> > + * automatically de-asserts the irqchip level, and notifies userspace
> > + * via the oadn_eventfd.  This object helps to provide one-to-many
> > + * deassert-to-notify so we can share a single irq source ID per OADN.
> > + */
> > +struct _irqfd_oadn {
> > +	struct kvm *kvm;
> > +	int irq_source_id; /* IRQ source ID shared by these irqfds */
> > +	struct list_head irqfds; /* list of irqfds using this object */
> > +	struct kvm_irq_ack_notifier notifier; /* IRQ ACK notification */
> > +	struct kref kref; /* Race-free removal */
> > +	struct list_head list;
> > +};
> > +
> >  struct _irqfd {
> >  	/* Used for MSI fast-path */
> >  	struct kvm *kvm;
> > @@ -52,6 +69,10 @@ struct _irqfd {
> >  	/* Used for level IRQ fast-path */
> >  	int gsi;
> >  	struct work_struct inject;
> > +	/* Used for OADN (On Ack, De-assert & Notify) path */
> > +	struct eventfd_ctx *oadn_eventfd;
> > +	struct list_head oadn_list;
> 
> Could you document RCU and locking rules for this field please?

Sure

> > +	struct _irqfd_oadn *oadn;
> >  	/* Used for setup/shutdown */
> >  	struct eventfd_ctx *eventfd;
> >  	struct list_head list;
> > @@ -71,6 +92,51 @@ irqfd_inject(struct work_struct *work)
> >  	kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 0);
> >  }
> >  
> > +static void
> > +irqfd_oadn_inject(struct work_struct *work)
> > +{
> > +	struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
> > +
> > +	kvm_set_irq(irqfd->kvm, irqfd->oadn->irq_source_id, irqfd->gsi, 1);
> > +}
> > +
> > +/*
> > + * Since OADN irqfds share an IRQ source ID, we de-assert once then
> > + * notify all of the OADN irqfds using this GSI.  We can't do multiple
> > + * de-asserts or we risk racing with incoming re-asserts.
> > + */
> > +static void
> > +irqfd_oadn_notify(struct kvm_irq_ack_notifier *kian)
> > +{
> > +	struct _irqfd_oadn *oadn;
> > +	struct _irqfd *irqfd;
> > +
> > +	oadn = container_of(kian, struct _irqfd_oadn, notifier);
> > +
> > +	rcu_read_lock();
> > +
> > +	kvm_set_irq(oadn->kvm, oadn->irq_source_id,
> > +		    oadn->notifier.gsi, 0);
> > +
> > +	list_for_each_entry_rcu(irqfd, &oadn->irqfds, oadn_list)
> > +		eventfd_signal(irqfd->oadn_eventfd, 1);
> > +
> > +	rcu_read_unlock();
> > +}
> > +
> > +/*
> > + * Assumes kvm->irqfds.lock is held.  Remove from the list of OADNs, new
> > + * users will allocate their own, letting us de-assert this GSI on free,
> > + * after the RCU grace period.
> > + */
> > +static void
> > +irqfd_oadn_release(struct kref *kref)
> > +{
> > +	struct _irqfd_oadn *oadn = container_of(kref, struct _irqfd_oadn, kref);
> > +
> > +	list_del(&oadn->list);
> > +}
> > +
> >  /*
> >   * Race-free decouple logic (ordering is critical)
> >   */
> > @@ -93,6 +159,35 @@ irqfd_shutdown(struct work_struct *work)
> >  	flush_work_sync(&irqfd->inject);
> >  
> >  	/*
> > +	 * OADN irqfds use an RCU list for lockless de-assert and user
> > +	 * notification.  Modify the list using RCU under lock and release
> > +	 * the OADN object.  Since we later free this irqfd, we must wait
> > +	 * for an RCU grace period.  If the OADN was released, we can then
> > +	 * unregister the irq ack notifier, free the irq source ID (assuring
> > +	 * that the GSI is de-asserted), and free the object.
> > +	 */
> > +	if (irqfd->oadn) {
> > +		struct _irqfd_oadn *oadn = irqfd->oadn;
> > +		struct kvm *kvm = oadn->kvm;
> > +		bool released;
> > +
> > +		spin_lock_irq(&irqfd->kvm->irqfds.lock);
> > +		list_del_rcu(&irqfd->oadn_list);
> > +		released = kref_put(&oadn->kref, irqfd_oadn_release);
> > +		spin_unlock_irq(&irqfd->kvm->irqfds.lock);
> > +
> > +		synchronize_rcu();
> > +
> > +		if (released) {
> > +			kvm_unregister_irq_ack_notifier(kvm, &oadn->notifier);
> > +			kvm_free_irq_source_id(kvm, oadn->irq_source_id);
> > +			kfree(oadn);
> > +		}
> > +
> > +		eventfd_ctx_put(irqfd->oadn_eventfd);
> > +	}
> > +
> > +	/*
> >  	 * It is now safe to release the object's resources
> >  	 */
> >  	eventfd_ctx_put(irqfd->eventfd);
> > @@ -202,8 +297,9 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
> >  {
> >  	struct kvm_irq_routing_table *irq_rt;
> >  	struct _irqfd *irqfd, *tmp;
> > +	struct _irqfd_oadn *oadn = NULL;
> >  	struct file *file = NULL;
> > -	struct eventfd_ctx *eventfd = NULL;
> > +	struct eventfd_ctx *eventfd = NULL, *oadn_eventfd = NULL;
> >  	int ret;
> >  	unsigned int events;
> >  
> > @@ -214,7 +310,49 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
> >  	irqfd->kvm = kvm;
> >  	irqfd->gsi = args->gsi;
> >  	INIT_LIST_HEAD(&irqfd->list);
> > -	INIT_WORK(&irqfd->inject, irqfd_inject);
> > +	INIT_LIST_HEAD(&irqfd->oadn_list);
> > +
> > +	if (args->flags & KVM_IRQFD_FLAG_OADN) {
> > +		oadn_eventfd = eventfd_ctx_fdget(args->notifyfd);
> > +		if (IS_ERR(oadn_eventfd)) {
> > +			ret = PTR_ERR(oadn_eventfd);
> > +			goto fail;
> > +		}
> > +
> > +		irqfd->oadn_eventfd = oadn_eventfd;
> > +
> > +		/*
> > +		 * We may be able to share an existing OADN, but we won't
> > +		 * know that until we search under spinlock.  We can't get
> > +		 * an irq_source_id under spinlock, and we'd prefer not to
> > +		 * do an atomic allocation, so prep an object here and free
> > +		 * it if we don't need it.
> > +		 */
> 
> So if everyone uses same source ID this code will be simpler too?

No, everyone using the same source ID results in racy cleanup that I
can't figure out how to avoid while maintaining a lockless
de-assert/notify path.  See patch 0/2 for details of the race.

> > +		oadn = kzalloc(sizeof(*oadn), GFP_KERNEL);
> > +		if (!oadn) {
> > +			ret = -ENOMEM;
> > +			goto fail;
> > +		}
> > +
> > +		oadn->kvm = kvm;
> > +		INIT_LIST_HEAD(&oadn->irqfds);
> > +		oadn->notifier.gsi = irqfd->gsi;
> > +		oadn->notifier.irq_acked = irqfd_oadn_notify;
> > +		kref_init(&oadn->kref);
> > +		INIT_LIST_HEAD(&oadn->list);
> > +
> > +		oadn->irq_source_id = kvm_request_irq_source_id(kvm);
> > +		if (oadn->irq_source_id < 0) {
> > +			ret = oadn->irq_source_id;
> > +			goto fail;
> > +		}
> > +
> 
> This still has the problem that by allocating these
> OADN irqfds you can run out of source IDs for other uses.
> Why don't OADNs just use KVM_IRQFD_IRQ_SOURCE_ID?
> I thought this is why it was introduced in this patchset ...

See patch 0/2, it's racy.  We have 64 IRQ source IDs and 24 GSIs
currently.  This let's all users on the same GSI share an IRQ source ID.
Yes, we hold onto one we may not need during setup, but locking doesn't
allow us to allocate it at the point where we'd need it.

> > +		irqfd->oadn = oadn;
> > +
> > +		INIT_WORK(&irqfd->inject, irqfd_oadn_inject);
> > +	} else
> > +		INIT_WORK(&irqfd->inject, irqfd_inject);
> > +
> >  	INIT_WORK(&irqfd->shutdown, irqfd_shutdown);
> >  
> >  	file = eventfd_fget(args->fd);
> > @@ -250,6 +388,27 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
> >  		goto fail;
> >  	}
> >  
> > +	/*
> > +	 * When configuring an OADN irqfd, try to re-use an existing OADN.
> > +	 */
> > +	if (oadn) {
> > +		struct _irqfd_oadn *oadn_tmp;
> > +
> > +		list_for_each_entry(oadn_tmp, &kvm->irqfds.oadns, list) {
> > +			if (oadn_tmp->notifier.gsi == irqfd->gsi) {
> > +				irqfd->oadn = oadn_tmp;
> > +				break;
> > +			}
> > +		}
> > +
> > +		if (irqfd->oadn != oadn)
> > +			kref_get(&irqfd->oadn->kref);
> > +		else
> > +			list_add(&oadn->list, &kvm->irqfds.oadns);
> > +
> > +		list_add_rcu(&irqfd->oadn_list, &irqfd->oadn->irqfds);
> 
> Hang on, don't we need irqfds_lock here? 

We're in kvm->irqfds.lock.

> > +	}
> > +
> >  	irq_rt = rcu_dereference_protected(kvm->irq_routing,
> >  					   lockdep_is_held(&kvm->irqfds.lock));
> >  	irqfd_update(kvm, irqfd, irq_rt);
> > @@ -267,6 +426,26 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
> >  
> >  	spin_unlock_irq(&kvm->irqfds.lock);
> >  
> > +	if (oadn) {
> > +		synchronize_rcu();
> 
> Seems unexpected on assign.
> What does this synchronize_rcu do?

Because we force the notify below and we need the irqfd we just added to
the list to be visible on the list for that.

> > +
> > +		/*
> > +		 * If we weren't able to re-use an OADN, setup the irq ack
> > +		 * notifier outside of spinlock.  Our interface requires
> > +		 * users to be able to handle spurious de-assert & notifies,
> > +		 * so trigger one here to make sure we didn't miss anything.
> > +		 * Cleanup unused OADN if we share.
> > +		 */
> > +		if (irqfd->oadn == oadn)
> > +			kvm_register_irq_ack_notifier(kvm, &oadn->notifier);
> > +		else {
> > +			kvm_free_irq_source_id(kvm, oadn->irq_source_id);
> 
> This is unfortunate - so we need a spare source ID for internal use.
> Thus if 1 source ID is available, then it is possible that
> 
> - allocate irqfd for A
> - allocate irqfd for B
> 
> succeeds because A uses a shared source id so it is freed here.
> 
> While
> 
> - allocate irqfd for B
> - allocate irqfd for A
> 
> fails because B used the last ID and now temporary allocation
> above fails.

Yep, suggestions?  How many KVM_IRQFDs can we expect in-flight at one
time vs how many IRQ source IDs will be used long term.  VFIO assigned
PCI device can only make use of up to 4 GSIs per root bridge, we only
have 24 IOAPIC pins.

> > +			kfree(oadn);
> > +		}
> > +
> > +		irqfd_oadn_notify(&irqfd->oadn->notifier);
> > +	}
> > +
> >  	/*
> >  	 * do not drop the file until the irqfd is fully initialized, otherwise
> >  	 * we might race against the POLLHUP
> > @@ -276,12 +455,19 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
> >  	return 0;
> >  
> >  fail:
> > +	if (oadn_eventfd && !IS_ERR(oadn_eventfd))
> > +		eventfd_ctx_put(oadn_eventfd);
> > +
> > +	if (oadn && oadn->irq_source_id >= 0)
> > +		kvm_free_irq_source_id(kvm, oadn->irq_source_id);
> > +
> 
> A bit cleaner to have distinct labels for failures
> at different stages of the setup.
> I know it's not written this way ATM so it's not a must
> to address this.

I often find myself getting lost trying to use distinct labels.  I'm
just following along what this function already does, but find it to be
a nice simplification to handle everything this way.  Thanks,

Alex

> >  	if (eventfd && !IS_ERR(eventfd))
> >  		eventfd_ctx_put(eventfd);
> >  
> >  	if (!IS_ERR(file))
> >  		fput(file);
> >  
> > +	kfree(oadn);
> >  	kfree(irqfd);
> >  	return ret;
> >  }
> > @@ -291,6 +477,7 @@ kvm_eventfd_init(struct kvm *kvm)
> >  {
> >  	spin_lock_init(&kvm->irqfds.lock);
> >  	INIT_LIST_HEAD(&kvm->irqfds.items);
> > +	INIT_LIST_HEAD(&kvm->irqfds.oadns);
> >  	INIT_LIST_HEAD(&kvm->ioeventfds);
> >  }
> >  
> > @@ -340,7 +527,7 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
> >  int
> >  kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
> >  {
> > -	if (args->flags & ~KVM_IRQFD_FLAG_DEASSIGN)
> > +	if (args->flags & ~(KVM_IRQFD_FLAG_DEASSIGN | KVM_IRQFD_FLAG_OADN))
> >  		return -EINVAL;
> >  
> >  	if (args->flags & KVM_IRQFD_FLAG_DEASSIGN)
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd
  2012-08-21 20:41       ` Michael S. Tsirkin
@ 2012-08-21 21:14         ` Alex Williamson
  2012-08-22  0:41           ` Michael S. Tsirkin
  0 siblings, 1 reply; 35+ messages in thread
From: Alex Williamson @ 2012-08-21 21:14 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: avi, gleb, kvm, linux-kernel

On Tue, 2012-08-21 at 23:41 +0300, Michael S. Tsirkin wrote:
> On Tue, Aug 21, 2012 at 02:06:19PM -0600, Alex Williamson wrote:
> > On Tue, 2012-08-21 at 22:58 +0300, Michael S. Tsirkin wrote:
> > > On Tue, Aug 21, 2012 at 01:29:06PM -0600, Alex Williamson wrote:
> > > > KVM_IRQFD currently uses the reserved KVM_USERSPACE_IRQ_SOURCE_ID
> > > > which is also shared with userspace injection methods like
> > > > KVM_IRQ_LINE.  This can cause a conflict if an irqfd triggers on
> > > > a GSI asserted through KVM_IRQ_LINE.
> > > 
> > > What kind of conflict do you envision?  Pls note level interrupts are
> > > unsupported ATM.
> > 
> > If KVM_IRQ_LINE asserts a level interrupt and KVM_IRQFD triggers on the
> > same GSI then the pin is no longer asserted as userspace thinks it is.
> > Do we just chalk this up to userspace error?
> 
> Yes: using a level GSI with current irqfd is a userspace error
> because you can lose interrupts anyway.
> 
> Are edge GSIs affected?

I wouldn't think so.

> > > > Move irqfd to it's own reserved IRQ source ID.  Add a capability for
> > > > userspace to test for this fix.
> > > > 
> > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > > ---
> > > > 
> > > >  arch/x86/kvm/x86.c       |    3 +++
> > > >  include/linux/kvm.h      |    1 +
> > > >  include/linux/kvm_host.h |    1 +
> > > >  virt/kvm/eventfd.c       |    6 +++---
> > > >  4 files changed, 8 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > index 42bce48..cd98673 100644
> > > > --- a/arch/x86/kvm/x86.c
> > > > +++ b/arch/x86/kvm/x86.c
> > > > @@ -2174,6 +2174,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> > > >  	case KVM_CAP_GET_TSC_KHZ:
> > > >  	case KVM_CAP_PCI_2_3:
> > > >  	case KVM_CAP_KVMCLOCK_CTRL:
> > > > +	case KVM_CAP_IRQFD_IRQ_SOURCE_ID:
> > > >  		r = 1;
> > > >  		break;
> > > >  	case KVM_CAP_COALESCED_MMIO:
> > > > @@ -6258,6 +6259,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> > > >  
> > > >  	/* Reserve bit 0 of irq_sources_bitmap for userspace irq source */
> > > >  	set_bit(KVM_USERSPACE_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap);
> > > > +	/* Reserve bit 1 of irq_sources_bitmap for irqfd irq source */
> > > > +	set_bit(KVM_IRQFD_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap);
> > > >  
> > > >  	raw_spin_lock_init(&kvm->arch.tsc_write_lock);
> > > >  
> > > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> > > > index 2ce09aa..ae66b9c 100644
> > > > --- a/include/linux/kvm.h
> > > > +++ b/include/linux/kvm.h
> > > > @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
> > > >  #define KVM_CAP_PPC_GET_SMMU_INFO 78
> > > >  #define KVM_CAP_S390_COW 79
> > > >  #define KVM_CAP_PPC_ALLOC_HTAB 80
> > > > +#define KVM_CAP_IRQFD_IRQ_SOURCE_ID 81
> > > >  
> > > >  #ifdef KVM_CAP_IRQ_ROUTING
> > > >  
> > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > > index b70b48b..b763230 100644
> > > > --- a/include/linux/kvm_host.h
> > > > +++ b/include/linux/kvm_host.h
> > > > @@ -71,6 +71,7 @@
> > > >  #define KVM_REQ_PMI               17
> > > >  
> > > >  #define KVM_USERSPACE_IRQ_SOURCE_ID	0
> > > > +#define KVM_IRQFD_IRQ_SOURCE_ID		1
> > > >  
> > > >  struct kvm;
> > > >  struct kvm_vcpu;
> > > 
> > > Above looks fine but I'm not sure why is the below needed.
> > > This changes irqfd behaviour for edge GSIs slightly
> > > in a userspace-visible way. Maybe make it a separate patch
> > > so it can be considered on merits?
> > 
> > Hmm, the above does nothing without the below.
> 
> Yes. But you can use the above with the new irqfds you are adding.

Nope, racy.

> > I thought I was just
> > implementing your idea that IRQFDs should all share a single IRQ source
> > ID...
> 
> Sorry I only meant for level irqfds. You are changing edge here.

Ok, I misunderstood then.

> > why is that no longer a good idea?  Thanks,
> > 
> > Alex
> 
> Maybe it is a good idea. I am just asking for the motivation.

I assumed you were pointing out the level vs edge interaction.  If we
call that a userspace bug, I can just drop this.  Thanks,

Alex

> > > > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> > > > index 7d7e2aa..2245cfa 100644
> > > > --- a/virt/kvm/eventfd.c
> > > > +++ b/virt/kvm/eventfd.c
> > > > @@ -67,8 +67,8 @@ irqfd_inject(struct work_struct *work)
> > > >  	struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
> > > >  	struct kvm *kvm = irqfd->kvm;
> > > >  
> > > > -	kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1);
> > > > -	kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
> > > > +	kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 1);
> > > > +	kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 0);
> > > >  }
> > > >  
> > > >  /*
> > > > @@ -138,7 +138,7 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
> > > >  		irq = rcu_dereference(irqfd->irq_entry);
> > > >  		/* An event has been signaled, inject an interrupt */
> > > >  		if (irq)
> > > > -			kvm_set_msi(irq, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1);
> > > > +			kvm_set_msi(irq, kvm, KVM_IRQFD_IRQ_SOURCE_ID, 1);
> > > >  		else
> > > >  			schedule_work(&irqfd->inject);
> > > >  		rcu_read_unlock();
> > 
> > 




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 2/2] kvm: On Ack, De-assert & Notify KVM_IRQFD extension
  2012-08-21 21:11     ` Alex Williamson
@ 2012-08-22  0:14       ` Michael S. Tsirkin
  2012-08-22  1:48         ` Alex Williamson
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2012-08-22  0:14 UTC (permalink / raw)
  To: Alex Williamson; +Cc: avi, gleb, kvm, linux-kernel

On Tue, Aug 21, 2012 at 03:11:41PM -0600, Alex Williamson wrote:
> On Tue, 2012-08-21 at 23:37 +0300, Michael S. Tsirkin wrote:
> > On Tue, Aug 21, 2012 at 01:29:14PM -0600, Alex Williamson wrote:
> > > For VFIO based device assignment we'd like a mechanism to allow level
> > > triggered interrutps to be directly injected into KVM.  KVM_IRQFD
> > > already allows this for edge triggered interrupts, but for level, we
> > > need to watch for acknowledgement of the interrupt from the guest to
> > > provide us a hint when to test the device and allow it to re-assert
> > > if necessary.  To do this, we create a new KVM_IRQFD mode called
> > > "On Ack, De-assert & Notify", or OADN.  In this mode, an interrupt
> > > injection provides only a gsi assertion.  We then hook into the IRQ
> > > ACK notifier, which when triggered de-asserts the gsi and notifies
> > > via another eventfd.  It's then the responsibility of the user to
> > > re-assert the interrupt is service is still required.
> > > 
> > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > 
> > Naming aside, looks good.
> > I think I see some minor bugs, and I added some improvement
> > suggestions below.
> > 
> > Thanks!
> > 
> > > ---
> > > 
> > >  Documentation/virtual/kvm/api.txt |   13 ++
> > >  arch/x86/kvm/x86.c                |    1 
> > >  include/linux/kvm.h               |    6 +
> > >  include/linux/kvm_host.h          |    1 
> > >  virt/kvm/eventfd.c                |  193 ++++++++++++++++++++++++++++++++++++-
> > >  5 files changed, 210 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > > index bf33aaa..87d7321 100644
> > > --- a/Documentation/virtual/kvm/api.txt
> > > +++ b/Documentation/virtual/kvm/api.txt
> > > @@ -1946,6 +1946,19 @@ the guest using the specified gsi pin.  The irqfd is removed using
> > >  the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
> > >  and kvm_irqfd.gsi.
> > >  
> > > +With KVM_CAP_IRQFD_OADN, KVM_IRQFD supports an "On Ack, De-assert &
> > > +Notify" option that allows emulation of level-triggered interrupts.
> > > +When kvm_irqfd.fd is triggered, the requested gsi is asserted and
> > > +remains asserted until interaction with the irqchip indicates the
> > > +VM has acknowledged the interrupt, such as an EOI.  On acknoledgement
> > > +the gsi is automatically de-asserted and the user is notified via
> > > +kvm_irqfd.notifyfd.  The user is then required to re-assert the
> > > +interrupt if the associated device still requires service.  To enable
> > > +this mode, configure the KVM_IRQFD using the KVM_IRQFD_FLAG_OADN flag
> > > +and specify kvm_irqfd.notifyfd.  Note that closing kvm_irqfd.notifyfd
> > > +while configured in this mode does not disable the irqfd.  The
> > > +KVM_IRQFD_FLAG_OADN flag is only necessary on assignment.
> > > +
> > >  4.76 KVM_PPC_ALLOCATE_HTAB
> > >  
> > >  Capability: KVM_CAP_PPC_ALLOC_HTAB
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index cd98673..fde7b66 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -2175,6 +2175,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> > >  	case KVM_CAP_PCI_2_3:
> > >  	case KVM_CAP_KVMCLOCK_CTRL:
> > >  	case KVM_CAP_IRQFD_IRQ_SOURCE_ID:
> > > +	case KVM_CAP_IRQFD_OADN:
> > >  		r = 1;
> > >  		break;
> > >  	case KVM_CAP_COALESCED_MMIO:
> > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> > > index ae66b9c..ec0f1d8 100644
> > > --- a/include/linux/kvm.h
> > > +++ b/include/linux/kvm.h
> > > @@ -619,6 +619,7 @@ struct kvm_ppc_smmu_info {
> > >  #define KVM_CAP_S390_COW 79
> > >  #define KVM_CAP_PPC_ALLOC_HTAB 80
> > >  #define KVM_CAP_IRQFD_IRQ_SOURCE_ID 81
> > > +#define KVM_CAP_IRQFD_OADN 82
> > >  
> > >  #ifdef KVM_CAP_IRQ_ROUTING
> > >  
> > > @@ -684,12 +685,15 @@ struct kvm_xen_hvm_config {
> > >  #endif
> > >  
> > >  #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
> > > +/* Availabie with KVM_CAP_IRQFD_OADN */
> > 
> > Need to also explain what it is.
> 
> Beyond Documentation/virtual/kvm/api.txt?  I don't see much else getting
> documented here.  Or maybe you mean
> 
> /* On Ack, De-assert & Notify */

 yes


> > > +#define KVM_IRQFD_FLAG_OADN (1 << 1)
> > >  
> > >  struct kvm_irqfd {
> > >  	__u32 fd;
> > >  	__u32 gsi;
> > >  	__u32 flags;
> > > -	__u8  pad[20];
> > > +	__u32 notifyfd;
> > 
> > Document that this is only valid with OADN flag.  Might be a good idea
> > to rename this to deassert_on_ack_notifyfd or oadn_notifyfd
> > to avoid confusion.
> 
> I'll add a /* only valid with KVM_IRQFD_FLAG_OADN */
> 
> I can change the name if you prefer, but it seems pretty clear to me how
> a notifyfd might relate to a "On Ack, De-assert & Notify" irqfd without
> pulling longer names into userspace.

Ideally you can figure out stuff without reading docs.
So eschew abbreviations, make it as long and clear as possible
is my advise.
But it is up to Avi.

> > > +	__u8  pad[16];
> > >  };
> > >  
> > >  struct kvm_clock_data {
> > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > index b763230..d502d08 100644
> > > --- a/include/linux/kvm_host.h
> > > +++ b/include/linux/kvm_host.h
> > > @@ -284,6 +284,7 @@ struct kvm {
> > >  	struct {
> > >  		spinlock_t        lock;
> > >  		struct list_head  items;
> > > +		struct list_head  oadns;
> > >  	} irqfds;
> > >  	struct list_head ioeventfds;
> > >  #endif
> > > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> > > index 2245cfa..dfdb5b2 100644
> > > --- a/virt/kvm/eventfd.c
> > > +++ b/virt/kvm/eventfd.c
> > > @@ -43,6 +43,23 @@
> > >   * --------------------------------------------------------------------
> > >   */
> > >  
> > > +/*
> > > + * OADN irqfds (On Ack, De-assert & Notify) are a special variety of
> > > + * irqfds that assert an interrupt to the irqchip on eventfd trigger,
> > > + * receieve notification when userspace acknowledges the interrupt,
> > > + * automatically de-asserts the irqchip level, and notifies userspace
> > > + * via the oadn_eventfd.  This object helps to provide one-to-many
> > > + * deassert-to-notify so we can share a single irq source ID per OADN.
> > > + */
> > > +struct _irqfd_oadn {
> > > +	struct kvm *kvm;
> > > +	int irq_source_id; /* IRQ source ID shared by these irqfds */
> > > +	struct list_head irqfds; /* list of irqfds using this object */
> > > +	struct kvm_irq_ack_notifier notifier; /* IRQ ACK notification */
> > > +	struct kref kref; /* Race-free removal */
> > > +	struct list_head list;
> > > +};
> > > +
> > >  struct _irqfd {
> > >  	/* Used for MSI fast-path */
> > >  	struct kvm *kvm;
> > > @@ -52,6 +69,10 @@ struct _irqfd {
> > >  	/* Used for level IRQ fast-path */
> > >  	int gsi;
> > >  	struct work_struct inject;
> > > +	/* Used for OADN (On Ack, De-assert & Notify) path */
> > > +	struct eventfd_ctx *oadn_eventfd;
> > > +	struct list_head oadn_list;
> > 
> > Could you document RCU and locking rules for this field please?
> 
> Sure
> 
> > > +	struct _irqfd_oadn *oadn;
> > >  	/* Used for setup/shutdown */
> > >  	struct eventfd_ctx *eventfd;
> > >  	struct list_head list;
> > > @@ -71,6 +92,51 @@ irqfd_inject(struct work_struct *work)
> > >  	kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 0);
> > >  }
> > >  
> > > +static void
> > > +irqfd_oadn_inject(struct work_struct *work)
> > > +{
> > > +	struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
> > > +
> > > +	kvm_set_irq(irqfd->kvm, irqfd->oadn->irq_source_id, irqfd->gsi, 1);
> > > +}
> > > +
> > > +/*
> > > + * Since OADN irqfds share an IRQ source ID, we de-assert once then
> > > + * notify all of the OADN irqfds using this GSI.  We can't do multiple
> > > + * de-asserts or we risk racing with incoming re-asserts.
> > > + */
> > > +static void
> > > +irqfd_oadn_notify(struct kvm_irq_ack_notifier *kian)
> > > +{
> > > +	struct _irqfd_oadn *oadn;
> > > +	struct _irqfd *irqfd;
> > > +
> > > +	oadn = container_of(kian, struct _irqfd_oadn, notifier);
> > > +
> > > +	rcu_read_lock();
> > > +
> > > +	kvm_set_irq(oadn->kvm, oadn->irq_source_id,
> > > +		    oadn->notifier.gsi, 0);
> > > +
> > > +	list_for_each_entry_rcu(irqfd, &oadn->irqfds, oadn_list)
> > > +		eventfd_signal(irqfd->oadn_eventfd, 1);
> > > +
> > > +	rcu_read_unlock();
> > > +}
> > > +
> > > +/*
> > > + * Assumes kvm->irqfds.lock is held.  Remove from the list of OADNs, new
> > > + * users will allocate their own, letting us de-assert this GSI on free,
> > > + * after the RCU grace period.
> > > + */
> > > +static void
> > > +irqfd_oadn_release(struct kref *kref)
> > > +{
> > > +	struct _irqfd_oadn *oadn = container_of(kref, struct _irqfd_oadn, kref);
> > > +
> > > +	list_del(&oadn->list);
> > > +}
> > > +
> > >  /*
> > >   * Race-free decouple logic (ordering is critical)
> > >   */
> > > @@ -93,6 +159,35 @@ irqfd_shutdown(struct work_struct *work)
> > >  	flush_work_sync(&irqfd->inject);
> > >  
> > >  	/*
> > > +	 * OADN irqfds use an RCU list for lockless de-assert and user
> > > +	 * notification.  Modify the list using RCU under lock and release
> > > +	 * the OADN object.  Since we later free this irqfd, we must wait
> > > +	 * for an RCU grace period.  If the OADN was released, we can then
> > > +	 * unregister the irq ack notifier, free the irq source ID (assuring
> > > +	 * that the GSI is de-asserted), and free the object.
> > > +	 */
> > > +	if (irqfd->oadn) {
> > > +		struct _irqfd_oadn *oadn = irqfd->oadn;
> > > +		struct kvm *kvm = oadn->kvm;
> > > +		bool released;
> > > +
> > > +		spin_lock_irq(&irqfd->kvm->irqfds.lock);
> > > +		list_del_rcu(&irqfd->oadn_list);
> > > +		released = kref_put(&oadn->kref, irqfd_oadn_release);
> > > +		spin_unlock_irq(&irqfd->kvm->irqfds.lock);
> > > +
> > > +		synchronize_rcu();
> > > +
> > > +		if (released) {
> > > +			kvm_unregister_irq_ack_notifier(kvm, &oadn->notifier);
> > > +			kvm_free_irq_source_id(kvm, oadn->irq_source_id);
> > > +			kfree(oadn);
> > > +		}
> > > +
> > > +		eventfd_ctx_put(irqfd->oadn_eventfd);
> > > +	}
> > > +
> > > +	/*
> > >  	 * It is now safe to release the object's resources
> > >  	 */
> > >  	eventfd_ctx_put(irqfd->eventfd);
> > > @@ -202,8 +297,9 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
> > >  {
> > >  	struct kvm_irq_routing_table *irq_rt;
> > >  	struct _irqfd *irqfd, *tmp;
> > > +	struct _irqfd_oadn *oadn = NULL;
> > >  	struct file *file = NULL;
> > > -	struct eventfd_ctx *eventfd = NULL;
> > > +	struct eventfd_ctx *eventfd = NULL, *oadn_eventfd = NULL;
> > >  	int ret;
> > >  	unsigned int events;
> > >  
> > > @@ -214,7 +310,49 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
> > >  	irqfd->kvm = kvm;
> > >  	irqfd->gsi = args->gsi;
> > >  	INIT_LIST_HEAD(&irqfd->list);
> > > -	INIT_WORK(&irqfd->inject, irqfd_inject);
> > > +	INIT_LIST_HEAD(&irqfd->oadn_list);
> > > +
> > > +	if (args->flags & KVM_IRQFD_FLAG_OADN) {
> > > +		oadn_eventfd = eventfd_ctx_fdget(args->notifyfd);
> > > +		if (IS_ERR(oadn_eventfd)) {
> > > +			ret = PTR_ERR(oadn_eventfd);
> > > +			goto fail;
> > > +		}
> > > +
> > > +		irqfd->oadn_eventfd = oadn_eventfd;
> > > +
> > > +		/*
> > > +		 * We may be able to share an existing OADN, but we won't
> > > +		 * know that until we search under spinlock.  We can't get
> > > +		 * an irq_source_id under spinlock, and we'd prefer not to
> > > +		 * do an atomic allocation, so prep an object here and free
> > > +		 * it if we don't need it.
> > > +		 */
> > 
> > So if everyone uses same source ID this code will be simpler too?
> 
> No, everyone using the same source ID results in racy cleanup that I
> can't figure out how to avoid while maintaining a lockless
> de-assert/notify path.  See patch 0/2 for details of the race.
> 
> > > +		oadn = kzalloc(sizeof(*oadn), GFP_KERNEL);
> > > +		if (!oadn) {
> > > +			ret = -ENOMEM;
> > > +			goto fail;
> > > +		}
> > > +
> > > +		oadn->kvm = kvm;
> > > +		INIT_LIST_HEAD(&oadn->irqfds);
> > > +		oadn->notifier.gsi = irqfd->gsi;
> > > +		oadn->notifier.irq_acked = irqfd_oadn_notify;
> > > +		kref_init(&oadn->kref);
> > > +		INIT_LIST_HEAD(&oadn->list);
> > > +
> > > +		oadn->irq_source_id = kvm_request_irq_source_id(kvm);
> > > +		if (oadn->irq_source_id < 0) {
> > > +			ret = oadn->irq_source_id;
> > > +			goto fail;
> > > +		}
> > > +
> > 
> > This still has the problem that by allocating these
> > OADN irqfds you can run out of source IDs for other uses.
> > Why don't OADNs just use KVM_IRQFD_IRQ_SOURCE_ID?
> > I thought this is why it was introduced in this patchset ...
> 
> See patch 0/2, it's racy.  We have 64 IRQ source IDs and 24 GSIs
> currently.  This let's all users on the same GSI share an IRQ source ID.
> Yes, we hold onto one we may not need during setup, but locking doesn't
> allow us to allocate it at the point where we'd need it.
> 
> > > +		irqfd->oadn = oadn;
> > > +
> > > +		INIT_WORK(&irqfd->inject, irqfd_oadn_inject);
> > > +	} else
> > > +		INIT_WORK(&irqfd->inject, irqfd_inject);
> > > +
> > >  	INIT_WORK(&irqfd->shutdown, irqfd_shutdown);
> > >  
> > >  	file = eventfd_fget(args->fd);
> > > @@ -250,6 +388,27 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
> > >  		goto fail;
> > >  	}
> > >  
> > > +	/*
> > > +	 * When configuring an OADN irqfd, try to re-use an existing OADN.
> > > +	 */
> > > +	if (oadn) {
> > > +		struct _irqfd_oadn *oadn_tmp;
> > > +
> > > +		list_for_each_entry(oadn_tmp, &kvm->irqfds.oadns, list) {
> > > +			if (oadn_tmp->notifier.gsi == irqfd->gsi) {
> > > +				irqfd->oadn = oadn_tmp;
> > > +				break;
> > > +			}
> > > +		}
> > > +
> > > +		if (irqfd->oadn != oadn)
> > > +			kref_get(&irqfd->oadn->kref);
> > > +		else
> > > +			list_add(&oadn->list, &kvm->irqfds.oadns);
> > > +
> > > +		list_add_rcu(&irqfd->oadn_list, &irqfd->oadn->irqfds);
> > 
> > Hang on, don't we need irqfds_lock here? 
> 
> We're in kvm->irqfds.lock.

 so we are my bad


> > > +	}
> > > +
> > >  	irq_rt = rcu_dereference_protected(kvm->irq_routing,
> > >  					   lockdep_is_held(&kvm->irqfds.lock));
> > >  	irqfd_update(kvm, irqfd, irq_rt);
> > > @@ -267,6 +426,26 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
> > >  
> > >  	spin_unlock_irq(&kvm->irqfds.lock);
> > >  
> > > +	if (oadn) {
> > > +		synchronize_rcu();
> > 
> > Seems unexpected on assign.
> > What does this synchronize_rcu do?
> 
> Because we force the notify below and we need the irqfd we just added to
> the list to be visible on the list for that.

I do not see it yet - maybe add a comment?

> > > +
> > > +		/*
> > > +		 * If we weren't able to re-use an OADN, setup the irq ack
> > > +		 * notifier outside of spinlock.  Our interface requires
> > > +		 * users to be able to handle spurious de-assert & notifies,
> > > +		 * so trigger one here to make sure we didn't miss anything.
> > > +		 * Cleanup unused OADN if we share.
> > > +		 */
> > > +		if (irqfd->oadn == oadn)
> > > +			kvm_register_irq_ack_notifier(kvm, &oadn->notifier);
> > > +		else {
> > > +			kvm_free_irq_source_id(kvm, oadn->irq_source_id);
> > 
> > This is unfortunate - so we need a spare source ID for internal use.
> > Thus if 1 source ID is available, then it is possible that
> > 
> > - allocate irqfd for A
> > - allocate irqfd for B
> > 
> > succeeds because A uses a shared source id so it is freed here.
> > 
> > While
> > 
> > - allocate irqfd for B
> > - allocate irqfd for A
> > 
> > fails because B used the last ID and now temporary allocation
> > above fails.
> 
> Yep, suggestions?  How many KVM_IRQFDs can we expect in-flight at one
> time vs how many IRQ source IDs will be used long term.  VFIO assigned
> PCI device can only make use of up to 4 GSIs per root bridge, we only
> have 24 IOAPIC pins.
> 
> > > +			kfree(oadn);
> > > +		}
> > > +
> > > +		irqfd_oadn_notify(&irqfd->oadn->notifier);
> > > +	}
> > > +
> > >  	/*
> > >  	 * do not drop the file until the irqfd is fully initialized, otherwise
> > >  	 * we might race against the POLLHUP
> > > @@ -276,12 +455,19 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
> > >  	return 0;
> > >  
> > >  fail:
> > > +	if (oadn_eventfd && !IS_ERR(oadn_eventfd))
> > > +		eventfd_ctx_put(oadn_eventfd);
> > > +
> > > +	if (oadn && oadn->irq_source_id >= 0)
> > > +		kvm_free_irq_source_id(kvm, oadn->irq_source_id);
> > > +
> > 
> > A bit cleaner to have distinct labels for failures
> > at different stages of the setup.
> > I know it's not written this way ATM so it's not a must
> > to address this.
> 
> I often find myself getting lost trying to use distinct labels.  I'm
> just following along what this function already does, but find it to be
> a nice simplification to handle everything this way.  Thanks,
> 
> Alex
> 
> > >  	if (eventfd && !IS_ERR(eventfd))
> > >  		eventfd_ctx_put(eventfd);
> > >  
> > >  	if (!IS_ERR(file))
> > >  		fput(file);
> > >  
> > > +	kfree(oadn);
> > >  	kfree(irqfd);
> > >  	return ret;
> > >  }
> > > @@ -291,6 +477,7 @@ kvm_eventfd_init(struct kvm *kvm)
> > >  {
> > >  	spin_lock_init(&kvm->irqfds.lock);
> > >  	INIT_LIST_HEAD(&kvm->irqfds.items);
> > > +	INIT_LIST_HEAD(&kvm->irqfds.oadns);
> > >  	INIT_LIST_HEAD(&kvm->ioeventfds);
> > >  }
> > >  
> > > @@ -340,7 +527,7 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
> > >  int
> > >  kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
> > >  {
> > > -	if (args->flags & ~KVM_IRQFD_FLAG_DEASSIGN)
> > > +	if (args->flags & ~(KVM_IRQFD_FLAG_DEASSIGN | KVM_IRQFD_FLAG_OADN))
> > >  		return -EINVAL;
> > >  
> > >  	if (args->flags & KVM_IRQFD_FLAG_DEASSIGN)
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 0/2] kvm: level irqfd support
  2012-08-21 19:28 [PATCH v9 0/2] kvm: level irqfd support Alex Williamson
  2012-08-21 19:29 ` [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd Alex Williamson
  2012-08-21 19:29 ` [PATCH v9 2/2] kvm: On Ack, De-assert & Notify KVM_IRQFD extension Alex Williamson
@ 2012-08-22  0:31 ` Michael S. Tsirkin
  2012-08-22  1:28   ` Alex Williamson
  2012-08-22 10:10 ` Michael S. Tsirkin
  2012-09-05 14:42 ` Avi Kivity
  4 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2012-08-22  0:31 UTC (permalink / raw)
  To: Alex Williamson; +Cc: avi, gleb, kvm, linux-kernel

On Tue, Aug 21, 2012 at 01:28:57PM -0600, Alex Williamson wrote:
> Here's the much anticipated re-write of support for level irqfds.  As
> Michael suggested, I've rolled the eoi/ack notification fd into
> KVM_IRQFD as a new mode.  For lack of a better name, as there seems to
> be objections to associating this specifically with an EOI or an ACK,
> I've name this OADN or "On Ack, De-assert & Notify".
> 
> Patch 1of2 switches current KVM_IRQFDs to use their own IRQ source ID
> since we're potentially stepping on KVM_USERSPACE_IRQ_SOURCE_ID.
> Unfurtunately I was not able to make 2of2 use a single IRQ source ID,
> the reason is it's racy.  Objects to track OADNs are made dynamically,
> we look through existing ones for a match under spinlock and setup a
> new one if there's no match.  On teardown, we can remove the OADN from
> the list under lock, but that same lock prevents us from de-assigning
> the IRQ ACK notifier or waiting for an RCU grace period.  We must make
> sure that any unused GSI is de-asserted, but the above means it's
> possible that another OADN has been created for this source ID/GSI
> and de-asserting the GSI could lead to breakage.

I do not see it. What breakage? Could you give an example please?


I think what you are saying is last deassign must clear
since otherwise we never will clear.
I agree it is either that or delay deassign until ack.

Can it be as simple as this (after all rcu etc dances)?
	lock irqfds
	if no oadns
		set level to 0
	unlock irqfds
?



>  Instead each OADN
> object gets it's own source ID, but these are all shared by users
> of the same GSI.  So for PCI devices, we might have up to 4 IRQ
> source IDs allocated.
> 
> Michael had also suggested avoiding reference counting and using
> list_empty for this OADN object.  Unfortunately, that doesn't work
> for similar reasons.  We want to release the OADN object underlock,
> preventing others from re-using it on the free path, but in order
> to have lock-less de-assert & notify we use RCU, meaning we can't
> trust list_empty until after an RCU grace period, which must be
> done outside of spinlocks.

confused. list empty on assign/deassing would be under lock
so no need for grace periods to trust it.
what am I missing?

But if you like kref more that is OK too.

> If there are suggestions how we can handle these better, please
> make them, but I think this compromise is race-free and still
> manages to make allocation of IRQ source IDs mostly a non-issue
> for device assignment limits.  Thanks,
> 
> Alex
> 
> ---
> 
> Alex Williamson (2):
>       kvm: On Ack, De-assert & Notify KVM_IRQFD extension
>       kvm: Use a reserved IRQ source ID for irqfd
> 
> 
>  Documentation/virtual/kvm/api.txt |   13 ++
>  arch/x86/kvm/x86.c                |    4 +
>  include/linux/kvm.h               |    7 +
>  include/linux/kvm_host.h          |    2 
>  virt/kvm/eventfd.c                |  199 ++++++++++++++++++++++++++++++++++++-
>  5 files changed, 218 insertions(+), 7 deletions(-)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd
  2012-08-21 21:14         ` Alex Williamson
@ 2012-08-22  0:41           ` Michael S. Tsirkin
  2012-08-22  1:34             ` Alex Williamson
  2012-09-05 14:35             ` Avi Kivity
  0 siblings, 2 replies; 35+ messages in thread
From: Michael S. Tsirkin @ 2012-08-22  0:41 UTC (permalink / raw)
  To: Alex Williamson; +Cc: avi, gleb, kvm, linux-kernel

On Tue, Aug 21, 2012 at 03:14:54PM -0600, Alex Williamson wrote:
> On Tue, 2012-08-21 at 23:41 +0300, Michael S. Tsirkin wrote:
> > On Tue, Aug 21, 2012 at 02:06:19PM -0600, Alex Williamson wrote:
> > > On Tue, 2012-08-21 at 22:58 +0300, Michael S. Tsirkin wrote:
> > > > On Tue, Aug 21, 2012 at 01:29:06PM -0600, Alex Williamson wrote:
> > > > > KVM_IRQFD currently uses the reserved KVM_USERSPACE_IRQ_SOURCE_ID
> > > > > which is also shared with userspace injection methods like
> > > > > KVM_IRQ_LINE.  This can cause a conflict if an irqfd triggers on
> > > > > a GSI asserted through KVM_IRQ_LINE.
> > > > 
> > > > What kind of conflict do you envision?  Pls note level interrupts are
> > > > unsupported ATM.
> > > 
> > > If KVM_IRQ_LINE asserts a level interrupt and KVM_IRQFD triggers on the
> > > same GSI then the pin is no longer asserted as userspace thinks it is.
> > > Do we just chalk this up to userspace error?
> > 
> > Yes: using a level GSI with current irqfd is a userspace error
> > because you can lose interrupts anyway.
> > 
> > Are edge GSIs affected?
> 
> I wouldn't think so.

No? If userspace does

. set line to 1
. trigger irqfd
. set line to 1
. trigger irqfd
. set line to 1
. trigger irqfd
. set line to 1

it gets 4 interrupts now

With your patch it will get 1, right?

> > > > > Move irqfd to it's own reserved IRQ source ID.  Add a capability for
> > > > > userspace to test for this fix.
> > > > > 
> > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > > > ---
> > > > > 
> > > > >  arch/x86/kvm/x86.c       |    3 +++
> > > > >  include/linux/kvm.h      |    1 +
> > > > >  include/linux/kvm_host.h |    1 +
> > > > >  virt/kvm/eventfd.c       |    6 +++---
> > > > >  4 files changed, 8 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > > index 42bce48..cd98673 100644
> > > > > --- a/arch/x86/kvm/x86.c
> > > > > +++ b/arch/x86/kvm/x86.c
> > > > > @@ -2174,6 +2174,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> > > > >  	case KVM_CAP_GET_TSC_KHZ:
> > > > >  	case KVM_CAP_PCI_2_3:
> > > > >  	case KVM_CAP_KVMCLOCK_CTRL:
> > > > > +	case KVM_CAP_IRQFD_IRQ_SOURCE_ID:
> > > > >  		r = 1;
> > > > >  		break;
> > > > >  	case KVM_CAP_COALESCED_MMIO:
> > > > > @@ -6258,6 +6259,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> > > > >  
> > > > >  	/* Reserve bit 0 of irq_sources_bitmap for userspace irq source */
> > > > >  	set_bit(KVM_USERSPACE_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap);
> > > > > +	/* Reserve bit 1 of irq_sources_bitmap for irqfd irq source */
> > > > > +	set_bit(KVM_IRQFD_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap);
> > > > >  
> > > > >  	raw_spin_lock_init(&kvm->arch.tsc_write_lock);
> > > > >  
> > > > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> > > > > index 2ce09aa..ae66b9c 100644
> > > > > --- a/include/linux/kvm.h
> > > > > +++ b/include/linux/kvm.h
> > > > > @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
> > > > >  #define KVM_CAP_PPC_GET_SMMU_INFO 78
> > > > >  #define KVM_CAP_S390_COW 79
> > > > >  #define KVM_CAP_PPC_ALLOC_HTAB 80
> > > > > +#define KVM_CAP_IRQFD_IRQ_SOURCE_ID 81
> > > > >  
> > > > >  #ifdef KVM_CAP_IRQ_ROUTING
> > > > >  
> > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > > > index b70b48b..b763230 100644
> > > > > --- a/include/linux/kvm_host.h
> > > > > +++ b/include/linux/kvm_host.h
> > > > > @@ -71,6 +71,7 @@
> > > > >  #define KVM_REQ_PMI               17
> > > > >  
> > > > >  #define KVM_USERSPACE_IRQ_SOURCE_ID	0
> > > > > +#define KVM_IRQFD_IRQ_SOURCE_ID		1
> > > > >  
> > > > >  struct kvm;
> > > > >  struct kvm_vcpu;
> > > > 
> > > > Above looks fine but I'm not sure why is the below needed.
> > > > This changes irqfd behaviour for edge GSIs slightly
> > > > in a userspace-visible way. Maybe make it a separate patch
> > > > so it can be considered on merits?
> > > 
> > > Hmm, the above does nothing without the below.
> > 
> > Yes. But you can use the above with the new irqfds you are adding.
> 
> Nope, racy.
> 
> > > I thought I was just
> > > implementing your idea that IRQFDs should all share a single IRQ source
> > > ID...
> > 
> > Sorry I only meant for level irqfds. You are changing edge here.
> 
> Ok, I misunderstood then.
> 
> > > why is that no longer a good idea?  Thanks,
> > > 
> > > Alex
> > 
> > Maybe it is a good idea. I am just asking for the motivation.
> 
> I assumed you were pointing out the level vs edge interaction.  If we
> call that a userspace bug, I can just drop this.  Thanks,
> 
> Alex

level is userspace bug I think :)

> > > > > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> > > > > index 7d7e2aa..2245cfa 100644
> > > > > --- a/virt/kvm/eventfd.c
> > > > > +++ b/virt/kvm/eventfd.c
> > > > > @@ -67,8 +67,8 @@ irqfd_inject(struct work_struct *work)
> > > > >  	struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
> > > > >  	struct kvm *kvm = irqfd->kvm;
> > > > >  
> > > > > -	kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1);
> > > > > -	kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
> > > > > +	kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 1);
> > > > > +	kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 0);
> > > > >  }
> > > > >  
> > > > >  /*
> > > > > @@ -138,7 +138,7 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
> > > > >  		irq = rcu_dereference(irqfd->irq_entry);
> > > > >  		/* An event has been signaled, inject an interrupt */
> > > > >  		if (irq)
> > > > > -			kvm_set_msi(irq, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1);
> > > > > +			kvm_set_msi(irq, kvm, KVM_IRQFD_IRQ_SOURCE_ID, 1);
> > > > >  		else
> > > > >  			schedule_work(&irqfd->inject);
> > > > >  		rcu_read_unlock();
> > > 
> > > 
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 0/2] kvm: level irqfd support
  2012-08-22  0:31 ` [PATCH v9 0/2] kvm: level irqfd support Michael S. Tsirkin
@ 2012-08-22  1:28   ` Alex Williamson
  2012-08-22  8:25     ` Michael S. Tsirkin
  0 siblings, 1 reply; 35+ messages in thread
From: Alex Williamson @ 2012-08-22  1:28 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: avi, gleb, kvm, linux-kernel

On Wed, 2012-08-22 at 03:31 +0300, Michael S. Tsirkin wrote:
> On Tue, Aug 21, 2012 at 01:28:57PM -0600, Alex Williamson wrote:
> > Here's the much anticipated re-write of support for level irqfds.  As
> > Michael suggested, I've rolled the eoi/ack notification fd into
> > KVM_IRQFD as a new mode.  For lack of a better name, as there seems to
> > be objections to associating this specifically with an EOI or an ACK,
> > I've name this OADN or "On Ack, De-assert & Notify".
> > 
> > Patch 1of2 switches current KVM_IRQFDs to use their own IRQ source ID
> > since we're potentially stepping on KVM_USERSPACE_IRQ_SOURCE_ID.
> > Unfurtunately I was not able to make 2of2 use a single IRQ source ID,
> > the reason is it's racy.  Objects to track OADNs are made dynamically,
> > we look through existing ones for a match under spinlock and setup a
> > new one if there's no match.  On teardown, we can remove the OADN from
> > the list under lock, but that same lock prevents us from de-assigning
> > the IRQ ACK notifier or waiting for an RCU grace period.  We must make
> > sure that any unused GSI is de-asserted, but the above means it's
> > possible that another OADN has been created for this source ID/GSI
> > and de-asserting the GSI could lead to breakage.
> 
> I do not see it. What breakage? Could you give an example please?
> 
> 
> I think what you are saying is last deassign must clear
> since otherwise we never will clear.
> I agree it is either that or delay deassign until ack.
> 
> Can it be as simple as this (after all rcu etc dances)?
> 	lock irqfds
> 	if no oadns
> 		set level to 0
> 	unlock irqfds
> ?

lock irqfds
remove irqfd from oadn list
if no oadns
    remove oadn
    set gsi 0
unlock
                                    lock irqfds
                                    new oadn
                                    unlock irqfds

>> EOI 
                                                          ack notify new oadn
                                                          de-assert gsi
                                                          notify new oadn
>> re-assert irqfd
                                                          ack notify old oadn
                                                          de-assert gsi
                                                          notify old oadn

synchronize_rcu

kvm_unregister_irq_ack_notifier

So, because the unregister is removed from the final clear and because
we share an IRQ source ID there's a window where we can have two oadns
registered for the same GSI.  The new one will de-assert and notify
while the old one has an empty list to notify, but still de-asserts.  We
can therefore de-assert w/o notify.

By using a new source ID, we separate the two so users of the new oadn
can't race the old and we can cleanly free the old source ID,
de-asserting it.

> >  Instead each OADN
> > object gets it's own source ID, but these are all shared by users
> > of the same GSI.  So for PCI devices, we might have up to 4 IRQ
> > source IDs allocated.
> > 
> > Michael had also suggested avoiding reference counting and using
> > list_empty for this OADN object.  Unfortunately, that doesn't work
> > for similar reasons.  We want to release the OADN object underlock,
> > preventing others from re-using it on the free path, but in order
> > to have lock-less de-assert & notify we use RCU, meaning we can't
> > trust list_empty until after an RCU grace period, which must be
> > done outside of spinlocks.
> 
> confused. list empty on assign/deassing would be under lock
> so no need for grace periods to trust it.
> what am I missing?
> 
> But if you like kref more that is OK too.

Maybe I'm misinterpreting this:

include/linux/rculist.h:
/**
 * list_del_rcu - deletes entry from list without re-initialization
 * @entry: the element to delete from the list.
 *
 * Note: list_empty() on entry does not return true after this,
 * the entry is in an undefined state. It is useful for RCU based
 * lockfree traversal.

If I can trust list_empty on oadn->irqfds, which maybe I can re-reading
it again, then we can drop the kref.  Thanks,

Alex


> > If there are suggestions how we can handle these better, please
> > make them, but I think this compromise is race-free and still
> > manages to make allocation of IRQ source IDs mostly a non-issue
> > for device assignment limits.  Thanks,
> > 
> > Alex
> > 
> > ---
> > 
> > Alex Williamson (2):
> >       kvm: On Ack, De-assert & Notify KVM_IRQFD extension
> >       kvm: Use a reserved IRQ source ID for irqfd
> > 
> > 
> >  Documentation/virtual/kvm/api.txt |   13 ++
> >  arch/x86/kvm/x86.c                |    4 +
> >  include/linux/kvm.h               |    7 +
> >  include/linux/kvm_host.h          |    2 
> >  virt/kvm/eventfd.c                |  199 ++++++++++++++++++++++++++++++++++++-
> >  5 files changed, 218 insertions(+), 7 deletions(-)




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd
  2012-08-22  0:41           ` Michael S. Tsirkin
@ 2012-08-22  1:34             ` Alex Williamson
  2012-09-05 14:35             ` Avi Kivity
  1 sibling, 0 replies; 35+ messages in thread
From: Alex Williamson @ 2012-08-22  1:34 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: avi, gleb, kvm, linux-kernel

On Wed, 2012-08-22 at 03:41 +0300, Michael S. Tsirkin wrote:
> On Tue, Aug 21, 2012 at 03:14:54PM -0600, Alex Williamson wrote:
> > On Tue, 2012-08-21 at 23:41 +0300, Michael S. Tsirkin wrote:
> > > On Tue, Aug 21, 2012 at 02:06:19PM -0600, Alex Williamson wrote:
> > > > On Tue, 2012-08-21 at 22:58 +0300, Michael S. Tsirkin wrote:
> > > > > On Tue, Aug 21, 2012 at 01:29:06PM -0600, Alex Williamson wrote:
> > > > > > KVM_IRQFD currently uses the reserved KVM_USERSPACE_IRQ_SOURCE_ID
> > > > > > which is also shared with userspace injection methods like
> > > > > > KVM_IRQ_LINE.  This can cause a conflict if an irqfd triggers on
> > > > > > a GSI asserted through KVM_IRQ_LINE.
> > > > > 
> > > > > What kind of conflict do you envision?  Pls note level interrupts are
> > > > > unsupported ATM.
> > > > 
> > > > If KVM_IRQ_LINE asserts a level interrupt and KVM_IRQFD triggers on the
> > > > same GSI then the pin is no longer asserted as userspace thinks it is.
> > > > Do we just chalk this up to userspace error?
> > > 
> > > Yes: using a level GSI with current irqfd is a userspace error
> > > because you can lose interrupts anyway.
> > > 
> > > Are edge GSIs affected?
> > 
> > I wouldn't think so.
> 
> No? If userspace does
> 
> . set line to 1
> . trigger irqfd
> . set line to 1
> . trigger irqfd
> . set line to 1
> . trigger irqfd
> . set line to 1
> 
> it gets 4 interrupts now
> 
> With your patch it will get 1, right?
> 
> > > > > > Move irqfd to it's own reserved IRQ source ID.  Add a capability for
> > > > > > userspace to test for this fix.
> > > > > > 
> > > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > > > > ---
> > > > > > 
> > > > > >  arch/x86/kvm/x86.c       |    3 +++
> > > > > >  include/linux/kvm.h      |    1 +
> > > > > >  include/linux/kvm_host.h |    1 +
> > > > > >  virt/kvm/eventfd.c       |    6 +++---
> > > > > >  4 files changed, 8 insertions(+), 3 deletions(-)
> > > > > > 
> > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > > > index 42bce48..cd98673 100644
> > > > > > --- a/arch/x86/kvm/x86.c
> > > > > > +++ b/arch/x86/kvm/x86.c
> > > > > > @@ -2174,6 +2174,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> > > > > >  	case KVM_CAP_GET_TSC_KHZ:
> > > > > >  	case KVM_CAP_PCI_2_3:
> > > > > >  	case KVM_CAP_KVMCLOCK_CTRL:
> > > > > > +	case KVM_CAP_IRQFD_IRQ_SOURCE_ID:
> > > > > >  		r = 1;
> > > > > >  		break;
> > > > > >  	case KVM_CAP_COALESCED_MMIO:
> > > > > > @@ -6258,6 +6259,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> > > > > >  
> > > > > >  	/* Reserve bit 0 of irq_sources_bitmap for userspace irq source */
> > > > > >  	set_bit(KVM_USERSPACE_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap);
> > > > > > +	/* Reserve bit 1 of irq_sources_bitmap for irqfd irq source */
> > > > > > +	set_bit(KVM_IRQFD_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap);
> > > > > >  
> > > > > >  	raw_spin_lock_init(&kvm->arch.tsc_write_lock);
> > > > > >  
> > > > > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> > > > > > index 2ce09aa..ae66b9c 100644
> > > > > > --- a/include/linux/kvm.h
> > > > > > +++ b/include/linux/kvm.h
> > > > > > @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
> > > > > >  #define KVM_CAP_PPC_GET_SMMU_INFO 78
> > > > > >  #define KVM_CAP_S390_COW 79
> > > > > >  #define KVM_CAP_PPC_ALLOC_HTAB 80
> > > > > > +#define KVM_CAP_IRQFD_IRQ_SOURCE_ID 81
> > > > > >  
> > > > > >  #ifdef KVM_CAP_IRQ_ROUTING
> > > > > >  
> > > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > > > > index b70b48b..b763230 100644
> > > > > > --- a/include/linux/kvm_host.h
> > > > > > +++ b/include/linux/kvm_host.h
> > > > > > @@ -71,6 +71,7 @@
> > > > > >  #define KVM_REQ_PMI               17
> > > > > >  
> > > > > >  #define KVM_USERSPACE_IRQ_SOURCE_ID	0
> > > > > > +#define KVM_IRQFD_IRQ_SOURCE_ID		1
> > > > > >  
> > > > > >  struct kvm;
> > > > > >  struct kvm_vcpu;
> > > > > 
> > > > > Above looks fine but I'm not sure why is the below needed.
> > > > > This changes irqfd behaviour for edge GSIs slightly
> > > > > in a userspace-visible way. Maybe make it a separate patch
> > > > > so it can be considered on merits?
> > > > 
> > > > Hmm, the above does nothing without the below.
> > > 
> > > Yes. But you can use the above with the new irqfds you are adding.
> > 
> > Nope, racy.
> > 
> > > > I thought I was just
> > > > implementing your idea that IRQFDs should all share a single IRQ source
> > > > ID...
> > > 
> > > Sorry I only meant for level irqfds. You are changing edge here.
> > 
> > Ok, I misunderstood then.
> > 
> > > > why is that no longer a good idea?  Thanks,
> > > > 
> > > > Alex
> > > 
> > > Maybe it is a good idea. I am just asking for the motivation.
> > 
> > I assumed you were pointing out the level vs edge interaction.  If we
> > call that a userspace bug, I can just drop this.  Thanks,
> > 
> > Alex
> 
> level is userspace bug I think :)

Dropped.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 2/2] kvm: On Ack, De-assert & Notify KVM_IRQFD extension
  2012-08-22  0:14       ` Michael S. Tsirkin
@ 2012-08-22  1:48         ` Alex Williamson
  0 siblings, 0 replies; 35+ messages in thread
From: Alex Williamson @ 2012-08-22  1:48 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: avi, gleb, kvm, linux-kernel

On Wed, 2012-08-22 at 03:14 +0300, Michael S. Tsirkin wrote:
> On Tue, Aug 21, 2012 at 03:11:41PM -0600, Alex Williamson wrote:
> > On Tue, 2012-08-21 at 23:37 +0300, Michael S. Tsirkin wrote:
> > > On Tue, Aug 21, 2012 at 01:29:14PM -0600, Alex Williamson wrote:
> > > > +	}
> > > > +
> > > >  	irq_rt = rcu_dereference_protected(kvm->irq_routing,
> > > >  					   lockdep_is_held(&kvm->irqfds.lock));
> > > >  	irqfd_update(kvm, irqfd, irq_rt);
> > > > @@ -267,6 +426,26 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
> > > >  
> > > >  	spin_unlock_irq(&kvm->irqfds.lock);
> > > >  
> > > > +	if (oadn) {
> > > > +		synchronize_rcu();
> > > 
> > > Seems unexpected on assign.
> > > What does this synchronize_rcu do?
> > 
> > Because we force the notify below and we need the irqfd we just added to
> > the list to be visible on the list for that.
> 
> I do not see it yet - maybe add a comment?

It's entirely possible I'm wrong too.  This was an after thought and now
that I look at it again, the below irqfd_oadn_notify obviously starts
after the list_add, so there shouldn't be any need to wait.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 0/2] kvm: level irqfd support
  2012-08-22  1:28   ` Alex Williamson
@ 2012-08-22  8:25     ` Michael S. Tsirkin
  2012-09-17 18:08       ` Alex Williamson
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2012-08-22  8:25 UTC (permalink / raw)
  To: Alex Williamson; +Cc: avi, gleb, kvm, linux-kernel

On Tue, Aug 21, 2012 at 07:28:15PM -0600, Alex Williamson wrote:
> On Wed, 2012-08-22 at 03:31 +0300, Michael S. Tsirkin wrote:
> > On Tue, Aug 21, 2012 at 01:28:57PM -0600, Alex Williamson wrote:
> > > Here's the much anticipated re-write of support for level irqfds.  As
> > > Michael suggested, I've rolled the eoi/ack notification fd into
> > > KVM_IRQFD as a new mode.  For lack of a better name, as there seems to
> > > be objections to associating this specifically with an EOI or an ACK,
> > > I've name this OADN or "On Ack, De-assert & Notify".
> > > 
> > > Patch 1of2 switches current KVM_IRQFDs to use their own IRQ source ID
> > > since we're potentially stepping on KVM_USERSPACE_IRQ_SOURCE_ID.
> > > Unfurtunately I was not able to make 2of2 use a single IRQ source ID,
> > > the reason is it's racy.  Objects to track OADNs are made dynamically,
> > > we look through existing ones for a match under spinlock and setup a
> > > new one if there's no match.  On teardown, we can remove the OADN from
> > > the list under lock, but that same lock prevents us from de-assigning
> > > the IRQ ACK notifier or waiting for an RCU grace period.  We must make
> > > sure that any unused GSI is de-asserted, but the above means it's
> > > possible that another OADN has been created for this source ID/GSI
> > > and de-asserting the GSI could lead to breakage.
> > 
> > I do not see it. What breakage? Could you give an example please?
> > 
> > 
> > I think what you are saying is last deassign must clear
> > since otherwise we never will clear.
> > I agree it is either that or delay deassign until ack.
> > 
> > Can it be as simple as this (after all rcu etc dances)?
> > 	lock irqfds
> > 	if no oadns
> > 		set level to 0
> > 	unlock irqfds
> > ?
> 
> lock irqfds
> remove irqfd from oadn list
> if no oadns
>     remove oadn
>     set gsi 0
> unlock
>                                     lock irqfds
>                                     new oadn
>                                     unlock irqfds
> 
> >> EOI 
>                                                           ack notify new oadn
>                                                           de-assert gsi
>                                                           notify new oadn
> >> re-assert irqfd
>                                                           ack notify old oadn
>                                                           de-assert gsi
>                                                           notify old oadn
> 
> synchronize_rcu
> 
> kvm_unregister_irq_ack_notifier
>
> So, because the unregister is removed from the final clear and because
> we share an IRQ source ID there's a window where we can have two oadns
> registered for the same GSI.  The new one will de-assert and notify
> while the old one has an empty list to notify, but still de-asserts.  We
> can therefore de-assert w/o notify.
> 
> By using a new source ID, we separate the two so users of the new oadn
> can't race the old and we can cleanly free the old source ID,
> de-asserting it.

Need to think about it some more but is the problem two
ack notifiers for the same gsi?

In that case, how about we add __kvm_unregister_irq_ack_notifier
with no locking, and do most of the above under
kvm->irq_lock?

With one change: it is better not to call synchronize_rcu
under irq lock, I think we can safely move it to after
__kvm_unregister_irq_ack_notifier.


> > >  Instead each OADN
> > > object gets it's own source ID, but these are all shared by users
> > > of the same GSI.  So for PCI devices, we might have up to 4 IRQ
> > > source IDs allocated.
> > > 
> > > Michael had also suggested avoiding reference counting and using
> > > list_empty for this OADN object.  Unfortunately, that doesn't work
> > > for similar reasons.  We want to release the OADN object underlock,
> > > preventing others from re-using it on the free path, but in order
> > > to have lock-less de-assert & notify we use RCU, meaning we can't
> > > trust list_empty until after an RCU grace period, which must be
> > > done outside of spinlocks.
> > 
> > confused. list empty on assign/deassing would be under lock
> > so no need for grace periods to trust it.
> > what am I missing?
> > 
> > But if you like kref more that is OK too.
> 
> Maybe I'm misinterpreting this:
> 
> include/linux/rculist.h:
> /**
>  * list_del_rcu - deletes entry from list without re-initialization
>  * @entry: the element to delete from the list.
>  *
>  * Note: list_empty() on entry does not return true after this,
>  * the entry is in an undefined state. It is useful for RCU based
>  * lockfree traversal.
> 
> If I can trust list_empty on oadn->irqfds, which maybe I can re-reading
> it again, then we can drop the kref.  Thanks,
> 
> Alex

I think you are - *the entry you deleted* is not empty.
The list itself naturally is, or so it seems from code.

static inline void list_del_rcu(struct list_head *entry)
{
        __list_del_entry(entry);
        entry->prev = LIST_POISON2;
}

No?

> 
> > > If there are suggestions how we can handle these better, please
> > > make them, but I think this compromise is race-free and still
> > > manages to make allocation of IRQ source IDs mostly a non-issue
> > > for device assignment limits.  Thanks,
> > > 
> > > Alex
> > > 
> > > ---
> > > 
> > > Alex Williamson (2):
> > >       kvm: On Ack, De-assert & Notify KVM_IRQFD extension
> > >       kvm: Use a reserved IRQ source ID for irqfd
> > > 
> > > 
> > >  Documentation/virtual/kvm/api.txt |   13 ++
> > >  arch/x86/kvm/x86.c                |    4 +
> > >  include/linux/kvm.h               |    7 +
> > >  include/linux/kvm_host.h          |    2 
> > >  virt/kvm/eventfd.c                |  199 ++++++++++++++++++++++++++++++++++++-
> > >  5 files changed, 218 insertions(+), 7 deletions(-)
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 0/2] kvm: level irqfd support
  2012-08-21 19:28 [PATCH v9 0/2] kvm: level irqfd support Alex Williamson
                   ` (2 preceding siblings ...)
  2012-08-22  0:31 ` [PATCH v9 0/2] kvm: level irqfd support Michael S. Tsirkin
@ 2012-08-22 10:10 ` Michael S. Tsirkin
  2012-09-05 14:42 ` Avi Kivity
  4 siblings, 0 replies; 35+ messages in thread
From: Michael S. Tsirkin @ 2012-08-22 10:10 UTC (permalink / raw)
  To: Alex Williamson; +Cc: avi, gleb, kvm, linux-kernel

On Tue, Aug 21, 2012 at 01:28:57PM -0600, Alex Williamson wrote:
> Here's the much anticipated re-write of support for level irqfds.  As
> Michael suggested, I've rolled the eoi/ack notification fd into
> KVM_IRQFD as a new mode.  For lack of a better name, as there seems to
> be objections to associating this specifically with an EOI or an ACK,
> I've name this OADN or "On Ack, De-assert & Notify".

Kid wouldn't let me sleep in the night so I've been thinking of better
names :).

I think a non abbreviated deassert_on_ack is a better name.
Yes e.g. irqfd is an abbreviation too, but it is
actually just a composition of irq and fd which are
both standard, familiar abbreviations.

If you have really set your heart on using an abbreviation,
it is better to not put two vowels together.
ADN (ack deasserts and notifies or auto deassert and notify)
is a bit better in that respect, though it does look a bit like
a miss-spelled AND which is unfortunate.

But obviously, it's all not terribly important.

> Patch 1of2 switches current KVM_IRQFDs to use their own IRQ source ID
> since we're potentially stepping on KVM_USERSPACE_IRQ_SOURCE_ID.
> Unfurtunately I was not able to make 2of2 use a single IRQ source ID,
> the reason is it's racy.  Objects to track OADNs are made dynamically,
> we look through existing ones for a match under spinlock and setup a
> new one if there's no match.  On teardown, we can remove the OADN from
> the list under lock, but that same lock prevents us from de-assigning
> the IRQ ACK notifier or waiting for an RCU grace period.  We must make
> sure that any unused GSI is de-asserted, but the above means it's
> possible that another OADN has been created for this source ID/GSI
> and de-asserting the GSI could lead to breakage.  Instead each OADN
> object gets it's own source ID, but these are all shared by users
> of the same GSI.  So for PCI devices, we might have up to 4 IRQ
> source IDs allocated.
> 
> Michael had also suggested avoiding reference counting and using
> list_empty for this OADN object.  Unfortunately, that doesn't work
> for similar reasons.  We want to release the OADN object underlock,
> preventing others from re-using it on the free path, but in order
> to have lock-less de-assert & notify we use RCU, meaning we can't
> trust list_empty until after an RCU grace period, which must be
> done outside of spinlocks.
> 
> If there are suggestions how we can handle these better, please
> make them, but I think this compromise is race-free and still
> manages to make allocation of IRQ source IDs mostly a non-issue
> for device assignment limits.  Thanks,
> 
> Alex
> 
> ---
> 
> Alex Williamson (2):
>       kvm: On Ack, De-assert & Notify KVM_IRQFD extension
>       kvm: Use a reserved IRQ source ID for irqfd
> 
> 
>  Documentation/virtual/kvm/api.txt |   13 ++
>  arch/x86/kvm/x86.c                |    4 +
>  include/linux/kvm.h               |    7 +
>  include/linux/kvm_host.h          |    2 
>  virt/kvm/eventfd.c                |  199 ++++++++++++++++++++++++++++++++++++-
>  5 files changed, 218 insertions(+), 7 deletions(-)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd
  2012-08-22  0:41           ` Michael S. Tsirkin
  2012-08-22  1:34             ` Alex Williamson
@ 2012-09-05 14:35             ` Avi Kivity
  2012-09-05 14:51               ` Michael S. Tsirkin
  1 sibling, 1 reply; 35+ messages in thread
From: Avi Kivity @ 2012-09-05 14:35 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Alex Williamson, gleb, kvm, linux-kernel

On 08/22/2012 03:41 AM, Michael S. Tsirkin wrote:
>> 
>> I assumed you were pointing out the level vs edge interaction.  If we
>> call that a userspace bug, I can just drop this.  Thanks,
>> 
>> Alex
> 
> level is userspace bug I think :)

I don't see how it's a bug.  Suppose we have a vfio device that shares a
gsi with an emulated device.  The emulated device naturally uses
KVM_IRQ_LINE (it has no need to re-sample on ADN), while vfio naturally
has to use irqfd.

Note one would expect that each irqfd gets its own irq source id, since
they are all independent level sources.  The reason they don't is that
we shut them down anyway and let the sources re-trigger (it is more
accurate to say that they have no irq source id, but that would just
muddle the implementation).

Alex, if the conclusion is that we do need this patch, then please add a
comment explaining why we can share the source id among all irqfd users.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 0/2] kvm: level irqfd support
  2012-08-21 19:28 [PATCH v9 0/2] kvm: level irqfd support Alex Williamson
                   ` (3 preceding siblings ...)
  2012-08-22 10:10 ` Michael S. Tsirkin
@ 2012-09-05 14:42 ` Avi Kivity
  2012-09-05 14:52   ` Michael S. Tsirkin
  4 siblings, 1 reply; 35+ messages in thread
From: Avi Kivity @ 2012-09-05 14:42 UTC (permalink / raw)
  To: Alex Williamson; +Cc: mst, gleb, kvm, linux-kernel

On 08/21/2012 10:28 PM, Alex Williamson wrote:
> Here's the much anticipated re-write of support for level irqfds.  As
> Michael suggested, I've rolled the eoi/ack notification fd into
> KVM_IRQFD as a new mode.  For lack of a better name, as there seems to
> be objections to associating this specifically with an EOI or an ACK,
> I've name this OADN or "On Ack, De-assert & Notify".

Perhaps we can call it "resample", since this is what it is doing.
irqfd tells kvm that the line as been asserted, resample tells the
caller to check again.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd
  2012-08-21 19:29 ` [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd Alex Williamson
  2012-08-21 19:58   ` Michael S. Tsirkin
@ 2012-09-05 14:46   ` Avi Kivity
  2012-09-05 15:07     ` Michael S. Tsirkin
  1 sibling, 1 reply; 35+ messages in thread
From: Avi Kivity @ 2012-09-05 14:46 UTC (permalink / raw)
  To: Alex Williamson; +Cc: mst, gleb, kvm, linux-kernel

On 08/21/2012 10:29 PM, Alex Williamson wrote:
> KVM_IRQFD currently uses the reserved KVM_USERSPACE_IRQ_SOURCE_ID
> which is also shared with userspace injection methods like
> KVM_IRQ_LINE.  This can cause a conflict if an irqfd triggers on
> a GSI asserted through KVM_IRQ_LINE.  Move irqfd to it's own
> reserved IRQ source ID.  Add a capability for userspace to test
> for this fix.

I don't think we need a cap, rather a backport if we identify real cases
where an edge gsi is shared among several devices.  Otherwise it is just
a theoretical bug before level irqfd is introduced.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd
  2012-09-05 14:35             ` Avi Kivity
@ 2012-09-05 14:51               ` Michael S. Tsirkin
  2012-09-05 14:59                 ` Avi Kivity
  2012-09-05 15:09                 ` Michael S. Tsirkin
  0 siblings, 2 replies; 35+ messages in thread
From: Michael S. Tsirkin @ 2012-09-05 14:51 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Alex Williamson, gleb, kvm, linux-kernel

On Wed, Sep 05, 2012 at 05:35:43PM +0300, Avi Kivity wrote:
> On 08/22/2012 03:41 AM, Michael S. Tsirkin wrote:
> >> 
> >> I assumed you were pointing out the level vs edge interaction.  If we
> >> call that a userspace bug, I can just drop this.  Thanks,
> >> 
> >> Alex
> > 
> > level is userspace bug I think :)
> 
> I don't see how it's a bug.  Suppose we have a vfio device that shares a
> gsi with an emulated device.  The emulated device naturally uses
> KVM_IRQ_LINE (it has no need to re-sample on ADN), while vfio naturally
> has to use irqfd.

Absolutely. But vfio needs to use irqfd with the new flag.
Using existing irqfd for level is a bug.

> Note one would expect that each irqfd gets its own irq source id, since
> they are all independent level sources.  The reason they don't is that
> we shut them down anyway and let the sources re-trigger (it is more
> accurate to say that they have no irq source id, but that would just
> muddle the implementation).
> 
> Alex, if the conclusion is that we do need this patch, then please add a
> comment explaining why we can share the source id among all irqfd users.
> 
> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 0/2] kvm: level irqfd support
  2012-09-05 14:42 ` Avi Kivity
@ 2012-09-05 14:52   ` Michael S. Tsirkin
  0 siblings, 0 replies; 35+ messages in thread
From: Michael S. Tsirkin @ 2012-09-05 14:52 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Alex Williamson, gleb, kvm, linux-kernel

On Wed, Sep 05, 2012 at 05:42:38PM +0300, Avi Kivity wrote:
> On 08/21/2012 10:28 PM, Alex Williamson wrote:
> > Here's the much anticipated re-write of support for level irqfds.  As
> > Michael suggested, I've rolled the eoi/ack notification fd into
> > KVM_IRQFD as a new mode.  For lack of a better name, as there seems to
> > be objections to associating this specifically with an EOI or an ACK,
> > I've name this OADN or "On Ack, De-assert & Notify".
> 
> Perhaps we can call it "resample", since this is what it is doing.
> irqfd tells kvm that the line as been asserted, resample tells the
> caller to check again.

Sounds good.

> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 2/2] kvm: On Ack, De-assert & Notify KVM_IRQFD extension
  2012-08-21 19:29 ` [PATCH v9 2/2] kvm: On Ack, De-assert & Notify KVM_IRQFD extension Alex Williamson
  2012-08-21 20:37   ` Michael S. Tsirkin
@ 2012-09-05 14:57   ` Avi Kivity
  2012-09-17 16:13     ` Alex Williamson
  1 sibling, 1 reply; 35+ messages in thread
From: Avi Kivity @ 2012-09-05 14:57 UTC (permalink / raw)
  To: Alex Williamson; +Cc: mst, gleb, kvm, linux-kernel

On 08/21/2012 10:29 PM, Alex Williamson wrote:
> For VFIO based device assignment we'd like a mechanism to allow level
> triggered interrutps to be directly injected into KVM.  KVM_IRQFD
> already allows this for edge triggered interrupts, but for level, we
> need to watch for acknowledgement of the interrupt from the guest to
> provide us a hint when to test the device and allow it to re-assert
> if necessary.  To do this, we create a new KVM_IRQFD mode called
> "On Ack, De-assert & Notify", or OADN.  In this mode, an interrupt
> injection provides only a gsi assertion.  We then hook into the IRQ
> ACK notifier, which when triggered de-asserts the gsi and notifies
> via another eventfd.  It's then the responsibility of the user to
> re-assert the interrupt is service is still required.
> 
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index bf33aaa..87d7321 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -1946,6 +1946,19 @@ the guest using the specified gsi pin.  The irqfd is removed using
>  the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
>  and kvm_irqfd.gsi.
>  
> +With KVM_CAP_IRQFD_OADN, KVM_IRQFD supports an "On Ack, De-assert &
> +Notify" option that allows emulation of level-triggered interrupts.
> +When kvm_irqfd.fd is triggered, the requested gsi is asserted and
> +remains asserted until interaction with the irqchip indicates the
> +VM has acknowledged the interrupt, such as an EOI.  On acknoledgement
> +the gsi is automatically de-asserted and the user is notified via
> +kvm_irqfd.notifyfd.  The user is then required to re-assert the
> +interrupt if the associated device still requires service.  To enable
> +this mode, configure the KVM_IRQFD using the KVM_IRQFD_FLAG_OADN flag
> +and specify kvm_irqfd.notifyfd.  Note that closing kvm_irqfd.notifyfd
> +while configured in this mode does not disable the irqfd.  The
> +KVM_IRQFD_FLAG_OADN flag is only necessary on assignment.

Under my suggested naming, this would be called a "resampling irqfd",
with resampling requested via kvm_irqfd.resamplefd.

> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 2245cfa..dfdb5b2 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -43,6 +43,23 @@
>   * --------------------------------------------------------------------
>   */
>  
> +/*
> + * OADN irqfds (On Ack, De-assert & Notify) are a special variety of
> + * irqfds that assert an interrupt to the irqchip on eventfd trigger,
> + * receieve notification when userspace acknowledges the interrupt,
> + * automatically de-asserts the irqchip level, and notifies userspace
> + * via the oadn_eventfd.  This object helps to provide one-to-many
> + * deassert-to-notify so we can share a single irq source ID per OADN.
> + */
> +struct _irqfd_oadn {
> +	struct kvm *kvm;
> +	int irq_source_id; /* IRQ source ID shared by these irqfds */
> +	struct list_head irqfds; /* list of irqfds using this object */
> +	struct kvm_irq_ack_notifier notifier; /* IRQ ACK notification */
> +	struct kref kref; /* Race-free removal */
> +	struct list_head list;
> +};


Why do you need per-gsi irq source IDs?  irq source ids only matter
within a gsi.  For example KVM_IRQ_LINE shares one source ID for all
lines (with result that userspace is forced to manage the ORing of
shared inputs itself).




-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd
  2012-09-05 14:51               ` Michael S. Tsirkin
@ 2012-09-05 14:59                 ` Avi Kivity
  2012-09-05 15:13                   ` Michael S. Tsirkin
  2012-09-05 15:09                 ` Michael S. Tsirkin
  1 sibling, 1 reply; 35+ messages in thread
From: Avi Kivity @ 2012-09-05 14:59 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Alex Williamson, gleb, kvm, linux-kernel

On 09/05/2012 05:51 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 05, 2012 at 05:35:43PM +0300, Avi Kivity wrote:
>> On 08/22/2012 03:41 AM, Michael S. Tsirkin wrote:
>> >> 
>> >> I assumed you were pointing out the level vs edge interaction.  If we
>> >> call that a userspace bug, I can just drop this.  Thanks,
>> >> 
>> >> Alex
>> > 
>> > level is userspace bug I think :)
>> 
>> I don't see how it's a bug.  Suppose we have a vfio device that shares a
>> gsi with an emulated device.  The emulated device naturally uses
>> KVM_IRQ_LINE (it has no need to re-sample on ADN), while vfio naturally
>> has to use irqfd.
> 
> Absolutely. But vfio needs to use irqfd with the new flag.
> Using existing irqfd for level is a bug.

I see we're not reusing this irq source id for level irqfd.  But I think
we should, there's no need for per-gsi irq source id.  Plus I'd like to
fix the theoretical bug even if it doesn't bite in practice.



-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd
  2012-09-05 14:46   ` Avi Kivity
@ 2012-09-05 15:07     ` Michael S. Tsirkin
  2012-09-05 15:15       ` Avi Kivity
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2012-09-05 15:07 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Alex Williamson, gleb, kvm, linux-kernel

On Wed, Sep 05, 2012 at 05:46:17PM +0300, Avi Kivity wrote:
> On 08/21/2012 10:29 PM, Alex Williamson wrote:
> > KVM_IRQFD currently uses the reserved KVM_USERSPACE_IRQ_SOURCE_ID
> > which is also shared with userspace injection methods like
> > KVM_IRQ_LINE.  This can cause a conflict if an irqfd triggers on
> > a GSI asserted through KVM_IRQ_LINE.  Move irqfd to it's own
> > reserved IRQ source ID.  Add a capability for userspace to test
> > for this fix.
> 
> I don't think we need a cap, rather a backport if we identify real cases
> where an edge gsi is shared among several devices.  Otherwise it is just
> a theoretical bug before level irqfd is introduced.

In that case, I think it's safer to preserve the "bug" as is: we are
changing userspace-visible behaviour for edge interrupts otherwise.
For example if userspace uses kvm_irq_line for an edge
interrupt, set it to 1, previously it could then
continue to send any number of interrupts with irqfd,
now it can't.

Basically the logical OR functionality of source IDs
does not make sense for edge.

How about we do
	 if (flags&RESAMPLE)
		source_id = USERSPACE
	else
		source_id = IRQFD

> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd
  2012-09-05 14:51               ` Michael S. Tsirkin
  2012-09-05 14:59                 ` Avi Kivity
@ 2012-09-05 15:09                 ` Michael S. Tsirkin
  2012-09-05 15:12                   ` Avi Kivity
  1 sibling, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2012-09-05 15:09 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Alex Williamson, gleb, kvm, linux-kernel

On Wed, Sep 05, 2012 at 05:51:53PM +0300, Michael S. Tsirkin wrote:
> On Wed, Sep 05, 2012 at 05:35:43PM +0300, Avi Kivity wrote:
> > On 08/22/2012 03:41 AM, Michael S. Tsirkin wrote:
> > >> 
> > >> I assumed you were pointing out the level vs edge interaction.  If we
> > >> call that a userspace bug, I can just drop this.  Thanks,
> > >> 
> > >> Alex
> > > 
> > > level is userspace bug I think :)
> > 
> > I don't see how it's a bug.  Suppose we have a vfio device that shares a
> > gsi with an emulated device.  The emulated device naturally uses
> > KVM_IRQ_LINE (it has no need to re-sample on ADN), while vfio naturally
> > has to use irqfd.
> 
> Absolutely. But vfio needs to use irqfd with the new flag.
> Using existing irqfd for level is a bug.
> 
> > Note one would expect that each irqfd gets its own irq source id, since
> > they are all independent level sources.  The reason they don't is that
> > we shut them down anyway and let the sources re-trigger (it is more
> > accurate to say that they have no irq source id, but that would just
> > muddle the implementation).
> > 
> > Alex, if the conclusion is that we do need this patch, then please add a
> > comment explaining why we can share the source id among all irqfd users.

Something along the lines of

/* 
   For resample irqfds, level is a logical OR of all inputs;
   to support this, track state for RESAMPLE irqfds separately
   from userspace. We do not need to track state for each input since
   they are all deasserted at the same time, before resampling.
   */

?

> > -- 
> > error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd
  2012-09-05 15:09                 ` Michael S. Tsirkin
@ 2012-09-05 15:12                   ` Avi Kivity
  2012-09-05 15:15                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 35+ messages in thread
From: Avi Kivity @ 2012-09-05 15:12 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Alex Williamson, gleb, kvm, linux-kernel

On 09/05/2012 06:09 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 05, 2012 at 05:51:53PM +0300, Michael S. Tsirkin wrote:
>> On Wed, Sep 05, 2012 at 05:35:43PM +0300, Avi Kivity wrote:
>> > On 08/22/2012 03:41 AM, Michael S. Tsirkin wrote:
>> > >> 
>> > >> I assumed you were pointing out the level vs edge interaction.  If we
>> > >> call that a userspace bug, I can just drop this.  Thanks,
>> > >> 
>> > >> Alex
>> > > 
>> > > level is userspace bug I think :)
>> > 
>> > I don't see how it's a bug.  Suppose we have a vfio device that shares a
>> > gsi with an emulated device.  The emulated device naturally uses
>> > KVM_IRQ_LINE (it has no need to re-sample on ADN), while vfio naturally
>> > has to use irqfd.
>> 
>> Absolutely. But vfio needs to use irqfd with the new flag.
>> Using existing irqfd for level is a bug.
>> 
>> > Note one would expect that each irqfd gets its own irq source id, since
>> > they are all independent level sources.  The reason they don't is that
>> > we shut them down anyway and let the sources re-trigger (it is more
>> > accurate to say that they have no irq source id, but that would just
>> > muddle the implementation).
>> > 
>> > Alex, if the conclusion is that we do need this patch, then please add a
>> > comment explaining why we can share the source id among all irqfd users.
> 
> Something along the lines of
> 
> /* 
>    For resample irqfds, level is a logical OR of all inputs;
>    to support this, track state for RESAMPLE irqfds separately
>    from userspace. We do not need to track state for each input since
>    they are all deasserted at the same time, before resampling.
>    */

Well the comment style is wrong.

To expand a little more, irqfd only sends assert events, so assigning
the level is equivalent to an OR.  Clearing an resampling simply builds
the state again.

btw, there can be other irq source IDs if the lines are shared with the
PIT or kvm assigned devices.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd
  2012-09-05 14:59                 ` Avi Kivity
@ 2012-09-05 15:13                   ` Michael S. Tsirkin
  2012-09-05 15:22                     ` Avi Kivity
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2012-09-05 15:13 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Alex Williamson, gleb, kvm, linux-kernel

On Wed, Sep 05, 2012 at 05:59:46PM +0300, Avi Kivity wrote:
> On 09/05/2012 05:51 PM, Michael S. Tsirkin wrote:
> > On Wed, Sep 05, 2012 at 05:35:43PM +0300, Avi Kivity wrote:
> >> On 08/22/2012 03:41 AM, Michael S. Tsirkin wrote:
> >> >> 
> >> >> I assumed you were pointing out the level vs edge interaction.  If we
> >> >> call that a userspace bug, I can just drop this.  Thanks,
> >> >> 
> >> >> Alex
> >> > 
> >> > level is userspace bug I think :)
> >> 
> >> I don't see how it's a bug.  Suppose we have a vfio device that shares a
> >> gsi with an emulated device.  The emulated device naturally uses
> >> KVM_IRQ_LINE (it has no need to re-sample on ADN), while vfio naturally
> >> has to use irqfd.
> > 
> > Absolutely. But vfio needs to use irqfd with the new flag.
> > Using existing irqfd for level is a bug.
> 
> I see we're not reusing this irq source id for level irqfd.  But I think
> we should, there's no need for per-gsi irq source id.

I agree. All resample irqfds are deasserted at the same time,
tracking them separately gets us nothing.

> Plus I'd like to
> fix the theoretical bug even if it doesn't bite in practice.
> 

I'm not sure what the bug is, for edge, and how a separate ID fixes it.
Could you clarify?

> 
> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd
  2012-09-05 15:07     ` Michael S. Tsirkin
@ 2012-09-05 15:15       ` Avi Kivity
  0 siblings, 0 replies; 35+ messages in thread
From: Avi Kivity @ 2012-09-05 15:15 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Alex Williamson, gleb, kvm, linux-kernel

On 09/05/2012 06:07 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 05, 2012 at 05:46:17PM +0300, Avi Kivity wrote:
>> On 08/21/2012 10:29 PM, Alex Williamson wrote:
>> > KVM_IRQFD currently uses the reserved KVM_USERSPACE_IRQ_SOURCE_ID
>> > which is also shared with userspace injection methods like
>> > KVM_IRQ_LINE.  This can cause a conflict if an irqfd triggers on
>> > a GSI asserted through KVM_IRQ_LINE.  Move irqfd to it's own
>> > reserved IRQ source ID.  Add a capability for userspace to test
>> > for this fix.
>> 
>> I don't think we need a cap, rather a backport if we identify real cases
>> where an edge gsi is shared among several devices.  Otherwise it is just
>> a theoretical bug before level irqfd is introduced.
> 
> In that case, I think it's safer to preserve the "bug" as is: we are
> changing userspace-visible behaviour for edge interrupts otherwise.
> For example if userspace uses kvm_irq_line for an edge
> interrupt, set it to 1, previously it could then
> continue to send any number of interrupts with irqfd,
> now it can't.

If anyone did that, they should have reported a bug, since they surely
didn't expect edges if the line was held high.

> 
> Basically the logical OR functionality of source IDs
> does not make sense for edge.

Edge is only interpreted at the ioapic or pic input; the line is just a
line (an open collector line that ORs anything connected to it, or an
equivalent).

> 
> How about we do
> 	 if (flags&RESAMPLE)
> 		source_id = USERSPACE
> 	else
> 		source_id = IRQFD

Okay if we identify something that depends on the bug, otherwise not.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd
  2012-09-05 15:12                   ` Avi Kivity
@ 2012-09-05 15:15                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 35+ messages in thread
From: Michael S. Tsirkin @ 2012-09-05 15:15 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Alex Williamson, gleb, kvm, linux-kernel

On Wed, Sep 05, 2012 at 06:12:04PM +0300, Avi Kivity wrote:
> On 09/05/2012 06:09 PM, Michael S. Tsirkin wrote:
> > On Wed, Sep 05, 2012 at 05:51:53PM +0300, Michael S. Tsirkin wrote:
> >> On Wed, Sep 05, 2012 at 05:35:43PM +0300, Avi Kivity wrote:
> >> > On 08/22/2012 03:41 AM, Michael S. Tsirkin wrote:
> >> > >> 
> >> > >> I assumed you were pointing out the level vs edge interaction.  If we
> >> > >> call that a userspace bug, I can just drop this.  Thanks,
> >> > >> 
> >> > >> Alex
> >> > > 
> >> > > level is userspace bug I think :)
> >> > 
> >> > I don't see how it's a bug.  Suppose we have a vfio device that shares a
> >> > gsi with an emulated device.  The emulated device naturally uses
> >> > KVM_IRQ_LINE (it has no need to re-sample on ADN), while vfio naturally
> >> > has to use irqfd.
> >> 
> >> Absolutely. But vfio needs to use irqfd with the new flag.
> >> Using existing irqfd for level is a bug.
> >> 
> >> > Note one would expect that each irqfd gets its own irq source id, since
> >> > they are all independent level sources.  The reason they don't is that
> >> > we shut them down anyway and let the sources re-trigger (it is more
> >> > accurate to say that they have no irq source id, but that would just
> >> > muddle the implementation).
> >> > 
> >> > Alex, if the conclusion is that we do need this patch, then please add a
> >> > comment explaining why we can share the source id among all irqfd users.
> > 
> > Something along the lines of
> > 
> > /* 
> >    For resample irqfds, level is a logical OR of all inputs;
> >    to support this, track state for RESAMPLE irqfds separately
> >    from userspace. We do not need to track state for each input since
> >    they are all deasserted at the same time, before resampling.
> >    */
> 
> Well the comment style is wrong.

Ouch.

> To expand a little more, irqfd only sends assert events, so assigning
> the level is equivalent to an OR.  Clearing an resampling simply builds
> the state again.
> 
> btw, there can be other irq source IDs if the lines are shared with the
> PIT or kvm assigned devices.

Nod.

> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd
  2012-09-05 15:13                   ` Michael S. Tsirkin
@ 2012-09-05 15:22                     ` Avi Kivity
  2012-09-05 15:28                       ` Michael S. Tsirkin
  0 siblings, 1 reply; 35+ messages in thread
From: Avi Kivity @ 2012-09-05 15:22 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Alex Williamson, gleb, kvm, linux-kernel

On 09/05/2012 06:13 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 05, 2012 at 05:59:46PM +0300, Avi Kivity wrote:
>> On 09/05/2012 05:51 PM, Michael S. Tsirkin wrote:
>> > On Wed, Sep 05, 2012 at 05:35:43PM +0300, Avi Kivity wrote:
>> >> On 08/22/2012 03:41 AM, Michael S. Tsirkin wrote:
>> >> >> 
>> >> >> I assumed you were pointing out the level vs edge interaction.  If we
>> >> >> call that a userspace bug, I can just drop this.  Thanks,
>> >> >> 
>> >> >> Alex
>> >> > 
>> >> > level is userspace bug I think :)
>> >> 
>> >> I don't see how it's a bug.  Suppose we have a vfio device that shares a
>> >> gsi with an emulated device.  The emulated device naturally uses
>> >> KVM_IRQ_LINE (it has no need to re-sample on ADN), while vfio naturally
>> >> has to use irqfd.
>> > 
>> > Absolutely. But vfio needs to use irqfd with the new flag.
>> > Using existing irqfd for level is a bug.
>> 
>> I see we're not reusing this irq source id for level irqfd.  But I think
>> we should, there's no need for per-gsi irq source id.
> 
> I agree. All resample irqfds are deasserted at the same time,
> tracking them separately gets us nothing.

That's not the reason.  Separate irq source ids only have meanings
within a gsi.  We could have two lines (gsi 3 isid 4) and (gsi 4 isid 4)
that can be toggled independently with no effect on the other gsi.
Within a gsi we do need a separate irq source id usually, but as 2/2
recognizes, AODNs are a special case since we clear all inputs anyway.
The end result is that all AODNs can share a single isid.

> 
>> Plus I'd like to
>> fix the theoretical bug even if it doesn't bite in practice.
>> 
> 
> I'm not sure what the bug is, for edge, and how a separate ID fixes it.
> Could you clarify?

gsi 3 is configured as edge in the ioapic.  It has (unusually) two
inputs: one driven by userspace, the other by irqfd.

cpu 0                    cpu 1
------------------------ -------------------------
irqfd: set to 1
ioapic: recognize edge
inject irq
EOI
                         KVM_IRQ_LINE: set to 1
                         ioapic: ignore
                         KVM_IRQ_LINE: set to 0
irqfd: set to 0

We had two edges with an EOI between them, but injected just on interrupt.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd
  2012-09-05 15:22                     ` Avi Kivity
@ 2012-09-05 15:28                       ` Michael S. Tsirkin
  2012-09-05 15:35                         ` Avi Kivity
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2012-09-05 15:28 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Alex Williamson, gleb, kvm, linux-kernel

On Wed, Sep 05, 2012 at 06:22:57PM +0300, Avi Kivity wrote:
> On 09/05/2012 06:13 PM, Michael S. Tsirkin wrote:
> > On Wed, Sep 05, 2012 at 05:59:46PM +0300, Avi Kivity wrote:
> >> On 09/05/2012 05:51 PM, Michael S. Tsirkin wrote:
> >> > On Wed, Sep 05, 2012 at 05:35:43PM +0300, Avi Kivity wrote:
> >> >> On 08/22/2012 03:41 AM, Michael S. Tsirkin wrote:
> >> >> >> 
> >> >> >> I assumed you were pointing out the level vs edge interaction.  If we
> >> >> >> call that a userspace bug, I can just drop this.  Thanks,
> >> >> >> 
> >> >> >> Alex
> >> >> > 
> >> >> > level is userspace bug I think :)
> >> >> 
> >> >> I don't see how it's a bug.  Suppose we have a vfio device that shares a
> >> >> gsi with an emulated device.  The emulated device naturally uses
> >> >> KVM_IRQ_LINE (it has no need to re-sample on ADN), while vfio naturally
> >> >> has to use irqfd.
> >> > 
> >> > Absolutely. But vfio needs to use irqfd with the new flag.
> >> > Using existing irqfd for level is a bug.
> >> 
> >> I see we're not reusing this irq source id for level irqfd.  But I think
> >> we should, there's no need for per-gsi irq source id.
> > 
> > I agree. All resample irqfds are deasserted at the same time,
> > tracking them separately gets us nothing.
> 
> That's not the reason.  Separate irq source ids only have meanings
> within a gsi.  We could have two lines (gsi 3 isid 4) and (gsi 4 isid 4)
> that can be toggled independently with no effect on the other gsi.
> Within a gsi we do need a separate irq source id usually, but as 2/2
> recognizes, AODNs are a special case since we clear all inputs anyway.
> The end result is that all AODNs can share a single isid.
> 
> > 
> >> Plus I'd like to
> >> fix the theoretical bug even if it doesn't bite in practice.
> >> 
> > 
> > I'm not sure what the bug is, for edge, and how a separate ID fixes it.
> > Could you clarify?
> 
> gsi 3 is configured as edge in the ioapic.  It has (unusually) two
> inputs: one driven by userspace, the other by irqfd.
> 
> cpu 0                    cpu 1
> ------------------------ -------------------------
> irqfd: set to 1
> ioapic: recognize edge
> inject irq
> EOI
>                          KVM_IRQ_LINE: set to 1
>                          ioapic: ignore
>                          KVM_IRQ_LINE: set to 0
> irqfd: set to 0
> 
> We had two edges with an EOI between them, but injected just on interrupt.

I see. Makes sense, ACK this patch.


> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd
  2012-09-05 15:28                       ` Michael S. Tsirkin
@ 2012-09-05 15:35                         ` Avi Kivity
  0 siblings, 0 replies; 35+ messages in thread
From: Avi Kivity @ 2012-09-05 15:35 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Alex Williamson, gleb, kvm, linux-kernel

On 09/05/2012 06:28 PM, Michael S. Tsirkin wrote:

>> gsi 3 is configured as edge in the ioapic.  It has (unusually) two
>> inputs: one driven by userspace, the other by irqfd.
>> 
>> cpu 0                    cpu 1
>> ------------------------ -------------------------
>> irqfd: set to 1
>> ioapic: recognize edge
>> inject irq
>> EOI
>>                          KVM_IRQ_LINE: set to 1
>>                          ioapic: ignore
>>                          KVM_IRQ_LINE: set to 0
>> irqfd: set to 0
>> 
>> We had two edges with an EOI between them, but injected just on interrupt.
> 
> I see. Makes sense, ACK this patch.

Actually it's wrong.  The two sources are not synchronized, so there is
no way for them to know the two edges did not coalesce.  On real
hardware, after all, edge interrupts have a non-zero pulse width, and
kvm faithfully emulates this.

But this patch makes sense for level irqfd, so we might as well keep it
with a different description.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 2/2] kvm: On Ack, De-assert & Notify KVM_IRQFD extension
  2012-09-05 14:57   ` Avi Kivity
@ 2012-09-17 16:13     ` Alex Williamson
  0 siblings, 0 replies; 35+ messages in thread
From: Alex Williamson @ 2012-09-17 16:13 UTC (permalink / raw)
  To: Avi Kivity; +Cc: mst, gleb, kvm, linux-kernel

On Wed, 2012-09-05 at 17:57 +0300, Avi Kivity wrote:
> On 08/21/2012 10:29 PM, Alex Williamson wrote:
> > For VFIO based device assignment we'd like a mechanism to allow level
> > triggered interrutps to be directly injected into KVM.  KVM_IRQFD
> > already allows this for edge triggered interrupts, but for level, we
> > need to watch for acknowledgement of the interrupt from the guest to
> > provide us a hint when to test the device and allow it to re-assert
> > if necessary.  To do this, we create a new KVM_IRQFD mode called
> > "On Ack, De-assert & Notify", or OADN.  In this mode, an interrupt
> > injection provides only a gsi assertion.  We then hook into the IRQ
> > ACK notifier, which when triggered de-asserts the gsi and notifies
> > via another eventfd.  It's then the responsibility of the user to
> > re-assert the interrupt is service is still required.
> > 
> > 
> > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > index bf33aaa..87d7321 100644
> > --- a/Documentation/virtual/kvm/api.txt
> > +++ b/Documentation/virtual/kvm/api.txt
> > @@ -1946,6 +1946,19 @@ the guest using the specified gsi pin.  The irqfd is removed using
> >  the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
> >  and kvm_irqfd.gsi.
> >  
> > +With KVM_CAP_IRQFD_OADN, KVM_IRQFD supports an "On Ack, De-assert &
> > +Notify" option that allows emulation of level-triggered interrupts.
> > +When kvm_irqfd.fd is triggered, the requested gsi is asserted and
> > +remains asserted until interaction with the irqchip indicates the
> > +VM has acknowledged the interrupt, such as an EOI.  On acknoledgement
> > +the gsi is automatically de-asserted and the user is notified via
> > +kvm_irqfd.notifyfd.  The user is then required to re-assert the
> > +interrupt if the associated device still requires service.  To enable
> > +this mode, configure the KVM_IRQFD using the KVM_IRQFD_FLAG_OADN flag
> > +and specify kvm_irqfd.notifyfd.  Note that closing kvm_irqfd.notifyfd
> > +while configured in this mode does not disable the irqfd.  The
> > +KVM_IRQFD_FLAG_OADN flag is only necessary on assignment.
> 
> Under my suggested naming, this would be called a "resampling irqfd",
> with resampling requested via kvm_irqfd.resamplefd.
> 
> > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> > index 2245cfa..dfdb5b2 100644
> > --- a/virt/kvm/eventfd.c
> > +++ b/virt/kvm/eventfd.c
> > @@ -43,6 +43,23 @@
> >   * --------------------------------------------------------------------
> >   */
> >  
> > +/*
> > + * OADN irqfds (On Ack, De-assert & Notify) are a special variety of
> > + * irqfds that assert an interrupt to the irqchip on eventfd trigger,
> > + * receieve notification when userspace acknowledges the interrupt,
> > + * automatically de-asserts the irqchip level, and notifies userspace
> > + * via the oadn_eventfd.  This object helps to provide one-to-many
> > + * deassert-to-notify so we can share a single irq source ID per OADN.
> > + */
> > +struct _irqfd_oadn {
> > +	struct kvm *kvm;
> > +	int irq_source_id; /* IRQ source ID shared by these irqfds */
> > +	struct list_head irqfds; /* list of irqfds using this object */
> > +	struct kvm_irq_ack_notifier notifier; /* IRQ ACK notification */
> > +	struct kref kref; /* Race-free removal */
> > +	struct list_head list;
> > +};
> 
> 
> Why do you need per-gsi irq source IDs?  irq source ids only matter
> within a gsi.  For example KVM_IRQ_LINE shares one source ID for all
> lines (with result that userspace is forced to manage the ORing of
> shared inputs itself).

Right, but locking makes it difficult to tear down a resample irqfd
without potentially racing creation of a new one, which I tried to
explain here:

http://www.spinics.net/lists/kvm/msg78460.html

This can cause a de-assert w/o ack as we briefly have multiple resample
irqfds on the same gsi, irq source id pair.  That can dead lock a vfio
device.  Using a new irq source ID ensures that the old resample irqfd
doesn't interfere with the new one.  We count on the final clear or the
gsi assertion when releasing the irq source id, so we can't share it
among other resample irqfds on other gsis with different life cycles.

Michael has suggested re-architecting the locking around some structure,
but I'm not sure it's worth it.  AFAICT we have more irq source IDs than
we could consume if resample irqfds on the same gsi share an irq source
id.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v9 0/2] kvm: level irqfd support
  2012-08-22  8:25     ` Michael S. Tsirkin
@ 2012-09-17 18:08       ` Alex Williamson
  0 siblings, 0 replies; 35+ messages in thread
From: Alex Williamson @ 2012-09-17 18:08 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: avi, gleb, kvm, linux-kernel

On Wed, 2012-08-22 at 11:25 +0300, Michael S. Tsirkin wrote:
> On Tue, Aug 21, 2012 at 07:28:15PM -0600, Alex Williamson wrote:
> > On Wed, 2012-08-22 at 03:31 +0300, Michael S. Tsirkin wrote:
> > > On Tue, Aug 21, 2012 at 01:28:57PM -0600, Alex Williamson wrote:
> > > > Here's the much anticipated re-write of support for level irqfds.  As
> > > > Michael suggested, I've rolled the eoi/ack notification fd into
> > > > KVM_IRQFD as a new mode.  For lack of a better name, as there seems to
> > > > be objections to associating this specifically with an EOI or an ACK,
> > > > I've name this OADN or "On Ack, De-assert & Notify".
> > > > 
> > > > Patch 1of2 switches current KVM_IRQFDs to use their own IRQ source ID
> > > > since we're potentially stepping on KVM_USERSPACE_IRQ_SOURCE_ID.
> > > > Unfurtunately I was not able to make 2of2 use a single IRQ source ID,
> > > > the reason is it's racy.  Objects to track OADNs are made dynamically,
> > > > we look through existing ones for a match under spinlock and setup a
> > > > new one if there's no match.  On teardown, we can remove the OADN from
> > > > the list under lock, but that same lock prevents us from de-assigning
> > > > the IRQ ACK notifier or waiting for an RCU grace period.  We must make
> > > > sure that any unused GSI is de-asserted, but the above means it's
> > > > possible that another OADN has been created for this source ID/GSI
> > > > and de-asserting the GSI could lead to breakage.
> > > 
> > > I do not see it. What breakage? Could you give an example please?
> > > 
> > > 
> > > I think what you are saying is last deassign must clear
> > > since otherwise we never will clear.
> > > I agree it is either that or delay deassign until ack.
> > > 
> > > Can it be as simple as this (after all rcu etc dances)?
> > > 	lock irqfds
> > > 	if no oadns
> > > 		set level to 0
> > > 	unlock irqfds
> > > ?
> > 
> > lock irqfds
> > remove irqfd from oadn list
> > if no oadns
> >     remove oadn
> >     set gsi 0
> > unlock
> >                                     lock irqfds
> >                                     new oadn
> >                                     unlock irqfds
> > 
> > >> EOI 
> >                                                           ack notify new oadn
> >                                                           de-assert gsi
> >                                                           notify new oadn
> > >> re-assert irqfd
> >                                                           ack notify old oadn
> >                                                           de-assert gsi
> >                                                           notify old oadn
> > 
> > synchronize_rcu
> > 
> > kvm_unregister_irq_ack_notifier
> >
> > So, because the unregister is removed from the final clear and because
> > we share an IRQ source ID there's a window where we can have two oadns
> > registered for the same GSI.  The new one will de-assert and notify
> > while the old one has an empty list to notify, but still de-asserts.  We
> > can therefore de-assert w/o notify.
> > 
> > By using a new source ID, we separate the two so users of the new oadn
> > can't race the old and we can cleanly free the old source ID,
> > de-asserting it.
> 
> Need to think about it some more but is the problem two
> ack notifiers for the same gsi?

yes

> In that case, how about we add __kvm_unregister_irq_ack_notifier
> with no locking, and do most of the above under
> kvm->irq_lock?

Converting locks makes me nervous, but I'll give it a shot.  I don't
know how easy/possible it is though.  I know in previous iterations I
tried to make something similar to irqfd use a mutex and couldn't, but I
don't remember the details.

> With one change: it is better not to call synchronize_rcu
> under irq lock, I think we can safely move it to after
> __kvm_unregister_irq_ack_notifier.

Yep, that makes the interface pretty ugly though as we then have two
separate, but dependent steps.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2012-09-17 18:08 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-21 19:28 [PATCH v9 0/2] kvm: level irqfd support Alex Williamson
2012-08-21 19:29 ` [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd Alex Williamson
2012-08-21 19:58   ` Michael S. Tsirkin
2012-08-21 20:06     ` Alex Williamson
2012-08-21 20:41       ` Michael S. Tsirkin
2012-08-21 21:14         ` Alex Williamson
2012-08-22  0:41           ` Michael S. Tsirkin
2012-08-22  1:34             ` Alex Williamson
2012-09-05 14:35             ` Avi Kivity
2012-09-05 14:51               ` Michael S. Tsirkin
2012-09-05 14:59                 ` Avi Kivity
2012-09-05 15:13                   ` Michael S. Tsirkin
2012-09-05 15:22                     ` Avi Kivity
2012-09-05 15:28                       ` Michael S. Tsirkin
2012-09-05 15:35                         ` Avi Kivity
2012-09-05 15:09                 ` Michael S. Tsirkin
2012-09-05 15:12                   ` Avi Kivity
2012-09-05 15:15                     ` Michael S. Tsirkin
2012-09-05 14:46   ` Avi Kivity
2012-09-05 15:07     ` Michael S. Tsirkin
2012-09-05 15:15       ` Avi Kivity
2012-08-21 19:29 ` [PATCH v9 2/2] kvm: On Ack, De-assert & Notify KVM_IRQFD extension Alex Williamson
2012-08-21 20:37   ` Michael S. Tsirkin
2012-08-21 21:11     ` Alex Williamson
2012-08-22  0:14       ` Michael S. Tsirkin
2012-08-22  1:48         ` Alex Williamson
2012-09-05 14:57   ` Avi Kivity
2012-09-17 16:13     ` Alex Williamson
2012-08-22  0:31 ` [PATCH v9 0/2] kvm: level irqfd support Michael S. Tsirkin
2012-08-22  1:28   ` Alex Williamson
2012-08-22  8:25     ` Michael S. Tsirkin
2012-09-17 18:08       ` Alex Williamson
2012-08-22 10:10 ` Michael S. Tsirkin
2012-09-05 14:42 ` Avi Kivity
2012-09-05 14:52   ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).