linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 0/6] kvm: level irqfd support
@ 2012-08-10 22:37 Alex Williamson
  2012-08-10 22:37 ` [PATCH v8 1/6] kvm: Allow filtering of acked irqs Alex Williamson
                   ` (7 more replies)
  0 siblings, 8 replies; 33+ messages in thread
From: Alex Williamson @ 2012-08-10 22:37 UTC (permalink / raw)
  To: avi, mst; +Cc: gleb, kvm, linux-kernel

v8:

Trying a new approach.  Nobody seems to like the internal IRQ
source ID object and the interactions it implies between irqfd
and eoifd, so let's get rid of it.  Instead, simply expose
IRQ source IDs to userspace.  This lets the user be in charge
of freeing them or hanging onto a source ID for later use.  They
can also detach and re-attach components at will.  It also opens
up the possibility that userspace might want to use each IRQ
source ID for more than one GSI (and avoids the kernel needing
to manage that).  Per suggestions, EOIFD is now IRQ_ACKFD.

I really wanted to add a de-assert-only option to irqfd so the
irq_ackfd could be fed directly into an irqfd, but I'm dependent
on the ordering of de-assert _then_ signal an eventfd.  Keeping
that ordering doesn't seem to be possible, especially since irqfd
uses a workqueue, if I attempt to make that connection.  Thanks,

Alex

---

Alex Williamson (6):
      kvm: Add de-assert option to KVM_IRQ_ACKFD
      kvm: KVM_IRQ_ACKFD
      kvm: Add assert-only option to KVM_IRQFD
      kvm: Add IRQ source ID option to KVM_IRQFD
      kvm: Expose IRQ source IDs to userspace
      kvm: Allow filtering of acked irqs


 Documentation/virtual/kvm/api.txt |   53 ++++++
 arch/x86/kvm/Kconfig              |    1 
 arch/x86/kvm/i8254.c              |    1 
 arch/x86/kvm/i8259.c              |    8 +
 arch/x86/kvm/x86.c                |    8 +
 include/linux/kvm.h               |   39 ++++-
 include/linux/kvm_host.h          |   18 ++
 virt/kvm/Kconfig                  |    3 
 virt/kvm/assigned-dev.c           |    1 
 virt/kvm/eventfd.c                |  315 +++++++++++++++++++++++++++++++++++++
 virt/kvm/ioapic.c                 |    5 -
 virt/kvm/irq_comm.c               |   28 +++
 virt/kvm/kvm_main.c               |   26 +++
 13 files changed, 496 insertions(+), 10 deletions(-)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v8 1/6] kvm: Allow filtering of acked irqs
  2012-08-10 22:37 [PATCH v8 0/6] kvm: level irqfd support Alex Williamson
@ 2012-08-10 22:37 ` Alex Williamson
  2012-08-15 12:27   ` Michael S. Tsirkin
  2012-08-10 22:37 ` [PATCH v8 2/6] kvm: Expose IRQ source IDs to userspace Alex Williamson
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 33+ messages in thread
From: Alex Williamson @ 2012-08-10 22:37 UTC (permalink / raw)
  To: avi, mst; +Cc: gleb, kvm, linux-kernel

Registering an kvm_irq_ack_notifier with kian.irq_source_id < 0
retains existing behavior, filling in the actual irq_source_id results
in the callback only being called when the specified irq_source_id is
asserting the given gsi.

The i8254 PIT remains unfiltered because it de-asserts it's irq source
id, so it's notifier would never get called otherwise.  KVM device
assignment gets filtering as it de-asserts the GSI in it's notifier.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 arch/x86/kvm/i8254.c     |    1 +
 arch/x86/kvm/i8259.c     |    8 +++++++-
 include/linux/kvm_host.h |    4 +++-
 virt/kvm/assigned-dev.c  |    1 +
 virt/kvm/ioapic.c        |    5 ++++-
 virt/kvm/irq_comm.c      |    6 ++++--
 6 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index adba28f..2355d19 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -709,6 +709,7 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
 	hrtimer_init(&pit_state->pit_timer.timer,
 		     CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
 	pit_state->irq_ack_notifier.gsi = 0;
+	pit_state->irq_ack_notifier.irq_source_id = -1; /* No filter */
 	pit_state->irq_ack_notifier.irq_acked = kvm_pit_ack_irq;
 	kvm_register_irq_ack_notifier(kvm, &pit_state->irq_ack_notifier);
 	pit_state->pit_timer.reinject = true;
diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index e498b18..d2175a9 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -74,9 +74,14 @@ static void pic_unlock(struct kvm_pic *s)
 
 static void pic_clear_isr(struct kvm_kpic_state *s, int irq)
 {
+	unsigned long irq_source_ids;
+
 	s->isr &= ~(1 << irq);
 	if (s != &s->pics_state->pics[0])
 		irq += 8;
+
+	irq_source_ids = s->pics_state->irq_states[irq];
+
 	/*
 	 * We are dropping lock while calling ack notifiers since ack
 	 * notifier callbacks for assigned devices call into PIC recursively.
@@ -84,7 +89,8 @@ static void pic_clear_isr(struct kvm_kpic_state *s, int irq)
 	 * it should be safe since PIC state is already updated at this stage.
 	 */
 	pic_unlock(s->pics_state);
-	kvm_notify_acked_irq(s->pics_state->kvm, SELECT_PIC(irq), irq);
+	kvm_notify_acked_irq(s->pics_state->kvm, SELECT_PIC(irq), irq,
+			     irq_source_ids);
 	pic_lock(s->pics_state);
 }
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b70b48b..2ad3e4a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -577,6 +577,7 @@ int kvm_is_mmio_pfn(pfn_t pfn);
 
 struct kvm_irq_ack_notifier {
 	struct hlist_node link;
+	int irq_source_id;
 	unsigned gsi;
 	void (*irq_acked)(struct kvm_irq_ack_notifier *kian);
 };
@@ -627,7 +628,8 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic,
 int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level);
 int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm *kvm,
 		int irq_source_id, int level);
-void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
+void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin,
+			  unsigned long irq_source_ids);
 void kvm_register_irq_ack_notifier(struct kvm *kvm,
 				   struct kvm_irq_ack_notifier *kian);
 void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index 23a41a9..a08c9c1 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -407,6 +407,7 @@ static int assigned_device_enable_guest_intx(struct kvm *kvm,
 {
 	dev->guest_irq = irq->guest_irq;
 	dev->ack_notifier.gsi = irq->guest_irq;
+	dev->ack_notifier.irq_source_id = dev->irq_source_id;
 	return 0;
 }
 
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index ef61d52..1a9f445 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -241,10 +241,12 @@ static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector,
 
 	for (i = 0; i < IOAPIC_NUM_PINS; i++) {
 		union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i];
+		unsigned long irq_source_ids;
 
 		if (ent->fields.vector != vector)
 			continue;
 
+		irq_source_ids = ioapic->irq_states[i];
 		/*
 		 * We are dropping lock while calling ack notifiers because ack
 		 * notifier callbacks for assigned devices call into IOAPIC
@@ -254,7 +256,8 @@ static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector,
 		 * after ack notifier returns.
 		 */
 		spin_unlock(&ioapic->lock);
-		kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
+		kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i,
+				     irq_source_ids);
 		spin_lock(&ioapic->lock);
 
 		if (trigger_mode != IOAPIC_LEVEL_TRIG)
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 83402d7..7d75126 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -173,7 +173,8 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level)
 	return ret;
 }
 
-void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin)
+void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin,
+			  unsigned long irq_source_ids)
 {
 	struct kvm_irq_ack_notifier *kian;
 	struct hlist_node *n;
@@ -186,7 +187,8 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin)
 	if (gsi != -1)
 		hlist_for_each_entry_rcu(kian, n, &kvm->irq_ack_notifier_list,
 					 link)
-			if (kian->gsi == gsi)
+			if (kian->gsi == gsi && (kian->irq_source_id < 0 ||
+			    test_bit(kian->irq_source_id, &irq_source_ids)))
 				kian->irq_acked(kian);
 	rcu_read_unlock();
 }


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v8 2/6] kvm: Expose IRQ source IDs to userspace
  2012-08-10 22:37 [PATCH v8 0/6] kvm: level irqfd support Alex Williamson
  2012-08-10 22:37 ` [PATCH v8 1/6] kvm: Allow filtering of acked irqs Alex Williamson
@ 2012-08-10 22:37 ` Alex Williamson
  2012-08-15 12:59   ` Michael S. Tsirkin
  2012-08-10 22:37 ` [PATCH v8 3/6] kvm: Add IRQ source ID option to KVM_IRQFD Alex Williamson
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 33+ messages in thread
From: Alex Williamson @ 2012-08-10 22:37 UTC (permalink / raw)
  To: avi, mst; +Cc: gleb, kvm, linux-kernel

Introduce KVM_IRQ_SOURCE_ID and KVM_CAP_NR_IRQ_SOURCE_ID to allow
user allocation of IRQ source IDs and querying both the capability
and the total count of IRQ source IDs.  These will later be used
by interfaces for setting up level IRQs.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 Documentation/virtual/kvm/api.txt |   20 ++++++++++++++++++++
 arch/x86/kvm/Kconfig              |    1 +
 arch/x86/kvm/x86.c                |    3 +++
 include/linux/kvm.h               |   11 +++++++++++
 include/linux/kvm_host.h          |    1 +
 virt/kvm/Kconfig                  |    3 +++
 virt/kvm/irq_comm.c               |   22 ++++++++++++++++++++++
 virt/kvm/kvm_main.c               |   16 ++++++++++++++++
 8 files changed, 77 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index bf33aaa..062cfd5 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1980,6 +1980,26 @@ return the hash table order in the parameter.  (If the guest is using
 the virtualized real-mode area (VRMA) facility, the kernel will
 re-create the VMRA HPTEs on the next KVM_RUN of any vcpu.)
 
+4.77 KVM_IRQ_SOURCE_ID
+
+Capability: KVM_CAP_NR_IRQ_SOURCE_ID
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_irq_source_id (in/out)
+Returns: 0 on success, -errno on error
+
+Allows allocating and freeing IRQ source IDs.  Each IRQ source ID
+represents a complete set of irqchip pin inputs which are logically
+OR'd with other IRQ source IDs for determining the final assertion
+level of a pin.  The flag KVM_IRQ_SOURCE_ID_FLAG_DEASSIGN indicates
+whether the call is for an allocation or deallocation.
+kvm_irq_source_id.irq_source_id returns the allocated IRQ source ID
+on success and specifies the freed IRQ source ID on deassign.  The
+return value of KVM_CAP_NR_IRQ_SOURCE_ID indicates the total number
+of IRQ source IDs.  These IDs are also shared with KVM internal users
+(ex. KVM assigned devices, PIT, shared user ID), therefore not all IDs
+may be allocated through this interface.
+
 
 5. The kvm_run structure
 ------------------------
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index a28f338..bfd2082 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -37,6 +37,7 @@ config KVM
 	select TASK_DELAY_ACCT
 	select PERF_EVENTS
 	select HAVE_KVM_MSI
+	select HAVE_KVM_IRQ_SOURCE_ID
 	---help---
 	  Support hosting fully virtualized guest machines using hardware
 	  virtualization extensions.  You will need a fairly recent
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 42bce48..75e743e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2209,6 +2209,9 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_TSC_DEADLINE_TIMER:
 		r = boot_cpu_has(X86_FEATURE_TSC_DEADLINE_TIMER);
 		break;
+	case KVM_CAP_NR_IRQ_SOURCE_ID:
+		r = BITS_PER_LONG; /* kvm->arch.irq_sources_bitmap */
+		break;
 	default:
 		r = 0;
 		break;
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 2ce09aa..67b6b49 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_PPC_GET_SMMU_INFO 78
 #define KVM_CAP_S390_COW 79
 #define KVM_CAP_PPC_ALLOC_HTAB 80
+#define KVM_CAP_NR_IRQ_SOURCE_ID 81
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -691,6 +692,14 @@ struct kvm_irqfd {
 	__u8  pad[20];
 };
 
+#define KVM_IRQ_SOURCE_ID_FLAG_DEASSIGN (1 << 0)
+
+struct kvm_irq_source_id {
+	__u32 flags;
+	__u32 irq_source_id;
+	__u8 pad[24];
+};
+
 struct kvm_clock_data {
 	__u64 clock;
 	__u32 flags;
@@ -831,6 +840,8 @@ struct kvm_s390_ucas_mapping {
 #define KVM_PPC_GET_SMMU_INFO	  _IOR(KVMIO,  0xa6, struct kvm_ppc_smmu_info)
 /* Available with KVM_CAP_PPC_ALLOC_HTAB */
 #define KVM_PPC_ALLOCATE_HTAB	  _IOWR(KVMIO, 0xa7, __u32)
+/* Available with KVM_CAP_IRQ_SOURCE_ID */
+#define KVM_IRQ_SOURCE_ID         _IOWR(KVMIO, 0xa8, struct kvm_irq_source_id)
 
 /*
  * ioctls for vcpu fds
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 2ad3e4a..ea6d7a1 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -636,6 +636,7 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
 				   struct kvm_irq_ack_notifier *kian);
 int kvm_request_irq_source_id(struct kvm *kvm);
 void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id);
+int kvm_irq_source_id(struct kvm *kvm, struct kvm_irq_source_id *id);
 
 /* For vcpu->arch.iommu_flags */
 #define KVM_IOMMU_CACHE_COHERENCY	0x1
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 28694f4..b7e0d4d 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -21,3 +21,6 @@ config KVM_ASYNC_PF
 
 config HAVE_KVM_MSI
        bool
+
+config HAVE_KVM_IRQ_SOURCE_ID
+       bool
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 7d75126..44d40c9 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -254,6 +254,28 @@ unlock:
 	mutex_unlock(&kvm->irq_lock);
 }
 
+int kvm_irq_source_id(struct kvm *kvm, struct kvm_irq_source_id *id)
+{
+	int irq_source_id;
+
+	if (id->flags & ~KVM_IRQ_SOURCE_ID_FLAG_DEASSIGN)
+		return -EINVAL;
+
+	if (id->flags & KVM_IRQ_SOURCE_ID_FLAG_DEASSIGN) {
+		if (id->irq_source_id == KVM_USERSPACE_IRQ_SOURCE_ID)
+			return -EINVAL;
+		kvm_free_irq_source_id(kvm, id->irq_source_id);
+		return 0;
+	}
+
+	irq_source_id = kvm_request_irq_source_id(kvm);
+	if (irq_source_id < 0)
+		return irq_source_id;
+
+	id->irq_source_id = irq_source_id;
+	return 0;
+}
+
 void kvm_register_irq_mask_notifier(struct kvm *kvm, int irq,
 				    struct kvm_irq_mask_notifier *kimn)
 {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2468523..e120cb3 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2093,6 +2093,22 @@ static long kvm_vm_ioctl(struct file *filp,
 		break;
 	}
 #endif
+#ifdef CONFIG_HAVE_KVM_IRQ_SOURCE_ID
+	case KVM_IRQ_SOURCE_ID: {
+		struct kvm_irq_source_id id;
+
+		r = -EFAULT;
+		if (copy_from_user(&id, argp, sizeof(id)))
+			goto out;
+
+		r = kvm_irq_source_id(kvm, &id);
+		if (!r && copy_to_user(argp, &id, sizeof(id))) {
+			r = -EFAULT;
+			goto out;
+		}
+		break;
+	}
+#endif
 	default:
 		r = kvm_arch_vm_ioctl(filp, ioctl, arg);
 		if (r == -ENOTTY)


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v8 3/6] kvm: Add IRQ source ID option to KVM_IRQFD
  2012-08-10 22:37 [PATCH v8 0/6] kvm: level irqfd support Alex Williamson
  2012-08-10 22:37 ` [PATCH v8 1/6] kvm: Allow filtering of acked irqs Alex Williamson
  2012-08-10 22:37 ` [PATCH v8 2/6] kvm: Expose IRQ source IDs to userspace Alex Williamson
@ 2012-08-10 22:37 ` Alex Williamson
  2012-08-15 13:49   ` Michael S. Tsirkin
  2012-08-10 22:37 ` [PATCH v8 4/6] kvm: Add assert-only " Alex Williamson
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 33+ messages in thread
From: Alex Williamson @ 2012-08-10 22:37 UTC (permalink / raw)
  To: avi, mst; +Cc: gleb, kvm, linux-kernel

This allows specifying an IRQ source ID to be used when injecting an
interrupt.  When not specified KVM_USERSPACE_IRQ_SOURCE_ID is used.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 Documentation/virtual/kvm/api.txt |    5 +++++
 arch/x86/kvm/x86.c                |    1 +
 include/linux/kvm.h               |    6 +++++-
 virt/kvm/eventfd.c                |   14 ++++++++++----
 4 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 062cfd5..46f4b4d 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1946,6 +1946,11 @@ the guest using the specified gsi pin.  The irqfd is removed using
 the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
 and kvm_irqfd.gsi.
 
+When KVM_CAP_IRQFD_IRQ_SOURCE_ID is available, KVM_IRQFD supports the
+KVM_IRQFD_FLAG_IRQ_SOURCE_ID which can be used to specify an IRQ
+source ID (see KVM_IRQ_SOURCE_ID) to be used for the guest interrupt.
+This flag has no effect on deassignment.
+
 4.76 KVM_PPC_ALLOCATE_HTAB
 
 Capability: KVM_CAP_PPC_ALLOC_HTAB
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 75e743e..946c5bd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2174,6 +2174,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_GET_TSC_KHZ:
 	case KVM_CAP_PCI_2_3:
 	case KVM_CAP_KVMCLOCK_CTRL:
+	case KVM_CAP_IRQFD_IRQ_SOURCE_ID:
 		r = 1;
 		break;
 	case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 67b6b49..deda8a9 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -619,6 +619,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_S390_COW 79
 #define KVM_CAP_PPC_ALLOC_HTAB 80
 #define KVM_CAP_NR_IRQ_SOURCE_ID 81
+#define KVM_CAP_IRQFD_IRQ_SOURCE_ID 82
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -684,12 +685,15 @@ struct kvm_xen_hvm_config {
 #endif
 
 #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
+/* Available with KVM_CAP_IRQFD_IRQ_SOURCE_ID */
+#define KVM_IRQFD_FLAG_IRQ_SOURCE_ID (1 << 1)
 
 struct kvm_irqfd {
 	__u32 fd;
 	__u32 gsi;
 	__u32 flags;
-	__u8  pad[20];
+	__u32 irq_source_id;
+	__u8  pad[16];
 };
 
 #define KVM_IRQ_SOURCE_ID_FLAG_DEASSIGN (1 << 0)
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 7d7e2aa..30150f1 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -51,6 +51,7 @@ struct _irqfd {
 	struct kvm_kernel_irq_routing_entry __rcu *irq_entry;
 	/* Used for level IRQ fast-path */
 	int gsi;
+	int irq_source_id;
 	struct work_struct inject;
 	/* Used for setup/shutdown */
 	struct eventfd_ctx *eventfd;
@@ -67,8 +68,8 @@ irqfd_inject(struct work_struct *work)
 	struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
 	struct kvm *kvm = irqfd->kvm;
 
-	kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1);
-	kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
+	kvm_set_irq(kvm, irqfd->irq_source_id, irqfd->gsi, 1);
+	kvm_set_irq(kvm, irqfd->irq_source_id, irqfd->gsi, 0);
 }
 
 /*
@@ -138,7 +139,7 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
 		irq = rcu_dereference(irqfd->irq_entry);
 		/* An event has been signaled, inject an interrupt */
 		if (irq)
-			kvm_set_msi(irq, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1);
+			kvm_set_msi(irq, kvm, irqfd->irq_source_id, 1);
 		else
 			schedule_work(&irqfd->inject);
 		rcu_read_unlock();
@@ -213,6 +214,10 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 
 	irqfd->kvm = kvm;
 	irqfd->gsi = args->gsi;
+	if (args->flags & KVM_IRQFD_FLAG_IRQ_SOURCE_ID)
+		irqfd->irq_source_id = args->irq_source_id;
+	else
+		irqfd->irq_source_id = KVM_USERSPACE_IRQ_SOURCE_ID;
 	INIT_LIST_HEAD(&irqfd->list);
 	INIT_WORK(&irqfd->inject, irqfd_inject);
 	INIT_WORK(&irqfd->shutdown, irqfd_shutdown);
@@ -340,7 +345,8 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
 int
 kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
 {
-	if (args->flags & ~KVM_IRQFD_FLAG_DEASSIGN)
+	if (args->flags & ~(KVM_IRQFD_FLAG_DEASSIGN |
+			    KVM_IRQFD_FLAG_IRQ_SOURCE_ID))
 		return -EINVAL;
 
 	if (args->flags & KVM_IRQFD_FLAG_DEASSIGN)


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v8 4/6] kvm: Add assert-only option to KVM_IRQFD
  2012-08-10 22:37 [PATCH v8 0/6] kvm: level irqfd support Alex Williamson
                   ` (2 preceding siblings ...)
  2012-08-10 22:37 ` [PATCH v8 3/6] kvm: Add IRQ source ID option to KVM_IRQFD Alex Williamson
@ 2012-08-10 22:37 ` Alex Williamson
  2012-08-10 22:37 ` [PATCH v8 5/6] kvm: KVM_IRQ_ACKFD Alex Williamson
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 33+ messages in thread
From: Alex Williamson @ 2012-08-10 22:37 UTC (permalink / raw)
  To: avi, mst; +Cc: gleb, kvm, linux-kernel

This allows specifying that an irqfd is used only to assert the
specified gsi, whereas standard behavior is to follow the assertion
with a deassertion.  This will later allow a level interrupt to be
asserted via eventfd and later de-asserted by other means.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 Documentation/virtual/kvm/api.txt |    6 ++++++
 arch/x86/kvm/x86.c                |    1 +
 include/linux/kvm.h               |    3 +++
 virt/kvm/eventfd.c                |    8 ++++++--
 4 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 46f4b4d..17cd599 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1951,6 +1951,12 @@ KVM_IRQFD_FLAG_IRQ_SOURCE_ID which can be used to specify an IRQ
 source ID (see KVM_IRQ_SOURCE_ID) to be used for the guest interrupt.
 This flag has no effect on deassignment.
 
+When KVM_CAP_IRQFD_ASSERT_ONLY is available, KVM_IRQFD supports the
+KVM_IRQFD_FLAG_ASSERT_ONLY which specifies that an interrupt injected
+via the eventfd is only asserted.  The default behavior is to assert
+then deassert the specified gsi when the eventfd is triggered.  This
+flag has no effect on deassignment.
+
 4.76 KVM_PPC_ALLOCATE_HTAB
 
 Capability: KVM_CAP_PPC_ALLOC_HTAB
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 946c5bd..19680ed 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2175,6 +2175,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_PCI_2_3:
 	case KVM_CAP_KVMCLOCK_CTRL:
 	case KVM_CAP_IRQFD_IRQ_SOURCE_ID:
+	case KVM_CAP_IRQFD_ASSERT_ONLY:
 		r = 1;
 		break;
 	case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index deda8a9..19b1235 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -620,6 +620,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_PPC_ALLOC_HTAB 80
 #define KVM_CAP_NR_IRQ_SOURCE_ID 81
 #define KVM_CAP_IRQFD_IRQ_SOURCE_ID 82
+#define KVM_CAP_IRQFD_ASSERT_ONLY 83
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -687,6 +688,8 @@ struct kvm_xen_hvm_config {
 #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
 /* Available with KVM_CAP_IRQFD_IRQ_SOURCE_ID */
 #define KVM_IRQFD_FLAG_IRQ_SOURCE_ID (1 << 1)
+/* Available with KVM_CAP_IRQFD_ASSERT_ONLY */
+#define KVM_IRQFD_FLAG_ASSERT_ONLY (1 << 2)
 
 struct kvm_irqfd {
 	__u32 fd;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 30150f1..df41038 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -52,6 +52,7 @@ struct _irqfd {
 	/* Used for level IRQ fast-path */
 	int gsi;
 	int irq_source_id;
+	bool assert_only;
 	struct work_struct inject;
 	/* Used for setup/shutdown */
 	struct eventfd_ctx *eventfd;
@@ -69,7 +70,8 @@ irqfd_inject(struct work_struct *work)
 	struct kvm *kvm = irqfd->kvm;
 
 	kvm_set_irq(kvm, irqfd->irq_source_id, irqfd->gsi, 1);
-	kvm_set_irq(kvm, irqfd->irq_source_id, irqfd->gsi, 0);
+	if (!irqfd->assert_only)
+		kvm_set_irq(kvm, irqfd->irq_source_id, irqfd->gsi, 0);
 }
 
 /*
@@ -218,6 +220,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 		irqfd->irq_source_id = args->irq_source_id;
 	else
 		irqfd->irq_source_id = KVM_USERSPACE_IRQ_SOURCE_ID;
+	irqfd->assert_only = args->flags & KVM_IRQFD_FLAG_ASSERT_ONLY;
 	INIT_LIST_HEAD(&irqfd->list);
 	INIT_WORK(&irqfd->inject, irqfd_inject);
 	INIT_WORK(&irqfd->shutdown, irqfd_shutdown);
@@ -346,7 +349,8 @@ int
 kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
 {
 	if (args->flags & ~(KVM_IRQFD_FLAG_DEASSIGN |
-			    KVM_IRQFD_FLAG_IRQ_SOURCE_ID))
+			    KVM_IRQFD_FLAG_IRQ_SOURCE_ID |
+			    KVM_IRQFD_FLAG_ASSERT_ONLY))
 		return -EINVAL;
 
 	if (args->flags & KVM_IRQFD_FLAG_DEASSIGN)


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v8 5/6] kvm: KVM_IRQ_ACKFD
  2012-08-10 22:37 [PATCH v8 0/6] kvm: level irqfd support Alex Williamson
                   ` (3 preceding siblings ...)
  2012-08-10 22:37 ` [PATCH v8 4/6] kvm: Add assert-only " Alex Williamson
@ 2012-08-10 22:37 ` Alex Williamson
  2012-08-15 14:05   ` Michael S. Tsirkin
  2012-08-10 22:37 ` [PATCH v8 6/6] kvm: Add de-assert option to KVM_IRQ_ACKFD Alex Williamson
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 33+ messages in thread
From: Alex Williamson @ 2012-08-10 22:37 UTC (permalink / raw)
  To: avi, mst; +Cc: gleb, kvm, linux-kernel

Enable a mechanism for IRQ ACKs to be exposed through an eventfd.  The
user can specify the GSI and optionally an IRQ source ID and have the
provided eventfd trigger whenever the irqchip resamples it's inputs,
for instance on EOI.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 Documentation/virtual/kvm/api.txt |   18 ++
 arch/x86/kvm/x86.c                |    2 
 include/linux/kvm.h               |   16 ++
 include/linux/kvm_host.h          |   13 ++
 virt/kvm/eventfd.c                |  285 +++++++++++++++++++++++++++++++++++++
 virt/kvm/kvm_main.c               |   10 +
 6 files changed, 344 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 17cd599..77b4837 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2011,6 +2011,24 @@ of IRQ source IDs.  These IDs are also shared with KVM internal users
 (ex. KVM assigned devices, PIT, shared user ID), therefore not all IDs
 may be allocated through this interface.
 
+4.78 KVM_IRQ_ACKFD
+
+Capability: KVM_CAP_IRQ_ACKFD
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_irq_ackfd (in)
+Returns: 0 on success, -errno on error
+
+Allows userspace notification of IRQ ACKs, or resampling of irqchip
+inputs, often associated with an EOI.  User provided kvm_irq_ackfd.fd
+and kvm_irq_ackfd.gsi are required and result in an eventfd trigger
+whenever the GSI is acknowledged.  When KVM_CAP_IRQ_ACKFD_IRQ_SOURCE_FD
+is available, KVM_IRQ_ACKFD supports the KVM_IRQ_ACKFD_FLAG_IRQ_SOURCE_ID
+flag which indicates kvm_irqfd.irq_source_id is provided.  With this,
+the eventfd is only triggered when the specified IRQ source ID is
+pending.  On deassign, fd, gsi, and irq_source_id (if provided on assign)
+must be provided.
+
 
 5. The kvm_run structure
 ------------------------
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 19680ed..3d98e59 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2176,6 +2176,8 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_KVMCLOCK_CTRL:
 	case KVM_CAP_IRQFD_IRQ_SOURCE_ID:
 	case KVM_CAP_IRQFD_ASSERT_ONLY:
+	case KVM_CAP_IRQ_ACKFD:
+	case KVM_CAP_IRQ_ACKFD_IRQ_SOURCE_ID:
 		r = 1;
 		break;
 	case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 19b1235..0f53bd5 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -621,6 +621,8 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_NR_IRQ_SOURCE_ID 81
 #define KVM_CAP_IRQFD_IRQ_SOURCE_ID 82
 #define KVM_CAP_IRQFD_ASSERT_ONLY 83
+#define KVM_CAP_IRQ_ACKFD 84
+#define KVM_CAP_IRQ_ACKFD_IRQ_SOURCE_ID 85
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -707,6 +709,18 @@ struct kvm_irq_source_id {
 	__u8 pad[24];
 };
 
+#define KVM_IRQ_ACKFD_FLAG_DEASSIGN (1 << 0)
+/* Available with KVM_CAP_IRQ_ACKFD_IRQ_SOURCE_ID */
+#define KVM_IRQ_ACKFD_FLAG_IRQ_SOURCE_ID (1 << 1)
+
+struct kvm_irq_ackfd {
+	__u32 flags;
+	__u32 fd;
+	__u32 gsi;
+	__u32 irq_source_id;
+	__u8 pad[16];
+};
+
 struct kvm_clock_data {
 	__u64 clock;
 	__u32 flags;
@@ -849,6 +863,8 @@ struct kvm_s390_ucas_mapping {
 #define KVM_PPC_ALLOCATE_HTAB	  _IOWR(KVMIO, 0xa7, __u32)
 /* Available with KVM_CAP_IRQ_SOURCE_ID */
 #define KVM_IRQ_SOURCE_ID         _IOWR(KVMIO, 0xa8, struct kvm_irq_source_id)
+/* Available with KVM_CAP_IRQ_ACKFD */
+#define KVM_IRQ_ACKFD             _IOW(KVMIO,  0xa9, struct kvm_irq_ackfd)
 
 /*
  * ioctls for vcpu fds
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ea6d7a1..cdc55c2 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -285,6 +285,10 @@ struct kvm {
 		struct list_head  items;
 	} irqfds;
 	struct list_head ioeventfds;
+	struct {
+		spinlock_t        lock;
+		struct list_head  items;
+	} irq_ackfds;
 #endif
 	struct kvm_vm_stat stat;
 	struct kvm_arch arch;
@@ -831,6 +835,8 @@ int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args);
 void kvm_irqfd_release(struct kvm *kvm);
 void kvm_irq_routing_update(struct kvm *, struct kvm_irq_routing_table *);
 int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);
+int kvm_irq_ackfd(struct kvm *kvm, struct kvm_irq_ackfd *args);
+void kvm_irq_ackfd_release(struct kvm *kvm);
 
 #else
 
@@ -843,6 +849,13 @@ static inline int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
 
 static inline void kvm_irqfd_release(struct kvm *kvm) {}
 
+static inline int kvm_irq_ackfd(struct kvm *kvm, struct kvm_irq_ackfd *args)
+{
+	return -EINVAL;
+}
+
+static inline void kvm_irq_ackfd_release(struct kvm *kvm) {}
+
 #ifdef CONFIG_HAVE_KVM_IRQCHIP
 static inline void kvm_irq_routing_update(struct kvm *kvm,
 					  struct kvm_irq_routing_table *irq_rt)
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index df41038..ff5c784 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -300,6 +300,8 @@ kvm_eventfd_init(struct kvm *kvm)
 	spin_lock_init(&kvm->irqfds.lock);
 	INIT_LIST_HEAD(&kvm->irqfds.items);
 	INIT_LIST_HEAD(&kvm->ioeventfds);
+	spin_lock_init(&kvm->irq_ackfds.lock);
+	INIT_LIST_HEAD(&kvm->irq_ackfds.items);
 }
 
 /*
@@ -668,3 +670,286 @@ kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
 
 	return kvm_assign_ioeventfd(kvm, args);
 }
+
+/*
+ * --------------------------------------------------------------------
+ * irq_ackfd: Expose IRQ ACKs to userspace via eventfd
+ *
+ * userspace can register to recieve eventfd notification for ACKs of
+ * interrupt lines.
+ * --------------------------------------------------------------------
+ */
+
+struct _irq_ackfd {
+	struct kvm *kvm;
+	struct eventfd_ctx *eventfd; /* signaled on ack */
+	struct kvm_irq_ack_notifier notifier;
+	/* Setup/shutdown */
+	wait_queue_t wait;
+	poll_table pt;
+	struct work_struct shutdown;
+	struct list_head list;
+};
+
+/* Called under irq_ackfds.lock */
+static void irq_ackfd_shutdown(struct work_struct *work)
+{
+	struct _irq_ackfd *ackfd = container_of(work, struct _irq_ackfd,
+						shutdown);
+	struct kvm *kvm = ackfd->kvm;
+	u64 cnt;
+
+	/*
+	 * Stop ACK signaling
+	 */
+	kvm_unregister_irq_ack_notifier(kvm, &ackfd->notifier);
+
+	/*
+	 * Synchronize with the wait-queue and unhook ourselves to prevent
+	 * further events.
+	 */
+	eventfd_ctx_remove_wait_queue(ackfd->eventfd, &ackfd->wait, &cnt);
+
+	/*
+	 * Release resources
+	 */
+	eventfd_ctx_put(ackfd->eventfd);
+	kfree(ackfd);
+}
+
+/* assumes kvm->irq_ackfds.lock is held */
+static bool irq_ackfd_is_active(struct _irq_ackfd *ackfd)
+{
+	return list_empty(&ackfd->list) ? false : true;
+}
+
+/*
+ * Mark the irq_ackfd as inactive and schedule it for removal
+ *
+ * assumes kvm->irq_ackfds.lock is held
+ */
+static void irq_ackfd_deactivate(struct _irq_ackfd *ackfd)
+{
+	BUG_ON(!irq_ackfd_is_active(ackfd));
+
+	list_del_init(&ackfd->list);
+
+	/* Borrow irqfd's workqueue, we don't need our own. */
+	queue_work(irqfd_cleanup_wq, &ackfd->shutdown);
+}
+
+/*
+ * Called with wqh->lock held and interrupts disabled
+ */
+static int irq_ackfd_wakeup(wait_queue_t *wait, unsigned mode,
+			    int sync, void *key)
+{
+	unsigned long flags = (unsigned long)key;
+
+	if (unlikely(flags & POLLHUP)) {
+		/* The eventfd is closing, detach from KVM */
+		struct _irq_ackfd *ackfd;
+		unsigned long flags;
+
+		ackfd = container_of(wait, struct _irq_ackfd, wait);
+
+		spin_lock_irqsave(&ackfd->kvm->irq_ackfds.lock, flags);
+
+		/*
+		 * We must check if someone deactivated the irq_ackfd before
+		 * we could acquire the irq_ackfds.lock since the item is
+		 * deactivated from the KVM side before it is unhooked from
+		 * the wait-queue.  If it is already deactivated, we can
+		 * simply return knowing the other side will cleanup for us.
+		 * We cannot race against the irq_ackfd going away since the
+		 * other side is required to acquire wqh->lock, which we hold
+		 */
+		if (irq_ackfd_is_active(ackfd))
+				irq_ackfd_deactivate(ackfd);
+
+		spin_unlock_irqrestore(&ackfd->kvm->irq_ackfds.lock, flags);
+	}
+
+	return 0;
+}
+
+static void irq_ackfd_ptable_queue_proc(struct file *file,
+					wait_queue_head_t *wqh,
+					poll_table *pt)
+{
+	struct _irq_ackfd *ackfd = container_of(pt, struct _irq_ackfd, pt);
+	add_wait_queue(wqh, &ackfd->wait);
+}
+
+/*
+ * This function is called as the kvm VM fd is being released. Shutdown all
+ * irq_ackfds that still remain open
+ */
+void kvm_irq_ackfd_release(struct kvm *kvm)
+{
+	struct _irq_ackfd *tmp, *ackfd;
+
+	spin_lock_irq(&kvm->irq_ackfds.lock);
+
+	list_for_each_entry_safe(ackfd, tmp, &kvm->irq_ackfds.items, list)
+		irq_ackfd_deactivate(ackfd);
+
+	spin_unlock_irq(&kvm->irq_ackfds.lock);
+
+	flush_workqueue(irqfd_cleanup_wq);
+}
+
+static void irq_ackfd_acked(struct kvm_irq_ack_notifier *notifier)
+{
+	struct _irq_ackfd *ackfd;
+
+	ackfd = container_of(notifier, struct _irq_ackfd, notifier);
+
+	eventfd_signal(ackfd->eventfd, 1);
+}
+
+static int kvm_assign_irq_ackfd(struct kvm *kvm, struct kvm_irq_ackfd *args)
+{
+	struct file *file = NULL;
+	struct eventfd_ctx *eventfd = NULL;
+	struct _irq_ackfd *ackfd = NULL, *tmp;
+	int ret;
+	u64 cnt;
+
+	file = eventfd_fget(args->fd);
+	if (IS_ERR(file)) {
+		ret = PTR_ERR(file);
+		goto fail;
+	}
+
+	eventfd = eventfd_ctx_fileget(file);
+	if (IS_ERR(eventfd)) {
+		ret = PTR_ERR(eventfd);
+		goto fail;
+	}
+
+	ackfd = kzalloc(sizeof(*ackfd), GFP_KERNEL);
+	if (!ackfd) {
+		ret = -ENOMEM;
+		goto fail;
+	}
+
+	INIT_LIST_HEAD(&ackfd->list);
+	INIT_WORK(&ackfd->shutdown, irq_ackfd_shutdown);
+	ackfd->kvm = kvm;
+	ackfd->eventfd = eventfd;
+	ackfd->notifier.gsi = args->gsi;
+	if (args->flags & KVM_IRQ_ACKFD_FLAG_IRQ_SOURCE_ID)
+		ackfd->notifier.irq_source_id = args->irq_source_id;
+	else
+		ackfd->notifier.irq_source_id = -1;
+	ackfd->notifier.irq_acked = irq_ackfd_acked;
+
+	/*
+	 * Install our own custom wake-up handling so we are notified via
+	 * a callback whenever someone releases the underlying eventfd
+	 */
+	init_waitqueue_func_entry(&ackfd->wait, irq_ackfd_wakeup);
+	init_poll_funcptr(&ackfd->pt, irq_ackfd_ptable_queue_proc);
+
+	/*
+	 * Install the wait queue function to allow cleanup when the
+	 * eventfd is closed by the user.  This grabs the wqh lock, so
+	 * we do it out of spinlock, holding the file reference ensures
+	 * we won't see a POLLHUP until setup is complete.
+	 */
+	file->f_op->poll(file, &ackfd->pt);
+
+	spin_lock_irq(&kvm->irq_ackfds.lock);
+
+	/* Check for duplicates.  */
+	list_for_each_entry(tmp, &kvm->irq_ackfds.items, list) {
+		if (ackfd->eventfd == tmp->eventfd &&
+		    ackfd->notifier.gsi == tmp->notifier.gsi &&
+		    ackfd->notifier.irq_source_id ==
+		    tmp->notifier.irq_source_id) {
+			spin_unlock_irq(&kvm->irq_ackfds.lock);
+			ret = -EBUSY;
+			goto fail_unqueue;
+		}
+	}
+
+	list_add_tail(&ackfd->list, &kvm->irq_ackfds.items);
+
+	spin_unlock_irq(&kvm->irq_ackfds.lock);
+
+	/* This can sleep, so register after spinlock.  */
+	kvm_register_irq_ack_notifier(kvm, &ackfd->notifier);
+
+	fput(file); /* Enable POLLHUP */
+
+	return 0;
+
+fail_unqueue:
+	eventfd_ctx_remove_wait_queue(eventfd, &ackfd->wait, &cnt);
+fail:
+	if (eventfd && !IS_ERR(eventfd))
+		eventfd_ctx_put(eventfd);
+
+	if (file && !IS_ERR(file))
+		fput(file);
+
+	kfree(ackfd);
+
+	return ret;
+}
+
+static int kvm_deassign_irq_ackfd(struct kvm *kvm, struct kvm_irq_ackfd *args)
+{
+	struct eventfd_ctx *eventfd = NULL;
+	struct _irq_ackfd *ackfd;
+	int irq_source_id = -1, ret = -ENOENT;
+
+	eventfd = eventfd_ctx_fdget(args->fd);
+	if (IS_ERR(eventfd)) {
+		ret = PTR_ERR(eventfd);
+		goto fail;
+	}
+
+	if (args->flags & KVM_IRQ_ACKFD_FLAG_IRQ_SOURCE_ID)
+		irq_source_id = args->irq_source_id;
+
+	spin_lock_irq(&kvm->irq_ackfds.lock);
+
+	list_for_each_entry(ackfd, &kvm->irq_ackfds.items, list) {
+		if (ackfd->eventfd == eventfd &&
+		    ackfd->notifier.gsi == args->gsi &&
+		    ackfd->notifier.irq_source_id == irq_source_id) {
+			irq_ackfd_deactivate(ackfd);
+			ret = 0;
+			break;
+		}
+	}
+
+	spin_unlock_irq(&kvm->irq_ackfds.lock);
+
+fail:
+	if (eventfd && !IS_ERR(eventfd))
+		eventfd_ctx_put(eventfd);
+
+	/*
+	 * Block until we know all outstanding shutdown jobs have completed
+	 * so that we guarantee there will not be any more ACKs signaled on
+	 * this eventfd once this deassign function returns.
+	 */
+	flush_workqueue(irqfd_cleanup_wq);
+
+	return ret;
+}
+
+int kvm_irq_ackfd(struct kvm *kvm, struct kvm_irq_ackfd *args)
+{
+	if (args->flags & ~(KVM_IRQ_ACKFD_FLAG_DEASSIGN |
+			    KVM_IRQ_ACKFD_FLAG_IRQ_SOURCE_ID))
+		return -EINVAL;
+
+	if (args->flags & KVM_IRQ_ACKFD_FLAG_DEASSIGN)
+		return kvm_deassign_irq_ackfd(kvm, args);
+
+	return kvm_assign_irq_ackfd(kvm, args);
+}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e120cb3..bbe3a20 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -619,6 +619,7 @@ static int kvm_vm_release(struct inode *inode, struct file *filp)
 	struct kvm *kvm = filp->private_data;
 
 	kvm_irqfd_release(kvm);
+	kvm_irq_ackfd_release(kvm);
 
 	kvm_put_kvm(kvm);
 	return 0;
@@ -2109,6 +2110,15 @@ static long kvm_vm_ioctl(struct file *filp,
 		break;
 	}
 #endif
+	case KVM_IRQ_ACKFD: {
+		struct kvm_irq_ackfd data;
+
+		r = -EFAULT;
+		if (copy_from_user(&data, argp, sizeof(data)))
+			goto out;
+		r = kvm_irq_ackfd(kvm, &data);
+		break;
+	}
 	default:
 		r = kvm_arch_vm_ioctl(filp, ioctl, arg);
 		if (r == -ENOTTY)


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v8 6/6] kvm: Add de-assert option to KVM_IRQ_ACKFD
  2012-08-10 22:37 [PATCH v8 0/6] kvm: level irqfd support Alex Williamson
                   ` (4 preceding siblings ...)
  2012-08-10 22:37 ` [PATCH v8 5/6] kvm: KVM_IRQ_ACKFD Alex Williamson
@ 2012-08-10 22:37 ` Alex Williamson
  2012-08-15 14:11   ` Michael S. Tsirkin
  2012-08-15 14:28 ` [PATCH v8 0/6] kvm: level irqfd support Michael S. Tsirkin
  2012-08-16 16:32 ` Avi Kivity
  7 siblings, 1 reply; 33+ messages in thread
From: Alex Williamson @ 2012-08-10 22:37 UTC (permalink / raw)
  To: avi, mst; +Cc: gleb, kvm, linux-kernel

It's likely (vfio) that one of the reasons to watch for an IRQ ACK
is to de-assert and re-enable an interrupt.  As the IRQ ACK notfier
is already watching a GSI for an IRQ source ID we can easily couple
these together.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 Documentation/virtual/kvm/api.txt |    4 ++++
 arch/x86/kvm/x86.c                |    1 +
 include/linux/kvm.h               |    3 +++
 virt/kvm/eventfd.c                |   14 +++++++++++++-
 4 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 77b4837..128d4c3 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2029,6 +2029,10 @@ the eventfd is only triggered when the specified IRQ source ID is
 pending.  On deassign, fd, gsi, and irq_source_id (if provided on assign)
 must be provided.
 
+When KVM_CAP_IRQ_ACKFD_DEASSERT is available the flag
+KVM_IRQ_ACKFD_FLAG_IRQ_DEASSERT may be used on assignment to specify
+that the GSI should be de-asserted prior to eventfd notification.
+This flag requires an IRQ source ID to be provided as described above.
 
 5. The kvm_run structure
 ------------------------
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3d98e59..691b00d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2178,6 +2178,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_IRQFD_ASSERT_ONLY:
 	case KVM_CAP_IRQ_ACKFD:
 	case KVM_CAP_IRQ_ACKFD_IRQ_SOURCE_ID:
+	case KVM_CAP_IRQ_ACKFD_DEASSERT:
 		r = 1;
 		break;
 	case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 0f53bd5..331631e 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -623,6 +623,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_IRQFD_ASSERT_ONLY 83
 #define KVM_CAP_IRQ_ACKFD 84
 #define KVM_CAP_IRQ_ACKFD_IRQ_SOURCE_ID 85
+#define KVM_CAP_IRQ_ACKFD_DEASSERT 86
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -712,6 +713,8 @@ struct kvm_irq_source_id {
 #define KVM_IRQ_ACKFD_FLAG_DEASSIGN (1 << 0)
 /* Available with KVM_CAP_IRQ_ACKFD_IRQ_SOURCE_ID */
 #define KVM_IRQ_ACKFD_FLAG_IRQ_SOURCE_ID (1 << 1)
+/* Available with KVM_CAP_IRQ_ACKFD_DEASSERT */
+#define KVM_IRQ_ACKFD_FLAG_DEASSERT (1 << 2)
 
 struct kvm_irq_ackfd {
 	__u32 flags;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index ff5c784..ffc6a13 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -682,6 +682,7 @@ kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
 
 struct _irq_ackfd {
 	struct kvm *kvm;
+	bool deassert; /* de-assert on ack? */
 	struct eventfd_ctx *eventfd; /* signaled on ack */
 	struct kvm_irq_ack_notifier notifier;
 	/* Setup/shutdown */
@@ -805,6 +806,10 @@ static void irq_ackfd_acked(struct kvm_irq_ack_notifier *notifier)
 
 	ackfd = container_of(notifier, struct _irq_ackfd, notifier);
 
+	if (ackfd->deassert)
+		kvm_set_irq(ackfd->kvm, ackfd->notifier.irq_source_id,
+			    ackfd->notifier.gsi, 0);
+
 	eventfd_signal(ackfd->eventfd, 1);
 }
 
@@ -845,6 +850,12 @@ static int kvm_assign_irq_ackfd(struct kvm *kvm, struct kvm_irq_ackfd *args)
 		ackfd->notifier.irq_source_id = -1;
 	ackfd->notifier.irq_acked = irq_ackfd_acked;
 
+	ackfd->deassert = args->flags & KVM_IRQ_ACKFD_FLAG_DEASSERT;
+	if (ackfd->deassert && ackfd->notifier.irq_source_id < 0) {
+		ret = -EINVAL;
+		goto fail;
+	}
+
 	/*
 	 * Install our own custom wake-up handling so we are notified via
 	 * a callback whenever someone releases the underlying eventfd
@@ -945,7 +956,8 @@ fail:
 int kvm_irq_ackfd(struct kvm *kvm, struct kvm_irq_ackfd *args)
 {
 	if (args->flags & ~(KVM_IRQ_ACKFD_FLAG_DEASSIGN |
-			    KVM_IRQ_ACKFD_FLAG_IRQ_SOURCE_ID))
+			    KVM_IRQ_ACKFD_FLAG_IRQ_SOURCE_ID |
+			    KVM_IRQ_ACKFD_FLAG_DEASSERT))
 		return -EINVAL;
 
 	if (args->flags & KVM_IRQ_ACKFD_FLAG_DEASSIGN)


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 1/6] kvm: Allow filtering of acked irqs
  2012-08-10 22:37 ` [PATCH v8 1/6] kvm: Allow filtering of acked irqs Alex Williamson
@ 2012-08-15 12:27   ` Michael S. Tsirkin
  2012-08-15 16:47     ` Alex Williamson
  0 siblings, 1 reply; 33+ messages in thread
From: Michael S. Tsirkin @ 2012-08-15 12:27 UTC (permalink / raw)
  To: Alex Williamson; +Cc: avi, gleb, kvm, linux-kernel

On Fri, Aug 10, 2012 at 04:37:17PM -0600, Alex Williamson wrote:
> Registering an kvm_irq_ack_notifier with kian.irq_source_id < 0
> retains existing behavior, filling in the actual irq_source_id results
> in the callback only being called when the specified irq_source_id is
> asserting the given gsi.
> 
> The i8254 PIT remains unfiltered because it de-asserts it's irq source
> id, so it's notifier would never get called otherwise.  KVM device
> assignment gets filtering as it de-asserts the GSI in it's notifier.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

Looks good to me. For the record, I expect this to help if
- an assigned device interrupt is shared in host
  so we use slow config cycles in the ack notifier
- said device is sharing interrupt with another device in guest
- said another device is actually driving most interrupts
For example, I think this could be tested
by booting guest with pci=nomsi.

A minor suggestions below but
nothing that needs to block this patch.

> ---
> 
>  arch/x86/kvm/i8254.c     |    1 +
>  arch/x86/kvm/i8259.c     |    8 +++++++-
>  include/linux/kvm_host.h |    4 +++-
>  virt/kvm/assigned-dev.c  |    1 +
>  virt/kvm/ioapic.c        |    5 ++++-
>  virt/kvm/irq_comm.c      |    6 ++++--
>  6 files changed, 20 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
> index adba28f..2355d19 100644
> --- a/arch/x86/kvm/i8254.c
> +++ b/arch/x86/kvm/i8254.c
> @@ -709,6 +709,7 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
>  	hrtimer_init(&pit_state->pit_timer.timer,
>  		     CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
>  	pit_state->irq_ack_notifier.gsi = 0;
> +	pit_state->irq_ack_notifier.irq_source_id = -1; /* No filter */

A bit prettier would be to
#define KVM_NO_IRQ_SOURCE_ID     (-1)
and test for it explicitly.

>  	pit_state->irq_ack_notifier.irq_acked = kvm_pit_ack_irq;
>  	kvm_register_irq_ack_notifier(kvm, &pit_state->irq_ack_notifier);
>  	pit_state->pit_timer.reinject = true;
> diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
> index e498b18..d2175a9 100644
> --- a/arch/x86/kvm/i8259.c
> +++ b/arch/x86/kvm/i8259.c
> @@ -74,9 +74,14 @@ static void pic_unlock(struct kvm_pic *s)
>  
>  static void pic_clear_isr(struct kvm_kpic_state *s, int irq)
>  {
> +	unsigned long irq_source_ids;
> +
>  	s->isr &= ~(1 << irq);
>  	if (s != &s->pics_state->pics[0])
>  		irq += 8;
> +
> +	irq_source_ids = s->pics_state->irq_states[irq];
> +
>  	/*
>  	 * We are dropping lock while calling ack notifiers since ack
>  	 * notifier callbacks for assigned devices call into PIC recursively.
> @@ -84,7 +89,8 @@ static void pic_clear_isr(struct kvm_kpic_state *s, int irq)
>  	 * it should be safe since PIC state is already updated at this stage.
>  	 */
>  	pic_unlock(s->pics_state);
> -	kvm_notify_acked_irq(s->pics_state->kvm, SELECT_PIC(irq), irq);
> +	kvm_notify_acked_irq(s->pics_state->kvm, SELECT_PIC(irq), irq,
> +			     irq_source_ids);
>  	pic_lock(s->pics_state);
>  }
>  
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index b70b48b..2ad3e4a 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -577,6 +577,7 @@ int kvm_is_mmio_pfn(pfn_t pfn);
>  
>  struct kvm_irq_ack_notifier {
>  	struct hlist_node link;
> +	int irq_source_id;
>  	unsigned gsi;
>  	void (*irq_acked)(struct kvm_irq_ack_notifier *kian);
>  };
> @@ -627,7 +628,8 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic,
>  int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level);
>  int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm *kvm,
>  		int irq_source_id, int level);
> -void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
> +void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin,
> +			  unsigned long irq_source_ids);
>  void kvm_register_irq_ack_notifier(struct kvm *kvm,
>  				   struct kvm_irq_ack_notifier *kian);
>  void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
> diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
> index 23a41a9..a08c9c1 100644
> --- a/virt/kvm/assigned-dev.c
> +++ b/virt/kvm/assigned-dev.c
> @@ -407,6 +407,7 @@ static int assigned_device_enable_guest_intx(struct kvm *kvm,
>  {
>  	dev->guest_irq = irq->guest_irq;
>  	dev->ack_notifier.gsi = irq->guest_irq;
> +	dev->ack_notifier.irq_source_id = dev->irq_source_id;
>  	return 0;
>  }
>  
> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
> index ef61d52..1a9f445 100644
> --- a/virt/kvm/ioapic.c
> +++ b/virt/kvm/ioapic.c
> @@ -241,10 +241,12 @@ static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector,
>  
>  	for (i = 0; i < IOAPIC_NUM_PINS; i++) {
>  		union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i];
> +		unsigned long irq_source_ids;
>  
>  		if (ent->fields.vector != vector)
>  			continue;
>  
> +		irq_source_ids = ioapic->irq_states[i];
>  		/*
>  		 * We are dropping lock while calling ack notifiers because ack
>  		 * notifier callbacks for assigned devices call into IOAPIC
> @@ -254,7 +256,8 @@ static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector,
>  		 * after ack notifier returns.
>  		 */
>  		spin_unlock(&ioapic->lock);
> -		kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
> +		kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i,
> +				     irq_source_ids);
>  		spin_lock(&ioapic->lock);
>  
>  		if (trigger_mode != IOAPIC_LEVEL_TRIG)
> diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
> index 83402d7..7d75126 100644
> --- a/virt/kvm/irq_comm.c
> +++ b/virt/kvm/irq_comm.c
> @@ -173,7 +173,8 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level)
>  	return ret;
>  }
>  
> -void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin)
> +void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin,
> +			  unsigned long irq_source_ids)
>  {
>  	struct kvm_irq_ack_notifier *kian;
>  	struct hlist_node *n;
> @@ -186,7 +187,8 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin)
>  	if (gsi != -1)
>  		hlist_for_each_entry_rcu(kian, n, &kvm->irq_ack_notifier_list,
>  					 link)
> -			if (kian->gsi == gsi)
> +			if (kian->gsi == gsi && (kian->irq_source_id < 0 ||
> +			    test_bit(kian->irq_source_id, &irq_source_ids)))
>  				kian->irq_acked(kian);
>  	rcu_read_unlock();
>  }

-- 
MST

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 2/6] kvm: Expose IRQ source IDs to userspace
  2012-08-10 22:37 ` [PATCH v8 2/6] kvm: Expose IRQ source IDs to userspace Alex Williamson
@ 2012-08-15 12:59   ` Michael S. Tsirkin
  2012-08-15 17:05     ` Alex Williamson
  0 siblings, 1 reply; 33+ messages in thread
From: Michael S. Tsirkin @ 2012-08-15 12:59 UTC (permalink / raw)
  To: Alex Williamson; +Cc: avi, gleb, kvm, linux-kernel

On Fri, Aug 10, 2012 at 04:37:25PM -0600, Alex Williamson wrote:
> Introduce KVM_IRQ_SOURCE_ID and KVM_CAP_NR_IRQ_SOURCE_ID to allow
> user allocation of IRQ source IDs and querying both the capability
> and the total count of IRQ source IDs.  These will later be used
> by interfaces for setting up level IRQs.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
> 
>  Documentation/virtual/kvm/api.txt |   20 ++++++++++++++++++++
>  arch/x86/kvm/Kconfig              |    1 +
>  arch/x86/kvm/x86.c                |    3 +++
>  include/linux/kvm.h               |   11 +++++++++++
>  include/linux/kvm_host.h          |    1 +
>  virt/kvm/Kconfig                  |    3 +++
>  virt/kvm/irq_comm.c               |   22 ++++++++++++++++++++++
>  virt/kvm/kvm_main.c               |   16 ++++++++++++++++
>  8 files changed, 77 insertions(+)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index bf33aaa..062cfd5 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -1980,6 +1980,26 @@ return the hash table order in the parameter.  (If the guest is using
>  the virtualized real-mode area (VRMA) facility, the kernel will
>  re-create the VMRA HPTEs on the next KVM_RUN of any vcpu.)
>  
> +4.77 KVM_IRQ_SOURCE_ID
> +
> +Capability: KVM_CAP_NR_IRQ_SOURCE_ID
> +Architectures: x86
> +Type: vm ioctl
> +Parameters: struct kvm_irq_source_id (in/out)
> +Returns: 0 on success, -errno on error
> +
> +Allows allocating and freeing IRQ source IDs.  Each IRQ source ID
> +represents a complete set of irqchip pin inputs which are logically
> +OR'd with other IRQ source IDs for determining the final assertion
> +level of a pin.  The flag KVM_IRQ_SOURCE_ID_FLAG_DEASSIGN indicates
> +whether the call is for an allocation or deallocation.
> +kvm_irq_source_id.irq_source_id returns the allocated IRQ source ID
> +on success and specifies the freed IRQ source ID on deassign.  The
> +return value of KVM_CAP_NR_IRQ_SOURCE_ID indicates the total number
> +of IRQ source IDs.  These IDs are also shared with KVM internal users
> +(ex. KVM assigned devices, PIT, shared user ID), therefore not all IDs
> +may be allocated through this interface.
> +
>  5. The kvm_run structure
>  ------------------------
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index a28f338..bfd2082 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -37,6 +37,7 @@ config KVM
>  	select TASK_DELAY_ACCT
>  	select PERF_EVENTS
>  	select HAVE_KVM_MSI
> +	select HAVE_KVM_IRQ_SOURCE_ID
>  	---help---
>  	  Support hosting fully virtualized guest machines using hardware
>  	  virtualization extensions.  You will need a fairly recent
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 42bce48..75e743e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2209,6 +2209,9 @@ int kvm_dev_ioctl_check_extension(long ext)
>  	case KVM_CAP_TSC_DEADLINE_TIMER:
>  		r = boot_cpu_has(X86_FEATURE_TSC_DEADLINE_TIMER);
>  		break;
> +	case KVM_CAP_NR_IRQ_SOURCE_ID:
> +		r = BITS_PER_LONG; /* kvm->arch.irq_sources_bitmap */
> +		break;
>  	default:
>  		r = 0;
>  		break;
> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> index 2ce09aa..67b6b49 100644
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
>  #define KVM_CAP_PPC_GET_SMMU_INFO 78
>  #define KVM_CAP_S390_COW 79
>  #define KVM_CAP_PPC_ALLOC_HTAB 80
> +#define KVM_CAP_NR_IRQ_SOURCE_ID 81
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> @@ -691,6 +692,14 @@ struct kvm_irqfd {
>  	__u8  pad[20];
>  };
>  
> +#define KVM_IRQ_SOURCE_ID_FLAG_DEASSIGN (1 << 0)
> +
> +struct kvm_irq_source_id {
> +	__u32 flags;
> +	__u32 irq_source_id;
> +	__u8 pad[24];
> +};
> +
>  struct kvm_clock_data {
>  	__u64 clock;
>  	__u32 flags;
> @@ -831,6 +840,8 @@ struct kvm_s390_ucas_mapping {
>  #define KVM_PPC_GET_SMMU_INFO	  _IOR(KVMIO,  0xa6, struct kvm_ppc_smmu_info)
>  /* Available with KVM_CAP_PPC_ALLOC_HTAB */
>  #define KVM_PPC_ALLOCATE_HTAB	  _IOWR(KVMIO, 0xa7, __u32)
> +/* Available with KVM_CAP_IRQ_SOURCE_ID */
> +#define KVM_IRQ_SOURCE_ID         _IOWR(KVMIO, 0xa8, struct kvm_irq_source_id)
>  
>  /*
>   * ioctls for vcpu fds
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 2ad3e4a..ea6d7a1 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -636,6 +636,7 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
>  				   struct kvm_irq_ack_notifier *kian);
>  int kvm_request_irq_source_id(struct kvm *kvm);
>  void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id);
> +int kvm_irq_source_id(struct kvm *kvm, struct kvm_irq_source_id *id);
>  
>  /* For vcpu->arch.iommu_flags */
>  #define KVM_IOMMU_CACHE_COHERENCY	0x1
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 28694f4..b7e0d4d 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -21,3 +21,6 @@ config KVM_ASYNC_PF
>  
>  config HAVE_KVM_MSI
>         bool
> +
> +config HAVE_KVM_IRQ_SOURCE_ID
> +       bool
> diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
> index 7d75126..44d40c9 100644
> --- a/virt/kvm/irq_comm.c
> +++ b/virt/kvm/irq_comm.c
> @@ -254,6 +254,28 @@ unlock:
>  	mutex_unlock(&kvm->irq_lock);
>  }
>  
> +int kvm_irq_source_id(struct kvm *kvm, struct kvm_irq_source_id *id)
> +{
> +	int irq_source_id;
> +
> +	if (id->flags & ~KVM_IRQ_SOURCE_ID_FLAG_DEASSIGN)
> +		return -EINVAL;
> +
> +	if (id->flags & KVM_IRQ_SOURCE_ID_FLAG_DEASSIGN) {
> +		if (id->irq_source_id == KVM_USERSPACE_IRQ_SOURCE_ID)
> +			return -EINVAL;

This validation here is probably a work around
for assert in kvm_free_irq_source_id?
It would be good to add a comment explaining this,
or just drop assert in kvm_free_irq_source_id.

Also, kvm_free_irq_source_id prints error messages
if you give it an invalid value, so
this let userspace flood kernel log with error messages.

> +		kvm_free_irq_source_id(kvm, id->irq_source_id);
> +		return 0;
> +	}
> +

Hmm. This will let buggy userspace deassign IDs such as PIT that it
never assigned. I think it will only break itself so it should be
harmless, but it would seem to work with subtle
interrupt-losing races, so maybe it would be nicer if we could
validate the ID was actually given to userspace at some point.

> +	irq_source_id = kvm_request_irq_source_id(kvm);

kvm_request_irq_source_id was not intended to be called
by userspace so it behaves a bit strangely:
- If you call this too many times you let userspace flood
  kernel log with warnings
- return value is -EFAULT which is ignored by another caller
  but passed to userspace here, arguably wrong

> +	if (irq_source_id < 0)
> +		return irq_source_id;
> +
> +	id->irq_source_id = irq_source_id;
> +	return 0;
> +}
> +

There is a minor bug here: if you call this too many times then following
PIT and assigned device allocation will fail.

This is in contrast to documentation which says that
getting source id will fail in this case,
and generally gives userspace no way to detect
what went wrong and recover: it only sees ENOMEM.

One solution I see is split the source ID space, give half to userspace
and half to internal users.

>  void kvm_register_irq_mask_notifier(struct kvm *kvm, int irq,
>  				    struct kvm_irq_mask_notifier *kimn)
>  {
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 2468523..e120cb3 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2093,6 +2093,22 @@ static long kvm_vm_ioctl(struct file *filp,
>  		break;
>  	}
>  #endif
> +#ifdef CONFIG_HAVE_KVM_IRQ_SOURCE_ID
> +	case KVM_IRQ_SOURCE_ID: {
> +		struct kvm_irq_source_id id;
> +
> +		r = -EFAULT;
> +		if (copy_from_user(&id, argp, sizeof(id)))
> +			goto out;
> +
> +		r = kvm_irq_source_id(kvm, &id);
> +		if (!r && copy_to_user(argp, &id, sizeof(id))) {
> +			r = -EFAULT;
> +			goto out;
> +		}
> +		break;
> +	}
> +#endif
>  	default:
>  		r = kvm_arch_vm_ioctl(filp, ioctl, arg);
>  		if (r == -ENOTTY)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 3/6] kvm: Add IRQ source ID option to KVM_IRQFD
  2012-08-10 22:37 ` [PATCH v8 3/6] kvm: Add IRQ source ID option to KVM_IRQFD Alex Williamson
@ 2012-08-15 13:49   ` Michael S. Tsirkin
  2012-08-15 17:08     ` Alex Williamson
  0 siblings, 1 reply; 33+ messages in thread
From: Michael S. Tsirkin @ 2012-08-15 13:49 UTC (permalink / raw)
  To: Alex Williamson; +Cc: avi, gleb, kvm, linux-kernel

On Fri, Aug 10, 2012 at 04:37:33PM -0600, Alex Williamson wrote:
> This allows specifying an IRQ source ID to be used when injecting an
> interrupt.  When not specified KVM_USERSPACE_IRQ_SOURCE_ID is used.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
> 
>  Documentation/virtual/kvm/api.txt |    5 +++++
>  arch/x86/kvm/x86.c                |    1 +
>  include/linux/kvm.h               |    6 +++++-
>  virt/kvm/eventfd.c                |   14 ++++++++++----
>  4 files changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 062cfd5..46f4b4d 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -1946,6 +1946,11 @@ the guest using the specified gsi pin.  The irqfd is removed using
>  the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
>  and kvm_irqfd.gsi.
>  
> +When KVM_CAP_IRQFD_IRQ_SOURCE_ID is available, KVM_IRQFD supports the
> +KVM_IRQFD_FLAG_IRQ_SOURCE_ID which can be used to specify an IRQ
> +source ID (see KVM_IRQ_SOURCE_ID) to be used for the guest interrupt.
> +This flag has no effect on deassignment.
> +
>  4.76 KVM_PPC_ALLOCATE_HTAB
>  
>  Capability: KVM_CAP_PPC_ALLOC_HTAB
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 75e743e..946c5bd 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2174,6 +2174,7 @@ int kvm_dev_ioctl_check_extension(long ext)
>  	case KVM_CAP_GET_TSC_KHZ:
>  	case KVM_CAP_PCI_2_3:
>  	case KVM_CAP_KVMCLOCK_CTRL:
> +	case KVM_CAP_IRQFD_IRQ_SOURCE_ID:
>  		r = 1;
>  		break;
>  	case KVM_CAP_COALESCED_MMIO:
> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> index 67b6b49..deda8a9 100644
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -619,6 +619,7 @@ struct kvm_ppc_smmu_info {
>  #define KVM_CAP_S390_COW 79
>  #define KVM_CAP_PPC_ALLOC_HTAB 80
>  #define KVM_CAP_NR_IRQ_SOURCE_ID 81
> +#define KVM_CAP_IRQFD_IRQ_SOURCE_ID 82
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> @@ -684,12 +685,15 @@ struct kvm_xen_hvm_config {
>  #endif
>  
>  #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
> +/* Available with KVM_CAP_IRQFD_IRQ_SOURCE_ID */
> +#define KVM_IRQFD_FLAG_IRQ_SOURCE_ID (1 << 1)
>  
>  struct kvm_irqfd {
>  	__u32 fd;
>  	__u32 gsi;
>  	__u32 flags;
> -	__u8  pad[20];
> +	__u32 irq_source_id;
> +	__u8  pad[16];
>  };
>  
>  #define KVM_IRQ_SOURCE_ID_FLAG_DEASSIGN (1 << 0)
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 7d7e2aa..30150f1 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -51,6 +51,7 @@ struct _irqfd {
>  	struct kvm_kernel_irq_routing_entry __rcu *irq_entry;
>  	/* Used for level IRQ fast-path */
>  	int gsi;
> +	int irq_source_id;
>  	struct work_struct inject;
>  	/* Used for setup/shutdown */
>  	struct eventfd_ctx *eventfd;
> @@ -67,8 +68,8 @@ irqfd_inject(struct work_struct *work)
>  	struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
>  	struct kvm *kvm = irqfd->kvm;
>  
> -	kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1);
> -	kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
> +	kvm_set_irq(kvm, irqfd->irq_source_id, irqfd->gsi, 1);
> +	kvm_set_irq(kvm, irqfd->irq_source_id, irqfd->gsi, 0);

Looks like this can lead to kernel data corruption if irq_source_id
specified it out of range?

>  }
>  
>  /*
> @@ -138,7 +139,7 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
>  		irq = rcu_dereference(irqfd->irq_entry);
>  		/* An event has been signaled, inject an interrupt */
>  		if (irq)
> -			kvm_set_msi(irq, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1);
> +			kvm_set_msi(irq, kvm, irqfd->irq_source_id, 1);
>  		else
>  			schedule_work(&irqfd->inject);
>  		rcu_read_unlock();
> @@ -213,6 +214,10 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
>  
>  	irqfd->kvm = kvm;
>  	irqfd->gsi = args->gsi;
> +	if (args->flags & KVM_IRQFD_FLAG_IRQ_SOURCE_ID)
> +		irqfd->irq_source_id = args->irq_source_id;
> +	else
> +		irqfd->irq_source_id = KVM_USERSPACE_IRQ_SOURCE_ID;

As in the previous patch, there is no validation
that source id was previously allocated to userspace, so this makes life
harder for userspace: if it deassigns a used source ID the result is not
a clean failure but hard to find bugs.

>  	INIT_LIST_HEAD(&irqfd->list);
>  	INIT_WORK(&irqfd->inject, irqfd_inject);
>  	INIT_WORK(&irqfd->shutdown, irqfd_shutdown);
> @@ -340,7 +345,8 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
>  int
>  kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
>  {
> -	if (args->flags & ~KVM_IRQFD_FLAG_DEASSIGN)
> +	if (args->flags & ~(KVM_IRQFD_FLAG_DEASSIGN |
> +			    KVM_IRQFD_FLAG_IRQ_SOURCE_ID))
>  		return -EINVAL;
>  
>  	if (args->flags & KVM_IRQFD_FLAG_DEASSIGN)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 5/6] kvm: KVM_IRQ_ACKFD
  2012-08-10 22:37 ` [PATCH v8 5/6] kvm: KVM_IRQ_ACKFD Alex Williamson
@ 2012-08-15 14:05   ` Michael S. Tsirkin
  2012-08-15 17:17     ` Alex Williamson
  0 siblings, 1 reply; 33+ messages in thread
From: Michael S. Tsirkin @ 2012-08-15 14:05 UTC (permalink / raw)
  To: Alex Williamson; +Cc: avi, gleb, kvm, linux-kernel

On Fri, Aug 10, 2012 at 04:37:48PM -0600, Alex Williamson wrote:
> Enable a mechanism for IRQ ACKs to be exposed through an eventfd.  The
> user can specify the GSI and optionally an IRQ source ID and have the
> provided eventfd trigger whenever the irqchip resamples it's inputs,
> for instance on EOI.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
> 
>  Documentation/virtual/kvm/api.txt |   18 ++
>  arch/x86/kvm/x86.c                |    2 
>  include/linux/kvm.h               |   16 ++
>  include/linux/kvm_host.h          |   13 ++
>  virt/kvm/eventfd.c                |  285 +++++++++++++++++++++++++++++++++++++
>  virt/kvm/kvm_main.c               |   10 +
>  6 files changed, 344 insertions(+)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 17cd599..77b4837 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -2011,6 +2011,24 @@ of IRQ source IDs.  These IDs are also shared with KVM internal users
>  (ex. KVM assigned devices, PIT, shared user ID), therefore not all IDs
>  may be allocated through this interface.
>  
> +4.78 KVM_IRQ_ACKFD
> +
> +Capability: KVM_CAP_IRQ_ACKFD
> +Architectures: x86
> +Type: vm ioctl
> +Parameters: struct kvm_irq_ackfd (in)
> +Returns: 0 on success, -errno on error
> +
> +Allows userspace notification of IRQ ACKs, or resampling of irqchip
> +inputs, often associated with an EOI.  User provided kvm_irq_ackfd.fd
> +and kvm_irq_ackfd.gsi are required and result in an eventfd trigger
> +whenever the GSI is acknowledged.  When KVM_CAP_IRQ_ACKFD_IRQ_SOURCE_FD
> +is available, KVM_IRQ_ACKFD supports the KVM_IRQ_ACKFD_FLAG_IRQ_SOURCE_ID
> +flag which indicates kvm_irqfd.irq_source_id is provided.  With this,
> +the eventfd is only triggered when the specified IRQ source ID is
> +pending.  On deassign, fd, gsi, and irq_source_id (if provided on assign)
> +must be provided.
> +
>

This seems to imply that ACKFD can be used with edge
GSIs, but this does not actually work: with source ID
because of the filtering, and without because of PV EOI.
It seems that what we are really tracking is
level change 1->0. For edge no level -> no notification?

Or does 'resampling of irqchip inputs' imply level somehow?

>  5. The kvm_run structure
>  ------------------------
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 19680ed..3d98e59 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2176,6 +2176,8 @@ int kvm_dev_ioctl_check_extension(long ext)
>  	case KVM_CAP_KVMCLOCK_CTRL:
>  	case KVM_CAP_IRQFD_IRQ_SOURCE_ID:
>  	case KVM_CAP_IRQFD_ASSERT_ONLY:
> +	case KVM_CAP_IRQ_ACKFD:
> +	case KVM_CAP_IRQ_ACKFD_IRQ_SOURCE_ID:
>  		r = 1;
>  		break;
>  	case KVM_CAP_COALESCED_MMIO:
> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> index 19b1235..0f53bd5 100644
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -621,6 +621,8 @@ struct kvm_ppc_smmu_info {
>  #define KVM_CAP_NR_IRQ_SOURCE_ID 81
>  #define KVM_CAP_IRQFD_IRQ_SOURCE_ID 82
>  #define KVM_CAP_IRQFD_ASSERT_ONLY 83
> +#define KVM_CAP_IRQ_ACKFD 84
> +#define KVM_CAP_IRQ_ACKFD_IRQ_SOURCE_ID 85
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> @@ -707,6 +709,18 @@ struct kvm_irq_source_id {
>  	__u8 pad[24];
>  };
>  
> +#define KVM_IRQ_ACKFD_FLAG_DEASSIGN (1 << 0)
> +/* Available with KVM_CAP_IRQ_ACKFD_IRQ_SOURCE_ID */
> +#define KVM_IRQ_ACKFD_FLAG_IRQ_SOURCE_ID (1 << 1)
> +
> +struct kvm_irq_ackfd {
> +	__u32 flags;
> +	__u32 fd;
> +	__u32 gsi;
> +	__u32 irq_source_id;
> +	__u8 pad[16];
> +};
> +
>  struct kvm_clock_data {
>  	__u64 clock;
>  	__u32 flags;
> @@ -849,6 +863,8 @@ struct kvm_s390_ucas_mapping {
>  #define KVM_PPC_ALLOCATE_HTAB	  _IOWR(KVMIO, 0xa7, __u32)
>  /* Available with KVM_CAP_IRQ_SOURCE_ID */
>  #define KVM_IRQ_SOURCE_ID         _IOWR(KVMIO, 0xa8, struct kvm_irq_source_id)
> +/* Available with KVM_CAP_IRQ_ACKFD */
> +#define KVM_IRQ_ACKFD             _IOW(KVMIO,  0xa9, struct kvm_irq_ackfd)
>  
>  /*
>   * ioctls for vcpu fds
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index ea6d7a1..cdc55c2 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -285,6 +285,10 @@ struct kvm {
>  		struct list_head  items;
>  	} irqfds;
>  	struct list_head ioeventfds;
> +	struct {
> +		spinlock_t        lock;
> +		struct list_head  items;
> +	} irq_ackfds;
>  #endif
>  	struct kvm_vm_stat stat;
>  	struct kvm_arch arch;
> @@ -831,6 +835,8 @@ int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args);
>  void kvm_irqfd_release(struct kvm *kvm);
>  void kvm_irq_routing_update(struct kvm *, struct kvm_irq_routing_table *);
>  int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);
> +int kvm_irq_ackfd(struct kvm *kvm, struct kvm_irq_ackfd *args);
> +void kvm_irq_ackfd_release(struct kvm *kvm);
>  
>  #else
>  
> @@ -843,6 +849,13 @@ static inline int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
>  
>  static inline void kvm_irqfd_release(struct kvm *kvm) {}
>  
> +static inline int kvm_irq_ackfd(struct kvm *kvm, struct kvm_irq_ackfd *args)
> +{
> +	return -EINVAL;
> +}
> +
> +static inline void kvm_irq_ackfd_release(struct kvm *kvm) {}
> +
>  #ifdef CONFIG_HAVE_KVM_IRQCHIP
>  static inline void kvm_irq_routing_update(struct kvm *kvm,
>  					  struct kvm_irq_routing_table *irq_rt)
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index df41038..ff5c784 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -300,6 +300,8 @@ kvm_eventfd_init(struct kvm *kvm)
>  	spin_lock_init(&kvm->irqfds.lock);
>  	INIT_LIST_HEAD(&kvm->irqfds.items);
>  	INIT_LIST_HEAD(&kvm->ioeventfds);
> +	spin_lock_init(&kvm->irq_ackfds.lock);
> +	INIT_LIST_HEAD(&kvm->irq_ackfds.items);
>  }
>  
>  /*
> @@ -668,3 +670,286 @@ kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
>  
>  	return kvm_assign_ioeventfd(kvm, args);
>  }
> +
> +/*
> + * --------------------------------------------------------------------
> + * irq_ackfd: Expose IRQ ACKs to userspace via eventfd
> + *
> + * userspace can register to recieve eventfd notification for ACKs of
> + * interrupt lines.
> + * --------------------------------------------------------------------
> + */
> +
> +struct _irq_ackfd {
> +	struct kvm *kvm;
> +	struct eventfd_ctx *eventfd; /* signaled on ack */
> +	struct kvm_irq_ack_notifier notifier;
> +	/* Setup/shutdown */
> +	wait_queue_t wait;
> +	poll_table pt;
> +	struct work_struct shutdown;
> +	struct list_head list;
> +};
> +
> +/* Called under irq_ackfds.lock */
> +static void irq_ackfd_shutdown(struct work_struct *work)
> +{
> +	struct _irq_ackfd *ackfd = container_of(work, struct _irq_ackfd,
> +						shutdown);
> +	struct kvm *kvm = ackfd->kvm;
> +	u64 cnt;
> +
> +	/*
> +	 * Stop ACK signaling
> +	 */
> +	kvm_unregister_irq_ack_notifier(kvm, &ackfd->notifier);
> +
> +	/*
> +	 * Synchronize with the wait-queue and unhook ourselves to prevent
> +	 * further events.
> +	 */
> +	eventfd_ctx_remove_wait_queue(ackfd->eventfd, &ackfd->wait, &cnt);
> +
> +	/*
> +	 * Release resources
> +	 */
> +	eventfd_ctx_put(ackfd->eventfd);
> +	kfree(ackfd);
> +}
> +
> +/* assumes kvm->irq_ackfds.lock is held */
> +static bool irq_ackfd_is_active(struct _irq_ackfd *ackfd)
> +{
> +	return list_empty(&ackfd->list) ? false : true;
> +}
> +
> +/*
> + * Mark the irq_ackfd as inactive and schedule it for removal
> + *
> + * assumes kvm->irq_ackfds.lock is held
> + */
> +static void irq_ackfd_deactivate(struct _irq_ackfd *ackfd)
> +{
> +	BUG_ON(!irq_ackfd_is_active(ackfd));
> +
> +	list_del_init(&ackfd->list);
> +
> +	/* Borrow irqfd's workqueue, we don't need our own. */
> +	queue_work(irqfd_cleanup_wq, &ackfd->shutdown);
> +}
> +
> +/*
> + * Called with wqh->lock held and interrupts disabled
> + */
> +static int irq_ackfd_wakeup(wait_queue_t *wait, unsigned mode,
> +			    int sync, void *key)
> +{
> +	unsigned long flags = (unsigned long)key;
> +
> +	if (unlikely(flags & POLLHUP)) {
> +		/* The eventfd is closing, detach from KVM */
> +		struct _irq_ackfd *ackfd;
> +		unsigned long flags;
> +
> +		ackfd = container_of(wait, struct _irq_ackfd, wait);
> +
> +		spin_lock_irqsave(&ackfd->kvm->irq_ackfds.lock, flags);
> +
> +		/*
> +		 * We must check if someone deactivated the irq_ackfd before
> +		 * we could acquire the irq_ackfds.lock since the item is
> +		 * deactivated from the KVM side before it is unhooked from
> +		 * the wait-queue.  If it is already deactivated, we can
> +		 * simply return knowing the other side will cleanup for us.
> +		 * We cannot race against the irq_ackfd going away since the
> +		 * other side is required to acquire wqh->lock, which we hold
> +		 */
> +		if (irq_ackfd_is_active(ackfd))
> +				irq_ackfd_deactivate(ackfd);
> +
> +		spin_unlock_irqrestore(&ackfd->kvm->irq_ackfds.lock, flags);
> +	}
> +
> +	return 0;
> +}
> +
> +static void irq_ackfd_ptable_queue_proc(struct file *file,
> +					wait_queue_head_t *wqh,
> +					poll_table *pt)
> +{
> +	struct _irq_ackfd *ackfd = container_of(pt, struct _irq_ackfd, pt);
> +	add_wait_queue(wqh, &ackfd->wait);
> +}
> +
> +/*
> + * This function is called as the kvm VM fd is being released. Shutdown all
> + * irq_ackfds that still remain open
> + */
> +void kvm_irq_ackfd_release(struct kvm *kvm)
> +{
> +	struct _irq_ackfd *tmp, *ackfd;
> +
> +	spin_lock_irq(&kvm->irq_ackfds.lock);
> +
> +	list_for_each_entry_safe(ackfd, tmp, &kvm->irq_ackfds.items, list)
> +		irq_ackfd_deactivate(ackfd);
> +
> +	spin_unlock_irq(&kvm->irq_ackfds.lock);
> +
> +	flush_workqueue(irqfd_cleanup_wq);
> +}
> +
> +static void irq_ackfd_acked(struct kvm_irq_ack_notifier *notifier)
> +{
> +	struct _irq_ackfd *ackfd;
> +
> +	ackfd = container_of(notifier, struct _irq_ackfd, notifier);
> +
> +	eventfd_signal(ackfd->eventfd, 1);
> +}
> +
> +static int kvm_assign_irq_ackfd(struct kvm *kvm, struct kvm_irq_ackfd *args)
> +{
> +	struct file *file = NULL;
> +	struct eventfd_ctx *eventfd = NULL;
> +	struct _irq_ackfd *ackfd = NULL, *tmp;
> +	int ret;
> +	u64 cnt;
> +
> +	file = eventfd_fget(args->fd);
> +	if (IS_ERR(file)) {
> +		ret = PTR_ERR(file);
> +		goto fail;
> +	}
> +
> +	eventfd = eventfd_ctx_fileget(file);
> +	if (IS_ERR(eventfd)) {
> +		ret = PTR_ERR(eventfd);
> +		goto fail;
> +	}
> +
> +	ackfd = kzalloc(sizeof(*ackfd), GFP_KERNEL);
> +	if (!ackfd) {
> +		ret = -ENOMEM;
> +		goto fail;
> +	}
> +
> +	INIT_LIST_HEAD(&ackfd->list);
> +	INIT_WORK(&ackfd->shutdown, irq_ackfd_shutdown);
> +	ackfd->kvm = kvm;
> +	ackfd->eventfd = eventfd;
> +	ackfd->notifier.gsi = args->gsi;
> +	if (args->flags & KVM_IRQ_ACKFD_FLAG_IRQ_SOURCE_ID)
> +		ackfd->notifier.irq_source_id = args->irq_source_id;
> +	else
> +		ackfd->notifier.irq_source_id = -1;
> +	ackfd->notifier.irq_acked = irq_ackfd_acked;
> +
> +	/*
> +	 * Install our own custom wake-up handling so we are notified via
> +	 * a callback whenever someone releases the underlying eventfd
> +	 */
> +	init_waitqueue_func_entry(&ackfd->wait, irq_ackfd_wakeup);
> +	init_poll_funcptr(&ackfd->pt, irq_ackfd_ptable_queue_proc);
> +
> +	/*
> +	 * Install the wait queue function to allow cleanup when the
> +	 * eventfd is closed by the user.  This grabs the wqh lock, so
> +	 * we do it out of spinlock, holding the file reference ensures
> +	 * we won't see a POLLHUP until setup is complete.
> +	 */
> +	file->f_op->poll(file, &ackfd->pt);
> +
> +	spin_lock_irq(&kvm->irq_ackfds.lock);
> +
> +	/* Check for duplicates.  */
> +	list_for_each_entry(tmp, &kvm->irq_ackfds.items, list) {
> +		if (ackfd->eventfd == tmp->eventfd &&
> +		    ackfd->notifier.gsi == tmp->notifier.gsi &&
> +		    ackfd->notifier.irq_source_id ==
> +		    tmp->notifier.irq_source_id) {
> +			spin_unlock_irq(&kvm->irq_ackfds.lock);
> +			ret = -EBUSY;
> +			goto fail_unqueue;
> +		}
> +	}
> +

So source ID is not validated at all, this means there are
4G ways for the same eventfd to be registered,
drinking up unlimited kernel memory by means of kzalloc above.

> +	list_add_tail(&ackfd->list, &kvm->irq_ackfds.items);
> +
> +	spin_unlock_irq(&kvm->irq_ackfds.lock);
> +
> +	/* This can sleep, so register after spinlock.  */
> +	kvm_register_irq_ack_notifier(kvm, &ackfd->notifier);
> +
> +	fput(file); /* Enable POLLHUP */
> +
> +	return 0;
> +
> +fail_unqueue:
> +	eventfd_ctx_remove_wait_queue(eventfd, &ackfd->wait, &cnt);
> +fail:
> +	if (eventfd && !IS_ERR(eventfd))
> +		eventfd_ctx_put(eventfd);
> +
> +	if (file && !IS_ERR(file))
> +		fput(file);
> +
> +	kfree(ackfd);
> +
> +	return ret;
> +}
> +
> +static int kvm_deassign_irq_ackfd(struct kvm *kvm, struct kvm_irq_ackfd *args)
> +{
> +	struct eventfd_ctx *eventfd = NULL;
> +	struct _irq_ackfd *ackfd;
> +	int irq_source_id = -1, ret = -ENOENT;
> +
> +	eventfd = eventfd_ctx_fdget(args->fd);
> +	if (IS_ERR(eventfd)) {
> +		ret = PTR_ERR(eventfd);
> +		goto fail;
> +	}
> +
> +	if (args->flags & KVM_IRQ_ACKFD_FLAG_IRQ_SOURCE_ID)
> +		irq_source_id = args->irq_source_id;
> +
> +	spin_lock_irq(&kvm->irq_ackfds.lock);
> +
> +	list_for_each_entry(ackfd, &kvm->irq_ackfds.items, list) {
> +		if (ackfd->eventfd == eventfd &&
> +		    ackfd->notifier.gsi == args->gsi &&
> +		    ackfd->notifier.irq_source_id == irq_source_id) {
> +			irq_ackfd_deactivate(ackfd);
> +			ret = 0;
> +			break;
> +		}
> +	}
> +
> +	spin_unlock_irq(&kvm->irq_ackfds.lock);
> +
> +fail:
> +	if (eventfd && !IS_ERR(eventfd))
> +		eventfd_ctx_put(eventfd);
> +
> +	/*
> +	 * Block until we know all outstanding shutdown jobs have completed
> +	 * so that we guarantee there will not be any more ACKs signaled on
> +	 * this eventfd once this deassign function returns.
> +	 */
> +	flush_workqueue(irqfd_cleanup_wq);
> +
> +	return ret;
> +}
> +
> +int kvm_irq_ackfd(struct kvm *kvm, struct kvm_irq_ackfd *args)
> +{
> +	if (args->flags & ~(KVM_IRQ_ACKFD_FLAG_DEASSIGN |
> +			    KVM_IRQ_ACKFD_FLAG_IRQ_SOURCE_ID))
> +		return -EINVAL;
> +
> +	if (args->flags & KVM_IRQ_ACKFD_FLAG_DEASSIGN)
> +		return kvm_deassign_irq_ackfd(kvm, args);
> +
> +	return kvm_assign_irq_ackfd(kvm, args);
> +}
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index e120cb3..bbe3a20 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -619,6 +619,7 @@ static int kvm_vm_release(struct inode *inode, struct file *filp)
>  	struct kvm *kvm = filp->private_data;
>  
>  	kvm_irqfd_release(kvm);
> +	kvm_irq_ackfd_release(kvm);
>  
>  	kvm_put_kvm(kvm);
>  	return 0;
> @@ -2109,6 +2110,15 @@ static long kvm_vm_ioctl(struct file *filp,
>  		break;
>  	}
>  #endif
> +	case KVM_IRQ_ACKFD: {
> +		struct kvm_irq_ackfd data;
> +
> +		r = -EFAULT;
> +		if (copy_from_user(&data, argp, sizeof(data)))
> +			goto out;
> +		r = kvm_irq_ackfd(kvm, &data);
> +		break;
> +	}
>  	default:
>  		r = kvm_arch_vm_ioctl(filp, ioctl, arg);
>  		if (r == -ENOTTY)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 6/6] kvm: Add de-assert option to KVM_IRQ_ACKFD
  2012-08-10 22:37 ` [PATCH v8 6/6] kvm: Add de-assert option to KVM_IRQ_ACKFD Alex Williamson
@ 2012-08-15 14:11   ` Michael S. Tsirkin
  2012-08-15 17:24     ` Alex Williamson
  0 siblings, 1 reply; 33+ messages in thread
From: Michael S. Tsirkin @ 2012-08-15 14:11 UTC (permalink / raw)
  To: Alex Williamson; +Cc: avi, gleb, kvm, linux-kernel

On Fri, Aug 10, 2012 at 04:37:56PM -0600, Alex Williamson wrote:
> It's likely (vfio) that one of the reasons to watch for an IRQ ACK
> is to de-assert and re-enable an interrupt.  As the IRQ ACK notfier
> is already watching a GSI for an IRQ source ID we can easily couple
> these together.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

This source id is required the only way to assert
in the 1st place is with irqfd.
So why is this an ackfd flag and not an irqfd flag?
I am guessing because this way you do not need
an extra ack notifier, but isn't this an internal
optimization leaking out to userspace?

> ---
> 
>  Documentation/virtual/kvm/api.txt |    4 ++++
>  arch/x86/kvm/x86.c                |    1 +
>  include/linux/kvm.h               |    3 +++
>  virt/kvm/eventfd.c                |   14 +++++++++++++-
>  4 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 77b4837..128d4c3 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -2029,6 +2029,10 @@ the eventfd is only triggered when the specified IRQ source ID is
>  pending.  On deassign, fd, gsi, and irq_source_id (if provided on assign)
>  must be provided.
>  
> +When KVM_CAP_IRQ_ACKFD_DEASSERT is available the flag
> +KVM_IRQ_ACKFD_FLAG_IRQ_DEASSERT may be used on assignment to specify
> +that the GSI should be de-asserted prior to eventfd notification.
> +This flag requires an IRQ source ID to be provided as described above.
>  
>  5. The kvm_run structure
>  ------------------------
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3d98e59..691b00d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2178,6 +2178,7 @@ int kvm_dev_ioctl_check_extension(long ext)
>  	case KVM_CAP_IRQFD_ASSERT_ONLY:
>  	case KVM_CAP_IRQ_ACKFD:
>  	case KVM_CAP_IRQ_ACKFD_IRQ_SOURCE_ID:
> +	case KVM_CAP_IRQ_ACKFD_DEASSERT:
>  		r = 1;
>  		break;
>  	case KVM_CAP_COALESCED_MMIO:
> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> index 0f53bd5..331631e 100644
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -623,6 +623,7 @@ struct kvm_ppc_smmu_info {
>  #define KVM_CAP_IRQFD_ASSERT_ONLY 83
>  #define KVM_CAP_IRQ_ACKFD 84
>  #define KVM_CAP_IRQ_ACKFD_IRQ_SOURCE_ID 85
> +#define KVM_CAP_IRQ_ACKFD_DEASSERT 86
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> @@ -712,6 +713,8 @@ struct kvm_irq_source_id {
>  #define KVM_IRQ_ACKFD_FLAG_DEASSIGN (1 << 0)
>  /* Available with KVM_CAP_IRQ_ACKFD_IRQ_SOURCE_ID */
>  #define KVM_IRQ_ACKFD_FLAG_IRQ_SOURCE_ID (1 << 1)
> +/* Available with KVM_CAP_IRQ_ACKFD_DEASSERT */
> +#define KVM_IRQ_ACKFD_FLAG_DEASSERT (1 << 2)
>  
>  struct kvm_irq_ackfd {
>  	__u32 flags;
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index ff5c784..ffc6a13 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -682,6 +682,7 @@ kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
>  
>  struct _irq_ackfd {
>  	struct kvm *kvm;
> +	bool deassert; /* de-assert on ack? */
>  	struct eventfd_ctx *eventfd; /* signaled on ack */
>  	struct kvm_irq_ack_notifier notifier;
>  	/* Setup/shutdown */
> @@ -805,6 +806,10 @@ static void irq_ackfd_acked(struct kvm_irq_ack_notifier *notifier)
>  
>  	ackfd = container_of(notifier, struct _irq_ackfd, notifier);
>  
> +	if (ackfd->deassert)
> +		kvm_set_irq(ackfd->kvm, ackfd->notifier.irq_source_id,
> +			    ackfd->notifier.gsi, 0);
> +
>  	eventfd_signal(ackfd->eventfd, 1);
>  }
>  
> @@ -845,6 +850,12 @@ static int kvm_assign_irq_ackfd(struct kvm *kvm, struct kvm_irq_ackfd *args)
>  		ackfd->notifier.irq_source_id = -1;
>  	ackfd->notifier.irq_acked = irq_ackfd_acked;
>  
> +	ackfd->deassert = args->flags & KVM_IRQ_ACKFD_FLAG_DEASSERT;
> +	if (ackfd->deassert && ackfd->notifier.irq_source_id < 0) {
> +		ret = -EINVAL;
> +		goto fail;
> +	}
> +
>  	/*
>  	 * Install our own custom wake-up handling so we are notified via
>  	 * a callback whenever someone releases the underlying eventfd
> @@ -945,7 +956,8 @@ fail:
>  int kvm_irq_ackfd(struct kvm *kvm, struct kvm_irq_ackfd *args)
>  {
>  	if (args->flags & ~(KVM_IRQ_ACKFD_FLAG_DEASSIGN |
> -			    KVM_IRQ_ACKFD_FLAG_IRQ_SOURCE_ID))
> +			    KVM_IRQ_ACKFD_FLAG_IRQ_SOURCE_ID |
> +			    KVM_IRQ_ACKFD_FLAG_DEASSERT))
>  		return -EINVAL;
>  
>  	if (args->flags & KVM_IRQ_ACKFD_FLAG_DEASSIGN)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 0/6] kvm: level irqfd support
  2012-08-10 22:37 [PATCH v8 0/6] kvm: level irqfd support Alex Williamson
                   ` (5 preceding siblings ...)
  2012-08-10 22:37 ` [PATCH v8 6/6] kvm: Add de-assert option to KVM_IRQ_ACKFD Alex Williamson
@ 2012-08-15 14:28 ` Michael S. Tsirkin
  2012-08-15 17:36   ` Alex Williamson
  2012-08-16 16:32 ` Avi Kivity
  7 siblings, 1 reply; 33+ messages in thread
From: Michael S. Tsirkin @ 2012-08-15 14:28 UTC (permalink / raw)
  To: Alex Williamson; +Cc: avi, gleb, kvm, linux-kernel

On Fri, Aug 10, 2012 at 04:37:08PM -0600, Alex Williamson wrote:
> v8:
> 
> Trying a new approach.  Nobody seems to like the internal IRQ
> source ID object and the interactions it implies between irqfd
> and eoifd, so let's get rid of it.  Instead, simply expose
> IRQ source IDs to userspace.  This lets the user be in charge
> of freeing them or hanging onto a source ID for later use.

In the end it turns out source ID is an optimization for shared
interrupts, isn't it?  Can't we apply the optimization transparently to
the user?  E.g. if we have some spare source IDs, allocate them, if we
run out, use a shared source ID?

> They
> can also detach and re-attach components at will.  It also opens
> up the possibility that userspace might want to use each IRQ
> source ID for more than one GSI (and avoids the kernel needing
> to manage that).  Per suggestions, EOIFD is now IRQ_ACKFD.
> 
> I really wanted to add a de-assert-only option to irqfd so the
> irq_ackfd could be fed directly into an irqfd, but I'm dependent
> on the ordering of de-assert _then_ signal an eventfd.  Keeping
> that ordering doesn't seem to be possible, especially since irqfd
> uses a workqueue, if I attempt to make that connection.  Thanks,
> 
> Alex
> ---
> 
> Alex Williamson (6):
>       kvm: Add de-assert option to KVM_IRQ_ACKFD
>       kvm: KVM_IRQ_ACKFD
>       kvm: Add assert-only option to KVM_IRQFD
>       kvm: Add IRQ source ID option to KVM_IRQFD
>       kvm: Expose IRQ source IDs to userspace
>       kvm: Allow filtering of acked irqs
> 
> 
>  Documentation/virtual/kvm/api.txt |   53 ++++++
>  arch/x86/kvm/Kconfig              |    1 
>  arch/x86/kvm/i8254.c              |    1 
>  arch/x86/kvm/i8259.c              |    8 +
>  arch/x86/kvm/x86.c                |    8 +
>  include/linux/kvm.h               |   39 ++++-
>  include/linux/kvm_host.h          |   18 ++
>  virt/kvm/Kconfig                  |    3 
>  virt/kvm/assigned-dev.c           |    1 
>  virt/kvm/eventfd.c                |  315 +++++++++++++++++++++++++++++++++++++
>  virt/kvm/ioapic.c                 |    5 -
>  virt/kvm/irq_comm.c               |   28 +++
>  virt/kvm/kvm_main.c               |   26 +++
>  13 files changed, 496 insertions(+), 10 deletions(-)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 1/6] kvm: Allow filtering of acked irqs
  2012-08-15 12:27   ` Michael S. Tsirkin
@ 2012-08-15 16:47     ` Alex Williamson
  2012-08-15 19:24       ` Michael S. Tsirkin
  0 siblings, 1 reply; 33+ messages in thread
From: Alex Williamson @ 2012-08-15 16:47 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: avi, gleb, kvm, linux-kernel

On Wed, 2012-08-15 at 15:27 +0300, Michael S. Tsirkin wrote:
> On Fri, Aug 10, 2012 at 04:37:17PM -0600, Alex Williamson wrote:
> > Registering an kvm_irq_ack_notifier with kian.irq_source_id < 0
> > retains existing behavior, filling in the actual irq_source_id results
> > in the callback only being called when the specified irq_source_id is
> > asserting the given gsi.
> > 
> > The i8254 PIT remains unfiltered because it de-asserts it's irq source
> > id, so it's notifier would never get called otherwise.  KVM device
> > assignment gets filtering as it de-asserts the GSI in it's notifier.
> > 
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> 
> Looks good to me. For the record, I expect this to help if
> - an assigned device interrupt is shared in host
>   so we use slow config cycles in the ack notifier
> - said device is sharing interrupt with another device in guest
> - said another device is actually driving most interrupts
> For example, I think this could be tested
> by booting guest with pci=nomsi.

Yes, that's how I do most of my testing of legacy interrupts with
vfio-pci.  KVM assignment is smart enough not to do config access unless
the irq is marked as disabled, but it does eliminate an unnecessary
de-assert and a couple spinlocks in assignment code.

> A minor suggestions below but
> nothing that needs to block this patch.
> 
> > ---
> > 
> >  arch/x86/kvm/i8254.c     |    1 +
> >  arch/x86/kvm/i8259.c     |    8 +++++++-
> >  include/linux/kvm_host.h |    4 +++-
> >  virt/kvm/assigned-dev.c  |    1 +
> >  virt/kvm/ioapic.c        |    5 ++++-
> >  virt/kvm/irq_comm.c      |    6 ++++--
> >  6 files changed, 20 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
> > index adba28f..2355d19 100644
> > --- a/arch/x86/kvm/i8254.c
> > +++ b/arch/x86/kvm/i8254.c
> > @@ -709,6 +709,7 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
> >  	hrtimer_init(&pit_state->pit_timer.timer,
> >  		     CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
> >  	pit_state->irq_ack_notifier.gsi = 0;
> > +	pit_state->irq_ack_notifier.irq_source_id = -1; /* No filter */
> 
> A bit prettier would be to
> #define KVM_NO_IRQ_SOURCE_ID     (-1)
> and test for it explicitly.

Ok.  I'll add a define and resend this one.  Thanks,

Alex

> >  	pit_state->irq_ack_notifier.irq_acked = kvm_pit_ack_irq;
> >  	kvm_register_irq_ack_notifier(kvm, &pit_state->irq_ack_notifier);
> >  	pit_state->pit_timer.reinject = true;
> > diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
> > index e498b18..d2175a9 100644
> > --- a/arch/x86/kvm/i8259.c
> > +++ b/arch/x86/kvm/i8259.c
> > @@ -74,9 +74,14 @@ static void pic_unlock(struct kvm_pic *s)
> >  
> >  static void pic_clear_isr(struct kvm_kpic_state *s, int irq)
> >  {
> > +	unsigned long irq_source_ids;
> > +
> >  	s->isr &= ~(1 << irq);
> >  	if (s != &s->pics_state->pics[0])
> >  		irq += 8;
> > +
> > +	irq_source_ids = s->pics_state->irq_states[irq];
> > +
> >  	/*
> >  	 * We are dropping lock while calling ack notifiers since ack
> >  	 * notifier callbacks for assigned devices call into PIC recursively.
> > @@ -84,7 +89,8 @@ static void pic_clear_isr(struct kvm_kpic_state *s, int irq)
> >  	 * it should be safe since PIC state is already updated at this stage.
> >  	 */
> >  	pic_unlock(s->pics_state);
> > -	kvm_notify_acked_irq(s->pics_state->kvm, SELECT_PIC(irq), irq);
> > +	kvm_notify_acked_irq(s->pics_state->kvm, SELECT_PIC(irq), irq,
> > +			     irq_source_ids);
> >  	pic_lock(s->pics_state);
> >  }
> >  
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index b70b48b..2ad3e4a 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -577,6 +577,7 @@ int kvm_is_mmio_pfn(pfn_t pfn);
> >  
> >  struct kvm_irq_ack_notifier {
> >  	struct hlist_node link;
> > +	int irq_source_id;
> >  	unsigned gsi;
> >  	void (*irq_acked)(struct kvm_irq_ack_notifier *kian);
> >  };
> > @@ -627,7 +628,8 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic,
> >  int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level);
> >  int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm *kvm,
> >  		int irq_source_id, int level);
> > -void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
> > +void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin,
> > +			  unsigned long irq_source_ids);
> >  void kvm_register_irq_ack_notifier(struct kvm *kvm,
> >  				   struct kvm_irq_ack_notifier *kian);
> >  void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
> > diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
> > index 23a41a9..a08c9c1 100644
> > --- a/virt/kvm/assigned-dev.c
> > +++ b/virt/kvm/assigned-dev.c
> > @@ -407,6 +407,7 @@ static int assigned_device_enable_guest_intx(struct kvm *kvm,
> >  {
> >  	dev->guest_irq = irq->guest_irq;
> >  	dev->ack_notifier.gsi = irq->guest_irq;
> > +	dev->ack_notifier.irq_source_id = dev->irq_source_id;
> >  	return 0;
> >  }
> >  
> > diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
> > index ef61d52..1a9f445 100644
> > --- a/virt/kvm/ioapic.c
> > +++ b/virt/kvm/ioapic.c
> > @@ -241,10 +241,12 @@ static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector,
> >  
> >  	for (i = 0; i < IOAPIC_NUM_PINS; i++) {
> >  		union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i];
> > +		unsigned long irq_source_ids;
> >  
> >  		if (ent->fields.vector != vector)
> >  			continue;
> >  
> > +		irq_source_ids = ioapic->irq_states[i];
> >  		/*
> >  		 * We are dropping lock while calling ack notifiers because ack
> >  		 * notifier callbacks for assigned devices call into IOAPIC
> > @@ -254,7 +256,8 @@ static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector,
> >  		 * after ack notifier returns.
> >  		 */
> >  		spin_unlock(&ioapic->lock);
> > -		kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
> > +		kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i,
> > +				     irq_source_ids);
> >  		spin_lock(&ioapic->lock);
> >  
> >  		if (trigger_mode != IOAPIC_LEVEL_TRIG)
> > diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
> > index 83402d7..7d75126 100644
> > --- a/virt/kvm/irq_comm.c
> > +++ b/virt/kvm/irq_comm.c
> > @@ -173,7 +173,8 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level)
> >  	return ret;
> >  }
> >  
> > -void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin)
> > +void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin,
> > +			  unsigned long irq_source_ids)
> >  {
> >  	struct kvm_irq_ack_notifier *kian;
> >  	struct hlist_node *n;
> > @@ -186,7 +187,8 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin)
> >  	if (gsi != -1)
> >  		hlist_for_each_entry_rcu(kian, n, &kvm->irq_ack_notifier_list,
> >  					 link)
> > -			if (kian->gsi == gsi)
> > +			if (kian->gsi == gsi && (kian->irq_source_id < 0 ||
> > +			    test_bit(kian->irq_source_id, &irq_source_ids)))
> >  				kian->irq_acked(kian);
> >  	rcu_read_unlock();
> >  }
> 




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 2/6] kvm: Expose IRQ source IDs to userspace
  2012-08-15 12:59   ` Michael S. Tsirkin
@ 2012-08-15 17:05     ` Alex Williamson
  0 siblings, 0 replies; 33+ messages in thread
From: Alex Williamson @ 2012-08-15 17:05 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: avi, gleb, kvm, linux-kernel

On Wed, 2012-08-15 at 15:59 +0300, Michael S. Tsirkin wrote:
> On Fri, Aug 10, 2012 at 04:37:25PM -0600, Alex Williamson wrote:
> > Introduce KVM_IRQ_SOURCE_ID and KVM_CAP_NR_IRQ_SOURCE_ID to allow
> > user allocation of IRQ source IDs and querying both the capability
> > and the total count of IRQ source IDs.  These will later be used
> > by interfaces for setting up level IRQs.
> > 
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > ---
> > 
> >  Documentation/virtual/kvm/api.txt |   20 ++++++++++++++++++++
> >  arch/x86/kvm/Kconfig              |    1 +
> >  arch/x86/kvm/x86.c                |    3 +++
> >  include/linux/kvm.h               |   11 +++++++++++
> >  include/linux/kvm_host.h          |    1 +
> >  virt/kvm/Kconfig                  |    3 +++
> >  virt/kvm/irq_comm.c               |   22 ++++++++++++++++++++++
> >  virt/kvm/kvm_main.c               |   16 ++++++++++++++++
> >  8 files changed, 77 insertions(+)
> > 
> > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > index bf33aaa..062cfd5 100644
> > --- a/Documentation/virtual/kvm/api.txt
> > +++ b/Documentation/virtual/kvm/api.txt
> > @@ -1980,6 +1980,26 @@ return the hash table order in the parameter.  (If the guest is using
> >  the virtualized real-mode area (VRMA) facility, the kernel will
> >  re-create the VMRA HPTEs on the next KVM_RUN of any vcpu.)
> >  
> > +4.77 KVM_IRQ_SOURCE_ID
> > +
> > +Capability: KVM_CAP_NR_IRQ_SOURCE_ID
> > +Architectures: x86
> > +Type: vm ioctl
> > +Parameters: struct kvm_irq_source_id (in/out)
> > +Returns: 0 on success, -errno on error
> > +
> > +Allows allocating and freeing IRQ source IDs.  Each IRQ source ID
> > +represents a complete set of irqchip pin inputs which are logically
> > +OR'd with other IRQ source IDs for determining the final assertion
> > +level of a pin.  The flag KVM_IRQ_SOURCE_ID_FLAG_DEASSIGN indicates
> > +whether the call is for an allocation or deallocation.
> > +kvm_irq_source_id.irq_source_id returns the allocated IRQ source ID
> > +on success and specifies the freed IRQ source ID on deassign.  The
> > +return value of KVM_CAP_NR_IRQ_SOURCE_ID indicates the total number
> > +of IRQ source IDs.  These IDs are also shared with KVM internal users
> > +(ex. KVM assigned devices, PIT, shared user ID), therefore not all IDs
> > +may be allocated through this interface.
> > +
> >  5. The kvm_run structure
> >  ------------------------
> > diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> > index a28f338..bfd2082 100644
> > --- a/arch/x86/kvm/Kconfig
> > +++ b/arch/x86/kvm/Kconfig
> > @@ -37,6 +37,7 @@ config KVM
> >  	select TASK_DELAY_ACCT
> >  	select PERF_EVENTS
> >  	select HAVE_KVM_MSI
> > +	select HAVE_KVM_IRQ_SOURCE_ID
> >  	---help---
> >  	  Support hosting fully virtualized guest machines using hardware
> >  	  virtualization extensions.  You will need a fairly recent
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 42bce48..75e743e 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -2209,6 +2209,9 @@ int kvm_dev_ioctl_check_extension(long ext)
> >  	case KVM_CAP_TSC_DEADLINE_TIMER:
> >  		r = boot_cpu_has(X86_FEATURE_TSC_DEADLINE_TIMER);
> >  		break;
> > +	case KVM_CAP_NR_IRQ_SOURCE_ID:
> > +		r = BITS_PER_LONG; /* kvm->arch.irq_sources_bitmap */
> > +		break;
> >  	default:
> >  		r = 0;
> >  		break;
> > diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> > index 2ce09aa..67b6b49 100644
> > --- a/include/linux/kvm.h
> > +++ b/include/linux/kvm.h
> > @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
> >  #define KVM_CAP_PPC_GET_SMMU_INFO 78
> >  #define KVM_CAP_S390_COW 79
> >  #define KVM_CAP_PPC_ALLOC_HTAB 80
> > +#define KVM_CAP_NR_IRQ_SOURCE_ID 81
> >  
> >  #ifdef KVM_CAP_IRQ_ROUTING
> >  
> > @@ -691,6 +692,14 @@ struct kvm_irqfd {
> >  	__u8  pad[20];
> >  };
> >  
> > +#define KVM_IRQ_SOURCE_ID_FLAG_DEASSIGN (1 << 0)
> > +
> > +struct kvm_irq_source_id {
> > +	__u32 flags;
> > +	__u32 irq_source_id;
> > +	__u8 pad[24];
> > +};
> > +
> >  struct kvm_clock_data {
> >  	__u64 clock;
> >  	__u32 flags;
> > @@ -831,6 +840,8 @@ struct kvm_s390_ucas_mapping {
> >  #define KVM_PPC_GET_SMMU_INFO	  _IOR(KVMIO,  0xa6, struct kvm_ppc_smmu_info)
> >  /* Available with KVM_CAP_PPC_ALLOC_HTAB */
> >  #define KVM_PPC_ALLOCATE_HTAB	  _IOWR(KVMIO, 0xa7, __u32)
> > +/* Available with KVM_CAP_IRQ_SOURCE_ID */
> > +#define KVM_IRQ_SOURCE_ID         _IOWR(KVMIO, 0xa8, struct kvm_irq_source_id)
> >  
> >  /*
> >   * ioctls for vcpu fds
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 2ad3e4a..ea6d7a1 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -636,6 +636,7 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
> >  				   struct kvm_irq_ack_notifier *kian);
> >  int kvm_request_irq_source_id(struct kvm *kvm);
> >  void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id);
> > +int kvm_irq_source_id(struct kvm *kvm, struct kvm_irq_source_id *id);
> >  
> >  /* For vcpu->arch.iommu_flags */
> >  #define KVM_IOMMU_CACHE_COHERENCY	0x1
> > diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> > index 28694f4..b7e0d4d 100644
> > --- a/virt/kvm/Kconfig
> > +++ b/virt/kvm/Kconfig
> > @@ -21,3 +21,6 @@ config KVM_ASYNC_PF
> >  
> >  config HAVE_KVM_MSI
> >         bool
> > +
> > +config HAVE_KVM_IRQ_SOURCE_ID
> > +       bool
> > diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
> > index 7d75126..44d40c9 100644
> > --- a/virt/kvm/irq_comm.c
> > +++ b/virt/kvm/irq_comm.c
> > @@ -254,6 +254,28 @@ unlock:
> >  	mutex_unlock(&kvm->irq_lock);
> >  }
> >  
> > +int kvm_irq_source_id(struct kvm *kvm, struct kvm_irq_source_id *id)
> > +{
> > +	int irq_source_id;
> > +
> > +	if (id->flags & ~KVM_IRQ_SOURCE_ID_FLAG_DEASSIGN)
> > +		return -EINVAL;
> > +
> > +	if (id->flags & KVM_IRQ_SOURCE_ID_FLAG_DEASSIGN) {
> > +		if (id->irq_source_id == KVM_USERSPACE_IRQ_SOURCE_ID)
> > +			return -EINVAL;
> 
> This validation here is probably a work around
> for assert in kvm_free_irq_source_id?
> It would be good to add a comment explaining this,
> or just drop assert in kvm_free_irq_source_id.

I think we have to keep KVM_USERSPACE_IRQ_SOURCE_ID protected, but maybe
the way to do that would be to keep a bitmap of user allocated IDs so
that userspace can only free the correct IDs.

> Also, kvm_free_irq_source_id prints error messages
> if you give it an invalid value, so
> this let userspace flood kernel log with error messages.

This would also be prevented with the bitmap I mention above.

> > +		kvm_free_irq_source_id(kvm, id->irq_source_id);
> > +		return 0;
> > +	}
> > +
> 
> Hmm. This will let buggy userspace deassign IDs such as PIT that it
> never assigned. I think it will only break itself so it should be
> harmless, but it would seem to work with subtle
> interrupt-losing races, so maybe it would be nicer if we could
> validate the ID was actually given to userspace at some point.

Yep, we need to keep a bitmap.

> > +	irq_source_id = kvm_request_irq_source_id(kvm);
> 
> kvm_request_irq_source_id was not intended to be called
> by userspace so it behaves a bit strangely:
> - If you call this too many times you let userspace flood
>   kernel log with warnings

This printk would need to go away, returning error should be sufficient.

> - return value is -EFAULT which is ignored by another caller
>   but passed to userspace here, arguably wrong

ENOSPC would be a more appropriate return from
kvm_request_irq_source_id.  I'm not seeing which user ignores the return
value, pit returns NULL and device assignment returns the error value.

> > +	if (irq_source_id < 0)
> > +		return irq_source_id;
> > +
> > +	id->irq_source_id = irq_source_id;
> > +	return 0;
> > +}
> > +
> 
> There is a minor bug here: if you call this too many times then following
> PIT and assigned device allocation will fail.
> 
> This is in contrast to documentation which says that
> getting source id will fail in this case,
> and generally gives userspace no way to detect
> what went wrong and recover: it only sees ENOMEM.

Seems like a pit bug for the return value.  I agree though, it's
difficult to debug.

> 
> One solution I see is split the source ID space, give half to userspace
> and half to internal users.

I agree that pit should have priority and maybe x86 should hard code
which source ID it makes use of like KVM_USERSPACE_IRQ_SOURCE_ID.  It
could be statically reserved.  KVM device assignment doesn't deserve
reserved IDs though, it's the same class of device as the intended
caller of these interfaces.  

Summary:
      * Need bitmap for tracking user allocated/freed IDs
      * Reserve ID for PIT
      * Evaluate return values

Thanks,

Alex

> >  void kvm_register_irq_mask_notifier(struct kvm *kvm, int irq,
> >  				    struct kvm_irq_mask_notifier *kimn)
> >  {
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 2468523..e120cb3 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -2093,6 +2093,22 @@ static long kvm_vm_ioctl(struct file *filp,
> >  		break;
> >  	}
> >  #endif
> > +#ifdef CONFIG_HAVE_KVM_IRQ_SOURCE_ID
> > +	case KVM_IRQ_SOURCE_ID: {
> > +		struct kvm_irq_source_id id;
> > +
> > +		r = -EFAULT;
> > +		if (copy_from_user(&id, argp, sizeof(id)))
> > +			goto out;
> > +
> > +		r = kvm_irq_source_id(kvm, &id);
> > +		if (!r && copy_to_user(argp, &id, sizeof(id))) {
> > +			r = -EFAULT;
> > +			goto out;
> > +		}
> > +		break;
> > +	}
> > +#endif
> >  	default:
> >  		r = kvm_arch_vm_ioctl(filp, ioctl, arg);
> >  		if (r == -ENOTTY)




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 3/6] kvm: Add IRQ source ID option to KVM_IRQFD
  2012-08-15 13:49   ` Michael S. Tsirkin
@ 2012-08-15 17:08     ` Alex Williamson
  0 siblings, 0 replies; 33+ messages in thread
From: Alex Williamson @ 2012-08-15 17:08 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: avi, gleb, kvm, linux-kernel

On Wed, 2012-08-15 at 16:49 +0300, Michael S. Tsirkin wrote:
> On Fri, Aug 10, 2012 at 04:37:33PM -0600, Alex Williamson wrote:
> > This allows specifying an IRQ source ID to be used when injecting an
> > interrupt.  When not specified KVM_USERSPACE_IRQ_SOURCE_ID is used.
> > 
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > ---
> > 
> >  Documentation/virtual/kvm/api.txt |    5 +++++
> >  arch/x86/kvm/x86.c                |    1 +
> >  include/linux/kvm.h               |    6 +++++-
> >  virt/kvm/eventfd.c                |   14 ++++++++++----
> >  4 files changed, 21 insertions(+), 5 deletions(-)
> > 
> > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > index 062cfd5..46f4b4d 100644
> > --- a/Documentation/virtual/kvm/api.txt
> > +++ b/Documentation/virtual/kvm/api.txt
> > @@ -1946,6 +1946,11 @@ the guest using the specified gsi pin.  The irqfd is removed using
> >  the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
> >  and kvm_irqfd.gsi.
> >  
> > +When KVM_CAP_IRQFD_IRQ_SOURCE_ID is available, KVM_IRQFD supports the
> > +KVM_IRQFD_FLAG_IRQ_SOURCE_ID which can be used to specify an IRQ
> > +source ID (see KVM_IRQ_SOURCE_ID) to be used for the guest interrupt.
> > +This flag has no effect on deassignment.
> > +
> >  4.76 KVM_PPC_ALLOCATE_HTAB
> >  
> >  Capability: KVM_CAP_PPC_ALLOC_HTAB
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 75e743e..946c5bd 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -2174,6 +2174,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> >  	case KVM_CAP_GET_TSC_KHZ:
> >  	case KVM_CAP_PCI_2_3:
> >  	case KVM_CAP_KVMCLOCK_CTRL:
> > +	case KVM_CAP_IRQFD_IRQ_SOURCE_ID:
> >  		r = 1;
> >  		break;
> >  	case KVM_CAP_COALESCED_MMIO:
> > diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> > index 67b6b49..deda8a9 100644
> > --- a/include/linux/kvm.h
> > +++ b/include/linux/kvm.h
> > @@ -619,6 +619,7 @@ struct kvm_ppc_smmu_info {
> >  #define KVM_CAP_S390_COW 79
> >  #define KVM_CAP_PPC_ALLOC_HTAB 80
> >  #define KVM_CAP_NR_IRQ_SOURCE_ID 81
> > +#define KVM_CAP_IRQFD_IRQ_SOURCE_ID 82
> >  
> >  #ifdef KVM_CAP_IRQ_ROUTING
> >  
> > @@ -684,12 +685,15 @@ struct kvm_xen_hvm_config {
> >  #endif
> >  
> >  #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
> > +/* Available with KVM_CAP_IRQFD_IRQ_SOURCE_ID */
> > +#define KVM_IRQFD_FLAG_IRQ_SOURCE_ID (1 << 1)
> >  
> >  struct kvm_irqfd {
> >  	__u32 fd;
> >  	__u32 gsi;
> >  	__u32 flags;
> > -	__u8  pad[20];
> > +	__u32 irq_source_id;
> > +	__u8  pad[16];
> >  };
> >  
> >  #define KVM_IRQ_SOURCE_ID_FLAG_DEASSIGN (1 << 0)
> > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> > index 7d7e2aa..30150f1 100644
> > --- a/virt/kvm/eventfd.c
> > +++ b/virt/kvm/eventfd.c
> > @@ -51,6 +51,7 @@ struct _irqfd {
> >  	struct kvm_kernel_irq_routing_entry __rcu *irq_entry;
> >  	/* Used for level IRQ fast-path */
> >  	int gsi;
> > +	int irq_source_id;
> >  	struct work_struct inject;
> >  	/* Used for setup/shutdown */
> >  	struct eventfd_ctx *eventfd;
> > @@ -67,8 +68,8 @@ irqfd_inject(struct work_struct *work)
> >  	struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
> >  	struct kvm *kvm = irqfd->kvm;
> >  
> > -	kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1);
> > -	kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
> > +	kvm_set_irq(kvm, irqfd->irq_source_id, irqfd->gsi, 1);
> > +	kvm_set_irq(kvm, irqfd->irq_source_id, irqfd->gsi, 0);
> 
> Looks like this can lead to kernel data corruption if irq_source_id
> specified it out of range?

Yep, we need range checking and maybe even cross check against user
allocated IDs.

> >  }
> >  
> >  /*
> > @@ -138,7 +139,7 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
> >  		irq = rcu_dereference(irqfd->irq_entry);
> >  		/* An event has been signaled, inject an interrupt */
> >  		if (irq)
> > -			kvm_set_msi(irq, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1);
> > +			kvm_set_msi(irq, kvm, irqfd->irq_source_id, 1);
> >  		else
> >  			schedule_work(&irqfd->inject);
> >  		rcu_read_unlock();
> > @@ -213,6 +214,10 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
> >  
> >  	irqfd->kvm = kvm;
> >  	irqfd->gsi = args->gsi;
> > +	if (args->flags & KVM_IRQFD_FLAG_IRQ_SOURCE_ID)
> > +		irqfd->irq_source_id = args->irq_source_id;
> > +	else
> > +		irqfd->irq_source_id = KVM_USERSPACE_IRQ_SOURCE_ID;
> 
> As in the previous patch, there is no validation
> that source id was previously allocated to userspace, so this makes life
> harder for userspace: if it deassigns a used source ID the result is not
> a clean failure but hard to find bugs.

Yep, good point.  Some trivial bitmap management should fix this.

> >  	INIT_LIST_HEAD(&irqfd->list);
> >  	INIT_WORK(&irqfd->inject, irqfd_inject);
> >  	INIT_WORK(&irqfd->shutdown, irqfd_shutdown);
> > @@ -340,7 +345,8 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
> >  int
> >  kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
> >  {
> > -	if (args->flags & ~KVM_IRQFD_FLAG_DEASSIGN)
> > +	if (args->flags & ~(KVM_IRQFD_FLAG_DEASSIGN |
> > +			    KVM_IRQFD_FLAG_IRQ_SOURCE_ID))
> >  		return -EINVAL;
> >  
> >  	if (args->flags & KVM_IRQFD_FLAG_DEASSIGN)

Summary:
      * user allocation bitmap needs to be cross-referenced by consumers
        of user passed source IDs.

Thanks,

Alex


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 5/6] kvm: KVM_IRQ_ACKFD
  2012-08-15 14:05   ` Michael S. Tsirkin
@ 2012-08-15 17:17     ` Alex Williamson
  0 siblings, 0 replies; 33+ messages in thread
From: Alex Williamson @ 2012-08-15 17:17 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: avi, gleb, kvm, linux-kernel

On Wed, 2012-08-15 at 17:05 +0300, Michael S. Tsirkin wrote:
> On Fri, Aug 10, 2012 at 04:37:48PM -0600, Alex Williamson wrote:
> > Enable a mechanism for IRQ ACKs to be exposed through an eventfd.  The
> > user can specify the GSI and optionally an IRQ source ID and have the
> > provided eventfd trigger whenever the irqchip resamples it's inputs,
> > for instance on EOI.
> > 
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > ---
> > 
> >  Documentation/virtual/kvm/api.txt |   18 ++
> >  arch/x86/kvm/x86.c                |    2 
> >  include/linux/kvm.h               |   16 ++
> >  include/linux/kvm_host.h          |   13 ++
> >  virt/kvm/eventfd.c                |  285 +++++++++++++++++++++++++++++++++++++
> >  virt/kvm/kvm_main.c               |   10 +
> >  6 files changed, 344 insertions(+)
> > 
> > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > index 17cd599..77b4837 100644
> > --- a/Documentation/virtual/kvm/api.txt
> > +++ b/Documentation/virtual/kvm/api.txt
> > @@ -2011,6 +2011,24 @@ of IRQ source IDs.  These IDs are also shared with KVM internal users
> >  (ex. KVM assigned devices, PIT, shared user ID), therefore not all IDs
> >  may be allocated through this interface.
> >  
> > +4.78 KVM_IRQ_ACKFD
> > +
> > +Capability: KVM_CAP_IRQ_ACKFD
> > +Architectures: x86
> > +Type: vm ioctl
> > +Parameters: struct kvm_irq_ackfd (in)
> > +Returns: 0 on success, -errno on error
> > +
> > +Allows userspace notification of IRQ ACKs, or resampling of irqchip
> > +inputs, often associated with an EOI.  User provided kvm_irq_ackfd.fd
> > +and kvm_irq_ackfd.gsi are required and result in an eventfd trigger
> > +whenever the GSI is acknowledged.  When KVM_CAP_IRQ_ACKFD_IRQ_SOURCE_FD
> > +is available, KVM_IRQ_ACKFD supports the KVM_IRQ_ACKFD_FLAG_IRQ_SOURCE_ID
> > +flag which indicates kvm_irqfd.irq_source_id is provided.  With this,
> > +the eventfd is only triggered when the specified IRQ source ID is
> > +pending.  On deassign, fd, gsi, and irq_source_id (if provided on assign)
> > +must be provided.
> > +
> >
> 
> This seems to imply that ACKFD can be used with edge
> GSIs, but this does not actually work: with source ID
> because of the filtering, and without because of PV EOI.
> It seems that what we are really tracking is
> level change 1->0. For edge no level -> no notification?
> 
> Or does 'resampling of irqchip inputs' imply level somehow?

AIUI, there's some sort of resampling for edge inputs, but I'm by no
means an expert.  In earlier versions I defined this as level-only,
undefined results for edge, but Gleb thought it might be useful for
clock drift problems.  There seem to be other complications for using it
for that, so maybe we can go back to this being defined for level-only
and undefined for edge.  I'd probably vote for a documentation-only
change vs trying to validate that a irqchip pin is programmed for level
at the time of this ioctl.

...
> > +static int kvm_assign_irq_ackfd(struct kvm *kvm, struct kvm_irq_ackfd *args)
> > +{
> > +	struct file *file = NULL;
> > +	struct eventfd_ctx *eventfd = NULL;
> > +	struct _irq_ackfd *ackfd = NULL, *tmp;
> > +	int ret;
> > +	u64 cnt;
> > +
> > +	file = eventfd_fget(args->fd);
> > +	if (IS_ERR(file)) {
> > +		ret = PTR_ERR(file);
> > +		goto fail;
> > +	}
> > +
> > +	eventfd = eventfd_ctx_fileget(file);
> > +	if (IS_ERR(eventfd)) {
> > +		ret = PTR_ERR(eventfd);
> > +		goto fail;
> > +	}
> > +
> > +	ackfd = kzalloc(sizeof(*ackfd), GFP_KERNEL);
> > +	if (!ackfd) {
> > +		ret = -ENOMEM;
> > +		goto fail;
> > +	}
> > +
> > +	INIT_LIST_HEAD(&ackfd->list);
> > +	INIT_WORK(&ackfd->shutdown, irq_ackfd_shutdown);
> > +	ackfd->kvm = kvm;
> > +	ackfd->eventfd = eventfd;
> > +	ackfd->notifier.gsi = args->gsi;
> > +	if (args->flags & KVM_IRQ_ACKFD_FLAG_IRQ_SOURCE_ID)
> > +		ackfd->notifier.irq_source_id = args->irq_source_id;
> > +	else
> > +		ackfd->notifier.irq_source_id = -1;
> > +	ackfd->notifier.irq_acked = irq_ackfd_acked;
> > +
> > +	/*
> > +	 * Install our own custom wake-up handling so we are notified via
> > +	 * a callback whenever someone releases the underlying eventfd
> > +	 */
> > +	init_waitqueue_func_entry(&ackfd->wait, irq_ackfd_wakeup);
> > +	init_poll_funcptr(&ackfd->pt, irq_ackfd_ptable_queue_proc);
> > +
> > +	/*
> > +	 * Install the wait queue function to allow cleanup when the
> > +	 * eventfd is closed by the user.  This grabs the wqh lock, so
> > +	 * we do it out of spinlock, holding the file reference ensures
> > +	 * we won't see a POLLHUP until setup is complete.
> > +	 */
> > +	file->f_op->poll(file, &ackfd->pt);
> > +
> > +	spin_lock_irq(&kvm->irq_ackfds.lock);
> > +
> > +	/* Check for duplicates.  */
> > +	list_for_each_entry(tmp, &kvm->irq_ackfds.items, list) {
> > +		if (ackfd->eventfd == tmp->eventfd &&
> > +		    ackfd->notifier.gsi == tmp->notifier.gsi &&
> > +		    ackfd->notifier.irq_source_id ==
> > +		    tmp->notifier.irq_source_id) {
> > +			spin_unlock_irq(&kvm->irq_ackfds.lock);
> > +			ret = -EBUSY;
> > +			goto fail_unqueue;
> > +		}
> > +	}
> > +
> 
> So source ID is not validated at all, this means there are
> 4G ways for the same eventfd to be registered,
> drinking up unlimited kernel memory by means of kzalloc above.

Bitmap and cross-referencing mentioned in earlier patches also fix this.
Thanks,

Alex



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 6/6] kvm: Add de-assert option to KVM_IRQ_ACKFD
  2012-08-15 14:11   ` Michael S. Tsirkin
@ 2012-08-15 17:24     ` Alex Williamson
  0 siblings, 0 replies; 33+ messages in thread
From: Alex Williamson @ 2012-08-15 17:24 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: avi, gleb, kvm, linux-kernel

On Wed, 2012-08-15 at 17:11 +0300, Michael S. Tsirkin wrote:
> On Fri, Aug 10, 2012 at 04:37:56PM -0600, Alex Williamson wrote:
> > It's likely (vfio) that one of the reasons to watch for an IRQ ACK
> > is to de-assert and re-enable an interrupt.  As the IRQ ACK notfier
> > is already watching a GSI for an IRQ source ID we can easily couple
> > these together.
> > 
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> 
> This source id is required the only way to assert
> in the 1st place is with irqfd.

ioctl(fd, KVM_IRQ_ACKFD, kvm_irq_ackfd.{flags = 0, fd = eventfd, gsi = $GSI});
ioctl(fd, KVM_IRQ_LINE, kvm_irq_level.{irq = $GSI, level = 1);
/* eventfd notification */
ioctl(fd, KVM_IRQ_LINE, kvm_irq_level.{irq = $GSI, level = 0);

> So why is this an ackfd flag and not an irqfd flag?
> I am guessing because this way you do not need
> an extra ack notifier, but isn't this an internal
> optimization leaking out to userspace?

If irqfd were to setup it's own irq ack notifier for de-assert, we can
guarantee the ordering of de-assert vs eventfd trigger.  So, whoever
does one, needs to do both.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 0/6] kvm: level irqfd support
  2012-08-15 14:28 ` [PATCH v8 0/6] kvm: level irqfd support Michael S. Tsirkin
@ 2012-08-15 17:36   ` Alex Williamson
  2012-08-15 19:22     ` Michael S. Tsirkin
  0 siblings, 1 reply; 33+ messages in thread
From: Alex Williamson @ 2012-08-15 17:36 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: avi, gleb, kvm, linux-kernel

On Wed, 2012-08-15 at 17:28 +0300, Michael S. Tsirkin wrote:
> On Fri, Aug 10, 2012 at 04:37:08PM -0600, Alex Williamson wrote:
> > v8:
> > 
> > Trying a new approach.  Nobody seems to like the internal IRQ
> > source ID object and the interactions it implies between irqfd
> > and eoifd, so let's get rid of it.  Instead, simply expose
> > IRQ source IDs to userspace.  This lets the user be in charge
> > of freeing them or hanging onto a source ID for later use.
> 
> In the end it turns out source ID is an optimization for shared
> interrupts, isn't it?  Can't we apply the optimization transparently to
> the user?  E.g. if we have some spare source IDs, allocate them, if we
> run out, use a shared source ID?

Let's think about shared source IDs a bit more.  I think it's wrong that
irqfd uses KVM_USERSPACE_IRQ_SOURCE_ID, but I'm questioning whether all
irqfd users can share a source ID.  We do not get the logical OR of all
users by putting them on the same source ID, we get "last set wins".
KVM_USERSPACE_IRQ_SOURCE_ID is used for multiple inputs because the
logical OR happens in userspace.  How would we not starve a user if we
define KVM_IRQFD_SOURCE_ID?  What am I missing?

So I'm inclined to say source IDs are a requirement for shared
interrupts.  That means the re-use scheme becomes complicated (ex. we
run out of IRQ source IDs, so we start looking for sharing by re-using a
source ID used by a different GSI).  Do we want to do that in kernel or
userspace?  This series allows userspace to deal with that complexity.
Please let me know if I'm thinking incorrectly about source ID re-use.
Thanks,

Alex


> > They
> > can also detach and re-attach components at will.  It also opens
> > up the possibility that userspace might want to use each IRQ
> > source ID for more than one GSI (and avoids the kernel needing
> > to manage that).  Per suggestions, EOIFD is now IRQ_ACKFD.
> > 
> > I really wanted to add a de-assert-only option to irqfd so the
> > irq_ackfd could be fed directly into an irqfd, but I'm dependent
> > on the ordering of de-assert _then_ signal an eventfd.  Keeping
> > that ordering doesn't seem to be possible, especially since irqfd
> > uses a workqueue, if I attempt to make that connection.  Thanks,
> > 
> > Alex
> > ---
> > 
> > Alex Williamson (6):
> >       kvm: Add de-assert option to KVM_IRQ_ACKFD
> >       kvm: KVM_IRQ_ACKFD
> >       kvm: Add assert-only option to KVM_IRQFD
> >       kvm: Add IRQ source ID option to KVM_IRQFD
> >       kvm: Expose IRQ source IDs to userspace
> >       kvm: Allow filtering of acked irqs
> > 
> > 
> >  Documentation/virtual/kvm/api.txt |   53 ++++++
> >  arch/x86/kvm/Kconfig              |    1 
> >  arch/x86/kvm/i8254.c              |    1 
> >  arch/x86/kvm/i8259.c              |    8 +
> >  arch/x86/kvm/x86.c                |    8 +
> >  include/linux/kvm.h               |   39 ++++-
> >  include/linux/kvm_host.h          |   18 ++
> >  virt/kvm/Kconfig                  |    3 
> >  virt/kvm/assigned-dev.c           |    1 
> >  virt/kvm/eventfd.c                |  315 +++++++++++++++++++++++++++++++++++++
> >  virt/kvm/ioapic.c                 |    5 -
> >  virt/kvm/irq_comm.c               |   28 +++
> >  virt/kvm/kvm_main.c               |   26 +++
> >  13 files changed, 496 insertions(+), 10 deletions(-)




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 0/6] kvm: level irqfd support
  2012-08-15 17:36   ` Alex Williamson
@ 2012-08-15 19:22     ` Michael S. Tsirkin
  2012-08-15 19:59       ` Alex Williamson
  2012-08-16 16:29       ` Avi Kivity
  0 siblings, 2 replies; 33+ messages in thread
From: Michael S. Tsirkin @ 2012-08-15 19:22 UTC (permalink / raw)
  To: Alex Williamson; +Cc: avi, gleb, kvm, linux-kernel

On Wed, Aug 15, 2012 at 11:36:31AM -0600, Alex Williamson wrote:
> On Wed, 2012-08-15 at 17:28 +0300, Michael S. Tsirkin wrote:
> > On Fri, Aug 10, 2012 at 04:37:08PM -0600, Alex Williamson wrote:
> > > v8:
> > > 
> > > Trying a new approach.  Nobody seems to like the internal IRQ
> > > source ID object and the interactions it implies between irqfd
> > > and eoifd, so let's get rid of it.  Instead, simply expose
> > > IRQ source IDs to userspace.  This lets the user be in charge
> > > of freeing them or hanging onto a source ID for later use.
> > 
> > In the end it turns out source ID is an optimization for shared
> > interrupts, isn't it?  Can't we apply the optimization transparently to
> > the user?  E.g. if we have some spare source IDs, allocate them, if we
> > run out, use a shared source ID?
> 
> Let's think about shared source IDs a bit more.  I think it's wrong that
> irqfd uses KVM_USERSPACE_IRQ_SOURCE_ID, but I'm questioning whether all
> irqfd users can share a source ID.  We do not get the logical OR of all
> users by putting them on the same source ID, we get "last set wins".
> KVM_USERSPACE_IRQ_SOURCE_ID is used for multiple inputs because the
> logical OR happens in userspace.  How would we not starve a user if we
> define KVM_IRQFD_SOURCE_ID?  What am I missing?

That all irqfds are deasserted on EOI anyway.  So there's no point
to do a logical OR.


> 
> So I'm inclined to say source IDs are a requirement for shared
> interrupts.

Can yo show a specific example that breaks?
I don't think it can exist.

> That means the re-use scheme becomes complicated (ex. we
> run out of IRQ source IDs, so we start looking for sharing by re-using a
> source ID used by a different GSI).  Do we want to do that in kernel or
> userspace?  This series allows userspace to deal with that complexity.
> Please let me know if I'm thinking incorrectly about source ID re-use.
> Thanks,
> 
> Alex

I think there is a misunderstanding.
All deassert on ack irqfds can share a source ID.
This is why I am now thinking deassert on ack behaviour
should be set when irqfd is assigned.


> 
> > > They
> > > can also detach and re-attach components at will.  It also opens
> > > up the possibility that userspace might want to use each IRQ
> > > source ID for more than one GSI (and avoids the kernel needing
> > > to manage that).  Per suggestions, EOIFD is now IRQ_ACKFD.
> > > 
> > > I really wanted to add a de-assert-only option to irqfd so the
> > > irq_ackfd could be fed directly into an irqfd, but I'm dependent
> > > on the ordering of de-assert _then_ signal an eventfd.  Keeping
> > > that ordering doesn't seem to be possible, especially since irqfd
> > > uses a workqueue, if I attempt to make that connection.  Thanks,
> > > 
> > > Alex
> > > ---
> > > 
> > > Alex Williamson (6):
> > >       kvm: Add de-assert option to KVM_IRQ_ACKFD
> > >       kvm: KVM_IRQ_ACKFD
> > >       kvm: Add assert-only option to KVM_IRQFD
> > >       kvm: Add IRQ source ID option to KVM_IRQFD
> > >       kvm: Expose IRQ source IDs to userspace
> > >       kvm: Allow filtering of acked irqs
> > > 
> > > 
> > >  Documentation/virtual/kvm/api.txt |   53 ++++++
> > >  arch/x86/kvm/Kconfig              |    1 
> > >  arch/x86/kvm/i8254.c              |    1 
> > >  arch/x86/kvm/i8259.c              |    8 +
> > >  arch/x86/kvm/x86.c                |    8 +
> > >  include/linux/kvm.h               |   39 ++++-
> > >  include/linux/kvm_host.h          |   18 ++
> > >  virt/kvm/Kconfig                  |    3 
> > >  virt/kvm/assigned-dev.c           |    1 
> > >  virt/kvm/eventfd.c                |  315 +++++++++++++++++++++++++++++++++++++
> > >  virt/kvm/ioapic.c                 |    5 -
> > >  virt/kvm/irq_comm.c               |   28 +++
> > >  virt/kvm/kvm_main.c               |   26 +++
> > >  13 files changed, 496 insertions(+), 10 deletions(-)
> 
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 1/6] kvm: Allow filtering of acked irqs
  2012-08-15 16:47     ` Alex Williamson
@ 2012-08-15 19:24       ` Michael S. Tsirkin
  0 siblings, 0 replies; 33+ messages in thread
From: Michael S. Tsirkin @ 2012-08-15 19:24 UTC (permalink / raw)
  To: Alex Williamson; +Cc: avi, gleb, kvm, linux-kernel

On Wed, Aug 15, 2012 at 10:47:38AM -0600, Alex Williamson wrote:
> On Wed, 2012-08-15 at 15:27 +0300, Michael S. Tsirkin wrote:
> > On Fri, Aug 10, 2012 at 04:37:17PM -0600, Alex Williamson wrote:
> > > Registering an kvm_irq_ack_notifier with kian.irq_source_id < 0
> > > retains existing behavior, filling in the actual irq_source_id results
> > > in the callback only being called when the specified irq_source_id is
> > > asserting the given gsi.
> > > 
> > > The i8254 PIT remains unfiltered because it de-asserts it's irq source
> > > id, so it's notifier would never get called otherwise.  KVM device
> > > assignment gets filtering as it de-asserts the GSI in it's notifier.
> > > 
> > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > 
> > Looks good to me. For the record, I expect this to help if
> > - an assigned device interrupt is shared in host
> >   so we use slow config cycles in the ack notifier
> > - said device is sharing interrupt with another device in guest
> > - said another device is actually driving most interrupts
> > For example, I think this could be tested
> > by booting guest with pci=nomsi.
> 
> Yes, that's how I do most of my testing of legacy interrupts with
> vfio-pci.  KVM assignment is smart enough not to do config access unless
> the irq is marked as disabled, but it does eliminate an unnecessary
> de-assert and a couple spinlocks in assignment code.

Hmm this is less than I thought. Any performance numbers?

> > A minor suggestions below but
> > nothing that needs to block this patch.
> > 
> > > ---
> > > 
> > >  arch/x86/kvm/i8254.c     |    1 +
> > >  arch/x86/kvm/i8259.c     |    8 +++++++-
> > >  include/linux/kvm_host.h |    4 +++-
> > >  virt/kvm/assigned-dev.c  |    1 +
> > >  virt/kvm/ioapic.c        |    5 ++++-
> > >  virt/kvm/irq_comm.c      |    6 ++++--
> > >  6 files changed, 20 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
> > > index adba28f..2355d19 100644
> > > --- a/arch/x86/kvm/i8254.c
> > > +++ b/arch/x86/kvm/i8254.c
> > > @@ -709,6 +709,7 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
> > >  	hrtimer_init(&pit_state->pit_timer.timer,
> > >  		     CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
> > >  	pit_state->irq_ack_notifier.gsi = 0;
> > > +	pit_state->irq_ack_notifier.irq_source_id = -1; /* No filter */
> > 
> > A bit prettier would be to
> > #define KVM_NO_IRQ_SOURCE_ID     (-1)
> > and test for it explicitly.
> 
> Ok.  I'll add a define and resend this one.  Thanks,
> 
> Alex
> 
> > >  	pit_state->irq_ack_notifier.irq_acked = kvm_pit_ack_irq;
> > >  	kvm_register_irq_ack_notifier(kvm, &pit_state->irq_ack_notifier);
> > >  	pit_state->pit_timer.reinject = true;
> > > diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
> > > index e498b18..d2175a9 100644
> > > --- a/arch/x86/kvm/i8259.c
> > > +++ b/arch/x86/kvm/i8259.c
> > > @@ -74,9 +74,14 @@ static void pic_unlock(struct kvm_pic *s)
> > >  
> > >  static void pic_clear_isr(struct kvm_kpic_state *s, int irq)
> > >  {
> > > +	unsigned long irq_source_ids;
> > > +
> > >  	s->isr &= ~(1 << irq);
> > >  	if (s != &s->pics_state->pics[0])
> > >  		irq += 8;
> > > +
> > > +	irq_source_ids = s->pics_state->irq_states[irq];
> > > +
> > >  	/*
> > >  	 * We are dropping lock while calling ack notifiers since ack
> > >  	 * notifier callbacks for assigned devices call into PIC recursively.
> > > @@ -84,7 +89,8 @@ static void pic_clear_isr(struct kvm_kpic_state *s, int irq)
> > >  	 * it should be safe since PIC state is already updated at this stage.
> > >  	 */
> > >  	pic_unlock(s->pics_state);
> > > -	kvm_notify_acked_irq(s->pics_state->kvm, SELECT_PIC(irq), irq);
> > > +	kvm_notify_acked_irq(s->pics_state->kvm, SELECT_PIC(irq), irq,
> > > +			     irq_source_ids);
> > >  	pic_lock(s->pics_state);
> > >  }
> > >  
> > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > index b70b48b..2ad3e4a 100644
> > > --- a/include/linux/kvm_host.h
> > > +++ b/include/linux/kvm_host.h
> > > @@ -577,6 +577,7 @@ int kvm_is_mmio_pfn(pfn_t pfn);
> > >  
> > >  struct kvm_irq_ack_notifier {
> > >  	struct hlist_node link;
> > > +	int irq_source_id;
> > >  	unsigned gsi;
> > >  	void (*irq_acked)(struct kvm_irq_ack_notifier *kian);
> > >  };
> > > @@ -627,7 +628,8 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic,
> > >  int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level);
> > >  int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm *kvm,
> > >  		int irq_source_id, int level);
> > > -void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
> > > +void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin,
> > > +			  unsigned long irq_source_ids);
> > >  void kvm_register_irq_ack_notifier(struct kvm *kvm,
> > >  				   struct kvm_irq_ack_notifier *kian);
> > >  void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
> > > diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
> > > index 23a41a9..a08c9c1 100644
> > > --- a/virt/kvm/assigned-dev.c
> > > +++ b/virt/kvm/assigned-dev.c
> > > @@ -407,6 +407,7 @@ static int assigned_device_enable_guest_intx(struct kvm *kvm,
> > >  {
> > >  	dev->guest_irq = irq->guest_irq;
> > >  	dev->ack_notifier.gsi = irq->guest_irq;
> > > +	dev->ack_notifier.irq_source_id = dev->irq_source_id;
> > >  	return 0;
> > >  }
> > >  
> > > diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
> > > index ef61d52..1a9f445 100644
> > > --- a/virt/kvm/ioapic.c
> > > +++ b/virt/kvm/ioapic.c
> > > @@ -241,10 +241,12 @@ static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector,
> > >  
> > >  	for (i = 0; i < IOAPIC_NUM_PINS; i++) {
> > >  		union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i];
> > > +		unsigned long irq_source_ids;
> > >  
> > >  		if (ent->fields.vector != vector)
> > >  			continue;
> > >  
> > > +		irq_source_ids = ioapic->irq_states[i];
> > >  		/*
> > >  		 * We are dropping lock while calling ack notifiers because ack
> > >  		 * notifier callbacks for assigned devices call into IOAPIC
> > > @@ -254,7 +256,8 @@ static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector,
> > >  		 * after ack notifier returns.
> > >  		 */
> > >  		spin_unlock(&ioapic->lock);
> > > -		kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
> > > +		kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i,
> > > +				     irq_source_ids);
> > >  		spin_lock(&ioapic->lock);
> > >  
> > >  		if (trigger_mode != IOAPIC_LEVEL_TRIG)
> > > diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
> > > index 83402d7..7d75126 100644
> > > --- a/virt/kvm/irq_comm.c
> > > +++ b/virt/kvm/irq_comm.c
> > > @@ -173,7 +173,8 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level)
> > >  	return ret;
> > >  }
> > >  
> > > -void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin)
> > > +void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin,
> > > +			  unsigned long irq_source_ids)
> > >  {
> > >  	struct kvm_irq_ack_notifier *kian;
> > >  	struct hlist_node *n;
> > > @@ -186,7 +187,8 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin)
> > >  	if (gsi != -1)
> > >  		hlist_for_each_entry_rcu(kian, n, &kvm->irq_ack_notifier_list,
> > >  					 link)
> > > -			if (kian->gsi == gsi)
> > > +			if (kian->gsi == gsi && (kian->irq_source_id < 0 ||
> > > +			    test_bit(kian->irq_source_id, &irq_source_ids)))
> > >  				kian->irq_acked(kian);
> > >  	rcu_read_unlock();
> > >  }
> > 
> 
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 0/6] kvm: level irqfd support
  2012-08-15 19:22     ` Michael S. Tsirkin
@ 2012-08-15 19:59       ` Alex Williamson
  2012-08-16 12:34         ` Alex Williamson
  2012-08-16 16:29       ` Avi Kivity
  1 sibling, 1 reply; 33+ messages in thread
From: Alex Williamson @ 2012-08-15 19:59 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: avi, gleb, kvm, linux-kernel

On Wed, 2012-08-15 at 22:22 +0300, Michael S. Tsirkin wrote:
> On Wed, Aug 15, 2012 at 11:36:31AM -0600, Alex Williamson wrote:
> > On Wed, 2012-08-15 at 17:28 +0300, Michael S. Tsirkin wrote:
> > > On Fri, Aug 10, 2012 at 04:37:08PM -0600, Alex Williamson wrote:
> > > > v8:
> > > > 
> > > > Trying a new approach.  Nobody seems to like the internal IRQ
> > > > source ID object and the interactions it implies between irqfd
> > > > and eoifd, so let's get rid of it.  Instead, simply expose
> > > > IRQ source IDs to userspace.  This lets the user be in charge
> > > > of freeing them or hanging onto a source ID for later use.
> > > 
> > > In the end it turns out source ID is an optimization for shared
> > > interrupts, isn't it?  Can't we apply the optimization transparently to
> > > the user?  E.g. if we have some spare source IDs, allocate them, if we
> > > run out, use a shared source ID?
> > 
> > Let's think about shared source IDs a bit more.  I think it's wrong that
> > irqfd uses KVM_USERSPACE_IRQ_SOURCE_ID, but I'm questioning whether all
> > irqfd users can share a source ID.  We do not get the logical OR of all
> > users by putting them on the same source ID, we get "last set wins".
> > KVM_USERSPACE_IRQ_SOURCE_ID is used for multiple inputs because the
> > logical OR happens in userspace.  How would we not starve a user if we
> > define KVM_IRQFD_SOURCE_ID?  What am I missing?
> 
> That all irqfds are deasserted on EOI anyway.  So there's no point
> to do a logical OR.

Ok, so the argument is:

- edge irqfds (the code now) can share a source ID because there is no
state.  Overlapping interrupt injects always cause one or more edge
triggers.
- your proposed level extension can only be asserted by the inject
eventfd and is only de-asserted by EOI, which de-asserts and notifies
all users.

What prevents an edge irqfd being registered to the same GSI as a level
irqfd, resulting in a de-assert that might result in the irr not being
seen by the guest and therefore maybe not getting an EOI? (I think this
is the same problem as why we can't use the exiting irqfd to insert a
level interrupt)

Having the de-assert only on EOI policy allows level irqfds to share a
source ID, but do they all need to share a separate source ID from edge
irqfds?

> > So I'm inclined to say source IDs are a requirement for shared
> > interrupts.
> 
> Can yo show a specific example that breaks?
> I don't think it can exist.

Only the edge vs level interaction if we define the policy above for
de-assert.

> > That means the re-use scheme becomes complicated (ex. we
> > run out of IRQ source IDs, so we start looking for sharing by re-using a
> > source ID used by a different GSI).  Do we want to do that in kernel or
> > userspace?  This series allows userspace to deal with that complexity.
> > Please let me know if I'm thinking incorrectly about source ID re-use.
> > Thanks,
> > 
> > Alex
> 
> I think there is a misunderstanding.
> All deassert on ack irqfds can share a source ID.
> This is why I am now thinking deassert on ack behaviour
> should be set when irqfd is assigned.

Maybe you were already thinking along the lines of a separate source ID
for de-assert on ack irqfds vs normal irqfds then.  I think I missed
that.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 0/6] kvm: level irqfd support
  2012-08-15 19:59       ` Alex Williamson
@ 2012-08-16 12:34         ` Alex Williamson
  2012-08-16 12:53           ` Michael S. Tsirkin
  0 siblings, 1 reply; 33+ messages in thread
From: Alex Williamson @ 2012-08-16 12:34 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: avi, gleb, kvm, linux-kernel

On Wed, 2012-08-15 at 13:59 -0600, Alex Williamson wrote:
> On Wed, 2012-08-15 at 22:22 +0300, Michael S. Tsirkin wrote:
> > On Wed, Aug 15, 2012 at 11:36:31AM -0600, Alex Williamson wrote:
> > > On Wed, 2012-08-15 at 17:28 +0300, Michael S. Tsirkin wrote:
> > > > On Fri, Aug 10, 2012 at 04:37:08PM -0600, Alex Williamson wrote:
> > > > > v8:
> > > > > 
> > > > > Trying a new approach.  Nobody seems to like the internal IRQ
> > > > > source ID object and the interactions it implies between irqfd
> > > > > and eoifd, so let's get rid of it.  Instead, simply expose
> > > > > IRQ source IDs to userspace.  This lets the user be in charge
> > > > > of freeing them or hanging onto a source ID for later use.
> > > > 
> > > > In the end it turns out source ID is an optimization for shared
> > > > interrupts, isn't it?  Can't we apply the optimization transparently to
> > > > the user?  E.g. if we have some spare source IDs, allocate them, if we
> > > > run out, use a shared source ID?
> > > 
> > > Let's think about shared source IDs a bit more.  I think it's wrong that
> > > irqfd uses KVM_USERSPACE_IRQ_SOURCE_ID, but I'm questioning whether all
> > > irqfd users can share a source ID.  We do not get the logical OR of all
> > > users by putting them on the same source ID, we get "last set wins".
> > > KVM_USERSPACE_IRQ_SOURCE_ID is used for multiple inputs because the
> > > logical OR happens in userspace.  How would we not starve a user if we
> > > define KVM_IRQFD_SOURCE_ID?  What am I missing?
> > 
> > That all irqfds are deasserted on EOI anyway.  So there's no point
> > to do a logical OR.
> 
> Ok, so the argument is:
> 
> - edge irqfds (the code now) can share a source ID because there is no
> state.  Overlapping interrupt injects always cause one or more edge
> triggers.
> - your proposed level extension can only be asserted by the inject
> eventfd and is only de-asserted by EOI, which de-asserts and notifies
> all users.
> 
> What prevents an edge irqfd being registered to the same GSI as a level
> irqfd, resulting in a de-assert that might result in the irr not being
> seen by the guest and therefore maybe not getting an EOI? (I think this
> is the same problem as why we can't use the exiting irqfd to insert a
> level interrupt)
> 
> Having the de-assert only on EOI policy allows level irqfds to share a
> source ID, but do they all need to share a separate source ID from edge
> irqfds?
> 
> > > So I'm inclined to say source IDs are a requirement for shared
> > > interrupts.
> > 
> > Can yo show a specific example that breaks?
> > I don't think it can exist.
> 
> Only the edge vs level interaction if we define the policy above for
> de-assert.

Hmm, there is still a race w/ level.  If we have a number of
level-deassert-irqfds making use of the same gsi and sourceid and we
individually de-assert and notify, a re-assert could get lost if it
happens before all of the de-asserts have finished.  We either need
separate sourceids or we need to do a single de-assert followed by
multiple notifies.  Right?  Thanks,

Alex

> > > That means the re-use scheme becomes complicated (ex. we
> > > run out of IRQ source IDs, so we start looking for sharing by re-using a
> > > source ID used by a different GSI).  Do we want to do that in kernel or
> > > userspace?  This series allows userspace to deal with that complexity.
> > > Please let me know if I'm thinking incorrectly about source ID re-use.
> > > Thanks,
> > > 
> > > Alex
> > 
> > I think there is a misunderstanding.
> > All deassert on ack irqfds can share a source ID.
> > This is why I am now thinking deassert on ack behaviour
> > should be set when irqfd is assigned.
> 
> Maybe you were already thinking along the lines of a separate source ID
> for de-assert on ack irqfds vs normal irqfds then.  I think I missed
> that.  Thanks,
> 
> Alex




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 0/6] kvm: level irqfd support
  2012-08-16 12:34         ` Alex Williamson
@ 2012-08-16 12:53           ` Michael S. Tsirkin
  0 siblings, 0 replies; 33+ messages in thread
From: Michael S. Tsirkin @ 2012-08-16 12:53 UTC (permalink / raw)
  To: Alex Williamson; +Cc: avi, gleb, kvm, linux-kernel

On Thu, Aug 16, 2012 at 06:34:52AM -0600, Alex Williamson wrote:
> > > > So I'm inclined to say source IDs are a requirement for shared
> > > > interrupts.
> > > 
> > > Can yo show a specific example that breaks?
> > > I don't think it can exist.
> > 
> > Only the edge vs level interaction if we define the policy above for
> > de-assert.
> 
> Hmm, there is still a race w/ level.  If we have a number of
> level-deassert-irqfds making use of the same gsi and sourceid and we
> individually de-assert and notify, a re-assert could get lost if it
> happens before all of the de-asserts have finished.
> We either need
> separate sourceids or we need to do a single de-assert followed by
> multiple notifies.  Right?  Thanks,
> 
> Alex

Good catch, I agree, we need a single deassert.

I think I see how to implement this without reference counting and
stuff.  So we chain all auto-deassert irqfds for a given GSI together,
and have a single ack notifier.  When list becomes empty, remove the ack
notifier.

It's actually a good thing to do anyway, too many ack notifiers
would slow unrelated GSIs down.

-- 
MST

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 0/6] kvm: level irqfd support
  2012-08-15 19:22     ` Michael S. Tsirkin
  2012-08-15 19:59       ` Alex Williamson
@ 2012-08-16 16:29       ` Avi Kivity
  2012-08-16 16:36         ` Michael S. Tsirkin
  2012-08-16 16:37         ` Alex Williamson
  1 sibling, 2 replies; 33+ messages in thread
From: Avi Kivity @ 2012-08-16 16:29 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Alex Williamson, gleb, kvm, linux-kernel

On 08/15/2012 10:22 PM, Michael S. Tsirkin wrote:
> On Wed, Aug 15, 2012 at 11:36:31AM -0600, Alex Williamson wrote:
>> On Wed, 2012-08-15 at 17:28 +0300, Michael S. Tsirkin wrote:
>> > On Fri, Aug 10, 2012 at 04:37:08PM -0600, Alex Williamson wrote:
>> > > v8:
>> > > 
>> > > Trying a new approach.  Nobody seems to like the internal IRQ
>> > > source ID object and the interactions it implies between irqfd
>> > > and eoifd, so let's get rid of it.  Instead, simply expose
>> > > IRQ source IDs to userspace.  This lets the user be in charge
>> > > of freeing them or hanging onto a source ID for later use.
>> > 
>> > In the end it turns out source ID is an optimization for shared
>> > interrupts, isn't it?  Can't we apply the optimization transparently to
>> > the user?  E.g. if we have some spare source IDs, allocate them, if we
>> > run out, use a shared source ID?
>> 
>> Let's think about shared source IDs a bit more.  I think it's wrong that
>> irqfd uses KVM_USERSPACE_IRQ_SOURCE_ID, but I'm questioning whether all
>> irqfd users can share a source ID.  We do not get the logical OR of all
>> users by putting them on the same source ID, we get "last set wins".
>> KVM_USERSPACE_IRQ_SOURCE_ID is used for multiple inputs because the
>> logical OR happens in userspace.  How would we not starve a user if we
>> define KVM_IRQFD_SOURCE_ID?  What am I missing?
> 
> That all irqfds are deasserted on EOI anyway.  So there's no point
> to do a logical OR.
> 
> 

What if a level irqfd shares a line with a KVM_IRQ_LINE ioctl?  Then an
EOI can de-assert the irqfd source, but the line is kept high by the
last KVM_IRQ_LINE invocation.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 0/6] kvm: level irqfd support
  2012-08-10 22:37 [PATCH v8 0/6] kvm: level irqfd support Alex Williamson
                   ` (6 preceding siblings ...)
  2012-08-15 14:28 ` [PATCH v8 0/6] kvm: level irqfd support Michael S. Tsirkin
@ 2012-08-16 16:32 ` Avi Kivity
  2012-08-16 16:45   ` Alex Williamson
  7 siblings, 1 reply; 33+ messages in thread
From: Avi Kivity @ 2012-08-16 16:32 UTC (permalink / raw)
  To: Alex Williamson; +Cc: mst, gleb, kvm, linux-kernel

On 08/11/2012 01:37 AM, Alex Williamson wrote:
> v8:
> 
> Trying a new approach.  Nobody seems to like the internal IRQ
> source ID object and the interactions it implies between irqfd
> and eoifd, so let's get rid of it.  Instead, simply expose
> IRQ source IDs to userspace.  This lets the user be in charge
> of freeing them or hanging onto a source ID for later use.  They
> can also detach and re-attach components at will.  It also opens
> up the possibility that userspace might want to use each IRQ
> source ID for more than one GSI (and avoids the kernel needing
> to manage that).  Per suggestions, EOIFD is now IRQ_ACKFD.
> 
> I really wanted to add a de-assert-only option to irqfd so the
> irq_ackfd could be fed directly into an irqfd, but I'm dependent
> on the ordering of de-assert _then_ signal an eventfd.  Keeping
> that ordering doesn't seem to be possible, especially since irqfd
> uses a workqueue, if I attempt to make that connection.  Thanks,

I can't say I'm happy with exposing irq source IDs.  It's true that they
correspond to a physical entity so they can't be said to be an
implementation detail, but adding more ABIs has a cost and I can't say
that I see another user for this.

Can you provide a link to the combined irqfd+ackfd implementation?  I'm
inclined now to go for the simplest solution possible.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 0/6] kvm: level irqfd support
  2012-08-16 16:29       ` Avi Kivity
@ 2012-08-16 16:36         ` Michael S. Tsirkin
  2012-08-16 16:39           ` Avi Kivity
  2012-08-16 16:37         ` Alex Williamson
  1 sibling, 1 reply; 33+ messages in thread
From: Michael S. Tsirkin @ 2012-08-16 16:36 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Alex Williamson, gleb, kvm, linux-kernel

On Thu, Aug 16, 2012 at 07:29:40PM +0300, Avi Kivity wrote:
> On 08/15/2012 10:22 PM, Michael S. Tsirkin wrote:
> > On Wed, Aug 15, 2012 at 11:36:31AM -0600, Alex Williamson wrote:
> >> On Wed, 2012-08-15 at 17:28 +0300, Michael S. Tsirkin wrote:
> >> > On Fri, Aug 10, 2012 at 04:37:08PM -0600, Alex Williamson wrote:
> >> > > v8:
> >> > > 
> >> > > Trying a new approach.  Nobody seems to like the internal IRQ
> >> > > source ID object and the interactions it implies between irqfd
> >> > > and eoifd, so let's get rid of it.  Instead, simply expose
> >> > > IRQ source IDs to userspace.  This lets the user be in charge
> >> > > of freeing them or hanging onto a source ID for later use.
> >> > 
> >> > In the end it turns out source ID is an optimization for shared
> >> > interrupts, isn't it?  Can't we apply the optimization transparently to
> >> > the user?  E.g. if we have some spare source IDs, allocate them, if we
> >> > run out, use a shared source ID?
> >> 
> >> Let's think about shared source IDs a bit more.  I think it's wrong that
> >> irqfd uses KVM_USERSPACE_IRQ_SOURCE_ID, but I'm questioning whether all
> >> irqfd users can share a source ID.  We do not get the logical OR of all
> >> users by putting them on the same source ID, we get "last set wins".
> >> KVM_USERSPACE_IRQ_SOURCE_ID is used for multiple inputs because the
> >> logical OR happens in userspace.  How would we not starve a user if we
> >> define KVM_IRQFD_SOURCE_ID?  What am I missing?
> > 
> > That all irqfds are deasserted on EOI anyway.  So there's no point
> > to do a logical OR.
> > 
> > 
> 
> What if a level irqfd shares a line with a KVM_IRQ_LINE ioctl?  Then an
> EOI can de-assert the irqfd source, but the line is kept high by the
> last KVM_IRQ_LINE invocation.

Exactly. So 1 ID for userspace and 1 for irqfd.

> 
> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 0/6] kvm: level irqfd support
  2012-08-16 16:29       ` Avi Kivity
  2012-08-16 16:36         ` Michael S. Tsirkin
@ 2012-08-16 16:37         ` Alex Williamson
  1 sibling, 0 replies; 33+ messages in thread
From: Alex Williamson @ 2012-08-16 16:37 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Michael S. Tsirkin, gleb, kvm, linux-kernel

On Thu, 2012-08-16 at 19:29 +0300, Avi Kivity wrote:
> On 08/15/2012 10:22 PM, Michael S. Tsirkin wrote:
> > On Wed, Aug 15, 2012 at 11:36:31AM -0600, Alex Williamson wrote:
> >> On Wed, 2012-08-15 at 17:28 +0300, Michael S. Tsirkin wrote:
> >> > On Fri, Aug 10, 2012 at 04:37:08PM -0600, Alex Williamson wrote:
> >> > > v8:
> >> > > 
> >> > > Trying a new approach.  Nobody seems to like the internal IRQ
> >> > > source ID object and the interactions it implies between irqfd
> >> > > and eoifd, so let's get rid of it.  Instead, simply expose
> >> > > IRQ source IDs to userspace.  This lets the user be in charge
> >> > > of freeing them or hanging onto a source ID for later use.
> >> > 
> >> > In the end it turns out source ID is an optimization for shared
> >> > interrupts, isn't it?  Can't we apply the optimization transparently to
> >> > the user?  E.g. if we have some spare source IDs, allocate them, if we
> >> > run out, use a shared source ID?
> >> 
> >> Let's think about shared source IDs a bit more.  I think it's wrong that
> >> irqfd uses KVM_USERSPACE_IRQ_SOURCE_ID, but I'm questioning whether all
> >> irqfd users can share a source ID.  We do not get the logical OR of all
> >> users by putting them on the same source ID, we get "last set wins".
> >> KVM_USERSPACE_IRQ_SOURCE_ID is used for multiple inputs because the
> >> logical OR happens in userspace.  How would we not starve a user if we
> >> define KVM_IRQFD_SOURCE_ID?  What am I missing?
> > 
> > That all irqfds are deasserted on EOI anyway.  So there's no point
> > to do a logical OR.
> > 
> > 
> 
> What if a level irqfd shares a line with a KVM_IRQ_LINE ioctl?  Then an
> EOI can de-assert the irqfd source, but the line is kept high by the
> last KVM_IRQ_LINE invocation.

As I understand Michael's proposal, the shared irq source id used by
level-deassert-irqfds can only be asserted via an irqfd injection and
can only be de-asserted by the ack notifier.  If we let any other
interface have access to the irq source id it breaks.  If KVM_IRQ_LINE
picks up and extension to specify the irq source id, it would have to be
prevented from accessing this one.  Thanks,

Alex




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 0/6] kvm: level irqfd support
  2012-08-16 16:36         ` Michael S. Tsirkin
@ 2012-08-16 16:39           ` Avi Kivity
  2012-08-16 16:54             ` Michael S. Tsirkin
  0 siblings, 1 reply; 33+ messages in thread
From: Avi Kivity @ 2012-08-16 16:39 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Alex Williamson, gleb, kvm, linux-kernel

On 08/16/2012 07:36 PM, Michael S. Tsirkin wrote:

>> What if a level irqfd shares a line with a KVM_IRQ_LINE ioctl?  Then an
>> EOI can de-assert the irqfd source, but the line is kept high by the
>> last KVM_IRQ_LINE invocation.
> 
> Exactly. So 1 ID for userspace and 1 for irqfd.

Gaa, this mess belongs in userspace.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 0/6] kvm: level irqfd support
  2012-08-16 16:32 ` Avi Kivity
@ 2012-08-16 16:45   ` Alex Williamson
  0 siblings, 0 replies; 33+ messages in thread
From: Alex Williamson @ 2012-08-16 16:45 UTC (permalink / raw)
  To: Avi Kivity; +Cc: mst, gleb, kvm, linux-kernel

On Thu, 2012-08-16 at 19:32 +0300, Avi Kivity wrote:
> On 08/11/2012 01:37 AM, Alex Williamson wrote:
> > v8:
> > 
> > Trying a new approach.  Nobody seems to like the internal IRQ
> > source ID object and the interactions it implies between irqfd
> > and eoifd, so let's get rid of it.  Instead, simply expose
> > IRQ source IDs to userspace.  This lets the user be in charge
> > of freeing them or hanging onto a source ID for later use.  They
> > can also detach and re-attach components at will.  It also opens
> > up the possibility that userspace might want to use each IRQ
> > source ID for more than one GSI (and avoids the kernel needing
> > to manage that).  Per suggestions, EOIFD is now IRQ_ACKFD.
> > 
> > I really wanted to add a de-assert-only option to irqfd so the
> > irq_ackfd could be fed directly into an irqfd, but I'm dependent
> > on the ordering of de-assert _then_ signal an eventfd.  Keeping
> > that ordering doesn't seem to be possible, especially since irqfd
> > uses a workqueue, if I attempt to make that connection.  Thanks,
> 
> I can't say I'm happy with exposing irq source IDs.  It's true that they
> correspond to a physical entity so they can't be said to be an
> implementation detail, but adding more ABIs has a cost and I can't say
> that I see another user for this.
> 
> Can you provide a link to the combined irqfd+ackfd implementation?  I'm
> inclined now to go for the simplest solution possible.

As soon as I write it :)  Keeping lists to handle the one-to-many
deassert-to-notify will notch up the complexity, but it'll be
interesting to see how it compares.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 0/6] kvm: level irqfd support
  2012-08-16 16:54             ` Michael S. Tsirkin
@ 2012-08-16 16:54               ` Avi Kivity
  2012-08-16 17:01                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 33+ messages in thread
From: Avi Kivity @ 2012-08-16 16:54 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Alex Williamson, gleb, kvm, linux-kernel

On 08/16/2012 07:54 PM, Michael S. Tsirkin wrote:
> On Thu, Aug 16, 2012 at 07:39:35PM +0300, Avi Kivity wrote:
>> On 08/16/2012 07:36 PM, Michael S. Tsirkin wrote:
>> 
>> >> What if a level irqfd shares a line with a KVM_IRQ_LINE ioctl?  Then an
>> >> EOI can de-assert the irqfd source, but the line is kept high by the
>> >> last KVM_IRQ_LINE invocation.
>> > 
>> > Exactly. So 1 ID for userspace and 1 for irqfd.
>> 
>> Gaa, this mess belongs in userspace.
> 
> Not sure I understand what you refer to.
> 
> I meant simply
> #define KVM_IRQFD_IRQ_SOURCE_ID    1
> request it at kvm init.
> 
> As opposed to using KVM_USERSPACE_IRQ_SOURCE_ID like we do now
> for edge.
> Does this seem acceptable to you?

I meant the pic/ioapic, not this particular bit.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 0/6] kvm: level irqfd support
  2012-08-16 16:39           ` Avi Kivity
@ 2012-08-16 16:54             ` Michael S. Tsirkin
  2012-08-16 16:54               ` Avi Kivity
  0 siblings, 1 reply; 33+ messages in thread
From: Michael S. Tsirkin @ 2012-08-16 16:54 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Alex Williamson, gleb, kvm, linux-kernel

On Thu, Aug 16, 2012 at 07:39:35PM +0300, Avi Kivity wrote:
> On 08/16/2012 07:36 PM, Michael S. Tsirkin wrote:
> 
> >> What if a level irqfd shares a line with a KVM_IRQ_LINE ioctl?  Then an
> >> EOI can de-assert the irqfd source, but the line is kept high by the
> >> last KVM_IRQ_LINE invocation.
> > 
> > Exactly. So 1 ID for userspace and 1 for irqfd.
> 
> Gaa, this mess belongs in userspace.

Not sure I understand what you refer to.

I meant simply
#define KVM_IRQFD_IRQ_SOURCE_ID    1
request it at kvm init.

As opposed to using KVM_USERSPACE_IRQ_SOURCE_ID like we do now
for edge.
Does this seem acceptable to you?

> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v8 0/6] kvm: level irqfd support
  2012-08-16 16:54               ` Avi Kivity
@ 2012-08-16 17:01                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 33+ messages in thread
From: Michael S. Tsirkin @ 2012-08-16 17:01 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Alex Williamson, gleb, kvm, linux-kernel

On Thu, Aug 16, 2012 at 07:54:04PM +0300, Avi Kivity wrote:
> On 08/16/2012 07:54 PM, Michael S. Tsirkin wrote:
> > On Thu, Aug 16, 2012 at 07:39:35PM +0300, Avi Kivity wrote:
> >> On 08/16/2012 07:36 PM, Michael S. Tsirkin wrote:
> >> 
> >> >> What if a level irqfd shares a line with a KVM_IRQ_LINE ioctl?  Then an
> >> >> EOI can de-assert the irqfd source, but the line is kept high by the
> >> >> last KVM_IRQ_LINE invocation.
> >> > 
> >> > Exactly. So 1 ID for userspace and 1 for irqfd.
> >> 
> >> Gaa, this mess belongs in userspace.
> > 
> > Not sure I understand what you refer to.
> > 
> > I meant simply
> > #define KVM_IRQFD_IRQ_SOURCE_ID    1
> > request it at kvm init.
> > 
> > As opposed to using KVM_USERSPACE_IRQ_SOURCE_ID like we do now
> > for edge.
> > Does this seem acceptable to you?
> 
> I meant the pic/ioapic, not this particular bit.

:)

> 
> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2012-08-16 18:23 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-10 22:37 [PATCH v8 0/6] kvm: level irqfd support Alex Williamson
2012-08-10 22:37 ` [PATCH v8 1/6] kvm: Allow filtering of acked irqs Alex Williamson
2012-08-15 12:27   ` Michael S. Tsirkin
2012-08-15 16:47     ` Alex Williamson
2012-08-15 19:24       ` Michael S. Tsirkin
2012-08-10 22:37 ` [PATCH v8 2/6] kvm: Expose IRQ source IDs to userspace Alex Williamson
2012-08-15 12:59   ` Michael S. Tsirkin
2012-08-15 17:05     ` Alex Williamson
2012-08-10 22:37 ` [PATCH v8 3/6] kvm: Add IRQ source ID option to KVM_IRQFD Alex Williamson
2012-08-15 13:49   ` Michael S. Tsirkin
2012-08-15 17:08     ` Alex Williamson
2012-08-10 22:37 ` [PATCH v8 4/6] kvm: Add assert-only " Alex Williamson
2012-08-10 22:37 ` [PATCH v8 5/6] kvm: KVM_IRQ_ACKFD Alex Williamson
2012-08-15 14:05   ` Michael S. Tsirkin
2012-08-15 17:17     ` Alex Williamson
2012-08-10 22:37 ` [PATCH v8 6/6] kvm: Add de-assert option to KVM_IRQ_ACKFD Alex Williamson
2012-08-15 14:11   ` Michael S. Tsirkin
2012-08-15 17:24     ` Alex Williamson
2012-08-15 14:28 ` [PATCH v8 0/6] kvm: level irqfd support Michael S. Tsirkin
2012-08-15 17:36   ` Alex Williamson
2012-08-15 19:22     ` Michael S. Tsirkin
2012-08-15 19:59       ` Alex Williamson
2012-08-16 12:34         ` Alex Williamson
2012-08-16 12:53           ` Michael S. Tsirkin
2012-08-16 16:29       ` Avi Kivity
2012-08-16 16:36         ` Michael S. Tsirkin
2012-08-16 16:39           ` Avi Kivity
2012-08-16 16:54             ` Michael S. Tsirkin
2012-08-16 16:54               ` Avi Kivity
2012-08-16 17:01                 ` Michael S. Tsirkin
2012-08-16 16:37         ` Alex Williamson
2012-08-16 16:32 ` Avi Kivity
2012-08-16 16:45   ` Alex Williamson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).