* [PATCH v9 0/2] kvm: level irqfd support @ 2012-08-21 19:28 Alex Williamson 2012-08-21 19:29 ` [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd Alex Williamson ` (4 more replies) 0 siblings, 5 replies; 35+ messages in thread From: Alex Williamson @ 2012-08-21 19:28 UTC (permalink / raw) To: avi, mst; +Cc: gleb, kvm, linux-kernel Here's the much anticipated re-write of support for level irqfds. As Michael suggested, I've rolled the eoi/ack notification fd into KVM_IRQFD as a new mode. For lack of a better name, as there seems to be objections to associating this specifically with an EOI or an ACK, I've name this OADN or "On Ack, De-assert & Notify". Patch 1of2 switches current KVM_IRQFDs to use their own IRQ source ID since we're potentially stepping on KVM_USERSPACE_IRQ_SOURCE_ID. Unfurtunately I was not able to make 2of2 use a single IRQ source ID, the reason is it's racy. Objects to track OADNs are made dynamically, we look through existing ones for a match under spinlock and setup a new one if there's no match. On teardown, we can remove the OADN from the list under lock, but that same lock prevents us from de-assigning the IRQ ACK notifier or waiting for an RCU grace period. We must make sure that any unused GSI is de-asserted, but the above means it's possible that another OADN has been created for this source ID/GSI and de-asserting the GSI could lead to breakage. Instead each OADN object gets it's own source ID, but these are all shared by users of the same GSI. So for PCI devices, we might have up to 4 IRQ source IDs allocated. Michael had also suggested avoiding reference counting and using list_empty for this OADN object. Unfortunately, that doesn't work for similar reasons. We want to release the OADN object underlock, preventing others from re-using it on the free path, but in order to have lock-less de-assert & notify we use RCU, meaning we can't trust list_empty until after an RCU grace period, which must be done outside of spinlocks. If there are suggestions how we can handle these better, please make them, but I think this compromise is race-free and still manages to make allocation of IRQ source IDs mostly a non-issue for device assignment limits. Thanks, Alex --- Alex Williamson (2): kvm: On Ack, De-assert & Notify KVM_IRQFD extension kvm: Use a reserved IRQ source ID for irqfd Documentation/virtual/kvm/api.txt | 13 ++ arch/x86/kvm/x86.c | 4 + include/linux/kvm.h | 7 + include/linux/kvm_host.h | 2 virt/kvm/eventfd.c | 199 ++++++++++++++++++++++++++++++++++++- 5 files changed, 218 insertions(+), 7 deletions(-) ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd 2012-08-21 19:28 [PATCH v9 0/2] kvm: level irqfd support Alex Williamson @ 2012-08-21 19:29 ` Alex Williamson 2012-08-21 19:58 ` Michael S. Tsirkin 2012-09-05 14:46 ` Avi Kivity 2012-08-21 19:29 ` [PATCH v9 2/2] kvm: On Ack, De-assert & Notify KVM_IRQFD extension Alex Williamson ` (3 subsequent siblings) 4 siblings, 2 replies; 35+ messages in thread From: Alex Williamson @ 2012-08-21 19:29 UTC (permalink / raw) To: avi, mst; +Cc: gleb, kvm, linux-kernel KVM_IRQFD currently uses the reserved KVM_USERSPACE_IRQ_SOURCE_ID which is also shared with userspace injection methods like KVM_IRQ_LINE. This can cause a conflict if an irqfd triggers on a GSI asserted through KVM_IRQ_LINE. Move irqfd to it's own reserved IRQ source ID. Add a capability for userspace to test for this fix. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> --- arch/x86/kvm/x86.c | 3 +++ include/linux/kvm.h | 1 + include/linux/kvm_host.h | 1 + virt/kvm/eventfd.c | 6 +++--- 4 files changed, 8 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 42bce48..cd98673 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2174,6 +2174,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_GET_TSC_KHZ: case KVM_CAP_PCI_2_3: case KVM_CAP_KVMCLOCK_CTRL: + case KVM_CAP_IRQFD_IRQ_SOURCE_ID: r = 1; break; case KVM_CAP_COALESCED_MMIO: @@ -6258,6 +6259,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) /* Reserve bit 0 of irq_sources_bitmap for userspace irq source */ set_bit(KVM_USERSPACE_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap); + /* Reserve bit 1 of irq_sources_bitmap for irqfd irq source */ + set_bit(KVM_IRQFD_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap); raw_spin_lock_init(&kvm->arch.tsc_write_lock); diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 2ce09aa..ae66b9c 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_PPC_GET_SMMU_INFO 78 #define KVM_CAP_S390_COW 79 #define KVM_CAP_PPC_ALLOC_HTAB 80 +#define KVM_CAP_IRQFD_IRQ_SOURCE_ID 81 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index b70b48b..b763230 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -71,6 +71,7 @@ #define KVM_REQ_PMI 17 #define KVM_USERSPACE_IRQ_SOURCE_ID 0 +#define KVM_IRQFD_IRQ_SOURCE_ID 1 struct kvm; struct kvm_vcpu; diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 7d7e2aa..2245cfa 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -67,8 +67,8 @@ irqfd_inject(struct work_struct *work) struct _irqfd *irqfd = container_of(work, struct _irqfd, inject); struct kvm *kvm = irqfd->kvm; - kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1); - kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0); + kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 1); + kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 0); } /* @@ -138,7 +138,7 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) irq = rcu_dereference(irqfd->irq_entry); /* An event has been signaled, inject an interrupt */ if (irq) - kvm_set_msi(irq, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1); + kvm_set_msi(irq, kvm, KVM_IRQFD_IRQ_SOURCE_ID, 1); else schedule_work(&irqfd->inject); rcu_read_unlock(); ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd 2012-08-21 19:29 ` [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd Alex Williamson @ 2012-08-21 19:58 ` Michael S. Tsirkin 2012-08-21 20:06 ` Alex Williamson 2012-09-05 14:46 ` Avi Kivity 1 sibling, 1 reply; 35+ messages in thread From: Michael S. Tsirkin @ 2012-08-21 19:58 UTC (permalink / raw) To: Alex Williamson; +Cc: avi, gleb, kvm, linux-kernel On Tue, Aug 21, 2012 at 01:29:06PM -0600, Alex Williamson wrote: > KVM_IRQFD currently uses the reserved KVM_USERSPACE_IRQ_SOURCE_ID > which is also shared with userspace injection methods like > KVM_IRQ_LINE. This can cause a conflict if an irqfd triggers on > a GSI asserted through KVM_IRQ_LINE. What kind of conflict do you envision? Pls note level interrupts are unsupported ATM. > Move irqfd to it's own reserved IRQ source ID. Add a capability for > userspace to test for this fix. > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > --- > > arch/x86/kvm/x86.c | 3 +++ > include/linux/kvm.h | 1 + > include/linux/kvm_host.h | 1 + > virt/kvm/eventfd.c | 6 +++--- > 4 files changed, 8 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 42bce48..cd98673 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -2174,6 +2174,7 @@ int kvm_dev_ioctl_check_extension(long ext) > case KVM_CAP_GET_TSC_KHZ: > case KVM_CAP_PCI_2_3: > case KVM_CAP_KVMCLOCK_CTRL: > + case KVM_CAP_IRQFD_IRQ_SOURCE_ID: > r = 1; > break; > case KVM_CAP_COALESCED_MMIO: > @@ -6258,6 +6259,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) > > /* Reserve bit 0 of irq_sources_bitmap for userspace irq source */ > set_bit(KVM_USERSPACE_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap); > + /* Reserve bit 1 of irq_sources_bitmap for irqfd irq source */ > + set_bit(KVM_IRQFD_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap); > > raw_spin_lock_init(&kvm->arch.tsc_write_lock); > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > index 2ce09aa..ae66b9c 100644 > --- a/include/linux/kvm.h > +++ b/include/linux/kvm.h > @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info { > #define KVM_CAP_PPC_GET_SMMU_INFO 78 > #define KVM_CAP_S390_COW 79 > #define KVM_CAP_PPC_ALLOC_HTAB 80 > +#define KVM_CAP_IRQFD_IRQ_SOURCE_ID 81 > > #ifdef KVM_CAP_IRQ_ROUTING > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index b70b48b..b763230 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -71,6 +71,7 @@ > #define KVM_REQ_PMI 17 > > #define KVM_USERSPACE_IRQ_SOURCE_ID 0 > +#define KVM_IRQFD_IRQ_SOURCE_ID 1 > > struct kvm; > struct kvm_vcpu; Above looks fine but I'm not sure why is the below needed. This changes irqfd behaviour for edge GSIs slightly in a userspace-visible way. Maybe make it a separate patch so it can be considered on merits? > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c > index 7d7e2aa..2245cfa 100644 > --- a/virt/kvm/eventfd.c > +++ b/virt/kvm/eventfd.c > @@ -67,8 +67,8 @@ irqfd_inject(struct work_struct *work) > struct _irqfd *irqfd = container_of(work, struct _irqfd, inject); > struct kvm *kvm = irqfd->kvm; > > - kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1); > - kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0); > + kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 1); > + kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 0); > } > > /* > @@ -138,7 +138,7 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) > irq = rcu_dereference(irqfd->irq_entry); > /* An event has been signaled, inject an interrupt */ > if (irq) > - kvm_set_msi(irq, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1); > + kvm_set_msi(irq, kvm, KVM_IRQFD_IRQ_SOURCE_ID, 1); > else > schedule_work(&irqfd->inject); > rcu_read_unlock(); ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd 2012-08-21 19:58 ` Michael S. Tsirkin @ 2012-08-21 20:06 ` Alex Williamson 2012-08-21 20:41 ` Michael S. Tsirkin 0 siblings, 1 reply; 35+ messages in thread From: Alex Williamson @ 2012-08-21 20:06 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: avi, gleb, kvm, linux-kernel On Tue, 2012-08-21 at 22:58 +0300, Michael S. Tsirkin wrote: > On Tue, Aug 21, 2012 at 01:29:06PM -0600, Alex Williamson wrote: > > KVM_IRQFD currently uses the reserved KVM_USERSPACE_IRQ_SOURCE_ID > > which is also shared with userspace injection methods like > > KVM_IRQ_LINE. This can cause a conflict if an irqfd triggers on > > a GSI asserted through KVM_IRQ_LINE. > > What kind of conflict do you envision? Pls note level interrupts are > unsupported ATM. If KVM_IRQ_LINE asserts a level interrupt and KVM_IRQFD triggers on the same GSI then the pin is no longer asserted as userspace thinks it is. Do we just chalk this up to userspace error? > > Move irqfd to it's own reserved IRQ source ID. Add a capability for > > userspace to test for this fix. > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > > --- > > > > arch/x86/kvm/x86.c | 3 +++ > > include/linux/kvm.h | 1 + > > include/linux/kvm_host.h | 1 + > > virt/kvm/eventfd.c | 6 +++--- > > 4 files changed, 8 insertions(+), 3 deletions(-) > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index 42bce48..cd98673 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -2174,6 +2174,7 @@ int kvm_dev_ioctl_check_extension(long ext) > > case KVM_CAP_GET_TSC_KHZ: > > case KVM_CAP_PCI_2_3: > > case KVM_CAP_KVMCLOCK_CTRL: > > + case KVM_CAP_IRQFD_IRQ_SOURCE_ID: > > r = 1; > > break; > > case KVM_CAP_COALESCED_MMIO: > > @@ -6258,6 +6259,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) > > > > /* Reserve bit 0 of irq_sources_bitmap for userspace irq source */ > > set_bit(KVM_USERSPACE_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap); > > + /* Reserve bit 1 of irq_sources_bitmap for irqfd irq source */ > > + set_bit(KVM_IRQFD_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap); > > > > raw_spin_lock_init(&kvm->arch.tsc_write_lock); > > > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > > index 2ce09aa..ae66b9c 100644 > > --- a/include/linux/kvm.h > > +++ b/include/linux/kvm.h > > @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info { > > #define KVM_CAP_PPC_GET_SMMU_INFO 78 > > #define KVM_CAP_S390_COW 79 > > #define KVM_CAP_PPC_ALLOC_HTAB 80 > > +#define KVM_CAP_IRQFD_IRQ_SOURCE_ID 81 > > > > #ifdef KVM_CAP_IRQ_ROUTING > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > index b70b48b..b763230 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -71,6 +71,7 @@ > > #define KVM_REQ_PMI 17 > > > > #define KVM_USERSPACE_IRQ_SOURCE_ID 0 > > +#define KVM_IRQFD_IRQ_SOURCE_ID 1 > > > > struct kvm; > > struct kvm_vcpu; > > Above looks fine but I'm not sure why is the below needed. > This changes irqfd behaviour for edge GSIs slightly > in a userspace-visible way. Maybe make it a separate patch > so it can be considered on merits? Hmm, the above does nothing without the below. I thought I was just implementing your idea that IRQFDs should all share a single IRQ source ID... why is that no longer a good idea? Thanks, Alex > > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c > > index 7d7e2aa..2245cfa 100644 > > --- a/virt/kvm/eventfd.c > > +++ b/virt/kvm/eventfd.c > > @@ -67,8 +67,8 @@ irqfd_inject(struct work_struct *work) > > struct _irqfd *irqfd = container_of(work, struct _irqfd, inject); > > struct kvm *kvm = irqfd->kvm; > > > > - kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1); > > - kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0); > > + kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 1); > > + kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 0); > > } > > > > /* > > @@ -138,7 +138,7 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) > > irq = rcu_dereference(irqfd->irq_entry); > > /* An event has been signaled, inject an interrupt */ > > if (irq) > > - kvm_set_msi(irq, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1); > > + kvm_set_msi(irq, kvm, KVM_IRQFD_IRQ_SOURCE_ID, 1); > > else > > schedule_work(&irqfd->inject); > > rcu_read_unlock(); ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd 2012-08-21 20:06 ` Alex Williamson @ 2012-08-21 20:41 ` Michael S. Tsirkin 2012-08-21 21:14 ` Alex Williamson 0 siblings, 1 reply; 35+ messages in thread From: Michael S. Tsirkin @ 2012-08-21 20:41 UTC (permalink / raw) To: Alex Williamson; +Cc: avi, gleb, kvm, linux-kernel On Tue, Aug 21, 2012 at 02:06:19PM -0600, Alex Williamson wrote: > On Tue, 2012-08-21 at 22:58 +0300, Michael S. Tsirkin wrote: > > On Tue, Aug 21, 2012 at 01:29:06PM -0600, Alex Williamson wrote: > > > KVM_IRQFD currently uses the reserved KVM_USERSPACE_IRQ_SOURCE_ID > > > which is also shared with userspace injection methods like > > > KVM_IRQ_LINE. This can cause a conflict if an irqfd triggers on > > > a GSI asserted through KVM_IRQ_LINE. > > > > What kind of conflict do you envision? Pls note level interrupts are > > unsupported ATM. > > If KVM_IRQ_LINE asserts a level interrupt and KVM_IRQFD triggers on the > same GSI then the pin is no longer asserted as userspace thinks it is. > Do we just chalk this up to userspace error? Yes: using a level GSI with current irqfd is a userspace error because you can lose interrupts anyway. Are edge GSIs affected? > > > Move irqfd to it's own reserved IRQ source ID. Add a capability for > > > userspace to test for this fix. > > > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > > > --- > > > > > > arch/x86/kvm/x86.c | 3 +++ > > > include/linux/kvm.h | 1 + > > > include/linux/kvm_host.h | 1 + > > > virt/kvm/eventfd.c | 6 +++--- > > > 4 files changed, 8 insertions(+), 3 deletions(-) > > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > > index 42bce48..cd98673 100644 > > > --- a/arch/x86/kvm/x86.c > > > +++ b/arch/x86/kvm/x86.c > > > @@ -2174,6 +2174,7 @@ int kvm_dev_ioctl_check_extension(long ext) > > > case KVM_CAP_GET_TSC_KHZ: > > > case KVM_CAP_PCI_2_3: > > > case KVM_CAP_KVMCLOCK_CTRL: > > > + case KVM_CAP_IRQFD_IRQ_SOURCE_ID: > > > r = 1; > > > break; > > > case KVM_CAP_COALESCED_MMIO: > > > @@ -6258,6 +6259,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) > > > > > > /* Reserve bit 0 of irq_sources_bitmap for userspace irq source */ > > > set_bit(KVM_USERSPACE_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap); > > > + /* Reserve bit 1 of irq_sources_bitmap for irqfd irq source */ > > > + set_bit(KVM_IRQFD_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap); > > > > > > raw_spin_lock_init(&kvm->arch.tsc_write_lock); > > > > > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > > > index 2ce09aa..ae66b9c 100644 > > > --- a/include/linux/kvm.h > > > +++ b/include/linux/kvm.h > > > @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info { > > > #define KVM_CAP_PPC_GET_SMMU_INFO 78 > > > #define KVM_CAP_S390_COW 79 > > > #define KVM_CAP_PPC_ALLOC_HTAB 80 > > > +#define KVM_CAP_IRQFD_IRQ_SOURCE_ID 81 > > > > > > #ifdef KVM_CAP_IRQ_ROUTING > > > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > > index b70b48b..b763230 100644 > > > --- a/include/linux/kvm_host.h > > > +++ b/include/linux/kvm_host.h > > > @@ -71,6 +71,7 @@ > > > #define KVM_REQ_PMI 17 > > > > > > #define KVM_USERSPACE_IRQ_SOURCE_ID 0 > > > +#define KVM_IRQFD_IRQ_SOURCE_ID 1 > > > > > > struct kvm; > > > struct kvm_vcpu; > > > > Above looks fine but I'm not sure why is the below needed. > > This changes irqfd behaviour for edge GSIs slightly > > in a userspace-visible way. Maybe make it a separate patch > > so it can be considered on merits? > > Hmm, the above does nothing without the below. Yes. But you can use the above with the new irqfds you are adding. > I thought I was just > implementing your idea that IRQFDs should all share a single IRQ source > ID... Sorry I only meant for level irqfds. You are changing edge here. > why is that no longer a good idea? Thanks, > > Alex Maybe it is a good idea. I am just asking for the motivation. > > > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c > > > index 7d7e2aa..2245cfa 100644 > > > --- a/virt/kvm/eventfd.c > > > +++ b/virt/kvm/eventfd.c > > > @@ -67,8 +67,8 @@ irqfd_inject(struct work_struct *work) > > > struct _irqfd *irqfd = container_of(work, struct _irqfd, inject); > > > struct kvm *kvm = irqfd->kvm; > > > > > > - kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1); > > > - kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0); > > > + kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 1); > > > + kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 0); > > > } > > > > > > /* > > > @@ -138,7 +138,7 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) > > > irq = rcu_dereference(irqfd->irq_entry); > > > /* An event has been signaled, inject an interrupt */ > > > if (irq) > > > - kvm_set_msi(irq, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1); > > > + kvm_set_msi(irq, kvm, KVM_IRQFD_IRQ_SOURCE_ID, 1); > > > else > > > schedule_work(&irqfd->inject); > > > rcu_read_unlock(); > > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd 2012-08-21 20:41 ` Michael S. Tsirkin @ 2012-08-21 21:14 ` Alex Williamson 2012-08-22 0:41 ` Michael S. Tsirkin 0 siblings, 1 reply; 35+ messages in thread From: Alex Williamson @ 2012-08-21 21:14 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: avi, gleb, kvm, linux-kernel On Tue, 2012-08-21 at 23:41 +0300, Michael S. Tsirkin wrote: > On Tue, Aug 21, 2012 at 02:06:19PM -0600, Alex Williamson wrote: > > On Tue, 2012-08-21 at 22:58 +0300, Michael S. Tsirkin wrote: > > > On Tue, Aug 21, 2012 at 01:29:06PM -0600, Alex Williamson wrote: > > > > KVM_IRQFD currently uses the reserved KVM_USERSPACE_IRQ_SOURCE_ID > > > > which is also shared with userspace injection methods like > > > > KVM_IRQ_LINE. This can cause a conflict if an irqfd triggers on > > > > a GSI asserted through KVM_IRQ_LINE. > > > > > > What kind of conflict do you envision? Pls note level interrupts are > > > unsupported ATM. > > > > If KVM_IRQ_LINE asserts a level interrupt and KVM_IRQFD triggers on the > > same GSI then the pin is no longer asserted as userspace thinks it is. > > Do we just chalk this up to userspace error? > > Yes: using a level GSI with current irqfd is a userspace error > because you can lose interrupts anyway. > > Are edge GSIs affected? I wouldn't think so. > > > > Move irqfd to it's own reserved IRQ source ID. Add a capability for > > > > userspace to test for this fix. > > > > > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > > > > --- > > > > > > > > arch/x86/kvm/x86.c | 3 +++ > > > > include/linux/kvm.h | 1 + > > > > include/linux/kvm_host.h | 1 + > > > > virt/kvm/eventfd.c | 6 +++--- > > > > 4 files changed, 8 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > > > index 42bce48..cd98673 100644 > > > > --- a/arch/x86/kvm/x86.c > > > > +++ b/arch/x86/kvm/x86.c > > > > @@ -2174,6 +2174,7 @@ int kvm_dev_ioctl_check_extension(long ext) > > > > case KVM_CAP_GET_TSC_KHZ: > > > > case KVM_CAP_PCI_2_3: > > > > case KVM_CAP_KVMCLOCK_CTRL: > > > > + case KVM_CAP_IRQFD_IRQ_SOURCE_ID: > > > > r = 1; > > > > break; > > > > case KVM_CAP_COALESCED_MMIO: > > > > @@ -6258,6 +6259,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) > > > > > > > > /* Reserve bit 0 of irq_sources_bitmap for userspace irq source */ > > > > set_bit(KVM_USERSPACE_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap); > > > > + /* Reserve bit 1 of irq_sources_bitmap for irqfd irq source */ > > > > + set_bit(KVM_IRQFD_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap); > > > > > > > > raw_spin_lock_init(&kvm->arch.tsc_write_lock); > > > > > > > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > > > > index 2ce09aa..ae66b9c 100644 > > > > --- a/include/linux/kvm.h > > > > +++ b/include/linux/kvm.h > > > > @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info { > > > > #define KVM_CAP_PPC_GET_SMMU_INFO 78 > > > > #define KVM_CAP_S390_COW 79 > > > > #define KVM_CAP_PPC_ALLOC_HTAB 80 > > > > +#define KVM_CAP_IRQFD_IRQ_SOURCE_ID 81 > > > > > > > > #ifdef KVM_CAP_IRQ_ROUTING > > > > > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > > > index b70b48b..b763230 100644 > > > > --- a/include/linux/kvm_host.h > > > > +++ b/include/linux/kvm_host.h > > > > @@ -71,6 +71,7 @@ > > > > #define KVM_REQ_PMI 17 > > > > > > > > #define KVM_USERSPACE_IRQ_SOURCE_ID 0 > > > > +#define KVM_IRQFD_IRQ_SOURCE_ID 1 > > > > > > > > struct kvm; > > > > struct kvm_vcpu; > > > > > > Above looks fine but I'm not sure why is the below needed. > > > This changes irqfd behaviour for edge GSIs slightly > > > in a userspace-visible way. Maybe make it a separate patch > > > so it can be considered on merits? > > > > Hmm, the above does nothing without the below. > > Yes. But you can use the above with the new irqfds you are adding. Nope, racy. > > I thought I was just > > implementing your idea that IRQFDs should all share a single IRQ source > > ID... > > Sorry I only meant for level irqfds. You are changing edge here. Ok, I misunderstood then. > > why is that no longer a good idea? Thanks, > > > > Alex > > Maybe it is a good idea. I am just asking for the motivation. I assumed you were pointing out the level vs edge interaction. If we call that a userspace bug, I can just drop this. Thanks, Alex > > > > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c > > > > index 7d7e2aa..2245cfa 100644 > > > > --- a/virt/kvm/eventfd.c > > > > +++ b/virt/kvm/eventfd.c > > > > @@ -67,8 +67,8 @@ irqfd_inject(struct work_struct *work) > > > > struct _irqfd *irqfd = container_of(work, struct _irqfd, inject); > > > > struct kvm *kvm = irqfd->kvm; > > > > > > > > - kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1); > > > > - kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0); > > > > + kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 1); > > > > + kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 0); > > > > } > > > > > > > > /* > > > > @@ -138,7 +138,7 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) > > > > irq = rcu_dereference(irqfd->irq_entry); > > > > /* An event has been signaled, inject an interrupt */ > > > > if (irq) > > > > - kvm_set_msi(irq, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1); > > > > + kvm_set_msi(irq, kvm, KVM_IRQFD_IRQ_SOURCE_ID, 1); > > > > else > > > > schedule_work(&irqfd->inject); > > > > rcu_read_unlock(); > > > > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd 2012-08-21 21:14 ` Alex Williamson @ 2012-08-22 0:41 ` Michael S. Tsirkin 2012-08-22 1:34 ` Alex Williamson 2012-09-05 14:35 ` Avi Kivity 0 siblings, 2 replies; 35+ messages in thread From: Michael S. Tsirkin @ 2012-08-22 0:41 UTC (permalink / raw) To: Alex Williamson; +Cc: avi, gleb, kvm, linux-kernel On Tue, Aug 21, 2012 at 03:14:54PM -0600, Alex Williamson wrote: > On Tue, 2012-08-21 at 23:41 +0300, Michael S. Tsirkin wrote: > > On Tue, Aug 21, 2012 at 02:06:19PM -0600, Alex Williamson wrote: > > > On Tue, 2012-08-21 at 22:58 +0300, Michael S. Tsirkin wrote: > > > > On Tue, Aug 21, 2012 at 01:29:06PM -0600, Alex Williamson wrote: > > > > > KVM_IRQFD currently uses the reserved KVM_USERSPACE_IRQ_SOURCE_ID > > > > > which is also shared with userspace injection methods like > > > > > KVM_IRQ_LINE. This can cause a conflict if an irqfd triggers on > > > > > a GSI asserted through KVM_IRQ_LINE. > > > > > > > > What kind of conflict do you envision? Pls note level interrupts are > > > > unsupported ATM. > > > > > > If KVM_IRQ_LINE asserts a level interrupt and KVM_IRQFD triggers on the > > > same GSI then the pin is no longer asserted as userspace thinks it is. > > > Do we just chalk this up to userspace error? > > > > Yes: using a level GSI with current irqfd is a userspace error > > because you can lose interrupts anyway. > > > > Are edge GSIs affected? > > I wouldn't think so. No? If userspace does . set line to 1 . trigger irqfd . set line to 1 . trigger irqfd . set line to 1 . trigger irqfd . set line to 1 it gets 4 interrupts now With your patch it will get 1, right? > > > > > Move irqfd to it's own reserved IRQ source ID. Add a capability for > > > > > userspace to test for this fix. > > > > > > > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > > > > > --- > > > > > > > > > > arch/x86/kvm/x86.c | 3 +++ > > > > > include/linux/kvm.h | 1 + > > > > > include/linux/kvm_host.h | 1 + > > > > > virt/kvm/eventfd.c | 6 +++--- > > > > > 4 files changed, 8 insertions(+), 3 deletions(-) > > > > > > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > > > > index 42bce48..cd98673 100644 > > > > > --- a/arch/x86/kvm/x86.c > > > > > +++ b/arch/x86/kvm/x86.c > > > > > @@ -2174,6 +2174,7 @@ int kvm_dev_ioctl_check_extension(long ext) > > > > > case KVM_CAP_GET_TSC_KHZ: > > > > > case KVM_CAP_PCI_2_3: > > > > > case KVM_CAP_KVMCLOCK_CTRL: > > > > > + case KVM_CAP_IRQFD_IRQ_SOURCE_ID: > > > > > r = 1; > > > > > break; > > > > > case KVM_CAP_COALESCED_MMIO: > > > > > @@ -6258,6 +6259,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) > > > > > > > > > > /* Reserve bit 0 of irq_sources_bitmap for userspace irq source */ > > > > > set_bit(KVM_USERSPACE_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap); > > > > > + /* Reserve bit 1 of irq_sources_bitmap for irqfd irq source */ > > > > > + set_bit(KVM_IRQFD_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap); > > > > > > > > > > raw_spin_lock_init(&kvm->arch.tsc_write_lock); > > > > > > > > > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > > > > > index 2ce09aa..ae66b9c 100644 > > > > > --- a/include/linux/kvm.h > > > > > +++ b/include/linux/kvm.h > > > > > @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info { > > > > > #define KVM_CAP_PPC_GET_SMMU_INFO 78 > > > > > #define KVM_CAP_S390_COW 79 > > > > > #define KVM_CAP_PPC_ALLOC_HTAB 80 > > > > > +#define KVM_CAP_IRQFD_IRQ_SOURCE_ID 81 > > > > > > > > > > #ifdef KVM_CAP_IRQ_ROUTING > > > > > > > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > > > > index b70b48b..b763230 100644 > > > > > --- a/include/linux/kvm_host.h > > > > > +++ b/include/linux/kvm_host.h > > > > > @@ -71,6 +71,7 @@ > > > > > #define KVM_REQ_PMI 17 > > > > > > > > > > #define KVM_USERSPACE_IRQ_SOURCE_ID 0 > > > > > +#define KVM_IRQFD_IRQ_SOURCE_ID 1 > > > > > > > > > > struct kvm; > > > > > struct kvm_vcpu; > > > > > > > > Above looks fine but I'm not sure why is the below needed. > > > > This changes irqfd behaviour for edge GSIs slightly > > > > in a userspace-visible way. Maybe make it a separate patch > > > > so it can be considered on merits? > > > > > > Hmm, the above does nothing without the below. > > > > Yes. But you can use the above with the new irqfds you are adding. > > Nope, racy. > > > > I thought I was just > > > implementing your idea that IRQFDs should all share a single IRQ source > > > ID... > > > > Sorry I only meant for level irqfds. You are changing edge here. > > Ok, I misunderstood then. > > > > why is that no longer a good idea? Thanks, > > > > > > Alex > > > > Maybe it is a good idea. I am just asking for the motivation. > > I assumed you were pointing out the level vs edge interaction. If we > call that a userspace bug, I can just drop this. Thanks, > > Alex level is userspace bug I think :) > > > > > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c > > > > > index 7d7e2aa..2245cfa 100644 > > > > > --- a/virt/kvm/eventfd.c > > > > > +++ b/virt/kvm/eventfd.c > > > > > @@ -67,8 +67,8 @@ irqfd_inject(struct work_struct *work) > > > > > struct _irqfd *irqfd = container_of(work, struct _irqfd, inject); > > > > > struct kvm *kvm = irqfd->kvm; > > > > > > > > > > - kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1); > > > > > - kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0); > > > > > + kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 1); > > > > > + kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 0); > > > > > } > > > > > > > > > > /* > > > > > @@ -138,7 +138,7 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) > > > > > irq = rcu_dereference(irqfd->irq_entry); > > > > > /* An event has been signaled, inject an interrupt */ > > > > > if (irq) > > > > > - kvm_set_msi(irq, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1); > > > > > + kvm_set_msi(irq, kvm, KVM_IRQFD_IRQ_SOURCE_ID, 1); > > > > > else > > > > > schedule_work(&irqfd->inject); > > > > > rcu_read_unlock(); > > > > > > > > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd 2012-08-22 0:41 ` Michael S. Tsirkin @ 2012-08-22 1:34 ` Alex Williamson 2012-09-05 14:35 ` Avi Kivity 1 sibling, 0 replies; 35+ messages in thread From: Alex Williamson @ 2012-08-22 1:34 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: avi, gleb, kvm, linux-kernel On Wed, 2012-08-22 at 03:41 +0300, Michael S. Tsirkin wrote: > On Tue, Aug 21, 2012 at 03:14:54PM -0600, Alex Williamson wrote: > > On Tue, 2012-08-21 at 23:41 +0300, Michael S. Tsirkin wrote: > > > On Tue, Aug 21, 2012 at 02:06:19PM -0600, Alex Williamson wrote: > > > > On Tue, 2012-08-21 at 22:58 +0300, Michael S. Tsirkin wrote: > > > > > On Tue, Aug 21, 2012 at 01:29:06PM -0600, Alex Williamson wrote: > > > > > > KVM_IRQFD currently uses the reserved KVM_USERSPACE_IRQ_SOURCE_ID > > > > > > which is also shared with userspace injection methods like > > > > > > KVM_IRQ_LINE. This can cause a conflict if an irqfd triggers on > > > > > > a GSI asserted through KVM_IRQ_LINE. > > > > > > > > > > What kind of conflict do you envision? Pls note level interrupts are > > > > > unsupported ATM. > > > > > > > > If KVM_IRQ_LINE asserts a level interrupt and KVM_IRQFD triggers on the > > > > same GSI then the pin is no longer asserted as userspace thinks it is. > > > > Do we just chalk this up to userspace error? > > > > > > Yes: using a level GSI with current irqfd is a userspace error > > > because you can lose interrupts anyway. > > > > > > Are edge GSIs affected? > > > > I wouldn't think so. > > No? If userspace does > > . set line to 1 > . trigger irqfd > . set line to 1 > . trigger irqfd > . set line to 1 > . trigger irqfd > . set line to 1 > > it gets 4 interrupts now > > With your patch it will get 1, right? > > > > > > > Move irqfd to it's own reserved IRQ source ID. Add a capability for > > > > > > userspace to test for this fix. > > > > > > > > > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > > > > > > --- > > > > > > > > > > > > arch/x86/kvm/x86.c | 3 +++ > > > > > > include/linux/kvm.h | 1 + > > > > > > include/linux/kvm_host.h | 1 + > > > > > > virt/kvm/eventfd.c | 6 +++--- > > > > > > 4 files changed, 8 insertions(+), 3 deletions(-) > > > > > > > > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > > > > > index 42bce48..cd98673 100644 > > > > > > --- a/arch/x86/kvm/x86.c > > > > > > +++ b/arch/x86/kvm/x86.c > > > > > > @@ -2174,6 +2174,7 @@ int kvm_dev_ioctl_check_extension(long ext) > > > > > > case KVM_CAP_GET_TSC_KHZ: > > > > > > case KVM_CAP_PCI_2_3: > > > > > > case KVM_CAP_KVMCLOCK_CTRL: > > > > > > + case KVM_CAP_IRQFD_IRQ_SOURCE_ID: > > > > > > r = 1; > > > > > > break; > > > > > > case KVM_CAP_COALESCED_MMIO: > > > > > > @@ -6258,6 +6259,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) > > > > > > > > > > > > /* Reserve bit 0 of irq_sources_bitmap for userspace irq source */ > > > > > > set_bit(KVM_USERSPACE_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap); > > > > > > + /* Reserve bit 1 of irq_sources_bitmap for irqfd irq source */ > > > > > > + set_bit(KVM_IRQFD_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap); > > > > > > > > > > > > raw_spin_lock_init(&kvm->arch.tsc_write_lock); > > > > > > > > > > > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > > > > > > index 2ce09aa..ae66b9c 100644 > > > > > > --- a/include/linux/kvm.h > > > > > > +++ b/include/linux/kvm.h > > > > > > @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info { > > > > > > #define KVM_CAP_PPC_GET_SMMU_INFO 78 > > > > > > #define KVM_CAP_S390_COW 79 > > > > > > #define KVM_CAP_PPC_ALLOC_HTAB 80 > > > > > > +#define KVM_CAP_IRQFD_IRQ_SOURCE_ID 81 > > > > > > > > > > > > #ifdef KVM_CAP_IRQ_ROUTING > > > > > > > > > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > > > > > index b70b48b..b763230 100644 > > > > > > --- a/include/linux/kvm_host.h > > > > > > +++ b/include/linux/kvm_host.h > > > > > > @@ -71,6 +71,7 @@ > > > > > > #define KVM_REQ_PMI 17 > > > > > > > > > > > > #define KVM_USERSPACE_IRQ_SOURCE_ID 0 > > > > > > +#define KVM_IRQFD_IRQ_SOURCE_ID 1 > > > > > > > > > > > > struct kvm; > > > > > > struct kvm_vcpu; > > > > > > > > > > Above looks fine but I'm not sure why is the below needed. > > > > > This changes irqfd behaviour for edge GSIs slightly > > > > > in a userspace-visible way. Maybe make it a separate patch > > > > > so it can be considered on merits? > > > > > > > > Hmm, the above does nothing without the below. > > > > > > Yes. But you can use the above with the new irqfds you are adding. > > > > Nope, racy. > > > > > > I thought I was just > > > > implementing your idea that IRQFDs should all share a single IRQ source > > > > ID... > > > > > > Sorry I only meant for level irqfds. You are changing edge here. > > > > Ok, I misunderstood then. > > > > > > why is that no longer a good idea? Thanks, > > > > > > > > Alex > > > > > > Maybe it is a good idea. I am just asking for the motivation. > > > > I assumed you were pointing out the level vs edge interaction. If we > > call that a userspace bug, I can just drop this. Thanks, > > > > Alex > > level is userspace bug I think :) Dropped. Thanks, Alex ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd 2012-08-22 0:41 ` Michael S. Tsirkin 2012-08-22 1:34 ` Alex Williamson @ 2012-09-05 14:35 ` Avi Kivity 2012-09-05 14:51 ` Michael S. Tsirkin 1 sibling, 1 reply; 35+ messages in thread From: Avi Kivity @ 2012-09-05 14:35 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Alex Williamson, gleb, kvm, linux-kernel On 08/22/2012 03:41 AM, Michael S. Tsirkin wrote: >> >> I assumed you were pointing out the level vs edge interaction. If we >> call that a userspace bug, I can just drop this. Thanks, >> >> Alex > > level is userspace bug I think :) I don't see how it's a bug. Suppose we have a vfio device that shares a gsi with an emulated device. The emulated device naturally uses KVM_IRQ_LINE (it has no need to re-sample on ADN), while vfio naturally has to use irqfd. Note one would expect that each irqfd gets its own irq source id, since they are all independent level sources. The reason they don't is that we shut them down anyway and let the sources re-trigger (it is more accurate to say that they have no irq source id, but that would just muddle the implementation). Alex, if the conclusion is that we do need this patch, then please add a comment explaining why we can share the source id among all irqfd users. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd 2012-09-05 14:35 ` Avi Kivity @ 2012-09-05 14:51 ` Michael S. Tsirkin 2012-09-05 14:59 ` Avi Kivity 2012-09-05 15:09 ` Michael S. Tsirkin 0 siblings, 2 replies; 35+ messages in thread From: Michael S. Tsirkin @ 2012-09-05 14:51 UTC (permalink / raw) To: Avi Kivity; +Cc: Alex Williamson, gleb, kvm, linux-kernel On Wed, Sep 05, 2012 at 05:35:43PM +0300, Avi Kivity wrote: > On 08/22/2012 03:41 AM, Michael S. Tsirkin wrote: > >> > >> I assumed you were pointing out the level vs edge interaction. If we > >> call that a userspace bug, I can just drop this. Thanks, > >> > >> Alex > > > > level is userspace bug I think :) > > I don't see how it's a bug. Suppose we have a vfio device that shares a > gsi with an emulated device. The emulated device naturally uses > KVM_IRQ_LINE (it has no need to re-sample on ADN), while vfio naturally > has to use irqfd. Absolutely. But vfio needs to use irqfd with the new flag. Using existing irqfd for level is a bug. > Note one would expect that each irqfd gets its own irq source id, since > they are all independent level sources. The reason they don't is that > we shut them down anyway and let the sources re-trigger (it is more > accurate to say that they have no irq source id, but that would just > muddle the implementation). > > Alex, if the conclusion is that we do need this patch, then please add a > comment explaining why we can share the source id among all irqfd users. > > -- > error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd 2012-09-05 14:51 ` Michael S. Tsirkin @ 2012-09-05 14:59 ` Avi Kivity 2012-09-05 15:13 ` Michael S. Tsirkin 2012-09-05 15:09 ` Michael S. Tsirkin 1 sibling, 1 reply; 35+ messages in thread From: Avi Kivity @ 2012-09-05 14:59 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Alex Williamson, gleb, kvm, linux-kernel On 09/05/2012 05:51 PM, Michael S. Tsirkin wrote: > On Wed, Sep 05, 2012 at 05:35:43PM +0300, Avi Kivity wrote: >> On 08/22/2012 03:41 AM, Michael S. Tsirkin wrote: >> >> >> >> I assumed you were pointing out the level vs edge interaction. If we >> >> call that a userspace bug, I can just drop this. Thanks, >> >> >> >> Alex >> > >> > level is userspace bug I think :) >> >> I don't see how it's a bug. Suppose we have a vfio device that shares a >> gsi with an emulated device. The emulated device naturally uses >> KVM_IRQ_LINE (it has no need to re-sample on ADN), while vfio naturally >> has to use irqfd. > > Absolutely. But vfio needs to use irqfd with the new flag. > Using existing irqfd for level is a bug. I see we're not reusing this irq source id for level irqfd. But I think we should, there's no need for per-gsi irq source id. Plus I'd like to fix the theoretical bug even if it doesn't bite in practice. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd 2012-09-05 14:59 ` Avi Kivity @ 2012-09-05 15:13 ` Michael S. Tsirkin 2012-09-05 15:22 ` Avi Kivity 0 siblings, 1 reply; 35+ messages in thread From: Michael S. Tsirkin @ 2012-09-05 15:13 UTC (permalink / raw) To: Avi Kivity; +Cc: Alex Williamson, gleb, kvm, linux-kernel On Wed, Sep 05, 2012 at 05:59:46PM +0300, Avi Kivity wrote: > On 09/05/2012 05:51 PM, Michael S. Tsirkin wrote: > > On Wed, Sep 05, 2012 at 05:35:43PM +0300, Avi Kivity wrote: > >> On 08/22/2012 03:41 AM, Michael S. Tsirkin wrote: > >> >> > >> >> I assumed you were pointing out the level vs edge interaction. If we > >> >> call that a userspace bug, I can just drop this. Thanks, > >> >> > >> >> Alex > >> > > >> > level is userspace bug I think :) > >> > >> I don't see how it's a bug. Suppose we have a vfio device that shares a > >> gsi with an emulated device. The emulated device naturally uses > >> KVM_IRQ_LINE (it has no need to re-sample on ADN), while vfio naturally > >> has to use irqfd. > > > > Absolutely. But vfio needs to use irqfd with the new flag. > > Using existing irqfd for level is a bug. > > I see we're not reusing this irq source id for level irqfd. But I think > we should, there's no need for per-gsi irq source id. I agree. All resample irqfds are deasserted at the same time, tracking them separately gets us nothing. > Plus I'd like to > fix the theoretical bug even if it doesn't bite in practice. > I'm not sure what the bug is, for edge, and how a separate ID fixes it. Could you clarify? > > -- > error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd 2012-09-05 15:13 ` Michael S. Tsirkin @ 2012-09-05 15:22 ` Avi Kivity 2012-09-05 15:28 ` Michael S. Tsirkin 0 siblings, 1 reply; 35+ messages in thread From: Avi Kivity @ 2012-09-05 15:22 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Alex Williamson, gleb, kvm, linux-kernel On 09/05/2012 06:13 PM, Michael S. Tsirkin wrote: > On Wed, Sep 05, 2012 at 05:59:46PM +0300, Avi Kivity wrote: >> On 09/05/2012 05:51 PM, Michael S. Tsirkin wrote: >> > On Wed, Sep 05, 2012 at 05:35:43PM +0300, Avi Kivity wrote: >> >> On 08/22/2012 03:41 AM, Michael S. Tsirkin wrote: >> >> >> >> >> >> I assumed you were pointing out the level vs edge interaction. If we >> >> >> call that a userspace bug, I can just drop this. Thanks, >> >> >> >> >> >> Alex >> >> > >> >> > level is userspace bug I think :) >> >> >> >> I don't see how it's a bug. Suppose we have a vfio device that shares a >> >> gsi with an emulated device. The emulated device naturally uses >> >> KVM_IRQ_LINE (it has no need to re-sample on ADN), while vfio naturally >> >> has to use irqfd. >> > >> > Absolutely. But vfio needs to use irqfd with the new flag. >> > Using existing irqfd for level is a bug. >> >> I see we're not reusing this irq source id for level irqfd. But I think >> we should, there's no need for per-gsi irq source id. > > I agree. All resample irqfds are deasserted at the same time, > tracking them separately gets us nothing. That's not the reason. Separate irq source ids only have meanings within a gsi. We could have two lines (gsi 3 isid 4) and (gsi 4 isid 4) that can be toggled independently with no effect on the other gsi. Within a gsi we do need a separate irq source id usually, but as 2/2 recognizes, AODNs are a special case since we clear all inputs anyway. The end result is that all AODNs can share a single isid. > >> Plus I'd like to >> fix the theoretical bug even if it doesn't bite in practice. >> > > I'm not sure what the bug is, for edge, and how a separate ID fixes it. > Could you clarify? gsi 3 is configured as edge in the ioapic. It has (unusually) two inputs: one driven by userspace, the other by irqfd. cpu 0 cpu 1 ------------------------ ------------------------- irqfd: set to 1 ioapic: recognize edge inject irq EOI KVM_IRQ_LINE: set to 1 ioapic: ignore KVM_IRQ_LINE: set to 0 irqfd: set to 0 We had two edges with an EOI between them, but injected just on interrupt. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd 2012-09-05 15:22 ` Avi Kivity @ 2012-09-05 15:28 ` Michael S. Tsirkin 2012-09-05 15:35 ` Avi Kivity 0 siblings, 1 reply; 35+ messages in thread From: Michael S. Tsirkin @ 2012-09-05 15:28 UTC (permalink / raw) To: Avi Kivity; +Cc: Alex Williamson, gleb, kvm, linux-kernel On Wed, Sep 05, 2012 at 06:22:57PM +0300, Avi Kivity wrote: > On 09/05/2012 06:13 PM, Michael S. Tsirkin wrote: > > On Wed, Sep 05, 2012 at 05:59:46PM +0300, Avi Kivity wrote: > >> On 09/05/2012 05:51 PM, Michael S. Tsirkin wrote: > >> > On Wed, Sep 05, 2012 at 05:35:43PM +0300, Avi Kivity wrote: > >> >> On 08/22/2012 03:41 AM, Michael S. Tsirkin wrote: > >> >> >> > >> >> >> I assumed you were pointing out the level vs edge interaction. If we > >> >> >> call that a userspace bug, I can just drop this. Thanks, > >> >> >> > >> >> >> Alex > >> >> > > >> >> > level is userspace bug I think :) > >> >> > >> >> I don't see how it's a bug. Suppose we have a vfio device that shares a > >> >> gsi with an emulated device. The emulated device naturally uses > >> >> KVM_IRQ_LINE (it has no need to re-sample on ADN), while vfio naturally > >> >> has to use irqfd. > >> > > >> > Absolutely. But vfio needs to use irqfd with the new flag. > >> > Using existing irqfd for level is a bug. > >> > >> I see we're not reusing this irq source id for level irqfd. But I think > >> we should, there's no need for per-gsi irq source id. > > > > I agree. All resample irqfds are deasserted at the same time, > > tracking them separately gets us nothing. > > That's not the reason. Separate irq source ids only have meanings > within a gsi. We could have two lines (gsi 3 isid 4) and (gsi 4 isid 4) > that can be toggled independently with no effect on the other gsi. > Within a gsi we do need a separate irq source id usually, but as 2/2 > recognizes, AODNs are a special case since we clear all inputs anyway. > The end result is that all AODNs can share a single isid. > > > > >> Plus I'd like to > >> fix the theoretical bug even if it doesn't bite in practice. > >> > > > > I'm not sure what the bug is, for edge, and how a separate ID fixes it. > > Could you clarify? > > gsi 3 is configured as edge in the ioapic. It has (unusually) two > inputs: one driven by userspace, the other by irqfd. > > cpu 0 cpu 1 > ------------------------ ------------------------- > irqfd: set to 1 > ioapic: recognize edge > inject irq > EOI > KVM_IRQ_LINE: set to 1 > ioapic: ignore > KVM_IRQ_LINE: set to 0 > irqfd: set to 0 > > We had two edges with an EOI between them, but injected just on interrupt. I see. Makes sense, ACK this patch. > -- > error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd 2012-09-05 15:28 ` Michael S. Tsirkin @ 2012-09-05 15:35 ` Avi Kivity 0 siblings, 0 replies; 35+ messages in thread From: Avi Kivity @ 2012-09-05 15:35 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Alex Williamson, gleb, kvm, linux-kernel On 09/05/2012 06:28 PM, Michael S. Tsirkin wrote: >> gsi 3 is configured as edge in the ioapic. It has (unusually) two >> inputs: one driven by userspace, the other by irqfd. >> >> cpu 0 cpu 1 >> ------------------------ ------------------------- >> irqfd: set to 1 >> ioapic: recognize edge >> inject irq >> EOI >> KVM_IRQ_LINE: set to 1 >> ioapic: ignore >> KVM_IRQ_LINE: set to 0 >> irqfd: set to 0 >> >> We had two edges with an EOI between them, but injected just on interrupt. > > I see. Makes sense, ACK this patch. Actually it's wrong. The two sources are not synchronized, so there is no way for them to know the two edges did not coalesce. On real hardware, after all, edge interrupts have a non-zero pulse width, and kvm faithfully emulates this. But this patch makes sense for level irqfd, so we might as well keep it with a different description. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd 2012-09-05 14:51 ` Michael S. Tsirkin 2012-09-05 14:59 ` Avi Kivity @ 2012-09-05 15:09 ` Michael S. Tsirkin 2012-09-05 15:12 ` Avi Kivity 1 sibling, 1 reply; 35+ messages in thread From: Michael S. Tsirkin @ 2012-09-05 15:09 UTC (permalink / raw) To: Avi Kivity; +Cc: Alex Williamson, gleb, kvm, linux-kernel On Wed, Sep 05, 2012 at 05:51:53PM +0300, Michael S. Tsirkin wrote: > On Wed, Sep 05, 2012 at 05:35:43PM +0300, Avi Kivity wrote: > > On 08/22/2012 03:41 AM, Michael S. Tsirkin wrote: > > >> > > >> I assumed you were pointing out the level vs edge interaction. If we > > >> call that a userspace bug, I can just drop this. Thanks, > > >> > > >> Alex > > > > > > level is userspace bug I think :) > > > > I don't see how it's a bug. Suppose we have a vfio device that shares a > > gsi with an emulated device. The emulated device naturally uses > > KVM_IRQ_LINE (it has no need to re-sample on ADN), while vfio naturally > > has to use irqfd. > > Absolutely. But vfio needs to use irqfd with the new flag. > Using existing irqfd for level is a bug. > > > Note one would expect that each irqfd gets its own irq source id, since > > they are all independent level sources. The reason they don't is that > > we shut them down anyway and let the sources re-trigger (it is more > > accurate to say that they have no irq source id, but that would just > > muddle the implementation). > > > > Alex, if the conclusion is that we do need this patch, then please add a > > comment explaining why we can share the source id among all irqfd users. Something along the lines of /* For resample irqfds, level is a logical OR of all inputs; to support this, track state for RESAMPLE irqfds separately from userspace. We do not need to track state for each input since they are all deasserted at the same time, before resampling. */ ? > > -- > > error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd 2012-09-05 15:09 ` Michael S. Tsirkin @ 2012-09-05 15:12 ` Avi Kivity 2012-09-05 15:15 ` Michael S. Tsirkin 0 siblings, 1 reply; 35+ messages in thread From: Avi Kivity @ 2012-09-05 15:12 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Alex Williamson, gleb, kvm, linux-kernel On 09/05/2012 06:09 PM, Michael S. Tsirkin wrote: > On Wed, Sep 05, 2012 at 05:51:53PM +0300, Michael S. Tsirkin wrote: >> On Wed, Sep 05, 2012 at 05:35:43PM +0300, Avi Kivity wrote: >> > On 08/22/2012 03:41 AM, Michael S. Tsirkin wrote: >> > >> >> > >> I assumed you were pointing out the level vs edge interaction. If we >> > >> call that a userspace bug, I can just drop this. Thanks, >> > >> >> > >> Alex >> > > >> > > level is userspace bug I think :) >> > >> > I don't see how it's a bug. Suppose we have a vfio device that shares a >> > gsi with an emulated device. The emulated device naturally uses >> > KVM_IRQ_LINE (it has no need to re-sample on ADN), while vfio naturally >> > has to use irqfd. >> >> Absolutely. But vfio needs to use irqfd with the new flag. >> Using existing irqfd for level is a bug. >> >> > Note one would expect that each irqfd gets its own irq source id, since >> > they are all independent level sources. The reason they don't is that >> > we shut them down anyway and let the sources re-trigger (it is more >> > accurate to say that they have no irq source id, but that would just >> > muddle the implementation). >> > >> > Alex, if the conclusion is that we do need this patch, then please add a >> > comment explaining why we can share the source id among all irqfd users. > > Something along the lines of > > /* > For resample irqfds, level is a logical OR of all inputs; > to support this, track state for RESAMPLE irqfds separately > from userspace. We do not need to track state for each input since > they are all deasserted at the same time, before resampling. > */ Well the comment style is wrong. To expand a little more, irqfd only sends assert events, so assigning the level is equivalent to an OR. Clearing an resampling simply builds the state again. btw, there can be other irq source IDs if the lines are shared with the PIT or kvm assigned devices. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd 2012-09-05 15:12 ` Avi Kivity @ 2012-09-05 15:15 ` Michael S. Tsirkin 0 siblings, 0 replies; 35+ messages in thread From: Michael S. Tsirkin @ 2012-09-05 15:15 UTC (permalink / raw) To: Avi Kivity; +Cc: Alex Williamson, gleb, kvm, linux-kernel On Wed, Sep 05, 2012 at 06:12:04PM +0300, Avi Kivity wrote: > On 09/05/2012 06:09 PM, Michael S. Tsirkin wrote: > > On Wed, Sep 05, 2012 at 05:51:53PM +0300, Michael S. Tsirkin wrote: > >> On Wed, Sep 05, 2012 at 05:35:43PM +0300, Avi Kivity wrote: > >> > On 08/22/2012 03:41 AM, Michael S. Tsirkin wrote: > >> > >> > >> > >> I assumed you were pointing out the level vs edge interaction. If we > >> > >> call that a userspace bug, I can just drop this. Thanks, > >> > >> > >> > >> Alex > >> > > > >> > > level is userspace bug I think :) > >> > > >> > I don't see how it's a bug. Suppose we have a vfio device that shares a > >> > gsi with an emulated device. The emulated device naturally uses > >> > KVM_IRQ_LINE (it has no need to re-sample on ADN), while vfio naturally > >> > has to use irqfd. > >> > >> Absolutely. But vfio needs to use irqfd with the new flag. > >> Using existing irqfd for level is a bug. > >> > >> > Note one would expect that each irqfd gets its own irq source id, since > >> > they are all independent level sources. The reason they don't is that > >> > we shut them down anyway and let the sources re-trigger (it is more > >> > accurate to say that they have no irq source id, but that would just > >> > muddle the implementation). > >> > > >> > Alex, if the conclusion is that we do need this patch, then please add a > >> > comment explaining why we can share the source id among all irqfd users. > > > > Something along the lines of > > > > /* > > For resample irqfds, level is a logical OR of all inputs; > > to support this, track state for RESAMPLE irqfds separately > > from userspace. We do not need to track state for each input since > > they are all deasserted at the same time, before resampling. > > */ > > Well the comment style is wrong. Ouch. > To expand a little more, irqfd only sends assert events, so assigning > the level is equivalent to an OR. Clearing an resampling simply builds > the state again. > > btw, there can be other irq source IDs if the lines are shared with the > PIT or kvm assigned devices. Nod. > -- > error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd 2012-08-21 19:29 ` [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd Alex Williamson 2012-08-21 19:58 ` Michael S. Tsirkin @ 2012-09-05 14:46 ` Avi Kivity 2012-09-05 15:07 ` Michael S. Tsirkin 1 sibling, 1 reply; 35+ messages in thread From: Avi Kivity @ 2012-09-05 14:46 UTC (permalink / raw) To: Alex Williamson; +Cc: mst, gleb, kvm, linux-kernel On 08/21/2012 10:29 PM, Alex Williamson wrote: > KVM_IRQFD currently uses the reserved KVM_USERSPACE_IRQ_SOURCE_ID > which is also shared with userspace injection methods like > KVM_IRQ_LINE. This can cause a conflict if an irqfd triggers on > a GSI asserted through KVM_IRQ_LINE. Move irqfd to it's own > reserved IRQ source ID. Add a capability for userspace to test > for this fix. I don't think we need a cap, rather a backport if we identify real cases where an edge gsi is shared among several devices. Otherwise it is just a theoretical bug before level irqfd is introduced. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd 2012-09-05 14:46 ` Avi Kivity @ 2012-09-05 15:07 ` Michael S. Tsirkin 2012-09-05 15:15 ` Avi Kivity 0 siblings, 1 reply; 35+ messages in thread From: Michael S. Tsirkin @ 2012-09-05 15:07 UTC (permalink / raw) To: Avi Kivity; +Cc: Alex Williamson, gleb, kvm, linux-kernel On Wed, Sep 05, 2012 at 05:46:17PM +0300, Avi Kivity wrote: > On 08/21/2012 10:29 PM, Alex Williamson wrote: > > KVM_IRQFD currently uses the reserved KVM_USERSPACE_IRQ_SOURCE_ID > > which is also shared with userspace injection methods like > > KVM_IRQ_LINE. This can cause a conflict if an irqfd triggers on > > a GSI asserted through KVM_IRQ_LINE. Move irqfd to it's own > > reserved IRQ source ID. Add a capability for userspace to test > > for this fix. > > I don't think we need a cap, rather a backport if we identify real cases > where an edge gsi is shared among several devices. Otherwise it is just > a theoretical bug before level irqfd is introduced. In that case, I think it's safer to preserve the "bug" as is: we are changing userspace-visible behaviour for edge interrupts otherwise. For example if userspace uses kvm_irq_line for an edge interrupt, set it to 1, previously it could then continue to send any number of interrupts with irqfd, now it can't. Basically the logical OR functionality of source IDs does not make sense for edge. How about we do if (flags&RESAMPLE) source_id = USERSPACE else source_id = IRQFD > -- > error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd 2012-09-05 15:07 ` Michael S. Tsirkin @ 2012-09-05 15:15 ` Avi Kivity 0 siblings, 0 replies; 35+ messages in thread From: Avi Kivity @ 2012-09-05 15:15 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Alex Williamson, gleb, kvm, linux-kernel On 09/05/2012 06:07 PM, Michael S. Tsirkin wrote: > On Wed, Sep 05, 2012 at 05:46:17PM +0300, Avi Kivity wrote: >> On 08/21/2012 10:29 PM, Alex Williamson wrote: >> > KVM_IRQFD currently uses the reserved KVM_USERSPACE_IRQ_SOURCE_ID >> > which is also shared with userspace injection methods like >> > KVM_IRQ_LINE. This can cause a conflict if an irqfd triggers on >> > a GSI asserted through KVM_IRQ_LINE. Move irqfd to it's own >> > reserved IRQ source ID. Add a capability for userspace to test >> > for this fix. >> >> I don't think we need a cap, rather a backport if we identify real cases >> where an edge gsi is shared among several devices. Otherwise it is just >> a theoretical bug before level irqfd is introduced. > > In that case, I think it's safer to preserve the "bug" as is: we are > changing userspace-visible behaviour for edge interrupts otherwise. > For example if userspace uses kvm_irq_line for an edge > interrupt, set it to 1, previously it could then > continue to send any number of interrupts with irqfd, > now it can't. If anyone did that, they should have reported a bug, since they surely didn't expect edges if the line was held high. > > Basically the logical OR functionality of source IDs > does not make sense for edge. Edge is only interpreted at the ioapic or pic input; the line is just a line (an open collector line that ORs anything connected to it, or an equivalent). > > How about we do > if (flags&RESAMPLE) > source_id = USERSPACE > else > source_id = IRQFD Okay if we identify something that depends on the bug, otherwise not. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH v9 2/2] kvm: On Ack, De-assert & Notify KVM_IRQFD extension 2012-08-21 19:28 [PATCH v9 0/2] kvm: level irqfd support Alex Williamson 2012-08-21 19:29 ` [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd Alex Williamson @ 2012-08-21 19:29 ` Alex Williamson 2012-08-21 20:37 ` Michael S. Tsirkin 2012-09-05 14:57 ` Avi Kivity 2012-08-22 0:31 ` [PATCH v9 0/2] kvm: level irqfd support Michael S. Tsirkin ` (2 subsequent siblings) 4 siblings, 2 replies; 35+ messages in thread From: Alex Williamson @ 2012-08-21 19:29 UTC (permalink / raw) To: avi, mst; +Cc: gleb, kvm, linux-kernel For VFIO based device assignment we'd like a mechanism to allow level triggered interrutps to be directly injected into KVM. KVM_IRQFD already allows this for edge triggered interrupts, but for level, we need to watch for acknowledgement of the interrupt from the guest to provide us a hint when to test the device and allow it to re-assert if necessary. To do this, we create a new KVM_IRQFD mode called "On Ack, De-assert & Notify", or OADN. In this mode, an interrupt injection provides only a gsi assertion. We then hook into the IRQ ACK notifier, which when triggered de-asserts the gsi and notifies via another eventfd. It's then the responsibility of the user to re-assert the interrupt is service is still required. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> --- Documentation/virtual/kvm/api.txt | 13 ++ arch/x86/kvm/x86.c | 1 include/linux/kvm.h | 6 + include/linux/kvm_host.h | 1 virt/kvm/eventfd.c | 193 ++++++++++++++++++++++++++++++++++++- 5 files changed, 210 insertions(+), 4 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index bf33aaa..87d7321 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1946,6 +1946,19 @@ the guest using the specified gsi pin. The irqfd is removed using the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd and kvm_irqfd.gsi. +With KVM_CAP_IRQFD_OADN, KVM_IRQFD supports an "On Ack, De-assert & +Notify" option that allows emulation of level-triggered interrupts. +When kvm_irqfd.fd is triggered, the requested gsi is asserted and +remains asserted until interaction with the irqchip indicates the +VM has acknowledged the interrupt, such as an EOI. On acknoledgement +the gsi is automatically de-asserted and the user is notified via +kvm_irqfd.notifyfd. The user is then required to re-assert the +interrupt if the associated device still requires service. To enable +this mode, configure the KVM_IRQFD using the KVM_IRQFD_FLAG_OADN flag +and specify kvm_irqfd.notifyfd. Note that closing kvm_irqfd.notifyfd +while configured in this mode does not disable the irqfd. The +KVM_IRQFD_FLAG_OADN flag is only necessary on assignment. + 4.76 KVM_PPC_ALLOCATE_HTAB Capability: KVM_CAP_PPC_ALLOC_HTAB diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index cd98673..fde7b66 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2175,6 +2175,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_PCI_2_3: case KVM_CAP_KVMCLOCK_CTRL: case KVM_CAP_IRQFD_IRQ_SOURCE_ID: + case KVM_CAP_IRQFD_OADN: r = 1; break; case KVM_CAP_COALESCED_MMIO: diff --git a/include/linux/kvm.h b/include/linux/kvm.h index ae66b9c..ec0f1d8 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -619,6 +619,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_S390_COW 79 #define KVM_CAP_PPC_ALLOC_HTAB 80 #define KVM_CAP_IRQFD_IRQ_SOURCE_ID 81 +#define KVM_CAP_IRQFD_OADN 82 #ifdef KVM_CAP_IRQ_ROUTING @@ -684,12 +685,15 @@ struct kvm_xen_hvm_config { #endif #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0) +/* Availabie with KVM_CAP_IRQFD_OADN */ +#define KVM_IRQFD_FLAG_OADN (1 << 1) struct kvm_irqfd { __u32 fd; __u32 gsi; __u32 flags; - __u8 pad[20]; + __u32 notifyfd; + __u8 pad[16]; }; struct kvm_clock_data { diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index b763230..d502d08 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -284,6 +284,7 @@ struct kvm { struct { spinlock_t lock; struct list_head items; + struct list_head oadns; } irqfds; struct list_head ioeventfds; #endif diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 2245cfa..dfdb5b2 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -43,6 +43,23 @@ * -------------------------------------------------------------------- */ +/* + * OADN irqfds (On Ack, De-assert & Notify) are a special variety of + * irqfds that assert an interrupt to the irqchip on eventfd trigger, + * receieve notification when userspace acknowledges the interrupt, + * automatically de-asserts the irqchip level, and notifies userspace + * via the oadn_eventfd. This object helps to provide one-to-many + * deassert-to-notify so we can share a single irq source ID per OADN. + */ +struct _irqfd_oadn { + struct kvm *kvm; + int irq_source_id; /* IRQ source ID shared by these irqfds */ + struct list_head irqfds; /* list of irqfds using this object */ + struct kvm_irq_ack_notifier notifier; /* IRQ ACK notification */ + struct kref kref; /* Race-free removal */ + struct list_head list; +}; + struct _irqfd { /* Used for MSI fast-path */ struct kvm *kvm; @@ -52,6 +69,10 @@ struct _irqfd { /* Used for level IRQ fast-path */ int gsi; struct work_struct inject; + /* Used for OADN (On Ack, De-assert & Notify) path */ + struct eventfd_ctx *oadn_eventfd; + struct list_head oadn_list; + struct _irqfd_oadn *oadn; /* Used for setup/shutdown */ struct eventfd_ctx *eventfd; struct list_head list; @@ -71,6 +92,51 @@ irqfd_inject(struct work_struct *work) kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 0); } +static void +irqfd_oadn_inject(struct work_struct *work) +{ + struct _irqfd *irqfd = container_of(work, struct _irqfd, inject); + + kvm_set_irq(irqfd->kvm, irqfd->oadn->irq_source_id, irqfd->gsi, 1); +} + +/* + * Since OADN irqfds share an IRQ source ID, we de-assert once then + * notify all of the OADN irqfds using this GSI. We can't do multiple + * de-asserts or we risk racing with incoming re-asserts. + */ +static void +irqfd_oadn_notify(struct kvm_irq_ack_notifier *kian) +{ + struct _irqfd_oadn *oadn; + struct _irqfd *irqfd; + + oadn = container_of(kian, struct _irqfd_oadn, notifier); + + rcu_read_lock(); + + kvm_set_irq(oadn->kvm, oadn->irq_source_id, + oadn->notifier.gsi, 0); + + list_for_each_entry_rcu(irqfd, &oadn->irqfds, oadn_list) + eventfd_signal(irqfd->oadn_eventfd, 1); + + rcu_read_unlock(); +} + +/* + * Assumes kvm->irqfds.lock is held. Remove from the list of OADNs, new + * users will allocate their own, letting us de-assert this GSI on free, + * after the RCU grace period. + */ +static void +irqfd_oadn_release(struct kref *kref) +{ + struct _irqfd_oadn *oadn = container_of(kref, struct _irqfd_oadn, kref); + + list_del(&oadn->list); +} + /* * Race-free decouple logic (ordering is critical) */ @@ -93,6 +159,35 @@ irqfd_shutdown(struct work_struct *work) flush_work_sync(&irqfd->inject); /* + * OADN irqfds use an RCU list for lockless de-assert and user + * notification. Modify the list using RCU under lock and release + * the OADN object. Since we later free this irqfd, we must wait + * for an RCU grace period. If the OADN was released, we can then + * unregister the irq ack notifier, free the irq source ID (assuring + * that the GSI is de-asserted), and free the object. + */ + if (irqfd->oadn) { + struct _irqfd_oadn *oadn = irqfd->oadn; + struct kvm *kvm = oadn->kvm; + bool released; + + spin_lock_irq(&irqfd->kvm->irqfds.lock); + list_del_rcu(&irqfd->oadn_list); + released = kref_put(&oadn->kref, irqfd_oadn_release); + spin_unlock_irq(&irqfd->kvm->irqfds.lock); + + synchronize_rcu(); + + if (released) { + kvm_unregister_irq_ack_notifier(kvm, &oadn->notifier); + kvm_free_irq_source_id(kvm, oadn->irq_source_id); + kfree(oadn); + } + + eventfd_ctx_put(irqfd->oadn_eventfd); + } + + /* * It is now safe to release the object's resources */ eventfd_ctx_put(irqfd->eventfd); @@ -202,8 +297,9 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) { struct kvm_irq_routing_table *irq_rt; struct _irqfd *irqfd, *tmp; + struct _irqfd_oadn *oadn = NULL; struct file *file = NULL; - struct eventfd_ctx *eventfd = NULL; + struct eventfd_ctx *eventfd = NULL, *oadn_eventfd = NULL; int ret; unsigned int events; @@ -214,7 +310,49 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) irqfd->kvm = kvm; irqfd->gsi = args->gsi; INIT_LIST_HEAD(&irqfd->list); - INIT_WORK(&irqfd->inject, irqfd_inject); + INIT_LIST_HEAD(&irqfd->oadn_list); + + if (args->flags & KVM_IRQFD_FLAG_OADN) { + oadn_eventfd = eventfd_ctx_fdget(args->notifyfd); + if (IS_ERR(oadn_eventfd)) { + ret = PTR_ERR(oadn_eventfd); + goto fail; + } + + irqfd->oadn_eventfd = oadn_eventfd; + + /* + * We may be able to share an existing OADN, but we won't + * know that until we search under spinlock. We can't get + * an irq_source_id under spinlock, and we'd prefer not to + * do an atomic allocation, so prep an object here and free + * it if we don't need it. + */ + oadn = kzalloc(sizeof(*oadn), GFP_KERNEL); + if (!oadn) { + ret = -ENOMEM; + goto fail; + } + + oadn->kvm = kvm; + INIT_LIST_HEAD(&oadn->irqfds); + oadn->notifier.gsi = irqfd->gsi; + oadn->notifier.irq_acked = irqfd_oadn_notify; + kref_init(&oadn->kref); + INIT_LIST_HEAD(&oadn->list); + + oadn->irq_source_id = kvm_request_irq_source_id(kvm); + if (oadn->irq_source_id < 0) { + ret = oadn->irq_source_id; + goto fail; + } + + irqfd->oadn = oadn; + + INIT_WORK(&irqfd->inject, irqfd_oadn_inject); + } else + INIT_WORK(&irqfd->inject, irqfd_inject); + INIT_WORK(&irqfd->shutdown, irqfd_shutdown); file = eventfd_fget(args->fd); @@ -250,6 +388,27 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) goto fail; } + /* + * When configuring an OADN irqfd, try to re-use an existing OADN. + */ + if (oadn) { + struct _irqfd_oadn *oadn_tmp; + + list_for_each_entry(oadn_tmp, &kvm->irqfds.oadns, list) { + if (oadn_tmp->notifier.gsi == irqfd->gsi) { + irqfd->oadn = oadn_tmp; + break; + } + } + + if (irqfd->oadn != oadn) + kref_get(&irqfd->oadn->kref); + else + list_add(&oadn->list, &kvm->irqfds.oadns); + + list_add_rcu(&irqfd->oadn_list, &irqfd->oadn->irqfds); + } + irq_rt = rcu_dereference_protected(kvm->irq_routing, lockdep_is_held(&kvm->irqfds.lock)); irqfd_update(kvm, irqfd, irq_rt); @@ -267,6 +426,26 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) spin_unlock_irq(&kvm->irqfds.lock); + if (oadn) { + synchronize_rcu(); + + /* + * If we weren't able to re-use an OADN, setup the irq ack + * notifier outside of spinlock. Our interface requires + * users to be able to handle spurious de-assert & notifies, + * so trigger one here to make sure we didn't miss anything. + * Cleanup unused OADN if we share. + */ + if (irqfd->oadn == oadn) + kvm_register_irq_ack_notifier(kvm, &oadn->notifier); + else { + kvm_free_irq_source_id(kvm, oadn->irq_source_id); + kfree(oadn); + } + + irqfd_oadn_notify(&irqfd->oadn->notifier); + } + /* * do not drop the file until the irqfd is fully initialized, otherwise * we might race against the POLLHUP @@ -276,12 +455,19 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) return 0; fail: + if (oadn_eventfd && !IS_ERR(oadn_eventfd)) + eventfd_ctx_put(oadn_eventfd); + + if (oadn && oadn->irq_source_id >= 0) + kvm_free_irq_source_id(kvm, oadn->irq_source_id); + if (eventfd && !IS_ERR(eventfd)) eventfd_ctx_put(eventfd); if (!IS_ERR(file)) fput(file); + kfree(oadn); kfree(irqfd); return ret; } @@ -291,6 +477,7 @@ kvm_eventfd_init(struct kvm *kvm) { spin_lock_init(&kvm->irqfds.lock); INIT_LIST_HEAD(&kvm->irqfds.items); + INIT_LIST_HEAD(&kvm->irqfds.oadns); INIT_LIST_HEAD(&kvm->ioeventfds); } @@ -340,7 +527,7 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args) int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args) { - if (args->flags & ~KVM_IRQFD_FLAG_DEASSIGN) + if (args->flags & ~(KVM_IRQFD_FLAG_DEASSIGN | KVM_IRQFD_FLAG_OADN)) return -EINVAL; if (args->flags & KVM_IRQFD_FLAG_DEASSIGN) ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH v9 2/2] kvm: On Ack, De-assert & Notify KVM_IRQFD extension 2012-08-21 19:29 ` [PATCH v9 2/2] kvm: On Ack, De-assert & Notify KVM_IRQFD extension Alex Williamson @ 2012-08-21 20:37 ` Michael S. Tsirkin 2012-08-21 21:11 ` Alex Williamson 2012-09-05 14:57 ` Avi Kivity 1 sibling, 1 reply; 35+ messages in thread From: Michael S. Tsirkin @ 2012-08-21 20:37 UTC (permalink / raw) To: Alex Williamson; +Cc: avi, gleb, kvm, linux-kernel On Tue, Aug 21, 2012 at 01:29:14PM -0600, Alex Williamson wrote: > For VFIO based device assignment we'd like a mechanism to allow level > triggered interrutps to be directly injected into KVM. KVM_IRQFD > already allows this for edge triggered interrupts, but for level, we > need to watch for acknowledgement of the interrupt from the guest to > provide us a hint when to test the device and allow it to re-assert > if necessary. To do this, we create a new KVM_IRQFD mode called > "On Ack, De-assert & Notify", or OADN. In this mode, an interrupt > injection provides only a gsi assertion. We then hook into the IRQ > ACK notifier, which when triggered de-asserts the gsi and notifies > via another eventfd. It's then the responsibility of the user to > re-assert the interrupt is service is still required. > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Naming aside, looks good. I think I see some minor bugs, and I added some improvement suggestions below. Thanks! > --- > > Documentation/virtual/kvm/api.txt | 13 ++ > arch/x86/kvm/x86.c | 1 > include/linux/kvm.h | 6 + > include/linux/kvm_host.h | 1 > virt/kvm/eventfd.c | 193 ++++++++++++++++++++++++++++++++++++- > 5 files changed, 210 insertions(+), 4 deletions(-) > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt > index bf33aaa..87d7321 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -1946,6 +1946,19 @@ the guest using the specified gsi pin. The irqfd is removed using > the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd > and kvm_irqfd.gsi. > > +With KVM_CAP_IRQFD_OADN, KVM_IRQFD supports an "On Ack, De-assert & > +Notify" option that allows emulation of level-triggered interrupts. > +When kvm_irqfd.fd is triggered, the requested gsi is asserted and > +remains asserted until interaction with the irqchip indicates the > +VM has acknowledged the interrupt, such as an EOI. On acknoledgement > +the gsi is automatically de-asserted and the user is notified via > +kvm_irqfd.notifyfd. The user is then required to re-assert the > +interrupt if the associated device still requires service. To enable > +this mode, configure the KVM_IRQFD using the KVM_IRQFD_FLAG_OADN flag > +and specify kvm_irqfd.notifyfd. Note that closing kvm_irqfd.notifyfd > +while configured in this mode does not disable the irqfd. The > +KVM_IRQFD_FLAG_OADN flag is only necessary on assignment. > + > 4.76 KVM_PPC_ALLOCATE_HTAB > > Capability: KVM_CAP_PPC_ALLOC_HTAB > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index cd98673..fde7b66 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -2175,6 +2175,7 @@ int kvm_dev_ioctl_check_extension(long ext) > case KVM_CAP_PCI_2_3: > case KVM_CAP_KVMCLOCK_CTRL: > case KVM_CAP_IRQFD_IRQ_SOURCE_ID: > + case KVM_CAP_IRQFD_OADN: > r = 1; > break; > case KVM_CAP_COALESCED_MMIO: > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > index ae66b9c..ec0f1d8 100644 > --- a/include/linux/kvm.h > +++ b/include/linux/kvm.h > @@ -619,6 +619,7 @@ struct kvm_ppc_smmu_info { > #define KVM_CAP_S390_COW 79 > #define KVM_CAP_PPC_ALLOC_HTAB 80 > #define KVM_CAP_IRQFD_IRQ_SOURCE_ID 81 > +#define KVM_CAP_IRQFD_OADN 82 > > #ifdef KVM_CAP_IRQ_ROUTING > > @@ -684,12 +685,15 @@ struct kvm_xen_hvm_config { > #endif > > #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0) > +/* Availabie with KVM_CAP_IRQFD_OADN */ Need to also explain what it is. > +#define KVM_IRQFD_FLAG_OADN (1 << 1) > > struct kvm_irqfd { > __u32 fd; > __u32 gsi; > __u32 flags; > - __u8 pad[20]; > + __u32 notifyfd; Document that this is only valid with OADN flag. Might be a good idea to rename this to deassert_on_ack_notifyfd or oadn_notifyfd to avoid confusion. > + __u8 pad[16]; > }; > > struct kvm_clock_data { > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index b763230..d502d08 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -284,6 +284,7 @@ struct kvm { > struct { > spinlock_t lock; > struct list_head items; > + struct list_head oadns; > } irqfds; > struct list_head ioeventfds; > #endif > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c > index 2245cfa..dfdb5b2 100644 > --- a/virt/kvm/eventfd.c > +++ b/virt/kvm/eventfd.c > @@ -43,6 +43,23 @@ > * -------------------------------------------------------------------- > */ > > +/* > + * OADN irqfds (On Ack, De-assert & Notify) are a special variety of > + * irqfds that assert an interrupt to the irqchip on eventfd trigger, > + * receieve notification when userspace acknowledges the interrupt, > + * automatically de-asserts the irqchip level, and notifies userspace > + * via the oadn_eventfd. This object helps to provide one-to-many > + * deassert-to-notify so we can share a single irq source ID per OADN. > + */ > +struct _irqfd_oadn { > + struct kvm *kvm; > + int irq_source_id; /* IRQ source ID shared by these irqfds */ > + struct list_head irqfds; /* list of irqfds using this object */ > + struct kvm_irq_ack_notifier notifier; /* IRQ ACK notification */ > + struct kref kref; /* Race-free removal */ > + struct list_head list; > +}; > + > struct _irqfd { > /* Used for MSI fast-path */ > struct kvm *kvm; > @@ -52,6 +69,10 @@ struct _irqfd { > /* Used for level IRQ fast-path */ > int gsi; > struct work_struct inject; > + /* Used for OADN (On Ack, De-assert & Notify) path */ > + struct eventfd_ctx *oadn_eventfd; > + struct list_head oadn_list; Could you document RCU and locking rules for this field please? > + struct _irqfd_oadn *oadn; > /* Used for setup/shutdown */ > struct eventfd_ctx *eventfd; > struct list_head list; > @@ -71,6 +92,51 @@ irqfd_inject(struct work_struct *work) > kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 0); > } > > +static void > +irqfd_oadn_inject(struct work_struct *work) > +{ > + struct _irqfd *irqfd = container_of(work, struct _irqfd, inject); > + > + kvm_set_irq(irqfd->kvm, irqfd->oadn->irq_source_id, irqfd->gsi, 1); > +} > + > +/* > + * Since OADN irqfds share an IRQ source ID, we de-assert once then > + * notify all of the OADN irqfds using this GSI. We can't do multiple > + * de-asserts or we risk racing with incoming re-asserts. > + */ > +static void > +irqfd_oadn_notify(struct kvm_irq_ack_notifier *kian) > +{ > + struct _irqfd_oadn *oadn; > + struct _irqfd *irqfd; > + > + oadn = container_of(kian, struct _irqfd_oadn, notifier); > + > + rcu_read_lock(); > + > + kvm_set_irq(oadn->kvm, oadn->irq_source_id, > + oadn->notifier.gsi, 0); > + > + list_for_each_entry_rcu(irqfd, &oadn->irqfds, oadn_list) > + eventfd_signal(irqfd->oadn_eventfd, 1); > + > + rcu_read_unlock(); > +} > + > +/* > + * Assumes kvm->irqfds.lock is held. Remove from the list of OADNs, new > + * users will allocate their own, letting us de-assert this GSI on free, > + * after the RCU grace period. > + */ > +static void > +irqfd_oadn_release(struct kref *kref) > +{ > + struct _irqfd_oadn *oadn = container_of(kref, struct _irqfd_oadn, kref); > + > + list_del(&oadn->list); > +} > + > /* > * Race-free decouple logic (ordering is critical) > */ > @@ -93,6 +159,35 @@ irqfd_shutdown(struct work_struct *work) > flush_work_sync(&irqfd->inject); > > /* > + * OADN irqfds use an RCU list for lockless de-assert and user > + * notification. Modify the list using RCU under lock and release > + * the OADN object. Since we later free this irqfd, we must wait > + * for an RCU grace period. If the OADN was released, we can then > + * unregister the irq ack notifier, free the irq source ID (assuring > + * that the GSI is de-asserted), and free the object. > + */ > + if (irqfd->oadn) { > + struct _irqfd_oadn *oadn = irqfd->oadn; > + struct kvm *kvm = oadn->kvm; > + bool released; > + > + spin_lock_irq(&irqfd->kvm->irqfds.lock); > + list_del_rcu(&irqfd->oadn_list); > + released = kref_put(&oadn->kref, irqfd_oadn_release); > + spin_unlock_irq(&irqfd->kvm->irqfds.lock); > + > + synchronize_rcu(); > + > + if (released) { > + kvm_unregister_irq_ack_notifier(kvm, &oadn->notifier); > + kvm_free_irq_source_id(kvm, oadn->irq_source_id); > + kfree(oadn); > + } > + > + eventfd_ctx_put(irqfd->oadn_eventfd); > + } > + > + /* > * It is now safe to release the object's resources > */ > eventfd_ctx_put(irqfd->eventfd); > @@ -202,8 +297,9 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) > { > struct kvm_irq_routing_table *irq_rt; > struct _irqfd *irqfd, *tmp; > + struct _irqfd_oadn *oadn = NULL; > struct file *file = NULL; > - struct eventfd_ctx *eventfd = NULL; > + struct eventfd_ctx *eventfd = NULL, *oadn_eventfd = NULL; > int ret; > unsigned int events; > > @@ -214,7 +310,49 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) > irqfd->kvm = kvm; > irqfd->gsi = args->gsi; > INIT_LIST_HEAD(&irqfd->list); > - INIT_WORK(&irqfd->inject, irqfd_inject); > + INIT_LIST_HEAD(&irqfd->oadn_list); > + > + if (args->flags & KVM_IRQFD_FLAG_OADN) { > + oadn_eventfd = eventfd_ctx_fdget(args->notifyfd); > + if (IS_ERR(oadn_eventfd)) { > + ret = PTR_ERR(oadn_eventfd); > + goto fail; > + } > + > + irqfd->oadn_eventfd = oadn_eventfd; > + > + /* > + * We may be able to share an existing OADN, but we won't > + * know that until we search under spinlock. We can't get > + * an irq_source_id under spinlock, and we'd prefer not to > + * do an atomic allocation, so prep an object here and free > + * it if we don't need it. > + */ So if everyone uses same source ID this code will be simpler too? > + oadn = kzalloc(sizeof(*oadn), GFP_KERNEL); > + if (!oadn) { > + ret = -ENOMEM; > + goto fail; > + } > + > + oadn->kvm = kvm; > + INIT_LIST_HEAD(&oadn->irqfds); > + oadn->notifier.gsi = irqfd->gsi; > + oadn->notifier.irq_acked = irqfd_oadn_notify; > + kref_init(&oadn->kref); > + INIT_LIST_HEAD(&oadn->list); > + > + oadn->irq_source_id = kvm_request_irq_source_id(kvm); > + if (oadn->irq_source_id < 0) { > + ret = oadn->irq_source_id; > + goto fail; > + } > + This still has the problem that by allocating these OADN irqfds you can run out of source IDs for other uses. Why don't OADNs just use KVM_IRQFD_IRQ_SOURCE_ID? I thought this is why it was introduced in this patchset ... > + irqfd->oadn = oadn; > + > + INIT_WORK(&irqfd->inject, irqfd_oadn_inject); > + } else > + INIT_WORK(&irqfd->inject, irqfd_inject); > + > INIT_WORK(&irqfd->shutdown, irqfd_shutdown); > > file = eventfd_fget(args->fd); > @@ -250,6 +388,27 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) > goto fail; > } > > + /* > + * When configuring an OADN irqfd, try to re-use an existing OADN. > + */ > + if (oadn) { > + struct _irqfd_oadn *oadn_tmp; > + > + list_for_each_entry(oadn_tmp, &kvm->irqfds.oadns, list) { > + if (oadn_tmp->notifier.gsi == irqfd->gsi) { > + irqfd->oadn = oadn_tmp; > + break; > + } > + } > + > + if (irqfd->oadn != oadn) > + kref_get(&irqfd->oadn->kref); > + else > + list_add(&oadn->list, &kvm->irqfds.oadns); > + > + list_add_rcu(&irqfd->oadn_list, &irqfd->oadn->irqfds); Hang on, don't we need irqfds_lock here? > + } > + > irq_rt = rcu_dereference_protected(kvm->irq_routing, > lockdep_is_held(&kvm->irqfds.lock)); > irqfd_update(kvm, irqfd, irq_rt); > @@ -267,6 +426,26 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) > > spin_unlock_irq(&kvm->irqfds.lock); > > + if (oadn) { > + synchronize_rcu(); Seems unexpected on assign. What does this synchronize_rcu do? > + > + /* > + * If we weren't able to re-use an OADN, setup the irq ack > + * notifier outside of spinlock. Our interface requires > + * users to be able to handle spurious de-assert & notifies, > + * so trigger one here to make sure we didn't miss anything. > + * Cleanup unused OADN if we share. > + */ > + if (irqfd->oadn == oadn) > + kvm_register_irq_ack_notifier(kvm, &oadn->notifier); > + else { > + kvm_free_irq_source_id(kvm, oadn->irq_source_id); This is unfortunate - so we need a spare source ID for internal use. Thus if 1 source ID is available, then it is possible that - allocate irqfd for A - allocate irqfd for B succeeds because A uses a shared source id so it is freed here. While - allocate irqfd for B - allocate irqfd for A fails because B used the last ID and now temporary allocation above fails. > + kfree(oadn); > + } > + > + irqfd_oadn_notify(&irqfd->oadn->notifier); > + } > + > /* > * do not drop the file until the irqfd is fully initialized, otherwise > * we might race against the POLLHUP > @@ -276,12 +455,19 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) > return 0; > > fail: > + if (oadn_eventfd && !IS_ERR(oadn_eventfd)) > + eventfd_ctx_put(oadn_eventfd); > + > + if (oadn && oadn->irq_source_id >= 0) > + kvm_free_irq_source_id(kvm, oadn->irq_source_id); > + A bit cleaner to have distinct labels for failures at different stages of the setup. I know it's not written this way ATM so it's not a must to address this. > if (eventfd && !IS_ERR(eventfd)) > eventfd_ctx_put(eventfd); > > if (!IS_ERR(file)) > fput(file); > > + kfree(oadn); > kfree(irqfd); > return ret; > } > @@ -291,6 +477,7 @@ kvm_eventfd_init(struct kvm *kvm) > { > spin_lock_init(&kvm->irqfds.lock); > INIT_LIST_HEAD(&kvm->irqfds.items); > + INIT_LIST_HEAD(&kvm->irqfds.oadns); > INIT_LIST_HEAD(&kvm->ioeventfds); > } > > @@ -340,7 +527,7 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args) > int > kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args) > { > - if (args->flags & ~KVM_IRQFD_FLAG_DEASSIGN) > + if (args->flags & ~(KVM_IRQFD_FLAG_DEASSIGN | KVM_IRQFD_FLAG_OADN)) > return -EINVAL; > > if (args->flags & KVM_IRQFD_FLAG_DEASSIGN) ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v9 2/2] kvm: On Ack, De-assert & Notify KVM_IRQFD extension 2012-08-21 20:37 ` Michael S. Tsirkin @ 2012-08-21 21:11 ` Alex Williamson 2012-08-22 0:14 ` Michael S. Tsirkin 0 siblings, 1 reply; 35+ messages in thread From: Alex Williamson @ 2012-08-21 21:11 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: avi, gleb, kvm, linux-kernel On Tue, 2012-08-21 at 23:37 +0300, Michael S. Tsirkin wrote: > On Tue, Aug 21, 2012 at 01:29:14PM -0600, Alex Williamson wrote: > > For VFIO based device assignment we'd like a mechanism to allow level > > triggered interrutps to be directly injected into KVM. KVM_IRQFD > > already allows this for edge triggered interrupts, but for level, we > > need to watch for acknowledgement of the interrupt from the guest to > > provide us a hint when to test the device and allow it to re-assert > > if necessary. To do this, we create a new KVM_IRQFD mode called > > "On Ack, De-assert & Notify", or OADN. In this mode, an interrupt > > injection provides only a gsi assertion. We then hook into the IRQ > > ACK notifier, which when triggered de-asserts the gsi and notifies > > via another eventfd. It's then the responsibility of the user to > > re-assert the interrupt is service is still required. > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > > Naming aside, looks good. > I think I see some minor bugs, and I added some improvement > suggestions below. > > Thanks! > > > --- > > > > Documentation/virtual/kvm/api.txt | 13 ++ > > arch/x86/kvm/x86.c | 1 > > include/linux/kvm.h | 6 + > > include/linux/kvm_host.h | 1 > > virt/kvm/eventfd.c | 193 ++++++++++++++++++++++++++++++++++++- > > 5 files changed, 210 insertions(+), 4 deletions(-) > > > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt > > index bf33aaa..87d7321 100644 > > --- a/Documentation/virtual/kvm/api.txt > > +++ b/Documentation/virtual/kvm/api.txt > > @@ -1946,6 +1946,19 @@ the guest using the specified gsi pin. The irqfd is removed using > > the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd > > and kvm_irqfd.gsi. > > > > +With KVM_CAP_IRQFD_OADN, KVM_IRQFD supports an "On Ack, De-assert & > > +Notify" option that allows emulation of level-triggered interrupts. > > +When kvm_irqfd.fd is triggered, the requested gsi is asserted and > > +remains asserted until interaction with the irqchip indicates the > > +VM has acknowledged the interrupt, such as an EOI. On acknoledgement > > +the gsi is automatically de-asserted and the user is notified via > > +kvm_irqfd.notifyfd. The user is then required to re-assert the > > +interrupt if the associated device still requires service. To enable > > +this mode, configure the KVM_IRQFD using the KVM_IRQFD_FLAG_OADN flag > > +and specify kvm_irqfd.notifyfd. Note that closing kvm_irqfd.notifyfd > > +while configured in this mode does not disable the irqfd. The > > +KVM_IRQFD_FLAG_OADN flag is only necessary on assignment. > > + > > 4.76 KVM_PPC_ALLOCATE_HTAB > > > > Capability: KVM_CAP_PPC_ALLOC_HTAB > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index cd98673..fde7b66 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -2175,6 +2175,7 @@ int kvm_dev_ioctl_check_extension(long ext) > > case KVM_CAP_PCI_2_3: > > case KVM_CAP_KVMCLOCK_CTRL: > > case KVM_CAP_IRQFD_IRQ_SOURCE_ID: > > + case KVM_CAP_IRQFD_OADN: > > r = 1; > > break; > > case KVM_CAP_COALESCED_MMIO: > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > > index ae66b9c..ec0f1d8 100644 > > --- a/include/linux/kvm.h > > +++ b/include/linux/kvm.h > > @@ -619,6 +619,7 @@ struct kvm_ppc_smmu_info { > > #define KVM_CAP_S390_COW 79 > > #define KVM_CAP_PPC_ALLOC_HTAB 80 > > #define KVM_CAP_IRQFD_IRQ_SOURCE_ID 81 > > +#define KVM_CAP_IRQFD_OADN 82 > > > > #ifdef KVM_CAP_IRQ_ROUTING > > > > @@ -684,12 +685,15 @@ struct kvm_xen_hvm_config { > > #endif > > > > #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0) > > +/* Availabie with KVM_CAP_IRQFD_OADN */ > > Need to also explain what it is. Beyond Documentation/virtual/kvm/api.txt? I don't see much else getting documented here. Or maybe you mean /* On Ack, De-assert & Notify */ > > +#define KVM_IRQFD_FLAG_OADN (1 << 1) > > > > struct kvm_irqfd { > > __u32 fd; > > __u32 gsi; > > __u32 flags; > > - __u8 pad[20]; > > + __u32 notifyfd; > > Document that this is only valid with OADN flag. Might be a good idea > to rename this to deassert_on_ack_notifyfd or oadn_notifyfd > to avoid confusion. I'll add a /* only valid with KVM_IRQFD_FLAG_OADN */ I can change the name if you prefer, but it seems pretty clear to me how a notifyfd might relate to a "On Ack, De-assert & Notify" irqfd without pulling longer names into userspace. > > + __u8 pad[16]; > > }; > > > > struct kvm_clock_data { > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > index b763230..d502d08 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -284,6 +284,7 @@ struct kvm { > > struct { > > spinlock_t lock; > > struct list_head items; > > + struct list_head oadns; > > } irqfds; > > struct list_head ioeventfds; > > #endif > > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c > > index 2245cfa..dfdb5b2 100644 > > --- a/virt/kvm/eventfd.c > > +++ b/virt/kvm/eventfd.c > > @@ -43,6 +43,23 @@ > > * -------------------------------------------------------------------- > > */ > > > > +/* > > + * OADN irqfds (On Ack, De-assert & Notify) are a special variety of > > + * irqfds that assert an interrupt to the irqchip on eventfd trigger, > > + * receieve notification when userspace acknowledges the interrupt, > > + * automatically de-asserts the irqchip level, and notifies userspace > > + * via the oadn_eventfd. This object helps to provide one-to-many > > + * deassert-to-notify so we can share a single irq source ID per OADN. > > + */ > > +struct _irqfd_oadn { > > + struct kvm *kvm; > > + int irq_source_id; /* IRQ source ID shared by these irqfds */ > > + struct list_head irqfds; /* list of irqfds using this object */ > > + struct kvm_irq_ack_notifier notifier; /* IRQ ACK notification */ > > + struct kref kref; /* Race-free removal */ > > + struct list_head list; > > +}; > > + > > struct _irqfd { > > /* Used for MSI fast-path */ > > struct kvm *kvm; > > @@ -52,6 +69,10 @@ struct _irqfd { > > /* Used for level IRQ fast-path */ > > int gsi; > > struct work_struct inject; > > + /* Used for OADN (On Ack, De-assert & Notify) path */ > > + struct eventfd_ctx *oadn_eventfd; > > + struct list_head oadn_list; > > Could you document RCU and locking rules for this field please? Sure > > + struct _irqfd_oadn *oadn; > > /* Used for setup/shutdown */ > > struct eventfd_ctx *eventfd; > > struct list_head list; > > @@ -71,6 +92,51 @@ irqfd_inject(struct work_struct *work) > > kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 0); > > } > > > > +static void > > +irqfd_oadn_inject(struct work_struct *work) > > +{ > > + struct _irqfd *irqfd = container_of(work, struct _irqfd, inject); > > + > > + kvm_set_irq(irqfd->kvm, irqfd->oadn->irq_source_id, irqfd->gsi, 1); > > +} > > + > > +/* > > + * Since OADN irqfds share an IRQ source ID, we de-assert once then > > + * notify all of the OADN irqfds using this GSI. We can't do multiple > > + * de-asserts or we risk racing with incoming re-asserts. > > + */ > > +static void > > +irqfd_oadn_notify(struct kvm_irq_ack_notifier *kian) > > +{ > > + struct _irqfd_oadn *oadn; > > + struct _irqfd *irqfd; > > + > > + oadn = container_of(kian, struct _irqfd_oadn, notifier); > > + > > + rcu_read_lock(); > > + > > + kvm_set_irq(oadn->kvm, oadn->irq_source_id, > > + oadn->notifier.gsi, 0); > > + > > + list_for_each_entry_rcu(irqfd, &oadn->irqfds, oadn_list) > > + eventfd_signal(irqfd->oadn_eventfd, 1); > > + > > + rcu_read_unlock(); > > +} > > + > > +/* > > + * Assumes kvm->irqfds.lock is held. Remove from the list of OADNs, new > > + * users will allocate their own, letting us de-assert this GSI on free, > > + * after the RCU grace period. > > + */ > > +static void > > +irqfd_oadn_release(struct kref *kref) > > +{ > > + struct _irqfd_oadn *oadn = container_of(kref, struct _irqfd_oadn, kref); > > + > > + list_del(&oadn->list); > > +} > > + > > /* > > * Race-free decouple logic (ordering is critical) > > */ > > @@ -93,6 +159,35 @@ irqfd_shutdown(struct work_struct *work) > > flush_work_sync(&irqfd->inject); > > > > /* > > + * OADN irqfds use an RCU list for lockless de-assert and user > > + * notification. Modify the list using RCU under lock and release > > + * the OADN object. Since we later free this irqfd, we must wait > > + * for an RCU grace period. If the OADN was released, we can then > > + * unregister the irq ack notifier, free the irq source ID (assuring > > + * that the GSI is de-asserted), and free the object. > > + */ > > + if (irqfd->oadn) { > > + struct _irqfd_oadn *oadn = irqfd->oadn; > > + struct kvm *kvm = oadn->kvm; > > + bool released; > > + > > + spin_lock_irq(&irqfd->kvm->irqfds.lock); > > + list_del_rcu(&irqfd->oadn_list); > > + released = kref_put(&oadn->kref, irqfd_oadn_release); > > + spin_unlock_irq(&irqfd->kvm->irqfds.lock); > > + > > + synchronize_rcu(); > > + > > + if (released) { > > + kvm_unregister_irq_ack_notifier(kvm, &oadn->notifier); > > + kvm_free_irq_source_id(kvm, oadn->irq_source_id); > > + kfree(oadn); > > + } > > + > > + eventfd_ctx_put(irqfd->oadn_eventfd); > > + } > > + > > + /* > > * It is now safe to release the object's resources > > */ > > eventfd_ctx_put(irqfd->eventfd); > > @@ -202,8 +297,9 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) > > { > > struct kvm_irq_routing_table *irq_rt; > > struct _irqfd *irqfd, *tmp; > > + struct _irqfd_oadn *oadn = NULL; > > struct file *file = NULL; > > - struct eventfd_ctx *eventfd = NULL; > > + struct eventfd_ctx *eventfd = NULL, *oadn_eventfd = NULL; > > int ret; > > unsigned int events; > > > > @@ -214,7 +310,49 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) > > irqfd->kvm = kvm; > > irqfd->gsi = args->gsi; > > INIT_LIST_HEAD(&irqfd->list); > > - INIT_WORK(&irqfd->inject, irqfd_inject); > > + INIT_LIST_HEAD(&irqfd->oadn_list); > > + > > + if (args->flags & KVM_IRQFD_FLAG_OADN) { > > + oadn_eventfd = eventfd_ctx_fdget(args->notifyfd); > > + if (IS_ERR(oadn_eventfd)) { > > + ret = PTR_ERR(oadn_eventfd); > > + goto fail; > > + } > > + > > + irqfd->oadn_eventfd = oadn_eventfd; > > + > > + /* > > + * We may be able to share an existing OADN, but we won't > > + * know that until we search under spinlock. We can't get > > + * an irq_source_id under spinlock, and we'd prefer not to > > + * do an atomic allocation, so prep an object here and free > > + * it if we don't need it. > > + */ > > So if everyone uses same source ID this code will be simpler too? No, everyone using the same source ID results in racy cleanup that I can't figure out how to avoid while maintaining a lockless de-assert/notify path. See patch 0/2 for details of the race. > > + oadn = kzalloc(sizeof(*oadn), GFP_KERNEL); > > + if (!oadn) { > > + ret = -ENOMEM; > > + goto fail; > > + } > > + > > + oadn->kvm = kvm; > > + INIT_LIST_HEAD(&oadn->irqfds); > > + oadn->notifier.gsi = irqfd->gsi; > > + oadn->notifier.irq_acked = irqfd_oadn_notify; > > + kref_init(&oadn->kref); > > + INIT_LIST_HEAD(&oadn->list); > > + > > + oadn->irq_source_id = kvm_request_irq_source_id(kvm); > > + if (oadn->irq_source_id < 0) { > > + ret = oadn->irq_source_id; > > + goto fail; > > + } > > + > > This still has the problem that by allocating these > OADN irqfds you can run out of source IDs for other uses. > Why don't OADNs just use KVM_IRQFD_IRQ_SOURCE_ID? > I thought this is why it was introduced in this patchset ... See patch 0/2, it's racy. We have 64 IRQ source IDs and 24 GSIs currently. This let's all users on the same GSI share an IRQ source ID. Yes, we hold onto one we may not need during setup, but locking doesn't allow us to allocate it at the point where we'd need it. > > + irqfd->oadn = oadn; > > + > > + INIT_WORK(&irqfd->inject, irqfd_oadn_inject); > > + } else > > + INIT_WORK(&irqfd->inject, irqfd_inject); > > + > > INIT_WORK(&irqfd->shutdown, irqfd_shutdown); > > > > file = eventfd_fget(args->fd); > > @@ -250,6 +388,27 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) > > goto fail; > > } > > > > + /* > > + * When configuring an OADN irqfd, try to re-use an existing OADN. > > + */ > > + if (oadn) { > > + struct _irqfd_oadn *oadn_tmp; > > + > > + list_for_each_entry(oadn_tmp, &kvm->irqfds.oadns, list) { > > + if (oadn_tmp->notifier.gsi == irqfd->gsi) { > > + irqfd->oadn = oadn_tmp; > > + break; > > + } > > + } > > + > > + if (irqfd->oadn != oadn) > > + kref_get(&irqfd->oadn->kref); > > + else > > + list_add(&oadn->list, &kvm->irqfds.oadns); > > + > > + list_add_rcu(&irqfd->oadn_list, &irqfd->oadn->irqfds); > > Hang on, don't we need irqfds_lock here? We're in kvm->irqfds.lock. > > + } > > + > > irq_rt = rcu_dereference_protected(kvm->irq_routing, > > lockdep_is_held(&kvm->irqfds.lock)); > > irqfd_update(kvm, irqfd, irq_rt); > > @@ -267,6 +426,26 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) > > > > spin_unlock_irq(&kvm->irqfds.lock); > > > > + if (oadn) { > > + synchronize_rcu(); > > Seems unexpected on assign. > What does this synchronize_rcu do? Because we force the notify below and we need the irqfd we just added to the list to be visible on the list for that. > > + > > + /* > > + * If we weren't able to re-use an OADN, setup the irq ack > > + * notifier outside of spinlock. Our interface requires > > + * users to be able to handle spurious de-assert & notifies, > > + * so trigger one here to make sure we didn't miss anything. > > + * Cleanup unused OADN if we share. > > + */ > > + if (irqfd->oadn == oadn) > > + kvm_register_irq_ack_notifier(kvm, &oadn->notifier); > > + else { > > + kvm_free_irq_source_id(kvm, oadn->irq_source_id); > > This is unfortunate - so we need a spare source ID for internal use. > Thus if 1 source ID is available, then it is possible that > > - allocate irqfd for A > - allocate irqfd for B > > succeeds because A uses a shared source id so it is freed here. > > While > > - allocate irqfd for B > - allocate irqfd for A > > fails because B used the last ID and now temporary allocation > above fails. Yep, suggestions? How many KVM_IRQFDs can we expect in-flight at one time vs how many IRQ source IDs will be used long term. VFIO assigned PCI device can only make use of up to 4 GSIs per root bridge, we only have 24 IOAPIC pins. > > + kfree(oadn); > > + } > > + > > + irqfd_oadn_notify(&irqfd->oadn->notifier); > > + } > > + > > /* > > * do not drop the file until the irqfd is fully initialized, otherwise > > * we might race against the POLLHUP > > @@ -276,12 +455,19 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) > > return 0; > > > > fail: > > + if (oadn_eventfd && !IS_ERR(oadn_eventfd)) > > + eventfd_ctx_put(oadn_eventfd); > > + > > + if (oadn && oadn->irq_source_id >= 0) > > + kvm_free_irq_source_id(kvm, oadn->irq_source_id); > > + > > A bit cleaner to have distinct labels for failures > at different stages of the setup. > I know it's not written this way ATM so it's not a must > to address this. I often find myself getting lost trying to use distinct labels. I'm just following along what this function already does, but find it to be a nice simplification to handle everything this way. Thanks, Alex > > if (eventfd && !IS_ERR(eventfd)) > > eventfd_ctx_put(eventfd); > > > > if (!IS_ERR(file)) > > fput(file); > > > > + kfree(oadn); > > kfree(irqfd); > > return ret; > > } > > @@ -291,6 +477,7 @@ kvm_eventfd_init(struct kvm *kvm) > > { > > spin_lock_init(&kvm->irqfds.lock); > > INIT_LIST_HEAD(&kvm->irqfds.items); > > + INIT_LIST_HEAD(&kvm->irqfds.oadns); > > INIT_LIST_HEAD(&kvm->ioeventfds); > > } > > > > @@ -340,7 +527,7 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args) > > int > > kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args) > > { > > - if (args->flags & ~KVM_IRQFD_FLAG_DEASSIGN) > > + if (args->flags & ~(KVM_IRQFD_FLAG_DEASSIGN | KVM_IRQFD_FLAG_OADN)) > > return -EINVAL; > > > > if (args->flags & KVM_IRQFD_FLAG_DEASSIGN) > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v9 2/2] kvm: On Ack, De-assert & Notify KVM_IRQFD extension 2012-08-21 21:11 ` Alex Williamson @ 2012-08-22 0:14 ` Michael S. Tsirkin 2012-08-22 1:48 ` Alex Williamson 0 siblings, 1 reply; 35+ messages in thread From: Michael S. Tsirkin @ 2012-08-22 0:14 UTC (permalink / raw) To: Alex Williamson; +Cc: avi, gleb, kvm, linux-kernel On Tue, Aug 21, 2012 at 03:11:41PM -0600, Alex Williamson wrote: > On Tue, 2012-08-21 at 23:37 +0300, Michael S. Tsirkin wrote: > > On Tue, Aug 21, 2012 at 01:29:14PM -0600, Alex Williamson wrote: > > > For VFIO based device assignment we'd like a mechanism to allow level > > > triggered interrutps to be directly injected into KVM. KVM_IRQFD > > > already allows this for edge triggered interrupts, but for level, we > > > need to watch for acknowledgement of the interrupt from the guest to > > > provide us a hint when to test the device and allow it to re-assert > > > if necessary. To do this, we create a new KVM_IRQFD mode called > > > "On Ack, De-assert & Notify", or OADN. In this mode, an interrupt > > > injection provides only a gsi assertion. We then hook into the IRQ > > > ACK notifier, which when triggered de-asserts the gsi and notifies > > > via another eventfd. It's then the responsibility of the user to > > > re-assert the interrupt is service is still required. > > > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > > > > Naming aside, looks good. > > I think I see some minor bugs, and I added some improvement > > suggestions below. > > > > Thanks! > > > > > --- > > > > > > Documentation/virtual/kvm/api.txt | 13 ++ > > > arch/x86/kvm/x86.c | 1 > > > include/linux/kvm.h | 6 + > > > include/linux/kvm_host.h | 1 > > > virt/kvm/eventfd.c | 193 ++++++++++++++++++++++++++++++++++++- > > > 5 files changed, 210 insertions(+), 4 deletions(-) > > > > > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt > > > index bf33aaa..87d7321 100644 > > > --- a/Documentation/virtual/kvm/api.txt > > > +++ b/Documentation/virtual/kvm/api.txt > > > @@ -1946,6 +1946,19 @@ the guest using the specified gsi pin. The irqfd is removed using > > > the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd > > > and kvm_irqfd.gsi. > > > > > > +With KVM_CAP_IRQFD_OADN, KVM_IRQFD supports an "On Ack, De-assert & > > > +Notify" option that allows emulation of level-triggered interrupts. > > > +When kvm_irqfd.fd is triggered, the requested gsi is asserted and > > > +remains asserted until interaction with the irqchip indicates the > > > +VM has acknowledged the interrupt, such as an EOI. On acknoledgement > > > +the gsi is automatically de-asserted and the user is notified via > > > +kvm_irqfd.notifyfd. The user is then required to re-assert the > > > +interrupt if the associated device still requires service. To enable > > > +this mode, configure the KVM_IRQFD using the KVM_IRQFD_FLAG_OADN flag > > > +and specify kvm_irqfd.notifyfd. Note that closing kvm_irqfd.notifyfd > > > +while configured in this mode does not disable the irqfd. The > > > +KVM_IRQFD_FLAG_OADN flag is only necessary on assignment. > > > + > > > 4.76 KVM_PPC_ALLOCATE_HTAB > > > > > > Capability: KVM_CAP_PPC_ALLOC_HTAB > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > > index cd98673..fde7b66 100644 > > > --- a/arch/x86/kvm/x86.c > > > +++ b/arch/x86/kvm/x86.c > > > @@ -2175,6 +2175,7 @@ int kvm_dev_ioctl_check_extension(long ext) > > > case KVM_CAP_PCI_2_3: > > > case KVM_CAP_KVMCLOCK_CTRL: > > > case KVM_CAP_IRQFD_IRQ_SOURCE_ID: > > > + case KVM_CAP_IRQFD_OADN: > > > r = 1; > > > break; > > > case KVM_CAP_COALESCED_MMIO: > > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > > > index ae66b9c..ec0f1d8 100644 > > > --- a/include/linux/kvm.h > > > +++ b/include/linux/kvm.h > > > @@ -619,6 +619,7 @@ struct kvm_ppc_smmu_info { > > > #define KVM_CAP_S390_COW 79 > > > #define KVM_CAP_PPC_ALLOC_HTAB 80 > > > #define KVM_CAP_IRQFD_IRQ_SOURCE_ID 81 > > > +#define KVM_CAP_IRQFD_OADN 82 > > > > > > #ifdef KVM_CAP_IRQ_ROUTING > > > > > > @@ -684,12 +685,15 @@ struct kvm_xen_hvm_config { > > > #endif > > > > > > #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0) > > > +/* Availabie with KVM_CAP_IRQFD_OADN */ > > > > Need to also explain what it is. > > Beyond Documentation/virtual/kvm/api.txt? I don't see much else getting > documented here. Or maybe you mean > > /* On Ack, De-assert & Notify */ yes > > > +#define KVM_IRQFD_FLAG_OADN (1 << 1) > > > > > > struct kvm_irqfd { > > > __u32 fd; > > > __u32 gsi; > > > __u32 flags; > > > - __u8 pad[20]; > > > + __u32 notifyfd; > > > > Document that this is only valid with OADN flag. Might be a good idea > > to rename this to deassert_on_ack_notifyfd or oadn_notifyfd > > to avoid confusion. > > I'll add a /* only valid with KVM_IRQFD_FLAG_OADN */ > > I can change the name if you prefer, but it seems pretty clear to me how > a notifyfd might relate to a "On Ack, De-assert & Notify" irqfd without > pulling longer names into userspace. Ideally you can figure out stuff without reading docs. So eschew abbreviations, make it as long and clear as possible is my advise. But it is up to Avi. > > > + __u8 pad[16]; > > > }; > > > > > > struct kvm_clock_data { > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > > index b763230..d502d08 100644 > > > --- a/include/linux/kvm_host.h > > > +++ b/include/linux/kvm_host.h > > > @@ -284,6 +284,7 @@ struct kvm { > > > struct { > > > spinlock_t lock; > > > struct list_head items; > > > + struct list_head oadns; > > > } irqfds; > > > struct list_head ioeventfds; > > > #endif > > > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c > > > index 2245cfa..dfdb5b2 100644 > > > --- a/virt/kvm/eventfd.c > > > +++ b/virt/kvm/eventfd.c > > > @@ -43,6 +43,23 @@ > > > * -------------------------------------------------------------------- > > > */ > > > > > > +/* > > > + * OADN irqfds (On Ack, De-assert & Notify) are a special variety of > > > + * irqfds that assert an interrupt to the irqchip on eventfd trigger, > > > + * receieve notification when userspace acknowledges the interrupt, > > > + * automatically de-asserts the irqchip level, and notifies userspace > > > + * via the oadn_eventfd. This object helps to provide one-to-many > > > + * deassert-to-notify so we can share a single irq source ID per OADN. > > > + */ > > > +struct _irqfd_oadn { > > > + struct kvm *kvm; > > > + int irq_source_id; /* IRQ source ID shared by these irqfds */ > > > + struct list_head irqfds; /* list of irqfds using this object */ > > > + struct kvm_irq_ack_notifier notifier; /* IRQ ACK notification */ > > > + struct kref kref; /* Race-free removal */ > > > + struct list_head list; > > > +}; > > > + > > > struct _irqfd { > > > /* Used for MSI fast-path */ > > > struct kvm *kvm; > > > @@ -52,6 +69,10 @@ struct _irqfd { > > > /* Used for level IRQ fast-path */ > > > int gsi; > > > struct work_struct inject; > > > + /* Used for OADN (On Ack, De-assert & Notify) path */ > > > + struct eventfd_ctx *oadn_eventfd; > > > + struct list_head oadn_list; > > > > Could you document RCU and locking rules for this field please? > > Sure > > > > + struct _irqfd_oadn *oadn; > > > /* Used for setup/shutdown */ > > > struct eventfd_ctx *eventfd; > > > struct list_head list; > > > @@ -71,6 +92,51 @@ irqfd_inject(struct work_struct *work) > > > kvm_set_irq(kvm, KVM_IRQFD_IRQ_SOURCE_ID, irqfd->gsi, 0); > > > } > > > > > > +static void > > > +irqfd_oadn_inject(struct work_struct *work) > > > +{ > > > + struct _irqfd *irqfd = container_of(work, struct _irqfd, inject); > > > + > > > + kvm_set_irq(irqfd->kvm, irqfd->oadn->irq_source_id, irqfd->gsi, 1); > > > +} > > > + > > > +/* > > > + * Since OADN irqfds share an IRQ source ID, we de-assert once then > > > + * notify all of the OADN irqfds using this GSI. We can't do multiple > > > + * de-asserts or we risk racing with incoming re-asserts. > > > + */ > > > +static void > > > +irqfd_oadn_notify(struct kvm_irq_ack_notifier *kian) > > > +{ > > > + struct _irqfd_oadn *oadn; > > > + struct _irqfd *irqfd; > > > + > > > + oadn = container_of(kian, struct _irqfd_oadn, notifier); > > > + > > > + rcu_read_lock(); > > > + > > > + kvm_set_irq(oadn->kvm, oadn->irq_source_id, > > > + oadn->notifier.gsi, 0); > > > + > > > + list_for_each_entry_rcu(irqfd, &oadn->irqfds, oadn_list) > > > + eventfd_signal(irqfd->oadn_eventfd, 1); > > > + > > > + rcu_read_unlock(); > > > +} > > > + > > > +/* > > > + * Assumes kvm->irqfds.lock is held. Remove from the list of OADNs, new > > > + * users will allocate their own, letting us de-assert this GSI on free, > > > + * after the RCU grace period. > > > + */ > > > +static void > > > +irqfd_oadn_release(struct kref *kref) > > > +{ > > > + struct _irqfd_oadn *oadn = container_of(kref, struct _irqfd_oadn, kref); > > > + > > > + list_del(&oadn->list); > > > +} > > > + > > > /* > > > * Race-free decouple logic (ordering is critical) > > > */ > > > @@ -93,6 +159,35 @@ irqfd_shutdown(struct work_struct *work) > > > flush_work_sync(&irqfd->inject); > > > > > > /* > > > + * OADN irqfds use an RCU list for lockless de-assert and user > > > + * notification. Modify the list using RCU under lock and release > > > + * the OADN object. Since we later free this irqfd, we must wait > > > + * for an RCU grace period. If the OADN was released, we can then > > > + * unregister the irq ack notifier, free the irq source ID (assuring > > > + * that the GSI is de-asserted), and free the object. > > > + */ > > > + if (irqfd->oadn) { > > > + struct _irqfd_oadn *oadn = irqfd->oadn; > > > + struct kvm *kvm = oadn->kvm; > > > + bool released; > > > + > > > + spin_lock_irq(&irqfd->kvm->irqfds.lock); > > > + list_del_rcu(&irqfd->oadn_list); > > > + released = kref_put(&oadn->kref, irqfd_oadn_release); > > > + spin_unlock_irq(&irqfd->kvm->irqfds.lock); > > > + > > > + synchronize_rcu(); > > > + > > > + if (released) { > > > + kvm_unregister_irq_ack_notifier(kvm, &oadn->notifier); > > > + kvm_free_irq_source_id(kvm, oadn->irq_source_id); > > > + kfree(oadn); > > > + } > > > + > > > + eventfd_ctx_put(irqfd->oadn_eventfd); > > > + } > > > + > > > + /* > > > * It is now safe to release the object's resources > > > */ > > > eventfd_ctx_put(irqfd->eventfd); > > > @@ -202,8 +297,9 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) > > > { > > > struct kvm_irq_routing_table *irq_rt; > > > struct _irqfd *irqfd, *tmp; > > > + struct _irqfd_oadn *oadn = NULL; > > > struct file *file = NULL; > > > - struct eventfd_ctx *eventfd = NULL; > > > + struct eventfd_ctx *eventfd = NULL, *oadn_eventfd = NULL; > > > int ret; > > > unsigned int events; > > > > > > @@ -214,7 +310,49 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) > > > irqfd->kvm = kvm; > > > irqfd->gsi = args->gsi; > > > INIT_LIST_HEAD(&irqfd->list); > > > - INIT_WORK(&irqfd->inject, irqfd_inject); > > > + INIT_LIST_HEAD(&irqfd->oadn_list); > > > + > > > + if (args->flags & KVM_IRQFD_FLAG_OADN) { > > > + oadn_eventfd = eventfd_ctx_fdget(args->notifyfd); > > > + if (IS_ERR(oadn_eventfd)) { > > > + ret = PTR_ERR(oadn_eventfd); > > > + goto fail; > > > + } > > > + > > > + irqfd->oadn_eventfd = oadn_eventfd; > > > + > > > + /* > > > + * We may be able to share an existing OADN, but we won't > > > + * know that until we search under spinlock. We can't get > > > + * an irq_source_id under spinlock, and we'd prefer not to > > > + * do an atomic allocation, so prep an object here and free > > > + * it if we don't need it. > > > + */ > > > > So if everyone uses same source ID this code will be simpler too? > > No, everyone using the same source ID results in racy cleanup that I > can't figure out how to avoid while maintaining a lockless > de-assert/notify path. See patch 0/2 for details of the race. > > > > + oadn = kzalloc(sizeof(*oadn), GFP_KERNEL); > > > + if (!oadn) { > > > + ret = -ENOMEM; > > > + goto fail; > > > + } > > > + > > > + oadn->kvm = kvm; > > > + INIT_LIST_HEAD(&oadn->irqfds); > > > + oadn->notifier.gsi = irqfd->gsi; > > > + oadn->notifier.irq_acked = irqfd_oadn_notify; > > > + kref_init(&oadn->kref); > > > + INIT_LIST_HEAD(&oadn->list); > > > + > > > + oadn->irq_source_id = kvm_request_irq_source_id(kvm); > > > + if (oadn->irq_source_id < 0) { > > > + ret = oadn->irq_source_id; > > > + goto fail; > > > + } > > > + > > > > This still has the problem that by allocating these > > OADN irqfds you can run out of source IDs for other uses. > > Why don't OADNs just use KVM_IRQFD_IRQ_SOURCE_ID? > > I thought this is why it was introduced in this patchset ... > > See patch 0/2, it's racy. We have 64 IRQ source IDs and 24 GSIs > currently. This let's all users on the same GSI share an IRQ source ID. > Yes, we hold onto one we may not need during setup, but locking doesn't > allow us to allocate it at the point where we'd need it. > > > > + irqfd->oadn = oadn; > > > + > > > + INIT_WORK(&irqfd->inject, irqfd_oadn_inject); > > > + } else > > > + INIT_WORK(&irqfd->inject, irqfd_inject); > > > + > > > INIT_WORK(&irqfd->shutdown, irqfd_shutdown); > > > > > > file = eventfd_fget(args->fd); > > > @@ -250,6 +388,27 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) > > > goto fail; > > > } > > > > > > + /* > > > + * When configuring an OADN irqfd, try to re-use an existing OADN. > > > + */ > > > + if (oadn) { > > > + struct _irqfd_oadn *oadn_tmp; > > > + > > > + list_for_each_entry(oadn_tmp, &kvm->irqfds.oadns, list) { > > > + if (oadn_tmp->notifier.gsi == irqfd->gsi) { > > > + irqfd->oadn = oadn_tmp; > > > + break; > > > + } > > > + } > > > + > > > + if (irqfd->oadn != oadn) > > > + kref_get(&irqfd->oadn->kref); > > > + else > > > + list_add(&oadn->list, &kvm->irqfds.oadns); > > > + > > > + list_add_rcu(&irqfd->oadn_list, &irqfd->oadn->irqfds); > > > > Hang on, don't we need irqfds_lock here? > > We're in kvm->irqfds.lock. so we are my bad > > > + } > > > + > > > irq_rt = rcu_dereference_protected(kvm->irq_routing, > > > lockdep_is_held(&kvm->irqfds.lock)); > > > irqfd_update(kvm, irqfd, irq_rt); > > > @@ -267,6 +426,26 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) > > > > > > spin_unlock_irq(&kvm->irqfds.lock); > > > > > > + if (oadn) { > > > + synchronize_rcu(); > > > > Seems unexpected on assign. > > What does this synchronize_rcu do? > > Because we force the notify below and we need the irqfd we just added to > the list to be visible on the list for that. I do not see it yet - maybe add a comment? > > > + > > > + /* > > > + * If we weren't able to re-use an OADN, setup the irq ack > > > + * notifier outside of spinlock. Our interface requires > > > + * users to be able to handle spurious de-assert & notifies, > > > + * so trigger one here to make sure we didn't miss anything. > > > + * Cleanup unused OADN if we share. > > > + */ > > > + if (irqfd->oadn == oadn) > > > + kvm_register_irq_ack_notifier(kvm, &oadn->notifier); > > > + else { > > > + kvm_free_irq_source_id(kvm, oadn->irq_source_id); > > > > This is unfortunate - so we need a spare source ID for internal use. > > Thus if 1 source ID is available, then it is possible that > > > > - allocate irqfd for A > > - allocate irqfd for B > > > > succeeds because A uses a shared source id so it is freed here. > > > > While > > > > - allocate irqfd for B > > - allocate irqfd for A > > > > fails because B used the last ID and now temporary allocation > > above fails. > > Yep, suggestions? How many KVM_IRQFDs can we expect in-flight at one > time vs how many IRQ source IDs will be used long term. VFIO assigned > PCI device can only make use of up to 4 GSIs per root bridge, we only > have 24 IOAPIC pins. > > > > + kfree(oadn); > > > + } > > > + > > > + irqfd_oadn_notify(&irqfd->oadn->notifier); > > > + } > > > + > > > /* > > > * do not drop the file until the irqfd is fully initialized, otherwise > > > * we might race against the POLLHUP > > > @@ -276,12 +455,19 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) > > > return 0; > > > > > > fail: > > > + if (oadn_eventfd && !IS_ERR(oadn_eventfd)) > > > + eventfd_ctx_put(oadn_eventfd); > > > + > > > + if (oadn && oadn->irq_source_id >= 0) > > > + kvm_free_irq_source_id(kvm, oadn->irq_source_id); > > > + > > > > A bit cleaner to have distinct labels for failures > > at different stages of the setup. > > I know it's not written this way ATM so it's not a must > > to address this. > > I often find myself getting lost trying to use distinct labels. I'm > just following along what this function already does, but find it to be > a nice simplification to handle everything this way. Thanks, > > Alex > > > > if (eventfd && !IS_ERR(eventfd)) > > > eventfd_ctx_put(eventfd); > > > > > > if (!IS_ERR(file)) > > > fput(file); > > > > > > + kfree(oadn); > > > kfree(irqfd); > > > return ret; > > > } > > > @@ -291,6 +477,7 @@ kvm_eventfd_init(struct kvm *kvm) > > > { > > > spin_lock_init(&kvm->irqfds.lock); > > > INIT_LIST_HEAD(&kvm->irqfds.items); > > > + INIT_LIST_HEAD(&kvm->irqfds.oadns); > > > INIT_LIST_HEAD(&kvm->ioeventfds); > > > } > > > > > > @@ -340,7 +527,7 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args) > > > int > > > kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args) > > > { > > > - if (args->flags & ~KVM_IRQFD_FLAG_DEASSIGN) > > > + if (args->flags & ~(KVM_IRQFD_FLAG_DEASSIGN | KVM_IRQFD_FLAG_OADN)) > > > return -EINVAL; > > > > > > if (args->flags & KVM_IRQFD_FLAG_DEASSIGN) > > -- > > To unsubscribe from this list: send the line "unsubscribe kvm" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v9 2/2] kvm: On Ack, De-assert & Notify KVM_IRQFD extension 2012-08-22 0:14 ` Michael S. Tsirkin @ 2012-08-22 1:48 ` Alex Williamson 0 siblings, 0 replies; 35+ messages in thread From: Alex Williamson @ 2012-08-22 1:48 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: avi, gleb, kvm, linux-kernel On Wed, 2012-08-22 at 03:14 +0300, Michael S. Tsirkin wrote: > On Tue, Aug 21, 2012 at 03:11:41PM -0600, Alex Williamson wrote: > > On Tue, 2012-08-21 at 23:37 +0300, Michael S. Tsirkin wrote: > > > On Tue, Aug 21, 2012 at 01:29:14PM -0600, Alex Williamson wrote: > > > > + } > > > > + > > > > irq_rt = rcu_dereference_protected(kvm->irq_routing, > > > > lockdep_is_held(&kvm->irqfds.lock)); > > > > irqfd_update(kvm, irqfd, irq_rt); > > > > @@ -267,6 +426,26 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) > > > > > > > > spin_unlock_irq(&kvm->irqfds.lock); > > > > > > > > + if (oadn) { > > > > + synchronize_rcu(); > > > > > > Seems unexpected on assign. > > > What does this synchronize_rcu do? > > > > Because we force the notify below and we need the irqfd we just added to > > the list to be visible on the list for that. > > I do not see it yet - maybe add a comment? It's entirely possible I'm wrong too. This was an after thought and now that I look at it again, the below irqfd_oadn_notify obviously starts after the list_add, so there shouldn't be any need to wait. Thanks, Alex ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v9 2/2] kvm: On Ack, De-assert & Notify KVM_IRQFD extension 2012-08-21 19:29 ` [PATCH v9 2/2] kvm: On Ack, De-assert & Notify KVM_IRQFD extension Alex Williamson 2012-08-21 20:37 ` Michael S. Tsirkin @ 2012-09-05 14:57 ` Avi Kivity 2012-09-17 16:13 ` Alex Williamson 1 sibling, 1 reply; 35+ messages in thread From: Avi Kivity @ 2012-09-05 14:57 UTC (permalink / raw) To: Alex Williamson; +Cc: mst, gleb, kvm, linux-kernel On 08/21/2012 10:29 PM, Alex Williamson wrote: > For VFIO based device assignment we'd like a mechanism to allow level > triggered interrutps to be directly injected into KVM. KVM_IRQFD > already allows this for edge triggered interrupts, but for level, we > need to watch for acknowledgement of the interrupt from the guest to > provide us a hint when to test the device and allow it to re-assert > if necessary. To do this, we create a new KVM_IRQFD mode called > "On Ack, De-assert & Notify", or OADN. In this mode, an interrupt > injection provides only a gsi assertion. We then hook into the IRQ > ACK notifier, which when triggered de-asserts the gsi and notifies > via another eventfd. It's then the responsibility of the user to > re-assert the interrupt is service is still required. > > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt > index bf33aaa..87d7321 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -1946,6 +1946,19 @@ the guest using the specified gsi pin. The irqfd is removed using > the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd > and kvm_irqfd.gsi. > > +With KVM_CAP_IRQFD_OADN, KVM_IRQFD supports an "On Ack, De-assert & > +Notify" option that allows emulation of level-triggered interrupts. > +When kvm_irqfd.fd is triggered, the requested gsi is asserted and > +remains asserted until interaction with the irqchip indicates the > +VM has acknowledged the interrupt, such as an EOI. On acknoledgement > +the gsi is automatically de-asserted and the user is notified via > +kvm_irqfd.notifyfd. The user is then required to re-assert the > +interrupt if the associated device still requires service. To enable > +this mode, configure the KVM_IRQFD using the KVM_IRQFD_FLAG_OADN flag > +and specify kvm_irqfd.notifyfd. Note that closing kvm_irqfd.notifyfd > +while configured in this mode does not disable the irqfd. The > +KVM_IRQFD_FLAG_OADN flag is only necessary on assignment. Under my suggested naming, this would be called a "resampling irqfd", with resampling requested via kvm_irqfd.resamplefd. > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c > index 2245cfa..dfdb5b2 100644 > --- a/virt/kvm/eventfd.c > +++ b/virt/kvm/eventfd.c > @@ -43,6 +43,23 @@ > * -------------------------------------------------------------------- > */ > > +/* > + * OADN irqfds (On Ack, De-assert & Notify) are a special variety of > + * irqfds that assert an interrupt to the irqchip on eventfd trigger, > + * receieve notification when userspace acknowledges the interrupt, > + * automatically de-asserts the irqchip level, and notifies userspace > + * via the oadn_eventfd. This object helps to provide one-to-many > + * deassert-to-notify so we can share a single irq source ID per OADN. > + */ > +struct _irqfd_oadn { > + struct kvm *kvm; > + int irq_source_id; /* IRQ source ID shared by these irqfds */ > + struct list_head irqfds; /* list of irqfds using this object */ > + struct kvm_irq_ack_notifier notifier; /* IRQ ACK notification */ > + struct kref kref; /* Race-free removal */ > + struct list_head list; > +}; Why do you need per-gsi irq source IDs? irq source ids only matter within a gsi. For example KVM_IRQ_LINE shares one source ID for all lines (with result that userspace is forced to manage the ORing of shared inputs itself). -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v9 2/2] kvm: On Ack, De-assert & Notify KVM_IRQFD extension 2012-09-05 14:57 ` Avi Kivity @ 2012-09-17 16:13 ` Alex Williamson 0 siblings, 0 replies; 35+ messages in thread From: Alex Williamson @ 2012-09-17 16:13 UTC (permalink / raw) To: Avi Kivity; +Cc: mst, gleb, kvm, linux-kernel On Wed, 2012-09-05 at 17:57 +0300, Avi Kivity wrote: > On 08/21/2012 10:29 PM, Alex Williamson wrote: > > For VFIO based device assignment we'd like a mechanism to allow level > > triggered interrutps to be directly injected into KVM. KVM_IRQFD > > already allows this for edge triggered interrupts, but for level, we > > need to watch for acknowledgement of the interrupt from the guest to > > provide us a hint when to test the device and allow it to re-assert > > if necessary. To do this, we create a new KVM_IRQFD mode called > > "On Ack, De-assert & Notify", or OADN. In this mode, an interrupt > > injection provides only a gsi assertion. We then hook into the IRQ > > ACK notifier, which when triggered de-asserts the gsi and notifies > > via another eventfd. It's then the responsibility of the user to > > re-assert the interrupt is service is still required. > > > > > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt > > index bf33aaa..87d7321 100644 > > --- a/Documentation/virtual/kvm/api.txt > > +++ b/Documentation/virtual/kvm/api.txt > > @@ -1946,6 +1946,19 @@ the guest using the specified gsi pin. The irqfd is removed using > > the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd > > and kvm_irqfd.gsi. > > > > +With KVM_CAP_IRQFD_OADN, KVM_IRQFD supports an "On Ack, De-assert & > > +Notify" option that allows emulation of level-triggered interrupts. > > +When kvm_irqfd.fd is triggered, the requested gsi is asserted and > > +remains asserted until interaction with the irqchip indicates the > > +VM has acknowledged the interrupt, such as an EOI. On acknoledgement > > +the gsi is automatically de-asserted and the user is notified via > > +kvm_irqfd.notifyfd. The user is then required to re-assert the > > +interrupt if the associated device still requires service. To enable > > +this mode, configure the KVM_IRQFD using the KVM_IRQFD_FLAG_OADN flag > > +and specify kvm_irqfd.notifyfd. Note that closing kvm_irqfd.notifyfd > > +while configured in this mode does not disable the irqfd. The > > +KVM_IRQFD_FLAG_OADN flag is only necessary on assignment. > > Under my suggested naming, this would be called a "resampling irqfd", > with resampling requested via kvm_irqfd.resamplefd. > > > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c > > index 2245cfa..dfdb5b2 100644 > > --- a/virt/kvm/eventfd.c > > +++ b/virt/kvm/eventfd.c > > @@ -43,6 +43,23 @@ > > * -------------------------------------------------------------------- > > */ > > > > +/* > > + * OADN irqfds (On Ack, De-assert & Notify) are a special variety of > > + * irqfds that assert an interrupt to the irqchip on eventfd trigger, > > + * receieve notification when userspace acknowledges the interrupt, > > + * automatically de-asserts the irqchip level, and notifies userspace > > + * via the oadn_eventfd. This object helps to provide one-to-many > > + * deassert-to-notify so we can share a single irq source ID per OADN. > > + */ > > +struct _irqfd_oadn { > > + struct kvm *kvm; > > + int irq_source_id; /* IRQ source ID shared by these irqfds */ > > + struct list_head irqfds; /* list of irqfds using this object */ > > + struct kvm_irq_ack_notifier notifier; /* IRQ ACK notification */ > > + struct kref kref; /* Race-free removal */ > > + struct list_head list; > > +}; > > > Why do you need per-gsi irq source IDs? irq source ids only matter > within a gsi. For example KVM_IRQ_LINE shares one source ID for all > lines (with result that userspace is forced to manage the ORing of > shared inputs itself). Right, but locking makes it difficult to tear down a resample irqfd without potentially racing creation of a new one, which I tried to explain here: http://www.spinics.net/lists/kvm/msg78460.html This can cause a de-assert w/o ack as we briefly have multiple resample irqfds on the same gsi, irq source id pair. That can dead lock a vfio device. Using a new irq source ID ensures that the old resample irqfd doesn't interfere with the new one. We count on the final clear or the gsi assertion when releasing the irq source id, so we can't share it among other resample irqfds on other gsis with different life cycles. Michael has suggested re-architecting the locking around some structure, but I'm not sure it's worth it. AFAICT we have more irq source IDs than we could consume if resample irqfds on the same gsi share an irq source id. Thanks, Alex ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v9 0/2] kvm: level irqfd support 2012-08-21 19:28 [PATCH v9 0/2] kvm: level irqfd support Alex Williamson 2012-08-21 19:29 ` [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd Alex Williamson 2012-08-21 19:29 ` [PATCH v9 2/2] kvm: On Ack, De-assert & Notify KVM_IRQFD extension Alex Williamson @ 2012-08-22 0:31 ` Michael S. Tsirkin 2012-08-22 1:28 ` Alex Williamson 2012-08-22 10:10 ` Michael S. Tsirkin 2012-09-05 14:42 ` Avi Kivity 4 siblings, 1 reply; 35+ messages in thread From: Michael S. Tsirkin @ 2012-08-22 0:31 UTC (permalink / raw) To: Alex Williamson; +Cc: avi, gleb, kvm, linux-kernel On Tue, Aug 21, 2012 at 01:28:57PM -0600, Alex Williamson wrote: > Here's the much anticipated re-write of support for level irqfds. As > Michael suggested, I've rolled the eoi/ack notification fd into > KVM_IRQFD as a new mode. For lack of a better name, as there seems to > be objections to associating this specifically with an EOI or an ACK, > I've name this OADN or "On Ack, De-assert & Notify". > > Patch 1of2 switches current KVM_IRQFDs to use their own IRQ source ID > since we're potentially stepping on KVM_USERSPACE_IRQ_SOURCE_ID. > Unfurtunately I was not able to make 2of2 use a single IRQ source ID, > the reason is it's racy. Objects to track OADNs are made dynamically, > we look through existing ones for a match under spinlock and setup a > new one if there's no match. On teardown, we can remove the OADN from > the list under lock, but that same lock prevents us from de-assigning > the IRQ ACK notifier or waiting for an RCU grace period. We must make > sure that any unused GSI is de-asserted, but the above means it's > possible that another OADN has been created for this source ID/GSI > and de-asserting the GSI could lead to breakage. I do not see it. What breakage? Could you give an example please? I think what you are saying is last deassign must clear since otherwise we never will clear. I agree it is either that or delay deassign until ack. Can it be as simple as this (after all rcu etc dances)? lock irqfds if no oadns set level to 0 unlock irqfds ? > Instead each OADN > object gets it's own source ID, but these are all shared by users > of the same GSI. So for PCI devices, we might have up to 4 IRQ > source IDs allocated. > > Michael had also suggested avoiding reference counting and using > list_empty for this OADN object. Unfortunately, that doesn't work > for similar reasons. We want to release the OADN object underlock, > preventing others from re-using it on the free path, but in order > to have lock-less de-assert & notify we use RCU, meaning we can't > trust list_empty until after an RCU grace period, which must be > done outside of spinlocks. confused. list empty on assign/deassing would be under lock so no need for grace periods to trust it. what am I missing? But if you like kref more that is OK too. > If there are suggestions how we can handle these better, please > make them, but I think this compromise is race-free and still > manages to make allocation of IRQ source IDs mostly a non-issue > for device assignment limits. Thanks, > > Alex > > --- > > Alex Williamson (2): > kvm: On Ack, De-assert & Notify KVM_IRQFD extension > kvm: Use a reserved IRQ source ID for irqfd > > > Documentation/virtual/kvm/api.txt | 13 ++ > arch/x86/kvm/x86.c | 4 + > include/linux/kvm.h | 7 + > include/linux/kvm_host.h | 2 > virt/kvm/eventfd.c | 199 ++++++++++++++++++++++++++++++++++++- > 5 files changed, 218 insertions(+), 7 deletions(-) ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v9 0/2] kvm: level irqfd support 2012-08-22 0:31 ` [PATCH v9 0/2] kvm: level irqfd support Michael S. Tsirkin @ 2012-08-22 1:28 ` Alex Williamson 2012-08-22 8:25 ` Michael S. Tsirkin 0 siblings, 1 reply; 35+ messages in thread From: Alex Williamson @ 2012-08-22 1:28 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: avi, gleb, kvm, linux-kernel On Wed, 2012-08-22 at 03:31 +0300, Michael S. Tsirkin wrote: > On Tue, Aug 21, 2012 at 01:28:57PM -0600, Alex Williamson wrote: > > Here's the much anticipated re-write of support for level irqfds. As > > Michael suggested, I've rolled the eoi/ack notification fd into > > KVM_IRQFD as a new mode. For lack of a better name, as there seems to > > be objections to associating this specifically with an EOI or an ACK, > > I've name this OADN or "On Ack, De-assert & Notify". > > > > Patch 1of2 switches current KVM_IRQFDs to use their own IRQ source ID > > since we're potentially stepping on KVM_USERSPACE_IRQ_SOURCE_ID. > > Unfurtunately I was not able to make 2of2 use a single IRQ source ID, > > the reason is it's racy. Objects to track OADNs are made dynamically, > > we look through existing ones for a match under spinlock and setup a > > new one if there's no match. On teardown, we can remove the OADN from > > the list under lock, but that same lock prevents us from de-assigning > > the IRQ ACK notifier or waiting for an RCU grace period. We must make > > sure that any unused GSI is de-asserted, but the above means it's > > possible that another OADN has been created for this source ID/GSI > > and de-asserting the GSI could lead to breakage. > > I do not see it. What breakage? Could you give an example please? > > > I think what you are saying is last deassign must clear > since otherwise we never will clear. > I agree it is either that or delay deassign until ack. > > Can it be as simple as this (after all rcu etc dances)? > lock irqfds > if no oadns > set level to 0 > unlock irqfds > ? lock irqfds remove irqfd from oadn list if no oadns remove oadn set gsi 0 unlock lock irqfds new oadn unlock irqfds >> EOI ack notify new oadn de-assert gsi notify new oadn >> re-assert irqfd ack notify old oadn de-assert gsi notify old oadn synchronize_rcu kvm_unregister_irq_ack_notifier So, because the unregister is removed from the final clear and because we share an IRQ source ID there's a window where we can have two oadns registered for the same GSI. The new one will de-assert and notify while the old one has an empty list to notify, but still de-asserts. We can therefore de-assert w/o notify. By using a new source ID, we separate the two so users of the new oadn can't race the old and we can cleanly free the old source ID, de-asserting it. > > Instead each OADN > > object gets it's own source ID, but these are all shared by users > > of the same GSI. So for PCI devices, we might have up to 4 IRQ > > source IDs allocated. > > > > Michael had also suggested avoiding reference counting and using > > list_empty for this OADN object. Unfortunately, that doesn't work > > for similar reasons. We want to release the OADN object underlock, > > preventing others from re-using it on the free path, but in order > > to have lock-less de-assert & notify we use RCU, meaning we can't > > trust list_empty until after an RCU grace period, which must be > > done outside of spinlocks. > > confused. list empty on assign/deassing would be under lock > so no need for grace periods to trust it. > what am I missing? > > But if you like kref more that is OK too. Maybe I'm misinterpreting this: include/linux/rculist.h: /** * list_del_rcu - deletes entry from list without re-initialization * @entry: the element to delete from the list. * * Note: list_empty() on entry does not return true after this, * the entry is in an undefined state. It is useful for RCU based * lockfree traversal. If I can trust list_empty on oadn->irqfds, which maybe I can re-reading it again, then we can drop the kref. Thanks, Alex > > If there are suggestions how we can handle these better, please > > make them, but I think this compromise is race-free and still > > manages to make allocation of IRQ source IDs mostly a non-issue > > for device assignment limits. Thanks, > > > > Alex > > > > --- > > > > Alex Williamson (2): > > kvm: On Ack, De-assert & Notify KVM_IRQFD extension > > kvm: Use a reserved IRQ source ID for irqfd > > > > > > Documentation/virtual/kvm/api.txt | 13 ++ > > arch/x86/kvm/x86.c | 4 + > > include/linux/kvm.h | 7 + > > include/linux/kvm_host.h | 2 > > virt/kvm/eventfd.c | 199 ++++++++++++++++++++++++++++++++++++- > > 5 files changed, 218 insertions(+), 7 deletions(-) ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v9 0/2] kvm: level irqfd support 2012-08-22 1:28 ` Alex Williamson @ 2012-08-22 8:25 ` Michael S. Tsirkin 2012-09-17 18:08 ` Alex Williamson 0 siblings, 1 reply; 35+ messages in thread From: Michael S. Tsirkin @ 2012-08-22 8:25 UTC (permalink / raw) To: Alex Williamson; +Cc: avi, gleb, kvm, linux-kernel On Tue, Aug 21, 2012 at 07:28:15PM -0600, Alex Williamson wrote: > On Wed, 2012-08-22 at 03:31 +0300, Michael S. Tsirkin wrote: > > On Tue, Aug 21, 2012 at 01:28:57PM -0600, Alex Williamson wrote: > > > Here's the much anticipated re-write of support for level irqfds. As > > > Michael suggested, I've rolled the eoi/ack notification fd into > > > KVM_IRQFD as a new mode. For lack of a better name, as there seems to > > > be objections to associating this specifically with an EOI or an ACK, > > > I've name this OADN or "On Ack, De-assert & Notify". > > > > > > Patch 1of2 switches current KVM_IRQFDs to use their own IRQ source ID > > > since we're potentially stepping on KVM_USERSPACE_IRQ_SOURCE_ID. > > > Unfurtunately I was not able to make 2of2 use a single IRQ source ID, > > > the reason is it's racy. Objects to track OADNs are made dynamically, > > > we look through existing ones for a match under spinlock and setup a > > > new one if there's no match. On teardown, we can remove the OADN from > > > the list under lock, but that same lock prevents us from de-assigning > > > the IRQ ACK notifier or waiting for an RCU grace period. We must make > > > sure that any unused GSI is de-asserted, but the above means it's > > > possible that another OADN has been created for this source ID/GSI > > > and de-asserting the GSI could lead to breakage. > > > > I do not see it. What breakage? Could you give an example please? > > > > > > I think what you are saying is last deassign must clear > > since otherwise we never will clear. > > I agree it is either that or delay deassign until ack. > > > > Can it be as simple as this (after all rcu etc dances)? > > lock irqfds > > if no oadns > > set level to 0 > > unlock irqfds > > ? > > lock irqfds > remove irqfd from oadn list > if no oadns > remove oadn > set gsi 0 > unlock > lock irqfds > new oadn > unlock irqfds > > >> EOI > ack notify new oadn > de-assert gsi > notify new oadn > >> re-assert irqfd > ack notify old oadn > de-assert gsi > notify old oadn > > synchronize_rcu > > kvm_unregister_irq_ack_notifier > > So, because the unregister is removed from the final clear and because > we share an IRQ source ID there's a window where we can have two oadns > registered for the same GSI. The new one will de-assert and notify > while the old one has an empty list to notify, but still de-asserts. We > can therefore de-assert w/o notify. > > By using a new source ID, we separate the two so users of the new oadn > can't race the old and we can cleanly free the old source ID, > de-asserting it. Need to think about it some more but is the problem two ack notifiers for the same gsi? In that case, how about we add __kvm_unregister_irq_ack_notifier with no locking, and do most of the above under kvm->irq_lock? With one change: it is better not to call synchronize_rcu under irq lock, I think we can safely move it to after __kvm_unregister_irq_ack_notifier. > > > Instead each OADN > > > object gets it's own source ID, but these are all shared by users > > > of the same GSI. So for PCI devices, we might have up to 4 IRQ > > > source IDs allocated. > > > > > > Michael had also suggested avoiding reference counting and using > > > list_empty for this OADN object. Unfortunately, that doesn't work > > > for similar reasons. We want to release the OADN object underlock, > > > preventing others from re-using it on the free path, but in order > > > to have lock-less de-assert & notify we use RCU, meaning we can't > > > trust list_empty until after an RCU grace period, which must be > > > done outside of spinlocks. > > > > confused. list empty on assign/deassing would be under lock > > so no need for grace periods to trust it. > > what am I missing? > > > > But if you like kref more that is OK too. > > Maybe I'm misinterpreting this: > > include/linux/rculist.h: > /** > * list_del_rcu - deletes entry from list without re-initialization > * @entry: the element to delete from the list. > * > * Note: list_empty() on entry does not return true after this, > * the entry is in an undefined state. It is useful for RCU based > * lockfree traversal. > > If I can trust list_empty on oadn->irqfds, which maybe I can re-reading > it again, then we can drop the kref. Thanks, > > Alex I think you are - *the entry you deleted* is not empty. The list itself naturally is, or so it seems from code. static inline void list_del_rcu(struct list_head *entry) { __list_del_entry(entry); entry->prev = LIST_POISON2; } No? > > > > If there are suggestions how we can handle these better, please > > > make them, but I think this compromise is race-free and still > > > manages to make allocation of IRQ source IDs mostly a non-issue > > > for device assignment limits. Thanks, > > > > > > Alex > > > > > > --- > > > > > > Alex Williamson (2): > > > kvm: On Ack, De-assert & Notify KVM_IRQFD extension > > > kvm: Use a reserved IRQ source ID for irqfd > > > > > > > > > Documentation/virtual/kvm/api.txt | 13 ++ > > > arch/x86/kvm/x86.c | 4 + > > > include/linux/kvm.h | 7 + > > > include/linux/kvm_host.h | 2 > > > virt/kvm/eventfd.c | 199 ++++++++++++++++++++++++++++++++++++- > > > 5 files changed, 218 insertions(+), 7 deletions(-) > > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v9 0/2] kvm: level irqfd support 2012-08-22 8:25 ` Michael S. Tsirkin @ 2012-09-17 18:08 ` Alex Williamson 0 siblings, 0 replies; 35+ messages in thread From: Alex Williamson @ 2012-09-17 18:08 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: avi, gleb, kvm, linux-kernel On Wed, 2012-08-22 at 11:25 +0300, Michael S. Tsirkin wrote: > On Tue, Aug 21, 2012 at 07:28:15PM -0600, Alex Williamson wrote: > > On Wed, 2012-08-22 at 03:31 +0300, Michael S. Tsirkin wrote: > > > On Tue, Aug 21, 2012 at 01:28:57PM -0600, Alex Williamson wrote: > > > > Here's the much anticipated re-write of support for level irqfds. As > > > > Michael suggested, I've rolled the eoi/ack notification fd into > > > > KVM_IRQFD as a new mode. For lack of a better name, as there seems to > > > > be objections to associating this specifically with an EOI or an ACK, > > > > I've name this OADN or "On Ack, De-assert & Notify". > > > > > > > > Patch 1of2 switches current KVM_IRQFDs to use their own IRQ source ID > > > > since we're potentially stepping on KVM_USERSPACE_IRQ_SOURCE_ID. > > > > Unfurtunately I was not able to make 2of2 use a single IRQ source ID, > > > > the reason is it's racy. Objects to track OADNs are made dynamically, > > > > we look through existing ones for a match under spinlock and setup a > > > > new one if there's no match. On teardown, we can remove the OADN from > > > > the list under lock, but that same lock prevents us from de-assigning > > > > the IRQ ACK notifier or waiting for an RCU grace period. We must make > > > > sure that any unused GSI is de-asserted, but the above means it's > > > > possible that another OADN has been created for this source ID/GSI > > > > and de-asserting the GSI could lead to breakage. > > > > > > I do not see it. What breakage? Could you give an example please? > > > > > > > > > I think what you are saying is last deassign must clear > > > since otherwise we never will clear. > > > I agree it is either that or delay deassign until ack. > > > > > > Can it be as simple as this (after all rcu etc dances)? > > > lock irqfds > > > if no oadns > > > set level to 0 > > > unlock irqfds > > > ? > > > > lock irqfds > > remove irqfd from oadn list > > if no oadns > > remove oadn > > set gsi 0 > > unlock > > lock irqfds > > new oadn > > unlock irqfds > > > > >> EOI > > ack notify new oadn > > de-assert gsi > > notify new oadn > > >> re-assert irqfd > > ack notify old oadn > > de-assert gsi > > notify old oadn > > > > synchronize_rcu > > > > kvm_unregister_irq_ack_notifier > > > > So, because the unregister is removed from the final clear and because > > we share an IRQ source ID there's a window where we can have two oadns > > registered for the same GSI. The new one will de-assert and notify > > while the old one has an empty list to notify, but still de-asserts. We > > can therefore de-assert w/o notify. > > > > By using a new source ID, we separate the two so users of the new oadn > > can't race the old and we can cleanly free the old source ID, > > de-asserting it. > > Need to think about it some more but is the problem two > ack notifiers for the same gsi? yes > In that case, how about we add __kvm_unregister_irq_ack_notifier > with no locking, and do most of the above under > kvm->irq_lock? Converting locks makes me nervous, but I'll give it a shot. I don't know how easy/possible it is though. I know in previous iterations I tried to make something similar to irqfd use a mutex and couldn't, but I don't remember the details. > With one change: it is better not to call synchronize_rcu > under irq lock, I think we can safely move it to after > __kvm_unregister_irq_ack_notifier. Yep, that makes the interface pretty ugly though as we then have two separate, but dependent steps. Thanks, Alex ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v9 0/2] kvm: level irqfd support 2012-08-21 19:28 [PATCH v9 0/2] kvm: level irqfd support Alex Williamson ` (2 preceding siblings ...) 2012-08-22 0:31 ` [PATCH v9 0/2] kvm: level irqfd support Michael S. Tsirkin @ 2012-08-22 10:10 ` Michael S. Tsirkin 2012-09-05 14:42 ` Avi Kivity 4 siblings, 0 replies; 35+ messages in thread From: Michael S. Tsirkin @ 2012-08-22 10:10 UTC (permalink / raw) To: Alex Williamson; +Cc: avi, gleb, kvm, linux-kernel On Tue, Aug 21, 2012 at 01:28:57PM -0600, Alex Williamson wrote: > Here's the much anticipated re-write of support for level irqfds. As > Michael suggested, I've rolled the eoi/ack notification fd into > KVM_IRQFD as a new mode. For lack of a better name, as there seems to > be objections to associating this specifically with an EOI or an ACK, > I've name this OADN or "On Ack, De-assert & Notify". Kid wouldn't let me sleep in the night so I've been thinking of better names :). I think a non abbreviated deassert_on_ack is a better name. Yes e.g. irqfd is an abbreviation too, but it is actually just a composition of irq and fd which are both standard, familiar abbreviations. If you have really set your heart on using an abbreviation, it is better to not put two vowels together. ADN (ack deasserts and notifies or auto deassert and notify) is a bit better in that respect, though it does look a bit like a miss-spelled AND which is unfortunate. But obviously, it's all not terribly important. > Patch 1of2 switches current KVM_IRQFDs to use their own IRQ source ID > since we're potentially stepping on KVM_USERSPACE_IRQ_SOURCE_ID. > Unfurtunately I was not able to make 2of2 use a single IRQ source ID, > the reason is it's racy. Objects to track OADNs are made dynamically, > we look through existing ones for a match under spinlock and setup a > new one if there's no match. On teardown, we can remove the OADN from > the list under lock, but that same lock prevents us from de-assigning > the IRQ ACK notifier or waiting for an RCU grace period. We must make > sure that any unused GSI is de-asserted, but the above means it's > possible that another OADN has been created for this source ID/GSI > and de-asserting the GSI could lead to breakage. Instead each OADN > object gets it's own source ID, but these are all shared by users > of the same GSI. So for PCI devices, we might have up to 4 IRQ > source IDs allocated. > > Michael had also suggested avoiding reference counting and using > list_empty for this OADN object. Unfortunately, that doesn't work > for similar reasons. We want to release the OADN object underlock, > preventing others from re-using it on the free path, but in order > to have lock-less de-assert & notify we use RCU, meaning we can't > trust list_empty until after an RCU grace period, which must be > done outside of spinlocks. > > If there are suggestions how we can handle these better, please > make them, but I think this compromise is race-free and still > manages to make allocation of IRQ source IDs mostly a non-issue > for device assignment limits. Thanks, > > Alex > > --- > > Alex Williamson (2): > kvm: On Ack, De-assert & Notify KVM_IRQFD extension > kvm: Use a reserved IRQ source ID for irqfd > > > Documentation/virtual/kvm/api.txt | 13 ++ > arch/x86/kvm/x86.c | 4 + > include/linux/kvm.h | 7 + > include/linux/kvm_host.h | 2 > virt/kvm/eventfd.c | 199 ++++++++++++++++++++++++++++++++++++- > 5 files changed, 218 insertions(+), 7 deletions(-) ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v9 0/2] kvm: level irqfd support 2012-08-21 19:28 [PATCH v9 0/2] kvm: level irqfd support Alex Williamson ` (3 preceding siblings ...) 2012-08-22 10:10 ` Michael S. Tsirkin @ 2012-09-05 14:42 ` Avi Kivity 2012-09-05 14:52 ` Michael S. Tsirkin 4 siblings, 1 reply; 35+ messages in thread From: Avi Kivity @ 2012-09-05 14:42 UTC (permalink / raw) To: Alex Williamson; +Cc: mst, gleb, kvm, linux-kernel On 08/21/2012 10:28 PM, Alex Williamson wrote: > Here's the much anticipated re-write of support for level irqfds. As > Michael suggested, I've rolled the eoi/ack notification fd into > KVM_IRQFD as a new mode. For lack of a better name, as there seems to > be objections to associating this specifically with an EOI or an ACK, > I've name this OADN or "On Ack, De-assert & Notify". Perhaps we can call it "resample", since this is what it is doing. irqfd tells kvm that the line as been asserted, resample tells the caller to check again. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v9 0/2] kvm: level irqfd support 2012-09-05 14:42 ` Avi Kivity @ 2012-09-05 14:52 ` Michael S. Tsirkin 0 siblings, 0 replies; 35+ messages in thread From: Michael S. Tsirkin @ 2012-09-05 14:52 UTC (permalink / raw) To: Avi Kivity; +Cc: Alex Williamson, gleb, kvm, linux-kernel On Wed, Sep 05, 2012 at 05:42:38PM +0300, Avi Kivity wrote: > On 08/21/2012 10:28 PM, Alex Williamson wrote: > > Here's the much anticipated re-write of support for level irqfds. As > > Michael suggested, I've rolled the eoi/ack notification fd into > > KVM_IRQFD as a new mode. For lack of a better name, as there seems to > > be objections to associating this specifically with an EOI or an ACK, > > I've name this OADN or "On Ack, De-assert & Notify". > > Perhaps we can call it "resample", since this is what it is doing. > irqfd tells kvm that the line as been asserted, resample tells the > caller to check again. Sounds good. > -- > error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2012-09-17 18:08 UTC | newest] Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-08-21 19:28 [PATCH v9 0/2] kvm: level irqfd support Alex Williamson 2012-08-21 19:29 ` [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd Alex Williamson 2012-08-21 19:58 ` Michael S. Tsirkin 2012-08-21 20:06 ` Alex Williamson 2012-08-21 20:41 ` Michael S. Tsirkin 2012-08-21 21:14 ` Alex Williamson 2012-08-22 0:41 ` Michael S. Tsirkin 2012-08-22 1:34 ` Alex Williamson 2012-09-05 14:35 ` Avi Kivity 2012-09-05 14:51 ` Michael S. Tsirkin 2012-09-05 14:59 ` Avi Kivity 2012-09-05 15:13 ` Michael S. Tsirkin 2012-09-05 15:22 ` Avi Kivity 2012-09-05 15:28 ` Michael S. Tsirkin 2012-09-05 15:35 ` Avi Kivity 2012-09-05 15:09 ` Michael S. Tsirkin 2012-09-05 15:12 ` Avi Kivity 2012-09-05 15:15 ` Michael S. Tsirkin 2012-09-05 14:46 ` Avi Kivity 2012-09-05 15:07 ` Michael S. Tsirkin 2012-09-05 15:15 ` Avi Kivity 2012-08-21 19:29 ` [PATCH v9 2/2] kvm: On Ack, De-assert & Notify KVM_IRQFD extension Alex Williamson 2012-08-21 20:37 ` Michael S. Tsirkin 2012-08-21 21:11 ` Alex Williamson 2012-08-22 0:14 ` Michael S. Tsirkin 2012-08-22 1:48 ` Alex Williamson 2012-09-05 14:57 ` Avi Kivity 2012-09-17 16:13 ` Alex Williamson 2012-08-22 0:31 ` [PATCH v9 0/2] kvm: level irqfd support Michael S. Tsirkin 2012-08-22 1:28 ` Alex Williamson 2012-08-22 8:25 ` Michael S. Tsirkin 2012-09-17 18:08 ` Alex Williamson 2012-08-22 10:10 ` Michael S. Tsirkin 2012-09-05 14:42 ` Avi Kivity 2012-09-05 14:52 ` Michael S. Tsirkin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).