From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754921Ab2HVB2U (ORCPT ); Tue, 21 Aug 2012 21:28:20 -0400 Received: from mx1.redhat.com ([209.132.183.28]:2172 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752179Ab2HVB2Q (ORCPT ); Tue, 21 Aug 2012 21:28:16 -0400 Message-ID: <1345598895.29292.115.camel@ul30vt.home> Subject: Re: [PATCH v9 0/2] kvm: level irqfd support From: Alex Williamson To: "Michael S. Tsirkin" Cc: avi@redhat.com, gleb@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Date: Tue, 21 Aug 2012 19:28:15 -0600 In-Reply-To: <20120822003129.GK9027@redhat.com> References: <20120821190800.24958.74812.stgit@bling.home> <20120822003129.GK9027@redhat.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2012-08-22 at 03:31 +0300, Michael S. Tsirkin wrote: > On Tue, Aug 21, 2012 at 01:28:57PM -0600, Alex Williamson wrote: > > Here's the much anticipated re-write of support for level irqfds. As > > Michael suggested, I've rolled the eoi/ack notification fd into > > KVM_IRQFD as a new mode. For lack of a better name, as there seems to > > be objections to associating this specifically with an EOI or an ACK, > > I've name this OADN or "On Ack, De-assert & Notify". > > > > Patch 1of2 switches current KVM_IRQFDs to use their own IRQ source ID > > since we're potentially stepping on KVM_USERSPACE_IRQ_SOURCE_ID. > > Unfurtunately I was not able to make 2of2 use a single IRQ source ID, > > the reason is it's racy. Objects to track OADNs are made dynamically, > > we look through existing ones for a match under spinlock and setup a > > new one if there's no match. On teardown, we can remove the OADN from > > the list under lock, but that same lock prevents us from de-assigning > > the IRQ ACK notifier or waiting for an RCU grace period. We must make > > sure that any unused GSI is de-asserted, but the above means it's > > possible that another OADN has been created for this source ID/GSI > > and de-asserting the GSI could lead to breakage. > > I do not see it. What breakage? Could you give an example please? > > > I think what you are saying is last deassign must clear > since otherwise we never will clear. > I agree it is either that or delay deassign until ack. > > Can it be as simple as this (after all rcu etc dances)? > lock irqfds > if no oadns > set level to 0 > unlock irqfds > ? lock irqfds remove irqfd from oadn list if no oadns remove oadn set gsi 0 unlock lock irqfds new oadn unlock irqfds >> EOI ack notify new oadn de-assert gsi notify new oadn >> re-assert irqfd ack notify old oadn de-assert gsi notify old oadn synchronize_rcu kvm_unregister_irq_ack_notifier So, because the unregister is removed from the final clear and because we share an IRQ source ID there's a window where we can have two oadns registered for the same GSI. The new one will de-assert and notify while the old one has an empty list to notify, but still de-asserts. We can therefore de-assert w/o notify. By using a new source ID, we separate the two so users of the new oadn can't race the old and we can cleanly free the old source ID, de-asserting it. > > Instead each OADN > > object gets it's own source ID, but these are all shared by users > > of the same GSI. So for PCI devices, we might have up to 4 IRQ > > source IDs allocated. > > > > Michael had also suggested avoiding reference counting and using > > list_empty for this OADN object. Unfortunately, that doesn't work > > for similar reasons. We want to release the OADN object underlock, > > preventing others from re-using it on the free path, but in order > > to have lock-less de-assert & notify we use RCU, meaning we can't > > trust list_empty until after an RCU grace period, which must be > > done outside of spinlocks. > > confused. list empty on assign/deassing would be under lock > so no need for grace periods to trust it. > what am I missing? > > But if you like kref more that is OK too. Maybe I'm misinterpreting this: include/linux/rculist.h: /** * list_del_rcu - deletes entry from list without re-initialization * @entry: the element to delete from the list. * * Note: list_empty() on entry does not return true after this, * the entry is in an undefined state. It is useful for RCU based * lockfree traversal. If I can trust list_empty on oadn->irqfds, which maybe I can re-reading it again, then we can drop the kref. Thanks, Alex > > If there are suggestions how we can handle these better, please > > make them, but I think this compromise is race-free and still > > manages to make allocation of IRQ source IDs mostly a non-issue > > for device assignment limits. Thanks, > > > > Alex > > > > --- > > > > Alex Williamson (2): > > kvm: On Ack, De-assert & Notify KVM_IRQFD extension > > kvm: Use a reserved IRQ source ID for irqfd > > > > > > Documentation/virtual/kvm/api.txt | 13 ++ > > arch/x86/kvm/x86.c | 4 + > > include/linux/kvm.h | 7 + > > include/linux/kvm_host.h | 2 > > virt/kvm/eventfd.c | 199 ++++++++++++++++++++++++++++++++++++- > > 5 files changed, 218 insertions(+), 7 deletions(-)