linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: avi@redhat.com, gleb@redhat.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v9 0/2] kvm: level irqfd support
Date: Tue, 21 Aug 2012 19:28:15 -0600	[thread overview]
Message-ID: <1345598895.29292.115.camel@ul30vt.home> (raw)
In-Reply-To: <20120822003129.GK9027@redhat.com>

On Wed, 2012-08-22 at 03:31 +0300, Michael S. Tsirkin wrote:
> On Tue, Aug 21, 2012 at 01:28:57PM -0600, Alex Williamson wrote:
> > Here's the much anticipated re-write of support for level irqfds.  As
> > Michael suggested, I've rolled the eoi/ack notification fd into
> > KVM_IRQFD as a new mode.  For lack of a better name, as there seems to
> > be objections to associating this specifically with an EOI or an ACK,
> > I've name this OADN or "On Ack, De-assert & Notify".
> > 
> > Patch 1of2 switches current KVM_IRQFDs to use their own IRQ source ID
> > since we're potentially stepping on KVM_USERSPACE_IRQ_SOURCE_ID.
> > Unfurtunately I was not able to make 2of2 use a single IRQ source ID,
> > the reason is it's racy.  Objects to track OADNs are made dynamically,
> > we look through existing ones for a match under spinlock and setup a
> > new one if there's no match.  On teardown, we can remove the OADN from
> > the list under lock, but that same lock prevents us from de-assigning
> > the IRQ ACK notifier or waiting for an RCU grace period.  We must make
> > sure that any unused GSI is de-asserted, but the above means it's
> > possible that another OADN has been created for this source ID/GSI
> > and de-asserting the GSI could lead to breakage.
> 
> I do not see it. What breakage? Could you give an example please?
> 
> 
> I think what you are saying is last deassign must clear
> since otherwise we never will clear.
> I agree it is either that or delay deassign until ack.
> 
> Can it be as simple as this (after all rcu etc dances)?
> 	lock irqfds
> 	if no oadns
> 		set level to 0
> 	unlock irqfds
> ?

lock irqfds
remove irqfd from oadn list
if no oadns
    remove oadn
    set gsi 0
unlock
                                    lock irqfds
                                    new oadn
                                    unlock irqfds

>> EOI 
                                                          ack notify new oadn
                                                          de-assert gsi
                                                          notify new oadn
>> re-assert irqfd
                                                          ack notify old oadn
                                                          de-assert gsi
                                                          notify old oadn

synchronize_rcu

kvm_unregister_irq_ack_notifier

So, because the unregister is removed from the final clear and because
we share an IRQ source ID there's a window where we can have two oadns
registered for the same GSI.  The new one will de-assert and notify
while the old one has an empty list to notify, but still de-asserts.  We
can therefore de-assert w/o notify.

By using a new source ID, we separate the two so users of the new oadn
can't race the old and we can cleanly free the old source ID,
de-asserting it.

> >  Instead each OADN
> > object gets it's own source ID, but these are all shared by users
> > of the same GSI.  So for PCI devices, we might have up to 4 IRQ
> > source IDs allocated.
> > 
> > Michael had also suggested avoiding reference counting and using
> > list_empty for this OADN object.  Unfortunately, that doesn't work
> > for similar reasons.  We want to release the OADN object underlock,
> > preventing others from re-using it on the free path, but in order
> > to have lock-less de-assert & notify we use RCU, meaning we can't
> > trust list_empty until after an RCU grace period, which must be
> > done outside of spinlocks.
> 
> confused. list empty on assign/deassing would be under lock
> so no need for grace periods to trust it.
> what am I missing?
> 
> But if you like kref more that is OK too.

Maybe I'm misinterpreting this:

include/linux/rculist.h:
/**
 * list_del_rcu - deletes entry from list without re-initialization
 * @entry: the element to delete from the list.
 *
 * Note: list_empty() on entry does not return true after this,
 * the entry is in an undefined state. It is useful for RCU based
 * lockfree traversal.

If I can trust list_empty on oadn->irqfds, which maybe I can re-reading
it again, then we can drop the kref.  Thanks,

Alex


> > If there are suggestions how we can handle these better, please
> > make them, but I think this compromise is race-free and still
> > manages to make allocation of IRQ source IDs mostly a non-issue
> > for device assignment limits.  Thanks,
> > 
> > Alex
> > 
> > ---
> > 
> > Alex Williamson (2):
> >       kvm: On Ack, De-assert & Notify KVM_IRQFD extension
> >       kvm: Use a reserved IRQ source ID for irqfd
> > 
> > 
> >  Documentation/virtual/kvm/api.txt |   13 ++
> >  arch/x86/kvm/x86.c                |    4 +
> >  include/linux/kvm.h               |    7 +
> >  include/linux/kvm_host.h          |    2 
> >  virt/kvm/eventfd.c                |  199 ++++++++++++++++++++++++++++++++++++-
> >  5 files changed, 218 insertions(+), 7 deletions(-)




  reply	other threads:[~2012-08-22  1:28 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-21 19:28 [PATCH v9 0/2] kvm: level irqfd support Alex Williamson
2012-08-21 19:29 ` [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd Alex Williamson
2012-08-21 19:58   ` Michael S. Tsirkin
2012-08-21 20:06     ` Alex Williamson
2012-08-21 20:41       ` Michael S. Tsirkin
2012-08-21 21:14         ` Alex Williamson
2012-08-22  0:41           ` Michael S. Tsirkin
2012-08-22  1:34             ` Alex Williamson
2012-09-05 14:35             ` Avi Kivity
2012-09-05 14:51               ` Michael S. Tsirkin
2012-09-05 14:59                 ` Avi Kivity
2012-09-05 15:13                   ` Michael S. Tsirkin
2012-09-05 15:22                     ` Avi Kivity
2012-09-05 15:28                       ` Michael S. Tsirkin
2012-09-05 15:35                         ` Avi Kivity
2012-09-05 15:09                 ` Michael S. Tsirkin
2012-09-05 15:12                   ` Avi Kivity
2012-09-05 15:15                     ` Michael S. Tsirkin
2012-09-05 14:46   ` Avi Kivity
2012-09-05 15:07     ` Michael S. Tsirkin
2012-09-05 15:15       ` Avi Kivity
2012-08-21 19:29 ` [PATCH v9 2/2] kvm: On Ack, De-assert & Notify KVM_IRQFD extension Alex Williamson
2012-08-21 20:37   ` Michael S. Tsirkin
2012-08-21 21:11     ` Alex Williamson
2012-08-22  0:14       ` Michael S. Tsirkin
2012-08-22  1:48         ` Alex Williamson
2012-09-05 14:57   ` Avi Kivity
2012-09-17 16:13     ` Alex Williamson
2012-08-22  0:31 ` [PATCH v9 0/2] kvm: level irqfd support Michael S. Tsirkin
2012-08-22  1:28   ` Alex Williamson [this message]
2012-08-22  8:25     ` Michael S. Tsirkin
2012-09-17 18:08       ` Alex Williamson
2012-08-22 10:10 ` Michael S. Tsirkin
2012-09-05 14:42 ` Avi Kivity
2012-09-05 14:52   ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1345598895.29292.115.camel@ul30vt.home \
    --to=alex.williamson@redhat.com \
    --cc=avi@redhat.com \
    --cc=gleb@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).