From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1755907AbbJ1SSu (ORCPT <rfc822;w@1wt.eu>);
	Wed, 28 Oct 2015 14:18:50 -0400
Received: from mx1.redhat.com ([209.132.183.28]:40666 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1755391AbbJ1SSt (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Wed, 28 Oct 2015 14:18:49 -0400
Message-ID: <1446056328.8018.422.camel@redhat.com>
Subject: Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
From: Alex Williamson <alex.williamson@redhat.com>
To: Yunhong Jiang <yunhong.jiang@linux.intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Steven Rostedt <rostedt@goodmis.org>
Date: Wed, 28 Oct 2015 12:18:48 -0600
In-Reply-To: <20151028175013.GA21961@jnakajim-build>
References: <1445908801-14732-1-git-send-email-yunhong.jiang@linux.intel.com>
	 <1445917034.8018.220.camel@redhat.com>
	 <20151027063501.GA22054@jnakajim-build> <562F43F8.1040101@redhat.com>
	 <20151027212648.GA22916@jnakajim-build> <56301A87.9030907@redhat.com>
	 <20151028175013.GA21961@jnakajim-build>
Content-Type: text/plain; charset="UTF-8"
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Wed, 2015-10-28 at 10:50 -0700, Yunhong Jiang wrote:
> On Wed, Oct 28, 2015 at 01:44:55AM +0100, Paolo Bonzini wrote:
> > 
> > 
> > On 27/10/2015 22:26, Yunhong Jiang wrote:
> > >> > On RT kernels however can you call eventfd_signal from interrupt
> > >> > context?  You cannot call spin_lock_irqsave (which can sleep) from a
> > >> > non-threaded interrupt handler, can you?  You would need a raw spin lock.
> > > Thanks for pointing this out. Yes, we can't call spin_lock_irqsave on RT 
> > > kernel. Will do this way on next patch. But not sure if it's overkill to use 
> > > raw_spinlock there since the eventfd_signal is used by other caller also.
> > 
> > No, I don't think you can use raw_spinlock there.  The problem is not
> > just eventfd_signal, it is especially wake_up_locked_poll.  You cannot
> > convert the whole workqueue infrastructure to use raw_spinlock.
> 
> You mean the waitqueue, instead of workqueue, right? One choice is to change 
> the eventfd to use simple wait queue, which is raw_spinlock. But use simple 
> waitqueue on eventfd may in fact impact real time latency if not in this 
> scenario.
> 
> > 
> > Alex, would it make sense to use the IRQ bypass infrastructure always,
> > not just for VT-d, to do the MSI injection directly from the VFIO
> > interrupt handler and bypass the eventfd?  Basically this would add an
> > RCU-protected list of consumers matching the token to struct
> > irq_bypass_producer, and a
> > 
> > 	int (*inject)(struct irq_bypass_consumer *);
> > 
> > callback to struct irq_bypass_consumer.  If any callback returns true,
> > the eventfd is not signaled.  The KVM implementation would be like this
> > (compare with virt/kvm/eventfd.c):
> > 
> > 	/* Extracted out of irqfd_wakeup */
> > 	static int
> > 	irqfd_wakeup_pollin(struct kvm_kernel_irqfd *irqfd)
> > 	{
> > 		...
> > 	}
> > 
> > 	/* Extracted out of irqfd_wakeup */
> > 	static int
> > 	irqfd_wakeup_pollhup(struct kvm_kernel_irqfd *irqfd)
> > 	{
> > 		...
> > 	}
> > 
> > 	static int
> > 	irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync,
> > 		     void *key)
> > 	{
> > 	        struct _irqfd *irqfd = container_of(wait,
> > 			struct _irqfd, wait);
> > 	        unsigned long flags = (unsigned long)key;
> > 
> > 		if (flags & POLLIN)
> > 			irqfd_wakeup_pollin(irqfd);
> > 		if (flags & POLLHUP)
> > 			irqfd_wakeup_pollhup(irqfd);
> > 
> > 		return 0;
> > 	}
> > 
> > 	static int kvm_arch_irq_bypass_inject(
> > 		struct irq_bypass_consumer *cons)
> > 	{
> > 		struct kvm_kernel_irqfd *irqfd =
> > 			container_of(cons, struct kvm_kernel_irqfd,
> > 				     consumer);	
> > 
> > 		irqfd_wakeup_pollin(irqfd);
> > 	}
> > 
> This is a good idea IMHO. So for MSI interrupt, the 
> kvm_arch_irq_bypass_inject will be used, and the irqfd_wakeup will not be 
> invoked anymore, am I right?
> 
> I noticed the irq bypass manager is not merged yet, are there any git branch 
> for it?

It's in linux-next via the kvm.git next branch:

git://git.kernel.org/pub/scm/virt/kvm/kvm.git

Thanks,
Alex