From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1755196AbbJ1ApH (ORCPT <rfc822;w@1wt.eu>);
	Tue, 27 Oct 2015 20:45:07 -0400
Received: from mail-wi0-f180.google.com ([209.85.212.180]:36302 "EHLO
	mail-wi0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1755030AbbJ1ApD (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Tue, 27 Oct 2015 20:45:03 -0400
Subject: Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
To: Yunhong Jiang <yunhong.jiang@linux.intel.com>
References: <1445908801-14732-1-git-send-email-yunhong.jiang@linux.intel.com>
 <1445917034.8018.220.camel@redhat.com>
 <20151027063501.GA22054@jnakajim-build> <562F43F8.1040101@redhat.com>
 <20151027212648.GA22916@jnakajim-build>
Cc: Alex Williamson <alex.williamson@redhat.com>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org
From: Paolo Bonzini <pbonzini@redhat.com>
X-Enigmail-Draft-Status: N1110
Message-ID: <56301A87.9030907@redhat.com>
Date: Wed, 28 Oct 2015 01:44:55 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
 Thunderbird/38.3.0
MIME-Version: 1.0
In-Reply-To: <20151027212648.GA22916@jnakajim-build>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 8bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org


On 27/10/2015 22:26, Yunhong Jiang wrote:
>> > On RT kernels however can you call eventfd_signal from interrupt
>> > context?  You cannot call spin_lock_irqsave (which can sleep) from a
>> > non-threaded interrupt handler, can you?  You would need a raw spin lock.
> Thanks for pointing this out. Yes, we can't call spin_lock_irqsave on RT 
> kernel. Will do this way on next patch. But not sure if it's overkill to use 
> raw_spinlock there since the eventfd_signal is used by other caller also.

No, I don't think you can use raw_spinlock there.  The problem is not
just eventfd_signal, it is especially wake_up_locked_poll.  You cannot
convert the whole workqueue infrastructure to use raw_spinlock.

Alex, would it make sense to use the IRQ bypass infrastructure always,
not just for VT-d, to do the MSI injection directly from the VFIO
interrupt handler and bypass the eventfd?  Basically this would add an
RCU-protected list of consumers matching the token to struct
irq_bypass_producer, and a

	int (*inject)(struct irq_bypass_consumer *);

callback to struct irq_bypass_consumer.  If any callback returns true,
the eventfd is not signaled.  The KVM implementation would be like this
(compare with virt/kvm/eventfd.c):

	/* Extracted out of irqfd_wakeup */
	static int
	irqfd_wakeup_pollin(struct kvm_kernel_irqfd *irqfd)
	{
		...
	}

	/* Extracted out of irqfd_wakeup */
	static int
	irqfd_wakeup_pollhup(struct kvm_kernel_irqfd *irqfd)
	{
		...
	}

	static int
	irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync,
		     void *key)
	{
	        struct _irqfd *irqfd = container_of(wait,
			struct _irqfd, wait);
	        unsigned long flags = (unsigned long)key;

		if (flags & POLLIN)
			irqfd_wakeup_pollin(irqfd);
		if (flags & POLLHUP)
			irqfd_wakeup_pollhup(irqfd);

		return 0;
	}

	static int kvm_arch_irq_bypass_inject(
		struct irq_bypass_consumer *cons)
	{
		struct kvm_kernel_irqfd *irqfd =
			container_of(cons, struct kvm_kernel_irqfd,
				     consumer);	

		irqfd_wakeup_pollin(irqfd);
	}

Or do you think it would be a hack?  The latency improvement might
actually be even better than what Yunhong is already reporting.

Paolo