From mboxrd@z Thu Jan  1 00:00:00 1970
From: Alex Williamson <alex.williamson@redhat.com>
Subject: Re: KVM devices assignment; PCIe AER?
Date: Wed, 27 Oct 2010 16:58:20 -0600
Message-ID: <1288220300.5129.219.camel@x201>
References: <alpine.DEB.2.00.1010260913380.22179@localhost.localdomain>
	 <20101026183733.GA17477@redhat.com>
	 <alpine.DEB.2.00.1010261321390.755@localhost.localdomain>
	 <20101026204208.GV25455@sequoia.sous-sol.org>
	 <alpine.DEB.2.00.1010261423520.4117@ubuntu.ubuntu-domain>
	 <20101026221558.GZ25455@sequoia.sous-sol.org>
	 <alpine.DEB.2.00.1010261519310.2450@localhost.localdomain>
	 <20101026230552.GA25455@sequoia.sous-sol.org>
	 <alpine.DEB.2.00.1010262001320.3242@ubuntu.ubuntu-domain>
	 <1288191276.5129.183.camel@x201>
	 <alpine.DEB.2.00.1010271052050.14208@localhost.localdomain>
	 <1288206999.5129.203.camel@x201>
	 <alpine.DEB.2.00.1010271228001.1805@ubuntu.ubuntu-domain>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
Cc: Chris Wright <chrisw@sous-sol.org>,
	"Michael S. Tsirkin" <mst@redhat.com>, kvm@vger.kernel.org
To: Etienne Martineau <etmartin101@gmail.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:9162 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1752599Ab0J0W60 (ORCPT <rfc822;kvm@vger.kernel.org>);
	Wed, 27 Oct 2010 18:58:26 -0400
In-Reply-To: <alpine.DEB.2.00.1010271228001.1805@ubuntu.ubuntu-domain>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On Wed, 2010-10-27 at 14:43 -0700, Etienne Martineau wrote:
> On Wed, 27 Oct 2010, Alex Williamson wrote:
> > No, emulated devices trigger interrupts directly with qemu_set_irq.
> > irqfds are currently only used by vhost afaik, since it's being
> > interrupted externally, much like pass through devices are.
> 
> Fair enough. Thanks for the clarification.
> 
> > Sort of.  When the VFIO device triggers an interrupt, we get notified
> > via the eventfd we've registered for that interrupt.  We can then call
> > qemu_set_irq directly to raise that interrupt in the KVM kernel APIC.
> > That much works today.
> 
> Understood but performance wise this is no good for KVM right?

Right, bouncing interrupts and EOIs through qemu via eventfds is going
to add latency.  On the interrupt path we already have irqfds, which
will avoid the bounce through userspace, we just need to use them.
Doing something similar with EOIs could avoid that path, giving us
something comparable to current device assignment.

> > The irqfd mechanism is simply a way for KVM to
> > directly consume the eventfd and raise an interrupt via a pre-setup
> > vector.  That's yet to be implemented for INTx on VFIO, but should
> > mostly be a matter of connecting existing pieces together.  It's working
> > for MSI-X.
> 
> OK, I was on the impression you already had irqfd 'connected' to KVM from 
> VFIO... This is why I was asking about the nature of the changed in VFIO.
> 
> > When VFIO sends an interrupt, it disables the physical device from
> > generating more interrupts (this is where VFIO requires PCI 2.3
> > compliant devices for the INTx disable bit int he status register).
> > When the guest services the interrupt, we can detect this by catching
> > the EOI of the IOAPIC.  At that point, we can re-eanble interrupts on
> > the device.  Wash, rinse, repeat.
> >
> > To do this in qemu, I created a callback on the ioapic where drivers can
> > register for the interrupt they care about.  Since KVM moves the ioapic
> > into the kernel, we need to extend this into KVM and have yet another
> > eventfd mechanism.  It's possible that we could have the VFIO kernel
> > module also receive this eventfd, re-enabling interrupts on the device,
> > in much the same way as above.
> 
> In the cases of KVM where are you going to catch the EIO? For some 
> reason I'm on the impression that this is part of KVM. If so then how are 
> you going to 'signal' to VFIO? Cannot use eventfd here right?

KVM already has an internal IRQ ACK notifier (which is what current
device assignment uses to do the same thing), it's just a matter of
adding a callback that does a kvm_register_irq_ack_notifier that sends
off the eventfd signal.  I've got this working and will probably send
out the KVM patch this week.  For now the eventfd goes to userspace, but
this is where I imagine we could steal some of the irqfd code to make
VFIO consume the irqfd signal directly.  Thanks,

Alex