Re: [Qemu-devel] [Xen-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks

From: Alex Williamson <alex.williamson@redhat.com>
To: David Woodhouse <dwmw2@infradead.org>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	"Kay, Allen M" <allen.m.kay@intel.com>,
	Gerd Hoffmann <kraxel@redhat.com>
Cc: "igvt-g@ml01.01.org" <igvt-g@ml01.01.org>,
	"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
	Eduardo Habkost <ehabkost@redhat.com>,
	Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Cao jin <caoj.fnst@cn.fujitsu.com>,
	"vfio-users@redhat.com" <vfio-users@redhat.com>
Subject: Re: [Qemu-devel] [Xen-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks
Date: Tue, 02 Feb 2016 07:54:00 -0700	[thread overview]
Message-ID: <1454424840.10542.18.camel@redhat.com> (raw)
In-Reply-To: <1454413825.4788.49.camel@infradead.org>

On Tue, 2016-02-02 at 11:50 +0000, David Woodhouse wrote:
> On Tue, 2016-02-02 at 06:42 +0000, Tian, Kevin wrote:
> > > From: Kay, Allen M
> > > Sent: Tuesday, February 02, 2016 8:04 AM
> > > > 
> > > > David notes in the latter commit above:
> > > > 
> > > > "We should be able to successfully assign graphics devices to guests too, as
> > > > long as the initial handling of stolen memory is reconfigured appropriately."
> > > > 
> > > > What code is supposed to be doing that reconfiguration when a device is
> > > > assigned?  Clearly we don't have it yet, making assignment of these devices
> > > > very unsafe.  It seems like vfio or IOMMU code  in the kernel needs device
> > > > specific code to clear these settings to make it safe for userspace, then
> > > > perhaps VM BIOS support to reallocate.  Is there any consistency across IGD
> > > > revisions for doing this?  Is there a spec?
> > > > Thanks,
> 
> I haven't ever successfully assigned an IGD device to a VM myself, but
> my understanding was that it *has* been done. So when the code was
> changed to prevent assignment of devices afflicted by RMRRs (except USB
> where we know it's OK), I just added the integrated graphics to that
> same exception as USB, to preserve the status quo ante.

It had been successfully done on /Xen/, not on anything that actually made
use of that exclusion, so there was no status quo to preserve.

> > I don't think stolen memory should be handled explicitly. If yes, it should be
> > listed as a RMRR region so general RMRR setup will cover it. But as Allen
> > pointed out, the whole RMRR becomes unnecessary if we target only secondary
> > device for IGD.
> 
> Perhaps the best option is *not* to have special cases in the IOMMU
> code for "those devices which can safely be assigned despite RMRRs".
> 
> Instead, let's let the device driver — or whatever — tell the IOMMU
> code when it's *stopped* the firmware from (ab)using the device's DMA
> facilities.
> 
> So when the USB code does the handoff thing to quiesce the firmware's
> access to USB and take over in the OS, it would call the IOMMU function
> to revoke the RMRR for the USB controller.
> 
> And if/when the graphics driver resets its device into a state where
> it's no longer accessing stolen memory and can be assigned to a VM, it
> can also call that 'RMRR revoke' function.
> 
> Likewise, if we teach device drivers to cancel whatever abominations
> the HP firmware tends to set up behind the OS's back on other PCI
> devices, we can cancel the RMRRs for those too.
> 
> Then the IOMMU code has a simple choice and no special cases — we can
> assign a device iff it has no active RMRR.

I first glance I like it, but there's a problem, it assumes there is a
host driver for the device that will permanently release the device from
the RMRR even after the device is unbound.  Currently we don't have a
requirement that the user must first bind the device to a native host
driver, unbind it, and only then is it eligible for device assignment.
In fact with GPUs we often blacklist the native driver or attach them
directly to a stub driver to avoid the host driver.  Maybe that issue
works itself out since the IOMMU won't allow access to the device
without this step, but it means that i915 needs to be better than most
graphics drivers when it comes to unbinding the device (which is not a
very high bar).  Of course as I've shown on IGD, it's not simply a
matter of declaring the RMRR unused, some reconfiguration of the device
is necessary such that the guest driver doesn't try to start using that
same reserved range.  Thanks,

Alex