From mboxrd@z Thu Jan  1 00:00:00 1970
From: David Gibson <dwg@au1.ibm.com>
Subject: Re: kvm PCI assignment & VFIO ramblings
Date: Mon, 1 Aug 2011 12:48:46 +1000
Message-ID: <20110801024846.GA28437@yookeroo.fritz.box>
References: <1311983933.8793.42.camel@pasglop>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Cc: Alexey Kardashevskiy <aik@au1.ibm.com>, kvm@vger.kernel.org,
	Paul Mackerras <pmac@au1.ibm.com>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	Anthony Liguori <anthony@codemonkey.ws>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Return-path: <linuxppc-dev-bounces+glppe-linuxppc-embedded-2=m.gmane.org@lists.ozlabs.org>
Content-Disposition: inline
In-Reply-To: <1311983933.8793.42.camel@pasglop>
List-Unsubscribe: <https://lists.ozlabs.org/options/linuxppc-dev>,
	<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=unsubscribe>
List-Archive: <http://lists.ozlabs.org/pipermail/linuxppc-dev>
List-Post: <mailto:linuxppc-dev@lists.ozlabs.org>
List-Help: <mailto:linuxppc-dev-request@lists.ozlabs.org?subject=help>
List-Subscribe: <https://lists.ozlabs.org/listinfo/linuxppc-dev>,
	<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=subscribe>
Errors-To: linuxppc-dev-bounces+glppe-linuxppc-embedded-2=m.gmane.org@lists.ozlabs.org
Sender: linuxppc-dev-bounces+glppe-linuxppc-embedded-2=m.gmane.org@lists.ozlabs.org
List-Id: kvm.vger.kernel.org

On Sat, Jul 30, 2011 at 09:58:53AM +1000, Benjamin Herrenschmidt wrote:
[snip]
> That current hack won't work well if two devices share an iommu. Note
> that we have an additional constraint here due to our paravirt
> interfaces (specificed in PAPR) which is that PE domains must have a
> common parent. Basically, pHyp makes them look like a PCIe host bridge
> per domain in the guest. I think that's a pretty good idea and qemu
> might want to do the same.
> 
> - We hack out the currently unconditional mapping of the entire guest
> space in the iommu. Something will have to be done to "decide" whether
> to do that or not ... qemu argument -> ioctl ?

Not quite.  We already require the not-yet-upstream patches which add
guest-side (emulated) IOMMU support to qemu.  The approach we're using
for the passthrough (or at least will when I fix up my patches again)
is that we only map all guest ram into the vfio iommu if and only if
there is no guest visible iommu advertised in the qdev.

This kind of makes sense - if there is no iommu from the guest
perspective, the guest will expect to see all its physical memory 1:1
in DMA.

The hacky bit is that when there *is* a guest visible iommu, it's
assumed that whatever interface the guest iommu uses is somehow wired
up to vfio map/unmap calls.  For us at the moment, this means
passthrough devices for us must be assigned to a special (guest) pci
domain which sets up a suitable wires up the paravirt iommu to the vfio iommu.

In theory under some circumstances, with full emu, you could wire up
an emulated guest iommu interface to a different host iommu
implementation via this mechanism.  However that wouldn't work if the
guest and host iommus capabilities are too different, and in any case
would require considerable extra abstraction work on the qemu guest
iommu code.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson