From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751710AbbKJDF5 (ORCPT <rfc822;w@1wt.eu>);
	Mon, 9 Nov 2015 22:05:57 -0500
Received: from gate.crashing.org ([63.228.1.57]:59099 "EHLO gate.crashing.org"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1750727AbbKJDFz (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Mon, 9 Nov 2015 22:05:55 -0500
Message-ID: <1447121076.31884.61.camel@kernel.crashing.org>
Subject: Re: [PATCH v4 0/6] virtio core DMA API conversion
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>, David Woodhouse <dwmw2@infradead.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        "David S. Miller" <davem@davemloft.net>, sparclinux@vger.kernel.org,
        Joerg Roedel <jroedel@suse.de>,
        Christian Borntraeger <borntraeger@de.ibm.com>,
        Cornelia Huck <cornelia.huck@de.ibm.com>,
        Sebastian Ott <sebott@linux.vnet.ibm.com>,
        Paolo Bonzini <pbonzini@redhat.com>, Christoph Hellwig <hch@lst.de>,
        KVM <kvm@vger.kernel.org>, Martin Schwidefsky <schwidefsky@de.ibm.com>,
        linux-s390 <linux-s390@vger.kernel.org>,
        Linux Virtualization <virtualization@lists.linux-foundation.org>,
        "Michael S. Tsirkin" <mst@redhat.com>
Date: Tue, 10 Nov 2015 13:04:36 +1100
In-Reply-To: <CALCETrX7Gkw3WrBHff=TpCFHj444E8hHcR6sAqOghQFBo5wp_A@mail.gmail.com>
References: <cover.1446162273.git.luto@kernel.org>
	 <20151109133624-mutt-send-email-mst@redhat.com>
	 <1447109937.31884.42.camel@kernel.crashing.org>
	 <CALCETrX7Gkw3WrBHff=TpCFHj444E8hHcR6sAqOghQFBo5wp_A@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
X-Mailer: Evolution 3.18.1 (3.18.1-1.fc23) 
Mime-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Mon, 2015-11-09 at 16:46 -0800, Andy Lutomirski wrote:
> The problem here is that in some of the problematic cases the virtio
> driver may not even be loaded.  If someone runs an L1 guest with an
> IOMMU-bypassing virtio device and assigns it to L2 using vfio, then
> *boom* L1 crashes.  (Same if, say, DPDK gets used, I think.)
> 
> >
> > The only way out of this while keeping the "platform" stuff would be to
> > also bump some kind of version in the virtio config (or PCI header). I
> > have no other way to differenciate between "this is an old qemu that
> > doesn't do the 'bypass property' yet" from "this is a virtio device
> > that doesn't bypass".
> >
> > Any better idea ?
> 
> I'd suggest that, in the absence of the new DT binding, we assume that
> any PCI device with the virtio vendor ID is passthrough on powerpc.  I
> can do this in the virtio driver, but if it's in the platform code
> then vfio gets it right too (i.e. fails to load).

The problem is there isn't *a* virtio vendor ID. It's the RedHat vendor
ID which will be used by more than just virtio, so we need to
specifically list the devices.

Additionally, that still means that once we have a virtio device that
actually uses the iommu, powerpc will not work since the "workaround"
above will kick in.

The "in absence of the new DT binding" doesn't make that much sense.

Those platforms use device-trees defined since the dawn of ages by
actual open firmware implementations, they either have no iommu
representation in there (Macs, the platform code hooks it all up) or
have various properties related to the iommu but no concept of "bypass"
in there.

We can *add* a new property under some circumstances that indicates a
bypass on a per-device basis, however that doesn't completely solve it:

  - As I said above, what does the absence of that property mean ? An
old qemu that does bypass on all virtio or a new qemu trying to tell
you that the virtio device actually does use the iommu (or some other
environment that isn't qemu) ?

  - On things like macs, the device-tree is generated by openbios, it
would have to have some added logic to try to figure that out, which
means it needs to know *via different means* that some or all virtio
devices bypass the iommu.

I thus go back to my original statement, it's a LOT easier to handle if
the device itself is self describing, indicating whether it is set to
bypass a host iommu or not. For L1->L2, well, that wouldn't be the
first time qemu/VFIO plays tricks with the passed through device
configuration space...

Note that the above can be solved via some kind of compromise: The
device self describes the ability to honor the iommu, along with the
property (or ACPI table entry) that indicates whether or not it does.

IE. We could use the revision or ProgIf field of the config space for
example. Or something in virtio config. If it's an "old" device, we
know it always bypass. If it's a new device, we know it only bypasses
if the corresponding property is in. I still would have to sort out the
openbios case for mac among others but it's at least a workable
direction.

BTW. Don't you have a similar problem on x86 that today qemu claims
that everything honors the iommu in ACPI ?

Unless somebody can come up with a better idea...

Cheers,
Ben.


From mboxrd@z Thu Jan  1 00:00:00 1970
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Subject: Re: [PATCH v4 0/6] virtio core DMA API conversion
Date: Tue, 10 Nov 2015 13:04:36 +1100
Message-ID: <1447121076.31884.61.camel@kernel.crashing.org>
References: <cover.1446162273.git.luto@kernel.org>
	<20151109133624-mutt-send-email-mst@redhat.com>
	<1447109937.31884.42.camel@kernel.crashing.org>
	<CALCETrX7Gkw3WrBHff=TpCFHj444E8hHcR6sAqOghQFBo5wp_A@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Return-path: <virtualization-bounces@lists.linux-foundation.org>
In-Reply-To: <CALCETrX7Gkw3WrBHff=TpCFHj444E8hHcR6sAqOghQFBo5wp_A@mail.gmail.com>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/virtualization>,
	<mailto:virtualization-request@lists.linux-foundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/virtualization/>
List-Post: <mailto:virtualization@lists.linux-foundation.org>
List-Help: <mailto:virtualization-request@lists.linux-foundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/virtualization>,
	<mailto:virtualization-request@lists.linux-foundation.org?subject=subscribe>
Sender: virtualization-bounces@lists.linux-foundation.org
Errors-To: virtualization-bounces@lists.linux-foundation.org
List-Archive: <https://lore.kernel.org/kvm/>
List-Post: <mailto:kvm@vger.kernel.org>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Joerg Roedel <jroedel@suse.de>, KVM <kvm@vger.kernel.org>, linux-s390 <linux-s390@vger.kernel.org>, "Michael S. Tsirkin" <mst@redhat.com>, Sebastian Ott <sebott@linux.vnet.ibm.com>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, Christoph Hellwig <hch@lst.de>, Christian Borntraeger <borntraeger@de.ibm.com>, Andy Lutomirski <luto@kernel.org>, sparclinux@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>, Linux Virtualization <virtualization@lists.linux-foundation.org>, David Woodhouse <dwmw2@infradead.org>, "David S. Miller" <davem@davemloft.net>, Martin Schwidefsky <schwidefsky@de.ibm.com>
List-ID: <linux-s390.vger.kernel.org>

T24gTW9uLCAyMDE1LTExLTA5IGF0IDE2OjQ2IC0wODAwLCBBbmR5IEx1dG9taXJza2kgd3JvdGU6
Cj4gVGhlIHByb2JsZW0gaGVyZSBpcyB0aGF0IGluIHNvbWUgb2YgdGhlIHByb2JsZW1hdGljIGNh
c2VzIHRoZSB2aXJ0aW8KPiBkcml2ZXIgbWF5IG5vdCBldmVuIGJlIGxvYWRlZC7CoCBJZiBzb21l
b25lIHJ1bnMgYW4gTDEgZ3Vlc3Qgd2l0aCBhbgo+IElPTU1VLWJ5cGFzc2luZyB2aXJ0aW8gZGV2
aWNlIGFuZCBhc3NpZ25zIGl0IHRvIEwyIHVzaW5nIHZmaW8sIHRoZW4KPiAqYm9vbSogTDEgY3Jh
c2hlcy7CoCAoU2FtZSBpZiwgc2F5LCBEUERLIGdldHMgdXNlZCwgSSB0aGluay4pCj4gCj4gPgo+
ID4gVGhlIG9ubHkgd2F5IG91dCBvZiB0aGlzIHdoaWxlIGtlZXBpbmcgdGhlICJwbGF0Zm9ybSIg
c3R1ZmYgd291bGQgYmUgdG8KPiA+IGFsc28gYnVtcCBzb21lIGtpbmQgb2YgdmVyc2lvbiBpbiB0
aGUgdmlydGlvIGNvbmZpZyAob3IgUENJIGhlYWRlcikuIEkKPiA+IGhhdmUgbm8gb3RoZXIgd2F5
IHRvIGRpZmZlcmVuY2lhdGUgYmV0d2VlbiAidGhpcyBpcyBhbiBvbGQgcWVtdSB0aGF0Cj4gPiBk
b2Vzbid0IGRvIHRoZSAnYnlwYXNzIHByb3BlcnR5JyB5ZXQiIGZyb20gInRoaXMgaXMgYSB2aXJ0
aW8gZGV2aWNlCj4gPiB0aGF0IGRvZXNuJ3QgYnlwYXNzIi4KPiA+Cj4gPiBBbnkgYmV0dGVyIGlk
ZWEgPwo+IAo+IEknZCBzdWdnZXN0IHRoYXQsIGluIHRoZSBhYnNlbmNlIG9mIHRoZSBuZXcgRFQg
YmluZGluZywgd2UgYXNzdW1lIHRoYXQKPiBhbnkgUENJIGRldmljZSB3aXRoIHRoZSB2aXJ0aW8g
dmVuZG9yIElEIGlzIHBhc3N0aHJvdWdoIG9uIHBvd2VycGMuwqAgSQo+IGNhbiBkbyB0aGlzIGlu
IHRoZSB2aXJ0aW8gZHJpdmVyLCBidXQgaWYgaXQncyBpbiB0aGUgcGxhdGZvcm0gY29kZQo+IHRo
ZW4gdmZpbyBnZXRzIGl0IHJpZ2h0IHRvbyAoaS5lLiBmYWlscyB0byBsb2FkKS4KClRoZSBwcm9i
bGVtIGlzIHRoZXJlIGlzbid0ICphKiB2aXJ0aW8gdmVuZG9yIElELiBJdCdzIHRoZSBSZWRIYXQg
dmVuZG9yCklEIHdoaWNoIHdpbGwgYmUgdXNlZCBieSBtb3JlIHRoYW4ganVzdCB2aXJ0aW8sIHNv
IHdlIG5lZWQgdG8Kc3BlY2lmaWNhbGx5IGxpc3QgdGhlIGRldmljZXMuCgpBZGRpdGlvbmFsbHks
IHRoYXQgc3RpbGwgbWVhbnMgdGhhdCBvbmNlIHdlIGhhdmUgYSB2aXJ0aW8gZGV2aWNlIHRoYXQK
YWN0dWFsbHkgdXNlcyB0aGUgaW9tbXUsIHBvd2VycGMgd2lsbCBub3Qgd29yayBzaW5jZSB0aGUg
Indvcmthcm91bmQiCmFib3ZlIHdpbGwga2ljayBpbi4KClRoZSAiaW4gYWJzZW5jZSBvZiB0aGUg
bmV3IERUIGJpbmRpbmciIGRvZXNuJ3QgbWFrZSB0aGF0IG11Y2ggc2Vuc2UuCgpUaG9zZSBwbGF0
Zm9ybXMgdXNlIGRldmljZS10cmVlcyBkZWZpbmVkIHNpbmNlIHRoZSBkYXduIG9mIGFnZXMgYnkK
YWN0dWFsIG9wZW4gZmlybXdhcmUgaW1wbGVtZW50YXRpb25zLCB0aGV5IGVpdGhlciBoYXZlIG5v
IGlvbW11CnJlcHJlc2VudGF0aW9uIGluIHRoZXJlIChNYWNzLCB0aGUgcGxhdGZvcm0gY29kZSBo
b29rcyBpdCBhbGwgdXApIG9yCmhhdmUgdmFyaW91cyBwcm9wZXJ0aWVzIHJlbGF0ZWQgdG8gdGhl
IGlvbW11IGJ1dCBubyBjb25jZXB0IG9mICJieXBhc3MiCmluIHRoZXJlLgoKV2UgY2FuICphZGQq
IGEgbmV3IHByb3BlcnR5IHVuZGVyIHNvbWUgY2lyY3Vtc3RhbmNlcyB0aGF0IGluZGljYXRlcyBh
CmJ5cGFzcyBvbiBhIHBlci1kZXZpY2UgYmFzaXMsIGhvd2V2ZXIgdGhhdCBkb2Vzbid0IGNvbXBs
ZXRlbHkgc29sdmUgaXQ6CgrCoCAtIEFzIEkgc2FpZCBhYm92ZSwgd2hhdCBkb2VzIHRoZSBhYnNl
bmNlIG9mIHRoYXQgcHJvcGVydHkgbWVhbiA/IEFuCm9sZCBxZW11IHRoYXQgZG9lcyBieXBhc3Mg
b24gYWxsIHZpcnRpbyBvciBhIG5ldyBxZW11IHRyeWluZyB0byB0ZWxsCnlvdSB0aGF0IHRoZSB2
aXJ0aW8gZGV2aWNlIGFjdHVhbGx5IGRvZXMgdXNlIHRoZSBpb21tdSAob3Igc29tZSBvdGhlcgpl
bnZpcm9ubWVudCB0aGF0IGlzbid0IHFlbXUpID8KCsKgIC0gT24gdGhpbmdzIGxpa2UgbWFjcywg
dGhlIGRldmljZS10cmVlIGlzIGdlbmVyYXRlZCBieSBvcGVuYmlvcywgaXQKd291bGQgaGF2ZSB0
byBoYXZlIHNvbWUgYWRkZWQgbG9naWMgdG8gdHJ5IHRvIGZpZ3VyZSB0aGF0IG91dCwgd2hpY2gK
bWVhbnMgaXQgbmVlZHMgdG8ga25vdyAqdmlhIGRpZmZlcmVudCBtZWFucyogdGhhdCBzb21lIG9y
IGFsbCB2aXJ0aW8KZGV2aWNlcyBieXBhc3MgdGhlIGlvbW11LgoKSSB0aHVzIGdvIGJhY2sgdG8g
bXkgb3JpZ2luYWwgc3RhdGVtZW50LCBpdCdzIGEgTE9UIGVhc2llciB0byBoYW5kbGUgaWYKdGhl
IGRldmljZSBpdHNlbGYgaXMgc2VsZiBkZXNjcmliaW5nLCBpbmRpY2F0aW5nIHdoZXRoZXIgaXQg
aXMgc2V0IHRvCmJ5cGFzcyBhIGhvc3QgaW9tbXUgb3Igbm90LiBGb3IgTDEtPkwyLCB3ZWxsLCB0
aGF0IHdvdWxkbid0IGJlIHRoZQpmaXJzdCB0aW1lIHFlbXUvVkZJTyBwbGF5cyB0cmlja3Mgd2l0
aCB0aGUgcGFzc2VkIHRocm91Z2ggZGV2aWNlCmNvbmZpZ3VyYXRpb24gc3BhY2UuLi4KCk5vdGUg
dGhhdCB0aGUgYWJvdmUgY2FuIGJlIHNvbHZlZCB2aWEgc29tZSBraW5kIG9mIGNvbXByb21pc2U6
IFRoZQpkZXZpY2Ugc2VsZiBkZXNjcmliZXMgdGhlIGFiaWxpdHkgdG8gaG9ub3IgdGhlIGlvbW11
LCBhbG9uZyB3aXRoIHRoZQpwcm9wZXJ0eSAob3IgQUNQSSB0YWJsZSBlbnRyeSkgdGhhdCBpbmRp
Y2F0ZXMgd2hldGhlciBvciBub3QgaXQgZG9lcy4KCklFLiBXZSBjb3VsZCB1c2UgdGhlIHJldmlz
aW9uIG9yIFByb2dJZiBmaWVsZCBvZiB0aGUgY29uZmlnIHNwYWNlIGZvcgpleGFtcGxlLiBPciBz
b21ldGhpbmcgaW4gdmlydGlvIGNvbmZpZy4gSWYgaXQncyBhbiAib2xkIiBkZXZpY2UsIHdlCmtu
b3cgaXQgYWx3YXlzIGJ5cGFzcy4gSWYgaXQncyBhIG5ldyBkZXZpY2UsIHdlIGtub3cgaXQgb25s
eSBieXBhc3NlcwppZiB0aGUgY29ycmVzcG9uZGluZyBwcm9wZXJ0eSBpcyBpbi4gSSBzdGlsbCB3
b3VsZCBoYXZlIHRvIHNvcnQgb3V0IHRoZQpvcGVuYmlvcyBjYXNlIGZvciBtYWMgYW1vbmcgb3Ro
ZXJzIGJ1dCBpdCdzIGF0IGxlYXN0IGEgd29ya2FibGUKZGlyZWN0aW9uLgoKQlRXLiBEb24ndCB5
b3UgaGF2ZSBhIHNpbWlsYXIgcHJvYmxlbSBvbiB4ODYgdGhhdCB0b2RheSBxZW11IGNsYWltcwp0
aGF0IGV2ZXJ5dGhpbmcgaG9ub3JzIHRoZSBpb21tdSBpbiBBQ1BJID8KClVubGVzcyBzb21lYm9k
eSBjYW4gY29tZSB1cCB3aXRoIGEgYmV0dGVyIGlkZWEuLi4KCkNoZWVycywKQmVuLgoKX19fX19f
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KVmlydHVhbGl6YXRpb24g
bWFpbGluZyBsaXN0ClZpcnR1YWxpemF0aW9uQGxpc3RzLmxpbnV4LWZvdW5kYXRpb24ub3JnCmh0
dHBzOi8vbGlzdHMubGludXhmb3VuZGF0aW9uLm9yZy9tYWlsbWFuL2xpc3RpbmZvL3ZpcnR1YWxp
emF0aW9u

From mboxrd@z Thu Jan  1 00:00:00 1970
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date: Tue, 10 Nov 2015 02:04:36 +0000
Subject: Re: [PATCH v4 0/6] virtio core DMA API conversion
Message-Id: <1447121076.31884.61.camel@kernel.crashing.org>
List-Id: <sparclinux.vger.kernel.org>
References: <cover.1446162273.git.luto@kernel.org>
	<20151109133624-mutt-send-email-mst@redhat.com>
	<1447109937.31884.42.camel@kernel.crashing.org>
	<CALCETrX7Gkw3WrBHff=TpCFHj444E8hHcR6sAqOghQFBo5wp_A@mail.gmail.com>
In-Reply-To: <CALCETrX7Gkw3WrBHff=TpCFHj444E8hHcR6sAqOghQFBo5wp_A@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
To: Andy Lutomirski <luto@amacapital.net>
Cc: Joerg Roedel <jroedel@suse.de>, KVM <kvm@vger.kernel.org>, linux-s390 <linux-s390@vger.kernel.org>, "Michael S. Tsirkin" <mst@redhat.com>, Sebastian Ott <sebott@linux.vnet.ibm.com>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, Christoph Hellwig <hch@lst.de>, Christian Borntraeger <borntraeger@de.ibm.com>, Andy Lutomirski <luto@kernel.org>, sparclinux@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>, Linux Virtualization <virtualization@lists.linux-foundation.org>, David Woodhouse <dwmw2@infradead.org>, "David S. Miller" <davem@davemloft.net>, Martin Schwidefsky <schwidefsky@de.ibm.com>

On Mon, 2015-11-09 at 16:46 -0800, Andy Lutomirski wrote:
> The problem here is that in some of the problematic cases the virtio
> driver may not even be loaded.  If someone runs an L1 guest with an
> IOMMU-bypassing virtio device and assigns it to L2 using vfio, then
> *boom* L1 crashes.  (Same if, say, DPDK gets used, I think.)
> 
> >
> > The only way out of this while keeping the "platform" stuff would be to
> > also bump some kind of version in the virtio config (or PCI header). I
> > have no other way to differenciate between "this is an old qemu that
> > doesn't do the 'bypass property' yet" from "this is a virtio device
> > that doesn't bypass".
> >
> > Any better idea ?
> 
> I'd suggest that, in the absence of the new DT binding, we assume that
> any PCI device with the virtio vendor ID is passthrough on powerpc.  I
> can do this in the virtio driver, but if it's in the platform code
> then vfio gets it right too (i.e. fails to load).

The problem is there isn't *a* virtio vendor ID. It's the RedHat vendor
ID which will be used by more than just virtio, so we need to
specifically list the devices.

Additionally, that still means that once we have a virtio device that
actually uses the iommu, powerpc will not work since the "workaround"
above will kick in.

The "in absence of the new DT binding" doesn't make that much sense.

Those platforms use device-trees defined since the dawn of ages by
actual open firmware implementations, they either have no iommu
representation in there (Macs, the platform code hooks it all up) or
have various properties related to the iommu but no concept of "bypass"
in there.

We can *add* a new property under some circumstances that indicates a
bypass on a per-device basis, however that doesn't completely solve it:

  - As I said above, what does the absence of that property mean ? An
old qemu that does bypass on all virtio or a new qemu trying to tell
you that the virtio device actually does use the iommu (or some other
environment that isn't qemu) ?

  - On things like macs, the device-tree is generated by openbios, it
would have to have some added logic to try to figure that out, which
means it needs to know *via different means* that some or all virtio
devices bypass the iommu.

I thus go back to my original statement, it's a LOT easier to handle if
the device itself is self describing, indicating whether it is set to
bypass a host iommu or not. For L1->L2, well, that wouldn't be the
first time qemu/VFIO plays tricks with the passed through device
configuration space...

Note that the above can be solved via some kind of compromise: The
device self describes the ability to honor the iommu, along with the
property (or ACPI table entry) that indicates whether or not it does.

IE. We could use the revision or ProgIf field of the config space for
example. Or something in virtio config. If it's an "old" device, we
know it always bypass. If it's a new device, we know it only bypasses
if the corresponding property is in. I still would have to sort out the
openbios case for mac among others but it's at least a workable
direction.

BTW. Don't you have a similar problem on x86 that today qemu claims
that everything honors the iommu in ACPI ?

Unless somebody can come up with a better idea...

Cheers,
Ben.