From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1161248AbcA1VwI (ORCPT <rfc822;w@1wt.eu>);
	Thu, 28 Jan 2016 16:52:08 -0500
Received: from mx1.redhat.com ([209.132.183.28]:44863 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S967789AbcA1Vvl (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Thu, 28 Jan 2016 16:51:41 -0500
Message-ID: <1454017899.23148.0.camel@redhat.com>
Subject: Re: [PATCH 00/10] KVM PCIe/MSI passthrough on ARM/ARM64
From: Alex Williamson <alex.williamson@redhat.com>
To: Eric Auger <eric.auger@linaro.org>, eric.auger@st.com, will.deacon@arm.com,
        christoffer.dall@linaro.org, marc.zyngier@arm.com,
        linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu,
        kvm@vger.kernel.org
Cc: Bharat.Bhushan@freescale.com, pranav.sawargaonkar@gmail.com,
        p.fedin@samsung.com, suravee.suthikulpanit@amd.com,
        linux-kernel@vger.kernel.org, patches@linaro.org,
        iommu@lists.linux-foundation.org
Date: Thu, 28 Jan 2016 14:51:39 -0700
In-Reply-To: <1453813968-2024-1-git-send-email-eric.auger@linaro.org>
References: <1453813968-2024-1-git-send-email-eric.auger@linaro.org>
Content-Type: text/plain; charset="UTF-8"
Mime-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, 2016-01-26 at 13:12 +0000, Eric Auger wrote:
> This series addresses KVM PCIe passthrough with MSI enabled on ARM/ARM64.
> It pursues the efforts done on [1], [2], [3]. It also aims at covering the
> same need on some PowerPC platforms.
> 
> On x86 all accesses to the 1MB PA region [FEE0_0000h - FEF0_000h] are directed
> as interrupt messages: accesses to this special PA window directly target the
> APIC configuration space and not DRAM, meaning the downstream IOMMU is bypassed.
> 
> This is not the case on above mentionned platforms where MSI messages emitted
> by devices are conveyed through the IOMMU. This means an IOVA/host PA mapping
> must exist for the MSI to reach the MSI controller. Normal way to create
> IOVA bindings consists in using VFIO DMA MAP API. However in this case
> the MSI IOVA is not mapped onto guest RAM but on host physical page (the MSI
> controller frame).
> 
> Following first comments, the spirit of [2] is kept: the guest registers
> an IOVA range reserved for MSI mapping. When the VFIO-PCIe driver allocates
> its MSI vectors, it overwrites the MSI controller physical address with an IOVA,
> allocated within the window provided by the userspace. This IOVA is mapped
> onto the MSI controller frame physical page.
> 
> The series does not address yet the problematic of telling the userspace how
> much IOVA he should provision.

I'm sort of on a think-different approach today, so bear with me; how is
it that x86 can make interrupt remapping so transparent to drivers like
vfio-pci while for ARM and ppc we seem to be stuck with doing these
fixups of the physical vector ourselves, implying ugly (no offense)
paths bouncing through vfio to connect the driver and iommu backends?

We know that x86 handles MSI vectors specially, so there is some
hardware that helps the situation.  It's not just that x86 has a fixed
range for MSI, it's how it manages that range when interrupt remapping
hardware is enabled.  A device table indexed by source-ID references a
per device table indexed by data from the MSI write itself.  So we get
much, much finer granularity, but there's still effectively an interrupt
domain per device that's being transparently managed under the covers
whenever we request an MSI vector for a device.

So why can't we do something more like that here?  There's no predefined
MSI vector range, so defining an interface for the user to specify that
is unavoidable.  But why shouldn't everything else be transparent?  We
could add an interface to the IOMMU API that allows us to register that
reserved range for the IOMMU domain.  IOMMU-core (or maybe interrupt
remapping) code might allocate an IOVA domain for this just as you've
done in the type1 code here.  But rather than having any interaction
with vfio-pci, why not do this at lower levels such that the platform
interrupt vector allocation code automatically uses one of those IOVA
ranges and returns the IOVA rather than the physical address for the PCI
code to program into the device?  I think we know what needs to be done,
but we're taking the approach of managing the space ourselves and doing
a fixup of the device after the core code has done its job when we
really ought to be letting the core code manage a space that we define
and programming the device so that it doesn't need a fixup in the
vfio-pci code.  Wouldn't it be nicer if pci_enable_msix_range() returned
with the device properly programmed or generate an error if there's not
enough reserved mapping space in IOMMU domain?  Can it be done?  Thanks,

Alex

From mboxrd@z Thu Jan  1 00:00:00 1970
From: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Subject: Re: [PATCH 00/10] KVM PCIe/MSI passthrough on ARM/ARM64
Date: Thu, 28 Jan 2016 14:51:39 -0700
Message-ID: <1454017899.23148.0.camel@redhat.com>
References: <1453813968-2024-1-git-send-email-eric.auger@linaro.org>
Mime-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Cc: patches-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org, p.fedin-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
To: Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>, eric.auger-qxv4g6HH51o@public.gmane.org, will.deacon-5wv7dgnIgG8@public.gmane.org,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org, marc.zyngier-5wv7dgnIgG8@public.gmane.org,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org, kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg@public.gmane.org,
	kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Return-path: <iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
In-Reply-To: <1453813968-2024-1-git-send-email-eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/iommu>,
	<mailto:iommu-request-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/iommu/>
List-Post: <mailto:iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
List-Help: <mailto:iommu-request-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/iommu>,
	<mailto:iommu-request-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org?subject=subscribe>
Sender: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
Errors-To: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
List-Id: kvm.vger.kernel.org

T24gVHVlLCAyMDE2LTAxLTI2IGF0IDEzOjEyICswMDAwLCBFcmljIEF1Z2VyIHdyb3RlOgo+IFRo
aXMgc2VyaWVzIGFkZHJlc3NlcyBLVk0gUENJZSBwYXNzdGhyb3VnaCB3aXRoIE1TSSBlbmFibGVk
IG9uIEFSTS9BUk02NC4KPiBJdCBwdXJzdWVzIHRoZSBlZmZvcnRzIGRvbmUgb24gWzFdLCBbMl0s
IFszXS4gSXQgYWxzbyBhaW1zIGF0IGNvdmVyaW5nIHRoZQo+IHNhbWUgbmVlZCBvbiBzb21lIFBv
d2VyUEMgcGxhdGZvcm1zLgo+wqAKPiBPbiB4ODYgYWxsIGFjY2Vzc2VzIHRvIHRoZSAxTUIgUEEg
cmVnaW9uIFtGRUUwXzAwMDBoIC0gRkVGMF8wMDBoXSBhcmUgZGlyZWN0ZWQKPiBhcyBpbnRlcnJ1
cHQgbWVzc2FnZXM6IGFjY2Vzc2VzIHRvIHRoaXMgc3BlY2lhbCBQQSB3aW5kb3cgZGlyZWN0bHkg
dGFyZ2V0IHRoZQo+IEFQSUMgY29uZmlndXJhdGlvbiBzcGFjZSBhbmQgbm90IERSQU0sIG1lYW5p
bmcgdGhlIGRvd25zdHJlYW0gSU9NTVUgaXMgYnlwYXNzZWQuCj7CoAo+IFRoaXMgaXMgbm90IHRo
ZSBjYXNlIG9uIGFib3ZlIG1lbnRpb25uZWQgcGxhdGZvcm1zIHdoZXJlIE1TSSBtZXNzYWdlcyBl
bWl0dGVkCj4gYnkgZGV2aWNlcyBhcmUgY29udmV5ZWQgdGhyb3VnaCB0aGUgSU9NTVUuIFRoaXMg
bWVhbnMgYW4gSU9WQS9ob3N0IFBBIG1hcHBpbmcKPiBtdXN0IGV4aXN0IGZvciB0aGUgTVNJIHRv
IHJlYWNoIHRoZSBNU0kgY29udHJvbGxlci4gTm9ybWFsIHdheSB0byBjcmVhdGUKPiBJT1ZBIGJp
bmRpbmdzIGNvbnNpc3RzIGluIHVzaW5nIFZGSU8gRE1BIE1BUCBBUEkuIEhvd2V2ZXIgaW4gdGhp
cyBjYXNlCj4gdGhlIE1TSSBJT1ZBIGlzIG5vdCBtYXBwZWQgb250byBndWVzdCBSQU0gYnV0IG9u
IGhvc3QgcGh5c2ljYWwgcGFnZSAodGhlIE1TSQo+IGNvbnRyb2xsZXIgZnJhbWUpLgo+wqAKPiBG
b2xsb3dpbmcgZmlyc3QgY29tbWVudHMsIHRoZSBzcGlyaXQgb2YgWzJdIGlzIGtlcHQ6IHRoZSBn
dWVzdCByZWdpc3RlcnMKPiBhbiBJT1ZBIHJhbmdlIHJlc2VydmVkIGZvciBNU0kgbWFwcGluZy4g
V2hlbiB0aGUgVkZJTy1QQ0llIGRyaXZlciBhbGxvY2F0ZXMKPiBpdHMgTVNJIHZlY3RvcnMsIGl0
IG92ZXJ3cml0ZXMgdGhlIE1TSSBjb250cm9sbGVyIHBoeXNpY2FsIGFkZHJlc3Mgd2l0aCBhbiBJ
T1ZBLAo+IGFsbG9jYXRlZCB3aXRoaW4gdGhlIHdpbmRvdyBwcm92aWRlZCBieSB0aGUgdXNlcnNw
YWNlLiBUaGlzIElPVkEgaXMgbWFwcGVkCj4gb250byB0aGUgTVNJIGNvbnRyb2xsZXIgZnJhbWUg
cGh5c2ljYWwgcGFnZS4KPsKgCj4gVGhlIHNlcmllcyBkb2VzIG5vdCBhZGRyZXNzIHlldCB0aGUg
cHJvYmxlbWF0aWMgb2YgdGVsbGluZyB0aGUgdXNlcnNwYWNlIGhvdwo+IG11Y2ggSU9WQSBoZSBz
aG91bGQgcHJvdmlzaW9uLgoKSSdtIHNvcnQgb2Ygb24gYSB0aGluay1kaWZmZXJlbnQgYXBwcm9h
Y2ggdG9kYXksIHNvIGJlYXIgd2l0aCBtZTsgaG93IGlzCml0IHRoYXQgeDg2IGNhbiBtYWtlIGlu
dGVycnVwdCByZW1hcHBpbmcgc28gdHJhbnNwYXJlbnQgdG8gZHJpdmVycyBsaWtlCnZmaW8tcGNp
IHdoaWxlIGZvciBBUk0gYW5kIHBwYyB3ZSBzZWVtIHRvIGJlIHN0dWNrIHdpdGggZG9pbmcgdGhl
c2UKZml4dXBzIG9mIHRoZSBwaHlzaWNhbCB2ZWN0b3Igb3Vyc2VsdmVzLCBpbXBseWluZyB1Z2x5
IChubyBvZmZlbnNlKQpwYXRocyBib3VuY2luZyB0aHJvdWdoIHZmaW8gdG8gY29ubmVjdCB0aGUg
ZHJpdmVyIGFuZCBpb21tdSBiYWNrZW5kcz8KCldlIGtub3cgdGhhdCB4ODYgaGFuZGxlcyBNU0kg
dmVjdG9ycyBzcGVjaWFsbHksIHNvIHRoZXJlIGlzIHNvbWUKaGFyZHdhcmUgdGhhdCBoZWxwcyB0
aGUgc2l0dWF0aW9uLsKgwqBJdCdzIG5vdCBqdXN0IHRoYXQgeDg2IGhhcyBhIGZpeGVkCnJhbmdl
IGZvciBNU0ksIGl0J3MgaG93IGl0IG1hbmFnZXMgdGhhdCByYW5nZSB3aGVuIGludGVycnVwdCBy
ZW1hcHBpbmcKaGFyZHdhcmUgaXMgZW5hYmxlZC7CoMKgQSBkZXZpY2UgdGFibGUgaW5kZXhlZCBi
eSBzb3VyY2UtSUQgcmVmZXJlbmNlcyBhCnBlciBkZXZpY2UgdGFibGUgaW5kZXhlZCBieSBkYXRh
IGZyb20gdGhlIE1TSSB3cml0ZSBpdHNlbGYuwqDCoFNvIHdlIGdldAptdWNoLCBtdWNoIGZpbmVy
IGdyYW51bGFyaXR5LCBidXQgdGhlcmUncyBzdGlsbCBlZmZlY3RpdmVseSBhbiBpbnRlcnJ1cHQK
ZG9tYWluIHBlciBkZXZpY2UgdGhhdCdzIGJlaW5nIHRyYW5zcGFyZW50bHkgbWFuYWdlZCB1bmRl
ciB0aGUgY292ZXJzCndoZW5ldmVyIHdlIHJlcXVlc3QgYW4gTVNJIHZlY3RvciBmb3IgYSBkZXZp
Y2UuCgpTbyB3aHkgY2FuJ3Qgd2UgZG8gc29tZXRoaW5nIG1vcmUgbGlrZSB0aGF0IGhlcmU/wqDC
oFRoZXJlJ3Mgbm8gcHJlZGVmaW5lZApNU0kgdmVjdG9yIHJhbmdlLCBzbyBkZWZpbmluZyBhbiBp
bnRlcmZhY2UgZm9yIHRoZSB1c2VyIHRvIHNwZWNpZnkgdGhhdAppcyB1bmF2b2lkYWJsZS7CoMKg
QnV0IHdoeSBzaG91bGRuJ3QgZXZlcnl0aGluZyBlbHNlIGJlIHRyYW5zcGFyZW50P8KgwqBXZQpj
b3VsZCBhZGQgYW4gaW50ZXJmYWNlIHRvIHRoZSBJT01NVSBBUEkgdGhhdCBhbGxvd3MgdXMgdG8g
cmVnaXN0ZXIgdGhhdApyZXNlcnZlZCByYW5nZSBmb3IgdGhlIElPTU1VIGRvbWFpbi7CoMKgSU9N
TVUtY29yZSAob3IgbWF5YmUgaW50ZXJydXB0CnJlbWFwcGluZykgY29kZSBtaWdodCBhbGxvY2F0
ZSBhbiBJT1ZBIGRvbWFpbiBmb3IgdGhpcyBqdXN0IGFzIHlvdSd2ZQpkb25lIGluIHRoZSB0eXBl
MSBjb2RlIGhlcmUuwqDCoEJ1dCByYXRoZXIgdGhhbiBoYXZpbmcgYW55IGludGVyYWN0aW9uCndp
dGggdmZpby1wY2ksIHdoeSBub3QgZG8gdGhpcyBhdCBsb3dlciBsZXZlbHMgc3VjaCB0aGF0IHRo
ZSBwbGF0Zm9ybQppbnRlcnJ1cHQgdmVjdG9yIGFsbG9jYXRpb24gY29kZSBhdXRvbWF0aWNhbGx5
IHVzZXMgb25lIG9mIHRob3NlIElPVkEKcmFuZ2VzIGFuZCByZXR1cm5zIHRoZSBJT1ZBIHJhdGhl
ciB0aGFuIHRoZSBwaHlzaWNhbCBhZGRyZXNzIGZvciB0aGUgUENJCmNvZGUgdG8gcHJvZ3JhbSBp
bnRvIHRoZSBkZXZpY2U/wqDCoEkgdGhpbmsgd2Uga25vdyB3aGF0IG5lZWRzIHRvIGJlIGRvbmUs
CmJ1dCB3ZSdyZSB0YWtpbmcgdGhlIGFwcHJvYWNoIG9mIG1hbmFnaW5nIHRoZSBzcGFjZSBvdXJz
ZWx2ZXMgYW5kIGRvaW5nCmEgZml4dXAgb2YgdGhlIGRldmljZSBhZnRlciB0aGUgY29yZSBjb2Rl
IGhhcyBkb25lIGl0cyBqb2Igd2hlbiB3ZQpyZWFsbHkgb3VnaHQgdG8gYmUgbGV0dGluZyB0aGUg
Y29yZSBjb2RlIG1hbmFnZSBhIHNwYWNlIHRoYXQgd2UgZGVmaW5lCmFuZCBwcm9ncmFtbWluZyB0
aGUgZGV2aWNlIHNvIHRoYXQgaXQgZG9lc24ndCBuZWVkIGEgZml4dXAgaW4gdGhlCnZmaW8tcGNp
IGNvZGUuwqDCoFdvdWxkbid0IGl0IGJlIG5pY2VyIGlmIHBjaV9lbmFibGVfbXNpeF9yYW5nZSgp
IHJldHVybmVkCndpdGggdGhlIGRldmljZSBwcm9wZXJseSBwcm9ncmFtbWVkIG9yIGdlbmVyYXRl
IGFuIGVycm9yIGlmIHRoZXJlJ3Mgbm90CmVub3VnaCByZXNlcnZlZCBtYXBwaW5nIHNwYWNlIGlu
IElPTU1VIGRvbWFpbj8gIENhbiBpdCBiZSBkb25lP8KgwqBUaGFua3MsCgpBbGV4CgpfX19fX19f
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwppb21tdSBtYWlsaW5nIGxp
c3QKaW9tbXVAbGlzdHMubGludXgtZm91bmRhdGlvbi5vcmcKaHR0cHM6Ly9saXN0cy5saW51eGZv
dW5kYXRpb24ub3JnL21haWxtYW4vbGlzdGluZm8vaW9tbXU=

From mboxrd@z Thu Jan  1 00:00:00 1970
From: alex.williamson@redhat.com (Alex Williamson)
Date: Thu, 28 Jan 2016 14:51:39 -0700
Subject: [PATCH 00/10] KVM PCIe/MSI passthrough on ARM/ARM64
In-Reply-To: <1453813968-2024-1-git-send-email-eric.auger@linaro.org>
References: <1453813968-2024-1-git-send-email-eric.auger@linaro.org>
Message-ID: <1454017899.23148.0.camel@redhat.com>
To: linux-arm-kernel@lists.infradead.org
List-Id: linux-arm-kernel.lists.infradead.org

On Tue, 2016-01-26 at 13:12 +0000, Eric Auger wrote:
> This series addresses KVM PCIe passthrough with MSI enabled on ARM/ARM64.
> It pursues the efforts done on [1], [2], [3]. It also aims at covering the
> same need on some PowerPC platforms.
>?
> On x86 all accesses to the 1MB PA region [FEE0_0000h - FEF0_000h] are directed
> as interrupt messages: accesses to this special PA window directly target the
> APIC configuration space and not DRAM, meaning the downstream IOMMU is bypassed.
>?
> This is not the case on above mentionned platforms where MSI messages emitted
> by devices are conveyed through the IOMMU. This means an IOVA/host PA mapping
> must exist for the MSI to reach the MSI controller. Normal way to create
> IOVA bindings consists in using VFIO DMA MAP API. However in this case
> the MSI IOVA is not mapped onto guest RAM but on host physical page (the MSI
> controller frame).
>?
> Following first comments, the spirit of [2] is kept: the guest registers
> an IOVA range reserved for MSI mapping. When the VFIO-PCIe driver allocates
> its MSI vectors, it overwrites the MSI controller physical address with an IOVA,
> allocated within the window provided by the userspace. This IOVA is mapped
> onto the MSI controller frame physical page.
>?
> The series does not address yet the problematic of telling the userspace how
> much IOVA he should provision.

I'm sort of on a think-different approach today, so bear with me; how is
it that x86 can make interrupt remapping so transparent to drivers like
vfio-pci while for ARM and ppc we seem to be stuck with doing these
fixups of the physical vector ourselves, implying ugly (no offense)
paths bouncing through vfio to connect the driver and iommu backends?

We know that x86 handles MSI vectors specially, so there is some
hardware that helps the situation.??It's not just that x86 has a fixed
range for MSI, it's how it manages that range when interrupt remapping
hardware is enabled.??A device table indexed by source-ID references a
per device table indexed by data from the MSI write itself.??So we get
much, much finer granularity, but there's still effectively an interrupt
domain per device that's being transparently managed under the covers
whenever we request an MSI vector for a device.

So why can't we do something more like that here???There's no predefined
MSI vector range, so defining an interface for the user to specify that
is unavoidable.??But why shouldn't everything else be transparent???We
could add an interface to the IOMMU API that allows us to register that
reserved range for the IOMMU domain.??IOMMU-core (or maybe interrupt
remapping) code might allocate an IOVA domain for this just as you've
done in the type1 code here.??But rather than having any interaction
with vfio-pci, why not do this at lower levels such that the platform
interrupt vector allocation code automatically uses one of those IOVA
ranges and returns the IOVA rather than the physical address for the PCI
code to program into the device???I think we know what needs to be done,
but we're taking the approach of managing the space ourselves and doing
a fixup of the device after the core code has done its job when we
really ought to be letting the core code manage a space that we define
and programming the device so that it doesn't need a fixup in the
vfio-pci code.??Wouldn't it be nicer if pci_enable_msix_range() returned
with the device properly programmed or generate an error if there's not
enough reserved mapping space in IOMMU domain?  Can it be done???Thanks,

Alex