All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Eric Auger <eric.auger@linaro.org>,
	eric.auger@st.com, will.deacon@arm.com,
	christoffer.dall@linaro.org, marc.zyngier@arm.com,
	linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org
Cc: Bharat.Bhushan@freescale.com, pranav.sawargaonkar@gmail.com,
	p.fedin@samsung.com, suravee.suthikulpanit@amd.com,
	linux-kernel@vger.kernel.org, patches@linaro.org,
	iommu@lists.linux-foundation.org
Subject: Re: [PATCH 00/10] KVM PCIe/MSI passthrough on ARM/ARM64
Date: Thu, 28 Jan 2016 14:51:39 -0700	[thread overview]
Message-ID: <1454017899.23148.0.camel@redhat.com> (raw)
In-Reply-To: <1453813968-2024-1-git-send-email-eric.auger@linaro.org>

On Tue, 2016-01-26 at 13:12 +0000, Eric Auger wrote:
> This series addresses KVM PCIe passthrough with MSI enabled on ARM/ARM64.
> It pursues the efforts done on [1], [2], [3]. It also aims at covering the
> same need on some PowerPC platforms.
> 
> On x86 all accesses to the 1MB PA region [FEE0_0000h - FEF0_000h] are directed
> as interrupt messages: accesses to this special PA window directly target the
> APIC configuration space and not DRAM, meaning the downstream IOMMU is bypassed.
> 
> This is not the case on above mentionned platforms where MSI messages emitted
> by devices are conveyed through the IOMMU. This means an IOVA/host PA mapping
> must exist for the MSI to reach the MSI controller. Normal way to create
> IOVA bindings consists in using VFIO DMA MAP API. However in this case
> the MSI IOVA is not mapped onto guest RAM but on host physical page (the MSI
> controller frame).
> 
> Following first comments, the spirit of [2] is kept: the guest registers
> an IOVA range reserved for MSI mapping. When the VFIO-PCIe driver allocates
> its MSI vectors, it overwrites the MSI controller physical address with an IOVA,
> allocated within the window provided by the userspace. This IOVA is mapped
> onto the MSI controller frame physical page.
> 
> The series does not address yet the problematic of telling the userspace how
> much IOVA he should provision.

I'm sort of on a think-different approach today, so bear with me; how is
it that x86 can make interrupt remapping so transparent to drivers like
vfio-pci while for ARM and ppc we seem to be stuck with doing these
fixups of the physical vector ourselves, implying ugly (no offense)
paths bouncing through vfio to connect the driver and iommu backends?

We know that x86 handles MSI vectors specially, so there is some
hardware that helps the situation.  It's not just that x86 has a fixed
range for MSI, it's how it manages that range when interrupt remapping
hardware is enabled.  A device table indexed by source-ID references a
per device table indexed by data from the MSI write itself.  So we get
much, much finer granularity, but there's still effectively an interrupt
domain per device that's being transparently managed under the covers
whenever we request an MSI vector for a device.

So why can't we do something more like that here?  There's no predefined
MSI vector range, so defining an interface for the user to specify that
is unavoidable.  But why shouldn't everything else be transparent?  We
could add an interface to the IOMMU API that allows us to register that
reserved range for the IOMMU domain.  IOMMU-core (or maybe interrupt
remapping) code might allocate an IOVA domain for this just as you've
done in the type1 code here.  But rather than having any interaction
with vfio-pci, why not do this at lower levels such that the platform
interrupt vector allocation code automatically uses one of those IOVA
ranges and returns the IOVA rather than the physical address for the PCI
code to program into the device?  I think we know what needs to be done,
but we're taking the approach of managing the space ourselves and doing
a fixup of the device after the core code has done its job when we
really ought to be letting the core code manage a space that we define
and programming the device so that it doesn't need a fixup in the
vfio-pci code.  Wouldn't it be nicer if pci_enable_msix_range() returned
with the device properly programmed or generate an error if there's not
enough reserved mapping space in IOMMU domain?  Can it be done?  Thanks,

Alex

WARNING: multiple messages have this Message-ID (diff)
From: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>,
	eric.auger-qxv4g6HH51o@public.gmane.org,
	will.deacon-5wv7dgnIgG8@public.gmane.org,
	christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org,
	marc.zyngier-5wv7dgnIgG8@public.gmane.org,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg@public.gmane.org,
	kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: patches-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org,
	p.fedin-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	pranav.sawargaonkar-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
Subject: Re: [PATCH 00/10] KVM PCIe/MSI passthrough on ARM/ARM64
Date: Thu, 28 Jan 2016 14:51:39 -0700	[thread overview]
Message-ID: <1454017899.23148.0.camel@redhat.com> (raw)
In-Reply-To: <1453813968-2024-1-git-send-email-eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>

On Tue, 2016-01-26 at 13:12 +0000, Eric Auger wrote:
> This series addresses KVM PCIe passthrough with MSI enabled on ARM/ARM64.
> It pursues the efforts done on [1], [2], [3]. It also aims at covering the
> same need on some PowerPC platforms.
> 
> On x86 all accesses to the 1MB PA region [FEE0_0000h - FEF0_000h] are directed
> as interrupt messages: accesses to this special PA window directly target the
> APIC configuration space and not DRAM, meaning the downstream IOMMU is bypassed.
> 
> This is not the case on above mentionned platforms where MSI messages emitted
> by devices are conveyed through the IOMMU. This means an IOVA/host PA mapping
> must exist for the MSI to reach the MSI controller. Normal way to create
> IOVA bindings consists in using VFIO DMA MAP API. However in this case
> the MSI IOVA is not mapped onto guest RAM but on host physical page (the MSI
> controller frame).
> 
> Following first comments, the spirit of [2] is kept: the guest registers
> an IOVA range reserved for MSI mapping. When the VFIO-PCIe driver allocates
> its MSI vectors, it overwrites the MSI controller physical address with an IOVA,
> allocated within the window provided by the userspace. This IOVA is mapped
> onto the MSI controller frame physical page.
> 
> The series does not address yet the problematic of telling the userspace how
> much IOVA he should provision.

I'm sort of on a think-different approach today, so bear with me; how is
it that x86 can make interrupt remapping so transparent to drivers like
vfio-pci while for ARM and ppc we seem to be stuck with doing these
fixups of the physical vector ourselves, implying ugly (no offense)
paths bouncing through vfio to connect the driver and iommu backends?

We know that x86 handles MSI vectors specially, so there is some
hardware that helps the situation.  It's not just that x86 has a fixed
range for MSI, it's how it manages that range when interrupt remapping
hardware is enabled.  A device table indexed by source-ID references a
per device table indexed by data from the MSI write itself.  So we get
much, much finer granularity, but there's still effectively an interrupt
domain per device that's being transparently managed under the covers
whenever we request an MSI vector for a device.

So why can't we do something more like that here?  There's no predefined
MSI vector range, so defining an interface for the user to specify that
is unavoidable.  But why shouldn't everything else be transparent?  We
could add an interface to the IOMMU API that allows us to register that
reserved range for the IOMMU domain.  IOMMU-core (or maybe interrupt
remapping) code might allocate an IOVA domain for this just as you've
done in the type1 code here.  But rather than having any interaction
with vfio-pci, why not do this at lower levels such that the platform
interrupt vector allocation code automatically uses one of those IOVA
ranges and returns the IOVA rather than the physical address for the PCI
code to program into the device?  I think we know what needs to be done,
but we're taking the approach of managing the space ourselves and doing
a fixup of the device after the core code has done its job when we
really ought to be letting the core code manage a space that we define
and programming the device so that it doesn't need a fixup in the
vfio-pci code.  Wouldn't it be nicer if pci_enable_msix_range() returned
with the device properly programmed or generate an error if there's not
enough reserved mapping space in IOMMU domain?  Can it be done?  Thanks,

Alex

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

WARNING: multiple messages have this Message-ID (diff)
From: alex.williamson@redhat.com (Alex Williamson)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 00/10] KVM PCIe/MSI passthrough on ARM/ARM64
Date: Thu, 28 Jan 2016 14:51:39 -0700	[thread overview]
Message-ID: <1454017899.23148.0.camel@redhat.com> (raw)
In-Reply-To: <1453813968-2024-1-git-send-email-eric.auger@linaro.org>

On Tue, 2016-01-26 at 13:12 +0000, Eric Auger wrote:
> This series addresses KVM PCIe passthrough with MSI enabled on ARM/ARM64.
> It pursues the efforts done on [1], [2], [3]. It also aims at covering the
> same need on some PowerPC platforms.
>?
> On x86 all accesses to the 1MB PA region [FEE0_0000h - FEF0_000h] are directed
> as interrupt messages: accesses to this special PA window directly target the
> APIC configuration space and not DRAM, meaning the downstream IOMMU is bypassed.
>?
> This is not the case on above mentionned platforms where MSI messages emitted
> by devices are conveyed through the IOMMU. This means an IOVA/host PA mapping
> must exist for the MSI to reach the MSI controller. Normal way to create
> IOVA bindings consists in using VFIO DMA MAP API. However in this case
> the MSI IOVA is not mapped onto guest RAM but on host physical page (the MSI
> controller frame).
>?
> Following first comments, the spirit of [2] is kept: the guest registers
> an IOVA range reserved for MSI mapping. When the VFIO-PCIe driver allocates
> its MSI vectors, it overwrites the MSI controller physical address with an IOVA,
> allocated within the window provided by the userspace. This IOVA is mapped
> onto the MSI controller frame physical page.
>?
> The series does not address yet the problematic of telling the userspace how
> much IOVA he should provision.

I'm sort of on a think-different approach today, so bear with me; how is
it that x86 can make interrupt remapping so transparent to drivers like
vfio-pci while for ARM and ppc we seem to be stuck with doing these
fixups of the physical vector ourselves, implying ugly (no offense)
paths bouncing through vfio to connect the driver and iommu backends?

We know that x86 handles MSI vectors specially, so there is some
hardware that helps the situation.??It's not just that x86 has a fixed
range for MSI, it's how it manages that range when interrupt remapping
hardware is enabled.??A device table indexed by source-ID references a
per device table indexed by data from the MSI write itself.??So we get
much, much finer granularity, but there's still effectively an interrupt
domain per device that's being transparently managed under the covers
whenever we request an MSI vector for a device.

So why can't we do something more like that here???There's no predefined
MSI vector range, so defining an interface for the user to specify that
is unavoidable.??But why shouldn't everything else be transparent???We
could add an interface to the IOMMU API that allows us to register that
reserved range for the IOMMU domain.??IOMMU-core (or maybe interrupt
remapping) code might allocate an IOVA domain for this just as you've
done in the type1 code here.??But rather than having any interaction
with vfio-pci, why not do this at lower levels such that the platform
interrupt vector allocation code automatically uses one of those IOVA
ranges and returns the IOVA rather than the physical address for the PCI
code to program into the device???I think we know what needs to be done,
but we're taking the approach of managing the space ourselves and doing
a fixup of the device after the core code has done its job when we
really ought to be letting the core code manage a space that we define
and programming the device so that it doesn't need a fixup in the
vfio-pci code.??Wouldn't it be nicer if pci_enable_msix_range() returned
with the device properly programmed or generate an error if there's not
enough reserved mapping space in IOMMU domain?  Can it be done???Thanks,

Alex

  parent reply	other threads:[~2016-01-28 21:52 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-26 13:12 [PATCH 00/10] KVM PCIe/MSI passthrough on ARM/ARM64 Eric Auger
2016-01-26 13:12 ` Eric Auger
2016-01-26 13:12 ` Eric Auger
2016-01-26 13:12 ` [PATCH 01/10] iommu: Add DOMAIN_ATTR_MSI_MAPPING attribute Eric Auger
2016-01-26 13:12   ` Eric Auger
2016-01-26 13:12   ` Eric Auger
2016-01-26 13:12 ` [PATCH 02/10] vfio: expose MSI mapping requirement through VFIO_IOMMU_GET_INFO Eric Auger
2016-01-26 13:12   ` Eric Auger
2016-01-26 13:12   ` Eric Auger
2016-01-26 13:12 ` [PATCH 03/10] vfio_iommu_type1: add reserved binding RB tree management Eric Auger
2016-01-26 13:12   ` Eric Auger
2016-01-26 13:12   ` Eric Auger
2016-01-26 13:12 ` [PATCH 04/10] vfio: introduce VFIO_IOVA_RESERVED vfio_dma type Eric Auger
2016-01-26 13:12   ` Eric Auger
2016-01-26 13:12   ` Eric Auger
2016-01-26 13:12 ` [PATCH 05/10] vfio/type1: attach a reserved iova domain to vfio_domain Eric Auger
2016-01-26 13:12   ` Eric Auger
2016-01-26 13:12   ` Eric Auger
2016-01-26 13:12 ` [PATCH 06/10] vfio: introduce vfio_group_alloc_map_/unmap_free_reserved_iova Eric Auger
2016-01-26 13:12   ` Eric Auger
2016-01-26 13:12   ` Eric Auger
2016-01-26 16:17   ` kbuild test robot
2016-01-26 16:17     ` kbuild test robot
2016-01-26 16:17     ` kbuild test robot
2016-01-26 16:17     ` kbuild test robot
2016-01-26 16:37     ` Eric Auger
2016-01-26 16:37       ` Eric Auger
2016-01-26 16:37       ` Eric Auger
2016-01-26 13:12 ` [PATCH 07/10] vfio: pci: cache the vfio_group in vfio_pci_device Eric Auger
2016-01-26 13:12   ` Eric Auger
2016-01-26 13:12   ` Eric Auger
2016-01-26 13:12 ` [PATCH 08/10] vfio: introduce vfio_group_require_msi_mapping Eric Auger
2016-01-26 13:12   ` Eric Auger
2016-01-26 13:12   ` Eric Auger
2016-01-26 13:12 ` [PATCH 09/10] vfio-pci: create an iommu mapping for msi address Eric Auger
2016-01-26 13:12   ` Eric Auger
2016-01-26 13:12   ` Eric Auger
2016-01-26 14:43   ` kbuild test robot
2016-01-26 14:43     ` kbuild test robot
2016-01-26 14:43     ` kbuild test robot
2016-01-26 14:43     ` kbuild test robot
2016-01-26 15:14     ` Eric Auger
2016-01-26 15:14       ` Eric Auger
2016-01-26 15:14       ` Eric Auger
2016-01-26 13:12 ` [PATCH 10/10] vfio: allow the user to register reserved iova range for MSI mapping Eric Auger
2016-01-26 13:12   ` Eric Auger
2016-01-26 13:12   ` Eric Auger
2016-01-26 16:42   ` kbuild test robot
2016-01-26 16:42     ` kbuild test robot
2016-01-26 16:42     ` kbuild test robot
2016-01-26 18:32   ` kbuild test robot
2016-01-26 18:32     ` kbuild test robot
2016-01-26 18:32     ` kbuild test robot
2016-01-26 18:32     ` kbuild test robot
2016-01-26 17:25 ` [PATCH 00/10] KVM PCIe/MSI passthrough on ARM/ARM64 Pavel Fedin
2016-01-26 17:25   ` Pavel Fedin
2016-01-26 17:25   ` Pavel Fedin
2016-01-27  8:52   ` Eric Auger
2016-01-27  8:52     ` Eric Auger
2016-01-27  8:52     ` Eric Auger
2016-01-28  7:13     ` Pavel Fedin
2016-01-28  7:13       ` Pavel Fedin
2016-01-28  7:13       ` Pavel Fedin
2016-01-28  9:50       ` Eric Auger
2016-01-28  9:50         ` Eric Auger
2016-01-28 21:51 ` Alex Williamson [this message]
2016-01-28 21:51   ` Alex Williamson
2016-01-28 21:51   ` Alex Williamson
2016-01-29 14:35   ` Eric Auger
2016-01-29 14:35     ` Eric Auger
2016-01-29 14:35     ` Eric Auger
2016-01-29 19:33     ` Alex Williamson
2016-01-29 19:33       ` Alex Williamson
2016-01-29 21:25       ` Eric Auger
2016-01-29 21:25         ` Eric Auger
2016-01-29 21:25         ` Eric Auger
2016-02-01 14:03         ` Will Deacon
2016-02-01 14:03           ` Will Deacon
2016-02-01 14:03           ` Will Deacon
2016-02-03 12:50           ` Christoffer Dall
2016-02-03 12:50             ` Christoffer Dall
2016-02-03 12:50             ` Christoffer Dall
2016-02-03 13:10             ` Will Deacon
2016-02-03 13:10               ` Will Deacon
2016-02-03 13:10               ` Will Deacon
2016-02-03 15:36               ` Christoffer Dall
2016-02-03 15:36                 ` Christoffer Dall
2016-02-03 15:36                 ` Christoffer Dall
2016-02-05 17:32                 ` ARM PCI/MSI KVM passthrough with GICv2M Eric Auger
2016-02-05 17:32                   ` Eric Auger
2016-02-05 18:17                   ` Alex Williamson
2016-02-05 18:17                     ` Alex Williamson
2016-02-05 18:17                     ` Alex Williamson
2016-02-08  9:48                     ` Christoffer Dall
2016-02-08  9:48                       ` Christoffer Dall
2016-02-08  9:48                       ` Christoffer Dall
2016-02-08 13:27                       ` Eric Auger
2016-02-08 13:27                         ` Eric Auger
2016-02-08 13:27                         ` Eric Auger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1454017899.23148.0.camel@redhat.com \
    --to=alex.williamson@redhat.com \
    --cc=Bharat.Bhushan@freescale.com \
    --cc=christoffer.dall@linaro.org \
    --cc=eric.auger@linaro.org \
    --cc=eric.auger@st.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marc.zyngier@arm.com \
    --cc=p.fedin@samsung.com \
    --cc=patches@linaro.org \
    --cc=pranav.sawargaonkar@gmail.com \
    --cc=suravee.suthikulpanit@amd.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.