linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH kernel 0/3 REPOST] vfio-pci: Add support for mmapping MSI-X table
@ 2017-06-15  5:48 Alexey Kardashevskiy
  2017-06-15  5:48 ` [PATCH kernel 1/3] PCI: Add a new PCI_BUS_FLAGS_MSI_REMAP flag Alexey Kardashevskiy
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Alexey Kardashevskiy @ 2017-06-15  5:48 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, kvm, linux-pci, linux-kernel, Yongji Xie,
	Alex Williamson, Benjamin Herrenschmidt, Gavin Shan,
	Paul Mackerras, David Gibson


Here is a patchset which Yongji was working on before
leaving IBM LTC. Since we still want to have this functionality
in the kernel (DPDK is the first user), here is a rebase
on the current upstream.


Current vfio-pci implementation disallows to mmap the page
containing MSI-X table in case that users can write directly
to MSI-X table and generate an incorrect MSIs.

However, this will cause some performance issue when there
are some critical device registers in the same page as the
MSI-X table. We have to handle the mmio access to these
registers in QEMU emulation rather than in guest.

To solve this issue, this series allows to expose MSI-X table
to userspace when hardware enables the capability of interrupt
remapping which can ensure that a given PCI device can only
shoot the MSIs assigned for it. And we introduce a new bus_flags
PCI_BUS_FLAGS_MSI_REMAP to test this capability on PCI side
for different archs.

The patch 3 are based on the proposed patchset[1].

Changelog
v3:
- rebased on the current upstream

v2:
- Make the commit log more clear
- Replace pci_bus_check_msi_remapping() with pci_bus_msi_isolated()
  so that we could clearly know what the function does
- Set PCI_BUS_FLAGS_MSI_REMAP in pci_create_root_bus() instead
  of iommu_bus_notifier()
- Reserve VFIO_REGION_INFO_FLAG_CAPS when we allow to mmap MSI-X
  table so that we can know whether we allow to mmap MSI-X table
  in QEMU

[1] https://www.mail-archive.com/linux-kernel%40vger.kernel.org/msg1138820.html


This is based on sha1
63f700aab4c1 Linus Torvalds "Merge tag 'xtensa-20170612' of git://github.com/jcmvbkbc/linux-xtensa".

Please comment. Thanks.



Yongji Xie (3):
  PCI: Add a new PCI_BUS_FLAGS_MSI_REMAP flag
  pci-ioda: Set PCI_BUS_FLAGS_MSI_REMAP for IODA host bridge
  vfio-pci: Allow to expose MSI-X table to userspace if interrupt
    remapping is enabled

 include/linux/pci.h                       |  1 +
 arch/powerpc/platforms/powernv/pci-ioda.c |  8 ++++++++
 drivers/vfio/pci/vfio_pci.c               | 18 +++++++++++++++---
 drivers/vfio/pci/vfio_pci_rdwr.c          |  3 ++-
 4 files changed, 26 insertions(+), 4 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH kernel 1/3] PCI: Add a new PCI_BUS_FLAGS_MSI_REMAP flag
  2017-06-15  5:48 [PATCH kernel 0/3 REPOST] vfio-pci: Add support for mmapping MSI-X table Alexey Kardashevskiy
@ 2017-06-15  5:48 ` Alexey Kardashevskiy
  2017-06-15  5:48 ` [PATCH kernel 2/3] pci-ioda: Set PCI_BUS_FLAGS_MSI_REMAP for IODA host bridge Alexey Kardashevskiy
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Alexey Kardashevskiy @ 2017-06-15  5:48 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, kvm, linux-pci, linux-kernel, Yongji Xie,
	Alex Williamson, Benjamin Herrenschmidt, Gavin Shan,
	Paul Mackerras, David Gibson, Yongji Xie

From: Yongji Xie <elohimes@gmail.com>

We introduce a new pci_bus_flags, PCI_BUS_FLAGS_MSI_REMAP
which indicates interrupts of all devices on the bus are
managed by the hardware enabling IRQ remapping(intel naming).
When the capability is enabled, a given PCI device can only
shoot the MSIs assigned for it. In other words, the hardware
can protect system from invalid MSIs of the device by checking
the target address and data when there is something wrong
with MSI part in device or device driver.

There is a existing flag for this capability in the IOMMU space:

enum iommu_cap {
	IOMMU_CAP_CACHE_COHERENCY,
--->	IOMMU_CAP_INTR_REMAP,
	IOMMU_CAP_NOEXEC,
};

and Eric also posted a patchset [1] to abstract it on MSI
controller side for ARM. But it would make sense to have a
more common flag like PCI_BUS_FLAGS_MSI_REMAP so that we can
use a universal flag to test this capability on PCI side for
different archs.

With this flag enabled, we can easily know whether it's safe
to expose MSI-X tables of PCI BARs to userspace. Some usespace
drivers such as VFIO may benefit from this.

[1] https://www.mail-archive.com/linux-kernel%40vger.kernel.org/msg1138820.html

Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 include/linux/pci.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 8039f9f0ca05..2c6dbb3dd0da 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -200,6 +200,7 @@ enum pci_bus_flags {
 	PCI_BUS_FLAGS_NO_MSI	= (__force pci_bus_flags_t) 1,
 	PCI_BUS_FLAGS_NO_MMRBC	= (__force pci_bus_flags_t) 2,
 	PCI_BUS_FLAGS_NO_AERSID	= (__force pci_bus_flags_t) 4,
+	PCI_BUS_FLAGS_MSI_REMAP = (__force pci_bus_flags_t) 8,
 };
 
 /* These values come from the PCI Express Spec */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH kernel 2/3] pci-ioda: Set PCI_BUS_FLAGS_MSI_REMAP for IODA host bridge
  2017-06-15  5:48 [PATCH kernel 0/3 REPOST] vfio-pci: Add support for mmapping MSI-X table Alexey Kardashevskiy
  2017-06-15  5:48 ` [PATCH kernel 1/3] PCI: Add a new PCI_BUS_FLAGS_MSI_REMAP flag Alexey Kardashevskiy
@ 2017-06-15  5:48 ` Alexey Kardashevskiy
  2017-06-15  9:25   ` Michael Ellerman
  2017-06-15  5:48 ` [PATCH kernel 3/3] vfio-pci: Allow to expose MSI-X table to userspace if interrupt remapping is enabled Alexey Kardashevskiy
  2017-06-22 21:11 ` [PATCH kernel 0/3 REPOST] vfio-pci: Add support for mmapping MSI-X table Alex Williamson
  3 siblings, 1 reply; 12+ messages in thread
From: Alexey Kardashevskiy @ 2017-06-15  5:48 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, kvm, linux-pci, linux-kernel, Yongji Xie,
	Alex Williamson, Benjamin Herrenschmidt, Gavin Shan,
	Paul Mackerras, David Gibson, Yongji Xie

From: Yongji Xie <elohimes@gmail.com>

Any IODA host bridge have the capability of IRQ remapping.
So we set PCI_BUS_FLAGS_MSI_REMAP when this kind of host birdge
is detected.

Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 283caf1070c9..b6bda1918273 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3177,6 +3177,12 @@ static void pnv_pci_ioda_fixup(void)
 #endif
 }
 
+int pnv_pci_ioda_root_bridge_prepare(struct pci_host_bridge *bridge)
+{
+	bridge->bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP;
+	return 0;
+}
+
 /*
  * Returns the alignment for I/O or memory windows for P2P
  * bridges. That actually depends on how PEs are segmented.
@@ -3861,6 +3867,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	 */
 	ppc_md.pcibios_fixup = pnv_pci_ioda_fixup;
 
+	ppc_md.pcibios_root_bridge_prepare = pnv_pci_ioda_root_bridge_prepare;
+
 	if (phb->type == PNV_PHB_NPU) {
 		hose->controller_ops = pnv_npu_ioda_controller_ops;
 	} else {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH kernel 3/3] vfio-pci: Allow to expose MSI-X table to userspace if interrupt remapping is enabled
  2017-06-15  5:48 [PATCH kernel 0/3 REPOST] vfio-pci: Add support for mmapping MSI-X table Alexey Kardashevskiy
  2017-06-15  5:48 ` [PATCH kernel 1/3] PCI: Add a new PCI_BUS_FLAGS_MSI_REMAP flag Alexey Kardashevskiy
  2017-06-15  5:48 ` [PATCH kernel 2/3] pci-ioda: Set PCI_BUS_FLAGS_MSI_REMAP for IODA host bridge Alexey Kardashevskiy
@ 2017-06-15  5:48 ` Alexey Kardashevskiy
  2017-06-22 21:11 ` [PATCH kernel 0/3 REPOST] vfio-pci: Add support for mmapping MSI-X table Alex Williamson
  3 siblings, 0 replies; 12+ messages in thread
From: Alexey Kardashevskiy @ 2017-06-15  5:48 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, kvm, linux-pci, linux-kernel, Yongji Xie,
	Alex Williamson, Benjamin Herrenschmidt, Gavin Shan,
	Paul Mackerras, David Gibson, Yongji Xie

From: Yongji Xie <elohimes@gmail.com>

This patch tries to expose MSI-X tables to userspace if hardware
enables interrupt remapping which can ensure that a given PCI
device can only shoot the MSIs assigned for it. So we could
never worry that userspace driver can hurt other devices by
writing to the exposed MSI-X table directly.

Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 drivers/vfio/pci/vfio_pci.c      | 18 +++++++++++++++---
 drivers/vfio/pci/vfio_pci_rdwr.c |  3 ++-
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 324c52e3a1a4..700e9d04dab5 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -564,8 +564,12 @@ static int msix_sparse_mmap_cap(struct vfio_pci_device *vdev,
 
 	end = pci_resource_len(vdev->pdev, vdev->msix_bar);
 
-	/* If MSI-X table is aligned to the start or end, only one area */
-	if (((vdev->msix_offset & PAGE_MASK) == 0) ||
+	/*
+	 * If MSI-X table is allowed to mmap because of the capability
+	 * of IRQ remapping or aligned to the start or end, only one area
+	 */
+	if ((vdev->pdev->bus->bus_flags & PCI_BUS_FLAGS_MSI_REMAP) ||
+	    ((vdev->msix_offset & PAGE_MASK) == 0) ||
 	    (PAGE_ALIGN(vdev->msix_offset + vdev->msix_size) >= end))
 		nr_areas = 1;
 
@@ -577,6 +581,12 @@ static int msix_sparse_mmap_cap(struct vfio_pci_device *vdev,
 
 	sparse->nr_areas = nr_areas;
 
+	if (vdev->pdev->bus->bus_flags & PCI_BUS_FLAGS_MSI_REMAP) {
+		sparse->areas[i].offset = 0;
+		sparse->areas[i].size = end;
+		goto out;
+	}
+
 	if (vdev->msix_offset & PAGE_MASK) {
 		sparse->areas[i].offset = 0;
 		sparse->areas[i].size = vdev->msix_offset & PAGE_MASK;
@@ -590,6 +600,7 @@ static int msix_sparse_mmap_cap(struct vfio_pci_device *vdev,
 		i++;
 	}
 
+out:
 	ret = vfio_info_add_capability(caps, VFIO_REGION_INFO_CAP_SPARSE_MMAP,
 				       sparse);
 	kfree(sparse);
@@ -1115,7 +1126,8 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
 	if (req_start + req_len > phys_len)
 		return -EINVAL;
 
-	if (index == vdev->msix_bar) {
+	if (index == vdev->msix_bar &&
+		!(pdev->bus->bus_flags & PCI_BUS_FLAGS_MSI_REMAP)) {
 		/*
 		 * Disallow mmaps overlapping the MSI-X table; users don't
 		 * get to touch this directly.  We could find somewhere
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 357243d76f10..5378f2c3ac8e 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -164,7 +164,8 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf,
 	} else
 		io = vdev->barmap[bar];
 
-	if (bar == vdev->msix_bar) {
+	if (bar == vdev->msix_bar &&
+		!(pdev->bus->bus_flags & PCI_BUS_FLAGS_MSI_REMAP)) {
 		x_start = vdev->msix_offset;
 		x_end = vdev->msix_offset + vdev->msix_size;
 	}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH kernel 2/3] pci-ioda: Set PCI_BUS_FLAGS_MSI_REMAP for IODA host bridge
  2017-06-15  5:48 ` [PATCH kernel 2/3] pci-ioda: Set PCI_BUS_FLAGS_MSI_REMAP for IODA host bridge Alexey Kardashevskiy
@ 2017-06-15  9:25   ` Michael Ellerman
  2017-06-15 10:56     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 12+ messages in thread
From: Michael Ellerman @ 2017-06-15  9:25 UTC (permalink / raw)
  To: Alexey Kardashevskiy, linuxppc-dev
  Cc: kvm, Alexey Kardashevskiy, linux-pci, linux-kernel, Gavin Shan,
	Yongji Xie, Alex Williamson, Paul Mackerras, Yongji Xie,
	David Gibson

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> From: Yongji Xie <elohimes@gmail.com>
>
> Any IODA host bridge have the capability of IRQ remapping.
> So we set PCI_BUS_FLAGS_MSI_REMAP when this kind of host birdge
> is detected.

Where's the code that actually enforces this property?

It would be good to have a comment in pnv_pci_ioda_root_bridge_prepare()
(probably), pointing to that code, so that we can remember the
relationship between the two.

cheers

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH kernel 2/3] pci-ioda: Set PCI_BUS_FLAGS_MSI_REMAP for IODA host bridge
  2017-06-15  9:25   ` Michael Ellerman
@ 2017-06-15 10:56     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 12+ messages in thread
From: Benjamin Herrenschmidt @ 2017-06-15 10:56 UTC (permalink / raw)
  To: Michael Ellerman, Alexey Kardashevskiy, linuxppc-dev
  Cc: kvm, linux-pci, linux-kernel, Gavin Shan, Yongji Xie,
	Alex Williamson, Paul Mackerras, Yongji Xie, David Gibson

On Thu, 2017-06-15 at 19:25 +1000, Michael Ellerman wrote:
> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> 
> > From: Yongji Xie <elohimes@gmail.com>
> > 
> > Any IODA host bridge have the capability of IRQ remapping.
> > So we set PCI_BUS_FLAGS_MSI_REMAP when this kind of host birdge
> > is detected.
> 
> Where's the code that actually enforces this property?
> 
> It would be good to have a comment in pnv_pci_ioda_root_bridge_prepare()
> (probably), pointing to that code, so that we can remember the
> relationship between the two.

Actually it's not so much remapping as:

 - The bridge can enforce that the interrupt is allowed for a given
partition

 - Because the interrupts are handled via the hypervisor, the latter
can do the remapping.

But the effect is the same, so yes we want the flag. On P9 with XIVE
exploitation we also have HW remapping on top.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH kernel 0/3 REPOST] vfio-pci: Add support for mmapping MSI-X table
  2017-06-15  5:48 [PATCH kernel 0/3 REPOST] vfio-pci: Add support for mmapping MSI-X table Alexey Kardashevskiy
                   ` (2 preceding siblings ...)
  2017-06-15  5:48 ` [PATCH kernel 3/3] vfio-pci: Allow to expose MSI-X table to userspace if interrupt remapping is enabled Alexey Kardashevskiy
@ 2017-06-22 21:11 ` Alex Williamson
  2017-06-23  5:06   ` Alexey Kardashevskiy
  3 siblings, 1 reply; 12+ messages in thread
From: Alex Williamson @ 2017-06-22 21:11 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, kvm, linux-pci, linux-kernel, Yongji Xie,
	Benjamin Herrenschmidt, Gavin Shan, Paul Mackerras, David Gibson

On Thu, 15 Jun 2017 15:48:42 +1000
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> Here is a patchset which Yongji was working on before
> leaving IBM LTC. Since we still want to have this functionality
> in the kernel (DPDK is the first user), here is a rebase
> on the current upstream.
> 
> 
> Current vfio-pci implementation disallows to mmap the page
> containing MSI-X table in case that users can write directly
> to MSI-X table and generate an incorrect MSIs.
> 
> However, this will cause some performance issue when there
> are some critical device registers in the same page as the
> MSI-X table. We have to handle the mmio access to these
> registers in QEMU emulation rather than in guest.
> 
> To solve this issue, this series allows to expose MSI-X table
> to userspace when hardware enables the capability of interrupt
> remapping which can ensure that a given PCI device can only
> shoot the MSIs assigned for it. And we introduce a new bus_flags
> PCI_BUS_FLAGS_MSI_REMAP to test this capability on PCI side
> for different archs.
> 
> The patch 3 are based on the proposed patchset[1].
> 
> Changelog
> v3:
> - rebased on the current upstream

There's something not forthcoming here, the last version I see from
Yongji is this one:

https://lists.linuxfoundation.org/pipermail/iommu/2016-June/017245.html

Which was a 6-patch series where patches 2-4 tried to apply
PCI_BUS_FLAGS_MSI_REMAP for cases that supported other platforms.  That
doesn't exist here, so it's not simply a rebase.  Patch 1/ seems to
equate this new flag to the IOMMU capability IOMMU_CAP_INTR_REMAP, but
nothing is done here to match them together.  That patch also mentions
the work Eric has done for similar features on ARM, but again those
patches are dropped.  It seems like an incomplete feature now.  Thanks,

Alex

> v2:
> - Make the commit log more clear
> - Replace pci_bus_check_msi_remapping() with pci_bus_msi_isolated()
>   so that we could clearly know what the function does
> - Set PCI_BUS_FLAGS_MSI_REMAP in pci_create_root_bus() instead
>   of iommu_bus_notifier()
> - Reserve VFIO_REGION_INFO_FLAG_CAPS when we allow to mmap MSI-X
>   table so that we can know whether we allow to mmap MSI-X table
>   in QEMU
> 
> [1] https://www.mail-archive.com/linux-kernel%40vger.kernel.org/msg1138820.html
> 
> 
> This is based on sha1
> 63f700aab4c1 Linus Torvalds "Merge tag 'xtensa-20170612' of git://github.com/jcmvbkbc/linux-xtensa".
> 
> Please comment. Thanks.
> 
> 
> 
> Yongji Xie (3):
>   PCI: Add a new PCI_BUS_FLAGS_MSI_REMAP flag
>   pci-ioda: Set PCI_BUS_FLAGS_MSI_REMAP for IODA host bridge
>   vfio-pci: Allow to expose MSI-X table to userspace if interrupt
>     remapping is enabled
> 
>  include/linux/pci.h                       |  1 +
>  arch/powerpc/platforms/powernv/pci-ioda.c |  8 ++++++++
>  drivers/vfio/pci/vfio_pci.c               | 18 +++++++++++++++---
>  drivers/vfio/pci/vfio_pci_rdwr.c          |  3 ++-
>  4 files changed, 26 insertions(+), 4 deletions(-)
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH kernel 0/3 REPOST] vfio-pci: Add support for mmapping MSI-X table
  2017-06-22 21:11 ` [PATCH kernel 0/3 REPOST] vfio-pci: Add support for mmapping MSI-X table Alex Williamson
@ 2017-06-23  5:06   ` Alexey Kardashevskiy
  2017-06-23 15:17     ` Alex Williamson
  0 siblings, 1 reply; 12+ messages in thread
From: Alexey Kardashevskiy @ 2017-06-23  5:06 UTC (permalink / raw)
  To: Alex Williamson
  Cc: linuxppc-dev, kvm, linux-pci, linux-kernel, Yongji Xie,
	Benjamin Herrenschmidt, Gavin Shan, Paul Mackerras, David Gibson

On 23/06/17 07:11, Alex Williamson wrote:
> On Thu, 15 Jun 2017 15:48:42 +1000
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
>> Here is a patchset which Yongji was working on before
>> leaving IBM LTC. Since we still want to have this functionality
>> in the kernel (DPDK is the first user), here is a rebase
>> on the current upstream.
>>
>>
>> Current vfio-pci implementation disallows to mmap the page
>> containing MSI-X table in case that users can write directly
>> to MSI-X table and generate an incorrect MSIs.
>>
>> However, this will cause some performance issue when there
>> are some critical device registers in the same page as the
>> MSI-X table. We have to handle the mmio access to these
>> registers in QEMU emulation rather than in guest.
>>
>> To solve this issue, this series allows to expose MSI-X table
>> to userspace when hardware enables the capability of interrupt
>> remapping which can ensure that a given PCI device can only
>> shoot the MSIs assigned for it. And we introduce a new bus_flags
>> PCI_BUS_FLAGS_MSI_REMAP to test this capability on PCI side
>> for different archs.
>>
>> The patch 3 are based on the proposed patchset[1].
>>
>> Changelog
>> v3:
>> - rebased on the current upstream
> 
> There's something not forthcoming here, the last version I see from
> Yongji is this one:
> 
> https://lists.linuxfoundation.org/pipermail/iommu/2016-June/017245.html
> 
> Which was a 6-patch series where patches 2-4 tried to apply
> PCI_BUS_FLAGS_MSI_REMAP for cases that supported other platforms.  That
> doesn't exist here, so it's not simply a rebase.  Patch 1/ seems to
> equate this new flag to the IOMMU capability IOMMU_CAP_INTR_REMAP, but
> nothing is done here to match them together.  That patch also mentions
> the work Eric has done for similar features on ARM, but again those
> patches are dropped.  It seems like an incomplete feature now.  Thanks,


Thanks! I suspected this is not the latest but could not find anything
better than we use internally for tests, and I could not reach Yongji for
comments whether this was the latest update.

As I am reading the patches, I notice that the "msi remap" term is used all
over the place. While this remapping capability may be the case for x86/arm
(and therefore the IOMMU_CAP_INTR_REMAP flag makes sense), powernv does not
do remapping but provides hardware isolation. When we are allowing MSIX BAR
mapping to the userspace - the isolation is what we really care about. Will
it make sense to rename PCI_BUS_FLAGS_MSI_REMAP to
PCI_BUS_FLAGS_MSI_ISOLATED ?

Another thing - the patchset enables PCI_BUS_FLAGS_MSI_REMAP when IOMMU
just advertises IOMMU_CAP_INTR_REMAP, not necessarily uses it, should the
patchset actually look at something like irq_remapping_enabled in
drivers/iommu/amd_iommu.c instead?



> 
> Alex
> 
>> v2:
>> - Make the commit log more clear
>> - Replace pci_bus_check_msi_remapping() with pci_bus_msi_isolated()
>>   so that we could clearly know what the function does
>> - Set PCI_BUS_FLAGS_MSI_REMAP in pci_create_root_bus() instead
>>   of iommu_bus_notifier()
>> - Reserve VFIO_REGION_INFO_FLAG_CAPS when we allow to mmap MSI-X
>>   table so that we can know whether we allow to mmap MSI-X table
>>   in QEMU
>>
>> [1] https://www.mail-archive.com/linux-kernel%40vger.kernel.org/msg1138820.html
>>
>>
>> This is based on sha1
>> 63f700aab4c1 Linus Torvalds "Merge tag 'xtensa-20170612' of git://github.com/jcmvbkbc/linux-xtensa".
>>
>> Please comment. Thanks.
>>
>>
>>
>> Yongji Xie (3):
>>   PCI: Add a new PCI_BUS_FLAGS_MSI_REMAP flag
>>   pci-ioda: Set PCI_BUS_FLAGS_MSI_REMAP for IODA host bridge
>>   vfio-pci: Allow to expose MSI-X table to userspace if interrupt
>>     remapping is enabled
>>
>>  include/linux/pci.h                       |  1 +
>>  arch/powerpc/platforms/powernv/pci-ioda.c |  8 ++++++++
>>  drivers/vfio/pci/vfio_pci.c               | 18 +++++++++++++++---
>>  drivers/vfio/pci/vfio_pci_rdwr.c          |  3 ++-
>>  4 files changed, 26 insertions(+), 4 deletions(-)
>>
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH kernel 0/3 REPOST] vfio-pci: Add support for mmapping MSI-X table
  2017-06-23  5:06   ` Alexey Kardashevskiy
@ 2017-06-23 15:17     ` Alex Williamson
  2017-06-28  7:27       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 12+ messages in thread
From: Alex Williamson @ 2017-06-23 15:17 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, kvm, linux-pci, linux-kernel, Yongji Xie,
	Benjamin Herrenschmidt, Gavin Shan, Paul Mackerras, David Gibson

On Fri, 23 Jun 2017 15:06:37 +1000
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> On 23/06/17 07:11, Alex Williamson wrote:
> > On Thu, 15 Jun 2017 15:48:42 +1000
> > Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> >   
> >> Here is a patchset which Yongji was working on before
> >> leaving IBM LTC. Since we still want to have this functionality
> >> in the kernel (DPDK is the first user), here is a rebase
> >> on the current upstream.
> >>
> >>
> >> Current vfio-pci implementation disallows to mmap the page
> >> containing MSI-X table in case that users can write directly
> >> to MSI-X table and generate an incorrect MSIs.
> >>
> >> However, this will cause some performance issue when there
> >> are some critical device registers in the same page as the
> >> MSI-X table. We have to handle the mmio access to these
> >> registers in QEMU emulation rather than in guest.
> >>
> >> To solve this issue, this series allows to expose MSI-X table
> >> to userspace when hardware enables the capability of interrupt
> >> remapping which can ensure that a given PCI device can only
> >> shoot the MSIs assigned for it. And we introduce a new bus_flags
> >> PCI_BUS_FLAGS_MSI_REMAP to test this capability on PCI side
> >> for different archs.
> >>
> >> The patch 3 are based on the proposed patchset[1].
> >>
> >> Changelog
> >> v3:
> >> - rebased on the current upstream  
> > 
> > There's something not forthcoming here, the last version I see from
> > Yongji is this one:
> > 
> > https://lists.linuxfoundation.org/pipermail/iommu/2016-June/017245.html
> > 
> > Which was a 6-patch series where patches 2-4 tried to apply
> > PCI_BUS_FLAGS_MSI_REMAP for cases that supported other platforms.  That
> > doesn't exist here, so it's not simply a rebase.  Patch 1/ seems to
> > equate this new flag to the IOMMU capability IOMMU_CAP_INTR_REMAP, but
> > nothing is done here to match them together.  That patch also mentions
> > the work Eric has done for similar features on ARM, but again those
> > patches are dropped.  It seems like an incomplete feature now.  Thanks,  
> 
> 
> Thanks! I suspected this is not the latest but could not find anything
> better than we use internally for tests, and I could not reach Yongji for
> comments whether this was the latest update.
> 
> As I am reading the patches, I notice that the "msi remap" term is used all
> over the place. While this remapping capability may be the case for x86/arm
> (and therefore the IOMMU_CAP_INTR_REMAP flag makes sense), powernv does not
> do remapping but provides hardware isolation. When we are allowing MSIX BAR
> mapping to the userspace - the isolation is what we really care about. Will
> it make sense to rename PCI_BUS_FLAGS_MSI_REMAP to
> PCI_BUS_FLAGS_MSI_ISOLATED ?

I don't have a strong opinion either way, so long as it's fully
described what the flag indicates.

> Another thing - the patchset enables PCI_BUS_FLAGS_MSI_REMAP when IOMMU
> just advertises IOMMU_CAP_INTR_REMAP, not necessarily uses it, should the
> patchset actually look at something like irq_remapping_enabled in
> drivers/iommu/amd_iommu.c instead?

Interrupt remapping being enabled is implicit in IOMMU_CAP_INTR_REMAP,
neither intel or amd iommu export the capability unless enabled.
Nobody cares if it's supported but not enabled.  Thanks,

Alex

> >> v2:
> >> - Make the commit log more clear
> >> - Replace pci_bus_check_msi_remapping() with pci_bus_msi_isolated()
> >>   so that we could clearly know what the function does
> >> - Set PCI_BUS_FLAGS_MSI_REMAP in pci_create_root_bus() instead
> >>   of iommu_bus_notifier()
> >> - Reserve VFIO_REGION_INFO_FLAG_CAPS when we allow to mmap MSI-X
> >>   table so that we can know whether we allow to mmap MSI-X table
> >>   in QEMU
> >>
> >> [1] https://www.mail-archive.com/linux-kernel%40vger.kernel.org/msg1138820.html
> >>
> >>
> >> This is based on sha1
> >> 63f700aab4c1 Linus Torvalds "Merge tag 'xtensa-20170612' of git://github.com/jcmvbkbc/linux-xtensa".
> >>
> >> Please comment. Thanks.
> >>
> >>
> >>
> >> Yongji Xie (3):
> >>   PCI: Add a new PCI_BUS_FLAGS_MSI_REMAP flag
> >>   pci-ioda: Set PCI_BUS_FLAGS_MSI_REMAP for IODA host bridge
> >>   vfio-pci: Allow to expose MSI-X table to userspace if interrupt
> >>     remapping is enabled
> >>
> >>  include/linux/pci.h                       |  1 +
> >>  arch/powerpc/platforms/powernv/pci-ioda.c |  8 ++++++++
> >>  drivers/vfio/pci/vfio_pci.c               | 18 +++++++++++++++---
> >>  drivers/vfio/pci/vfio_pci_rdwr.c          |  3 ++-
> >>  4 files changed, 26 insertions(+), 4 deletions(-)
> >>  
> >   
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH kernel 0/3 REPOST] vfio-pci: Add support for mmapping MSI-X table
  2017-06-23 15:17     ` Alex Williamson
@ 2017-06-28  7:27       ` Alexey Kardashevskiy
  2017-06-29 20:06         ` Alex Williamson
  0 siblings, 1 reply; 12+ messages in thread
From: Alexey Kardashevskiy @ 2017-06-28  7:27 UTC (permalink / raw)
  To: Alex Williamson
  Cc: linuxppc-dev, kvm, linux-pci, linux-kernel, Yongji Xie,
	Benjamin Herrenschmidt, Gavin Shan, Paul Mackerras, David Gibson

On 24/06/17 01:17, Alex Williamson wrote:
> On Fri, 23 Jun 2017 15:06:37 +1000
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
>> On 23/06/17 07:11, Alex Williamson wrote:
>>> On Thu, 15 Jun 2017 15:48:42 +1000
>>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>   
>>>> Here is a patchset which Yongji was working on before
>>>> leaving IBM LTC. Since we still want to have this functionality
>>>> in the kernel (DPDK is the first user), here is a rebase
>>>> on the current upstream.
>>>>
>>>>
>>>> Current vfio-pci implementation disallows to mmap the page
>>>> containing MSI-X table in case that users can write directly
>>>> to MSI-X table and generate an incorrect MSIs.
>>>>
>>>> However, this will cause some performance issue when there
>>>> are some critical device registers in the same page as the
>>>> MSI-X table. We have to handle the mmio access to these
>>>> registers in QEMU emulation rather than in guest.
>>>>
>>>> To solve this issue, this series allows to expose MSI-X table
>>>> to userspace when hardware enables the capability of interrupt
>>>> remapping which can ensure that a given PCI device can only
>>>> shoot the MSIs assigned for it. And we introduce a new bus_flags
>>>> PCI_BUS_FLAGS_MSI_REMAP to test this capability on PCI side
>>>> for different archs.
>>>>
>>>> The patch 3 are based on the proposed patchset[1].
>>>>
>>>> Changelog
>>>> v3:
>>>> - rebased on the current upstream  
>>>
>>> There's something not forthcoming here, the last version I see from
>>> Yongji is this one:
>>>
>>> https://lists.linuxfoundation.org/pipermail/iommu/2016-June/017245.html
>>>
>>> Which was a 6-patch series where patches 2-4 tried to apply
>>> PCI_BUS_FLAGS_MSI_REMAP for cases that supported other platforms.  That
>>> doesn't exist here, so it's not simply a rebase.  Patch 1/ seems to
>>> equate this new flag to the IOMMU capability IOMMU_CAP_INTR_REMAP, but
>>> nothing is done here to match them together.  That patch also mentions
>>> the work Eric has done for similar features on ARM, but again those
>>> patches are dropped.  It seems like an incomplete feature now.  Thanks,  
>>
>>
>> Thanks! I suspected this is not the latest but could not find anything
>> better than we use internally for tests, and I could not reach Yongji for
>> comments whether this was the latest update.
>>
>> As I am reading the patches, I notice that the "msi remap" term is used all
>> over the place. While this remapping capability may be the case for x86/arm
>> (and therefore the IOMMU_CAP_INTR_REMAP flag makes sense), powernv does not
>> do remapping but provides hardware isolation. When we are allowing MSIX BAR
>> mapping to the userspace - the isolation is what we really care about. Will
>> it make sense to rename PCI_BUS_FLAGS_MSI_REMAP to
>> PCI_BUS_FLAGS_MSI_ISOLATED ?
> 
> I don't have a strong opinion either way, so long as it's fully
> described what the flag indicates.
> 
>> Another thing - the patchset enables PCI_BUS_FLAGS_MSI_REMAP when IOMMU
>> just advertises IOMMU_CAP_INTR_REMAP, not necessarily uses it, should the
>> patchset actually look at something like irq_remapping_enabled in
>> drivers/iommu/amd_iommu.c instead?
> 
> Interrupt remapping being enabled is implicit in IOMMU_CAP_INTR_REMAP,
> neither intel or amd iommu export the capability unless enabled.
> Nobody cares if it's supported but not enabled.  Thanks,


As I am reading the current drivers/vfio/vfio_iommu_type1.c, it feels like
MSIX BAR mappings can always be allowed for the type1 IOMMU as
vfio_iommu_type1_attach_group() performs this check:

msi_remap = resv_msi ? irq_domain_check_msi_remap() :
    iommu_capable(bus, IOMMU_CAP_INTR_REMAP);

and simply does not proceed if MSI remap is not supported. Is that correct
or I miss something here? Thanks.




-- 
Alexey

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH kernel 0/3 REPOST] vfio-pci: Add support for mmapping MSI-X table
  2017-06-28  7:27       ` Alexey Kardashevskiy
@ 2017-06-29 20:06         ` Alex Williamson
  0 siblings, 0 replies; 12+ messages in thread
From: Alex Williamson @ 2017-06-29 20:06 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, kvm, linux-pci, linux-kernel, Yongji Xie,
	Benjamin Herrenschmidt, Gavin Shan, Paul Mackerras, David Gibson

On Wed, 28 Jun 2017 17:27:32 +1000
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> On 24/06/17 01:17, Alex Williamson wrote:
> > On Fri, 23 Jun 2017 15:06:37 +1000
> > Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> >   
> >> On 23/06/17 07:11, Alex Williamson wrote:  
> >>> On Thu, 15 Jun 2017 15:48:42 +1000
> >>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> >>>     
> >>>> Here is a patchset which Yongji was working on before
> >>>> leaving IBM LTC. Since we still want to have this functionality
> >>>> in the kernel (DPDK is the first user), here is a rebase
> >>>> on the current upstream.
> >>>>
> >>>>
> >>>> Current vfio-pci implementation disallows to mmap the page
> >>>> containing MSI-X table in case that users can write directly
> >>>> to MSI-X table and generate an incorrect MSIs.
> >>>>
> >>>> However, this will cause some performance issue when there
> >>>> are some critical device registers in the same page as the
> >>>> MSI-X table. We have to handle the mmio access to these
> >>>> registers in QEMU emulation rather than in guest.
> >>>>
> >>>> To solve this issue, this series allows to expose MSI-X table
> >>>> to userspace when hardware enables the capability of interrupt
> >>>> remapping which can ensure that a given PCI device can only
> >>>> shoot the MSIs assigned for it. And we introduce a new bus_flags
> >>>> PCI_BUS_FLAGS_MSI_REMAP to test this capability on PCI side
> >>>> for different archs.
> >>>>
> >>>> The patch 3 are based on the proposed patchset[1].
> >>>>
> >>>> Changelog
> >>>> v3:
> >>>> - rebased on the current upstream    
> >>>
> >>> There's something not forthcoming here, the last version I see from
> >>> Yongji is this one:
> >>>
> >>> https://lists.linuxfoundation.org/pipermail/iommu/2016-June/017245.html
> >>>
> >>> Which was a 6-patch series where patches 2-4 tried to apply
> >>> PCI_BUS_FLAGS_MSI_REMAP for cases that supported other platforms.  That
> >>> doesn't exist here, so it's not simply a rebase.  Patch 1/ seems to
> >>> equate this new flag to the IOMMU capability IOMMU_CAP_INTR_REMAP, but
> >>> nothing is done here to match them together.  That patch also mentions
> >>> the work Eric has done for similar features on ARM, but again those
> >>> patches are dropped.  It seems like an incomplete feature now.  Thanks,    
> >>
> >>
> >> Thanks! I suspected this is not the latest but could not find anything
> >> better than we use internally for tests, and I could not reach Yongji for
> >> comments whether this was the latest update.
> >>
> >> As I am reading the patches, I notice that the "msi remap" term is used all
> >> over the place. While this remapping capability may be the case for x86/arm
> >> (and therefore the IOMMU_CAP_INTR_REMAP flag makes sense), powernv does not
> >> do remapping but provides hardware isolation. When we are allowing MSIX BAR
> >> mapping to the userspace - the isolation is what we really care about. Will
> >> it make sense to rename PCI_BUS_FLAGS_MSI_REMAP to
> >> PCI_BUS_FLAGS_MSI_ISOLATED ?  
> > 
> > I don't have a strong opinion either way, so long as it's fully
> > described what the flag indicates.
> >   
> >> Another thing - the patchset enables PCI_BUS_FLAGS_MSI_REMAP when IOMMU
> >> just advertises IOMMU_CAP_INTR_REMAP, not necessarily uses it, should the
> >> patchset actually look at something like irq_remapping_enabled in
> >> drivers/iommu/amd_iommu.c instead?  
> > 
> > Interrupt remapping being enabled is implicit in IOMMU_CAP_INTR_REMAP,
> > neither intel or amd iommu export the capability unless enabled.
> > Nobody cares if it's supported but not enabled.  Thanks,  
> 
> 
> As I am reading the current drivers/vfio/vfio_iommu_type1.c, it feels like
> MSIX BAR mappings can always be allowed for the type1 IOMMU as
> vfio_iommu_type1_attach_group() performs this check:
> 
> msi_remap = resv_msi ? irq_domain_check_msi_remap() :
>     iommu_capable(bus, IOMMU_CAP_INTR_REMAP);
> 
> and simply does not proceed if MSI remap is not supported. Is that correct
> or I miss something here? Thanks.

The MSI code in type1 has absolutely nothing to do with BAR mappings.
That's looking at how MSI is handled by the IOMMU, whether it needs a
reserved mapping area and whether MSI writes have source ID
validation.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH kernel 2/3] pci-ioda: Set PCI_BUS_FLAGS_MSI_REMAP for IODA host bridge
  2017-06-15  5:06 [PATCH kernel 0/3] " Alexey Kardashevskiy
@ 2017-06-15  5:06 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 12+ messages in thread
From: Alexey Kardashevskiy @ 2017-06-15  5:06 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, kvm, linux-pci, linux-kernel, Yongji Xie,
	Alex Williamson, Benjamin Herrenschmidt, Gavin Shan,
	Paul Mackerras, David Gibson, Yongji Xie, Paul Mackerras

From: Yongji Xie <xyjxie@linux.vnet.ibm.com>

Any IODA host bridge have the capability of IRQ remapping.
So we set PCI_BUS_FLAGS_MSI_REMAP when this kind of host birdge
is detected.

Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 283caf1070c9..b6bda1918273 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3177,6 +3177,12 @@ static void pnv_pci_ioda_fixup(void)
 #endif
 }
 
+int pnv_pci_ioda_root_bridge_prepare(struct pci_host_bridge *bridge)
+{
+	bridge->bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP;
+	return 0;
+}
+
 /*
  * Returns the alignment for I/O or memory windows for P2P
  * bridges. That actually depends on how PEs are segmented.
@@ -3861,6 +3867,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	 */
 	ppc_md.pcibios_fixup = pnv_pci_ioda_fixup;
 
+	ppc_md.pcibios_root_bridge_prepare = pnv_pci_ioda_root_bridge_prepare;
+
 	if (phb->type == PNV_PHB_NPU) {
 		hose->controller_ops = pnv_npu_ioda_controller_ops;
 	} else {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-06-29 20:06 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-15  5:48 [PATCH kernel 0/3 REPOST] vfio-pci: Add support for mmapping MSI-X table Alexey Kardashevskiy
2017-06-15  5:48 ` [PATCH kernel 1/3] PCI: Add a new PCI_BUS_FLAGS_MSI_REMAP flag Alexey Kardashevskiy
2017-06-15  5:48 ` [PATCH kernel 2/3] pci-ioda: Set PCI_BUS_FLAGS_MSI_REMAP for IODA host bridge Alexey Kardashevskiy
2017-06-15  9:25   ` Michael Ellerman
2017-06-15 10:56     ` Benjamin Herrenschmidt
2017-06-15  5:48 ` [PATCH kernel 3/3] vfio-pci: Allow to expose MSI-X table to userspace if interrupt remapping is enabled Alexey Kardashevskiy
2017-06-22 21:11 ` [PATCH kernel 0/3 REPOST] vfio-pci: Add support for mmapping MSI-X table Alex Williamson
2017-06-23  5:06   ` Alexey Kardashevskiy
2017-06-23 15:17     ` Alex Williamson
2017-06-28  7:27       ` Alexey Kardashevskiy
2017-06-29 20:06         ` Alex Williamson
  -- strict thread matches above, loose matches on Subject: below --
2017-06-15  5:06 [PATCH kernel 0/3] " Alexey Kardashevskiy
2017-06-15  5:06 ` [PATCH kernel 2/3] pci-ioda: Set PCI_BUS_FLAGS_MSI_REMAP for IODA host bridge Alexey Kardashevskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).