All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v3 0/5] vfio-pci: Allow to mmap sub-page MMIO BARs and MSI-X table on PPC64 platform
@ 2016-01-15  7:06 Yongji Xie
  2016-01-15  7:06 ` [RFC PATCH v3 1/5] PCI: Add support for enforcing all MMIO BARs to be page aligned Yongji Xie
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: Yongji Xie @ 2016-01-15  7:06 UTC (permalink / raw)
  To: kvm, linux-kernel, linux-pci, linuxppc-dev, linux-doc
  Cc: bhelgaas, corbet, aik, alex.williamson, benh, paulus, mpe,
	warrier, zhong, nikunj, Yongji Xie

Current vfio-pci implementation disallows to mmap
sub-page(size < PAGE_SIZE) MMIO BARs and MSI-X table. This is because
sub-page BARs' mmio page may be shared with other BARs and MSI-X table
should not be accessed directly from the guest for security reasons.

But these will easily cause some performance issues for mmio accesses
in guest when vfio passthrough sub-page BARs or BARs containing MSI-X
table on PPC64 platform. This is because PAGE_SIZE is 64KB by default
on PPC64 platform and the big page may easily hit the sub-page MMIO
BARs' unmmapping and cause the unmmaping of the mmio page which
MSI-X table locate in, which lead to mmio emulation in host.

For sub-page MMIO BARs' unmmapping, this patchset adds a kernel
parameter for PCI resource allocator to enforce the alignment of all
MMIO BARs to be at least PAGE_SZIE and make it enabled by default on
PPC64 platform so that sub-page BAR's mmio page will not be shared
with other BARs. Then we can mmap sub-page MMIO BARs in vfio-pci driver
with this parameter enabled.

For MSI-X table's unmmapping, we think MSI-X table is safe to access
directly from userspace if PCI host bridge support filtering of MSIs
which can ensure that a given pci device can only shoot the MSIs
assigned for it. So we add a pci_host_bridge attribute to indicate
if this PCI host bridge supports filtering of MSIs. Then we can mmap
MSI-X table with this attribute set.

With this patchset applied, we can get almost 100% improvement on
performance for mmio accesses when we passthrough sub-page BARs to guest
in our test.

The two vfio related patches(patch 2 and patch 5) are based on the
proposed patchset[1].

Changelog v3: 
- Rebase on new linux kernel mainline with the patchset[1] applied.
- Add a function to check whether PCI BARs'mmio page is shared with
  other BARs.
- Add a host bridge attribute to indicate PCI host bridge support
  filtering of MSIs.
- Use the new host bridge attribute to check if MSI-X table can
  be mmapped instead of CONFIG_EEH.
- Remove Kconfig option VFIO_PCI_MMAP_MSIX

Changelog v2:
- Rebase on v4.4-rc6 with the patchset[1] applied.
- Use kernel parameter to enforce all MMIO BARs to be page aligned
  on PCI core code instead of doing it on PPC64 arch code.
- Remove flags: VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED
                VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP
- Add a Kconfig option to support for mmapping MSI-X table.

[1] https://lkml.org/lkml/2015/11/23/748

Yongji Xie (5):
  PCI: Add support for enforcing all MMIO BARs to be page aligned
  vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive
  PCI: Add host bridge attribute to indicate filtering of MSIs is supported
  powerpc/powernv/pci-ioda: Enable msi_filtered bit for any IODA host bridge
  vfio-pci: Allow to mmap MSI-X table if host bridge supports filtering of MSIs

 Documentation/kernel-parameters.txt       |    5 +++++
 arch/powerpc/include/asm/pci.h            |   11 +++++++++
 arch/powerpc/platforms/powernv/pci-ioda.c |    6 +++++
 drivers/pci/host-bridge.c                 |    6 +++++
 drivers/pci/pci.c                         |   35 +++++++++++++++++++++++++++++
 drivers/pci/pci.h                         |    8 ++++++-
 drivers/vfio/pci/vfio_pci.c               |   13 ++++++++---
 include/linux/pci.h                       |    7 ++++++
 8 files changed, 87 insertions(+), 4 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH v3 1/5] PCI: Add support for enforcing all MMIO BARs to be page aligned
  2016-01-15  7:06 [RFC PATCH v3 0/5] vfio-pci: Allow to mmap sub-page MMIO BARs and MSI-X table on PPC64 platform Yongji Xie
@ 2016-01-15  7:06 ` Yongji Xie
  2016-01-28 22:46   ` Alex Williamson
  2016-01-15  7:06 ` [RFC PATCH v3 2/5] vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive Yongji Xie
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Yongji Xie @ 2016-01-15  7:06 UTC (permalink / raw)
  To: kvm, linux-kernel, linux-pci, linuxppc-dev, linux-doc
  Cc: bhelgaas, corbet, aik, alex.williamson, benh, paulus, mpe,
	warrier, zhong, nikunj, Yongji Xie

When vfio passthrough a PCI device of which MMIO BARs
are smaller than PAGE_SIZE, guest will not handle the
mmio accesses to the BARs which leads to mmio emulations
in host.

This is because vfio will not allow to passthrough one
BAR's mmio page which may be shared with other BARs.

To solve this performance issue, this patch adds a kernel
parameter "pci=resource_page_aligned=on" to enforce
the alignment of all MMIO BARs to be at least PAGE_SIZE,
so that one BAR's mmio page would not be shared with other
BARs. We can also disable it through kernel parameter
"pci=resource_page_aligned=off".

For the default value of the parameter, we think it should be
arch-independent, so we add a macro
HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED to change it. And we
define this macro to enable this parameter by default on PPC64
platform which can easily hit this performance issue because
its PAGE_SIZE is 64KB.

Note that the kernel parameter won't works if kernel doesn't do
resources reallocation.

Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
---
 Documentation/kernel-parameters.txt |    5 +++++
 arch/powerpc/include/asm/pci.h      |   11 +++++++++++
 drivers/pci/pci.c                   |   35 +++++++++++++++++++++++++++++++++++
 drivers/pci/pci.h                   |    8 +++++++-
 include/linux/pci.h                 |    4 ++++
 5 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 742f69d..3f2a7c9 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2857,6 +2857,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 				PAGE_SIZE is used as alignment.
 				PCI-PCI bridge can be specified, if resource
 				windows need to be expanded.
+		resource_page_aligned=	Enable/disable enforcing the alignment
+				of all PCI devices' memory resources to be
+				at least PAGE_SIZE if resources reallocation
+				is done by kernel.
+				Format: { "on" | "off" }
 		ecrc=		Enable/disable PCIe ECRC (transaction layer
 				end-to-end CRC checking).
 				bios: Use BIOS/firmware settings. This is the
diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
index 3453bd8..2d2b3ef 100644
--- a/arch/powerpc/include/asm/pci.h
+++ b/arch/powerpc/include/asm/pci.h
@@ -136,6 +136,17 @@ extern pgprot_t	pci_phys_mem_access_prot(struct file *file,
 					 unsigned long pfn,
 					 unsigned long size,
 					 pgprot_t prot);
+#ifdef CONFIG_PPC64
+
+/* For PPC64, We enforce all PCI MMIO BARs to be page aligned
+ * by default. This would be helpful to improve performance
+ * when we passthrough a PCI device of which BARs are smaller
+ * than PAGE_SIZE(64KB). And we can use kernel parameter
+ * "pci=resource_page_aligned=off" to disable it.
+ */
+#define HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED	1
+
+#endif
 
 #define HAVE_ARCH_PCI_RESOURCE_TO_USER
 extern void pci_resource_to_user(const struct pci_dev *dev, int bar,
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 314db8c..7b21238 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -99,6 +99,9 @@ u8 pci_cache_line_size;
  */
 unsigned int pcibios_max_latency = 255;
 
+bool pci_resources_page_aligned =
+	IS_ENABLED(HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED);
+
 /* If set, the PCIe ARI capability will not be used. */
 static bool pcie_ari_disabled;
 
@@ -4746,6 +4749,35 @@ static ssize_t pci_resource_alignment_store(struct bus_type *bus,
 BUS_ATTR(resource_alignment, 0644, pci_resource_alignment_show,
 					pci_resource_alignment_store);
 
+static void pci_resources_get_page_aligned(char *str)
+{
+	if (!strncmp(str, "off", 3))
+		pci_resources_page_aligned = false;
+	else if (!strncmp(str, "on", 2))
+		pci_resources_page_aligned = true;
+}
+
+/*
+ * This function checks whether PCI BARs' mmio page will be shared
+ * with other BARs.
+ */
+bool pci_resources_share_page(struct pci_dev *dev, int resno)
+{
+	struct resource *res = dev->resource + resno;
+
+	if (resource_size(res) >= PAGE_SIZE)
+		return false;
+	if (pci_resources_page_aligned && !(res->start & ~PAGE_MASK) &&
+	    res->flags & IORESOURCE_MEM) {
+		if (res->sibling)
+			return (res->sibling->start & ~PAGE_MASK);
+		else
+			return false;
+	}
+	return true;
+}
+EXPORT_SYMBOL_GPL(pci_resources_share_page);
+
 static int __init pci_resource_alignment_sysfs_init(void)
 {
 	return bus_create_file(&pci_bus_type,
@@ -4859,6 +4891,9 @@ static int __init pci_setup(char *str)
 			} else if (!strncmp(str, "resource_alignment=", 19)) {
 				pci_set_resource_alignment_param(str + 19,
 							strlen(str + 19));
+			} else if (!strncmp(str, "resource_page_aligned=",
+				   22)) {
+				pci_resources_get_page_aligned(str + 22);
 			} else if (!strncmp(str, "ecrc=", 5)) {
 				pcie_ecrc_get_policy(str + 5);
 			} else if (!strncmp(str, "hpiosize=", 9)) {
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index d390fc1..b9b333d 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -312,11 +312,17 @@ static inline resource_size_t pci_resource_alignment(struct pci_dev *dev,
 #ifdef CONFIG_PCI_IOV
 	int resno = res - dev->resource;
 
-	if (resno >= PCI_IOV_RESOURCES && resno <= PCI_IOV_RESOURCE_END)
+	if (resno >= PCI_IOV_RESOURCES && resno <= PCI_IOV_RESOURCE_END) {
+		if (pci_resources_page_aligned && res->flags & IORESOURCE_MEM)
+			return PAGE_ALIGN(pci_sriov_resource_alignment(dev,
+					  resno));
 		return pci_sriov_resource_alignment(dev, resno);
+	}
 #endif
 	if (dev->class >> 8  == PCI_CLASS_BRIDGE_CARDBUS)
 		return pci_cardbus_resource_alignment(res);
+	if (pci_resources_page_aligned && res->flags & IORESOURCE_MEM)
+		return PAGE_ALIGN(resource_alignment(res));
 	return resource_alignment(res);
 }
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 6ae25aa..b640d65 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1530,6 +1530,10 @@ static inline int pci_get_new_domain_nr(void) { return -ENOSYS; }
 	 (pci_resource_end((dev), (bar)) -		\
 	  pci_resource_start((dev), (bar)) + 1))
 
+extern bool pci_resources_page_aligned;
+
+bool pci_resources_share_page(struct pci_dev *dev, int resno);
+
 /* Similar to the helpers above, these manipulate per-pci_dev
  * driver-specific data.  They are really just a wrapper around
  * the generic device structure functions of these calls.
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v3 2/5] vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive
  2016-01-15  7:06 [RFC PATCH v3 0/5] vfio-pci: Allow to mmap sub-page MMIO BARs and MSI-X table on PPC64 platform Yongji Xie
  2016-01-15  7:06 ` [RFC PATCH v3 1/5] PCI: Add support for enforcing all MMIO BARs to be page aligned Yongji Xie
@ 2016-01-15  7:06 ` Yongji Xie
  2016-01-15  7:06 ` [RFC PATCH v3 3/5] PCI: Add host bridge attribute to indicate filtering of MSIs is supported Yongji Xie
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Yongji Xie @ 2016-01-15  7:06 UTC (permalink / raw)
  To: kvm, linux-kernel, linux-pci, linuxppc-dev, linux-doc
  Cc: bhelgaas, corbet, aik, alex.williamson, benh, paulus, mpe,
	warrier, zhong, nikunj, Yongji Xie

Current vfio-pci implementation disallows to mmap
sub-page(size < PAGE_SIZE) MMIO BARs because these BARs' mmio
page may be shared with other BARs.

But we should allow to mmap these sub-page MMIO BARs if PCI
resource allocator can make sure these BARs' mmio page will
not be shared with other BARs.

Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
---
 drivers/vfio/pci/vfio_pci.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 7e9f497..11fd0f0 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -552,7 +552,8 @@ static long vfio_pci_ioctl(void *device_data,
 				     VFIO_REGION_INFO_FLAG_WRITE;
 			if (IS_ENABLED(CONFIG_VFIO_PCI_MMAP) &&
 			    pci_resource_flags(pdev, info.index) &
-			    IORESOURCE_MEM && info.size >= PAGE_SIZE) {
+			    IORESOURCE_MEM && !pci_resources_share_page(pdev,
+			    info.index)) {
 				info.flags |= VFIO_REGION_INFO_FLAG_MMAP;
 				if (info.index == vdev->msix_bar) {
 					ret = msix_sparse_mmap_cap(vdev, &caps);
@@ -954,6 +955,10 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
 		return -EINVAL;
 
 	phys_len = pci_resource_len(pdev, index);
+
+	if (!pci_resources_share_page(pdev, index))
+		phys_len = PAGE_ALIGN(phys_len);
+
 	req_len = vma->vm_end - vma->vm_start;
 	pgoff = vma->vm_pgoff &
 		((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v3 3/5] PCI: Add host bridge attribute to indicate filtering of MSIs is supported
  2016-01-15  7:06 [RFC PATCH v3 0/5] vfio-pci: Allow to mmap sub-page MMIO BARs and MSI-X table on PPC64 platform Yongji Xie
  2016-01-15  7:06 ` [RFC PATCH v3 1/5] PCI: Add support for enforcing all MMIO BARs to be page aligned Yongji Xie
  2016-01-15  7:06 ` [RFC PATCH v3 2/5] vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive Yongji Xie
@ 2016-01-15  7:06 ` Yongji Xie
  2016-01-15 17:24     ` David Laight
  2016-01-28 22:46   ` Alex Williamson
  2016-01-15  7:06 ` [RFC PATCH v3 4/5] powerpc/powernv/pci-ioda: Enable msi_filtered bit for any IODA host bridge Yongji Xie
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 22+ messages in thread
From: Yongji Xie @ 2016-01-15  7:06 UTC (permalink / raw)
  To: kvm, linux-kernel, linux-pci, linuxppc-dev, linux-doc
  Cc: bhelgaas, corbet, aik, alex.williamson, benh, paulus, mpe,
	warrier, zhong, nikunj, Yongji Xie

MSI-X tables are not allowed to be mmapped in vfio-pci
driver in case that user get to touch this directly.
This will cause some performance issues when when PCI
adapters have critical registers in the same page as
the MSI-X table.

However, some kind of PCI host bridge such as IODA bridge
on Power support filtering of MSIs, which can ensure that a
given pci device can only shoot the MSIs assigned for it.
So we think it's safe to expose the MSI-X table to userspace
if filtering of MSIs is supported because the exposed MSI-X
table can't be used to do harm to other memory space.

To support this case, this patch adds a pci_host_bridge
attribute to indicate if this PCI host bridge supports
filtering of MSIs.

Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
---
 drivers/pci/host-bridge.c |    6 ++++++
 include/linux/pci.h       |    3 +++
 2 files changed, 9 insertions(+)

diff --git a/drivers/pci/host-bridge.c b/drivers/pci/host-bridge.c
index 5f4a2e0..c029267 100644
--- a/drivers/pci/host-bridge.c
+++ b/drivers/pci/host-bridge.c
@@ -96,3 +96,9 @@ void pcibios_bus_to_resource(struct pci_bus *bus, struct resource *res,
 	res->end = region->end + offset;
 }
 EXPORT_SYMBOL(pcibios_bus_to_resource);
+
+bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev)
+{
+	return pci_find_host_bridge(pdev->bus)->msi_filtered;
+}
+EXPORT_SYMBOL_GPL(pci_host_bridge_msi_filtered_enabled);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index b640d65..b952b78 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -412,6 +412,7 @@ struct pci_host_bridge {
 	void (*release_fn)(struct pci_host_bridge *);
 	void *release_data;
 	unsigned int ignore_reset_delay:1;	/* for entire hierarchy */
+	unsigned int msi_filtered:1;	/* support filtering of MSIs */
 	/* Resource alignment requirements */
 	resource_size_t (*align_resource)(struct pci_dev *dev,
 			const struct resource *res,
@@ -430,6 +431,8 @@ void pci_set_host_bridge_release(struct pci_host_bridge *bridge,
 
 int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge);
 
+bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev);
+
 /*
  * The first PCI_BRIDGE_RESOURCE_NUM PCI bus resources (those that correspond
  * to P2P or CardBus bridge windows) go in a table.  Additional ones (for
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v3 4/5] powerpc/powernv/pci-ioda: Enable msi_filtered bit for any IODA host bridge
  2016-01-15  7:06 [RFC PATCH v3 0/5] vfio-pci: Allow to mmap sub-page MMIO BARs and MSI-X table on PPC64 platform Yongji Xie
                   ` (2 preceding siblings ...)
  2016-01-15  7:06 ` [RFC PATCH v3 3/5] PCI: Add host bridge attribute to indicate filtering of MSIs is supported Yongji Xie
@ 2016-01-15  7:06 ` Yongji Xie
  2016-01-15  7:06 ` [RFC PATCH v3 5/5] vfio-pci: Allow to mmap MSI-X table if host bridge supports filtering of MSIs Yongji Xie
  2016-01-28 10:01 ` [RFC PATCH v3 0/5] vfio-pci: Allow to mmap sub-page MMIO BARs and MSI-X table on PPC64 platform Yongji Xie
  5 siblings, 0 replies; 22+ messages in thread
From: Yongji Xie @ 2016-01-15  7:06 UTC (permalink / raw)
  To: kvm, linux-kernel, linux-pci, linuxppc-dev, linux-doc
  Cc: bhelgaas, corbet, aik, alex.williamson, benh, paulus, mpe,
	warrier, zhong, nikunj, Yongji Xie

This patch enables msi_filtered bit for any IODA host bridge.

Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 414fd1a..689f0dc 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2934,6 +2934,11 @@ static void pnv_pci_ioda_fixup(void)
 #endif
 }
 
+static void pnv_pci_ioda_fixup_phb(struct pci_controller *hose)
+{
+	to_pci_host_bridge(hose->bus->bridge)->msi_filtered = 1;
+}
+
 /*
  * Returns the alignment for I/O or memory windows for P2P
  * bridges. That actually depends on how PEs are segmented.
@@ -3201,6 +3206,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 	 * the child P2P bridges) can form individual PE.
 	 */
 	ppc_md.pcibios_fixup = pnv_pci_ioda_fixup;
+	ppc_md.pcibios_fixup_phb = pnv_pci_ioda_fixup_phb;
 	hose->controller_ops = pnv_pci_ioda_controller_ops;
 
 #ifdef CONFIG_PCI_IOV
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v3 5/5] vfio-pci: Allow to mmap MSI-X table if host bridge supports filtering of MSIs
  2016-01-15  7:06 [RFC PATCH v3 0/5] vfio-pci: Allow to mmap sub-page MMIO BARs and MSI-X table on PPC64 platform Yongji Xie
                   ` (3 preceding siblings ...)
  2016-01-15  7:06 ` [RFC PATCH v3 4/5] powerpc/powernv/pci-ioda: Enable msi_filtered bit for any IODA host bridge Yongji Xie
@ 2016-01-15  7:06 ` Yongji Xie
  2016-01-28 22:46   ` Alex Williamson
  2016-01-28 10:01 ` [RFC PATCH v3 0/5] vfio-pci: Allow to mmap sub-page MMIO BARs and MSI-X table on PPC64 platform Yongji Xie
  5 siblings, 1 reply; 22+ messages in thread
From: Yongji Xie @ 2016-01-15  7:06 UTC (permalink / raw)
  To: kvm, linux-kernel, linux-pci, linuxppc-dev, linux-doc
  Cc: bhelgaas, corbet, aik, alex.williamson, benh, paulus, mpe,
	warrier, zhong, nikunj, Yongji Xie

Current vfio-pci implementation disallows to mmap MSI-X
table in case that user get to touch this directly.

But we should allow to mmap these MSI-X tables if the PCI
host bridge supports filtering of MSIs.

Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
---
 drivers/vfio/pci/vfio_pci.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 11fd0f0..4d68f6a 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -555,7 +555,8 @@ static long vfio_pci_ioctl(void *device_data,
 			    IORESOURCE_MEM && !pci_resources_share_page(pdev,
 			    info.index)) {
 				info.flags |= VFIO_REGION_INFO_FLAG_MMAP;
-				if (info.index == vdev->msix_bar) {
+				if (!pci_host_bridge_msi_filtered_enabled(pdev) &&
+				    info.index == vdev->msix_bar) {
 					ret = msix_sparse_mmap_cap(vdev, &caps);
 					if (ret)
 						return ret;
@@ -967,7 +968,8 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
 	if (phys_len < PAGE_SIZE || req_start + req_len > phys_len)
 		return -EINVAL;
 
-	if (index == vdev->msix_bar) {
+	if (!pci_host_bridge_msi_filtered_enabled(pdev) &&
+	    index == vdev->msix_bar) {
 		/*
 		 * Disallow mmaps overlapping the MSI-X table; users don't
 		 * get to touch this directly.  We could find somewhere
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* RE: [RFC PATCH v3 3/5] PCI: Add host bridge attribute to indicate filtering of MSIs is supported
  2016-01-15  7:06 ` [RFC PATCH v3 3/5] PCI: Add host bridge attribute to indicate filtering of MSIs is supported Yongji Xie
@ 2016-01-15 17:24     ` David Laight
  2016-01-28 22:46   ` Alex Williamson
  1 sibling, 0 replies; 22+ messages in thread
From: David Laight @ 2016-01-15 17:24 UTC (permalink / raw)
  To: 'Yongji Xie',
	kvm, linux-kernel, linux-pci, linuxppc-dev, linux-doc
  Cc: nikunj, zhong, corbet, aik, warrier, alex.williamson, paulus, bhelgaas

From: Yongji Xie
> Sent: 15 January 2016 07:06
>
> MSI-X tables are not allowed to be mmapped in vfio-pci
> driver in case that user get to touch this directly.
> This will cause some performance issues when when PCI
> adapters have critical registers in the same page as
> the MSI-X table.
...
If the driver wants to generate an incorrect MSI-X interrupt
it can do so by requesting the device do a normal memory transfer
to the target address area that raises MSI-X interrupts.
So disabling writes to the MSI-X table (and pending bit array)
areas only raises the bar very slightly.
A device may also give the driver write access to the MSI-X
table through other addresses.

This seems to make disallowing the mapping of the MSI-X table
rather pointless.

I've also dumped out the MSI-X table (during development) to
check that the values are being written there correctly.

	David

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [RFC PATCH v3 3/5] PCI: Add host bridge attribute to indicate filtering of MSIs is supported
@ 2016-01-15 17:24     ` David Laight
  0 siblings, 0 replies; 22+ messages in thread
From: David Laight @ 2016-01-15 17:24 UTC (permalink / raw)
  To: 'Yongji Xie',
	kvm, linux-kernel, linux-pci, linuxppc-dev, linux-doc
  Cc: nikunj, zhong, corbet, aik, warrier, alex.williamson, paulus, bhelgaas

RnJvbTogWW9uZ2ppIFhpZQ0KPiBTZW50OiAxNSBKYW51YXJ5IDIwMTYgMDc6MDYNCj4NCj4gTVNJ
LVggdGFibGVzIGFyZSBub3QgYWxsb3dlZCB0byBiZSBtbWFwcGVkIGluIHZmaW8tcGNpDQo+IGRy
aXZlciBpbiBjYXNlIHRoYXQgdXNlciBnZXQgdG8gdG91Y2ggdGhpcyBkaXJlY3RseS4NCj4gVGhp
cyB3aWxsIGNhdXNlIHNvbWUgcGVyZm9ybWFuY2UgaXNzdWVzIHdoZW4gd2hlbiBQQ0kNCj4gYWRh
cHRlcnMgaGF2ZSBjcml0aWNhbCByZWdpc3RlcnMgaW4gdGhlIHNhbWUgcGFnZSBhcw0KPiB0aGUg
TVNJLVggdGFibGUuDQouLi4NCklmIHRoZSBkcml2ZXIgd2FudHMgdG8gZ2VuZXJhdGUgYW4gaW5j
b3JyZWN0IE1TSS1YIGludGVycnVwdA0KaXQgY2FuIGRvIHNvIGJ5IHJlcXVlc3RpbmcgdGhlIGRl
dmljZSBkbyBhIG5vcm1hbCBtZW1vcnkgdHJhbnNmZXINCnRvIHRoZSB0YXJnZXQgYWRkcmVzcyBh
cmVhIHRoYXQgcmFpc2VzIE1TSS1YIGludGVycnVwdHMuDQpTbyBkaXNhYmxpbmcgd3JpdGVzIHRv
IHRoZSBNU0ktWCB0YWJsZSAoYW5kIHBlbmRpbmcgYml0IGFycmF5KQ0KYXJlYXMgb25seSByYWlz
ZXMgdGhlIGJhciB2ZXJ5IHNsaWdodGx5Lg0KQSBkZXZpY2UgbWF5IGFsc28gZ2l2ZSB0aGUgZHJp
dmVyIHdyaXRlIGFjY2VzcyB0byB0aGUgTVNJLVgNCnRhYmxlIHRocm91Z2ggb3RoZXIgYWRkcmVz
c2VzLg0KDQpUaGlzIHNlZW1zIHRvIG1ha2UgZGlzYWxsb3dpbmcgdGhlIG1hcHBpbmcgb2YgdGhl
IE1TSS1YIHRhYmxlDQpyYXRoZXIgcG9pbnRsZXNzLg0KDQpJJ3ZlIGFsc28gZHVtcGVkIG91dCB0
aGUgTVNJLVggdGFibGUgKGR1cmluZyBkZXZlbG9wbWVudCkgdG8NCmNoZWNrIHRoYXQgdGhlIHZh
bHVlcyBhcmUgYmVpbmcgd3JpdHRlbiB0aGVyZSBjb3JyZWN0bHkuDQoNCglEYXZpZA0KDQo=

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 3/5] PCI: Add host bridge attribute to indicate filtering of MSIs is supported
  2016-01-15 17:24     ` David Laight
  (?)
@ 2016-01-20  9:41     ` Yongji Xie
  -1 siblings, 0 replies; 22+ messages in thread
From: Yongji Xie @ 2016-01-20  9:41 UTC (permalink / raw)
  To: David Laight, kvm, linux-kernel, linux-pci, linuxppc-dev, linux-doc
  Cc: nikunj, zhong, corbet, aik, warrier, alex.williamson, paulus, bhelgaas

On 2016/1/16 1:24, David Laight wrote:
> From: Yongji Xie
>> Sent: 15 January 2016 07:06
>>
>> MSI-X tables are not allowed to be mmapped in vfio-pci
>> driver in case that user get to touch this directly.
>> This will cause some performance issues when when PCI
>> adapters have critical registers in the same page as
>> the MSI-X table.
> ...
> If the driver wants to generate an incorrect MSI-X interrupt
> it can do so by requesting the device do a normal memory transfer
> to the target address area that raises MSI-X interrupts.

IOMMUs supporting interrupt remapping can prevent this case.

> So disabling writes to the MSI-X table (and pending bit array)
> areas only raises the bar very slightly.
> A device may also give the driver write access to the MSI-X
> table through other addresses.
>
> This seems to make disallowing the mapping of the MSI-X table
> rather pointless.

If we allow the mapping of the MSI-X table, it seems the guest
kernels of some architectures can write invalid data to MSI-X table
when device drivers initialize MSI-X interrupts.

Regards,
Yongji Xie

> I've also dumped out the MSI-X table (during development) to
> check that the values are being written there correctly.
>
> 	David
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 0/5] vfio-pci: Allow to mmap sub-page MMIO BARs and MSI-X table on PPC64 platform
  2016-01-15  7:06 [RFC PATCH v3 0/5] vfio-pci: Allow to mmap sub-page MMIO BARs and MSI-X table on PPC64 platform Yongji Xie
                   ` (4 preceding siblings ...)
  2016-01-15  7:06 ` [RFC PATCH v3 5/5] vfio-pci: Allow to mmap MSI-X table if host bridge supports filtering of MSIs Yongji Xie
@ 2016-01-28 10:01 ` Yongji Xie
  5 siblings, 0 replies; 22+ messages in thread
From: Yongji Xie @ 2016-01-28 10:01 UTC (permalink / raw)
  To: kvm, linux-kernel, linux-pci, linuxppc-dev, linux-doc
  Cc: bhelgaas, corbet, aik, alex.williamson, benh, paulus, mpe,
	warrier, zhong, nikunj, David Laight

Ping...

Alex, any comment?

Regards,
Yongji Xie

On 2016/1/15 15:06, Yongji Xie wrote:
> Current vfio-pci implementation disallows to mmap
> sub-page(size < PAGE_SIZE) MMIO BARs and MSI-X table. This is because
> sub-page BARs' mmio page may be shared with other BARs and MSI-X table
> should not be accessed directly from the guest for security reasons.
>
> But these will easily cause some performance issues for mmio accesses
> in guest when vfio passthrough sub-page BARs or BARs containing MSI-X
> table on PPC64 platform. This is because PAGE_SIZE is 64KB by default
> on PPC64 platform and the big page may easily hit the sub-page MMIO
> BARs' unmmapping and cause the unmmaping of the mmio page which
> MSI-X table locate in, which lead to mmio emulation in host.
>
> For sub-page MMIO BARs' unmmapping, this patchset adds a kernel
> parameter for PCI resource allocator to enforce the alignment of all
> MMIO BARs to be at least PAGE_SZIE and make it enabled by default on
> PPC64 platform so that sub-page BAR's mmio page will not be shared
> with other BARs. Then we can mmap sub-page MMIO BARs in vfio-pci driver
> with this parameter enabled.
>
> For MSI-X table's unmmapping, we think MSI-X table is safe to access
> directly from userspace if PCI host bridge support filtering of MSIs
> which can ensure that a given pci device can only shoot the MSIs
> assigned for it. So we add a pci_host_bridge attribute to indicate
> if this PCI host bridge supports filtering of MSIs. Then we can mmap
> MSI-X table with this attribute set.
>
> With this patchset applied, we can get almost 100% improvement on
> performance for mmio accesses when we passthrough sub-page BARs to guest
> in our test.
>
> The two vfio related patches(patch 2 and patch 5) are based on the
> proposed patchset[1].
>
> Changelog v3:
> - Rebase on new linux kernel mainline with the patchset[1] applied.
> - Add a function to check whether PCI BARs'mmio page is shared with
>    other BARs.
> - Add a host bridge attribute to indicate PCI host bridge support
>    filtering of MSIs.
> - Use the new host bridge attribute to check if MSI-X table can
>    be mmapped instead of CONFIG_EEH.
> - Remove Kconfig option VFIO_PCI_MMAP_MSIX
>
> Changelog v2:
> - Rebase on v4.4-rc6 with the patchset[1] applied.
> - Use kernel parameter to enforce all MMIO BARs to be page aligned
>    on PCI core code instead of doing it on PPC64 arch code.
> - Remove flags: VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED
>                  VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP
> - Add a Kconfig option to support for mmapping MSI-X table.
>
> [1] https://lkml.org/lkml/2015/11/23/748
>
> Yongji Xie (5):
>    PCI: Add support for enforcing all MMIO BARs to be page aligned
>    vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive
>    PCI: Add host bridge attribute to indicate filtering of MSIs is supported
>    powerpc/powernv/pci-ioda: Enable msi_filtered bit for any IODA host bridge
>    vfio-pci: Allow to mmap MSI-X table if host bridge supports filtering of MSIs
>
>   Documentation/kernel-parameters.txt       |    5 +++++
>   arch/powerpc/include/asm/pci.h            |   11 +++++++++
>   arch/powerpc/platforms/powernv/pci-ioda.c |    6 +++++
>   drivers/pci/host-bridge.c                 |    6 +++++
>   drivers/pci/pci.c                         |   35 +++++++++++++++++++++++++++++
>   drivers/pci/pci.h                         |    8 ++++++-
>   drivers/vfio/pci/vfio_pci.c               |   13 ++++++++---
>   include/linux/pci.h                       |    7 ++++++
>   8 files changed, 87 insertions(+), 4 deletions(-)
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 1/5] PCI: Add support for enforcing all MMIO BARs to be page aligned
  2016-01-15  7:06 ` [RFC PATCH v3 1/5] PCI: Add support for enforcing all MMIO BARs to be page aligned Yongji Xie
@ 2016-01-28 22:46   ` Alex Williamson
  2016-01-29 10:37     ` Yongji Xie
  0 siblings, 1 reply; 22+ messages in thread
From: Alex Williamson @ 2016-01-28 22:46 UTC (permalink / raw)
  To: Yongji Xie, kvm, linux-kernel, linux-pci, linuxppc-dev, linux-doc
  Cc: bhelgaas, corbet, aik, benh, paulus, mpe, warrier, zhong, nikunj

On Fri, 2016-01-15 at 15:06 +0800, Yongji Xie wrote:
> When vfio passthrough a PCI device of which MMIO BARs
> are smaller than PAGE_SIZE, guest will not handle the
> mmio accesses to the BARs which leads to mmio emulations
> in host.
> 
> This is because vfio will not allow to passthrough one
> BAR's mmio page which may be shared with other BARs.
> 
> To solve this performance issue, this patch adds a kernel
> parameter "pci=resource_page_aligned=on" to enforce
> the alignment of all MMIO BARs to be at least PAGE_SIZE,
> so that one BAR's mmio page would not be shared with other
> BARs. We can also disable it through kernel parameter
> "pci=resource_page_aligned=off".
> 
> For the default value of the parameter, we think it should be
> arch-independent, so we add a macro
> HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED to change it. And we
> define this macro to enable this parameter by default on PPC64
> platform which can easily hit this performance issue because
> its PAGE_SIZE is 64KB.
> 
> Note that the kernel parameter won't works if kernel doesn't do
> resources reallocation.

And where do you account for this so that we know whether it's really in
effect?

> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
> ---
>  Documentation/kernel-parameters.txt |    5 +++++
>  arch/powerpc/include/asm/pci.h      |   11 +++++++++++
>  drivers/pci/pci.c                   |   35 +++++++++++++++++++++++++++++++++++
>  drivers/pci/pci.h                   |    8 +++++++-
>  include/linux/pci.h                 |    4 ++++
>  5 files changed, 62 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index 742f69d..3f2a7c9 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2857,6 +2857,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
>  				PAGE_SIZE is used as alignment.
>  				PCI-PCI bridge can be specified, if resource
>  				windows need to be expanded.
> +		resource_page_aligned=	Enable/disable enforcing the alignment
> +				of all PCI devices' memory resources to be
> +				at least PAGE_SIZE if resources reallocation
> +				is done by kernel.
> +				Format: { "on" | "off" }
>  		ecrc=		Enable/disable PCIe ECRC (transaction layer
>  				end-to-end CRC checking).
>  				bios: Use BIOS/firmware settings. This is the
> diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
> index 3453bd8..2d2b3ef 100644
> --- a/arch/powerpc/include/asm/pci.h
> +++ b/arch/powerpc/include/asm/pci.h
> @@ -136,6 +136,17 @@ extern pgprot_t	pci_phys_mem_access_prot(struct file *file,
>  					 unsigned long pfn,
>  					 unsigned long size,
>  					 pgprot_t prot);
> +#ifdef CONFIG_PPC64
> +
> +/* For PPC64, We enforce all PCI MMIO BARs to be page aligned
> + * by default. This would be helpful to improve performance
> + * when we passthrough a PCI device of which BARs are smaller
> + * than PAGE_SIZE(64KB). And we can use kernel parameter
> + * "pci=resource_page_aligned=off" to disable it.
> + */
> +#define HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED	1
> +
> +#endif
>  
>  #define HAVE_ARCH_PCI_RESOURCE_TO_USER
>  extern void pci_resource_to_user(const struct pci_dev *dev, int bar,
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 314db8c..7b21238 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -99,6 +99,9 @@ u8 pci_cache_line_size;
>   */
>  unsigned int pcibios_max_latency = 255;
>  
> +bool pci_resources_page_aligned =
> +	IS_ENABLED(HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED);

I don't think this is proper use of IS_ENABLED, which seems to be
targeted at CONFIG_ type options.  You could define this as that in an
arch Kconfig.

> +
>  /* If set, the PCIe ARI capability will not be used. */
>  static bool pcie_ari_disabled;
>  
> @@ -4746,6 +4749,35 @@ static ssize_t pci_resource_alignment_store(struct bus_type *bus,
>  BUS_ATTR(resource_alignment, 0644, pci_resource_alignment_show,
>  					pci_resource_alignment_store);
>  
> +static void pci_resources_get_page_aligned(char *str)
> +{
> +	if (!strncmp(str, "off", 3))
> +		pci_resources_page_aligned = false;
> +	else if (!strncmp(str, "on", 2))
> +		pci_resources_page_aligned = true;
> +}

"get"?

> +
> +/*
> + * This function checks whether PCI BARs' mmio page will be shared
> + * with other BARs.
> + */
> +bool pci_resources_share_page(struct pci_dev *dev, int resno)
> +{
> +	struct resource *res = dev->resource + resno;
> +
> +	if (resource_size(res) >= PAGE_SIZE)
> +		return false;
> +	if (pci_resources_page_aligned && !(res->start & ~PAGE_MASK) &&
> +	    res->flags & IORESOURCE_MEM) {
> +		if (res->sibling)
> +			return (res->sibling->start & ~PAGE_MASK);
> +		else
> +			return false;
> +	}
> +	return true;
> +}
> +EXPORT_SYMBOL_GPL(pci_resources_share_page);
> +
>  static int __init pci_resource_alignment_sysfs_init(void)
>  {
>  	return bus_create_file(&pci_bus_type,
> @@ -4859,6 +4891,9 @@ static int __init pci_setup(char *str)
>  			} else if (!strncmp(str, "resource_alignment=", 19)) {
>  				pci_set_resource_alignment_param(str + 19,
>  							strlen(str + 19));
> +			} else if (!strncmp(str, "resource_page_aligned=",
> +				   22)) {
> +				pci_resources_get_page_aligned(str + 22);

Doesn't this seem rather redundant with the option right above it,
resource_alignment=?  Why not modify that to support syntax where all
devices get the same alignment?


>  			} else if (!strncmp(str, "ecrc=", 5)) {
>  				pcie_ecrc_get_policy(str + 5);
>  			} else if (!strncmp(str, "hpiosize=", 9)) {
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index d390fc1..b9b333d 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -312,11 +312,17 @@ static inline resource_size_t pci_resource_alignment(struct pci_dev *dev,
>  #ifdef CONFIG_PCI_IOV
>  	int resno = res - dev->resource;
>  
> -	if (resno >= PCI_IOV_RESOURCES && resno <= PCI_IOV_RESOURCE_END)
> +	if (resno >= PCI_IOV_RESOURCES && resno <= PCI_IOV_RESOURCE_END) {
> +		if (pci_resources_page_aligned && res->flags & IORESOURCE_MEM)
> +			return PAGE_ALIGN(pci_sriov_resource_alignment(dev,
> +					  resno));
>  		return pci_sriov_resource_alignment(dev, resno);
> +	}
>  #endif
>  	if (dev->class >> 8  == PCI_CLASS_BRIDGE_CARDBUS)
>  		return pci_cardbus_resource_alignment(res);
> +	if (pci_resources_page_aligned && res->flags & IORESOURCE_MEM)
> +		return PAGE_ALIGN(resource_alignment(res));
>  	return resource_alignment(res);
>  }


Since we already have resource_alignment=, shouldn't we already have the
code in place to re-align?

>  
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 6ae25aa..b640d65 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1530,6 +1530,10 @@ static inline int pci_get_new_domain_nr(void) { return -ENOSYS; }
>  	 (pci_resource_end((dev), (bar)) -		\
>  	  pci_resource_start((dev), (bar)) + 1))
>  
> +extern bool pci_resources_page_aligned;
> +
> +bool pci_resources_share_page(struct pci_dev *dev, int resno);
> +
>  /* Similar to the helpers above, these manipulate per-pci_dev
>   * driver-specific data.  They are really just a wrapper around
>   * the generic device structure functions of these calls.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 3/5] PCI: Add host bridge attribute to indicate filtering of MSIs is supported
  2016-01-15  7:06 ` [RFC PATCH v3 3/5] PCI: Add host bridge attribute to indicate filtering of MSIs is supported Yongji Xie
  2016-01-15 17:24     ` David Laight
@ 2016-01-28 22:46   ` Alex Williamson
  2016-01-29 10:40     ` Yongji Xie
  1 sibling, 1 reply; 22+ messages in thread
From: Alex Williamson @ 2016-01-28 22:46 UTC (permalink / raw)
  To: Yongji Xie, kvm, linux-kernel, linux-pci, linuxppc-dev, linux-doc
  Cc: bhelgaas, corbet, aik, benh, paulus, mpe, warrier, zhong, nikunj

On Fri, 2016-01-15 at 15:06 +0800, Yongji Xie wrote:
> MSI-X tables are not allowed to be mmapped in vfio-pci
> driver in case that user get to touch this directly.
> This will cause some performance issues when when PCI
> adapters have critical registers in the same page as
> the MSI-X table.
> 
> However, some kind of PCI host bridge such as IODA bridge
> on Power support filtering of MSIs, which can ensure that a
> given pci device can only shoot the MSIs assigned for it.
> So we think it's safe to expose the MSI-X table to userspace
> if filtering of MSIs is supported because the exposed MSI-X
> table can't be used to do harm to other memory space.
> 
> To support this case, this patch adds a pci_host_bridge
> attribute to indicate if this PCI host bridge supports
> filtering of MSIs.
> 
> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
> ---
>  drivers/pci/host-bridge.c |    6 ++++++
>  include/linux/pci.h       |    3 +++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/drivers/pci/host-bridge.c b/drivers/pci/host-bridge.c
> index 5f4a2e0..c029267 100644
> --- a/drivers/pci/host-bridge.c
> +++ b/drivers/pci/host-bridge.c
> @@ -96,3 +96,9 @@ void pcibios_bus_to_resource(struct pci_bus *bus, struct resource *res,
>  	res->end = region->end + offset;
>  }
>  EXPORT_SYMBOL(pcibios_bus_to_resource);
> +
> +bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev)
> +{
> +	return pci_find_host_bridge(pdev->bus)->msi_filtered;
> +}
> +EXPORT_SYMBOL_GPL(pci_host_bridge_msi_filtered_enabled);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index b640d65..b952b78 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -412,6 +412,7 @@ struct pci_host_bridge {
>  	void (*release_fn)(struct pci_host_bridge *);
>  	void *release_data;
>  	unsigned int ignore_reset_delay:1;	/* for entire hierarchy */
> +	unsigned int msi_filtered:1;	/* support filtering of MSIs */
>  	/* Resource alignment requirements */
>  	resource_size_t (*align_resource)(struct pci_dev *dev,
>  			const struct resource *res,
> @@ -430,6 +431,8 @@ void pci_set_host_bridge_release(struct pci_host_bridge *bridge,
>  
>  int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge);
>  
> +bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev);
> +
>  /*
>   * The first PCI_BRIDGE_RESOURCE_NUM PCI bus resources (those that correspond
>   * to P2P or CardBus bridge windows) go in a table.  Additional ones (for

Don't we already have a flag for this in the IOMMU space?

enum iommu_cap {
        IOMMU_CAP_CACHE_COHERENCY,      /* IOMMU can enforce cache coherent DMA
                                           transactions */
--->    IOMMU_CAP_INTR_REMAP,           /* IOMMU supports interrupt isolation */
        IOMMU_CAP_NOEXEC,               /* IOMMU_NOEXEC flag */
};

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 5/5] vfio-pci: Allow to mmap MSI-X table if host bridge supports filtering of MSIs
  2016-01-15  7:06 ` [RFC PATCH v3 5/5] vfio-pci: Allow to mmap MSI-X table if host bridge supports filtering of MSIs Yongji Xie
@ 2016-01-28 22:46   ` Alex Williamson
  2016-01-29 10:42     ` Yongji Xie
  0 siblings, 1 reply; 22+ messages in thread
From: Alex Williamson @ 2016-01-28 22:46 UTC (permalink / raw)
  To: Yongji Xie, kvm, linux-kernel, linux-pci, linuxppc-dev, linux-doc
  Cc: bhelgaas, corbet, aik, benh, paulus, mpe, warrier, zhong, nikunj

On Fri, 2016-01-15 at 15:06 +0800, Yongji Xie wrote:
> Current vfio-pci implementation disallows to mmap MSI-X
> table in case that user get to touch this directly.
> 
> But we should allow to mmap these MSI-X tables if the PCI
> host bridge supports filtering of MSIs.
> 
> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
> ---
>  drivers/vfio/pci/vfio_pci.c |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 11fd0f0..4d68f6a 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -555,7 +555,8 @@ static long vfio_pci_ioctl(void *device_data,
>  			    IORESOURCE_MEM && !pci_resources_share_page(pdev,
>  			    info.index)) {
>  				info.flags |= VFIO_REGION_INFO_FLAG_MMAP;
> -				if (info.index == vdev->msix_bar) {
> +				if (!pci_host_bridge_msi_filtered_enabled(pdev) &&
> +				    info.index == vdev->msix_bar) {
>  					ret = msix_sparse_mmap_cap(vdev, &caps);
>  					if (ret)
>  						return ret;
> @@ -967,7 +968,8 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
>  	if (phys_len < PAGE_SIZE || req_start + req_len > phys_len)
>  		return -EINVAL;
>  
> -	if (index == vdev->msix_bar) {
> +	if (!pci_host_bridge_msi_filtered_enabled(pdev) &&
> +	    index == vdev->msix_bar) {
>  		/*
>  		 * Disallow mmaps overlapping the MSI-X table; users don't
>  		 * get to touch this directly.  We could find somewhere

What about read()/write() access, why would we allow mmap() but not
those?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 1/5] PCI: Add support for enforcing all MMIO BARs to be page aligned
  2016-01-28 22:46   ` Alex Williamson
@ 2016-01-29 10:37     ` Yongji Xie
  2016-01-29 19:01       ` Alex Williamson
  0 siblings, 1 reply; 22+ messages in thread
From: Yongji Xie @ 2016-01-29 10:37 UTC (permalink / raw)
  To: Alex Williamson, kvm, linux-kernel, linux-pci, linuxppc-dev, linux-doc
  Cc: bhelgaas, corbet, aik, benh, paulus, mpe, warrier, zhong, nikunj

On 2016/1/29 6:46, Alex Williamson wrote:
> On Fri, 2016-01-15 at 15:06 +0800, Yongji Xie wrote:
>> When vfio passthrough a PCI device of which MMIO BARs
>> are smaller than PAGE_SIZE, guest will not handle the
>> mmio accesses to the BARs which leads to mmio emulations
>> in host.
>>   
>> This is because vfio will not allow to passthrough one
>> BAR's mmio page which may be shared with other BARs.
>>   
>> To solve this performance issue, this patch adds a kernel
>> parameter "pci=resource_page_aligned=on" to enforce
>> the alignment of all MMIO BARs to be at least PAGE_SIZE,
>> so that one BAR's mmio page would not be shared with other
>> BARs. We can also disable it through kernel parameter
>> "pci=resource_page_aligned=off".
>>   
>> For the default value of the parameter, we think it should be
>> arch-independent, so we add a macro
>> HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED to change it. And we
>> define this macro to enable this parameter by default on PPC64
>> platform which can easily hit this performance issue because
>> its PAGE_SIZE is 64KB.
>>   
>> Note that the kernel parameter won't works if kernel doesn't do
>> resources reallocation.
> And where do you account for this so that we know whether it's really in
> effect?

We can check the flag PCI_PROBE_ONLY to know whether kernel do
resources reallocation. Then we know if the kernel parameter is really
in effect.

enum {
     /* Force re-assigning all resources (ignore firmware
      * setup completely)
      */
     PCI_REASSIGN_ALL_RSRC    = 0x00000001,

     /* Re-assign all bus numbers */
     PCI_REASSIGN_ALL_BUS    = 0x00000002,

     /* Do not try to assign, just use existing setup */
--->    PCI_PROBE_ONLY        = 0x00000004,

And I will add this to commit log.

>> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
>> ---
>>   Documentation/kernel-parameters.txt |    5 +++++
>>   arch/powerpc/include/asm/pci.h      |   11 +++++++++++
>>   drivers/pci/pci.c                   |   35 +++++++++++++++++++++++++++++++++++
>>   drivers/pci/pci.h                   |    8 +++++++-
>>   include/linux/pci.h                 |    4 ++++
>>   5 files changed, 62 insertions(+), 1 deletion(-)
>>   
>> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
>> index 742f69d..3f2a7c9 100644
>> --- a/Documentation/kernel-parameters.txt
>> +++ b/Documentation/kernel-parameters.txt
>> @@ -2857,6 +2857,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
>>   				PAGE_SIZE is used as alignment.
>>   				PCI-PCI bridge can be specified, if resource
>>   				windows need to be expanded.
>> +		resource_page_aligned=	Enable/disable enforcing the alignment
>> +				of all PCI devices' memory resources to be
>> +				at least PAGE_SIZE if resources reallocation
>> +				is done by kernel.
>> +				Format: { "on" | "off" }
>>   		ecrc=		Enable/disable PCIe ECRC (transaction layer
>>   				end-to-end CRC checking).
>>   				bios: Use BIOS/firmware settings. This is the
>> diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
>> index 3453bd8..2d2b3ef 100644
>> --- a/arch/powerpc/include/asm/pci.h
>> +++ b/arch/powerpc/include/asm/pci.h
>> @@ -136,6 +136,17 @@ extern pgprot_t	pci_phys_mem_access_prot(struct file *file,
>>   					 unsigned long pfn,
>>   					 unsigned long size,
>>   					 pgprot_t prot);
>> +#ifdef CONFIG_PPC64
>> +
>> +/* For PPC64, We enforce all PCI MMIO BARs to be page aligned
>> + * by default. This would be helpful to improve performance
>> + * when we passthrough a PCI device of which BARs are smaller
>> + * than PAGE_SIZE(64KB). And we can use kernel parameter
>> + * "pci=resource_page_aligned=off" to disable it.
>> + */
>> +#define HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED	1
>> +
>> +#endif
>>   
>>   #define HAVE_ARCH_PCI_RESOURCE_TO_USER
>>   extern void pci_resource_to_user(const struct pci_dev *dev, int bar,
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 314db8c..7b21238 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -99,6 +99,9 @@ u8 pci_cache_line_size;
>>    */
>>   unsigned int pcibios_max_latency = 255;
>>   
>> +bool pci_resources_page_aligned =
>> +	IS_ENABLED(HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED);
> I don't think this is proper use of IS_ENABLED, which seems to be
> targeted at CONFIG_ type options.  You could define this as that in an
> arch Kconfig.

Is it better that we define this as a pci Kconfig and select it in arch 
Kconfig?

>> +
>>   /* If set, the PCIe ARI capability will not be used. */
>>   static bool pcie_ari_disabled;
>>   
>> @@ -4746,6 +4749,35 @@ static ssize_t pci_resource_alignment_store(struct bus_type *bus,
>>   BUS_ATTR(resource_alignment, 0644, pci_resource_alignment_show,
>>   					pci_resource_alignment_store);
>>   
>> +static void pci_resources_get_page_aligned(char *str)
>> +{
>> +	if (!strncmp(str, "off", 3))
>> +		pci_resources_page_aligned = false;
>> +	else if (!strncmp(str, "on", 2))
>> +		pci_resources_page_aligned = true;
>> +}
> "get"?

How about pci_resources_parse_page_aligned_param()?

>> +
>> +/*
>> + * This function checks whether PCI BARs' mmio page will be shared
>> + * with other BARs.
>> + */
>> +bool pci_resources_share_page(struct pci_dev *dev, int resno)
>> +{
>> +	struct resource *res = dev->resource + resno;
>> +
>> +	if (resource_size(res) >= PAGE_SIZE)
>> +		return false;
>> +	if (pci_resources_page_aligned && !(res->start & ~PAGE_MASK) &&
>> +	    res->flags & IORESOURCE_MEM) {
>> +		if (res->sibling)
>> +			return (res->sibling->start & ~PAGE_MASK);
>> +		else
>> +			return false;
>> +	}
>> +	return true;
>> +}
>> +EXPORT_SYMBOL_GPL(pci_resources_share_page);
>> +
>>   static int __init pci_resource_alignment_sysfs_init(void)
>>   {
>>   	return bus_create_file(&pci_bus_type,
>> @@ -4859,6 +4891,9 @@ static int __init pci_setup(char *str)
>>   			} else if (!strncmp(str, "resource_alignment=", 19)) {
>>   				pci_set_resource_alignment_param(str + 19,
>>   							strlen(str + 19));
>> +			} else if (!strncmp(str, "resource_page_aligned=",
>> +				   22)) {
>> +				pci_resources_get_page_aligned(str + 22);
> Doesn't this seem rather redundant with the option right above it,
> resource_alignment=?  Why not modify that to support syntax where all
> devices get the same alignment?
>

The kernel option will be used to do two things.

Firstly, the option can be used to enable all devices to be page aligned.

Secondly, we can use the option to disable it when the Kconfig option
mentioned above enable all devices to be page aligned by default.

We can easily modify this option "resource_alignment=" to do the first
thing. But I didn't find a proper way to modify it to do the second thing.

>>   			} else if (!strncmp(str, "ecrc=", 5)) {
>>   				pcie_ecrc_get_policy(str + 5);
>>   			} else if (!strncmp(str, "hpiosize=", 9)) {
>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>> index d390fc1..b9b333d 100644
>> --- a/drivers/pci/pci.h
>> +++ b/drivers/pci/pci.h
>> @@ -312,11 +312,17 @@ static inline resource_size_t pci_resource_alignment(struct pci_dev *dev,
>>   #ifdef CONFIG_PCI_IOV
>>   	int resno = res - dev->resource;
>>   
>> -	if (resno >= PCI_IOV_RESOURCES && resno <= PCI_IOV_RESOURCE_END)
>> +	if (resno >= PCI_IOV_RESOURCES && resno <= PCI_IOV_RESOURCE_END) {
>> +		if (pci_resources_page_aligned && res->flags & IORESOURCE_MEM)
>> +			return PAGE_ALIGN(pci_sriov_resource_alignment(dev,
>> +					  resno));
>>   		return pci_sriov_resource_alignment(dev, resno);
>> +	}
>>   #endif
>>   	if (dev->class >> 8  == PCI_CLASS_BRIDGE_CARDBUS)
>>   		return pci_cardbus_resource_alignment(res);
>> +	if (pci_resources_page_aligned && res->flags & IORESOURCE_MEM)
>> +		return PAGE_ALIGN(resource_alignment(res));
>>   	return resource_alignment(res);
>>   }
>
> Since we already have resource_alignment=, shouldn't we already have the
> code in place to re-align?

Yes, this code can do the re-aligning. But we can't reuse the code because
it re-align device's bars by changing their sizes, which can potentially 
break
some drivers.

I'm thinking if we can use IORESOURCE_STARTALIGN for this. Thanks.

Regards,
Yongji Xie

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 3/5] PCI: Add host bridge attribute to indicate filtering of MSIs is supported
  2016-01-28 22:46   ` Alex Williamson
@ 2016-01-29 10:40     ` Yongji Xie
  2016-01-29 19:05       ` Alex Williamson
  0 siblings, 1 reply; 22+ messages in thread
From: Yongji Xie @ 2016-01-29 10:40 UTC (permalink / raw)
  To: Alex Williamson, kvm, linux-kernel, linux-pci, linuxppc-dev, linux-doc
  Cc: bhelgaas, corbet, aik, benh, paulus, mpe, warrier, zhong, nikunj

On 2016/1/29 6:46, Alex Williamson wrote:
> On Fri, 2016-01-15 at 15:06 +0800, Yongji Xie wrote:
>> MSI-X tables are not allowed to be mmapped in vfio-pci
>> driver in case that user get to touch this directly.
>> This will cause some performance issues when when PCI
>> adapters have critical registers in the same page as
>> the MSI-X table.
>>   
>> However, some kind of PCI host bridge such as IODA bridge
>> on Power support filtering of MSIs, which can ensure that a
>> given pci device can only shoot the MSIs assigned for it.
>> So we think it's safe to expose the MSI-X table to userspace
>> if filtering of MSIs is supported because the exposed MSI-X
>> table can't be used to do harm to other memory space.
>>   
>> To support this case, this patch adds a pci_host_bridge
>> attribute to indicate if this PCI host bridge supports
>> filtering of MSIs.
>>   
>> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
>> ---
>>   drivers/pci/host-bridge.c |    6 ++++++
>>   include/linux/pci.h       |    3 +++
>>   2 files changed, 9 insertions(+)
>>   
>> diff --git a/drivers/pci/host-bridge.c b/drivers/pci/host-bridge.c
>> index 5f4a2e0..c029267 100644
>> --- a/drivers/pci/host-bridge.c
>> +++ b/drivers/pci/host-bridge.c
>> @@ -96,3 +96,9 @@ void pcibios_bus_to_resource(struct pci_bus *bus, struct resource *res,
>>   	res->end = region->end + offset;
>>   }
>>   EXPORT_SYMBOL(pcibios_bus_to_resource);
>> +
>> +bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev)
>> +{
>> +	return pci_find_host_bridge(pdev->bus)->msi_filtered;
>> +}
>> +EXPORT_SYMBOL_GPL(pci_host_bridge_msi_filtered_enabled);
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index b640d65..b952b78 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -412,6 +412,7 @@ struct pci_host_bridge {
>>   	void (*release_fn)(struct pci_host_bridge *);
>>   	void *release_data;
>>   	unsigned int ignore_reset_delay:1;	/* for entire hierarchy */
>> +	unsigned int msi_filtered:1;	/* support filtering of MSIs */
>>   	/* Resource alignment requirements */
>>   	resource_size_t (*align_resource)(struct pci_dev *dev,
>>   			const struct resource *res,
>> @@ -430,6 +431,8 @@ void pci_set_host_bridge_release(struct pci_host_bridge *bridge,
>>   
>>   int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge);
>>   
>> +bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev);
>> +
>>   /*
>>    * The first PCI_BRIDGE_RESOURCE_NUM PCI bus resources (those that correspond
>>    * to P2P or CardBus bridge windows) go in a table.  Additional ones (for
> Don't we already have a flag for this in the IOMMU space?
>
> enum iommu_cap {
>          IOMMU_CAP_CACHE_COHERENCY,      /* IOMMU can enforce cache coherent DMA
>                                             transactions */
> --->    IOMMU_CAP_INTR_REMAP,           /* IOMMU supports interrupt isolation */
>          IOMMU_CAP_NOEXEC,               /* IOMMU_NOEXEC flag */
> };
>

I saw this flag had been enabled in x86 and ARM arch.

I'm not sure whether we can mmap MSI-X table in those archs. I just 
verify it on PPC64 arch.

Regards.
Yongji Xie

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 5/5] vfio-pci: Allow to mmap MSI-X table if host bridge supports filtering of MSIs
  2016-01-28 22:46   ` Alex Williamson
@ 2016-01-29 10:42     ` Yongji Xie
  0 siblings, 0 replies; 22+ messages in thread
From: Yongji Xie @ 2016-01-29 10:42 UTC (permalink / raw)
  To: Alex Williamson, kvm, linux-kernel, linux-pci, linuxppc-dev, linux-doc
  Cc: bhelgaas, corbet, aik, benh, paulus, mpe, warrier, zhong, nikunj

On 2016/1/29 6:46, Alex Williamson wrote:
> On Fri, 2016-01-15 at 15:06 +0800, Yongji Xie wrote:
>> Current vfio-pci implementation disallows to mmap MSI-X
>> table in case that user get to touch this directly.
>>   
>> But we should allow to mmap these MSI-X tables if the PCI
>> host bridge supports filtering of MSIs.
>>   
>> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
>> ---
>>   drivers/vfio/pci/vfio_pci.c |    6 ++++--
>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>   
>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>> index 11fd0f0..4d68f6a 100644
>> --- a/drivers/vfio/pci/vfio_pci.c
>> +++ b/drivers/vfio/pci/vfio_pci.c
>> @@ -555,7 +555,8 @@ static long vfio_pci_ioctl(void *device_data,
>>   			    IORESOURCE_MEM && !pci_resources_share_page(pdev,
>>   			    info.index)) {
>>   				info.flags |= VFIO_REGION_INFO_FLAG_MMAP;
>> -				if (info.index == vdev->msix_bar) {
>> +				if (!pci_host_bridge_msi_filtered_enabled(pdev) &&
>> +				    info.index == vdev->msix_bar) {
>>   					ret = msix_sparse_mmap_cap(vdev, &caps);
>>   					if (ret)
>>   						return ret;
>> @@ -967,7 +968,8 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
>>   	if (phys_len < PAGE_SIZE || req_start + req_len > phys_len)
>>   		return -EINVAL;
>>   
>> -	if (index == vdev->msix_bar) {
>> +	if (!pci_host_bridge_msi_filtered_enabled(pdev) &&
>> +	    index == vdev->msix_bar) {
>>   		/*
>>   		 * Disallow mmaps overlapping the MSI-X table; users don't
>>   		 * get to touch this directly.  We could find somewhere
> What about read()/write() access, why would we allow mmap() but not
> those?
>

Yes, you are right! I miss the MSI-X table check in vfio_pci_bar_rw().
I will fix it in next version. Thanks.

Regards,
Yongji Xie

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 1/5] PCI: Add support for enforcing all MMIO BARs to be page aligned
  2016-01-29 10:37     ` Yongji Xie
@ 2016-01-29 19:01       ` Alex Williamson
  2016-02-01  8:50           ` Yongji Xie
  0 siblings, 1 reply; 22+ messages in thread
From: Alex Williamson @ 2016-01-29 19:01 UTC (permalink / raw)
  To: Yongji Xie, kvm, linux-kernel, linux-pci, linuxppc-dev, linux-doc
  Cc: bhelgaas, corbet, aik, benh, paulus, mpe, warrier, zhong, nikunj

On Fri, 2016-01-29 at 18:37 +0800, Yongji Xie wrote:
> On 2016/1/29 6:46, Alex Williamson wrote:
> > On Fri, 2016-01-15 at 15:06 +0800, Yongji Xie wrote:
> > > When vfio passthrough a PCI device of which MMIO BARs
> > > are smaller than PAGE_SIZE, guest will not handle the
> > > mmio accesses to the BARs which leads to mmio emulations
> > > in host.
> > >   
> > > This is because vfio will not allow to passthrough one
> > > BAR's mmio page which may be shared with other BARs.
> > >   
> > > To solve this performance issue, this patch adds a kernel
> > > parameter "pci=resource_page_aligned=on" to enforce
> > > the alignment of all MMIO BARs to be at least PAGE_SIZE,
> > > so that one BAR's mmio page would not be shared with other
> > > BARs. We can also disable it through kernel parameter
> > > "pci=resource_page_aligned=off".
> > >   
> > > For the default value of the parameter, we think it should be
> > > arch-independent, so we add a macro
> > > HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED to change it. And we
> > > define this macro to enable this parameter by default on PPC64
> > > platform which can easily hit this performance issue because
> > > its PAGE_SIZE is 64KB.
> > >   
> > > Note that the kernel parameter won't works if kernel doesn't do
> > > resources reallocation.
> > And where do you account for this so that we know whether it's really in
> > effect?
> 
> We can check the flag PCI_PROBE_ONLY to know whether kernel do
> resources reallocation. Then we know if the kernel parameter is really
> in effect.
> 
> enum {
>      /* Force re-assigning all resources (ignore firmware
>       * setup completely)
>       */
>      PCI_REASSIGN_ALL_RSRC    = 0x00000001,
> 
>      /* Re-assign all bus numbers */
>      PCI_REASSIGN_ALL_BUS    = 0x00000002,
> 
>      /* Do not try to assign, just use existing setup */
> --->    PCI_PROBE_ONLY        = 0x00000004,
> 
> And I will add this to commit log.

We need more than a commit log entry for this, what's the purpose of the
pci_resources_share_page() function if we don't know if this is in
effect?

> > > Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
> > > ---
> > >   Documentation/kernel-parameters.txt |    5 +++++
> > >   arch/powerpc/include/asm/pci.h      |   11 +++++++++++
> > >   drivers/pci/pci.c                   |   35 +++++++++++++++++++++++++++++++++++
> > >   drivers/pci/pci.h                   |    8 +++++++-
> > >   include/linux/pci.h                 |    4 ++++
> > >   5 files changed, 62 insertions(+), 1 deletion(-)
> > >   
> > > diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> > > index 742f69d..3f2a7c9 100644
> > > --- a/Documentation/kernel-parameters.txt
> > > +++ b/Documentation/kernel-parameters.txt
> > > @@ -2857,6 +2857,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
> > >   				PAGE_SIZE is used as alignment.
> > >   				PCI-PCI bridge can be specified, if resource
> > >   				windows need to be expanded.
> > > +		resource_page_aligned=	Enable/disable enforcing the alignment
> > > +				of all PCI devices' memory resources to be
> > > +				at least PAGE_SIZE if resources reallocation
> > > +				is done by kernel.
> > > +				Format: { "on" | "off" }
> > >   		ecrc=		Enable/disable PCIe ECRC (transaction layer
> > >   				end-to-end CRC checking).
> > >   				bios: Use BIOS/firmware settings. This is the
> > > diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
> > > index 3453bd8..2d2b3ef 100644
> > > --- a/arch/powerpc/include/asm/pci.h
> > > +++ b/arch/powerpc/include/asm/pci.h
> > > @@ -136,6 +136,17 @@ extern pgprot_t	pci_phys_mem_access_prot(struct file *file,
> > >   					 unsigned long pfn,
> > >   					 unsigned long size,
> > >   					 pgprot_t prot);
> > > +#ifdef CONFIG_PPC64
> > > +
> > > +/* For PPC64, We enforce all PCI MMIO BARs to be page aligned
> > > + * by default. This would be helpful to improve performance
> > > + * when we passthrough a PCI device of which BARs are smaller
> > > + * than PAGE_SIZE(64KB). And we can use kernel parameter
> > > + * "pci=resource_page_aligned=off" to disable it.
> > > + */
> > > +#define HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED	1
> > > +
> > > +#endif
> > >   
> > >   #define HAVE_ARCH_PCI_RESOURCE_TO_USER
> > >   extern void pci_resource_to_user(const struct pci_dev *dev, int bar,
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index 314db8c..7b21238 100644
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -99,6 +99,9 @@ u8 pci_cache_line_size;
> > >    */
> > >   unsigned int pcibios_max_latency = 255;
> > >   
> > > +bool pci_resources_page_aligned =
> > > +	IS_ENABLED(HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED);
> > I don't think this is proper use of IS_ENABLED, which seems to be
> > targeted at CONFIG_ type options.  You could define this as that in an
> > arch Kconfig.
> 
> Is it better that we define this as a pci Kconfig and select it in arch 
> Kconfig?

If you want to use IS_ENABLE here, I would think so.

> > > +
> > >   /* If set, the PCIe ARI capability will not be used. */
> > >   static bool pcie_ari_disabled;
> > >   
> > > @@ -4746,6 +4749,35 @@ static ssize_t pci_resource_alignment_store(struct bus_type *bus,
> > >   BUS_ATTR(resource_alignment, 0644, pci_resource_alignment_show,
> > >   					pci_resource_alignment_store);
> > >   
> > > +static void pci_resources_get_page_aligned(char *str)
> > > +{
> > > +	if (!strncmp(str, "off", 3))
> > > +		pci_resources_page_aligned = false;
> > > +	else if (!strncmp(str, "on", 2))
> > > +		pci_resources_page_aligned = true;
> > > +}
> > "get"?
> 
> How about pci_resources_parse_page_aligned_param()?

Better.

> > > +
> > > +/*
> > > + * This function checks whether PCI BARs' mmio page will be shared
> > > + * with other BARs.
> > > + */
> > > +bool pci_resources_share_page(struct pci_dev *dev, int resno)
> > > +{
> > > +	struct resource *res = dev->resource + resno;
> > > +
> > > +	if (resource_size(res) >= PAGE_SIZE)
> > > +		return false;
> > > +	if (pci_resources_page_aligned && !(res->start & ~PAGE_MASK) &&
> > > +	    res->flags & IORESOURCE_MEM) {
> > > +		if (res->sibling)
> > > +			return (res->sibling->start & ~PAGE_MASK);
> > > +		else
> > > +			return false;
> > > +	}
> > > +	return true;
> > > +}
> > > +EXPORT_SYMBOL_GPL(pci_resources_share_page);
> > > +
> > >   static int __init pci_resource_alignment_sysfs_init(void)
> > >   {
> > >   	return bus_create_file(&pci_bus_type,
> > > @@ -4859,6 +4891,9 @@ static int __init pci_setup(char *str)
> > >   			} else if (!strncmp(str, "resource_alignment=", 19)) {
> > >   				pci_set_resource_alignment_param(str + 19,
> > >   							strlen(str + 19));
> > > +			} else if (!strncmp(str, "resource_page_aligned=",
> > > +				   22)) {
> > > +				pci_resources_get_page_aligned(str + 22);
> > Doesn't this seem rather redundant with the option right above it,
> > resource_alignment=?  Why not modify that to support syntax where all
> > devices get the same alignment?
> > 
> 
> The kernel option will be used to do two things.
> 
> Firstly, the option can be used to enable all devices to be page aligned.
> 
> Secondly, we can use the option to disable it when the Kconfig option
> mentioned above enable all devices to be page aligned by default.
> 
> We can easily modify this option "resource_alignment=" to do the first
> thing. But I didn't find a proper way to modify it to do the second thing.

You could allow an arch specified default which is overridden by
specifying a resource_alignment= value.  Then you need a way to disable
it, which you could simply do by making
pci_set_resource_alignment_param() able to parse something like
resource_alignment=off.

> > >   			} else if (!strncmp(str, "ecrc=", 5)) {
> > >   				pcie_ecrc_get_policy(str + 5);
> > >   			} else if (!strncmp(str, "hpiosize=", 9)) {
> > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > > index d390fc1..b9b333d 100644
> > > --- a/drivers/pci/pci.h
> > > +++ b/drivers/pci/pci.h
> > > @@ -312,11 +312,17 @@ static inline resource_size_t pci_resource_alignment(struct pci_dev *dev,
> > >   #ifdef CONFIG_PCI_IOV
> > >   	int resno = res - dev->resource;
> > >   
> > > -	if (resno >= PCI_IOV_RESOURCES && resno <= PCI_IOV_RESOURCE_END)
> > > +	if (resno >= PCI_IOV_RESOURCES && resno <= PCI_IOV_RESOURCE_END) {
> > > +		if (pci_resources_page_aligned && res->flags & IORESOURCE_MEM)
> > > +			return PAGE_ALIGN(pci_sriov_resource_alignment(dev,
> > > +					  resno));
> > >   		return pci_sriov_resource_alignment(dev, resno);
> > > +	}
> > >   #endif
> > >   	if (dev->class >> 8  == PCI_CLASS_BRIDGE_CARDBUS)
> > >   		return pci_cardbus_resource_alignment(res);
> > > +	if (pci_resources_page_aligned && res->flags & IORESOURCE_MEM)
> > > +		return PAGE_ALIGN(resource_alignment(res));
> > >   	return resource_alignment(res);
> > >   }
> > 
> > Since we already have resource_alignment=, shouldn't we already have the
> > code in place to re-align?
> 
> Yes, this code can do the re-aligning. But we can't reuse the code because
> it re-align device's bars by changing their sizes, which can potentially 
> break
> some drivers.
> 
> I'm thinking if we can use IORESOURCE_STARTALIGN for this. Thanks.

Shouldn't we fix resource_alignment= then to make it behave in a more
compatible way then?  resource_alignment=64k,resource_resize=off?
Thanks,

Alex

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 3/5] PCI: Add host bridge attribute to indicate filtering of MSIs is supported
  2016-01-29 10:40     ` Yongji Xie
@ 2016-01-29 19:05       ` Alex Williamson
  2016-02-01  9:13           ` Yongji Xie
  0 siblings, 1 reply; 22+ messages in thread
From: Alex Williamson @ 2016-01-29 19:05 UTC (permalink / raw)
  To: Yongji Xie
  Cc: kvm, linux-kernel, linux-pci, linuxppc-dev, linux-doc, bhelgaas,
	corbet, aik, benh, paulus, mpe, warrier, zhong, nikunj



----- Original Message -----
> On 2016/1/29 6:46, Alex Williamson wrote:
> > On Fri, 2016-01-15 at 15:06 +0800, Yongji Xie wrote:
> >> MSI-X tables are not allowed to be mmapped in vfio-pci
> >> driver in case that user get to touch this directly.
> >> This will cause some performance issues when when PCI
> >> adapters have critical registers in the same page as
> >> the MSI-X table.
> >>   
> >> However, some kind of PCI host bridge such as IODA bridge
> >> on Power support filtering of MSIs, which can ensure that a
> >> given pci device can only shoot the MSIs assigned for it.
> >> So we think it's safe to expose the MSI-X table to userspace
> >> if filtering of MSIs is supported because the exposed MSI-X
> >> table can't be used to do harm to other memory space.
> >>   
> >> To support this case, this patch adds a pci_host_bridge
> >> attribute to indicate if this PCI host bridge supports
> >> filtering of MSIs.
> >>   
> >> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
> >> ---
> >>   drivers/pci/host-bridge.c |    6 ++++++
> >>   include/linux/pci.h       |    3 +++
> >>   2 files changed, 9 insertions(+)
> >>   
> >> diff --git a/drivers/pci/host-bridge.c b/drivers/pci/host-bridge.c
> >> index 5f4a2e0..c029267 100644
> >> --- a/drivers/pci/host-bridge.c
> >> +++ b/drivers/pci/host-bridge.c
> >> @@ -96,3 +96,9 @@ void pcibios_bus_to_resource(struct pci_bus *bus, struct
> >> resource *res,
> >>   	res->end = region->end + offset;
> >>   }
> >>   EXPORT_SYMBOL(pcibios_bus_to_resource);
> >> +
> >> +bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev)
> >> +{
> >> +	return pci_find_host_bridge(pdev->bus)->msi_filtered;
> >> +}
> >> +EXPORT_SYMBOL_GPL(pci_host_bridge_msi_filtered_enabled);
> >> diff --git a/include/linux/pci.h b/include/linux/pci.h
> >> index b640d65..b952b78 100644
> >> --- a/include/linux/pci.h
> >> +++ b/include/linux/pci.h
> >> @@ -412,6 +412,7 @@ struct pci_host_bridge {
> >>   	void (*release_fn)(struct pci_host_bridge *);
> >>   	void *release_data;
> >>   	unsigned int ignore_reset_delay:1;	/* for entire hierarchy */
> >> +	unsigned int msi_filtered:1;	/* support filtering of MSIs */
> >>   	/* Resource alignment requirements */
> >>   	resource_size_t (*align_resource)(struct pci_dev *dev,
> >>   			const struct resource *res,
> >> @@ -430,6 +431,8 @@ void pci_set_host_bridge_release(struct
> >> pci_host_bridge *bridge,
> >>   
> >>   int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge);
> >>   
> >> +bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev);
> >> +
> >>   /*
> >>    * The first PCI_BRIDGE_RESOURCE_NUM PCI bus resources (those that
> >>    correspond
> >>    * to P2P or CardBus bridge windows) go in a table.  Additional ones
> >>    (for
> > Don't we already have a flag for this in the IOMMU space?
> >
> > enum iommu_cap {
> >          IOMMU_CAP_CACHE_COHERENCY,      /* IOMMU can enforce cache
> >          coherent DMA
> >                                             transactions */
> > --->    IOMMU_CAP_INTR_REMAP,           /* IOMMU supports interrupt
> > isolation */
> >          IOMMU_CAP_NOEXEC,               /* IOMMU_NOEXEC flag */
> > };
> >
> 
> I saw this flag had been enabled in x86 and ARM arch.
> 
> I'm not sure whether we can mmap MSI-X table in those archs. I just
> verify it on PPC64 arch.

Unfortunately that's not a very good excuse for creating an alternate implementation.  When x86 implements interrupt remapping, we get fine grained isolation of MSI vectors and we've always taken this flag to mean that the system is isolated from devices that may perform DoS attacks with MSI writes.  I'm not entirely sure whether ARM really provides that degree of isolation, but they would be incorrect is exposing the capability if they do not.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 1/5] PCI: Add support for enforcing all MMIO BARs to be page aligned
  2016-01-29 19:01       ` Alex Williamson
@ 2016-02-01  8:50           ` Yongji Xie
  0 siblings, 0 replies; 22+ messages in thread
From: Yongji Xie @ 2016-02-01  8:50 UTC (permalink / raw)
  To: Alex Williamson, kvm, linux-kernel, linux-pci, linuxppc-dev, linux-doc
  Cc: bhelgaas, corbet, aik, benh, paulus, mpe, warrier, zhong, nikunj

On 2016/1/30 3:01, Alex Williamson wrote:
> On Fri, 2016-01-29 at 18:37 +0800, Yongji Xie wrote:
>> On 2016/1/29 6:46, Alex Williamson wrote:
>>> On Fri, 2016-01-15 at 15:06 +0800, Yongji Xie wrote:
>>>> When vfio passthrough a PCI device of which MMIO BARs
>>>> are smaller than PAGE_SIZE, guest will not handle the
>>>> mmio accesses to the BARs which leads to mmio emulations
>>>> in host.
>>>>    
>>>> This is because vfio will not allow to passthrough one
>>>> BAR's mmio page which may be shared with other BARs.
>>>>    
>>>> To solve this performance issue, this patch adds a kernel
>>>> parameter "pci=resource_page_aligned=on" to enforce
>>>> the alignment of all MMIO BARs to be at least PAGE_SIZE,
>>>> so that one BAR's mmio page would not be shared with other
>>>> BARs. We can also disable it through kernel parameter
>>>> "pci=resource_page_aligned=off".
>>>>    
>>>> For the default value of the parameter, we think it should be
>>>> arch-independent, so we add a macro
>>>> HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED to change it. And we
>>>> define this macro to enable this parameter by default on PPC64
>>>> platform which can easily hit this performance issue because
>>>> its PAGE_SIZE is 64KB.
>>>>    
>>>> Note that the kernel parameter won't works if kernel doesn't do
>>>> resources reallocation.
>>> And where do you account for this so that we know whether it's really in
>>> effect?
>>   
>> We can check the flag PCI_PROBE_ONLY to know whether kernel do
>> resources reallocation. Then we know if the kernel parameter is really
>> in effect.
>>   
>> enum {
>>       /* Force re-assigning all resources (ignore firmware
>>        * setup completely)
>>        */
>>       PCI_REASSIGN_ALL_RSRC    = 0x00000001,
>>   
>>       /* Re-assign all bus numbers */
>>       PCI_REASSIGN_ALL_BUS    = 0x00000002,
>>   
>>       /* Do not try to assign, just use existing setup */
>> --->    PCI_PROBE_ONLY        = 0x00000004,
>>   
>> And I will add this to commit log.
> We need more than a commit log entry for this, what's the purpose of the
> pci_resources_share_page() function if we don't know if this is in
> effect?

It seems the parameter will be always in effect if we reuse the 
re-aligning code
of parameter "resource_alignment=" in pci_reassigndev_resource_alignment().

>>>> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
>>>> ---
>>>>    Documentation/kernel-parameters.txt |    5 +++++
>>>>    arch/powerpc/include/asm/pci.h      |   11 +++++++++++
>>>>    drivers/pci/pci.c                   |   35 +++++++++++++++++++++++++++++++++++
>>>>    drivers/pci/pci.h                   |    8 +++++++-
>>>>    include/linux/pci.h                 |    4 ++++
>>>>    5 files changed, 62 insertions(+), 1 deletion(-)
>>>>    
>>>> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
>>>> index 742f69d..3f2a7c9 100644
>>>> --- a/Documentation/kernel-parameters.txt
>>>> +++ b/Documentation/kernel-parameters.txt
>>>> @@ -2857,6 +2857,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
>>>>    				PAGE_SIZE is used as alignment.
>>>>    				PCI-PCI bridge can be specified, if resource
>>>>    				windows need to be expanded.
>>>> +		resource_page_aligned=	Enable/disable enforcing the alignment
>>>> +				of all PCI devices' memory resources to be
>>>> +				at least PAGE_SIZE if resources reallocation
>>>> +				is done by kernel.
>>>> +				Format: { "on" | "off" }
>>>>    		ecrc=		Enable/disable PCIe ECRC (transaction layer
>>>>    				end-to-end CRC checking).
>>>>    				bios: Use BIOS/firmware settings. This is the
>>>> diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
>>>> index 3453bd8..2d2b3ef 100644
>>>> --- a/arch/powerpc/include/asm/pci.h
>>>> +++ b/arch/powerpc/include/asm/pci.h
>>>> @@ -136,6 +136,17 @@ extern pgprot_t	pci_phys_mem_access_prot(struct file *file,
>>>>    					 unsigned long pfn,
>>>>    					 unsigned long size,
>>>>    					 pgprot_t prot);
>>>> +#ifdef CONFIG_PPC64
>>>> +
>>>> +/* For PPC64, We enforce all PCI MMIO BARs to be page aligned
>>>> + * by default. This would be helpful to improve performance
>>>> + * when we passthrough a PCI device of which BARs are smaller
>>>> + * than PAGE_SIZE(64KB). And we can use kernel parameter
>>>> + * "pci=resource_page_aligned=off" to disable it.
>>>> + */
>>>> +#define HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED	1
>>>> +
>>>> +#endif
>>>>    
>>>>    #define HAVE_ARCH_PCI_RESOURCE_TO_USER
>>>>    extern void pci_resource_to_user(const struct pci_dev *dev, int bar,
>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>>> index 314db8c..7b21238 100644
>>>> --- a/drivers/pci/pci.c
>>>> +++ b/drivers/pci/pci.c
>>>> @@ -99,6 +99,9 @@ u8 pci_cache_line_size;
>>>>     */
>>>>    unsigned int pcibios_max_latency = 255;
>>>>    
>>>> +bool pci_resources_page_aligned =
>>>> +	IS_ENABLED(HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED);
>>> I don't think this is proper use of IS_ENABLED, which seems to be
>>> targeted at CONFIG_ type options.  You could define this as that in an
>>> arch Kconfig.
>>   
>> Is it better that we define this as a pci Kconfig and select it in arch
>> Kconfig?
> If you want to use IS_ENABLE here, I would think so.

Actually, I'm not sure if it's necessary to add a Kconfig option for it.

I prefer to do it like previous version:

#ifdef HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED
bool pci_resources_page_aligned = true;
#else
bool pci_resources_page_aligned;
#endif

>>>> +
>>>>    /* If set, the PCIe ARI capability will not be used. */
>>>>    static bool pcie_ari_disabled;
>>>>    
>>>> @@ -4746,6 +4749,35 @@ static ssize_t pci_resource_alignment_store(struct bus_type *bus,
>>>>    BUS_ATTR(resource_alignment, 0644, pci_resource_alignment_show,
>>>>    					pci_resource_alignment_store);
>>>>    
>>>> +static void pci_resources_get_page_aligned(char *str)
>>>> +{
>>>> +	if (!strncmp(str, "off", 3))
>>>> +		pci_resources_page_aligned = false;
>>>> +	else if (!strncmp(str, "on", 2))
>>>> +		pci_resources_page_aligned = true;
>>>> +}
>>> "get"?
>>   
>> How about pci_resources_parse_page_aligned_param()?
> Better.
>
>>>> +
>>>> +/*
>>>> + * This function checks whether PCI BARs' mmio page will be shared
>>>> + * with other BARs.
>>>> + */
>>>> +bool pci_resources_share_page(struct pci_dev *dev, int resno)
>>>> +{
>>>> +	struct resource *res = dev->resource + resno;
>>>> +
>>>> +	if (resource_size(res) >= PAGE_SIZE)
>>>> +		return false;
>>>> +	if (pci_resources_page_aligned && !(res->start & ~PAGE_MASK) &&
>>>> +	    res->flags & IORESOURCE_MEM) {
>>>> +		if (res->sibling)
>>>> +			return (res->sibling->start & ~PAGE_MASK);
>>>> +		else
>>>> +			return false;
>>>> +	}
>>>> +	return true;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(pci_resources_share_page);
>>>> +
>>>>    static int __init pci_resource_alignment_sysfs_init(void)
>>>>    {
>>>>    	return bus_create_file(&pci_bus_type,
>>>> @@ -4859,6 +4891,9 @@ static int __init pci_setup(char *str)
>>>>    			} else if (!strncmp(str, "resource_alignment=", 19)) {
>>>>    				pci_set_resource_alignment_param(str + 19,
>>>>    							strlen(str + 19));
>>>> +			} else if (!strncmp(str, "resource_page_aligned=",
>>>> +				   22)) {
>>>> +				pci_resources_get_page_aligned(str + 22);
>>> Doesn't this seem rather redundant with the option right above it,
>>> resource_alignment=?  Why not modify that to support syntax where all
>>> devices get the same alignment?
>>>   
>>   
>> The kernel option will be used to do two things.
>>   
>> Firstly, the option can be used to enable all devices to be page aligned.
>>   
>> Secondly, we can use the option to disable it when the Kconfig option
>> mentioned above enable all devices to be page aligned by default.
>>   
>> We can easily modify this option "resource_alignment=" to do the first
>> thing. But I didn't find a proper way to modify it to do the second thing.
> You could allow an arch specified default which is overridden by
> specifying a resource_alignment= value.  Then you need a way to disable
> it, which you could simply do by making
> pci_set_resource_alignment_param() able to parse something like
> resource_alignment=off.

We just want to enforce the alignment of all MMIO BARs to be page
aligned in this patch. And both the arch specified default value and
pci_resources_page_aligned are something like *on/off* enforcing the
alignment of resources to be page aligned.

So I think it's better to add a parameter whose format is *on/off*.

If we reuse "resource_alignment=", we need an additional translation
from "resource_alignment="(format: [<order of align>@]...) to
pci_resources_page_aligned(format: true/false).

And "resource_alignment=" is always used to specify alignments of devices.
Would it be misunderstanding to add some syntax like
"resource_alignment=off"?

>>>>    			} else if (!strncmp(str, "ecrc=", 5)) {
>>>>    				pcie_ecrc_get_policy(str + 5);
>>>>    			} else if (!strncmp(str, "hpiosize=", 9)) {
>>>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>>>> index d390fc1..b9b333d 100644
>>>> --- a/drivers/pci/pci.h
>>>> +++ b/drivers/pci/pci.h
>>>> @@ -312,11 +312,17 @@ static inline resource_size_t pci_resource_alignment(struct pci_dev *dev,
>>>>    #ifdef CONFIG_PCI_IOV
>>>>    	int resno = res - dev->resource;
>>>>    
>>>> -	if (resno >= PCI_IOV_RESOURCES && resno <= PCI_IOV_RESOURCE_END)
>>>> +	if (resno >= PCI_IOV_RESOURCES && resno <= PCI_IOV_RESOURCE_END) {
>>>> +		if (pci_resources_page_aligned && res->flags & IORESOURCE_MEM)
>>>> +			return PAGE_ALIGN(pci_sriov_resource_alignment(dev,
>>>> +					  resno));
>>>>    		return pci_sriov_resource_alignment(dev, resno);
>>>> +	}
>>>>    #endif
>>>>    	if (dev->class >> 8  == PCI_CLASS_BRIDGE_CARDBUS)
>>>>    		return pci_cardbus_resource_alignment(res);
>>>> +	if (pci_resources_page_aligned && res->flags & IORESOURCE_MEM)
>>>> +		return PAGE_ALIGN(resource_alignment(res));
>>>>    	return resource_alignment(res);
>>>>    }
>>>   
>>> Since we already have resource_alignment=, shouldn't we already have the
>>> code in place to re-align?
>>   
>> Yes, this code can do the re-aligning. But we can't reuse the code because
>> it re-align device's bars by changing their sizes, which can potentially
>> break
>> some drivers.
>>   
>> I'm thinking if we can use IORESOURCE_STARTALIGN for this. Thanks.
> Shouldn't we fix resource_alignment= then to make it behave in a more
> compatible way then?  resource_alignment=64k,resource_resize=off?
> Thanks,
>
> Alex
>

Good point! We can add something like "resource_resize=off" to
"resource_alignment=". Then we can reuse the re-aligning code. Thanks.

Regards,
Yongji Xie

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 1/5] PCI: Add support for enforcing all MMIO BARs to be page aligned
@ 2016-02-01  8:50           ` Yongji Xie
  0 siblings, 0 replies; 22+ messages in thread
From: Yongji Xie @ 2016-02-01  8:50 UTC (permalink / raw)
  To: Alex Williamson, kvm, linux-kernel, linux-pci, linuxppc-dev, linux-doc
  Cc: nikunj, zhong, corbet, aik, warrier, paulus, bhelgaas

On 2016/1/30 3:01, Alex Williamson wrote:
> On Fri, 2016-01-29 at 18:37 +0800, Yongji Xie wrote:
>> On 2016/1/29 6:46, Alex Williamson wrote:
>>> On Fri, 2016-01-15 at 15:06 +0800, Yongji Xie wrote:
>>>> When vfio passthrough a PCI device of which MMIO BARs
>>>> are smaller than PAGE_SIZE, guest will not handle the
>>>> mmio accesses to the BARs which leads to mmio emulations
>>>> in host.
>>>>    
>>>> This is because vfio will not allow to passthrough one
>>>> BAR's mmio page which may be shared with other BARs.
>>>>    
>>>> To solve this performance issue, this patch adds a kernel
>>>> parameter "pci=resource_page_aligned=on" to enforce
>>>> the alignment of all MMIO BARs to be at least PAGE_SIZE,
>>>> so that one BAR's mmio page would not be shared with other
>>>> BARs. We can also disable it through kernel parameter
>>>> "pci=resource_page_aligned=off".
>>>>    
>>>> For the default value of the parameter, we think it should be
>>>> arch-independent, so we add a macro
>>>> HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED to change it. And we
>>>> define this macro to enable this parameter by default on PPC64
>>>> platform which can easily hit this performance issue because
>>>> its PAGE_SIZE is 64KB.
>>>>    
>>>> Note that the kernel parameter won't works if kernel doesn't do
>>>> resources reallocation.
>>> And where do you account for this so that we know whether it's really in
>>> effect?
>>   
>> We can check the flag PCI_PROBE_ONLY to know whether kernel do
>> resources reallocation. Then we know if the kernel parameter is really
>> in effect.
>>   
>> enum {
>>       /* Force re-assigning all resources (ignore firmware
>>        * setup completely)
>>        */
>>       PCI_REASSIGN_ALL_RSRC    = 0x00000001,
>>   
>>       /* Re-assign all bus numbers */
>>       PCI_REASSIGN_ALL_BUS    = 0x00000002,
>>   
>>       /* Do not try to assign, just use existing setup */
>> --->    PCI_PROBE_ONLY        = 0x00000004,
>>   
>> And I will add this to commit log.
> We need more than a commit log entry for this, what's the purpose of the
> pci_resources_share_page() function if we don't know if this is in
> effect?

It seems the parameter will be always in effect if we reuse the 
re-aligning code
of parameter "resource_alignment=" in pci_reassigndev_resource_alignment().

>>>> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
>>>> ---
>>>>    Documentation/kernel-parameters.txt |    5 +++++
>>>>    arch/powerpc/include/asm/pci.h      |   11 +++++++++++
>>>>    drivers/pci/pci.c                   |   35 +++++++++++++++++++++++++++++++++++
>>>>    drivers/pci/pci.h                   |    8 +++++++-
>>>>    include/linux/pci.h                 |    4 ++++
>>>>    5 files changed, 62 insertions(+), 1 deletion(-)
>>>>    
>>>> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
>>>> index 742f69d..3f2a7c9 100644
>>>> --- a/Documentation/kernel-parameters.txt
>>>> +++ b/Documentation/kernel-parameters.txt
>>>> @@ -2857,6 +2857,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
>>>>    				PAGE_SIZE is used as alignment.
>>>>    				PCI-PCI bridge can be specified, if resource
>>>>    				windows need to be expanded.
>>>> +		resource_page_aligned=	Enable/disable enforcing the alignment
>>>> +				of all PCI devices' memory resources to be
>>>> +				at least PAGE_SIZE if resources reallocation
>>>> +				is done by kernel.
>>>> +				Format: { "on" | "off" }
>>>>    		ecrc=		Enable/disable PCIe ECRC (transaction layer
>>>>    				end-to-end CRC checking).
>>>>    				bios: Use BIOS/firmware settings. This is the
>>>> diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
>>>> index 3453bd8..2d2b3ef 100644
>>>> --- a/arch/powerpc/include/asm/pci.h
>>>> +++ b/arch/powerpc/include/asm/pci.h
>>>> @@ -136,6 +136,17 @@ extern pgprot_t	pci_phys_mem_access_prot(struct file *file,
>>>>    					 unsigned long pfn,
>>>>    					 unsigned long size,
>>>>    					 pgprot_t prot);
>>>> +#ifdef CONFIG_PPC64
>>>> +
>>>> +/* For PPC64, We enforce all PCI MMIO BARs to be page aligned
>>>> + * by default. This would be helpful to improve performance
>>>> + * when we passthrough a PCI device of which BARs are smaller
>>>> + * than PAGE_SIZE(64KB). And we can use kernel parameter
>>>> + * "pci=resource_page_aligned=off" to disable it.
>>>> + */
>>>> +#define HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED	1
>>>> +
>>>> +#endif
>>>>    
>>>>    #define HAVE_ARCH_PCI_RESOURCE_TO_USER
>>>>    extern void pci_resource_to_user(const struct pci_dev *dev, int bar,
>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>>> index 314db8c..7b21238 100644
>>>> --- a/drivers/pci/pci.c
>>>> +++ b/drivers/pci/pci.c
>>>> @@ -99,6 +99,9 @@ u8 pci_cache_line_size;
>>>>     */
>>>>    unsigned int pcibios_max_latency = 255;
>>>>    
>>>> +bool pci_resources_page_aligned =
>>>> +	IS_ENABLED(HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED);
>>> I don't think this is proper use of IS_ENABLED, which seems to be
>>> targeted at CONFIG_ type options.  You could define this as that in an
>>> arch Kconfig.
>>   
>> Is it better that we define this as a pci Kconfig and select it in arch
>> Kconfig?
> If you want to use IS_ENABLE here, I would think so.

Actually, I'm not sure if it's necessary to add a Kconfig option for it.

I prefer to do it like previous version:

#ifdef HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED
bool pci_resources_page_aligned = true;
#else
bool pci_resources_page_aligned;
#endif

>>>> +
>>>>    /* If set, the PCIe ARI capability will not be used. */
>>>>    static bool pcie_ari_disabled;
>>>>    
>>>> @@ -4746,6 +4749,35 @@ static ssize_t pci_resource_alignment_store(struct bus_type *bus,
>>>>    BUS_ATTR(resource_alignment, 0644, pci_resource_alignment_show,
>>>>    					pci_resource_alignment_store);
>>>>    
>>>> +static void pci_resources_get_page_aligned(char *str)
>>>> +{
>>>> +	if (!strncmp(str, "off", 3))
>>>> +		pci_resources_page_aligned = false;
>>>> +	else if (!strncmp(str, "on", 2))
>>>> +		pci_resources_page_aligned = true;
>>>> +}
>>> "get"?
>>   
>> How about pci_resources_parse_page_aligned_param()?
> Better.
>
>>>> +
>>>> +/*
>>>> + * This function checks whether PCI BARs' mmio page will be shared
>>>> + * with other BARs.
>>>> + */
>>>> +bool pci_resources_share_page(struct pci_dev *dev, int resno)
>>>> +{
>>>> +	struct resource *res = dev->resource + resno;
>>>> +
>>>> +	if (resource_size(res) >= PAGE_SIZE)
>>>> +		return false;
>>>> +	if (pci_resources_page_aligned && !(res->start & ~PAGE_MASK) &&
>>>> +	    res->flags & IORESOURCE_MEM) {
>>>> +		if (res->sibling)
>>>> +			return (res->sibling->start & ~PAGE_MASK);
>>>> +		else
>>>> +			return false;
>>>> +	}
>>>> +	return true;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(pci_resources_share_page);
>>>> +
>>>>    static int __init pci_resource_alignment_sysfs_init(void)
>>>>    {
>>>>    	return bus_create_file(&pci_bus_type,
>>>> @@ -4859,6 +4891,9 @@ static int __init pci_setup(char *str)
>>>>    			} else if (!strncmp(str, "resource_alignment=", 19)) {
>>>>    				pci_set_resource_alignment_param(str + 19,
>>>>    							strlen(str + 19));
>>>> +			} else if (!strncmp(str, "resource_page_aligned=",
>>>> +				   22)) {
>>>> +				pci_resources_get_page_aligned(str + 22);
>>> Doesn't this seem rather redundant with the option right above it,
>>> resource_alignment=?  Why not modify that to support syntax where all
>>> devices get the same alignment?
>>>   
>>   
>> The kernel option will be used to do two things.
>>   
>> Firstly, the option can be used to enable all devices to be page aligned.
>>   
>> Secondly, we can use the option to disable it when the Kconfig option
>> mentioned above enable all devices to be page aligned by default.
>>   
>> We can easily modify this option "resource_alignment=" to do the first
>> thing. But I didn't find a proper way to modify it to do the second thing.
> You could allow an arch specified default which is overridden by
> specifying a resource_alignment= value.  Then you need a way to disable
> it, which you could simply do by making
> pci_set_resource_alignment_param() able to parse something like
> resource_alignment=off.

We just want to enforce the alignment of all MMIO BARs to be page
aligned in this patch. And both the arch specified default value and
pci_resources_page_aligned are something like *on/off* enforcing the
alignment of resources to be page aligned.

So I think it's better to add a parameter whose format is *on/off*.

If we reuse "resource_alignment=", we need an additional translation
from "resource_alignment="(format: [<order of align>@]...) to
pci_resources_page_aligned(format: true/false).

And "resource_alignment=" is always used to specify alignments of devices.
Would it be misunderstanding to add some syntax like
"resource_alignment=off"?

>>>>    			} else if (!strncmp(str, "ecrc=", 5)) {
>>>>    				pcie_ecrc_get_policy(str + 5);
>>>>    			} else if (!strncmp(str, "hpiosize=", 9)) {
>>>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>>>> index d390fc1..b9b333d 100644
>>>> --- a/drivers/pci/pci.h
>>>> +++ b/drivers/pci/pci.h
>>>> @@ -312,11 +312,17 @@ static inline resource_size_t pci_resource_alignment(struct pci_dev *dev,
>>>>    #ifdef CONFIG_PCI_IOV
>>>>    	int resno = res - dev->resource;
>>>>    
>>>> -	if (resno >= PCI_IOV_RESOURCES && resno <= PCI_IOV_RESOURCE_END)
>>>> +	if (resno >= PCI_IOV_RESOURCES && resno <= PCI_IOV_RESOURCE_END) {
>>>> +		if (pci_resources_page_aligned && res->flags & IORESOURCE_MEM)
>>>> +			return PAGE_ALIGN(pci_sriov_resource_alignment(dev,
>>>> +					  resno));
>>>>    		return pci_sriov_resource_alignment(dev, resno);
>>>> +	}
>>>>    #endif
>>>>    	if (dev->class >> 8  == PCI_CLASS_BRIDGE_CARDBUS)
>>>>    		return pci_cardbus_resource_alignment(res);
>>>> +	if (pci_resources_page_aligned && res->flags & IORESOURCE_MEM)
>>>> +		return PAGE_ALIGN(resource_alignment(res));
>>>>    	return resource_alignment(res);
>>>>    }
>>>   
>>> Since we already have resource_alignment=, shouldn't we already have the
>>> code in place to re-align?
>>   
>> Yes, this code can do the re-aligning. But we can't reuse the code because
>> it re-align device's bars by changing their sizes, which can potentially
>> break
>> some drivers.
>>   
>> I'm thinking if we can use IORESOURCE_STARTALIGN for this. Thanks.
> Shouldn't we fix resource_alignment= then to make it behave in a more
> compatible way then?  resource_alignment=64k,resource_resize=off?
> Thanks,
>
> Alex
>

Good point! We can add something like "resource_resize=off" to
"resource_alignment=". Then we can reuse the re-aligning code. Thanks.

Regards,
Yongji Xie

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 3/5] PCI: Add host bridge attribute to indicate filtering of MSIs is supported
  2016-01-29 19:05       ` Alex Williamson
@ 2016-02-01  9:13           ` Yongji Xie
  0 siblings, 0 replies; 22+ messages in thread
From: Yongji Xie @ 2016-02-01  9:13 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, linux-kernel, linux-pci, linuxppc-dev, linux-doc, bhelgaas,
	corbet, aik, benh, paulus, mpe, warrier, zhong, nikunj

On 2016/1/30 3:05, Alex Williamson wrote:
>
> ----- Original Message -----
>> On 2016/1/29 6:46, Alex Williamson wrote:
>>> On Fri, 2016-01-15 at 15:06 +0800, Yongji Xie wrote:
>>>> MSI-X tables are not allowed to be mmapped in vfio-pci
>>>> driver in case that user get to touch this directly.
>>>> This will cause some performance issues when when PCI
>>>> adapters have critical registers in the same page as
>>>> the MSI-X table.
>>>>    
>>>> However, some kind of PCI host bridge such as IODA bridge
>>>> on Power support filtering of MSIs, which can ensure that a
>>>> given pci device can only shoot the MSIs assigned for it.
>>>> So we think it's safe to expose the MSI-X table to userspace
>>>> if filtering of MSIs is supported because the exposed MSI-X
>>>> table can't be used to do harm to other memory space.
>>>>    
>>>> To support this case, this patch adds a pci_host_bridge
>>>> attribute to indicate if this PCI host bridge supports
>>>> filtering of MSIs.
>>>>    
>>>> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
>>>> ---
>>>>    drivers/pci/host-bridge.c |    6 ++++++
>>>>    include/linux/pci.h       |    3 +++
>>>>    2 files changed, 9 insertions(+)
>>>>    
>>>> diff --git a/drivers/pci/host-bridge.c b/drivers/pci/host-bridge.c
>>>> index 5f4a2e0..c029267 100644
>>>> --- a/drivers/pci/host-bridge.c
>>>> +++ b/drivers/pci/host-bridge.c
>>>> @@ -96,3 +96,9 @@ void pcibios_bus_to_resource(struct pci_bus *bus, struct
>>>> resource *res,
>>>>    	res->end = region->end + offset;
>>>>    }
>>>>    EXPORT_SYMBOL(pcibios_bus_to_resource);
>>>> +
>>>> +bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev)
>>>> +{
>>>> +	return pci_find_host_bridge(pdev->bus)->msi_filtered;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(pci_host_bridge_msi_filtered_enabled);
>>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>>> index b640d65..b952b78 100644
>>>> --- a/include/linux/pci.h
>>>> +++ b/include/linux/pci.h
>>>> @@ -412,6 +412,7 @@ struct pci_host_bridge {
>>>>    	void (*release_fn)(struct pci_host_bridge *);
>>>>    	void *release_data;
>>>>    	unsigned int ignore_reset_delay:1;	/* for entire hierarchy */
>>>> +	unsigned int msi_filtered:1;	/* support filtering of MSIs */
>>>>    	/* Resource alignment requirements */
>>>>    	resource_size_t (*align_resource)(struct pci_dev *dev,
>>>>    			const struct resource *res,
>>>> @@ -430,6 +431,8 @@ void pci_set_host_bridge_release(struct
>>>> pci_host_bridge *bridge,
>>>>    
>>>>    int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge);
>>>>    
>>>> +bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev);
>>>> +
>>>>    /*
>>>>     * The first PCI_BRIDGE_RESOURCE_NUM PCI bus resources (those that
>>>>     correspond
>>>>     * to P2P or CardBus bridge windows) go in a table.  Additional ones
>>>>     (for
>>> Don't we already have a flag for this in the IOMMU space?
>>>
>>> enum iommu_cap {
>>>           IOMMU_CAP_CACHE_COHERENCY,      /* IOMMU can enforce cache
>>>           coherent DMA
>>>                                              transactions */
>>> --->    IOMMU_CAP_INTR_REMAP,           /* IOMMU supports interrupt
>>> isolation */
>>>           IOMMU_CAP_NOEXEC,               /* IOMMU_NOEXEC flag */
>>> };
>>>
>> I saw this flag had been enabled in x86 and ARM arch.
>>
>> I'm not sure whether we can mmap MSI-X table in those archs. I just
>> verify it on PPC64 arch.
> Unfortunately that's not a very good excuse for creating an alternate implementation.  When x86 implements interrupt remapping, we get fine grained isolation of MSI vectors and we've always taken this flag to mean that the system is isolated from devices that may perform DoS attacks with MSI writes.  I'm not entirely sure whether ARM really provides that degree of isolation, but they would be incorrect is exposing the capability if they do not.  Thanks,
>
> Alex
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

OK, I will use this flag in next version. Thanks.

Regards,
Yongji Xie

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 3/5] PCI: Add host bridge attribute to indicate filtering of MSIs is supported
@ 2016-02-01  9:13           ` Yongji Xie
  0 siblings, 0 replies; 22+ messages in thread
From: Yongji Xie @ 2016-02-01  9:13 UTC (permalink / raw)
  To: Alex Williamson
  Cc: nikunj, zhong, kvm, linux-doc, aik, linux-pci, corbet,
	linux-kernel, warrier, paulus, bhelgaas, linuxppc-dev

On 2016/1/30 3:05, Alex Williamson wrote:
>
> ----- Original Message -----
>> On 2016/1/29 6:46, Alex Williamson wrote:
>>> On Fri, 2016-01-15 at 15:06 +0800, Yongji Xie wrote:
>>>> MSI-X tables are not allowed to be mmapped in vfio-pci
>>>> driver in case that user get to touch this directly.
>>>> This will cause some performance issues when when PCI
>>>> adapters have critical registers in the same page as
>>>> the MSI-X table.
>>>>    
>>>> However, some kind of PCI host bridge such as IODA bridge
>>>> on Power support filtering of MSIs, which can ensure that a
>>>> given pci device can only shoot the MSIs assigned for it.
>>>> So we think it's safe to expose the MSI-X table to userspace
>>>> if filtering of MSIs is supported because the exposed MSI-X
>>>> table can't be used to do harm to other memory space.
>>>>    
>>>> To support this case, this patch adds a pci_host_bridge
>>>> attribute to indicate if this PCI host bridge supports
>>>> filtering of MSIs.
>>>>    
>>>> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
>>>> ---
>>>>    drivers/pci/host-bridge.c |    6 ++++++
>>>>    include/linux/pci.h       |    3 +++
>>>>    2 files changed, 9 insertions(+)
>>>>    
>>>> diff --git a/drivers/pci/host-bridge.c b/drivers/pci/host-bridge.c
>>>> index 5f4a2e0..c029267 100644
>>>> --- a/drivers/pci/host-bridge.c
>>>> +++ b/drivers/pci/host-bridge.c
>>>> @@ -96,3 +96,9 @@ void pcibios_bus_to_resource(struct pci_bus *bus, struct
>>>> resource *res,
>>>>    	res->end = region->end + offset;
>>>>    }
>>>>    EXPORT_SYMBOL(pcibios_bus_to_resource);
>>>> +
>>>> +bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev)
>>>> +{
>>>> +	return pci_find_host_bridge(pdev->bus)->msi_filtered;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(pci_host_bridge_msi_filtered_enabled);
>>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>>> index b640d65..b952b78 100644
>>>> --- a/include/linux/pci.h
>>>> +++ b/include/linux/pci.h
>>>> @@ -412,6 +412,7 @@ struct pci_host_bridge {
>>>>    	void (*release_fn)(struct pci_host_bridge *);
>>>>    	void *release_data;
>>>>    	unsigned int ignore_reset_delay:1;	/* for entire hierarchy */
>>>> +	unsigned int msi_filtered:1;	/* support filtering of MSIs */
>>>>    	/* Resource alignment requirements */
>>>>    	resource_size_t (*align_resource)(struct pci_dev *dev,
>>>>    			const struct resource *res,
>>>> @@ -430,6 +431,8 @@ void pci_set_host_bridge_release(struct
>>>> pci_host_bridge *bridge,
>>>>    
>>>>    int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge);
>>>>    
>>>> +bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev);
>>>> +
>>>>    /*
>>>>     * The first PCI_BRIDGE_RESOURCE_NUM PCI bus resources (those that
>>>>     correspond
>>>>     * to P2P or CardBus bridge windows) go in a table.  Additional ones
>>>>     (for
>>> Don't we already have a flag for this in the IOMMU space?
>>>
>>> enum iommu_cap {
>>>           IOMMU_CAP_CACHE_COHERENCY,      /* IOMMU can enforce cache
>>>           coherent DMA
>>>                                              transactions */
>>> --->    IOMMU_CAP_INTR_REMAP,           /* IOMMU supports interrupt
>>> isolation */
>>>           IOMMU_CAP_NOEXEC,               /* IOMMU_NOEXEC flag */
>>> };
>>>
>> I saw this flag had been enabled in x86 and ARM arch.
>>
>> I'm not sure whether we can mmap MSI-X table in those archs. I just
>> verify it on PPC64 arch.
> Unfortunately that's not a very good excuse for creating an alternate implementation.  When x86 implements interrupt remapping, we get fine grained isolation of MSI vectors and we've always taken this flag to mean that the system is isolated from devices that may perform DoS attacks with MSI writes.  I'm not entirely sure whether ARM really provides that degree of isolation, but they would be incorrect is exposing the capability if they do not.  Thanks,
>
> Alex
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

OK, I will use this flag in next version. Thanks.

Regards,
Yongji Xie

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-02-01  9:14 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-15  7:06 [RFC PATCH v3 0/5] vfio-pci: Allow to mmap sub-page MMIO BARs and MSI-X table on PPC64 platform Yongji Xie
2016-01-15  7:06 ` [RFC PATCH v3 1/5] PCI: Add support for enforcing all MMIO BARs to be page aligned Yongji Xie
2016-01-28 22:46   ` Alex Williamson
2016-01-29 10:37     ` Yongji Xie
2016-01-29 19:01       ` Alex Williamson
2016-02-01  8:50         ` Yongji Xie
2016-02-01  8:50           ` Yongji Xie
2016-01-15  7:06 ` [RFC PATCH v3 2/5] vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive Yongji Xie
2016-01-15  7:06 ` [RFC PATCH v3 3/5] PCI: Add host bridge attribute to indicate filtering of MSIs is supported Yongji Xie
2016-01-15 17:24   ` David Laight
2016-01-15 17:24     ` David Laight
2016-01-20  9:41     ` Yongji Xie
2016-01-28 22:46   ` Alex Williamson
2016-01-29 10:40     ` Yongji Xie
2016-01-29 19:05       ` Alex Williamson
2016-02-01  9:13         ` Yongji Xie
2016-02-01  9:13           ` Yongji Xie
2016-01-15  7:06 ` [RFC PATCH v3 4/5] powerpc/powernv/pci-ioda: Enable msi_filtered bit for any IODA host bridge Yongji Xie
2016-01-15  7:06 ` [RFC PATCH v3 5/5] vfio-pci: Allow to mmap MSI-X table if host bridge supports filtering of MSIs Yongji Xie
2016-01-28 22:46   ` Alex Williamson
2016-01-29 10:42     ` Yongji Xie
2016-01-28 10:01 ` [RFC PATCH v3 0/5] vfio-pci: Allow to mmap sub-page MMIO BARs and MSI-X table on PPC64 platform Yongji Xie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.