All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v10 0/4] PCI: Introduce a way to enforce all MMIO BARs not to share PAGE_SIZE
@ 2017-04-10 11:58 Yongji Xie
  2017-04-10 11:58 ` [PATCH v10 1/4] PCI: A fix for caculating bridge window's size and alignment Yongji Xie
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Yongji Xie @ 2017-04-10 11:58 UTC (permalink / raw)
  To: bhelgaas
  Cc: linux-pci, linuxppc-dev, alex.williamson, gwshan, aik, benh, mpe,
	paulus, zhong

This series introduces a way for PCI resource allocator to force
MMIO BARs not to share PAGE_SIZE. This would make sense to VFIO
driver. Because current VFIO implementation disallows to mmap
sub-page(size < PAGE_SIZE) MMIO BARs which may share the same page
with other BARs for security reasons. Thus, we have to handle mmio
access to these BARs in QEMU emulation rather than in guest which
will cause some performance loss.

In our solution, we try to make use of the existing code path of
resource_alignment kernel parameter and add a macro to set default
alignment for it. Thus we can define this macro by default on some
archs which may easily hit the performance issue because of their
64K page.

In this series, patch 1 fixes a bug related to bridge window
size/alignment caculating; patch 2,3 add support for setting default
alignment of all MMIO BAR.

Changelog v10:
- Introduce an arch-specific function to set default alignment
  for all PCI devices instead of using macro
- Fix some minor comment issues
- Code style improvements

Changelog v9:
- Add a patch to fix for caculating bridge window's size and alignment
- Remove an unrelated patch
- Rework the patch that fix bug that reassign resources's alignment by
  changing its size

Changelog v8:
- Rebased against v4.10-rc4
- Rework the patch 2
- Change the commit log of patch 1

Changelog v7:
- Rebased against v4.9-rc2
- Drop two merged patches
- Rework the patch which fix a bug that resources's size is changed when
  using resource_alignment
- Add a patch that fix a bug for IOV BARs when using resource_alignment

Changelog v6:
- Remove the option "noresize@" of resource_alignment

Changelog v5:
- Rebased against v4.8-rc6
- Drop the patch that forbidding disable memory decoding in
  pci_reassigndev_resource_alignment()

Changelog v4:
- Rebased against v4.8-rc1
- Drop one irrelevant patch
- Drop the patch that adding wildcard to resource_alignment to enforce
  the alignment of all MMIO BARs to be at least PAGE_SIZE
- Change the format of option "noresize" of resource_alignment
- Code style improvements

Changelog v3:
- Ignore enforced alignment to fixed BARs
- Fix issue that disabling memory decoding when reassigning the alignment
- Only enable default alignment on PowerNV platform

Changelog v2:
- Ignore enforced alignment to VF BARs on pci_reassigndev_resource_alignment()

Yongji Xie (4):
  PCI: A fix for caculating bridge window's size and alignment
  PCI: Add pcibios_default_alignment() for arch-specific alignment control
  powerpc/powernv: Override pcibios_default_alignment() to force PCI devices to be page aligned
  PCI: Don't extend device's size when using default alignment for all devices

 arch/powerpc/include/asm/machdep.h        |    2 ++
 arch/powerpc/kernel/pci-common.c          |    8 ++++++
 arch/powerpc/platforms/powernv/pci-ioda.c |    7 +++++
 drivers/pci/pci.c                         |   44 +++++++++++++++++++++--------
 drivers/pci/setup-bus.c                   |    4 +--
 5 files changed, 51 insertions(+), 14 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v10 1/4] PCI: A fix for caculating bridge window's size and alignment
  2017-04-10 11:58 [PATCH v10 0/4] PCI: Introduce a way to enforce all MMIO BARs not to share PAGE_SIZE Yongji Xie
@ 2017-04-10 11:58 ` Yongji Xie
  2017-04-10 11:58 ` [PATCH v10 2/4] PCI: Add pcibios_default_alignment() for arch-specific alignment control Yongji Xie
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Yongji Xie @ 2017-04-10 11:58 UTC (permalink / raw)
  To: bhelgaas
  Cc: linux-pci, linuxppc-dev, alex.williamson, gwshan, aik, benh, mpe,
	paulus, zhong

In case that one device's alignment is greater than its size,
we may get an incorrect size and alignment for its bus's memory
window in pbus_size_mem(). This patch fixes this case.

Signed-off-by: Yongji Xie <elohimes@gmail.com>
---
 drivers/pci/setup-bus.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index cb389277..958da7d 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1066,10 +1066,10 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 				r->flags = 0;
 				continue;
 			}
-			size += r_size;
+			size += max(r_size, align);
 			/* Exclude ranges with size > align from
 			   calculation of the alignment. */
-			if (r_size == align)
+			if (r_size <= align)
 				aligns[order] += align;
 			if (order > max_order)
 				max_order = order;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v10 2/4] PCI: Add pcibios_default_alignment() for arch-specific alignment control
  2017-04-10 11:58 [PATCH v10 0/4] PCI: Introduce a way to enforce all MMIO BARs not to share PAGE_SIZE Yongji Xie
  2017-04-10 11:58 ` [PATCH v10 1/4] PCI: A fix for caculating bridge window's size and alignment Yongji Xie
@ 2017-04-10 11:58 ` Yongji Xie
  2017-04-10 11:58 ` [PATCH v10 3/4] powerpc/powernv: Override pcibios_default_alignment() to force PCI devices to be page aligned Yongji Xie
  2017-04-10 11:58 ` [PATCH v10 4/4] PCI: Don't extend device's size when using default alignment for all devices Yongji Xie
  3 siblings, 0 replies; 14+ messages in thread
From: Yongji Xie @ 2017-04-10 11:58 UTC (permalink / raw)
  To: bhelgaas
  Cc: linux-pci, linuxppc-dev, alex.williamson, gwshan, aik, benh, mpe,
	paulus, zhong

When VFIO passes through a PCI device to a guest, it does not allow
the guest to mmap BARs that are smaller than PAGE_SIZE unless it
can reserve the rest of the page (see vfio_pci_probe_mmaps()). This
is because a page might contain several small BARs for unrelated
devices and a guest should not be able to access all of them.

VFIO emulates guest accesses to non-mappable BARs, which is functional
but slow. On systems with large page sizes, e.g., PowerNV with 64K pages,
BARs are more likely to share a page and performance is more likely to
be a problem.

Add a weak function to set default alignment for all PCI devices.
An arch can override it to force the PCI core to place memory BARs on
their own pages.

Signed-off-by: Yongji Xie <elohimes@gmail.com>
---
 drivers/pci/pci.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 7904d02..02f1255 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4947,6 +4947,11 @@ void pci_ignore_hotplug(struct pci_dev *dev)
 }
 EXPORT_SYMBOL_GPL(pci_ignore_hotplug);
 
+resource_size_t __weak pcibios_default_alignment(struct pci_dev *dev)
+{
+	return 0;
+}
+
 #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
 static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
 static DEFINE_SPINLOCK(resource_alignment_lock);
@@ -4962,14 +4967,15 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev)
 {
 	int seg, bus, slot, func, align_order, count;
 	unsigned short vendor, device, subsystem_vendor, subsystem_device;
-	resource_size_t align = 0;
+	resource_size_t align = pcibios_default_alignment(dev);
 	char *p;
 
 	spin_lock(&resource_alignment_lock);
 	p = resource_alignment_param;
-	if (!*p)
+	if (!*p && !align)
 		goto out;
 	if (pci_has_flag(PCI_PROBE_ONLY)) {
+		align = 0;
 		pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n");
 		goto out;
 	}
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v10 3/4] powerpc/powernv: Override pcibios_default_alignment() to force PCI devices to be page aligned
  2017-04-10 11:58 [PATCH v10 0/4] PCI: Introduce a way to enforce all MMIO BARs not to share PAGE_SIZE Yongji Xie
  2017-04-10 11:58 ` [PATCH v10 1/4] PCI: A fix for caculating bridge window's size and alignment Yongji Xie
  2017-04-10 11:58 ` [PATCH v10 2/4] PCI: Add pcibios_default_alignment() for arch-specific alignment control Yongji Xie
@ 2017-04-10 11:58 ` Yongji Xie
  2017-04-14 15:58   ` Bjorn Helgaas
  2017-04-10 11:58 ` [PATCH v10 4/4] PCI: Don't extend device's size when using default alignment for all devices Yongji Xie
  3 siblings, 1 reply; 14+ messages in thread
From: Yongji Xie @ 2017-04-10 11:58 UTC (permalink / raw)
  To: bhelgaas
  Cc: linux-pci, linuxppc-dev, alex.williamson, gwshan, aik, benh, mpe,
	paulus, zhong

This overrides pcibios_default_alignment() to set default alignment
to PAGE_SIZE for all PCI devices on PowerNV platform. Thus sub-page
BARs would not share a page and could be mapped into guest when VFIO
passthrough them.

Signed-off-by: Yongji Xie <elohimes@gmail.com>
---
 arch/powerpc/include/asm/machdep.h        |    2 ++
 arch/powerpc/kernel/pci-common.c          |    8 ++++++++
 arch/powerpc/platforms/powernv/pci-ioda.c |    7 +++++++
 3 files changed, 17 insertions(+)

diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index 5011b69..a82c192 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -173,6 +173,8 @@ struct machdep_calls {
 	/* Called after scan and before resource survey */
 	void (*pcibios_fixup_phb)(struct pci_controller *hose);
 
+	resource_size_t (*pcibios_default_alignment)(struct pci_dev *);
+
 #ifdef CONFIG_PCI_IOV
 	void (*pcibios_fixup_sriov)(struct pci_dev *pdev);
 	resource_size_t (*pcibios_iov_resource_alignment)(struct pci_dev *, int resno);
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index ffda24a..ceda574 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -233,6 +233,14 @@ void pcibios_reset_secondary_bus(struct pci_dev *dev)
 	pci_reset_secondary_bus(dev);
 }
 
+resource_size_t pcibios_default_alignment(struct pci_dev *pdev)
+{
+	if (ppc_md.pcibios_default_alignment)
+		return ppc_md.pcibios_default_alignment(pdev);
+
+	return 0;
+}
+
 #ifdef CONFIG_PCI_IOV
 resource_size_t pcibios_iov_resource_alignment(struct pci_dev *pdev, int resno)
 {
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index e367382..354c852 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3297,6 +3297,11 @@ static void pnv_pci_setup_bridge(struct pci_bus *bus, unsigned long type)
 	}
 }
 
+static resource_size_t pnv_pci_default_alignment(struct pci_dev *pdev)
+{
+	return PAGE_SIZE;
+}
+
 #ifdef CONFIG_PCI_IOV
 static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
 						      int resno)
@@ -3830,6 +3835,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 		hose->controller_ops = pnv_pci_ioda_controller_ops;
 	}
 
+	ppc_md.pcibios_default_alignment = pnv_pci_default_alignment;
+
 #ifdef CONFIG_PCI_IOV
 	ppc_md.pcibios_fixup_sriov = pnv_pci_ioda_fixup_iov_resources;
 	ppc_md.pcibios_iov_resource_alignment = pnv_pci_iov_resource_alignment;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v10 4/4] PCI: Don't extend device's size when using default alignment for all devices
  2017-04-10 11:58 [PATCH v10 0/4] PCI: Introduce a way to enforce all MMIO BARs not to share PAGE_SIZE Yongji Xie
                   ` (2 preceding siblings ...)
  2017-04-10 11:58 ` [PATCH v10 3/4] powerpc/powernv: Override pcibios_default_alignment() to force PCI devices to be page aligned Yongji Xie
@ 2017-04-10 11:58 ` Yongji Xie
  2017-04-14 22:54   ` Bjorn Helgaas
  3 siblings, 1 reply; 14+ messages in thread
From: Yongji Xie @ 2017-04-10 11:58 UTC (permalink / raw)
  To: bhelgaas
  Cc: linux-pci, linuxppc-dev, alex.williamson, gwshan, aik, benh, mpe,
	paulus, zhong

Currently we reassign the alignment by extending resources' size in
pci_reassigndev_resource_alignment(). This could potentially break
some drivers when the driver uses the size to locate register
whose length is related to the size. Some examples as below:

- misc/Hpilo.c:
off = pci_resource_len(pdev, bar) - 0x2000;

- net/ethernet/chelsio/cxgb4/cxgb4_uld.h:
(pci_resource_len((pdev), 2) - roundup_pow_of_two((vres)->ocq.size))

- infiniband/hw/nes/Nes_hw.c:
num_pds = pci_resource_len(nesdev->pcidev, BAR_1) >> PAGE_SHIFT;

This risk could be easily prevented before because we only had one way
(kernel parameter resource_alignment) to touch those codes. And even
some users may be happy to see the extended size.

But now we introduce pcibios_default_alignment() to set default alignment
for all PCI devices which would also touch those codes. It would be hard
to prevent the risk in this case. So this patch tries to use
START_ALIGNMENT to identify the resource's alignment without extending
the size when the alignment reassigning is caused by the default alignment.

Signed-off-by: Yongji Xie <elohimes@gmail.com>
---
 drivers/pci/pci.c |   34 ++++++++++++++++++++++++----------
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 02f1255..358366e 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4959,11 +4959,13 @@ resource_size_t __weak pcibios_default_alignment(struct pci_dev *dev)
 /**
  * pci_specified_resource_alignment - get resource alignment specified by user.
  * @dev: the PCI device to get
+ * @resize: whether or not to change resources' size when reassigning alignment
  *
  * RETURNS: Resource alignment if it is specified.
  *          Zero if it is not specified.
  */
-static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev)
+static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
+							bool *resize)
 {
 	int seg, bus, slot, func, align_order, count;
 	unsigned short vendor, device, subsystem_vendor, subsystem_device;
@@ -5005,6 +5007,7 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev)
 				(!device || (device == dev->device)) &&
 				(!subsystem_vendor || (subsystem_vendor == dev->subsystem_vendor)) &&
 				(!subsystem_device || (subsystem_device == dev->subsystem_device))) {
+				*resize = true;
 				if (align_order == -1)
 					align = PAGE_SIZE;
 				else
@@ -5030,6 +5033,7 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev)
 				bus == dev->bus->number &&
 				slot == PCI_SLOT(dev->devfn) &&
 				func == PCI_FUNC(dev->devfn)) {
+				*resize = true;
 				if (align_order == -1)
 					align = PAGE_SIZE;
 				else
@@ -5062,6 +5066,7 @@ void pci_reassigndev_resource_alignment(struct pci_dev *dev)
 	struct resource *r;
 	resource_size_t align, size;
 	u16 command;
+	bool resize = false;
 
 	/*
 	 * VF BARs are read-only zero according to SR-IOV spec r1.1, sec
@@ -5073,7 +5078,7 @@ void pci_reassigndev_resource_alignment(struct pci_dev *dev)
 		return;
 
 	/* check if specified PCI is target device to reassign */
-	align = pci_specified_resource_alignment(dev);
+	align = pci_specified_resource_alignment(dev, &resize);
 	if (!align)
 		return;
 
@@ -5101,15 +5106,24 @@ void pci_reassigndev_resource_alignment(struct pci_dev *dev)
 		}
 
 		size = resource_size(r);
-		if (size < align) {
-			size = align;
-			dev_info(&dev->dev,
-				"Rounding up size of resource #%d to %#llx.\n",
-				i, (unsigned long long)size);
+		if (resize) {
+			if (size < align) {
+				size = align;
+				dev_info(&dev->dev,
+					"Rounding up size of resource #%d to %#llx.\n",
+					i, (unsigned long long)size);
+			}
+			r->flags |= IORESOURCE_UNSET;
+			r->start = 0;
+		} else {
+			if (size < align) {
+				r->flags &= ~IORESOURCE_SIZEALIGN;
+				r->flags |= IORESOURCE_STARTALIGN |
+							IORESOURCE_UNSET;
+				r->start = align;
+			}
 		}
-		r->flags |= IORESOURCE_UNSET;
-		r->end = size - 1;
-		r->start = 0;
+		r->end = r->start + size - 1;
 	}
 	/* Need to disable bridge's resource window,
 	 * to enable the kernel to reassign new resource
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 3/4] powerpc/powernv: Override pcibios_default_alignment() to force PCI devices to be page aligned
  2017-04-10 11:58 ` [PATCH v10 3/4] powerpc/powernv: Override pcibios_default_alignment() to force PCI devices to be page aligned Yongji Xie
@ 2017-04-14 15:58   ` Bjorn Helgaas
  2017-04-14 21:52     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 14+ messages in thread
From: Bjorn Helgaas @ 2017-04-14 15:58 UTC (permalink / raw)
  To: Yongji Xie
  Cc: bhelgaas, linux-pci, linuxppc-dev, alex.williamson, gwshan, aik,
	benh, mpe, paulus, zhong

On Mon, Apr 10, 2017 at 07:58:13PM +0800, Yongji Xie wrote:
> This overrides pcibios_default_alignment() to set default alignment
> to PAGE_SIZE for all PCI devices on PowerNV platform. Thus sub-page
> BARs would not share a page and could be mapped into guest when VFIO
> passthrough them.

Thanks for doing this.  This looks like a reasonable strategy to me,
but it would be good to get a powerpc ack for it.

> Signed-off-by: Yongji Xie <elohimes@gmail.com>
> ---
>  arch/powerpc/include/asm/machdep.h        |    2 ++
>  arch/powerpc/kernel/pci-common.c          |    8 ++++++++
>  arch/powerpc/platforms/powernv/pci-ioda.c |    7 +++++++
>  3 files changed, 17 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
> index 5011b69..a82c192 100644
> --- a/arch/powerpc/include/asm/machdep.h
> +++ b/arch/powerpc/include/asm/machdep.h
> @@ -173,6 +173,8 @@ struct machdep_calls {
>  	/* Called after scan and before resource survey */
>  	void (*pcibios_fixup_phb)(struct pci_controller *hose);
>  
> +	resource_size_t (*pcibios_default_alignment)(struct pci_dev *);
> +
>  #ifdef CONFIG_PCI_IOV
>  	void (*pcibios_fixup_sriov)(struct pci_dev *pdev);
>  	resource_size_t (*pcibios_iov_resource_alignment)(struct pci_dev *, int resno);
> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
> index ffda24a..ceda574 100644
> --- a/arch/powerpc/kernel/pci-common.c
> +++ b/arch/powerpc/kernel/pci-common.c
> @@ -233,6 +233,14 @@ void pcibios_reset_secondary_bus(struct pci_dev *dev)
>  	pci_reset_secondary_bus(dev);
>  }
>  
> +resource_size_t pcibios_default_alignment(struct pci_dev *pdev)
> +{
> +	if (ppc_md.pcibios_default_alignment)
> +		return ppc_md.pcibios_default_alignment(pdev);
> +
> +	return 0;
> +}
> +
>  #ifdef CONFIG_PCI_IOV
>  resource_size_t pcibios_iov_resource_alignment(struct pci_dev *pdev, int resno)
>  {
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index e367382..354c852 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -3297,6 +3297,11 @@ static void pnv_pci_setup_bridge(struct pci_bus *bus, unsigned long type)
>  	}
>  }
>  
> +static resource_size_t pnv_pci_default_alignment(struct pci_dev *pdev)
> +{
> +	return PAGE_SIZE;
> +}
> +
>  #ifdef CONFIG_PCI_IOV
>  static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
>  						      int resno)
> @@ -3830,6 +3835,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>  		hose->controller_ops = pnv_pci_ioda_controller_ops;
>  	}
>  
> +	ppc_md.pcibios_default_alignment = pnv_pci_default_alignment;
> +
>  #ifdef CONFIG_PCI_IOV
>  	ppc_md.pcibios_fixup_sriov = pnv_pci_ioda_fixup_iov_resources;
>  	ppc_md.pcibios_iov_resource_alignment = pnv_pci_iov_resource_alignment;
> -- 
> 1.7.9.5
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 3/4] powerpc/powernv: Override pcibios_default_alignment() to force PCI devices to be page aligned
  2017-04-14 15:58   ` Bjorn Helgaas
@ 2017-04-14 21:52     ` Benjamin Herrenschmidt
  2017-04-15 16:36       ` Bjorn Helgaas
  0 siblings, 1 reply; 14+ messages in thread
From: Benjamin Herrenschmidt @ 2017-04-14 21:52 UTC (permalink / raw)
  To: Bjorn Helgaas, Yongji Xie
  Cc: bhelgaas, linux-pci, linuxppc-dev, alex.williamson, gwshan, aik,
	mpe, paulus, zhong

On Fri, 2017-04-14 at 10:58 -0500, Bjorn Helgaas wrote:
> On Mon, Apr 10, 2017 at 07:58:13PM +0800, Yongji Xie wrote:
> > This overrides pcibios_default_alignment() to set default alignment
> > to PAGE_SIZE for all PCI devices on PowerNV platform. Thus sub-page
> > BARs would not share a page and could be mapped into guest when
> > VFIO
> > passthrough them.
> 
> Thanks for doing this.  This looks like a reasonable strategy to me,
> but it would be good to get a powerpc ack for it.

I agree in principle. I'm surprised that PowerPC is the only one
interested here though, what about other platforms who want to use
KVM and PCI pass-through and use Linux to assign BARs ?

Cheers,
Ben.

> > Signed-off-by: Yongji Xie <elohimes@gmail.com>
> > ---
> >  arch/powerpc/include/asm/machdep.h        |    2 ++
> >  arch/powerpc/kernel/pci-common.c          |    8 ++++++++
> >  arch/powerpc/platforms/powernv/pci-ioda.c |    7 +++++++
> >  3 files changed, 17 insertions(+)
> > 
> > diff --git a/arch/powerpc/include/asm/machdep.h
> > b/arch/powerpc/include/asm/machdep.h
> > index 5011b69..a82c192 100644
> > --- a/arch/powerpc/include/asm/machdep.h
> > +++ b/arch/powerpc/include/asm/machdep.h
> > @@ -173,6 +173,8 @@ struct machdep_calls {
> >  	/* Called after scan and before resource survey */
> >  	void (*pcibios_fixup_phb)(struct pci_controller *hose);
> >  
> > +	resource_size_t (*pcibios_default_alignment)(struct
> > pci_dev *);
> > +
> >  #ifdef CONFIG_PCI_IOV
> >  	void (*pcibios_fixup_sriov)(struct pci_dev *pdev);
> >  	resource_size_t (*pcibios_iov_resource_alignment)(struct
> > pci_dev *, int resno);
> > diff --git a/arch/powerpc/kernel/pci-common.c
> > b/arch/powerpc/kernel/pci-common.c
> > index ffda24a..ceda574 100644
> > --- a/arch/powerpc/kernel/pci-common.c
> > +++ b/arch/powerpc/kernel/pci-common.c
> > @@ -233,6 +233,14 @@ void pcibios_reset_secondary_bus(struct
> > pci_dev *dev)
> >  	pci_reset_secondary_bus(dev);
> >  }
> >  
> > +resource_size_t pcibios_default_alignment(struct pci_dev *pdev)
> > +{
> > +	if (ppc_md.pcibios_default_alignment)
> > +		return ppc_md.pcibios_default_alignment(pdev);
> > +
> > +	return 0;
> > +}
> > +
> >  #ifdef CONFIG_PCI_IOV
> >  resource_size_t pcibios_iov_resource_alignment(struct pci_dev
> > *pdev, int resno)
> >  {
> > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c
> > b/arch/powerpc/platforms/powernv/pci-ioda.c
> > index e367382..354c852 100644
> > --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> > @@ -3297,6 +3297,11 @@ static void pnv_pci_setup_bridge(struct
> > pci_bus *bus, unsigned long type)
> >  	}
> >  }
> >  
> > +static resource_size_t pnv_pci_default_alignment(struct pci_dev
> > *pdev)
> > +{
> > +	return PAGE_SIZE;
> > +}
> > +
> >  #ifdef CONFIG_PCI_IOV
> >  static resource_size_t pnv_pci_iov_resource_alignment(struct
> > pci_dev *pdev,
> >  						      int resno)
> > @@ -3830,6 +3835,8 @@ static void __init
> > pnv_pci_init_ioda_phb(struct device_node *np,
> >  		hose->controller_ops =
> > pnv_pci_ioda_controller_ops;
> >  	}
> >  
> > +	ppc_md.pcibios_default_alignment =
> > pnv_pci_default_alignment;
> > +
> >  #ifdef CONFIG_PCI_IOV
> >  	ppc_md.pcibios_fixup_sriov =
> > pnv_pci_ioda_fixup_iov_resources;
> >  	ppc_md.pcibios_iov_resource_alignment =
> > pnv_pci_iov_resource_alignment;
> > -- 
> > 1.7.9.5
> > 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 4/4] PCI: Don't extend device's size when using default alignment for all devices
  2017-04-10 11:58 ` [PATCH v10 4/4] PCI: Don't extend device's size when using default alignment for all devices Yongji Xie
@ 2017-04-14 22:54   ` Bjorn Helgaas
  2017-04-17  6:27     ` Yongji Xie
  2017-04-17  6:59     ` Yongji Xie
  0 siblings, 2 replies; 14+ messages in thread
From: Bjorn Helgaas @ 2017-04-14 22:54 UTC (permalink / raw)
  To: Yongji Xie
  Cc: bhelgaas, linux-pci, linuxppc-dev, alex.williamson, gwshan, aik,
	benh, mpe, paulus, zhong

On Mon, Apr 10, 2017 at 07:58:14PM +0800, Yongji Xie wrote:
> Currently we reassign the alignment by extending resources' size in
> pci_reassigndev_resource_alignment(). This could potentially break
> some drivers when the driver uses the size to locate register
> whose length is related to the size. Some examples as below:
> 
> - misc/Hpilo.c:

If you're going to quote filenames, they need to match the actual
files in the tree.  In this case, "hpilo.c", not "Hpilo.c".  Also
"Nes_hw.c" below is incorrect.

> off = pci_resource_len(pdev, bar) - 0x2000;
> 
> - net/ethernet/chelsio/cxgb4/cxgb4_uld.h:
> (pci_resource_len((pdev), 2) - roundup_pow_of_two((vres)->ocq.size))
> 
> - infiniband/hw/nes/Nes_hw.c:
> num_pds = pci_resource_len(nesdev->pcidev, BAR_1) >> PAGE_SHIFT;

This BAR is apparently at least a page in size already, so this only
breaks if we request alignment of more than a page size.

> This risk could be easily prevented before because we only had one way
> (kernel parameter resource_alignment) to touch those codes. And even
> some users may be happy to see the extended size.

Currently, pci_reassigndev_resource_alignment() extends the size of
some resources when we use "pci=resource_alignment=..."  That
certainly breaks places like the ones above.

What do you mean by "some users may be happy to see the extended
size"?  We're increasing a resource size to something larger than what
the BAR actually responds to.  I don't see how that can ever be an
*improvement*.  I can see how most drivers won't care, and a few (like
the ones you cite above) will break.  But I don't see how any driver
could be *helped* by us making the resource larger.

> But now we introduce pcibios_default_alignment() to set default alignment
> for all PCI devices which would also touch those codes. It would be hard
> to prevent the risk in this case. So this patch tries to use
> START_ALIGNMENT to identify the resource's alignment without extending
> the size when the alignment reassigning is caused by the default alignment.
> 
> Signed-off-by: Yongji Xie <elohimes@gmail.com>
> ---
>  drivers/pci/pci.c |   34 ++++++++++++++++++++++++----------
>  1 file changed, 24 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 02f1255..358366e 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4959,11 +4959,13 @@ resource_size_t __weak pcibios_default_alignment(struct pci_dev *dev)
>  /**
>   * pci_specified_resource_alignment - get resource alignment specified by user.
>   * @dev: the PCI device to get
> + * @resize: whether or not to change resources' size when reassigning alignment
>   *
>   * RETURNS: Resource alignment if it is specified.
>   *          Zero if it is not specified.
>   */
> -static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev)
> +static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
> +							bool *resize)
>  {
>  	int seg, bus, slot, func, align_order, count;
>  	unsigned short vendor, device, subsystem_vendor, subsystem_device;
> @@ -5005,6 +5007,7 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev)
>  				(!device || (device == dev->device)) &&
>  				(!subsystem_vendor || (subsystem_vendor == dev->subsystem_vendor)) &&
>  				(!subsystem_device || (subsystem_device == dev->subsystem_device))) {
> +				*resize = true;
>  				if (align_order == -1)
>  					align = PAGE_SIZE;
>  				else
> @@ -5030,6 +5033,7 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev)
>  				bus == dev->bus->number &&
>  				slot == PCI_SLOT(dev->devfn) &&
>  				func == PCI_FUNC(dev->devfn)) {
> +				*resize = true;
>  				if (align_order == -1)
>  					align = PAGE_SIZE;
>  				else
> @@ -5062,6 +5066,7 @@ void pci_reassigndev_resource_alignment(struct pci_dev *dev)
>  	struct resource *r;
>  	resource_size_t align, size;
>  	u16 command;
> +	bool resize = false;
>  
>  	/*
>  	 * VF BARs are read-only zero according to SR-IOV spec r1.1, sec
> @@ -5073,7 +5078,7 @@ void pci_reassigndev_resource_alignment(struct pci_dev *dev)
>  		return;
>  
>  	/* check if specified PCI is target device to reassign */
> -	align = pci_specified_resource_alignment(dev);
> +	align = pci_specified_resource_alignment(dev, &resize);
>  	if (!align)
>  		return;
>  
> @@ -5101,15 +5106,24 @@ void pci_reassigndev_resource_alignment(struct pci_dev *dev)
>  		}
>  
>  		size = resource_size(r);
> -		if (size < align) {
> -			size = align;
> -			dev_info(&dev->dev,
> -				"Rounding up size of resource #%d to %#llx.\n",
> -				i, (unsigned long long)size);
> +		if (resize) {
> +			if (size < align) {
> +				size = align;
> +				dev_info(&dev->dev,
> +					"Rounding up size of resource #%d to %#llx.\n",
> +					i, (unsigned long long)size);
> +			}
> +			r->flags |= IORESOURCE_UNSET;
> +			r->start = 0;
> +		} else {
> +			if (size < align) {
> +				r->flags &= ~IORESOURCE_SIZEALIGN;
> +				r->flags |= IORESOURCE_STARTALIGN |
> +							IORESOURCE_UNSET;
> +				r->start = align;
> +			}

I'm still not happy about the fact that we now have these two cases:

  1) If we increase resource alignment because of the
     "pci=resource_alignment=" parameter, we increase the resource
     size, and

  2) If we increase resource alignment because of
     pcibios_default_alignment(), we use IORESOURCE_STARTALIGN and do
     not increase the resource size.

This makes the code complicated and hard to maintain in the future.
We talked about this earlier [1], but I'm not sure I've figured out
the reason.  Here's my guess:

Let's assume we have a 4K BAR and we're requesting alignment to 64K.
In case 1, we're only changing the alignment for specified devices.
If we used IORESOURCE_STARTALIGN, we would align that 4K BAR of the
specified devices to 64K, but BARs of other devices could still be
assigned in that same 64K page.  Therefore, we must increase the size
to prevent other BARs in the page, even though this might break some
drivers.

But in case 2, we're changing the alignment for *all* devices.  In
this case, we *can* use IORESOURCE_STARTALIGN because we're going to
align *every* BAR of every device to 64K, so no other BARs will be put
in the same page.  And since we didn't change the resource size, we
avoid the problem of breaking drivers.

If that's the reasoning, I'm not 100% sure that the complexity of this
code is worth it.  It took me half a day to come up with this theory.
If we do have to go this route, we *must* include comments along this
line to make it easier next time.

[1] https://lkml.kernel.org/r/20160929115422.GA31048@localhost

>  		}
> -		r->flags |= IORESOURCE_UNSET;
> -		r->end = size - 1;
> -		r->start = 0;
> +		r->end = r->start + size - 1;
>  	}
>  	/* Need to disable bridge's resource window,
>  	 * to enable the kernel to reassign new resource
> -- 
> 1.7.9.5
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 3/4] powerpc/powernv: Override pcibios_default_alignment() to force PCI devices to be page aligned
  2017-04-14 21:52     ` Benjamin Herrenschmidt
@ 2017-04-15 16:36       ` Bjorn Helgaas
  2017-04-15 22:06         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 14+ messages in thread
From: Bjorn Helgaas @ 2017-04-15 16:36 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Bjorn Helgaas, Yongji Xie, linux-pci, linuxppc-dev,
	alex.williamson, Gavin Shan, Alexey Kardashevskiy,
	Michael Ellerman, paulus, Li Zhong

On Fri, Apr 14, 2017 at 4:52 PM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Fri, 2017-04-14 at 10:58 -0500, Bjorn Helgaas wrote:
>> On Mon, Apr 10, 2017 at 07:58:13PM +0800, Yongji Xie wrote:
>> > This overrides pcibios_default_alignment() to set default alignment
>> > to PAGE_SIZE for all PCI devices on PowerNV platform. Thus sub-page
>> > BARs would not share a page and could be mapped into guest when
>> > VFIO
>> > passthrough them.
>>
>> Thanks for doing this.  This looks like a reasonable strategy to me,
>> but it would be good to get a powerpc ack for it.
>
> I agree in principle. I'm surprised that PowerPC is the only one
> interested here though, what about other platforms who want to use
> KVM and PCI pass-through and use Linux to assign BARs ?

If I understand correctly, the problem is with BARs smaller than a
page, and this happens more on PowerPC because larger page sizes are
more common there.

Bjorn

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 3/4] powerpc/powernv: Override pcibios_default_alignment() to force PCI devices to be page aligned
  2017-04-15 16:36       ` Bjorn Helgaas
@ 2017-04-15 22:06         ` Benjamin Herrenschmidt
  2017-04-17 15:51           ` Bjorn Helgaas
  0 siblings, 1 reply; 14+ messages in thread
From: Benjamin Herrenschmidt @ 2017-04-15 22:06 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Yongji Xie, linux-pci, linuxppc-dev,
	alex.williamson, Gavin Shan, Alexey Kardashevskiy,
	Michael Ellerman, paulus, Li Zhong

On Sat, 2017-04-15 at 11:36 -0500, Bjorn Helgaas wrote:
> > I agree in principle. I'm surprised that PowerPC is the only one
> > interested here though, what about other platforms who want to use
> > KVM and PCI pass-through and use Linux to assign BARs ?
> 
> If I understand correctly, the problem is with BARs smaller than a
> page, and this happens more on PowerPC because larger page sizes are
> more common there.

Yes, it happens "more". That doesn't mean it doesn't happen at all on
others :-) Anyway, I'm not objecting, just surprised.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 4/4] PCI: Don't extend device's size when using default alignment for all devices
  2017-04-14 22:54   ` Bjorn Helgaas
@ 2017-04-17  6:27     ` Yongji Xie
  2017-04-17 20:33       ` Bjorn Helgaas
  2017-04-17  6:59     ` Yongji Xie
  1 sibling, 1 reply; 14+ messages in thread
From: Yongji Xie @ 2017-04-17  6:27 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, linux-pci, linuxppc-dev, alex.williamson, gwshan,
	aik, benh, mpe, paulus, zhong

[-- Attachment #1: Type: text/plain, Size: 9139 bytes --]

2017-04-15 6:54 GMT+08:00 Bjorn Helgaas <helgaas@kernel.org>:

> On Mon, Apr 10, 2017 at 07:58:14PM +0800, Yongji Xie wrote:
> > Currently we reassign the alignment by extending resources' size in
> > pci_reassigndev_resource_alignment(). This could potentially break
> > some drivers when the driver uses the size to locate register
> > whose length is related to the size. Some examples as below:
> >
> > - misc/Hpilo.c:
>
> If you're going to quote filenames, they need to match the actual
> files in the tree.  In this case, "hpilo.c", not "Hpilo.c".  Also
> "Nes_hw.c" below is incorrect.
>
>
Will fix it.


> > off = pci_resource_len(pdev, bar) - 0x2000;
> >
> > - net/ethernet/chelsio/cxgb4/cxgb4_uld.h:
> > (pci_resource_len((pdev), 2) - roundup_pow_of_two((vres)->ocq.size))
> >
> > - infiniband/hw/nes/Nes_hw.c:
> > num_pds = pci_resource_len(nesdev->pcidev, BAR_1) >> PAGE_SHIFT;
>
> This BAR is apparently at least a page in size already, so this only
> breaks if we request alignment of more than a page size.
>
>
Oh, yes. I'll remove this one.

> This risk could be easily prevented before because we only had one way
> > (kernel parameter resource_alignment) to touch those codes. And even
> > some users may be happy to see the extended size.
>
> Currently, pci_reassigndev_resource_alignment() extends the size of
> some resources when we use "pci=resource_alignment=..."  That
> certainly breaks places like the ones above.
>
> What do you mean by "some users may be happy to see the extended
> size"?  We're increasing a resource size to something larger than what
> the BAR actually responds to.  I don't see how that can ever be an
> *improvement*.  I can see how most drivers won't care, and a few (like
> the ones you cite above) will break.  But I don't see how any driver
> could be *helped* by us making the resource larger.
>
>
>From commit 32a9a68 (PCI: allow assignment of memory resources with a
specified alignment), it seems that "pci=resource_alignment" was introduced
because we want to use PCI pass-through for some page-unaligned devices.

So there might be some cases that users use "pci=resource_alignment" to
extend the BAR's size then passthrough them. That's why I say "some users
may be happy to see the extended size". We are not able to passthrough
the device unless we extend its BARs to PAGE_SIZE.


> > But now we introduce pcibios_default_alignment() to set default alignment
> > for all PCI devices which would also touch those codes. It would be hard
> > to prevent the risk in this case. So this patch tries to use
> > START_ALIGNMENT to identify the resource's alignment without extending
> > the size when the alignment reassigning is caused by the default
> alignment.
> >
> > Signed-off-by: Yongji Xie <elohimes@gmail.com>
> > ---
> >  drivers/pci/pci.c |   34 ++++++++++++++++++++++++----------
> >  1 file changed, 24 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 02f1255..358366e 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -4959,11 +4959,13 @@ resource_size_t __weak
> pcibios_default_alignment(struct pci_dev *dev)
> >  /**
> >   * pci_specified_resource_alignment - get resource alignment specified
> by user.
> >   * @dev: the PCI device to get
> > + * @resize: whether or not to change resources' size when reassigning
> alignment
> >   *
> >   * RETURNS: Resource alignment if it is specified.
> >   *          Zero if it is not specified.
> >   */
> > -static resource_size_t pci_specified_resource_alignment(struct pci_dev
> *dev)
> > +static resource_size_t pci_specified_resource_alignment(struct pci_dev
> *dev,
> > +                                                     bool *resize)
> >  {
> >       int seg, bus, slot, func, align_order, count;
> >       unsigned short vendor, device, subsystem_vendor, subsystem_device;
> > @@ -5005,6 +5007,7 @@ static resource_size_t
> pci_specified_resource_alignment(struct pci_dev *dev)
> >                               (!device || (device == dev->device)) &&
> >                               (!subsystem_vendor || (subsystem_vendor ==
> dev->subsystem_vendor)) &&
> >                               (!subsystem_device || (subsystem_device ==
> dev->subsystem_device))) {
> > +                             *resize = true;
> >                               if (align_order == -1)
> >                                       align = PAGE_SIZE;
> >                               else
> > @@ -5030,6 +5033,7 @@ static resource_size_t
> pci_specified_resource_alignment(struct pci_dev *dev)
> >                               bus == dev->bus->number &&
> >                               slot == PCI_SLOT(dev->devfn) &&
> >                               func == PCI_FUNC(dev->devfn)) {
> > +                             *resize = true;
> >                               if (align_order == -1)
> >                                       align = PAGE_SIZE;
> >                               else
> > @@ -5062,6 +5066,7 @@ void pci_reassigndev_resource_alignment(struct
> pci_dev *dev)
> >       struct resource *r;
> >       resource_size_t align, size;
> >       u16 command;
> > +     bool resize = false;
> >
> >       /*
> >        * VF BARs are read-only zero according to SR-IOV spec r1.1, sec
> > @@ -5073,7 +5078,7 @@ void pci_reassigndev_resource_alignment(struct
> pci_dev *dev)
> >               return;
> >
> >       /* check if specified PCI is target device to reassign */
> > -     align = pci_specified_resource_alignment(dev);
> > +     align = pci_specified_resource_alignment(dev, &resize);
> >       if (!align)
> >               return;
> >
> > @@ -5101,15 +5106,24 @@ void pci_reassigndev_resource_alignment(struct
> pci_dev *dev)
> >               }
> >
> >               size = resource_size(r);
> > -             if (size < align) {
> > -                     size = align;
> > -                     dev_info(&dev->dev,
> > -                             "Rounding up size of resource #%d to
> %#llx.\n",
> > -                             i, (unsigned long long)size);
> > +             if (resize) {
> > +                     if (size < align) {
> > +                             size = align;
> > +                             dev_info(&dev->dev,
> > +                                     "Rounding up size of resource #%d
> to %#llx.\n",
> > +                                     i, (unsigned long long)size);
> > +                     }
> > +                     r->flags |= IORESOURCE_UNSET;
> > +                     r->start = 0;
> > +             } else {
> > +                     if (size < align) {
> > +                             r->flags &= ~IORESOURCE_SIZEALIGN;
> > +                             r->flags |= IORESOURCE_STARTALIGN |
> > +                                                     IORESOURCE_UNSET;
> > +                             r->start = align;
> > +                     }
>
> I'm still not happy about the fact that we now have these two cases:
>
>   1) If we increase resource alignment because of the
>      "pci=resource_alignment=" parameter, we increase the resource
>      size, and
>
>   2) If we increase resource alignment because of
>      pcibios_default_alignment(), we use IORESOURCE_STARTALIGN and do
>      not increase the resource size.
>
> This makes the code complicated and hard to maintain in the future.
> We talked about this earlier [1], but I'm not sure I've figured out
> the reason.  Here's my guess:
>
> Let's assume we have a 4K BAR and we're requesting alignment to 64K.
> In case 1, we're only changing the alignment for specified devices.
> If we used IORESOURCE_STARTALIGN, we would align that 4K BAR of the
> specified devices to 64K, but BARs of other devices could still be
> assigned in that same 64K page.  Therefore, we must increase the size
> to prevent other BARs in the page, even though this might break some
> drivers.
>
> But in case 2, we're changing the alignment for *all* devices.  In
> this case, we *can* use IORESOURCE_STARTALIGN because we're going to
> align *every* BAR of every device to 64K, so no other BARs will be put
> in the same page.  And since we didn't change the resource size, we
> avoid the problem of breaking drivers.
>
> If that's the reasoning, I'm not 100% sure that the complexity of this
> code is worth it.  It took me half a day to come up with this theory.
> If we do have to go this route, we *must* include comments along this
> line to make it easier next time.
>
>
Your guess is exactly correct. We have two problems when we need to change
one BAR's alignment:

1) If we extend the BAR's size, we may break the driver

2) If we only change the alignment without extending size, we have no way to
prevent other BARs being assigned into the same page.

I failed to come up with a way to use only one way to satisfy the two cases.
So I handle the two cases separately like those codes. But as you said, this
makes the code hard to maintain. I think what I can do next is to add some
comments to make the codes easier to read and maintain. Of course, if
anybody have a better idea, I'll be happy to see it.

Thanks,
Yongji

[-- Attachment #2: Type: text/html, Size: 13453 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 4/4] PCI: Don't extend device's size when using default alignment for all devices
  2017-04-14 22:54   ` Bjorn Helgaas
  2017-04-17  6:27     ` Yongji Xie
@ 2017-04-17  6:59     ` Yongji Xie
  1 sibling, 0 replies; 14+ messages in thread
From: Yongji Xie @ 2017-04-17  6:59 UTC (permalink / raw)
  To: linux-pci

On 15 April 2017 at 06:54, Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Mon, Apr 10, 2017 at 07:58:14PM +0800, Yongji Xie wrote:
> > Currently we reassign the alignment by extending resources' size in
> > pci_reassigndev_resource_alignment(). This could potentially break
> > some drivers when the driver uses the size to locate register
> > whose length is related to the size. Some examples as below:
> >
> > - misc/Hpilo.c:
>
> If you're going to quote filenames, they need to match the actual
> files in the tree.  In this case, "hpilo.c", not "Hpilo.c".  Also
> "Nes_hw.c" below is incorrect.
>

Will fix it.

> > off = pci_resource_len(pdev, bar) - 0x2000;
> >
> > - net/ethernet/chelsio/cxgb4/cxgb4_uld.h:
> > (pci_resource_len((pdev), 2) - roundup_pow_of_two((vres)->ocq.size))
> >
> > - infiniband/hw/nes/Nes_hw.c:
> > num_pds = pci_resource_len(nesdev->pcidev, BAR_1) >> PAGE_SHIFT;
>
> This BAR is apparently at least a page in size already, so this only
> breaks if we request alignment of more than a page size.
>

Oh, yes. I'll remove this one.

> > This risk could be easily prevented before because we only had one way
> > (kernel parameter resource_alignment) to touch those codes. And even
> > some users may be happy to see the extended size.
>
> Currently, pci_reassigndev_resource_alignment() extends the size of
> some resources when we use "pci=resource_alignment=..."  That
> certainly breaks places like the ones above.
>
> What do you mean by "some users may be happy to see the extended
> size"?  We're increasing a resource size to something larger than what
> the BAR actually responds to.  I don't see how that can ever be an
> *improvement*.  I can see how most drivers won't care, and a few (like
> the ones you cite above) will break.  But I don't see how any driver
> could be *helped* by us making the resource larger.
>

>From commit 32a9a68 (PCI: allow assignment of memory resources with a
specified alignment), it seems that "pci=resource_alignment" was introduced
because we want to use PCI pass-through for some page-unaligned devices.

So there might be some cases that users use "pci=resource_alignment" to
extend the BAR's size then passthrough them. That's why I say "some users
may be happy to see the extended size". We are not able to passthrough
the device unless we extend its BARs to PAGE_SIZE.

> > But now we introduce pcibios_default_alignment() to set default alignment
> > for all PCI devices which would also touch those codes. It would be hard
> > to prevent the risk in this case. So this patch tries to use
> > START_ALIGNMENT to identify the resource's alignment without extending
> > the size when the alignment reassigning is caused by the default alignment.
> >
> > Signed-off-by: Yongji Xie <elohimes@gmail.com>
> > ---
> >  drivers/pci/pci.c |   34 ++++++++++++++++++++++++----------
> >  1 file changed, 24 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 02f1255..358366e 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -4959,11 +4959,13 @@ resource_size_t __weak pcibios_default_alignment(struct pci_dev *dev)
> >  /**
> >   * pci_specified_resource_alignment - get resource alignment specified by user.
> >   * @dev: the PCI device to get
> > + * @resize: whether or not to change resources' size when reassigning alignment
> >   *
> >   * RETURNS: Resource alignment if it is specified.
> >   *          Zero if it is not specified.
> >   */
> > -static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev)
> > +static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
> > +                                                     bool *resize)
> >  {
> >       int seg, bus, slot, func, align_order, count;
> >       unsigned short vendor, device, subsystem_vendor, subsystem_device;
> > @@ -5005,6 +5007,7 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev)
> >                               (!device || (device == dev->device)) &&
> >                               (!subsystem_vendor || (subsystem_vendor == dev->subsystem_vendor)) &&
> >                               (!subsystem_device || (subsystem_device == dev->subsystem_device))) {
> > +                             *resize = true;
> >                               if (align_order == -1)
> >                                       align = PAGE_SIZE;
> >                               else
> > @@ -5030,6 +5033,7 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev)
> >                               bus == dev->bus->number &&
> >                               slot == PCI_SLOT(dev->devfn) &&
> >                               func == PCI_FUNC(dev->devfn)) {
> > +                             *resize = true;
> >                               if (align_order == -1)
> >                                       align = PAGE_SIZE;
> >                               else
> > @@ -5062,6 +5066,7 @@ void pci_reassigndev_resource_alignment(struct pci_dev *dev)
> >       struct resource *r;
> >       resource_size_t align, size;
> >       u16 command;
> > +     bool resize = false;
> >
> >       /*
> >        * VF BARs are read-only zero according to SR-IOV spec r1.1, sec
> > @@ -5073,7 +5078,7 @@ void pci_reassigndev_resource_alignment(struct pci_dev *dev)
> >               return;
> >
> >       /* check if specified PCI is target device to reassign */
> > -     align = pci_specified_resource_alignment(dev);
> > +     align = pci_specified_resource_alignment(dev, &resize);
> >       if (!align)
> >               return;
> >
> > @@ -5101,15 +5106,24 @@ void pci_reassigndev_resource_alignment(struct pci_dev *dev)
> >               }
> >
> >               size = resource_size(r);
> > -             if (size < align) {
> > -                     size = align;
> > -                     dev_info(&dev->dev,
> > -                             "Rounding up size of resource #%d to %#llx.\n",
> > -                             i, (unsigned long long)size);
> > +             if (resize) {
> > +                     if (size < align) {
> > +                             size = align;
> > +                             dev_info(&dev->dev,
> > +                                     "Rounding up size of resource #%d to %#llx.\n",
> > +                                     i, (unsigned long long)size);
> > +                     }
> > +                     r->flags |= IORESOURCE_UNSET;
> > +                     r->start = 0;
> > +             } else {
> > +                     if (size < align) {
> > +                             r->flags &= ~IORESOURCE_SIZEALIGN;
> > +                             r->flags |= IORESOURCE_STARTALIGN |
> > +                                                     IORESOURCE_UNSET;
> > +                             r->start = align;
> > +                     }
>
> I'm still not happy about the fact that we now have these two cases:
>
>   1) If we increase resource alignment because of the
>      "pci=resource_alignment=" parameter, we increase the resource
>      size, and
>
>   2) If we increase resource alignment because of
>      pcibios_default_alignment(), we use IORESOURCE_STARTALIGN and do
>      not increase the resource size.
>
> This makes the code complicated and hard to maintain in the future.
> We talked about this earlier [1], but I'm not sure I've figured out
> the reason.  Here's my guess:
>
> Let's assume we have a 4K BAR and we're requesting alignment to 64K.
> In case 1, we're only changing the alignment for specified devices.
> If we used IORESOURCE_STARTALIGN, we would align that 4K BAR of the
> specified devices to 64K, but BARs of other devices could still be
> assigned in that same 64K page.  Therefore, we must increase the size
> to prevent other BARs in the page, even though this might break some
> drivers.
>
> But in case 2, we're changing the alignment for *all* devices.  In
> this case, we *can* use IORESOURCE_STARTALIGN because we're going to
> align *every* BAR of every device to 64K, so no other BARs will be put
> in the same page.  And since we didn't change the resource size, we
> avoid the problem of breaking drivers.
>
> If that's the reasoning, I'm not 100% sure that the complexity of this
> code is worth it.  It took me half a day to come up with this theory.
> If we do have to go this route, we *must* include comments along this
> line to make it easier next time.
>

Your guess is exactly correct. We have two problems when we need to change
one BAR's alignment:

1) If we extend the BAR's size, we may break the driver

2) If we only change the alignment without extending size, we have no way to
prevent other BARs being assigned into the same page.

I failed to come up with a way to use only one way to satisfy the two cases.
So I handle the two cases separately like those codes. But as you said, this
makes the code hard to maintain. I think what I can do next is to add some
comments to make the codes easier to read and maintain. Of course, if
anybody have a better idea, I'll be happy to see it.

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 3/4] powerpc/powernv: Override pcibios_default_alignment() to force PCI devices to be page aligned
  2017-04-15 22:06         ` Benjamin Herrenschmidt
@ 2017-04-17 15:51           ` Bjorn Helgaas
  0 siblings, 0 replies; 14+ messages in thread
From: Bjorn Helgaas @ 2017-04-17 15:51 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Bjorn Helgaas, Yongji Xie, linux-pci, linuxppc-dev,
	alex.williamson, Gavin Shan, Alexey Kardashevskiy,
	Michael Ellerman, paulus, Li Zhong

On Sat, Apr 15, 2017 at 5:06 PM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Sat, 2017-04-15 at 11:36 -0500, Bjorn Helgaas wrote:
>> > I agree in principle. I'm surprised that PowerPC is the only one
>> > interested here though, what about other platforms who want to use
>> > KVM and PCI pass-through and use Linux to assign BARs ?
>>
>> If I understand correctly, the problem is with BARs smaller than a
>> page, and this happens more on PowerPC because larger page sizes are
>> more common there.
>
> Yes, it happens "more". That doesn't mean it doesn't happen at all on
> others :-) Anyway, I'm not objecting, just surprised.

Yeah, I agree.  Maybe the devices interesting for pass-through tend to
have BARs of least 4KB?  Those would never be a problem on x86.  But I
have absolutely no data either way.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 4/4] PCI: Don't extend device's size when using default alignment for all devices
  2017-04-17  6:27     ` Yongji Xie
@ 2017-04-17 20:33       ` Bjorn Helgaas
  0 siblings, 0 replies; 14+ messages in thread
From: Bjorn Helgaas @ 2017-04-17 20:33 UTC (permalink / raw)
  To: Yongji Xie
  Cc: Bjorn Helgaas, linux-pci, linuxppc-dev, alex.williamson,
	Gavin Shan, Alexey Kardashevskiy, Benjamin Herrenschmidt,
	Michael Ellerman, paulus, Li Zhong

On Mon, Apr 17, 2017 at 1:27 AM, Yongji Xie <elohimes@gmail.com> wrote:
> 2017-04-15 6:54 GMT+08:00 Bjorn Helgaas <helgaas@kernel.org>:
>>
>> On Mon, Apr 10, 2017 at 07:58:14PM +0800, Yongji Xie wrote:
>> > Currently we reassign the alignment by extending resources' size in
>> > pci_reassigndev_resource_alignment(). This could potentially break
>> > some drivers when the driver uses the size to locate register
>> > whose length is related to the size. Some examples as below:
>> >
>> > - misc/Hpilo.c:
>>
>> If you're going to quote filenames, they need to match the actual
>> files in the tree.  In this case, "hpilo.c", not "Hpilo.c".  Also
>> "Nes_hw.c" below is incorrect.
>>
>
> Will fix it.
>
>>
>> > off = pci_resource_len(pdev, bar) - 0x2000;
>> >
>> > - net/ethernet/chelsio/cxgb4/cxgb4_uld.h:
>> > (pci_resource_len((pdev), 2) - roundup_pow_of_two((vres)->ocq.size))
>> >
>> > - infiniband/hw/nes/Nes_hw.c:
>> > num_pds = pci_resource_len(nesdev->pcidev, BAR_1) >> PAGE_SHIFT;
>>
>> This BAR is apparently at least a page in size already, so this only
>> breaks if we request alignment of more than a page size.
>>
>
> Oh, yes. I'll remove this one.
>
>> > This risk could be easily prevented before because we only had one way
>> > (kernel parameter resource_alignment) to touch those codes. And even
>> > some users may be happy to see the extended size.
>>
>> Currently, pci_reassigndev_resource_alignment() extends the size of
>> some resources when we use "pci=resource_alignment=..."  That
>> certainly breaks places like the ones above.
>>
>> What do you mean by "some users may be happy to see the extended
>> size"?  We're increasing a resource size to something larger than what
>> the BAR actually responds to.  I don't see how that can ever be an
>> *improvement*.  I can see how most drivers won't care, and a few (like
>> the ones you cite above) will break.  But I don't see how any driver
>> could be *helped* by us making the resource larger.
>>
>
> From commit 32a9a68 (PCI: allow assignment of memory resources with a
> specified alignment), it seems that "pci=resource_alignment" was introduced
> because we want to use PCI pass-through for some page-unaligned devices.
>
> So there might be some cases that users use "pci=resource_alignment" to
> extend the BAR's size then passthrough them. That's why I say "some users
> may be happy to see the extended size". We are not able to passthrough
> the device unless we extend its BARs to PAGE_SIZE.
>
>>
>> > But now we introduce pcibios_default_alignment() to set default
>> > alignment
>> > for all PCI devices which would also touch those codes. It would be hard
>> > to prevent the risk in this case. So this patch tries to use
>> > START_ALIGNMENT to identify the resource's alignment without extending
>> > the size when the alignment reassigning is caused by the default
>> > alignment.
>> >
>> > Signed-off-by: Yongji Xie <elohimes@gmail.com>
>> > ---
>> >  drivers/pci/pci.c |   34 ++++++++++++++++++++++++----------
>> >  1 file changed, 24 insertions(+), 10 deletions(-)
>> >
>> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> > index 02f1255..358366e 100644
>> > --- a/drivers/pci/pci.c
>> > +++ b/drivers/pci/pci.c
>> > @@ -4959,11 +4959,13 @@ resource_size_t __weak
>> > pcibios_default_alignment(struct pci_dev *dev)
>> >  /**
>> >   * pci_specified_resource_alignment - get resource alignment specified
>> > by user.
>> >   * @dev: the PCI device to get
>> > + * @resize: whether or not to change resources' size when reassigning
>> > alignment
>> >   *
>> >   * RETURNS: Resource alignment if it is specified.
>> >   *          Zero if it is not specified.
>> >   */
>> > -static resource_size_t pci_specified_resource_alignment(struct pci_dev
>> > *dev)
>> > +static resource_size_t pci_specified_resource_alignment(struct pci_dev
>> > *dev,
>> > +                                                     bool *resize)
>> >  {
>> >       int seg, bus, slot, func, align_order, count;
>> >       unsigned short vendor, device, subsystem_vendor, subsystem_device;
>> > @@ -5005,6 +5007,7 @@ static resource_size_t
>> > pci_specified_resource_alignment(struct pci_dev *dev)
>> >                               (!device || (device == dev->device)) &&
>> >                               (!subsystem_vendor || (subsystem_vendor ==
>> > dev->subsystem_vendor)) &&
>> >                               (!subsystem_device || (subsystem_device ==
>> > dev->subsystem_device))) {
>> > +                             *resize = true;
>> >                               if (align_order == -1)
>> >                                       align = PAGE_SIZE;
>> >                               else
>> > @@ -5030,6 +5033,7 @@ static resource_size_t
>> > pci_specified_resource_alignment(struct pci_dev *dev)
>> >                               bus == dev->bus->number &&
>> >                               slot == PCI_SLOT(dev->devfn) &&
>> >                               func == PCI_FUNC(dev->devfn)) {
>> > +                             *resize = true;
>> >                               if (align_order == -1)
>> >                                       align = PAGE_SIZE;
>> >                               else
>> > @@ -5062,6 +5066,7 @@ void pci_reassigndev_resource_alignment(struct
>> > pci_dev *dev)
>> >       struct resource *r;
>> >       resource_size_t align, size;
>> >       u16 command;
>> > +     bool resize = false;
>> >
>> >       /*
>> >        * VF BARs are read-only zero according to SR-IOV spec r1.1, sec
>> > @@ -5073,7 +5078,7 @@ void pci_reassigndev_resource_alignment(struct
>> > pci_dev *dev)
>> >               return;
>> >
>> >       /* check if specified PCI is target device to reassign */
>> > -     align = pci_specified_resource_alignment(dev);
>> > +     align = pci_specified_resource_alignment(dev, &resize);
>> >       if (!align)
>> >               return;
>> >
>> > @@ -5101,15 +5106,24 @@ void pci_reassigndev_resource_alignment(struct
>> > pci_dev *dev)
>> >               }
>> >
>> >               size = resource_size(r);
>> > -             if (size < align) {
>> > -                     size = align;
>> > -                     dev_info(&dev->dev,
>> > -                             "Rounding up size of resource #%d to
>> > %#llx.\n",
>> > -                             i, (unsigned long long)size);
>> > +             if (resize) {
>> > +                     if (size < align) {
>> > +                             size = align;
>> > +                             dev_info(&dev->dev,
>> > +                                     "Rounding up size of resource #%d
>> > to %#llx.\n",
>> > +                                     i, (unsigned long long)size);
>> > +                     }
>> > +                     r->flags |= IORESOURCE_UNSET;
>> > +                     r->start = 0;
>> > +             } else {
>> > +                     if (size < align) {
>> > +                             r->flags &= ~IORESOURCE_SIZEALIGN;
>> > +                             r->flags |= IORESOURCE_STARTALIGN |
>> > +                                                     IORESOURCE_UNSET;
>> > +                             r->start = align;
>> > +                     }
>>
>> I'm still not happy about the fact that we now have these two cases:
>>
>>   1) If we increase resource alignment because of the
>>      "pci=resource_alignment=" parameter, we increase the resource
>>      size, and
>>
>>   2) If we increase resource alignment because of
>>      pcibios_default_alignment(), we use IORESOURCE_STARTALIGN and do
>>      not increase the resource size.
>>
>> This makes the code complicated and hard to maintain in the future.
>> We talked about this earlier [1], but I'm not sure I've figured out
>> the reason.  Here's my guess:
>>
>> Let's assume we have a 4K BAR and we're requesting alignment to 64K.
>> In case 1, we're only changing the alignment for specified devices.
>> If we used IORESOURCE_STARTALIGN, we would align that 4K BAR of the
>> specified devices to 64K, but BARs of other devices could still be
>> assigned in that same 64K page.  Therefore, we must increase the size
>> to prevent other BARs in the page, even though this might break some
>> drivers.
>>
>> But in case 2, we're changing the alignment for *all* devices.  In
>> this case, we *can* use IORESOURCE_STARTALIGN because we're going to
>> align *every* BAR of every device to 64K, so no other BARs will be put
>> in the same page.  And since we didn't change the resource size, we
>> avoid the problem of breaking drivers.
>>
>> If that's the reasoning, I'm not 100% sure that the complexity of this
>> code is worth it.  It took me half a day to come up with this theory.
>> If we do have to go this route, we *must* include comments along this
>> line to make it easier next time.
>>
>
> Your guess is exactly correct. We have two problems when we need to change
> one BAR's alignment:
>
> 1) If we extend the BAR's size, we may break the driver
>
> 2) If we only change the alignment without extending size, we have no way to
> prevent other BARs being assigned into the same page.
>
> I failed to come up with a way to use only one way to satisfy the two cases.
> So I handle the two cases separately like those codes. But as you said, this
> makes the code hard to maintain. I think what I can do next is to add some
> comments to make the codes easier to read and maintain. Of course, if
> anybody have a better idea, I'll be happy to see it.

I don't see a better way either.  Let me try to add some comments.
I'll post a patch and you can see what you think.

Bjorn

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-04-17 20:33 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-10 11:58 [PATCH v10 0/4] PCI: Introduce a way to enforce all MMIO BARs not to share PAGE_SIZE Yongji Xie
2017-04-10 11:58 ` [PATCH v10 1/4] PCI: A fix for caculating bridge window's size and alignment Yongji Xie
2017-04-10 11:58 ` [PATCH v10 2/4] PCI: Add pcibios_default_alignment() for arch-specific alignment control Yongji Xie
2017-04-10 11:58 ` [PATCH v10 3/4] powerpc/powernv: Override pcibios_default_alignment() to force PCI devices to be page aligned Yongji Xie
2017-04-14 15:58   ` Bjorn Helgaas
2017-04-14 21:52     ` Benjamin Herrenschmidt
2017-04-15 16:36       ` Bjorn Helgaas
2017-04-15 22:06         ` Benjamin Herrenschmidt
2017-04-17 15:51           ` Bjorn Helgaas
2017-04-10 11:58 ` [PATCH v10 4/4] PCI: Don't extend device's size when using default alignment for all devices Yongji Xie
2017-04-14 22:54   ` Bjorn Helgaas
2017-04-17  6:27     ` Yongji Xie
2017-04-17 20:33       ` Bjorn Helgaas
2017-04-17  6:59     ` Yongji Xie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.