linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7
@ 2016-04-08  0:15 Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 01/60] PCI: Fix iomem_is_exclusive() checking in pci_mmap_resource() Yinghai Lu
                   ` (60 more replies)
  0 siblings, 61 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

Hi Bjorn,

After 5b28541552ef (PCI: Restrict 64-bit prefetchable bridge windows
to 64-bit resources), we have several reports on resource allocation
failure, and we try to fix the problem with resource clip, and find
more problems.

One is realloc fail with two graphics cards above 4G.
One is from sparc that have problem with clip as we don't parse
mem64 for it.

Other report is about pci remove/rescan does not work on some setup
when BIOS tend to allocate small bus size.

This patchset enhance resource allocation to address those problems.

patch 1-11: parse MEM64 for sparc and other system with OF

patch 12-16: MMIO64 allocation enhancement
        treat non-pref mmio64 if parent bridges are all pcie.
        restore old pref allocation logic if hostbridge does not support mmio64.

patch 17-19: FIXED resource handling during realloc
        don't realloc resource if device firmware does not support bar change.

patch 20-23: bridge MMIO allocation with hotplug and last try.
        treat optional as required on first try when hotplug.
        MMIO size set to 0 for last try during realloc

patch 24-56: enhancement for mmio resource allocation:
        optimize bus mmio alignment calculation.
        optimize bus mmio optional alignment calculation.
        add support for alt size to prefer small bus size to small bus alignment.
        treat ROM bar as optional resource.
        during allocation, will pick up best fit resource, and allocate near end.

patch 57: add pci=assign_pref_bars to clear and assign pref bars.

patch 58-59: don't clear resource when allocation fails

patch 60: don't try io port allocation if root bus does not have io port.

I put latest copy at:
  git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-pci-v4.7-next

That is rebased on v4.6-rc2+.

Yijing and Wei Yang both tested the branch, and the patchset fixs rescan
problem, and is working well on powerpc setup.

Khalid Aziz tested on some sparc platforms that have different offset to
mem space start or support 64bit mmio.

-v11: replace pci_find_root_bus_resource with pci_find_bus_resource
      add two patches from Bjorn for checking with pci_mmap.
      refreshed to current linus tree v4.6-rc2+

Hope at least patch 1-16 could get into v4.7.
Patch 3-16 has been Oracle UEK4 for a while to boot on some sparc platforms as
them have 64bt mmio above 4G.

Thanks

Yinghai

Bjorn Helgaas (2):
  PCI: Fix iomem_is_exclusive() checking in pci_mmap_resource()
  alpha/PCI: Only check iomem_is_exclusive() for IORESOURCE_MEM, not IORESOURCE_IO

Yinghai Lu (58):
  PCI: Add pci_find_bus_resource()
  sparc/PCI: Use correct offset for bus address to resource
  sparc/PCI: Reserve legacy mmio after PCI mmio
  sparc/PCI: Add IORESOURCE_MEM_64 for 64-bit resource in OF parsing
  sparc/PCI: Keep resource idx order with bridge register number
  PCI: Kill wrong quirk about M7101
  powerpc/PCI: Keep resource idx order with bridge register number
  powerpc/PCI: Add IORESOURCE_MEM_64 for 64-bit resource in OF parsing
  OF/PCI: Add IORESOURCE_MEM_64 for 64-bit resource
  PCI: Check pref compatible bit for mem64 resource of PCIe device
  PCI: Only treat non-pref mmio64 as pref if all bridges have MEM_64
  PCI: Add has_mem64 for struct host_bridge
  PCI: Only treat non-pref mmio64 as pref if host bridge has mmio64
  PCI: Restore pref MMIO allocation logic for host bridge without mmio64
  PCI: Don't release fixed resource for realloc
  PCI: Claim fixed resource during remove/rescan path
  PCI: Set resource to FIXED for LSI devices
  PCI: Separate realloc list checking after allocation
  PCI: Treat optional as required in first try for bridge rescan
  PCI: Get new realloc size for bridge for last try
  PCI: Don't release sibling bridge resources during hotplug
  PCI: Cleanup res_to_dev_res() printout
  PCI: Reuse res_to_dev_res() in reassign_resources_sorted()
  PCI: Use correct align for optional only resources during sorting
  PCI: Optimize bus min_align/size calculation during sizing
  PCI: Optimize bus align/size calculation for optional during sizing
  PCI: Don't add too much optional size for hotplug bridge MMIO
  PCI: Reorder resources list for required/optional resources
  PCI: Remove duplicated code for resource sorting
  PCI: Rename pdev_sort_resources() to pdev_assign_resources_prepare()
  PCI: Treat ROM resource as optional during realloc
  PCI: Add debug printout during releasing partial assigned resources
  PCI: Simplify res reference using in __assign_resources_sorted()
  PCI: Add __add_to_list()
  PCI: Cache window alignment value during bus sizing
  PCI: Check if resource is allocated before trying to assign one
  PCI: Separate out save_resources()/restore_resources()
  PCI: Move comment to pci_need_to_release()
  PCI: Separate required+optional assigning to another function
  PCI: Skip required+optional if there is no optional
  PCI: Move saved required resource list out of required+optional assigning
  PCI: Add alt_size ressource allocation support
  PCI: Add support for more than two alt_size entries under same bridge
  PCI: Fix size calculation with old_size on rescan path
  PCI: Don't add too much optional size for hotplug bridge io
  PCI: Move ISA io port align out of calculate_iosize()
  PCI: Don't add too much io port for hotplug bridge with old size
  PCI: Unify calculate_size() for io port and MMIO
  PCI: Allow bridge optional only io port resource required size to be 0
  PCI: Unify skip_ioresource_align()
  PCI: Kill macro checking for bus io port sizing
  resources: Make allocate_resource() return best fit resource
  PCI, x86: Allocate from high in available window for MMIO
  PCI: Add debug print out for min_align and alt_size
  PCI, x86: Add pci=assign_pref_bars to reallocate pref BARs
  PCI: Introduce resource_disabled()
  PCI: Don't set flags to 0 when assign resource fail
  PCI: Only try to assign io port only for root bus that support it

 arch/alpha/kernel/pci-sysfs.c             |    4 +-
 arch/alpha/kernel/pci.c                   |    2 +-
 arch/ia64/pci/pci.c                       |    4 +-
 arch/microblaze/pci/pci-common.c          |   15 +-
 arch/mn10300/unit-asb2305/pci-asb2305.c   |    4 +-
 arch/mn10300/unit-asb2305/pci.c           |    4 +-
 arch/powerpc/kernel/pci-common.c          |   27 +-
 arch/powerpc/kernel/pci_of_scan.c         |   12 +-
 arch/powerpc/platforms/powernv/pci-ioda.c |   12 +-
 arch/s390/pci/pci.c                       |    2 +-
 arch/sparc/kernel/of_device_32.c          |    5 +-
 arch/sparc/kernel/of_device_64.c          |    5 +-
 arch/sparc/kernel/pci.c                   |   65 +-
 arch/sparc/kernel/pci_common.c            |   91 +-
 arch/sparc/kernel/pci_impl.h              |    5 +
 arch/x86/include/asm/pci_x86.h            |    2 +-
 arch/x86/pci/common.c                     |    7 +-
 arch/x86/pci/i386.c                       |   85 +-
 arch/xtensa/kernel/pci.c                  |    4 +-
 drivers/iommu/intel-iommu.c               |    3 +-
 drivers/of/address.c                      |    4 +-
 drivers/pci/bus.c                         |    6 +-
 drivers/pci/host/pcie-rcar.c              |    2 +-
 drivers/pci/hotplug/acpiphp_glue.c        |    1 +
 drivers/pci/iov.c                         |    2 +-
 drivers/pci/pci-sysfs.c                   |    7 +-
 drivers/pci/pci.c                         |   31 +-
 drivers/pci/pci.h                         |    4 +
 drivers/pci/probe.c                       |   48 +-
 drivers/pci/quirks.c                      |   55 +-
 drivers/pci/rom.c                         |    2 +-
 drivers/pci/setup-bus.c                   | 1316 +++++++++++++++++++++--------
 drivers/pci/setup-res.c                   |   18 +-
 include/linux/ioport.h                    |    6 +-
 include/linux/pci.h                       |   10 +
 kernel/resource.c                         |  104 ++-
 36 files changed, 1427 insertions(+), 547 deletions(-)

-- 
1.8.4.5

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v11 01/60] PCI: Fix iomem_is_exclusive() checking in pci_mmap_resource()
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 02/60] alpha/PCI: Only check iomem_is_exclusive() for IORESOURCE_MEM, not IORESOURCE_IO Yinghai Lu
                   ` (59 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Arjan van de Ven

From: Bjorn Helgaas <bhelgaas@google.com>

iomem_is_exclusive() requires a CPU physical address, but on some arches we
supplied a PCI bus address instead.

On most arches, pci_resource_to_user(res) returns "res->start", which is a
CPU physical address.  But on microblaze, mips, powerpc, and sparc, it
returns the PCI bus address corresponding to "res->start".

The result is that pci_mmap_resource() may fail when it shouldn't (if the
bus address happens to match an existing resource), or it may succeed when
it should fail (if the resource is exclusive but the bus address doesn't
match it).

Call iomem_is_exclusive() with "res->start", which is always a CPU physical
address, not the result of pci_resource_to_user().

Fixes: e8de1481fd71 ("resource: allow MMIO exclusivity for device drivers")
Suggested-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
---
 drivers/pci/pci-sysfs.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index e982010..cbb13be 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -1008,6 +1008,9 @@ static int pci_mmap_resource(struct kobject *kobj, struct bin_attribute *attr,
 	if (i >= PCI_ROM_RESOURCE)
 		return -ENODEV;
 
+	if (res->flags & IORESOURCE_MEM && iomem_is_exclusive(res->start))
+		return -EINVAL;
+
 	if (!pci_mmap_fits(pdev, i, vma, PCI_MMAP_SYSFS)) {
 		WARN(1, "process \"%s\" tried to map 0x%08lx bytes at page 0x%08lx on %s BAR %d (start 0x%16Lx, size 0x%16Lx)\n",
 			current->comm, vma->vm_end-vma->vm_start, vma->vm_pgoff,
@@ -1024,10 +1027,6 @@ static int pci_mmap_resource(struct kobject *kobj, struct bin_attribute *attr,
 	pci_resource_to_user(pdev, i, res, &start, &end);
 	vma->vm_pgoff += start >> PAGE_SHIFT;
 	mmap_type = res->flags & IORESOURCE_MEM ? pci_mmap_mem : pci_mmap_io;
-
-	if (res->flags & IORESOURCE_MEM && iomem_is_exclusive(start))
-		return -EINVAL;
-
 	return pci_mmap_page_range(pdev, vma, mmap_type, write_combine);
 }
 
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 02/60] alpha/PCI: Only check iomem_is_exclusive() for IORESOURCE_MEM, not IORESOURCE_IO
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 01/60] PCI: Fix iomem_is_exclusive() checking in pci_mmap_resource() Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-25 21:01   ` Bjorn Helgaas
  2016-04-08  0:15 ` [PATCH v11 03/60] PCI: Add pci_find_bus_resource() Yinghai Lu
                   ` (58 subsequent siblings)
  60 siblings, 1 reply; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Ivan Kokshaysky

From: Bjorn Helgaas <bhelgaas@google.com>

The alpha pci_mmap_resource() is used for both IORESOURCE_MEM and
IORESOURCE_IO resources, but iomem_is_exclusive() is only applicable for
IORESOURCE_MEM.

Call iomem_is_exclusive() only for IORESOURCE_MEM resources, and do it
earlier to match the generic version of pci_mmap_resource().

Fixes: 10a0ef39fbd1 ("PCI/alpha: pci sysfs resources")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
---
 arch/alpha/kernel/pci-sysfs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/alpha/kernel/pci-sysfs.c b/arch/alpha/kernel/pci-sysfs.c
index 99e8d47..92c0d46 100644
--- a/arch/alpha/kernel/pci-sysfs.c
+++ b/arch/alpha/kernel/pci-sysfs.c
@@ -77,10 +77,10 @@ static int pci_mmap_resource(struct kobject *kobj,
 	if (i >= PCI_ROM_RESOURCE)
 		return -ENODEV;
 
-	if (!__pci_mmap_fits(pdev, i, vma, sparse))
+	if (res->flags & IORESOURCE_MEM && iomem_is_exclusive(res->start))
 		return -EINVAL;
 
-	if (iomem_is_exclusive(res->start))
+	if (!__pci_mmap_fits(pdev, i, vma, sparse))
 		return -EINVAL;
 
 	pcibios_resource_to_bus(pdev->bus, &bar, res);
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 03/60] PCI: Add pci_find_bus_resource()
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 01/60] PCI: Fix iomem_is_exclusive() checking in pci_mmap_resource() Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 02/60] alpha/PCI: Only check iomem_is_exclusive() for IORESOURCE_MEM, not IORESOURCE_IO Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 04/60] sparc/PCI: Use correct offset for bus address to resource Yinghai Lu
                   ` (57 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

Add pci_find_bus_resource() to return bus resource for input resource.

In some case, we may only have bus instead of dev.
It is same as pci_find_parent_resource, but take bus as input.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/pci.c   | 27 ++++++++++++++++-----------
 include/linux/pci.h |  2 ++
 2 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 25e0327..313dea5 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -414,18 +414,9 @@ int pci_find_ht_capability(struct pci_dev *dev, int ht_cap)
 }
 EXPORT_SYMBOL_GPL(pci_find_ht_capability);
 
-/**
- * pci_find_parent_resource - return resource region of parent bus of given region
- * @dev: PCI device structure contains resources to be searched
- * @res: child resource record for which parent is sought
- *
- *  For given resource region of given device, return the resource
- *  region of parent bus the given region is contained in.
- */
-struct resource *pci_find_parent_resource(const struct pci_dev *dev,
-					  struct resource *res)
+struct resource *pci_find_bus_resource(const struct pci_bus *bus,
+					struct resource *res)
 {
-	const struct pci_bus *bus = dev->bus;
 	struct resource *r;
 	int i;
 
@@ -455,6 +446,20 @@ struct resource *pci_find_parent_resource(const struct pci_dev *dev,
 	}
 	return NULL;
 }
+
+/**
+ * pci_find_parent_resource - return resource region of parent bus of given region
+ * @dev: PCI device structure contains resources to be searched
+ * @res: child resource record for which parent is sought
+ *
+ *  For given resource region of given device, return the resource
+ *  region of parent bus the given region is contained in.
+ */
+struct resource *pci_find_parent_resource(const struct pci_dev *dev,
+					  struct resource *res)
+{
+	return pci_find_bus_resource(dev->bus, res);
+}
 EXPORT_SYMBOL(pci_find_parent_resource);
 
 /**
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 004b813..795b4c7 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -807,6 +807,8 @@ void pcibios_resource_to_bus(struct pci_bus *bus, struct pci_bus_region *region,
 			     struct resource *res);
 void pcibios_bus_to_resource(struct pci_bus *bus, struct resource *res,
 			     struct pci_bus_region *region);
+struct resource *pci_find_bus_resource(const struct pci_bus *bus,
+					struct resource *res);
 void pcibios_scan_specific_bus(int busn);
 struct pci_bus *pci_find_bus(int domain, int busnr);
 void pci_bus_add_devices(const struct pci_bus *bus);
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 04/60] sparc/PCI: Use correct offset for bus address to resource
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (2 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 03/60] PCI: Add pci_find_bus_resource() Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-22 20:49   ` Bjorn Helgaas
  2016-04-08  0:15 ` [PATCH v11 05/60] sparc/PCI: Reserve legacy mmio after PCI mmio Yinghai Lu
                   ` (56 subsequent siblings)
  60 siblings, 1 reply; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

After we added 64bit mmio parsing, we got some "no compatible bridge window"
warning on anther new model that support 64bit resource.

It turns out that we can not use mem_space.start as 64bit mem space
offset, aka there is mem_space.start != offset.

Use child_phys_addr to calculate exact offset and record offset in
pbm.

After patch we get correct offset.

/pci@305: PCI IO [io  0x2007e00000000-0x2007e0fffffff] offset 2007e00000000
/pci@305: PCI MEM [mem 0x2000000100000-0x200007effffff] offset 2000000000000
/pci@305: PCI MEM64 [mem 0x2000100000000-0x2000dffffffff] offset 2000000000000
...
pci_sun4v f02ae7f8: PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io  0x2007e00000000-0x2007e0fffffff] (bus address [0x0000-0xfffffff])
pci_bus 0000:00: root bus resource [mem 0x2000000100000-0x200007effffff] (bus address [0x00100000-0x7effffff])
pci_bus 0000:00: root bus resource [mem 0x2000100000000-0x2000dffffffff] (bus address [0x100000000-0xdffffffff])

-v3: put back mem64_offset, as we found T4 has mem_offset != mem64_offset
     check overlapping between mem64_space and mem_space.

-v5: use pcibios_bus_to_region() requested by Bjorn.

-v6: use pci_find_bus_resource().

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Tested-by: Khalid Aziz <khalid.aziz@oracle.com>
---
 arch/sparc/kernel/pci.c        | 54 ++++++++++++++++++++----------------------
 arch/sparc/kernel/pci_common.c | 32 ++++++++++++++++++-------
 arch/sparc/kernel/pci_impl.h   |  4 ++++
 3 files changed, 54 insertions(+), 36 deletions(-)

diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c
index badf095..4606dc1 100644
--- a/arch/sparc/kernel/pci.c
+++ b/arch/sparc/kernel/pci.c
@@ -654,12 +654,12 @@ struct pci_bus *pci_scan_one_pbm(struct pci_pbm_info *pbm,
 	printk("PCI: Scanning PBM %s\n", node->full_name);
 
 	pci_add_resource_offset(&resources, &pbm->io_space,
-				pbm->io_space.start);
+				pbm->io_offset);
 	pci_add_resource_offset(&resources, &pbm->mem_space,
-				pbm->mem_space.start);
+				pbm->mem_offset);
 	if (pbm->mem64_space.flags)
 		pci_add_resource_offset(&resources, &pbm->mem64_space,
-					pbm->mem_space.start);
+					pbm->mem64_offset);
 	pbm->busn.start = pbm->pci_first_busno;
 	pbm->busn.end	= pbm->pci_last_busno;
 	pbm->busn.flags	= IORESOURCE_BUS;
@@ -733,30 +733,32 @@ int pcibios_enable_device(struct pci_dev *dev, int mask)
 static int __pci_mmap_make_offset_bus(struct pci_dev *pdev, struct vm_area_struct *vma,
 				      enum pci_mmap_state mmap_state)
 {
-	struct pci_pbm_info *pbm = pdev->dev.archdata.host_controller;
-	unsigned long space_size, user_offset, user_size;
-
-	if (mmap_state == pci_mmap_io) {
-		space_size = resource_size(&pbm->io_space);
-	} else {
-		space_size = resource_size(&pbm->mem_space);
-	}
+	unsigned long user_offset, user_size;
+	struct resource res, *root_bus_res;
+	struct pci_bus_region region;
+	struct pci_bus *bus;
 
 	/* Make sure the request is in range. */
 	user_offset = vma->vm_pgoff << PAGE_SHIFT;
 	user_size = vma->vm_end - vma->vm_start;
 
-	if (user_offset >= space_size ||
-	    (user_offset + user_size) > space_size)
+	region.start = user_offset;
+	region.end = user_offset + user_size - 1;
+	memset(&res, 0, sizeof(res));
+	if (mmap_state == pci_mmap_io)
+		res.flags = IORESOURCE_IO;
+	else
+		res.flags = IORESOURCE_MEM;
+
+	pcibios_bus_to_resource(pdev->bus, &res, &region);
+	bus = pdev->bus;
+	while (bus->parent)
+		bus = bus->parent;
+	root_bus_res = pci_find_bus_resource(bus, &res);
+	if (!root_bus_res)
 		return -EINVAL;
 
-	if (mmap_state == pci_mmap_io) {
-		vma->vm_pgoff = (pbm->io_space.start +
-				 user_offset) >> PAGE_SHIFT;
-	} else {
-		vma->vm_pgoff = (pbm->mem_space.start +
-				 user_offset) >> PAGE_SHIFT;
-	}
+	vma->vm_pgoff = res.start >> PAGE_SHIFT;
 
 	return 0;
 }
@@ -977,16 +979,12 @@ void pci_resource_to_user(const struct pci_dev *pdev, int bar,
 			  const struct resource *rp, resource_size_t *start,
 			  resource_size_t *end)
 {
-	struct pci_pbm_info *pbm = pdev->dev.archdata.host_controller;
-	unsigned long offset;
+	struct pci_bus_region region;
 
-	if (rp->flags & IORESOURCE_IO)
-		offset = pbm->io_space.start;
-	else
-		offset = pbm->mem_space.start;
+	pcibios_resource_to_bus(pdev->bus, &region, (struct resource *)rp);
 
-	*start = rp->start - offset;
-	*end = rp->end - offset;
+	*start = region.start;
+	*end = region.end;
 }
 
 void pcibios_set_master(struct pci_dev *dev)
diff --git a/arch/sparc/kernel/pci_common.c b/arch/sparc/kernel/pci_common.c
index 33524c1..76998f8 100644
--- a/arch/sparc/kernel/pci_common.c
+++ b/arch/sparc/kernel/pci_common.c
@@ -410,13 +410,16 @@ void pci_determine_mem_io_space(struct pci_pbm_info *pbm)
 
 	for (i = 0; i < num_pbm_ranges; i++) {
 		const struct linux_prom_pci_ranges *pr = &pbm_ranges[i];
-		unsigned long a, size;
+		unsigned long a, size, region_a;
 		u32 parent_phys_hi, parent_phys_lo;
+		u32 child_phys_mid, child_phys_lo;
 		u32 size_hi, size_lo;
 		int type;
 
 		parent_phys_hi = pr->parent_phys_hi;
 		parent_phys_lo = pr->parent_phys_lo;
+		child_phys_mid = pr->child_phys_mid;
+		child_phys_lo = pr->child_phys_lo;
 		if (tlb_type == hypervisor)
 			parent_phys_hi &= 0x0fffffff;
 
@@ -426,6 +429,8 @@ void pci_determine_mem_io_space(struct pci_pbm_info *pbm)
 		type = (pr->child_phys_hi >> 24) & 0x3;
 		a = (((unsigned long)parent_phys_hi << 32UL) |
 		     ((unsigned long)parent_phys_lo  <<  0UL));
+		region_a = (((unsigned long)child_phys_mid << 32UL) |
+		     ((unsigned long)child_phys_lo  <<  0UL));
 		size = (((unsigned long)size_hi << 32UL) |
 			((unsigned long)size_lo  <<  0UL));
 
@@ -440,6 +445,7 @@ void pci_determine_mem_io_space(struct pci_pbm_info *pbm)
 			pbm->io_space.start = a;
 			pbm->io_space.end = a + size - 1UL;
 			pbm->io_space.flags = IORESOURCE_IO;
+			pbm->io_offset = a - region_a;
 			saw_io = 1;
 			break;
 
@@ -448,6 +454,7 @@ void pci_determine_mem_io_space(struct pci_pbm_info *pbm)
 			pbm->mem_space.start = a;
 			pbm->mem_space.end = a + size - 1UL;
 			pbm->mem_space.flags = IORESOURCE_MEM;
+			pbm->mem_offset = a - region_a;
 			saw_mem = 1;
 			break;
 
@@ -456,6 +463,7 @@ void pci_determine_mem_io_space(struct pci_pbm_info *pbm)
 			pbm->mem64_space.start = a;
 			pbm->mem64_space.end = a + size - 1UL;
 			pbm->mem64_space.flags = IORESOURCE_MEM;
+			pbm->mem64_offset = a - region_a;
 			saw_mem = 1;
 			break;
 
@@ -471,14 +479,22 @@ void pci_determine_mem_io_space(struct pci_pbm_info *pbm)
 		prom_halt();
 	}
 
-	printk("%s: PCI IO[%llx] MEM[%llx]",
-	       pbm->name,
-	       pbm->io_space.start,
-	       pbm->mem_space.start);
+	if (pbm->io_space.flags)
+		printk("%s: PCI IO %pR offset %llx\n",
+		       pbm->name, &pbm->io_space, pbm->io_offset);
+	if (pbm->mem_space.flags)
+		printk("%s: PCI MEM %pR offset %llx\n",
+		       pbm->name, &pbm->mem_space, pbm->mem_offset);
+	if (pbm->mem64_space.flags && pbm->mem_space.flags) {
+		if (pbm->mem64_space.start <= pbm->mem_space.end)
+			pbm->mem64_space.start = pbm->mem_space.end + 1;
+		if (pbm->mem64_space.start > pbm->mem64_space.end)
+			pbm->mem64_space.flags = 0;
+	}
+
 	if (pbm->mem64_space.flags)
-		printk(" MEM64[%llx]",
-		       pbm->mem64_space.start);
-	printk("\n");
+		printk("%s: PCI MEM64 %pR offset %llx\n",
+		       pbm->name, &pbm->mem64_space, pbm->mem64_offset);
 
 	pbm->io_space.name = pbm->mem_space.name = pbm->name;
 	pbm->mem64_space.name = pbm->name;
diff --git a/arch/sparc/kernel/pci_impl.h b/arch/sparc/kernel/pci_impl.h
index 37222ca..2853af7 100644
--- a/arch/sparc/kernel/pci_impl.h
+++ b/arch/sparc/kernel/pci_impl.h
@@ -99,6 +99,10 @@ struct pci_pbm_info {
 	struct resource			mem_space;
 	struct resource			mem64_space;
 	struct resource			busn;
+	/* offset */
+	resource_size_t			io_offset;
+	resource_size_t			mem_offset;
+	resource_size_t			mem64_offset;
 
 	/* Base of PCI Config space, can be per-PBM or shared. */
 	unsigned long			config_space;
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 05/60] sparc/PCI: Reserve legacy mmio after PCI mmio
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (3 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 04/60] sparc/PCI: Use correct offset for bus address to resource Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 06/60] sparc/PCI: Add IORESOURCE_MEM_64 for 64-bit resource in OF parsing Yinghai Lu
                   ` (55 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

On one system found bunch of claim resource fail from pci device.
pci_sun4v f02b894c: PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io  0x2007e00000000-0x2007e0fffffff] (bus address [0x0000-0xfffffff])
pci_bus 0000:00: root bus resource [mem 0x2000000000000-0x200007effffff] (bus address [0x00000000-0x7effffff])
pci_bus 0000:00: root bus resource [mem 0x2000100000000-0x20007ffffffff] (bus address [0x100000000-0x7ffffffff])
...
PCI: Claiming 0000:00:02.0: Resource 14: 0002000000000000..00020000004fffff [200]
pci 0000:00:02.0: can't claim BAR 14 [mem 0x2000000000000-0x20000004fffff]: address conflict with Video RAM area [??? 0x20000000a0000-0x20000000bffff flags 0x80000000]
pci 0000:02:00.0: can't claim BAR 0 [mem 0x2000000000000-0x20000000fffff]: no compatible bridge window
PCI: Claiming 0000:02:00.0: Resource 3: 0002000000100000..0002000000103fff [200]
pci 0000:02:00.0: can't claim BAR 3 [mem 0x2000000100000-0x2000000103fff]: no compatible bridge window
PCI: Claiming 0000:02:00.1: Resource 0: 0002000000200000..00020000002fffff [200]
pci 0000:02:00.1: can't claim BAR 0 [mem 0x2000000200000-0x20000002fffff]: no compatible bridge window
PCI: Claiming 0000:02:00.1: Resource 3: 0002000000104000..0002000000107fff [200]
pci 0000:02:00.1: can't claim BAR 3 [mem 0x2000000104000-0x2000000107fff]: no compatible bridge window
PCI: Claiming 0000:02:00.2: Resource 0: 0002000000300000..00020000003fffff [200]
pci 0000:02:00.2: can't claim BAR 0 [mem 0x2000000300000-0x20000003fffff]: no compatible bridge window
PCI: Claiming 0000:02:00.2: Resource 3: 0002000000108000..000200000010bfff [200]
pci 0000:02:00.2: can't claim BAR 3 [mem 0x2000000108000-0x200000010bfff]: no compatible bridge window
PCI: Claiming 0000:02:00.3: Resource 0: 0002000000400000..00020000004fffff [200]
pci 0000:02:00.3: can't claim BAR 0 [mem 0x2000000400000-0x20000004fffff]: no compatible bridge window
PCI: Claiming 0000:02:00.3: Resource 3: 000200000010c000..000200000010ffff [200]
pci 0000:02:00.3: can't claim BAR 3 [mem 0x200000010c000-0x200000010ffff]: no compatible bridge window

The bridge 00:02.0 resource does not get reserved as Video RAM take the position early,
and following children resources reservation all fail.

Move down Video RAM area reservation after pci mmio get reserved,
so we leave pci driver to use those regions.

-v5: merge simplify one and use pcibios_bus_to_resource()

-v6: use pci_find_bus_resource()

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Tested-by: Khalid Aziz <khalid.aziz@oracle.com>
---
 arch/sparc/kernel/pci.c        |  1 +
 arch/sparc/kernel/pci_common.c | 59 ++++++++++++++++++++++--------------------
 arch/sparc/kernel/pci_impl.h   |  1 +
 3 files changed, 33 insertions(+), 28 deletions(-)

diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c
index 4606dc1..9c6daad 100644
--- a/arch/sparc/kernel/pci.c
+++ b/arch/sparc/kernel/pci.c
@@ -677,6 +677,7 @@ struct pci_bus *pci_scan_one_pbm(struct pci_pbm_info *pbm,
 	pci_bus_register_of_sysfs(bus);
 
 	pci_claim_bus_resources(bus);
+	pci_register_legacy_regions(bus);
 	pci_bus_add_devices(bus);
 	return bus;
 }
diff --git a/arch/sparc/kernel/pci_common.c b/arch/sparc/kernel/pci_common.c
index 76998f8..1ebc7ff 100644
--- a/arch/sparc/kernel/pci_common.c
+++ b/arch/sparc/kernel/pci_common.c
@@ -328,41 +328,46 @@ void pci_get_pbm_props(struct pci_pbm_info *pbm)
 	}
 }
 
-static void pci_register_legacy_regions(struct resource *io_res,
-					struct resource *mem_res)
+static void pci_register_region(struct pci_bus *bus, const char *name,
+				resource_size_t rstart, resource_size_t size)
 {
-	struct resource *p;
+	struct resource *res, *conflict, *bus_res;
+	struct pci_bus_region region;
 
-	/* VGA Video RAM. */
-	p = kzalloc(sizeof(*p), GFP_KERNEL);
-	if (!p)
+	res = kzalloc(sizeof(*res), GFP_KERNEL);
+	if (!res)
 		return;
 
-	p->name = "Video RAM area";
-	p->start = mem_res->start + 0xa0000UL;
-	p->end = p->start + 0x1ffffUL;
-	p->flags = IORESOURCE_BUSY;
-	request_resource(mem_res, p);
+	res->flags = IORESOURCE_MEM;
 
-	p = kzalloc(sizeof(*p), GFP_KERNEL);
-	if (!p)
+	region.start = rstart;
+	region.end = rstart + size - 1UL;
+	pcibios_bus_to_resource(bus, res, &region);
+	bus_res = pci_find_bus_resource(bus, res);
+	if (!bus_res) {
+		kfree(res);
 		return;
+	}
+
+	res->name = name;
+	res->flags |= IORESOURCE_BUSY;
+	conflict = request_resource_conflict(bus_res, res);
+	if (conflict) {
+		dev_printk(KERN_DEBUG, &bus->dev,
+			" can't claim %s %pR: address conflict with %s %pR\n",
+			res->name, res, conflict->name, conflict);
+		kfree(res);
+	}
+}
 
-	p->name = "System ROM";
-	p->start = mem_res->start + 0xf0000UL;
-	p->end = p->start + 0xffffUL;
-	p->flags = IORESOURCE_BUSY;
-	request_resource(mem_res, p);
+void pci_register_legacy_regions(struct pci_bus *bus)
+{
+	/* VGA Video RAM. */
+	pci_register_region(bus, "Video RAM area", 0xa0000UL, 0x20000UL);
 
-	p = kzalloc(sizeof(*p), GFP_KERNEL);
-	if (!p)
-		return;
+	pci_register_region(bus, "System ROM",     0xf0000UL, 0x10000UL);
 
-	p->name = "Video ROM";
-	p->start = mem_res->start + 0xc0000UL;
-	p->end = p->start + 0x7fffUL;
-	p->flags = IORESOURCE_BUSY;
-	request_resource(mem_res, p);
+	pci_register_region(bus, "Video ROM",      0xc0000UL,  0x8000UL);
 }
 
 static void pci_register_iommu_region(struct pci_pbm_info *pbm)
@@ -504,8 +509,6 @@ void pci_determine_mem_io_space(struct pci_pbm_info *pbm)
 	if (pbm->mem64_space.flags)
 		request_resource(&iomem_resource, &pbm->mem64_space);
 
-	pci_register_legacy_regions(&pbm->io_space,
-				    &pbm->mem_space);
 	pci_register_iommu_region(pbm);
 }
 
diff --git a/arch/sparc/kernel/pci_impl.h b/arch/sparc/kernel/pci_impl.h
index 2853af7..ff8f5e1 100644
--- a/arch/sparc/kernel/pci_impl.h
+++ b/arch/sparc/kernel/pci_impl.h
@@ -167,6 +167,7 @@ void pci_get_pbm_props(struct pci_pbm_info *pbm);
 struct pci_bus *pci_scan_one_pbm(struct pci_pbm_info *pbm,
 				 struct device *parent);
 void pci_determine_mem_io_space(struct pci_pbm_info *pbm);
+void pci_register_legacy_regions(struct pci_bus *bus);
 
 /* Error reporting support. */
 void pci_scan_for_target_abort(struct pci_pbm_info *, struct pci_bus *);
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 06/60] sparc/PCI: Add IORESOURCE_MEM_64 for 64-bit resource in OF parsing
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (4 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 05/60] sparc/PCI: Reserve legacy mmio after PCI mmio Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 07/60] sparc/PCI: Keep resource idx order with bridge register number Yinghai Lu
                   ` (54 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu, sparclinux

For device resource with PREF bit setting under bridge 64-bit pref resource,
we need to make sure only set PREF for 64bit resource.

so this patch set IORESOUCE_MEM_64 for 64bit resource during OF device resource
flags parsing.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=96261
Link: https://bugzilla.kernel.org/show_bug.cgi?id=96241
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Tested-by: Khalid Aziz <khalid.aziz@oracle.com>
---
 arch/sparc/kernel/of_device_32.c | 5 +++--
 arch/sparc/kernel/of_device_64.c | 5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/sparc/kernel/of_device_32.c b/arch/sparc/kernel/of_device_32.c
index 185aa96..3e9f273 100644
--- a/arch/sparc/kernel/of_device_32.c
+++ b/arch/sparc/kernel/of_device_32.c
@@ -83,11 +83,12 @@ static unsigned long of_bus_pci_get_flags(const u32 *addr, unsigned long flags)
 	case 0x01:
 		flags |= IORESOURCE_IO;
 		break;
-
 	case 0x02: /* 32 bits */
-	case 0x03: /* 64 bits */
 		flags |= IORESOURCE_MEM;
 		break;
+	case 0x03: /* 64 bits */
+		flags |= IORESOURCE_MEM | IORESOURCE_MEM_64;
+		break;
 	}
 	if (w & 0x40000000)
 		flags |= IORESOURCE_PREFETCH;
diff --git a/arch/sparc/kernel/of_device_64.c b/arch/sparc/kernel/of_device_64.c
index 7bbdc26..defee61 100644
--- a/arch/sparc/kernel/of_device_64.c
+++ b/arch/sparc/kernel/of_device_64.c
@@ -146,11 +146,12 @@ static unsigned long of_bus_pci_get_flags(const u32 *addr, unsigned long flags)
 	case 0x01:
 		flags |= IORESOURCE_IO;
 		break;
-
 	case 0x02: /* 32 bits */
-	case 0x03: /* 64 bits */
 		flags |= IORESOURCE_MEM;
 		break;
+	case 0x03: /* 64 bits */
+		flags |= IORESOURCE_MEM | IORESOURCE_MEM_64;
+		break;
 	}
 	if (w & 0x40000000)
 		flags |= IORESOURCE_PREFETCH;
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 07/60] sparc/PCI: Keep resource idx order with bridge register number
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (5 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 06/60] sparc/PCI: Add IORESOURCE_MEM_64 for 64-bit resource in OF parsing Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 08/60] PCI: Kill wrong quirk about M7101 Yinghai Lu
                   ` (53 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

On one system found strange "no compatible bridge window" warning

PCI: Claiming 0000:00:01.0: Resource 14: 0002000100000000..000200010fffffff [10220c]
PCI: Claiming 0000:01:00.0: Resource 1: 0002000100000000..000200010000ffff [100214]
pci 0000:01:00.0: can't claim BAR 1 [mem 0x2000100000000-0x200010000ffff 64bit]: no compatible bridge window

and we already had pref_compat support that add extra pref bit for device
resource.

It turns out that pci_resource_compatible()/pci_up_path_over_pref_mem64()
just check resource with bridge pref mmio register idx 15, and we have put
resource to use mmio register idx 14 during of_scan_pci_bridge()
as the bridge does not have mmio resource.

We already fix pci_up_path_over_pref_mem64() to check all bus resources.

And at the same time, this patch make resource to have consistent sequence
like other arch or directly from pci_read_bridge_bases(),
even when non-pref mmio is missing, or out of ordering in firmware reporting.

Just hold i = 1 for non pref mmio, and i = 2 for pref mmio.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Tested-by: Khalid Aziz <khalid.aziz@oracle.com>
---
 arch/sparc/kernel/pci.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c
index 9c6daad..9415abc 100644
--- a/arch/sparc/kernel/pci.c
+++ b/arch/sparc/kernel/pci.c
@@ -472,7 +472,7 @@ static void of_scan_pci_bridge(struct pci_pbm_info *pbm,
 		pci_read_bridge_bases(bus);
 		goto after_ranges;
 	}
-	i = 1;
+	i = 3;
 	for (; len >= 32; len -= 32, ranges += 8) {
 		u64 start;
 
@@ -504,6 +504,12 @@ static void of_scan_pci_bridge(struct pci_pbm_info *pbm,
 				       " for bridge %s\n", node->full_name);
 				continue;
 			}
+		} else if ((flags & IORESOURCE_PREFETCH) &&
+			   !bus->resource[2]->flags) {
+			res = bus->resource[2];
+		} else if (((flags & (IORESOURCE_MEM | IORESOURCE_PREFETCH)) ==
+			    IORESOURCE_MEM) && !bus->resource[1]->flags) {
+			res = bus->resource[1];
 		} else {
 			if (i >= PCI_NUM_RESOURCES - PCI_BRIDGE_RESOURCES) {
 				printk(KERN_ERR "PCI: too many memory ranges"
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 08/60] PCI: Kill wrong quirk about M7101
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (6 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 07/60] sparc/PCI: Keep resource idx order with bridge register number Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 09/60] powerpc/PCI: Keep resource idx order with bridge register number Yinghai Lu
                   ` (52 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu, Meelis Roos

Meelis reported that qla2000 driver does not get loaded on one sparc system.

schizo f00732d0: PCI host bridge to bus 0001:00
pci_bus 0001:00: root bus resource [io  0x7fe01000000-0x7fe01ffffff] (bus address [0x0000-0xffffff])
pci 0001:00:06.0: quirk: [io  0x7fe01000800-0x7fe0100083f] claimed by ali7101 ACPI
pci 0001:00:06.0: quirk: [io  0x7fe01000600-0x7fe0100061f] claimed by ali7101 SMB
pci 0001:00:07.0: can't claim BAR 0 [io  0x7fe01000000-0x7fe0100ffff]: address conflict with 0001:00:06.0 [io  0x7fe01000600-0x7fe0100061f]

So the quirk for M7101 claim the io range early.

According to spec with M7101 in M1543 page 103/104,
	http://www.versalogic.com/Support/Downloads/pdf/ali1543.pdf
0xe0, and 0xe2 do not include address info for acpi/smb.

We can not find how the code got there. But per Linus
we should remove that quirk according to the datasheet.

Kill wrong quirk about them.

Link: http://kodu.ut.ee/~mroos/dm/dm.v240
Link: http://kodu.ut.ee/~mroos/dm/dm.sb100
Reported-by: Meelis Roos <mroos@linux.ee>
Cc: Meelis Roos <mroos@linux.ee>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Tested-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
---
 drivers/pci/quirks.c | 18 ------------------
 1 file changed, 18 deletions(-)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 8e67802..21d545d 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -445,24 +445,6 @@ static void quirk_amd_nl_class(struct pci_dev *pdev)
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_NL_USB,
 		quirk_amd_nl_class);
 
-/*
- * Let's make the southbridge information explicit instead
- * of having to worry about people probing the ACPI areas,
- * for example.. (Yes, it happens, and if you read the wrong
- * ACPI register it will put the machine to sleep with no
- * way of waking it up again. Bummer).
- *
- * ALI M7101: Two IO regions pointed to by words at
- *	0xE0 (64 bytes of ACPI registers)
- *	0xE2 (32 bytes of SMB registers)
- */
-static void quirk_ali7101_acpi(struct pci_dev *dev)
-{
-	quirk_io_region(dev, 0xE0, 64, PCI_BRIDGE_RESOURCES, "ali7101 ACPI");
-	quirk_io_region(dev, 0xE2, 32, PCI_BRIDGE_RESOURCES+1, "ali7101 SMB");
-}
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_AL,	PCI_DEVICE_ID_AL_M7101,		quirk_ali7101_acpi);
-
 static void piix4_io_quirk(struct pci_dev *dev, const char *name, unsigned int port, unsigned int enable)
 {
 	u32 devres;
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 09/60] powerpc/PCI: Keep resource idx order with bridge register number
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (7 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 08/60] PCI: Kill wrong quirk about M7101 Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 10/60] powerpc/PCI: Add IORESOURCE_MEM_64 for 64-bit resource in OF parsing Yinghai Lu
                   ` (51 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

Same as sparc version.

Make resource with consistent sequence
like other arch or directly from pci_read_bridge_bases(),
even when non-pref mmio is missing, or out of ordering in firmware reporting.

Just hold i = 1 for non pref mmio, and i = 2 for pref mmio.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/powerpc/kernel/pci_of_scan.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci_of_scan.c b/arch/powerpc/kernel/pci_of_scan.c
index 526ac67..719f225 100644
--- a/arch/powerpc/kernel/pci_of_scan.c
+++ b/arch/powerpc/kernel/pci_of_scan.c
@@ -252,7 +252,7 @@ void of_scan_pci_bridge(struct pci_dev *dev)
 		bus->resource[i] = res;
 		++res;
 	}
-	i = 1;
+	i = 3;
 	for (; len >= 32; len -= 32, ranges += 8) {
 		flags = pci_parse_of_flags(of_read_number(ranges, 1), 1);
 		size = of_read_number(&ranges[6], 2);
@@ -265,6 +265,12 @@ void of_scan_pci_bridge(struct pci_dev *dev)
 				       " for bridge %s\n", node->full_name);
 				continue;
 			}
+		} else if ((flags & IORESOURCE_PREFETCH) &&
+			   !bus->resource[2]->flags) {
+			res = bus->resource[2];
+		} else if (((flags & (IORESOURCE_MEM | IORESOURCE_PREFETCH)) ==
+			    IORESOURCE_MEM) && !bus->resource[1]->flags) {
+			res = bus->resource[1];
 		} else {
 			if (i >= PCI_NUM_RESOURCES - PCI_BRIDGE_RESOURCES) {
 				printk(KERN_ERR "PCI: too many memory ranges"
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 10/60] powerpc/PCI: Add IORESOURCE_MEM_64 for 64-bit resource in OF parsing
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (8 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 09/60] powerpc/PCI: Keep resource idx order with bridge register number Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 11/60] OF/PCI: Add IORESOURCE_MEM_64 for 64-bit resource Yinghai Lu
                   ` (50 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu, Paul Mackerras, Michael Ellerman, Gavin Shan,
	Anton Blanchard, linuxppc-dev

For device resource PREF bit setting under bridge 64-bit pref resource,
we need to make sure only set PREF for 64bit resource.

This patch set IORESOUCE_MEM_64 for 64bit resource during OF device resource
flags parsing.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=96261
Link: https://bugzilla.kernel.org/show_bug.cgi?id=96241
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Cc: Yijing Wang <wangyijing@huawei.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/kernel/pci_of_scan.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci_of_scan.c b/arch/powerpc/kernel/pci_of_scan.c
index 719f225..476b8ac5 100644
--- a/arch/powerpc/kernel/pci_of_scan.c
+++ b/arch/powerpc/kernel/pci_of_scan.c
@@ -44,8 +44,10 @@ static unsigned int pci_parse_of_flags(u32 addr0, int bridge)
 
 	if (addr0 & 0x02000000) {
 		flags = IORESOURCE_MEM | PCI_BASE_ADDRESS_SPACE_MEMORY;
-		flags |= (addr0 >> 22) & PCI_BASE_ADDRESS_MEM_TYPE_64;
 		flags |= (addr0 >> 28) & PCI_BASE_ADDRESS_MEM_TYPE_1M;
+		if (addr0 & 0x01000000)
+			flags |= IORESOURCE_MEM_64
+				 | PCI_BASE_ADDRESS_MEM_TYPE_64;
 		if (addr0 & 0x40000000)
 			flags |= IORESOURCE_PREFETCH
 				 | PCI_BASE_ADDRESS_MEM_PREFETCH;
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 11/60] OF/PCI: Add IORESOURCE_MEM_64 for 64-bit resource
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (9 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 10/60] powerpc/PCI: Add IORESOURCE_MEM_64 for 64-bit resource in OF parsing Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 12/60] PCI: Check pref compatible bit for mem64 resource of PCIe device Yinghai Lu
                   ` (49 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu, Grant Likely, Rob Herring, devicetree

For device resource PREF bit setting under bridge 64-bit pref resource,
we need to make sure only set PREF for 64bit resource.

This patch set IORESOUCE_MEM_64 for 64bit resource during OF device resource
flags parsing.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=96261
Link: https://bugzilla.kernel.org/show_bug.cgi?id=96241
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: devicetree@vger.kernel.org
Tested-by: Khalid Aziz <khalid.aziz@oracle.com>
---
 drivers/of/address.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 91a469d..3b09261 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -128,9 +128,11 @@ static unsigned int of_bus_pci_get_flags(const __be32 *addr)
 		flags |= IORESOURCE_IO;
 		break;
 	case 0x02: /* 32 bits */
-	case 0x03: /* 64 bits */
 		flags |= IORESOURCE_MEM;
 		break;
+	case 0x03: /* 64 bits */
+		flags |= IORESOURCE_MEM | IORESOURCE_MEM_64;
+		break;
 	}
 	if (w & 0x40000000)
 		flags |= IORESOURCE_PREFETCH;
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 12/60] PCI: Check pref compatible bit for mem64 resource of PCIe device
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (10 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 11/60] OF/PCI: Add IORESOURCE_MEM_64 for 64-bit resource Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 13/60] PCI: Only treat non-pref mmio64 as pref if all bridges have MEM_64 Yinghai Lu
                   ` (48 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

We still get "no compatible bridge window" warning on sparc T5-8
after we add support for 64bit resource parsing for root bus.

 PCI: scan_bus[/pci@300/pci@1/pci@0/pci@6] bus no 8
 PCI: Claiming 0000:00:01.0: Resource 15: 0000800100000000..00008004afffffff [220c]
 PCI: Claiming 0000:01:00.0: Resource 15: 0000800100000000..00008004afffffff [220c]
 PCI: Claiming 0000:02:04.0: Resource 15: 0000800100000000..000080012fffffff [220c]
 PCI: Claiming 0000:03:00.0: Resource 15: 0000800100000000..000080012fffffff [220c]
 PCI: Claiming 0000:04:06.0: Resource 14: 0000800100000000..000080010fffffff [220c]
 PCI: Claiming 0000:05:00.0: Resource 0: 0000800100000000..0000800100001fff [204]
 pci 0000:05:00.0: can't claim BAR 0 [mem 0x800100000000-0x800100001fff]: no compatible bridge window

All the bridges 64-bit resource have pref bit, but the device resource does not
have pref set, then we can not find parent for the device resource,
as we can not put non-pref mmio under pref mmio.

According to pcie spec errta
https://www.pcisig.com/specifications/pciexpress/base2/PCIe_Base_r2.1_Errata_08Jun10.pdf
page 13, in some case it is ok to mark some as pref.

Mark if the entire path from the host to the adapter is over PCI Express.
Set pref compatible bit for claim/sizing/assign for 64bit mem resource
on that pcie device.

-v2: set pref for mmio 64 when whole path is PCI Express, according to David Miller.
-v3: don't set pref directly, change to UNDER_PREF, and set PREF before
     sizing and assign resource, and cleart PREF afterwards. requested by BenH.
-v4: use on_all_pcie_path device flag instead.
-v6: update after pci_find_bus_resource() change

Fixes: commit d63e2e1f3df9 ("sparc/PCI: Clip bridge windows to fit in upstream windows")
Link: http://lkml.kernel.org/r/CAE9FiQU1gJY1LYrxs+ma5LCTEEe4xmtjRG0aXJ9K_Tsu+m9Wuw@mail.gmail.com
Reported-by: David Ahern <david.ahern@oracle.com>
Tested-by: David Ahern <david.ahern@oracle.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=81431
Tested-by: TJ <linux@iam.tj>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Tested-by: Khalid Aziz <khalid.aziz@oracle.com>
---
 arch/sparc/kernel/pci.c        |  2 +-
 arch/sparc/kernel/pci_common.c |  2 +-
 drivers/pci/pci.c              |  8 +++++---
 drivers/pci/pci.h              |  2 ++
 drivers/pci/probe.c            | 33 +++++++++++++++++++++++++++++++++
 drivers/pci/setup-bus.c        | 23 +++++++++++++++++++----
 drivers/pci/setup-res.c        |  4 ++++
 include/linux/pci.h            |  3 ++-
 8 files changed, 67 insertions(+), 10 deletions(-)

diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c
index 9415abc..e46e739 100644
--- a/arch/sparc/kernel/pci.c
+++ b/arch/sparc/kernel/pci.c
@@ -761,7 +761,7 @@ static int __pci_mmap_make_offset_bus(struct pci_dev *pdev, struct vm_area_struc
 	bus = pdev->bus;
 	while (bus->parent)
 		bus = bus->parent;
-	root_bus_res = pci_find_bus_resource(bus, &res);
+	root_bus_res = pci_find_bus_resource(bus, &res, res.flags);
 	if (!root_bus_res)
 		return -EINVAL;
 
diff --git a/arch/sparc/kernel/pci_common.c b/arch/sparc/kernel/pci_common.c
index 1ebc7ff..6f206a1 100644
--- a/arch/sparc/kernel/pci_common.c
+++ b/arch/sparc/kernel/pci_common.c
@@ -343,7 +343,7 @@ static void pci_register_region(struct pci_bus *bus, const char *name,
 	region.start = rstart;
 	region.end = rstart + size - 1UL;
 	pcibios_bus_to_resource(bus, res, &region);
-	bus_res = pci_find_bus_resource(bus, res);
+	bus_res = pci_find_bus_resource(bus, res, res->flags);
 	if (!bus_res) {
 		kfree(res);
 		return;
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 313dea5..bd72df3 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -415,7 +415,7 @@ int pci_find_ht_capability(struct pci_dev *dev, int ht_cap)
 EXPORT_SYMBOL_GPL(pci_find_ht_capability);
 
 struct resource *pci_find_bus_resource(const struct pci_bus *bus,
-					struct resource *res)
+					struct resource *res, int flags)
 {
 	struct resource *r;
 	int i;
@@ -430,7 +430,7 @@ struct resource *pci_find_bus_resource(const struct pci_bus *bus,
 			 * not, the allocator made a mistake.
 			 */
 			if (r->flags & IORESOURCE_PREFETCH &&
-			    !(res->flags & IORESOURCE_PREFETCH))
+			    !(flags & IORESOURCE_PREFETCH))
 				return NULL;
 
 			/*
@@ -458,7 +458,9 @@ struct resource *pci_find_bus_resource(const struct pci_bus *bus,
 struct resource *pci_find_parent_resource(const struct pci_dev *dev,
 					  struct resource *res)
 {
-	return pci_find_bus_resource(dev->bus, res);
+	int flags = pci_resource_pref_compatible(dev, res);
+
+	return pci_find_bus_resource(dev->bus, res, flags);
 }
 EXPORT_SYMBOL(pci_find_parent_resource);
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index d0fb934..90e6e3e 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -335,4 +335,6 @@ static inline int pci_dev_specific_reset(struct pci_dev *dev, int probe)
 }
 #endif
 
+int pci_resource_pref_compatible(const struct pci_dev *dev,
+				 struct resource *res);
 #endif /* DRIVERS_PCI_H */
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 8004f67..48e6f29 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1731,6 +1731,36 @@ static void pci_dma_configure(struct pci_dev *dev)
 	pci_put_host_bridge_device(bridge);
 }
 
+static bool pci_up_path_over_pcie(struct pci_bus *bus)
+{
+	if (pci_is_root_bus(bus))
+		return true;
+
+	if (bus->self && !pci_is_pcie(bus->self))
+		return false;
+
+	return pci_up_path_over_pcie(bus->parent);
+}
+
+/*
+ * According to
+ * https://www.pcisig.com/specifications/pciexpress/base2/PCIe_Base_r2.1_Errata_08Jun10.pdf
+ * page 13, system firmware could put some 64bit non-pref under 64bit pref,
+ * on some cases.
+ * Let's mark if entire path from the host to the adapter is over PCI
+ * Express. later will use that compute pref compaitable bit.
+ */
+static void pci_set_on_all_pcie_path(struct pci_dev *dev)
+{
+	if (!pci_is_pcie(dev))
+		return;
+
+	if (!pci_up_path_over_pcie(dev->bus))
+		return;
+
+	dev->on_all_pcie_path = 1;
+}
+
 void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
 {
 	int ret;
@@ -1761,6 +1791,9 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
 	/* Initialize various capabilities */
 	pci_init_capabilities(dev);
 
+	/* After pcie_cap is assigned */
+	pci_set_on_all_pcie_path(dev);
+
 	/*
 	 * Add the device to our list of discovered devices
 	 * and the bus list for fixup functions, etc.
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 55641a3..b3b1565 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -738,6 +738,20 @@ int pci_claim_bridge_resource(struct pci_dev *bridge, int i)
 	return -EINVAL;
 }
 
+int pci_resource_pref_compatible(const struct pci_dev *dev,
+				 struct resource *res)
+{
+	if (res->flags & IORESOURCE_PREFETCH)
+		return res->flags;
+
+	if ((res->flags & IORESOURCE_MEM) &&
+	    (res->flags & IORESOURCE_MEM_64) &&
+	    dev->on_all_pcie_path)
+		return res->flags | IORESOURCE_PREFETCH;
+
+	return res->flags;
+}
+
 /* Check whether the bridge supports optional I/O and
    prefetchable memory ranges. If not, the respective
    base/limit registers must be read-only and read as 0. */
@@ -1035,11 +1049,12 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 		for (i = 0; i < PCI_NUM_RESOURCES; i++) {
 			struct resource *r = &dev->resource[i];
 			resource_size_t r_size;
+			int flags = pci_resource_pref_compatible(dev, r);
 
-			if (r->parent || (r->flags & IORESOURCE_PCI_FIXED) ||
-			    ((r->flags & mask) != type &&
-			     (r->flags & mask) != type2 &&
-			     (r->flags & mask) != type3))
+			if (r->parent || (flags & IORESOURCE_PCI_FIXED) ||
+			    ((flags & mask) != type &&
+			     (flags & mask) != type2 &&
+			     (flags & mask) != type3))
 				continue;
 			r_size = resource_size(r);
 #ifdef CONFIG_PCI_IOV
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 66c4d8f..f741fed 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -257,15 +257,19 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 static int _pci_assign_resource(struct pci_dev *dev, int resno,
 				resource_size_t size, resource_size_t min_align)
 {
+	struct resource *res = dev->resource + resno;
+	int old_flags = res->flags;
 	struct pci_bus *bus;
 	int ret;
 
+	res->flags = pci_resource_pref_compatible(dev, res);
 	bus = dev->bus;
 	while ((ret = __pci_assign_resource(bus, dev, resno, size, min_align))) {
 		if (!bus->parent || !bus->self->transparent)
 			break;
 		bus = bus->parent;
 	}
+	res->flags = old_flags;
 
 	return ret;
 }
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 795b4c7..1527735 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -305,6 +305,7 @@ struct pci_dev {
 						   powered on/off by the
 						   corresponding bridge */
 	unsigned int	ignore_hotplug:1;	/* Ignore hotplug events */
+	unsigned int	on_all_pcie_path:1;	/* up to host-bridge all pcie */
 	unsigned int	d3_delay;	/* D3->D0 transition time in ms */
 	unsigned int	d3cold_delay;	/* D3cold->D0 transition time in ms */
 
@@ -808,7 +809,7 @@ void pcibios_resource_to_bus(struct pci_bus *bus, struct pci_bus_region *region,
 void pcibios_bus_to_resource(struct pci_bus *bus, struct resource *res,
 			     struct pci_bus_region *region);
 struct resource *pci_find_bus_resource(const struct pci_bus *bus,
-					struct resource *res);
+					struct resource *res, int flags);
 void pcibios_scan_specific_bus(int busn);
 struct pci_bus *pci_find_bus(int domain, int busnr);
 void pci_bus_add_devices(const struct pci_bus *bus);
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 13/60] PCI: Only treat non-pref mmio64 as pref if all bridges have MEM_64
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (11 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 12/60] PCI: Check pref compatible bit for mem64 resource of PCIe device Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 14/60] PCI: Add has_mem64 for struct host_bridge Yinghai Lu
                   ` (47 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

If any bridge up to root only have 32bit pref mmio, We don't need to
treat device non-pref mmio64 as as pref mmio64.

We need to move pci_bridge_check_ranges calling early.
For parent bridges pref mmio BAR may not allocated by BIOS, res flags
is still 0, we need to have it correct set before we check them for
child device resources.

-v2: check all bus resources instead of just res[15].

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Tested-by: Khalid Aziz <khalid.aziz@oracle.com>
---
 drivers/pci/setup-bus.c | 31 +++++++++++++++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index b3b1565..ffb1941 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -738,6 +738,29 @@ int pci_claim_bridge_resource(struct pci_dev *bridge, int i)
 	return -EINVAL;
 }
 
+static bool pci_up_path_over_pref_mem64(struct pci_bus *bus)
+{
+	if (pci_is_root_bus(bus))
+		return true;
+
+	if (bus->self) {
+		int i;
+		bool found = false;
+		struct resource *res;
+
+		pci_bus_for_each_resource(bus, res, i)
+			if (res->flags & IORESOURCE_MEM_64) {
+				found = true;
+				break;
+			}
+
+		if (!found)
+			return false;
+	}
+
+	return pci_up_path_over_pref_mem64(bus->parent);
+}
+
 int pci_resource_pref_compatible(const struct pci_dev *dev,
 				 struct resource *res)
 {
@@ -746,7 +769,8 @@ int pci_resource_pref_compatible(const struct pci_dev *dev,
 
 	if ((res->flags & IORESOURCE_MEM) &&
 	    (res->flags & IORESOURCE_MEM_64) &&
-	    dev->on_all_pcie_path)
+	    dev->on_all_pcie_path &&
+	    pci_up_path_over_pref_mem64(dev->bus))
 		return res->flags | IORESOURCE_PREFETCH;
 
 	return res->flags;
@@ -1239,6 +1263,10 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
 	struct resource *b_res;
 	int ret;
 
+	if (!pci_is_root_bus(bus) &&
+	    (bus->self->class >> 8) == PCI_CLASS_BRIDGE_PCI)
+		pci_bridge_check_ranges(bus);
+
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		struct pci_bus *b = dev->subordinate;
 		if (!b)
@@ -1266,7 +1294,6 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
 		break;
 
 	case PCI_CLASS_BRIDGE_PCI:
-		pci_bridge_check_ranges(bus);
 		if (bus->self->is_hotplug_bridge) {
 			additional_io_size  = pci_hotplug_io_size;
 			additional_mem_size = pci_hotplug_mem_size;
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 14/60] PCI: Add has_mem64 for struct host_bridge
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (12 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 13/60] PCI: Only treat non-pref mmio64 as pref if all bridges have MEM_64 Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 15/60] PCI: Only treat non-pref mmio64 as pref if host bridge has mmio64 Yinghai Lu
                   ` (46 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

Add has_mem64 for struct host_bridge, on root bus that does not support
mmio64 above 4g, will not set that.

We will use that info next two following patches:
1. Don't treat non-pref mmio64 as pref mmio, so will not put
   it under bridge's pref range when rescan the devices
2. will keep pref mmio64 and pref mmio32 under bridge pref bar.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Tested-by: Khalid Aziz <khalid.aziz@oracle.com>
---
 drivers/pci/probe.c | 7 +++++++
 include/linux/pci.h | 1 +
 2 files changed, 8 insertions(+)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 48e6f29..6b079f4 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2230,6 +2230,13 @@ struct pci_bus *pci_create_root_bus(struct device *parent, int bus,
 		} else
 			bus_addr[0] = '\0';
 		dev_info(&b->dev, "root bus resource %pR%s\n", res, bus_addr);
+
+		if (resource_type(res) == IORESOURCE_MEM) {
+			if ((res->end - offset) > 0xffffffff)
+				bridge->has_mem64 = 1;
+			if ((res->start - offset) > 0xffffffff)
+				res->flags |= IORESOURCE_MEM_64;
+		}
 	}
 
 	down_write(&pci_bus_sem);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 1527735..979be25 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -408,6 +408,7 @@ struct pci_host_bridge {
 	void (*release_fn)(struct pci_host_bridge *);
 	void *release_data;
 	unsigned int ignore_reset_delay:1;	/* for entire hierarchy */
+	unsigned int has_mem64:1;
 	/* Resource alignment requirements */
 	resource_size_t (*align_resource)(struct pci_dev *dev,
 			const struct resource *res,
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 15/60] PCI: Only treat non-pref mmio64 as pref if host bridge has mmio64
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (13 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 14/60] PCI: Add has_mem64 for struct host_bridge Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 16/60] PCI: Restore pref MMIO allocation logic for host bridge without mmio64 Yinghai Lu
                   ` (45 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

If host bridge does not have mmio64 above 4G, We don't need to
treat device non-pref mmio64 as as pref mmio64.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Tested-by: Khalid Aziz <khalid.aziz@oracle.com>
---
 drivers/pci/setup-bus.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index ffb1941..9404032 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -741,7 +741,7 @@ int pci_claim_bridge_resource(struct pci_dev *bridge, int i)
 static bool pci_up_path_over_pref_mem64(struct pci_bus *bus)
 {
 	if (pci_is_root_bus(bus))
-		return true;
+		return to_pci_host_bridge(bus->bridge)->has_mem64;
 
 	if (bus->self) {
 		int i;
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 16/60] PCI: Restore pref MMIO allocation logic for host bridge without mmio64
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (14 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 15/60] PCI: Only treat non-pref mmio64 as pref if host bridge has mmio64 Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 17/60] PCI: Don't release fixed resource for realloc Yinghai Lu
                   ` (44 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

>From 5b2854155 (PCI: Restrict 64-bit prefetchable bridge windows to 64-bit
resources), we change the logic for pref mmio allocation:
When bridge pref support mmio64, we will only put children pref
that support mmio64 into it, and will put children pref mmio32
into bridge's non-pref mmio32.

That could leave bridge pref bar not used when that pref bar is mmio64,
and children res only has mmio32.
Also could have allocation failure when non-pref mmio32 is not big
enough space for those children pref mmio32.

That is not rational when the host bridge does not have 64bit mmio
above 4g at all.

The patch restore to old logic:
when host bridge does not have has_mem64, put children pref mmio64 and
pref mmio32 all under bridges pref bars.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Tested-by: Khalid Aziz <khalid.aziz@oracle.com>
---
 drivers/pci/bus.c       |  4 +++-
 drivers/pci/setup-bus.c | 13 +++++++++----
 drivers/pci/setup-res.c |  9 ++++++---
 3 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 6c9f546..200fdac 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -204,8 +204,10 @@ int pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
 {
 #ifdef CONFIG_PCI_BUS_ADDR_T_64BIT
 	int rc;
+	unsigned long mmio64 = pci_find_host_bridge(bus)->has_mem64 ?
+				IORESOURCE_MEM_64 : 0;
 
-	if (res->flags & IORESOURCE_MEM_64) {
+	if (res->flags & mmio64) {
 		rc = pci_bus_alloc_from_region(bus, res, size, align, min,
 					       type_mask, alignf, alignf_data,
 					       &pci_high);
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 9404032..0845a57 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1311,7 +1311,8 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
 		b_res = &bus->self->resource[PCI_BRIDGE_RESOURCES];
 		mask = IORESOURCE_MEM;
 		prefmask = IORESOURCE_MEM | IORESOURCE_PREFETCH;
-		if (b_res[2].flags & IORESOURCE_MEM_64) {
+		if ((b_res[2].flags & IORESOURCE_MEM_64) &&
+		    pci_find_host_bridge(bus)->has_mem64) {
 			prefmask |= IORESOURCE_MEM_64;
 			ret = pbus_size_mem(bus, prefmask, prefmask,
 				  prefmask, prefmask,
@@ -1513,17 +1514,21 @@ static void pci_bridge_release_resources(struct pci_bus *bus,
 	 *	  io port.
 	 *     2. if there is non pref mmio assign fail, release bridge
 	 *	  nonpref mmio.
-	 *     3. if there is 64bit pref mmio assign fail, and bridge pref
+	 *     3. if there is pref mmio assign fail, and host bridge does
+	 *	  have 64bit mmio, release bridge pref mmio.
+	 *     4. if there is 64bit pref mmio assign fail, and bridge pref
 	 *	  is 64bit, release bridge pref mmio.
-	 *     4. if there is pref mmio assign fail, and bridge pref is
+	 *     5. if there is pref mmio assign fail, and bridge pref is
 	 *	  32bit mmio, release bridge pref mmio
-	 *     5. if there is pref mmio assign fail, and bridge pref is not
+	 *     6. if there is pref mmio assign fail, and bridge pref is not
 	 *	  assigned, release bridge nonpref mmio.
 	 */
 	if (type & IORESOURCE_IO)
 		idx = 0;
 	else if (!(type & IORESOURCE_PREFETCH))
 		idx = 1;
+	else if (!pci_find_host_bridge(bus)->has_mem64)
+		idx = 2;
 	else if ((type & IORESOURCE_MEM_64) &&
 		 (b_res[2].flags & IORESOURCE_MEM_64))
 		idx = 2;
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index f741fed..59271ee 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -212,6 +212,8 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 	struct resource *res = dev->resource + resno;
 	resource_size_t min;
 	int ret;
+	unsigned long mmio64 = pci_find_host_bridge(bus)->has_mem64 ?
+				IORESOURCE_MEM_64 : 0;
 
 	min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO : PCIBIOS_MIN_MEM;
 
@@ -223,7 +225,7 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 	 * things differently than they were sized, not everything will fit.
 	 */
 	ret = pci_bus_alloc_resource(bus, res, size, align, min,
-				     IORESOURCE_PREFETCH | IORESOURCE_MEM_64,
+				     IORESOURCE_PREFETCH | mmio64,
 				     pcibios_align_resource, dev);
 	if (ret == 0)
 		return 0;
@@ -232,7 +234,8 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 	 * If the prefetchable window is only 32 bits wide, we can put
 	 * 64-bit prefetchable resources in it.
 	 */
-	if ((res->flags & (IORESOURCE_PREFETCH | IORESOURCE_MEM_64)) ==
+	if (mmio64 &&
+	    (res->flags & (IORESOURCE_PREFETCH | IORESOURCE_MEM_64)) ==
 	     (IORESOURCE_PREFETCH | IORESOURCE_MEM_64)) {
 		ret = pci_bus_alloc_resource(bus, res, size, align, min,
 					     IORESOURCE_PREFETCH,
@@ -247,7 +250,7 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 	 * non-prefetchable, the first call already tried the only possibility
 	 * so we don't need to try again.
 	 */
-	if (res->flags & (IORESOURCE_PREFETCH | IORESOURCE_MEM_64))
+	if (res->flags & (IORESOURCE_PREFETCH | mmio64))
 		ret = pci_bus_alloc_resource(bus, res, size, align, min, 0,
 					     pcibios_align_resource, dev);
 
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 17/60] PCI: Don't release fixed resource for realloc
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (15 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 16/60] PCI: Restore pref MMIO allocation logic for host bridge without mmio64 Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 18/60] PCI: Claim fixed resource during remove/rescan path Yinghai Lu
                   ` (43 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu, stable

We should not release bridge resource if there is fixed resources
under it, otherwise the children firmware would stop working.

Reported-by: Paul Johnson <pjay@nwtrail.com>
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=92351
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: stable@vger.kernel.org
---
 drivers/pci/setup-bus.c |  6 ++++--
 include/linux/ioport.h  |  2 +-
 kernel/resource.c       | 28 ++++++++++++++++++++++++++--
 3 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 0845a57..815d2de 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1540,14 +1540,16 @@ static void pci_bridge_release_resources(struct pci_bus *bus,
 
 	r = &b_res[idx];
 
-	if (!r->parent)
+	if (!r->parent || r->flags & IORESOURCE_PCI_FIXED)
 		return;
 
 	/*
 	 * if there are children under that, we should release them
 	 *  all
 	 */
-	release_child_resources(r);
+	if (!release_child_resources(r))
+		return;
+
 	if (!release_resource(r)) {
 		type = old_flags = r->flags & type_mask;
 		dev_printk(KERN_DEBUG, &dev->dev, "resource %d %pR released\n",
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 0b65543..9053ac9 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -165,7 +165,7 @@ extern struct resource iomem_resource;
 extern struct resource *request_resource_conflict(struct resource *root, struct resource *new);
 extern int request_resource(struct resource *root, struct resource *new);
 extern int release_resource(struct resource *new);
-void release_child_resources(struct resource *new);
+bool release_child_resources(struct resource *new);
 extern void reserve_region_with_split(struct resource *root,
 			     resource_size_t start, resource_size_t end,
 			     const char *name);
diff --git a/kernel/resource.c b/kernel/resource.c
index 2e78ead..c5dbe02 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -285,11 +285,35 @@ static void __release_child_resources(struct resource *r)
 	}
 }
 
-void release_child_resources(struct resource *r)
+static bool __has_fixed_child_resources(struct resource *r)
 {
+	struct resource *p;
+
+	p = r->child;
+	while (p) {
+		if (p->flags & IORESOURCE_PCI_FIXED)
+			return true;
+
+		if (__has_fixed_child_resources(p))
+			return true;
+
+		p = p->sibling;
+	}
+
+	return false;
+}
+
+bool release_child_resources(struct resource *r)
+{
+	bool fixed;
+
 	write_lock(&resource_lock);
-	__release_child_resources(r);
+	fixed = __has_fixed_child_resources(r);
+	if (!fixed)
+		__release_child_resources(r);
 	write_unlock(&resource_lock);
+
+	return !fixed;
 }
 
 /**
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 18/60] PCI: Claim fixed resource during remove/rescan path
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (16 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 17/60] PCI: Don't release fixed resource for realloc Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 19/60] PCI: Set resource to FIXED for LSI devices Yinghai Lu
                   ` (42 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

During driver loading kernel checks if resources get reserved.
so we need to make sure resources get reserved before pci_bus_add().

On remove/rescan path, we will leave those fixed resource
not reserved. In that path, We don't call pcibios_resource_survery()
before pci_assign_unassigned_bus_resources(), and that is intentional
for us to get new resources for rescan. We do need to use rescan to make
device get resource allocated while ignoring BIOS allocate resource.

But fixed resources are not allocated via
pci_assign_unassigned_bus_resources(), so we need to reserve them
explicitly.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/quirks.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 21d545d..d11af7f 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -335,6 +335,23 @@ static void quirk_s3_64M(struct pci_dev *dev)
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_S3,	PCI_DEVICE_ID_S3_868,		quirk_s3_64M);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_S3,	PCI_DEVICE_ID_S3_968,		quirk_s3_64M);
 
+/* for pci remove and rescan */
+static void quirk_allocate_fixed(struct pci_dev *dev)
+{
+	int i;
+	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
+		struct resource *r = &dev->resource[i];
+
+		if (r->parent ||
+		    !(r->flags & IORESOURCE_PCI_FIXED) ||
+		    !(r->flags & (IORESOURCE_IO | IORESOURCE_MEM)))
+			continue;
+
+		pci_claim_resource(dev, i);
+	}
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_ANY_ID,	PCI_ANY_ID,	quirk_allocate_fixed);
+
 static void quirk_io(struct pci_dev *dev, int pos, unsigned size,
 		     const char *name)
 {
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 19/60] PCI: Set resource to FIXED for LSI devices
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (17 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 18/60] PCI: Claim fixed resource during remove/rescan path Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 20/60] PCI: Separate realloc list checking after allocation Yinghai Lu
                   ` (41 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu, stable

LSI HBA firmware stop responding pci read from host if pci core ever change
pci device BAR values.

Set their resources to FIXED, so let realloc to skip them.

v2: check if start is 0.

Reported-by: Paul Johnson <pjay@nwtrail.com>
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=92351
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: stable@vger.kernel.org
---
 drivers/pci/pci.h       |  1 +
 drivers/pci/quirks.c    | 20 ++++++++++++++++++++
 drivers/pci/setup-bus.c |  4 ++++
 3 files changed, 25 insertions(+)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 90e6e3e..0ac4229 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -170,6 +170,7 @@ static inline void pci_msix_clear_and_set_ctrl(struct pci_dev *dev, u16 clear, u
 }
 
 void pci_realloc_get_opt(char *);
+bool pci_realloc_user_enabled(void);
 
 static inline int pci_no_d1d2(struct pci_dev *dev)
 {
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index d11af7f..a7cd617 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -335,6 +335,26 @@ static void quirk_s3_64M(struct pci_dev *dev)
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_S3,	PCI_DEVICE_ID_S3_868,		quirk_s3_64M);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_S3,	PCI_DEVICE_ID_S3_968,		quirk_s3_64M);
 
+/*
+ * LSI devices firmware does not like BAR get changed
+ */
+static void quirk_bar_fixed(struct pci_dev *dev)
+{
+	int i;
+
+	if (pci_realloc_user_enabled())
+		return;
+
+	for (i = 0; i < PCI_STD_RESOURCE_END; i++) {
+		struct resource *r = &dev->resource[i];
+
+		if (!r->start || !r->flags)
+			continue;
+		r->flags |= IORESOURCE_PCI_FIXED;
+	}
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC,	PCI_ANY_ID,	quirk_bar_fixed);
+
 /* for pci remove and rescan */
 static void quirk_allocate_fixed(struct pci_dev *dev)
 {
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 815d2de..6385cf7 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1677,6 +1677,10 @@ void __init pci_realloc_get_opt(char *str)
 	else if (!strncmp(str, "on", 2))
 		pci_realloc_enable = user_enabled;
 }
+bool pci_realloc_user_enabled(void)
+{
+	return pci_realloc_enable == user_enabled;
+}
 static bool pci_realloc_enabled(enum enable_type enable)
 {
 	return enable >= user_enabled;
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 20/60] PCI: Separate realloc list checking after allocation
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (18 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 19/60] PCI: Set resource to FIXED for LSI devices Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 21/60] PCI: Treat optional as required in first try for bridge rescan Yinghai Lu
                   ` (40 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu, Rafael J. Wysocki, Len Brown, linux-acpi

We check the realloc list, as list must be empty after allocation.

Separate the realloc list checking to another function.

Add checking that is missed in acpiphp driver.

-v2: change from BUG_ON to WARN_ON according to Rafael.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: linux-acpi@vger.kernel.org
---
 drivers/pci/hotplug/acpiphp_glue.c |  1 +
 drivers/pci/pci.h                  |  1 +
 drivers/pci/setup-bus.c            | 12 +++++++++---
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
index fa49f91..c35983a 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -507,6 +507,7 @@ static void enable_slot(struct acpiphp_slot *slot)
 		}
 	}
 	__pci_bus_assign_resources(bus, &add_list, NULL);
+	pci_bus_check_realloc(&add_list);
 
 	acpiphp_sanitize_bus(bus);
 	pcie_bus_configure_settings(bus);
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 0ac4229..0bb8eee 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -238,6 +238,7 @@ void __pci_bus_size_bridges(struct pci_bus *bus,
 void __pci_bus_assign_resources(const struct pci_bus *bus,
 				struct list_head *realloc_head,
 				struct list_head *fail_head);
+void pci_bus_check_realloc(struct list_head *realloc_head);
 bool pci_bus_clip_resource(struct pci_dev *dev, int idx);
 
 void pci_reassigndev_resource_alignment(struct pci_dev *dev);
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 6385cf7..7d58f3f 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -279,6 +279,12 @@ out:
 	}
 }
 
+void pci_bus_check_realloc(struct list_head *realloc_head)
+{
+	if (WARN_ON(!list_empty(realloc_head)))
+		free_list(realloc_head);
+}
+
 /**
  * assign_requested_resources_sorted() - satisfy resource requests
  *
@@ -1776,7 +1782,7 @@ again:
 	/* Depth last, allocate resources and update the hardware. */
 	__pci_bus_assign_resources(bus, add_list, &fail_head);
 	if (add_list)
-		BUG_ON(!list_empty(add_list));
+		pci_bus_check_realloc(add_list);
 	tried_times++;
 
 	/* any device complain? */
@@ -1851,7 +1857,7 @@ void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge)
 again:
 	__pci_bus_size_bridges(parent, &add_list);
 	__pci_bridge_assign_resources(bridge, &add_list, &fail_head);
-	BUG_ON(!list_empty(&add_list));
+	pci_bus_check_realloc(&add_list);
 	tried_times++;
 
 	if (list_empty(&fail_head))
@@ -1910,6 +1916,6 @@ void pci_assign_unassigned_bus_resources(struct pci_bus *bus)
 							 &add_list);
 	up_read(&pci_bus_sem);
 	__pci_bus_assign_resources(bus, &add_list, NULL);
-	BUG_ON(!list_empty(&add_list));
+	pci_bus_check_realloc(&add_list);
 }
 EXPORT_SYMBOL_GPL(pci_assign_unassigned_bus_resources);
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 21/60] PCI: Treat optional as required in first try for bridge rescan
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (19 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 20/60] PCI: Separate realloc list checking after allocation Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 22/60] PCI: Get new realloc size for bridge for last try Yinghai Lu
                   ` (39 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

For rescan bridge/bus that children are removed before, we should treat
optional as required just like root bus the boot time in 19aa7ee432ce
(PCI: make re-allocation try harder by reassigning ranges higher in
the heirarchy).

The reason: allocate required and expand to optional path do not
put failed resource to fail list, so will lose required info before
next try.

So we are using following way:
1. First and following try before last try:
   We don't keep realloc list so treat every optional as required.
   allocate for required+optional and put failed in the fail list.
   then size info (include must and optonal separatedly) will be kept
   for next try.
2. last try:
   a: try to allocate required+optional to see if all get allocated.
   b: try to allocate required then expand to optional.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 7d58f3f..3dc4ac9 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1845,25 +1845,34 @@ void __init pci_assign_unassigned_resources(void)
 void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge)
 {
 	struct pci_bus *parent = bridge->subordinate;
-	LIST_HEAD(add_list); /* list of resources that
+	LIST_HEAD(realloc_head); /* list of resources that
 					want additional resources */
+	struct list_head *add_list = NULL;
 	int tried_times = 0;
 	LIST_HEAD(fail_head);
 	struct pci_dev_resource *fail_res;
 	int retval;
 	unsigned long type_mask = IORESOURCE_IO | IORESOURCE_MEM |
 				  IORESOURCE_PREFETCH | IORESOURCE_MEM_64;
+	int pci_try_num = 2;
 
 again:
-	__pci_bus_size_bridges(parent, &add_list);
-	__pci_bridge_assign_resources(bridge, &add_list, &fail_head);
-	pci_bus_check_realloc(&add_list);
+	/*
+	 * last try will use add_list, otherwise will try good to have as
+	 * must have, so can realloc parent bridge resource
+	 */
+	if (tried_times + 1 == pci_try_num)
+		add_list = &realloc_head;
+	__pci_bus_size_bridges(parent, add_list);
+	__pci_bridge_assign_resources(bridge, add_list, &fail_head);
+	if (add_list)
+		pci_bus_check_realloc(add_list);
 	tried_times++;
 
 	if (list_empty(&fail_head))
 		goto enable_all;
 
-	if (tried_times >= 2) {
+	if (tried_times >= pci_try_num) {
 		/* still fail, don't need to try more */
 		free_list(&fail_head);
 		goto enable_all;
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 22/60] PCI: Get new realloc size for bridge for last try
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (20 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 21/60] PCI: Treat optional as required in first try for bridge rescan Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 23/60] PCI: Don't release sibling bridge resources during hotplug Yinghai Lu
                   ` (38 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

Current realloc path would not shrink bridge resource through
pbus_size_mem() checking with the old size.

That cause problem: when "required+optional" resource allocation fails,
the cached bridge resource size will prevent "required" resource to get
allocated smaller resource.

Clear the old resource size for last try or third and later try.

-v3: for last or third time and later.
     change reset_bridge_resource_size to static according to Fengguang.
-v4: don't clear size for bridge's normal resources.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=81431
Tested-by: TJ <linux@iam.tj>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 3dc4ac9..d3a39b7 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1738,6 +1738,17 @@ static enum enable_type pci_realloc_detect(struct pci_bus *bus,
 }
 #endif
 
+static void reset_bridge_resource_size(struct pci_dev *dev,
+				       struct resource *res)
+{
+	int idx = res - &dev->resource[0];
+
+	if (idx >= PCI_BRIDGE_RESOURCES && idx <= PCI_BRIDGE_RESOURCE_END) {
+		res->start = 0;
+		res->end = res->start - 1;
+	}
+}
+
 /*
  * first try will not touch pci bridge res
  * second and later try will clear small leaf bridge res
@@ -1822,8 +1833,13 @@ again:
 		res->start = fail_res->start;
 		res->end = fail_res->end;
 		res->flags = fail_res->flags;
-		if (fail_res->dev->subordinate)
+		if (fail_res->dev->subordinate) {
 			res->flags = 0;
+			/* last or third times and later */
+			if (tried_times + 1 == pci_try_num ||
+			    tried_times + 1 > 2)
+				reset_bridge_resource_size(fail_res->dev, res);
+		}
 	}
 	free_list(&fail_head);
 
@@ -1897,8 +1913,11 @@ again:
 		res->start = fail_res->start;
 		res->end = fail_res->end;
 		res->flags = fail_res->flags;
-		if (fail_res->dev->subordinate)
+		if (fail_res->dev->subordinate) {
 			res->flags = 0;
+			/* last time */
+			reset_bridge_resource_size(fail_res->dev, res);
+		}
 	}
 	free_list(&fail_head);
 
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 23/60] PCI: Don't release sibling bridge resources during hotplug
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (21 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 22/60] PCI: Get new realloc size for bridge for last try Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 24/60] PCI: Cleanup res_to_dev_res() printout Yinghai Lu
                   ` (37 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

On hotplug path, we can not touch sibling bridges that is outside
of the slot.

That could happen when BIOS does not assign some bridge BARs and
later kernel can not assign resource to them in first try.

Check if fail dev is the parent bridge, then just use subordinate
bus instead use parent bus.

Reported-by: Andreas Noever <andreas.noever@gmail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index d3a39b7..4674e6b 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1901,10 +1901,16 @@ again:
 	 * Try to release leaf bridge's resources that doesn't fit resource of
 	 * child device under that bridge
 	 */
-	list_for_each_entry(fail_res, &fail_head, list)
-		pci_bus_release_bridge_resources(fail_res->dev->bus,
+	list_for_each_entry(fail_res, &fail_head, list) {
+		struct pci_bus *bus = fail_res->dev->bus;
+
+		if (fail_res->dev == bridge)
+			bus = bridge->subordinate;
+
+		pci_bus_release_bridge_resources(bus,
 						 fail_res->flags & type_mask,
 						 whole_subtree);
+	}
 
 	/* restore size and flags */
 	list_for_each_entry(fail_res, &fail_head, list) {
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 24/60] PCI: Cleanup res_to_dev_res() printout
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (22 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 23/60] PCI: Don't release sibling bridge resources during hotplug Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 25/60] PCI: Reuse res_to_dev_res() in reassign_resources_sorted() Yinghai Lu
                   ` (36 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

Now get_res_add_size() and get_res_add_align() all have same printout
from res_to_dev_res(), and it is confusing.

Move out debug messages printout from res_to_dev_res(),
so later we will reuse res_to_dev_res() in other functions.

-v2: does not print out when add_size or min_align is 0
-v3: change to %#llx according to Bjorn.


Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 34 ++++++++++++++++++++--------------
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 4674e6b..dd33234 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -103,19 +103,9 @@ static struct pci_dev_resource *res_to_dev_res(struct list_head *head,
 {
 	struct pci_dev_resource *dev_res;
 
-	list_for_each_entry(dev_res, head, list) {
-		if (dev_res->res == res) {
-			int idx = res - &dev_res->dev->resource[0];
-
-			dev_printk(KERN_DEBUG, &dev_res->dev->dev,
-				 "res[%d]=%pR res_to_dev_res add_size %llx min_align %llx\n",
-				 idx, dev_res->res,
-				 (unsigned long long)dev_res->add_size,
-				 (unsigned long long)dev_res->min_align);
-
+	list_for_each_entry(dev_res, head, list)
+		if (dev_res->res == res)
 			return dev_res;
-		}
-	}
 
 	return NULL;
 }
@@ -126,7 +116,15 @@ static resource_size_t get_res_add_size(struct list_head *head,
 	struct pci_dev_resource *dev_res;
 
 	dev_res = res_to_dev_res(head, res);
-	return dev_res ? dev_res->add_size : 0;
+	if (!dev_res || !dev_res->add_size)
+		return 0;
+
+	dev_printk(KERN_DEBUG, &dev_res->dev->dev,
+		   "BAR %d: %pR get_res_add_size add_size   %#llx\n",
+		   (int)(res - &dev_res->dev->resource[0]),
+		   res, (unsigned long long)dev_res->add_size);
+
+	return dev_res->add_size;
 }
 
 static resource_size_t get_res_add_align(struct list_head *head,
@@ -135,7 +133,15 @@ static resource_size_t get_res_add_align(struct list_head *head,
 	struct pci_dev_resource *dev_res;
 
 	dev_res = res_to_dev_res(head, res);
-	return dev_res ? dev_res->min_align : 0;
+	if (!dev_res || !dev_res->min_align)
+		return 0;
+
+	dev_printk(KERN_DEBUG, &dev_res->dev->dev,
+		   "BAR %d: %pR get_res_add_align min_align %#llx\n",
+		   (int)(res - &dev_res->dev->resource[0]),
+		   res, (unsigned long long)dev_res->min_align);
+
+	return dev_res->min_align;
 }
 
 
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 25/60] PCI: Reuse res_to_dev_res() in reassign_resources_sorted()
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (23 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 24/60] PCI: Cleanup res_to_dev_res() printout Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 26/60] PCI: Use correct align for optional only resources during sorting Yinghai Lu
                   ` (35 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

Now res_to_dev_res() does not print out debug message anymore, so
we can reuse it in reassign_resource_sorted() without confusing printout.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index dd33234..fc30f80 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -239,26 +239,17 @@ static void reassign_resources_sorted(struct list_head *realloc_head,
 {
 	struct resource *res;
 	struct pci_dev_resource *add_res, *tmp;
-	struct pci_dev_resource *dev_res;
 	resource_size_t add_size, align;
 	int idx;
 
 	list_for_each_entry_safe(add_res, tmp, realloc_head, list) {
-		bool found_match = false;
-
 		res = add_res->res;
 		/* skip resource that has been reset */
 		if (!res->flags)
 			goto out;
 
 		/* skip this resource if not found in head list */
-		list_for_each_entry(dev_res, head, list) {
-			if (dev_res->res == res) {
-				found_match = true;
-				break;
-			}
-		}
-		if (!found_match)/* just skip */
+		if (!res_to_dev_res(head, res))
 			continue;
 
 		idx = res - &add_res->dev->resource[0];
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 26/60] PCI: Use correct align for optional only resources during sorting
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (24 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 25/60] PCI: Reuse res_to_dev_res() in reassign_resources_sorted() Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 27/60] PCI: Optimize bus min_align/size calculation during sizing Yinghai Lu
                   ` (34 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

During sorting before assign, we only put resource with non-zero align
in the sorted list, so for optional resources that required size is 0 and
only have optional parts, we need to have correct align.

While treating SRIOV as optional resources, we always read alignment for
SRIOV bars every time, so they are ok.
Hotplug bridge resources are using STARTALIGN so it is ok when size is 0
if we have correct start for them.

Later we want to treat the ROM BAR as optional resource, and it has
SIZEALIGN, so align=size will be 0. We need to find a way to get align
for them.

We can use optional resource align instead in that case, and it
is ok for SRIOV path and hotplug bridge resource path.

We need to pass realloc list from sizing stage to sorting stage, and
get entry from realloc list and calculate align from the entry.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=81431
Reported-by: TJ <linux@iam.tj>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 49 ++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 42 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index fc30f80..544f518 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -144,9 +144,42 @@ static resource_size_t get_res_add_align(struct list_head *head,
 	return dev_res->min_align;
 }
 
+static resource_size_t __pci_resource_alignment(
+				struct pci_dev *dev,
+				struct resource *r,
+				struct list_head *realloc_head)
+{
+	resource_size_t r_align = pci_resource_alignment(dev, r);
+	resource_size_t orig_start, orig_end;
+	struct pci_dev_resource *dev_res;
+
+	if (r_align || !realloc_head)
+		return r_align;
+
+	dev_res = res_to_dev_res(realloc_head, r);
+	if (!dev_res || !dev_res->add_size)
+		return r_align;
+
+	orig_start = r->start;
+	orig_end = r->end;
+	r->end += dev_res->add_size;
+	if ((r->flags & IORESOURCE_STARTALIGN)) {
+		resource_size_t r_size = resource_size(r);
+
+		r->start = dev_res->min_align;
+		r->end = r->start + r_size - 1;
+	}
+	r_align = pci_resource_alignment(dev, r);
+	r->start = orig_start;
+	r->end = orig_end;
+
+	return r_align;
+}
 
 /* Sort resources by alignment */
-static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
+static void pdev_sort_resources(struct pci_dev *dev,
+				 struct list_head *realloc_head,
+				 struct list_head *head)
 {
 	int i;
 
@@ -164,7 +197,7 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
 		if (!(r->flags) || r->parent)
 			continue;
 
-		r_align = pci_resource_alignment(dev, r);
+		r_align = __pci_resource_alignment(dev, r, realloc_head);
 		if (!r_align) {
 			dev_warn(&dev->dev, "BAR %d: %pR has bogus alignment\n",
 				 i, r);
@@ -182,8 +215,9 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
 		list_for_each_entry(dev_res, head, list) {
 			resource_size_t align;
 
-			align = pci_resource_alignment(dev_res->dev,
-							 dev_res->res);
+			align = __pci_resource_alignment(dev_res->dev,
+							 dev_res->res,
+							 realloc_head);
 
 			if (r_align > align) {
 				n = &dev_res->list;
@@ -196,6 +230,7 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
 }
 
 static void __dev_sort_resources(struct pci_dev *dev,
+				 struct list_head *realloc_head,
 				 struct list_head *head)
 {
 	u16 class = dev->class >> 8;
@@ -212,7 +247,7 @@ static void __dev_sort_resources(struct pci_dev *dev,
 			return;
 	}
 
-	pdev_sort_resources(dev, head);
+	pdev_sort_resources(dev, realloc_head, head);
 }
 
 static inline void reset_resource(struct resource *res)
@@ -506,7 +541,7 @@ static void pdev_assign_resources_sorted(struct pci_dev *dev,
 {
 	LIST_HEAD(head);
 
-	__dev_sort_resources(dev, &head);
+	__dev_sort_resources(dev, add_head, &head);
 	__assign_resources_sorted(&head, add_head, fail_head);
 
 }
@@ -519,7 +554,7 @@ static void pbus_assign_resources_sorted(const struct pci_bus *bus,
 	LIST_HEAD(head);
 
 	list_for_each_entry(dev, &bus->devices, bus_list)
-		__dev_sort_resources(dev, &head);
+		__dev_sort_resources(dev, realloc_head, &head);
 
 	__assign_resources_sorted(&head, realloc_head, fail_head);
 }
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 27/60] PCI: Optimize bus min_align/size calculation during sizing
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (25 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 26/60] PCI: Use correct align for optional only resources during sorting Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 28/60] PCI: Optimize bus align/size calculation for optional " Yinghai Lu
                   ` (33 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

During bus mmio resource sizing stage, current code try to get alignment as
small as possible and use that to align size to get final size. But it does
not handle resource that size is bigger than alignment in optimal way, kernel
only use max alignment for them.

For example:
 When we have resources with align/size: 1M/2M, 512M/512M,
 current code will have bus resource min_align/size: 512M/1024M,
 but optimal value should be 256M/768M, as we can fit them into
 [256M,768M) or [512M,1280M) instead of [512M,1536M).

 0M        256M        512M       768M       1024M      1280M
 |----------|-----------|----------|----------|----------|----------|
when we have [256M,1024M)
            |---------------------------------|
            |-2M-|      |---512M--------------|
when we have [512M,1280M)
                        |--------------------------------|
                        |---512M--------------|-2M-|

For following cases that we have resource size that is bigger
than resource alignment:
1. SRIOV bar.
2. PCI bridges with children that need several MMIOs that are more than 1M.

We can keep on trying to allocate children devices resources from range
[offset, offset + aligned_size) and offset is aligned with half min_align.
If it sucesses, we can use that half min_align as new min_align.

After this patch, we get:
 align/size: 1M/2M, 2M/4M, 4M/8M, 8M/16M
 new min_align/min_size: 4M/32M, and original is 8M/32M

 align/size: 1M/2M, 2M/4M, 4M/8M
 new min_align/min_size: 2M/14M, and original is 4M/16M

 align/size: 1M/2M, 512M/512M
 new min_align/min_size: 256M/768M, and original is 512M/1024M

The real result from one system with one pcie card that has
four functions that support sriov:
 children resources with align/size:
   00800000/00800000, 00800000/00800000, 00800000/00800000,
   00800000/00800000, 00010000/00200000, 00010000/00200000,
   00010000/00200000, 00010000/00200000, 00008000/00008000,
   00008000/00008000, 00008000/00008000, 00008000/00008000,
   00004000/00080000, 00004000/00080000, 00004000/00080000,
   00004000/00080000
for the bridge:
With original code we have min_align/min_size: 00400000/02c00000,
and with this patch we have min_align/min_size: 00100000/02b00000
So min_align will be 1M instead of 4M and we even have smaller size.

-v2: Need to check more offset with every min_alignment.
-v3: skip r_size <= 1 for optional only bridge resources.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=81431
Reported-by: TJ <linux@iam.tj>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 195 ++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 157 insertions(+), 38 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 544f518..3051bb7 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -29,6 +29,34 @@
 
 unsigned int pci_flags;
 
+static inline bool is_before(resource_size_t align1, resource_size_t size1,
+			     resource_size_t align2, resource_size_t size2)
+{
+	resource_size_t size1_left, size2_left;
+
+	/* big align is before small align */
+	if (align1 > align2)
+		return true;
+
+	/*
+	 * for same align:
+	 *   aligned is before not aligned
+	 *   for not aligned, big remainder is before small remainder
+	 */
+	if (align1 == align2) {
+		size1_left = size1 & (align1 - 1);
+		if (!size1_left)
+			size1_left = align1;
+		size2_left = size2 & (align2 - 1);
+		if (!size2_left)
+			size2_left = align2;
+		if (size1_left > size2_left)
+			return true;
+	}
+
+	return false;
+}
+
 struct pci_dev_resource {
 	struct list_head list;
 	struct resource *res;
@@ -1041,26 +1069,125 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
 	}
 }
 
-static inline resource_size_t calculate_mem_align(resource_size_t *aligns,
-						  int max_order)
+struct align_test_res {
+	struct list_head list;
+	struct resource res;
+	resource_size_t size;
+	resource_size_t align;
+};
+
+static void free_align_test_list(struct list_head *head)
 {
-	resource_size_t align = 0;
-	resource_size_t min_align = 0;
-	int order;
+	struct align_test_res *p, *tmp;
 
-	for (order = 0; order <= max_order; order++) {
-		resource_size_t align1 = 1;
+	list_for_each_entry_safe(p, tmp, head, list) {
+		list_del(&p->list);
+		kfree(p);
+	}
+}
 
-		align1 <<= (order + 20);
+static int add_to_align_test_list(struct list_head *head,
+				  resource_size_t align, resource_size_t size)
+{
+	struct align_test_res *tmp;
+
+	tmp = kzalloc(sizeof(*tmp), GFP_KERNEL);
+	if (!tmp)
+		return -ENOMEM;
+
+	tmp->align = align;
+	tmp->size = size;
+
+	list_add_tail(&tmp->list, head);
+
+	return 0;
+}
+
+static void sort_align_test(struct list_head *head)
+{
+	struct align_test_res *res1, *tmp_res, *res2;
 
-		if (!align)
-			min_align = align1;
-		else if (ALIGN(align + min_align, min_align) < align1)
-			min_align = align1 >> 1;
-		align += aligns[order];
+	list_for_each_entry_safe(res1, tmp_res, head, list) {
+		/* reorder it */
+		list_for_each_entry(res2, head, list) {
+			if (res2 == res1)
+				break;
+
+			if (is_before(res1->align, res1->size,
+				      res2->align, res2->size)) {
+				list_move_tail(&res1->list, &res2->list);
+				break;
+			}
+		}
+	}
+}
+
+static bool is_align_size_good(struct list_head *head,
+			resource_size_t min_align, resource_size_t size,
+			resource_size_t start)
+{
+	struct align_test_res *p;
+	struct resource root;
+
+	memset(&root, 0, sizeof(root));
+	root.start = start;
+	root.end = start + size - 1;
+
+	list_for_each_entry(p, head, list)
+		memset(&p->res, 0, sizeof(p->res));
+
+	list_for_each_entry(p, head, list)
+		if (allocate_resource(&root, &p->res, p->size,
+				0, (resource_size_t)-1ULL,
+				p->align, NULL, NULL))
+			return false;
+
+	return true;
+}
+
+static resource_size_t calculate_mem_align(struct list_head *head,
+				resource_size_t max_align, resource_size_t size,
+				resource_size_t align_low)
+{
+	struct align_test_res *p;
+	resource_size_t min_align, good_align, aligned_size, start;
+	int count = 0;
+
+	if (max_align <= align_low) {
+		good_align = align_low;
+		goto out;
 	}
 
-	return min_align;
+	good_align = max_align;
+
+	list_for_each_entry(p, head, list)
+		count++;
+
+	if (count <= 1)
+		goto out;
+
+	sort_align_test(head);
+
+	do {
+		/* check if we can use smaller align */
+		min_align = good_align >> 1;
+		aligned_size = ALIGN(size, min_align);
+
+		/* need to make sure every offset work */
+		for (start = min_align; start < max_align; start += min_align) {
+			/* checked already with last align ? */
+			if (!(start & (good_align - 1)))
+				continue;
+
+			if (!is_align_size_good(head, min_align, aligned_size,
+					       start))
+				goto out;
+		}
+		good_align = min_align;
+	} while (min_align > align_low);
+
+out:
+	return good_align;
 }
 
 /**
@@ -1090,19 +1217,17 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 {
 	struct pci_dev *dev;
 	resource_size_t min_align, align, size, size0, size1;
-	resource_size_t aligns[18];	/* Alignments from 1Mb to 128Gb */
-	int order, max_order;
+	resource_size_t max_align = 0;
 	struct resource *b_res = find_free_bus_resource(bus,
 					mask | IORESOURCE_PREFETCH, type);
 	resource_size_t children_add_size = 0;
 	resource_size_t children_add_align = 0;
 	resource_size_t add_align = 0;
+	LIST_HEAD(align_test_list);
 
 	if (!b_res)
 		return -ENOSPC;
 
-	memset(aligns, 0, sizeof(aligns));
-	max_order = 0;
 	size = 0;
 
 	list_for_each_entry(dev, &bus->devices, bus_list) {
@@ -1130,29 +1255,20 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 				continue;
 			}
 #endif
-			/*
-			 * aligns[0] is for 1MB (since bridge memory
-			 * windows are always at least 1MB aligned), so
-			 * keep "order" from being negative for smaller
-			 * resources.
-			 */
 			align = pci_resource_alignment(dev, r);
-			order = __ffs(align) - 20;
-			if (order < 0)
-				order = 0;
-			if (order >= ARRAY_SIZE(aligns)) {
+			if (align > (1ULL<<37)) { /*128 Gb*/
 				dev_warn(&dev->dev, "disabling BAR %d: %pR (bad alignment %#llx)\n",
-					 i, r, (unsigned long long) align);
+					i, r, (unsigned long long) align);
 				r->flags = 0;
 				continue;
 			}
+
+			if (r_size > 1)
+				add_to_align_test_list(&align_test_list,
+							align, r_size);
 			size += r_size;
-			/* Exclude ranges with size > align from
-			   calculation of the alignment. */
-			if (r_size == align)
-				aligns[order] += align;
-			if (order > max_order)
-				max_order = order;
+			if (align > max_align)
+				max_align = align;
 
 			if (realloc_head) {
 				children_add_size += get_res_add_size(realloc_head, r);
@@ -1162,9 +1278,12 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 		}
 	}
 
-	min_align = calculate_mem_align(aligns, max_order);
-	min_align = max(min_align, window_alignment(bus, b_res->flags));
-	size0 = calculate_memsize(size, min_size, 0, resource_size(b_res), min_align);
+	max_align = max(max_align, window_alignment(bus, b_res->flags));
+	min_align = calculate_mem_align(&align_test_list, max_align, size,
+					window_alignment(bus, b_res->flags));
+	size0 = calculate_memsize(size, min_size, 0,
+				  resource_size(b_res), min_align);
+	free_align_test_list(&align_test_list);
 	add_align = max(min_align, add_align);
 	if (children_add_size > add_size)
 		add_size = children_add_size;
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 28/60] PCI: Optimize bus align/size calculation for optional during sizing
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (26 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 27/60] PCI: Optimize bus min_align/size calculation during sizing Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 29/60] PCI: Don't add too much optional size for hotplug bridge MMIO Yinghai Lu
                   ` (32 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

Current add_align always use max align, that make required+optional
to get allocated more than needed in some cases.

Now we have new calculate_mem_align(), so we could use it for
add_align calculation.

Need to create separated list for required+optional align/size info.

After that we can get smaller add_align/size, and we have more chance
to make required+optional to get allocated sucessfully.

The result for bridge that have Intel 4x10g card installed:

 pci 0000:20:03.2: bridge window [mem 0x00000000-0x000fffff 64bit pref]
	to [bus 2a-31] calculate_mem for required
 align/size:
   00800000/00800000, 00800000/00800000, 00800000/00800000,
   00800000/00800000, 00008000/00008000, 00008000/00008000,
   00008000/00008000, 00008000/00008000
 original min_align/min_size: 00400000/02400000
 new min_align/min_size: 00400000/02400000

 pci 0000:20:03.2: bridge window [mem 0x00000000-0x000fffff 64bit pref]
	to [bus 2a-31] calculate_mem for required+optional
 align/size:
   00800000/00800000, 00800000/00800000, 00800000/00800000,
   00800000/00800000, 00010000/00200000, 00010000/00200000,
   00010000/00200000, 00010000/00200000, 00008000/00008000,
   00008000/00008000, 00008000/00008000, 00008000/00008000,
   00004000/00080000, 00004000/00080000, 00004000/00080000,
   00004000/00080000
 original code min_align/min_size: 00800000/03000000
 new min_align/min_size: 00100000/02b00000

so required align/size: 0x400000/0x2400000, and
new required+optional align/size: 0x100000/0x2b00000, and it is much better
than original required+optional align/size: 0x800000/0x3000000
and even have smaller min_align than required.


-v2: remove not used size1 in calculate_memsize


Link: https://bugzilla.kernel.org/show_bug.cgi?id=81431
Reported-by: TJ <linux@iam.tj>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 82 ++++++++++++++++++++++++++++++-------------------
 1 file changed, 51 insertions(+), 31 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 3051bb7..12fd6d9 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -943,7 +943,6 @@ static resource_size_t calculate_iosize(resource_size_t size,
 
 static resource_size_t calculate_memsize(resource_size_t size,
 		resource_size_t min_size,
-		resource_size_t size1,
 		resource_size_t old_size,
 		resource_size_t align)
 {
@@ -953,7 +952,7 @@ static resource_size_t calculate_memsize(resource_size_t size,
 		old_size = 0;
 	if (size < old_size)
 		size = old_size;
-	size = ALIGN(size + size1, align);
+	size = ALIGN(size, align);
 	return size;
 }
 
@@ -1216,26 +1215,23 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 			 struct list_head *realloc_head)
 {
 	struct pci_dev *dev;
-	resource_size_t min_align, align, size, size0, size1;
-	resource_size_t max_align = 0;
+	resource_size_t min_align = 0, min_add_align = 0;
+	resource_size_t max_align = 0, max_add_align = 0;
+	resource_size_t size = 0, size0 = 0, size1 = 0, sum_add_size = 0;
 	struct resource *b_res = find_free_bus_resource(bus,
 					mask | IORESOURCE_PREFETCH, type);
-	resource_size_t children_add_size = 0;
-	resource_size_t children_add_align = 0;
-	resource_size_t add_align = 0;
 	LIST_HEAD(align_test_list);
+	LIST_HEAD(align_test_add_list);
 
 	if (!b_res)
 		return -ENOSPC;
 
-	size = 0;
-
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		int i;
 
 		for (i = 0; i < PCI_NUM_RESOURCES; i++) {
 			struct resource *r = &dev->resource[i];
-			resource_size_t r_size;
+			resource_size_t r_size, align;
 			int flags = pci_resource_pref_compatible(dev, r);
 
 			if (r->parent || (flags & IORESOURCE_PCI_FIXED) ||
@@ -1243,19 +1239,23 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 			     (flags & mask) != type2 &&
 			     (flags & mask) != type3))
 				continue;
+
 			r_size = resource_size(r);
+			align = pci_resource_alignment(dev, r);
 #ifdef CONFIG_PCI_IOV
 			/* put SRIOV requested res to the optional list */
 			if (realloc_head && i >= PCI_IOV_RESOURCES &&
 					i <= PCI_IOV_RESOURCE_END) {
-				add_align = max(pci_resource_alignment(dev, r), add_align);
+				add_to_align_test_list(&align_test_add_list,
+							align, r_size);
 				r->end = r->start - 1;
 				add_to_list(realloc_head, dev, r, r_size, 0/* don't care */);
-				children_add_size += r_size;
+				sum_add_size += r_size;
+				if (align > max_add_align)
+					max_add_align = align;
 				continue;
 			}
 #endif
-			align = pci_resource_alignment(dev, r);
 			if (align > (1ULL<<37)) { /*128 Gb*/
 				dev_warn(&dev->dev, "disabling BAR %d: %pR (bad alignment %#llx)\n",
 					i, r, (unsigned long long) align);
@@ -1263,33 +1263,52 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 				continue;
 			}
 
-			if (r_size > 1)
+			if (r_size > 1) {
 				add_to_align_test_list(&align_test_list,
 							align, r_size);
-			size += r_size;
-			if (align > max_align)
-				max_align = align;
+				size += r_size;
+				if (align > max_align)
+					max_align = align;
+			}
 
 			if (realloc_head) {
-				children_add_size += get_res_add_size(realloc_head, r);
-				children_add_align = get_res_add_align(realloc_head, r);
-				add_align = max(add_align, children_add_align);
+				resource_size_t add_r_size, add_align;
+
+				add_r_size = get_res_add_size(realloc_head, r);
+				add_align = get_res_add_align(realloc_head, r);
+				/* no add on ? */
+				if (add_align < align)
+					add_align = align;
+				add_to_align_test_list(&align_test_add_list,
+							add_align,
+							r_size + add_r_size);
+				sum_add_size += r_size + add_r_size;
+				if (add_align > max_add_align)
+					max_add_align = add_align;
 			}
 		}
 	}
 
 	max_align = max(max_align, window_alignment(bus, b_res->flags));
-	min_align = calculate_mem_align(&align_test_list, max_align, size,
-					window_alignment(bus, b_res->flags));
-	size0 = calculate_memsize(size, min_size, 0,
+	if (size || min_size) {
+		min_align = calculate_mem_align(&align_test_list, max_align,
+				 size, window_alignment(bus, b_res->flags));
+		size0 = calculate_memsize(size, min_size,
 				  resource_size(b_res), min_align);
+	}
 	free_align_test_list(&align_test_list);
-	add_align = max(min_align, add_align);
-	if (children_add_size > add_size)
-		add_size = children_add_size;
-	size1 = (!realloc_head || (realloc_head && !add_size)) ? size0 :
-		calculate_memsize(size, min_size, add_size,
-				resource_size(b_res), add_align);
+
+	if ((sum_add_size - size) < add_size)
+		sum_add_size = size + add_size;
+	if (sum_add_size > size && realloc_head) {
+		min_add_align = calculate_mem_align(&align_test_add_list,
+					max_add_align, sum_add_size,
+					window_alignment(bus, b_res->flags));
+		size1 = calculate_memsize(sum_add_size, min_size,
+				 resource_size(b_res), min_add_align);
+	}
+	free_align_test_list(&align_test_add_list);
+
 	if (!size0 && !size1) {
 		if (b_res->start || b_res->end)
 			dev_info(&bus->self->dev, "disabling bridge window %pR to %pR (unused)\n",
@@ -1301,11 +1320,12 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 	b_res->end = size0 + min_align - 1;
 	b_res->flags |= IORESOURCE_STARTALIGN;
 	if (size1 > size0 && realloc_head) {
-		add_to_list(realloc_head, bus->self, b_res, size1-size0, add_align);
+		add_to_list(realloc_head, bus->self, b_res, size1 - size0,
+				min_add_align);
 		dev_printk(KERN_DEBUG, &bus->self->dev, "bridge window %pR to %pR add_size %llx add_align %llx\n",
 			   b_res, &bus->busn_res,
 			   (unsigned long long) (size1 - size0),
-			   (unsigned long long) add_align);
+			   (unsigned long long) min_add_align);
 	}
 	return 0;
 }
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 29/60] PCI: Don't add too much optional size for hotplug bridge MMIO
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (27 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 28/60] PCI: Optimize bus align/size calculation for optional " Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 30/60] PCI: Reorder resources list for required/optional resources Yinghai Lu
                   ` (31 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

Current code will always add 2M for hotplug bridge MMIO even
there is child device under it already.

For example:
	40:03.0 --- 43:00.0 --- 44:02.0 -+- 45:00.0
					 \- 45:00.1

44:02.0 will need 1M as must for 45:00.0 and 45:00.1
When we calculate add_size for 44:02.0, we pass 2M as additional
size for hotplug bridge, total will be 3M.

That is different from code before changes for optional support,
or even current code that treat optional as required directly by
not passing realloc list. We only need 2M as total.

The optional size should be 1M, and total size should be 2M.

This patch change to comparing required+optional with min_sum_size to
get smaller optional size.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 28 +++++++++++++++-------------
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 12fd6d9..f6b7b8f 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1198,7 +1198,6 @@ out:
  * @type2: second match type
  * @type3: third match type
  * @min_size : the minimum memory window that must to be allocated
- * @add_size : additional optional memory window
  * @realloc_head : track the additional memory window on this list
  *
  * Calculate the size of the bus and minimal alignment which
@@ -1211,10 +1210,11 @@ out:
 static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 			 unsigned long type, unsigned long type2,
 			 unsigned long type3,
-			 resource_size_t min_size, resource_size_t add_size,
+			 resource_size_t min_size,
 			 struct list_head *realloc_head)
 {
 	struct pci_dev *dev;
+	resource_size_t min_sum_size = 0;
 	resource_size_t min_align = 0, min_add_align = 0;
 	resource_size_t max_align = 0, max_add_align = 0;
 	resource_size_t size = 0, size0 = 0, size1 = 0, sum_add_size = 0;
@@ -1226,6 +1226,11 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 	if (!b_res)
 		return -ENOSPC;
 
+	if (realloc_head) {
+		min_sum_size = min_size;
+		min_size = 0;
+	}
+
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		int i;
 
@@ -1298,8 +1303,8 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 	}
 	free_align_test_list(&align_test_list);
 
-	if ((sum_add_size - size) < add_size)
-		sum_add_size = size + add_size;
+	if (sum_add_size < min_sum_size)
+		sum_add_size = min_sum_size;
 	if (sum_add_size > size && realloc_head) {
 		min_add_align = calculate_mem_align(&align_test_add_list,
 					max_add_align, sum_add_size,
@@ -1436,7 +1441,7 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
 {
 	struct pci_dev *dev;
 	unsigned long mask, prefmask, type2 = 0, type3 = 0;
-	resource_size_t additional_mem_size = 0, additional_io_size = 0;
+	resource_size_t min_mem_size = 0, additional_io_size = 0;
 	struct resource *b_res;
 	int ret;
 
@@ -1473,7 +1478,7 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
 	case PCI_CLASS_BRIDGE_PCI:
 		if (bus->self->is_hotplug_bridge) {
 			additional_io_size  = pci_hotplug_io_size;
-			additional_mem_size = pci_hotplug_mem_size;
+			min_mem_size = pci_hotplug_mem_size;
 		}
 		/* Fall through */
 	default:
@@ -1493,8 +1498,7 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
 			prefmask |= IORESOURCE_MEM_64;
 			ret = pbus_size_mem(bus, prefmask, prefmask,
 				  prefmask, prefmask,
-				  realloc_head ? 0 : additional_mem_size,
-				  additional_mem_size, realloc_head);
+				  min_mem_size, realloc_head);
 
 			/*
 			 * If successful, all non-prefetchable resources
@@ -1517,8 +1521,7 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
 			prefmask &= ~IORESOURCE_MEM_64;
 			ret = pbus_size_mem(bus, prefmask, prefmask,
 					 prefmask, prefmask,
-					 realloc_head ? 0 : additional_mem_size,
-					 additional_mem_size, realloc_head);
+					 min_mem_size, realloc_head);
 
 			/*
 			 * If successful, only non-prefetchable resources
@@ -1527,7 +1530,7 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
 			if (ret == 0)
 				mask = prefmask;
 			else
-				additional_mem_size += additional_mem_size;
+				min_mem_size += min_mem_size;
 
 			type2 = type3 = IORESOURCE_MEM;
 		}
@@ -1548,8 +1551,7 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
 		 * window.
 		 */
 		pbus_size_mem(bus, mask, IORESOURCE_MEM, type2, type3,
-				realloc_head ? 0 : additional_mem_size,
-				additional_mem_size, realloc_head);
+				min_mem_size, realloc_head);
 		break;
 	}
 }
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 30/60] PCI: Reorder resources list for required/optional resources
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (28 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 29/60] PCI: Don't add too much optional size for hotplug bridge MMIO Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 31/60] PCI: Remove duplicated code for resource sorting Yinghai Lu
                   ` (30 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

We try to allocate required+optional before allocate required only and
expand with optional.

At first we update size and alignment for required+optional resource.
And after that we reorder them with new alignment, but current we only
do that STARTALIGN ones.

For SIZEALIGN type resource, after add back add_size, the alignment
get changed, so need to do sorting like STARTALIGN type resources.

Also we need to reorder the sorting back after we restore
resource to required only when required+optional fail to allocate for all.

So move out the reordering code from the loop to separated function,
and call it two times accordingly.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 62 +++++++++++++++++++++++++++++--------------------
 1 file changed, 37 insertions(+), 25 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index f6b7b8f..aed62cc 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -285,6 +285,31 @@ static inline void reset_resource(struct resource *res)
 	res->flags = 0;
 }
 
+static void sort_resources(struct list_head *head)
+{
+	struct pci_dev_resource *res1, *tmp_res, *res2;
+
+	list_for_each_entry_safe(res1, tmp_res, head, list) {
+		resource_size_t align1, size1, align2, size2;
+
+		align1 = pci_resource_alignment(res1->dev, res1->res);
+		size1 = resource_size(res1->res);
+
+		/* reorder it */
+		list_for_each_entry(res2, head, list) {
+			if (res2 == res1)
+				break;
+
+			align2 = pci_resource_alignment(res2->dev, res2->res);
+			size2 = resource_size(res2->res);
+			if (is_before(align1, size1, align2, size2)) {
+				list_move_tail(&res1->list, &res2->list);
+				break;
+			}
+		}
+	}
+}
+
 /**
  * reassign_resources_sorted() - satisfy any additional resource requests
  *
@@ -453,9 +478,9 @@ static void __assign_resources_sorted(struct list_head *head,
 	LIST_HEAD(save_head);
 	LIST_HEAD(local_fail_head);
 	struct pci_dev_resource *save_res;
-	struct pci_dev_resource *dev_res, *tmp_res, *dev_res2;
+	struct pci_dev_resource *dev_res, *tmp_res;
 	unsigned long fail_type;
-	resource_size_t add_align, align;
+	resource_size_t add_align;
 
 	/* Check if optional add_size is there */
 	if (!realloc_head || list_empty(realloc_head))
@@ -470,47 +495,32 @@ static void __assign_resources_sorted(struct list_head *head,
 	}
 
 	/* Update res in head list with add_size in realloc_head list */
-	list_for_each_entry_safe(dev_res, tmp_res, head, list) {
+	list_for_each_entry(dev_res, head, list) {
 		dev_res->res->end += get_res_add_size(realloc_head,
 							dev_res->res);
 
 		/*
 		 * There are two kinds of additional resources in the list:
-		 * 1. bridge resource  -- IORESOURCE_STARTALIGN
-		 * 2. SR-IOV resource   -- IORESOURCE_SIZEALIGN
-		 * Here just fix the additional alignment for bridge
+		 * 1. bridge resource with IORESOURCE_STARTALIGN
+		 *    need to update start to change alignment
+		 * 2. resource with IORESOURCE_SIZEALIGN
+		 *    update size above already change alignment.
 		 */
 		if (!(dev_res->res->flags & IORESOURCE_STARTALIGN))
 			continue;
 
 		add_align = get_res_add_align(realloc_head, dev_res->res);
 
-		/*
-		 * The "head" list is sorted by the alignment to make sure
-		 * resources with bigger alignment will be assigned first.
-		 * After we change the alignment of a dev_res in "head" list,
-		 * we need to reorder the list by alignment to make it
-		 * consistent.
-		 */
-		if (add_align > dev_res->res->start) {
+		if (add_align) {
 			resource_size_t r_size = resource_size(dev_res->res);
 
 			dev_res->res->start = add_align;
 			dev_res->res->end = add_align + r_size - 1;
-
-			list_for_each_entry(dev_res2, head, list) {
-				align = pci_resource_alignment(dev_res2->dev,
-							       dev_res2->res);
-				if (add_align > align) {
-					list_move_tail(&dev_res->list,
-						       &dev_res2->list);
-					break;
-				}
-			}
 		}
-
 	}
 
+	sort_resources(head);
+
 	/* Try updated head list with add_size added */
 	assign_requested_resources_sorted(head, &local_fail_head);
 
@@ -552,6 +562,8 @@ static void __assign_resources_sorted(struct list_head *head,
 	}
 	free_list(&save_head);
 
+	sort_resources(head);
+
 requested_and_reassign:
 	/* Satisfy the must-have resource requests */
 	assign_requested_resources_sorted(head, fail_head);
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 31/60] PCI: Remove duplicated code for resource sorting
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (29 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 30/60] PCI: Reorder resources list for required/optional resources Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 32/60] PCI: Rename pdev_sort_resources() to pdev_assign_resources_prepare() Yinghai Lu
                   ` (29 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

Now sort_resources() and pdev_sort_resources() all have sorting
code.

As we are going to call sort_resources() several places later for
alt_size support, so choose to remove related code in
pdev_sort_resources().

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 22 +++-------------------
 1 file changed, 3 insertions(+), 19 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index aed62cc..0e3be10 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -213,9 +213,8 @@ static void pdev_sort_resources(struct pci_dev *dev,
 
 	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
 		struct resource *r;
-		struct pci_dev_resource *dev_res, *tmp;
+		struct pci_dev_resource *tmp;
 		resource_size_t r_align;
-		struct list_head *n;
 
 		r = &dev->resource[i];
 
@@ -238,22 +237,7 @@ static void pdev_sort_resources(struct pci_dev *dev,
 		tmp->res = r;
 		tmp->dev = dev;
 
-		/* fallback is smallest one or list is empty*/
-		n = head;
-		list_for_each_entry(dev_res, head, list) {
-			resource_size_t align;
-
-			align = __pci_resource_alignment(dev_res->dev,
-							 dev_res->res,
-							 realloc_head);
-
-			if (r_align > align) {
-				n = &dev_res->list;
-				break;
-			}
-		}
-		/* Insert it just before n*/
-		list_add_tail(&tmp->list, n);
+		list_add_tail(&tmp->list, head);
 	}
 }
 
@@ -562,9 +546,9 @@ static void __assign_resources_sorted(struct list_head *head,
 	}
 	free_list(&save_head);
 
+requested_and_reassign:
 	sort_resources(head);
 
-requested_and_reassign:
 	/* Satisfy the must-have resource requests */
 	assign_requested_resources_sorted(head, fail_head);
 
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 32/60] PCI: Rename pdev_sort_resources() to pdev_assign_resources_prepare()
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (30 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 31/60] PCI: Remove duplicated code for resource sorting Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 33/60] PCI: Treat ROM resource as optional during realloc Yinghai Lu
                   ` (28 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

pdev_sort_resources() etc was checking devices resources and putting
resources that need to assign to one list in sorted order.

Now we don't do sorting in those functions anymore, so change to
pdev_assign_resources_prepare() instead.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 0e3be10..9cd0411 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -204,8 +204,8 @@ static resource_size_t __pci_resource_alignment(
 	return r_align;
 }
 
-/* Sort resources by alignment */
-static void pdev_sort_resources(struct pci_dev *dev,
+/* check resources and save to the list */
+static void pdev_assign_resources_prepare(struct pci_dev *dev,
 				 struct list_head *realloc_head,
 				 struct list_head *head)
 {
@@ -241,7 +241,7 @@ static void pdev_sort_resources(struct pci_dev *dev,
 	}
 }
 
-static void __dev_sort_resources(struct pci_dev *dev,
+static void __dev_assign_resources_prepare(struct pci_dev *dev,
 				 struct list_head *realloc_head,
 				 struct list_head *head)
 {
@@ -259,7 +259,7 @@ static void __dev_sort_resources(struct pci_dev *dev,
 			return;
 	}
 
-	pdev_sort_resources(dev, realloc_head, head);
+	pdev_assign_resources_prepare(dev, realloc_head, head);
 }
 
 static inline void reset_resource(struct resource *res)
@@ -565,7 +565,7 @@ static void pdev_assign_resources_sorted(struct pci_dev *dev,
 {
 	LIST_HEAD(head);
 
-	__dev_sort_resources(dev, add_head, &head);
+	__dev_assign_resources_prepare(dev, add_head, &head);
 	__assign_resources_sorted(&head, add_head, fail_head);
 
 }
@@ -578,7 +578,7 @@ static void pbus_assign_resources_sorted(const struct pci_bus *bus,
 	LIST_HEAD(head);
 
 	list_for_each_entry(dev, &bus->devices, bus_list)
-		__dev_sort_resources(dev, realloc_head, &head);
+		__dev_assign_resources_prepare(dev, realloc_head, &head);
 
 	__assign_resources_sorted(&head, realloc_head, fail_head);
 }
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 33/60] PCI: Treat ROM resource as optional during realloc
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (31 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 32/60] PCI: Rename pdev_sort_resources() to pdev_assign_resources_prepare() Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 34/60] PCI: Add debug printout during releasing partial assigned resources Yinghai Lu
                   ` (27 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

Current on realloc path, we just ignore ROM resource if we can not assign
them in first try.

Treat ROM resources as optional resources,so try to allocate them together
with required ones, if can not assign them, could go with other required
resources only, and try to allocate them second time in expand path.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 37 ++++++++++++++++++++-----------------
 1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 9cd0411..f9e6a00 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -376,18 +376,10 @@ static void assign_requested_resources_sorted(struct list_head *head,
 		idx = res - &dev_res->dev->resource[0];
 		if (resource_size(res) &&
 		    pci_assign_resource(dev_res->dev, idx)) {
-			if (fail_head) {
-				/*
-				 * if the failed res is for ROM BAR, and it will
-				 * be enabled later, don't add it to the list
-				 */
-				if (!((idx == PCI_ROM_RESOURCE) &&
-				      (!(res->flags & IORESOURCE_ROM_ENABLE))))
-					add_to_list(fail_head,
-						    dev_res->dev, res,
-						    0 /* don't care */,
-						    0 /* don't care */);
-			}
+			if (fail_head)
+				add_to_list(fail_head, dev_res->dev, res,
+					    0 /* don't care */,
+					    0 /* don't care */);
 			reset_resource(res);
 		}
 	}
@@ -1185,6 +1177,19 @@ out:
 	return good_align;
 }
 
+static inline bool is_optional(int i)
+{
+
+	if (i == PCI_ROM_RESOURCE)
+		return true;
+
+#ifdef CONFIG_PCI_IOV
+	if (i >= PCI_IOV_RESOURCES && i <= PCI_IOV_RESOURCE_END)
+		return true;
+#endif
+
+	return false;
+}
 /**
  * pbus_size_mem() - size the memory window of a given bus
  *
@@ -1243,10 +1248,8 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 
 			r_size = resource_size(r);
 			align = pci_resource_alignment(dev, r);
-#ifdef CONFIG_PCI_IOV
-			/* put SRIOV requested res to the optional list */
-			if (realloc_head && i >= PCI_IOV_RESOURCES &&
-					i <= PCI_IOV_RESOURCE_END) {
+			/* put SRIOV/ROM res to realloc list */
+			if (realloc_head && is_optional(i)) {
 				add_to_align_test_list(&align_test_add_list,
 							align, r_size);
 				r->end = r->start - 1;
@@ -1256,7 +1259,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 					max_add_align = align;
 				continue;
 			}
-#endif
+
 			if (align > (1ULL<<37)) { /*128 Gb*/
 				dev_warn(&dev->dev, "disabling BAR %d: %pR (bad alignment %#llx)\n",
 					i, r, (unsigned long long) align);
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 34/60] PCI: Add debug printout during releasing partial assigned resources
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (32 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 33/60] PCI: Treat ROM resource as optional during realloc Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 35/60] PCI: Simplify res reference using in __assign_resources_sorted() Yinghai Lu
                   ` (26 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

We try to assign required+optional at first, and we only accept the result
if all resources get allocated. Otherwise will release assigned in the
list, and try to assign required and expand to optional.

We have to do that to make sure any required has priority over any optional.

When that happens, we only print out "assigned" info, that is confusing
as it looks like same range is assigned to two peer resources at the same
time.

Add printout for releasing so we have whole picture in debug messages.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index f9e6a00..b22eb5f 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -525,9 +525,17 @@ static void __assign_resources_sorted(struct list_head *head,
 
 	free_list(&local_fail_head);
 	/* Release assigned resource */
-	list_for_each_entry(dev_res, head, list)
-		if (dev_res->res->parent)
-			release_resource(dev_res->res);
+	list_for_each_entry(dev_res, head, list) {
+		struct resource *res = dev_res->res;
+
+		if (res->parent) {
+			dev_printk(KERN_DEBUG, &dev_res->dev->dev,
+				   "BAR %d: released %pR\n",
+				   (int)(res - &dev_res->dev->resource[0]),
+				   res);
+			release_resource(res);
+		}
+	}
 	/* Restore start/end/flags from saved list */
 	list_for_each_entry(save_res, &save_head, list) {
 		struct resource *res = save_res->res;
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 35/60] PCI: Simplify res reference using in __assign_resources_sorted()
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (33 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 34/60] PCI: Add debug printout during releasing partial assigned resources Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 36/60] PCI: Add __add_to_list() Yinghai Lu
                   ` (25 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

There are couples of dev_res->res reference, to make code more readable
use res instead of dev_res->res directly.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index b22eb5f..7865e44 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -457,6 +457,7 @@ static void __assign_resources_sorted(struct list_head *head,
 	struct pci_dev_resource *dev_res, *tmp_res;
 	unsigned long fail_type;
 	resource_size_t add_align;
+	struct resource *res;
 
 	/* Check if optional add_size is there */
 	if (!realloc_head || list_empty(realloc_head))
@@ -472,8 +473,8 @@ static void __assign_resources_sorted(struct list_head *head,
 
 	/* Update res in head list with add_size in realloc_head list */
 	list_for_each_entry(dev_res, head, list) {
-		dev_res->res->end += get_res_add_size(realloc_head,
-							dev_res->res);
+		res = dev_res->res;
+		res->end += get_res_add_size(realloc_head, res);
 
 		/*
 		 * There are two kinds of additional resources in the list:
@@ -482,16 +483,16 @@ static void __assign_resources_sorted(struct list_head *head,
 		 * 2. resource with IORESOURCE_SIZEALIGN
 		 *    update size above already change alignment.
 		 */
-		if (!(dev_res->res->flags & IORESOURCE_STARTALIGN))
+		if (!(res->flags & IORESOURCE_STARTALIGN))
 			continue;
 
-		add_align = get_res_add_align(realloc_head, dev_res->res);
+		add_align = get_res_add_align(realloc_head, res);
 
 		if (add_align) {
-			resource_size_t r_size = resource_size(dev_res->res);
+			resource_size_t r_size = resource_size(res);
 
-			dev_res->res->start = add_align;
-			dev_res->res->end = add_align + r_size - 1;
+			res->start = add_align;
+			res->end = add_align + r_size - 1;
 		}
 	}
 
@@ -513,21 +514,21 @@ static void __assign_resources_sorted(struct list_head *head,
 	/* check failed type */
 	fail_type = pci_fail_res_type_mask(&local_fail_head);
 	/* remove not need to be released assigned res from head list etc */
-	list_for_each_entry_safe(dev_res, tmp_res, head, list)
-		if (dev_res->res->parent &&
-		    !pci_need_to_release(fail_type, dev_res->res)) {
+	list_for_each_entry_safe(dev_res, tmp_res, head, list) {
+		res = dev_res->res;
+		if (res->parent && !pci_need_to_release(fail_type, res)) {
 			/* remove it from realloc_head list */
-			remove_from_list(realloc_head, dev_res->res);
-			remove_from_list(&save_head, dev_res->res);
+			remove_from_list(realloc_head, res);
+			remove_from_list(&save_head, res);
 			list_del(&dev_res->list);
 			kfree(dev_res);
 		}
+	}
 
 	free_list(&local_fail_head);
 	/* Release assigned resource */
 	list_for_each_entry(dev_res, head, list) {
-		struct resource *res = dev_res->res;
-
+		res = dev_res->res;
 		if (res->parent) {
 			dev_printk(KERN_DEBUG, &dev_res->dev->dev,
 				   "BAR %d: released %pR\n",
@@ -538,8 +539,7 @@ static void __assign_resources_sorted(struct list_head *head,
 	}
 	/* Restore start/end/flags from saved list */
 	list_for_each_entry(save_res, &save_head, list) {
-		struct resource *res = save_res->res;
-
+		res = save_res->res;
 		res->start = save_res->start;
 		res->end = save_res->end;
 		res->flags = save_res->flags;
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 36/60] PCI: Add __add_to_list()
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (34 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 35/60] PCI: Simplify res reference using in __assign_resources_sorted() Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 37/60] PCI: Cache window alignment value during bus sizing Yinghai Lu
                   ` (24 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

For alt_size support, we will add more entries to realloc list.

Add new __add_to_list() to take alt_size, alt_align.

And simplify add_to_list() not to take add/alt input.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 51 ++++++++++++++++++++++++++++++-------------------
 1 file changed, 31 insertions(+), 20 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 7865e44..efa6d4e 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -65,6 +65,8 @@ struct pci_dev_resource {
 	resource_size_t end;
 	resource_size_t add_size;
 	resource_size_t min_align;
+	resource_size_t alt_size;
+	resource_size_t alt_align;
 	unsigned long flags;
 };
 
@@ -87,15 +89,16 @@ static void free_list(struct list_head *head)
  * @add_size:	additional size to be optionally added
  *              to the resource
  */
-static int add_to_list(struct list_head *head,
+static int __add_to_list(struct list_head *head,
 		 struct pci_dev *dev, struct resource *res,
-		 resource_size_t add_size, resource_size_t min_align)
+		 resource_size_t add_size, resource_size_t min_align,
+		 resource_size_t alt_size, resource_size_t alt_align)
 {
 	struct pci_dev_resource *tmp;
 
 	tmp = kzalloc(sizeof(*tmp), GFP_KERNEL);
 	if (!tmp) {
-		pr_warn("add_to_list: kmalloc() failed!\n");
+		pr_warn("__add_to_list: kmalloc() failed!\n");
 		return -ENOMEM;
 	}
 
@@ -106,12 +109,20 @@ static int add_to_list(struct list_head *head,
 	tmp->flags = res->flags;
 	tmp->add_size = add_size;
 	tmp->min_align = min_align;
+	tmp->alt_size = alt_size;
+	tmp->alt_align = alt_align;
 
 	list_add(&tmp->list, head);
 
 	return 0;
 }
 
+static int add_to_list(struct list_head *head,
+		 struct pci_dev *dev, struct resource *res)
+{
+	return __add_to_list(head, dev, res, 0, 0, 0, 0);
+}
+
 static void remove_from_list(struct list_head *head,
 				 struct resource *res)
 {
@@ -377,9 +388,7 @@ static void assign_requested_resources_sorted(struct list_head *head,
 		if (resource_size(res) &&
 		    pci_assign_resource(dev_res->dev, idx)) {
 			if (fail_head)
-				add_to_list(fail_head, dev_res->dev, res,
-					    0 /* don't care */,
-					    0 /* don't care */);
+				add_to_list(fail_head, dev_res->dev, res);
 			reset_resource(res);
 		}
 	}
@@ -465,7 +474,7 @@ static void __assign_resources_sorted(struct list_head *head,
 
 	/* Save original start, end, flags etc at first */
 	list_for_each_entry(dev_res, head, list) {
-		if (add_to_list(&save_head, dev_res->dev, dev_res->res, 0, 0)) {
+		if (add_to_list(&save_head, dev_res->dev, dev_res->res)) {
 			free_list(&save_head);
 			goto requested_and_reassign;
 		}
@@ -1056,8 +1065,8 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
 	b_res->end = b_res->start + size0 - 1;
 	b_res->flags |= IORESOURCE_STARTALIGN;
 	if (size1 > size0 && realloc_head) {
-		add_to_list(realloc_head, bus->self, b_res, size1-size0,
-			    min_align);
+		__add_to_list(realloc_head, bus->self, b_res,
+			      size1 - size0, min_align, 0, 0);
 		dev_printk(KERN_DEBUG, &bus->self->dev, "bridge window %pR to %pR add_size %llx\n",
 			   b_res, &bus->busn_res,
 			   (unsigned long long)size1-size0);
@@ -1261,7 +1270,8 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 				add_to_align_test_list(&align_test_add_list,
 							align, r_size);
 				r->end = r->start - 1;
-				add_to_list(realloc_head, dev, r, r_size, 0/* don't care */);
+				__add_to_list(realloc_head, dev, r,
+					      r_size, align, 0, 0);
 				sum_add_size += r_size;
 				if (align > max_add_align)
 					max_add_align = align;
@@ -1332,8 +1342,8 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 	b_res->end = size0 + min_align - 1;
 	b_res->flags |= IORESOURCE_STARTALIGN;
 	if (size1 > size0 && realloc_head) {
-		add_to_list(realloc_head, bus->self, b_res, size1 - size0,
-				min_add_align);
+		__add_to_list(realloc_head, bus->self, b_res, size1 - size0,
+				min_add_align, 0, 0);
 		dev_printk(KERN_DEBUG, &bus->self->dev, "bridge window %pR to %pR add_size %llx add_align %llx\n",
 			   b_res, &bus->busn_res,
 			   (unsigned long long) (size1 - size0),
@@ -1370,8 +1380,8 @@ static void pci_bus_size_cardbus(struct pci_bus *bus,
 	b_res[0].flags |= IORESOURCE_IO | IORESOURCE_STARTALIGN;
 	if (realloc_head) {
 		b_res[0].end -= pci_cardbus_io_size;
-		add_to_list(realloc_head, bridge, b_res, pci_cardbus_io_size,
-				pci_cardbus_io_size);
+		__add_to_list(realloc_head, bridge, b_res,
+			      pci_cardbus_io_size, pci_cardbus_io_size, 0, 0);
 	}
 
 handle_b_res_1:
@@ -1382,8 +1392,8 @@ handle_b_res_1:
 	b_res[1].flags |= IORESOURCE_IO | IORESOURCE_STARTALIGN;
 	if (realloc_head) {
 		b_res[1].end -= pci_cardbus_io_size;
-		add_to_list(realloc_head, bridge, b_res+1, pci_cardbus_io_size,
-				 pci_cardbus_io_size);
+		__add_to_list(realloc_head, bridge, b_res + 1,
+			      pci_cardbus_io_size, pci_cardbus_io_size, 0, 0);
 	}
 
 handle_b_res_2:
@@ -1420,8 +1430,9 @@ handle_b_res_2:
 				  IORESOURCE_STARTALIGN;
 		if (realloc_head) {
 			b_res[2].end -= pci_cardbus_mem_size;
-			add_to_list(realloc_head, bridge, b_res+2,
-				 pci_cardbus_mem_size, pci_cardbus_mem_size);
+			__add_to_list(realloc_head, bridge, b_res + 2,
+				pci_cardbus_mem_size, pci_cardbus_mem_size,
+				0, 0);
 		}
 
 		/* reduce that to half */
@@ -1436,8 +1447,8 @@ handle_b_res_3:
 	b_res[3].flags |= IORESOURCE_MEM | IORESOURCE_STARTALIGN;
 	if (realloc_head) {
 		b_res[3].end -= b_res_3_size;
-		add_to_list(realloc_head, bridge, b_res+3, b_res_3_size,
-				 pci_cardbus_mem_size);
+		__add_to_list(realloc_head, bridge, b_res + 3,
+				b_res_3_size, pci_cardbus_mem_size, 0, 0);
 	}
 
 handle_done:
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 37/60] PCI: Cache window alignment value during bus sizing
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (35 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 36/60] PCI: Add __add_to_list() Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 38/60] PCI: Check if resource is allocated before trying to assign one Yinghai Lu
                   ` (23 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

There are several calling to window_alignment(), and we will have more
for alt_size support, cache the value instead of keeping on getting it.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index efa6d4e..4d2898d 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1240,6 +1240,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 					mask | IORESOURCE_PREFETCH, type);
 	LIST_HEAD(align_test_list);
 	LIST_HEAD(align_test_add_list);
+	resource_size_t window_align;
 
 	if (!b_res)
 		return -ENOSPC;
@@ -1249,6 +1250,8 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 		min_size = 0;
 	}
 
+	window_align = window_alignment(bus, b_res->flags);
+
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		int i;
 
@@ -1311,10 +1314,10 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 		}
 	}
 
-	max_align = max(max_align, window_alignment(bus, b_res->flags));
+	max_align = max(max_align, window_align);
 	if (size || min_size) {
 		min_align = calculate_mem_align(&align_test_list, max_align,
-				 size, window_alignment(bus, b_res->flags));
+						size, window_align);
 		size0 = calculate_memsize(size, min_size,
 				  resource_size(b_res), min_align);
 	}
@@ -1325,7 +1328,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 	if (sum_add_size > size && realloc_head) {
 		min_add_align = calculate_mem_align(&align_test_add_list,
 					max_add_align, sum_add_size,
-					window_alignment(bus, b_res->flags));
+					window_align);
 		size1 = calculate_memsize(sum_add_size, min_size,
 				 resource_size(b_res), min_add_align);
 	}
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 38/60] PCI: Check if resource is allocated before trying to assign one
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (36 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 37/60] PCI: Cache window alignment value during bus sizing Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 39/60] PCI: Separate out save_resources()/restore_resources() Yinghai Lu
                   ` (22 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

In following alt_size support, we will call pci_assign_resource() several
times on one resource list, and some resources could have been assigned
already.

Skip allocated resource in the list, as pci_assign_resource()
only can handle not assigned resource.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 4d2898d..b5529cc 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -385,7 +385,7 @@ static void assign_requested_resources_sorted(struct list_head *head,
 	list_for_each_entry(dev_res, head, list) {
 		res = dev_res->res;
 		idx = res - &dev_res->dev->resource[0];
-		if (resource_size(res) &&
+		if (!res->parent && resource_size(res) &&
 		    pci_assign_resource(dev_res->dev, idx)) {
 			if (fail_head)
 				add_to_list(fail_head, dev_res->dev, res);
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 39/60] PCI: Separate out save_resources()/restore_resources()
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (37 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 38/60] PCI: Check if resource is allocated before trying to assign one Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 40/60] PCI: Move comment to pci_need_to_release() Yinghai Lu
                   ` (21 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

We need to save/restore resources several times for alt_size support,
separate the save_resources()/resources() to save some lines later.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 49 ++++++++++++++++++++++++++++++-------------------
 1 file changed, 30 insertions(+), 19 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index b5529cc..1571245 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -435,6 +435,29 @@ static bool pci_need_to_release(unsigned long mask, struct resource *res)
 	return false;	/* should not get here */
 }
 
+static bool save_resources(struct list_head *head,
+			   struct list_head *save_head)
+{
+	struct pci_dev_resource *dev_res;
+
+	/* Save original start, end, flags etc at first */
+	list_for_each_entry(dev_res, head, list)
+		if (add_to_list(save_head, dev_res->dev, dev_res->res)) {
+			free_list(save_head);
+			return false;
+		}
+
+	return true;
+}
+
+static void restore_resource(struct pci_dev_resource *save_res,
+			     struct resource *res)
+{
+	res->start = save_res->start;
+	res->end = save_res->end;
+	res->flags = save_res->flags;
+}
+
 static void __assign_resources_sorted(struct list_head *head,
 				 struct list_head *realloc_head,
 				 struct list_head *fail_head)
@@ -472,13 +495,8 @@ static void __assign_resources_sorted(struct list_head *head,
 	if (!realloc_head || list_empty(realloc_head))
 		goto requested_and_reassign;
 
-	/* Save original start, end, flags etc at first */
-	list_for_each_entry(dev_res, head, list) {
-		if (add_to_list(&save_head, dev_res->dev, dev_res->res)) {
-			free_list(&save_head);
-			goto requested_and_reassign;
-		}
-	}
+	if (!save_resources(head, &save_head))
+		goto requested_and_reassign;
 
 	/* Update res in head list with add_size in realloc_head list */
 	list_for_each_entry(dev_res, head, list) {
@@ -547,12 +565,9 @@ static void __assign_resources_sorted(struct list_head *head,
 		}
 	}
 	/* Restore start/end/flags from saved list */
-	list_for_each_entry(save_res, &save_head, list) {
-		res = save_res->res;
-		res->start = save_res->start;
-		res->end = save_res->end;
-		res->flags = save_res->flags;
-	}
+	list_for_each_entry(save_res, &save_head, list)
+		restore_resource(save_res, save_res->res);
+
 	free_list(&save_head);
 
 requested_and_reassign:
@@ -2024,9 +2039,7 @@ again:
 	list_for_each_entry(fail_res, &fail_head, list) {
 		struct resource *res = fail_res->res;
 
-		res->start = fail_res->start;
-		res->end = fail_res->end;
-		res->flags = fail_res->flags;
+		restore_resource(fail_res, res);
 		if (fail_res->dev->subordinate) {
 			res->flags = 0;
 			/* last or third times and later */
@@ -2110,9 +2123,7 @@ again:
 	list_for_each_entry(fail_res, &fail_head, list) {
 		struct resource *res = fail_res->res;
 
-		res->start = fail_res->start;
-		res->end = fail_res->end;
-		res->flags = fail_res->flags;
+		restore_resource(fail_res, res);
 		if (fail_res->dev->subordinate) {
 			res->flags = 0;
 			/* last time */
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 40/60] PCI: Move comment to pci_need_to_release()
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (38 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 39/60] PCI: Separate out save_resources()/restore_resources() Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 41/60] PCI: Separate required+optional assigning to another function Yinghai Lu
                   ` (20 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

Move comment from caller to pci_need_to_release(), as we will have one new
caller for alt_size support.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 1571245..b4eb37d 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -414,6 +414,20 @@ static unsigned long pci_fail_res_type_mask(struct list_head *fail_head)
 
 static bool pci_need_to_release(unsigned long mask, struct resource *res)
 {
+	/*
+	 * Separate three resource type checking if we need to release
+	 * assigned resource.
+	 *	1. if there is io port assign fail, will release assigned
+	 *	   io port.
+	 *	2. if there is pref mmio assign fail, release assigned
+	 *	   pref mmio.
+	 *	   if assigned pref mmio's parent is non-pref mmio and there
+	 *	   is non-pref mmio assign fail, will release that assigned
+	 *	   pref mmio.
+	 *	3. if there is non-pref mmio assign fail or pref mmio
+	 *	   assigned fail, will release assigned non-pref mmio.
+	 */
+
 	if (res->flags & IORESOURCE_IO)
 		return !!(mask & IORESOURCE_IO);
 
@@ -470,19 +484,8 @@ static void __assign_resources_sorted(struct list_head *head,
 	 *  if could do that, could get out early.
 	 *  if could not do that, we still try to assign requested at first,
 	 *    then try to reassign add_size for some resources.
-	 *
-	 * Separate three resource type checking if we need to release
-	 * assigned resource after requested + add_size try.
-	 *	1. if there is io port assign fail, will release assigned
-	 *	   io port.
-	 *	2. if there is pref mmio assign fail, release assigned
-	 *	   pref mmio.
-	 *	   if assigned pref mmio's parent is non-pref mmio and there
-	 *	   is non-pref mmio assign fail, will release that assigned
-	 *	   pref mmio.
-	 *	3. if there is non-pref mmio assign fail or pref mmio
-	 *	   assigned fail, will release assigned non-pref mmio.
 	 */
+
 	LIST_HEAD(save_head);
 	LIST_HEAD(local_fail_head);
 	struct pci_dev_resource *save_res;
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 41/60] PCI: Separate required+optional assigning to another function
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (39 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 40/60] PCI: Move comment to pci_need_to_release() Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 42/60] PCI: Skip required+optional if there is no optional Yinghai Lu
                   ` (19 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

__assign_resources_sorted() is getting too big if we put alt_size support
into it.  Split out required+optional assigning code to another function.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 47 +++++++++++++++++++++++++++--------------------
 1 file changed, 27 insertions(+), 20 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index b4eb37d..a4f53ec 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -472,20 +472,9 @@ static void restore_resource(struct pci_dev_resource *save_res,
 	res->flags = save_res->flags;
 }
 
-static void __assign_resources_sorted(struct list_head *head,
-				 struct list_head *realloc_head,
-				 struct list_head *fail_head)
+static bool __assign_resources_required_optional_sorted(struct list_head *head,
+				 struct list_head *realloc_head)
 {
-	/*
-	 * Should not assign requested resources at first.
-	 *   they could be adjacent, so later reassign can not reallocate
-	 *   them one by one in parent resource window.
-	 * Try to assign requested + add_size at beginning
-	 *  if could do that, could get out early.
-	 *  if could not do that, we still try to assign requested at first,
-	 *    then try to reassign add_size for some resources.
-	 */
-
 	LIST_HEAD(save_head);
 	LIST_HEAD(local_fail_head);
 	struct pci_dev_resource *save_res;
@@ -494,12 +483,8 @@ static void __assign_resources_sorted(struct list_head *head,
 	resource_size_t add_align;
 	struct resource *res;
 
-	/* Check if optional add_size is there */
-	if (!realloc_head || list_empty(realloc_head))
-		goto requested_and_reassign;
-
 	if (!save_resources(head, &save_head))
-		goto requested_and_reassign;
+		return false;
 
 	/* Update res in head list with add_size in realloc_head list */
 	list_for_each_entry(dev_res, head, list) {
@@ -538,7 +523,8 @@ static void __assign_resources_sorted(struct list_head *head,
 			remove_from_list(realloc_head, dev_res->res);
 		free_list(&save_head);
 		free_list(head);
-		return;
+
+		return true;
 	}
 
 	/* check failed type */
@@ -573,7 +559,28 @@ static void __assign_resources_sorted(struct list_head *head,
 
 	free_list(&save_head);
 
-requested_and_reassign:
+	return false;
+}
+
+static void __assign_resources_sorted(struct list_head *head,
+				 struct list_head *realloc_head,
+				 struct list_head *fail_head)
+{
+	/*
+	 * Should not assign required resources at first.
+	 *   they could be adjacent, so later reassign can not reallocate
+	 *   them one by one in parent resource window.
+	 * Try to assign required + optional at beginning
+	 *  if could do that, could get out early.
+	 *  if could not do that, we still try to assign required at first,
+	 *    then try to reassign add_size for some resources.
+	 */
+
+	/* Check required+optional add */
+	if (realloc_head && !list_empty(realloc_head) &&
+	    __assign_resources_required_optional_sorted(head, realloc_head))
+		return;
+
 	sort_resources(head);
 
 	/* Satisfy the must-have resource requests */
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 42/60] PCI: Skip required+optional if there is no optional
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (40 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 41/60] PCI: Separate required+optional assigning to another function Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 43/60] PCI: Move saved required resource list out of required+optional assigning Yinghai Lu
                   ` (18 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

If the bridge does not support hotplug and has no child with sriov support,
We will not have optional resources. We could get out early and
don't try required+optional allocation.

Also in the loop that update res with optional add info, skip resource
that add_size is 0.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 32 +++++++++++++++++++++++++++++---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index a4f53ec..373f76f 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -449,6 +449,24 @@ static bool pci_need_to_release(unsigned long mask, struct resource *res)
 	return false;	/* should not get here */
 }
 
+static bool has_addon(struct list_head *head,
+			struct list_head *realloc_head)
+{
+	int add_count = 0;
+	struct pci_dev_resource *dev_res, *tmp_res;
+
+	/* check if we have add really */
+	list_for_each_entry(dev_res, head, list) {
+		tmp_res = res_to_dev_res(realloc_head, dev_res->res);
+		if (!tmp_res || !tmp_res->add_size)
+			continue;
+
+		add_count++;
+	}
+
+	return add_count != 0;
+}
+
 static bool save_resources(struct list_head *head,
 			   struct list_head *save_head)
 {
@@ -480,16 +498,24 @@ static bool __assign_resources_required_optional_sorted(struct list_head *head,
 	struct pci_dev_resource *save_res;
 	struct pci_dev_resource *dev_res, *tmp_res;
 	unsigned long fail_type;
-	resource_size_t add_align;
+	resource_size_t add_align, add_size;
 	struct resource *res;
 
+	if (!has_addon(head, realloc_head))
+		return false;
+
 	if (!save_resources(head, &save_head))
 		return false;
 
 	/* Update res in head list with add_size in realloc_head list */
 	list_for_each_entry(dev_res, head, list) {
 		res = dev_res->res;
-		res->end += get_res_add_size(realloc_head, res);
+		add_size = get_res_add_size(realloc_head, res);
+
+		if (!add_size)
+			continue;
+
+		res->end += add_size;
 
 		/*
 		 * There are two kinds of additional resources in the list:
@@ -577,7 +603,7 @@ static void __assign_resources_sorted(struct list_head *head,
 	 */
 
 	/* Check required+optional add */
-	if (realloc_head && !list_empty(realloc_head) &&
+	if (realloc_head &&
 	    __assign_resources_required_optional_sorted(head, realloc_head))
 		return;
 
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 43/60] PCI: Move saved required resource list out of required+optional assigning
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (41 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 42/60] PCI: Skip required+optional if there is no optional Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 44/60] PCI: Add alt_size ressource allocation support Yinghai Lu
                   ` (17 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

We will need to share saved required list for alt_size support, so move
it out from required+optional assigning.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 30 ++++++++++++++++--------------
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 373f76f..6c58b4a 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -455,6 +455,9 @@ static bool has_addon(struct list_head *head,
 	int add_count = 0;
 	struct pci_dev_resource *dev_res, *tmp_res;
 
+	if (!realloc_head)
+		return false;
+
 	/* check if we have add really */
 	list_for_each_entry(dev_res, head, list) {
 		tmp_res = res_to_dev_res(realloc_head, dev_res->res);
@@ -491,9 +494,9 @@ static void restore_resource(struct pci_dev_resource *save_res,
 }
 
 static bool __assign_resources_required_optional_sorted(struct list_head *head,
+				 struct list_head *save_head,
 				 struct list_head *realloc_head)
 {
-	LIST_HEAD(save_head);
 	LIST_HEAD(local_fail_head);
 	struct pci_dev_resource *save_res;
 	struct pci_dev_resource *dev_res, *tmp_res;
@@ -501,12 +504,6 @@ static bool __assign_resources_required_optional_sorted(struct list_head *head,
 	resource_size_t add_align, add_size;
 	struct resource *res;
 
-	if (!has_addon(head, realloc_head))
-		return false;
-
-	if (!save_resources(head, &save_head))
-		return false;
-
 	/* Update res in head list with add_size in realloc_head list */
 	list_for_each_entry(dev_res, head, list) {
 		res = dev_res->res;
@@ -547,7 +544,6 @@ static bool __assign_resources_required_optional_sorted(struct list_head *head,
 		/* Remove head list from realloc_head list */
 		list_for_each_entry(dev_res, head, list)
 			remove_from_list(realloc_head, dev_res->res);
-		free_list(&save_head);
 		free_list(head);
 
 		return true;
@@ -561,7 +557,7 @@ static bool __assign_resources_required_optional_sorted(struct list_head *head,
 		if (res->parent && !pci_need_to_release(fail_type, res)) {
 			/* remove it from realloc_head list */
 			remove_from_list(realloc_head, res);
-			remove_from_list(&save_head, res);
+			remove_from_list(save_head, res);
 			list_del(&dev_res->list);
 			kfree(dev_res);
 		}
@@ -580,11 +576,9 @@ static bool __assign_resources_required_optional_sorted(struct list_head *head,
 		}
 	}
 	/* Restore start/end/flags from saved list */
-	list_for_each_entry(save_res, &save_head, list)
+	list_for_each_entry(save_res, save_head, list)
 		restore_resource(save_res, save_res->res);
 
-	free_list(&save_head);
-
 	return false;
 }
 
@@ -602,16 +596,24 @@ static void __assign_resources_sorted(struct list_head *head,
 	 *    then try to reassign add_size for some resources.
 	 */
 
+	LIST_HEAD(save_head);
+
 	/* Check required+optional add */
-	if (realloc_head &&
-	    __assign_resources_required_optional_sorted(head, realloc_head))
+	if (has_addon(head, realloc_head) &&
+	    save_resources(head, &save_head) &&
+	    __assign_resources_required_optional_sorted(head, &save_head,
+					       realloc_head)) {
+		free_list(&save_head);
 		return;
+	}
 
 	sort_resources(head);
 
 	/* Satisfy the must-have resource requests */
 	assign_requested_resources_sorted(head, fail_head);
 
+	free_list(&save_head);
+
 	/* Try to satisfy any additional optional resource
 		requests */
 	if (realloc_head)
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 44/60] PCI: Add alt_size ressource allocation support
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (42 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 43/60] PCI: Move saved required resource list out of required+optional assigning Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:56   ` Linus Torvalds
  2016-04-08  0:15 ` [PATCH v11 45/60] PCI: Add support for more than two alt_size entries under same bridge Yinghai Lu
                   ` (16 subsequent siblings)
  60 siblings, 1 reply; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

On system with several pcie switches, BIOS allocate very tight resources
to the bridge bar, and it is not aligned to min_align as kernel allocation
code.

For example:
  02:03.0---0c:00.0---0d:04.0---18:00.0

  18:00.0 need 0x10000000, and 0x00010000.
BIOS only allocate 0x10100000 to 0d:04.0 and above bridges.
Later after using /sys/bus/pci/devices/0000:0c:00.0/remove to remove 0c:00.0,
rescan with /sys/bus/pci/rescan can not allocate 0x18000000 to 0c:00.0.
as current min_align solution will need 0x18000000.

Another example:
  00:1c.0---02:00.0---03:01.0---04:00.0---05:19.0---06:00.0

  06:00.0 need 0x4000000 and 0x800000.
BIOS only allocate 0x4800000 to 05:19.0 and 04:00.0.
when 05:19.0 get removed via /sys/bus/pci/devices/0000:05:19.0/remove,
rescan with /sys/bus/pci/rescan will fail.
 pci 0000:05:19.0: BAR 14: no space for [mem size 0x06000000]
 pci 0000:05:19.0: BAR 14: failed to assign [mem size 0x06000000]
 pci 0000:06:00.0: BAR 2: no space for [mem size 0x04000000 64bit]
 pci 0000:06:00.0: BAR 2: failed to assign [mem size 0x04000000 64bit]
 pci 0000:06:00.0: BAR 0: no space for [mem size 0x00800000]
 pci 0000:06:00.0: BAR 0: failed to assign [mem size 0x00800000]
current code try to use align 0x2000000 and size 0x6000000, but parent
bridge only have 0x4800000.

Introduce alt_align/alt_size and store them in realloc list in addition
to addon info, and will try it after min_align/min_size allocation fails.

The alt_align is max_align, and alt_size is aligned size with bridge
minimum window alignment.

On my test setup:
  00:1c.7---61:00.0---62:00.0

  62:00.0 needs 0x800000 and 0x20000, and 00:1c.7 only have 9M allocated
for mmio, with this patch we have

 pci 0000:61:00.0: bridge window [mem 0x00400000-0x00ffffff] to [bus 62]
   add_size 0 add_align 0 alt_size 900000 alt_align 800000
   req_size c00000 req_align 400000
 pci 0000:61:00.0: BAR 14: no space for [mem size 0x00c00000]
 pci 0000:61:00.0: BAR 14: failed to assign [mem size 0x00c00000]
 pci 0000:61:00.0: BAR 14: assigned [mem 0xdf000000-0xdf8fffff]
 pci 0000:62:00.0: BAR 0: assigned [mem 0xdf000000-0xdf7fffff pref]
 pci 0000:62:00.0: BAR 1: assigned [mem 0xdf800000-0xdf81ffff]
 pci 0000:61:00.0: PCI bridge to [bus 62]
 pci 0000:61:00.0:   bridge window [io  0x6000-0x6fff]
 pci 0000:61:00.0:   bridge window [mem 0xdf000000-0xdf8fffff]
 pci 0000:00:1c.7: PCI bridge to [bus 61-68]
 pci 0000:00:1c.7:   bridge window [io  0x6000-0x6fff]
 pci 0000:00:1c.7:   bridge window [mem 0xdf000000-0xdf8fffff]

So for 61:00.0 first try with 12M fails, and second try with 9M the
alt_size works. Later 62:00.0 get correct resource allocated too.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=100451
Reported-by: Yijing Wang <wangyijing@huawei.com>
Tested-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 203 +++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 191 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 6c58b4a..f3e9873 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -322,7 +322,7 @@ static void reassign_resources_sorted(struct list_head *realloc_head,
 {
 	struct resource *res;
 	struct pci_dev_resource *add_res, *tmp;
-	resource_size_t add_size, align;
+	resource_size_t add_size, align, r_size;
 	int idx;
 
 	list_for_each_entry_safe(add_res, tmp, realloc_head, list) {
@@ -338,12 +338,23 @@ static void reassign_resources_sorted(struct list_head *realloc_head,
 		idx = res - &add_res->dev->resource[0];
 		add_size = add_res->add_size;
 		align = add_res->min_align;
-		if (!resource_size(res)) {
+		if (!add_size || !align) /* alt_size only */
+			goto out;
+
+		r_size = resource_size(res);
+		if (!r_size) {
 			res->start = align;
 			res->end = res->start + add_size - 1;
 			if (pci_assign_resource(add_res->dev, idx))
 				reset_resource(res);
 		} else {
+			/* could just assigned with alt, add difference ? */
+			resource_size_t size;
+
+			size = add_res->end - add_res->start + 1;
+			if (r_size < size)
+				add_size += size - r_size;
+
 			res->flags |= add_res->flags &
 				 (IORESOURCE_STARTALIGN|IORESOURCE_SIZEALIGN);
 			if (pci_reassign_resource(add_res->dev, idx,
@@ -582,6 +593,104 @@ static bool __assign_resources_required_optional_sorted(struct list_head *head,
 	return false;
 }
 
+static bool __has_alt(struct list_head *head,
+		    struct list_head *realloc_head)
+{
+	int alt_count = 0;
+	struct pci_dev_resource *dev_res, *alt_res;
+
+	if (!realloc_head)
+		return false;
+
+	/* check if we have alt really */
+	list_for_each_entry(dev_res, head, list) {
+		alt_res = res_to_dev_res(realloc_head, dev_res->res);
+		if (!alt_res || !alt_res->alt_size)
+			continue;
+
+		alt_count++;
+	}
+
+	if (!alt_count)
+		return false;
+
+	return true;
+}
+
+static void __assign_resources_alt_sorted(struct list_head *head,
+				 struct list_head *save_head,
+				 struct list_head *realloc_head,
+				 struct list_head *local_fail_head)
+{
+	LIST_HEAD(local_alt_fail_head);
+	struct pci_dev_resource *dev_res;
+	struct pci_dev_resource *alt_res, *fail_res, *save_res;
+	unsigned long fail_type;
+	struct resource *res;
+
+	/* check failed type */
+	fail_type = pci_fail_res_type_mask(local_fail_head);
+	/* release resource with same type that failes */
+	list_for_each_entry(dev_res, head, list) {
+		res = dev_res->res;
+		if (res->parent) {
+			if (!pci_need_to_release(fail_type, res))
+				continue;
+
+			/*
+			 * have to use saved info, as resource that does not
+			 * have addon/alt is not in realloc list.
+			 */
+			save_res = res_to_dev_res(save_head, res);
+			if (!save_res)
+				continue;
+
+			dev_printk(KERN_DEBUG, &dev_res->dev->dev,
+				   "BAR %d: released %pR\n",
+				   (int)(res - &dev_res->dev->resource[0]),
+				   res);
+			release_resource(dev_res->res);
+			restore_resource(save_res, res);
+		} else {
+			/* restore fail one */
+			fail_res = res_to_dev_res(local_fail_head, res);
+			if (fail_res) {
+				restore_resource(fail_res, res);
+				remove_from_list(local_fail_head, res);
+			}
+		}
+
+		alt_res = res_to_dev_res(realloc_head, res);
+		if (!alt_res || !alt_res->alt_size)
+			continue;
+
+		/* change res to alt */
+		if (res->flags & IORESOURCE_STARTALIGN)
+			res->start = alt_res->alt_align;
+		else
+			res->start = 0;
+		res->end = res->start + alt_res->alt_size - 1;
+	}
+
+	sort_resources(head);
+	/* Satisfy the alt resource requests */
+	assign_requested_resources_sorted(head, &local_alt_fail_head);
+
+	/* update local fail list */
+	list_for_each_entry(fail_res, &local_alt_fail_head, list) {
+		res = fail_res->res;
+		dev_res = res_to_dev_res(realloc_head, res);
+		/* change res back to required */
+		if (dev_res && dev_res->alt_size)
+			restore_resource(dev_res, res);
+
+		if (!res_to_dev_res(local_fail_head, res))
+			add_to_list(local_fail_head, fail_res->dev, res);
+		reset_resource(res);
+	}
+	free_list(&local_alt_fail_head);
+}
+
 static void __assign_resources_sorted(struct list_head *head,
 				 struct list_head *realloc_head,
 				 struct list_head *fail_head)
@@ -597,6 +706,8 @@ static void __assign_resources_sorted(struct list_head *head,
 	 */
 
 	LIST_HEAD(save_head);
+	LIST_HEAD(local_fail_head);
+	bool has_alt;
 
 	/* Check required+optional add */
 	if (has_addon(head, realloc_head) &&
@@ -609,15 +720,29 @@ static void __assign_resources_sorted(struct list_head *head,
 
 	sort_resources(head);
 
+	has_alt = __has_alt(head, realloc_head);
+	if (has_alt && list_empty(&save_head))
+		save_resources(head, &save_head);
+
 	/* Satisfy the must-have resource requests */
-	assign_requested_resources_sorted(head, fail_head);
+	assign_requested_resources_sorted(head, &local_fail_head);
+
+	if (has_alt && !list_empty(&local_fail_head) && !list_empty(&save_head))
+		__assign_resources_alt_sorted(head, &save_head,
+					      realloc_head,
+					      &local_fail_head);
 
 	free_list(&save_head);
 
-	/* Try to satisfy any additional optional resource
-		requests */
+	/* Try to satisfy any additional optional resource requests */
 	if (realloc_head)
 		reassign_resources_sorted(realloc_head, head);
+
+	if (fail_head)
+		list_splice_tail(&local_fail_head, fail_head);
+	else
+		free_list(&local_fail_head);
+
 	free_list(head);
 }
 
@@ -1293,6 +1418,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 					mask | IORESOURCE_PREFETCH, type);
 	LIST_HEAD(align_test_list);
 	LIST_HEAD(align_test_add_list);
+	resource_size_t alt_size = 0, alt_align = 0;
 	resource_size_t window_align;
 
 	if (!b_res)
@@ -1351,6 +1477,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 
 			if (realloc_head) {
 				resource_size_t add_r_size, add_align;
+				struct pci_dev_resource *dev_res;
 
 				add_r_size = get_res_add_size(realloc_head, r);
 				add_align = get_res_add_align(realloc_head, r);
@@ -1363,6 +1490,17 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 				sum_add_size += r_size + add_r_size;
 				if (add_align > max_add_align)
 					max_add_align = add_align;
+
+				dev_res = res_to_dev_res(realloc_head, r);
+				if (dev_res && dev_res->alt_size) {
+					alt_size += dev_res->alt_size;
+					if (alt_align < dev_res->alt_align)
+						alt_align = dev_res->alt_align;
+				} else if (r_size > 1) {
+					alt_size += r_size;
+					if (alt_align < align)
+						alt_align = align;
+				}
 			}
 		}
 	}
@@ -1376,6 +1514,17 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 	}
 	free_align_test_list(&align_test_list);
 
+	if (size0 && realloc_head) {
+		alt_align = max(alt_align, window_align);
+		alt_size = calculate_memsize(alt_size, min_size,
+					     0, window_align);
+		/* required is better ? */
+		if (alt_size >= size0) {
+			alt_align = 0;
+			alt_size = 0;
+		}
+	}
+
 	if (sum_add_size < min_sum_size)
 		sum_add_size = min_sum_size;
 	if (sum_add_size > size && realloc_head) {
@@ -1397,13 +1546,43 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 	b_res->start = min_align;
 	b_res->end = size0 + min_align - 1;
 	b_res->flags |= IORESOURCE_STARTALIGN;
-	if (size1 > size0 && realloc_head) {
-		__add_to_list(realloc_head, bus->self, b_res, size1 - size0,
-				min_add_align, 0, 0);
-		dev_printk(KERN_DEBUG, &bus->self->dev, "bridge window %pR to %pR add_size %llx add_align %llx\n",
-			   b_res, &bus->busn_res,
-			   (unsigned long long) (size1 - size0),
-			   (unsigned long long) min_add_align);
+	if (realloc_head) {
+		resource_size_t final_add_size = 0;
+
+		if (size1 > size0)
+			final_add_size = size1 - size0;
+		else
+			min_add_align = 0;
+
+		/*
+		 * realloc list include three type entries
+		 * 1. optional only:
+		 *    add_size != 0, alt_size == 0, req_size == 0
+		 * 2. required only with smaller alt_size.
+		 *    add_size == 0, alt_size != 0, req_size > alt_size
+		 * 3. required + optional:
+		 *    add_size != 0, alt_size < req_size, req_size != 0
+		 *
+		 * So there is no req_size != 0, and alt_size == req_size.
+		 * in that case, we already set alt_size = 0.
+		 *
+		 * req_align/req_size is not stored directly, and we
+		 * have dev_res start/end/flags instead.
+		 */
+		if (final_add_size || alt_size) {
+			__add_to_list(realloc_head, bus->self, b_res,
+				      final_add_size, min_add_align,
+				      alt_size, alt_align);
+			dev_printk(KERN_DEBUG, &bus->self->dev,
+				   "bridge window %pR to %pR add_size %llx add_align %llx alt_size %llx alt_align %llx req_size %llx req_align %llx\n",
+				   b_res, &bus->busn_res,
+				   (unsigned long long)final_add_size,
+				   (unsigned long long)min_add_align,
+				   (unsigned long long)alt_size,
+				   (unsigned long long)alt_align,
+				   (unsigned long long)size0,
+				   (unsigned long long)min_align);
+		}
 	}
 	return 0;
 }
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 45/60] PCI: Add support for more than two alt_size entries under same bridge
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (43 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 44/60] PCI: Add alt_size ressource allocation support Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:15 ` [PATCH v11 46/60] PCI: Fix size calculation with old_size on rescan path Yinghai Lu
                   ` (15 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

When we have two bridges under parent bridge, and each child
bridge has alt_size, we need to increase parent alt_size to make
sure it could fit all alt entries.

In the patch, we first select one big size, and then keep reducing
the size and retrying to get the minimum value for alt_size.

For example, two bridges:
  one have 8M/8M, and 1M/1M children res.
  one have 4M/4M, and 1M/1M children res.

Then we have child pridges alt_align/alt_size: 8M/9M, 4M/5M.
Before this patch, parent bridge alt_align/alt_size is 8M/14M
that is wrong, as it can not fit two alt entries at all.
With this patch	parent bridge alt_align/alt_size: 8M/17M.
               8M            16M   20M   24M
  |------------|-------------|-----|-----|
	       8M			   25M
	       |---------------------------|
			       17M
               |---9M----------|   |-5M----|


At same time, child bridges required align/size: 4M/12M, 2M/6M.
and prarent bridge required align/size: 4M/20M.

So at last, we use 8M/17M as parent bridge alt_align/alt_size.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=100451
Reported-by: Yijing Wang <wangyijing@huawei.com>
Tested-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 54 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index f3e9873..88557b9 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1372,6 +1372,47 @@ out:
 	return good_align;
 }
 
+static resource_size_t calculate_mem_alt_size(struct list_head *head,
+				resource_size_t max_align, resource_size_t size,
+				resource_size_t align_low)
+{
+	struct align_test_res *p;
+	resource_size_t tmp;
+	resource_size_t good_size, bad_size;
+	int count = 0, order;
+
+	good_size = ALIGN(size, align_low);
+
+	list_for_each_entry(p, head, list)
+		count++;
+
+	if (count <= 1)
+		goto out;
+
+	sort_align_test(head);
+
+	tmp = max(size, max_align);
+	order = __fls(count);
+	if ((1ULL << order) < count)
+		order++;
+	good_size = ALIGN((tmp << order), align_low);
+	bad_size = ALIGN(size, align_low) - align_low;
+	size = good_size;
+	while (size > bad_size) {
+		/* check if align/size fit all entries */
+		if (is_align_size_good(head, max_align, size, 0))
+			good_size = size;
+		else
+			bad_size = size;
+
+		size = bad_size + ((good_size - bad_size) >> 1);
+		size = round_down(size, align_low);
+	}
+
+out:
+	return good_size;
+}
+
 static inline bool is_optional(int i)
 {
 
@@ -1418,6 +1459,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 					mask | IORESOURCE_PREFETCH, type);
 	LIST_HEAD(align_test_list);
 	LIST_HEAD(align_test_add_list);
+	LIST_HEAD(align_test_alt_list);
 	resource_size_t alt_size = 0, alt_align = 0;
 	resource_size_t window_align;
 
@@ -1493,10 +1535,17 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 
 				dev_res = res_to_dev_res(realloc_head, r);
 				if (dev_res && dev_res->alt_size) {
+					add_to_align_test_list(
+						&align_test_alt_list,
+						dev_res->alt_align,
+						dev_res->alt_size);
 					alt_size += dev_res->alt_size;
 					if (alt_align < dev_res->alt_align)
 						alt_align = dev_res->alt_align;
 				} else if (r_size > 1) {
+					add_to_align_test_list(
+						&align_test_alt_list,
+						align, r_size);
 					alt_size += r_size;
 					if (alt_align < align)
 						alt_align = align;
@@ -1516,14 +1565,17 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 
 	if (size0 && realloc_head) {
 		alt_align = max(alt_align, window_align);
-		alt_size = calculate_memsize(alt_size, min_size,
-					     0, window_align);
+		/* need to increase size to fit more alt */
+		alt_size = calculate_mem_alt_size(&align_test_alt_list,
+						  alt_align, alt_size,
+						  window_align);
 		/* required is better ? */
 		if (alt_size >= size0) {
 			alt_align = 0;
 			alt_size = 0;
 		}
 	}
+	free_align_test_list(&align_test_alt_list);
 
 	if (sum_add_size < min_sum_size)
 		sum_add_size = min_sum_size;
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 46/60] PCI: Fix size calculation with old_size on rescan path
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (44 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 45/60] PCI: Add support for more than two alt_size entries under same bridge Yinghai Lu
@ 2016-04-08  0:15 ` Yinghai Lu
  2016-04-08  0:16 ` [PATCH v11 47/60] PCI: Don't add too much optional size for hotplug bridge io Yinghai Lu
                   ` (14 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:15 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

On booting path, we don't pass realloc at first, and treat all optional
just as required, in some case we can have smaller size/align with optional
than required only.

  04:00.0 has children bridges: 05:03.0, 05:04.0
pref layout after booting path like followings:

pci 0000:04:00.0: BAR 9: assigned [mem 0x84000000-0x9fffffff 64bit pref]
pci 0000:05:04.0: BAR 9: assigned [mem 0x88000000-0x9fffffff 64bit pref]
pci 0000:05:03.0: BAR 9: assigned [mem 0x84000000-0x841fffff 64bit pref]
pci 0000:05:03.0: PCI bridge to [bus 08-0f]
pci 0000:05:03.0:   bridge window [mem 0x84000000-0x841fffff 64bit pref]
pci 0000:05:04.0: PCI bridge to [bus 10]
pci 0000:05:04.0:   bridge window [mem 0x88000000-0x9fffffff 64bit pref]
pci 0000:04:00.0: PCI bridge to [bus 05-10]
pci 0000:04:00.0:   bridge window [mem 0x84000000-0x9fffffff 64bit pref]

so the old size in rescan for 04:00.0 would be 0x1c000000, and align is 0x4000000

during remove and rescan:

pci 0000:05:03.0: bridge window [mem 0x00000000-0xffffffffffffffff 64bit pref] to [bus 08-0f] add_size 200000 add_align 100000 alt_size 0 alt_align 0 must_size 0 must_align 0
pci 0000:05:03.0: bridge window [mem 0x00000000-0xffffffffffffffff] to [bus 08-0f] add_size 200000 add_align 100000 alt_size 0 alt_align 0 must_size 0 must_align 0
pci 0000:05:04.0: bridge window [mem 0x08000000-0x1fffffff 64bit pref] to [bus 10] add_size 0 add_align 0 alt_size 10100000 alt_align 10000000 must_size 18000000 must_align 8000000
pci 0000:05:03.0: BAR 9: [mem 0x00000000-0xffffffffffffffff 64bit pref] get_res_add_size add_size   200000
pci 0000:05:03.0: BAR 9: [mem 0x00000000-0xffffffffffffffff 64bit pref] get_res_add_align min_align 100000
pci 0000:04:00.0: bridge window [mem 0x08000000-0x27ffffff 64bit pref] to [bus 05-10] add_size 0 add_align 0 alt_size 10100000 alt_align 10000000 must_size 20000000 must_align 8000000

align old size 0x1c000000 to 0x2000000 as size0, 0x1c000000 as size1.
so for 04:00.0 will have big must and no optional size anymore.

So don't align old size, then we will have same size0 and size1,
and use smaller add_align as must align.

After the patch, rescan works properly.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 88557b9..d2712d8 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1133,9 +1133,9 @@ static resource_size_t calculate_memsize(resource_size_t size,
 		size = min_size;
 	if (old_size == 1)
 		old_size = 0;
+	size = ALIGN(size, align);
 	if (size < old_size)
 		size = old_size;
-	size = ALIGN(size, align);
 	return size;
 }
 
@@ -1595,6 +1595,17 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 		b_res->flags = 0;
 		return 0;
 	}
+
+	/*
+	 * It happens when boot path is not passing realloc
+	 * and later rescan is passing realloc.
+	 * The old value from boot path is bigger, and calculate_size will
+	 * use old value as size0 and size1, and also have
+	 * chance optional align is smaller than must only align.
+	 */
+	if(size0 == size1 && min_align > min_add_align)
+		min_align = min_add_align;
+
 	b_res->start = min_align;
 	b_res->end = size0 + min_align - 1;
 	b_res->flags |= IORESOURCE_STARTALIGN;
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 47/60] PCI: Don't add too much optional size for hotplug bridge io
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (45 preceding siblings ...)
  2016-04-08  0:15 ` [PATCH v11 46/60] PCI: Fix size calculation with old_size on rescan path Yinghai Lu
@ 2016-04-08  0:16 ` Yinghai Lu
  2016-04-08  0:16 ` [PATCH v11 48/60] PCI: Move ISA io port align out of calculate_iosize() Yinghai Lu
                   ` (13 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:16 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

Same as patch for MMIO (PCI: Don't add too much optional size for hotplug
bridge MMIO), and this one is for io port.

It will compare required+optional with min_sum_size to get smaller
optional size.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index d2712d8..11a4c1d 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1176,7 +1176,6 @@ static resource_size_t window_alignment(struct pci_bus *bus,
  *
  * @bus : the bus
  * @min_size : the minimum io window that must to be allocated
- * @add_size : additional optional io window
  * @realloc_head : track the additional io window on this list
  *
  * Sizing the IO windows of the PCI-PCI bridge is trivial,
@@ -1185,9 +1184,11 @@ static resource_size_t window_alignment(struct pci_bus *bus,
  * We must be careful with the ISA aliasing though.
  */
 static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
-		resource_size_t add_size, struct list_head *realloc_head)
+			 struct list_head *realloc_head)
 {
 	struct pci_dev *dev;
+	resource_size_t min_sum_size = 0;
+	resource_size_t sum_add_size;
 	struct resource *b_res = find_free_bus_resource(bus, IORESOURCE_IO,
 							IORESOURCE_IO);
 	resource_size_t size = 0, size0 = 0, size1 = 0;
@@ -1197,6 +1198,11 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
 	if (!b_res)
 		return;
 
+	if (realloc_head) {
+		min_sum_size = min_size;
+		min_size = 0;
+	}
+
 	min_align = window_alignment(bus, IORESOURCE_IO);
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		int i;
@@ -1226,10 +1232,11 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
 
 	size0 = calculate_iosize(size, min_size, size1,
 			resource_size(b_res), min_align);
-	if (children_add_size > add_size)
-		add_size = children_add_size;
-	size1 = (!realloc_head || (realloc_head && !add_size)) ? size0 :
-		calculate_iosize(size, min_size, add_size + size1,
+	sum_add_size = children_add_size + size + size1;
+	if (sum_add_size < min_sum_size)
+		sum_add_size = min_sum_size;
+	size1 = !realloc_head ? size0 :
+		calculate_iosize(size, min_size, sum_add_size - size,
 			resource_size(b_res), min_align);
 	if (!size0 && !size1) {
 		if (b_res->start || b_res->end)
@@ -1757,7 +1764,7 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
 {
 	struct pci_dev *dev;
 	unsigned long mask, prefmask, type2 = 0, type3 = 0;
-	resource_size_t min_mem_size = 0, additional_io_size = 0;
+	resource_size_t min_mem_size = 0, min_io_size = 0;
 	struct resource *b_res;
 	int ret;
 
@@ -1793,13 +1800,12 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
 
 	case PCI_CLASS_BRIDGE_PCI:
 		if (bus->self->is_hotplug_bridge) {
-			additional_io_size  = pci_hotplug_io_size;
+			min_io_size  = pci_hotplug_io_size;
 			min_mem_size = pci_hotplug_mem_size;
 		}
 		/* Fall through */
 	default:
-		pbus_size_io(bus, realloc_head ? 0 : additional_io_size,
-			     additional_io_size, realloc_head);
+		pbus_size_io(bus, min_io_size, realloc_head);
 
 		/*
 		 * If there's a 64-bit prefetchable MMIO window, compute
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 48/60] PCI: Move ISA io port align out of calculate_iosize()
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (46 preceding siblings ...)
  2016-04-08  0:16 ` [PATCH v11 47/60] PCI: Don't add too much optional size for hotplug bridge io Yinghai Lu
@ 2016-04-08  0:16 ` Yinghai Lu
  2016-04-08  0:16 ` [PATCH v11 49/60] PCI: Don't add too much io port for hotplug bridge with old size Yinghai Lu
                   ` (12 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:16 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

We need to move ISA io port align out of calculate_iosize(),
so we could unify calculate_iosize and calculate_memsize later.

That extra aligning or offset is to work around ISA devices:
When one bridge have several children devices, and every device
has several io port resources and resource size < 0x400.
We need to check size, and add extra size to make sure bit8/9
to be zero.

Also need to apply same checking for optional size path.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 39 +++++++++++++++++++++++++++------------
 1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 11a4c1d..c202854 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1113,11 +1113,6 @@ static resource_size_t calculate_iosize(resource_size_t size,
 		size = min_size;
 	if (old_size == 1)
 		old_size = 0;
-	/* To be fixed in 2.5: we should have sort of HAVE_ISA
-	   flag in the struct pci_bus. */
-#if defined(CONFIG_ISA) || defined(CONFIG_EISA)
-	size = (size & 0xff) + ((size & ~0xffUL) << 2);
-#endif
 	size = ALIGN(size + size1, align);
 	if (size < old_size)
 		size = old_size;
@@ -1171,6 +1166,18 @@ static resource_size_t window_alignment(struct pci_bus *bus,
 	return max(align, arch_align);
 }
 
+static resource_size_t size_aligned_for_isa(resource_size_t size)
+{
+	/*
+	 * To be fixed in 2.5: we should have sort of HAVE_ISA
+	 *  flag in the struct pci_bus.
+	 */
+#if defined(CONFIG_ISA) || defined(CONFIG_EISA)
+	size = (size & 0xff) + ((size & ~0xffUL) << 2);
+#endif
+	return size;
+}
+
 /**
  * pbus_size_io() - size the io window of a given bus
  *
@@ -1188,11 +1195,10 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
 {
 	struct pci_dev *dev;
 	resource_size_t min_sum_size = 0;
-	resource_size_t sum_add_size;
 	struct resource *b_res = find_free_bus_resource(bus, IORESOURCE_IO,
 							IORESOURCE_IO);
 	resource_size_t size = 0, size0 = 0, size1 = 0;
-	resource_size_t children_add_size = 0;
+	resource_size_t sum_add_size = 0, sum_add_size1 = 0;
 	resource_size_t min_align, align;
 
 	if (!b_res)
@@ -1209,7 +1215,7 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
 
 		for (i = 0; i < PCI_NUM_RESOURCES; i++) {
 			struct resource *r = &dev->resource[i];
-			unsigned long r_size;
+			unsigned long r_size, r_add_size;
 
 			if (r->parent || !(r->flags & IORESOURCE_IO))
 				continue;
@@ -1225,18 +1231,27 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
 			if (align > min_align)
 				min_align = align;
 
-			if (realloc_head)
-				children_add_size += get_res_add_size(realloc_head, r);
+			if (realloc_head) {
+				r_add_size = get_res_add_size(realloc_head, r);
+				r_add_size += r_size;
+				if (r_add_size < 0x400)
+					/* Might be re-aligned for ISA */
+					sum_add_size += r_add_size;
+				else
+					sum_add_size1 += r_add_size;
+			}
 		}
 	}
 
+	size = size_aligned_for_isa(size);
 	size0 = calculate_iosize(size, min_size, size1,
 			resource_size(b_res), min_align);
-	sum_add_size = children_add_size + size + size1;
+	sum_add_size = size_aligned_for_isa(sum_add_size);
+	sum_add_size += sum_add_size1;
 	if (sum_add_size < min_sum_size)
 		sum_add_size = min_sum_size;
 	size1 = !realloc_head ? size0 :
-		calculate_iosize(size, min_size, sum_add_size - size,
+		calculate_iosize(sum_add_size, min_size, 0,
 			resource_size(b_res), min_align);
 	if (!size0 && !size1) {
 		if (b_res->start || b_res->end)
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 49/60] PCI: Don't add too much io port for hotplug bridge with old size
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (47 preceding siblings ...)
  2016-04-08  0:16 ` [PATCH v11 48/60] PCI: Move ISA io port align out of calculate_iosize() Yinghai Lu
@ 2016-04-08  0:16 ` Yinghai Lu
  2016-04-08  0:16 ` [PATCH v11 50/60] PCI: Unify calculate_size() for io port and MMIO Yinghai Lu
                   ` (11 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:16 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

Now we add too much for hotplug bridge io port.
For example, when hotplug bridge has two children bridges,
every child bridge will need 0x1000, so size1 will be 0x2000
and size is 0. The min_size for the hotplug bridge is 0x100.
with old version calculate_iosize, we get 0x3000 for final
size as we are using size to compare with min_size at first.
That is not right, we should have 0x2000.

We can check size+size1 with min_size for io port, and just add size1
to size without passing extra size1 into calculate_iosize().

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index c202854..930dcbd 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1105,7 +1105,6 @@ static struct resource *find_free_bus_resource(struct pci_bus *bus,
 
 static resource_size_t calculate_iosize(resource_size_t size,
 		resource_size_t min_size,
-		resource_size_t size1,
 		resource_size_t old_size,
 		resource_size_t align)
 {
@@ -1113,7 +1112,7 @@ static resource_size_t calculate_iosize(resource_size_t size,
 		size = min_size;
 	if (old_size == 1)
 		old_size = 0;
-	size = ALIGN(size + size1, align);
+	size = ALIGN(size, align);
 	if (size < old_size)
 		size = old_size;
 	return size;
@@ -1244,14 +1243,15 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
 	}
 
 	size = size_aligned_for_isa(size);
-	size0 = calculate_iosize(size, min_size, size1,
+	size += size1;
+	size0 = calculate_iosize(size, min_size,
 			resource_size(b_res), min_align);
 	sum_add_size = size_aligned_for_isa(sum_add_size);
 	sum_add_size += sum_add_size1;
 	if (sum_add_size < min_sum_size)
 		sum_add_size = min_sum_size;
 	size1 = !realloc_head ? size0 :
-		calculate_iosize(sum_add_size, min_size, 0,
+		calculate_iosize(sum_add_size, min_size,
 			resource_size(b_res), min_align);
 	if (!size0 && !size1) {
 		if (b_res->start || b_res->end)
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 50/60] PCI: Unify calculate_size() for io port and MMIO
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (48 preceding siblings ...)
  2016-04-08  0:16 ` [PATCH v11 49/60] PCI: Don't add too much io port for hotplug bridge with old size Yinghai Lu
@ 2016-04-08  0:16 ` Yinghai Lu
  2016-04-08  0:16 ` [PATCH v11 51/60] PCI: Allow bridge optional only io port resource required size to be 0 Yinghai Lu
                   ` (10 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:16 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

Now calculate_memsize() and calculate_iosize() is the same.

Change them to calculate_size().

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 25 +++++--------------------
 1 file changed, 5 insertions(+), 20 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 930dcbd..b071035 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1103,22 +1103,7 @@ static struct resource *find_free_bus_resource(struct pci_bus *bus,
 	return NULL;
 }
 
-static resource_size_t calculate_iosize(resource_size_t size,
-		resource_size_t min_size,
-		resource_size_t old_size,
-		resource_size_t align)
-{
-	if (size < min_size)
-		size = min_size;
-	if (old_size == 1)
-		old_size = 0;
-	size = ALIGN(size, align);
-	if (size < old_size)
-		size = old_size;
-	return size;
-}
-
-static resource_size_t calculate_memsize(resource_size_t size,
+static resource_size_t calculate_size(resource_size_t size,
 		resource_size_t min_size,
 		resource_size_t old_size,
 		resource_size_t align)
@@ -1244,14 +1229,14 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
 
 	size = size_aligned_for_isa(size);
 	size += size1;
-	size0 = calculate_iosize(size, min_size,
+	size0 = calculate_size(size, min_size,
 			resource_size(b_res), min_align);
 	sum_add_size = size_aligned_for_isa(sum_add_size);
 	sum_add_size += sum_add_size1;
 	if (sum_add_size < min_sum_size)
 		sum_add_size = min_sum_size;
 	size1 = !realloc_head ? size0 :
-		calculate_iosize(sum_add_size, min_size,
+		calculate_size(sum_add_size, min_size,
 			resource_size(b_res), min_align);
 	if (!size0 && !size1) {
 		if (b_res->start || b_res->end)
@@ -1580,7 +1565,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 	if (size || min_size) {
 		min_align = calculate_mem_align(&align_test_list, max_align,
 						size, window_align);
-		size0 = calculate_memsize(size, min_size,
+		size0 = calculate_size(size, min_size,
 				  resource_size(b_res), min_align);
 	}
 	free_align_test_list(&align_test_list);
@@ -1605,7 +1590,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 		min_add_align = calculate_mem_align(&align_test_add_list,
 					max_add_align, sum_add_size,
 					window_align);
-		size1 = calculate_memsize(sum_add_size, min_size,
+		size1 = calculate_size(sum_add_size, min_size,
 				 resource_size(b_res), min_add_align);
 	}
 	free_align_test_list(&align_test_add_list);
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 51/60] PCI: Allow bridge optional only io port resource required size to be 0
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (49 preceding siblings ...)
  2016-04-08  0:16 ` [PATCH v11 50/60] PCI: Unify calculate_size() for io port and MMIO Yinghai Lu
@ 2016-04-08  0:16 ` Yinghai Lu
  2016-04-08  0:16 ` [PATCH v11 52/60] PCI: Unify skip_ioresource_align() Yinghai Lu
                   ` (9 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:16 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

When there is no child device under the non hotplug bridge,
We can use 0 for required size, and do not use old size as required size.

That will save some io port range for other bridges, as BIOS could do
some partial assign, and we want to use those not used io port range.

When there is child device, size will not be 0.
when the bridge supports hotplug, min_size will not be 0.
So they will still honor the old size as required size.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index b071035..28dfd8e 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1229,8 +1229,9 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
 
 	size = size_aligned_for_isa(size);
 	size += size1;
-	size0 = calculate_size(size, min_size,
-			resource_size(b_res), min_align);
+	if (size || min_size)
+		size0 = calculate_size(size, min_size,
+					resource_size(b_res), min_align);
 	sum_add_size = size_aligned_for_isa(sum_add_size);
 	sum_add_size += sum_add_size1;
 	if (sum_add_size < min_sum_size)
@@ -1246,7 +1247,7 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
 		return;
 	}
 
-	b_res->start = min_align;
+	b_res->start = size0 ? min_align : 0;
 	b_res->end = b_res->start + size0 - 1;
 	b_res->flags |= IORESOURCE_STARTALIGN;
 	if (size1 > size0 && realloc_head) {
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 52/60] PCI: Unify skip_ioresource_align()
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (50 preceding siblings ...)
  2016-04-08  0:16 ` [PATCH v11 51/60] PCI: Allow bridge optional only io port resource required size to be 0 Yinghai Lu
@ 2016-04-08  0:16 ` Yinghai Lu
  2016-04-08  0:16 ` [PATCH v11 53/60] PCI: Kill macro checking for bus io port sizing Yinghai Lu
                   ` (8 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:16 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu, Michal Simek, Paul Mackerras, Michael Ellerman,
	Arnd Bergmann, linuxppc-dev, linux-arch

There are powerpc generic version and x86 local version for
skip_ioresource_align().

Move the powerpc version to setup-bus.c, and kill x86 local version.

Also kill dummy version in microblaze.

Cc: Michal Simek <monstr@monstr.eu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-arch@vger.kernel.org
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/kernel/pci-common.c | 11 +----------
 arch/x86/include/asm/pci_x86.h   |  1 -
 arch/x86/pci/common.c            |  4 ++--
 arch/x86/pci/i386.c              | 11 +----------
 drivers/pci/setup-bus.c          |  9 +++++++++
 include/linux/pci.h              |  2 ++
 6 files changed, 15 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 0f7a60f..2a7f4fd 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1053,15 +1053,6 @@ void pci_fixup_cardbus(struct pci_bus *bus)
 	pcibios_setup_bus_devices(bus);
 }
 
-
-static int skip_isa_ioresource_align(struct pci_dev *dev)
-{
-	if (pci_has_flag(PCI_CAN_SKIP_ISA_ALIGN) &&
-	    !(dev->bus->bridge_ctl & PCI_BRIDGE_CTL_ISA))
-		return 1;
-	return 0;
-}
-
 /*
  * We need to avoid collisions with `mirrored' VGA ports
  * and other strange ISA hardware, so we always want the
@@ -1082,7 +1073,7 @@ resource_size_t pcibios_align_resource(void *data, const struct resource *res,
 	resource_size_t start = res->start;
 
 	if (res->flags & IORESOURCE_IO) {
-		if (skip_isa_ioresource_align(dev))
+		if (skip_isa_ioresource_align(dev->bus))
 			return start;
 		if (start & 0x300)
 			start = (start + 0x3ff) & ~0x3ff;
diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
index d08eacd2..d1f919e 100644
--- a/arch/x86/include/asm/pci_x86.h
+++ b/arch/x86/include/asm/pci_x86.h
@@ -28,7 +28,6 @@ do {						\
 #define PCI_ASSIGN_ROMS		0x1000
 #define PCI_BIOS_IRQ_SCAN	0x2000
 #define PCI_ASSIGN_ALL_BUSSES	0x4000
-#define PCI_CAN_SKIP_ISA_ALIGN	0x8000
 #define PCI_USE__CRS		0x10000
 #define PCI_CHECK_ENABLE_AMD_MMCONF	0x20000
 #define PCI_HAS_IO_ECS		0x40000
diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index 381a43c..09a16b7 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -82,7 +82,7 @@ DEFINE_RAW_SPINLOCK(pci_config_lock);
 
 static int __init can_skip_ioresource_align(const struct dmi_system_id *d)
 {
-	pci_probe |= PCI_CAN_SKIP_ISA_ALIGN;
+	pci_add_flags(PCI_CAN_SKIP_ISA_ALIGN);
 	printk(KERN_INFO "PCI: %s detected, can skip ISA alignment\n", d->ident);
 	return 0;
 }
@@ -618,7 +618,7 @@ char *__init pcibios_setup(char *str)
 		pci_routeirq = 1;
 		return NULL;
 	} else if (!strcmp(str, "skip_isa_align")) {
-		pci_probe |= PCI_CAN_SKIP_ISA_ALIGN;
+		pci_add_flags(PCI_CAN_SKIP_ISA_ALIGN);
 		return NULL;
 	} else if (!strcmp(str, "noioapicquirk")) {
 		noioapicquirk = 1;
diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
index 0a9f2ca..cf296f5 100644
--- a/arch/x86/pci/i386.c
+++ b/arch/x86/pci/i386.c
@@ -128,15 +128,6 @@ static void __init pcibios_fw_addr_list_del(void)
 	pcibios_fw_addr_done = true;
 }
 
-static int
-skip_isa_ioresource_align(struct pci_dev *dev) {
-
-	if ((pci_probe & PCI_CAN_SKIP_ISA_ALIGN) &&
-	    !(dev->bus->bridge_ctl & PCI_BRIDGE_CTL_ISA))
-		return 1;
-	return 0;
-}
-
 /*
  * We need to avoid collisions with `mirrored' VGA ports
  * and other strange ISA hardware, so we always want the
@@ -158,7 +149,7 @@ pcibios_align_resource(void *data, const struct resource *res,
 	resource_size_t start = res->start;
 
 	if (res->flags & IORESOURCE_IO) {
-		if (skip_isa_ioresource_align(dev))
+		if (skip_isa_ioresource_align(dev->bus))
 			return start;
 		if (start & 0x300)
 			start = (start + 0x3ff) & ~0x3ff;
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 28dfd8e..5ba4bf5 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1150,6 +1150,15 @@ static resource_size_t window_alignment(struct pci_bus *bus,
 	return max(align, arch_align);
 }
 
+int skip_isa_ioresource_align(struct pci_bus *bus)
+{
+	if (pci_has_flag(PCI_CAN_SKIP_ISA_ALIGN) &&
+	    !(bus->bridge_ctl & PCI_BRIDGE_CTL_ISA))
+		return 1;
+
+	return 0;
+}
+
 static resource_size_t size_aligned_for_isa(resource_size_t size)
 {
 	/*
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 979be25..d7b1ceb 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -765,6 +765,8 @@ static inline void pci_add_flags(int flags) { pci_flags |= flags; }
 static inline void pci_clear_flags(int flags) { pci_flags &= ~flags; }
 static inline int pci_has_flag(int flag) { return pci_flags & flag; }
 
+int skip_isa_ioresource_align(struct pci_bus *bus);
+
 void pcie_bus_configure_settings(struct pci_bus *bus);
 
 enum pcie_bus_config_types {
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 53/60] PCI: Kill macro checking for bus io port sizing
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (51 preceding siblings ...)
  2016-04-08  0:16 ` [PATCH v11 52/60] PCI: Unify skip_ioresource_align() Yinghai Lu
@ 2016-04-08  0:16 ` Yinghai Lu
  2016-04-08  0:16 ` [PATCH v11 54/60] resources: Make allocate_resource() return best fit resource Yinghai Lu
                   ` (7 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:16 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

We can use new generic version skip_isa_ioresource_align() instead
of macro, and then kill the marco.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 5ba4bf5..65a41e7 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1159,15 +1159,12 @@ int skip_isa_ioresource_align(struct pci_bus *bus)
 	return 0;
 }
 
-static resource_size_t size_aligned_for_isa(resource_size_t size)
+static resource_size_t size_aligned_for_isa(resource_size_t size,
+					    struct pci_bus *bus)
 {
-	/*
-	 * To be fixed in 2.5: we should have sort of HAVE_ISA
-	 *  flag in the struct pci_bus.
-	 */
-#if defined(CONFIG_ISA) || defined(CONFIG_EISA)
-	size = (size & 0xff) + ((size & ~0xffUL) << 2);
-#endif
+	if (!skip_isa_ioresource_align(bus))
+		size = (size & 0xff) + ((size & ~0xffUL) << 2);
+
 	return size;
 }
 
@@ -1236,12 +1233,12 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
 		}
 	}
 
-	size = size_aligned_for_isa(size);
+	size = size_aligned_for_isa(size, bus);
 	size += size1;
 	if (size || min_size)
 		size0 = calculate_size(size, min_size,
 					resource_size(b_res), min_align);
-	sum_add_size = size_aligned_for_isa(sum_add_size);
+	sum_add_size = size_aligned_for_isa(sum_add_size, bus);
 	sum_add_size += sum_add_size1;
 	if (sum_add_size < min_sum_size)
 		sum_add_size = min_sum_size;
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 54/60] resources: Make allocate_resource() return best fit resource
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (52 preceding siblings ...)
  2016-04-08  0:16 ` [PATCH v11 53/60] PCI: Kill macro checking for bus io port sizing Yinghai Lu
@ 2016-04-08  0:16 ` Yinghai Lu
  2016-04-08  0:16 ` [PATCH v11 55/60] PCI, x86: Allocate from high in available window for MMIO Yinghai Lu
                   ` (6 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:16 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

Current code just allocate from first avail window.

We can find all suitable empty slots and pick one with smallest size, so
we could save the big slot for needed ones later when we have several pci
bridges under parent bridge and some bridges get assigned from bios and we
need to assign others in kernel.

For examples: we have window
[0xc0000000, 0xd0000000), and [0xe0000000,0xe1000000)

and we try allocate 0x200000 size resource.

in this patch will reserve [0xc0000000, 0xd0000000) and
[0xe0000000,0xe1000000) at first, then pick [0xe0000000,0xe1000000)
to allocate 0x200000 size.

-v2: updated after __allocate_resource change, and add field in constraint
        instead of passing it directly.
-v3: Use best fit instead of just fit according to Bjorn.
-v4: fix the warning found by Huang Ying.


Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 kernel/resource.c | 76 ++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 61 insertions(+), 15 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index c5dbe02..d91ebc5 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -611,7 +611,7 @@ static void resource_clip(struct resource *res, resource_size_t min,
  * alignment constraints
  */
 static int __find_resource(struct resource *root, struct resource *old,
-			 struct resource *new,
+			 struct resource *new, struct resource *availx,
 			 resource_size_t  size,
 			 struct resource_constraint *constraint)
 {
@@ -651,6 +651,11 @@ static int __find_resource(struct resource *root, struct resource *old,
 			if (resource_contains(&avail, &alloc)) {
 				new->start = alloc.start;
 				new->end = alloc.end;
+				if (availx) {
+					availx->start = avail.start;
+					availx->end = avail.end;
+					availx->flags = avail.flags;
+				}
 				return 0;
 			}
 		}
@@ -665,16 +670,6 @@ next:		if (!this || this->end == root->end)
 	return -EBUSY;
 }
 
-/*
- * Find empty slot in the resource tree given range and alignment.
- */
-static int find_resource(struct resource *root, struct resource *new,
-			resource_size_t size,
-			struct resource_constraint  *constraint)
-{
-	return  __find_resource(root, NULL, new, size, constraint);
-}
-
 /**
  * reallocate_resource - allocate a slot in the resource tree given range & alignment.
  *	The resource will be relocated if the new size cannot be reallocated in the
@@ -694,8 +689,8 @@ static int reallocate_resource(struct resource *root, struct resource *old,
 	struct resource *conflict;
 
 	write_lock(&resource_lock);
-
-	if ((err = __find_resource(root, old, &new, newsize, constraint)))
+	err = __find_resource(root, old, &new, NULL, newsize, constraint);
+	if (err)
 		goto out;
 
 	if (resource_contains(&new, old)) {
@@ -723,10 +718,16 @@ out:
 	return err;
 }
 
+struct good_resource {
+	struct list_head list;
+	struct resource avail;
+	struct resource new;
+};
 
 /**
  * allocate_resource - allocate empty slot in the resource tree given range & alignment.
- * 	The resource will be reallocated with a new size if it was already allocated
+ *	The resource will be reallocated with a new size if it was already
+ *	allocated
  * @root: root resource descriptor
  * @new: resource descriptor desired by caller
  * @size: requested resource region size
@@ -747,6 +748,9 @@ int allocate_resource(struct resource *root, struct resource *new,
 {
 	int err;
 	struct resource_constraint constraint;
+	LIST_HEAD(head);
+	struct good_resource *good, *tmp;
+	resource_size_t avail_size = (resource_size_t)-1ULL;
 
 	if (!alignf)
 		alignf = simple_align_resource;
@@ -763,11 +767,53 @@ int allocate_resource(struct resource *root, struct resource *new,
 		return reallocate_resource(root, new, size, &constraint);
 	}
 
+	/* find all suitable ones and add to the list */
+	for (;;) {
+		good = kzalloc(sizeof(*good), GFP_KERNEL);
+		if (!good) {
+			err = -ENOMEM;
+			break;
+		}
+
+		good->new.start = new->start;
+		good->new.end = new->end;
+		good->new.flags = new->flags;
+
+		write_lock(&resource_lock);
+		err = __find_resource(root, NULL, &good->new, &good->avail,
+					size, &constraint);
+		if (err || __request_resource(root, &good->avail)) {
+			err = -EBUSY;
+			kfree(good);
+			write_unlock(&resource_lock);
+			break;
+		}
+		write_unlock(&resource_lock);
+
+		list_add(&good->list, &head);
+	}
+
+	/* pick up the smallest one */
 	write_lock(&resource_lock);
-	err = find_resource(root, new, size, &constraint);
+	list_for_each_entry(good, &head, list) {
+		if (resource_size(&good->avail) < avail_size) {
+			avail_size = resource_size(&good->avail);
+			new->start = good->new.start;
+			new->end = good->new.end;
+			err = 0;
+		}
+		__release_resource(&good->avail, false);
+	}
 	if (err >= 0 && __request_resource(root, new))
 		err = -EBUSY;
 	write_unlock(&resource_lock);
+
+	/* delete the list */
+	list_for_each_entry_safe(good, tmp, &head, list) {
+		list_del(&good->list);
+		kfree(good);
+	}
+
 	return err;
 }
 
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 55/60] PCI, x86: Allocate from high in available window for MMIO
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (53 preceding siblings ...)
  2016-04-08  0:16 ` [PATCH v11 54/60] resources: Make allocate_resource() return best fit resource Yinghai Lu
@ 2016-04-08  0:16 ` Yinghai Lu
  2016-04-08  0:16 ` [PATCH v11 56/60] PCI: Add debug print out for min_align and alt_size Yinghai Lu
                   ` (5 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:16 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

Current code just use aligned start from avialable window, that could waste
big alignment from start.

We can align to the end from avialable window, so will save
start with big align to others: like second try for pref mmio
after first try already have non-pref assigned.

pci tree:
-[0000:00]-+-00.0
           +-1c.0-[01-10]--+-00.0-[02-10]--+-01.0-[03]----00.0  PLX Technology, Inc. Device 87b1
           |               |               +-02.0-[04-09]--+-00.0-[05-09]--+-01.0-[06]----00.0  PLX Technology, Inc. Device 87b1
           |               |               |               |               +-02.0-[07]----00.0  Broadcom Corporation Device 8650
           |               |               |               |               +-03.0-[08]--
           |               |               |               |               \-04.0-[09]----00.0  Altera Corporation Device 0201
           |               |               |               +-00.1  PLX Technology, Inc. Device 87d0
           |               |               |               +-00.2  PLX Technology, Inc. Device 87d0
           |               |               |               +-00.3  PLX Technology, Inc. Device 87d0
           |               |               |               \-00.4  PLX Technology, Inc. Device 87d0
           |               |               +-03.0-[0a-0f]--+-00.0-[0b-0f]--+-01.0-[0c]----00.0  PLX Technology, Inc. Device 87b1
           |               |               |               |               +-02.0-[0d]----00.0  Broadcom Corporation Device 8650
           |               |               |               |               +-03.0-[0e]--
           |               |               |               |               \-04.0-[0f]----00.0  Altera Corporation Device 0201
           |               |               |               +-00.1  PLX Technology, Inc. Device 87d0
           |               |               |               +-00.2  PLX Technology, Inc. Device 87d0
           |               |               |               +-00.3  PLX Technology, Inc. Device 87d0
           |               |               |               \-00.4  PLX Technology, Inc. Device 87d0
           |               |               \-04.0-[10]--
           |               +-00.1  PLX Technology, Inc. Device 87d0
           |               +-00.2  PLX Technology, Inc. Device 87d0
           |               +-00.3  PLX Technology, Inc. Device 87d0
           |               \-00.4  PLX Technology, Inc. Device 87d0
           +-1c.3-[11]----00.0

hotplug device under 0000:02:03.0

before the patch:

pci 0000:0a:00.0: BAR 9: no space for [mem size 0x03000000 64bit pref]
pci 0000:0a:00.0: BAR 9: failed to assign [mem size 0x03000000 64bit pref]
pci 0000:0a:00.0: BAR 8: assigned [mem 0xb0000000-0xb01fffff]  **************
pci 0000:0a:00.0: BAR 0: assigned [mem 0xb0200000-0xb023ffff]
pci 0000:0a:00.1: BAR 0: assigned [mem 0xb0240000-0xb0241fff]
pci 0000:0a:00.2: BAR 0: assigned [mem 0xb0242000-0xb0243fff]
pci 0000:0a:00.3: BAR 0: assigned [mem 0xb0244000-0xb0245fff]
pci 0000:0a:00.4: BAR 0: assigned [mem 0xb0246000-0xb0247fff]
pci 0000:0b:04.0: BAR 9: no space for [mem size 0x03000000 64bit pref]
pci 0000:0b:04.0: BAR 9: failed to assign [mem size 0x03000000 64bit pref]
pci 0000:0b:01.0: BAR 8: assigned [mem 0xb0000000-0xb00fffff]
pci 0000:0b:02.0: BAR 8: assigned [mem 0xb0100000-0xb01fffff]
pci 0000:0c:00.0: BAR 0: assigned [mem 0xb0000000-0xb003ffff]
pci 0000:0b:01.0: PCI bridge to [bus 0c]
pci 0000:0b:01.0:   bridge window [mem 0xb0000000-0xb00fffff]
pci 0000:0d:00.0: BAR 0: assigned [mem 0xb0100000-0xb013ffff 64bit]
pci 0000:0b:02.0: PCI bridge to [bus 0d]
pci 0000:0b:02.0:   bridge window [mem 0xb0100000-0xb01fffff]
pci 0000:0b:03.0: PCI bridge to [bus 0e]
pci 0000:0f:00.0: BAR 0: no space for [mem size 0x02000000 64bit pref]
pci 0000:0f:00.0: BAR 0: failed to assign [mem size 0x02000000 64bit pref]
pci 0000:0f:00.0: BAR 2: no space for [mem size 0x00010000 64bit pref]
pci 0000:0f:00.0: BAR 2: failed to assign [mem size 0x00010000 64bit pref]
pci 0000:0b:04.0: PCI bridge to [bus 0f]
pci 0000:0a:00.0: PCI bridge to [bus 0b-0f]
pci 0000:0a:00.0:   bridge window [mem 0xb0000000-0xb01fffff]
pcieport 0000:02:03.0: PCI bridge to [bus 0a-0f]
pcieport 0000:02:03.0:   bridge window [io  0x2000-0x2fff]
pcieport 0000:02:03.0:   bridge window [mem 0xb0000000-0xb24fffff]
pcieport 0000:02:03.0:   bridge window [mem 0x80200000-0x803fffff 64bit pref]
PCI: No. 2 try to assign unassigned res
pcieport 0000:02:03.0: resource 9 [mem 0x80200000-0x803fffff 64bit pref] released
pcieport 0000:02:03.0: PCI bridge to [bus 0a-0f]
pcieport 0000:02:03.0: BAR 9: no space for [mem size 0x03000000 64bit pref]
pcieport 0000:02:03.0: BAR 9: failed to assign [mem size 0x03000000 64bit pref]
pcieport 0000:02:03.0: BAR 9: no space for [mem size 0x02100000 64bit pref]
pcieport 0000:02:03.0: BAR 9: failed to assign [mem size 0x02100000 64bit pref]
pci 0000:0a:00.0: BAR 9: no space for [mem size 0x03000000 64bit pref]
pci 0000:0a:00.0: BAR 9: failed to assign [mem size 0x03000000 64bit pref]
pci 0000:0a:00.0: BAR 9: no space for [mem size 0x02100000 64bit pref]   **************
pci 0000:0a:00.0: BAR 9: failed to assign [mem size 0x02100000 64bit pref]
pci 0000:0b:04.0: BAR 9: no space for [mem size 0x03000000 64bit pref]
pci 0000:0b:04.0: BAR 9: failed to assign [mem size 0x03000000 64bit pref]
pci 0000:0b:04.0: BAR 9: no space for [mem size 0x02100000 64bit pref]   **************
pci 0000:0b:04.0: BAR 9: failed to assign [mem size 0x02100000 64bit pref]
pci 0000:0b:01.0: PCI bridge to [bus 0c]
pci 0000:0b:01.0:   bridge window [mem 0xb0000000-0xb00fffff]
pci 0000:0b:02.0: PCI bridge to [bus 0d]
pci 0000:0b:02.0:   bridge window [mem 0xb0100000-0xb01fffff]
pci 0000:0b:03.0: PCI bridge to [bus 0e]
pci 0000:0f:00.0: BAR 0: no space for [mem size 0x02000000 64bit pref]
pci 0000:0f:00.0: BAR 0: failed to assign [mem size 0x02000000 64bit pref]
pci 0000:0f:00.0: BAR 2: no space for [mem size 0x00010000 64bit pref]
pci 0000:0f:00.0: BAR 2: failed to assign [mem size 0x00010000 64bit pref]
pci 0000:0b:04.0: PCI bridge to [bus 0f]
pci 0000:0a:00.0: PCI bridge to [bus 0b-0f]
pci 0000:0a:00.0:   bridge window [mem 0xb0000000-0xb01fffff]
pcieport 0000:02:03.0: PCI bridge to [bus 0a-0f]
pcieport 0000:02:03.0:   bridge window [io  0x2000-0x2fff]
pcieport 0000:02:03.0:   bridge window [mem 0xb0000000-0xb24fffff]


after the patch:

pci 0000:0a:00.0: BAR 9: no space for [mem size 0x03000000 64bit pref]
pci 0000:0a:00.0: BAR 9: failed to assign [mem size 0x03000000 64bit pref]
pci 0000:0a:00.0: BAR 8: assigned [mem 0xb2300000-0xb24fffff]   *************
pci 0000:0a:00.0: BAR 0: assigned [mem 0xb22c0000-0xb22fffff]
pci 0000:0a:00.1: BAR 0: assigned [mem 0xb22be000-0xb22bffff]
pci 0000:0a:00.2: BAR 0: assigned [mem 0xb22bc000-0xb22bdfff]
pci 0000:0a:00.3: BAR 0: assigned [mem 0xb22ba000-0xb22bbfff]
pci 0000:0a:00.4: BAR 0: assigned [mem 0xb22b8000-0xb22b9fff]
pci 0000:0b:04.0: BAR 9: no space for [mem size 0x03000000 64bit pref]
pci 0000:0b:04.0: BAR 9: failed to assign [mem size 0x03000000 64bit pref]
pci 0000:0b:01.0: BAR 8: assigned [mem 0xb2400000-0xb24fffff]
pci 0000:0b:02.0: BAR 8: assigned [mem 0xb2300000-0xb23fffff]
pci 0000:0c:00.0: BAR 0: assigned [mem 0xb24c0000-0xb24fffff]
pci 0000:0b:01.0: PCI bridge to [bus 0c]
pci 0000:0b:01.0:   bridge window [mem 0xb2400000-0xb24fffff]
pci 0000:0d:00.0: BAR 0: assigned [mem 0xb23c0000-0xb23fffff 64bit]
pci 0000:0b:02.0: PCI bridge to [bus 0d]
pci 0000:0b:02.0:   bridge window [mem 0xb2300000-0xb23fffff]
pci 0000:0b:03.0: PCI bridge to [bus 0e]
pci 0000:0f:00.0: BAR 0: no space for [mem size 0x02000000 64bit pref]
pci 0000:0f:00.0: BAR 0: failed to assign [mem size 0x02000000 64bit pref]
pci 0000:0f:00.0: BAR 2: no space for [mem size 0x00010000 64bit pref]
pci 0000:0f:00.0: BAR 2: failed to assign [mem size 0x00010000 64bit pref]
pci 0000:0b:04.0: PCI bridge to [bus 0f]
pci 0000:0a:00.0: PCI bridge to [bus 0b-0f]
pci 0000:0a:00.0:   bridge window [mem 0xb2300000-0xb24fffff]
pcieport 0000:02:03.0: PCI bridge to [bus 0a-0f]
pcieport 0000:02:03.0:   bridge window [io  0x2000-0x2fff]
pcieport 0000:02:03.0:   bridge window [mem 0xb0000000-0xb24fffff]
pcieport 0000:02:03.0:   bridge window [mem 0x9fc00000-0x9fdfffff 64bit pref]
PCI: No. 2 try to assign unassigned res
pcieport 0000:02:03.0: resource 9 [mem 0x9fc00000-0x9fdfffff 64bit pref] released
pcieport 0000:02:03.0: PCI bridge to [bus 0a-0f]
pcieport 0000:02:03.0: BAR 9: no space for [mem size 0x03000000 64bit pref]
pcieport 0000:02:03.0: BAR 9: failed to assign [mem size 0x03000000 64bit pref]
pcieport 0000:02:03.0: BAR 9: no space for [mem size 0x02100000 64bit pref]
pcieport 0000:02:03.0: BAR 9: failed to assign [mem size 0x02100000 64bit pref]
pci 0000:0a:00.0: BAR 9: no space for [mem size 0x03000000 64bit pref]
pci 0000:0a:00.0: BAR 9: failed to assign [mem size 0x03000000 64bit pref]
pci 0000:0a:00.0: BAR 9: assigned [mem 0xb0000000-0xb20fffff 64bit pref]  *********
pci 0000:0b:04.0: BAR 9: no space for [mem size 0x03000000 64bit pref]
pci 0000:0b:04.0: BAR 9: failed to assign [mem size 0x03000000 64bit pref]
pci 0000:0b:04.0: BAR 9: assigned [mem 0xb0000000-0xb20fffff 64bit pref]  *********
pci 0000:0b:01.0: PCI bridge to [bus 0c]
pci 0000:0b:01.0:   bridge window [mem 0xb2400000-0xb24fffff]
pci 0000:0b:02.0: PCI bridge to [bus 0d]
pci 0000:0b:02.0:   bridge window [mem 0xb2300000-0xb23fffff]
pci 0000:0b:03.0: PCI bridge to [bus 0e]
pci 0000:0f:00.0: BAR 0: assigned [mem 0xb0000000-0xb1ffffff 64bit pref]   ********
pci 0000:0f:00.0: BAR 2: assigned [mem 0xb20f0000-0xb20fffff 64bit pref]   ********
pci 0000:0b:04.0: PCI bridge to [bus 0f]
pci 0000:0b:04.0:   bridge window [mem 0xb0000000-0xb20fffff 64bit pref]
pci 0000:0a:00.0: PCI bridge to [bus 0b-0f]
pci 0000:0a:00.0:   bridge window [mem 0xb2300000-0xb24fffff]
pci 0000:0a:00.0:   bridge window [mem 0xb0000000-0xb20fffff 64bit pref]
pcieport 0000:02:03.0: PCI bridge to [bus 0a-0f]
pcieport 0000:02:03.0:   bridge window [io  0x2000-0x2fff]
pcieport 0000:02:03.0:   bridge window [mem 0xb0000000-0xb24fffff]

So we allocate high for 0a:00.0 and etc, and leave low range like 0xb0000000 to
0b:04.0 and 0f:00.0

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/pci/i386.c     | 20 ++++++++++++++++++++
 drivers/pci/setup-bus.c | 11 ++++++++++-
 include/linux/pci.h     |  3 +++
 3 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
index cf296f5..6121ef3 100644
--- a/arch/x86/pci/i386.c
+++ b/arch/x86/pci/i386.c
@@ -128,6 +128,24 @@ static void __init pcibios_fw_addr_list_del(void)
 	pcibios_fw_addr_done = true;
 }
 
+resource_size_t
+pcibios_align_end_resource(void *data, const struct resource *res,
+			resource_size_t size, resource_size_t align)
+{
+	resource_size_t start = res->start;
+
+	/* Take near end */
+	if (res->end + 1 > size) {
+		resource_size_t new_start;
+
+		new_start = round_down(res->end + 1 - size, align);
+		if (new_start > start)
+			start = new_start;
+	}
+
+	return start;
+}
+
 /*
  * We need to avoid collisions with `mirrored' VGA ports
  * and other strange ISA hardware, so we always want the
@@ -154,6 +172,8 @@ pcibios_align_resource(void *data, const struct resource *res,
 		if (start & 0x300)
 			start = (start + 0x3ff) & ~0x3ff;
 	} else if (res->flags & IORESOURCE_MEM) {
+		start = pcibios_align_end_resource(data, res, size, align);
+
 		/* The low 1MB range is reserved for ISA cards */
 		if (start < BIOS_END)
 			start = BIOS_END;
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 65a41e7..c282b86 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1318,6 +1318,15 @@ static void sort_align_test(struct list_head *head)
 	}
 }
 
+resource_size_t __weak pcibios_align_end_resource(void *data,
+					  const struct resource *res,
+					  resource_size_t size,
+					  resource_size_t align)
+{
+	/* default is not aligned to end */
+	return res->start;
+}
+
 static bool is_align_size_good(struct list_head *head,
 			resource_size_t min_align, resource_size_t size,
 			resource_size_t start)
@@ -1335,7 +1344,7 @@ static bool is_align_size_good(struct list_head *head,
 	list_for_each_entry(p, head, list)
 		if (allocate_resource(&root, &p->res, p->size,
 				0, (resource_size_t)-1ULL,
-				p->align, NULL, NULL))
+				p->align, pcibios_align_end_resource, NULL))
 			return false;
 
 	return true;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index d7b1ceb..41d06ce 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -800,6 +800,9 @@ char *pcibios_setup(char *str);
 resource_size_t pcibios_align_resource(void *, const struct resource *,
 				resource_size_t,
 				resource_size_t);
+resource_size_t pcibios_align_end_resource(void *, const struct resource *,
+				resource_size_t,
+				resource_size_t);
 void pcibios_update_irq(struct pci_dev *, int irq);
 
 /* Weak but can be overriden by arch */
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 56/60] PCI: Add debug print out for min_align and alt_size
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (54 preceding siblings ...)
  2016-04-08  0:16 ` [PATCH v11 55/60] PCI, x86: Allocate from high in available window for MMIO Yinghai Lu
@ 2016-04-08  0:16 ` Yinghai Lu
  2016-04-08  0:16 ` [PATCH v11 57/60] PCI, x86: Add pci=assign_pref_bars to reallocate pref BARs Yinghai Lu
                   ` (4 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:16 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

Put all print out for all children align/size and result align/size
together.

We can print out device name at same time with min_align/alt_size
calculation.

So we can shut off debug print out from get_res_add_size() and
get_res_add_align().

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 76 ++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 56 insertions(+), 20 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index c282b86..bd74349 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -158,11 +158,6 @@ static resource_size_t get_res_add_size(struct list_head *head,
 	if (!dev_res || !dev_res->add_size)
 		return 0;
 
-	dev_printk(KERN_DEBUG, &dev_res->dev->dev,
-		   "BAR %d: %pR get_res_add_size add_size   %#llx\n",
-		   (int)(res - &dev_res->dev->resource[0]),
-		   res, (unsigned long long)dev_res->add_size);
-
 	return dev_res->add_size;
 }
 
@@ -175,11 +170,6 @@ static resource_size_t get_res_add_align(struct list_head *head,
 	if (!dev_res || !dev_res->min_align)
 		return 0;
 
-	dev_printk(KERN_DEBUG, &dev_res->dev->dev,
-		   "BAR %d: %pR get_res_add_align min_align %#llx\n",
-		   (int)(res - &dev_res->dev->resource[0]),
-		   res, (unsigned long long)dev_res->min_align);
-
 	return dev_res->min_align;
 }
 
@@ -1270,6 +1260,8 @@ struct align_test_res {
 	struct resource res;
 	resource_size_t size;
 	resource_size_t align;
+	struct device *dev;
+	int idx;
 };
 
 static void free_align_test_list(struct list_head *head)
@@ -1283,7 +1275,8 @@ static void free_align_test_list(struct list_head *head)
 }
 
 static int add_to_align_test_list(struct list_head *head,
-				  resource_size_t align, resource_size_t size)
+				  resource_size_t align, resource_size_t size,
+				  struct device *dev, int idx)
 {
 	struct align_test_res *tmp;
 
@@ -1293,6 +1286,8 @@ static int add_to_align_test_list(struct list_head *head,
 
 	tmp->align = align;
 	tmp->size = size;
+	tmp->dev = dev;
+	tmp->idx = idx;
 
 	list_add_tail(&tmp->list, head);
 
@@ -1358,6 +1353,19 @@ static resource_size_t calculate_mem_align(struct list_head *head,
 	resource_size_t min_align, good_align, aligned_size, start;
 	int count = 0;
 
+	list_for_each_entry(p, head, list)
+		count++;
+
+	printk(KERN_DEBUG "  ===========BEGIN===calculate_mem_align========\n");
+	if (count) {
+		printk(KERN_DEBUG "  align/size:\n");
+		list_for_each_entry(p, head, list)
+			dev_printk(KERN_DEBUG, p->dev,
+				   "BAR %d:     %08llx/%08llx\n", p->idx,
+				   (unsigned long long)p->align,
+				   (unsigned long long)p->size);
+	}
+
 	if (max_align <= align_low) {
 		good_align = align_low;
 		goto out;
@@ -1365,9 +1373,6 @@ static resource_size_t calculate_mem_align(struct list_head *head,
 
 	good_align = max_align;
 
-	list_for_each_entry(p, head, list)
-		count++;
-
 	if (count <= 1)
 		goto out;
 
@@ -1392,6 +1397,11 @@ static resource_size_t calculate_mem_align(struct list_head *head,
 	} while (min_align > align_low);
 
 out:
+	printk(KERN_DEBUG "      min_align/aligned_size: %08llx/%08llx\n",
+			(unsigned long long)good_align,
+			(unsigned long long)ALIGN(size, good_align));
+	printk(KERN_DEBUG "  ===========END===calculate_mem_align==========\n");
+
 	return good_align;
 }
 
@@ -1409,6 +1419,16 @@ static resource_size_t calculate_mem_alt_size(struct list_head *head,
 	list_for_each_entry(p, head, list)
 		count++;
 
+	printk(KERN_DEBUG "  ===========BEGIN===calculate_mem_alt_size=====\n");
+	if (count) {
+		printk(KERN_DEBUG "  align/size:\n");
+		list_for_each_entry(p, head, list)
+			dev_printk(KERN_DEBUG, p->dev,
+				   "BAR %d:     %08llx/%08llx\n", p->idx,
+				   (unsigned long long)p->align,
+				   (unsigned long long)p->size);
+	}
+
 	if (count <= 1)
 		goto out;
 
@@ -1433,6 +1453,11 @@ static resource_size_t calculate_mem_alt_size(struct list_head *head,
 	}
 
 out:
+	printk(KERN_DEBUG "   alt_align/alt_size: %08llx/%08llx\n",
+			(unsigned long long)max_align,
+			(unsigned long long)good_size);
+	printk(KERN_DEBUG "  ===========END===calculate_mem_alt_size=======\n");
+
 	return good_size;
 }
 
@@ -1515,7 +1540,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 			/* put SRIOV/ROM res to realloc list */
 			if (realloc_head && is_optional(i)) {
 				add_to_align_test_list(&align_test_add_list,
-							align, r_size);
+						align, r_size, &dev->dev, i);
 				r->end = r->start - 1;
 				__add_to_list(realloc_head, dev, r,
 					      r_size, align, 0, 0);
@@ -1534,7 +1559,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 
 			if (r_size > 1) {
 				add_to_align_test_list(&align_test_list,
-							align, r_size);
+						align, r_size, &dev->dev, i);
 				size += r_size;
 				if (align > max_align)
 					max_align = align;
@@ -1551,7 +1576,8 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 					add_align = align;
 				add_to_align_test_list(&align_test_add_list,
 							add_align,
-							r_size + add_r_size);
+							r_size + add_r_size,
+							&dev->dev, i);
 				sum_add_size += r_size + add_r_size;
 				if (add_align > max_add_align)
 					max_add_align = add_align;
@@ -1561,14 +1587,14 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 					add_to_align_test_list(
 						&align_test_alt_list,
 						dev_res->alt_align,
-						dev_res->alt_size);
+						dev_res->alt_size, &dev->dev, i);
 					alt_size += dev_res->alt_size;
 					if (alt_align < dev_res->alt_align)
 						alt_align = dev_res->alt_align;
 				} else if (r_size > 1) {
 					add_to_align_test_list(
 						&align_test_alt_list,
-						align, r_size);
+						align, r_size, &dev->dev, i);
 					alt_size += r_size;
 					if (alt_align < align)
 						alt_align = align;
@@ -1579,6 +1605,9 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 
 	max_align = max(max_align, window_align);
 	if (size || min_size) {
+		dev_printk(KERN_DEBUG, &bus->self->dev,
+			   "BAR %d: bridge window %pR to %pR calculate_mem for MUST\n",
+			   (int)(b_res - &bus->self->resource[0]), b_res, &bus->busn_res);
 		min_align = calculate_mem_align(&align_test_list, max_align,
 						size, window_align);
 		size0 = calculate_size(size, min_size,
@@ -1588,6 +1617,9 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 
 	if (size0 && realloc_head) {
 		alt_align = max(alt_align, window_align);
+		dev_printk(KERN_DEBUG, &bus->self->dev,
+			   "BAR %d: bridge window %pR to %pR calculate_mem for ALT\n",
+			   (int)(b_res - &bus->self->resource[0]), b_res, &bus->busn_res);
 		/* need to increase size to fit more alt */
 		alt_size = calculate_mem_alt_size(&align_test_alt_list,
 						  alt_align, alt_size,
@@ -1603,6 +1635,9 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 	if (sum_add_size < min_sum_size)
 		sum_add_size = min_sum_size;
 	if (sum_add_size > size && realloc_head) {
+		dev_printk(KERN_DEBUG, &bus->self->dev,
+			   "BAR %d: bridge window %pR to %pR calculate_mem for ADD\n",
+			   (int)(b_res - &bus->self->resource[0]), b_res, &bus->busn_res);
 		min_add_align = calculate_mem_align(&align_test_add_list,
 					max_add_align, sum_add_size,
 					window_align);
@@ -1660,7 +1695,8 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 				      final_add_size, min_add_align,
 				      alt_size, alt_align);
 			dev_printk(KERN_DEBUG, &bus->self->dev,
-				   "bridge window %pR to %pR add_size %llx add_align %llx alt_size %llx alt_align %llx req_size %llx req_align %llx\n",
+				   "BAR %d: bridge window %pR to %pR add_size %llx add_align %llx alt_size %llx alt_align %llx req_size %llx req_align %llx\n",
+				   (int)(b_res - &bus->self->resource[0]),
 				   b_res, &bus->busn_res,
 				   (unsigned long long)final_add_size,
 				   (unsigned long long)min_add_align,
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 57/60] PCI, x86: Add pci=assign_pref_bars to reallocate pref BARs
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (55 preceding siblings ...)
  2016-04-08  0:16 ` [PATCH v11 56/60] PCI: Add debug print out for min_align and alt_size Yinghai Lu
@ 2016-04-08  0:16 ` Yinghai Lu
  2016-04-08  0:16 ` [PATCH v11 58/60] PCI: Introduce resource_disabled() Yinghai Lu
                   ` (3 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:16 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

Now some BIOS tend to allocate pref MMIO under non-pref MMIO, or allocate
64bit pref MMIO under 4G.

Add pci=assign_pref_bars to clear and allocate resource to pref BARS.
So could reallocate pref mmio64 above 4G and pref under bridges pref BARs.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/pci_x86.h |  1 +
 arch/x86/pci/common.c          |  3 +++
 arch/x86/pci/i386.c            | 56 ++++++++++++++++++++++++++----------------
 3 files changed, 39 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
index d1f919e..6a1a97e 100644
--- a/arch/x86/include/asm/pci_x86.h
+++ b/arch/x86/include/asm/pci_x86.h
@@ -34,6 +34,7 @@ do {						\
 #define PCI_NOASSIGN_ROMS	0x80000
 #define PCI_ROOT_NO_CRS		0x100000
 #define PCI_NOASSIGN_BARS	0x200000
+#define PCI_ASSIGN_PREF_BARS	0x400000
 
 extern unsigned int pci_probe;
 extern unsigned long pirq_table_addr;
diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index 09a16b7..b40b4a5 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -605,6 +605,9 @@ char *__init pcibios_setup(char *str)
 	} else if (!strcmp(str, "assign-busses")) {
 		pci_probe |= PCI_ASSIGN_ALL_BUSSES;
 		return NULL;
+	} else if (!strcmp(str, "assign_pref_bars")) {
+		pci_probe |= PCI_ASSIGN_PREF_BARS;
+		return NULL;
 	} else if (!strcmp(str, "use_crs")) {
 		pci_probe |= PCI_USE__CRS;
 		return NULL;
diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
index 6121ef3..2df7723 100644
--- a/arch/x86/pci/i386.c
+++ b/arch/x86/pci/i386.c
@@ -227,16 +227,25 @@ static void pcibios_allocate_bridge_resources(struct pci_dev *dev)
 			continue;
 		if (r->parent)	/* Already allocated */
 			continue;
-		if (!r->start || pci_claim_bridge_resource(dev, idx) < 0) {
-			/*
-			 * Something is wrong with the region.
-			 * Invalidate the resource to prevent
-			 * child resource allocations in this
-			 * range.
-			 */
-			r->start = r->end = 0;
-			r->flags = 0;
-		}
+
+		if ((r->flags & IORESOURCE_PREFETCH) &&
+		    (pci_probe & PCI_ASSIGN_PREF_BARS))
+			goto clear;
+
+		if (!r->start)
+			goto clear;
+
+		if (pci_claim_bridge_resource(dev, idx) == 0)
+			continue;
+
+clear:
+		/*
+		 * Something is wrong with the region.
+		 * Invalidate the resource to prevent
+		 * child resource allocations in this range.
+		 */
+		r->start = r->end = 0;
+		r->flags = 0;
 	}
 }
 
@@ -282,21 +291,26 @@ static void pcibios_allocate_dev_resources(struct pci_dev *dev, int pass)
 			else
 				disabled = !(command & PCI_COMMAND_MEMORY);
 			if (pass == disabled) {
+				if ((r->flags & IORESOURCE_PREFETCH) &&
+				    (pci_probe & PCI_ASSIGN_PREF_BARS))
+					goto clear;
+
 				dev_dbg(&dev->dev,
 					"BAR %d: reserving %pr (d=%d, p=%d)\n",
 					idx, r, disabled, pass);
-				if (pci_claim_resource(dev, idx) < 0) {
-					if (r->flags & IORESOURCE_PCI_FIXED) {
-						dev_info(&dev->dev, "BAR %d %pR is immovable\n",
-							 idx, r);
-					} else {
-						/* We'll assign a new address later */
-						pcibios_save_fw_addr(dev,
-								idx, r->start);
-						r->end -= r->start;
-						r->start = 0;
-					}
+				if (pci_claim_resource(dev, idx) == 0)
+					continue;
+				if (r->flags & IORESOURCE_PCI_FIXED) {
+					dev_info(&dev->dev, "BAR %d %pR is immovable\n",
+						 idx, r);
+					continue;
 				}
+
+clear:
+				/* We'll assign a new address later */
+				pcibios_save_fw_addr(dev, idx, r->start);
+				r->end -= r->start;
+				r->start = 0;
 			}
 		}
 	if (!pass) {
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 58/60] PCI: Introduce resource_disabled()
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (56 preceding siblings ...)
  2016-04-08  0:16 ` [PATCH v11 57/60] PCI, x86: Add pci=assign_pref_bars to reallocate pref BARs Yinghai Lu
@ 2016-04-08  0:16 ` Yinghai Lu
  2016-04-08  0:16 ` [PATCH v11 59/60] PCI: Don't set flags to 0 when assign resource fail Yinghai Lu
                   ` (2 subsequent siblings)
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:16 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu, linux-alpha, linux-ia64, linux-am33-list,
	linuxppc-dev, linux-s390, sparclinux, linux-xtensa, iommu,
	linux-sh

Current is using !flags, and we are going to use
IORESOURCE_DISABLED instead of clearing resource flags.

Let's convert all !flags to helper function resource_disabled().
resource_disabled will check !flags and IORESOURCE_DISABLED both.

Cc: linux-alpha@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-am33-list@redhat.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linux-pci@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: iommu@lists.linux-foundation.org
Cc: linux-sh@vger.kernel.org
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/alpha/kernel/pci.c                   |  2 +-
 arch/ia64/pci/pci.c                       |  4 ++--
 arch/microblaze/pci/pci-common.c          | 15 ++++++++-------
 arch/mn10300/unit-asb2305/pci-asb2305.c   |  4 ++--
 arch/mn10300/unit-asb2305/pci.c           |  4 ++--
 arch/powerpc/kernel/pci-common.c          | 16 +++++++++-------
 arch/powerpc/platforms/powernv/pci-ioda.c | 12 ++++++------
 arch/s390/pci/pci.c                       |  2 +-
 arch/sparc/kernel/pci.c                   |  2 +-
 arch/x86/pci/i386.c                       |  4 ++--
 arch/xtensa/kernel/pci.c                  |  4 ++--
 drivers/iommu/intel-iommu.c               |  3 ++-
 drivers/pci/host/pcie-rcar.c              |  2 +-
 drivers/pci/iov.c                         |  2 +-
 drivers/pci/probe.c                       |  2 +-
 drivers/pci/quirks.c                      |  4 ++--
 drivers/pci/rom.c                         |  2 +-
 drivers/pci/setup-bus.c                   |  8 ++++----
 drivers/pci/setup-res.c                   |  2 +-
 include/linux/ioport.h                    |  4 ++++
 20 files changed, 53 insertions(+), 45 deletions(-)

diff --git a/arch/alpha/kernel/pci.c b/arch/alpha/kernel/pci.c
index 5f387ee..c89c8ef 100644
--- a/arch/alpha/kernel/pci.c
+++ b/arch/alpha/kernel/pci.c
@@ -282,7 +282,7 @@ pcibios_claim_one_bus(struct pci_bus *b)
 		for (i = 0; i < PCI_NUM_RESOURCES; i++) {
 			struct resource *r = &dev->resource[i];
 
-			if (r->parent || !r->start || !r->flags)
+			if (r->parent || !r->start || resource_disabled(r))
 				continue;
 			if (pci_has_flag(PCI_PROBE_ONLY) ||
 			    (r->flags & IORESOURCE_PCI_FIXED)) {
diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c
index 8f6ac2f..f00373f 100644
--- a/arch/ia64/pci/pci.c
+++ b/arch/ia64/pci/pci.c
@@ -333,7 +333,7 @@ void pcibios_fixup_device_resources(struct pci_dev *dev)
 	for (idx = 0; idx < PCI_BRIDGE_RESOURCES; idx++) {
 		struct resource *r = &dev->resource[idx];
 
-		if (!r->flags || r->parent || !r->start)
+		if (resource_disabled(r) || r->parent || !r->start)
 			continue;
 
 		pci_claim_resource(dev, idx);
@@ -351,7 +351,7 @@ static void pcibios_fixup_bridge_resources(struct pci_dev *dev)
 	for (idx = PCI_BRIDGE_RESOURCES; idx < PCI_NUM_RESOURCES; idx++) {
 		struct resource *r = &dev->resource[idx];
 
-		if (!r->flags || r->parent || !r->start)
+		if (resource_disabled(r) || r->parent || !r->start)
 			continue;
 
 		pci_claim_bridge_resource(dev, idx);
diff --git a/arch/microblaze/pci/pci-common.c b/arch/microblaze/pci/pci-common.c
index 35654be..4cc5ed0 100644
--- a/arch/microblaze/pci/pci-common.c
+++ b/arch/microblaze/pci/pci-common.c
@@ -694,7 +694,7 @@ static void pcibios_fixup_resources(struct pci_dev *dev)
 	}
 	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
 		struct resource *res = dev->resource + i;
-		if (!res->flags)
+		if (resource_disabled(res))
 			continue;
 		if (res->start == 0) {
 			pr_debug("PCI:%s Resource %d %016llx-%016llx [%x]",
@@ -795,7 +795,7 @@ static void pcibios_fixup_bridge(struct pci_bus *bus)
 	pci_bus_for_each_resource(bus, res, i) {
 		if (!res)
 			continue;
-		if (!res->flags)
+		if (resource_disabled(res))
 			continue;
 		if (i >= 3 && bus->self->transparent)
 			continue;
@@ -964,7 +964,7 @@ static void pcibios_allocate_bus_resources(struct pci_bus *bus)
 		 pci_domain_nr(bus), bus->number);
 
 	pci_bus_for_each_resource(bus, res, i) {
-		if (!res || !res->flags
+		if (!res || resource_disabled(res)
 		    || res->start > res->end || res->parent)
 			continue;
 		if (bus->parent == NULL)
@@ -1066,7 +1066,8 @@ static void __init pcibios_allocate_resources(int pass)
 			r = &dev->resource[idx];
 			if (r->parent)		/* Already allocated */
 				continue;
-			if (!r->flags || (r->flags & IORESOURCE_UNSET))
+			if (resource_disabled(r) ||
+			    (r->flags & IORESOURCE_UNSET))
 				continue;	/* Not assigned at all */
 			/* We only allocate ROMs on pass 1 just in case they
 			 * have been screwed up by firmware
@@ -1197,7 +1198,7 @@ void pcibios_claim_one_bus(struct pci_bus *bus)
 		for (i = 0; i < PCI_NUM_RESOURCES; i++) {
 			struct resource *r = &dev->resource[i];
 
-			if (r->parent || !r->start || !r->flags)
+			if (r->parent || !r->start || resource_disabled(r))
 				continue;
 
 			pr_debug("PCI: Claiming %s: ", pci_name(dev));
@@ -1257,7 +1258,7 @@ static void pcibios_setup_phb_resources(struct pci_controller *hose,
 	res->start = (res->start + io_offset) & 0xffffffffu;
 	res->end = (res->end + io_offset) & 0xffffffffu;
 
-	if (!res->flags) {
+	if (resource_disabled(res)) {
 		pr_warn("PCI: I/O resource not set for host ");
 		pr_cont("bridge %s (domain %d)\n",
 			hose->dn->full_name, hose->global_number);
@@ -1277,7 +1278,7 @@ static void pcibios_setup_phb_resources(struct pci_controller *hose,
 	/* Hookup PHB Memory resources */
 	for (i = 0; i < 3; ++i) {
 		res = &hose->mem_resources[i];
-		if (!res->flags) {
+		if (resource_disabled(res)) {
 			if (i > 0)
 				continue;
 			pr_err("PCI: Memory resource 0 not set for ");
diff --git a/arch/mn10300/unit-asb2305/pci-asb2305.c b/arch/mn10300/unit-asb2305/pci-asb2305.c
index b7ab837..7e70e51 100644
--- a/arch/mn10300/unit-asb2305/pci-asb2305.c
+++ b/arch/mn10300/unit-asb2305/pci-asb2305.c
@@ -103,7 +103,7 @@ static void __init pcibios_allocate_bus_resources(struct list_head *bus_list)
 			     idx < PCI_NUM_RESOURCES;
 			     idx++) {
 				r = &dev->resource[idx];
-				if (!r->flags)
+				if (resource_disabled(r))
 					continue;
 				if (!r->start ||
 				    pci_claim_bridge_resource(dev, idx) < 0) {
@@ -188,7 +188,7 @@ static int __init pcibios_assign_resources(void)
 	   addresses. */
 	for_each_pci_dev(dev) {
 		r = &dev->resource[PCI_ROM_RESOURCE];
-		if (!r->flags || !r->start)
+		if (resource_disabled(r) || !r->start)
 			continue;
 		if (pci_claim_resource(dev, PCI_ROM_RESOURCE) < 0) {
 			r->end -= r->start;
diff --git a/arch/mn10300/unit-asb2305/pci.c b/arch/mn10300/unit-asb2305/pci.c
index 3dfe2d3..ad77b18 100644
--- a/arch/mn10300/unit-asb2305/pci.c
+++ b/arch/mn10300/unit-asb2305/pci.c
@@ -291,7 +291,7 @@ static void pcibios_fixup_device_resources(struct pci_dev *dev)
 	for (idx = 0; idx < PCI_BRIDGE_RESOURCES; idx++) {
 		struct resource *r = &dev->resource[idx];
 
-		if (!r->flags || r->parent || !r->start)
+		if (resource_disabled(r) || r->parent || !r->start)
 			continue;
 
 		pci_claim_resource(dev, idx);
@@ -308,7 +308,7 @@ static void pcibios_fixup_bridge_resources(struct pci_dev *dev)
 	for (idx = PCI_BRIDGE_RESOURCES; idx < PCI_NUM_RESOURCES; idx++) {
 		struct resource *r = &dev->resource[idx];
 
-		if (!r->flags || r->parent || !r->start)
+		if (resource_disabled(r) || r->parent || !r->start)
 			continue;
 
 		pci_claim_bridge_resource(dev, idx);
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 2a7f4fd..94d11f9 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -811,7 +811,7 @@ static void pcibios_fixup_resources(struct pci_dev *dev)
 	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
 		struct resource *res = dev->resource + i;
 		struct pci_bus_region reg;
-		if (!res->flags)
+		if (resource_disabled(res))
 			continue;
 
 		/* If we're going to re-assign everything, we mark all resources
@@ -920,7 +920,7 @@ static void pcibios_fixup_bridge(struct pci_bus *bus)
 	struct pci_dev *dev = bus->self;
 
 	pci_bus_for_each_resource(bus, res, i) {
-		if (!res || !res->flags)
+		if (!res || resource_disabled(res))
 			continue;
 		if (i >= 3 && bus->self->transparent)
 			continue;
@@ -1161,7 +1161,8 @@ static void pcibios_allocate_bus_resources(struct pci_bus *bus)
 		 pci_domain_nr(bus), bus->number);
 
 	pci_bus_for_each_resource(bus, res, i) {
-		if (!res || !res->flags || res->start > res->end || res->parent)
+		if (!res || resource_disabled(res) ||
+		    res->start > res->end || res->parent)
 			continue;
 
 		/* If the resource was left unset at this point, we clear it */
@@ -1256,7 +1257,8 @@ static void __init pcibios_allocate_resources(int pass)
 			r = &dev->resource[idx];
 			if (r->parent)		/* Already allocated */
 				continue;
-			if (!r->flags || (r->flags & IORESOURCE_UNSET))
+			if (resource_disabled(r) ||
+			    (r->flags & IORESOURCE_UNSET))
 				continue;	/* Not assigned at all */
 			/* We only allocate ROMs on pass 1 just in case they
 			 * have been screwed up by firmware
@@ -1394,7 +1396,7 @@ void pcibios_claim_one_bus(struct pci_bus *bus)
 		for (i = 0; i < PCI_NUM_RESOURCES; i++) {
 			struct resource *r = &dev->resource[i];
 
-			if (r->parent || !r->start || !r->flags)
+			if (r->parent || !r->start || resource_disabled(r))
 				continue;
 
 			pr_debug("PCI: Claiming %s: Resource %d: %pR\n",
@@ -1475,7 +1477,7 @@ static void pcibios_setup_phb_resources(struct pci_controller *hose,
 	/* Hookup PHB IO resource */
 	res = &hose->io_resource;
 
-	if (!res->flags) {
+	if (resource_disabled(res)) {
 		pr_info("PCI: I/O resource not set for host"
 		       " bridge %s (domain %d)\n",
 		       hose->dn->full_name, hose->global_number);
@@ -1490,7 +1492,7 @@ static void pcibios_setup_phb_resources(struct pci_controller *hose,
 	/* Hookup PHB Memory resources */
 	for (i = 0; i < 3; ++i) {
 		res = &hose->mem_resources[i];
-		if (!res->flags) {
+		if (resource_disabled(res)) {
 			if (i == 0)
 				printk(KERN_ERR "PCI: Memory resource 0 not set for "
 				       "host bridge %s (domain %d)\n",
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index c5baaf3..e621a68 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -869,7 +869,7 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
 	num_vfs = pdn->num_vfs;
 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
 		res = &dev->resource[i + PCI_IOV_RESOURCES];
-		if (!res->flags || !res->parent)
+		if (resource_disabled(res) || !res->parent)
 			continue;
 
 		/*
@@ -897,7 +897,7 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
 	 */
 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
 		res = &dev->resource[i + PCI_IOV_RESOURCES];
-		if (!res->flags || !res->parent)
+		if (resource_disabled(res) || !res->parent)
 			continue;
 
 		size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
@@ -1260,7 +1260,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
 
 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
 		res = &pdev->resource[i + PCI_IOV_RESOURCES];
-		if (!res->flags || !res->parent)
+		if (resource_disabled(res) || !res->parent)
 			continue;
 
 		for (j = 0; j < m64_bars; j++) {
@@ -2863,7 +2863,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 
 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
 		res = &pdev->resource[i + PCI_IOV_RESOURCES];
-		if (!res->flags || res->parent)
+		if (resource_disabled(res) || res->parent)
 			continue;
 		if (!pnv_pci_is_mem_pref_64(res->flags)) {
 			dev_warn(&pdev->dev, "Don't support SR-IOV with"
@@ -2899,7 +2899,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
 
 	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
 		res = &pdev->resource[i + PCI_IOV_RESOURCES];
-		if (!res->flags || res->parent)
+		if (resource_disabled(res) || res->parent)
 			continue;
 
 		size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
@@ -2951,7 +2951,7 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
 	BUG_ON(!(pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)));
 
 	pci_bus_for_each_resource(pe->pbus, res, i) {
-		if (!res || !res->flags ||
+		if (!res || resource_disabled(res) ||
 		    res->start > res->end)
 			continue;
 
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index 871af75..79e89fb 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -646,7 +646,7 @@ int pcibios_add_device(struct pci_dev *pdev)
 
 	for (i = 0; i < PCI_BAR_COUNT; i++) {
 		res = &pdev->resource[i];
-		if (res->parent || !res->flags)
+		if (res->parent || resource_disabled(res))
 			continue;
 		pci_claim_resource(pdev, i);
 	}
diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c
index e46e739..fa82e8e8 100644
--- a/arch/sparc/kernel/pci.c
+++ b/arch/sparc/kernel/pci.c
@@ -631,7 +631,7 @@ static void pci_claim_bus_resources(struct pci_bus *bus)
 		for (i = 0; i < PCI_NUM_RESOURCES; i++) {
 			struct resource *r = &dev->resource[i];
 
-			if (r->parent || !r->start || !r->flags)
+			if (r->parent || !r->start || resource_disabled(r))
 				continue;
 
 			if (ofpci_verbose)
diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
index 2df7723..dcd5beb 100644
--- a/arch/x86/pci/i386.c
+++ b/arch/x86/pci/i386.c
@@ -223,7 +223,7 @@ static void pcibios_allocate_bridge_resources(struct pci_dev *dev)
 
 	for (idx = PCI_BRIDGE_RESOURCES; idx < PCI_NUM_RESOURCES; idx++) {
 		r = &dev->resource[idx];
-		if (!r->flags)
+		if (resource_disabled(r))
 			continue;
 		if (r->parent)	/* Already allocated */
 			continue;
@@ -352,7 +352,7 @@ static void pcibios_allocate_dev_rom_resource(struct pci_dev *dev)
 	 * addresses.
 	 */
 	r = &dev->resource[PCI_ROM_RESOURCE];
-	if (!r->flags || !r->start)
+	if (resource_disabled(r) || !r->start)
 		return;
 	if (r->parent) /* Already allocated */
 		return;
diff --git a/arch/xtensa/kernel/pci.c b/arch/xtensa/kernel/pci.c
index b848cc3..f34d061 100644
--- a/arch/xtensa/kernel/pci.c
+++ b/arch/xtensa/kernel/pci.c
@@ -142,7 +142,7 @@ static void __init pci_controller_apertures(struct pci_controller *pci_ctrl,
 
 	io_offset = (unsigned long)pci_ctrl->io_space.base;
 	res = &pci_ctrl->io_resource;
-	if (!res->flags) {
+	if (resource_disabled(res)) {
 		if (io_offset)
 			printk (KERN_ERR "I/O resource not set for host"
 				" bridge %d\n", pci_ctrl->index);
@@ -156,7 +156,7 @@ static void __init pci_controller_apertures(struct pci_controller *pci_ctrl,
 
 	for (i = 0; i < 3; i++) {
 		res = &pci_ctrl->mem_resources[i];
-		if (!res->flags) {
+		if (resource_disabled(res)) {
 			if (i > 0)
 				continue;
 			printk(KERN_ERR "Memory resource not set for "
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a2e1b7f..1b8a7f6 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1825,7 +1825,8 @@ static int dmar_init_reserved_ranges(void)
 
 		for (i = 0; i < PCI_NUM_RESOURCES; i++) {
 			r = &pdev->resource[i];
-			if (!r->flags || !(r->flags & IORESOURCE_MEM))
+			if (resource_disabled(r) ||
+			    !(r->flags & IORESOURCE_MEM))
 				continue;
 			iova = reserve_iova(&reserved_iova_list,
 					    IOVA_PFN(r->start),
diff --git a/drivers/pci/host/pcie-rcar.c b/drivers/pci/host/pcie-rcar.c
index 3509218..716efd5 100644
--- a/drivers/pci/host/pcie-rcar.c
+++ b/drivers/pci/host/pcie-rcar.c
@@ -361,7 +361,7 @@ static int rcar_pcie_setup(struct list_head *resource, struct rcar_pcie *pci)
 	resource_list_for_each_entry(win, &pci->resources) {
 		struct resource *res = win->res;
 
-		if (!res->flags)
+		if (resource_disabled(res))
 			continue;
 
 		switch (resource_type(res)) {
diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 2194b44..a275c98 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -434,7 +434,7 @@ found:
 		else
 			bar64 = __pci_read_base(dev, pci_bar_unknown, res,
 						pos + PCI_SRIOV_BAR + i * 4);
-		if (!res->flags)
+		if (resource_disabled(res))
 			continue;
 		if (resource_size(res) & (PAGE_SIZE - 1)) {
 			rc = -EIO;
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 6b079f4..52ddc45 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2308,7 +2308,7 @@ void pci_bus_release_busn_res(struct pci_bus *b)
 	struct resource *res = &b->busn_res;
 	int ret;
 
-	if (!res->flags || !res->parent)
+	if (resource_disabled(res) || !res->parent)
 		return;
 
 	ret = release_resource(res);
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index a7cd617..61f87d5 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -348,7 +348,7 @@ static void quirk_bar_fixed(struct pci_dev *dev)
 	for (i = 0; i < PCI_STD_RESOURCE_END; i++) {
 		struct resource *r = &dev->resource[i];
 
-		if (!r->start || !r->flags)
+		if (!r->start || resource_disabled(r))
 			continue;
 		r->flags |= IORESOURCE_PCI_FIXED;
 	}
@@ -362,7 +362,7 @@ static void quirk_allocate_fixed(struct pci_dev *dev)
 	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
 		struct resource *r = &dev->resource[i];
 
-		if (r->parent ||
+		if (r->parent || resource_disabled(r) ||
 		    !(r->flags & IORESOURCE_PCI_FIXED) ||
 		    !(r->flags & (IORESOURCE_IO | IORESOURCE_MEM)))
 			continue;
diff --git a/drivers/pci/rom.c b/drivers/pci/rom.c
index 06663d3..cb40e8b 100644
--- a/drivers/pci/rom.c
+++ b/drivers/pci/rom.c
@@ -28,7 +28,7 @@ int pci_enable_rom(struct pci_dev *pdev)
 	struct pci_bus_region region;
 	u32 rom_addr;
 
-	if (!res->flags)
+	if (resource_disabled(res))
 		return -1;
 
 	/* Nothing to enable if we're using a shadow copy in RAM */
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index bd74349..c77c204 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -222,7 +222,7 @@ static void pdev_assign_resources_prepare(struct pci_dev *dev,
 		if (r->flags & IORESOURCE_PCI_FIXED)
 			continue;
 
-		if (!(r->flags) || r->parent)
+		if (resource_disabled(r) || r->parent)
 			continue;
 
 		r_align = __pci_resource_alignment(dev, r, realloc_head);
@@ -318,7 +318,7 @@ static void reassign_resources_sorted(struct list_head *realloc_head,
 	list_for_each_entry_safe(add_res, tmp, realloc_head, list) {
 		res = add_res->res;
 		/* skip resource that has been reset */
-		if (!res->flags)
+		if (resource_disabled(res))
 			goto out;
 
 		/* skip this resource if not found in head list */
@@ -2167,7 +2167,7 @@ static void pci_bus_dump_res(struct pci_bus *bus)
 	int i;
 
 	pci_bus_for_each_resource(bus, res, i) {
-		if (!res || !res->end || !res->flags)
+		if (!res || !res->end || resource_disabled(res))
 			continue;
 
 		dev_printk(KERN_DEBUG, &bus->dev, "resource %d %pR\n", i, res);
@@ -2250,7 +2250,7 @@ static int iov_resources_unassigned(struct pci_dev *dev, void *data)
 		struct pci_bus_region region;
 
 		/* Not assigned or rejected by kernel? */
-		if (!r->flags)
+		if (resource_disabled(r))
 			continue;
 
 		pcibios_resource_to_bus(dev->bus, &region, r);
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 59271ee..2f901f7 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -45,7 +45,7 @@ void pci_update_resource(struct pci_dev *dev, int resno)
 	 * Ignore resources for unimplemented BARs and unused resource slots
 	 * for 64 bit BARs.
 	 */
-	if (!res->flags)
+	if (resource_disabled(res))
 		return;
 
 	if (res->flags & IORESOURCE_UNSET)
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 9053ac9..b388127 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -208,6 +208,10 @@ static inline bool resource_contains(struct resource *r1, struct resource *r2)
 	return r1->start <= r2->start && r1->end >= r2->end;
 }
 
+static inline bool resource_disabled(struct resource *r)
+{
+	return !r->flags || (r->flags & IORESOURCE_DISABLED);
+}
 
 /* Convenience shorthand with allocation */
 #define request_region(start,n,name)		__request_region(&ioport_resource, (start), (n), (name), 0)
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 59/60] PCI: Don't set flags to 0 when assign resource fail
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (57 preceding siblings ...)
  2016-04-08  0:16 ` [PATCH v11 58/60] PCI: Introduce resource_disabled() Yinghai Lu
@ 2016-04-08  0:16 ` Yinghai Lu
  2016-04-08  0:16 ` [PATCH v11 60/60] PCI: Only try to assign io port only for root bus that support it Yinghai Lu
  2016-04-08  0:51 ` [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Linus Torvalds
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:16 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

We want to keep resource flags instead of clearing it after resource
allocation fails.

Make flags take IORESOURCE_UNSET | IORESOURCE_DISABLED instead.

-v2: add missing UNSET for _alt retore to required path.

Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/bus.c       |  2 +-
 drivers/pci/setup-bus.c | 45 +++++++++++++++++++++++----------------------
 drivers/pci/setup-res.c |  3 ++-
 3 files changed, 26 insertions(+), 24 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 200fdac..f357fb8 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -142,7 +142,7 @@ static int pci_bus_alloc_from_region(struct pci_bus *bus, struct resource *res,
 	pci_bus_for_each_resource(bus, r, i) {
 		resource_size_t min_used = min;
 
-		if (!r)
+		if (!r || resource_disabled(r))
 			continue;
 
 		/* type_mask must match */
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index c77c204..d07ba87 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -263,13 +263,6 @@ static void __dev_assign_resources_prepare(struct pci_dev *dev,
 	pdev_assign_resources_prepare(dev, realloc_head, head);
 }
 
-static inline void reset_resource(struct resource *res)
-{
-	res->start = 0;
-	res->end = 0;
-	res->flags = 0;
-}
-
 static void sort_resources(struct list_head *head)
 {
 	struct pci_dev_resource *res1, *tmp_res, *res2;
@@ -336,7 +329,7 @@ static void reassign_resources_sorted(struct list_head *realloc_head,
 			res->start = align;
 			res->end = res->start + add_size - 1;
 			if (pci_assign_resource(add_res->dev, idx))
-				reset_resource(res);
+				res->flags |= IORESOURCE_DISABLED;
 		} else {
 			/* could just assigned with alt, add difference ? */
 			resource_size_t size;
@@ -390,7 +383,7 @@ static void assign_requested_resources_sorted(struct list_head *head,
 		    pci_assign_resource(dev_res->dev, idx)) {
 			if (fail_head)
 				add_to_list(fail_head, dev_res->dev, res);
-			reset_resource(res);
+			res->flags |= IORESOURCE_DISABLED;
 		}
 	}
 }
@@ -676,7 +669,7 @@ static void __assign_resources_alt_sorted(struct list_head *head,
 
 		if (!res_to_dev_res(local_fail_head, res))
 			add_to_list(local_fail_head, fail_res->dev, res);
-		reset_resource(res);
+		res->flags |= IORESOURCE_UNSET | IORESOURCE_DISABLED;
 	}
 	free_list(&local_alt_fail_head);
 }
@@ -842,7 +835,7 @@ static void pci_setup_bridge_io(struct pci_dev *bridge)
 	/* Set up the top and bottom of the PCI I/O segment for this bus. */
 	res = &bridge->resource[PCI_BRIDGE_RESOURCES + 0];
 	pcibios_resource_to_bus(bridge->bus, &region, res);
-	if (res->flags & IORESOURCE_IO) {
+	if ((res->flags & IORESOURCE_IO) && !(res->flags & IORESOURCE_UNSET)) {
 		pci_read_config_word(bridge, PCI_IO_BASE, &l);
 		io_base_lo = (region.start >> 8) & io_mask;
 		io_limit_lo = (region.end >> 8) & io_mask;
@@ -872,7 +865,8 @@ static void pci_setup_bridge_mmio(struct pci_dev *bridge)
 	/* Set up the top and bottom of the PCI Memory segment for this bus. */
 	res = &bridge->resource[PCI_BRIDGE_RESOURCES + 1];
 	pcibios_resource_to_bus(bridge->bus, &region, res);
-	if (res->flags & IORESOURCE_MEM) {
+	if ((res->flags & IORESOURCE_MEM) &&
+	    !(res->flags & IORESOURCE_UNSET)) {
 		l = (region.start >> 16) & 0xfff0;
 		l |= region.end & 0xfff00000;
 		dev_info(&bridge->dev, "  bridge window %pR\n", res);
@@ -897,7 +891,8 @@ static void pci_setup_bridge_mmio_pref(struct pci_dev *bridge)
 	bu = lu = 0;
 	res = &bridge->resource[PCI_BRIDGE_RESOURCES + 2];
 	pcibios_resource_to_bus(bridge->bus, &region, res);
-	if (res->flags & IORESOURCE_PREFETCH) {
+	if ((res->flags & IORESOURCE_PREFETCH) &&
+	    !(res->flags & IORESOURCE_UNSET)) {
 		l = (region.start >> 16) & 0xfff0;
 		l |= region.end & 0xfff00000;
 		if (res->flags & IORESOURCE_MEM_64) {
@@ -1027,6 +1022,7 @@ static void pci_bridge_check_ranges(struct pci_bus *bus)
 
 	b_res = &bridge->resource[PCI_BRIDGE_RESOURCES];
 	b_res[1].flags |= IORESOURCE_MEM;
+	b_res[1].flags &= ~IORESOURCE_DISABLED;
 
 	pci_read_config_word(bridge, PCI_IO_BASE, &io);
 	if (!io) {
@@ -1034,8 +1030,10 @@ static void pci_bridge_check_ranges(struct pci_bus *bus)
 		pci_read_config_word(bridge, PCI_IO_BASE, &io);
 		pci_write_config_word(bridge, PCI_IO_BASE, 0x0);
 	}
-	if (io)
+	if (io) {
 		b_res[0].flags |= IORESOURCE_IO;
+		b_res[0].flags &= ~IORESOURCE_DISABLED;
+	}
 
 	/*  DECchip 21050 pass 2 errata: the bridge may miss an address
 	    disconnect boundary by one PCI data phase.
@@ -1052,6 +1050,7 @@ static void pci_bridge_check_ranges(struct pci_bus *bus)
 	}
 	if (pmem) {
 		b_res[2].flags |= IORESOURCE_MEM | IORESOURCE_PREFETCH;
+		b_res[2].flags &= ~IORESOURCE_DISABLED;
 		if ((pmem & PCI_PREF_RANGE_TYPE_MASK) ==
 		    PCI_PREF_RANGE_TYPE_64) {
 			b_res[2].flags |= IORESOURCE_MEM_64;
@@ -1197,8 +1196,10 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
 			struct resource *r = &dev->resource[i];
 			unsigned long r_size, r_add_size;
 
-			if (r->parent || !(r->flags & IORESOURCE_IO))
+			if (r->parent || !(r->flags & IORESOURCE_IO) ||
+			    resource_disabled(r))
 				continue;
+
 			r_size = resource_size(r);
 
 			if (r_size < 0x400)
@@ -1239,7 +1240,7 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
 		if (b_res->start || b_res->end)
 			dev_info(&bus->self->dev, "disabling bridge window %pR to %pR (unused)\n",
 				 b_res, &bus->busn_res);
-		b_res->flags = 0;
+		b_res->flags |= IORESOURCE_UNSET | IORESOURCE_DISABLED;
 		return;
 	}
 
@@ -1532,7 +1533,8 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 			if (r->parent || (flags & IORESOURCE_PCI_FIXED) ||
 			    ((flags & mask) != type &&
 			     (flags & mask) != type2 &&
-			     (flags & mask) != type3))
+			     (flags & mask) != type3) ||
+			    resource_disabled(r))
 				continue;
 
 			r_size = resource_size(r);
@@ -1553,7 +1555,8 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 			if (align > (1ULL<<37)) { /*128 Gb*/
 				dev_warn(&dev->dev, "disabling BAR %d: %pR (bad alignment %#llx)\n",
 					i, r, (unsigned long long) align);
-				r->flags = 0;
+				r->flags |= IORESOURCE_UNSET |
+					    IORESOURCE_DISABLED;
 				continue;
 			}
 
@@ -1650,7 +1653,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 		if (b_res->start || b_res->end)
 			dev_info(&bus->self->dev, "disabling bridge window %pR to %pR (unused)\n",
 				 b_res, &bus->busn_res);
-		b_res->flags = 0;
+		b_res->flags |= IORESOURCE_UNSET | IORESOURCE_DISABLED;
 		return 0;
 	}
 
@@ -2110,7 +2113,7 @@ static void pci_bridge_release_resources(struct pci_bus *bus,
 		/* keep the old size */
 		r->end = resource_size(r) - 1;
 		r->start = 0;
-		r->flags = 0;
+		r->flags |= IORESOURCE_UNSET | IORESOURCE_DISABLED;
 
 		/* avoiding touch the one without PREF */
 		if (type & IORESOURCE_PREFETCH)
@@ -2379,7 +2382,6 @@ again:
 
 		restore_resource(fail_res, res);
 		if (fail_res->dev->subordinate) {
-			res->flags = 0;
 			/* last or third times and later */
 			if (tried_times + 1 == pci_try_num ||
 			    tried_times + 1 > 2)
@@ -2463,7 +2465,6 @@ again:
 
 		restore_resource(fail_res, res);
 		if (fail_res->dev->subordinate) {
-			res->flags = 0;
 			/* last time */
 			reset_bridge_resource_size(fail_res->dev, res);
 		}
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 2f901f7..a3e683d 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -377,7 +377,8 @@ int pci_enable_resources(struct pci_dev *dev, int mask)
 
 		r = &dev->resource[i];
 
-		if (!(r->flags & (IORESOURCE_IO | IORESOURCE_MEM)))
+		if (!(r->flags & (IORESOURCE_IO | IORESOURCE_MEM)) ||
+		    resource_disabled(r))
 			continue;
 		if ((i == PCI_ROM_RESOURCE) &&
 				(!(r->flags & IORESOURCE_ROM_ENABLE)))
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v11 60/60] PCI: Only try to assign io port only for root bus that support it
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (58 preceding siblings ...)
  2016-04-08  0:16 ` [PATCH v11 59/60] PCI: Don't set flags to 0 when assign resource fail Yinghai Lu
@ 2016-04-08  0:16 ` Yinghai Lu
  2016-04-08  0:51 ` [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Linus Torvalds
  60 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  0:16 UTC (permalink / raw)
  To: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci, linux-kernel,
	Yinghai Lu

The PCI subsystem always assumes that I/O is supported on root bus and
tries to assign an I/O window to each child bus even if that is not the
case.

The use cases is on Intel 8 socket system that have 8 root buses,
last two root buses would not have io port resources from _CRS.

Check if root bus supports I/O, and later during sizing and
assigning, check that flags and skip those resources.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/probe.c     | 6 ++++++
 drivers/pci/setup-bus.c | 9 +++++++++
 include/linux/pci.h     | 1 +
 3 files changed, 16 insertions(+)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 52ddc45..6f0488c 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -345,6 +345,9 @@ static void pci_read_bridge_io(struct pci_bus *child)
 	struct pci_bus_region region;
 	struct resource *res;
 
+	if (!pci_find_host_bridge(child)->has_ioport)
+		return;
+
 	io_mask = PCI_IO_RANGE_MASK;
 	io_granularity = 0x1000;
 	if (dev->io_window_1k) {
@@ -2231,6 +2234,9 @@ struct pci_bus *pci_create_root_bus(struct device *parent, int bus,
 			bus_addr[0] = '\0';
 		dev_info(&b->dev, "root bus resource %pR%s\n", res, bus_addr);
 
+		if (resource_type(res) == IORESOURCE_IO)
+			bridge->has_ioport = 1;
+
 		if (resource_type(res) == IORESOURCE_MEM) {
 			if ((res->end - offset) > 0xffffffff)
 				bridge->has_mem64 = 1;
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index d07ba87..076b5db 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -225,6 +225,10 @@ static void pdev_assign_resources_prepare(struct pci_dev *dev,
 		if (resource_disabled(r) || r->parent)
 			continue;
 
+		if ((r->flags & IORESOURCE_IO) &&
+		    !pci_find_host_bridge(dev->bus)->has_ioport)
+			continue;
+
 		r_align = __pci_resource_alignment(dev, r, realloc_head);
 		if (!r_align) {
 			dev_warn(&dev->dev, "BAR %d: %pR has bogus alignment\n",
@@ -1188,6 +1192,11 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
 		min_size = 0;
 	}
 
+	if (!pci_find_host_bridge(bus)->has_ioport) {
+		b_res->flags |= IORESOURCE_UNSET | IORESOURCE_DISABLED;
+		return;
+	}
+
 	min_align = window_alignment(bus, IORESOURCE_IO);
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		int i;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 41d06ce..463094a 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -409,6 +409,7 @@ struct pci_host_bridge {
 	void *release_data;
 	unsigned int ignore_reset_delay:1;	/* for entire hierarchy */
 	unsigned int has_mem64:1;
+	unsigned int has_ioport:1;
 	/* Resource alignment requirements */
 	resource_size_t (*align_resource)(struct pci_dev *dev,
 			const struct resource *res,
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7
  2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
                   ` (59 preceding siblings ...)
  2016-04-08  0:16 ` [PATCH v11 60/60] PCI: Only try to assign io port only for root bus that support it Yinghai Lu
@ 2016-04-08  0:51 ` Linus Torvalds
  2016-04-09  5:29   ` Yinghai Lu
  60 siblings, 1 reply; 86+ messages in thread
From: Linus Torvalds @ 2016-04-08  0:51 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Wei Yang,
	TJ, Yijing Wang, Khalid Aziz, linux-pci,
	Linux Kernel Mailing List

On Thu, Apr 7, 2016 at 5:15 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>
> After 5b28541552ef (PCI: Restrict 64-bit prefetchable bridge windows
> to 64-bit resources), we have several reports on resource allocation
> failure, and we try to fix the problem with resource clip, and find
> more problems.

Quite frankly, that commit 5b28541552ef is two years old by now, and
went into 3.16.

So whatever problems it caused are kind of water under the bridge.

It worries me a *lot* when there is then a 60-patch series to fix
those alleged problems, because my natural worry ends up being that
the series is as likely to introduce new issues as it is to fix
something that clearly people have been living with for a while now.

I'm not saying that this series is bad, but I *am* saying that at this
point, I'd much rather see this be done as much smaller series, and
merged slowly and carefully. I'm not seeing a lot of people reviewing
the code, but even *with* code review, things like "let's start
allocating from the top of the resource" tends to make me really
really nervous. Because those kinds of patches tend to show problems
even if the code itself was perfectly bug-free, just because it
changes some layout issue and hits some hardware weakness or
undocumented resource allocation issue.

Using tricks like a __weak pcibios_align_end_resource() to only
trigger on one single architecture (despite the naming) makes those
things even subtler.

So please, try to split this up sanely, and let's merge it in sane
pieces. I see that you have that M7101 quirk removal randomly in the
middle of this series, for example, and it doesn't seem to be the only
such random patch. That's entirely independent of all the other
patches in the series (and I thought I acked it already, but
whatever).

Put another way: this is less of a real targeted series, and more of a
random collection of patches. Very few people have the background to
review them, and basically nobody has the ability to test them
(although _individual_ parts of it are obviously testable).

I'd realyl like to see the misc per-device ones separated, for
example. Same for the pure cleanup ones that obviously don't change
any actual semantics. There's a number of those there. And then the
ones that do change real and generic pci allocation code need to be
done in smaller batches so that you don't have "everything changes at
once".

I tried to scan the patches, and I didn't find anything actually
_wrong_. Several looked like "that's an obvious improvement". But
several looked fairly complex, and all _together_ just looked pretty
scary.

              Linus

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v11 44/60] PCI: Add alt_size ressource allocation support
  2016-04-08  0:15 ` [PATCH v11 44/60] PCI: Add alt_size ressource allocation support Yinghai Lu
@ 2016-04-08  0:56   ` Linus Torvalds
  2016-04-08  5:50     ` Yinghai Lu
  2016-04-08  6:24     ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 86+ messages in thread
From: Linus Torvalds @ 2016-04-08  0:56 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Wei Yang,
	TJ, Yijing Wang, Khalid Aziz, linux-pci,
	Linux Kernel Mailing List

On Thu, Apr 7, 2016 at 5:15 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On system with several pcie switches, BIOS allocate very tight resources
> to the bridge bar, and it is not aligned to min_align as kernel allocation
> code.

Ok, this came in after I already replied to the other ones.

I'm not excited about the whole "alternate aligment".

Maybe the kernel should just accept the smaller alignment. If the
minimum alignment we use is bigger than necessary, then we're just
wrong about it, and perhaps we should just use the smaller alignment
that the bios used.

So instead of adding this notion of alternate alignment, maybe we
should just not be so uptight about our own minimum alignment
requirements?

                  Linus

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v11 44/60] PCI: Add alt_size ressource allocation support
  2016-04-08  0:56   ` Linus Torvalds
@ 2016-04-08  5:50     ` Yinghai Lu
  2016-04-08  6:24     ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-08  5:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Wei Yang,
	TJ, Yijing Wang, Khalid Aziz, linux-pci,
	Linux Kernel Mailing List

On Thu, Apr 7, 2016 at 5:56 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> I'm not excited about the whole "alternate aligment".
>
> Maybe the kernel should just accept the smaller alignment. If the
> minimum alignment we use is bigger than necessary, then we're just
> wrong about it, and perhaps we should just use the smaller alignment
> that the bios used.

Look like I did not make it clearly in this change log.

Current kernel code sizing is searching smallest alignment, so call it
min_align scheme.

Some bios is using different way: It use smallest size instead, but it
could have bigger alignment
than min_align, so I call it alt_size scheme.

PROs for min_align: it can be used to calculate upper parent bridges
easily and safely.
CONs for min_align: it could generate more bigger required size.

PROs for alt_size: it try to search smaller size, esp for under 4G mmio space.
CONs for alt_size: it could have much bigger alignment, and need to
calculate the upper
bridge alt_size carefully. We end it up with try out to search the
upper bridge alt_size.

Current min_align code still have other problem: it can not handle block that
size is bigger than alignment, and it would generate wrong/too big align/size.

In the patch set, We
1. fix the min_align scheme to use try out way to find right min_align even
block size if bigger than block alignment.
2. add alt_size scheme, it will search alt_size/alt_align, that have
smaller size
   a. compare that with min_align/min_size, if alt_size is not smaller
than min_size
       just dump alt_size. otherwise record alt_size.
   b. later if we fail to get allocation with min_align, we retry alt_size.

So we still keep the old way as usual, and only handle some extra
corner case like
1. BIOS use small size and big align allocation, and in kernel we are doing pci
device remove or rescan.
2. We have couple layers of bridges that min_align scheme is wasting space.
and we have tight mmio under 4G.

Please let me know if I describe it clearly this time, otherwise I
would extract sample output
from patches and post here.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v11 44/60] PCI: Add alt_size ressource allocation support
  2016-04-08  0:56   ` Linus Torvalds
  2016-04-08  5:50     ` Yinghai Lu
@ 2016-04-08  6:24     ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 86+ messages in thread
From: Benjamin Herrenschmidt @ 2016-04-08  6:24 UTC (permalink / raw)
  To: Linus Torvalds, Yinghai Lu
  Cc: Bjorn Helgaas, David Miller, Wei Yang, TJ, Yijing Wang,
	Khalid Aziz, linux-pci, Linux Kernel Mailing List

On Thu, 2016-04-07 at 17:56 -0700, Linus Torvalds wrote:
> Maybe the kernel should just accept the smaller alignment. If the
> minimum alignment we use is bigger than necessary, then we're just
> wrong about it, and perhaps we should just use the smaller alignment
> that the bios used.
> 
> So instead of adding this notion of alternate alignment, maybe we
> should just not be so uptight about our own minimum alignment
> requirements?

On the other hand it's nice to align things at page boundaries when
these resources can possibly be mapped by user space or KVM guests...

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7
  2016-04-08  0:51 ` [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Linus Torvalds
@ 2016-04-09  5:29   ` Yinghai Lu
  0 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-04-09  5:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt, Wei Yang,
	TJ, Yijing Wang, Khalid Aziz, linux-pci,
	Linux Kernel Mailing List

On Thu, Apr 7, 2016 at 5:51 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So please, try to split this up sanely, and let's merge it in sane
> pieces. I see that you have that M7101 quirk removal randomly in the
> middle of this series, for example, and it doesn't seem to be the only
> such random patch. That's entirely independent of all the other
> patches in the series (and I thought I acked it already, but
> whatever).

ok, I will split them in small series.

Bjorn,

Can you review patch 1 - 16, and put them into pci-next at first ?

patch 1-11: parse MEM64 for sparc and other system with OF

patch 12-16: MMIO64 allocation enhancement
        treat non-pref mmio64 if parent bridges are all pcie.
        restore old pref allocation logic if hostbridge does not support mmio64.

Bjorn Helgaas (2):
  PCI: Fix iomem_is_exclusive() checking in pci_mmap_resource()
  alpha/PCI: Only check iomem_is_exclusive() for IORESOURCE_MEM, not
IORESOURCE_IO

Yinghai Lu (58):
  PCI: Add pci_find_bus_resource()
  sparc/PCI: Use correct offset for bus address to resource
  sparc/PCI: Reserve legacy mmio after PCI mmio
  sparc/PCI: Add IORESOURCE_MEM_64 for 64-bit resource in OF parsing
  sparc/PCI: Keep resource idx order with bridge register number
  PCI: Kill wrong quirk about M7101
  powerpc/PCI: Keep resource idx order with bridge register number
  powerpc/PCI: Add IORESOURCE_MEM_64 for 64-bit resource in OF parsing
  OF/PCI: Add IORESOURCE_MEM_64 for 64-bit resource
  PCI: Check pref compatible bit for mem64 resource of PCIe device
  PCI: Only treat non-pref mmio64 as pref if all bridges have MEM_64
  PCI: Add has_mem64 for struct host_bridge
  PCI: Only treat non-pref mmio64 as pref if host bridge has mmio64
  PCI: Restore pref MMIO allocation logic for host bridge without mmio64

I will sort out others late.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v11 04/60] sparc/PCI: Use correct offset for bus address to resource
  2016-04-08  0:15 ` [PATCH v11 04/60] sparc/PCI: Use correct offset for bus address to resource Yinghai Lu
@ 2016-04-22 20:49   ` Bjorn Helgaas
  2016-04-28  4:55     ` Yinghai Lu
  0 siblings, 1 reply; 86+ messages in thread
From: Bjorn Helgaas @ 2016-04-22 20:49 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt,
	Linus Torvalds, Wei Yang, TJ, Yijing Wang, Khalid Aziz,
	linux-pci, linux-kernel, Michael Ellerman

[+cc Ben, Michael]

On Thu, Apr 07, 2016 at 05:15:17PM -0700, Yinghai Lu wrote:
> After we added 64bit mmio parsing, we got some "no compatible bridge window"
> warning on anther new model that support 64bit resource.
> 
> It turns out that we can not use mem_space.start as 64bit mem space
> offset, aka there is mem_space.start != offset.
> 
> Use child_phys_addr to calculate exact offset and record offset in
> pbm.
> 
> After patch we get correct offset.
> 
> /pci@305: PCI IO [io  0x2007e00000000-0x2007e0fffffff] offset 2007e00000000
> /pci@305: PCI MEM [mem 0x2000000100000-0x200007effffff] offset 2000000000000
> /pci@305: PCI MEM64 [mem 0x2000100000000-0x2000dffffffff] offset 2000000000000
> ...
> pci_sun4v f02ae7f8: PCI host bridge to bus 0000:00
> pci_bus 0000:00: root bus resource [io  0x2007e00000000-0x2007e0fffffff] (bus address [0x0000-0xfffffff])
> pci_bus 0000:00: root bus resource [mem 0x2000000100000-0x200007effffff] (bus address [0x00100000-0x7effffff])
> pci_bus 0000:00: root bus resource [mem 0x2000100000000-0x2000dffffffff] (bus address [0x100000000-0xdffffffff])
> ...

> @@ -733,30 +733,32 @@ int pcibios_enable_device(struct pci_dev *dev, int mask)
>  static int __pci_mmap_make_offset_bus(struct pci_dev *pdev, struct vm_area_struct *vma,
>  				      enum pci_mmap_state mmap_state)
>  {
> -	struct pci_pbm_info *pbm = pdev->dev.archdata.host_controller;
> -	unsigned long space_size, user_offset, user_size;
> -
> -	if (mmap_state == pci_mmap_io) {
> -		space_size = resource_size(&pbm->io_space);
> -	} else {
> -		space_size = resource_size(&pbm->mem_space);
> -	}
> +	unsigned long user_offset, user_size;
> +	struct resource res, *root_bus_res;
> +	struct pci_bus_region region;
> +	struct pci_bus *bus;
>  
>  	/* Make sure the request is in range. */
>  	user_offset = vma->vm_pgoff << PAGE_SHIFT;
>  	user_size = vma->vm_end - vma->vm_start;
>  
> -	if (user_offset >= space_size ||
> -	    (user_offset + user_size) > space_size)
> +	region.start = user_offset;
> +	region.end = user_offset + user_size - 1;
> +	memset(&res, 0, sizeof(res));
> +	if (mmap_state == pci_mmap_io)
> +		res.flags = IORESOURCE_IO;
> +	else
> +		res.flags = IORESOURCE_MEM;
> +
> +	pcibios_bus_to_resource(pdev->bus, &res, &region);
> +	bus = pdev->bus;
> +	while (bus->parent)
> +		bus = bus->parent;
> +	root_bus_res = pci_find_bus_resource(bus, &res);
> +	if (!root_bus_res)
>  		return -EINVAL;
>  
> -	if (mmap_state == pci_mmap_io) {
> -		vma->vm_pgoff = (pbm->io_space.start +
> -				 user_offset) >> PAGE_SHIFT;
> -	} else {
> -		vma->vm_pgoff = (pbm->mem_space.start +
> -				 user_offset) >> PAGE_SHIFT;
> -	}
> +	vma->vm_pgoff = res.start >> PAGE_SHIFT;
>  
>  	return 0;
>  }

I'm kind of confused here.  There are two ways to mmap PCI BARs:

  /proc/bus/pci/00/02.0 (proc_bus_pci_mmap()):
    all BARs in one file; MEM/IO determined by ioctl()
    mmap offset is a CPU physical address in the PCI resource

  /sys/devices/pci0000:00/0000:00:02.0/resource0 (pci_mmap_resource()):
    one file per BAR; MEM/IO determined by BAR type
    mmap offset is between 0 and BAR size

Both proc_bus_pci_mmap() and pci_mmap_resource() validate the
requested area with pci_mmap_fits() before calling pci_mmap_page_range().

In the proc_bus_pci_mmap() path, the offset in vma->vm_pgoff must be
within the pdev->resource[], so the user must be supplying a CPU
physical address (not an address obtained from pci_resource_to_user()).
That vma->vm_pgoff is passed unchanged to pci_mmap_page_range().

In the pci_mmap_resource() path, vma->vm_pgoff must be between 0 and
the BAR size.  Then we add in the pci_resource_to_user() information
before passing it to pci_mmap_page_range().  The comment in
pci_mmap_resource() says pci_mmap_page_range() expects a "user
visible" address, but I don't really believe that based on how
proc_bus_pci_mmap() works.

Do both proc_bus_pci_mmap() and pci_mmap_resource() work on sparc?
It looks like they call pci_mmap_page_range() with different
assumptions, so I don't see how they can both work.

In any case, pci_mmap_page_range() on sparc converts that
vma->vm_pgoff back to a CPU physical address, so there's an awful lot
of work going on in the pci_mmap_resource() path to convert the mmap
offset to a "user" address and then convert it back again.

There's also quite a bit of validation done in the arch code (in
__pci_mmap_make_offset() and __pci_mmap_make_offset_bus()) that looks
partly redundant with pci_mmap_fits() and not necessarily
arch-specific.

As far as I can see, pci_mmap_resource() doesn't need to have any
connection at all with pci_resource_to_user().  All it needs is the
pdev->resource[] (which has the CPU physical address of the BAR) and
vma->vm_pgoff (the offset into the BAR).

I don't think pci_mmap_resource() should call pci_resource_to_user(),
and I think pci_mmap_page_range() should expect a normal VMA that
contains a valid CPU PFN in vm_pgoff (or a valid CPU I/O port number,
in the case of an I/O port mmap).

The original pci_resource_to_user() was added for powerpc by
2311b1f2bbd3 ("[PATCH] PCI: fix-pci-mmap-on-ppc-and-ppc64.patch"),
but I couldn't find the linux-pci discussion it mentions.

> @@ -977,16 +979,12 @@ void pci_resource_to_user(const struct pci_dev *pdev, int bar,
>  			  const struct resource *rp, resource_size_t *start,
>  			  resource_size_t *end)
>  {
> -	struct pci_pbm_info *pbm = pdev->dev.archdata.host_controller;
> -	unsigned long offset;
> +	struct pci_bus_region region;
>  
> -	if (rp->flags & IORESOURCE_IO)
> -		offset = pbm->io_space.start;
> -	else
> -		offset = pbm->mem_space.start;
> +	pcibios_resource_to_bus(pdev->bus, &region, (struct resource *)rp);
>  
> -	*start = rp->start - offset;
> -	*end = rp->end - offset;
> +	*start = region.start;
> +	*end = region.end;
>  }

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v11 02/60] alpha/PCI: Only check iomem_is_exclusive() for IORESOURCE_MEM, not IORESOURCE_IO
  2016-04-08  0:15 ` [PATCH v11 02/60] alpha/PCI: Only check iomem_is_exclusive() for IORESOURCE_MEM, not IORESOURCE_IO Yinghai Lu
@ 2016-04-25 21:01   ` Bjorn Helgaas
  0 siblings, 0 replies; 86+ messages in thread
From: Bjorn Helgaas @ 2016-04-25 21:01 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt,
	Linus Torvalds, Wei Yang, TJ, Yijing Wang, Khalid Aziz,
	linux-pci, linux-kernel, Ivan Kokshaysky

On Thu, Apr 07, 2016 at 05:15:15PM -0700, Yinghai Lu wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> The alpha pci_mmap_resource() is used for both IORESOURCE_MEM and
> IORESOURCE_IO resources, but iomem_is_exclusive() is only applicable for
> IORESOURCE_MEM.
> 
> Call iomem_is_exclusive() only for IORESOURCE_MEM resources, and do it
> earlier to match the generic version of pci_mmap_resource().
> 
> Fixes: 10a0ef39fbd1 ("PCI/alpha: pci sysfs resources")
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> CC: Ivan Kokshaysky <ink@jurassic.park.msu.ru>

I applied these first two patches to pci/resource for v4.7.

> ---
>  arch/alpha/kernel/pci-sysfs.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/alpha/kernel/pci-sysfs.c b/arch/alpha/kernel/pci-sysfs.c
> index 99e8d47..92c0d46 100644
> --- a/arch/alpha/kernel/pci-sysfs.c
> +++ b/arch/alpha/kernel/pci-sysfs.c
> @@ -77,10 +77,10 @@ static int pci_mmap_resource(struct kobject *kobj,
>  	if (i >= PCI_ROM_RESOURCE)
>  		return -ENODEV;
>  
> -	if (!__pci_mmap_fits(pdev, i, vma, sparse))
> +	if (res->flags & IORESOURCE_MEM && iomem_is_exclusive(res->start))
>  		return -EINVAL;
>  
> -	if (iomem_is_exclusive(res->start))
> +	if (!__pci_mmap_fits(pdev, i, vma, sparse))
>  		return -EINVAL;
>  
>  	pcibios_resource_to_bus(pdev->bus, &bar, res);
> -- 
> 1.8.4.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v11 04/60] sparc/PCI: Use correct offset for bus address to resource
  2016-04-22 20:49   ` Bjorn Helgaas
@ 2016-04-28  4:55     ` Yinghai Lu
  2016-04-28 13:56       ` Bjorn Helgaas
  0 siblings, 1 reply; 86+ messages in thread
From: Yinghai Lu @ 2016-04-28  4:55 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt,
	Linus Torvalds, Wei Yang, TJ, Yijing Wang, Khalid Aziz,
	linux-pci, Linux Kernel Mailing List, Michael Ellerman

On Fri, Apr 22, 2016 at 1:49 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> [+cc Ben, Michael]
> I'm kind of confused here.  There are two ways to mmap PCI BARs:
>
>   /proc/bus/pci/00/02.0 (proc_bus_pci_mmap()):
>     all BARs in one file; MEM/IO determined by ioctl()
>     mmap offset is a CPU physical address in the PCI resource
>
>   /sys/devices/pci0000:00/0000:00:02.0/resource0 (pci_mmap_resource()):
>     one file per BAR; MEM/IO determined by BAR type
>     mmap offset is between 0 and BAR size
>
> Both proc_bus_pci_mmap() and pci_mmap_resource() validate the
> requested area with pci_mmap_fits() before calling pci_mmap_page_range().
>
> In the proc_bus_pci_mmap() path, the offset in vma->vm_pgoff must be
> within the pdev->resource[], so the user must be supplying a CPU
> physical address (not an address obtained from pci_resource_to_user()).
> That vma->vm_pgoff is passed unchanged to pci_mmap_page_range().
>
> In the pci_mmap_resource() path, vma->vm_pgoff must be between 0 and
> the BAR size.  Then we add in the pci_resource_to_user() information
> before passing it to pci_mmap_page_range().  The comment in
> pci_mmap_resource() says pci_mmap_page_range() expects a "user
> visible" address, but I don't really believe that based on how
> proc_bus_pci_mmap() works.
>
> Do both proc_bus_pci_mmap() and pci_mmap_resource() work on sparc?
> It looks like they call pci_mmap_page_range() with different
> assumptions, so I don't see how they can both work.

for sysfs path: in pci_mmap_resource
        pci_resource_to_user(pdev, i, res, &start, &end);
        vma->vm_pgoff += start >> PAGE_SHIFT;
     then call pci_mmap_page_range()

the fit checking in pci_mmap_fits(),
        pci_start = (mmap_api == PCI_MMAP_PROCFS) ?
                        pci_resource_start(pdev, resno) >> PAGE_SHIFT : 0;
        if (start >= pci_start && start < pci_start + size &&
                        start + nr <= pci_start + size)

so proc fs assume resource_start for vm_pgoff ?

but current pci_mmap_page_range want to use bus address
start aka BAR value.

and we have

        /* pci_mmap_page_range() expects the same kind of entry as coming
         * from /proc/bus/pci/ which is a "user visible" value. If this is
         * different from the resource itself, arch will do necessary fixup.
         */

so we need to fix pci_mmap_fits(), please check if it is ok, will
submit it as separated one.

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index d319a9c..3768c6a 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -969,15 +969,20 @@ void pci_remove_legacy_files(struct pci_bus *b)
 int pci_mmap_fits(struct pci_dev *pdev, int resno, struct vm_area_struct *vma,
                  enum pci_mmap_api mmap_api)
 {
-       unsigned long nr, start, size, pci_start;
+       unsigned long nr, start, size;
+       resource_size_t pci_start = 0, pci_end;

        if (pci_resource_len(pdev, resno) == 0)
                return 0;
        nr = vma_pages(vma);
        start = vma->vm_pgoff;
        size = ((pci_resource_len(pdev, resno) - 1) >> PAGE_SHIFT) + 1;
-       pci_start = (mmap_api == PCI_MMAP_PROCFS) ?
-                       pci_resource_start(pdev, resno) >> PAGE_SHIFT : 0;
+       if (mmap_api == PCI_MMAP_PROCFS) {
+               struct resource *res = &pdev->resource[resno];
+
+               pci_resource_to_user(pdev, resno, res, &pci_start, &pci_end);
+               pci_start >>= PAGE_SHIFT;
+       }
        if (start >= pci_start && start < pci_start + size &&
                        start + nr <= pci_start + size)
                return 1;

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v11 04/60] sparc/PCI: Use correct offset for bus address to resource
  2016-04-28  4:55     ` Yinghai Lu
@ 2016-04-28 13:56       ` Bjorn Helgaas
  2016-04-29  7:19         ` Yinghai Lu
  0 siblings, 1 reply; 86+ messages in thread
From: Bjorn Helgaas @ 2016-04-28 13:56 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt,
	Linus Torvalds, Wei Yang, TJ, Yijing Wang, Khalid Aziz,
	linux-pci, Linux Kernel Mailing List, Michael Ellerman

On Wed, Apr 27, 2016 at 09:55:45PM -0700, Yinghai Lu wrote:
> On Fri, Apr 22, 2016 at 1:49 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> > [+cc Ben, Michael]
> > I'm kind of confused here.  There are two ways to mmap PCI BARs:
> >
> >   /proc/bus/pci/00/02.0 (proc_bus_pci_mmap()):
> >     all BARs in one file; MEM/IO determined by ioctl()
> >     mmap offset is a CPU physical address in the PCI resource
> >
> >   /sys/devices/pci0000:00/0000:00:02.0/resource0 (pci_mmap_resource()):
> >     one file per BAR; MEM/IO determined by BAR type
> >     mmap offset is between 0 and BAR size
> >
> > Both proc_bus_pci_mmap() and pci_mmap_resource() validate the
> > requested area with pci_mmap_fits() before calling pci_mmap_page_range().
> >
> > In the proc_bus_pci_mmap() path, the offset in vma->vm_pgoff must be
> > within the pdev->resource[], so the user must be supplying a CPU
> > physical address (not an address obtained from pci_resource_to_user()).
> > That vma->vm_pgoff is passed unchanged to pci_mmap_page_range().
> >
> > In the pci_mmap_resource() path, vma->vm_pgoff must be between 0 and
> > the BAR size.  Then we add in the pci_resource_to_user() information
> > before passing it to pci_mmap_page_range().  The comment in
> > pci_mmap_resource() says pci_mmap_page_range() expects a "user
> > visible" address, but I don't really believe that based on how
> > proc_bus_pci_mmap() works.
> >
> > Do both proc_bus_pci_mmap() and pci_mmap_resource() work on sparc?
> > It looks like they call pci_mmap_page_range() with different
> > assumptions, so I don't see how they can both work.
> 
> for sysfs path: in pci_mmap_resource
>         pci_resource_to_user(pdev, i, res, &start, &end);
>         vma->vm_pgoff += start >> PAGE_SHIFT;
>      then call pci_mmap_page_range()
> 
> the fit checking in pci_mmap_fits(),
>         pci_start = (mmap_api == PCI_MMAP_PROCFS) ?
>                         pci_resource_start(pdev, resno) >> PAGE_SHIFT : 0;
>         if (start >= pci_start && start < pci_start + size &&
>                         start + nr <= pci_start + size)
> 
> so proc fs assume resource_start for vm_pgoff ?
> 
> but current pci_mmap_page_range want to use bus address
> start aka BAR value.
> 
> and we have
> 
>         /* pci_mmap_page_range() expects the same kind of entry as coming
>          * from /proc/bus/pci/ which is a "user visible" value. If this is
>          * different from the resource itself, arch will do necessary fixup.
>          */
> 
> so we need to fix pci_mmap_fits(), please check if it is ok, will
> submit it as separated one.

1) The sysfs path uses offsets between 0 and BAR size.  This path
should work identically on all arches.  "User" addresses are not
involved, so it doesn't make sense that this path calls
pci_resource_to_user() from pci_mmap_resource().

2) The procfs path uses offsets of resource values (CPU physical
addresses) on most architectures.  If some arches use something else,
e.g., "user" addresses, the grunge of dealing with them should be in
this path, i.e., in proc_bus_pci_mmap().  This implies that
pci_mmap_page_range() should deal with CPU physical addresses, not bus
addresses, and proc_bus_pci_mmap() should perform any necessary
translation.

3) If my theory that proc_bus_pci_mmap() and pci_mmap_resource() are
calling pci_mmap_page_range() with different assumptions is correct,
you should be able to write a test program that fails for one method,
and your patch should fix that failure.

> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index d319a9c..3768c6a 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -969,15 +969,20 @@ void pci_remove_legacy_files(struct pci_bus *b)
>  int pci_mmap_fits(struct pci_dev *pdev, int resno, struct vm_area_struct *vma,
>                   enum pci_mmap_api mmap_api)
>  {
> -       unsigned long nr, start, size, pci_start;
> +       unsigned long nr, start, size;
> +       resource_size_t pci_start = 0, pci_end;
> 
>         if (pci_resource_len(pdev, resno) == 0)
>                 return 0;
>         nr = vma_pages(vma);
>         start = vma->vm_pgoff;
>         size = ((pci_resource_len(pdev, resno) - 1) >> PAGE_SHIFT) + 1;
> -       pci_start = (mmap_api == PCI_MMAP_PROCFS) ?
> -                       pci_resource_start(pdev, resno) >> PAGE_SHIFT : 0;
> +       if (mmap_api == PCI_MMAP_PROCFS) {
> +               struct resource *res = &pdev->resource[resno];
> +
> +               pci_resource_to_user(pdev, resno, res, &pci_start, &pci_end);
> +               pci_start >>= PAGE_SHIFT;
> +       }

This is the wrong place to deal with this.  IMO, any pci_resource_to_user()
fiddling should be done directly in proc_bus_pci_mmap(), and
pci_mmap_fits() should deal only with resources (CPU physical
addresses).  Then it wouldn't care where it was called from, so we
could get rid of the pci_mmap_api thing completely.

>         if (start >= pci_start && start < pci_start + size &&
>                         start + nr <= pci_start + size)
>                 return 1;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v11 04/60] sparc/PCI: Use correct offset for bus address to resource
  2016-04-28 13:56       ` Bjorn Helgaas
@ 2016-04-29  7:19         ` Yinghai Lu
  2016-05-03 22:52           ` Yinghai Lu
  0 siblings, 1 reply; 86+ messages in thread
From: Yinghai Lu @ 2016-04-29  7:19 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, David Miller, Benjamin Herrenschmidt,
	Linus Torvalds, Wei Yang, TJ, Yijing Wang, Khalid Aziz,
	linux-pci, Linux Kernel Mailing List, Michael Ellerman

On Thu, Apr 28, 2016 at 6:56 AM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Wed, Apr 27, 2016 at 09:55:45PM -0700, Yinghai Lu wrote:
>> On Fri, Apr 22, 2016 at 1:49 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
>> > [+cc Ben, Michael]
>> > I'm kind of confused here.  There are two ways to mmap PCI BARs:
>> >
>> >   /proc/bus/pci/00/02.0 (proc_bus_pci_mmap()):
>> >     all BARs in one file; MEM/IO determined by ioctl()
>> >     mmap offset is a CPU physical address in the PCI resource
>> >
>> >   /sys/devices/pci0000:00/0000:00:02.0/resource0 (pci_mmap_resource()):
>> >     one file per BAR; MEM/IO determined by BAR type
>> >     mmap offset is between 0 and BAR size
>> >
>> > Both proc_bus_pci_mmap() and pci_mmap_resource() validate the
>> > requested area with pci_mmap_fits() before calling pci_mmap_page_range().
>> >
>> > In the proc_bus_pci_mmap() path, the offset in vma->vm_pgoff must be
>> > within the pdev->resource[], so the user must be supplying a CPU
>> > physical address (not an address obtained from pci_resource_to_user()).
>> > That vma->vm_pgoff is passed unchanged to pci_mmap_page_range().
>> >
>> > In the pci_mmap_resource() path, vma->vm_pgoff must be between 0 and
>> > the BAR size.  Then we add in the pci_resource_to_user() information
>> > before passing it to pci_mmap_page_range().  The comment in
>> > pci_mmap_resource() says pci_mmap_page_range() expects a "user
>> > visible" address, but I don't really believe that based on how
>> > proc_bus_pci_mmap() works.
>> >
>> > Do both proc_bus_pci_mmap() and pci_mmap_resource() work on sparc?
>> > It looks like they call pci_mmap_page_range() with different
>> > assumptions, so I don't see how they can both work.
>>
>> for sysfs path: in pci_mmap_resource
>>         pci_resource_to_user(pdev, i, res, &start, &end);
>>         vma->vm_pgoff += start >> PAGE_SHIFT;
>>      then call pci_mmap_page_range()
>>
>> the fit checking in pci_mmap_fits(),
>>         pci_start = (mmap_api == PCI_MMAP_PROCFS) ?
>>                         pci_resource_start(pdev, resno) >> PAGE_SHIFT : 0;
>>         if (start >= pci_start && start < pci_start + size &&
>>                         start + nr <= pci_start + size)
>>
>> so proc fs assume resource_start for vm_pgoff ?
>>
>> but current pci_mmap_page_range want to use bus address
>> start aka BAR value.
>>
>> and we have
>>
>>         /* pci_mmap_page_range() expects the same kind of entry as coming
>>          * from /proc/bus/pci/ which is a "user visible" value. If this is
>>          * different from the resource itself, arch will do necessary fixup.
>>          */
>>
>> so we need to fix pci_mmap_fits(), please check if it is ok, will
>> submit it as separated one.
>
> 1) The sysfs path uses offsets between 0 and BAR size.  This path
> should work identically on all arches.  "User" addresses are not
> involved, so it doesn't make sense that this path calls
> pci_resource_to_user() from pci_mmap_resource().
>
> 2) The procfs path uses offsets of resource values (CPU physical
> addresses) on most architectures.  If some arches use something else,
> e.g., "user" addresses, the grunge of dealing with them should be in
> this path, i.e., in proc_bus_pci_mmap().  This implies that
> pci_mmap_page_range() should deal with CPU physical addresses, not bus
> addresses, and proc_bus_pci_mmap() should perform any necessary
> translation.
>
> 3) If my theory that proc_bus_pci_mmap() and pci_mmap_resource() are
> calling pci_mmap_page_range() with different assumptions is correct,
> you should be able to write a test program that fails for one method,
> and your patch should fix that failure.
>
...>
> This is the wrong place to deal with this.  IMO, any pci_resource_to_user()
> fiddling should be done directly in proc_bus_pci_mmap(), and
> pci_mmap_fits() should deal only with resources (CPU physical
> addresses).  Then it wouldn't care where it was called from, so we
> could get rid of the pci_mmap_api thing completely.

ok, I got it.

We should offset vma->vm_pgoff back into [0, BAR)

will look at it Monday.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v11 04/60] sparc/PCI: Use correct offset for bus address to resource
  2016-04-29  7:19         ` Yinghai Lu
@ 2016-05-03 22:52           ` Yinghai Lu
  2016-05-04  0:37             ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 86+ messages in thread
From: Yinghai Lu @ 2016-05-03 22:52 UTC (permalink / raw)
  To: Bjorn Helgaas, Bjorn Helgaas, David Miller,
	Benjamin Herrenschmidt, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci,
	Linux Kernel Mailing List, Michael Ellerman

On Fri, Apr 29, 2016 at 12:19 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Thu, Apr 28, 2016 at 6:56 AM, Bjorn Helgaas <helgaas@kernel.org> wrote:
>>
>> 1) The sysfs path uses offsets between 0 and BAR size.  This path
>> should work identically on all arches.  "User" addresses are not
>> involved, so it doesn't make sense that this path calls
>> pci_resource_to_user() from pci_mmap_resource().
>>
>> 2) The procfs path uses offsets of resource values (CPU physical
>> addresses) on most architectures.  If some arches use something else,
>> e.g., "user" addresses, the grunge of dealing with them should be in
>> this path, i.e., in proc_bus_pci_mmap().  This implies that
>> pci_mmap_page_range() should deal with CPU physical addresses, not bus
>> addresses, and proc_bus_pci_mmap() should perform any necessary
>> translation.

Please check if following is what you want.

BenH and DavidM,
Are you ok to let /proc/bus/pci/devices to expose resource value instead of
BAR value?
powerpc already expose MMIO as resource value, but still keep IO as BAR value?

Or can we just dump /proc/bus/pci support from now?

Thanks

Yinghai


Subject: [RFC PATCH] PCI: Expose resource value in /proc/bus/pci/devices

Some arch where cpu address (resource value) is not same as pci bus address
(BAR value in pci BAR registers), include sparc, powerpc, microblaze.

Orignally in /proc/bus/pci/devices is exposing device BAR value aka is
user value. later in 396a1a5832 ("[POWERPC] Fix mmap of PCI resource with
hack for X"), in will change to use resource start instead.

Also in 8c05cd08a7 ("PCI: fix offset check for sysfs mmapped files"), try
to check exposed value with resource start/end in proc mmap path.
|        start = vma->vm_pgoff;
|        size = ((pci_resource_len(pdev, resno) - 1) >> PAGE_SHIFT) + 1;
|        pci_start = (mmap_api == PCI_MMAP_PROCFS) ?
|                        pci_resource_start(pdev, resno) >> PAGE_SHIFT : 0;
|        if (start >= pci_start && start < pci_start + size &&
|                        start + nr <= pci_start + size)
That would break sparc that exposed value is still BAR value.

According to Bjorn, we could just pass resource addr instead of BAR.

In the patch:
1. remove those non uniformed pci_resource_to_user.
2. in proc path: proc_bus_pci_mmap==>pci_mmap_fits/pci_mmap_page_range
   all use resource start instead.
3. in sysfs path: pci_mmap_resource will just offset with resource start.
4. all pci_mmap_page_range will all have vma->vm_pgoff with in resource range
   instead of BAR value.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/microblaze/include/asm/pci.h |    5 ---
 arch/microblaze/pci/pci-common.c  |   53 ++------------------------------------
 arch/mips/include/asm/pci.h       |   12 --------
 arch/powerpc/include/asm/pci.h    |    5 ---
 arch/powerpc/kernel/pci-common.c  |   53 ++------------------------------------
 arch/sparc/include/asm/pci_64.h   |    5 ---
 arch/sparc/kernel/pci.c           |   43 ++++++------------------------
 arch/xtensa/kernel/pci.c          |   11 ++-----
 drivers/pci/pci-sysfs.c           |   13 ++-------
 drivers/pci/proc.c                |   16 ++++-------
 include/linux/pci.h               |   15 ----------
 kernel/trace/trace_mmiotrace.c    |   21 +++++----------
 12 files changed, 37 insertions(+), 215 deletions(-)

Index: linux-2.6/arch/microblaze/include/asm/pci.h
===================================================================
--- linux-2.6.orig/arch/microblaze/include/asm/pci.h
+++ linux-2.6/arch/microblaze/include/asm/pci.h
@@ -81,11 +81,6 @@ extern pgprot_t    pci_phys_mem_access_prot
                      unsigned long size,
                      pgprot_t prot);

-#define HAVE_ARCH_PCI_RESOURCE_TO_USER
-extern void pci_resource_to_user(const struct pci_dev *dev, int bar,
-                 const struct resource *rsrc,
-                 resource_size_t *start, resource_size_t *end);
-
 extern void pcibios_setup_bus_devices(struct pci_bus *bus);
 extern void pcibios_setup_bus_self(struct pci_bus *bus);

Index: linux-2.6/arch/microblaze/pci/pci-common.c
===================================================================
--- linux-2.6.orig/arch/microblaze/pci/pci-common.c
+++ linux-2.6/arch/microblaze/pci/pci-common.c
@@ -169,23 +169,16 @@ static struct resource *__pci_mmap_make_
                            enum pci_mmap_state mmap_state)
 {
     struct pci_controller *hose = pci_bus_to_host(dev->bus);
-    unsigned long io_offset = 0;
     int i, res_bit;

     if (!hose)
         return NULL;        /* should never happen */

     /* If memory, add on the PCI bridge address offset */
-    if (mmap_state == pci_mmap_mem) {
-#if 0 /* See comment in pci_resource_to_user() for why this is disabled */
-        *offset += hose->pci_mem_offset;
-#endif
+    if (mmap_state == pci_mmap_mem)
         res_bit = IORESOURCE_MEM;
-    } else {
-        io_offset = (unsigned long)hose->io_base_virt - _IO_BASE;
-        *offset += io_offset;
+    else
         res_bit = IORESOURCE_IO;
-    }

     /*
      * Check that the offset requested corresponds to one of the
@@ -209,7 +202,8 @@ static struct resource *__pci_mmap_make_

         /* found it! construct the final physical address */
         if (mmap_state == pci_mmap_io)
-            *offset += hose->io_base_phys - io_offset;
+            *offset += hose->io_base_phys -
+                 ((unsigned long)hose->io_base_virt - _IO_BASE);
         return rp;
     }

@@ -467,45 +461,6 @@ int pci_mmap_legacy_page_range(struct pc
                    vma->vm_page_prot);
 }

-void pci_resource_to_user(const struct pci_dev *dev, int bar,
-              const struct resource *rsrc,
-              resource_size_t *start, resource_size_t *end)
-{
-    struct pci_controller *hose = pci_bus_to_host(dev->bus);
-    resource_size_t offset = 0;
-
-    if (hose == NULL)
-        return;
-
-    if (rsrc->flags & IORESOURCE_IO)
-        offset = (unsigned long)hose->io_base_virt - _IO_BASE;
-
-    /* We pass a fully fixed up address to userland for MMIO instead of
-     * a BAR value because X is lame and expects to be able to use that
-     * to pass to /dev/mem !
-     *
-     * That means that we'll have potentially 64 bits values where some
-     * userland apps only expect 32 (like X itself since it thinks only
-     * Sparc has 64 bits MMIO) but if we don't do that, we break it on
-     * 32 bits CHRPs :-(
-     *
-     * Hopefully, the sysfs insterface is immune to that gunk. Once X
-     * has been fixed (and the fix spread enough), we can re-enable the
-     * 2 lines below and pass down a BAR value to userland. In that case
-     * we'll also have to re-enable the matching code in
-     * __pci_mmap_make_offset().
-     *
-     * BenH.
-     */
-#if 0
-    else if (rsrc->flags & IORESOURCE_MEM)
-        offset = hose->pci_mem_offset;
-#endif
-
-    *start = rsrc->start - offset;
-    *end = rsrc->end - offset;
-}
-
 /**
  * pci_process_bridge_OF_ranges - Parse PCI bridge resources from device tree
  * @hose: newly allocated pci_controller to be setup
Index: linux-2.6/arch/mips/include/asm/pci.h
===================================================================
--- linux-2.6.orig/arch/mips/include/asm/pci.h
+++ linux-2.6/arch/mips/include/asm/pci.h
@@ -80,18 +80,6 @@ extern void pcibios_set_master(struct pc
 extern int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
     enum pci_mmap_state mmap_state, int write_combine);

-#define HAVE_ARCH_PCI_RESOURCE_TO_USER
-
-static inline void pci_resource_to_user(const struct pci_dev *dev, int bar,
-        const struct resource *rsrc, resource_size_t *start,
-        resource_size_t *end)
-{
-    phys_addr_t size = resource_size(rsrc);
-
-    *start = fixup_bigphys_addr(rsrc->start, size);
-    *end = rsrc->start + size;
-}
-
 /*
  * Dynamic DMA mapping stuff.
  * MIPS has everything mapped statically.
Index: linux-2.6/arch/powerpc/include/asm/pci.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/pci.h
+++ linux-2.6/arch/powerpc/include/asm/pci.h
@@ -135,11 +135,6 @@ extern pgprot_t    pci_phys_mem_access_prot
                      unsigned long size,
                      pgprot_t prot);

-#define HAVE_ARCH_PCI_RESOURCE_TO_USER
-extern void pci_resource_to_user(const struct pci_dev *dev, int bar,
-                 const struct resource *rsrc,
-                 resource_size_t *start, resource_size_t *end);
-
 extern resource_size_t pcibios_io_space_offset(struct pci_controller *hose);
 extern void pcibios_setup_bus_devices(struct pci_bus *bus);
 extern void pcibios_setup_bus_self(struct pci_bus *bus);
Index: linux-2.6/arch/powerpc/kernel/pci-common.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/pci-common.c
+++ linux-2.6/arch/powerpc/kernel/pci-common.c
@@ -308,23 +308,16 @@ static struct resource *__pci_mmap_make_
                            enum pci_mmap_state mmap_state)
 {
     struct pci_controller *hose = pci_bus_to_host(dev->bus);
-    unsigned long io_offset = 0;
     int i, res_bit;

     if (hose == NULL)
         return NULL;        /* should never happen */

     /* If memory, add on the PCI bridge address offset */
-    if (mmap_state == pci_mmap_mem) {
-#if 0 /* See comment in pci_resource_to_user() for why this is disabled */
-        *offset += hose->pci_mem_offset;
-#endif
+    if (mmap_state == pci_mmap_mem)
         res_bit = IORESOURCE_MEM;
-    } else {
-        io_offset = (unsigned long)hose->io_base_virt - _IO_BASE;
-        *offset += io_offset;
+    else
         res_bit = IORESOURCE_IO;
-    }

     /*
      * Check that the offset requested corresponds to one of the
@@ -348,7 +341,8 @@ static struct resource *__pci_mmap_make_

         /* found it! construct the final physical address */
         if (mmap_state == pci_mmap_io)
-            *offset += hose->io_base_phys - io_offset;
+            *offset += hose->io_base_phys -
+                 ((unsigned long)hose->io_base_virt - _IO_BASE);
         return rp;
     }

@@ -606,45 +600,6 @@ int pci_mmap_legacy_page_range(struct pc
                    vma->vm_page_prot);
 }

-void pci_resource_to_user(const struct pci_dev *dev, int bar,
-              const struct resource *rsrc,
-              resource_size_t *start, resource_size_t *end)
-{
-    struct pci_controller *hose = pci_bus_to_host(dev->bus);
-    resource_size_t offset = 0;
-
-    if (hose == NULL)
-        return;
-
-    if (rsrc->flags & IORESOURCE_IO)
-        offset = (unsigned long)hose->io_base_virt - _IO_BASE;
-
-    /* We pass a fully fixed up address to userland for MMIO instead of
-     * a BAR value because X is lame and expects to be able to use that
-     * to pass to /dev/mem !
-     *
-     * That means that we'll have potentially 64 bits values where some
-     * userland apps only expect 32 (like X itself since it thinks only
-     * Sparc has 64 bits MMIO) but if we don't do that, we break it on
-     * 32 bits CHRPs :-(
-     *
-     * Hopefully, the sysfs insterface is immune to that gunk. Once X
-     * has been fixed (and the fix spread enough), we can re-enable the
-     * 2 lines below and pass down a BAR value to userland. In that case
-     * we'll also have to re-enable the matching code in
-     * __pci_mmap_make_offset().
-     *
-     * BenH.
-     */
-#if 0
-    else if (rsrc->flags & IORESOURCE_MEM)
-        offset = hose->pci_mem_offset;
-#endif
-
-    *start = rsrc->start - offset;
-    *end = rsrc->end - offset;
-}
-
 /**
  * pci_process_bridge_OF_ranges - Parse PCI bridge resources from device tree
  * @hose: newly allocated pci_controller to be setup
Index: linux-2.6/arch/sparc/include/asm/pci_64.h
===================================================================
--- linux-2.6.orig/arch/sparc/include/asm/pci_64.h
+++ linux-2.6/arch/sparc/include/asm/pci_64.h
@@ -53,11 +53,6 @@ static inline int pci_get_legacy_ide_irq
 {
     return PCI_IRQ_NONE;
 }
-
-#define HAVE_ARCH_PCI_RESOURCE_TO_USER
-void pci_resource_to_user(const struct pci_dev *dev, int bar,
-              const struct resource *rsrc,
-              resource_size_t *start, resource_size_t *end);
 #endif /* __KERNEL__ */

 #endif /* __SPARC64_PCI_H */
Index: linux-2.6/arch/sparc/kernel/pci.c
===================================================================
--- linux-2.6.orig/arch/sparc/kernel/pci.c
+++ linux-2.6/arch/sparc/kernel/pci.c
@@ -743,30 +743,21 @@ static int __pci_mmap_make_offset_bus(st
                       enum pci_mmap_state mmap_state)
 {
     struct pci_pbm_info *pbm = pdev->dev.archdata.host_controller;
-    unsigned long space_size, user_offset, user_size;
+    unsigned long start, end;
+    struct resource *res;

-    if (mmap_state == pci_mmap_io) {
-        space_size = resource_size(&pbm->io_space);
-    } else {
-        space_size = resource_size(&pbm->mem_space);
-    }
+    if (mmap_state == pci_mmap_io)
+        res = &pbm->io_space;
+    else
+        res = &pbm->mem_space;

     /* Make sure the request is in range. */
-    user_offset = vma->vm_pgoff << PAGE_SHIFT;
-    user_size = vma->vm_end - vma->vm_start;
+    start = vma->vm_pgoff << PAGE_SHIFT;
+    end = vma->vm_end - vma->vm_start + start - 1;

-    if (user_offset >= space_size ||
-        (user_offset + user_size) > space_size)
+    if (!((res->start <= start) && (res->end >= end)))
         return -EINVAL;

-    if (mmap_state == pci_mmap_io) {
-        vma->vm_pgoff = (pbm->io_space.start +
-                 user_offset) >> PAGE_SHIFT;
-    } else {
-        vma->vm_pgoff = (pbm->mem_space.start +
-                 user_offset) >> PAGE_SHIFT;
-    }
-
     return 0;
 }

@@ -982,22 +973,6 @@ int pci64_dma_supported(struct pci_dev *
     return (device_mask & dma_addr_mask) == dma_addr_mask;
 }

-void pci_resource_to_user(const struct pci_dev *pdev, int bar,
-              const struct resource *rp, resource_size_t *start,
-              resource_size_t *end)
-{
-    struct pci_pbm_info *pbm = pdev->dev.archdata.host_controller;
-    unsigned long offset;
-
-    if (rp->flags & IORESOURCE_IO)
-        offset = pbm->io_space.start;
-    else
-        offset = pbm->mem_space.start;
-
-    *start = rp->start - offset;
-    *end = rp->end - offset;
-}
-
 void pcibios_set_master(struct pci_dev *dev)
 {
     /* No special bus mastering setup handling */
Index: linux-2.6/arch/xtensa/kernel/pci.c
===================================================================
--- linux-2.6.orig/arch/xtensa/kernel/pci.c
+++ linux-2.6/arch/xtensa/kernel/pci.c
@@ -288,20 +288,16 @@ __pci_mmap_make_offset(struct pci_dev *d
 {
     struct pci_controller *pci_ctrl = (struct pci_controller*) dev->sysdata;
     unsigned long offset = vma->vm_pgoff << PAGE_SHIFT;
-    unsigned long io_offset = 0;
     int i, res_bit;

     if (pci_ctrl == 0)
         return -EINVAL;        /* should never happen */

     /* If memory, add on the PCI bridge address offset */
-    if (mmap_state == pci_mmap_mem) {
+    if (mmap_state == pci_mmap_mem)
         res_bit = IORESOURCE_MEM;
-    } else {
-        io_offset = (unsigned long)pci_ctrl->io_space.base;
-        offset += io_offset;
+    else
         res_bit = IORESOURCE_IO;
-    }

     /*
      * Check that the offset requested corresponds to one of the
@@ -325,7 +321,8 @@ __pci_mmap_make_offset(struct pci_dev *d

         /* found it! construct the final physical address */
         if (mmap_state == pci_mmap_io)
-            offset += pci_ctrl->io_space.start - io_offset;
+            offset += pci_ctrl->io_space.start -
+                    pci_ctrl->io_space.base;
         vma->vm_pgoff = offset >> PAGE_SHIFT;
         return 0;
     }
Index: linux-2.6/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-sysfs.c
+++ linux-2.6/drivers/pci/pci-sysfs.c
@@ -143,10 +143,9 @@ static ssize_t resource_show(struct devi

     for (i = 0; i < max; i++) {
         struct resource *res =  &pci_dev->resource[i];
-        pci_resource_to_user(pci_dev, i, res, &start, &end);
         str += sprintf(str, "0x%016llx 0x%016llx 0x%016llx\n",
-                   (unsigned long long)start,
-                   (unsigned long long)end,
+                   (unsigned long long)res->start,
+                   (unsigned long long)res->end,
                    (unsigned long long)res->flags);
     }
     return (str - buf);
@@ -999,7 +998,6 @@ static int pci_mmap_resource(struct kobj
     struct pci_dev *pdev = to_pci_dev(kobj_to_dev(kobj));
     struct resource *res = attr->private;
     enum pci_mmap_state mmap_type;
-    resource_size_t start, end;
     int i;

     for (i = 0; i < PCI_ROM_RESOURCE; i++)
@@ -1020,12 +1018,7 @@ static int pci_mmap_resource(struct kobj
         return -EINVAL;
     }

-    /* pci_mmap_page_range() expects the same kind of entry as coming
-     * from /proc/bus/pci/ which is a "user visible" value. If this is
-     * different from the resource itself, arch will do necessary fixup.
-     */
-    pci_resource_to_user(pdev, i, res, &start, &end);
-    vma->vm_pgoff += start >> PAGE_SHIFT;
+    vma->vm_pgoff += res->start >> PAGE_SHIFT;
     mmap_type = res->flags & IORESOURCE_MEM ? pci_mmap_mem : pci_mmap_io;
     return pci_mmap_page_range(pdev, vma, mmap_type, write_combine);
 }
Index: linux-2.6/drivers/pci/proc.c
===================================================================
--- linux-2.6.orig/drivers/pci/proc.c
+++ linux-2.6/drivers/pci/proc.c
@@ -343,20 +343,16 @@ static int show_device(struct seq_file *
             dev->irq);

     /* only print standard and ROM resources to preserve compatibility */
-    for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
-        resource_size_t start, end;
-        pci_resource_to_user(dev, i, &dev->resource[i], &start, &end);
+    for (i = 0; i <= PCI_ROM_RESOURCE; i++)
         seq_printf(m, "\t%16llx",
-            (unsigned long long)(start |
+            (unsigned long long)(dev->resource[i].start |
             (dev->resource[i].flags & PCI_REGION_FLAG_MASK)));
-    }
-    for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
-        resource_size_t start, end;
-        pci_resource_to_user(dev, i, &dev->resource[i], &start, &end);
+
+    for (i = 0; i <= PCI_ROM_RESOURCE; i++)
         seq_printf(m, "\t%16llx",
             dev->resource[i].start < dev->resource[i].end ?
-            (unsigned long long)(end - start) + 1 : 0);
-    }
+              (unsigned long long)resource_size(&dev->resource[i]) : 0);
+
     seq_putc(m, '\t');
     if (drv)
         seq_printf(m, "%s", drv->name);
Index: linux-2.6/include/linux/pci.h
===================================================================
--- linux-2.6.orig/include/linux/pci.h
+++ linux-2.6/include/linux/pci.h
@@ -1545,21 +1545,6 @@ static inline const char *pci_name(const
     return dev_name(&pdev->dev);
 }

-
-/* Some archs don't want to expose struct resource to userland as-is
- * in sysfs and /proc
- */
-#ifndef HAVE_ARCH_PCI_RESOURCE_TO_USER
-static inline void pci_resource_to_user(const struct pci_dev *dev, int bar,
-        const struct resource *rsrc, resource_size_t *start,
-        resource_size_t *end)
-{
-    *start = rsrc->start;
-    *end = rsrc->end;
-}
-#endif /* HAVE_ARCH_PCI_RESOURCE_TO_USER */
-
-
 /*
  *  The world is not perfect and supplies us with broken PCI devices.
  *  For at least a part of these bugs we need a work-around, so both
Index: linux-2.6/kernel/trace/trace_mmiotrace.c
===================================================================
--- linux-2.6.orig/kernel/trace/trace_mmiotrace.c
+++ linux-2.6/kernel/trace/trace_mmiotrace.c
@@ -62,29 +62,22 @@ static void mmio_trace_start(struct trac
 static void mmio_print_pcidev(struct trace_seq *s, const struct pci_dev *dev)
 {
     int i;
-    resource_size_t start, end;
     const struct pci_driver *drv = pci_dev_driver(dev);

     trace_seq_printf(s, "PCIDEV %02x%02x %04x%04x %x",
              dev->bus->number, dev->devfn,
              dev->vendor, dev->device, dev->irq);
-    /*
-     * XXX: is pci_resource_to_user() appropriate, since we are
-     * supposed to interpret the __ioremap() phys_addr argument based on
-     * these printed values?
-     */
-    for (i = 0; i < 7; i++) {
-        pci_resource_to_user(dev, i, &dev->resource[i], &start, &end);
+
+    for (i = 0; i < 7; i++)
         trace_seq_printf(s, " %llx",
-            (unsigned long long)(start |
+            (unsigned long long)(dev->resource[i].start |
             (dev->resource[i].flags & PCI_REGION_FLAG_MASK)));
-    }
-    for (i = 0; i < 7; i++) {
-        pci_resource_to_user(dev, i, &dev->resource[i], &start, &end);
+
+    for (i = 0; i < 7; i++)
         trace_seq_printf(s, " %llx",
             dev->resource[i].start < dev->resource[i].end ?
-            (unsigned long long)(end - start) + 1 : 0);
-    }
+              (unsigned long long)resource_size(&dev->resource[i]) : 0);
+
     if (drv)
         trace_seq_printf(s, " %s\n", drv->name);
     else

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v11 04/60] sparc/PCI: Use correct offset for bus address to resource
  2016-05-03 22:52           ` Yinghai Lu
@ 2016-05-04  0:37             ` Benjamin Herrenschmidt
  2016-05-04  1:25               ` Bjorn Helgaas
  2016-05-04  4:17               ` David Miller
  0 siblings, 2 replies; 86+ messages in thread
From: Benjamin Herrenschmidt @ 2016-05-04  0:37 UTC (permalink / raw)
  To: Yinghai Lu, Bjorn Helgaas, Bjorn Helgaas, David Miller, Linus Torvalds
  Cc: Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci,
	Linux Kernel Mailing List, Michael Ellerman

On Tue, 2016-05-03 at 15:52 -0700, Yinghai Lu wrote:
> BenH and DavidM,
> Are you ok to let /proc/bus/pci/devices to expose resource value
> instead of
> BAR value?
> powerpc already expose MMIO as resource value, but still keep IO as
> BAR value?
> 
> Or can we just dump /proc/bus/pci support from now?

The problem tends to be old Xserver expectations...

That stuff has been a can of worms over the years and we did things in
the kernel to work around X limitations. I'm not that keen on touching
/proc at all in that regard. Leave it there do what it does today, it's
a user visible ABI, don't change it.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v11 04/60] sparc/PCI: Use correct offset for bus address to resource
  2016-05-04  0:37             ` Benjamin Herrenschmidt
@ 2016-05-04  1:25               ` Bjorn Helgaas
  2016-05-04  5:08                 ` Yinghai Lu
  2016-05-04  4:17               ` David Miller
  1 sibling, 1 reply; 86+ messages in thread
From: Bjorn Helgaas @ 2016-05-04  1:25 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Yinghai Lu, Bjorn Helgaas, David Miller, Linus Torvalds,
	Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci,
	Linux Kernel Mailing List, Michael Ellerman

On Wed, May 04, 2016 at 10:37:40AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2016-05-03 at 15:52 -0700, Yinghai Lu wrote:
> > BenH and DavidM,
> > Are you ok to let /proc/bus/pci/devices to expose resource value
> > instead of
> > BAR value?
> > powerpc already expose MMIO as resource value, but still keep IO as
> > BAR value?
> > 
> > Or can we just dump /proc/bus/pci support from now?
> 
> The problem tends to be old Xserver expectations...
> 
> That stuff has been a can of worms over the years and we did things in
> the kernel to work around X limitations. I'm not that keen on touching
> /proc at all in that regard. Leave it there do what it does today, it's
> a user visible ABI, don't change it.

I did not propose changing any user-visible ABI.  To recap what I did
propose:

  - The sysfs path uses offsets between 0 and BAR size on all arches.
    It uses pci_resource_to_user() today, but I think it should not.

  - The procfs path uses offsets of resource values (CPU physical
    addresses) on most architectures, but uses something else, e.g.,
    BAR values, on others.  pci_resource_to_user() does this
    translation.  The procfs path does not use pci_resource_to_user()
    today, but I think it should.

  - This implies that pci_mmap_page_range() should deal with resource
    values (CPU physical addresses), and proc_bus_pci_mmap() should do
    any necessary arch-specific translation from BAR values to
    resource values.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v11 04/60] sparc/PCI: Use correct offset for bus address to resource
  2016-05-04  0:37             ` Benjamin Herrenschmidt
  2016-05-04  1:25               ` Bjorn Helgaas
@ 2016-05-04  4:17               ` David Miller
  1 sibling, 0 replies; 86+ messages in thread
From: David Miller @ 2016-05-04  4:17 UTC (permalink / raw)
  To: benh
  Cc: yinghai, helgaas, bhelgaas, torvalds, weiyang, linux, wangyijing,
	khalid.aziz, linux-pci, linux-kernel, mpe

From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date: Wed, 04 May 2016 10:37:40 +1000

> On Tue, 2016-05-03 at 15:52 -0700, Yinghai Lu wrote:
>> BenH and DavidM,
>> Are you ok to let /proc/bus/pci/devices to expose resource value
>> instead of
>> BAR value?
>> powerpc already expose MMIO as resource value, but still keep IO as
>> BAR value?
>> 
>> Or can we just dump /proc/bus/pci support from now?
> 
> The problem tends to be old Xserver expectations...
> 
> That stuff has been a can of worms over the years and we did things in
> the kernel to work around X limitations. I'm not that keen on touching
> /proc at all in that regard. Leave it there do what it does today, it's
> a user visible ABI, don't change it.

I agree with Ben, whatever procfs is exporting in the past is what
the Xserver and other things expect on sparc64 and therefore is a
user facing ABI that can't be changed.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v11 04/60] sparc/PCI: Use correct offset for bus address to resource
  2016-05-04  1:25               ` Bjorn Helgaas
@ 2016-05-04  5:08                 ` Yinghai Lu
  2016-05-04  5:52                   ` Yinghai Lu
  0 siblings, 1 reply; 86+ messages in thread
From: Yinghai Lu @ 2016-05-04  5:08 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Benjamin Herrenschmidt, Bjorn Helgaas, David Miller,
	Linus Torvalds, Wei Yang, TJ, Yijing Wang, Khalid Aziz,
	linux-pci, Linux Kernel Mailing List, Michael Ellerman

On Tue, May 3, 2016 at 6:25 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Wed, May 04, 2016 at 10:37:40AM +1000, Benjamin Herrenschmidt wrote:
>>
>> The problem tends to be old Xserver expectations...
>>
>> That stuff has been a can of worms over the years and we did things in
>> the kernel to work around X limitations. I'm not that keen on touching
>> /proc at all in that regard. Leave it there do what it does today, it's
>> a user visible ABI, don't change it.
>
> I did not propose changing any user-visible ABI.  To recap what I did
> propose:

I want to avoid introduce one strange pci_user_to_resource.

>
>   - The sysfs path uses offsets between 0 and BAR size on all arches.
>     It uses pci_resource_to_user() today, but I think it should not.
>
>   - The procfs path uses offsets of resource values (CPU physical
>     addresses) on most architectures, but uses something else, e.g.,
>     BAR values, on others.  pci_resource_to_user() does this
>     translation.  The procfs path does not use pci_resource_to_user()
>     today, but I think it should.

current powerpc pci_resource_to_user is strange:
it will return resource start for io mem.
but will return BAR (?) start for io port.

sparc pci_resource_to_user does return BAR value for iomem.

>
>   - This implies that pci_mmap_page_range() should deal with resource
>     values (CPU physical addresses), and proc_bus_pci_mmap() should do
>     any necessary arch-specific translation from BAR values to
>     resource values.

so will need one different version pci_user_to_resource.
and can not use pcibios_bus_to_resource directly, and will be another mess.

Can you reconsider to keep the pci_mmap_page_range still use BAR value
and only touch pci_mmap_fits() ?

like i suggested before, and it does not conflict with the patchset.

Subject: [PATCH] PCI: Fix pci_mmap_fits() with proc interface

Passed value is user address, so need to compare it with
user addr that is converted from dev resource.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/pci-sysfs.c |   10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

Index: linux-2.6/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-sysfs.c
+++ linux-2.6/drivers/pci/pci-sysfs.c
@@ -969,15 +969,19 @@ void pci_remove_legacy_files(struct pci_
 int pci_mmap_fits(struct pci_dev *pdev, int resno, struct vm_area_struct *vma,
           enum pci_mmap_api mmap_api)
 {
-    unsigned long nr, start, size, pci_start;
+    unsigned long nr, start, size;
+    resource_size_t pci_start = 0, pci_end;

     if (pci_resource_len(pdev, resno) == 0)
         return 0;
     nr = vma_pages(vma);
     start = vma->vm_pgoff;
     size = ((pci_resource_len(pdev, resno) - 1) >> PAGE_SHIFT) + 1;
-    pci_start = (mmap_api == PCI_MMAP_PROCFS) ?
-            pci_resource_start(pdev, resno) >> PAGE_SHIFT : 0;
+    if (mmap_api == PCI_MMAP_PROCFS) {
+        pci_resource_to_user(pdev, resno, &pdev->resource[resno],
+                     &pci_start, &pci_end);
+        pci_start >>= PAGE_SHIFT;
+    }
     if (start >= pci_start && start < pci_start + size &&
             start + nr <= pci_start + size)
         return 1;

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v11 04/60] sparc/PCI: Use correct offset for bus address to resource
  2016-05-04  5:08                 ` Yinghai Lu
@ 2016-05-04  5:52                   ` Yinghai Lu
  2016-05-04 15:17                     ` Bjorn Helgaas
  0 siblings, 1 reply; 86+ messages in thread
From: Yinghai Lu @ 2016-05-04  5:52 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Benjamin Herrenschmidt, Bjorn Helgaas, David Miller,
	Linus Torvalds, Wei Yang, TJ, Yijing Wang, Khalid Aziz,
	linux-pci, Linux Kernel Mailing List, Michael Ellerman

On Tue, May 3, 2016 at 10:08 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Tue, May 3, 2016 at 6:25 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
>> I did not propose changing any user-visible ABI.  To recap what I did
>> propose:
>
> I want to avoid introduce one strange pci_user_to_resource.
>
>>
>>   - The sysfs path uses offsets between 0 and BAR size on all arches.
>>     It uses pci_resource_to_user() today, but I think it should not.
>>
>>   - The procfs path uses offsets of resource values (CPU physical
>>     addresses) on most architectures, but uses something else, e.g.,
>>     BAR values, on others.  pci_resource_to_user() does this
>>     translation.  The procfs path does not use pci_resource_to_user()
>>     today, but I think it should.
>
> current powerpc pci_resource_to_user is strange:
> it will return resource start for io mem.
> but will return BAR (?) start for io port.
>
> sparc pci_resource_to_user does return BAR value for iomem.
>
>>
>>   - This implies that pci_mmap_page_range() should deal with resource
>>     values (CPU physical addresses), and proc_bus_pci_mmap() should do
>>     any necessary arch-specific translation from BAR values to
>>     resource values.
>
> so will need one different version pci_user_to_resource.
> and can not use pcibios_bus_to_resource directly, and will be another mess.

looks like we can avoid that pci_user_to_resource() via trying out.

Please check it:


Subject: [RFC PATCH] PCI: Let pci_mmap_page_range() take resource addr

Some arch where cpu address (resource value) is not same as pci bus address
(BAR value in pci BAR registers), include sparc, powerpc, microblaze.

In 8c05cd08a7 ("PCI: fix offset check for sysfs mmapped files"), try
to check exposed value with resource start/end in proc mmap path.
|        start = vma->vm_pgoff;
|        size = ((pci_resource_len(pdev, resno) - 1) >> PAGE_SHIFT) + 1;
|        pci_start = (mmap_api == PCI_MMAP_PROCFS) ?
|                        pci_resource_start(pdev, resno) >> PAGE_SHIFT : 0;
|        if (start >= pci_start && start < pci_start + size &&
|                        start + nr <= pci_start + size)
That would break sparc that exposed value is still BAR value.

According to Bjorn, we could just pass resource addr instead of BAR.

In the patch:
1. in proc path: proc_bus_pci_mmap, try convert back to resource
   before calling pci_mmap_page_range
2. in sysfs path: pci_mmap_resource will just offset with resource start.
3. all pci_mmap_page_range will all have vma->vm_pgoff with in resource range
   instead of BAR value.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/microblaze/pci/pci-common.c |   14 ++++----------
 arch/powerpc/kernel/pci-common.c |   14 ++++----------
 arch/sparc/kernel/pci.c          |   27 +++++++++------------------
 arch/xtensa/kernel/pci.c         |   11 ++++-------
 drivers/pci/pci-sysfs.c          |    8 +-------
 drivers/pci/proc.c               |   13 +++++++++++++
 6 files changed, 35 insertions(+), 52 deletions(-)

Index: linux-2.6/arch/microblaze/pci/pci-common.c
===================================================================
--- linux-2.6.orig/arch/microblaze/pci/pci-common.c
+++ linux-2.6/arch/microblaze/pci/pci-common.c
@@ -169,23 +169,16 @@ static struct resource *__pci_mmap_make_
                            enum pci_mmap_state mmap_state)
 {
     struct pci_controller *hose = pci_bus_to_host(dev->bus);
-    unsigned long io_offset = 0;
     int i, res_bit;

     if (!hose)
         return NULL;        /* should never happen */

     /* If memory, add on the PCI bridge address offset */
-    if (mmap_state == pci_mmap_mem) {
-#if 0 /* See comment in pci_resource_to_user() for why this is disabled */
-        *offset += hose->pci_mem_offset;
-#endif
+    if (mmap_state == pci_mmap_mem)
         res_bit = IORESOURCE_MEM;
-    } else {
-        io_offset = (unsigned long)hose->io_base_virt - _IO_BASE;
-        *offset += io_offset;
+    else
         res_bit = IORESOURCE_IO;
-    }

     /*
      * Check that the offset requested corresponds to one of the
@@ -209,7 +202,8 @@ static struct resource *__pci_mmap_make_

         /* found it! construct the final physical address */
         if (mmap_state == pci_mmap_io)
-            *offset += hose->io_base_phys - io_offset;
+            *offset += hose->io_base_phys -
+                 ((unsigned long)hose->io_base_virt - _IO_BASE);
         return rp;
     }

Index: linux-2.6/arch/powerpc/kernel/pci-common.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/pci-common.c
+++ linux-2.6/arch/powerpc/kernel/pci-common.c
@@ -308,23 +308,16 @@ static struct resource *__pci_mmap_make_
                            enum pci_mmap_state mmap_state)
 {
     struct pci_controller *hose = pci_bus_to_host(dev->bus);
-    unsigned long io_offset = 0;
     int i, res_bit;

     if (hose == NULL)
         return NULL;        /* should never happen */

     /* If memory, add on the PCI bridge address offset */
-    if (mmap_state == pci_mmap_mem) {
-#if 0 /* See comment in pci_resource_to_user() for why this is disabled */
-        *offset += hose->pci_mem_offset;
-#endif
+    if (mmap_state == pci_mmap_mem)
         res_bit = IORESOURCE_MEM;
-    } else {
-        io_offset = (unsigned long)hose->io_base_virt - _IO_BASE;
-        *offset += io_offset;
+    else
         res_bit = IORESOURCE_IO;
-    }

     /*
      * Check that the offset requested corresponds to one of the
@@ -348,7 +341,8 @@ static struct resource *__pci_mmap_make_

         /* found it! construct the final physical address */
         if (mmap_state == pci_mmap_io)
-            *offset += hose->io_base_phys - io_offset;
+            *offset += hose->io_base_phys -
+                 ((unsigned long)hose->io_base_virt - _IO_BASE);
         return rp;
     }

Index: linux-2.6/arch/sparc/kernel/pci.c
===================================================================
--- linux-2.6.orig/arch/sparc/kernel/pci.c
+++ linux-2.6/arch/sparc/kernel/pci.c
@@ -743,30 +743,21 @@ static int __pci_mmap_make_offset_bus(st
                       enum pci_mmap_state mmap_state)
 {
     struct pci_pbm_info *pbm = pdev->dev.archdata.host_controller;
-    unsigned long space_size, user_offset, user_size;
+    unsigned long start, end;
+    struct resource *res;

-    if (mmap_state == pci_mmap_io) {
-        space_size = resource_size(&pbm->io_space);
-    } else {
-        space_size = resource_size(&pbm->mem_space);
-    }
+    if (mmap_state == pci_mmap_io)
+        res = &pbm->io_space;
+    else
+        res = &pbm->mem_space;

     /* Make sure the request is in range. */
-    user_offset = vma->vm_pgoff << PAGE_SHIFT;
-    user_size = vma->vm_end - vma->vm_start;
+    start = vma->vm_pgoff << PAGE_SHIFT;
+    end = vma->vm_end - vma->vm_start + start - 1;

-    if (user_offset >= space_size ||
-        (user_offset + user_size) > space_size)
+    if (!((res->start <= start) && (res->end >= end)))
         return -EINVAL;

-    if (mmap_state == pci_mmap_io) {
-        vma->vm_pgoff = (pbm->io_space.start +
-                 user_offset) >> PAGE_SHIFT;
-    } else {
-        vma->vm_pgoff = (pbm->mem_space.start +
-                 user_offset) >> PAGE_SHIFT;
-    }
-
     return 0;
 }

Index: linux-2.6/arch/xtensa/kernel/pci.c
===================================================================
--- linux-2.6.orig/arch/xtensa/kernel/pci.c
+++ linux-2.6/arch/xtensa/kernel/pci.c
@@ -288,20 +288,16 @@ __pci_mmap_make_offset(struct pci_dev *d
 {
     struct pci_controller *pci_ctrl = (struct pci_controller*) dev->sysdata;
     unsigned long offset = vma->vm_pgoff << PAGE_SHIFT;
-    unsigned long io_offset = 0;
     int i, res_bit;

     if (pci_ctrl == 0)
         return -EINVAL;        /* should never happen */

     /* If memory, add on the PCI bridge address offset */
-    if (mmap_state == pci_mmap_mem) {
+    if (mmap_state == pci_mmap_mem)
         res_bit = IORESOURCE_MEM;
-    } else {
-        io_offset = (unsigned long)pci_ctrl->io_space.base;
-        offset += io_offset;
+    else
         res_bit = IORESOURCE_IO;
-    }

     /*
      * Check that the offset requested corresponds to one of the
@@ -325,7 +321,8 @@ __pci_mmap_make_offset(struct pci_dev *d

         /* found it! construct the final physical address */
         if (mmap_state == pci_mmap_io)
-            offset += pci_ctrl->io_space.start - io_offset;
+            offset += pci_ctrl->io_space.start -
+                    pci_ctrl->io_space.base;
         vma->vm_pgoff = offset >> PAGE_SHIFT;
         return 0;
     }
Index: linux-2.6/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-sysfs.c
+++ linux-2.6/drivers/pci/pci-sysfs.c
@@ -999,7 +999,6 @@ static int pci_mmap_resource(struct kobj
     struct pci_dev *pdev = to_pci_dev(kobj_to_dev(kobj));
     struct resource *res = attr->private;
     enum pci_mmap_state mmap_type;
-    resource_size_t start, end;
     int i;

     for (i = 0; i < PCI_ROM_RESOURCE; i++)
@@ -1020,12 +1019,7 @@ static int pci_mmap_resource(struct kobj
         return -EINVAL;
     }

-    /* pci_mmap_page_range() expects the same kind of entry as coming
-     * from /proc/bus/pci/ which is a "user visible" value. If this is
-     * different from the resource itself, arch will do necessary fixup.
-     */
-    pci_resource_to_user(pdev, i, res, &start, &end);
-    vma->vm_pgoff += start >> PAGE_SHIFT;
+    vma->vm_pgoff += res->start >> PAGE_SHIFT;
     mmap_type = res->flags & IORESOURCE_MEM ? pci_mmap_mem : pci_mmap_io;
     return pci_mmap_page_range(pdev, vma, mmap_type, write_combine);
 }
Index: linux-2.6/drivers/pci/proc.c
===================================================================
--- linux-2.6.orig/drivers/pci/proc.c
+++ linux-2.6/drivers/pci/proc.c
@@ -231,13 +231,26 @@ static int proc_bus_pci_mmap(struct file
 {
     struct pci_dev *dev = PDE_DATA(file_inode(file));
     struct pci_filp_private *fpriv = file->private_data;
+    resource_size_t start, end, offset;
+    struct resource *res;
     int i, ret;

     if (!capable(CAP_SYS_RAWIO))
         return -EPERM;

+    offset = vma->vm_pgoff << PAGE_SHIFT;
+
     /* Make sure the caller is mapping a real resource for this device */
     for (i = 0; i < PCI_ROM_RESOURCE; i++) {
+        res = &dev->resource[i];
+        if (!res->flags)
+            continue;
+
+        pci_resource_to_user(dev, i, res, &start, &end);
+        if (!(offset >= start && offset <= end))
+            continue;
+
+        vma->vm_pgoff = (res->start + (offset - start)) >> PAGE_SHIFT;
         if (pci_mmap_fits(dev, i, vma,  PCI_MMAP_PROCFS))
             break;
     }

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v11 04/60] sparc/PCI: Use correct offset for bus address to resource
  2016-05-04  5:52                   ` Yinghai Lu
@ 2016-05-04 15:17                     ` Bjorn Helgaas
  2016-05-04 18:46                       ` Yinghai Lu
  0 siblings, 1 reply; 86+ messages in thread
From: Bjorn Helgaas @ 2016-05-04 15:17 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Benjamin Herrenschmidt, Bjorn Helgaas, David Miller,
	Linus Torvalds, Wei Yang, TJ, Yijing Wang, Khalid Aziz,
	linux-pci, Linux Kernel Mailing List, Michael Ellerman

On Tue, May 03, 2016 at 10:52:33PM -0700, Yinghai Lu wrote:
> On Tue, May 3, 2016 at 10:08 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> > On Tue, May 3, 2016 at 6:25 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> >> I did not propose changing any user-visible ABI.  To recap what I did
> >> propose:
> >
> > I want to avoid introduce one strange pci_user_to_resource.
> >
> >>
> >>   - The sysfs path uses offsets between 0 and BAR size on all arches.
> >>     It uses pci_resource_to_user() today, but I think it should not.
> >>
> >>   - The procfs path uses offsets of resource values (CPU physical
> >>     addresses) on most architectures, but uses something else, e.g.,
> >>     BAR values, on others.  pci_resource_to_user() does this
> >>     translation.  The procfs path does not use pci_resource_to_user()
> >>     today, but I think it should.
> >
> > current powerpc pci_resource_to_user is strange:
> > it will return resource start for io mem.
> > but will return BAR (?) start for io port.
> >
> > sparc pci_resource_to_user does return BAR value for iomem.

That means it should be implemented using pcibios_resource_to_bus().

> >>   - This implies that pci_mmap_page_range() should deal with resource
> >>     values (CPU physical addresses), and proc_bus_pci_mmap() should do
> >>     any necessary arch-specific translation from BAR values to
> >>     resource values.
> >
> > so will need one different version pci_user_to_resource.
> > and can not use pcibios_bus_to_resource directly, and will be another mess.

What mess do you mean?  The fact that you could only use
pcibios_bus_to_resource() for MEM, and something else for IO?  Even
if we could only use pcibios_bus_to_resource() for MEM, that sounds
like an improvement, not a mess.

> looks like we can avoid that pci_user_to_resource() via trying out.
> Please check it:
> 
> 
> Subject: [RFC PATCH] PCI: Let pci_mmap_page_range() take resource addr
> 
> Some arch where cpu address (resource value) is not same as pci bus address
> (BAR value in pci BAR registers), include sparc, powerpc, microblaze.

This comment is irrelevant.  *Lots* of arches have CPU addresses
different from PCI bus addresses, including alpha, arm, ia64, mips,
mn10300, parisc, tile, xtensa, and x86.

> In 8c05cd08a7 ("PCI: fix offset check for sysfs mmapped files"), try
> to check exposed value with resource start/end in proc mmap path.
> |        start = vma->vm_pgoff;
> |        size = ((pci_resource_len(pdev, resno) - 1) >> PAGE_SHIFT) + 1;
> |        pci_start = (mmap_api == PCI_MMAP_PROCFS) ?
> |                        pci_resource_start(pdev, resno) >> PAGE_SHIFT : 0;
> |        if (start >= pci_start && start < pci_start + size &&
> |                        start + nr <= pci_start + size)
> That would break sparc that exposed value is still BAR value.
> 
> According to Bjorn, we could just pass resource addr instead of BAR.
> 
> In the patch:
> 1. in proc path: proc_bus_pci_mmap, try convert back to resource
>    before calling pci_mmap_page_range
> 2. in sysfs path: pci_mmap_resource will just offset with resource start.
> 3. all pci_mmap_page_range will all have vma->vm_pgoff with in resource range
>    instead of BAR value.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> 
> ---
>  arch/microblaze/pci/pci-common.c |   14 ++++----------
>  arch/powerpc/kernel/pci-common.c |   14 ++++----------
>  arch/sparc/kernel/pci.c          |   27 +++++++++------------------
>  arch/xtensa/kernel/pci.c         |   11 ++++-------
>  drivers/pci/pci-sysfs.c          |    8 +-------
>  drivers/pci/proc.c               |   13 +++++++++++++
>  6 files changed, 35 insertions(+), 52 deletions(-)
> 
> Index: linux-2.6/arch/microblaze/pci/pci-common.c
> ===================================================================
> --- linux-2.6.orig/arch/microblaze/pci/pci-common.c
> +++ linux-2.6/arch/microblaze/pci/pci-common.c
> @@ -169,23 +169,16 @@ static struct resource *__pci_mmap_make_
>                             enum pci_mmap_state mmap_state)
>  {
>      struct pci_controller *hose = pci_bus_to_host(dev->bus);
> -    unsigned long io_offset = 0;
>      int i, res_bit;
> 
>      if (!hose)
>          return NULL;        /* should never happen */
> 
>      /* If memory, add on the PCI bridge address offset */
> -    if (mmap_state == pci_mmap_mem) {
> -#if 0 /* See comment in pci_resource_to_user() for why this is disabled */
> -        *offset += hose->pci_mem_offset;
> -#endif
> +    if (mmap_state == pci_mmap_mem)
>          res_bit = IORESOURCE_MEM;
> -    } else {
> -        io_offset = (unsigned long)hose->io_base_virt - _IO_BASE;
> -        *offset += io_offset;
> +    else
>          res_bit = IORESOURCE_IO;
> -    }
> 
>      /*
>       * Check that the offset requested corresponds to one of the
> @@ -209,7 +202,8 @@ static struct resource *__pci_mmap_make_
> 
>          /* found it! construct the final physical address */
>          if (mmap_state == pci_mmap_io)
> -            *offset += hose->io_base_phys - io_offset;
> +            *offset += hose->io_base_phys -
> +                 ((unsigned long)hose->io_base_virt - _IO_BASE);
>          return rp;
>      }

Most of __pci_mmap_make_offset() is pointless.

We might need something there for I/O regions, but for MEM, the
vma->vm_pgoff coming into pci_mmap_page_range() should be exactly what
we need and we shouldn't touch it.  I think __pci_mmap_make_offset()
actually does leave it alone for MEM, but you have to read the code
carefully to figure that out.

All the validation stuff ("Check that the offset requested corresponds
to one of the resources...") should be removed or moved to
pci_mmap_fits().

> Index: linux-2.6/arch/powerpc/kernel/pci-common.c
> ===================================================================
> --- linux-2.6.orig/arch/powerpc/kernel/pci-common.c
> +++ linux-2.6/arch/powerpc/kernel/pci-common.c
> @@ -308,23 +308,16 @@ static struct resource *__pci_mmap_make_
>                             enum pci_mmap_state mmap_state)
>  {
>      struct pci_controller *hose = pci_bus_to_host(dev->bus);
> -    unsigned long io_offset = 0;
>      int i, res_bit;
> 
>      if (hose == NULL)
>          return NULL;        /* should never happen */
> 
>      /* If memory, add on the PCI bridge address offset */
> -    if (mmap_state == pci_mmap_mem) {
> -#if 0 /* See comment in pci_resource_to_user() for why this is disabled */
> -        *offset += hose->pci_mem_offset;
> -#endif
> +    if (mmap_state == pci_mmap_mem)
>          res_bit = IORESOURCE_MEM;
> -    } else {
> -        io_offset = (unsigned long)hose->io_base_virt - _IO_BASE;
> -        *offset += io_offset;
> +    else
>          res_bit = IORESOURCE_IO;
> -    }
> 
>      /*
>       * Check that the offset requested corresponds to one of the
> @@ -348,7 +341,8 @@ static struct resource *__pci_mmap_make_
> 
>          /* found it! construct the final physical address */
>          if (mmap_state == pci_mmap_io)
> -            *offset += hose->io_base_phys - io_offset;
> +            *offset += hose->io_base_phys -
> +                 ((unsigned long)hose->io_base_virt - _IO_BASE);
>          return rp;
>      }
> 
> Index: linux-2.6/arch/sparc/kernel/pci.c
> ===================================================================
> --- linux-2.6.orig/arch/sparc/kernel/pci.c
> +++ linux-2.6/arch/sparc/kernel/pci.c
> @@ -743,30 +743,21 @@ static int __pci_mmap_make_offset_bus(st
>                        enum pci_mmap_state mmap_state)
>  {
>      struct pci_pbm_info *pbm = pdev->dev.archdata.host_controller;
> -    unsigned long space_size, user_offset, user_size;
> +    unsigned long start, end;
> +    struct resource *res;
> 
> -    if (mmap_state == pci_mmap_io) {
> -        space_size = resource_size(&pbm->io_space);
> -    } else {
> -        space_size = resource_size(&pbm->mem_space);
> -    }
> +    if (mmap_state == pci_mmap_io)
> +        res = &pbm->io_space;
> +    else
> +        res = &pbm->mem_space;
> 
>      /* Make sure the request is in range. */
> -    user_offset = vma->vm_pgoff << PAGE_SHIFT;
> -    user_size = vma->vm_end - vma->vm_start;
> +    start = vma->vm_pgoff << PAGE_SHIFT;
> +    end = vma->vm_end - vma->vm_start + start - 1;
> 
> -    if (user_offset >= space_size ||
> -        (user_offset + user_size) > space_size)
> +    if (!((res->start <= start) && (res->end >= end)))
>          return -EINVAL;
> 
> -    if (mmap_state == pci_mmap_io) {
> -        vma->vm_pgoff = (pbm->io_space.start +
> -                 user_offset) >> PAGE_SHIFT;
> -    } else {
> -        vma->vm_pgoff = (pbm->mem_space.start +
> -                 user_offset) >> PAGE_SHIFT;
> -    }
> -
>      return 0;
>  }
> 
> Index: linux-2.6/arch/xtensa/kernel/pci.c
> ===================================================================
> --- linux-2.6.orig/arch/xtensa/kernel/pci.c
> +++ linux-2.6/arch/xtensa/kernel/pci.c
> @@ -288,20 +288,16 @@ __pci_mmap_make_offset(struct pci_dev *d
>  {
>      struct pci_controller *pci_ctrl = (struct pci_controller*) dev->sysdata;
>      unsigned long offset = vma->vm_pgoff << PAGE_SHIFT;
> -    unsigned long io_offset = 0;
>      int i, res_bit;
> 
>      if (pci_ctrl == 0)
>          return -EINVAL;        /* should never happen */
> 
>      /* If memory, add on the PCI bridge address offset */
> -    if (mmap_state == pci_mmap_mem) {
> +    if (mmap_state == pci_mmap_mem)
>          res_bit = IORESOURCE_MEM;
> -    } else {
> -        io_offset = (unsigned long)pci_ctrl->io_space.base;
> -        offset += io_offset;
> +    else
>          res_bit = IORESOURCE_IO;
> -    }
> 
>      /*
>       * Check that the offset requested corresponds to one of the
> @@ -325,7 +321,8 @@ __pci_mmap_make_offset(struct pci_dev *d
> 
>          /* found it! construct the final physical address */
>          if (mmap_state == pci_mmap_io)
> -            offset += pci_ctrl->io_space.start - io_offset;
> +            offset += pci_ctrl->io_space.start -
> +                    pci_ctrl->io_space.base;
>          vma->vm_pgoff = offset >> PAGE_SHIFT;
>          return 0;
>      }
> Index: linux-2.6/drivers/pci/pci-sysfs.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/pci-sysfs.c
> +++ linux-2.6/drivers/pci/pci-sysfs.c
> @@ -999,7 +999,6 @@ static int pci_mmap_resource(struct kobj
>      struct pci_dev *pdev = to_pci_dev(kobj_to_dev(kobj));
>      struct resource *res = attr->private;
>      enum pci_mmap_state mmap_type;
> -    resource_size_t start, end;
>      int i;
> 
>      for (i = 0; i < PCI_ROM_RESOURCE; i++)
> @@ -1020,12 +1019,7 @@ static int pci_mmap_resource(struct kobj
>          return -EINVAL;
>      }
> 
> -    /* pci_mmap_page_range() expects the same kind of entry as coming
> -     * from /proc/bus/pci/ which is a "user visible" value. If this is
> -     * different from the resource itself, arch will do necessary fixup.
> -     */
> -    pci_resource_to_user(pdev, i, res, &start, &end);
> -    vma->vm_pgoff += start >> PAGE_SHIFT;
> +    vma->vm_pgoff += res->start >> PAGE_SHIFT;
>      mmap_type = res->flags & IORESOURCE_MEM ? pci_mmap_mem : pci_mmap_io;
>      return pci_mmap_page_range(pdev, vma, mmap_type, write_combine);

This is a definite improvement.  This path doesn't use user addresses,
so getting rid of pci_resource_to_user() here helps a lot.

>  }
> Index: linux-2.6/drivers/pci/proc.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/proc.c
> +++ linux-2.6/drivers/pci/proc.c
> @@ -231,13 +231,26 @@ static int proc_bus_pci_mmap(struct file
>  {
>      struct pci_dev *dev = PDE_DATA(file_inode(file));
>      struct pci_filp_private *fpriv = file->private_data;
> +    resource_size_t start, end, offset;
> +    struct resource *res;
>      int i, ret;
> 
>      if (!capable(CAP_SYS_RAWIO))
>          return -EPERM;
> 
> +    offset = vma->vm_pgoff << PAGE_SHIFT;
> +
>      /* Make sure the caller is mapping a real resource for this device */
>      for (i = 0; i < PCI_ROM_RESOURCE; i++) {
> +        res = &dev->resource[i];
> +        if (!res->flags)
> +            continue;
> +
> +        pci_resource_to_user(dev, i, res, &start, &end);
> +        if (!(offset >= start && offset <= end))
> +            continue;
> +
> +        vma->vm_pgoff = (res->start + (offset - start)) >> PAGE_SHIFT;
>          if (pci_mmap_fits(dev, i, vma,  PCI_MMAP_PROCFS))

This is sort of OK, but I think we can do better.  I don't see any
problem with introducing pci_user_to_resource() as the inverse of
pci_resource_to_user().  I think it will make this code read much
better.

The default pci_user_to_resource() would do nothing, just like the
default pci_resource_to_user().

For sparc, I think pci_user_to_resource() can use
pcibios_bus_to_resource(), and pci_resource_to_user() can be rewritten
to use pcibios_resource_to_bus().  That makes it much more obvious
what's happening.

It looks like microblaze and powerpc should use
pcibios_resource_to_bus() for I/O resources and the default "user
address == resource address" for MMIO (?)

My goal is to make pci_mmap_resource() and proc_bus_pci_mmap() look
very similar, e.g.,

  /* locate resource */
  pci_user_to_resource()                # only in proc_bus_pci_mmap()
  if (!pci_mmap_fits()) {
    WARN(...);
    return -EINVAL;
  }
  pci_mmap_page_range();

Obviously there are several steps in getting here.  Reworking
pci_resource_to_user() to use pcibios_resource_to_bus() when possible
would be a good start.

Bjorn

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v11 04/60] sparc/PCI: Use correct offset for bus address to resource
  2016-05-04 15:17                     ` Bjorn Helgaas
@ 2016-05-04 18:46                       ` Yinghai Lu
  2016-05-05  0:25                         ` Yinghai Lu
  0 siblings, 1 reply; 86+ messages in thread
From: Yinghai Lu @ 2016-05-04 18:46 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Benjamin Herrenschmidt, Bjorn Helgaas, David Miller,
	Linus Torvalds, Wei Yang, TJ, Yijing Wang, Khalid Aziz,
	linux-pci, Linux Kernel Mailing List, Michael Ellerman

On Wed, May 4, 2016 at 8:17 AM, Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> What mess do you mean?  The fact that you could only use
> pcibios_bus_to_resource() for MEM, and something else for IO?  Even
> if we could only use pcibios_bus_to_resource() for MEM, that sounds
> like an improvement, not a mess.

I means that we need create another 5 pci_user_to_resource
arch/microblaze/pci/pci-common.c:void pci_resource_to_user(const
struct pci_dev *dev, int bar,
arch/mips/include/asm/pci.h:static inline void
pci_resource_to_user(const struct pci_dev *dev, int bar,
arch/powerpc/kernel/pci-common.c:void pci_resource_to_user(const
struct pci_dev *dev, int bar,
arch/sparc/kernel/pci.c:void pci_resource_to_user(const struct pci_dev
*pdev, int bar,
include/linux/pci.h:static inline void pci_resource_to_user(const
struct pci_dev *dev, int bar,

>
> Most of __pci_mmap_make_offset() is pointless.

we can clean up them later.

>
> We might need something there for I/O regions, but for MEM, the
> vma->vm_pgoff coming into pci_mmap_page_range() should be exactly what
> we need and we shouldn't touch it.  I think __pci_mmap_make_offset()
> actually does leave it alone for MEM, but you have to read the code
> carefully to figure that out.
>
> All the validation stuff ("Check that the offset requested corresponds
> to one of the resources...") should be removed or moved to
> pci_mmap_fits().

ok, will give it try.

>> @@ -231,13 +231,26 @@ static int proc_bus_pci_mmap(struct file
>>  {
>>      struct pci_dev *dev = PDE_DATA(file_inode(file));
>>      struct pci_filp_private *fpriv = file->private_data;
>> +    resource_size_t start, end, offset;
>> +    struct resource *res;
>>      int i, ret;
>>
>>      if (!capable(CAP_SYS_RAWIO))
>>          return -EPERM;
>>
>> +    offset = vma->vm_pgoff << PAGE_SHIFT;
>> +
>>      /* Make sure the caller is mapping a real resource for this device */
>>      for (i = 0; i < PCI_ROM_RESOURCE; i++) {
>> +        res = &dev->resource[i];
>> +        if (!res->flags)
>> +            continue;
>> +
>> +        pci_resource_to_user(dev, i, res, &start, &end);
>> +        if (!(offset >= start && offset <= end))
>> +            continue;
>> +
>> +        vma->vm_pgoff = (res->start + (offset - start)) >> PAGE_SHIFT;
>>          if (pci_mmap_fits(dev, i, vma,  PCI_MMAP_PROCFS))
>
> This is sort of OK, but I think we can do better.  I don't see any
> problem with introducing pci_user_to_resource() as the inverse of
> pci_resource_to_user().  I think it will make this code read much
> better.
>
> The default pci_user_to_resource() would do nothing, just like the
> default pci_resource_to_user().
>
> For sparc, I think pci_user_to_resource() can use
> pcibios_bus_to_resource(), and pci_resource_to_user() can be rewritten
> to use pcibios_resource_to_bus().  That makes it much more obvious
> what's happening.
>
> It looks like microblaze and powerpc should use
> pcibios_resource_to_bus() for I/O resources and the default "user
> address == resource address" for MMIO (?)
>
> My goal is to make pci_mmap_resource() and proc_bus_pci_mmap() look
> very similar, e.g.,
>
>   /* locate resource */
>   pci_user_to_resource()                # only in proc_bus_pci_mmap()
>   if (!pci_mmap_fits()) {
>     WARN(...);
>     return -EINVAL;
>   }
>   pci_mmap_page_range();
>
> Obviously there are several steps in getting here.  Reworking
> pci_resource_to_user() to use pcibios_resource_to_bus() when possible
> would be a good start.

I would like to avoid adding pci_user_to_resource, and put extra calling
pci_resource_to_user  pci_mmap_fits instead.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v11 04/60] sparc/PCI: Use correct offset for bus address to resource
  2016-05-04 18:46                       ` Yinghai Lu
@ 2016-05-05  0:25                         ` Yinghai Lu
  2016-05-05 15:53                           ` Yinghai Lu
  0 siblings, 1 reply; 86+ messages in thread
From: Yinghai Lu @ 2016-05-05  0:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Benjamin Herrenschmidt, Bjorn Helgaas, David Miller,
	Linus Torvalds, Wei Yang, TJ, Yijing Wang, Khalid Aziz,
	linux-pci, Linux Kernel Mailing List, Michael Ellerman

On Wed, May 4, 2016 at 11:46 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Wed, May 4, 2016 at 8:17 AM, Bjorn Helgaas <helgaas@kernel.org> wrote:
>> My goal is to make pci_mmap_resource() and proc_bus_pci_mmap() look
>> very similar, e.g.,
>>
>>   /* locate resource */
>>   pci_user_to_resource()                # only in proc_bus_pci_mmap()
>>   if (!pci_mmap_fits()) {
>>     WARN(...);
>>     return -EINVAL;
>>   }
>>   pci_mmap_page_range();

Please check v2.

Subject: [RFC PATCH v2] PCI: Let pci_mmap_page_range() take resource addr

In 8c05cd08a7 ("PCI: fix offset check for sysfs mmapped files"), try
to check exposed value with resource start/end in proc mmap path.
|        start = vma->vm_pgoff;
|        size = ((pci_resource_len(pdev, resno) - 1) >> PAGE_SHIFT) + 1;
|        pci_start = (mmap_api == PCI_MMAP_PROCFS) ?
|                        pci_resource_start(pdev, resno) >> PAGE_SHIFT : 0;
|        if (start >= pci_start && start < pci_start + size &&
|                        start + nr <= pci_start + size)

That would break sparc that exposed value is still BAR value.

In the patch:
1. in proc path: proc_bus_pci_mmap, try convert back to resource
   before calling pci_mmap_page_range
2. in sysfs path: pci_mmap_resource will just offset with resource start.
3. all pci_mmap_page_range will all have vma->vm_pgoff with in resource
   range instead of BAR value.
4. remove __pci_mmap_make_offset, as the checking is done
   in pci_mmap_fits().

-v2: add pci_user_to_resource and remove __pci_mmap_make_offset

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/microblaze/pci/pci-common.c |   94 ++++++++-----------------------
 arch/powerpc/kernel/pci-common.c |   95 ++++++++-----------------------
 arch/sparc/kernel/pci.c          |  117 ---------------------------------------
 arch/xtensa/kernel/pci.c         |   73 ++----------------------
 drivers/pci/pci-sysfs.c          |   23 ++++---
 drivers/pci/pci.h                |    2
 drivers/pci/proc.c               |   59 ++++++++++++++++---
 7 files changed, 124 insertions(+), 339 deletions(-)

Index: linux-2.6/arch/microblaze/pci/pci-common.c
===================================================================
--- linux-2.6.orig/arch/microblaze/pci/pci-common.c
+++ linux-2.6/arch/microblaze/pci/pci-common.c
@@ -153,63 +153,25 @@ void pcibios_set_master(struct pci_dev *
  *  -- paulus.
  */

-/*
- * Adjust vm_pgoff of VMA such that it is the physical page offset
- * corresponding to the 32-bit pci bus offset for DEV requested by the user.
- *
- * Basically, the user finds the base address for his device which he wishes
- * to mmap.  They read the 32-bit value from the config space base register,
- * add whatever PAGE_SIZE multiple offset they wish, and feed this into the
- * offset parameter of mmap on /proc/bus/pci/XXX for that device.
- *
- * Returns negative error code on failure, zero on success.
- */
-static struct resource *__pci_mmap_make_offset(struct pci_dev *dev,
-                           resource_size_t *offset,
-                           enum pci_mmap_state mmap_state)
+static struct resource *pci_find_resource(struct pci_dev *dev,
+                resource_size_t offset, int flags)
 {
-    struct pci_controller *hose = pci_bus_to_host(dev->bus);
-    unsigned long io_offset = 0;
-    int i, res_bit;
-
-    if (!hose)
-        return NULL;        /* should never happen */
-
-    /* If memory, add on the PCI bridge address offset */
-    if (mmap_state == pci_mmap_mem) {
-#if 0 /* See comment in pci_resource_to_user() for why this is disabled */
-        *offset += hose->pci_mem_offset;
-#endif
-        res_bit = IORESOURCE_MEM;
-    } else {
-        io_offset = (unsigned long)hose->io_base_virt - _IO_BASE;
-        *offset += io_offset;
-        res_bit = IORESOURCE_IO;
-    }
+    int i;

-    /*
-     * Check that the offset requested corresponds to one of the
-     * resources of the device.
-     */
     for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
         struct resource *rp = &dev->resource[i];
-        int flags = rp->flags;

-        /* treat ROM as memory (should be already) */
-        if (i == PCI_ROM_RESOURCE)
-            flags |= IORESOURCE_MEM;
+        if (!(rp->flags & flags))
+            continue;

-        /* Active and same type? */
-        if ((flags & res_bit) == 0)
+        if (pci_resource_len(dev, i) == 0)
             continue;

         /* In the range of this resource? */
-        if (*offset < (rp->start & PAGE_MASK) || *offset > rp->end)
+        if (offset < (rp->start & PAGE_MASK) ||
+            offset > rp->end)
             continue;

-        /* found it! construct the final physical address */
-        if (mmap_state == pci_mmap_io)
-            *offset += hose->io_base_phys - io_offset;
         return rp;
     }

@@ -236,7 +198,7 @@ static pgprot_t __pci_mmap_set_pgprot(st
     if (mmap_state != pci_mmap_mem)
         write_combine = 0;
     else if (write_combine == 0) {
-        if (rp->flags & IORESOURCE_PREFETCH)
+        if (rp && (rp->flags & IORESOURCE_PREFETCH))
             write_combine = 1;
     }

@@ -256,27 +218,13 @@ pgprot_t pci_phys_mem_access_prot(struct
     struct pci_dev *pdev = NULL;
     struct resource *found = NULL;
     resource_size_t offset = ((resource_size_t)pfn) << PAGE_SHIFT;
-    int i;

     if (page_is_ram(pfn))
         return prot;

     prot = pgprot_noncached(prot);
     for_each_pci_dev(pdev) {
-        for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
-            struct resource *rp = &pdev->resource[i];
-            int flags = rp->flags;
-
-            /* Active and same type? */
-            if ((flags & IORESOURCE_MEM) == 0)
-                continue;
-            /* In the range of this resource? */
-            if (offset < (rp->start & PAGE_MASK) ||
-                offset > rp->end)
-                continue;
-            found = rp;
-            break;
-        }
+        found = pci_find_resource(pdev, offset, IORESOURCE_MEM);
         if (found)
             break;
     }
@@ -305,14 +253,24 @@ pgprot_t pci_phys_mem_access_prot(struct
 int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
             enum pci_mmap_state mmap_state, int write_combine)
 {
+    struct pci_controller *hose = pci_bus_to_host(dev->bus);
     resource_size_t offset =
         ((resource_size_t)vma->vm_pgoff) << PAGE_SHIFT;
     struct resource *rp;
-    int ret;
+    int ret, flags;
+
+    if (!hose)
+        return -EINVAL;         /* should never happen */

-    rp = __pci_mmap_make_offset(dev, &offset, mmap_state);
-    if (rp == NULL)
-        return -EINVAL;
+    if (mmap_state == pci_mmap_mem)
+        flags = IORESOURCE_MEM;
+    else
+        flags = IORESOURCE_IO;
+
+    rp = pci_find_resource(dev, offset, flags);
+    if (mmap_state == pci_mmap_io)
+        *offset += hose->io_base_phys -
+             ((unsigned long)hose->io_base_virt - _IO_BASE);

     vma->vm_pgoff = offset >> PAGE_SHIFT;
     vma->vm_page_prot = __pci_mmap_set_pgprot(dev, rp,
@@ -491,9 +449,7 @@ void pci_resource_to_user(const struct p
      *
      * Hopefully, the sysfs insterface is immune to that gunk. Once X
      * has been fixed (and the fix spread enough), we can re-enable the
-     * 2 lines below and pass down a BAR value to userland. In that case
-     * we'll also have to re-enable the matching code in
-     * __pci_mmap_make_offset().
+     * 2 lines below and pass down a BAR value to userland.
      *
      * BenH.
      */
Index: linux-2.6/arch/powerpc/kernel/pci-common.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/pci-common.c
+++ linux-2.6/arch/powerpc/kernel/pci-common.c
@@ -292,63 +292,25 @@ static int pci_read_irq_line(struct pci_
  *  -- paulus.
  */

-/*
- * Adjust vm_pgoff of VMA such that it is the physical page offset
- * corresponding to the 32-bit pci bus offset for DEV requested by the user.
- *
- * Basically, the user finds the base address for his device which he wishes
- * to mmap.  They read the 32-bit value from the config space base register,
- * add whatever PAGE_SIZE multiple offset they wish, and feed this into the
- * offset parameter of mmap on /proc/bus/pci/XXX for that device.
- *
- * Returns negative error code on failure, zero on success.
- */
-static struct resource *__pci_mmap_make_offset(struct pci_dev *dev,
-                           resource_size_t *offset,
-                           enum pci_mmap_state mmap_state)
+static struct resource *pci_find_resource(struct pci_dev *dev,
+        resource_size_t offset, int flags)
 {
-    struct pci_controller *hose = pci_bus_to_host(dev->bus);
-    unsigned long io_offset = 0;
-    int i, res_bit;
-
-    if (hose == NULL)
-        return NULL;        /* should never happen */
-
-    /* If memory, add on the PCI bridge address offset */
-    if (mmap_state == pci_mmap_mem) {
-#if 0 /* See comment in pci_resource_to_user() for why this is disabled */
-        *offset += hose->pci_mem_offset;
-#endif
-        res_bit = IORESOURCE_MEM;
-    } else {
-        io_offset = (unsigned long)hose->io_base_virt - _IO_BASE;
-        *offset += io_offset;
-        res_bit = IORESOURCE_IO;
-    }
+    int i;

-    /*
-     * Check that the offset requested corresponds to one of the
-     * resources of the device.
-     */
     for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
         struct resource *rp = &dev->resource[i];
-        int flags = rp->flags;

-        /* treat ROM as memory (should be already) */
-        if (i == PCI_ROM_RESOURCE)
-            flags |= IORESOURCE_MEM;
+        if (!(rp->flags & flags))
+            continue;

-        /* Active and same type? */
-        if ((flags & res_bit) == 0)
+        if (pci_resource_len(dev, i) == 0)
             continue;

         /* In the range of this resource? */
-        if (*offset < (rp->start & PAGE_MASK) || *offset > rp->end)
+        if (offset < (rp->start & PAGE_MASK) ||
+            offset > rp->end)
             continue;

-        /* found it! construct the final physical address */
-        if (mmap_state == pci_mmap_io)
-            *offset += hose->io_base_phys - io_offset;
         return rp;
     }

@@ -374,7 +336,7 @@ static pgprot_t __pci_mmap_set_pgprot(st
     if (mmap_state != pci_mmap_mem)
         write_combine = 0;
     else if (write_combine == 0) {
-        if (rp->flags & IORESOURCE_PREFETCH)
+        if (rp && (rp->flags & IORESOURCE_PREFETCH))
             write_combine = 1;
     }

@@ -385,6 +347,7 @@ static pgprot_t __pci_mmap_set_pgprot(st
         return pgprot_noncached(protection);
 }

+
 /*
  * This one is used by /dev/mem and fbdev who have no clue about the
  * PCI device, it tries to find the PCI device first and calls the
@@ -398,27 +361,13 @@ pgprot_t pci_phys_mem_access_prot(struct
     struct pci_dev *pdev = NULL;
     struct resource *found = NULL;
     resource_size_t offset = ((resource_size_t)pfn) << PAGE_SHIFT;
-    int i;

     if (page_is_ram(pfn))
         return prot;

     prot = pgprot_noncached(prot);
     for_each_pci_dev(pdev) {
-        for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
-            struct resource *rp = &pdev->resource[i];
-            int flags = rp->flags;
-
-            /* Active and same type? */
-            if ((flags & IORESOURCE_MEM) == 0)
-                continue;
-            /* In the range of this resource? */
-            if (offset < (rp->start & PAGE_MASK) ||
-                offset > rp->end)
-                continue;
-            found = rp;
-            break;
-        }
+        found = pci_find_resource(pdev, offset, IORESOURCE_MEM);
         if (found)
             break;
     }
@@ -448,14 +397,24 @@ pgprot_t pci_phys_mem_access_prot(struct
 int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
             enum pci_mmap_state mmap_state, int write_combine)
 {
+    struct pci_controller *hose = pci_bus_to_host(dev->bus);
     resource_size_t offset =
         ((resource_size_t)vma->vm_pgoff) << PAGE_SHIFT;
     struct resource *rp;
-    int ret;
+    int ret, flags;
+
+    if (hose == NULL)
+        return -EINVAL;         /* should never happen */
+
+    if (mmap_state == pci_mmap_mem)
+        flags = IORESOURCE_MEM;
+    else
+        flags = IORESOURCE_IO;
+    rp = pci_find_resource(dev, offset, flags);

-    rp = __pci_mmap_make_offset(dev, &offset, mmap_state);
-    if (rp == NULL)
-        return -EINVAL;
+    if (mmap_state == pci_mmap_io)
+        offset += hose->io_base_phys -
+              ((unsigned long)hose->io_base_virt - _IO_BASE);

     vma->vm_pgoff = offset >> PAGE_SHIFT;
     vma->vm_page_prot = __pci_mmap_set_pgprot(dev, rp,
@@ -630,9 +589,7 @@ void pci_resource_to_user(const struct p
      *
      * Hopefully, the sysfs insterface is immune to that gunk. Once X
      * has been fixed (and the fix spread enough), we can re-enable the
-     * 2 lines below and pass down a BAR value to userland. In that case
-     * we'll also have to re-enable the matching code in
-     * __pci_mmap_make_offset().
+     * 2 lines below and pass down a BAR value to userland.
      *
      * BenH.
      */
Index: linux-2.6/arch/sparc/kernel/pci.c
===================================================================
--- linux-2.6.orig/arch/sparc/kernel/pci.c
+++ linux-2.6/arch/sparc/kernel/pci.c
@@ -732,119 +732,6 @@ int pcibios_enable_device(struct pci_dev

 /* Platform support for /proc/bus/pci/X/Y mmap()s. */

-/* If the user uses a host-bridge as the PCI device, he may use
- * this to perform a raw mmap() of the I/O or MEM space behind
- * that controller.
- *
- * This can be useful for execution of x86 PCI bios initialization code
- * on a PCI card, like the xfree86 int10 stuff does.
- */
-static int __pci_mmap_make_offset_bus(struct pci_dev *pdev, struct
vm_area_struct *vma,
-                      enum pci_mmap_state mmap_state)
-{
-    struct pci_pbm_info *pbm = pdev->dev.archdata.host_controller;
-    unsigned long space_size, user_offset, user_size;
-
-    if (mmap_state == pci_mmap_io) {
-        space_size = resource_size(&pbm->io_space);
-    } else {
-        space_size = resource_size(&pbm->mem_space);
-    }
-
-    /* Make sure the request is in range. */
-    user_offset = vma->vm_pgoff << PAGE_SHIFT;
-    user_size = vma->vm_end - vma->vm_start;
-
-    if (user_offset >= space_size ||
-        (user_offset + user_size) > space_size)
-        return -EINVAL;
-
-    if (mmap_state == pci_mmap_io) {
-        vma->vm_pgoff = (pbm->io_space.start +
-                 user_offset) >> PAGE_SHIFT;
-    } else {
-        vma->vm_pgoff = (pbm->mem_space.start +
-                 user_offset) >> PAGE_SHIFT;
-    }
-
-    return 0;
-}
-
-/* Adjust vm_pgoff of VMA such that it is the physical page offset
- * corresponding to the 32-bit pci bus offset for DEV requested by the user.
- *
- * Basically, the user finds the base address for his device which he wishes
- * to mmap.  They read the 32-bit value from the config space base register,
- * add whatever PAGE_SIZE multiple offset they wish, and feed this into the
- * offset parameter of mmap on /proc/bus/pci/XXX for that device.
- *
- * Returns negative error code on failure, zero on success.
- */
-static int __pci_mmap_make_offset(struct pci_dev *pdev,
-                  struct vm_area_struct *vma,
-                  enum pci_mmap_state mmap_state)
-{
-    unsigned long user_paddr, user_size;
-    int i, err;
-
-    /* First compute the physical address in vma->vm_pgoff,
-     * making sure the user offset is within range in the
-     * appropriate PCI space.
-     */
-    err = __pci_mmap_make_offset_bus(pdev, vma, mmap_state);
-    if (err)
-        return err;
-
-    /* If this is a mapping on a host bridge, any address
-     * is OK.
-     */
-    if ((pdev->class >> 8) == PCI_CLASS_BRIDGE_HOST)
-        return err;
-
-    /* Otherwise make sure it's in the range for one of the
-     * device's resources.
-     */
-    user_paddr = vma->vm_pgoff << PAGE_SHIFT;
-    user_size = vma->vm_end - vma->vm_start;
-
-    for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
-        struct resource *rp = &pdev->resource[i];
-        resource_size_t aligned_end;
-
-        /* Active? */
-        if (!rp->flags)
-            continue;
-
-        /* Same type? */
-        if (i == PCI_ROM_RESOURCE) {
-            if (mmap_state != pci_mmap_mem)
-                continue;
-        } else {
-            if ((mmap_state == pci_mmap_io &&
-                 (rp->flags & IORESOURCE_IO) == 0) ||
-                (mmap_state == pci_mmap_mem &&
-                 (rp->flags & IORESOURCE_MEM) == 0))
-                continue;
-        }
-
-        /* Align the resource end to the next page address.
-         * PAGE_SIZE intentionally added instead of (PAGE_SIZE - 1),
-         * because actually we need the address of the next byte
-         * after rp->end.
-         */
-        aligned_end = (rp->end + PAGE_SIZE) & PAGE_MASK;
-
-        if ((rp->start <= user_paddr) &&
-            (user_paddr + user_size) <= aligned_end)
-            break;
-    }
-
-    if (i > PCI_ROM_RESOURCE)
-        return -EINVAL;
-
-    return 0;
-}
-
 /* Set vm_page_prot of VMA, as appropriate for this architecture, for a pci
  * device mapping.
  */
@@ -868,10 +755,6 @@ int pci_mmap_page_range(struct pci_dev *
 {
     int ret;

-    ret = __pci_mmap_make_offset(dev, vma, mmap_state);
-    if (ret < 0)
-        return ret;
-
     __pci_mmap_set_pgprot(dev, vma, mmap_state);

     vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
Index: linux-2.6/arch/xtensa/kernel/pci.c
===================================================================
--- linux-2.6.orig/arch/xtensa/kernel/pci.c
+++ linux-2.6/arch/xtensa/kernel/pci.c
@@ -272,68 +272,6 @@ pci_controller_num(struct pci_dev *dev)
  */

 /*
- * Adjust vm_pgoff of VMA such that it is the physical page offset
- * corresponding to the 32-bit pci bus offset for DEV requested by the user.
- *
- * Basically, the user finds the base address for his device which he wishes
- * to mmap.  They read the 32-bit value from the config space base register,
- * add whatever PAGE_SIZE multiple offset they wish, and feed this into the
- * offset parameter of mmap on /proc/bus/pci/XXX for that device.
- *
- * Returns negative error code on failure, zero on success.
- */
-static __inline__ int
-__pci_mmap_make_offset(struct pci_dev *dev, struct vm_area_struct *vma,
-               enum pci_mmap_state mmap_state)
-{
-    struct pci_controller *pci_ctrl = (struct pci_controller*) dev->sysdata;
-    unsigned long offset = vma->vm_pgoff << PAGE_SHIFT;
-    unsigned long io_offset = 0;
-    int i, res_bit;
-
-    if (pci_ctrl == 0)
-        return -EINVAL;        /* should never happen */
-
-    /* If memory, add on the PCI bridge address offset */
-    if (mmap_state == pci_mmap_mem) {
-        res_bit = IORESOURCE_MEM;
-    } else {
-        io_offset = (unsigned long)pci_ctrl->io_space.base;
-        offset += io_offset;
-        res_bit = IORESOURCE_IO;
-    }
-
-    /*
-     * Check that the offset requested corresponds to one of the
-     * resources of the device.
-     */
-    for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
-        struct resource *rp = &dev->resource[i];
-        int flags = rp->flags;
-
-        /* treat ROM as memory (should be already) */
-        if (i == PCI_ROM_RESOURCE)
-            flags |= IORESOURCE_MEM;
-
-        /* Active and same type? */
-        if ((flags & res_bit) == 0)
-            continue;
-
-        /* In the range of this resource? */
-        if (offset < (rp->start & PAGE_MASK) || offset > rp->end)
-            continue;
-
-        /* found it! construct the final physical address */
-        if (mmap_state == pci_mmap_io)
-            offset += pci_ctrl->io_space.start - io_offset;
-        vma->vm_pgoff = offset >> PAGE_SHIFT;
-        return 0;
-    }
-
-    return -EINVAL;
-}
-
-/*
  * Set vm_page_prot of VMA, as appropriate for this architecture, for a pci
  * device mapping.
  */
@@ -366,11 +304,16 @@ int pci_mmap_page_range(struct pci_dev *
             enum pci_mmap_state mmap_state,
             int write_combine)
 {
+    struct pci_controller *pci_ctrl = (struct pci_controller *)dev->sysdata;
+    unsigned long offset = vma->vm_pgoff << PAGE_SHIFT;
     int ret;

-    ret = __pci_mmap_make_offset(dev, vma, mmap_state);
-    if (ret < 0)
-        return ret;
+    if (pci_ctrl == 0)
+        return -EINVAL;        /* should never happen */
+
+    if (mmap_state == pci_mmap_io)
+        offset += pci_ctrl->io_space.start - pci_ctrl->io_space.base;
+    vma->vm_pgoff = offset >> PAGE_SHIFT;

     __pci_mmap_set_pgprot(dev, vma, mmap_state, write_combine);

Index: linux-2.6/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-sysfs.c
+++ linux-2.6/drivers/pci/pci-sysfs.c
@@ -967,12 +967,23 @@ void pci_remove_legacy_files(struct pci_
 #ifdef HAVE_PCI_MMAP

 int pci_mmap_fits(struct pci_dev *pdev, int resno, struct vm_area_struct *vma,
+          enum pci_mmap_state mmap_type,
           enum pci_mmap_api mmap_api)
 {
     unsigned long nr, start, size, pci_start;
+    int res_bit;

     if (pci_resource_len(pdev, resno) == 0)
         return 0;
+
+    if (mmap_type == pci_mmap_mem)
+        res_bit = IORESOURCE_MEM;
+    else
+        res_bit = IORESOURCE_IO;
+
+    if (!(pci_resource_flags(pdev, resno) & res_bit))
+        return 0;
+
     nr = vma_pages(vma);
     start = vma->vm_pgoff;
     size = ((pci_resource_len(pdev, resno) - 1) >> PAGE_SHIFT) + 1;
@@ -999,7 +1010,6 @@ static int pci_mmap_resource(struct kobj
     struct pci_dev *pdev = to_pci_dev(kobj_to_dev(kobj));
     struct resource *res = attr->private;
     enum pci_mmap_state mmap_type;
-    resource_size_t start, end;
     int i;

     for (i = 0; i < PCI_ROM_RESOURCE; i++)
@@ -1011,7 +1021,8 @@ static int pci_mmap_resource(struct kobj
     if (res->flags & IORESOURCE_MEM && iomem_is_exclusive(res->start))
         return -EINVAL;

-    if (!pci_mmap_fits(pdev, i, vma, PCI_MMAP_SYSFS)) {
+    mmap_type = res->flags & IORESOURCE_MEM ? pci_mmap_mem : pci_mmap_io;
+    if (!pci_mmap_fits(pdev, i, vma, mmap_type, PCI_MMAP_SYSFS)) {
         WARN(1, "process \"%s\" tried to map 0x%08lx bytes at page
0x%08lx on %s BAR %d (start 0x%16Lx, size 0x%16Lx)\n",
             current->comm, vma->vm_end-vma->vm_start, vma->vm_pgoff,
             pci_name(pdev), i,
@@ -1020,13 +1031,7 @@ static int pci_mmap_resource(struct kobj
         return -EINVAL;
     }

-    /* pci_mmap_page_range() expects the same kind of entry as coming
-     * from /proc/bus/pci/ which is a "user visible" value. If this is
-     * different from the resource itself, arch will do necessary fixup.
-     */
-    pci_resource_to_user(pdev, i, res, &start, &end);
-    vma->vm_pgoff += start >> PAGE_SHIFT;
-    mmap_type = res->flags & IORESOURCE_MEM ? pci_mmap_mem : pci_mmap_io;
+    vma->vm_pgoff += res->start >> PAGE_SHIFT;
     return pci_mmap_page_range(pdev, vma, mmap_type, write_combine);
 }

Index: linux-2.6/drivers/pci/proc.c
===================================================================
--- linux-2.6.orig/drivers/pci/proc.c
+++ linux-2.6/drivers/pci/proc.c
@@ -227,26 +227,67 @@ static long proc_bus_pci_ioctl(struct fi
 }

 #ifdef HAVE_PCI_MMAP
+
+static int pci_user_to_resource(struct pci_dev *dev, resource_size_t *offset,
+                resource_size_t size, int flags)
+{
+    int i;
+
+    for (i = 0; i < PCI_ROM_RESOURCE; i++) {
+        resource_size_t start, end;
+        struct resource *res = &dev->resource[i];
+
+        if (!(res->flags & flags))
+            continue;
+
+        if (pci_resource_len(dev, i) == 0)
+            continue;
+
+        pci_resource_to_user(dev, i, res, &start, &end);
+        if (start <= *offset && (*offset + size - 1) <= end) {
+            *offset = res->start + (*offset - start);
+            return i;
+        }
+
+        return i;
+    }
+
+    return -ENODEV;
+}
+
 static int proc_bus_pci_mmap(struct file *file, struct vm_area_struct *vma)
 {
     struct pci_dev *dev = PDE_DATA(file_inode(file));
     struct pci_filp_private *fpriv = file->private_data;
-    int i, ret;
+    enum pci_mmap_state mmap_type = fpriv->mmap_state;
+    resource_size_t offset, size;
+    int i, ret, flags;

     if (!capable(CAP_SYS_RAWIO))
         return -EPERM;

-    /* Make sure the caller is mapping a real resource for this device */
-    for (i = 0; i < PCI_ROM_RESOURCE; i++) {
-        if (pci_mmap_fits(dev, i, vma,  PCI_MMAP_PROCFS))
-            break;
-    }
-
-    if (i >= PCI_ROM_RESOURCE)
+    offset = vma->vm_pgoff << PAGE_SHIFT;
+    size = vma->vm_end - vma->vm_start;
+    if (mmap_type == pci_mmap_mem)
+        flags = IORESOURCE_MEM;
+    else
+        flags = IORESOURCE_IO;
+    i = pci_user_to_resource(dev, &offset, size, flags);
+    if (i < 0)
         return -ENODEV;

+    vma->vm_pgoff = offset >> PAGE_SHIFT;
+    if (!pci_mmap_fits(dev, i, vma, mmap_type, PCI_MMAP_PROCFS)) {
+        WARN(1, "process \"%s\" tried to map 0x%08lx bytes at page
0x%08lx on %s BAR %d (start 0x%16Lx, size 0x%16Lx)\n",
+            current->comm, vma->vm_end-vma->vm_start, vma->vm_pgoff,
+            pci_name(dev), i,
+            (u64)pci_resource_start(dev, i),
+            (u64)pci_resource_len(dev, i));
+        return -EINVAL;
+    }
+
     ret = pci_mmap_page_range(dev, vma,
-                  fpriv->mmap_state,
+                  mmap_type,
                   fpriv->write_combine);
     if (ret < 0)
         return ret;
Index: linux-2.6/drivers/pci/pci.h
===================================================================
--- linux-2.6.orig/drivers/pci/pci.h
+++ linux-2.6/drivers/pci/pci.h
@@ -30,7 +30,7 @@ enum pci_mmap_api {
     PCI_MMAP_PROCFS    /* mmap on /proc/bus/pci/<BDF> */
 };
 int pci_mmap_fits(struct pci_dev *pdev, int resno, struct vm_area_struct *vmai,
-          enum pci_mmap_api mmap_api);
+          enum pci_mmap_state mmap_type, enum pci_mmap_api mmap_api);
 #endif
 int pci_probe_reset_function(struct pci_dev *dev);

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v11 04/60] sparc/PCI: Use correct offset for bus address to resource
  2016-05-05  0:25                         ` Yinghai Lu
@ 2016-05-05 15:53                           ` Yinghai Lu
  2016-05-05 22:02                             ` Benjamin Herrenschmidt
  2016-05-06 18:26                             ` Bjorn Helgaas
  0 siblings, 2 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-05-05 15:53 UTC (permalink / raw)
  To: Bjorn Helgaas, Benjamin Herrenschmidt, David Miller
  Cc: Bjorn Helgaas, Linus Torvalds, Wei Yang, TJ, Yijing Wang,
	Khalid Aziz, linux-pci, Linux Kernel Mailing List,
	Michael Ellerman

On Wed, May 4, 2016 at 5:25 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Wed, May 4, 2016 at 11:46 AM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Wed, May 4, 2016 at 8:17 AM, Bjorn Helgaas <helgaas@kernel.org> wrote:
>>> My goal is to make pci_mmap_resource() and proc_bus_pci_mmap() look
>>> very similar, e.g.,
>>>
>>>   /* locate resource */
>>>   pci_user_to_resource()                # only in proc_bus_pci_mmap()
>>>   if (!pci_mmap_fits()) {
>>>     WARN(...);
>>>     return -EINVAL;
>>>   }
>>>   pci_mmap_page_range();
>

v3, that have more change to pass *res to make powerpc prot setting simple.

Question for BenH and DavidM:

For powerpc io port, we still need extra offset from resource address
to final address.

     resource_size_t offset =
         ((resource_size_t)vma->vm_pgoff) << PAGE_SHIFT;

+    if (mmap_state == pci_mmap_io) {
+        struct pci_controller *hose = pci_bus_to_host(dev->bus);
+
+        /* hose should never be NULL */
+        offset += hose->io_base_phys -
+              ((unsigned long)hose->io_base_virt - _IO_BASE);
+    }

     vma->vm_pgoff = offset >> PAGE_SHIFT;

but sparc does not need that trick.

why ?

Thanks

Yinghai


---

Subject: [RFC PATCH v3 2/2] PCI: Let pci_mmap_page_range() take resource addr

In 8c05cd08a7 ("PCI: fix offset check for sysfs mmapped files"), try
to check exposed value with resource start/end in proc mmap path.
|        start = vma->vm_pgoff;
|        size = ((pci_resource_len(pdev, resno) - 1) >> PAGE_SHIFT) + 1;
|        pci_start = (mmap_api == PCI_MMAP_PROCFS) ?
|                        pci_resource_start(pdev, resno) >> PAGE_SHIFT : 0;
|        if (start >= pci_start && start < pci_start + size &&
|                        start + nr <= pci_start + size)

That would break sparc that exposed value is still BAR value.

In the patch:
1. in proc path: proc_bus_pci_mmap, try convert back to resource
   before calling pci_mmap_page_range
2. in sysfs path: pci_mmap_resource will just offset with resource start.
3. all pci_mmap_page_range will all have vma->vm_pgoff with in resource
   range instead of BAR value.
4. remove __pci_mmap_make_offset, as the checking is done
   in pci_mmap_fits().

-v2: add pci_user_to_resource and remove __pci_mmap_make_offset
-v3: pass resource pointer with pci_mmap_page_range()

Signed-off-by: Yinghai Lu <yinghai@kernel.org>


---
 arch/microblaze/pci/pci-common.c |   78 ++------------------------
 arch/powerpc/kernel/pci-common.c |   78 ++------------------------
 arch/sparc/kernel/pci.c          |  117 ---------------------------------------
 arch/xtensa/kernel/pci.c         |   75 +++----------------------
 drivers/pci/pci-sysfs.c          |   23 ++++---
 drivers/pci/proc.c               |   57 ++++++++++++++++---
 6 files changed, 88 insertions(+), 340 deletions(-)

Index: linux-2.6/arch/microblaze/pci/pci-common.c
===================================================================
--- linux-2.6.orig/arch/microblaze/pci/pci-common.c
+++ linux-2.6/arch/microblaze/pci/pci-common.c
@@ -154,69 +154,6 @@ void pcibios_set_master(struct pci_dev *
  */

 /*
- * Adjust vm_pgoff of VMA such that it is the physical page offset
- * corresponding to the 32-bit pci bus offset for DEV requested by the user.
- *
- * Basically, the user finds the base address for his device which he wishes
- * to mmap.  They read the 32-bit value from the config space base register,
- * add whatever PAGE_SIZE multiple offset they wish, and feed this into the
- * offset parameter of mmap on /proc/bus/pci/XXX for that device.
- *
- * Returns negative error code on failure, zero on success.
- */
-static struct resource *__pci_mmap_make_offset(struct pci_dev *dev,
-                           resource_size_t *offset,
-                           enum pci_mmap_state mmap_state)
-{
-    struct pci_controller *hose = pci_bus_to_host(dev->bus);
-    unsigned long io_offset = 0;
-    int i, res_bit;
-
-    if (!hose)
-        return NULL;        /* should never happen */
-
-    /* If memory, add on the PCI bridge address offset */
-    if (mmap_state == pci_mmap_mem) {
-#if 0 /* See comment in pci_resource_to_user() for why this is disabled */
-        *offset += hose->pci_mem_offset;
-#endif
-        res_bit = IORESOURCE_MEM;
-    } else {
-        io_offset = (unsigned long)hose->io_base_virt - _IO_BASE;
-        *offset += io_offset;
-        res_bit = IORESOURCE_IO;
-    }
-
-    /*
-     * Check that the offset requested corresponds to one of the
-     * resources of the device.
-     */
-    for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
-        struct resource *rp = &dev->resource[i];
-        int flags = rp->flags;
-
-        /* treat ROM as memory (should be already) */
-        if (i == PCI_ROM_RESOURCE)
-            flags |= IORESOURCE_MEM;
-
-        /* Active and same type? */
-        if ((flags & res_bit) == 0)
-            continue;
-
-        /* In the range of this resource? */
-        if (*offset < (rp->start & PAGE_MASK) || *offset > rp->end)
-            continue;
-
-        /* found it! construct the final physical address */
-        if (mmap_state == pci_mmap_io)
-            *offset += hose->io_base_phys - io_offset;
-        return rp;
-    }
-
-    return NULL;
-}
-
-/*
  * Set vm_page_prot of VMA, as appropriate for this architecture, for a pci
  * device mapping.
  */
@@ -308,12 +245,15 @@ int pci_mmap_page_range(struct pci_dev *
 {
     resource_size_t offset =
         ((resource_size_t)vma->vm_pgoff) << PAGE_SHIFT;
-    struct resource *rp;
     int ret;

-    rp = __pci_mmap_make_offset(dev, &offset, mmap_state);
-    if (rp == NULL)
-        return -EINVAL;
+    if (mmap_state == pci_mmap_io) {
+        struct pci_controller *hose = pci_bus_to_host(dev->bus);
+
+        /* hose should never be NULL */
+        *offset += hose->io_base_phys -
+             ((unsigned long)hose->io_base_virt - _IO_BASE);
+    }

     vma->vm_pgoff = offset >> PAGE_SHIFT;
     vma->vm_page_prot = __pci_mmap_set_pgprot(dev, rp,
@@ -492,9 +432,7 @@ void pci_resource_to_user(const struct p
      *
      * Hopefully, the sysfs insterface is immune to that gunk. Once X
      * has been fixed (and the fix spread enough), we can re-enable the
-     * 2 lines below and pass down a BAR value to userland. In that case
-     * we'll also have to re-enable the matching code in
-     * __pci_mmap_make_offset().
+     * 2 lines below and pass down a BAR value to userland.
      *
      * BenH.
      */
Index: linux-2.6/arch/powerpc/kernel/pci-common.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/pci-common.c
+++ linux-2.6/arch/powerpc/kernel/pci-common.c
@@ -293,69 +293,6 @@ static int pci_read_irq_line(struct pci_
  */

 /*
- * Adjust vm_pgoff of VMA such that it is the physical page offset
- * corresponding to the 32-bit pci bus offset for DEV requested by the user.
- *
- * Basically, the user finds the base address for his device which he wishes
- * to mmap.  They read the 32-bit value from the config space base register,
- * add whatever PAGE_SIZE multiple offset they wish, and feed this into the
- * offset parameter of mmap on /proc/bus/pci/XXX for that device.
- *
- * Returns negative error code on failure, zero on success.
- */
-static struct resource *__pci_mmap_make_offset(struct pci_dev *dev,
-                           resource_size_t *offset,
-                           enum pci_mmap_state mmap_state)
-{
-    struct pci_controller *hose = pci_bus_to_host(dev->bus);
-    unsigned long io_offset = 0;
-    int i, res_bit;
-
-    if (hose == NULL)
-        return NULL;        /* should never happen */
-
-    /* If memory, add on the PCI bridge address offset */
-    if (mmap_state == pci_mmap_mem) {
-#if 0 /* See comment in pci_resource_to_user() for why this is disabled */
-        *offset += hose->pci_mem_offset;
-#endif
-        res_bit = IORESOURCE_MEM;
-    } else {
-        io_offset = (unsigned long)hose->io_base_virt - _IO_BASE;
-        *offset += io_offset;
-        res_bit = IORESOURCE_IO;
-    }
-
-    /*
-     * Check that the offset requested corresponds to one of the
-     * resources of the device.
-     */
-    for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
-        struct resource *rp = &dev->resource[i];
-        int flags = rp->flags;
-
-        /* treat ROM as memory (should be already) */
-        if (i == PCI_ROM_RESOURCE)
-            flags |= IORESOURCE_MEM;
-
-        /* Active and same type? */
-        if ((flags & res_bit) == 0)
-            continue;
-
-        /* In the range of this resource? */
-        if (*offset < (rp->start & PAGE_MASK) || *offset > rp->end)
-            continue;
-
-        /* found it! construct the final physical address */
-        if (mmap_state == pci_mmap_io)
-            *offset += hose->io_base_phys - io_offset;
-        return rp;
-    }
-
-    return NULL;
-}
-
-/*
  * Set vm_page_prot of VMA, as appropriate for this architecture, for a pci
  * device mapping.
  */
@@ -451,12 +388,15 @@ int pci_mmap_page_range(struct pci_dev *
 {
     resource_size_t offset =
         ((resource_size_t)vma->vm_pgoff) << PAGE_SHIFT;
-    struct resource *rp;
     int ret;

-    rp = __pci_mmap_make_offset(dev, &offset, mmap_state);
-    if (rp == NULL)
-        return -EINVAL;
+    if (mmap_state == pci_mmap_io) {
+        struct pci_controller *hose = pci_bus_to_host(dev->bus);
+
+        /* hose should never be NULL */
+        offset += hose->io_base_phys -
+              ((unsigned long)hose->io_base_virt - _IO_BASE);
+    }

     vma->vm_pgoff = offset >> PAGE_SHIFT;
     vma->vm_page_prot = __pci_mmap_set_pgprot(dev, rp,
@@ -631,9 +571,7 @@ void pci_resource_to_user(const struct p
      *
      * Hopefully, the sysfs insterface is immune to that gunk. Once X
      * has been fixed (and the fix spread enough), we can re-enable the
-     * 2 lines below and pass down a BAR value to userland. In that case
-     * we'll also have to re-enable the matching code in
-     * __pci_mmap_make_offset().
+     * 2 lines below and pass down a BAR value to userland.
      *
      * BenH.
      */
Index: linux-2.6/arch/sparc/kernel/pci.c
===================================================================
--- linux-2.6.orig/arch/sparc/kernel/pci.c
+++ linux-2.6/arch/sparc/kernel/pci.c
@@ -732,119 +732,6 @@ int pcibios_enable_device(struct pci_dev

 /* Platform support for /proc/bus/pci/X/Y mmap()s. */

-/* If the user uses a host-bridge as the PCI device, he may use
- * this to perform a raw mmap() of the I/O or MEM space behind
- * that controller.
- *
- * This can be useful for execution of x86 PCI bios initialization code
- * on a PCI card, like the xfree86 int10 stuff does.
- */
-static int __pci_mmap_make_offset_bus(struct pci_dev *pdev, struct
vm_area_struct *vma,
-                      enum pci_mmap_state mmap_state)
-{
-    struct pci_pbm_info *pbm = pdev->dev.archdata.host_controller;
-    unsigned long space_size, user_offset, user_size;
-
-    if (mmap_state == pci_mmap_io) {
-        space_size = resource_size(&pbm->io_space);
-    } else {
-        space_size = resource_size(&pbm->mem_space);
-    }
-
-    /* Make sure the request is in range. */
-    user_offset = vma->vm_pgoff << PAGE_SHIFT;
-    user_size = vma->vm_end - vma->vm_start;
-
-    if (user_offset >= space_size ||
-        (user_offset + user_size) > space_size)
-        return -EINVAL;
-
-    if (mmap_state == pci_mmap_io) {
-        vma->vm_pgoff = (pbm->io_space.start +
-                 user_offset) >> PAGE_SHIFT;
-    } else {
-        vma->vm_pgoff = (pbm->mem_space.start +
-                 user_offset) >> PAGE_SHIFT;
-    }
-
-    return 0;
-}
-
-/* Adjust vm_pgoff of VMA such that it is the physical page offset
- * corresponding to the 32-bit pci bus offset for DEV requested by the user.
- *
- * Basically, the user finds the base address for his device which he wishes
- * to mmap.  They read the 32-bit value from the config space base register,
- * add whatever PAGE_SIZE multiple offset they wish, and feed this into the
- * offset parameter of mmap on /proc/bus/pci/XXX for that device.
- *
- * Returns negative error code on failure, zero on success.
- */
-static int __pci_mmap_make_offset(struct pci_dev *pdev,
-                  struct vm_area_struct *vma,
-                  enum pci_mmap_state mmap_state)
-{
-    unsigned long user_paddr, user_size;
-    int i, err;
-
-    /* First compute the physical address in vma->vm_pgoff,
-     * making sure the user offset is within range in the
-     * appropriate PCI space.
-     */
-    err = __pci_mmap_make_offset_bus(pdev, vma, mmap_state);
-    if (err)
-        return err;
-
-    /* If this is a mapping on a host bridge, any address
-     * is OK.
-     */
-    if ((pdev->class >> 8) == PCI_CLASS_BRIDGE_HOST)
-        return err;
-
-    /* Otherwise make sure it's in the range for one of the
-     * device's resources.
-     */
-    user_paddr = vma->vm_pgoff << PAGE_SHIFT;
-    user_size = vma->vm_end - vma->vm_start;
-
-    for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
-        struct resource *rp = &pdev->resource[i];
-        resource_size_t aligned_end;
-
-        /* Active? */
-        if (!rp->flags)
-            continue;
-
-        /* Same type? */
-        if (i == PCI_ROM_RESOURCE) {
-            if (mmap_state != pci_mmap_mem)
-                continue;
-        } else {
-            if ((mmap_state == pci_mmap_io &&
-                 (rp->flags & IORESOURCE_IO) == 0) ||
-                (mmap_state == pci_mmap_mem &&
-                 (rp->flags & IORESOURCE_MEM) == 0))
-                continue;
-        }
-
-        /* Align the resource end to the next page address.
-         * PAGE_SIZE intentionally added instead of (PAGE_SIZE - 1),
-         * because actually we need the address of the next byte
-         * after rp->end.
-         */
-        aligned_end = (rp->end + PAGE_SIZE) & PAGE_MASK;
-
-        if ((rp->start <= user_paddr) &&
-            (user_paddr + user_size) <= aligned_end)
-            break;
-    }
-
-    if (i > PCI_ROM_RESOURCE)
-        return -EINVAL;
-
-    return 0;
-}
-
 /* Set vm_page_prot of VMA, as appropriate for this architecture, for a pci
  * device mapping.
  */
@@ -869,10 +756,6 @@ int pci_mmap_page_range(struct pci_dev *
 {
     int ret;

-    ret = __pci_mmap_make_offset(dev, vma, mmap_state);
-    if (ret < 0)
-        return ret;
-
     __pci_mmap_set_pgprot(dev, vma, mmap_state);

     vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
Index: linux-2.6/arch/xtensa/kernel/pci.c
===================================================================
--- linux-2.6.orig/arch/xtensa/kernel/pci.c
+++ linux-2.6/arch/xtensa/kernel/pci.c
@@ -272,68 +272,6 @@ pci_controller_num(struct pci_dev *dev)
  */

 /*
- * Adjust vm_pgoff of VMA such that it is the physical page offset
- * corresponding to the 32-bit pci bus offset for DEV requested by the user.
- *
- * Basically, the user finds the base address for his device which he wishes
- * to mmap.  They read the 32-bit value from the config space base register,
- * add whatever PAGE_SIZE multiple offset they wish, and feed this into the
- * offset parameter of mmap on /proc/bus/pci/XXX for that device.
- *
- * Returns negative error code on failure, zero on success.
- */
-static __inline__ int
-__pci_mmap_make_offset(struct pci_dev *dev, struct vm_area_struct *vma,
-               enum pci_mmap_state mmap_state)
-{
-    struct pci_controller *pci_ctrl = (struct pci_controller*) dev->sysdata;
-    unsigned long offset = vma->vm_pgoff << PAGE_SHIFT;
-    unsigned long io_offset = 0;
-    int i, res_bit;
-
-    if (pci_ctrl == 0)
-        return -EINVAL;        /* should never happen */
-
-    /* If memory, add on the PCI bridge address offset */
-    if (mmap_state == pci_mmap_mem) {
-        res_bit = IORESOURCE_MEM;
-    } else {
-        io_offset = (unsigned long)pci_ctrl->io_space.base;
-        offset += io_offset;
-        res_bit = IORESOURCE_IO;
-    }
-
-    /*
-     * Check that the offset requested corresponds to one of the
-     * resources of the device.
-     */
-    for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
-        struct resource *rp = &dev->resource[i];
-        int flags = rp->flags;
-
-        /* treat ROM as memory (should be already) */
-        if (i == PCI_ROM_RESOURCE)
-            flags |= IORESOURCE_MEM;
-
-        /* Active and same type? */
-        if ((flags & res_bit) == 0)
-            continue;
-
-        /* In the range of this resource? */
-        if (offset < (rp->start & PAGE_MASK) || offset > rp->end)
-            continue;
-
-        /* found it! construct the final physical address */
-        if (mmap_state == pci_mmap_io)
-            offset += pci_ctrl->io_space.start - io_offset;
-        vma->vm_pgoff = offset >> PAGE_SHIFT;
-        return 0;
-    }
-
-    return -EINVAL;
-}
-
-/*
  * Set vm_page_prot of VMA, as appropriate for this architecture, for a pci
  * device mapping.
  */
@@ -367,11 +305,18 @@ int pci_mmap_page_range(struct pci_dev *
             enum pci_mmap_state mmap_state,
             int write_combine)
 {
+    unsigned long offset = vma->vm_pgoff << PAGE_SHIFT;
     int ret;

-    ret = __pci_mmap_make_offset(dev, vma, mmap_state);
-    if (ret < 0)
-        return ret;
+    if (mmap_state == pci_mmap_io) {
+        struct pci_controller *pci_ctrl =
+                     (struct pci_controller *)dev->sysdata;
+
+        /* pci_ctrl should never be NULL */
+        offset += pci_ctrl->io_space.start - pci_ctrl->io_space.base;
+    }
+
+    vma->vm_pgoff = offset >> PAGE_SHIFT;

     __pci_mmap_set_pgprot(dev, vma, mmap_state, write_combine);

Index: linux-2.6/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-sysfs.c
+++ linux-2.6/drivers/pci/pci-sysfs.c
@@ -967,12 +967,23 @@ void pci_remove_legacy_files(struct pci_
 #ifdef HAVE_PCI_MMAP

 int pci_mmap_fits(struct pci_dev *pdev, int resno, struct vm_area_struct *vma,
+          enum pci_mmap_state mmap_type,
           enum pci_mmap_api mmap_api)
 {
     unsigned long nr, start, size, pci_start;
+    int flags;

     if (pci_resource_len(pdev, resno) == 0)
         return 0;
+
+    if (mmap_type == pci_mmap_mem)
+        flags = IORESOURCE_MEM;
+    else
+        flags = IORESOURCE_IO;
+
+    if (!(pci_resource_flags(pdev, resno) & flags))
+        return 0;
+
     nr = vma_pages(vma);
     start = vma->vm_pgoff;
     size = ((pci_resource_len(pdev, resno) - 1) >> PAGE_SHIFT) + 1;
@@ -999,7 +1010,6 @@ static int pci_mmap_resource(struct kobj
     struct pci_dev *pdev = to_pci_dev(kobj_to_dev(kobj));
     struct resource *res = attr->private;
     enum pci_mmap_state mmap_type;
-    resource_size_t start, end;
     int i;

     for (i = 0; i < PCI_ROM_RESOURCE; i++)
@@ -1011,7 +1021,8 @@ static int pci_mmap_resource(struct kobj
     if (res->flags & IORESOURCE_MEM && iomem_is_exclusive(res->start))
         return -EINVAL;

-    if (!pci_mmap_fits(pdev, i, vma, PCI_MMAP_SYSFS)) {
+    mmap_type = res->flags & IORESOURCE_MEM ? pci_mmap_mem : pci_mmap_io;
+    if (!pci_mmap_fits(pdev, i, vma, mmap_type, PCI_MMAP_SYSFS)) {
         WARN(1, "process \"%s\" tried to map 0x%08lx bytes at page
0x%08lx on %s BAR %d (start 0x%16Lx, size 0x%16Lx)\n",
             current->comm, vma->vm_end-vma->vm_start, vma->vm_pgoff,
             pci_name(pdev), i,
@@ -1020,13 +1031,7 @@ static int pci_mmap_resource(struct kobj
         return -EINVAL;
     }

-    /* pci_mmap_page_range() expects the same kind of entry as coming
-     * from /proc/bus/pci/ which is a "user visible" value. If this is
-     * different from the resource itself, arch will do necessary fixup.
-     */
-    pci_resource_to_user(pdev, i, res, &start, &end);
-    vma->vm_pgoff += start >> PAGE_SHIFT;
-    mmap_type = res->flags & IORESOURCE_MEM ? pci_mmap_mem : pci_mmap_io;
+    vma->vm_pgoff += res->start >> PAGE_SHIFT;
     return pci_mmap_page_range(pdev, res, vma, mmap_type, write_combine);
 }

Index: linux-2.6/drivers/pci/proc.c
===================================================================
--- linux-2.6.orig/drivers/pci/proc.c
+++ linux-2.6/drivers/pci/proc.c
@@ -227,26 +227,65 @@ static long proc_bus_pci_ioctl(struct fi
 }

 #ifdef HAVE_PCI_MMAP
+
+static int pci_user_to_resource(struct pci_dev *dev, resource_size_t *offset,
+                resource_size_t size, int flags)
+{
+    int i;
+
+    for (i = 0; i < PCI_ROM_RESOURCE; i++) {
+        resource_size_t start, end;
+        struct resource *res = &dev->resource[i];
+
+        if (!(res->flags & flags))
+            continue;
+
+        if (pci_resource_len(dev, i) == 0)
+            continue;
+
+        pci_resource_to_user(dev, i, res, &start, &end);
+        if (start <= *offset && (*offset + size - 1) <= end) {
+            *offset = res->start + (*offset - start);
+            return i;
+        }
+    }
+
+    return -ENODEV;
+}
+
 static int proc_bus_pci_mmap(struct file *file, struct vm_area_struct *vma)
 {
     struct pci_dev *dev = PDE_DATA(file_inode(file));
     struct pci_filp_private *fpriv = file->private_data;
-    int i, ret;
+    enum pci_mmap_state mmap_type = fpriv->mmap_state;
+    resource_size_t offset, size;
+    int i, ret, flags;

     if (!capable(CAP_SYS_RAWIO))
         return -EPERM;

-    /* Make sure the caller is mapping a real resource for this device */
-    for (i = 0; i < PCI_ROM_RESOURCE; i++) {
-        if (pci_mmap_fits(dev, i, vma,  PCI_MMAP_PROCFS))
-            break;
-    }
-
-    if (i >= PCI_ROM_RESOURCE)
+    offset = vma->vm_pgoff << PAGE_SHIFT;
+    size = vma->vm_end - vma->vm_start;
+    if (mmap_type == pci_mmap_mem)
+        flags = IORESOURCE_MEM;
+    else
+        flags = IORESOURCE_IO;
+    i = pci_user_to_resource(dev, &offset, size, flags);
+    if (i < 0)
         return -ENODEV;

+    vma->vm_pgoff = offset >> PAGE_SHIFT;
+    if (!pci_mmap_fits(dev, i, vma, mmap_type, PCI_MMAP_PROCFS)) {
+        WARN(1, "process \"%s\" tried to map 0x%08lx bytes at page
0x%08lx on %s BAR %d (start 0x%16Lx, size 0x%16Lx)\n",
+            current->comm, vma->vm_end-vma->vm_start, vma->vm_pgoff,
+            pci_name(dev), i,
+            (u64)pci_resource_start(dev, i),
+            (u64)pci_resource_len(dev, i));
+        return -EINVAL;
+    }
+
     ret = pci_mmap_page_range(dev, &dev->resource[i], vma,
-                  fpriv->mmap_state,
+                  mmap_type,
                   fpriv->write_combine);
     if (ret < 0)
         return ret;

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v11 04/60] sparc/PCI: Use correct offset for bus address to resource
  2016-05-05 15:53                           ` Yinghai Lu
@ 2016-05-05 22:02                             ` Benjamin Herrenschmidt
  2016-05-06  0:56                               ` Yinghai Lu
  2016-05-06 18:26                             ` Bjorn Helgaas
  1 sibling, 1 reply; 86+ messages in thread
From: Benjamin Herrenschmidt @ 2016-05-05 22:02 UTC (permalink / raw)
  To: Yinghai Lu, Bjorn Helgaas, David Miller
  Cc: Bjorn Helgaas, Linus Torvalds, Wei Yang, TJ, Yijing Wang,
	Khalid Aziz, linux-pci, Linux Kernel Mailing List,
	Michael Ellerman

On Thu, 2016-05-05 at 08:53 -0700, Yinghai Lu wrote:
> For powerpc io port, we still need extra offset from resource address
> to final address.
> 
>      resource_size_t offset =
>          ((resource_size_t)vma->vm_pgoff) << PAGE_SHIFT;
> 
> +    if (mmap_state == pci_mmap_io) {
> +        struct pci_controller *hose = pci_bus_to_host(dev->bus);
> +
> +        /* hose should never be NULL */
> +        offset += hose->io_base_phys -
> +              ((unsigned long)hose->io_base_virt - _IO_BASE);
> +    }
> 
>      vma->vm_pgoff = offset >> PAGE_SHIFT;
> 
> but sparc does not need that trick.

I'm not sure how sparc handles IO space but on powerpc, the IO resource
is not a physical address, it's a virtual address (coming from
ioremap). 

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v11 04/60] sparc/PCI: Use correct offset for bus address to resource
  2016-05-05 22:02                             ` Benjamin Herrenschmidt
@ 2016-05-06  0:56                               ` Yinghai Lu
  2016-05-06  4:18                                 ` Yinghai Lu
  0 siblings, 1 reply; 86+ messages in thread
From: Yinghai Lu @ 2016-05-06  0:56 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Bjorn Helgaas, David Miller, Bjorn Helgaas, Linus Torvalds,
	Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci,
	Linux Kernel Mailing List, Michael Ellerman

On Thu, May 5, 2016 at 3:02 PM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Thu, 2016-05-05 at 08:53 -0700, Yinghai Lu wrote:
>> For powerpc io port, we still need extra offset from resource address
>> to final address.
>>
>>      resource_size_t offset =
>>          ((resource_size_t)vma->vm_pgoff) << PAGE_SHIFT;
>>
>> +    if (mmap_state == pci_mmap_io) {
>> +        struct pci_controller *hose = pci_bus_to_host(dev->bus);
>> +
>> +        /* hose should never be NULL */
>> +        offset += hose->io_base_phys -
>> +              ((unsigned long)hose->io_base_virt - _IO_BASE);
>> +    }
>>
>>      vma->vm_pgoff = offset >> PAGE_SHIFT;
>>
>> but sparc does not need that trick.
>
> I'm not sure how sparc handles IO space but on powerpc, the IO resource
> is not a physical address, it's a virtual address (coming from
> ioremap).

That is interesting. Any reason for that ?

why just cpu_addr in resource directly ?

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v11 04/60] sparc/PCI: Use correct offset for bus address to resource
  2016-05-06  0:56                               ` Yinghai Lu
@ 2016-05-06  4:18                                 ` Yinghai Lu
  0 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-05-06  4:18 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Bjorn Helgaas, David Miller, Bjorn Helgaas, Linus Torvalds,
	Wei Yang, TJ, Yijing Wang, Khalid Aziz, linux-pci,
	Linux Kernel Mailing List, Michael Ellerman

On Thu, May 5, 2016 at 5:56 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Thu, May 5, 2016 at 3:02 PM, Benjamin Herrenschmidt
> <benh@kernel.crashing.org> wrote:
>> On Thu, 2016-05-05 at 08:53 -0700, Yinghai Lu wrote:
>>> For powerpc io port, we still need extra offset from resource address
>>> to final address.
>>>
>>>      resource_size_t offset =
>>>          ((resource_size_t)vma->vm_pgoff) << PAGE_SHIFT;
>>>
>>> +    if (mmap_state == pci_mmap_io) {
>>> +        struct pci_controller *hose = pci_bus_to_host(dev->bus);
>>> +
>>> +        /* hose should never be NULL */
>>> +        offset += hose->io_base_phys -
>>> +              ((unsigned long)hose->io_base_virt - _IO_BASE);
>>> +    }
>>>
>>>      vma->vm_pgoff = offset >> PAGE_SHIFT;
>>>
>>> but sparc does not need that trick.
>>
>> I'm not sure how sparc handles IO space but on powerpc, the IO resource
>> is not a physical address, it's a virtual address (coming from
>> ioremap).
>
> That is interesting. Any reason for that ?
>
> why just cpu_addr in resource directly ?

Never mind, I figured it out. sparc64 could use cpu_addr to access
io_port directly.

powerpc64 need to ioremap cpu_addr to virt then use that ioport.
so ioremap early and use virt address as resource value.
otherwise every outb in powerpc64 will need iormap and access then unmap.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v11 04/60] sparc/PCI: Use correct offset for bus address to resource
  2016-05-05 15:53                           ` Yinghai Lu
  2016-05-05 22:02                             ` Benjamin Herrenschmidt
@ 2016-05-06 18:26                             ` Bjorn Helgaas
  2016-05-10  6:18                               ` Yinghai Lu
  1 sibling, 1 reply; 86+ messages in thread
From: Bjorn Helgaas @ 2016-05-06 18:26 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Benjamin Herrenschmidt, David Miller, Bjorn Helgaas,
	Linus Torvalds, Wei Yang, TJ, Yijing Wang, Khalid Aziz,
	linux-pci, Linux Kernel Mailing List, Michael Ellerman

On Thu, May 05, 2016 at 08:53:14AM -0700, Yinghai Lu wrote:
> On Wed, May 4, 2016 at 5:25 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> > On Wed, May 4, 2016 at 11:46 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> >> On Wed, May 4, 2016 at 8:17 AM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> >>> My goal is to make pci_mmap_resource() and proc_bus_pci_mmap() look
> >>> very similar, e.g.,
> >>>
> >>>   /* locate resource */
> >>>   pci_user_to_resource()                # only in proc_bus_pci_mmap()
> >>>   if (!pci_mmap_fits()) {
> >>>     WARN(...);
> >>>     return -EINVAL;
> >>>   }
> >>>   pci_mmap_page_range();
> >
> 
> v3, that have more change to pass *res to make powerpc prot setting simple.

This looks corrupted.  On v4.6-rc2:

  $ stg import -M m/yh3 
  Checking for changes in the working directory ... done
  Importing patch "re-patch-v11-04-60-sparc-pci" ... fatal: corrupt patch at line 266
  stg import: Diff does not apply cleanly

> ...
> Subject: [RFC PATCH v3 2/2] PCI: Let pci_mmap_page_range() take resource addr
> 
> In 8c05cd08a7 ("PCI: fix offset check for sysfs mmapped files"), try
> to check exposed value with resource start/end in proc mmap path.
> |        start = vma->vm_pgoff;
> |        size = ((pci_resource_len(pdev, resno) - 1) >> PAGE_SHIFT) + 1;
> |        pci_start = (mmap_api == PCI_MMAP_PROCFS) ?
> |                        pci_resource_start(pdev, resno) >> PAGE_SHIFT : 0;
> |        if (start >= pci_start && start < pci_start + size &&
> |                        start + nr <= pci_start + size)
> 
> That would break sparc that exposed value is still BAR value.
> 
> In the patch:
> 1. in proc path: proc_bus_pci_mmap, try convert back to resource
>    before calling pci_mmap_page_range
> 2. in sysfs path: pci_mmap_resource will just offset with resource start.
> 3. all pci_mmap_page_range will all have vma->vm_pgoff with in resource
>    range instead of BAR value.
> 4. remove __pci_mmap_make_offset, as the checking is done
>    in pci_mmap_fits().

This looks like it could possibly be split into several patches.  I
think it's too big to apply as-is.

I'm not sure what bug this is fixing or what improvement it's making.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v11 04/60] sparc/PCI: Use correct offset for bus address to resource
  2016-05-06 18:26                             ` Bjorn Helgaas
@ 2016-05-10  6:18                               ` Yinghai Lu
  0 siblings, 0 replies; 86+ messages in thread
From: Yinghai Lu @ 2016-05-10  6:18 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Benjamin Herrenschmidt, David Miller, Bjorn Helgaas,
	Linus Torvalds, Wei Yang, TJ, Yijing Wang, Khalid Aziz,
	linux-pci, Linux Kernel Mailing List, Michael Ellerman

On Fri, May 6, 2016 at 11:26 AM, Bjorn Helgaas <helgaas@kernel.org> wrote:
>> v3, that have more change to pass *res to make powerpc prot setting simple.
>
> This looks corrupted.  On v4.6-rc2:
>
>   $ stg import -M m/yh3
>   Checking for changes in the working directory ... done
>   Importing patch "re-patch-v11-04-60-sparc-pci" ... fatal: corrupt patch at line 266
>   stg import: Diff does not apply cleanly

Just resent them in plain/text mail. Please have a look.

>
...
>
> This looks like it could possibly be split into several patches.  I
> think it's too big to apply as-is.

Not that big except it remove lots of lines.

>
> I'm not sure what bug this is fixing or what improvement it's making.

fix sparc64 proc mmap path, as it can not pass checking in pci_mmap_fits()
with comparing BAR value and resource adder without offset.

also it will make
   sparc/PCI: Use correct offset for bus address to resource
and other one much simple.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 86+ messages in thread

end of thread, other threads:[~2016-05-10  6:18 UTC | newest]

Thread overview: 86+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-08  0:15 [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 01/60] PCI: Fix iomem_is_exclusive() checking in pci_mmap_resource() Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 02/60] alpha/PCI: Only check iomem_is_exclusive() for IORESOURCE_MEM, not IORESOURCE_IO Yinghai Lu
2016-04-25 21:01   ` Bjorn Helgaas
2016-04-08  0:15 ` [PATCH v11 03/60] PCI: Add pci_find_bus_resource() Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 04/60] sparc/PCI: Use correct offset for bus address to resource Yinghai Lu
2016-04-22 20:49   ` Bjorn Helgaas
2016-04-28  4:55     ` Yinghai Lu
2016-04-28 13:56       ` Bjorn Helgaas
2016-04-29  7:19         ` Yinghai Lu
2016-05-03 22:52           ` Yinghai Lu
2016-05-04  0:37             ` Benjamin Herrenschmidt
2016-05-04  1:25               ` Bjorn Helgaas
2016-05-04  5:08                 ` Yinghai Lu
2016-05-04  5:52                   ` Yinghai Lu
2016-05-04 15:17                     ` Bjorn Helgaas
2016-05-04 18:46                       ` Yinghai Lu
2016-05-05  0:25                         ` Yinghai Lu
2016-05-05 15:53                           ` Yinghai Lu
2016-05-05 22:02                             ` Benjamin Herrenschmidt
2016-05-06  0:56                               ` Yinghai Lu
2016-05-06  4:18                                 ` Yinghai Lu
2016-05-06 18:26                             ` Bjorn Helgaas
2016-05-10  6:18                               ` Yinghai Lu
2016-05-04  4:17               ` David Miller
2016-04-08  0:15 ` [PATCH v11 05/60] sparc/PCI: Reserve legacy mmio after PCI mmio Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 06/60] sparc/PCI: Add IORESOURCE_MEM_64 for 64-bit resource in OF parsing Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 07/60] sparc/PCI: Keep resource idx order with bridge register number Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 08/60] PCI: Kill wrong quirk about M7101 Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 09/60] powerpc/PCI: Keep resource idx order with bridge register number Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 10/60] powerpc/PCI: Add IORESOURCE_MEM_64 for 64-bit resource in OF parsing Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 11/60] OF/PCI: Add IORESOURCE_MEM_64 for 64-bit resource Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 12/60] PCI: Check pref compatible bit for mem64 resource of PCIe device Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 13/60] PCI: Only treat non-pref mmio64 as pref if all bridges have MEM_64 Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 14/60] PCI: Add has_mem64 for struct host_bridge Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 15/60] PCI: Only treat non-pref mmio64 as pref if host bridge has mmio64 Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 16/60] PCI: Restore pref MMIO allocation logic for host bridge without mmio64 Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 17/60] PCI: Don't release fixed resource for realloc Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 18/60] PCI: Claim fixed resource during remove/rescan path Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 19/60] PCI: Set resource to FIXED for LSI devices Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 20/60] PCI: Separate realloc list checking after allocation Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 21/60] PCI: Treat optional as required in first try for bridge rescan Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 22/60] PCI: Get new realloc size for bridge for last try Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 23/60] PCI: Don't release sibling bridge resources during hotplug Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 24/60] PCI: Cleanup res_to_dev_res() printout Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 25/60] PCI: Reuse res_to_dev_res() in reassign_resources_sorted() Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 26/60] PCI: Use correct align for optional only resources during sorting Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 27/60] PCI: Optimize bus min_align/size calculation during sizing Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 28/60] PCI: Optimize bus align/size calculation for optional " Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 29/60] PCI: Don't add too much optional size for hotplug bridge MMIO Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 30/60] PCI: Reorder resources list for required/optional resources Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 31/60] PCI: Remove duplicated code for resource sorting Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 32/60] PCI: Rename pdev_sort_resources() to pdev_assign_resources_prepare() Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 33/60] PCI: Treat ROM resource as optional during realloc Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 34/60] PCI: Add debug printout during releasing partial assigned resources Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 35/60] PCI: Simplify res reference using in __assign_resources_sorted() Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 36/60] PCI: Add __add_to_list() Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 37/60] PCI: Cache window alignment value during bus sizing Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 38/60] PCI: Check if resource is allocated before trying to assign one Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 39/60] PCI: Separate out save_resources()/restore_resources() Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 40/60] PCI: Move comment to pci_need_to_release() Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 41/60] PCI: Separate required+optional assigning to another function Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 42/60] PCI: Skip required+optional if there is no optional Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 43/60] PCI: Move saved required resource list out of required+optional assigning Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 44/60] PCI: Add alt_size ressource allocation support Yinghai Lu
2016-04-08  0:56   ` Linus Torvalds
2016-04-08  5:50     ` Yinghai Lu
2016-04-08  6:24     ` Benjamin Herrenschmidt
2016-04-08  0:15 ` [PATCH v11 45/60] PCI: Add support for more than two alt_size entries under same bridge Yinghai Lu
2016-04-08  0:15 ` [PATCH v11 46/60] PCI: Fix size calculation with old_size on rescan path Yinghai Lu
2016-04-08  0:16 ` [PATCH v11 47/60] PCI: Don't add too much optional size for hotplug bridge io Yinghai Lu
2016-04-08  0:16 ` [PATCH v11 48/60] PCI: Move ISA io port align out of calculate_iosize() Yinghai Lu
2016-04-08  0:16 ` [PATCH v11 49/60] PCI: Don't add too much io port for hotplug bridge with old size Yinghai Lu
2016-04-08  0:16 ` [PATCH v11 50/60] PCI: Unify calculate_size() for io port and MMIO Yinghai Lu
2016-04-08  0:16 ` [PATCH v11 51/60] PCI: Allow bridge optional only io port resource required size to be 0 Yinghai Lu
2016-04-08  0:16 ` [PATCH v11 52/60] PCI: Unify skip_ioresource_align() Yinghai Lu
2016-04-08  0:16 ` [PATCH v11 53/60] PCI: Kill macro checking for bus io port sizing Yinghai Lu
2016-04-08  0:16 ` [PATCH v11 54/60] resources: Make allocate_resource() return best fit resource Yinghai Lu
2016-04-08  0:16 ` [PATCH v11 55/60] PCI, x86: Allocate from high in available window for MMIO Yinghai Lu
2016-04-08  0:16 ` [PATCH v11 56/60] PCI: Add debug print out for min_align and alt_size Yinghai Lu
2016-04-08  0:16 ` [PATCH v11 57/60] PCI, x86: Add pci=assign_pref_bars to reallocate pref BARs Yinghai Lu
2016-04-08  0:16 ` [PATCH v11 58/60] PCI: Introduce resource_disabled() Yinghai Lu
2016-04-08  0:16 ` [PATCH v11 59/60] PCI: Don't set flags to 0 when assign resource fail Yinghai Lu
2016-04-08  0:16 ` [PATCH v11 60/60] PCI: Only try to assign io port only for root bus that support it Yinghai Lu
2016-04-08  0:51 ` [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7 Linus Torvalds
2016-04-09  5:29   ` Yinghai Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).