All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] PCI: resource allocation related
@ 2012-05-23  6:34 Yinghai Lu
  2012-05-23  6:34 ` [PATCH 01/11] PCI: Should add children device res to fail list Yinghai Lu
                   ` (10 more replies)
  0 siblings, 11 replies; 55+ messages in thread
From: Yinghai Lu @ 2012-05-23  6:34 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Andrew Morton, Linus Torvalds, linux-pci, linux-kernel, Yinghai Lu

From: Yinghai Lu <ying6hai@kernel.org>

This patchset will try to make allocation to find suitable assignement.
1. will try to assign 64 bit resource above 4g at first.
2. will find space that is matched with needed size
3. will put resource in right location to leave more big alignment with left blank resource
4. will try option rom BAR as optional resources.

Could be found:
	git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-pci-res-alloc

and it is based on pci for-3.6 branch.

Thanks

Yinghai Lu

Yinghai Lu (11):
  PCI: Should add children device res to fail list
  PCI: Try to allocate mem64 above 4G at first
  intel-gtt: Read 64bit for gmar_bus_addr
  PCI: Make sure assign same align with large size resource at first
  resources: Split out __allocate_resource()
  resource: make find_resource could return just fit resource
  PCI: Don't allocate small resource in big empty space.
  resource: only return range with needed align
  PCI: Add is_pci_iov_resource_idx()
  PCI: Sort unassigned resources with correct alignment
  PCI: Treat ROM resource as optional during assigning.

 drivers/char/agp/intel-gtt.c |   14 ++++--
 drivers/pci/bus.c            |   38 +++++++++++---
 drivers/pci/setup-bus.c      |   78 ++++++++++++++++++-----------
 drivers/pci/setup-res.c      |   28 +++++++----
 include/linux/ioport.h       |    8 +++
 include/linux/pci.h          |   23 ++++++++
 kernel/resource.c            |  114 ++++++++++++++++++++++++++++++++++++++---
 7 files changed, 243 insertions(+), 60 deletions(-)

-- 
1.7.7


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 01/11] PCI: Should add children device res to fail list
  2012-05-23  6:34 [PATCH 00/11] PCI: resource allocation related Yinghai Lu
@ 2012-05-23  6:34 ` Yinghai Lu
  2012-05-23  6:34 ` [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first Yinghai Lu
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 55+ messages in thread
From: Yinghai Lu @ 2012-05-23  6:34 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Andrew Morton, Linus Torvalds, linux-pci, linux-kernel, Yinghai Lu

We can stop according to try number now. So do need that as stop sign.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 192172c..64d478f 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -283,7 +283,7 @@ static void assign_requested_resources_sorted(struct list_head *head,
 		idx = res - &dev_res->dev->resource[0];
 		if (resource_size(res) &&
 		    pci_assign_resource(dev_res->dev, idx)) {
-			if (fail_head && !pci_is_root_bus(dev_res->dev->bus)) {
+			if (fail_head) {
 				/*
 				 * if the failed res is for ROM BAR, and it will
 				 * be enabled later, don't add it to the list
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-23  6:34 [PATCH 00/11] PCI: resource allocation related Yinghai Lu
  2012-05-23  6:34 ` [PATCH 01/11] PCI: Should add children device res to fail list Yinghai Lu
@ 2012-05-23  6:34 ` Yinghai Lu
  2012-05-23 15:57   ` Linus Torvalds
  2012-05-23  6:34 ` [PATCH 03/11] intel-gtt: Read 64bit for gmar_bus_addr Yinghai Lu
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Yinghai Lu @ 2012-05-23  6:34 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Andrew Morton, Linus Torvalds, linux-pci, linux-kernel, Yinghai Lu

and will fall back to below 4g if it can not find any above 4g.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/bus.c |   16 +++++++++++++---
 1 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 4ce5ef2..2429f1f 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -122,13 +122,17 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
 	int i, ret = -ENOMEM;
 	struct resource *r;
 	resource_size_t max = -1;
+	resource_size_t bottom = PCIBIOS_MAX_MEM_32 + 1ULL;
 
 	type_mask |= IORESOURCE_IO | IORESOURCE_MEM;
 
 	/* don't allocate too high if the pref mem doesn't support 64bit*/
-	if (!(res->flags & IORESOURCE_MEM_64))
+	if (!(res->flags & IORESOURCE_MEM_64)) {
 		max = PCIBIOS_MAX_MEM_32;
+		bottom = 0;
+	}
 
+again:
 	pci_bus_for_each_resource(bus, r, i) {
 		if (!r)
 			continue;
@@ -145,12 +149,18 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
 
 		/* Ok, try it out.. */
 		ret = allocate_resource(r, res, size,
-					r->start ? : min,
+					max(bottom, r->start ? : min),
 					max, align,
 					alignf, alignf_data);
 		if (ret == 0)
-			break;
+			return 0;
 	}
+
+	if (bottom != 0) {
+		bottom = 0;
+		goto again;
+	}
+
 	return ret;
 }
 
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 03/11] intel-gtt: Read 64bit for gmar_bus_addr
  2012-05-23  6:34 [PATCH 00/11] PCI: resource allocation related Yinghai Lu
  2012-05-23  6:34 ` [PATCH 01/11] PCI: Should add children device res to fail list Yinghai Lu
  2012-05-23  6:34 ` [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first Yinghai Lu
@ 2012-05-23  6:34 ` Yinghai Lu
  2012-05-23  7:21   ` Dave Airlie
  2012-05-23  6:34 ` [PATCH 04/11] PCI: Make sure assign same align with large size resource at first Yinghai Lu
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Yinghai Lu @ 2012-05-23  6:34 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Andrew Morton, Linus Torvalds, linux-pci, linux-kernel,
	Yinghai Lu, David Airlie

That bar could be 64bit pref mem.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: David Airlie <airlied@linux.ie>
---
 drivers/char/agp/intel-gtt.c |   14 ++++++++++----
 1 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
index 7f025fb..77e150e 100644
--- a/drivers/char/agp/intel-gtt.c
+++ b/drivers/char/agp/intel-gtt.c
@@ -770,16 +770,22 @@ static void i830_write_entry(dma_addr_t addr, unsigned int entry,
 static bool intel_enable_gtt(void)
 {
 	u32 gma_addr;
+	u32 addr_hi = 0;
 	u8 __iomem *reg;
+	int pos;
 
 	if (INTEL_GTT_GEN <= 2)
-		pci_read_config_dword(intel_private.pcidev, I810_GMADDR,
-				      &gma_addr);
+		pos = I810_GMADDR;
 	else
-		pci_read_config_dword(intel_private.pcidev, I915_GMADDR,
-				      &gma_addr);
+		pos = I915_GMADDR;
+
+	pci_read_config_dword(intel_private.pcidev, pos, &gma_addr);
+
+	if (gma_addr & PCI_BASE_ADDRESS_MEM_TYPE_64)
+		pci_read_config_dword(intel_private.pcidev, pos + 4, &addr_hi);
 
 	intel_private.gma_bus_addr = (gma_addr & PCI_BASE_ADDRESS_MEM_MASK);
+	intel_private.gma_bus_addr |= (u64)addr_hi << 32;
 
 	if (INTEL_GTT_GEN >= 6)
 	    return true;
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 04/11] PCI: Make sure assign same align with large size resource at first
  2012-05-23  6:34 [PATCH 00/11] PCI: resource allocation related Yinghai Lu
                   ` (2 preceding siblings ...)
  2012-05-23  6:34 ` [PATCH 03/11] intel-gtt: Read 64bit for gmar_bus_addr Yinghai Lu
@ 2012-05-23  6:34 ` Yinghai Lu
  2012-05-23  6:34 ` [PATCH 05/11] resources: Split out __allocate_resource() Yinghai Lu
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 55+ messages in thread
From: Yinghai Lu @ 2012-05-23  6:34 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Andrew Morton, Linus Torvalds, linux-pci, linux-kernel, Yinghai Lu

When sorting them, put the one with large size before small size.


Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 64d478f..0568f29 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -128,7 +128,7 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
 	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
 		struct resource *r;
 		struct pci_dev_resource *dev_res, *tmp;
-		resource_size_t r_align;
+		resource_size_t r_align, r_size;
 		struct list_head *n;
 
 		r = &dev->resource[i];
@@ -145,6 +145,7 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
 				 i, r);
 			continue;
 		}
+		r_size = resource_size(r);
 
 		tmp = kzalloc(sizeof(*tmp), GFP_KERNEL);
 		if (!tmp)
@@ -161,7 +162,9 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
 			align = pci_resource_alignment(dev_res->dev,
 							 dev_res->res);
 
-			if (r_align > align) {
+			if (r_align > align ||
+			    (r_align == align &&
+			     r_size > resource_size(dev_res->res))) {
 				n = &dev_res->list;
 				break;
 			}
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 05/11] resources: Split out __allocate_resource()
  2012-05-23  6:34 [PATCH 00/11] PCI: resource allocation related Yinghai Lu
                   ` (3 preceding siblings ...)
  2012-05-23  6:34 ` [PATCH 04/11] PCI: Make sure assign same align with large size resource at first Yinghai Lu
@ 2012-05-23  6:34 ` Yinghai Lu
  2012-05-23  6:34 ` [PATCH 06/11] resource: make find_resource could return just fit resource Yinghai Lu
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 55+ messages in thread
From: Yinghai Lu @ 2012-05-23  6:34 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Andrew Morton, Linus Torvalds, linux-pci, linux-kernel, Yinghai Lu

It will take bool lock, so we could use it in other functions that
hold the resource lock.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 kernel/resource.c |   26 +++++++++++++++++++++-----
 1 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 51ade23..41d7050 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -521,14 +521,14 @@ out:
  * @alignf: alignment function, optional, called if not NULL
  * @alignf_data: arbitrary data to pass to the @alignf function
  */
-int allocate_resource(struct resource *root, struct resource *new,
+static int __allocate_resource(struct resource *root, struct resource *new,
 		      resource_size_t size, resource_size_t min,
 		      resource_size_t max, resource_size_t align,
 		      resource_size_t (*alignf)(void *,
 						const struct resource *,
 						resource_size_t,
 						resource_size_t),
-		      void *alignf_data)
+		      void *alignf_data, bool lock)
 {
 	int err;
 	struct resource_constraint constraint;
@@ -542,19 +542,35 @@ int allocate_resource(struct resource *root, struct resource *new,
 	constraint.alignf = alignf;
 	constraint.alignf_data = alignf_data;
 
-	if ( new->parent ) {
+	if (new->parent && lock) {
 		/* resource is already allocated, try reallocating with
 		   the new constraints */
 		return reallocate_resource(root, new, size, &constraint);
 	}
 
-	write_lock(&resource_lock);
+	if (lock)
+		write_lock(&resource_lock);
 	err = find_resource(root, new, size, &constraint);
 	if (err >= 0 && __request_resource(root, new))
 		err = -EBUSY;
-	write_unlock(&resource_lock);
+	if (lock)
+		write_unlock(&resource_lock);
 	return err;
 }
+int allocate_resource(struct resource *root, struct resource *new,
+		      resource_size_t size, resource_size_t min,
+		      resource_size_t max, resource_size_t align,
+		      resource_size_t (*alignf)(void *,
+						const struct resource *,
+						resource_size_t,
+						resource_size_t),
+		      void *alignf_data)
+{
+	bool lock = true;
+
+	return __allocate_resource(root, new, size, min, max, align,
+				   alignf, alignf_data, lock);
+}
 
 EXPORT_SYMBOL(allocate_resource);
 
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 06/11] resource: make find_resource could return just fit resource
  2012-05-23  6:34 [PATCH 00/11] PCI: resource allocation related Yinghai Lu
                   ` (4 preceding siblings ...)
  2012-05-23  6:34 ` [PATCH 05/11] resources: Split out __allocate_resource() Yinghai Lu
@ 2012-05-23  6:34 ` Yinghai Lu
  2012-05-23  6:34 ` [PATCH 07/11] PCI: Don't allocate small resource in big empty space Yinghai Lu
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 55+ messages in thread
From: Yinghai Lu @ 2012-05-23  6:34 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Andrew Morton, Linus Torvalds, linux-pci, linux-kernel, Yinghai Lu

Find all suitable empty slots and pick one just fit, so we could spare the big
slot for needed ones later.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 kernel/resource.c |   60 +++++++++++++++++++++++++++++++++++++++++++++++++---
 1 files changed, 56 insertions(+), 4 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 41d7050..45ab24d 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -435,7 +435,7 @@ static int __find_resource(struct resource *root, struct resource *old,
 			alloc.end = alloc.start + size - 1;
 			if (resource_contains(&avail, &alloc)) {
 				new->start = alloc.start;
-				new->end = alloc.end;
+				new->end = !old ? avail.end : alloc.end;
 				return 0;
 			}
 		}
@@ -450,14 +450,66 @@ next:		if (!this || this->end == root->end)
 	return -EBUSY;
 }
 
+struct avail_resource {
+	struct list_head list;
+	struct resource res;
+};
 /*
  * Find empty slot in the resource tree given range and alignment.
  */
 static int find_resource(struct resource *root, struct resource *new,
 			resource_size_t size,
-			struct resource_constraint  *constraint)
+			struct resource_constraint *constraint, bool fit)
 {
-	return  __find_resource(root, NULL, new, size, constraint);
+	int ret = -1;
+	LIST_HEAD(head);
+	struct avail_resource *avail, *tmp;
+	resource_size_t avail_start = 0, avail_size = -1ULL;
+
+	if (!fit) {
+		ret = __find_resource(root, NULL, new, size, constraint);
+		if (!ret)
+			new->end = new->start + size - 1;
+		return ret;
+	}
+
+again:
+	/* find all suitable ones */
+	avail = kzalloc(sizeof(*avail), GFP_KERNEL);
+	if (!avail)
+		goto out;
+
+	avail->res.start = new->start;
+	avail->res.end = new->end;
+	avail->res.flags = new->flags;
+	ret = __find_resource(root, NULL, &avail->res, size, constraint);
+	if (ret || __request_resource(root, &avail->res)) {
+		ret = -EBUSY;
+		kfree(avail);
+		goto out;
+	}
+	/* add to the list */
+	list_add(&avail->list, &head);
+	goto again;
+
+out:
+	/* pick up the smallest one and delete the list */
+	list_for_each_entry_safe(avail, tmp, &head, list) {
+		if (resource_size(&avail->res) < avail_size) {
+			avail_size = resource_size(&avail->res);
+			avail_start = avail->res.start;
+			ret = 0;
+		}
+		list_del(&avail->list);
+		__release_resource(&avail->res);
+		kfree(avail);
+	}
+
+	if (!ret) {
+		new->start = avail_start;
+		new->end = new->start + size - 1;
+	}
+	return ret;
 }
 
 /**
@@ -550,7 +602,7 @@ static int __allocate_resource(struct resource *root, struct resource *new,
 
 	if (lock)
 		write_lock(&resource_lock);
-	err = find_resource(root, new, size, &constraint);
+	err = find_resource(root, new, size, &constraint, false);
 	if (err >= 0 && __request_resource(root, new))
 		err = -EBUSY;
 	if (lock)
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 07/11] PCI: Don't allocate small resource in big empty space.
  2012-05-23  6:34 [PATCH 00/11] PCI: resource allocation related Yinghai Lu
                   ` (5 preceding siblings ...)
  2012-05-23  6:34 ` [PATCH 06/11] resource: make find_resource could return just fit resource Yinghai Lu
@ 2012-05-23  6:34 ` Yinghai Lu
  2012-05-23  6:34 ` [PATCH 08/11] resource: only return range with needed align Yinghai Lu
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 55+ messages in thread
From: Yinghai Lu @ 2012-05-23  6:34 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Andrew Morton, Linus Torvalds, linux-pci, linux-kernel, Yinghai Lu

Use updated find_resource to return matched resource instead using head
of bigger range.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/bus.c       |   22 ++++++++++++++++++----
 drivers/pci/setup-bus.c |   12 ++++++++----
 drivers/pci/setup-res.c |   28 ++++++++++++++++++----------
 include/linux/ioport.h  |    8 ++++++++
 include/linux/pci.h     |   10 ++++++++++
 kernel/resource.c       |   22 +++++++++++++++++++---
 6 files changed, 81 insertions(+), 21 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 2429f1f..a7ba102 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -110,14 +110,14 @@ void pci_bus_remove_resources(struct pci_bus *bus)
  * for a specific device resource.
  */
 int
-pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
+pci_bus_alloc_resource_fit(struct pci_bus *bus, struct resource *res,
 		resource_size_t size, resource_size_t align,
 		resource_size_t min, unsigned int type_mask,
 		resource_size_t (*alignf)(void *,
 					  const struct resource *,
 					  resource_size_t,
 					  resource_size_t),
-		void *alignf_data)
+		void *alignf_data, bool fit)
 {
 	int i, ret = -ENOMEM;
 	struct resource *r;
@@ -148,10 +148,10 @@ again:
 			continue;
 
 		/* Ok, try it out.. */
-		ret = allocate_resource(r, res, size,
+		ret = allocate_resource_fit(r, res, size,
 					max(bottom, r->start ? : min),
 					max, align,
-					alignf, alignf_data);
+					alignf, alignf_data, fit);
 		if (ret == 0)
 			return 0;
 	}
@@ -164,6 +164,20 @@ again:
 	return ret;
 }
 
+int
+pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
+		resource_size_t size, resource_size_t align,
+		resource_size_t min, unsigned int type_mask,
+		resource_size_t (*alignf)(void *,
+					  const struct resource *,
+					  resource_size_t,
+					  resource_size_t),
+		void *alignf_data)
+{
+	return pci_bus_alloc_resource_fit(bus, res, size, align, min, type_mask,
+					alignf, alignf_data, false);
+}
+
 /**
  * pci_bus_add_device - add a single device
  * @dev: device to add
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 0568f29..4d6823f 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -275,7 +275,7 @@ out:
  * requests that could not satisfied to the failed_list.
  */
 static void assign_requested_resources_sorted(struct list_head *head,
-				 struct list_head *fail_head)
+				 struct list_head *fail_head, bool fit)
 {
 	struct resource *res;
 	struct pci_dev_resource *dev_res;
@@ -285,7 +285,7 @@ static void assign_requested_resources_sorted(struct list_head *head,
 		res = dev_res->res;
 		idx = res - &dev_res->dev->resource[0];
 		if (resource_size(res) &&
-		    pci_assign_resource(dev_res->dev, idx)) {
+		    pci_assign_resource_fit(dev_res->dev, idx, fit)) {
 			if (fail_head) {
 				/*
 				 * if the failed res is for ROM BAR, and it will
@@ -320,6 +320,7 @@ static void __assign_resources_sorted(struct list_head *head,
 	LIST_HEAD(local_fail_head);
 	struct pci_dev_resource *save_res;
 	struct pci_dev_resource *dev_res;
+	bool fit = true;
 
 	/* Check if optional add_size is there */
 	if (!realloc_head || list_empty(realloc_head))
@@ -339,7 +340,7 @@ static void __assign_resources_sorted(struct list_head *head,
 							dev_res->res);
 
 	/* Try updated head list with add_size added */
-	assign_requested_resources_sorted(head, &local_fail_head);
+	assign_requested_resources_sorted(head, &local_fail_head, fit);
 
 	/* all assigned with add_size ? */
 	if (list_empty(&local_fail_head)) {
@@ -366,9 +367,12 @@ static void __assign_resources_sorted(struct list_head *head,
 	}
 	free_list(&save_head);
 
+	/* will need to expand later, so not use fit */
+	fit = false;
+
 requested_and_reassign:
 	/* Satisfy the must-have resource requests */
-	assign_requested_resources_sorted(head, fail_head);
+	assign_requested_resources_sorted(head, fail_head, fit);
 
 	/* Try to satisfy any additional optional resource
 		requests */
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index eea85da..7010ad9 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -128,7 +128,8 @@ void pci_disable_bridge_window(struct pci_dev *dev)
 }
 
 static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
-		int resno, resource_size_t size, resource_size_t align)
+		int resno, resource_size_t size, resource_size_t align,
+		bool fit)
 {
 	struct resource *res = dev->resource + resno;
 	resource_size_t min;
@@ -137,9 +138,9 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 	min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO : PCIBIOS_MIN_MEM;
 
 	/* First, try exact prefetching match.. */
-	ret = pci_bus_alloc_resource(bus, res, size, align, min,
+	ret = pci_bus_alloc_resource_fit(bus, res, size, align, min,
 				     IORESOURCE_PREFETCH,
-				     pcibios_align_resource, dev);
+				     pcibios_align_resource, dev, fit);
 
 	if (ret < 0 && (res->flags & IORESOURCE_PREFETCH)) {
 		/*
@@ -148,8 +149,8 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 		 * But a prefetching area can handle a non-prefetching
 		 * window (it will just not perform as well).
 		 */
-		ret = pci_bus_alloc_resource(bus, res, size, align, min, 0,
-					     pcibios_align_resource, dev);
+		ret = pci_bus_alloc_resource_fit(bus, res, size, align, min, 0,
+					     pcibios_align_resource, dev, fit);
 	}
 	return ret;
 }
@@ -206,7 +207,8 @@ static int pci_revert_fw_address(struct resource *res, struct pci_dev *dev,
 	return ret;
 }
 
-static int _pci_assign_resource(struct pci_dev *dev, int resno, int size, resource_size_t min_align)
+static int _pci_assign_resource(struct pci_dev *dev, int resno, int size,
+				 resource_size_t min_align, bool fit)
 {
 	struct resource *res = dev->resource + resno;
 	struct pci_bus *bus;
@@ -214,7 +216,8 @@ static int _pci_assign_resource(struct pci_dev *dev, int resno, int size, resour
 	char *type;
 
 	bus = dev->bus;
-	while ((ret = __pci_assign_resource(bus, dev, resno, size, min_align))) {
+	while ((ret = __pci_assign_resource(bus, dev, resno, size,
+						min_align, fit))) {
 		if (!bus->parent || !bus->self->transparent)
 			break;
 		bus = bus->parent;
@@ -253,7 +256,7 @@ int pci_reassign_resource(struct pci_dev *dev, int resno, resource_size_t addsiz
 
 	/* already aligned with min_align */
 	new_size = resource_size(res) + addsize;
-	ret = _pci_assign_resource(dev, resno, new_size, min_align);
+	ret = _pci_assign_resource(dev, resno, new_size, min_align, false);
 	if (!ret) {
 		res->flags &= ~IORESOURCE_STARTALIGN;
 		dev_info(&dev->dev, "BAR %d: reassigned %pR\n", resno, res);
@@ -263,7 +266,7 @@ int pci_reassign_resource(struct pci_dev *dev, int resno, resource_size_t addsiz
 	return ret;
 }
 
-int pci_assign_resource(struct pci_dev *dev, int resno)
+int pci_assign_resource_fit(struct pci_dev *dev, int resno, bool fit)
 {
 	struct resource *res = dev->resource + resno;
 	resource_size_t align, size;
@@ -279,7 +282,7 @@ int pci_assign_resource(struct pci_dev *dev, int resno)
 
 	bus = dev->bus;
 	size = resource_size(res);
-	ret = _pci_assign_resource(dev, resno, size, align);
+	ret = _pci_assign_resource(dev, resno, size, align, fit);
 
 	/*
 	 * If we failed to assign anything, let's try the address
@@ -298,6 +301,11 @@ int pci_assign_resource(struct pci_dev *dev, int resno)
 	return ret;
 }
 
+int pci_assign_resource(struct pci_dev *dev, int resno)
+{
+	return pci_assign_resource_fit(dev, resno, false);
+}
+
 int pci_enable_resources(struct pci_dev *dev, int mask)
 {
 	u16 cmd, old_cmd;
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 589e0e7..255852e 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -148,6 +148,14 @@ extern struct resource *insert_resource_conflict(struct resource *parent, struct
 extern int insert_resource(struct resource *parent, struct resource *new);
 extern void insert_resource_expand_to_fit(struct resource *root, struct resource *new);
 extern void arch_remove_reservations(struct resource *avail);
+int allocate_resource_fit(struct resource *root, struct resource *new,
+			     resource_size_t size, resource_size_t min,
+			     resource_size_t max, resource_size_t align,
+			     resource_size_t (*alignf)(void *,
+						       const struct resource *,
+						       resource_size_t,
+						       resource_size_t),
+			     void *alignf_data, bool fit);
 extern int allocate_resource(struct resource *root, struct resource *new,
 			     resource_size_t size, resource_size_t min,
 			     resource_size_t max, resource_size_t align,
diff --git a/include/linux/pci.h b/include/linux/pci.h
index a0e2d7f..c0704a0 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -827,6 +827,7 @@ int __pci_reset_function_locked(struct pci_dev *dev);
 int pci_reset_function(struct pci_dev *dev);
 void pci_update_resource(struct pci_dev *dev, int resno);
 int __must_check pci_assign_resource(struct pci_dev *dev, int i);
+int __must_check pci_assign_resource_fit(struct pci_dev *dev, int i, bool fit);
 int __must_check pci_reassign_resource(struct pci_dev *dev, int i, resource_size_t add_size, resource_size_t align);
 int pci_select_bars(struct pci_dev *dev, unsigned long flags);
 
@@ -943,6 +944,15 @@ int __must_check pci_bus_alloc_resource(struct pci_bus *bus,
 						  resource_size_t,
 						  resource_size_t),
 			void *alignf_data);
+int __must_check pci_bus_alloc_resource_fit(struct pci_bus *bus,
+			struct resource *res, resource_size_t size,
+			resource_size_t align, resource_size_t min,
+			unsigned int type_mask,
+			resource_size_t (*alignf)(void *,
+						  const struct resource *,
+						  resource_size_t,
+						  resource_size_t),
+			void *alignf_data, bool fit);
 void pci_enable_bridges(struct pci_bus *bus);
 
 /* Proper probing supporting hot-pluggable devices */
diff --git a/kernel/resource.c b/kernel/resource.c
index 45ab24d..b4dae55 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -580,7 +580,7 @@ static int __allocate_resource(struct resource *root, struct resource *new,
 						const struct resource *,
 						resource_size_t,
 						resource_size_t),
-		      void *alignf_data, bool lock)
+		      void *alignf_data, bool lock, bool fit)
 {
 	int err;
 	struct resource_constraint constraint;
@@ -602,13 +602,28 @@ static int __allocate_resource(struct resource *root, struct resource *new,
 
 	if (lock)
 		write_lock(&resource_lock);
-	err = find_resource(root, new, size, &constraint, false);
+	err = find_resource(root, new, size, &constraint, fit);
 	if (err >= 0 && __request_resource(root, new))
 		err = -EBUSY;
 	if (lock)
 		write_unlock(&resource_lock);
 	return err;
 }
+int allocate_resource_fit(struct resource *root, struct resource *new,
+		      resource_size_t size, resource_size_t min,
+		      resource_size_t max, resource_size_t align,
+		      resource_size_t (*alignf)(void *,
+						const struct resource *,
+						resource_size_t,
+						resource_size_t),
+		      void *alignf_data, bool fit)
+{
+	bool lock = true;
+
+	return __allocate_resource(root, new, size, min, max, align,
+				   alignf, alignf_data, lock, fit);
+}
+
 int allocate_resource(struct resource *root, struct resource *new,
 		      resource_size_t size, resource_size_t min,
 		      resource_size_t max, resource_size_t align,
@@ -619,9 +634,10 @@ int allocate_resource(struct resource *root, struct resource *new,
 		      void *alignf_data)
 {
 	bool lock = true;
+	bool fit = false;
 
 	return __allocate_resource(root, new, size, min, max, align,
-				   alignf, alignf_data, lock);
+				   alignf, alignf_data, lock, fit);
 }
 
 EXPORT_SYMBOL(allocate_resource);
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 08/11] resource: only return range with needed align
  2012-05-23  6:34 [PATCH 00/11] PCI: resource allocation related Yinghai Lu
                   ` (6 preceding siblings ...)
  2012-05-23  6:34 ` [PATCH 07/11] PCI: Don't allocate small resource in big empty space Yinghai Lu
@ 2012-05-23  6:34 ` Yinghai Lu
  2012-05-23  6:34 ` [PATCH 09/11] PCI: Add is_pci_iov_resource_idx() Yinghai Lu
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 55+ messages in thread
From: Yinghai Lu @ 2012-05-23  6:34 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Andrew Morton, Linus Torvalds, linux-pci, linux-kernel, Yinghai Lu

Compare align between put range in head and tail, pick small one
to leave big one for future user.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 kernel/resource.c |   14 +++++++++++++-
 1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index b4dae55..8277090 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -506,8 +506,20 @@ out:
 	}
 
 	if (!ret) {
-		new->start = avail_start;
+		/* compare which one have max order */
+		new->start = round_down(avail_start + avail_size - size,
+					 constraint->align);
+		new->end = avail_start + avail_size - 1;
+		new->start = constraint->alignf(constraint->alignf_data, new,
+					size, constraint->align);
 		new->end = new->start + size - 1;
+
+		if (new->start < avail_start ||
+		    new->end > (avail_start + avail_size - 1) ||
+		    __ffs64(new->start) >= __ffs64(avail_start)) {
+			new->start = avail_start;
+			new->end = new->start + size - 1;
+		}
 	}
 	return ret;
 }
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 09/11] PCI: Add is_pci_iov_resource_idx()
  2012-05-23  6:34 [PATCH 00/11] PCI: resource allocation related Yinghai Lu
                   ` (7 preceding siblings ...)
  2012-05-23  6:34 ` [PATCH 08/11] resource: only return range with needed align Yinghai Lu
@ 2012-05-23  6:34 ` Yinghai Lu
  2012-05-23  6:34 ` [PATCH 10/11] PCI: Sort unassigned resources with correct alignment Yinghai Lu
  2012-05-23  6:34 ` [PATCH 11/11] PCI: Treat ROM resource as optional during assigning Yinghai Lu
  10 siblings, 0 replies; 55+ messages in thread
From: Yinghai Lu @ 2012-05-23  6:34 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Andrew Morton, Linus Torvalds, linux-pci, linux-kernel, Yinghai Lu

So we can remove one ifdef in setup-bus.c. and will share the code in that
ifdef block.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c |    7 +++----
 include/linux/pci.h     |    8 ++++++++
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 4d6823f..99bc728 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -813,16 +813,15 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 			if (r->parent || (r->flags & mask) != type)
 				continue;
 			r_size = resource_size(r);
-#ifdef CONFIG_PCI_IOV
+
 			/* put SRIOV requested res to the optional list */
-			if (realloc_head && i >= PCI_IOV_RESOURCES &&
-					i <= PCI_IOV_RESOURCE_END) {
+			if (realloc_head && is_pci_iov_resource_idx(i)) {
 				r->end = r->start - 1;
 				add_to_list(realloc_head, dev, r, r_size, 0/* dont' care */);
 				children_add_size += r_size;
 				continue;
 			}
-#endif
+
 			/* For bridges size != alignment */
 			align = pci_resource_alignment(dev, r);
 			order = __ffs(align) - 20;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index c0704a0..0d254d9 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -114,6 +114,14 @@ enum {
 	DEVICE_COUNT_RESOURCE = PCI_NUM_RESOURCES,
 };
 
+static inline bool is_pci_iov_resource_idx(int i)
+{
+#ifdef CONFIG_PCI_IOV
+	return i >= PCI_IOV_RESOURCES && i <= PCI_IOV_RESOURCE_END;
+#endif
+	return false;
+}
+
 typedef int __bitwise pci_power_t;
 
 #define PCI_D0		((pci_power_t __force) 0)
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 10/11] PCI: Sort unassigned resources with correct alignment
  2012-05-23  6:34 [PATCH 00/11] PCI: resource allocation related Yinghai Lu
                   ` (8 preceding siblings ...)
  2012-05-23  6:34 ` [PATCH 09/11] PCI: Add is_pci_iov_resource_idx() Yinghai Lu
@ 2012-05-23  6:34 ` Yinghai Lu
  2012-05-23  6:34 ` [PATCH 11/11] PCI: Treat ROM resource as optional during assigning Yinghai Lu
  10 siblings, 0 replies; 55+ messages in thread
From: Yinghai Lu @ 2012-05-23  6:34 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Andrew Morton, Linus Torvalds, linux-pci, linux-kernel, Yinghai Lu

For SIZEALIGN type resource, we need to add back add_size in optional
resource list during __dev_sort_resources(), otherwise those optional
resources  will get skipped.

SRIOV BAR is specical one, it will always re-read size for BAR.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c |   33 ++++++++++++++++++++++++++-------
 1 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 99bc728..ed32864 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -120,8 +120,25 @@ static resource_size_t get_res_add_size(struct list_head *head,
 	return 0;
 }
 
+static resource_size_t __pci_resource_alignment(
+				struct pci_dev *dev,
+				struct resource *r,
+				struct list_head *realloc_head)
+{
+	resource_size_t r_align, add_size = 0;
+
+	if ((r->flags & IORESOURCE_SIZEALIGN) && realloc_head)
+		add_size = get_res_add_size(realloc_head, r);
+	r->end += add_size;
+	r_align = pci_resource_alignment(dev, r);
+	r->end -= add_size;
+
+	return r_align;
+}
 /* Sort resources by alignment */
-static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
+static void pdev_sort_resources(struct pci_dev *dev,
+				 struct list_head *realloc_head,
+				 struct list_head *head)
 {
 	int i;
 
@@ -139,7 +156,7 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
 		if (!(r->flags) || r->parent)
 			continue;
 
-		r_align = pci_resource_alignment(dev, r);
+		r_align = __pci_resource_alignment(dev, r, realloc_head);
 		if (!r_align) {
 			dev_warn(&dev->dev, "BAR %d: %pR has bogus alignment\n",
 				 i, r);
@@ -159,8 +176,9 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
 		list_for_each_entry(dev_res, head, list) {
 			resource_size_t align;
 
-			align = pci_resource_alignment(dev_res->dev,
-							 dev_res->res);
+			align = __pci_resource_alignment(dev_res->dev,
+							 dev_res->res,
+							 realloc_head);
 
 			if (r_align > align ||
 			    (r_align == align &&
@@ -175,6 +193,7 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
 }
 
 static void __dev_sort_resources(struct pci_dev *dev,
+				 struct list_head *realloc_head,
 				 struct list_head *head)
 {
 	u16 class = dev->class >> 8;
@@ -191,7 +210,7 @@ static void __dev_sort_resources(struct pci_dev *dev,
 			return;
 	}
 
-	pdev_sort_resources(dev, head);
+	pdev_sort_resources(dev, realloc_head, head);
 }
 
 static inline void reset_resource(struct resource *res)
@@ -387,7 +406,7 @@ static void pdev_assign_resources_sorted(struct pci_dev *dev,
 {
 	LIST_HEAD(head);
 
-	__dev_sort_resources(dev, &head);
+	__dev_sort_resources(dev, add_head, &head);
 	__assign_resources_sorted(&head, add_head, fail_head);
 
 }
@@ -400,7 +419,7 @@ static void pbus_assign_resources_sorted(const struct pci_bus *bus,
 	LIST_HEAD(head);
 
 	list_for_each_entry(dev, &bus->devices, bus_list)
-		__dev_sort_resources(dev, &head);
+		__dev_sort_resources(dev, realloc_head, &head);
 
 	__assign_resources_sorted(&head, realloc_head, fail_head);
 }
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 11/11] PCI: Treat ROM resource as optional during assigning.
  2012-05-23  6:34 [PATCH 00/11] PCI: resource allocation related Yinghai Lu
                   ` (9 preceding siblings ...)
  2012-05-23  6:34 ` [PATCH 10/11] PCI: Sort unassigned resources with correct alignment Yinghai Lu
@ 2012-05-23  6:34 ` Yinghai Lu
  10 siblings, 0 replies; 55+ messages in thread
From: Yinghai Lu @ 2012-05-23  6:34 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Andrew Morton, Linus Torvalds, linux-pci, linux-kernel, Yinghai Lu

So will try to allocate them together with requested ones, if can not assign
them, could go with requested one only, and just skip ROM resource.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c |   21 +++++++--------------
 include/linux/pci.h     |    5 +++++
 2 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index ed32864..41c08d6 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -305,18 +305,10 @@ static void assign_requested_resources_sorted(struct list_head *head,
 		idx = res - &dev_res->dev->resource[0];
 		if (resource_size(res) &&
 		    pci_assign_resource_fit(dev_res->dev, idx, fit)) {
-			if (fail_head) {
-				/*
-				 * if the failed res is for ROM BAR, and it will
-				 * be enabled later, don't add it to the list
-				 */
-				if (!((idx == PCI_ROM_RESOURCE) &&
-				      (!(res->flags & IORESOURCE_ROM_ENABLE))))
-					add_to_list(fail_head,
-						    dev_res->dev, res,
-						    0 /* dont care */,
-						    0 /* dont care */);
-			}
+			if (fail_head)
+				add_to_list(fail_head, dev_res->dev, res,
+					    0 /* dont care */,
+					    0 /* dont care */);
 			reset_resource(res);
 		}
 	}
@@ -833,8 +825,9 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 				continue;
 			r_size = resource_size(r);
 
-			/* put SRIOV requested res to the optional list */
-			if (realloc_head && is_pci_iov_resource_idx(i)) {
+			/* put SRIOV/ROM requested res to the optional list */
+			if (realloc_head && (is_pci_iov_resource_idx(i) ||
+					     is_pci_rom_resource_idx(i))) {
 				r->end = r->start - 1;
 				add_to_list(realloc_head, dev, r, r_size, 0/* dont' care */);
 				children_add_size += r_size;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 0d254d9..08c081c 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -122,6 +122,11 @@ static inline bool is_pci_iov_resource_idx(int i)
 	return false;
 }
 
+static inline bool is_pci_rom_resource_idx(int i)
+{
+	return i == PCI_ROM_RESOURCE;
+}
+
 typedef int __bitwise pci_power_t;
 
 #define PCI_D0		((pci_power_t __force) 0)
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 03/11] intel-gtt: Read 64bit for gmar_bus_addr
  2012-05-23  6:34 ` [PATCH 03/11] intel-gtt: Read 64bit for gmar_bus_addr Yinghai Lu
@ 2012-05-23  7:21   ` Dave Airlie
  2012-05-23  7:44     ` Daniel Vetter
  0 siblings, 1 reply; 55+ messages in thread
From: Dave Airlie @ 2012-05-23  7:21 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Andrew Morton, Linus Torvalds, linux-pci,
	linux-kernel, David Airlie, Daniel Vetter

On Wed, May 23, 2012 at 7:34 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> That bar could be 64bit pref mem.
>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: David Airlie <airlied@linux.ie>

Adding Daniel Vetter.

Dave.
> ---
>  drivers/char/agp/intel-gtt.c |   14 ++++++++++----
>  1 files changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
> index 7f025fb..77e150e 100644
> --- a/drivers/char/agp/intel-gtt.c
> +++ b/drivers/char/agp/intel-gtt.c
> @@ -770,16 +770,22 @@ static void i830_write_entry(dma_addr_t addr, unsigned int entry,
>  static bool intel_enable_gtt(void)
>  {
>        u32 gma_addr;
> +       u32 addr_hi = 0;
>        u8 __iomem *reg;
> +       int pos;
>
>        if (INTEL_GTT_GEN <= 2)
> -               pci_read_config_dword(intel_private.pcidev, I810_GMADDR,
> -                                     &gma_addr);
> +               pos = I810_GMADDR;
>        else
> -               pci_read_config_dword(intel_private.pcidev, I915_GMADDR,
> -                                     &gma_addr);
> +               pos = I915_GMADDR;
> +
> +       pci_read_config_dword(intel_private.pcidev, pos, &gma_addr);
> +
> +       if (gma_addr & PCI_BASE_ADDRESS_MEM_TYPE_64)
> +               pci_read_config_dword(intel_private.pcidev, pos + 4, &addr_hi);
>
>        intel_private.gma_bus_addr = (gma_addr & PCI_BASE_ADDRESS_MEM_MASK);
> +       intel_private.gma_bus_addr |= (u64)addr_hi << 32;
>
>        if (INTEL_GTT_GEN >= 6)
>            return true;
> --
> 1.7.7
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 03/11] intel-gtt: Read 64bit for gmar_bus_addr
  2012-05-23  7:21   ` Dave Airlie
@ 2012-05-23  7:44     ` Daniel Vetter
  0 siblings, 0 replies; 55+ messages in thread
From: Daniel Vetter @ 2012-05-23  7:44 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Yinghai Lu, Bjorn Helgaas, Andrew Morton, Linus Torvalds,
	linux-pci, linux-kernel, David Airlie, Daniel Vetter

On Wed, May 23, 2012 at 08:21:51AM +0100, Dave Airlie wrote:
> On Wed, May 23, 2012 at 7:34 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> > That bar could be 64bit pref mem.
> >
> > Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> > Cc: David Airlie <airlied@linux.ie>
> 
> Adding Daniel Vetter.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-23  6:34 ` [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first Yinghai Lu
@ 2012-05-23 15:57   ` Linus Torvalds
  2012-05-23 17:30     ` Yinghai Lu
  0 siblings, 1 reply; 55+ messages in thread
From: Linus Torvalds @ 2012-05-23 15:57 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Bjorn Helgaas, Andrew Morton, linux-pci, linux-kernel

On Tue, May 22, 2012 at 11:34 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> and will fall back to below 4g if it can not find any above 4g.

Has this been tested on 32-bit machines without PAE? There might be
things that just happen to work because their allocations were always
done bottom-up.

Or do we have something else that protects us from the "oops, we can't
actually *map* those pages"?

                       Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-23 15:57   ` Linus Torvalds
@ 2012-05-23 17:30     ` Yinghai Lu
  2012-05-23 18:40       ` Yinghai Lu
  0 siblings, 1 reply; 55+ messages in thread
From: Yinghai Lu @ 2012-05-23 17:30 UTC (permalink / raw)
  To: Linus Torvalds, Steven Newbury
  Cc: Bjorn Helgaas, Andrew Morton, linux-pci, linux-kernel

On Wed, May 23, 2012 at 8:57 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, May 22, 2012 at 11:34 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> and will fall back to below 4g if it can not find any above 4g.
>
> Has this been tested on 32-bit machines without PAE? There might be
> things that just happen to work because their allocations were always
> done bottom-up.

Good point. that problem should be addressed at first before this patch.

>
> Or do we have something else that protects us from the "oops, we can't
> actually *map* those pages"?

Steven tested on his setup.

I tested some Infiniband cards.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-23 17:30     ` Yinghai Lu
@ 2012-05-23 18:40       ` Yinghai Lu
  2012-05-25  4:36         ` Bjorn Helgaas
  0 siblings, 1 reply; 55+ messages in thread
From: Yinghai Lu @ 2012-05-23 18:40 UTC (permalink / raw)
  To: Linus Torvalds, Steven Newbury, H. Peter Anvin
  Cc: Bjorn Helgaas, Andrew Morton, linux-pci, linux-kernel

On Wed, May 23, 2012 at 10:30 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Wed, May 23, 2012 at 8:57 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> On Tue, May 22, 2012 at 11:34 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>> and will fall back to below 4g if it can not find any above 4g.
>>
>> Has this been tested on 32-bit machines without PAE? There might be
>> things that just happen to work because their allocations were always
>> done bottom-up.
>
> Good point. that problem should be addressed at first before this patch.

Just checked code for 32bit machines without PAE.

when X86_PAE is not set, phys_addr_t aka resource_size_t will be 32bit.
so in drivers/pci/bus.c::pci_bus_alloc_resource_fit()
will have bottom to 0.
    resource_size_t bottom = PCIBIOS_MAX_MEM_32 + 1ULL;
also in arch/x86/kernel/setup.c::setup_arch()
   iomem_resource.end = (1ULL << boot_cpu_data.x86_phys_bits) - 1;
will have iomem_resource.end to 0xffffffff

when X86_PAE is set, but CPU does not support PAE.
phys_addr_t aka resource_size_t will be 32bit.
so in drivers/pci/bus.c::pci_bus_alloc_resource_fit()
will have bottom to 4g.
    resource_size_t bottom = PCIBIOS_MAX_MEM_32 + 1ULL;
but
in arch/x86/kernel/setup.c::setup_arch()
   iomem_resource.end = (1ULL << boot_cpu_data.x86_phys_bits) - 1;
will have iomem_resource.end to 0xffffffff, because x86_phys_bits is 32 when PAE
is not detected in arch/x86/kernel/cpu/common.c::get_cpu_cap.
that mean first try will fail, so it will go to second try with bottom to 0.

so both case are safe with this patch.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-23 18:40       ` Yinghai Lu
@ 2012-05-25  4:36         ` Bjorn Helgaas
  2012-05-25 17:53           ` Yinghai Lu
  0 siblings, 1 reply; 55+ messages in thread
From: Bjorn Helgaas @ 2012-05-25  4:36 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Linus Torvalds, Steven Newbury, H. Peter Anvin, Andrew Morton,
	linux-pci, linux-kernel

On Wed, May 23, 2012 at 11:40:46AM -0700, Yinghai Lu wrote:
> On Wed, May 23, 2012 at 10:30 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> > On Wed, May 23, 2012 at 8:57 AM, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> >> On Tue, May 22, 2012 at 11:34 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> >>> and will fall back to below 4g if it can not find any above 4g.
> >>
> >> Has this been tested on 32-bit machines without PAE? There might be
> >> things that just happen to work because their allocations were always
> >> done bottom-up.
> >
> > Good point. that problem should be addressed at first before this patch.
> 
> Just checked code for 32bit machines without PAE.
> 
> when X86_PAE is not set, phys_addr_t aka resource_size_t will be 32bit.
> so in drivers/pci/bus.c::pci_bus_alloc_resource_fit()
> will have bottom to 0.
>     resource_size_t bottom = PCIBIOS_MAX_MEM_32 + 1ULL;
> also in arch/x86/kernel/setup.c::setup_arch()
>    iomem_resource.end = (1ULL << boot_cpu_data.x86_phys_bits) - 1;
> will have iomem_resource.end to 0xffffffff
> 
> when X86_PAE is set, but CPU does not support PAE.
> phys_addr_t aka resource_size_t will be 32bit.

I think you meant phys_addr_t and resource_size_t will be *64* bit
when X86_PAE is set.  Obvious to you, but quite confusing to non-x86
experts like me :)

> so in drivers/pci/bus.c::pci_bus_alloc_resource_fit()
> will have bottom to 4g.
>     resource_size_t bottom = PCIBIOS_MAX_MEM_32 + 1ULL;
> but
> in arch/x86/kernel/setup.c::setup_arch()
>    iomem_resource.end = (1ULL << boot_cpu_data.x86_phys_bits) - 1;
> will have iomem_resource.end to 0xffffffff, because x86_phys_bits is 32 when PAE
> is not detected in arch/x86/kernel/cpu/common.c::get_cpu_cap.
> that mean first try will fail, so it will go to second try with bottom to 0.
> 
> so both case are safe with this patch.

I don't really like the dependency on PCIBIOS_MAX_MEM_32 + 1ULL
overflowing to zero -- that means the reader has to know what the
value of PCIBIOS_MAX_MEM_32 is, and things would break in non-obvious
ways if we changed it.

What do you think of a patch like the following?  It makes it
explicit that we can only allocate space the CPU can address.

commit feded2ae21d6160292726ccd5128080d42395be4
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Thu May 24 22:15:26 2012 -0600

    PCI: try to allocate 64-bit resources above 4GB
    
    If we have a 64-bit resource, try to allocate it above 4GB first.  If that
    fails, either because there's no space or the CPU can't address space above
    4GB (iomem_resource.end is the highest address the CPU supports), we'll
    fall back to allocating space below 4GB.

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 4ce5ef2..2c56693 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -121,14 +121,18 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
 {
 	int i, ret = -ENOMEM;
 	struct resource *r;
-	resource_size_t max = -1;
+	resource_size_t start = 0;
+	resource_size_t end = PCIBIOS_MAX_MEM_32;
 
 	type_mask |= IORESOURCE_IO | IORESOURCE_MEM;
 
-	/* don't allocate too high if the pref mem doesn't support 64bit*/
-	if (!(res->flags & IORESOURCE_MEM_64))
-		max = PCIBIOS_MAX_MEM_32;
+	/* If this is a 64-bit resource, prefer space above 4GB */
+	if (res->flags & IORESOURCE_MEM_64) {
+		start = PCIBIOS_MAX_MEM_32 + 1ULL;
+		end = iomem_resource.end;
+	}
 
+again:
 	pci_bus_for_each_resource(bus, r, i) {
 		if (!r)
 			continue;
@@ -145,12 +149,18 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
 
 		/* Ok, try it out.. */
 		ret = allocate_resource(r, res, size,
-					r->start ? : min,
-					max, align,
+					max(start, r->start ? : min),
+					end, align,
 					alignf, alignf_data);
 		if (ret == 0)
-			break;
+			return 0;
 	}
+
+	if (start != 0) {
+		start = 0;
+		goto again;
+	}
+
 	return ret;
 }
 

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-25  4:36         ` Bjorn Helgaas
@ 2012-05-25 17:53           ` Yinghai Lu
  2012-05-25 18:39             ` Yinghai Lu
  0 siblings, 1 reply; 55+ messages in thread
From: Yinghai Lu @ 2012-05-25 17:53 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Linus Torvalds, Steven Newbury, H. Peter Anvin, Andrew Morton,
	linux-pci, linux-kernel

On Thu, May 24, 2012 at 9:36 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Wed, May 23, 2012 at 11:40:46AM -0700, Yinghai Lu wrote:
>> On Wed, May 23, 2012 at 10:30 AM, Yinghai Lu <yinghai@kernel.org> wrote:
>> > On Wed, May 23, 2012 at 8:57 AM, Linus Torvalds
>> > <torvalds@linux-foundation.org> wrote:
>> >> On Tue, May 22, 2012 at 11:34 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> >>> and will fall back to below 4g if it can not find any above 4g.
>> >>
>> >> Has this been tested on 32-bit machines without PAE? There might be
>> >> things that just happen to work because their allocations were always
>> >> done bottom-up.
>> >
>> > Good point. that problem should be addressed at first before this patch.
>>
>> Just checked code for 32bit machines without PAE.
>>
>> when X86_PAE is not set, phys_addr_t aka resource_size_t will be 32bit.
>> so in drivers/pci/bus.c::pci_bus_alloc_resource_fit()
>> will have bottom to 0.
>>     resource_size_t bottom = PCIBIOS_MAX_MEM_32 + 1ULL;
>> also in arch/x86/kernel/setup.c::setup_arch()
>>    iomem_resource.end = (1ULL << boot_cpu_data.x86_phys_bits) - 1;
>> will have iomem_resource.end to 0xffffffff
>>
>> when X86_PAE is set, but CPU does not support PAE.
>> phys_addr_t aka resource_size_t will be 32bit.
>
> I think you meant phys_addr_t and resource_size_t will be *64* bit
> when X86_PAE is set.  Obvious to you, but quite confusing to non-x86
> experts like me :)
>
>> so in drivers/pci/bus.c::pci_bus_alloc_resource_fit()
>> will have bottom to 4g.
>>     resource_size_t bottom = PCIBIOS_MAX_MEM_32 + 1ULL;
>> but
>> in arch/x86/kernel/setup.c::setup_arch()
>>    iomem_resource.end = (1ULL << boot_cpu_data.x86_phys_bits) - 1;
>> will have iomem_resource.end to 0xffffffff, because x86_phys_bits is 32 when PAE
>> is not detected in arch/x86/kernel/cpu/common.c::get_cpu_cap.
>> that mean first try will fail, so it will go to second try with bottom to 0.
>>
>> so both case are safe with this patch.
>
> I don't really like the dependency on PCIBIOS_MAX_MEM_32 + 1ULL
> overflowing to zero -- that means the reader has to know what the
> value of PCIBIOS_MAX_MEM_32 is, and things would break in non-obvious
> ways if we changed it.
>
> What do you think of a patch like the following?  It makes it
> explicit that we can only allocate space the CPU can address.
>
> commit feded2ae21d6160292726ccd5128080d42395be4
> Author: Bjorn Helgaas <bhelgaas@google.com>
> Date:   Thu May 24 22:15:26 2012 -0600
>
>    PCI: try to allocate 64-bit resources above 4GB
>
>    If we have a 64-bit resource, try to allocate it above 4GB first.  If that
>    fails, either because there's no space or the CPU can't address space above
>    4GB (iomem_resource.end is the highest address the CPU supports), we'll
>    fall back to allocating space below 4GB.
>
> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> index 4ce5ef2..2c56693 100644
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -121,14 +121,18 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
>  {
>        int i, ret = -ENOMEM;
>        struct resource *r;
> -       resource_size_t max = -1;
> +       resource_size_t start = 0;
> +       resource_size_t end = PCIBIOS_MAX_MEM_32;
>
>        type_mask |= IORESOURCE_IO | IORESOURCE_MEM;
>
> -       /* don't allocate too high if the pref mem doesn't support 64bit*/
> -       if (!(res->flags & IORESOURCE_MEM_64))
> -               max = PCIBIOS_MAX_MEM_32;
> +       /* If this is a 64-bit resource, prefer space above 4GB */
> +       if (res->flags & IORESOURCE_MEM_64) {
> +               start = PCIBIOS_MAX_MEM_32 + 1ULL;
> +               end = iomem_resource.end;

but here we still have PCIBIOS_MAX_MEM_32 + 1ULL ...will still have
overflow to 0..

also because all mmio will in iomem_resource, so we don't need to
specify it, and still keep using -1 as max.
aka avoid referring global iomem_resource here.

So this version is the same as old version, and just reverse checking
           res->flags & IORESOURCE_MEM_64

> +       }
>
> +again:
>        pci_bus_for_each_resource(bus, r, i) {
>                if (!r)
>                        continue;
> @@ -145,12 +149,18 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
>
>                /* Ok, try it out.. */
>                ret = allocate_resource(r, res, size,
> -                                       r->start ? : min,
> -                                       max, align,
> +                                       max(start, r->start ? : min),
> +                                       end, align,
>                                        alignf, alignf_data);
>                if (ret == 0)
> -                       break;
> +                       return 0;
>        }
> +
> +       if (start != 0) {
> +               start = 0;
> +               goto again;
> +       }
> +
>        return ret;
>  }
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-25 17:53           ` Yinghai Lu
@ 2012-05-25 18:39             ` Yinghai Lu
  2012-05-25 19:37               ` Bjorn Helgaas
  0 siblings, 1 reply; 55+ messages in thread
From: Yinghai Lu @ 2012-05-25 18:39 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Linus Torvalds, Steven Newbury, H. Peter Anvin, Andrew Morton,
	linux-pci, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 841 bytes --]

On Fri, May 25, 2012 at 10:53 AM, Yinghai Lu <yinghai@kernel.org> wrote:
>> I don't really like the dependency on PCIBIOS_MAX_MEM_32 + 1ULL
>> overflowing to zero -- that means the reader has to know what the
>> value of PCIBIOS_MAX_MEM_32 is, and things would break in non-obvious
>> ways if we changed it.
>>

please check if attached one is more clear.

make max and bottom is only related to _MEM and not default one.

-       if (!(res->flags & IORESOURCE_MEM_64))
-               max = PCIBIOS_MAX_MEM_32;
+       if (res->flags & IORESOURCE_MEM) {
+               if (!(res->flags & IORESOURCE_MEM_64))
+                       max = PCIBIOS_MAX_MEM_32;
+               else if (PCIBIOS_MAX_MEM_32 != -1)
+                       bottom = (resource_size_t)(1ULL<<32);
+       }

will still not affect to other arches.


Thanks

Yinghai

[-- Attachment #2: allocate_high_at_first.patch --]
[-- Type: application/octet-stream, Size: 1948 bytes --]

Subject: [PATCH] PCI: Try to allocate mem64 above 4G at first

and will fall back to below 4g if it can not find any above 4g.

only x86 have PCIBIOS_MAX_MEM_32 set to 0xffffffff.
it will only affect x86_64 and 32bit with resource_size_t 64bit support.
only that case bottom is set to 4g for first try.

x86 32bit without X86_PAE support will have bottom set to 0, becuase
resource_size_t is 32bit.

Also for 32bit with resource_size_t 64bit kernel on machine with pae support
we are safe because iomem_resource is limited to 32bit according to
x86_phys_bits.

-v2: update bottom assigning to make it clear for non-pae support machine.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/bus.c |   20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

Index: linux-2.6/drivers/pci/bus.c
===================================================================
--- linux-2.6.orig/drivers/pci/bus.c
+++ linux-2.6/drivers/pci/bus.c
@@ -122,13 +122,19 @@ pci_bus_alloc_resource(struct pci_bus *b
 	int i, ret = -ENOMEM;
 	struct resource *r;
 	resource_size_t max = -1;
+	resource_size_t bottom = 0;
 
 	type_mask |= IORESOURCE_IO | IORESOURCE_MEM;
 
 	/* don't allocate too high if the pref mem doesn't support 64bit*/
-	if (!(res->flags & IORESOURCE_MEM_64))
-		max = PCIBIOS_MAX_MEM_32;
+	if (res->flags & IORESOURCE_MEM) {
+		if (!(res->flags & IORESOURCE_MEM_64))
+			max = PCIBIOS_MAX_MEM_32;
+		else if (PCIBIOS_MAX_MEM_32 != -1)
+			bottom = (resource_size_t)(1ULL<<32);
+	}
 
+again:
 	pci_bus_for_each_resource(bus, r, i) {
 		if (!r)
 			continue;
@@ -145,12 +151,18 @@ pci_bus_alloc_resource(struct pci_bus *b
 
 		/* Ok, try it out.. */
 		ret = allocate_resource(r, res, size,
-					r->start ? : min,
+					max(bottom, r->start ? : min),
 					max, align,
 					alignf, alignf_data);
 		if (ret == 0)
-			break;
+			return 0;
+	}
+
+	if (bottom != 0) {
+		bottom = 0;
+		goto again;
 	}
+
 	return ret;
 }
 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-25 18:39             ` Yinghai Lu
@ 2012-05-25 19:37               ` Bjorn Helgaas
  2012-05-25 20:18                 ` H. Peter Anvin
  2012-05-25 20:19                 ` Yinghai Lu
  0 siblings, 2 replies; 55+ messages in thread
From: Bjorn Helgaas @ 2012-05-25 19:37 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Linus Torvalds, Steven Newbury, H. Peter Anvin, Andrew Morton,
	linux-pci, linux-kernel

On Fri, May 25, 2012 at 11:39:26AM -0700, Yinghai Lu wrote:
> On Fri, May 25, 2012 at 10:53 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> >> I don't really like the dependency on PCIBIOS_MAX_MEM_32 + 1ULL
> >> overflowing to zero -- that means the reader has to know what the
> >> value of PCIBIOS_MAX_MEM_32 is, and things would break in non-obvious
> >> ways if we changed it.
> >>
> 
> please check if attached one is more clear.
> 
> make max and bottom is only related to _MEM and not default one.
> 
> -       if (!(res->flags & IORESOURCE_MEM_64))
> -               max = PCIBIOS_MAX_MEM_32;
> +       if (res->flags & IORESOURCE_MEM) {
> +               if (!(res->flags & IORESOURCE_MEM_64))
> +                       max = PCIBIOS_MAX_MEM_32;
> +               else if (PCIBIOS_MAX_MEM_32 != -1)
> +                       bottom = (resource_size_t)(1ULL<<32);
> +       }
> 
> will still not affect to other arches.

That's goofy.  You're proposing to make only x86_64 and x86-PAE try to put
64-bit BARs above 4GB.  Why should this be specific to x86?  I acknowledge
that there's risk in doing this, but if it's a good idea for x86_64, it
should also be a good idea for other 64-bit architectures.

And testing for "is this x86_32 without PAE?" with
"PCIBIOS_MAX_MEM_32 == -1" is just plain obtuse and hides an
important bit of arch-specific behavior.

Tangential question about allocate_resource():  Is its "max" argument
really necessary?  We'll obviously only allocate from inside the root
resource, so "max" is just a way to artificially avoid the end of
that resource.  Is there really a case where that's required?

"min" makes sense because in a case like this, it's valid to allocate from
anywhere in the root resource, but we want to try to allocate from the >4GB
part first, then fall back to allocating from the whole resource.  I'm not
sure there's a corresponding case for "max."

Getting back to this patch, I don't think we should need to adjust "max" at
all.  For example, this:

commit cb1c8e46244cfd84a1a2fe91be860a74c1cf4e25
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Thu May 24 22:15:26 2012 -0600

    PCI: try to allocate 64-bit mem resources above 4GB
    
    If we have a 64-bit mem resource, try to allocate it above 4GB first.  If
    that fails, we'll fall back to allocating space below 4GB.

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 4ce5ef2..075e5b1 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -121,14 +121,16 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
 {
 	int i, ret = -ENOMEM;
 	struct resource *r;
-	resource_size_t max = -1;
+	resource_size_t start = 0;
+	resource_size_t end = MAX_RESOURCE;
 
 	type_mask |= IORESOURCE_IO | IORESOURCE_MEM;
 
-	/* don't allocate too high if the pref mem doesn't support 64bit*/
-	if (!(res->flags & IORESOURCE_MEM_64))
-		max = PCIBIOS_MAX_MEM_32;
+	/* If this is a 64-bit mem resource, try above 4GB first */
+	if (res->flags & IORESOURCE_MEM_64)
+		start = (resource_size_t) (1ULL << 32);
 
+again:
 	pci_bus_for_each_resource(bus, r, i) {
 		if (!r)
 			continue;
@@ -145,12 +147,18 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
 
 		/* Ok, try it out.. */
 		ret = allocate_resource(r, res, size,
-					r->start ? : min,
-					max, align,
+					max(start, r->start ? : min),
+					end, align,
 					alignf, alignf_data);
 		if (ret == 0)
-			break;
+			return 0;
+	}
+
+	if (start != 0) {
+		start = 0;
+		goto again;
 	}
+
 	return ret;
 }
 

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-25 19:37               ` Bjorn Helgaas
@ 2012-05-25 20:18                 ` H. Peter Anvin
  2012-05-25 20:19                 ` Yinghai Lu
  1 sibling, 0 replies; 55+ messages in thread
From: H. Peter Anvin @ 2012-05-25 20:18 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Yinghai Lu, Linus Torvalds, Steven Newbury, Andrew Morton,
	linux-pci, linux-kernel

On 05/25/2012 12:37 PM, Bjorn Helgaas wrote:
> 
> That's goofy.  You're proposing to make only x86_64 and x86-PAE try to put
> 64-bit BARs above 4GB.  Why should this be specific to x86?  I acknowledge
> that there's risk in doing this, but if it's a good idea for x86_64, it
> should also be a good idea for other 64-bit architectures.
> 

And so it is (assuming, of course, that we can address that memory.)

	-hpa


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-25 19:37               ` Bjorn Helgaas
  2012-05-25 20:18                 ` H. Peter Anvin
@ 2012-05-25 20:19                 ` Yinghai Lu
  2012-05-25 21:55                   ` Bjorn Helgaas
  1 sibling, 1 reply; 55+ messages in thread
From: Yinghai Lu @ 2012-05-25 20:19 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Linus Torvalds, Steven Newbury, H. Peter Anvin, Andrew Morton,
	linux-pci, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3819 bytes --]

On Fri, May 25, 2012 at 12:37 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Fri, May 25, 2012 at 11:39:26AM -0700, Yinghai Lu wrote:
>> On Fri, May 25, 2012 at 10:53 AM, Yinghai Lu <yinghai@kernel.org> wrote:
>> >> I don't really like the dependency on PCIBIOS_MAX_MEM_32 + 1ULL
>> >> overflowing to zero -- that means the reader has to know what the
>> >> value of PCIBIOS_MAX_MEM_32 is, and things would break in non-obvious
>> >> ways if we changed it.
>> >>
>>
>> please check if attached one is more clear.
>>
>> make max and bottom is only related to _MEM and not default one.
>>
>> -       if (!(res->flags & IORESOURCE_MEM_64))
>> -               max = PCIBIOS_MAX_MEM_32;
>> +       if (res->flags & IORESOURCE_MEM) {
>> +               if (!(res->flags & IORESOURCE_MEM_64))
>> +                       max = PCIBIOS_MAX_MEM_32;
>> +               else if (PCIBIOS_MAX_MEM_32 != -1)
>> +                       bottom = (resource_size_t)(1ULL<<32);
>> +       }
>>
>> will still not affect to other arches.
>
> That's goofy.  You're proposing to make only x86_64 and x86-PAE try to put
> 64-bit BARs above 4GB.  Why should this be specific to x86?  I acknowledge
> that there's risk in doing this, but if it's a good idea for x86_64, it
> should also be a good idea for other 64-bit architectures.
>
> And testing for "is this x86_32 without PAE?" with
> "PCIBIOS_MAX_MEM_32 == -1" is just plain obtuse and hides an
> important bit of arch-specific behavior.
>
> Tangential question about allocate_resource():  Is its "max" argument
> really necessary?  We'll obviously only allocate from inside the root
> resource, so "max" is just a way to artificially avoid the end of
> that resource.  Is there really a case where that's required?
>
> "min" makes sense because in a case like this, it's valid to allocate from
> anywhere in the root resource, but we want to try to allocate from the >4GB
> part first, then fall back to allocating from the whole resource.  I'm not
> sure there's a corresponding case for "max."
>
> Getting back to this patch, I don't think we should need to adjust "max" at
> all.  For example, this:
>
> commit cb1c8e46244cfd84a1a2fe91be860a74c1cf4e25
> Author: Bjorn Helgaas <bhelgaas@google.com>
> Date:   Thu May 24 22:15:26 2012 -0600
>
>    PCI: try to allocate 64-bit mem resources above 4GB
>
>    If we have a 64-bit mem resource, try to allocate it above 4GB first.  If
>    that fails, we'll fall back to allocating space below 4GB.
>
> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> index 4ce5ef2..075e5b1 100644
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -121,14 +121,16 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
>  {
>        int i, ret = -ENOMEM;
>        struct resource *r;
> -       resource_size_t max = -1;
> +       resource_size_t start = 0;
> +       resource_size_t end = MAX_RESOURCE;

yeah, MAX_RESOURCE is better than -1.

>
>        type_mask |= IORESOURCE_IO | IORESOURCE_MEM;
>
> -       /* don't allocate too high if the pref mem doesn't support 64bit*/
> -       if (!(res->flags & IORESOURCE_MEM_64))
> -               max = PCIBIOS_MAX_MEM_32;

can not remove this one.
otherwise will could allocate above 4g range to non MEM64 resource.


> +       /* If this is a 64-bit mem resource, try above 4GB first */
> +       if (res->flags & IORESOURCE_MEM_64)
> +               start = (resource_size_t) (1ULL << 32);

could affect other arches. let's see if other arches is ok.

please check merged version.

also we have

include/linux/range.h:#define MAX_RESOURCE ((resource_size_t)~0)
arch/x86/kernel/e820.c:#define MAX_RESOURCE_SIZE ((resource_size_t)-1)

we should merge them later?

Thanks

Yinghai

[-- Attachment #2: allocate_high_at_first_v3.patch --]
[-- Type: application/octet-stream, Size: 2249 bytes --]

Subject: [PATCH] PCI: Try to allocate mem64 above 4G at first

and will fall back to below 4g if it can not find any above 4g.

only x86 have PCIBIOS_MAX_MEM_32 set to 0xffffffff.
it will only affect x86_64 and 32bit with resource_size_t 64bit support.
only that case bottom is set to 4g for first try.

x86 32bit without X86_PAE support will have bottom set to 0, becuase
resource_size_t is 32bit.

Also for 32bit with resource_size_t 64bit kernel on machine with pae support
we are safe because iomem_resource is limited to 32bit according to
x86_phys_bits.

-v2: update bottom assigning to make it clear for non-pae support machine.
-v3: Bjorn's change:
	use MAX_REOURCE instead of -1
	use start/end instead of bottom/max
	for all arch instead of just x86_64

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/bus.c |   29 ++++++++++++++++++++++-------
 1 file changed, 22 insertions(+), 7 deletions(-)

Index: linux-2.6/drivers/pci/bus.c
===================================================================
--- linux-2.6.orig/drivers/pci/bus.c
+++ linux-2.6/drivers/pci/bus.c
@@ -121,14 +121,23 @@ pci_bus_alloc_resource(struct pci_bus *b
 {
 	int i, ret = -ENOMEM;
 	struct resource *r;
-	resource_size_t max = -1;
+	resource_size_t end = MAX_RESOURCE;
+	resource_size_t start = 0;
 
 	type_mask |= IORESOURCE_IO | IORESOURCE_MEM;
 
-	/* don't allocate too high if the pref mem doesn't support 64bit*/
-	if (!(res->flags & IORESOURCE_MEM_64))
-		max = PCIBIOS_MAX_MEM_32;
+	/*
+	 * don't allocate too high if the pref mem doesn't support 64bit
+	 * also if this is a 64-bit mem resource, try above 4GB first
+	 */
+	if (res->flags & IORESOURCE_MEM) {
+		if (!(res->flags & IORESOURCE_MEM_64))
+			end = PCIBIOS_MAX_MEM_32;
+		else
+			start = (resource_size_t)(1ULL<<32);
+	}
 
+again:
 	pci_bus_for_each_resource(bus, r, i) {
 		if (!r)
 			continue;
@@ -145,12 +154,18 @@ pci_bus_alloc_resource(struct pci_bus *b
 
 		/* Ok, try it out.. */
 		ret = allocate_resource(r, res, size,
-					r->start ? : min,
-					max, align,
+					max(start, r->start ? : min),
+					end, align,
 					alignf, alignf_data);
 		if (ret == 0)
-			break;
+			return 0;
+	}
+
+	if (start != 0) {
+		start = 0;
+		goto again;
 	}
+
 	return ret;
 }
 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-25 20:19                 ` Yinghai Lu
@ 2012-05-25 21:55                   ` Bjorn Helgaas
  2012-05-25 21:58                     ` H. Peter Anvin
  2012-05-25 23:10                     ` Yinghai Lu
  0 siblings, 2 replies; 55+ messages in thread
From: Bjorn Helgaas @ 2012-05-25 21:55 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Linus Torvalds, Steven Newbury, H. Peter Anvin, Andrew Morton,
	linux-pci, linux-kernel

On Fri, May 25, 2012 at 2:19 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, May 25, 2012 at 12:37 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> On Fri, May 25, 2012 at 11:39:26AM -0700, Yinghai Lu wrote:
>>> On Fri, May 25, 2012 at 10:53 AM, Yinghai Lu <yinghai@kernel.org> wrote:
>>> >> I don't really like the dependency on PCIBIOS_MAX_MEM_32 + 1ULL
>>> >> overflowing to zero -- that means the reader has to know what the
>>> >> value of PCIBIOS_MAX_MEM_32 is, and things would break in non-obvious
>>> >> ways if we changed it.
>>> >>
>>>
>>> please check if attached one is more clear.
>>>
>>> make max and bottom is only related to _MEM and not default one.
>>>
>>> -       if (!(res->flags & IORESOURCE_MEM_64))
>>> -               max = PCIBIOS_MAX_MEM_32;
>>> +       if (res->flags & IORESOURCE_MEM) {
>>> +               if (!(res->flags & IORESOURCE_MEM_64))
>>> +                       max = PCIBIOS_MAX_MEM_32;
>>> +               else if (PCIBIOS_MAX_MEM_32 != -1)
>>> +                       bottom = (resource_size_t)(1ULL<<32);
>>> +       }
>>>
>>> will still not affect to other arches.
>>
>> That's goofy.  You're proposing to make only x86_64 and x86-PAE try to put
>> 64-bit BARs above 4GB.  Why should this be specific to x86?  I acknowledge
>> that there's risk in doing this, but if it's a good idea for x86_64, it
>> should also be a good idea for other 64-bit architectures.
>>
>> And testing for "is this x86_32 without PAE?" with
>> "PCIBIOS_MAX_MEM_32 == -1" is just plain obtuse and hides an
>> important bit of arch-specific behavior.
>>
>> Tangential question about allocate_resource():  Is its "max" argument
>> really necessary?  We'll obviously only allocate from inside the root
>> resource, so "max" is just a way to artificially avoid the end of
>> that resource.  Is there really a case where that's required?
>>
>> "min" makes sense because in a case like this, it's valid to allocate from
>> anywhere in the root resource, but we want to try to allocate from the >4GB
>> part first, then fall back to allocating from the whole resource.  I'm not
>> sure there's a corresponding case for "max."
>>
>> Getting back to this patch, I don't think we should need to adjust "max" at
>> all.  For example, this:
>>
>> commit cb1c8e46244cfd84a1a2fe91be860a74c1cf4e25
>> Author: Bjorn Helgaas <bhelgaas@google.com>
>> Date:   Thu May 24 22:15:26 2012 -0600
>>
>>    PCI: try to allocate 64-bit mem resources above 4GB
>>
>>    If we have a 64-bit mem resource, try to allocate it above 4GB first.  If
>>    that fails, we'll fall back to allocating space below 4GB.
>>
>> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
>> index 4ce5ef2..075e5b1 100644
>> --- a/drivers/pci/bus.c
>> +++ b/drivers/pci/bus.c
>> @@ -121,14 +121,16 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
>>  {
>>        int i, ret = -ENOMEM;
>>        struct resource *r;
>> -       resource_size_t max = -1;
>> +       resource_size_t start = 0;
>> +       resource_size_t end = MAX_RESOURCE;
>
> yeah, MAX_RESOURCE is better than -1.
>
>>
>>        type_mask |= IORESOURCE_IO | IORESOURCE_MEM;
>>
>> -       /* don't allocate too high if the pref mem doesn't support 64bit*/
>> -       if (!(res->flags & IORESOURCE_MEM_64))
>> -               max = PCIBIOS_MAX_MEM_32;
>
> can not remove this one.
> otherwise will could allocate above 4g range to non MEM64 resource.

Yeah, I convince myself of the dumbest things sometimes.  It occurred
to me while driving home that we need this, but you beat me to it :)

I think we actually have a separate bug here.  On 64-bit non-x86
architectures, PCIBIOS_MAX_MEM_32 is a 64-bit -1, so the following
attempt to avoid putting a 32-bit BAR above 4G only works on x86,
where PCIBIOS_MAX_MEM_32 is 0xffffffff.

        /* don't allocate too high if the pref mem doesn't support 64bit*/
        if (!(res->flags & IORESOURCE_MEM_64))
                max = PCIBIOS_MAX_MEM_32;

I think we should fix this with a separate patch that removes
PCIBIOS_MAX_MEM_32 altogether, replacing this use with an explicit
0xffffffff (or some other "max 32-bit value" symbol).  I don't think
there's anything arch-specific about this.

So I'd like to see two patches here:
  1) Avoid allocating 64-bit regions for 32-bit BARs
  2) Try to allocate regions above 4GB for 64-bit BARs

> also we have
>
> include/linux/range.h:#define MAX_RESOURCE ((resource_size_t)~0)
> arch/x86/kernel/e820.c:#define MAX_RESOURCE_SIZE ((resource_size_t)-1)
>
> we should merge them later?

I would support that.

Bjorn

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-25 21:55                   ` Bjorn Helgaas
@ 2012-05-25 21:58                     ` H. Peter Anvin
  2012-05-25 22:14                       ` Bjorn Helgaas
  2012-05-25 23:10                     ` Yinghai Lu
  1 sibling, 1 reply; 55+ messages in thread
From: H. Peter Anvin @ 2012-05-25 21:58 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Yinghai Lu, Linus Torvalds, Steven Newbury, Andrew Morton,
	linux-pci, linux-kernel

On 05/25/2012 02:55 PM, Bjorn Helgaas wrote:
> 
> I think we actually have a separate bug here.  On 64-bit non-x86
> architectures, PCIBIOS_MAX_MEM_32 is a 64-bit -1, so the following
> attempt to avoid putting a 32-bit BAR above 4G only works on x86,
> where PCIBIOS_MAX_MEM_32 is 0xffffffff.
> 
>         /* don't allocate too high if the pref mem doesn't support 64bit*/
>         if (!(res->flags & IORESOURCE_MEM_64))
>                 max = PCIBIOS_MAX_MEM_32;
> 
> I think we should fix this with a separate patch that removes
> PCIBIOS_MAX_MEM_32 altogether, replacing this use with an explicit
> 0xffffffff (or some other "max 32-bit value" symbol).  I don't think
> there's anything arch-specific about this.
> 
> So I'd like to see two patches here:
>   1) Avoid allocating 64-bit regions for 32-bit BARs
>   2) Try to allocate regions above 4GB for 64-bit BARs
> 

Do we also need to track the maximum address available to the CPU?

	-hpa


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-25 21:58                     ` H. Peter Anvin
@ 2012-05-25 22:14                       ` Bjorn Helgaas
  0 siblings, 0 replies; 55+ messages in thread
From: Bjorn Helgaas @ 2012-05-25 22:14 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Yinghai Lu, Linus Torvalds, Steven Newbury, Andrew Morton,
	linux-pci, linux-kernel

On Fri, May 25, 2012 at 3:58 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 05/25/2012 02:55 PM, Bjorn Helgaas wrote:
>>
>> I think we actually have a separate bug here.  On 64-bit non-x86
>> architectures, PCIBIOS_MAX_MEM_32 is a 64-bit -1, so the following
>> attempt to avoid putting a 32-bit BAR above 4G only works on x86,
>> where PCIBIOS_MAX_MEM_32 is 0xffffffff.
>>
>>         /* don't allocate too high if the pref mem doesn't support 64bit*/
>>         if (!(res->flags & IORESOURCE_MEM_64))
>>                 max = PCIBIOS_MAX_MEM_32;
>>
>> I think we should fix this with a separate patch that removes
>> PCIBIOS_MAX_MEM_32 altogether, replacing this use with an explicit
>> 0xffffffff (or some other "max 32-bit value" symbol).  I don't think
>> there's anything arch-specific about this.
>>
>> So I'd like to see two patches here:
>>   1) Avoid allocating 64-bit regions for 32-bit BARs
>>   2) Try to allocate regions above 4GB for 64-bit BARs
>
> Do we also need to track the maximum address available to the CPU?

We are allocating from the resources available on this bus, which
means the host bridge apertures or a subset.  If the arch supplies
those, that should be enough.  If it doesn't, we default to
iomem_resource.  On x86, we trim that down to only the supported
address space:

        iomem_resource.end = (1ULL << boot_cpu_data.x86_phys_bits) - 1;

But I'm sure some other architectures don't do that yet.  I guess
that's one of the risks -- if an arch is doesn't tell us the apertures
and doesn't trim iomem_resource, we could allocate a non-addressable
region.

Bjorn

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-25 21:55                   ` Bjorn Helgaas
  2012-05-25 21:58                     ` H. Peter Anvin
@ 2012-05-25 23:10                     ` Yinghai Lu
  2012-05-26  0:12                       ` Bjorn Helgaas
  1 sibling, 1 reply; 55+ messages in thread
From: Yinghai Lu @ 2012-05-25 23:10 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Linus Torvalds, Steven Newbury, H. Peter Anvin, Andrew Morton,
	linux-pci, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 549 bytes --]

On Fri, May 25, 2012 at 2:55 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> I think we should fix this with a separate patch that removes
> PCIBIOS_MAX_MEM_32 altogether, replacing this use with an explicit
> 0xffffffff (or some other "max 32-bit value" symbol).  I don't think
> there's anything arch-specific about this.
>
> So I'd like to see two patches here:
>  1) Avoid allocating 64-bit regions for 32-bit BARs
>  2) Try to allocate regions above 4GB for 64-bit BARs

Sure. please check updated two patches.

Thanks

Yinghai

[-- Attachment #2: 32_bit_bar_allocation.patch --]
[-- Type: application/octet-stream, Size: 2556 bytes --]

Subject: [PATCH] PCI: cap resource allocation for 32bit bar for all archititures

Should not limit to x86 platform, that is pci requirement.

Add PCI_MAX_RESOURCE_32 and use that instead.

Also kill PCIBIOS_MAX_MEM_32.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/include/asm/pci.h |    1 -
 drivers/pci/bus.c          |    4 ++--
 drivers/pci/pci.h          |    3 +++
 include/linux/pci.h        |    4 ----
 4 files changed, 5 insertions(+), 7 deletions(-)

Index: linux-2.6/drivers/pci/bus.c
===================================================================
--- linux-2.6.orig/drivers/pci/bus.c
+++ linux-2.6/drivers/pci/bus.c
@@ -121,13 +121,13 @@ pci_bus_alloc_resource(struct pci_bus *b
 {
 	int i, ret = -ENOMEM;
 	struct resource *r;
-	resource_size_t max = -1;
+	resource_size_t max = PCI_MAX_RESOURCE;
 
 	type_mask |= IORESOURCE_IO | IORESOURCE_MEM;
 
 	/* don't allocate too high if the pref mem doesn't support 64bit*/
 	if (!(res->flags & IORESOURCE_MEM_64))
-		max = PCIBIOS_MAX_MEM_32;
+		max = PCI_MAX_RESOURCE_32;
 
 	pci_bus_for_each_resource(bus, r, i) {
 		if (!r)
Index: linux-2.6/drivers/pci/pci.h
===================================================================
--- linux-2.6.orig/drivers/pci/pci.h
+++ linux-2.6/drivers/pci/pci.h
@@ -208,6 +208,9 @@ enum pci_bar_type {
 	pci_bar_mem64,		/* A 64-bit memory BAR */
 };
 
+#define PCI_MAX_RESOURCE	((resource_size_t)~0)
+#define PCI_MAX_RESOURCE_32	((resource_size_t)0xffffffff)
+
 bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *pl,
 				int crs_timeout);
 extern int pci_setup_device(struct pci_dev *dev);
Index: linux-2.6/arch/x86/include/asm/pci.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/pci.h
+++ linux-2.6/arch/x86/include/asm/pci.h
@@ -149,7 +149,6 @@ void default_restore_msi_irqs(struct pci
 
 /* generic pci stuff */
 #include <asm-generic/pci.h>
-#define PCIBIOS_MAX_MEM_32 0xffffffff
 
 #ifdef CONFIG_NUMA
 /* Returns the node based on pci bus */
Index: linux-2.6/include/linux/pci.h
===================================================================
--- linux-2.6.orig/include/linux/pci.h
+++ linux-2.6/include/linux/pci.h
@@ -1340,10 +1340,6 @@ static inline int pci_domain_nr(struct p
 
 #include <asm/pci.h>
 
-#ifndef PCIBIOS_MAX_MEM_32
-#define PCIBIOS_MAX_MEM_32 (-1)
-#endif
-
 /* these helpers provide future and backwards compatibility
  * for accessing popular PCI BAR info */
 #define pci_resource_start(dev, bar)	((dev)->resource[(bar)].start)

[-- Attachment #3: allocate_high_at_first_v4.patch --]
[-- Type: application/octet-stream, Size: 2140 bytes --]

Subject: [PATCH] PCI: Try to allocate mem64 above 4G at first

and will fall back to below 4g if it can not find any above 4g.

x86 32bit without X86_PAE support will have bottom set to 0, becuase
resource_size_t is 32bit.

Also for 32bit with resource_size_t 64bit kernel on machine with pae support
we are safe because iomem_resource is limited to 32bit according to
x86_phys_bits.

-v2: update bottom assigning to make it clear for non-pae support machine.
-v3: Bjorn's change:
	use MAX_REOURCE instead of -1
	use start/end instead of bottom/max
	for all arch instead of just x86_64
-v4: updated after PCI_MAX_RESOURCE_32 change.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/bus.c |   29 ++++++++++++++++++++++-------
 1 file changed, 22 insertions(+), 7 deletions(-)

Index: linux-2.6/drivers/pci/bus.c
===================================================================
--- linux-2.6.orig/drivers/pci/bus.c
+++ linux-2.6/drivers/pci/bus.c
@@ -121,14 +121,23 @@ pci_bus_alloc_resource(struct pci_bus *b
 {
 	int i, ret = -ENOMEM;
 	struct resource *r;
-	resource_size_t max = PCI_MAX_RESOURCE;
+	resource_size_t end = PCI_MAX_RESOURCE;
+	resource_size_t start = 0;
 
 	type_mask |= IORESOURCE_IO | IORESOURCE_MEM;
 
-	/* don't allocate too high if the pref mem doesn't support 64bit*/
-	if (!(res->flags & IORESOURCE_MEM_64))
-		max = PCI_MAX_RESOURCE_32;
+	/*
+	 * don't allocate too high if the pref mem doesn't support 64bit
+	 * also if this is a 64-bit mem resource, try above 4GB first
+	 */
+	if (res->flags & IORESOURCE_MEM) {
+		if (!(res->flags & IORESOURCE_MEM_64))
+			end = PCI_MAX_RESOURCE_32;
+		else
+			start = (resource_size_t)(1ULL<<32);
+	}
 
+again:
 	pci_bus_for_each_resource(bus, r, i) {
 		if (!r)
 			continue;
@@ -145,12 +154,18 @@ pci_bus_alloc_resource(struct pci_bus *b
 
 		/* Ok, try it out.. */
 		ret = allocate_resource(r, res, size,
-					r->start ? : min,
-					max, align,
+					max(start, r->start ? : min),
+					end, align,
 					alignf, alignf_data);
 		if (ret == 0)
-			break;
+			return 0;
+	}
+
+	if (start != 0) {
+		start = 0;
+		goto again;
 	}
+
 	return ret;
 }
 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-25 23:10                     ` Yinghai Lu
@ 2012-05-26  0:12                       ` Bjorn Helgaas
  2012-05-26 15:01                         ` Bjorn Helgaas
  2012-05-29 17:55                         ` Yinghai Lu
  0 siblings, 2 replies; 55+ messages in thread
From: Bjorn Helgaas @ 2012-05-26  0:12 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Linus Torvalds, Steven Newbury, H. Peter Anvin, Andrew Morton,
	linux-pci, linux-kernel

On Fri, May 25, 2012 at 5:10 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, May 25, 2012 at 2:55 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> I think we should fix this with a separate patch that removes
>> PCIBIOS_MAX_MEM_32 altogether, replacing this use with an explicit
>> 0xffffffff (or some other "max 32-bit value" symbol).  I don't think
>> there's anything arch-specific about this.
>>
>> So I'd like to see two patches here:
>>  1) Avoid allocating 64-bit regions for 32-bit BARs
>>  2) Try to allocate regions above 4GB for 64-bit BARs
>
> Sure. please check updated two patches.

I think the first one looks good.

I'm curious about the second.  Why did you add the IORESOURCE_MEM
test?  That's doesn't affect the "start =" piece because
IORESOURCE_MEM is always set if IORESOURCE_MEM_64 is set.

But it does affect the "end =" part.  Previously we limited all I/O
and 32-bit mem BARs to the low 4GB.  This patch makes it so we limit
32-bit mem BARs to the low 4GB, but we don't limit I/O BARs.  But I/O
BARs can only have 32 bits of address, so it seems like we should
limit them the same way as 32-bit mem BARs.  So I expected something
like this:

    if (res->flags & IORESOURCE_MEM_64) {
        start = (resource_size_t) (1ULL << 32);
        end = PCI_MAX_RESOURCE;
    } else {
        start = 0;
        end = PCI_MAX_RESOURCE_32;
    }

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-26  0:12                       ` Bjorn Helgaas
@ 2012-05-26 15:01                         ` Bjorn Helgaas
  2012-05-29 17:56                           ` Yinghai Lu
  2012-05-29 17:55                         ` Yinghai Lu
  1 sibling, 1 reply; 55+ messages in thread
From: Bjorn Helgaas @ 2012-05-26 15:01 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Linus Torvalds, Steven Newbury, H. Peter Anvin, Andrew Morton,
	linux-pci, linux-kernel

On Fri, May 25, 2012 at 6:12 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Fri, May 25, 2012 at 5:10 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Fri, May 25, 2012 at 2:55 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>> I think we should fix this with a separate patch that removes
>>> PCIBIOS_MAX_MEM_32 altogether, replacing this use with an explicit
>>> 0xffffffff (or some other "max 32-bit value" symbol).  I don't think
>>> there's anything arch-specific about this.
>>>
>>> So I'd like to see two patches here:
>>>  1) Avoid allocating 64-bit regions for 32-bit BARs
>>>  2) Try to allocate regions above 4GB for 64-bit BARs
>>
>> Sure. please check updated two patches.
>
> I think the first one looks good.
>
> I'm curious about the second.  Why did you add the IORESOURCE_MEM
> test?  That's doesn't affect the "start =" piece because
> IORESOURCE_MEM is always set if IORESOURCE_MEM_64 is set.
>
> But it does affect the "end =" part.  Previously we limited all I/O
> and 32-bit mem BARs to the low 4GB.  This patch makes it so we limit
> 32-bit mem BARs to the low 4GB, but we don't limit I/O BARs.  But I/O
> BARs can only have 32 bits of address, so it seems like we should
> limit them the same way as 32-bit mem BARs.  So I expected something
> like this:
>
>    if (res->flags & IORESOURCE_MEM_64) {
>        start = (resource_size_t) (1ULL << 32);
>        end = PCI_MAX_RESOURCE;
>    } else {
>        start = 0;
>        end = PCI_MAX_RESOURCE_32;
>    }

Another bug here: we're trying to restrict the *bus* addresses we
allocate, but we're applying the limits to *CPU* addresses.
Therefore, this only works as intended when CPU addresses are the same
as bus addresses, i.e., when the host bridge applies no address
translation.  That happens to be the case for x86, but is not the case
in general.

I think we need a third patch to fix this problem.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-26  0:12                       ` Bjorn Helgaas
  2012-05-26 15:01                         ` Bjorn Helgaas
@ 2012-05-29 17:55                         ` Yinghai Lu
  2012-05-29 17:57                           ` H. Peter Anvin
  1 sibling, 1 reply; 55+ messages in thread
From: Yinghai Lu @ 2012-05-29 17:55 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Linus Torvalds, Steven Newbury, H. Peter Anvin, Andrew Morton,
	linux-pci, linux-kernel

On Fri, May 25, 2012 at 5:12 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Fri, May 25, 2012 at 5:10 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Fri, May 25, 2012 at 2:55 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>> I think we should fix this with a separate patch that removes
>>> PCIBIOS_MAX_MEM_32 altogether, replacing this use with an explicit
>>> 0xffffffff (or some other "max 32-bit value" symbol).  I don't think
>>> there's anything arch-specific about this.
>>>
>>> So I'd like to see two patches here:
>>>  1) Avoid allocating 64-bit regions for 32-bit BARs
>>>  2) Try to allocate regions above 4GB for 64-bit BARs
>>
>> Sure. please check updated two patches.
>
> I think the first one looks good.
>
> I'm curious about the second.  Why did you add the IORESOURCE_MEM
> test?  That's doesn't affect the "start =" piece because
> IORESOURCE_MEM is always set if IORESOURCE_MEM_64 is set.
>
> But it does affect the "end =" part.  Previously we limited all I/O
> and 32-bit mem BARs to the low 4GB.  This patch makes it so we limit
> 32-bit mem BARs to the low 4GB, but we don't limit I/O BARs.  But I/O
> BARs can only have 32 bits of address, so it seems like we should
> limit them the same way as 32-bit mem BARs.  So I expected something
> like this:
>
>    if (res->flags & IORESOURCE_MEM_64) {
>        start = (resource_size_t) (1ULL << 32);
>        end = PCI_MAX_RESOURCE;
>    } else {
>        start = 0;
>        end = PCI_MAX_RESOURCE_32;
>    }

x86 are using 16bits.

some others use 32 bits.
#define IO_SPACE_LIMIT 0xffffffff

ia64 and sparc are using 64bits.
#define IO_SPACE_LIMIT               0xffffffffffffffffUL

but pci only support 16bits and 32bits.

maybe later we can add
PCI_MAX_RESOURCE_16

to handle 16bits and 32bit io ports.

Yinghai

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-26 15:01                         ` Bjorn Helgaas
@ 2012-05-29 17:56                           ` Yinghai Lu
  0 siblings, 0 replies; 55+ messages in thread
From: Yinghai Lu @ 2012-05-29 17:56 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Linus Torvalds, Steven Newbury, H. Peter Anvin, Andrew Morton,
	linux-pci, linux-kernel

On Sat, May 26, 2012 at 8:01 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> Another bug here: we're trying to restrict the *bus* addresses we
> allocate, but we're applying the limits to *CPU* addresses.
> Therefore, this only works as intended when CPU addresses are the same
> as bus addresses, i.e., when the host bridge applies no address
> translation.  That happens to be the case for x86, but is not the case
> in general.
>
> I think we need a third patch to fix this problem.

yes.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-29 17:55                         ` Yinghai Lu
@ 2012-05-29 17:57                           ` H. Peter Anvin
  2012-05-29 18:17                             ` Yinghai Lu
  0 siblings, 1 reply; 55+ messages in thread
From: H. Peter Anvin @ 2012-05-29 17:57 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Linus Torvalds, Steven Newbury, Andrew Morton,
	linux-pci, linux-kernel

On 05/29/2012 10:55 AM, Yinghai Lu wrote:
> 
> x86 are using 16bits.
> 
> some others use 32 bits.
> #define IO_SPACE_LIMIT 0xffffffff
> 
> ia64 and sparc are using 64bits.
> #define IO_SPACE_LIMIT               0xffffffffffffffffUL
> 
> but pci only support 16bits and 32bits.
> 
> maybe later we can add
> PCI_MAX_RESOURCE_16
> 
> to handle 16bits and 32bit io ports.
> 

Shouldn't this be dealt by root port apertures?

	-hpa

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-29 17:57                           ` H. Peter Anvin
@ 2012-05-29 18:17                             ` Yinghai Lu
  2012-05-29 19:03                               ` H. Peter Anvin
  2012-05-29 19:23                               ` Bjorn Helgaas
  0 siblings, 2 replies; 55+ messages in thread
From: Yinghai Lu @ 2012-05-29 18:17 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Bjorn Helgaas, Linus Torvalds, Steven Newbury, Andrew Morton,
	linux-pci, linux-kernel

On Tue, May 29, 2012 at 10:57 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 05/29/2012 10:55 AM, Yinghai Lu wrote:
>>
>> x86 are using 16bits.
>>
>> some others use 32 bits.
>> #define IO_SPACE_LIMIT 0xffffffff
>>
>> ia64 and sparc are using 64bits.
>> #define IO_SPACE_LIMIT               0xffffffffffffffffUL
>>
>> but pci only support 16bits and 32bits.
>>
>> maybe later we can add
>> PCI_MAX_RESOURCE_16
>>
>> to handle 16bits and 32bit io ports.
>>
>
> Shouldn't this be dealt by root port apertures?
>

pci bridge could support 16bits and 32bits io port.
but we did not record if 32bits is supported.

so during allocating, could have allocated above 64k address to non
32bit bridge.

but  x86 is ok, because ioport.end always set to 0xffff.
other arches with IO_SPACE_LIMIT with 0xffffffff or
0xffffffffffffffffUL may have problem.

Yinghai

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-29 18:17                             ` Yinghai Lu
@ 2012-05-29 19:03                               ` H. Peter Anvin
  2012-05-29 20:46                                 ` Yinghai Lu
  2012-05-29 19:23                               ` Bjorn Helgaas
  1 sibling, 1 reply; 55+ messages in thread
From: H. Peter Anvin @ 2012-05-29 19:03 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Linus Torvalds, Steven Newbury, Andrew Morton,
	linux-pci, linux-kernel

On 05/29/2012 11:17 AM, Yinghai Lu wrote:
> 
> pci bridge could support 16bits and 32bits io port.
> but we did not record if 32bits is supported.
> 

Okay, so this is the actual problem, no?

> so during allocating, could have allocated above 64k address to non
> 32bit bridge.
> 
> but  x86 is ok, because ioport.end always set to 0xffff.
> other arches with IO_SPACE_LIMIT with 0xffffffff or
> 0xffffffffffffffffUL may have problem.

The latter is nonsense, the PCI-side address space is only 32 bits wide.

	-hpa



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-29 18:17                             ` Yinghai Lu
  2012-05-29 19:03                               ` H. Peter Anvin
@ 2012-05-29 19:23                               ` Bjorn Helgaas
  2012-05-29 20:40                                 ` Yinghai Lu
  1 sibling, 1 reply; 55+ messages in thread
From: Bjorn Helgaas @ 2012-05-29 19:23 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Linus Torvalds, Steven Newbury, Andrew Morton,
	linux-pci, linux-kernel

On Tue, May 29, 2012 at 12:17 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Tue, May 29, 2012 at 10:57 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 05/29/2012 10:55 AM, Yinghai Lu wrote:
>>>
>>> x86 are using 16bits.
>>>
>>> some others use 32 bits.
>>> #define IO_SPACE_LIMIT 0xffffffff
>>>
>>> ia64 and sparc are using 64bits.
>>> #define IO_SPACE_LIMIT               0xffffffffffffffffUL
>>>
>>> but pci only support 16bits and 32bits.
>>>
>>> maybe later we can add
>>> PCI_MAX_RESOURCE_16
>>>
>>> to handle 16bits and 32bit io ports.
>>>
>>
>> Shouldn't this be dealt by root port apertures?
>>
>
> pci bridge could support 16bits and 32bits io port.
> but we did not record if 32bits is supported.
>
> so during allocating, could have allocated above 64k address to non
> 32bit bridge.
>
> but  x86 is ok, because ioport.end always set to 0xffff.
> other arches with IO_SPACE_LIMIT with 0xffffffff or
> 0xffffffffffffffffUL may have problem.

I think current IO_SPACE_LIMIT usage is a little confused.  The
"ioport_resource.end = IO_SPACE_LIMIT" in kernel/resource.c refers to
a CPU-side address, not a bus address.  Other uses, e.g., in
__pci_read_base(), apply it to bus addresses from BARs, which is
wrong.  Host bridges apply I/O port offsets just like they apply
memory offsets.  The ia64 IO_SPACE_LIMIT of 0xffffffffffffffffUL means
there's no restriction on CPU-side I/O port addresses, but any given
host bridge will translate its I/O port aperture to bus addresses that
fit in 32 bits.

None of this is really relevant to the question I asked, namely, "why
Yinghai's patch doesn't limit I/O BAR values to 32 bits?"  That
constraint is clearly a requirement because I/O BARs are only 32 bits
wide, but I don't think it needs to be enforced in the code here.  The
host bridge or upstream P2P bridge apertures should already take care
of that automatically.  I don't think the 16- or 32-bitness of P2P
bridge apertures is relevant here, because the I/O resources available
on the secondary bus already reflect that.

After all that discussion, I think my objection here boils down to
"you shouldn't change the I/O BAR constraints in a patch that claims
to allocate 64-bit *memory* BARs above 4GB."

I think the code below is still the clearest way to set the constraints:

   if (res->flags & IORESOURCE_MEM_64) {
       start = (resource_size_t) (1ULL << 32);
       end = PCI_MAX_RESOURCE;
   } else {
       start = 0;
       end = PCI_MAX_RESOURCE_32;
   }

It's not strictly necessary to limit I/O BARs to PCI_MAX_RESOURCE_32
because host bridge apertures should already enforce that, but I think
the code above just makes it clearer.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-29 19:23                               ` Bjorn Helgaas
@ 2012-05-29 20:40                                 ` Yinghai Lu
  2012-05-29 23:24                                   ` Bjorn Helgaas
  2012-05-29 23:27                                   ` Bjorn Helgaas
  0 siblings, 2 replies; 55+ messages in thread
From: Yinghai Lu @ 2012-05-29 20:40 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: H. Peter Anvin, Linus Torvalds, Steven Newbury, Andrew Morton,
	linux-pci, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3155 bytes --]

On Tue, May 29, 2012 at 12:23 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Tue, May 29, 2012 at 12:17 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Tue, May 29, 2012 at 10:57 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>>> On 05/29/2012 10:55 AM, Yinghai Lu wrote:
>>>>
>>>> x86 are using 16bits.
>>>>
>>>> some others use 32 bits.
>>>> #define IO_SPACE_LIMIT 0xffffffff
>>>>
>>>> ia64 and sparc are using 64bits.
>>>> #define IO_SPACE_LIMIT               0xffffffffffffffffUL
>>>>
>>>> but pci only support 16bits and 32bits.
>>>>
>>>> maybe later we can add
>>>> PCI_MAX_RESOURCE_16
>>>>
>>>> to handle 16bits and 32bit io ports.
>>>>
>>>
>>> Shouldn't this be dealt by root port apertures?
>>>
>>
>> pci bridge could support 16bits and 32bits io port.
>> but we did not record if 32bits is supported.
>>
>> so during allocating, could have allocated above 64k address to non
>> 32bit bridge.
>>
>> but  x86 is ok, because ioport.end always set to 0xffff.
>> other arches with IO_SPACE_LIMIT with 0xffffffff or
>> 0xffffffffffffffffUL may have problem.
>
> I think current IO_SPACE_LIMIT usage is a little confused.  The
> "ioport_resource.end = IO_SPACE_LIMIT" in kernel/resource.c refers to
> a CPU-side address, not a bus address.  Other uses, e.g., in
> __pci_read_base(), apply it to bus addresses from BARs, which is
> wrong.  Host bridges apply I/O port offsets just like they apply
> memory offsets.  The ia64 IO_SPACE_LIMIT of 0xffffffffffffffffUL means
> there's no restriction on CPU-side I/O port addresses, but any given
> host bridge will translate its I/O port aperture to bus addresses that
> fit in 32 bits.
>
> None of this is really relevant to the question I asked, namely, "why
> Yinghai's patch doesn't limit I/O BAR values to 32 bits?"  That
> constraint is clearly a requirement because I/O BARs are only 32 bits
> wide, but I don't think it needs to be enforced in the code here.  The
> host bridge or upstream P2P bridge apertures should already take care
> of that automatically.  I don't think the 16- or 32-bitness of P2P
> bridge apertures is relevant here, because the I/O resources available
> on the secondary bus already reflect that.
>
> After all that discussion, I think my objection here boils down to
> "you shouldn't change the I/O BAR constraints in a patch that claims
> to allocate 64-bit *memory* BARs above 4GB."
>
> I think the code below is still the clearest way to set the constraints:
>
>   if (res->flags & IORESOURCE_MEM_64) {
>       start = (resource_size_t) (1ULL << 32);
>       end = PCI_MAX_RESOURCE;
>   } else {
>       start = 0;
>       end = PCI_MAX_RESOURCE_32;
>   }
>
> It's not strictly necessary to limit I/O BARs to PCI_MAX_RESOURCE_32
> because host bridge apertures should already enforce that, but I think
> the code above just makes it clearer.


ok, please check the version, that put back PCI_MAX_RESOURCE_32 for io ports.

also RFC to limit for 16 bit ioport handling.  only help other arches
that does support 32bit ioports but have bridges only support 16bit io
ports.

Thanks

Yinghai

[-- Attachment #2: allocate_high_at_first_v5.patch --]
[-- Type: application/octet-stream, Size: 2183 bytes --]

Subject: [PATCH] PCI: Try to allocate mem64 above 4G at first

and will fall back to below 4g if it can not find any above 4g.

x86 32bit without X86_PAE support will have bottom set to 0, becuase
resource_size_t is 32bit.

Also for 32bit with resource_size_t 64bit kernel on machine with pae support
we are safe because iomem_resource is limited to 32bit according to
x86_phys_bits.

-v2: update bottom assigning to make it clear for non-pae support machine.
-v3: Bjorn's change:
	use MAX_REOURCE instead of -1
	use start/end instead of bottom/max
	for all arch instead of just x86_64
-v4: updated after PCI_MAX_RESOURCE_32 change.
-v5: restore io handling to use PCI_MAX_RESOURCE_32 as limit.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/bus.c |   30 +++++++++++++++++++++++-------
 1 file changed, 23 insertions(+), 7 deletions(-)

Index: linux-2.6/drivers/pci/bus.c
===================================================================
--- linux-2.6.orig/drivers/pci/bus.c
+++ linux-2.6/drivers/pci/bus.c
@@ -121,14 +121,24 @@ pci_bus_alloc_resource(struct pci_bus *b
 {
 	int i, ret = -ENOMEM;
 	struct resource *r;
-	resource_size_t max = PCI_MAX_RESOURCE;
+	resource_size_t end;
+	resource_size_t start;
 
 	type_mask |= IORESOURCE_IO | IORESOURCE_MEM;
 
-	/* don't allocate too high if the pref mem doesn't support 64bit*/
-	if (!(res->flags & IORESOURCE_MEM_64))
-		max = PCI_MAX_RESOURCE_32;
+	/*
+	 * don't allocate too high if the pref mem doesn't support 64bit
+	 * also if this is a 64-bit mem resource, try above 4GB first
+	 */
+	if (res->flags & IORESOURCE_MEM_64) {
+		start = (resource_size_t)(1ULL<<32);
+		end = PCI_MAX_RESOURCE;
+	} else {
+		start = 0;
+		end = PCI_MAX_RESOURCE_32;
+	}
 
+again:
 	pci_bus_for_each_resource(bus, r, i) {
 		if (!r)
 			continue;
@@ -145,12 +155,18 @@ pci_bus_alloc_resource(struct pci_bus *b
 
 		/* Ok, try it out.. */
 		ret = allocate_resource(r, res, size,
-					r->start ? : min,
-					max, align,
+					max(start, r->start ? : min),
+					end, align,
 					alignf, alignf_data);
 		if (ret == 0)
-			break;
+			return 0;
+	}
+
+	if (start != 0) {
+		start = 0;
+		goto again;
 	}
+
 	return ret;
 }
 

[-- Attachment #3: fix_32bit_ioport.patch --]
[-- Type: application/octet-stream, Size: 5224 bytes --]

Subject: [RFC PATCH] PCI: io port 16/32 bits allocation

pci bridge could only support 16bit io port.

some arches are supporting 32bit io ports.

So need to make sure 16bit ioport will be allocated under 64k.
For 32bit ioport, will try to allocate that [64k, 4g), and if it fail
will fall back to [0, 64k).

It is some kind of clone of 64bit mem io handling.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/bus.c       |   20 +++++++++++++++-----
 drivers/pci/pci.h       |    1 +
 drivers/pci/probe.c     |    4 +++-
 drivers/pci/setup-bus.c |   11 +++++++++--
 include/linux/ioport.h  |    1 +
 5 files changed, 29 insertions(+), 8 deletions(-)

Index: linux-2.6/drivers/pci/bus.c
===================================================================
--- linux-2.6.orig/drivers/pci/bus.c
+++ linux-2.6/drivers/pci/bus.c
@@ -130,12 +130,22 @@ pci_bus_alloc_resource(struct pci_bus *b
 	 * don't allocate too high if the pref mem doesn't support 64bit
 	 * also if this is a 64-bit mem resource, try above 4GB first
 	 */
-	if (res->flags & IORESOURCE_MEM_64) {
-		start = (resource_size_t)(1ULL<<32);
-		end = PCI_MAX_RESOURCE;
+	if (res->flags & IORESOURCE_MEM) {
+		if (res->flags & IORESOURCE_MEM_64) {
+			start = (resource_size_t)(1ULL<<32);
+			end = PCI_MAX_RESOURCE;
+		} else {
+			start = 0;
+			end = PCI_MAX_RESOURCE_32;
+		}
 	} else {
-		start = 0;
-		end = PCI_MAX_RESOURCE_32;
+		if (res->flags & IORESOURCE_IO_32) {
+			start = PCI_MAX_RESOURCE_16 + 1;
+			end = PCI_MAX_RESOURCE_32;
+		} else {
+			start = 0;
+			end = PCI_MAX_RESOURCE_16;
+		}
 	}
 
 again:
Index: linux-2.6/drivers/pci/pci.h
===================================================================
--- linux-2.6.orig/drivers/pci/pci.h
+++ linux-2.6/drivers/pci/pci.h
@@ -210,6 +210,7 @@ enum pci_bar_type {
 
 #define PCI_MAX_RESOURCE	((resource_size_t)~0)
 #define PCI_MAX_RESOURCE_32	((resource_size_t)0xffffffff)
+#define PCI_MAX_RESOURCE_16	((resource_size_t)0xffff)
 
 bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *pl,
 				int crs_timeout);
Index: linux-2.6/include/linux/ioport.h
===================================================================
--- linux-2.6.orig/include/linux/ioport.h
+++ linux-2.6/include/linux/ioport.h
@@ -47,6 +47,7 @@ struct resource {
 #define IORESOURCE_MEM_64	0x00100000
 #define IORESOURCE_WINDOW	0x00200000	/* forwarded by bridge */
 #define IORESOURCE_MUXED	0x00400000	/* Resource is software muxed */
+#define IORESOURCE_IO_32	0x00800000
 
 #define IORESOURCE_EXCLUSIVE	0x08000000	/* Userland may not map this resource */
 #define IORESOURCE_DISABLED	0x10000000
Index: linux-2.6/drivers/pci/probe.c
===================================================================
--- linux-2.6.orig/drivers/pci/probe.c
+++ linux-2.6/drivers/pci/probe.c
@@ -130,7 +130,7 @@ static inline unsigned long decode_bar(s
 
 	if ((bar & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO) {
 		flags = bar & ~PCI_BASE_ADDRESS_IO_MASK;
-		flags |= IORESOURCE_IO;
+		flags |= IORESOURCE_IO | IORESOURCE_IO_32;
 		return flags;
 	}
 
@@ -326,6 +326,8 @@ static void __devinit pci_read_bridge_io
 
 	if (base && base <= limit) {
 		res->flags = (io_base_lo & PCI_IO_RANGE_TYPE_MASK) | IORESOURCE_IO;
+		if ((io_base_lo & PCI_IO_RANGE_TYPE_MASK) == PCI_IO_RANGE_TYPE_32)
+			res->flags |= IORESOURCE_IO_32;
 		res2.flags = res->flags;
 		region.start = base;
 		region.end = limit + 0xfff;
Index: linux-2.6/drivers/pci/setup-bus.c
===================================================================
--- linux-2.6.orig/drivers/pci/setup-bus.c
+++ linux-2.6/drivers/pci/setup-bus.c
@@ -595,8 +595,11 @@ static void pci_bridge_check_ranges(stru
 		pci_read_config_word(bridge, PCI_IO_BASE, &io);
  		pci_write_config_word(bridge, PCI_IO_BASE, 0x0);
  	}
- 	if (io)
+ 	if (io) {
 		b_res[0].flags |= IORESOURCE_IO;
+		if ((io & PCI_IO_RANGE_TYPE_MASK) == PCI_IO_RANGE_TYPE_32)
+			b_res[0].flags |= IORESOURCE_IO_32;
+	}
 	/*  DECchip 21050 pass 2 errata: the bridge may miss an address
 	    disconnect boundary by one PCI data phase.
 	    Workaround: do not use prefetching on this device. */
@@ -710,10 +713,13 @@ static void pbus_size_io(struct pci_bus
 	struct resource *b_res = find_free_bus_resource(bus, IORESOURCE_IO);
 	unsigned long size = 0, size0 = 0, size1 = 0;
 	resource_size_t children_add_size = 0;
+	unsigned int io32_mask = 0;
 
 	if (!b_res)
  		return;
 
+	io32_mask = b_res->flags & IORESOURCE_IO_32;
+	b_res->flags &= ~IORESOURCE_IO_32;
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		int i;
 
@@ -731,6 +737,7 @@ static void pbus_size_io(struct pci_bus
 			else
 				size1 += r_size;
 
+			io32_mask &= r->flags & IORESOURCE_IO_32;
 			if (realloc_head)
 				children_add_size += get_res_add_size(realloc_head, r);
 		}
@@ -753,7 +760,7 @@ static void pbus_size_io(struct pci_bus
 	/* Alignment of the IO window is always 4K */
 	b_res->start = 4096;
 	b_res->end = b_res->start + size0 - 1;
-	b_res->flags |= IORESOURCE_STARTALIGN;
+	b_res->flags |= IORESOURCE_STARTALIGN | io32_mask;
 	if (size1 > size0 && realloc_head) {
 		add_to_list(realloc_head, bus->self, b_res, size1-size0, 4096);
 		dev_printk(KERN_DEBUG, &bus->self->dev, "bridge window "

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-29 19:03                               ` H. Peter Anvin
@ 2012-05-29 20:46                                 ` Yinghai Lu
  2012-05-29 20:50                                   ` H. Peter Anvin
  2012-05-29 20:53                                   ` David Miller
  0 siblings, 2 replies; 55+ messages in thread
From: Yinghai Lu @ 2012-05-29 20:46 UTC (permalink / raw)
  To: H. Peter Anvin, David Miller, Tony Luck
  Cc: Bjorn Helgaas, Linus Torvalds, Steven Newbury, Andrew Morton,
	linux-pci, linux-kernel

On Tue, May 29, 2012 at 12:03 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 05/29/2012 11:17 AM, Yinghai Lu wrote:
>>
>> pci bridge could support 16bits and 32bits io port.
>> but we did not record if 32bits is supported.
>>
>
> Okay, so this is the actual problem, no?

their fw could not need kernel help to allocate io ports, or they are
only use device that support 32bit ioport.

>
>> so during allocating, could have allocated above 64k address to non
>> 32bit bridge.
>>
>> but  x86 is ok, because ioport.end always set to 0xffff.
>> other arches with IO_SPACE_LIMIT with 0xffffffff or
>> 0xffffffffffffffffUL may have problem.
>
> The latter is nonsense, the PCI-side address space is only 32 bits wide.
>
maybe they have unified io include ioport and mem io?

Yinghai

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-29 20:46                                 ` Yinghai Lu
@ 2012-05-29 20:50                                   ` H. Peter Anvin
  2012-06-01 23:30                                     ` Yinghai Lu
  2012-05-29 20:53                                   ` David Miller
  1 sibling, 1 reply; 55+ messages in thread
From: H. Peter Anvin @ 2012-05-29 20:50 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: David Miller, Tony Luck, Bjorn Helgaas, Linus Torvalds,
	Steven Newbury, Andrew Morton, linux-pci, linux-kernel

On 05/29/2012 01:46 PM, Yinghai Lu wrote:
> On Tue, May 29, 2012 at 12:03 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 05/29/2012 11:17 AM, Yinghai Lu wrote:
>>>
>>> pci bridge could support 16bits and 32bits io port.
>>> but we did not record if 32bits is supported.
>>>
>>
>> Okay, so this is the actual problem, no?
> 
> their fw could not need kernel help to allocate io ports, or they are
> only use device that support 32bit ioport.
> 

That barely parses, never mind makes sense.

>>
>>> so during allocating, could have allocated above 64k address to non
>>> 32bit bridge.
>>>
>>> but  x86 is ok, because ioport.end always set to 0xffff.
>>> other arches with IO_SPACE_LIMIT with 0xffffffff or
>>> 0xffffffffffffffffUL may have problem.
>>
>> The latter is nonsense, the PCI-side address space is only 32 bits wide.
>>
> maybe they have unified io include ioport and mem io?
> 

The bus-side address space should not be more than 32 bits no matter
what.  As Bjorn indicates, you seem to be mixing up bus and cpu
addresses all over the place.

	-hpa


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-29 20:46                                 ` Yinghai Lu
  2012-05-29 20:50                                   ` H. Peter Anvin
@ 2012-05-29 20:53                                   ` David Miller
  1 sibling, 0 replies; 55+ messages in thread
From: David Miller @ 2012-05-29 20:53 UTC (permalink / raw)
  To: yinghai
  Cc: hpa, tony.luck, bhelgaas, torvalds, steve, akpm, linux-pci, linux-kernel

From: Yinghai Lu <yinghai@kernel.org>
Date: Tue, 29 May 2012 13:46:03 -0700

> On Tue, May 29, 2012 at 12:03 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 05/29/2012 11:17 AM, Yinghai Lu wrote:
>>> so during allocating, could have allocated above 64k address to non
>>> 32bit bridge.
>>>
>>> but  x86 is ok, because ioport.end always set to 0xffff.
>>> other arches with IO_SPACE_LIMIT with 0xffffffff or
>>> 0xffffffffffffffffUL may have problem.
>>
>> The latter is nonsense, the PCI-side address space is only 32 bits wide.
>>
> maybe they have unified io include ioport and mem io?

The resource values stored are the actual physical I/O addresses on
some platforms.  Sparc, for example, falls into this category.

Therefore ~0UL is the only feasible limit for them.

We have to distinguish explicitly when we are operating upon actual
PCI bus view addresses vs. the values that end up in the resources.
They are entirely different, and have completely different address
ranges.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-29 20:40                                 ` Yinghai Lu
@ 2012-05-29 23:24                                   ` Bjorn Helgaas
  2012-05-29 23:27                                   ` Bjorn Helgaas
  1 sibling, 0 replies; 55+ messages in thread
From: Bjorn Helgaas @ 2012-05-29 23:24 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Linus Torvalds, Steven Newbury, Andrew Morton,
	linux-pci, linux-kernel

On Tue, May 29, 2012 at 2:40 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Tue, May 29, 2012 at 12:23 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> On Tue, May 29, 2012 at 12:17 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>> On Tue, May 29, 2012 at 10:57 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>>>> On 05/29/2012 10:55 AM, Yinghai Lu wrote:
>>>>>
>>>>> x86 are using 16bits.
>>>>>
>>>>> some others use 32 bits.
>>>>> #define IO_SPACE_LIMIT 0xffffffff
>>>>>
>>>>> ia64 and sparc are using 64bits.
>>>>> #define IO_SPACE_LIMIT               0xffffffffffffffffUL
>>>>>
>>>>> but pci only support 16bits and 32bits.
>>>>>
>>>>> maybe later we can add
>>>>> PCI_MAX_RESOURCE_16
>>>>>
>>>>> to handle 16bits and 32bit io ports.
>>>>>
>>>>
>>>> Shouldn't this be dealt by root port apertures?
>>>>
>>>
>>> pci bridge could support 16bits and 32bits io port.
>>> but we did not record if 32bits is supported.
>>>
>>> so during allocating, could have allocated above 64k address to non
>>> 32bit bridge.
>>>
>>> but  x86 is ok, because ioport.end always set to 0xffff.
>>> other arches with IO_SPACE_LIMIT with 0xffffffff or
>>> 0xffffffffffffffffUL may have problem.
>>
>> I think current IO_SPACE_LIMIT usage is a little confused.  The
>> "ioport_resource.end = IO_SPACE_LIMIT" in kernel/resource.c refers to
>> a CPU-side address, not a bus address.  Other uses, e.g., in
>> __pci_read_base(), apply it to bus addresses from BARs, which is
>> wrong.  Host bridges apply I/O port offsets just like they apply
>> memory offsets.  The ia64 IO_SPACE_LIMIT of 0xffffffffffffffffUL means
>> there's no restriction on CPU-side I/O port addresses, but any given
>> host bridge will translate its I/O port aperture to bus addresses that
>> fit in 32 bits.
>>
>> None of this is really relevant to the question I asked, namely, "why
>> Yinghai's patch doesn't limit I/O BAR values to 32 bits?"  That
>> constraint is clearly a requirement because I/O BARs are only 32 bits
>> wide, but I don't think it needs to be enforced in the code here.  The
>> host bridge or upstream P2P bridge apertures should already take care
>> of that automatically.  I don't think the 16- or 32-bitness of P2P
>> bridge apertures is relevant here, because the I/O resources available
>> on the secondary bus already reflect that.
>>
>> After all that discussion, I think my objection here boils down to
>> "you shouldn't change the I/O BAR constraints in a patch that claims
>> to allocate 64-bit *memory* BARs above 4GB."
>>
>> I think the code below is still the clearest way to set the constraints:
>>
>>   if (res->flags & IORESOURCE_MEM_64) {
>>       start = (resource_size_t) (1ULL << 32);
>>       end = PCI_MAX_RESOURCE;
>>   } else {
>>       start = 0;
>>       end = PCI_MAX_RESOURCE_32;
>>   }
>>
>> It's not strictly necessary to limit I/O BARs to PCI_MAX_RESOURCE_32
>> because host bridge apertures should already enforce that, but I think
>> the code above just makes it clearer.
>
>
> ok, please check the version, that put back PCI_MAX_RESOURCE_32 for io ports.

I like the fact that this patch no longer changes anything for I/O
resources.  I assume this is part of fixing some bug (Steven's?)  I'd
like to have a pointer in the changelog to a bugzilla or discussion
about the bug.

The effect of this patch is similar to what I did earlier with
b126b4703afa4 and e7f8567db9a7 (allocate space from top down), though
this one is more limited and it won't change quite as much.  We ran
into problems (BIOS defects, I think) and had to revert my patches, so
it's quite possible that we'll run into similar problems here.

I'm a little nervous because this is a fundamental area that explores
new areas of the address space minefield.  I think we're generally
safer if we  follow a path similar to where Windows has been.  I think
Windows also prefers space above 4GB for 64-bit BARs, but I suspect
that's just a natural consequence of allocating from the top down.  So
we'll place things just above 4GB, and Windows will place things as
high as possible.

I don't know the best solution here.  This patch ("bottom-up above
4GB") is one possibility.  Another is to allocate only 64-bit BARs
top-down.  Or maybe allocate everything top-down on machines newer
than some date.  They all seem ugly.  What makes me uneasy is that
your patch strikes out on a new path that is different from what we've
done before *and* different from what Windows does.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-29 20:40                                 ` Yinghai Lu
  2012-05-29 23:24                                   ` Bjorn Helgaas
@ 2012-05-29 23:27                                   ` Bjorn Helgaas
  2012-05-29 23:33                                     ` Yinghai Lu
  2012-05-30  7:40                                     ` Steven Newbury
  1 sibling, 2 replies; 55+ messages in thread
From: Bjorn Helgaas @ 2012-05-29 23:27 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Linus Torvalds, Steven Newbury, Andrew Morton,
	linux-pci, linux-kernel

On Tue, May 29, 2012 at 2:40 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Tue, May 29, 2012 at 12:23 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> On Tue, May 29, 2012 at 12:17 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>> On Tue, May 29, 2012 at 10:57 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>>>> On 05/29/2012 10:55 AM, Yinghai Lu wrote:
>>>>>
>>>>> x86 are using 16bits.
>>>>>
>>>>> some others use 32 bits.
>>>>> #define IO_SPACE_LIMIT 0xffffffff
>>>>>
>>>>> ia64 and sparc are using 64bits.
>>>>> #define IO_SPACE_LIMIT               0xffffffffffffffffUL
>>>>>
>>>>> but pci only support 16bits and 32bits.
>>>>>
>>>>> maybe later we can add
>>>>> PCI_MAX_RESOURCE_16
>>>>>
>>>>> to handle 16bits and 32bit io ports.
>>>>>
>>>>
>>>> Shouldn't this be dealt by root port apertures?
>>>>
>>>
>>> pci bridge could support 16bits and 32bits io port.
>>> but we did not record if 32bits is supported.
>>>
>>> so during allocating, could have allocated above 64k address to non
>>> 32bit bridge.
>>>
>>> but  x86 is ok, because ioport.end always set to 0xffff.
>>> other arches with IO_SPACE_LIMIT with 0xffffffff or
>>> 0xffffffffffffffffUL may have problem.
>>
>> I think current IO_SPACE_LIMIT usage is a little confused.  The
>> "ioport_resource.end = IO_SPACE_LIMIT" in kernel/resource.c refers to
>> a CPU-side address, not a bus address.  Other uses, e.g., in
>> __pci_read_base(), apply it to bus addresses from BARs, which is
>> wrong.  Host bridges apply I/O port offsets just like they apply
>> memory offsets.  The ia64 IO_SPACE_LIMIT of 0xffffffffffffffffUL means
>> there's no restriction on CPU-side I/O port addresses, but any given
>> host bridge will translate its I/O port aperture to bus addresses that
>> fit in 32 bits.
>>
>> None of this is really relevant to the question I asked, namely, "why
>> Yinghai's patch doesn't limit I/O BAR values to 32 bits?"  That
>> constraint is clearly a requirement because I/O BARs are only 32 bits
>> wide, but I don't think it needs to be enforced in the code here.  The
>> host bridge or upstream P2P bridge apertures should already take care
>> of that automatically.  I don't think the 16- or 32-bitness of P2P
>> bridge apertures is relevant here, because the I/O resources available
>> on the secondary bus already reflect that.
>>
>> After all that discussion, I think my objection here boils down to
>> "you shouldn't change the I/O BAR constraints in a patch that claims
>> to allocate 64-bit *memory* BARs above 4GB."
>>
>> I think the code below is still the clearest way to set the constraints:
>>
>>   if (res->flags & IORESOURCE_MEM_64) {
>>       start = (resource_size_t) (1ULL << 32);
>>       end = PCI_MAX_RESOURCE;
>>   } else {
>>       start = 0;
>>       end = PCI_MAX_RESOURCE_32;
>>   }
>>
>> It's not strictly necessary to limit I/O BARs to PCI_MAX_RESOURCE_32
>> because host bridge apertures should already enforce that, but I think
>> the code above just makes it clearer.
>
>
> ok, please check the version, that put back PCI_MAX_RESOURCE_32 for io ports.
>
> also RFC to limit for 16 bit ioport handling.  only help other arches
> that does support 32bit ioports but have bridges only support 16bit io
> ports.

I don't understand this one at all.  It looks like you mashed together
at least two changes: (1) prefer I/O space above 64K if available, and
(2) mark secondary bus resources with IORESOURCE_IO_32 when the P2P
bridge I/O window address decode type is PCI_IO_RANGE_TYPE_32 and use
that to limit allocations.

I don't see the justification for (1).  What problem does this solve?

I don't see the need for (2).  Even without the start/end constraints
in pci_bus_alloc_resource(), we only allocate from the resources
available on the bus.  Apparently you think this list might be
incorrect, i.e., it might include I/O space above 64K when it
shouldn't.  I don't think that's the case, but if it is, we should fix
the list of bus resources, not add a hack to avoid parts of it.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-29 23:27                                   ` Bjorn Helgaas
@ 2012-05-29 23:33                                     ` Yinghai Lu
  2012-05-29 23:47                                       ` Bjorn Helgaas
  2012-05-30  7:40                                     ` Steven Newbury
  1 sibling, 1 reply; 55+ messages in thread
From: Yinghai Lu @ 2012-05-29 23:33 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: H. Peter Anvin, Linus Torvalds, Steven Newbury, Andrew Morton,
	linux-pci, linux-kernel

On Tue, May 29, 2012 at 4:27 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> I don't understand this one at all.  It looks like you mashed together
> at least two changes: (1) prefer I/O space above 64K if available, and
> (2) mark secondary bus resources with IORESOURCE_IO_32 when the P2P
> bridge I/O window address decode type is PCI_IO_RANGE_TYPE_32 and use
> that to limit allocations.

let drop the one about IORESOURCE_IO_32.

that should only fix one reallocation bug in theory out of x86.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-29 23:33                                     ` Yinghai Lu
@ 2012-05-29 23:47                                       ` Bjorn Helgaas
  0 siblings, 0 replies; 55+ messages in thread
From: Bjorn Helgaas @ 2012-05-29 23:47 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Linus Torvalds, Steven Newbury, Andrew Morton,
	linux-pci, linux-kernel

On Tue, May 29, 2012 at 5:33 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Tue, May 29, 2012 at 4:27 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> I don't understand this one at all.  It looks like you mashed together
>> at least two changes: (1) prefer I/O space above 64K if available, and
>> (2) mark secondary bus resources with IORESOURCE_IO_32 when the P2P
>> bridge I/O window address decode type is PCI_IO_RANGE_TYPE_32 and use
>> that to limit allocations.
>
> let drop the one about IORESOURCE_IO_32.
>
> that should only fix one reallocation bug in theory out of x86.

If there's a bug, we should try to fix it.  I just don't understand
what the bug *is*.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-29 23:27                                   ` Bjorn Helgaas
  2012-05-29 23:33                                     ` Yinghai Lu
@ 2012-05-30  7:40                                     ` Steven Newbury
  2012-05-30 16:27                                       ` Bjorn Helgaas
  1 sibling, 1 reply; 55+ messages in thread
From: Steven Newbury @ 2012-05-30  7:40 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Yinghai Lu, H. Peter Anvin, Linus Torvalds, Andrew Morton,
	linux-pci, linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 30/05/12 00:27, Bjorn Helgaas wrote:
> On Tue, May 29, 2012 at 2:40 PM, Yinghai Lu <yinghai@kernel.org>
> wrote:
>> On Tue, May 29, 2012 at 12:23 PM, Bjorn Helgaas
>> <bhelgaas@google.com> wrote:
>>> On Tue, May 29, 2012 at 12:17 PM, Yinghai Lu
>>> <yinghai@kernel.org> wrote:
>>>> On Tue, May 29, 2012 at 10:57 AM, H. Peter Anvin
>>>> <hpa@zytor.com> wrote:
>>>>> On 05/29/2012 10:55 AM, Yinghai Lu wrote:

*SNIP*

>> 
>> 
>> ok, please check the version, that put back PCI_MAX_RESOURCE_32
>> for io ports.
>> 
>> also RFC to limit for 16 bit ioport handling.  only help other
>> arches that does support 32bit ioports but have bridges only
>> support 16bit io ports.
> 
> I don't understand this one at all.  It looks like you mashed
> together at least two changes: (1) prefer I/O space above 64K if
> available, and (2) mark secondary bus resources with
> IORESOURCE_IO_32 when the P2P bridge I/O window address decode type
> is PCI_IO_RANGE_TYPE_32 and use that to limit allocations.
> 
> I don't see the justification for (1).  What problem does this
> solve?
> 
I can potentially see this helping with hotplug, where a new device
has only 16 bit io ports, but if there's no space remaining under 64k
allocation would fail.  Preferring above 64k means preserving that
limited resource.  This is exactly equivalent to my original
motivation for preferring 64 bit PCI mem in order to preserve 32 bit
address space for hotplug devices without 64 bit BARs.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk/FzvEACgkQGcb56gMuC61lYwCfchbsyzN5KLCWTuyQiMcJR0DH
l4gAoJAY1D0HN6m5JnQLu705hvm85p5a
=whd3
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-30  7:40                                     ` Steven Newbury
@ 2012-05-30 16:27                                       ` Bjorn Helgaas
  2012-05-30 16:30                                         ` H. Peter Anvin
  2012-05-30 16:33                                         ` Linus Torvalds
  0 siblings, 2 replies; 55+ messages in thread
From: Bjorn Helgaas @ 2012-05-30 16:27 UTC (permalink / raw)
  To: Steven Newbury
  Cc: Yinghai Lu, H. Peter Anvin, Linus Torvalds, Andrew Morton,
	linux-pci, linux-kernel

On Wed, May 30, 2012 at 1:40 AM, Steven Newbury <steve@snewbury.org.uk> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 30/05/12 00:27, Bjorn Helgaas wrote:
>> On Tue, May 29, 2012 at 2:40 PM, Yinghai Lu <yinghai@kernel.org>
>> wrote:
>>> On Tue, May 29, 2012 at 12:23 PM, Bjorn Helgaas
>>> <bhelgaas@google.com> wrote:
>>>> On Tue, May 29, 2012 at 12:17 PM, Yinghai Lu
>>>> <yinghai@kernel.org> wrote:
>>>>> On Tue, May 29, 2012 at 10:57 AM, H. Peter Anvin
>>>>> <hpa@zytor.com> wrote:
>>>>>> On 05/29/2012 10:55 AM, Yinghai Lu wrote:
>
> *SNIP*
>
>>>
>>>
>>> ok, please check the version, that put back PCI_MAX_RESOURCE_32
>>> for io ports.
>>>
>>> also RFC to limit for 16 bit ioport handling.  only help other
>>> arches that does support 32bit ioports but have bridges only
>>> support 16bit io ports.
>>
>> I don't understand this one at all.  It looks like you mashed
>> together at least two changes: (1) prefer I/O space above 64K if
>> available, and (2) mark secondary bus resources with
>> IORESOURCE_IO_32 when the P2P bridge I/O window address decode type
>> is PCI_IO_RANGE_TYPE_32 and use that to limit allocations.
>>
>> I don't see the justification for (1).  What problem does this
>> solve?
>>
> I can potentially see this helping with hotplug, where a new device
> has only 16 bit io ports, but if there's no space remaining under 64k
> allocation would fail.  Preferring above 64k means preserving that
> limited resource.  This is exactly equivalent to my original
> motivation for preferring 64 bit PCI mem in order to preserve 32 bit
> address space for hotplug devices without 64 bit BARs.

You're right, the spec does allow the upper 16 bits of I/O BARs to be
hardwired to zero (PCI spec rev 3.0, p. 226), so this part does make
some sense.  I don't think it applies to x86, since I don't think
there's a way to generate an I/O access to anything above 64K, but it
could help other arches.

I'm inclined to be conservative and wait until we find a problem where
a patch like this would help.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-30 16:27                                       ` Bjorn Helgaas
@ 2012-05-30 16:30                                         ` H. Peter Anvin
  2012-05-30 16:33                                         ` Linus Torvalds
  1 sibling, 0 replies; 55+ messages in thread
From: H. Peter Anvin @ 2012-05-30 16:30 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Steven Newbury, Yinghai Lu, Linus Torvalds, Andrew Morton,
	linux-pci, linux-kernel

On 05/30/2012 09:27 AM, Bjorn Helgaas wrote:
> 
> You're right, the spec does allow the upper 16 bits of I/O BARs to be
> hardwired to zero (PCI spec rev 3.0, p. 226), so this part does make
> some sense.  I don't think it applies to x86, since I don't think
> there's a way to generate an I/O access to anything above 64K, but it
> could help other arches.
> 
> I'm inclined to be conservative and wait until we find a problem where
> a patch like this would help.
> 

The really conservative thing is to just use 16-bit addresses for I/O on
all platforms.  I think a lot of non-x86 platforms does that.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-30 16:27                                       ` Bjorn Helgaas
  2012-05-30 16:30                                         ` H. Peter Anvin
@ 2012-05-30 16:33                                         ` Linus Torvalds
  1 sibling, 0 replies; 55+ messages in thread
From: Linus Torvalds @ 2012-05-30 16:33 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Steven Newbury, Yinghai Lu, H. Peter Anvin, Andrew Morton,
	linux-pci, linux-kernel

On Wed, May 30, 2012 at 9:27 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>
> You're right, the spec does allow the upper 16 bits of I/O BARs to be
> hardwired to zero (PCI spec rev 3.0, p. 226), so this part does make
> some sense.  I don't think it applies to x86, since I don't think
> there's a way to generate an I/O access to anything above 64K, but it
> could help other arches.
>
> I'm inclined to be conservative and wait until we find a problem where
> a patch like this would help.

I agree. I would be *very* suspicious of "it might help with other
architectures".

Sure, other architectures might be using non-16-bit IO addresses for
their motherboard resources (just because they can). However, any
hotplug device is most likely (by far) to have been tested and
designed for x86, which means that even if it might look like it has
more than 16 bits of IO space, it will never have been tested that
way.

So rather than make it likely to help other architectures, I'd say
that it would make it likely to break things.

                 Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-05-29 20:50                                   ` H. Peter Anvin
@ 2012-06-01 23:30                                     ` Yinghai Lu
  2012-06-04  1:05                                       ` Bjorn Helgaas
  0 siblings, 1 reply; 55+ messages in thread
From: Yinghai Lu @ 2012-06-01 23:30 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: David Miller, Tony Luck, Bjorn Helgaas, Linus Torvalds,
	Steven Newbury, Andrew Morton, linux-pci, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 366 bytes --]

On Tue, May 29, 2012 at 1:50 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>
> The bus-side address space should not be more than 32 bits no matter
> what.  As Bjorn indicates, you seem to be mixing up bus and cpu
> addresses all over the place.

please check update patches that is using converted pci bus address
for boundary checking.

Thanks

Yinghai Lu

[-- Attachment #2: pcibus_addr_converting_bus.patch --]
[-- Type: application/octet-stream, Size: 4469 bytes --]

Subject: [PATCH] PCI: pcibus address to resource converting take bus directly

For allocating resource under bus path, we do have dev pass along, and we
could just use bus instead.

Also add helper: pcibios_bus_addr_to_res to convert address only.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/host-bridge.c |   48 +++++++++++++++++++++++++++++++++-------------
 include/linux/pci.h       |    5 ++++
 2 files changed, 40 insertions(+), 13 deletions(-)

Index: linux-2.6/drivers/pci/host-bridge.c
===================================================================
--- linux-2.6.orig/drivers/pci/host-bridge.c
+++ linux-2.6/drivers/pci/host-bridge.c
@@ -9,22 +9,19 @@
 
 #include "pci.h"
 
-static struct pci_bus *find_pci_root_bus(struct pci_dev *dev)
+static struct pci_bus *find_pci_root_bus(struct pci_bus *bus)
 {
-	struct pci_bus *bus;
-
-	bus = dev->bus;
 	while (bus->parent)
 		bus = bus->parent;
 
 	return bus;
 }
 
-static struct pci_host_bridge *find_pci_host_bridge(struct pci_dev *dev)
+static struct pci_host_bridge *find_pci_host_bridge(struct pci_bus *bus)
 {
-	struct pci_bus *bus = find_pci_root_bus(dev);
+	struct pci_bus *root_bus = find_pci_root_bus(bus);
 
-	return to_pci_host_bridge(bus->bridge);
+	return to_pci_host_bridge(root_bus->bridge);
 }
 
 void pci_set_host_bridge_release(struct pci_host_bridge *bridge,
@@ -40,10 +37,11 @@ static bool resource_contains(struct res
 	return res1->start <= res2->start && res1->end >= res2->end;
 }
 
-void pcibios_resource_to_bus(struct pci_dev *dev, struct pci_bus_region *region,
-			     struct resource *res)
+void __pcibios_resource_to_bus(struct pci_bus *bus,
+				      struct pci_bus_region *region,
+				      struct resource *res)
 {
-	struct pci_host_bridge *bridge = find_pci_host_bridge(dev);
+	struct pci_host_bridge *bridge = find_pci_host_bridge(bus);
 	struct pci_host_bridge_window *window;
 	resource_size_t offset = 0;
 
@@ -60,6 +58,11 @@ void pcibios_resource_to_bus(struct pci_
 	region->start = res->start - offset;
 	region->end = res->end - offset;
 }
+void pcibios_resource_to_bus(struct pci_dev *dev, struct pci_bus_region *region,
+			     struct resource *res)
+{
+	__pcibios_resource_to_bus(dev->bus, region, res);
+}
 EXPORT_SYMBOL(pcibios_resource_to_bus);
 
 static bool region_contains(struct pci_bus_region *region1,
@@ -68,10 +71,10 @@ static bool region_contains(struct pci_b
 	return region1->start <= region2->start && region1->end >= region2->end;
 }
 
-void pcibios_bus_to_resource(struct pci_dev *dev, struct resource *res,
-			     struct pci_bus_region *region)
+static void __pcibios_bus_to_resource(struct pci_bus *bus, struct resource *res,
+				      struct pci_bus_region *region)
 {
-	struct pci_host_bridge *bridge = find_pci_host_bridge(dev);
+	struct pci_host_bridge *bridge = find_pci_host_bridge(bus);
 	struct pci_host_bridge_window *window;
 	resource_size_t offset = 0;
 
@@ -93,4 +96,23 @@ void pcibios_bus_to_resource(struct pci_
 	res->start = region->start + offset;
 	res->end = region->end + offset;
 }
+void pcibios_bus_to_resource(struct pci_dev *dev, struct resource *res,
+			     struct pci_bus_region *region)
+{
+	__pcibios_bus_to_resource(dev->bus, res, region);
+}
 EXPORT_SYMBOL(pcibios_bus_to_resource);
+
+resource_size_t pcibios_bus_addr_to_res(struct pci_bus *bus, int flags,
+					resource_size_t addr)
+{
+	struct pci_bus_region region;
+	struct resource r;
+
+	r.flags = flags;
+	region.start = addr;
+	region.end = addr;
+	__pcibios_bus_to_resource(bus, &r, &region);
+
+	return r.end;
+}
Index: linux-2.6/include/linux/pci.h
===================================================================
--- linux-2.6.orig/include/linux/pci.h
+++ linux-2.6/include/linux/pci.h
@@ -653,10 +653,15 @@ void pcibios_update_irq(struct pci_dev *
 
 /* Generic PCI functions used internally */
 
+void __pcibios_resource_to_bus(struct pci_bus *bus,
+			       struct pci_bus_region *region,
+			       struct resource *res);
 void pcibios_resource_to_bus(struct pci_dev *dev, struct pci_bus_region *region,
 			     struct resource *res);
 void pcibios_bus_to_resource(struct pci_dev *dev, struct resource *res,
 			     struct pci_bus_region *region);
+resource_size_t pcibios_bus_addr_to_res(struct pci_bus *bus, int flags,
+					resource_size_t addr);
 void pcibios_scan_specific_bus(int busn);
 extern struct pci_bus *pci_find_bus(int domain, int busnr);
 void pci_bus_add_devices(const struct pci_bus *bus);

[-- Attachment #3: allocate_high_at_first_v6.patch --]
[-- Type: application/octet-stream, Size: 4274 bytes --]

Subject: [PATCH] PCI: Try to allocate mem64 above 4G at first

and will fall back to below 4g if it can not find any above 4g.

x86 32bit without X86_PAE support will have bottom set to 0, becuase
resource_size_t is 32bit.

Also for 32bit with resource_size_t 64bit kernel on machine with pae support
we are safe because iomem_resource is limited to 32bit according to
x86_phys_bits.

-v2: update bottom assigning to make it clear for non-pae support machine.
-v3: Bjorn's change:
        use MAX_REOURCE instead of -1
        use start/end instead of bottom/max
        for all arch instead of just x86_64
-v4: updated after PCI_MAX_RESOURCE_32 change.
-v5: restore io handling to use PCI_MAX_RESOURCE_32 as limit.
-v6: checking pcibios_resource_to_bus return for every bus res, to decide it
	if we need to try high at first.
     It support all arches instead  of just x86_64.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/include/asm/pci.h |    1 -
 drivers/pci/bus.c          |   42 ++++++++++++++++++++++++++++++++++--------
 drivers/pci/pci.h          |    2 ++
 include/linux/pci.h        |    4 ----
 4 files changed, 36 insertions(+), 13 deletions(-)

Index: linux-2.6/arch/x86/include/asm/pci.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/pci.h
+++ linux-2.6/arch/x86/include/asm/pci.h
@@ -149,7 +149,6 @@ void default_restore_msi_irqs(struct pci
 
 /* generic pci stuff */
 #include <asm-generic/pci.h>
-#define PCIBIOS_MAX_MEM_32 0xffffffff
 
 #ifdef CONFIG_NUMA
 /* Returns the node based on pci bus */
Index: linux-2.6/drivers/pci/bus.c
===================================================================
--- linux-2.6.orig/drivers/pci/bus.c
+++ linux-2.6/drivers/pci/bus.c
@@ -121,15 +121,13 @@ pci_bus_alloc_resource(struct pci_bus *b
 {
 	int i, ret = -ENOMEM;
 	struct resource *r;
-	resource_size_t max = -1;
 
 	type_mask |= IORESOURCE_IO | IORESOURCE_MEM;
 
-	/* don't allocate too high if the pref mem doesn't support 64bit*/
-	if (!(res->flags & IORESOURCE_MEM_64))
-		max = PCIBIOS_MAX_MEM_32;
-
 	pci_bus_for_each_resource(bus, r, i) {
+		resource_size_t start, end, middle;
+		struct pci_bus_region region;
+
 		if (!r)
 			continue;
 
@@ -143,14 +141,42 @@ pci_bus_alloc_resource(struct pci_bus *b
 		    !(res->flags & IORESOURCE_PREFETCH))
 			continue;
 
+		/*
+		 * don't allocate too high if the pref mem doesn't
+		 * support 64bit, also if this is a 64-bit mem
+		 * resource, try above 4GB first
+		 */
+		__pcibios_resource_to_bus(bus, &region, r);
+		start = 0;
+		end = MAX_RESOURCE;
+		if (region.start <= PCI_MAX_ADDR_32 &&
+		    region.end > PCI_MAX_ADDR_32) {
+			middle = pcibios_bus_addr_to_res(bus, res->flags,
+						      PCI_MAX_ADDR_32);
+			if (res->flags & IORESOURCE_MEM_64)
+				start = middle + 1;
+			else
+				end = middle;
+		} else if (region.start > PCI_MAX_ADDR_32 &&
+			   !(res->flags & IORESOURCE_MEM_64))
+				continue;
+
+again:
 		/* Ok, try it out.. */
 		ret = allocate_resource(r, res, size,
-					r->start ? : min,
-					max, align,
+					max(start, r->start ? : min),
+					end, align,
 					alignf, alignf_data);
 		if (ret == 0)
-			break;
+			return 0;
+
+		if (start != 0) {
+			start = 0;
+			goto again;
+		}
 	}
+
+
 	return ret;
 }
 
Index: linux-2.6/drivers/pci/pci.h
===================================================================
--- linux-2.6.orig/drivers/pci/pci.h
+++ linux-2.6/drivers/pci/pci.h
@@ -208,6 +208,8 @@ enum pci_bar_type {
 	pci_bar_mem64,		/* A 64-bit memory BAR */
 };
 
+#define PCI_MAX_ADDR_32	((resource_size_t)0xffffffff)
+
 bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *pl,
 				int crs_timeout);
 extern int pci_setup_device(struct pci_dev *dev);
Index: linux-2.6/include/linux/pci.h
===================================================================
--- linux-2.6.orig/include/linux/pci.h
+++ linux-2.6/include/linux/pci.h
@@ -1345,10 +1345,6 @@ static inline int pci_domain_nr(struct p
 
 #include <asm/pci.h>
 
-#ifndef PCIBIOS_MAX_MEM_32
-#define PCIBIOS_MAX_MEM_32 (-1)
-#endif
-
 /* these helpers provide future and backwards compatibility
  * for accessing popular PCI BAR info */
 #define pci_resource_start(dev, bar)	((dev)->resource[(bar)].start)

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-06-01 23:30                                     ` Yinghai Lu
@ 2012-06-04  1:05                                       ` Bjorn Helgaas
  2012-06-05  2:37                                         ` Yinghai Lu
  0 siblings, 1 reply; 55+ messages in thread
From: Bjorn Helgaas @ 2012-06-04  1:05 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, David Miller, Tony Luck, Linus Torvalds,
	Steven Newbury, Andrew Morton, linux-pci, linux-kernel

On Fri, Jun 1, 2012 at 4:30 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Tue, May 29, 2012 at 1:50 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>
>> The bus-side address space should not be more than 32 bits no matter
>> what.  As Bjorn indicates, you seem to be mixing up bus and cpu
>> addresses all over the place.
>
> please check update patches that is using converted pci bus address
> for boundary checking.

What problem does this fix?  There's significant risk that this
allocation change  will make us trip over something, so it must fix
something to make it worth considering.

Steve's problem doesn't count because that's a "pci=nocrs" case that
will always require special handling.  A general solution is not
possible without a BIOS change (to describe >4GB apertures) or a
native host bridge driver (to discover >4GB apertures from the
hardware).  These patches only make Steve's machine work by accident
-- they make us put the video device above 4GB, and we're just lucky
that the host bridge claims that region.

One possibility is some sort of boot-time option to force a PCI device
to a specified address.  That would be useful for debugging as well as
for Steve's machine.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-06-04  1:05                                       ` Bjorn Helgaas
@ 2012-06-05  2:37                                         ` Yinghai Lu
  2012-06-05  4:50                                           ` Bjorn Helgaas
  0 siblings, 1 reply; 55+ messages in thread
From: Yinghai Lu @ 2012-06-05  2:37 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: H. Peter Anvin, David Miller, Tony Luck, Linus Torvalds,
	Steven Newbury, Andrew Morton, linux-pci, linux-kernel

On Sun, Jun 3, 2012 at 6:05 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Fri, Jun 1, 2012 at 4:30 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Tue, May 29, 2012 at 1:50 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>>
>>> The bus-side address space should not be more than 32 bits no matter
>>> what.  As Bjorn indicates, you seem to be mixing up bus and cpu
>>> addresses all over the place.
>>
>> please check update patches that is using converted pci bus address
>> for boundary checking.
>
> What problem does this fix?  There's significant risk that this
> allocation change  will make us trip over something, so it must fix
> something to make it worth considering.

If we do not enable that, we would not find the problem.

On one my test setup that _CRS does state 64bit resource range,
but when I clear some device resource manually and let kernel allocate
high, just then find out those devices does not work with drivers.
It turns out _CRS have more big range than what the chipset setting states.
with fixing in BIOS, allocate high is working now on that platform.

>
> Steve's problem doesn't count because that's a "pci=nocrs" case that
> will always require special handling.

but pci=nocrs is still supported, even some systems does not work with
pci=use_crs

> A general solution is not
> possible without a BIOS change (to describe >4GB apertures) or a
> native host bridge driver (to discover >4GB apertures from the
> hardware).  These patches only make Steve's machine work by accident
> -- they make us put the video device above 4GB, and we're just lucky
> that the host bridge claims that region.

Some bios looks enabling the non-stated range default to legacy chain.
Some bios does not do that. only stated range count.
So with pci=nocrs we still have some chance to get allocate high working.

>
> One possibility is some sort of boot-time option to force a PCI device
> to a specified address.  That would be useful for debugging as well as
> for Steve's machine.

yeah, how about

pci=alloc_high

and default to disabled ?

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-06-05  2:37                                         ` Yinghai Lu
@ 2012-06-05  4:50                                           ` Bjorn Helgaas
  2012-06-05  5:04                                             ` Yinghai Lu
  0 siblings, 1 reply; 55+ messages in thread
From: Bjorn Helgaas @ 2012-06-05  4:50 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, David Miller, Tony Luck, Linus Torvalds,
	Steven Newbury, Andrew Morton, linux-pci, linux-kernel

On Mon, Jun 4, 2012 at 7:37 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Sun, Jun 3, 2012 at 6:05 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> On Fri, Jun 1, 2012 at 4:30 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>> On Tue, May 29, 2012 at 1:50 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>>>
>>>> The bus-side address space should not be more than 32 bits no matter
>>>> what.  As Bjorn indicates, you seem to be mixing up bus and cpu
>>>> addresses all over the place.
>>>
>>> please check update patches that is using converted pci bus address
>>> for boundary checking.
>>
>> What problem does this fix?  There's significant risk that this
>> allocation change  will make us trip over something, so it must fix
>> something to make it worth considering.
>
> If we do not enable that, we would not find the problem.

Sorry, that didn't make any sense to me.  I'm hoping you will point us
to a bug report that is fixed by this patch.

> On one my test setup that _CRS does state 64bit resource range,
> but when I clear some device resource manually and let kernel allocate
> high, just then find out those devices does not work with drivers.
> It turns out _CRS have more big range than what the chipset setting states.
> with fixing in BIOS, allocate high is working now on that platform.

I didn't understand this either, sorry.  Are you saying that this
patch helps us work around a BIOS defect?

>> Steve's problem doesn't count because that's a "pci=nocrs" case that
>> will always require special handling.
>
> but pci=nocrs is still supported, even some systems does not work with
> pci=use_crs
>
>> A general solution is not
>> possible without a BIOS change (to describe >4GB apertures) or a
>> native host bridge driver (to discover >4GB apertures from the
>> hardware).  These patches only make Steve's machine work by accident
>> -- they make us put the video device above 4GB, and we're just lucky
>> that the host bridge claims that region.
>
> Some bios looks enabling the non-stated range default to legacy chain.
> Some bios does not do that. only stated range count.
> So with pci=nocrs we still have some chance to get allocate high working.

The patch as proposed changes behavior for all systems, whether we're
using _CRS or not (in fact, it even changes the behavior for non-x86
systems).  The only case we know of where it fixes something is
Steve's system, where he already has to use "pci=nocrs" in order for
it to help.  My point is that it would be safer to leave things as
they are for everybody, and merely ask Steve to use "pci=nocrs
pci=alloc_high" or something similar.

>> One possibility is some sort of boot-time option to force a PCI device
>> to a specified address.  That would be useful for debugging as well as
>> for Steve's machine.
>
> yeah, how about
>
> pci=alloc_high
>
> and default to disabled ?

I was actually thinking of something more specific, e.g., a way to
place one device at an exact address.  I've implemented that a couple
times already for testing various things.  But maybe a more general
option like "pci=alloc_high" would make sense, too.

Linux has a long history of allocating bottom-up.  Windows has a long
history of allocating top-down.  You're proposing a third alternative,
allocating bottom-up starting at 4GB for 64-bit BARs.  If we change
this area, I would prefer something that follows Windows because I
think it will be closer to what's been tested by Windows.  Do you
think your alternative is better?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-06-05  4:50                                           ` Bjorn Helgaas
@ 2012-06-05  5:04                                             ` Yinghai Lu
  2012-06-06  9:44                                               ` Steven Newbury
  0 siblings, 1 reply; 55+ messages in thread
From: Yinghai Lu @ 2012-06-05  5:04 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: H. Peter Anvin, David Miller, Tony Luck, Linus Torvalds,
	Steven Newbury, Andrew Morton, linux-pci, linux-kernel

On Mon, Jun 4, 2012 at 9:50 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Mon, Jun 4, 2012 at 7:37 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Sun, Jun 3, 2012 at 6:05 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>> On Fri, Jun 1, 2012 at 4:30 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>>> On Tue, May 29, 2012 at 1:50 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>>>>
>>>>> The bus-side address space should not be more than 32 bits no matter
>>>>> what.  As Bjorn indicates, you seem to be mixing up bus and cpu
>>>>> addresses all over the place.
>>>>
>>>> please check update patches that is using converted pci bus address
>>>> for boundary checking.
>>>
>>> What problem does this fix?  There's significant risk that this
>>> allocation change  will make us trip over something, so it must fix
>>> something to make it worth considering.
>>
>> If we do not enable that, we would not find the problem.
>
> Sorry, that didn't make any sense to me.  I'm hoping you will point us
> to a bug report that is fixed by this patch.

current it only help Steve's test case.

>
>> On one my test setup that _CRS does state 64bit resource range,
>> but when I clear some device resource manually and let kernel allocate
>> high, just then find out those devices does not work with drivers.
>> It turns out _CRS have more big range than what the chipset setting states.
>> with fixing in BIOS, allocate high is working now on that platform.
>
> I didn't understand this either, sorry.  Are you saying that this
> patch helps us work around a BIOS defect?

Help us find out one BIOS defect.

>
>> yeah, how about
>>
>> pci=alloc_high
>>
>> and default to disabled ?
>
> I was actually thinking of something more specific, e.g., a way to
> place one device at an exact address.  I've implemented that a couple
> times already for testing various things.  But maybe a more general
> option like "pci=alloc_high" would make sense, too.

yeah.

>
....
> Linux has a long history of allocating bottom-up.  Windows has a long
> history of allocating top-down.  You're proposing a third alternative,
> allocating bottom-up starting at 4GB for 64-bit BARs.  If we change
> this area, I would prefer something that follows Windows because I
> think it will be closer to what's been tested by Windows.  Do you
> think your alternative is better?

hope we can figure out how windows is making it work.

Steve, Can you check if Windows is working with your test case ?

If it works, we may try do the same thing from Linux, so you will not need to
append "pci=nocrs pci=alloc_high"...

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-06-05  5:04                                             ` Yinghai Lu
@ 2012-06-06  9:44                                               ` Steven Newbury
  2012-06-06 16:18                                                 ` Bjorn Helgaas
  0 siblings, 1 reply; 55+ messages in thread
From: Steven Newbury @ 2012-06-06  9:44 UTC (permalink / raw)
  To: Yinghai Lu, Bjorn Helgaas
  Cc: H. Peter Anvin, David Miller, Tony Luck, Linus Torvalds,
	Andrew Morton, linux-pci, linux-kernel

On Tue,   5 Jun 2012, 06:04:57 BST, Yinghai Lu <yinghai@kernel.org> wrote:
> > Linux has a long history of allocating bottom-up.  Windows has a long
> > history of allocating top-down.  You're proposing a third alternative,
> > allocating bottom-up starting at 4GB for 64-bit BARs.  If we change
> > this area, I would prefer something that follows Windows because I
> > think it will be closer to what's been tested by Windows.  Do you
> > think your alternative is better?
> 
> hope we can figure out how windows is making it work.
> 
> Steve, Can you check if Windows is working with your test case ?
> 
> If it works, we may try do the same thing from Linux, so you will not
> need to append "pci=nocrs pci=alloc_high"...
> 
Unfortunately I don't have a 64 bit version of Windows to test with.  Vista(32 bit) fails to even boot when docked, hot-plugging fails to allocate resources, but at least doesn't crash.

>From what I've read about the (64 bit) Windows allocation stragegy it's closer to Yinghai's method than the Linux default, preferring 64 bit resources (>4G) when possible.  I'll try to find the specification document again.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
  2012-06-06  9:44                                               ` Steven Newbury
@ 2012-06-06 16:18                                                 ` Bjorn Helgaas
       [not found]                                                   ` <CAGLnvc_ejMWiiubVMo7DLz5ZVn1iMbf67FB4H7crRCCTRRqt2A@mail.gmail.com>
  0 siblings, 1 reply; 55+ messages in thread
From: Bjorn Helgaas @ 2012-06-06 16:18 UTC (permalink / raw)
  To: Steven Newbury
  Cc: Yinghai Lu, H. Peter Anvin, David Miller, Tony Luck,
	Linus Torvalds, Andrew Morton, linux-pci, linux-kernel

On Wed, Jun 6, 2012 at 2:44 AM, Steven Newbury <steve@snewbury.org.uk> wrote:
> On Tue,   5 Jun 2012, 06:04:57 BST, Yinghai Lu <yinghai@kernel.org> wrote:
>> > Linux has a long history of allocating bottom-up.  Windows has a long
>> > history of allocating top-down.  You're proposing a third alternative,
>> > allocating bottom-up starting at 4GB for 64-bit BARs.  If we change
>> > this area, I would prefer something that follows Windows because I
>> > think it will be closer to what's been tested by Windows.  Do you
>> > think your alternative is better?
>>
>> hope we can figure out how windows is making it work.
>>
>> Steve, Can you check if Windows is working with your test case ?
>>
>> If it works, we may try do the same thing from Linux, so you will not
>> need to append "pci=nocrs pci=alloc_high"...
>>
> Unfortunately I don't have a 64 bit version of Windows to test with.  Vista(32 bit) fails to even boot when docked, hot-plugging fails to allocate resources, but at least doesn't crash.
>
> From what I've read about the (64 bit) Windows allocation stragegy it's closer to Yinghai's method than the Linux default, preferring 64 bit resources (>4G) when possible.  I'll try to find the specification document again.

Here's the host bridge info from the BIOS (from
https://bugzilla.kernel.org/show_bug.cgi?id=10461 attachment
https://bugzilla.kernel.org/attachment.cgi?id=72869):

ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
pci_root PNP0A03:00: host bridge window [io  0x0000-0x0cf7]
pci_root PNP0A03:00: host bridge window [io  0x0d00-0xffff]
pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff]
pci_root PNP0A03:00: host bridge window [mem 0x000d0000-0x000dffff]
pci_root PNP0A03:00: host bridge window [mem 0xe0000000-0xf7ffffff]
pci_root PNP0A03:00: host bridge window [mem 0xfc000000-0xfebfffff]
pci_root PNP0A03:00: host bridge window [mem 0xfec10000-0xfecfffff]
pci_root PNP0A03:00: host bridge window [mem 0xfed1c000-0xfed1ffff]
pci_root PNP0A03:00: host bridge window [mem 0xfed90000-0xfed9ffff]
pci_root PNP0A03:00: host bridge window [mem 0xfed40000-0xfed44fff]
pci_root PNP0A03:00: host bridge window [mem 0xfeda7000-0xfedfffff]
pci_root PNP0A03:00: host bridge window [mem 0xfee10000-0xff9fffff]
pci_root PNP0A03:00: host bridge window [mem 0xffc00000-0xffdfffff]

There's no aperture above 4GB.  So I don't think any version of
Windows will ever assign a BAR above 4GB.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first
       [not found]                                                   ` <CAGLnvc_ejMWiiubVMo7DLz5ZVn1iMbf67FB4H7crRCCTRRqt2A@mail.gmail.com>
@ 2012-07-04  3:00                                                     ` joeyli
  0 siblings, 0 replies; 55+ messages in thread
From: joeyli @ 2012-07-04  3:00 UTC (permalink / raw)
  To: Steven Newbury, Yinghai Lu, David Miller, H. Peter Anvin,
	Andrew Morton, linux-pci, Linus Torvalds, linux-kernel,
	Tony Luck

於 三,2012-07-04 於 10:56 +0800,lee joey 提到:
> 
> 
> ---------- Forwarded message ----------
> From: Bjorn Helgaas <bhelgaas@google.com>
> Date: 2012/6/7
> Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at
> first
> To: Steven Newbury <steve@snewbury.org.uk>
> Cc: Yinghai Lu <yinghai@kernel.org>, "H. Peter Anvin" <hpa@zytor.com>,
> David Miller <davem@davemloft.net>, Tony Luck <tony.luck@intel.com>,
> Linus Torvalds <torvalds@linux-foundation.org>, Andrew Morton
> <akpm@linux-foundation.org>, linux-pci@vger.kernel.org,
> linux-kernel@vger.kernel.org
> 
> 
> On Wed, Jun 6, 2012 at 2:44 AM, Steven Newbury <steve@snewbury.org.uk>
> wrote:
> > On Tue,   5 Jun 2012, 06:04:57 BST, Yinghai Lu <yinghai@kernel.org>
> wrote:
> >> > Linux has a long history of allocating bottom-up.  Windows has a
> long
> >> > history of allocating top-down.  You're proposing a third
> alternative,
> >> > allocating bottom-up starting at 4GB for 64-bit BARs.  If we
> change
> >> > this area, I would prefer something that follows Windows because
> I
> >> > think it will be closer to what's been tested by Windows.  Do you
> >> > think your alternative is better?
> >>
> >> hope we can figure out how windows is making it work.
> >>
> >> Steve, Can you check if Windows is working with your test case ?
> >>
> >> If it works, we may try do the same thing from Linux, so you will
> not
> >> need to append "pci=nocrs pci=alloc_high"...
> >>
> > Unfortunately I don't have a 64 bit version of Windows to test
> with.  Vista(32 bit) fails to even boot when docked, hot-plugging
> fails to allocate resources, but at least doesn't crash.
> >
> > From what I've read about the (64 bit) Windows allocation stragegy
> it's closer to Yinghai's method than the Linux default, preferring 64
> bit resources (>4G) when possible.  I'll try to find the specification
> document again.
> 
> 
> Here's the host bridge info from the BIOS (from
> https://bugzilla.kernel.org/show_bug.cgi?id=10461 attachment
> https://bugzilla.kernel.org/attachment.cgi?id=72869):
> 
> ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
> pci_root PNP0A03:00: host bridge window [io  0x0000-0x0cf7]
> pci_root PNP0A03:00: host bridge window [io  0x0d00-0xffff]
> pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff]
> pci_root PNP0A03:00: host bridge window [mem 0x000d0000-0x000dffff]
> pci_root PNP0A03:00: host bridge window [mem 0xe0000000-0xf7ffffff]
> pci_root PNP0A03:00: host bridge window [mem 0xfc000000-0xfebfffff]
> pci_root PNP0A03:00: host bridge window [mem 0xfec10000-0xfecfffff]
> pci_root PNP0A03:00: host bridge window [mem 0xfed1c000-0xfed1ffff]
> pci_root PNP0A03:00: host bridge window [mem 0xfed90000-0xfed9ffff]
> pci_root PNP0A03:00: host bridge window [mem 0xfed40000-0xfed44fff]
> pci_root PNP0A03:00: host bridge window [mem 0xfeda7000-0xfedfffff]
> pci_root PNP0A03:00: host bridge window [mem 0xfee10000-0xff9fffff]
> pci_root PNP0A03:00: host bridge window [mem 0xffc00000-0xffdfffff]
> 
> There's no aperture above 4GB.  So I don't think any version of
> Windows will ever assign a BAR above 4GB.
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 

Hope have any help...

Here have a document from MSDN talk about the pci allocate strategy on
Windows server 2003, XP and vista:
	http://msdn.microsoft.com/en-us/library/windows/hardware/gg462986.aspx


Per page 4, looks Microsoft have different strategy on different Windows
version

On XP and server 2003: First, they ignored BIOS's boot configuration and
allocate below 4G. If fail, then try to allocate above 4GB.

On Vista: it always respects the boot configuration of devices above 4
GB.

But, this document didn't cover the behavior on Windows 7, not sure it's
the same with Vista.


Thanks

Joey Lee




^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2012-07-04  3:01 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-23  6:34 [PATCH 00/11] PCI: resource allocation related Yinghai Lu
2012-05-23  6:34 ` [PATCH 01/11] PCI: Should add children device res to fail list Yinghai Lu
2012-05-23  6:34 ` [PATCH 02/11] PCI: Try to allocate mem64 above 4G at first Yinghai Lu
2012-05-23 15:57   ` Linus Torvalds
2012-05-23 17:30     ` Yinghai Lu
2012-05-23 18:40       ` Yinghai Lu
2012-05-25  4:36         ` Bjorn Helgaas
2012-05-25 17:53           ` Yinghai Lu
2012-05-25 18:39             ` Yinghai Lu
2012-05-25 19:37               ` Bjorn Helgaas
2012-05-25 20:18                 ` H. Peter Anvin
2012-05-25 20:19                 ` Yinghai Lu
2012-05-25 21:55                   ` Bjorn Helgaas
2012-05-25 21:58                     ` H. Peter Anvin
2012-05-25 22:14                       ` Bjorn Helgaas
2012-05-25 23:10                     ` Yinghai Lu
2012-05-26  0:12                       ` Bjorn Helgaas
2012-05-26 15:01                         ` Bjorn Helgaas
2012-05-29 17:56                           ` Yinghai Lu
2012-05-29 17:55                         ` Yinghai Lu
2012-05-29 17:57                           ` H. Peter Anvin
2012-05-29 18:17                             ` Yinghai Lu
2012-05-29 19:03                               ` H. Peter Anvin
2012-05-29 20:46                                 ` Yinghai Lu
2012-05-29 20:50                                   ` H. Peter Anvin
2012-06-01 23:30                                     ` Yinghai Lu
2012-06-04  1:05                                       ` Bjorn Helgaas
2012-06-05  2:37                                         ` Yinghai Lu
2012-06-05  4:50                                           ` Bjorn Helgaas
2012-06-05  5:04                                             ` Yinghai Lu
2012-06-06  9:44                                               ` Steven Newbury
2012-06-06 16:18                                                 ` Bjorn Helgaas
     [not found]                                                   ` <CAGLnvc_ejMWiiubVMo7DLz5ZVn1iMbf67FB4H7crRCCTRRqt2A@mail.gmail.com>
2012-07-04  3:00                                                     ` joeyli
2012-05-29 20:53                                   ` David Miller
2012-05-29 19:23                               ` Bjorn Helgaas
2012-05-29 20:40                                 ` Yinghai Lu
2012-05-29 23:24                                   ` Bjorn Helgaas
2012-05-29 23:27                                   ` Bjorn Helgaas
2012-05-29 23:33                                     ` Yinghai Lu
2012-05-29 23:47                                       ` Bjorn Helgaas
2012-05-30  7:40                                     ` Steven Newbury
2012-05-30 16:27                                       ` Bjorn Helgaas
2012-05-30 16:30                                         ` H. Peter Anvin
2012-05-30 16:33                                         ` Linus Torvalds
2012-05-23  6:34 ` [PATCH 03/11] intel-gtt: Read 64bit for gmar_bus_addr Yinghai Lu
2012-05-23  7:21   ` Dave Airlie
2012-05-23  7:44     ` Daniel Vetter
2012-05-23  6:34 ` [PATCH 04/11] PCI: Make sure assign same align with large size resource at first Yinghai Lu
2012-05-23  6:34 ` [PATCH 05/11] resources: Split out __allocate_resource() Yinghai Lu
2012-05-23  6:34 ` [PATCH 06/11] resource: make find_resource could return just fit resource Yinghai Lu
2012-05-23  6:34 ` [PATCH 07/11] PCI: Don't allocate small resource in big empty space Yinghai Lu
2012-05-23  6:34 ` [PATCH 08/11] resource: only return range with needed align Yinghai Lu
2012-05-23  6:34 ` [PATCH 09/11] PCI: Add is_pci_iov_resource_idx() Yinghai Lu
2012-05-23  6:34 ` [PATCH 10/11] PCI: Sort unassigned resources with correct alignment Yinghai Lu
2012-05-23  6:34 ` [PATCH 11/11] PCI: Treat ROM resource as optional during assigning Yinghai Lu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.