linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
@ 2010-05-12 18:14 Mike Travis
  2010-05-13 18:56 ` Bjorn Helgaas
                   ` (2 more replies)
  0 siblings, 3 replies; 49+ messages in thread
From: Mike Travis @ 2010-05-12 18:14 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin
  Cc: x86, Jesse Barnes, Bjorn Helgaas, Jacob Pan, Tejun Heo,
	Mike Habeck, LKML

Subject: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
From: Mike Habeck <habeck@sgi.com>

The Linux kernel assigns BARs that a BIOS did not assign, most likely
to handle broken BIOSes that didn't enumerate the devices correctly.
On UV the BIOS purposely doesn't assign I/O BARs for certain devices/
drivers we know don't use them (examples, LSI SAS, Qlogic FC, ...).
We purposely don't assign these I/O BARs because I/O Space is a very
limited resource.  There is only 64k of I/O Space, and in a PCIe
topology that space gets divided up into 4k chucks (this is due to
the fact that a pci-to-pci bridge's I/O decoder is aligned at 4k)...
Thus a system can have at most 16 cards with I/O BARs: (64k / 4k = 16)

SGI needs to scale to >16 devices with I/O BARs.  So by not assigning
I/O BARs on devices we know don't use them, we can do that (iff the
kernel doesn't go and assign these BARs that the BIOS purposely didn't
assign).

This patch will not assign a resource to a device BAR if that BAR was
not assigned by the BIOS, and the kernel cmdline option 'pci=nobar'
was specified.   This patch is closely modeled after the 'pci=norom'
option that currently exists in the tree.

Signed-off-by: Mike Habeck <habeck@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
---
 Documentation/kernel-parameters.txt |    2 ++
 arch/x86/include/asm/pci_x86.h      |    1 +
 arch/x86/pci/common.c               |   20 ++++++++++++++++++++
 3 files changed, 23 insertions(+)

--- linux.orig/Documentation/kernel-parameters.txt
+++ linux/Documentation/kernel-parameters.txt
@@ -1935,6 +1935,8 @@ and is between 256 and 4096 characters.
 		norom		[X86] Do not assign address space to
 				expansion ROMs that do not already have
 				BIOS assigned address ranges.
+		nobar		[X86] Do not assign address space to the
+				BARs that weren't assigned by the BIOS.
 		irqmask=0xMMMM	[X86] Set a bit mask of IRQs allowed to be
 				assigned automatically to PCI devices. You can
 				make the kernel exclude IRQs of your ISA cards
--- linux.orig/arch/x86/include/asm/pci_x86.h
+++ linux/arch/x86/include/asm/pci_x86.h
@@ -30,6 +30,7 @@
 #define PCI_HAS_IO_ECS		0x40000
 #define PCI_NOASSIGN_ROMS	0x80000
 #define PCI_ROOT_NO_CRS		0x100000
+#define PCI_NOASSIGN_BARS	0x200000
 
 extern unsigned int pci_probe;
 extern unsigned long pirq_table_addr;
--- linux.orig/arch/x86/pci/common.c
+++ linux/arch/x86/pci/common.c
@@ -125,6 +125,23 @@ void __init dmi_check_skip_isa_align(voi
 static void __devinit pcibios_fixup_device_resources(struct pci_dev *dev)
 {
 	struct resource *rom_r = &dev->resource[PCI_ROM_RESOURCE];
+	struct resource *bar_r;
+	int bar;
+
+	if (pci_probe & PCI_NOASSIGN_BARS) {
+		/*
+		* If the BIOS did not assign the BAR, zero out the
+		* resource so the kernel doesn't attmept to assign
+		* it later on in pci_assign_unassigned_resources
+		*/
+		for (bar = 0; bar <= PCI_STD_RESOURCE_END; bar++) {
+			bar_r = &dev->resource[bar];
+			if (bar_r->start == 0 && bar_r->end != 0) {
+				bar_r->flags = 0;
+				bar_r->end = 0;
+			}
+		}
+	}
 
 	if (pci_probe & PCI_NOASSIGN_ROMS) {
 		if (rom_r->parent)
@@ -509,6 +526,9 @@ char * __devinit  pcibios_setup(char *st
 	} else if (!strcmp(str, "norom")) {
 		pci_probe |= PCI_NOASSIGN_ROMS;
 		return NULL;
+	} else if (!strcmp(str, "nobar")) {
+		pci_probe |= PCI_NOASSIGN_BARS;
+		return NULL;
 	} else if (!strcmp(str, "assign-busses")) {
 		pci_probe |= PCI_ASSIGN_ALL_BUSSES;
 		return NULL;

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-12 18:14 [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned Mike Travis
@ 2010-05-13 18:56 ` Bjorn Helgaas
  2010-05-13 19:08   ` H. Peter Anvin
                     ` (2 more replies)
  2010-05-13 18:58 ` Bjorn Helgaas
  2010-05-28 16:53 ` Mike Travis
  2 siblings, 3 replies; 49+ messages in thread
From: Bjorn Helgaas @ 2010-05-13 18:56 UTC (permalink / raw)
  To: Mike Travis
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, x86, Jesse Barnes,
	Jacob Pan, Tejun Heo, Mike Habeck, LKML, Yinghai

On Wednesday, May 12, 2010 12:14:32 pm Mike Travis wrote:
> Subject: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
> From: Mike Habeck <habeck@sgi.com>
> 
> The Linux kernel assigns BARs that a BIOS did not assign, most likely
> to handle broken BIOSes that didn't enumerate the devices correctly.
> On UV the BIOS purposely doesn't assign I/O BARs for certain devices/
> drivers we know don't use them (examples, LSI SAS, Qlogic FC, ...).
> We purposely don't assign these I/O BARs because I/O Space is a very
> limited resource.  There is only 64k of I/O Space, and in a PCIe
> topology that space gets divided up into 4k chucks (this is due to
> the fact that a pci-to-pci bridge's I/O decoder is aligned at 4k)...
> Thus a system can have at most 16 cards with I/O BARs: (64k / 4k = 16)
> 
> SGI needs to scale to >16 devices with I/O BARs.  So by not assigning
> I/O BARs on devices we know don't use them, we can do that (iff the
> kernel doesn't go and assign these BARs that the BIOS purposely didn't
> assign).

I don't quite understand this part.  If you boot with "pci=nobar",
the BIOS doesn't assign BARs, Linux doesn't either, the drivers
don't need them -- everything works, and that makes sense so far.

Now, if you boot normally (without "pci=nobar"), what changes?
The BIOS situation is the same, but Linux tries to assign the
unassigned BARs.  It may assign a few before running out of space,
but the drivers still don't need those BARs.  What breaks?

> This patch will not assign a resource to a device BAR if that BAR was
> not assigned by the BIOS, and the kernel cmdline option 'pci=nobar'
> was specified.   This patch is closely modeled after the 'pci=norom'
> option that currently exists in the tree.

Can't we figure out whether we need this ourselves?  Using a command-
line option just guarantees that we'll forever be writing customer
advisories about this issue.

This issue is not specific to x86, so I don't really like having
the implementation be x86-specific.

Do we know anything about how other OSes handle this case of I/O
space exhaustion?

I'm a little bit nervous about Linux's current strategy of assigning
resources to things before we even know whether we're going to use
them.  We don't support dynamic PCI resource reassignment, so maybe
we don't have any choice in this case, but generally I prefer the
lazy approach.

Bjorn

> Signed-off-by: Mike Habeck <habeck@sgi.com>
> Signed-off-by: Mike Travis <travis@sgi.com>
> ---
>  Documentation/kernel-parameters.txt |    2 ++
>  arch/x86/include/asm/pci_x86.h      |    1 +
>  arch/x86/pci/common.c               |   20 ++++++++++++++++++++
>  3 files changed, 23 insertions(+)
> 
> --- linux.orig/Documentation/kernel-parameters.txt
> +++ linux/Documentation/kernel-parameters.txt
> @@ -1935,6 +1935,8 @@ and is between 256 and 4096 characters.
>  		norom		[X86] Do not assign address space to
>  				expansion ROMs that do not already have
>  				BIOS assigned address ranges.
> +		nobar		[X86] Do not assign address space to the
> +				BARs that weren't assigned by the BIOS.
>  		irqmask=0xMMMM	[X86] Set a bit mask of IRQs allowed to be
>  				assigned automatically to PCI devices. You can
>  				make the kernel exclude IRQs of your ISA cards
> --- linux.orig/arch/x86/include/asm/pci_x86.h
> +++ linux/arch/x86/include/asm/pci_x86.h
> @@ -30,6 +30,7 @@
>  #define PCI_HAS_IO_ECS		0x40000
>  #define PCI_NOASSIGN_ROMS	0x80000
>  #define PCI_ROOT_NO_CRS		0x100000
> +#define PCI_NOASSIGN_BARS	0x200000
>  
>  extern unsigned int pci_probe;
>  extern unsigned long pirq_table_addr;
> --- linux.orig/arch/x86/pci/common.c
> +++ linux/arch/x86/pci/common.c
> @@ -125,6 +125,23 @@ void __init dmi_check_skip_isa_align(voi
>  static void __devinit pcibios_fixup_device_resources(struct pci_dev *dev)
>  {
>  	struct resource *rom_r = &dev->resource[PCI_ROM_RESOURCE];
> +	struct resource *bar_r;
> +	int bar;
> +
> +	if (pci_probe & PCI_NOASSIGN_BARS) {
> +		/*
> +		* If the BIOS did not assign the BAR, zero out the
> +		* resource so the kernel doesn't attmept to assign
> +		* it later on in pci_assign_unassigned_resources
> +		*/
> +		for (bar = 0; bar <= PCI_STD_RESOURCE_END; bar++) {
> +			bar_r = &dev->resource[bar];
> +			if (bar_r->start == 0 && bar_r->end != 0) {
> +				bar_r->flags = 0;
> +				bar_r->end = 0;
> +			}
> +		}
> +	}
>  
>  	if (pci_probe & PCI_NOASSIGN_ROMS) {
>  		if (rom_r->parent)
> @@ -509,6 +526,9 @@ char * __devinit  pcibios_setup(char *st
>  	} else if (!strcmp(str, "norom")) {
>  		pci_probe |= PCI_NOASSIGN_ROMS;
>  		return NULL;
> +	} else if (!strcmp(str, "nobar")) {
> +		pci_probe |= PCI_NOASSIGN_BARS;
> +		return NULL;
>  	} else if (!strcmp(str, "assign-busses")) {
>  		pci_probe |= PCI_ASSIGN_ALL_BUSSES;
>  		return NULL;
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-12 18:14 [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned Mike Travis
  2010-05-13 18:56 ` Bjorn Helgaas
@ 2010-05-13 18:58 ` Bjorn Helgaas
  2010-05-28 16:53 ` Mike Travis
  2 siblings, 0 replies; 49+ messages in thread
From: Bjorn Helgaas @ 2010-05-13 18:58 UTC (permalink / raw)
  To: Mike Travis
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, x86, Jesse Barnes,
	Jacob Pan, Tejun Heo, Mike Habeck, LKML

Oops, I added Yinghai to the CC: list, but I forgot to add
linux-pci@vger.kernel.org.  Please add that on any future replies.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-13 18:56 ` Bjorn Helgaas
@ 2010-05-13 19:08   ` H. Peter Anvin
  2010-05-13 19:12   ` Mike Travis
  2010-05-13 19:38   ` Mike Habeck
  2 siblings, 0 replies; 49+ messages in thread
From: H. Peter Anvin @ 2010-05-13 19:08 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mike Travis, Ingo Molnar, Thomas Gleixner, x86, Jesse Barnes,
	Jacob Pan, Tejun Heo, Mike Habeck, LKML, Yinghai, linux-pci

On 05/13/2010 11:56 AM, Bjorn Helgaas wrote:
> 
> I'm a little bit nervous about Linux's current strategy of assigning
> resources to things before we even know whether we're going to use
> them.  We don't support dynamic PCI resource reassignment, so maybe
> we don't have any choice in this case, but generally I prefer the
> lazy approach.
> 

Lazy has its own pitfalls.  In particular, the issue here is about
allocation of bridge devices, which may cause all kinds of issues with
bridges further up.  Consider:


	       A
	      / \
             B   C
             |   |
             D   E

... where A, B, and C are bridges, and D and E are devices.  Now the
driver initializes D, and requests address space.  As a result, the OS
configures bridge B to have a 4K window, and accordingly the same for
bridge A.

Now you want to initialize device E.  This means the window for bridge A
has to be widened, because bridge C is going to need its own 4K window.
 If the linearly consecutive address space is not available, it now
means reconfiguring bridge B and device D.

4K granularity really hurts.  It made sense when bridges were relatively
rare, but in PCI-express world that is no longer the case...

	-hpa

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-13 18:56 ` Bjorn Helgaas
  2010-05-13 19:08   ` H. Peter Anvin
@ 2010-05-13 19:12   ` Mike Travis
  2010-05-13 19:13     ` Mike Travis
  2010-05-13 19:54     ` Bjorn Helgaas
  2010-05-13 19:38   ` Mike Habeck
  2 siblings, 2 replies; 49+ messages in thread
From: Mike Travis @ 2010-05-13 19:12 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, x86, Jesse Barnes,
	Jacob Pan, Tejun Heo, Mike Habeck, LKML, Yinghai



Bjorn Helgaas wrote:
> On Wednesday, May 12, 2010 12:14:32 pm Mike Travis wrote:
>> Subject: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
>> From: Mike Habeck <habeck@sgi.com>
>>
>> The Linux kernel assigns BARs that a BIOS did not assign, most likely
>> to handle broken BIOSes that didn't enumerate the devices correctly.
>> On UV the BIOS purposely doesn't assign I/O BARs for certain devices/
>> drivers we know don't use them (examples, LSI SAS, Qlogic FC, ...).
>> We purposely don't assign these I/O BARs because I/O Space is a very
>> limited resource.  There is only 64k of I/O Space, and in a PCIe
>> topology that space gets divided up into 4k chucks (this is due to
>> the fact that a pci-to-pci bridge's I/O decoder is aligned at 4k)...
>> Thus a system can have at most 16 cards with I/O BARs: (64k / 4k = 16)
>>
>> SGI needs to scale to >16 devices with I/O BARs.  So by not assigning
>> I/O BARs on devices we know don't use them, we can do that (iff the
>> kernel doesn't go and assign these BARs that the BIOS purposely didn't
>> assign).
> 
> I don't quite understand this part.  If you boot with "pci=nobar",
> the BIOS doesn't assign BARs, Linux doesn't either, the drivers
> don't need them -- everything works, and that makes sense so far.
> 
> Now, if you boot normally (without "pci=nobar"), what changes?
> The BIOS situation is the same, but Linux tries to assign the
> unassigned BARs.  It may assign a few before running out of space,
> but the drivers still don't need those BARs.  What breaks?

The problem arises because we run out of address spaces to assign.

Say you have 24 cards, and the 1st 16 do not use I/O BARs.  If
you assign the available 16 address spaces to cards that may not
need them, then the final 8 cards will not be available.

This avoids this problem by not wasting I/O address spaces when
they are not going to be used.

> 
>> This patch will not assign a resource to a device BAR if that BAR was
>> not assigned by the BIOS, and the kernel cmdline option 'pci=nobar'
>> was specified.   This patch is closely modeled after the 'pci=norom'
>> option that currently exists in the tree.
> 
> Can't we figure out whether we need this ourselves?  Using a command-
> line option just guarantees that we'll forever be writing customer
> advisories about this issue.

I think since this is so specific (like the potential of having
more than 16 cards would be something the customer would know),
I think it's better to error on the safe side.  If a BIOS does
not recognize an add in card (for whatever reason), and does
not assign the I/O BAR, then it would be up to the kernel to
do that.  Wouldn't you get more customer complaints about non-working
I/O, than someone with > 16 PCI cards not being able to use them
all?

> 
> This issue is not specific to x86, so I don't really like having
> the implementation be x86-specific.

We were going for as light a touch as possible, as there is not
time to verify other arches.  I'd be glad to submit a follow on
patch dealing with the generic case and depend on others for
testing, if that's of interest.

Note we also modeled the option to be identical in operation to
the pci=norom option, which is a similar x86 specific function.

> 
> Do we know anything about how other OSes handle this case of I/O
> space exhaustion?

16+ PCI devices is a fairly large amount.  Are there any other PC's
that handle this much I/O?
> 
> I'm a little bit nervous about Linux's current strategy of assigning
> resources to things before we even know whether we're going to use
> them.  We don't support dynamic PCI resource reassignment, so maybe
> we don't have any choice in this case, but generally I prefer the
> lazy approach.

That's a great idea if it can work.  Unfortunately, we are all tied
to the way BIOS sets up the system, and for UV systems I don't think
dynamic provisioning would work.  There's too much infrastructure
that all has to cooperate by the time the system is fully functional.

> 
> Bjorn

Thanks for the feedback.

Mike

> 
>> Signed-off-by: Mike Habeck <habeck@sgi.com>
>> Signed-off-by: Mike Travis <travis@sgi.com>
>> ---
>>  Documentation/kernel-parameters.txt |    2 ++
>>  arch/x86/include/asm/pci_x86.h      |    1 +
>>  arch/x86/pci/common.c               |   20 ++++++++++++++++++++
>>  3 files changed, 23 insertions(+)
>>
>> --- linux.orig/Documentation/kernel-parameters.txt
>> +++ linux/Documentation/kernel-parameters.txt
>> @@ -1935,6 +1935,8 @@ and is between 256 and 4096 characters.
>>  		norom		[X86] Do not assign address space to
>>  				expansion ROMs that do not already have
>>  				BIOS assigned address ranges.
>> +		nobar		[X86] Do not assign address space to the
>> +				BARs that weren't assigned by the BIOS.
>>  		irqmask=0xMMMM	[X86] Set a bit mask of IRQs allowed to be
>>  				assigned automatically to PCI devices. You can
>>  				make the kernel exclude IRQs of your ISA cards
>> --- linux.orig/arch/x86/include/asm/pci_x86.h
>> +++ linux/arch/x86/include/asm/pci_x86.h
>> @@ -30,6 +30,7 @@
>>  #define PCI_HAS_IO_ECS		0x40000
>>  #define PCI_NOASSIGN_ROMS	0x80000
>>  #define PCI_ROOT_NO_CRS		0x100000
>> +#define PCI_NOASSIGN_BARS	0x200000
>>  
>>  extern unsigned int pci_probe;
>>  extern unsigned long pirq_table_addr;
>> --- linux.orig/arch/x86/pci/common.c
>> +++ linux/arch/x86/pci/common.c
>> @@ -125,6 +125,23 @@ void __init dmi_check_skip_isa_align(voi
>>  static void __devinit pcibios_fixup_device_resources(struct pci_dev *dev)
>>  {
>>  	struct resource *rom_r = &dev->resource[PCI_ROM_RESOURCE];
>> +	struct resource *bar_r;
>> +	int bar;
>> +
>> +	if (pci_probe & PCI_NOASSIGN_BARS) {
>> +		/*
>> +		* If the BIOS did not assign the BAR, zero out the
>> +		* resource so the kernel doesn't attmept to assign
>> +		* it later on in pci_assign_unassigned_resources
>> +		*/
>> +		for (bar = 0; bar <= PCI_STD_RESOURCE_END; bar++) {
>> +			bar_r = &dev->resource[bar];
>> +			if (bar_r->start == 0 && bar_r->end != 0) {
>> +				bar_r->flags = 0;
>> +				bar_r->end = 0;
>> +			}
>> +		}
>> +	}
>>  
>>  	if (pci_probe & PCI_NOASSIGN_ROMS) {
>>  		if (rom_r->parent)
>> @@ -509,6 +526,9 @@ char * __devinit  pcibios_setup(char *st
>>  	} else if (!strcmp(str, "norom")) {
>>  		pci_probe |= PCI_NOASSIGN_ROMS;
>>  		return NULL;
>> +	} else if (!strcmp(str, "nobar")) {
>> +		pci_probe |= PCI_NOASSIGN_BARS;
>> +		return NULL;
>>  	} else if (!strcmp(str, "assign-busses")) {
>>  		pci_probe |= PCI_ASSIGN_ALL_BUSSES;
>>  		return NULL;
>>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-13 19:12   ` Mike Travis
@ 2010-05-13 19:13     ` Mike Travis
  2010-05-13 19:54     ` Bjorn Helgaas
  1 sibling, 0 replies; 49+ messages in thread
From: Mike Travis @ 2010-05-13 19:13 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, x86, Jesse Barnes,
	Jacob Pan, Tejun Heo, Mike Habeck, LKML, Yinghai, linux-pci

> Oops, I added Yinghai to the CC: list, but I forgot to add
> linux-pci@vger.kernel.org.  Please add that on any future replies.

[added.]

Mike Travis wrote:
> 
> 
> Bjorn Helgaas wrote:
>> On Wednesday, May 12, 2010 12:14:32 pm Mike Travis wrote:
>>> Subject: [Patch 1/1] x86 pci: Add option to not assign BAR's if not 
>>> already assigned
>>> From: Mike Habeck <habeck@sgi.com>
>>>
>>> The Linux kernel assigns BARs that a BIOS did not assign, most likely
>>> to handle broken BIOSes that didn't enumerate the devices correctly.
>>> On UV the BIOS purposely doesn't assign I/O BARs for certain devices/
>>> drivers we know don't use them (examples, LSI SAS, Qlogic FC, ...).
>>> We purposely don't assign these I/O BARs because I/O Space is a very
>>> limited resource.  There is only 64k of I/O Space, and in a PCIe
>>> topology that space gets divided up into 4k chucks (this is due to
>>> the fact that a pci-to-pci bridge's I/O decoder is aligned at 4k)...
>>> Thus a system can have at most 16 cards with I/O BARs: (64k / 4k = 16)
>>>
>>> SGI needs to scale to >16 devices with I/O BARs.  So by not assigning
>>> I/O BARs on devices we know don't use them, we can do that (iff the
>>> kernel doesn't go and assign these BARs that the BIOS purposely didn't
>>> assign).
>>
>> I don't quite understand this part.  If you boot with "pci=nobar",
>> the BIOS doesn't assign BARs, Linux doesn't either, the drivers
>> don't need them -- everything works, and that makes sense so far.
>>
>> Now, if you boot normally (without "pci=nobar"), what changes?
>> The BIOS situation is the same, but Linux tries to assign the
>> unassigned BARs.  It may assign a few before running out of space,
>> but the drivers still don't need those BARs.  What breaks?
> 
> The problem arises because we run out of address spaces to assign.
> 
> Say you have 24 cards, and the 1st 16 do not use I/O BARs.  If
> you assign the available 16 address spaces to cards that may not
> need them, then the final 8 cards will not be available.
> 
> This avoids this problem by not wasting I/O address spaces when
> they are not going to be used.
> 
>>
>>> This patch will not assign a resource to a device BAR if that BAR was
>>> not assigned by the BIOS, and the kernel cmdline option 'pci=nobar'
>>> was specified.   This patch is closely modeled after the 'pci=norom'
>>> option that currently exists in the tree.
>>
>> Can't we figure out whether we need this ourselves?  Using a command-
>> line option just guarantees that we'll forever be writing customer
>> advisories about this issue.
> 
> I think since this is so specific (like the potential of having
> more than 16 cards would be something the customer would know),
> I think it's better to error on the safe side.  If a BIOS does
> not recognize an add in card (for whatever reason), and does
> not assign the I/O BAR, then it would be up to the kernel to
> do that.  Wouldn't you get more customer complaints about non-working
> I/O, than someone with > 16 PCI cards not being able to use them
> all?
> 
>>
>> This issue is not specific to x86, so I don't really like having
>> the implementation be x86-specific.
> 
> We were going for as light a touch as possible, as there is not
> time to verify other arches.  I'd be glad to submit a follow on
> patch dealing with the generic case and depend on others for
> testing, if that's of interest.
> 
> Note we also modeled the option to be identical in operation to
> the pci=norom option, which is a similar x86 specific function.
> 
>>
>> Do we know anything about how other OSes handle this case of I/O
>> space exhaustion?
> 
> 16+ PCI devices is a fairly large amount.  Are there any other PC's
> that handle this much I/O?
>>
>> I'm a little bit nervous about Linux's current strategy of assigning
>> resources to things before we even know whether we're going to use
>> them.  We don't support dynamic PCI resource reassignment, so maybe
>> we don't have any choice in this case, but generally I prefer the
>> lazy approach.
> 
> That's a great idea if it can work.  Unfortunately, we are all tied
> to the way BIOS sets up the system, and for UV systems I don't think
> dynamic provisioning would work.  There's too much infrastructure
> that all has to cooperate by the time the system is fully functional.
> 
>>
>> Bjorn
> 
> Thanks for the feedback.
> 
> Mike
> 
>>
>>> Signed-off-by: Mike Habeck <habeck@sgi.com>
>>> Signed-off-by: Mike Travis <travis@sgi.com>
>>> ---
>>>  Documentation/kernel-parameters.txt |    2 ++
>>>  arch/x86/include/asm/pci_x86.h      |    1 +
>>>  arch/x86/pci/common.c               |   20 ++++++++++++++++++++
>>>  3 files changed, 23 insertions(+)
>>>
>>> --- linux.orig/Documentation/kernel-parameters.txt
>>> +++ linux/Documentation/kernel-parameters.txt
>>> @@ -1935,6 +1935,8 @@ and is between 256 and 4096 characters.
>>>          norom        [X86] Do not assign address space to
>>>                  expansion ROMs that do not already have
>>>                  BIOS assigned address ranges.
>>> +        nobar        [X86] Do not assign address space to the
>>> +                BARs that weren't assigned by the BIOS.
>>>          irqmask=0xMMMM    [X86] Set a bit mask of IRQs allowed to be
>>>                  assigned automatically to PCI devices. You can
>>>                  make the kernel exclude IRQs of your ISA cards
>>> --- linux.orig/arch/x86/include/asm/pci_x86.h
>>> +++ linux/arch/x86/include/asm/pci_x86.h
>>> @@ -30,6 +30,7 @@
>>>  #define PCI_HAS_IO_ECS        0x40000
>>>  #define PCI_NOASSIGN_ROMS    0x80000
>>>  #define PCI_ROOT_NO_CRS        0x100000
>>> +#define PCI_NOASSIGN_BARS    0x200000
>>>  
>>>  extern unsigned int pci_probe;
>>>  extern unsigned long pirq_table_addr;
>>> --- linux.orig/arch/x86/pci/common.c
>>> +++ linux/arch/x86/pci/common.c
>>> @@ -125,6 +125,23 @@ void __init dmi_check_skip_isa_align(voi
>>>  static void __devinit pcibios_fixup_device_resources(struct pci_dev 
>>> *dev)
>>>  {
>>>      struct resource *rom_r = &dev->resource[PCI_ROM_RESOURCE];
>>> +    struct resource *bar_r;
>>> +    int bar;
>>> +
>>> +    if (pci_probe & PCI_NOASSIGN_BARS) {
>>> +        /*
>>> +        * If the BIOS did not assign the BAR, zero out the
>>> +        * resource so the kernel doesn't attmept to assign
>>> +        * it later on in pci_assign_unassigned_resources
>>> +        */
>>> +        for (bar = 0; bar <= PCI_STD_RESOURCE_END; bar++) {
>>> +            bar_r = &dev->resource[bar];
>>> +            if (bar_r->start == 0 && bar_r->end != 0) {
>>> +                bar_r->flags = 0;
>>> +                bar_r->end = 0;
>>> +            }
>>> +        }
>>> +    }
>>>  
>>>      if (pci_probe & PCI_NOASSIGN_ROMS) {
>>>          if (rom_r->parent)
>>> @@ -509,6 +526,9 @@ char * __devinit  pcibios_setup(char *st
>>>      } else if (!strcmp(str, "norom")) {
>>>          pci_probe |= PCI_NOASSIGN_ROMS;
>>>          return NULL;
>>> +    } else if (!strcmp(str, "nobar")) {
>>> +        pci_probe |= PCI_NOASSIGN_BARS;
>>> +        return NULL;
>>>      } else if (!strcmp(str, "assign-busses")) {
>>>          pci_probe |= PCI_ASSIGN_ALL_BUSSES;
>>>          return NULL;
>>>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-13 18:56 ` Bjorn Helgaas
  2010-05-13 19:08   ` H. Peter Anvin
  2010-05-13 19:12   ` Mike Travis
@ 2010-05-13 19:38   ` Mike Habeck
  2010-05-13 20:02     ` Bjorn Helgaas
  2010-05-13 20:36     ` Yinghai Lu
  2 siblings, 2 replies; 49+ messages in thread
From: Mike Habeck @ 2010-05-13 19:38 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mike Travis, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, x86,
	Jesse Barnes, Jacob Pan, Tejun Heo, LKML, Yinghai, linux-pci

Bjorn Helgaas wrote:
> On Wednesday, May 12, 2010 12:14:32 pm Mike Travis wrote:
>> Subject: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
>> From: Mike Habeck <habeck@sgi.com>
>>
>> The Linux kernel assigns BARs that a BIOS did not assign, most likely
>> to handle broken BIOSes that didn't enumerate the devices correctly.
>> On UV the BIOS purposely doesn't assign I/O BARs for certain devices/
>> drivers we know don't use them (examples, LSI SAS, Qlogic FC, ...).
>> We purposely don't assign these I/O BARs because I/O Space is a very
>> limited resource.  There is only 64k of I/O Space, and in a PCIe
>> topology that space gets divided up into 4k chucks (this is due to
>> the fact that a pci-to-pci bridge's I/O decoder is aligned at 4k)...
>> Thus a system can have at most 16 cards with I/O BARs: (64k / 4k = 16)
>>
>> SGI needs to scale to >16 devices with I/O BARs.  So by not assigning
>> I/O BARs on devices we know don't use them, we can do that (iff the
>> kernel doesn't go and assign these BARs that the BIOS purposely didn't
>> assign).
> 
> I don't quite understand this part.  If you boot with "pci=nobar",
> the BIOS doesn't assign BARs, Linux doesn't either, the drivers
> don't need them -- everything works, and that makes sense so far.
> 
> Now, if you boot normally (without "pci=nobar"), what changes?
> The BIOS situation is the same, but Linux tries to assign the
> unassigned BARs.  It may assign a few before running out of space,
> but the drivers still don't need those BARs.  What breaks?


Nothing really breaks, it's more of a problem that the kernel uses
up the rest of the I/O Space, and starts spitting out warning
messages as it tries to assign the rest of the I/O BARs that the
BIOS didn't assign, something like:

   pci 0010:03:00.0: BAR 5: can't allocate I/O resource [0x0-0x7f]
   pci 0012:05:00.0: BAR 5: can't allocate I/O resource [0x0-0x7f]
   ...

And in using up all the I/O space, I think that could prevent a
hotplug attach of a pci device requiring I/O space (although I
believe most BIOSes pad the bridge decoders to support that).
I'm not to familiar with how pci hotplug works on x86 so I may
be wrong in what I just stated.

> 
>> This patch will not assign a resource to a device BAR if that BAR was
>> not assigned by the BIOS, and the kernel cmdline option 'pci=nobar'
>> was specified.   This patch is closely modeled after the 'pci=norom'
>> option that currently exists in the tree.
> 
> Can't we figure out whether we need this ourselves?  Using a command-
> line option just guarantees that we'll forever be writing customer
> advisories about this issue.
> 
> This issue is not specific to x86, so I don't really like having
> the implementation be x86-specific.

I agree this isn't a x86 specific issue but given the 'norom'
cmdline option is basically doing the same thing (but for pci
Expansion ROM BARs) this code was modeled after it.

> 
> Do we know anything about how other OSes handle this case of I/O
> space exhaustion?
> 
> I'm a little bit nervous about Linux's current strategy of assigning
> resources to things before we even know whether we're going to use
> them.  We don't support dynamic PCI resource reassignment, so maybe
> we don't have any choice in this case, but generally I prefer the
> lazy approach.
> 
> Bjorn
> 
>> Signed-off-by: Mike Habeck <habeck@sgi.com>
>> Signed-off-by: Mike Travis <travis@sgi.com>
>> ---
>>  Documentation/kernel-parameters.txt |    2 ++
>>  arch/x86/include/asm/pci_x86.h      |    1 +
>>  arch/x86/pci/common.c               |   20 ++++++++++++++++++++
>>  3 files changed, 23 insertions(+)
>>
>> --- linux.orig/Documentation/kernel-parameters.txt
>> +++ linux/Documentation/kernel-parameters.txt
>> @@ -1935,6 +1935,8 @@ and is between 256 and 4096 characters.
>>  		norom		[X86] Do not assign address space to
>>  				expansion ROMs that do not already have
>>  				BIOS assigned address ranges.
>> +		nobar		[X86] Do not assign address space to the
>> +				BARs that weren't assigned by the BIOS.
>>  		irqmask=0xMMMM	[X86] Set a bit mask of IRQs allowed to be
>>  				assigned automatically to PCI devices. You can
>>  				make the kernel exclude IRQs of your ISA cards
>> --- linux.orig/arch/x86/include/asm/pci_x86.h
>> +++ linux/arch/x86/include/asm/pci_x86.h
>> @@ -30,6 +30,7 @@
>>  #define PCI_HAS_IO_ECS		0x40000
>>  #define PCI_NOASSIGN_ROMS	0x80000
>>  #define PCI_ROOT_NO_CRS		0x100000
>> +#define PCI_NOASSIGN_BARS	0x200000
>>  
>>  extern unsigned int pci_probe;
>>  extern unsigned long pirq_table_addr;
>> --- linux.orig/arch/x86/pci/common.c
>> +++ linux/arch/x86/pci/common.c
>> @@ -125,6 +125,23 @@ void __init dmi_check_skip_isa_align(voi
>>  static void __devinit pcibios_fixup_device_resources(struct pci_dev *dev)
>>  {
>>  	struct resource *rom_r = &dev->resource[PCI_ROM_RESOURCE];
>> +	struct resource *bar_r;
>> +	int bar;
>> +
>> +	if (pci_probe & PCI_NOASSIGN_BARS) {
>> +		/*
>> +		* If the BIOS did not assign the BAR, zero out the
>> +		* resource so the kernel doesn't attmept to assign
>> +		* it later on in pci_assign_unassigned_resources
>> +		*/
>> +		for (bar = 0; bar <= PCI_STD_RESOURCE_END; bar++) {
>> +			bar_r = &dev->resource[bar];
>> +			if (bar_r->start == 0 && bar_r->end != 0) {
>> +				bar_r->flags = 0;
>> +				bar_r->end = 0;
>> +			}
>> +		}
>> +	}
>>  
>>  	if (pci_probe & PCI_NOASSIGN_ROMS) {
>>  		if (rom_r->parent)
>> @@ -509,6 +526,9 @@ char * __devinit  pcibios_setup(char *st
>>  	} else if (!strcmp(str, "norom")) {
>>  		pci_probe |= PCI_NOASSIGN_ROMS;
>>  		return NULL;
>> +	} else if (!strcmp(str, "nobar")) {
>> +		pci_probe |= PCI_NOASSIGN_BARS;
>> +		return NULL;
>>  	} else if (!strcmp(str, "assign-busses")) {
>>  		pci_probe |= PCI_ASSIGN_ALL_BUSSES;
>>  		return NULL;
>>


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-13 19:12   ` Mike Travis
  2010-05-13 19:13     ` Mike Travis
@ 2010-05-13 19:54     ` Bjorn Helgaas
  2010-05-13 20:27       ` Mike Habeck
  1 sibling, 1 reply; 49+ messages in thread
From: Bjorn Helgaas @ 2010-05-13 19:54 UTC (permalink / raw)
  To: Mike Travis
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, x86, Jesse Barnes,
	Jacob Pan, Tejun Heo, Mike Habeck, LKML, Yinghai, linux-pci,
	Myron Stowe

On Thursday, May 13, 2010 01:12:21 pm Mike Travis wrote:
> 
> Bjorn Helgaas wrote:
> > On Wednesday, May 12, 2010 12:14:32 pm Mike Travis wrote:
> >> Subject: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
> >> From: Mike Habeck <habeck@sgi.com>
> >>
> >> The Linux kernel assigns BARs that a BIOS did not assign, most likely
> >> to handle broken BIOSes that didn't enumerate the devices correctly.
> >> On UV the BIOS purposely doesn't assign I/O BARs for certain devices/
> >> drivers we know don't use them (examples, LSI SAS, Qlogic FC, ...).
> >> We purposely don't assign these I/O BARs because I/O Space is a very
> >> limited resource.  There is only 64k of I/O Space, and in a PCIe
> >> topology that space gets divided up into 4k chucks (this is due to
> >> the fact that a pci-to-pci bridge's I/O decoder is aligned at 4k)...
> >> Thus a system can have at most 16 cards with I/O BARs: (64k / 4k = 16)
> >>
> >> SGI needs to scale to >16 devices with I/O BARs.  So by not assigning
> >> I/O BARs on devices we know don't use them, we can do that (iff the
> >> kernel doesn't go and assign these BARs that the BIOS purposely didn't
> >> assign).
> > 
> > I don't quite understand this part.  If you boot with "pci=nobar",
> > the BIOS doesn't assign BARs, Linux doesn't either, the drivers
> > don't need them -- everything works, and that makes sense so far.
> > 
> > Now, if you boot normally (without "pci=nobar"), what changes?
> > The BIOS situation is the same, but Linux tries to assign the
> > unassigned BARs.  It may assign a few before running out of space,
> > but the drivers still don't need those BARs.  What breaks?
> 
> The problem arises because we run out of address spaces to assign.
> 
> Say you have 24 cards, and the 1st 16 do not use I/O BARs.  If
> you assign the available 16 address spaces to cards that may not
> need them, then the final 8 cards will not be available.

It sounds like your BIOS treats some devices specially, so I assumed
it would leave the first sixteen devices unassigned, but would assign
the last eight, including the bridge windows leading to them.  In that
case, I would expect Linux to preserve the resources of the last
eight devices, since they're already assigned, and assign anything
left over to the first sixteen.

Are you saying that Linux clobbers the resources of the last eight
devices in the process of assigning the first sixteen?  If so, I'd
say that's a Linux bug.

Or are the last eight hot-added cards that the BIOS never had a
chance to assign?  That's definitely a problem.

> > Can't we figure out whether we need this ourselves?  Using a command-
> > line option just guarantees that we'll forever be writing customer
> > advisories about this issue.
> 
> I think since this is so specific (like the potential of having
> more than 16 cards would be something the customer would know),
> I think it's better to error on the safe side.  If a BIOS does
> not recognize an add in card (for whatever reason), and does
> not assign the I/O BAR, then it would be up to the kernel to
> do that.  Wouldn't you get more customer complaints about non-working
> I/O, than someone with > 16 PCI cards not being able to use them
> all?

It feels specific now, but in five years, I bet it won't be so
unusual.  I really don't want to force customers to figure out
when they need this.

> > This issue is not specific to x86, so I don't really like having
> > the implementation be x86-specific.
> 
> We were going for as light a touch as possible, as there is not
> time to verify other arches.  I'd be glad to submit a follow on
> patch dealing with the generic case and depend on others for
> testing, if that's of interest.

It's of interest to me.  I spend a lot of time pulling generic
out of architecture-specific places.  If there's stuff that we
know is generic from the beginning, we shouldn't make work for
ourselves by making it x86-specific.

> > I'm a little bit nervous about Linux's current strategy of assigning
> > resources to things before we even know whether we're going to use
> > them.  We don't support dynamic PCI resource reassignment, so maybe
> > we don't have any choice in this case, but generally I prefer the
> > lazy approach.
> 
> That's a great idea if it can work.  Unfortunately, we are all tied
> to the way BIOS sets up the system, and for UV systems I don't think
> dynamic provisioning would work.  There's too much infrastructure
> that all has to cooperate by the time the system is fully functional.

Like I said, we maybe don't have a choice in this case, but I'd like
to have a clearer understanding of the problem and how other OSes
deal with it before we start applying band-aids that will hurt when
we pull them off later.

Bjorn

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-13 19:38   ` Mike Habeck
@ 2010-05-13 20:02     ` Bjorn Helgaas
  2010-05-13 20:09       ` H. Peter Anvin
  2010-05-14 22:25       ` Jesse Barnes
  2010-05-13 20:36     ` Yinghai Lu
  1 sibling, 2 replies; 49+ messages in thread
From: Bjorn Helgaas @ 2010-05-13 20:02 UTC (permalink / raw)
  To: Mike Habeck
  Cc: Mike Travis, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, x86,
	Jesse Barnes, Jacob Pan, Tejun Heo, LKML, Yinghai, linux-pci,
	Myron Stowe

On Thursday, May 13, 2010 01:38:24 pm Mike Habeck wrote:
> Bjorn Helgaas wrote:
> > On Wednesday, May 12, 2010 12:14:32 pm Mike Travis wrote:
> >> Subject: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
> >> From: Mike Habeck <habeck@sgi.com>
> >>
> >> The Linux kernel assigns BARs that a BIOS did not assign, most likely
> >> to handle broken BIOSes that didn't enumerate the devices correctly.
> >> On UV the BIOS purposely doesn't assign I/O BARs for certain devices/
> >> drivers we know don't use them (examples, LSI SAS, Qlogic FC, ...).
> >> We purposely don't assign these I/O BARs because I/O Space is a very
> >> limited resource.  There is only 64k of I/O Space, and in a PCIe
> >> topology that space gets divided up into 4k chucks (this is due to
> >> the fact that a pci-to-pci bridge's I/O decoder is aligned at 4k)...
> >> Thus a system can have at most 16 cards with I/O BARs: (64k / 4k = 16)
> >>
> >> SGI needs to scale to >16 devices with I/O BARs.  So by not assigning
> >> I/O BARs on devices we know don't use them, we can do that (iff the
> >> kernel doesn't go and assign these BARs that the BIOS purposely didn't
> >> assign).
> > 
> > I don't quite understand this part.  If you boot with "pci=nobar",
> > the BIOS doesn't assign BARs, Linux doesn't either, the drivers
> > don't need them -- everything works, and that makes sense so far.
> > 
> > Now, if you boot normally (without "pci=nobar"), what changes?
> > The BIOS situation is the same, but Linux tries to assign the
> > unassigned BARs.  It may assign a few before running out of space,
> > but the drivers still don't need those BARs.  What breaks?
> 
> Nothing really breaks, it's more of a problem that the kernel uses
> up the rest of the I/O Space, and starts spitting out warning
> messages as it tries to assign the rest of the I/O BARs that the
> BIOS didn't assign, something like:
> 
>    pci 0010:03:00.0: BAR 5: can't allocate I/O resource [0x0-0x7f]
>    pci 0012:05:00.0: BAR 5: can't allocate I/O resource [0x0-0x7f]
>    ...

OK, that's what I would expect.  Personally, I think I'd *like*
to have those messages.  If 0010:03:00.0 is a device whose driver
depends on I/O space, the message will be a good clue as to why
the driver isn't working.

> And in using up all the I/O space, I think that could prevent a
> hotplug attach of a pci device requiring I/O space (although I
> believe most BIOSes pad the bridge decoders to support that).
> I'm not to familiar with how pci hotplug works on x86 so I may
> be wrong in what I just stated.

Yep, that's definitely a problem, and I don't have a good solution.

HP (and probably SGI) had a nice hardware solution for ia64 --
address translation across the host bridge, so each bridge could
have its own 64K I/O space.  But I don't see that coming in the
x86 PC arena.

> > This issue is not specific to x86, so I don't really like having
> > the implementation be x86-specific.
> 
> I agree this isn't a x86 specific issue but given the 'norom'
> cmdline option is basically doing the same thing (but for pci
> Expansion ROM BARs) this code was modeled after it.

IMHO, we should fix both.

Bjorn

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-13 20:02     ` Bjorn Helgaas
@ 2010-05-13 20:09       ` H. Peter Anvin
  2010-05-14 22:25       ` Jesse Barnes
  1 sibling, 0 replies; 49+ messages in thread
From: H. Peter Anvin @ 2010-05-13 20:09 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mike Habeck, Mike Travis, Ingo Molnar, Thomas Gleixner, x86,
	Jesse Barnes, Jacob Pan, Tejun Heo, LKML, Yinghai, linux-pci,
	Myron Stowe

On 05/13/2010 01:02 PM, Bjorn Helgaas wrote:
> 
> Yep, that's definitely a problem, and I don't have a good solution.
> 
> HP (and probably SGI) had a nice hardware solution for ia64 --
> address translation across the host bridge, so each bridge could
> have its own 64K I/O space.  But I don't see that coming in the
> x86 PC arena.
> 

I would agree, that is unlikely (not that I have any information on this
subject, mind you.)  Much more likely would be a bridge extension
capability for finer-grained I/O space routing.

	-hpa

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-13 19:54     ` Bjorn Helgaas
@ 2010-05-13 20:27       ` Mike Habeck
  0 siblings, 0 replies; 49+ messages in thread
From: Mike Habeck @ 2010-05-13 20:27 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mike Travis, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, x86,
	Jesse Barnes, Jacob Pan, Tejun Heo, LKML, Yinghai, linux-pci,
	Myron Stowe, habeck

Bjorn Helgaas wrote:
> On Thursday, May 13, 2010 01:12:21 pm Mike Travis wrote:
>> Bjorn Helgaas wrote:
>>> On Wednesday, May 12, 2010 12:14:32 pm Mike Travis wrote:
>>>> Subject: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
>>>> From: Mike Habeck <habeck@sgi.com>
>>>>
>>>> The Linux kernel assigns BARs that a BIOS did not assign, most likely
>>>> to handle broken BIOSes that didn't enumerate the devices correctly.
>>>> On UV the BIOS purposely doesn't assign I/O BARs for certain devices/
>>>> drivers we know don't use them (examples, LSI SAS, Qlogic FC, ...).
>>>> We purposely don't assign these I/O BARs because I/O Space is a very
>>>> limited resource.  There is only 64k of I/O Space, and in a PCIe
>>>> topology that space gets divided up into 4k chucks (this is due to
>>>> the fact that a pci-to-pci bridge's I/O decoder is aligned at 4k)...
>>>> Thus a system can have at most 16 cards with I/O BARs: (64k / 4k = 16)
>>>>
>>>> SGI needs to scale to >16 devices with I/O BARs.  So by not assigning
>>>> I/O BARs on devices we know don't use them, we can do that (iff the
>>>> kernel doesn't go and assign these BARs that the BIOS purposely didn't
>>>> assign).
>>> I don't quite understand this part.  If you boot with "pci=nobar",
>>> the BIOS doesn't assign BARs, Linux doesn't either, the drivers
>>> don't need them -- everything works, and that makes sense so far.
>>>
>>> Now, if you boot normally (without "pci=nobar"), what changes?
>>> The BIOS situation is the same, but Linux tries to assign the
>>> unassigned BARs.  It may assign a few before running out of space,
>>> but the drivers still don't need those BARs.  What breaks?
>> The problem arises because we run out of address spaces to assign.
>>
>> Say you have 24 cards, and the 1st 16 do not use I/O BARs.  If
>> you assign the available 16 address spaces to cards that may not
>> need them, then the final 8 cards will not be available.
> 
> It sounds like your BIOS treats some devices specially, so I assumed
> it would leave the first sixteen devices unassigned, but would assign
> the last eight, including the bridge windows leading to them.  In that
> case, I would expect Linux to preserve the resources of the last
> eight devices, since they're already assigned, and assign anything
> left over to the first sixteen.

Correct, the BIOS special cases devices that have an I/O BAR but
aren't used because the same functionality that the I/O BAR provides
is available via the MEM BAR.  So if the device driver only uses the
MEM BAR then the BIOS will only assign resources to it, not to the
unused I/O BAR

And yes, linux would preserve the resources of the last eight devices
in your example above

> 
> Are you saying that Linux clobbers the resources of the last eight
> devices in the process of assigning the first sixteen?  If so, I'd
> say that's a Linux bug.

No, Linux does not clobbers the resources

> 
> Or are the last eight hot-added cards that the BIOS never had a
> chance to assign?  That's definitely a problem.

Yes, that is one of the problems we envision seeing.

> 
>>> Can't we figure out whether we need this ourselves?  Using a command-
>>> line option just guarantees that we'll forever be writing customer
>>> advisories about this issue.
>> I think since this is so specific (like the potential of having
>> more than 16 cards would be something the customer would know),
>> I think it's better to error on the safe side.  If a BIOS does
>> not recognize an add in card (for whatever reason), and does
>> not assign the I/O BAR, then it would be up to the kernel to
>> do that.  Wouldn't you get more customer complaints about non-working
>> I/O, than someone with > 16 PCI cards not being able to use them
>> all?
> 
> It feels specific now, but in five years, I bet it won't be so
> unusual.  I really don't want to force customers to figure out
> when they need this.
> 
>>> This issue is not specific to x86, so I don't really like having
>>> the implementation be x86-specific.
>> We were going for as light a touch as possible, as there is not
>> time to verify other arches.  I'd be glad to submit a follow on
>> patch dealing with the generic case and depend on others for
>> testing, if that's of interest.
> 
> It's of interest to me.  I spend a lot of time pulling generic
> out of architecture-specific places.  If there's stuff that we
> know is generic from the beginning, we shouldn't make work for
> ourselves by making it x86-specific.
> 
>>> I'm a little bit nervous about Linux's current strategy of assigning
>>> resources to things before we even know whether we're going to use
>>> them.  We don't support dynamic PCI resource reassignment, so maybe
>>> we don't have any choice in this case, but generally I prefer the
>>> lazy approach.
>> That's a great idea if it can work.  Unfortunately, we are all tied
>> to the way BIOS sets up the system, and for UV systems I don't think
>> dynamic provisioning would work.  There's too much infrastructure
>> that all has to cooperate by the time the system is fully functional.
> 
> Like I said, we maybe don't have a choice in this case, but I'd like
> to have a clearer understanding of the problem and how other OSes
> deal with it before we start applying band-aids that will hurt when
> we pull them off later.
> 
> Bjorn


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-13 20:36     ` Yinghai Lu
@ 2010-05-13 20:34       ` H. Peter Anvin
  0 siblings, 0 replies; 49+ messages in thread
From: H. Peter Anvin @ 2010-05-13 20:34 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Mike Habeck, Bjorn Helgaas, Mike Travis, Ingo Molnar,
	Thomas Gleixner, x86, Jesse Barnes, Jacob Pan, Tejun Heo, LKML,
	linux-pci

On 05/13/2010 01:36 PM, Yinghai Lu wrote:
> 
> mmio range should be much better, it could use mmio64
> 

We're not talking about MMIO.

	-hpa

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-13 19:38   ` Mike Habeck
  2010-05-13 20:02     ` Bjorn Helgaas
@ 2010-05-13 20:36     ` Yinghai Lu
  2010-05-13 20:34       ` H. Peter Anvin
  1 sibling, 1 reply; 49+ messages in thread
From: Yinghai Lu @ 2010-05-13 20:36 UTC (permalink / raw)
  To: Mike Habeck
  Cc: Bjorn Helgaas, Mike Travis, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, x86, Jesse Barnes, Jacob Pan, Tejun Heo, LKML,
	linux-pci

On 05/13/2010 12:38 PM, Mike Habeck wrote:
>
>
> Nothing really breaks, it's more of a problem that the kernel uses
> up the rest of the I/O Space, and starts spitting out warning
> messages as it tries to assign the rest of the I/O BARs that the
> BIOS didn't assign, something like:
>
>   pci 0010:03:00.0: BAR 5: can't allocate I/O resource [0x0-0x7f]
>   pci 0012:05:00.0: BAR 5: can't allocate I/O resource [0x0-0x7f]
>   ...
>
> And in using up all the I/O space, I think that could prevent a
> hotplug attach of a pci device requiring I/O space (although I
> believe most BIOSes pad the bridge decoders to support that).
> I'm not to familiar with how pci hotplug works on x86 so I may
> be wrong in what I just stated.

assume you have several IOHs on your system, and for hotplug support,
BIOS is supposed to balance the IO ioport range allocation
between IOHs.

mmio range should be much better, it could use mmio64

if your BIOS can not provide right _CRS for peer root buses,  some
devices may get wrong allocation from kernel.
or BIOS set small BAR in the bridge, that could cause problem too.

YH



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-13 20:02     ` Bjorn Helgaas
  2010-05-13 20:09       ` H. Peter Anvin
@ 2010-05-14 22:25       ` Jesse Barnes
  2010-05-14 22:34         ` Mike Travis
  1 sibling, 1 reply; 49+ messages in thread
From: Jesse Barnes @ 2010-05-14 22:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mike Habeck, Mike Travis, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, x86, Jacob Pan, Tejun Heo, LKML, Yinghai,
	linux-pci, Myron Stowe

On Thu, 13 May 2010 14:02:30 -0600
Bjorn Helgaas <bjorn.helgaas@hp.com> wrote:
> > > This issue is not specific to x86, so I don't really like having
> > > the implementation be x86-specific.
> > 
> > I agree this isn't a x86 specific issue but given the 'norom'
> > cmdline option is basically doing the same thing (but for pci
> > Expansion ROM BARs) this code was modeled after it.
> 
> IMHO, we should fix both.

Yeah, that would be good.  Mike, have you looked at this at all?

Also, to clarify, this isn't affecting users today, right?  Or do you
need all this I/O space for multiple IOHs and the drivers that bind to
them in current UV systems?

Fundamentally, until we have real dynamic PCI resource management (i.e.
driver hooks for handling relocation, lazy allocation of resources at
driver bind time, etc.) we're going to continue to need hacks like
this.  However, we could make them slightly more automated by making
"nobar" and "norom" the default on systems that typically need them,
maybe with a DMI table.

Thanks,
-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-14 22:25       ` Jesse Barnes
@ 2010-05-14 22:34         ` Mike Travis
  2010-05-14 22:35           ` H. Peter Anvin
  2010-05-14 22:47           ` Jesse Barnes
  0 siblings, 2 replies; 49+ messages in thread
From: Mike Travis @ 2010-05-14 22:34 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Bjorn Helgaas, Mike Habeck, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, x86, Jacob Pan, Tejun Heo, LKML, Yinghai,
	linux-pci, Myron Stowe



Jesse Barnes wrote:
> On Thu, 13 May 2010 14:02:30 -0600
> Bjorn Helgaas <bjorn.helgaas@hp.com> wrote:
>>>> This issue is not specific to x86, so I don't really like having
>>>> the implementation be x86-specific.
>>> I agree this isn't a x86 specific issue but given the 'norom'
>>> cmdline option is basically doing the same thing (but for pci
>>> Expansion ROM BARs) this code was modeled after it.

>> IMHO, we should fix both.
> 
> Yeah, that would be good.  Mike, have you looked at this at all?
> 
> Also, to clarify, this isn't affecting users today, right?  Or do you
> need all this I/O space for multiple IOHs and the drivers that bind to
> them in current UV systems?

We have customers that want to install more than 16 PCI-e cards right
now.  Our window of opportunity closes very soon (days), so either this
patch makes it in as is (or something close), or we wait for another
release cycle.  UV shipments start this month.

[I wouldn't mind working on an improvement for later.]

> 
> Fundamentally, until we have real dynamic PCI resource management (i.e.
> driver hooks for handling relocation, lazy allocation of resources at
> driver bind time, etc.) we're going to continue to need hacks like
> this.  However, we could make them slightly more automated by making
> "nobar" and "norom" the default on systems that typically need them,
> maybe with a DMI table.

It seems that BIOS changes are much more difficult.  The real solution
to this problem is for Card Vendors to not request I/O Bars if they
won't be using them.  But that's the hardest option of all to accomplish.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-14 22:34         ` Mike Travis
@ 2010-05-14 22:35           ` H. Peter Anvin
  2010-05-14 22:40             ` Mike Travis
  2010-05-14 22:47           ` Jesse Barnes
  1 sibling, 1 reply; 49+ messages in thread
From: H. Peter Anvin @ 2010-05-14 22:35 UTC (permalink / raw)
  To: Mike Travis
  Cc: Jesse Barnes, Bjorn Helgaas, Mike Habeck, Ingo Molnar,
	Thomas Gleixner, x86, Jacob Pan, Tejun Heo, LKML, Yinghai,
	linux-pci, Myron Stowe

On 05/14/2010 03:34 PM, Mike Travis wrote:
> 
> It seems that BIOS changes are much more difficult.  The real solution
> to this problem is for Card Vendors to not request I/O Bars if they
> won't be using them.  But that's the hardest option of all to accomplish.
> 

That is a non-option.  Any device which may have to be a boot or console
device in a legacy system pretty much needs them.

	-hpa

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-14 22:35           ` H. Peter Anvin
@ 2010-05-14 22:40             ` Mike Travis
  2010-05-15  2:25               ` Mike Travis
  0 siblings, 1 reply; 49+ messages in thread
From: Mike Travis @ 2010-05-14 22:40 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jesse Barnes, Bjorn Helgaas, Mike Habeck, Ingo Molnar,
	Thomas Gleixner, x86, Jacob Pan, Tejun Heo, LKML, Yinghai,
	linux-pci, Myron Stowe



H. Peter Anvin wrote:
> On 05/14/2010 03:34 PM, Mike Travis wrote:
>> It seems that BIOS changes are much more difficult.  The real solution
>> to this problem is for Card Vendors to not request I/O Bars if they
>> won't be using them.  But that's the hardest option of all to accomplish.
>>
> 
> That is a non-option.  Any device which may have to be a boot or console
> device in a legacy system pretty much needs them.
> 
> 	-hpa

Yes, that's true.  We're somewhat fortunate in that our Legacy I/O
devices are confined to those on the first blade, and all of these
other devices are on other blades (PCI segments 1+).

But you're right, unless the card vendor supplied some kind of strapping
option, or something similar, it won't know it's in a UV system or not.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-14 22:34         ` Mike Travis
  2010-05-14 22:35           ` H. Peter Anvin
@ 2010-05-14 22:47           ` Jesse Barnes
  2010-05-14 22:59             ` Mike Travis
  2010-05-14 23:20             ` H. Peter Anvin
  1 sibling, 2 replies; 49+ messages in thread
From: Jesse Barnes @ 2010-05-14 22:47 UTC (permalink / raw)
  To: Mike Travis
  Cc: Bjorn Helgaas, Mike Habeck, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, x86, Jacob Pan, Tejun Heo, LKML, Yinghai,
	linux-pci, Myron Stowe

On Fri, 14 May 2010 15:34:01 -0700
Mike Travis <travis@sgi.com> wrote:

> 
> 
> Jesse Barnes wrote:
> > On Thu, 13 May 2010 14:02:30 -0600
> > Bjorn Helgaas <bjorn.helgaas@hp.com> wrote:
> >>>> This issue is not specific to x86, so I don't really like having
> >>>> the implementation be x86-specific.
> >>> I agree this isn't a x86 specific issue but given the 'norom'
> >>> cmdline option is basically doing the same thing (but for pci
> >>> Expansion ROM BARs) this code was modeled after it.
> 
> >> IMHO, we should fix both.
> > 
> > Yeah, that would be good.  Mike, have you looked at this at all?
> > 
> > Also, to clarify, this isn't affecting users today, right?  Or do you
> > need all this I/O space for multiple IOHs and the drivers that bind to
> > them in current UV systems?
> 
> We have customers that want to install more than 16 PCI-e cards right
> now.  Our window of opportunity closes very soon (days), so either this
> patch makes it in as is (or something close), or we wait for another
> release cycle.  UV shipments start this month.
> 
> [I wouldn't mind working on an improvement for later.]

Wow and they're using cards that want to use I/O space?  Funky.  It's
too late to get this into 2.6.34, but that can't be what you were
expecting... I don't see a problem with getting something like this in
for 2.6.35.

> > Fundamentally, until we have real dynamic PCI resource management (i.e.
> > driver hooks for handling relocation, lazy allocation of resources at
> > driver bind time, etc.) we're going to continue to need hacks like
> > this.  However, we could make them slightly more automated by making
> > "nobar" and "norom" the default on systems that typically need them,
> > maybe with a DMI table.
> 
> It seems that BIOS changes are much more difficult.  The real solution
> to this problem is for Card Vendors to not request I/O Bars if they
> won't be using them.  But that's the hardest option of all to accomplish.

Right.

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-14 22:47           ` Jesse Barnes
@ 2010-05-14 22:59             ` Mike Travis
  2010-05-14 23:06               ` Jesse Barnes
  2010-05-14 23:20             ` H. Peter Anvin
  1 sibling, 1 reply; 49+ messages in thread
From: Mike Travis @ 2010-05-14 22:59 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Bjorn Helgaas, Mike Habeck, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, x86, Jacob Pan, Tejun Heo, LKML, Yinghai,
	linux-pci, Myron Stowe



Jesse Barnes wrote:
> On Fri, 14 May 2010 15:34:01 -0700
> Mike Travis <travis@sgi.com> wrote:
> 
>>
>> Jesse Barnes wrote:
>>> On Thu, 13 May 2010 14:02:30 -0600
>>> Bjorn Helgaas <bjorn.helgaas@hp.com> wrote:
>>>>>> This issue is not specific to x86, so I don't really like having
>>>>>> the implementation be x86-specific.
>>>>> I agree this isn't a x86 specific issue but given the 'norom'
>>>>> cmdline option is basically doing the same thing (but for pci
>>>>> Expansion ROM BARs) this code was modeled after it.
>>>> IMHO, we should fix both.
>>> Yeah, that would be good.  Mike, have you looked at this at all?
>>>
>>> Also, to clarify, this isn't affecting users today, right?  Or do you
>>> need all this I/O space for multiple IOHs and the drivers that bind to
>>> them in current UV systems?
>> We have customers that want to install more than 16 PCI-e cards right
>> now.  Our window of opportunity closes very soon (days), so either this
>> patch makes it in as is (or something close), or we wait for another
>> release cycle.  UV shipments start this month.
>>
>> [I wouldn't mind working on an improvement for later.]
> 
> Wow and they're using cards that want to use I/O space?  Funky.  It's
> too late to get this into 2.6.34, but that can't be what you were
> expecting... I don't see a problem with getting something like this in
> for 2.6.35.

2.6.35 would be fine.  It's the acceptance that's the key.

And yes, we're using standard cards like everyone else... ;-)

[The message is "UV" is just a really, really big PC. ;-)]

I would appreciate however, some more detail on what's the goal of the
updates to "fix both".  Thanks!

> 
>>> Fundamentally, until we have real dynamic PCI resource management (i.e.
>>> driver hooks for handling relocation, lazy allocation of resources at
>>> driver bind time, etc.) we're going to continue to need hacks like
>>> this.  However, we could make them slightly more automated by making
>>> "nobar" and "norom" the default on systems that typically need them,
>>> maybe with a DMI table.
>> It seems that BIOS changes are much more difficult.  The real solution
>> to this problem is for Card Vendors to not request I/O Bars if they
>> won't be using them.  But that's the hardest option of all to accomplish.
> 
> Right.
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-14 22:59             ` Mike Travis
@ 2010-05-14 23:06               ` Jesse Barnes
  2010-05-14 23:23                 ` Mike Travis
  0 siblings, 1 reply; 49+ messages in thread
From: Jesse Barnes @ 2010-05-14 23:06 UTC (permalink / raw)
  To: Mike Travis
  Cc: Bjorn Helgaas, Mike Habeck, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, x86, Jacob Pan, Tejun Heo, LKML, Yinghai,
	linux-pci, Myron Stowe

On Fri, 14 May 2010 15:59:11 -0700
Mike Travis <travis@sgi.com> wrote:

> 
> 
> Jesse Barnes wrote:
> > On Fri, 14 May 2010 15:34:01 -0700
> > Mike Travis <travis@sgi.com> wrote:
> > 
> >>
> >> Jesse Barnes wrote:
> >>> On Thu, 13 May 2010 14:02:30 -0600
> >>> Bjorn Helgaas <bjorn.helgaas@hp.com> wrote:
> >>>>>> This issue is not specific to x86, so I don't really like having
> >>>>>> the implementation be x86-specific.
> >>>>> I agree this isn't a x86 specific issue but given the 'norom'
> >>>>> cmdline option is basically doing the same thing (but for pci
> >>>>> Expansion ROM BARs) this code was modeled after it.
> >>>> IMHO, we should fix both.
> >>> Yeah, that would be good.  Mike, have you looked at this at all?
> >>>
> >>> Also, to clarify, this isn't affecting users today, right?  Or do you
> >>> need all this I/O space for multiple IOHs and the drivers that bind to
> >>> them in current UV systems?
> >> We have customers that want to install more than 16 PCI-e cards right
> >> now.  Our window of opportunity closes very soon (days), so either this
> >> patch makes it in as is (or something close), or we wait for another
> >> release cycle.  UV shipments start this month.
> >>
> >> [I wouldn't mind working on an improvement for later.]
> > 
> > Wow and they're using cards that want to use I/O space?  Funky.  It's
> > too late to get this into 2.6.34, but that can't be what you were
> > expecting... I don't see a problem with getting something like this in
> > for 2.6.35.
> 
> 2.6.35 would be fine.  It's the acceptance that's the key.
> 
> And yes, we're using standard cards like everyone else... ;-)
> 
> [The message is "UV" is just a really, really big PC. ;-)]
> 
> I would appreciate however, some more detail on what's the goal of the
> updates to "fix both".  Thanks!

As Bjorn noted, both the norom and nobar options are listed as x86
specific in the documentation and use x86 specific flags and code to
prevent their respective allocations.

It would be good if we could move the flags and code to the common PCI
layer in drivers/pci so that other arches could take advantage of it,
and we could have less resource management fragmentation.

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-14 22:47           ` Jesse Barnes
  2010-05-14 22:59             ` Mike Travis
@ 2010-05-14 23:20             ` H. Peter Anvin
  2010-05-14 23:28               ` Jesse Barnes
  1 sibling, 1 reply; 49+ messages in thread
From: H. Peter Anvin @ 2010-05-14 23:20 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Mike Travis, Bjorn Helgaas, Mike Habeck, Ingo Molnar,
	Thomas Gleixner, x86, Jacob Pan, Tejun Heo, LKML, Yinghai,
	linux-pci, Myron Stowe

On 05/14/2010 03:47 PM, Jesse Barnes wrote:
> 
> Wow and they're using cards that want to use I/O space?  Funky.  It's
> too late to get this into 2.6.34, but that can't be what you were
> expecting... I don't see a problem with getting something like this in
> for 2.6.35.
> 

Most cards on the market provide I/O BARs as a convenience to legacy
BIOSes; they don't need the I/O BAR functionality from inside a
full-featured OS.  There are a few, key, exceptions, mainly in the form
of legacy-interface devices like UARTs and VGA (note that VGA has its
own routing bits and is therefore unaffected by this problem.)

	-hpa

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-14 23:06               ` Jesse Barnes
@ 2010-05-14 23:23                 ` Mike Travis
  2010-05-14 23:33                   ` Jesse Barnes
  0 siblings, 1 reply; 49+ messages in thread
From: Mike Travis @ 2010-05-14 23:23 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Bjorn Helgaas, Mike Habeck, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, x86, Jacob Pan, Tejun Heo, LKML, Yinghai,
	linux-pci, Myron Stowe



Jesse Barnes wrote:
> On Fri, 14 May 2010 15:59:11 -0700
> Mike Travis <travis@sgi.com> wrote:
> 
>>
>> Jesse Barnes wrote:
>>> On Fri, 14 May 2010 15:34:01 -0700
>>> Mike Travis <travis@sgi.com> wrote:
>>>
>>>> Jesse Barnes wrote:
>>>>> On Thu, 13 May 2010 14:02:30 -0600
>>>>> Bjorn Helgaas <bjorn.helgaas@hp.com> wrote:
>>>>>>>> This issue is not specific to x86, so I don't really like having
>>>>>>>> the implementation be x86-specific.
>>>>>>> I agree this isn't a x86 specific issue but given the 'norom'
>>>>>>> cmdline option is basically doing the same thing (but for pci
>>>>>>> Expansion ROM BARs) this code was modeled after it.
>>>>>> IMHO, we should fix both.
>>>>> Yeah, that would be good.  Mike, have you looked at this at all?
>>>>>
>>>>> Also, to clarify, this isn't affecting users today, right?  Or do you
>>>>> need all this I/O space for multiple IOHs and the drivers that bind to
>>>>> them in current UV systems?
>>>> We have customers that want to install more than 16 PCI-e cards right
>>>> now.  Our window of opportunity closes very soon (days), so either this
>>>> patch makes it in as is (or something close), or we wait for another
>>>> release cycle.  UV shipments start this month.
>>>>
>>>> [I wouldn't mind working on an improvement for later.]
>>> Wow and they're using cards that want to use I/O space?  Funky.  It's
>>> too late to get this into 2.6.34, but that can't be what you were
>>> expecting... I don't see a problem with getting something like this in
>>> for 2.6.35.
>> 2.6.35 would be fine.  It's the acceptance that's the key.
>>
>> And yes, we're using standard cards like everyone else... ;-)
>>
>> [The message is "UV" is just a really, really big PC. ;-)]
>>
>> I would appreciate however, some more detail on what's the goal of the
>> updates to "fix both".  Thanks!
> 
> As Bjorn noted, both the norom and nobar options are listed as x86
> specific in the documentation and use x86 specific flags and code to
> prevent their respective allocations.
> 
> It would be good if we could move the flags and code to the common PCI
> layer in drivers/pci so that other arches could take advantage of it,
> and we could have less resource management fragmentation.
> 

Thanks!  I was hoping you weren't going to say we need to remove the
need for these options.

I think the dynamic provisioning thing would be a great feature for the
future, but it seems to me that it would entail quite a bit of coordinated
effort between BIOS and the kernel?  Not to mention the final death of
all POST related initialization (at least in the case where the device
is not being used in Legacy mode)?  [Heck, can we kill POST too?!?] I
haven't looked too closely at non-x86 arch's but unless they run an
x86 emulator, then the POST initialization is pretty much superfluous,
isn't it?  The drivers will initialize when the device comes online?
I mean we do have more the 64k of memory now... ;-)]

Thanks,
Mike





^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-14 23:20             ` H. Peter Anvin
@ 2010-05-14 23:28               ` Jesse Barnes
  2010-05-14 23:32                 ` H. Peter Anvin
  0 siblings, 1 reply; 49+ messages in thread
From: Jesse Barnes @ 2010-05-14 23:28 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Mike Travis, Bjorn Helgaas, Mike Habeck, Ingo Molnar,
	Thomas Gleixner, x86, Jacob Pan, Tejun Heo, LKML, Yinghai,
	linux-pci, Myron Stowe

On Fri, 14 May 2010 16:20:45 -0700
"H. Peter Anvin" <hpa@zytor.com> wrote:

> On 05/14/2010 03:47 PM, Jesse Barnes wrote:
> > 
> > Wow and they're using cards that want to use I/O space?  Funky.  It's
> > too late to get this into 2.6.34, but that can't be what you were
> > expecting... I don't see a problem with getting something like this in
> > for 2.6.35.
> > 
> 
> Most cards on the market provide I/O BARs as a convenience to legacy
> BIOSes; they don't need the I/O BAR functionality from inside a
> full-featured OS.  There are a few, key, exceptions, mainly in the form
> of legacy-interface devices like UARTs and VGA (note that VGA has its
> own routing bits and is therefore unaffected by this problem.)

Yeah, it's the "legacy" part that I'm questioning.  I'm just lamenting
that it's dying off so slowly...

And yes, VGA is an unfortunate standard.

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-14 23:28               ` Jesse Barnes
@ 2010-05-14 23:32                 ` H. Peter Anvin
  2010-05-14 23:34                   ` Jesse Barnes
  0 siblings, 1 reply; 49+ messages in thread
From: H. Peter Anvin @ 2010-05-14 23:32 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Mike Travis, Bjorn Helgaas, Mike Habeck, Ingo Molnar,
	Thomas Gleixner, x86, Jacob Pan, Tejun Heo, LKML, Yinghai,
	linux-pci, Myron Stowe

On 05/14/2010 04:28 PM, Jesse Barnes wrote:
> On Fri, 14 May 2010 16:20:45 -0700
> "H. Peter Anvin" <hpa@zytor.com> wrote:
> 
>> On 05/14/2010 03:47 PM, Jesse Barnes wrote:
>>>
>>> Wow and they're using cards that want to use I/O space?  Funky.  It's
>>> too late to get this into 2.6.34, but that can't be what you were
>>> expecting... I don't see a problem with getting something like this in
>>> for 2.6.35.
>>>
>>
>> Most cards on the market provide I/O BARs as a convenience to legacy
>> BIOSes; they don't need the I/O BAR functionality from inside a
>> full-featured OS.  There are a few, key, exceptions, mainly in the form
>> of legacy-interface devices like UARTs and VGA (note that VGA has its
>> own routing bits and is therefore unaffected by this problem.)
> 
> Yeah, it's the "legacy" part that I'm questioning.  I'm just lamenting
> that it's dying off so slowly...
> 
> And yes, VGA is an unfortunate standard.
> 

I'm not lamenting that fact, because my experience is that what ends up
replacing it is often far worse.  Consider UARTs -- no MMU dependencies
at all, can be accessed with four lines of assembly, and compare it to
EHCI debug port, the driver for which is over 900 lines in the Linux
kernel -- and that assumes that you're already in flat mode.

	-hpa

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-14 23:23                 ` Mike Travis
@ 2010-05-14 23:33                   ` Jesse Barnes
  2010-05-14 23:40                     ` H. Peter Anvin
  0 siblings, 1 reply; 49+ messages in thread
From: Jesse Barnes @ 2010-05-14 23:33 UTC (permalink / raw)
  To: Mike Travis
  Cc: Bjorn Helgaas, Mike Habeck, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, x86, Jacob Pan, Tejun Heo, LKML, Yinghai,
	linux-pci, Myron Stowe

On Fri, 14 May 2010 16:23:52 -0700
Mike Travis <travis@sgi.com> wrote:
> Thanks!  I was hoping you weren't going to say we need to remove the
> need for these options.

No, I won't force that on you just yet. :)

> I think the dynamic provisioning thing would be a great feature for the
> future, but it seems to me that it would entail quite a bit of coordinated
> effort between BIOS and the kernel?  Not to mention the final death of
> all POST related initialization (at least in the case where the device
> is not being used in Legacy mode)?  [Heck, can we kill POST too?!?] I
> haven't looked too closely at non-x86 arch's but unless they run an
> x86 emulator, then the POST initialization is pretty much superfluous,
> isn't it?  The drivers will initialize when the device comes online?
> I mean we do have more the 64k of memory now... ;-)]

Yeah, we can mostly ignore POST, though some BIOSes need it for their
boot time services.  Even complex graphics devices tend not to need it
these days as long as you have a fully capable driver.

As for BIOS coordination for dynamic reallocation, yeah there'd be some
of that.  I think the basic principles would be:
  1) use BIOS allocations wherever possible
  2) get an accurate list of available resources from the BIOS for
     potential remapping later
  3) allocate resources for BARs and devices as late as possible (e.g.
     at driver bind time) to avoid allocating more than we need

But that's a good chunk of work, and as we've seen, PCs in particular
are really sensitive to having resources moved around too much, so step
(2) is probably the hardest part.

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-14 23:32                 ` H. Peter Anvin
@ 2010-05-14 23:34                   ` Jesse Barnes
  2010-05-14 23:39                     ` H. Peter Anvin
  0 siblings, 1 reply; 49+ messages in thread
From: Jesse Barnes @ 2010-05-14 23:34 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Mike Travis, Bjorn Helgaas, Mike Habeck, Ingo Molnar,
	Thomas Gleixner, x86, Jacob Pan, Tejun Heo, LKML, Yinghai,
	linux-pci, Myron Stowe

On Fri, 14 May 2010 16:32:07 -0700
"H. Peter Anvin" <hpa@zytor.com> wrote:

> On 05/14/2010 04:28 PM, Jesse Barnes wrote:
> > On Fri, 14 May 2010 16:20:45 -0700
> > "H. Peter Anvin" <hpa@zytor.com> wrote:
> > 
> >> On 05/14/2010 03:47 PM, Jesse Barnes wrote:
> >>>
> >>> Wow and they're using cards that want to use I/O space?  Funky.  It's
> >>> too late to get this into 2.6.34, but that can't be what you were
> >>> expecting... I don't see a problem with getting something like this in
> >>> for 2.6.35.
> >>>
> >>
> >> Most cards on the market provide I/O BARs as a convenience to legacy
> >> BIOSes; they don't need the I/O BAR functionality from inside a
> >> full-featured OS.  There are a few, key, exceptions, mainly in the form
> >> of legacy-interface devices like UARTs and VGA (note that VGA has its
> >> own routing bits and is therefore unaffected by this problem.)
> > 
> > Yeah, it's the "legacy" part that I'm questioning.  I'm just lamenting
> > that it's dying off so slowly...
> > 
> > And yes, VGA is an unfortunate standard.
> > 
> 
> I'm not lamenting that fact, because my experience is that what ends up
> replacing it is often far worse.  Consider UARTs -- no MMU dependencies
> at all, can be accessed with four lines of assembly, and compare it to
> EHCI debug port, the driver for which is over 900 lines in the Linux
> kernel -- and that assumes that you're already in flat mode.

Heh, you're so old and crufty!

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-14 23:34                   ` Jesse Barnes
@ 2010-05-14 23:39                     ` H. Peter Anvin
  2010-05-15  0:00                       ` Jesse Barnes
  0 siblings, 1 reply; 49+ messages in thread
From: H. Peter Anvin @ 2010-05-14 23:39 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Mike Travis, Bjorn Helgaas, Mike Habeck, Ingo Molnar,
	Thomas Gleixner, x86, Jacob Pan, Tejun Heo, LKML, Yinghai,
	linux-pci, Myron Stowe

On 05/14/2010 04:34 PM, Jesse Barnes wrote:
>> I'm not lamenting that fact, because my experience is that what ends up
>> replacing it is often far worse.  Consider UARTs -- no MMU dependencies
>> at all, can be accessed with four lines of assembly, and compare it to
>> EHCI debug port, the driver for which is over 900 lines in the Linux
>> kernel -- and that assumes that you're already in flat mode.
> 
> Heh, you're so old and crufty!

I know.  Debugging is so 20th century.

	-hpa


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-14 23:33                   ` Jesse Barnes
@ 2010-05-14 23:40                     ` H. Peter Anvin
  2010-05-15  0:02                       ` Jesse Barnes
  0 siblings, 1 reply; 49+ messages in thread
From: H. Peter Anvin @ 2010-05-14 23:40 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Mike Travis, Bjorn Helgaas, Mike Habeck, Ingo Molnar,
	Thomas Gleixner, x86, Jacob Pan, Tejun Heo, LKML, Yinghai,
	linux-pci, Myron Stowe

On 05/14/2010 04:33 PM, Jesse Barnes wrote:
> 
> As for BIOS coordination for dynamic reallocation, yeah there'd be some
> of that.  I think the basic principles would be:
>   1) use BIOS allocations wherever possible
>   2) get an accurate list of available resources from the BIOS for
>      potential remapping later
>   3) allocate resources for BARs and devices as late as possible (e.g.
>      at driver bind time) to avoid allocating more than we need
> 
> But that's a good chunk of work, and as we've seen, PCs in particular
> are really sensitive to having resources moved around too much, so step
> (2) is probably the hardest part.
> 

The real problem that I see, as outlined before, has nothing to do with
the BIOS, but rather the interdependencies between resources.

	-hpa


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-14 23:39                     ` H. Peter Anvin
@ 2010-05-15  0:00                       ` Jesse Barnes
  2010-05-15  0:14                         ` H. Peter Anvin
  0 siblings, 1 reply; 49+ messages in thread
From: Jesse Barnes @ 2010-05-15  0:00 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Mike Travis, Bjorn Helgaas, Mike Habeck, Ingo Molnar,
	Thomas Gleixner, x86, Jacob Pan, Tejun Heo, LKML, Yinghai,
	linux-pci, Myron Stowe

On Fri, 14 May 2010 16:39:59 -0700
"H. Peter Anvin" <hpa@zytor.com> wrote:

> On 05/14/2010 04:34 PM, Jesse Barnes wrote:
> >> I'm not lamenting that fact, because my experience is that what ends up
> >> replacing it is often far worse.  Consider UARTs -- no MMU dependencies
> >> at all, can be accessed with four lines of assembly, and compare it to
> >> EHCI debug port, the driver for which is over 900 lines in the Linux
> >> kernel -- and that assumes that you're already in flat mode.
> > 
> > Heh, you're so old and crufty!
> 
> I know.  Debugging is so 20th century.

Yeah, get with the times.  Debugging on today's machines means you have
to look up register block offsets in ACPI, run some AML to setup your
debug port, and then load a microkernel onto the debug device to get
your vnc enabled debug console!

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-14 23:40                     ` H. Peter Anvin
@ 2010-05-15  0:02                       ` Jesse Barnes
  0 siblings, 0 replies; 49+ messages in thread
From: Jesse Barnes @ 2010-05-15  0:02 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Mike Travis, Bjorn Helgaas, Mike Habeck, Ingo Molnar,
	Thomas Gleixner, x86, Jacob Pan, Tejun Heo, LKML, Yinghai,
	linux-pci, Myron Stowe

On Fri, 14 May 2010 16:40:55 -0700
"H. Peter Anvin" <hpa@zytor.com> wrote:

> On 05/14/2010 04:33 PM, Jesse Barnes wrote:
> > 
> > As for BIOS coordination for dynamic reallocation, yeah there'd be some
> > of that.  I think the basic principles would be:
> >   1) use BIOS allocations wherever possible
> >   2) get an accurate list of available resources from the BIOS for
> >      potential remapping later
> >   3) allocate resources for BARs and devices as late as possible (e.g.
> >      at driver bind time) to avoid allocating more than we need
> > 
> > But that's a good chunk of work, and as we've seen, PCs in particular
> > are really sensitive to having resources moved around too much, so step
> > (2) is probably the hardest part.
> > 
> 
> The real problem that I see, as outlined before, has nothing to do with
> the BIOS, but rather the interdependencies between resources.

I was using the term "BIOS" loosely to refer to safe ranges to allocate
resources.  Your topology example is a good one, there are definite
dependencies between devices, especially on large systems.  But in both
simple and complex cases, we still need to have resource ranges
available, or we'll have no where to put things at all.

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-15  0:00                       ` Jesse Barnes
@ 2010-05-15  0:14                         ` H. Peter Anvin
  0 siblings, 0 replies; 49+ messages in thread
From: H. Peter Anvin @ 2010-05-15  0:14 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Mike Travis, Bjorn Helgaas, Mike Habeck, Ingo Molnar,
	Thomas Gleixner, x86, Jacob Pan, Tejun Heo, LKML, Yinghai,
	linux-pci, Myron Stowe

On 05/14/2010 05:00 PM, Jesse Barnes wrote:
> On Fri, 14 May 2010 16:39:59 -0700
> "H. Peter Anvin" <hpa@zytor.com> wrote:
> 
>> On 05/14/2010 04:34 PM, Jesse Barnes wrote:
>>>> I'm not lamenting that fact, because my experience is that what ends up
>>>> replacing it is often far worse.  Consider UARTs -- no MMU dependencies
>>>> at all, can be accessed with four lines of assembly, and compare it to
>>>> EHCI debug port, the driver for which is over 900 lines in the Linux
>>>> kernel -- and that assumes that you're already in flat mode.
>>>
>>> Heh, you're so old and crufty!
>>
>> I know.  Debugging is so 20th century.
> 
> Yeah, get with the times.  Debugging on today's machines means you have
> to look up register block offsets in ACPI, run some AML to setup your
> debug port, and then load a microkernel onto the debug device to get
> your vnc enabled debug console!
> 

An excellent plan!  Cannot fail!

	-hpa


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-14 22:40             ` Mike Travis
@ 2010-05-15  2:25               ` Mike Travis
  0 siblings, 0 replies; 49+ messages in thread
From: Mike Travis @ 2010-05-15  2:25 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jesse Barnes, Bjorn Helgaas, Mike Habeck, Ingo Molnar,
	Thomas Gleixner, x86, Jacob Pan, Tejun Heo, LKML, Yinghai,
	linux-pci, Myron Stowe

... We're somewhat fortunate in that our Legacy I/O
> devices are confined to those on the first blade, and all of these
> other devices are on other blades (PCI segments 1+).


By "somewhat" I meant we'd be way more fortunate without the
legacy I/O devices at all!  But hpa is right, 4 lines of
assembler is way better than (900?) lines of C.

My feeling is that if a CPU needs legacy i/o it should be
built into the chip.  HAH!  ;-)

The BIOS should just list the machine resources, in a
"non-BIOS" way, (oh wait, that's ACPI! :*)

Cheers!

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-12 18:14 [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned Mike Travis
  2010-05-13 18:56 ` Bjorn Helgaas
  2010-05-13 18:58 ` Bjorn Helgaas
@ 2010-05-28 16:53 ` Mike Travis
  2010-05-28 17:00   ` H. Peter Anvin
  2 siblings, 1 reply; 49+ messages in thread
From: Mike Travis @ 2010-05-28 16:53 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin
  Cc: x86, Jesse Barnes, Bjorn Helgaas, Jacob Pan, Tejun Heo,
	Mike Habeck, LKML

Any further consideration for this patch, or has it been rejected?

Thanks, Mike

Mike Travis wrote:
> Subject: [Patch 1/1] x86 pci: Add option to not assign BAR's if not 
> already assigned
> From: Mike Habeck <habeck@sgi.com>
> 
> The Linux kernel assigns BARs that a BIOS did not assign, most likely
> to handle broken BIOSes that didn't enumerate the devices correctly.
> On UV the BIOS purposely doesn't assign I/O BARs for certain devices/
> drivers we know don't use them (examples, LSI SAS, Qlogic FC, ...).
> We purposely don't assign these I/O BARs because I/O Space is a very
> limited resource.  There is only 64k of I/O Space, and in a PCIe
> topology that space gets divided up into 4k chucks (this is due to
> the fact that a pci-to-pci bridge's I/O decoder is aligned at 4k)...
> Thus a system can have at most 16 cards with I/O BARs: (64k / 4k = 16)
> 
> SGI needs to scale to >16 devices with I/O BARs.  So by not assigning
> I/O BARs on devices we know don't use them, we can do that (iff the
> kernel doesn't go and assign these BARs that the BIOS purposely didn't
> assign).
> 
> This patch will not assign a resource to a device BAR if that BAR was
> not assigned by the BIOS, and the kernel cmdline option 'pci=nobar'
> was specified.   This patch is closely modeled after the 'pci=norom'
> option that currently exists in the tree.
> 
> Signed-off-by: Mike Habeck <habeck@sgi.com>
> Signed-off-by: Mike Travis <travis@sgi.com>
> ---
> Documentation/kernel-parameters.txt |    2 ++
> arch/x86/include/asm/pci_x86.h      |    1 +
> arch/x86/pci/common.c               |   20 ++++++++++++++++++++
> 3 files changed, 23 insertions(+)
> 
> --- linux.orig/Documentation/kernel-parameters.txt
> +++ linux/Documentation/kernel-parameters.txt
> @@ -1935,6 +1935,8 @@ and is between 256 and 4096 characters.
>         norom        [X86] Do not assign address space to
>                 expansion ROMs that do not already have
>                 BIOS assigned address ranges.
> +        nobar        [X86] Do not assign address space to the
> +                BARs that weren't assigned by the BIOS.
>         irqmask=0xMMMM    [X86] Set a bit mask of IRQs allowed to be
>                 assigned automatically to PCI devices. You can
>                 make the kernel exclude IRQs of your ISA cards
> --- linux.orig/arch/x86/include/asm/pci_x86.h
> +++ linux/arch/x86/include/asm/pci_x86.h
> @@ -30,6 +30,7 @@
> #define PCI_HAS_IO_ECS        0x40000
> #define PCI_NOASSIGN_ROMS    0x80000
> #define PCI_ROOT_NO_CRS        0x100000
> +#define PCI_NOASSIGN_BARS    0x200000
> 
> extern unsigned int pci_probe;
> extern unsigned long pirq_table_addr;
> --- linux.orig/arch/x86/pci/common.c
> +++ linux/arch/x86/pci/common.c
> @@ -125,6 +125,23 @@ void __init dmi_check_skip_isa_align(voi
> static void __devinit pcibios_fixup_device_resources(struct pci_dev *dev)
> {
>     struct resource *rom_r = &dev->resource[PCI_ROM_RESOURCE];
> +    struct resource *bar_r;
> +    int bar;
> +
> +    if (pci_probe & PCI_NOASSIGN_BARS) {
> +        /*
> +        * If the BIOS did not assign the BAR, zero out the
> +        * resource so the kernel doesn't attmept to assign
> +        * it later on in pci_assign_unassigned_resources
> +        */
> +        for (bar = 0; bar <= PCI_STD_RESOURCE_END; bar++) {
> +            bar_r = &dev->resource[bar];
> +            if (bar_r->start == 0 && bar_r->end != 0) {
> +                bar_r->flags = 0;
> +                bar_r->end = 0;
> +            }
> +        }
> +    }
> 
>     if (pci_probe & PCI_NOASSIGN_ROMS) {
>         if (rom_r->parent)
> @@ -509,6 +526,9 @@ char * __devinit  pcibios_setup(char *st
>     } else if (!strcmp(str, "norom")) {
>         pci_probe |= PCI_NOASSIGN_ROMS;
>         return NULL;
> +    } else if (!strcmp(str, "nobar")) {
> +        pci_probe |= PCI_NOASSIGN_BARS;
> +        return NULL;
>     } else if (!strcmp(str, "assign-busses")) {
>         pci_probe |= PCI_ASSIGN_ALL_BUSSES;
>         return NULL;

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-28 16:53 ` Mike Travis
@ 2010-05-28 17:00   ` H. Peter Anvin
  2010-05-28 17:10     ` Mike Travis
  0 siblings, 1 reply; 49+ messages in thread
From: H. Peter Anvin @ 2010-05-28 17:00 UTC (permalink / raw)
  To: Mike Travis
  Cc: Ingo Molnar, Thomas Gleixner, x86, Jesse Barnes, Bjorn Helgaas,
	Jacob Pan, Tejun Heo, Mike Habeck, LKML

On 05/28/2010 09:53 AM, Mike Travis wrote:
> Any further consideration for this patch, or has it been rejected?

Well, it's really up to Jesse, but as far as I can see, this patch is a
net loss of functionality and doesn't actually add anything.  Without
this patch, some resources that were not assigned by BIOS will be
unreachable.  With this patch, *all* resources that were not assigned by
BIOS will be unreachable...

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-28 17:00   ` H. Peter Anvin
@ 2010-05-28 17:10     ` Mike Travis
  2010-05-28 19:28       ` Jesse Barnes
  2010-05-28 20:04       ` H. Peter Anvin
  0 siblings, 2 replies; 49+ messages in thread
From: Mike Travis @ 2010-05-28 17:10 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Thomas Gleixner, x86, Jesse Barnes, Bjorn Helgaas,
	Jacob Pan, Tejun Heo, Mike Habeck, LKML



H. Peter Anvin wrote:
> On 05/28/2010 09:53 AM, Mike Travis wrote:
>> Any further consideration for this patch, or has it been rejected?
> 
> Well, it's really up to Jesse, but as far as I can see, this patch is a
> net loss of functionality and doesn't actually add anything.  Without
> this patch, some resources that were not assigned by BIOS will be
> unreachable.  With this patch, *all* resources that were not assigned by
> BIOS will be unreachable...
> 
> 	-hpa
> 

Apparently you're missing the point of the patch?  The patch is needed
because BIOS is purposely not assigning I/O BAR's to devices that won't
use them, freeing up the resource for devices that do need them.  Where
is the "all" resources that are not reachable?

Thanks,
Mike

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-28 17:10     ` Mike Travis
@ 2010-05-28 19:28       ` Jesse Barnes
  2010-05-28 20:04       ` H. Peter Anvin
  1 sibling, 0 replies; 49+ messages in thread
From: Jesse Barnes @ 2010-05-28 19:28 UTC (permalink / raw)
  To: Mike Travis
  Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, x86, Bjorn Helgaas,
	Jacob Pan, Tejun Heo, Mike Habeck, LKML

On Fri, 28 May 2010 10:10:16 -0700
Mike Travis <travis@sgi.com> wrote:

> 
> 
> H. Peter Anvin wrote:
> > On 05/28/2010 09:53 AM, Mike Travis wrote:
> >> Any further consideration for this patch, or has it been rejected?
> > 
> > Well, it's really up to Jesse, but as far as I can see, this patch is a
> > net loss of functionality and doesn't actually add anything.  Without
> > this patch, some resources that were not assigned by BIOS will be
> > unreachable.  With this patch, *all* resources that were not assigned by
> > BIOS will be unreachable...
> > 
> > 	-hpa
> > 
> 
> Apparently you're missing the point of the patch?  The patch is needed
> because BIOS is purposely not assigning I/O BAR's to devices that won't
> use them, freeing up the resource for devices that do need them.  Where
> is the "all" resources that are not reachable?

Applied to my linux-next branch, thanks.

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-28 17:10     ` Mike Travis
  2010-05-28 19:28       ` Jesse Barnes
@ 2010-05-28 20:04       ` H. Peter Anvin
  2010-05-31 11:12         ` Mike Travis
  1 sibling, 1 reply; 49+ messages in thread
From: H. Peter Anvin @ 2010-05-28 20:04 UTC (permalink / raw)
  To: Mike Travis
  Cc: Ingo Molnar, Thomas Gleixner, x86, Jesse Barnes, Bjorn Helgaas,
	Jacob Pan, Tejun Heo, Mike Habeck, LKML

On 05/28/2010 10:10 AM, Mike Travis wrote:
> 
> 
> H. Peter Anvin wrote:
>> On 05/28/2010 09:53 AM, Mike Travis wrote:
>>> Any further consideration for this patch, or has it been rejected?
>>
>> Well, it's really up to Jesse, but as far as I can see, this patch is a
>> net loss of functionality and doesn't actually add anything.  Without
>> this patch, some resources that were not assigned by BIOS will be
>> unreachable.  With this patch, *all* resources that were not assigned by
>> BIOS will be unreachable...
>>
>> 	-hpa
>>
> 
> Apparently you're missing the point of the patch?  The patch is needed
> because BIOS is purposely not assigning I/O BAR's to devices that won't
> use them, freeing up the resource for devices that do need them.  Where
> is the "all" resources that are not reachable?
> 

No, the patch isn't needed for those.

Without your patch:

- Devices assigned by BIOS remain assigned;
- Devices not assigned by BIOS get assigned until address space
  exhausted.

With your patch:

- Devices assigned by BIOS remain assigned;
- Devices not assigned by BIOS never get assigned at all.

What am I missing here?

	-hpa


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-28 20:04       ` H. Peter Anvin
@ 2010-05-31 11:12         ` Mike Travis
  2010-05-31 16:36           ` H. Peter Anvin
  2010-06-01 22:49           ` Bjorn Helgaas
  0 siblings, 2 replies; 49+ messages in thread
From: Mike Travis @ 2010-05-31 11:12 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Thomas Gleixner, x86, Jesse Barnes, Bjorn Helgaas,
	Jacob Pan, Tejun Heo, Mike Habeck, LKML



H. Peter Anvin wrote:
> On 05/28/2010 10:10 AM, Mike Travis wrote:
>>
>> H. Peter Anvin wrote:
>>> On 05/28/2010 09:53 AM, Mike Travis wrote:
>>>> Any further consideration for this patch, or has it been rejected?
>>> Well, it's really up to Jesse, but as far as I can see, this patch is a
>>> net loss of functionality and doesn't actually add anything.  Without
>>> this patch, some resources that were not assigned by BIOS will be
>>> unreachable.  With this patch, *all* resources that were not assigned by
>>> BIOS will be unreachable...
>>>
>>> 	-hpa
>>>
>> Apparently you're missing the point of the patch?  The patch is needed
>> because BIOS is purposely not assigning I/O BAR's to devices that won't
>> use them, freeing up the resource for devices that do need them.  Where
>> is the "all" resources that are not reachable?
>>
> 
> No, the patch isn't needed for those.
> 
> Without your patch:
> 
> - Devices assigned by BIOS remain assigned;
> - Devices not assigned by BIOS get assigned until address space
>   exhausted.
> 
> With your patch:
> 
> - Devices assigned by BIOS remain assigned;
> - Devices not assigned by BIOS never get assigned at all.
> 
> What am I missing here?
> 
> 	-hpa
> 

BIOS still assigns the MMIO BAR's so the devices are alive.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-31 11:12         ` Mike Travis
@ 2010-05-31 16:36           ` H. Peter Anvin
  2010-06-01 22:49           ` Bjorn Helgaas
  1 sibling, 0 replies; 49+ messages in thread
From: H. Peter Anvin @ 2010-05-31 16:36 UTC (permalink / raw)
  To: Mike Travis
  Cc: Ingo Molnar, Thomas Gleixner, x86, Jesse Barnes, Bjorn Helgaas,
	Jacob Pan, Tejun Heo, Mike Habeck, LKML

On 05/31/2010 04:12 AM, Mike Travis wrote:
> 
> BIOS still assigns the MMIO BAR's so the devices are alive.

OK, so it sounds like the *real* problem is our behavior on address
space exhaustion: it is rather common that a driver will only use one
BAR/address space, and sometimes can even choose which address space to
use.  The right thing would be to defer error until a particular BAR is
requested by the device driver, but I'm not sure if our interfaces
permits that (and I'm on my way to the airport right now so I can't
check...)

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-05-31 11:12         ` Mike Travis
  2010-05-31 16:36           ` H. Peter Anvin
@ 2010-06-01 22:49           ` Bjorn Helgaas
  2010-06-02  7:31             ` H. Peter Anvin
  2010-06-02 15:53             ` Mike Habeck
  1 sibling, 2 replies; 49+ messages in thread
From: Bjorn Helgaas @ 2010-06-01 22:49 UTC (permalink / raw)
  To: Mike Travis
  Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, x86, Jesse Barnes,
	Jacob Pan, Tejun Heo, Mike Habeck, LKML, linux-pci

[Re-added linux-pci, which got lost again somewhere.]

On Monday, May 31, 2010 05:12:00 am Mike Travis wrote:
> H. Peter Anvin wrote:
> > On 05/28/2010 10:10 AM, Mike Travis wrote:
> >> H. Peter Anvin wrote:
> >>> On 05/28/2010 09:53 AM, Mike Travis wrote:
> >>>> Any further consideration for this patch, or has it been rejected?

I'm disappointed that you didn't rework this to make it generic,
not x86-specific.  That would be pretty easy and would remove
the need for somebody else to come and clean it up later.

> >>> Well, it's really up to Jesse, but as far as I can see, this patch is a
> >>> net loss of functionality and doesn't actually add anything.  Without
> >>> this patch, some resources that were not assigned by BIOS will be
> >>> unreachable.  With this patch, *all* resources that were not assigned by
> >>> BIOS will be unreachable...
> >>>
> >>> 	-hpa
> >>>
> >> Apparently you're missing the point of the patch?  The patch is needed
> >> because BIOS is purposely not assigning I/O BAR's to devices that won't
> >> use them, freeing up the resource for devices that do need them.  Where
> >> is the "all" resources that are not reachable?
> > 
> > No, the patch isn't needed for those.
> > 
> > Without your patch:
> > 
> > - Devices assigned by BIOS remain assigned;
> > - Devices not assigned by BIOS get assigned until address space
> >   exhausted.
> > 
> > With your patch:
> > 
> > - Devices assigned by BIOS remain assigned;
> > - Devices not assigned by BIOS never get assigned at all.
> > 
> > What am I missing here?
> 
> BIOS still assigns the MMIO BAR's so the devices are alive.

I'm sorry; I don't follow this.  BIOS assigns MMIO BARs regardless
of whether we have your patch.

I'm still having trouble reconciling the stated purpose, i.e., the
changelog, with the behavior.  The changelog implies that the patch
is required to make >16 devices with I/O BARs work at all, but per
Mike Habeck, the patch just gets rid of some warnings and maybe helps
with hot-add of devices using I/O space.

Is there a deeper problem that happens if we exhaust I/O space?
Are we releasing device resources in pci_assign_unassigned_bridge_resources()
and then we fail to reassign even MMIO resources after we exhaust
I/O space?

Maybe a complete dmesg log showing the failure would be helpful.  if
so, you could open a kernel.org bugzilla and reference it in your
changelog so we can take this issue into account in future PCI work.

Bjorn

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-06-01 22:49           ` Bjorn Helgaas
@ 2010-06-02  7:31             ` H. Peter Anvin
  2010-06-02 15:45               ` Bjorn Helgaas
  2010-06-02 15:53             ` Mike Habeck
  1 sibling, 1 reply; 49+ messages in thread
From: H. Peter Anvin @ 2010-06-02  7:31 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mike Travis, Ingo Molnar, Thomas Gleixner, x86, Jesse Barnes,
	Jacob Pan, Tejun Heo, Mike Habeck, LKML, linux-pci

On 06/01/2010 03:49 PM, Bjorn Helgaas wrote:
>>
>> BIOS still assigns the MMIO BAR's so the devices are alive.
>
> I'm sorry; I don't follow this.  BIOS assigns MMIO BARs regardless
> of whether we have your patch.
>

I'm assuming that that Mike is implying is that the allocation code runs 
out of I/O space and as a result shuts down the entire device.

	-hpa

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-06-02  7:31             ` H. Peter Anvin
@ 2010-06-02 15:45               ` Bjorn Helgaas
  2010-06-02 15:47                 ` H. Peter Anvin
  0 siblings, 1 reply; 49+ messages in thread
From: Bjorn Helgaas @ 2010-06-02 15:45 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Mike Travis, Ingo Molnar, Thomas Gleixner, x86, Jesse Barnes,
	Jacob Pan, Tejun Heo, Mike Habeck, LKML, linux-pci

On Wednesday, June 02, 2010 01:31:18 am H. Peter Anvin wrote:
> On 06/01/2010 03:49 PM, Bjorn Helgaas wrote:
> >>
> >> BIOS still assigns the MMIO BAR's so the devices are alive.
> >
> > I'm sorry; I don't follow this.  BIOS assigns MMIO BARs regardless
> > of whether we have your patch.
> 
> I'm assuming that that Mike is implying is that the allocation code runs 
> out of I/O space and as a result shuts down the entire device.

Yeah, that's why I asked about a deeper problem.  There's not really a
"shut down this device" flag, so the only way I can think of that we
might make a device completely unusable is if we release all the device
resources and then fail to reassign them.

A concrete example, e.g., a dmesg log, would go a long ways toward
clarifying this.

Bjorn

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-06-02 15:45               ` Bjorn Helgaas
@ 2010-06-02 15:47                 ` H. Peter Anvin
  2010-06-02 15:53                   ` Jesse Barnes
  0 siblings, 1 reply; 49+ messages in thread
From: H. Peter Anvin @ 2010-06-02 15:47 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mike Travis, Ingo Molnar, Thomas Gleixner, x86, Jesse Barnes,
	Jacob Pan, Tejun Heo, Mike Habeck, LKML, linux-pci

On 06/02/2010 05:45 PM, Bjorn Helgaas wrote:
> On Wednesday, June 02, 2010 01:31:18 am H. Peter Anvin wrote:
>> On 06/01/2010 03:49 PM, Bjorn Helgaas wrote:
>>>>
>>>> BIOS still assigns the MMIO BAR's so the devices are alive.
>>>
>>> I'm sorry; I don't follow this.  BIOS assigns MMIO BARs regardless
>>> of whether we have your patch.
>>
>> I'm assuming that that Mike is implying is that the allocation code runs
>> out of I/O space and as a result shuts down the entire device.
>
> Yeah, that's why I asked about a deeper problem.  There's not really a
> "shut down this device" flag, so the only way I can think of that we
> might make a device completely unusable is if we release all the device
> resources and then fail to reassign them.
>
> A concrete example, e.g., a dmesg log, would go a long ways toward
> clarifying this.
>

That's what I thought, which I guess means my original question to Mike 
still stands...

	-hpa


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-06-01 22:49           ` Bjorn Helgaas
  2010-06-02  7:31             ` H. Peter Anvin
@ 2010-06-02 15:53             ` Mike Habeck
  2010-06-02 16:40               ` Bjorn Helgaas
  1 sibling, 1 reply; 49+ messages in thread
From: Mike Habeck @ 2010-06-02 15:53 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mike Travis, H. Peter Anvin, Ingo Molnar, Thomas Gleixner, x86,
	Jesse Barnes, Jacob Pan, Tejun Heo, LKML, linux-pci

Bjorn Helgaas wrote:
> [Re-added linux-pci, which got lost again somewhere.]
> 
> On Monday, May 31, 2010 05:12:00 am Mike Travis wrote:
>> H. Peter Anvin wrote:
>>> On 05/28/2010 10:10 AM, Mike Travis wrote:
>>>> H. Peter Anvin wrote:
>>>>> On 05/28/2010 09:53 AM, Mike Travis wrote:
>>>>>> Any further consideration for this patch, or has it been rejected?
> 
> I'm disappointed that you didn't rework this to make it generic,
> not x86-specific.  That would be pretty easy and would remove
> the need for somebody else to come and clean it up later.
> 
>>>>> Well, it's really up to Jesse, but as far as I can see, this patch is a
>>>>> net loss of functionality and doesn't actually add anything.  Without
>>>>> this patch, some resources that were not assigned by BIOS will be
>>>>> unreachable.  With this patch, *all* resources that were not assigned by
>>>>> BIOS will be unreachable...
>>>>>
>>>>> 	-hpa
>>>>>
>>>> Apparently you're missing the point of the patch?  The patch is needed
>>>> because BIOS is purposely not assigning I/O BAR's to devices that won't
>>>> use them, freeing up the resource for devices that do need them.  Where
>>>> is the "all" resources that are not reachable?
>>> No, the patch isn't needed for those.
>>>
>>> Without your patch:
>>>
>>> - Devices assigned by BIOS remain assigned;
>>> - Devices not assigned by BIOS get assigned until address space
>>>   exhausted.
>>>
>>> With your patch:
>>>
>>> - Devices assigned by BIOS remain assigned;
>>> - Devices not assigned by BIOS never get assigned at all.
>>>
>>> What am I missing here?
>> BIOS still assigns the MMIO BAR's so the devices are alive.
> 
> I'm sorry; I don't follow this.  BIOS assigns MMIO BARs regardless
> of whether we have your patch.
> 
> I'm still having trouble reconciling the stated purpose, i.e., the
> changelog, with the behavior.  The changelog implies that the patch
> is required to make >16 devices with I/O BARs work at all, but per
> Mike Habeck, the patch just gets rid of some warnings and maybe helps
> with hot-add of devices using I/O space.

greaterthan 16 devices will still work if the BIOS doesn't assign
the I/O BARs and the kernel does (including the devices that don't
get assigned due to the kernel running out of I/O Port resources).
And when the kernel runs out of I/O Port space it will warn for
those devices it couldn't assign: (example):

   pci 0002:03:00.0: BAR 5: can't allocate I/O resource [0x0-0x7f]

But if the kernel assigns all the I/O Port space to these devices
we know don't need it (thus the reason the BIOS didn't assign it)
then I believe hot-add of devices using I/O space will fail.
What the patch is attempting to do is allow the BIOS a way to
not assign resources it knows are not needed and thus make sure
the kernel doesn't override that.

There is also the issue that quirk_system_pci_resources() thinks
those unassigned I/O resources are using I/O port space 0x0-0xFF.
Since the BIOS never assigned the BAR the kernel reads it as
having a base of 0x0 and a limit of whatever the BAR size is when
it writes all 1's to obtain the size.  So that results in
quirk_system_pci_resources() disabling pnp devices: (example):

   pnp 00:11: io resource (0x92-0x92) overlaps 0002:03:00.0 BAR 0 (0x0-0xff), disabling
   pnp 00:11: io resource (0x10-0x1f) overlaps 0002:03:00.0 BAR 0 (0x0-0xff), disabling
   pnp 00:11: io resource (0x72-0x73) overlaps 0002:03:00.0 BAR 0 (0x0-0xff), disabling
   pnp 00:11: io resource (0x80-0x80) overlaps 0002:03:00.0 BAR 0 (0x0-0xff), disabling
   pnp 00:11: io resource (0x84-0x86) overlaps 0002:03:00.0 BAR 0 (0x0-0xff), disabling
   pnp 00:11: io resource (0x88-0x88) overlaps 0002:03:00.0 BAR 0 (0x0-0xff), disabling
   pnp 00:11: io resource (0x8c-0x8e) overlaps 0002:03:00.0 BAR 0 (0x0-0xff), disabling
   pnp 00:11: io resource (0x90-0x9f) overlaps 0002:03:00.0 BAR 0 (0x0-0xff), disabling

One could argue this is a quirk_system_pci_resources() issue and
should be handled there rather than "zeroing out the resource if
the bios didn't assign it" as the patch does, but what the patch
is attempting to do (as stated above) is to allow the BIOS a way
to not assign resources it knows are not needed and thus make sure
the kernel doesn't override that... and in doing that the quirk
issue goes away too.

-mike



> 
> Is there a deeper problem that happens if we exhaust I/O space?
> Are we releasing device resources in pci_assign_unassigned_bridge_resources()
> and then we fail to reassign even MMIO resources after we exhaust
> I/O space?
> 
> Maybe a complete dmesg log showing the failure would be helpful.  if
> so, you could open a kernel.org bugzilla and reference it in your
> changelog so we can take this issue into account in future PCI work.
> 
> Bjorn


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-06-02 15:47                 ` H. Peter Anvin
@ 2010-06-02 15:53                   ` Jesse Barnes
  2010-06-09  0:53                     ` H. Peter Anvin
  0 siblings, 1 reply; 49+ messages in thread
From: Jesse Barnes @ 2010-06-02 15:53 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Bjorn Helgaas, Mike Travis, Ingo Molnar, Thomas Gleixner, x86,
	Jacob Pan, Tejun Heo, Mike Habeck, LKML, linux-pci

On Wed, 02 Jun 2010 17:47:23 +0200
"H. Peter Anvin" <hpa@zytor.com> wrote:

> On 06/02/2010 05:45 PM, Bjorn Helgaas wrote:
> > On Wednesday, June 02, 2010 01:31:18 am H. Peter Anvin wrote:
> >> On 06/01/2010 03:49 PM, Bjorn Helgaas wrote:
> >>>>
> >>>> BIOS still assigns the MMIO BAR's so the devices are alive.
> >>>
> >>> I'm sorry; I don't follow this.  BIOS assigns MMIO BARs regardless
> >>> of whether we have your patch.
> >>
> >> I'm assuming that that Mike is implying is that the allocation code runs
> >> out of I/O space and as a result shuts down the entire device.
> >
> > Yeah, that's why I asked about a deeper problem.  There's not really a
> > "shut down this device" flag, so the only way I can think of that we
> > might make a device completely unusable is if we release all the device
> > resources and then fail to reassign them.
> >
> > A concrete example, e.g., a dmesg log, would go a long ways toward
> > clarifying this.
> >
> 
> That's what I thought, which I guess means my original question to Mike 
> still stands...

I thought the whole reason for this was hotplug; we don't want to
exhaust I/O space unnecessarily by allocating resources for BARs the
BIOS didn't assign so we can keep them around for later hotplug
activity.

If there's some other issue, it's not too late to drop this patch.

Mike or Mike, can you clarify?

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-06-02 15:53             ` Mike Habeck
@ 2010-06-02 16:40               ` Bjorn Helgaas
  0 siblings, 0 replies; 49+ messages in thread
From: Bjorn Helgaas @ 2010-06-02 16:40 UTC (permalink / raw)
  To: Mike Habeck
  Cc: Mike Travis, H. Peter Anvin, Ingo Molnar, Thomas Gleixner, x86,
	Jesse Barnes, Jacob Pan, Tejun Heo, LKML, linux-pci

On Wednesday, June 02, 2010 09:53:23 am Mike Habeck wrote:
> Bjorn Helgaas wrote:
> > I'm still having trouble reconciling the stated purpose, i.e., the
> > changelog, with the behavior.  The changelog implies that the patch
> > is required to make >16 devices with I/O BARs work at all, but per
> > Mike Habeck, the patch just gets rid of some warnings and maybe helps
> > with hot-add of devices using I/O space.
> 
> greaterthan 16 devices will still work if the BIOS doesn't assign
> the I/O BARs and the kernel does (including the devices that don't
> get assigned due to the kernel running out of I/O Port resources).
> And when the kernel runs out of I/O Port space it will warn for
> those devices it couldn't assign: (example):
> 
>    pci 0002:03:00.0: BAR 5: can't allocate I/O resource [0x0-0x7f]

[For anybody else grepping for this message, it is now "can't assign",
not "can't allocate".]

> But if the kernel assigns all the I/O Port space to these devices
> we know don't need it (thus the reason the BIOS didn't assign it)
> then I believe hot-add of devices using I/O space will fail.

Yes, I understand and agree with this motivation, though I hope
"pci=nobar" is recognized as only a short-term workaround.  I think
it's completely unacceptable in terms of the user experience.  Using
DMI quirks to turn it on automatically removes some user pain, but
it's not a real solution and still leaves a maintenance problem.

How about if we write a changelog that mentions hot-add of devices
that need I/O space?  :-)

> There is also the issue that quirk_system_pci_resources() thinks
> those unassigned I/O resources are using I/O port space 0x0-0xFF.
> Since the BIOS never assigned the BAR the kernel reads it as
> having a base of 0x0 and a limit of whatever the BAR size is when
> it writes all 1's to obtain the size.  So that results in
> quirk_system_pci_resources() disabling pnp devices: (example):
> 
>    pnp 00:11: io resource (0x92-0x92) overlaps 0002:03:00.0 BAR 0 (0x0-0xff), disabling
[...]
> 
> One could argue this is a quirk_system_pci_resources() issue and
> should be handled there rather than "zeroing out the resource if
> the bios didn't assign it" as the patch does, but what the patch
> is attempting to do (as stated above) is to allow the BIOS a way
> to not assign resources it knows are not needed and thus make sure
> the kernel doesn't override that... and in doing that the quirk
> issue goes away too.

This is definitely a problem, but I think it should be handled
separately.  The fact that this patch makes the "overlap" messages
go away is a nice coincidence, but it doesn't fix the underlying
issue.

I think setting the resource start/end to zero to "disable" it as in
this patch (and elsewhere) is just a hack -- we no longer know the size
of the BAR, the struct resource no longer corresponds to the BAR contents,
and I don't think we do anything with the PCI_COMMAND_IO/PCI_COMMAND_MEM
bits to actually disable the BAR, so the device probably still responds
at whatever we left in the BAR.

In quirk_system_pci_resources(), we set IORESOURCE_DISABLED rather than
setting the PNP resource start/end to zero, but that's a bit of a hack,
too.  All that does is prevent the PNP motherboard driver (pnp/system.c)
from requesting that region.  We don't actually do anything to stop the
device from responding to that address.

Bjorn

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-06-02 15:53                   ` Jesse Barnes
@ 2010-06-09  0:53                     ` H. Peter Anvin
  2010-06-09  1:26                       ` Jesse Barnes
  0 siblings, 1 reply; 49+ messages in thread
From: H. Peter Anvin @ 2010-06-09  0:53 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Bjorn Helgaas, Mike Travis, Ingo Molnar, Thomas Gleixner, x86,
	Jacob Pan, Tejun Heo, Mike Habeck, LKML, linux-pci

On 06/02/2010 08:53 AM, Jesse Barnes wrote:
>>
>> That's what I thought, which I guess means my original question to Mike 
>> still stands...
> 
> I thought the whole reason for this was hotplug; we don't want to
> exhaust I/O space unnecessarily by allocating resources for BARs the
> BIOS didn't assign so we can keep them around for later hotplug
> activity.
> 
> If there's some other issue, it's not too late to drop this patch.
> 

Okay, now... this means that if a device that the BIOS doesn't know
about, but which needs I/O addresses, then it will work if hotplugged,
but not if it is plugged in on system boot?

	-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-06-09  0:53                     ` H. Peter Anvin
@ 2010-06-09  1:26                       ` Jesse Barnes
  2010-06-09 14:23                         ` Mike Habeck
  0 siblings, 1 reply; 49+ messages in thread
From: Jesse Barnes @ 2010-06-09  1:26 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Bjorn Helgaas, Mike Travis, Ingo Molnar, Thomas Gleixner, x86,
	Jacob Pan, Tejun Heo, Mike Habeck, LKML, linux-pci

On Tue, 08 Jun 2010 17:53:19 -0700
"H. Peter Anvin" <hpa@zytor.com> wrote:

> On 06/02/2010 08:53 AM, Jesse Barnes wrote:
> >>
> >> That's what I thought, which I guess means my original question to Mike 
> >> still stands...
> > 
> > I thought the whole reason for this was hotplug; we don't want to
> > exhaust I/O space unnecessarily by allocating resources for BARs the
> > BIOS didn't assign so we can keep them around for later hotplug
> > activity.
> > 
> > If there's some other issue, it's not too late to drop this patch.
> > 
> 
> Okay, now... this means that if a device that the BIOS doesn't know
> about, but which needs I/O addresses, then it will work if hotplugged,
> but not if it is plugged in on system boot?

Depends on the BIOS interactions on this platform; if the kernel ends
up doing all the allocations itself, we'll allocate space for every BAR
unconditionally, meaning that any hotplugged device should work.

But really the SGI guys should comment here.

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned
  2010-06-09  1:26                       ` Jesse Barnes
@ 2010-06-09 14:23                         ` Mike Habeck
  0 siblings, 0 replies; 49+ messages in thread
From: Mike Habeck @ 2010-06-09 14:23 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: H. Peter Anvin, Bjorn Helgaas, Mike Travis, Ingo Molnar,
	Thomas Gleixner, x86, Jacob Pan, Tejun Heo, LKML, linux-pci

Jesse Barnes wrote:
> On Tue, 08 Jun 2010 17:53:19 -0700
> "H. Peter Anvin" <hpa@zytor.com> wrote:
> 
>> On 06/02/2010 08:53 AM, Jesse Barnes wrote:
>>>> That's what I thought, which I guess means my original question to Mike 
>>>> still stands...
>>> I thought the whole reason for this was hotplug; we don't want to
>>> exhaust I/O space unnecessarily by allocating resources for BARs the
>>> BIOS didn't assign so we can keep them around for later hotplug
>>> activity.
>>>
>>> If there's some other issue, it's not too late to drop this patch.
>>>
>> Okay, now... this means that if a device that the BIOS doesn't know
>> about, but which needs I/O addresses, then it will work if hotplugged,
>> but not if it is plugged in on system boot?
> 
> Depends on the BIOS interactions on this platform; if the kernel ends
> up doing all the allocations itself, we'll allocate space for every BAR
> unconditionally, meaning that any hotplugged device should work.

Correct, our BIOS allocates I/O space for all devices except for a
few that it knows don't use it.  On a hotplug attach the kernel
will be unconditionally allocating the I/O space for all devices..
The pci=nobar option strictly prevents the kernel from allocating
BAR resources to device BARs that the BIOS didn't assign (similar
to how the pci=norom option works for the device's ROM BAR)

> 
> But really the SGI guys should comment here.
> 


^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2010-06-09 14:23 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-12 18:14 [Patch 1/1] x86 pci: Add option to not assign BAR's if not already assigned Mike Travis
2010-05-13 18:56 ` Bjorn Helgaas
2010-05-13 19:08   ` H. Peter Anvin
2010-05-13 19:12   ` Mike Travis
2010-05-13 19:13     ` Mike Travis
2010-05-13 19:54     ` Bjorn Helgaas
2010-05-13 20:27       ` Mike Habeck
2010-05-13 19:38   ` Mike Habeck
2010-05-13 20:02     ` Bjorn Helgaas
2010-05-13 20:09       ` H. Peter Anvin
2010-05-14 22:25       ` Jesse Barnes
2010-05-14 22:34         ` Mike Travis
2010-05-14 22:35           ` H. Peter Anvin
2010-05-14 22:40             ` Mike Travis
2010-05-15  2:25               ` Mike Travis
2010-05-14 22:47           ` Jesse Barnes
2010-05-14 22:59             ` Mike Travis
2010-05-14 23:06               ` Jesse Barnes
2010-05-14 23:23                 ` Mike Travis
2010-05-14 23:33                   ` Jesse Barnes
2010-05-14 23:40                     ` H. Peter Anvin
2010-05-15  0:02                       ` Jesse Barnes
2010-05-14 23:20             ` H. Peter Anvin
2010-05-14 23:28               ` Jesse Barnes
2010-05-14 23:32                 ` H. Peter Anvin
2010-05-14 23:34                   ` Jesse Barnes
2010-05-14 23:39                     ` H. Peter Anvin
2010-05-15  0:00                       ` Jesse Barnes
2010-05-15  0:14                         ` H. Peter Anvin
2010-05-13 20:36     ` Yinghai Lu
2010-05-13 20:34       ` H. Peter Anvin
2010-05-13 18:58 ` Bjorn Helgaas
2010-05-28 16:53 ` Mike Travis
2010-05-28 17:00   ` H. Peter Anvin
2010-05-28 17:10     ` Mike Travis
2010-05-28 19:28       ` Jesse Barnes
2010-05-28 20:04       ` H. Peter Anvin
2010-05-31 11:12         ` Mike Travis
2010-05-31 16:36           ` H. Peter Anvin
2010-06-01 22:49           ` Bjorn Helgaas
2010-06-02  7:31             ` H. Peter Anvin
2010-06-02 15:45               ` Bjorn Helgaas
2010-06-02 15:47                 ` H. Peter Anvin
2010-06-02 15:53                   ` Jesse Barnes
2010-06-09  0:53                     ` H. Peter Anvin
2010-06-09  1:26                       ` Jesse Barnes
2010-06-09 14:23                         ` Mike Habeck
2010-06-02 15:53             ` Mike Habeck
2010-06-02 16:40               ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).