All of lore.kernel.org
 help / color / mirror / Atom feed
* PCI Out of Resources
@ 2015-04-14  1:00 moussa ba
  2015-04-14 13:05 ` Bjorn Helgaas
  0 siblings, 1 reply; 6+ messages in thread
From: moussa ba @ 2015-04-14  1:00 UTC (permalink / raw)
  To: linux-pci

I am working on a system that has 20 PCie cards conencted via a series
of PCi-Switches. While booting, the BIOS complains after 11 cards that
it ran out of resources. Despite that, we are able to properly see all
devices up to 16 devices once the OS boots.

I am running centos 7 with a 3.10 kernel.

Interestingly enough, we are able to see all devices under lspci, but
the 4 missing devices have no memory allocated to any of the BAR
spaces.

My understanding is that we can allocate that memory via setpci.

A working device looks as follows under lspci:

9a:00.0 Mass storage controller: *******
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 139
Region 2: Memory at fbb02000 (32-bit, non-prefetchable) [size=4K]
Region 3: Memory at fbb03000 (32-bit, non-prefetchable) [size=4K]
Region 4: I/O ports at c000 [size=16]
Region 5: Memory at fbb00000 (32-bit, non-prefetchable) [size=8K]
Capabilities: [40] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited




A missing device looks as follows:

05:00.0 Mass storage controller: *******
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 26
        Region 4: I/O ports at 0000
        Capabilities: [40] Express (v2) Endpoint, MSI 00




I was going to simply use:
#setpci -s 0000:05:00.0 BASE_MEMORY_ADDRESS_2=0x..........

Unfortunately I end up with a <ignored> message under lspci.  How do I
determine the appropriate address to use to properly assign the
resources (I have been looking at /proc/iomem)


Thank you

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: PCI Out of Resources
  2015-04-14  1:00 PCI Out of Resources moussa ba
@ 2015-04-14 13:05 ` Bjorn Helgaas
  2015-04-14 13:58   ` Bjorn Helgaas
  0 siblings, 1 reply; 6+ messages in thread
From: Bjorn Helgaas @ 2015-04-14 13:05 UTC (permalink / raw)
  To: moussa ba; +Cc: linux-pci

On Mon, Apr 13, 2015 at 8:00 PM, moussa ba <musaba@gmail.com> wrote:
> I am working on a system that has 20 PCie cards conencted via a series
> of PCi-Switches. While booting, the BIOS complains after 11 cards that
> it ran out of resources. Despite that, we are able to properly see all
> devices up to 16 devices once the OS boots.
>
> I am running centos 7 with a 3.10 kernel.
>
> Interestingly enough, we are able to see all devices under lspci, but
> the 4 missing devices have no memory allocated to any of the BAR
> spaces.
>
> My understanding is that we can allocate that memory via setpci.
>
> A working device looks as follows under lspci:
>
> 9a:00.0 Mass storage controller: *******
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx+
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
> Latency: 0
> Interrupt: pin A routed to IRQ 139
> Region 2: Memory at fbb02000 (32-bit, non-prefetchable) [size=4K]
> Region 3: Memory at fbb03000 (32-bit, non-prefetchable) [size=4K]
> Region 4: I/O ports at c000 [size=16]
> Region 5: Memory at fbb00000 (32-bit, non-prefetchable) [size=8K]
> Capabilities: [40] Express (v2) Endpoint, MSI 00
> DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
>
>
>
>
> A missing device looks as follows:
>
> 05:00.0 Mass storage controller: *******
>         Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B- DisINTx-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Interrupt: pin A routed to IRQ 26
>         Region 4: I/O ports at 0000
>         Capabilities: [40] Express (v2) Endpoint, MSI 00
>
>
>
>
> I was going to simply use:
> #setpci -s 0000:05:00.0 BASE_MEMORY_ADDRESS_2=0x..........
>
> Unfortunately I end up with a <ignored> message under lspci.  How do I
> determine the appropriate address to use to properly assign the
> resources (I have been looking at /proc/iomem)

1) Please try the same thing with a current kernel, e.g., v3.19 or
v4.0.  There have been many resource management changes since v3.10,
and I don't want to debug problems that have already been fixed.  If
you find that v4.0 works better, then it becomes a simpler problem of
figuring out whether to use a newer kernel or backport a fix to your
old kernel.

2) Please post a complete dmesg log, complete "lspci -vv" output,
contents of /proc/iomem, and a transcript of what you're doing with
setpci (remove vendor/device IDs if they are confidential, and you can
attach these to a kernel.org bugzilla if that's more convenient).
It's not quite clear what you mean by "seeing devices" -- there are
several ways a device can be present but not usable, and we need to
figure out which are relevant here.

3) If you set a BAR with "setpci", it affects the device, but not
Linux.  The kernel doesn't know you've changed the BAR, so as far as
it's concerned, there's still no address space assigned to the device,
so it will still decline to enable it.

4) From the source, it looks like lspci will print "<ignored>" if the
BAR contains zero.  In some systems that would be a valid assignment,
so this might be something we should change in lspci.  But we need to
know more about your system to figure out whether this is relevant to
you.

Bjorn

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: PCI Out of Resources
  2015-04-14 13:05 ` Bjorn Helgaas
@ 2015-04-14 13:58   ` Bjorn Helgaas
  2015-04-14 21:43     ` moussa ba
  0 siblings, 1 reply; 6+ messages in thread
From: Bjorn Helgaas @ 2015-04-14 13:58 UTC (permalink / raw)
  To: moussa ba; +Cc: linux-pci

On Tue, Apr 14, 2015 at 08:05:42AM -0500, Bjorn Helgaas wrote:
> 2) Please post a complete dmesg log, complete "lspci -vv" output,
> contents of /proc/iomem, and a transcript of what you're doing with
> setpci (remove vendor/device IDs if they are confidential, ...

BTW, here are some sed scripts I've used to sanitize dmesg output
in the past.  It's tedious to do this by hand.

  dmesg | sed -r 's/(pci ....:..:..\..: \[)....:....(\].*)/\1VVVV:DDDD\2/' | sed -r 's/(DMI:).*/\1 (removed)/' | sed -r 's/(ACPI: [A-Z]{4} [0-9a-fA-F]{16} .*\().*(\).*)/\1...\2/' | sed -r 's/(scsi.*: Direct-Access).*/\1 .../'

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: PCI Out of Resources
  2015-04-14 13:58   ` Bjorn Helgaas
@ 2015-04-14 21:43     ` moussa ba
  2015-04-14 22:23       ` Bjorn Helgaas
  0 siblings, 1 reply; 6+ messages in thread
From: moussa ba @ 2015-04-14 21:43 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci

Bjorn,

See the bug filed here:

https://bugzilla.kernel.org/show_bug.cgi?id=96651

It was apparently misfiled to go to drivers_pci.  See the text of the
bug filing below.  This was tested on 3.10, 3.19 and 4.0

20 PCI drives installed on a 2 Node system via 5 PCI switches. Only 16
drives are actually recognized by the Linux.  All 20 drives are
visible via lspci. I am unable to find an appropriate address in
/proc/iomem to assign these regions to.


#setpci -s 0000:90:00.0 BASE_ADDRESS_5=0x.......

Not specifying the address because I had no idea where to put it, I
would find empty spots in /proc/iomem but thse would fall within the
address space of one of the pci bridges...

We are missing 4 drives with the following error seen in dmesg for
each drive, one given her as an example:

[    0.983954] pci 0000:8e:00.0: can't claim BAR 6 [mem
0xffffc000-0xffffffff pref]: no compatible bridge window
[    0.984106] pci 0000:02:02.0: BAR 14: no space for [mem size 0x00100000]
[    0.984108] pci 0000:02:02.0: BAR 14: failed to assign [mem size 0x00100000]
[    0.984110] pci 0000:02:04.0: BAR 14: no space for [mem size 0x00100000]
[    0.984111] pci 0000:02:04.0: BAR 14: failed to assign [mem size 0x00100000]
[    0.984113] pci 0000:02:02.0: BAR 13: no space for [io  size 0x1000]
[    0.984114] pci 0000:02:02.0: BAR 13: failed to assign [io  size 0x1000]
[    0.984116] pci 0000:02:04.0: BAR 13: no space for [io  size 0x1000]
[    0.984117] pci 0000:02:04.0: BAR 13: failed to assign [io  size 0x1000]
[    0.984119] pci 0000:03:00.0: BAR 5: no space for [mem size 0x00002000]
[    0.984120] pci 0000:03:00.0: BAR 5: failed to assign [mem size 0x00002000]
[    0.984122] pci 0000:03:00.0: BAR 2: no space for [mem size 0x00001000]
[    0.984123] pci 0000:03:00.0: BAR 2: failed to assign [mem size 0x00001000]
[    0.984124] pci 0000:03:00.0: BAR 3: no space for [mem size 0x00001000]
[    0.984125] pci 0000:03:00.0: BAR 3: failed to assign [mem size 0x00001000]
[    0.984128] pci 0000:03:00.0: BAR 4: no space for [io  size 0x0010]
[    0.984129] pci 0000:03:00.0: BAR 4: failed to assign [io  size 0x0010]
[    0.984132] pci 0000:02:02.0: PCI bridge to [bus 03]
[    0.984142] pci 0000:02:03.0: PCI bridge to [bus 04]
[    0.984154] pci 0000:05:00.0: BAR 5: no space for [mem size 0x00002000]
[    0.984155] pci 0000:05:00.0: BAR 5: failed to assign [mem size 0x00002000]
[    0.984156] pci 0000:05:00.0: BAR 2: no space for [mem size 0x00001000]
[    0.984157] pci 0000:05:00.0: BAR 2: failed to assign [mem size 0x00001000]
[    0.984159] pci 0000:05:00.0: BAR 3: no space for [mem size 0x00001000]
[    0.984160] pci 0000:05:00.0: BAR 3: failed to assign [mem size 0x00001000]
[    0.984161] pci 0000:05:00.0: BAR 4: no space for [io  size 0x0010]
[    0.984163] pci 0000:05:00.0: BAR 4: failed to assign [io  size 0x0010]


dmesg output
http://pastebin.com/YaWTRYsp

lspci output
http://pastebin.com/NHYyS61h

/proc/iomem output
http://pastebin.com/zpxk9pT3

On Tue, Apr 14, 2015 at 6:58 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Tue, Apr 14, 2015 at 08:05:42AM -0500, Bjorn Helgaas wrote:
>> 2) Please post a complete dmesg log, complete "lspci -vv" output,
>> contents of /proc/iomem, and a transcript of what you're doing with
>> setpci (remove vendor/device IDs if they are confidential, ...
>
> BTW, here are some sed scripts I've used to sanitize dmesg output
> in the past.  It's tedious to do this by hand.
>
>   dmesg | sed -r 's/(pci ....:..:..\..: \[)....:....(\].*)/\1VVVV:DDDD\2/' | sed -r 's/(DMI:).*/\1 (removed)/' | sed -r 's/(ACPI: [A-Z]{4} [0-9a-fA-F]{16} .*\().*(\).*)/\1...\2/' | sed -r 's/(scsi.*: Direct-Access).*/\1 .../'

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: PCI Out of Resources
  2015-04-14 21:43     ` moussa ba
@ 2015-04-14 22:23       ` Bjorn Helgaas
  2015-04-15  0:31         ` moussa ba
  0 siblings, 1 reply; 6+ messages in thread
From: Bjorn Helgaas @ 2015-04-14 22:23 UTC (permalink / raw)
  To: moussa ba; +Cc: linux-pci, Yinghai Lu

[+cc Yinghai]

On Tue, Apr 14, 2015 at 4:43 PM, moussa ba <musaba@gmail.com> wrote:
> Bjorn,
>
> See the bug filed here:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=96651
>
> It was apparently misfiled to go to drivers_pci.  See the text of the
> bug filing below.  This was tested on 3.10, 3.19 and 4.0
>
> 20 PCI drives installed on a 2 Node system via 5 PCI switches. Only 16
> drives are actually recognized by the Linux.  All 20 drives are
> visible via lspci. I am unable to find an appropriate address in
> /proc/iomem to assign these regions to.
>
>
> #setpci -s 0000:90:00.0 BASE_ADDRESS_5=0x.......
>
> Not specifying the address because I had no idea where to put it, I
> would find empty spots in /proc/iomem but thse would fall within the
> address space of one of the pci bridges...
>
> We are missing 4 drives with the following error seen in dmesg for
> each drive, one given her as an example:
>
> [    0.983954] pci 0000:8e:00.0: can't claim BAR 6 [mem
> 0xffffc000-0xffffffff pref]: no compatible bridge window
> [    0.984106] pci 0000:02:02.0: BAR 14: no space for [mem size 0x00100000]
> [    0.984108] pci 0000:02:02.0: BAR 14: failed to assign [mem size 0x00100000]
> [    0.984110] pci 0000:02:04.0: BAR 14: no space for [mem size 0x00100000]
> [    0.984111] pci 0000:02:04.0: BAR 14: failed to assign [mem size 0x00100000]
> [    0.984113] pci 0000:02:02.0: BAR 13: no space for [io  size 0x1000]
> [    0.984114] pci 0000:02:02.0: BAR 13: failed to assign [io  size 0x1000]
> [    0.984116] pci 0000:02:04.0: BAR 13: no space for [io  size 0x1000]
> [    0.984117] pci 0000:02:04.0: BAR 13: failed to assign [io  size 0x1000]
> [    0.984119] pci 0000:03:00.0: BAR 5: no space for [mem size 0x00002000]
> [    0.984120] pci 0000:03:00.0: BAR 5: failed to assign [mem size 0x00002000]
> [    0.984122] pci 0000:03:00.0: BAR 2: no space for [mem size 0x00001000]
> [    0.984123] pci 0000:03:00.0: BAR 2: failed to assign [mem size 0x00001000]
> [    0.984124] pci 0000:03:00.0: BAR 3: no space for [mem size 0x00001000]
> [    0.984125] pci 0000:03:00.0: BAR 3: failed to assign [mem size 0x00001000]
> [    0.984128] pci 0000:03:00.0: BAR 4: no space for [io  size 0x0010]
> [    0.984129] pci 0000:03:00.0: BAR 4: failed to assign [io  size 0x0010]
> [    0.984132] pci 0000:02:02.0: PCI bridge to [bus 03]
> [    0.984142] pci 0000:02:03.0: PCI bridge to [bus 04]
> [    0.984154] pci 0000:05:00.0: BAR 5: no space for [mem size 0x00002000]
> [    0.984155] pci 0000:05:00.0: BAR 5: failed to assign [mem size 0x00002000]
> [    0.984156] pci 0000:05:00.0: BAR 2: no space for [mem size 0x00001000]
> [    0.984157] pci 0000:05:00.0: BAR 2: failed to assign [mem size 0x00001000]
> [    0.984159] pci 0000:05:00.0: BAR 3: no space for [mem size 0x00001000]
> [    0.984160] pci 0000:05:00.0: BAR 3: failed to assign [mem size 0x00001000]
> [    0.984161] pci 0000:05:00.0: BAR 4: no space for [io  size 0x0010]
> [    0.984163] pci 0000:05:00.0: BAR 4: failed to assign [io  size 0x0010]

Linux assigns space for endpoint BARs, but it doesn't automatically
reassign bridge windows to make space for downstream devices.  Does
booting with "pci=realloc" make any difference?

Bjorn

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: PCI Out of Resources
  2015-04-14 22:23       ` Bjorn Helgaas
@ 2015-04-15  0:31         ` moussa ba
  0 siblings, 0 replies; 6+ messages in thread
From: moussa ba @ 2015-04-15  0:31 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, Yinghai Lu

Yes!!! pci=realloc actually resolves it.  Can you explain what the issue was?

Moussa

On Tue, Apr 14, 2015 at 3:23 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> [+cc Yinghai]
>
> On Tue, Apr 14, 2015 at 4:43 PM, moussa ba <musaba@gmail.com> wrote:
>> Bjorn,
>>
>> See the bug filed here:
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=96651
>>
>> It was apparently misfiled to go to drivers_pci.  See the text of the
>> bug filing below.  This was tested on 3.10, 3.19 and 4.0
>>
>> 20 PCI drives installed on a 2 Node system via 5 PCI switches. Only 16
>> drives are actually recognized by the Linux.  All 20 drives are
>> visible via lspci. I am unable to find an appropriate address in
>> /proc/iomem to assign these regions to.
>>
>>
>> #setpci -s 0000:90:00.0 BASE_ADDRESS_5=0x.......
>>
>> Not specifying the address because I had no idea where to put it, I
>> would find empty spots in /proc/iomem but thse would fall within the
>> address space of one of the pci bridges...
>>
>> We are missing 4 drives with the following error seen in dmesg for
>> each drive, one given her as an example:
>>
>> [    0.983954] pci 0000:8e:00.0: can't claim BAR 6 [mem
>> 0xffffc000-0xffffffff pref]: no compatible bridge window
>> [    0.984106] pci 0000:02:02.0: BAR 14: no space for [mem size 0x00100000]
>> [    0.984108] pci 0000:02:02.0: BAR 14: failed to assign [mem size 0x00100000]
>> [    0.984110] pci 0000:02:04.0: BAR 14: no space for [mem size 0x00100000]
>> [    0.984111] pci 0000:02:04.0: BAR 14: failed to assign [mem size 0x00100000]
>> [    0.984113] pci 0000:02:02.0: BAR 13: no space for [io  size 0x1000]
>> [    0.984114] pci 0000:02:02.0: BAR 13: failed to assign [io  size 0x1000]
>> [    0.984116] pci 0000:02:04.0: BAR 13: no space for [io  size 0x1000]
>> [    0.984117] pci 0000:02:04.0: BAR 13: failed to assign [io  size 0x1000]
>> [    0.984119] pci 0000:03:00.0: BAR 5: no space for [mem size 0x00002000]
>> [    0.984120] pci 0000:03:00.0: BAR 5: failed to assign [mem size 0x00002000]
>> [    0.984122] pci 0000:03:00.0: BAR 2: no space for [mem size 0x00001000]
>> [    0.984123] pci 0000:03:00.0: BAR 2: failed to assign [mem size 0x00001000]
>> [    0.984124] pci 0000:03:00.0: BAR 3: no space for [mem size 0x00001000]
>> [    0.984125] pci 0000:03:00.0: BAR 3: failed to assign [mem size 0x00001000]
>> [    0.984128] pci 0000:03:00.0: BAR 4: no space for [io  size 0x0010]
>> [    0.984129] pci 0000:03:00.0: BAR 4: failed to assign [io  size 0x0010]
>> [    0.984132] pci 0000:02:02.0: PCI bridge to [bus 03]
>> [    0.984142] pci 0000:02:03.0: PCI bridge to [bus 04]
>> [    0.984154] pci 0000:05:00.0: BAR 5: no space for [mem size 0x00002000]
>> [    0.984155] pci 0000:05:00.0: BAR 5: failed to assign [mem size 0x00002000]
>> [    0.984156] pci 0000:05:00.0: BAR 2: no space for [mem size 0x00001000]
>> [    0.984157] pci 0000:05:00.0: BAR 2: failed to assign [mem size 0x00001000]
>> [    0.984159] pci 0000:05:00.0: BAR 3: no space for [mem size 0x00001000]
>> [    0.984160] pci 0000:05:00.0: BAR 3: failed to assign [mem size 0x00001000]
>> [    0.984161] pci 0000:05:00.0: BAR 4: no space for [io  size 0x0010]
>> [    0.984163] pci 0000:05:00.0: BAR 4: failed to assign [io  size 0x0010]
>
> Linux assigns space for endpoint BARs, but it doesn't automatically
> reassign bridge windows to make space for downstream devices.  Does
> booting with "pci=realloc" make any difference?
>
> Bjorn

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-04-15  0:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-14  1:00 PCI Out of Resources moussa ba
2015-04-14 13:05 ` Bjorn Helgaas
2015-04-14 13:58   ` Bjorn Helgaas
2015-04-14 21:43     ` moussa ba
2015-04-14 22:23       ` Bjorn Helgaas
2015-04-15  0:31         ` moussa ba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.