* PCIe 32-bit MMIO exhaustion @ 2015-01-28 8:42 Daniel J Blueman 2015-01-29 15:23 ` Bjorn Helgaas 0 siblings, 1 reply; 7+ messages in thread From: Daniel J Blueman @ 2015-01-28 8:42 UTC (permalink / raw) To: Bjorn Helgaas, Ingo Molnar, Jiang Liu, H Peter Anvin, Thomas Gleixner Cc: Linux Kernel, Steffen Persvold, x86 With systems with a large number of PCI devices, we're seeing lack of 32-bit MMIO space, eg one quad-port NetXtreme-2 adapter takes 128MB of space [1]. An errata to the PCIe 2.1 spec provides guidance on limitations with 64-bit non-prefetchable BARs (since bridges have only 32-bit non-prefetchable ranges) stating that vendors can enable the prefetchable bit in BARs under certain circumstances to allow 64-bit allocation [2]. The problem with that, is that vendors can't know apriori what hosts their products will be in, so can't just advertise prefetchable 64-bit BARs. What can be done, is system firmware can use the 64-bit prefetchable BAR in bridges, and assign a 64-bit non-prefetchable device BAR into that area, where it is safe to do so (following the guidance). At present, linux denies such allocations [3] and disables the BARs. It seems a practical solution to allow them if the firmware believes it is safe. Is this plausible? Thanks, Daniel --- [1] 0000:01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) Subsystem: Dell Device 1f26 Flags: bus master, fast devsel, latency 0, IRQ 24 Memory at e6000000 (64-bit, non-prefetchable) [size=32M] Capabilities: [48] Power Management version 3 Capabilities: [50] Vital Product Data Capabilities: [58] MSI: Enable- Count=1/16 Maskable- 64bit+ Capabilities: [a0] MSI-X: Enable+ Count=9 Masked- Capabilities: [ac] Express Endpoint, MSI 00 Capabilities: [100] Device Serial Number d4-ae-52-ff-fe-ea-5c-e8 Capabilities: [110] Advanced Error Reporting Capabilities: [150] Power Budgeting <?> Capabilities: [160] Virtual Channel Kernel driver in use: bnx2 0000:01:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) Subsystem: Dell Device 1f26 Flags: bus master, fast devsel, latency 0, IRQ 25 Memory at e8000000 (64-bit, non-prefetchable) [size=32M] Capabilities: [48] Power Management version 3 Capabilities: [50] Vital Product Data Capabilities: [58] MSI: Enable- Count=1/16 Maskable- 64bit+ Capabilities: [a0] MSI-X: Enable- Count=9 Masked- Capabilities: [ac] Express Endpoint, MSI 00 Capabilities: [100] Device Serial Number d4-ae-52-ff-fe-ea-5c-ea Capabilities: [110] Advanced Error Reporting Capabilities: [150] Power Budgeting <?> Capabilities: [160] Virtual Channel Kernel driver in use: bnx2 0000:02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) Subsystem: Dell Device 1f26 Flags: bus master, fast devsel, latency 0, IRQ 28 Memory at ea000000 (64-bit, non-prefetchable) [size=32M] Capabilities: [48] Power Management version 3 Capabilities: [50] Vital Product Data Capabilities: [58] MSI: Enable- Count=1/16 Maskable- 64bit+ Capabilities: [a0] MSI-X: Enable- Count=9 Masked- Capabilities: [ac] Express Endpoint, MSI 00 Capabilities: [100] Device Serial Number d4-ae-52-ff-fe-ea-5c-ec Capabilities: [110] Advanced Error Reporting Capabilities: [150] Power Budgeting <?> Capabilities: [160] Virtual Channel Kernel driver in use: bnx2 0000:02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) Subsystem: Dell Device 1f26 Flags: bus master, fast devsel, latency 0, IRQ 29 Memory at ec000000 (64-bit, non-prefetchable) [size=32M] Capabilities: [48] Power Management version 3 Capabilities: [50] Vital Product Data Capabilities: [58] MSI: Enable- Count=1/16 Maskable- 64bit+ Capabilities: [a0] MSI-X: Enable- Count=9 Masked- Capabilities: [ac] Express Endpoint, MSI 00 Capabilities: [100] Device Serial Number d4-ae-52-ff-fe-ea-5c-ee Capabilities: [110] Advanced Error Reporting Capabilities: [150] Power Budgeting <?> Capabilities: [160] Virtual Channel Kernel driver in use: bnx2 -- [2] p13 https://www.pcisig.com/specifications/pciexpress/base2/PCIe_Base_r2.1_Errata_08Jun10.pdf -- [3] pci 0002:01:00.0: BAR 0: [mem size 0x00002000 64bit] conflicts with PCI Bus 0002:00 [mem 0x10020000000-0x10027ffffff pref] -- Daniel J Blueman Principal Software Engineer, Numascale ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PCIe 32-bit MMIO exhaustion 2015-01-28 8:42 PCIe 32-bit MMIO exhaustion Daniel J Blueman @ 2015-01-29 15:23 ` Bjorn Helgaas 2015-02-24 4:37 ` Daniel J Blueman 0 siblings, 1 reply; 7+ messages in thread From: Bjorn Helgaas @ 2015-01-29 15:23 UTC (permalink / raw) To: Daniel J Blueman Cc: Ingo Molnar, Jiang Liu, H Peter Anvin, Thomas Gleixner, Linux Kernel, Steffen Persvold, x86, Yinghai Lu [+cc Yinghai] Hi Daniel, On Wed, Jan 28, 2015 at 2:42 AM, Daniel J Blueman <daniel@numascale.com> wrote: > With systems with a large number of PCI devices, we're seeing lack of 32-bit > MMIO space, eg one quad-port NetXtreme-2 adapter takes 128MB of space [1]. > > An errata to the PCIe 2.1 spec provides guidance on limitations with 64-bit > non-prefetchable BARs (since bridges have only 32-bit non-prefetchable > ranges) stating that vendors can enable the prefetchable bit in BARs under > certain circumstances to allow 64-bit allocation [2]. > > The problem with that, is that vendors can't know apriori what hosts their > products will be in, so can't just advertise prefetchable 64-bit BARs. What > can be done, is system firmware can use the 64-bit prefetchable BAR in > bridges, and assign a 64-bit non-prefetchable device BAR into that area, > where it is safe to do so (following the guidance). > > At present, linux denies such allocations [3] and disables the BARs. It > seems a practical solution to allow them if the firmware believes it is > safe. This particular message ([3]): > pci 0002:01:00.0: BAR 0: [mem size 0x00002000 64bit] conflicts with PCI Bus > 0002:00 [mem 0x10020000000-0x10027ffffff pref] is misleading at best and likely a symptom of a bug. We printed the *size* of BAR 0, not an address, which means we haven't assigned space for the BAR. That means it should not conflict with anything. We already do revert to firmware assignments in some situations when Linux can't figure out how to assign things itself. But apparently not in *this* situation. Without seeing the whole picture, it's hard for me to figure out what's going on here. Could you open a bug report at http://bugzilla.kernel.org (category drivers/PCI) and attach a complete dmesg and "lspci -vv" output? Then we can look at what firmware did and what Linux thought was wrong with it. Bjorn > --- [1] > > 0000:01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 > Gigabit Ethernet (rev 20) > Subsystem: Dell Device 1f26 > Flags: bus master, fast devsel, latency 0, IRQ 24 > Memory at e6000000 (64-bit, non-prefetchable) [size=32M] > Capabilities: [48] Power Management version 3 > Capabilities: [50] Vital Product Data > Capabilities: [58] MSI: Enable- Count=1/16 Maskable- 64bit+ > Capabilities: [a0] MSI-X: Enable+ Count=9 Masked- > Capabilities: [ac] Express Endpoint, MSI 00 > Capabilities: [100] Device Serial Number d4-ae-52-ff-fe-ea-5c-e8 > Capabilities: [110] Advanced Error Reporting > Capabilities: [150] Power Budgeting <?> > Capabilities: [160] Virtual Channel > Kernel driver in use: bnx2 > > 0000:01:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 > Gigabit Ethernet (rev 20) > Subsystem: Dell Device 1f26 > Flags: bus master, fast devsel, latency 0, IRQ 25 > Memory at e8000000 (64-bit, non-prefetchable) [size=32M] > Capabilities: [48] Power Management version 3 > Capabilities: [50] Vital Product Data > Capabilities: [58] MSI: Enable- Count=1/16 Maskable- 64bit+ > Capabilities: [a0] MSI-X: Enable- Count=9 Masked- > Capabilities: [ac] Express Endpoint, MSI 00 > Capabilities: [100] Device Serial Number d4-ae-52-ff-fe-ea-5c-ea > Capabilities: [110] Advanced Error Reporting > Capabilities: [150] Power Budgeting <?> > Capabilities: [160] Virtual Channel > Kernel driver in use: bnx2 > > 0000:02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 > Gigabit Ethernet (rev 20) > Subsystem: Dell Device 1f26 > Flags: bus master, fast devsel, latency 0, IRQ 28 > Memory at ea000000 (64-bit, non-prefetchable) [size=32M] > Capabilities: [48] Power Management version 3 > Capabilities: [50] Vital Product Data > Capabilities: [58] MSI: Enable- Count=1/16 Maskable- 64bit+ > Capabilities: [a0] MSI-X: Enable- Count=9 Masked- > Capabilities: [ac] Express Endpoint, MSI 00 > Capabilities: [100] Device Serial Number d4-ae-52-ff-fe-ea-5c-ec > Capabilities: [110] Advanced Error Reporting > Capabilities: [150] Power Budgeting <?> > Capabilities: [160] Virtual Channel > Kernel driver in use: bnx2 > > 0000:02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 > Gigabit Ethernet (rev 20) > Subsystem: Dell Device 1f26 > Flags: bus master, fast devsel, latency 0, IRQ 29 > Memory at ec000000 (64-bit, non-prefetchable) [size=32M] > Capabilities: [48] Power Management version 3 > Capabilities: [50] Vital Product Data > Capabilities: [58] MSI: Enable- Count=1/16 Maskable- 64bit+ > Capabilities: [a0] MSI-X: Enable- Count=9 Masked- > Capabilities: [ac] Express Endpoint, MSI 00 > Capabilities: [100] Device Serial Number d4-ae-52-ff-fe-ea-5c-ee > Capabilities: [110] Advanced Error Reporting > Capabilities: [150] Power Budgeting <?> > Capabilities: [160] Virtual Channel > Kernel driver in use: bnx2 > > -- [2] p13 > > https://www.pcisig.com/specifications/pciexpress/base2/PCIe_Base_r2.1_Errata_08Jun10.pdf > > -- [3] > > pci 0002:01:00.0: BAR 0: [mem size 0x00002000 64bit] conflicts with PCI Bus > 0002:00 [mem 0x10020000000-0x10027ffffff pref] > -- > Daniel J Blueman > Principal Software Engineer, Numascale ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PCIe 32-bit MMIO exhaustion 2015-01-29 15:23 ` Bjorn Helgaas @ 2015-02-24 4:37 ` Daniel J Blueman 2015-03-03 22:38 ` Bjorn Helgaas 0 siblings, 1 reply; 7+ messages in thread From: Daniel J Blueman @ 2015-02-24 4:37 UTC (permalink / raw) To: Bjorn Helgaas, Jiang Liu Cc: Ingo Molnar, H Peter Anvin, Thomas Gleixner, Linux Kernel, Steffen Persvold, x86, Yinghai Lu Hi Bjorn, Jiang, On 29/01/2015 23:23, Bjorn Helgaas wrote: > Hi Daniel, > > On Wed, Jan 28, 2015 at 2:42 AM, Daniel J Blueman <daniel@numascale.com> wrote: >> With systems with a large number of PCI devices, we're seeing lack of 32-bit >> MMIO space, eg one quad-port NetXtreme-2 adapter takes 128MB of space [1]. >> >> An errata to the PCIe 2.1 spec provides guidance on limitations with 64-bit >> non-prefetchable BARs (since bridges have only 32-bit non-prefetchable >> ranges) stating that vendors can enable the prefetchable bit in BARs under >> certain circumstances to allow 64-bit allocation [2]. >> >> The problem with that, is that vendors can't know apriori what hosts their >> products will be in, so can't just advertise prefetchable 64-bit BARs. What >> can be done, is system firmware can use the 64-bit prefetchable BAR in >> bridges, and assign a 64-bit non-prefetchable device BAR into that area, >> where it is safe to do so (following the guidance). >> >> At present, linux denies such allocations [3] and disables the BARs. It >> seems a practical solution to allow them if the firmware believes it is >> safe. > > This particular message ([3]): > >> pci 0002:01:00.0: BAR 0: [mem size 0x00002000 64bit] conflicts with PCI Bus >> 0002:00 [mem 0x10020000000-0x10027ffffff pref] > > is misleading at best and likely a symptom of a bug. We printed the > *size* of BAR 0, not an address, which means we haven't assigned space > for the BAR. That means it should not conflict with anything. > > We already do revert to firmware assignments in some situations when > Linux can't figure out how to assign things itself. But apparently > not in *this* situation. > > Without seeing the whole picture, it's hard for me to figure out > what's going on here. Could you open a bug report at > http://bugzilla.kernel.org (category drivers/PCI) and attach a > complete dmesg and "lspci -vv" output? Then we can look at what > firmware did and what Linux thought was wrong with it. Done a while back: https://bugzilla.kernel.org/show_bug.cgi?id=92671 An interesting question popped up: I find the kernel doesn't accept IO BARs and bridge windows after address 0xffff, though the PCI spec and modern hardware allows 32-bit decode. Thus for practical reasons, our NumaConnect firmware doesn't setup IO BARs/windows beyond the first PCI domain (which is the only one with legacy support, and no drivers seem to require IO their BARs anyway), and we get conflicts and warnings [1]: pnp 00:00: disabling [io 0x0061] because it overlaps 0001:05:00.0 BAR 0 [io 0x0000-0x00ff] pci 0001:03:00.0: BAR 13: no space for [io size 0x1000] pci 0001:03:00.0: BAR 13: failed to assign [io size 0x1000] Is there a cleaner way of dealing with this, in our firmware and/or the kernel? Eg, I guess if IO BARs aren't assigned (value 0) on PCI domains without IO bridge windows in the ACPI AML, no need to conflict/attempt assignment? Many thanks! Daniel [1] https://bugzilla.kernel.org/attachment.cgi?id=165831 -- Daniel J Blueman Principal Software Engineer, Numascale ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PCIe 32-bit MMIO exhaustion 2015-02-24 4:37 ` Daniel J Blueman @ 2015-03-03 22:38 ` Bjorn Helgaas 2015-03-04 7:12 ` Daniel J Blueman 0 siblings, 1 reply; 7+ messages in thread From: Bjorn Helgaas @ 2015-03-03 22:38 UTC (permalink / raw) To: Daniel J Blueman Cc: Jiang Liu, Ingo Molnar, H Peter Anvin, Thomas Gleixner, Linux Kernel, Steffen Persvold, x86, Yinghai Lu, linux-pci, linux-acpi [+cc linux-pci, linux-acpi] On Tue, Feb 24, 2015 at 12:37:39PM +0800, Daniel J Blueman wrote: > Hi Bjorn, Jiang, > > On 29/01/2015 23:23, Bjorn Helgaas wrote: > >Hi Daniel, > > > >On Wed, Jan 28, 2015 at 2:42 AM, Daniel J Blueman <daniel@numascale.com> wrote: > >>With systems with a large number of PCI devices, we're seeing lack of 32-bit > >>MMIO space, eg one quad-port NetXtreme-2 adapter takes 128MB of space [1]. > >> > >>An errata to the PCIe 2.1 spec provides guidance on limitations with 64-bit > >>non-prefetchable BARs (since bridges have only 32-bit non-prefetchable > >>ranges) stating that vendors can enable the prefetchable bit in BARs under > >>certain circumstances to allow 64-bit allocation [2]. > >> > >>The problem with that, is that vendors can't know apriori what hosts their > >>products will be in, so can't just advertise prefetchable 64-bit BARs. What > >>can be done, is system firmware can use the 64-bit prefetchable BAR in > >>bridges, and assign a 64-bit non-prefetchable device BAR into that area, > >>where it is safe to do so (following the guidance). > >> > >>At present, linux denies such allocations [3] and disables the BARs. It > >>seems a practical solution to allow them if the firmware believes it is > >>safe. > > > >This particular message ([3]): > > > >>pci 0002:01:00.0: BAR 0: [mem size 0x00002000 64bit] conflicts with PCI Bus > >>0002:00 [mem 0x10020000000-0x10027ffffff pref] > > > >is misleading at best and likely a symptom of a bug. We printed the > >*size* of BAR 0, not an address, which means we haven't assigned space > >for the BAR. That means it should not conflict with anything. > > > >We already do revert to firmware assignments in some situations when > >Linux can't figure out how to assign things itself. But apparently > >not in *this* situation. > > > >Without seeing the whole picture, it's hard for me to figure out > >what's going on here. Could you open a bug report at > >http://bugzilla.kernel.org (category drivers/PCI) and attach a > >complete dmesg and "lspci -vv" output? Then we can look at what > >firmware did and what Linux thought was wrong with it. > > Done a while back: > https://bugzilla.kernel.org/show_bug.cgi?id=92671 > > An interesting question popped up: I find the kernel doesn't accept > IO BARs and bridge windows after address 0xffff, though the PCI spec > and modern hardware allows 32-bit decode. > > Thus for practical reasons, our NumaConnect firmware doesn't setup > IO BARs/windows beyond the first PCI domain (which is the only one > with legacy support, and no drivers seem to require IO their BARs > anyway), ... If we don't handle IO ports above 0xffff, I think that's broken. I'm pretty sure we do handle that on ia64 (it's done by assigning 64K of IO space to each host bridge, and I think it's typically translated by the bridge so each root bus sees a 0-64K space on PCI). We should be able to do something similar on x86, but it may not be implemented there yet. > and we get conflicts and warnings [1]: > > pnp 00:00: disabling [io 0x0061] because it overlaps 0001:05:00.0 > BAR 0 [io 0x0000-0x00ff] > pci 0001:03:00.0: BAR 13: no space for [io size 0x1000] > pci 0001:03:00.0: BAR 13: failed to assign [io size 0x1000] > > Is there a cleaner way of dealing with this, in our firmware and/or > the kernel? Eg, I guess if IO BARs aren't assigned (value 0) on PCI > domains without IO bridge windows in the ACPI AML, no need to > conflict/attempt assignment? Yes, we should be able to deal with this better. The complaint about disabling the pnp 00:00 resource is bogus because the PCI 0001:05:00.0 BAR is not assigned and should never be enabled, so this is not a real conflict. My intent is that the PCI resource corresponding to this BAR should have the IORESOURCE_UNSET bit set. That will prevent pci_enable_resources() from setting the PCI_COMMAND_IO bit, which is what would enable the BAR. Can you try the patch below? I don't think it will work right off the bat because I think the fact that we print "[io 0x0000-0x00ff]" instead of "[io size 0x0100]" means we don't have IORESOURCE_UNSET set in the PCI resource. But maybe you can figure out where it *should* be getting set? Bjorn commit fd4888cf942a2ae9cdefc46d1fba86b2c7ec2dbf Author: Bjorn Helgaas <bhelgaas@google.com> Date: Tue Mar 3 16:13:56 2015 -0600 PNP: Don't check for overlaps with unassigned PCI BARs After 0509ad5e1a7d ("PNP: disable PNP motherboard resources that overlap PCI BARs"), we disable and warn about PNP resources that overlap PCI BARs. But we assume that all PCI BARs are valid, which is incorrect, because a BAR may not have any space assigned to it. In that case, we will not enable the BAR, so no other resource can conflict with it. Ignore PCI BARs that are unassigned, as indicated by IORESOURCE_UNSET. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> diff --git a/drivers/pnp/quirks.c b/drivers/pnp/quirks.c index ebf0d6710b5a..943c1cb9566c 100644 --- a/drivers/pnp/quirks.c +++ b/drivers/pnp/quirks.c @@ -246,13 +246,16 @@ static void quirk_system_pci_resources(struct pnp_dev *dev) */ for_each_pci_dev(pdev) { for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) { - unsigned long type; + unsigned long flags, type; - type = pci_resource_flags(pdev, i) & - (IORESOURCE_IO | IORESOURCE_MEM); + flags = pci_resource_flags(pdev, i); + type = flags & (IORESOURCE_IO | IORESOURCE_MEM); if (!type || pci_resource_len(pdev, i) == 0) continue; + if (flags & IORESOURCE_UNSET) + continue; + pci_start = pci_resource_start(pdev, i); pci_end = pci_resource_end(pdev, i); for (j = 0; ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: PCIe 32-bit MMIO exhaustion 2015-03-03 22:38 ` Bjorn Helgaas @ 2015-03-04 7:12 ` Daniel J Blueman 2015-03-04 17:01 ` Bjorn Helgaas 0 siblings, 1 reply; 7+ messages in thread From: Daniel J Blueman @ 2015-03-04 7:12 UTC (permalink / raw) To: Bjorn Helgaas Cc: Jiang Liu, Ingo Molnar, H Peter Anvin, Thomas Gleixner, Linux Kernel, Steffen Persvold, x86, Yinghai Lu, linux-pci, linux-acpi On 04/03/2015 06:38, Bjorn Helgaas wrote: > [+cc linux-pci, linux-acpi] > > On Tue, Feb 24, 2015 at 12:37:39PM +0800, Daniel J Blueman wrote: >> Hi Bjorn, Jiang, >> >> On 29/01/2015 23:23, Bjorn Helgaas wrote: >>> Hi Daniel, >>> >>> On Wed, Jan 28, 2015 at 2:42 AM, Daniel J Blueman <daniel@numascale.com> wrote: >>>> With systems with a large number of PCI devices, we're seeing lack of 32-bit >>>> MMIO space, eg one quad-port NetXtreme-2 adapter takes 128MB of space [1]. >>>> >>>> An errata to the PCIe 2.1 spec provides guidance on limitations with 64-bit >>>> non-prefetchable BARs (since bridges have only 32-bit non-prefetchable >>>> ranges) stating that vendors can enable the prefetchable bit in BARs under >>>> certain circumstances to allow 64-bit allocation [2]. >>>> >>>> The problem with that, is that vendors can't know apriori what hosts their >>>> products will be in, so can't just advertise prefetchable 64-bit BARs. What >>>> can be done, is system firmware can use the 64-bit prefetchable BAR in >>>> bridges, and assign a 64-bit non-prefetchable device BAR into that area, >>>> where it is safe to do so (following the guidance). >>>> >>>> At present, linux denies such allocations [3] and disables the BARs. It >>>> seems a practical solution to allow them if the firmware believes it is >>>> safe. >>> >>> This particular message ([3]): >>> >>>> pci 0002:01:00.0: BAR 0: [mem size 0x00002000 64bit] conflicts with PCI Bus >>>> 0002:00 [mem 0x10020000000-0x10027ffffff pref] >>> >>> is misleading at best and likely a symptom of a bug. We printed the >>> *size* of BAR 0, not an address, which means we haven't assigned space >>> for the BAR. That means it should not conflict with anything. >>> >>> We already do revert to firmware assignments in some situations when >>> Linux can't figure out how to assign things itself. But apparently >>> not in *this* situation. >>> >>> Without seeing the whole picture, it's hard for me to figure out >>> what's going on here. Could you open a bug report at >>> http://bugzilla.kernel.org (category drivers/PCI) and attach a >>> complete dmesg and "lspci -vv" output? Then we can look at what >>> firmware did and what Linux thought was wrong with it. >> >> Done a while back: >> https://bugzilla.kernel.org/show_bug.cgi?id=92671 >> >> An interesting question popped up: I find the kernel doesn't accept >> IO BARs and bridge windows after address 0xffff, though the PCI spec >> and modern hardware allows 32-bit decode. >> >> Thus for practical reasons, our NumaConnect firmware doesn't setup >> IO BARs/windows beyond the first PCI domain (which is the only one >> with legacy support, and no drivers seem to require IO their BARs >> anyway), ... > > If we don't handle IO ports above 0xffff, I think that's broken. I'm > pretty sure we do handle that on ia64 (it's done by assigning 64K of IO > space to each host bridge, and I think it's typically translated by the > bridge so each root bus sees a 0-64K space on PCI). We should be able to > do something similar on x86, but it may not be implemented there yet. > >> and we get conflicts and warnings [1]: >> >> pnp 00:00: disabling [io 0x0061] because it overlaps 0001:05:00.0 >> BAR 0 [io 0x0000-0x00ff] >> pci 0001:03:00.0: BAR 13: no space for [io size 0x1000] >> pci 0001:03:00.0: BAR 13: failed to assign [io size 0x1000] >> >> Is there a cleaner way of dealing with this, in our firmware and/or >> the kernel? Eg, I guess if IO BARs aren't assigned (value 0) on PCI >> domains without IO bridge windows in the ACPI AML, no need to >> conflict/attempt assignment? > > Yes, we should be able to deal with this better. > > The complaint about disabling the pnp 00:00 resource is bogus because the > PCI 0001:05:00.0 BAR is not assigned and should never be enabled, so this > is not a real conflict. My intent is that the PCI resource corresponding > to this BAR should have the IORESOURCE_UNSET bit set. That will prevent > pci_enable_resources() from setting the PCI_COMMAND_IO bit, which is what > would enable the BAR. > > Can you try the patch below? I don't think it will work right off the bat > because I think the fact that we print "[io 0x0000-0x00ff]" instead of > "[io size 0x0100]" means we don't have IORESOURCE_UNSET set in the PCI > resource. But maybe you can figure out where it *should* be getting > set? > > Bjorn > > > commit fd4888cf942a2ae9cdefc46d1fba86b2c7ec2dbf > Author: Bjorn Helgaas <bhelgaas@google.com> > Date: Tue Mar 3 16:13:56 2015 -0600 > > PNP: Don't check for overlaps with unassigned PCI BARs > > After 0509ad5e1a7d ("PNP: disable PNP motherboard resources that overlap > PCI BARs"), we disable and warn about PNP resources that overlap PCI BARs. > But we assume that all PCI BARs are valid, which is incorrect, because a > BAR may not have any space assigned to it. In that case, we will not > enable the BAR, so no other resource can conflict with it. > > Ignore PCI BARs that are unassigned, as indicated by IORESOURCE_UNSET. > > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > > diff --git a/drivers/pnp/quirks.c b/drivers/pnp/quirks.c > index ebf0d6710b5a..943c1cb9566c 100644 > --- a/drivers/pnp/quirks.c > +++ b/drivers/pnp/quirks.c > @@ -246,13 +246,16 @@ static void quirk_system_pci_resources(struct pnp_dev *dev) > */ > for_each_pci_dev(pdev) { > for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) { > - unsigned long type; > + unsigned long flags, type; > > - type = pci_resource_flags(pdev, i) & > - (IORESOURCE_IO | IORESOURCE_MEM); > + flags = pci_resource_flags(pdev, i); > + type = flags & (IORESOURCE_IO | IORESOURCE_MEM); > if (!type || pci_resource_len(pdev, i) == 0) > continue; > > + if (flags & IORESOURCE_UNSET) > + continue; > + > pci_start = pci_resource_start(pdev, i); > pci_end = pci_resource_end(pdev, i); > for (j = 0; > Your patch solves the conflicts nicely [1] with: From f835b16b0758a1dde6042a0e4c8aa5a2e8be5f21 Mon Sep 17 00:00:00 2001 From: Daniel J Blueman <daniel@numascale.com> Date: Wed, 4 Mar 2015 14:53:00 +0800 Subject: [PATCH] Mark PCI BARs with address 0 as unset Allow the kernel to activate the unset flag for PCI BAR resources if the firmware assigns address 0 (invalid as legacy IO is in this range). This allows preventing conflicts with legacy IO/ACPI PNP resources in this range. Signed-off-by: Daniel J Blueman <daniel@numascale.com> --- drivers/pci/probe.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 8d2f400..ef43652 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -281,6 +281,13 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type, pcibios_resource_to_bus(dev->bus, &inverted_region, res); /* + * If firmware doesn't assign a valid PCI address (as legacy IO is below + * PCI IO), mark resource unset to prevent later resource conflicts + */ + if (region.start == 0) + res->flags |= IORESOURCE_UNSET; + + /* * If "A" is a BAR value (a bus address), "bus_to_resource(A)" is * the corresponding resource address (the physical address used by * the CPU. Converting that resource address back to a bus address [1] https://resource.numascale.com/dmesg-4.0.0-rc2.txt -- Daniel J Blueman Principal Software Engineer, Numascale ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: PCIe 32-bit MMIO exhaustion 2015-03-04 7:12 ` Daniel J Blueman @ 2015-03-04 17:01 ` Bjorn Helgaas 2015-03-19 15:04 ` Bjorn Helgaas 0 siblings, 1 reply; 7+ messages in thread From: Bjorn Helgaas @ 2015-03-04 17:01 UTC (permalink / raw) To: Daniel J Blueman Cc: Jiang Liu, Ingo Molnar, H Peter Anvin, Thomas Gleixner, Linux Kernel, Steffen Persvold, x86, Yinghai Lu, linux-pci, linux-acpi On Wed, Mar 04, 2015 at 03:12:04PM +0800, Daniel J Blueman wrote: > Your patch solves the conflicts nicely [1] with: > > From f835b16b0758a1dde6042a0e4c8aa5a2e8be5f21 Mon Sep 17 00:00:00 2001 > From: Daniel J Blueman <daniel@numascale.com> > Date: Wed, 4 Mar 2015 14:53:00 +0800 > Subject: [PATCH] Mark PCI BARs with address 0 as unset > > Allow the kernel to activate the unset flag for PCI BAR resources if > the firmware assigns address 0 (invalid as legacy IO is in this range). > > This allows preventing conflicts with legacy IO/ACPI PNP resources in > this range. > > Signed-off-by: Daniel J Blueman <daniel@numascale.com> > --- > drivers/pci/probe.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index 8d2f400..ef43652 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -281,6 +281,13 @@ int __pci_read_base(struct pci_dev *dev, enum > pci_bar_type type, > pcibios_resource_to_bus(dev->bus, &inverted_region, res); > > /* > + * If firmware doesn't assign a valid PCI address (as legacy IO is below > + * PCI IO), mark resource unset to prevent later resource conflicts > + */ > + if (region.start == 0) > + res->flags |= IORESOURCE_UNSET; It's true that an uninitialized BAR should contain zero. But an initialized BAR may also contain zero, since zero is a valid PCI memory or I/O address, so I don't really want to preclude that here. On large systems with host bridges that support address translation, it would be reasonable to have something like this: pci_bus 0001:00: root bus resource [mem 0x100000000-0x1ffffffff] (bus address [0x00000000-0xffffffff]) In that case, an initialized BAR may contain zero and that should not be an error. On your system, I don't think you advertise an I/O aperture to bus 0001:00. I'd like to make the PCI core smart enough to notice that and just ignore any I/O BARs on that bus. There's an argument for doing this immediately, here inside __pci_read_base(): we could look for an upstream window that contains the BAR we're reading. I'd like to be able to do that someday, but I'm not sure we have enough of the upstream topology set up to do that. Can you try the patch below, which tries to do it a little later? > + /* > * If "A" is a BAR value (a bus address), "bus_to_resource(A)" is > * the corresponding resource address (the physical address used by > * the CPU. Converting that resource address back to a bus address > > [1] https://resource.numascale.com/dmesg-4.0.0-rc2.txt This URL doesn't work for me. Bjorn commit 66c15b678466cb217f2615d4078d12a2ee4c99ac Author: Bjorn Helgaas <bhelgaas@google.com> Date: Wed Mar 4 10:47:35 2015 -0600 PCI: Mark invalid BARs as unassigned If a BAR is not inside any upstream bridge window, or if it conflicts with another resource, mark it as IORESOURCE_UNSET so we don't try to use it. We may be able to assign a different address for it. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c index b7c3a5ea1fca..232f9254c11a 100644 --- a/drivers/pci/setup-res.c +++ b/drivers/pci/setup-res.c @@ -120,6 +120,7 @@ int pci_claim_resource(struct pci_dev *dev, int resource) if (!root) { dev_info(&dev->dev, "can't claim BAR %d %pR: no compatible bridge window\n", resource, res); + res->flags |= IORESOURCE_UNSET; return -EINVAL; } @@ -127,6 +128,7 @@ int pci_claim_resource(struct pci_dev *dev, int resource) if (conflict) { dev_info(&dev->dev, "can't claim BAR %d %pR: address conflict with %s %pR\n", resource, res, conflict->name, conflict); + res->flags |= IORESOURCE_UNSET; return -EBUSY; } ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: PCIe 32-bit MMIO exhaustion 2015-03-04 17:01 ` Bjorn Helgaas @ 2015-03-19 15:04 ` Bjorn Helgaas 0 siblings, 0 replies; 7+ messages in thread From: Bjorn Helgaas @ 2015-03-19 15:04 UTC (permalink / raw) To: Daniel J Blueman Cc: Jiang Liu, Ingo Molnar, H Peter Anvin, Thomas Gleixner, Linux Kernel, Steffen Persvold, x86, Yinghai Lu, linux-pci, linux-acpi On Wed, Mar 04, 2015 at 11:01:59AM -0600, Bjorn Helgaas wrote: > On Wed, Mar 04, 2015 at 03:12:04PM +0800, Daniel J Blueman wrote: > > Your patch solves the conflicts nicely [1] with: > > > > From f835b16b0758a1dde6042a0e4c8aa5a2e8be5f21 Mon Sep 17 00:00:00 2001 > > From: Daniel J Blueman <daniel@numascale.com> > > Date: Wed, 4 Mar 2015 14:53:00 +0800 > > Subject: [PATCH] Mark PCI BARs with address 0 as unset > > > > Allow the kernel to activate the unset flag for PCI BAR resources if > > the firmware assigns address 0 (invalid as legacy IO is in this range). > > > > This allows preventing conflicts with legacy IO/ACPI PNP resources in > > this range. > > > > Signed-off-by: Daniel J Blueman <daniel@numascale.com> > > --- > > drivers/pci/probe.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > > index 8d2f400..ef43652 100644 > > --- a/drivers/pci/probe.c > > +++ b/drivers/pci/probe.c > > @@ -281,6 +281,13 @@ int __pci_read_base(struct pci_dev *dev, enum > > pci_bar_type type, > > pcibios_resource_to_bus(dev->bus, &inverted_region, res); > > > > /* > > + * If firmware doesn't assign a valid PCI address (as legacy IO is below > > + * PCI IO), mark resource unset to prevent later resource conflicts > > + */ > > + if (region.start == 0) > > + res->flags |= IORESOURCE_UNSET; > > It's true that an uninitialized BAR should contain zero. But an > initialized BAR may also contain zero, since zero is a valid PCI memory or > I/O address, so I don't really want to preclude that here. On large > systems with host bridges that support address translation, it would be > reasonable to have something like this: > > pci_bus 0001:00: root bus resource [mem 0x100000000-0x1ffffffff] (bus address [0x00000000-0xffffffff]) > > In that case, an initialized BAR may contain zero and that should not be an > error. > > On your system, I don't think you advertise an I/O aperture to bus 0001:00. > I'd like to make the PCI core smart enough to notice that and just ignore > any I/O BARs on that bus. > > There's an argument for doing this immediately, here inside > __pci_read_base(): we could look for an upstream window that contains the > BAR we're reading. I'd like to be able to do that someday, but I'm not > sure we have enough of the upstream topology set up to do that. > > Can you try the patch below, which tries to do it a little later? > > > + /* > > * If "A" is a BAR value (a bus address), "bus_to_resource(A)" is > > * the corresponding resource address (the physical address used by > > * the CPU. Converting that resource address back to a bus address > > > > [1] https://resource.numascale.com/dmesg-4.0.0-rc2.txt > > This URL doesn't work for me. Ping? > commit 66c15b678466cb217f2615d4078d12a2ee4c99ac > Author: Bjorn Helgaas <bhelgaas@google.com> > Date: Wed Mar 4 10:47:35 2015 -0600 > > PCI: Mark invalid BARs as unassigned > > If a BAR is not inside any upstream bridge window, or if it conflicts with > another resource, mark it as IORESOURCE_UNSET so we don't try to use it. > We may be able to assign a different address for it. > > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > > diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c > index b7c3a5ea1fca..232f9254c11a 100644 > --- a/drivers/pci/setup-res.c > +++ b/drivers/pci/setup-res.c > @@ -120,6 +120,7 @@ int pci_claim_resource(struct pci_dev *dev, int resource) > if (!root) { > dev_info(&dev->dev, "can't claim BAR %d %pR: no compatible bridge window\n", > resource, res); > + res->flags |= IORESOURCE_UNSET; > return -EINVAL; > } > > @@ -127,6 +128,7 @@ int pci_claim_resource(struct pci_dev *dev, int resource) > if (conflict) { > dev_info(&dev->dev, "can't claim BAR %d %pR: address conflict with %s %pR\n", > resource, res, conflict->name, conflict); > + res->flags |= IORESOURCE_UNSET; > return -EBUSY; > } > ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-03-19 15:04 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-01-28 8:42 PCIe 32-bit MMIO exhaustion Daniel J Blueman 2015-01-29 15:23 ` Bjorn Helgaas 2015-02-24 4:37 ` Daniel J Blueman 2015-03-03 22:38 ` Bjorn Helgaas 2015-03-04 7:12 ` Daniel J Blueman 2015-03-04 17:01 ` Bjorn Helgaas 2015-03-19 15:04 ` Bjorn Helgaas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).