linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5] PCI: Fix disabling of bridge BARs when assigning bus resources
@ 2020-01-08 21:32 Logan Gunthorpe
  2020-01-13 20:07 ` Bjorn Helgaas
  0 siblings, 1 reply; 4+ messages in thread
From: Logan Gunthorpe @ 2020-01-08 21:32 UTC (permalink / raw)
  To: linux-kernel, linux-pci, Bjorn Helgaas
  Cc: Kit Chow, Benjamin Herrenschmidt, Nicholas Johnson,
	Mika Westerberg, Logan Gunthorpe

One odd quirk of PLX switches is that their upstream bridge port has
256K of space allocated behind its BAR0 (most other bridge
implementations do not report any BAR space). The lspci for such  device
looks like:

  04:00.0 PCI bridge: PLX Technology, Inc. PEX 8724 24-Lane, 6-Port PCI
            Express Gen 3 (8 GT/s) Switch, 19 x 19mm FCBGA (rev ca)
	    (prog-if 00 [Normal decode])
      Physical Slot: 1
      Flags: bus master, fast devsel, latency 0, IRQ 30, NUMA node 0
      Memory at 90a00000 (32-bit, non-prefetchable) [size=256K]
      Bus: primary=04, secondary=05, subordinate=0a, sec-latency=0
      I/O behind bridge: 00002000-00003fff
      Memory behind bridge: 90000000-909fffff
      Prefetchable memory behind bridge: 0000380000800000-0000380000bfffff
      Kernel driver in use: pcieport

It's not clear what the purpose of the memory at 0x90a00000 is, and
currently the kernel never actually uses it for anything. In most cases,
it's safely ignored and does not cause a problem.

However, when the kernel assigns the resource addresses (with the
pci=realloc command line parameter, for example) it can inadvertently
disable the struct resource corresponding to the BAR. When this happens,
lspci will report this memory as ignored:

   Region 0: Memory at <ignored> (32-bit, non-prefetchable) [size=256K]

This is because the kernel reports a zero start address and zero flags
in the corresponding sysfs resource file and in /proc/bus/pci/devices.
Investigation with 'lspci -x', however shows the BIOS-assigned address
will still be programmed in the device's BAR registers.

It's clearly a bug that the kernel's view of the registers differs from
what's actually programmed in the BAR, but in most cases, this still
won't result in a visible issue because nothing uses the memory,
so nothing is affected. However, a big problem shows up when an IOMMU
is in use: the IOMMU will not reserve this space in the IOVA because the
kernel no longer thinks the range is valid. (See
dmar_init_reserved_ranges() for the Intel implementation of this.)

Without the proper reserved range, we have a situation where a DMA
mapping may occasionally allocate an IOVA which the PCI bus will actually
route to a BAR in the PLX switch. This will result in some random DMA
writes not actually writing to the RAM they are supposed to, or random
DMA reads returning all FFs from the PLX BAR when it's supposed to have
read from RAM.

The problem is caused in pci_assign_unassigned_root_bus_resources().
When any resource from a bridge device fails to get assigned, the code
sets the resource's flags to zero. This makes sense for bridge resources,
as they will be re-enabled later, but for regular BARs, it disables them
permanently.

The code in question seems to intend to check if "dev->subordinate" is
zero to determine whether a device is a bridge, however this is not
likely valid as there might be a bridge without a subordinate bus due to
running out of bus numbers or other cases.

To fix these issues we instead check that the idx is in the
PCI_BRIDGE_RESOURCES range which are only used for bridge windows and
thus is sufficient for the "dev->subordinate" check and will also
prevent the bug above from clobbering PLX devices' regular BARs.

The bug was caused in pci_assign_unassigned_root_bus_resources() but the
same pattern is in pci_assign_unassigned_bridge_resources() so we
changed the code for consistency in both places.

Reported-by: Kit Chow <kchow@gigaio.com>
Fixes: da7822e5ad71 ("PCI: update bridge resources to get more big ranges when allocating space (again)")
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/setup-bus.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)


v5 fixes a bunch of nits in the commit message and makes the same change to
the same pattern in both pci_assign_unassigned_root_bus_resources() and
pci_assign_unassigned_bridge_resources().

The patch is based on v5.5-rc5 and a git branch is available here:

https://github.com/sbates130272/linux-p2pmem pci_realloc_v5

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index f279826204eb..416cb625395e 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1803,11 +1803,15 @@ void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus)
 	/* Restore size and flags */
 	list_for_each_entry(fail_res, &fail_head, list) {
 		struct resource *res = fail_res->res;
+		int idx;

 		res->start = fail_res->start;
 		res->end = fail_res->end;
 		res->flags = fail_res->flags;
-		if (fail_res->dev->subordinate)
+
+		idx = res - &fail_res->dev->resource[0];
+		if (idx >= PCI_BRIDGE_RESOURCES &&
+		    idx <= PCI_BRIDGE_RESOURCE_END)
 			res->flags = 0;
 	}
 	free_list(&fail_head);
@@ -2055,11 +2059,15 @@ void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge)
 	/* Restore size and flags */
 	list_for_each_entry(fail_res, &fail_head, list) {
 		struct resource *res = fail_res->res;
+		int idx;

 		res->start = fail_res->start;
 		res->end = fail_res->end;
 		res->flags = fail_res->flags;
-		if (fail_res->dev->subordinate)
+
+		idx = res - &fail_res->dev->resource[0];
+		if (idx >= PCI_BRIDGE_RESOURCES &&
+		    idx <= PCI_BRIDGE_RESOURCE_END)
 			res->flags = 0;
 	}
 	free_list(&fail_head);
--
2.20.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v5] PCI: Fix disabling of bridge BARs when assigning bus resources
  2020-01-08 21:32 [PATCH v5] PCI: Fix disabling of bridge BARs when assigning bus resources Logan Gunthorpe
@ 2020-01-13 20:07 ` Bjorn Helgaas
  2020-01-14 18:07   ` Logan Gunthorpe
  0 siblings, 1 reply; 4+ messages in thread
From: Bjorn Helgaas @ 2020-01-13 20:07 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-pci, Kit Chow, Benjamin Herrenschmidt,
	Nicholas Johnson, Mika Westerberg

On Wed, Jan 08, 2020 at 02:32:08PM -0700, Logan Gunthorpe wrote:
> One odd quirk of PLX switches is that their upstream bridge port has
> 256K of space allocated behind its BAR0 (most other bridge
> implementations do not report any BAR space). The lspci for such  device
> looks like:
> 
>   04:00.0 PCI bridge: PLX Technology, Inc. PEX 8724 24-Lane, 6-Port PCI
>             Express Gen 3 (8 GT/s) Switch, 19 x 19mm FCBGA (rev ca)
> 	    (prog-if 00 [Normal decode])
>       Physical Slot: 1
>       Flags: bus master, fast devsel, latency 0, IRQ 30, NUMA node 0
>       Memory at 90a00000 (32-bit, non-prefetchable) [size=256K]
>       Bus: primary=04, secondary=05, subordinate=0a, sec-latency=0
>       I/O behind bridge: 00002000-00003fff
>       Memory behind bridge: 90000000-909fffff
>       Prefetchable memory behind bridge: 0000380000800000-0000380000bfffff
>       Kernel driver in use: pcieport
> 
> It's not clear what the purpose of the memory at 0x90a00000 is, and
> currently the kernel never actually uses it for anything. In most cases,
> it's safely ignored and does not cause a problem.
> 
> However, when the kernel assigns the resource addresses (with the
> pci=realloc command line parameter, for example) it can inadvertently
> disable the struct resource corresponding to the BAR. When this happens,
> lspci will report this memory as ignored:
> 
>    Region 0: Memory at <ignored> (32-bit, non-prefetchable) [size=256K]
> 
> This is because the kernel reports a zero start address and zero flags
> in the corresponding sysfs resource file and in /proc/bus/pci/devices.
> Investigation with 'lspci -x', however shows the BIOS-assigned address
> will still be programmed in the device's BAR registers.
> 
> It's clearly a bug that the kernel's view of the registers differs from
> what's actually programmed in the BAR, but in most cases, this still
> won't result in a visible issue because nothing uses the memory,
> so nothing is affected. However, a big problem shows up when an IOMMU
> is in use: the IOMMU will not reserve this space in the IOVA because the
> kernel no longer thinks the range is valid. (See
> dmar_init_reserved_ranges() for the Intel implementation of this.)
> 
> Without the proper reserved range, we have a situation where a DMA
> mapping may occasionally allocate an IOVA which the PCI bus will actually
> route to a BAR in the PLX switch. This will result in some random DMA
> writes not actually writing to the RAM they are supposed to, or random
> DMA reads returning all FFs from the PLX BAR when it's supposed to have
> read from RAM.
> 
> The problem is caused in pci_assign_unassigned_root_bus_resources().
> When any resource from a bridge device fails to get assigned, the code
> sets the resource's flags to zero. This makes sense for bridge resources,
> as they will be re-enabled later, but for regular BARs, it disables them
> permanently.
> 
> The code in question seems to intend to check if "dev->subordinate" is
> zero to determine whether a device is a bridge, however this is not
> likely valid as there might be a bridge without a subordinate bus due to
> running out of bus numbers or other cases.
> 
> To fix these issues we instead check that the idx is in the
> PCI_BRIDGE_RESOURCES range which are only used for bridge windows and
> thus is sufficient for the "dev->subordinate" check and will also
> prevent the bug above from clobbering PLX devices' regular BARs.
> 
> The bug was caused in pci_assign_unassigned_root_bus_resources() but the
> same pattern is in pci_assign_unassigned_bridge_resources() so we
> changed the code for consistency in both places.
> 
> Reported-by: Kit Chow <kchow@gigaio.com>
> Fixes: da7822e5ad71 ("PCI: update bridge resources to get more big ranges when allocating space (again)")
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>

Applied to pci/resource for v5.6, thanks!

I added a check of pci_is_bridge() as another hint that this is really
a bridge-specific issue.  Let me know if that breaks anything.

commit 9db8dc6d0785 ("PCI: Don't disable bridge BARs when assigning bus resources")
Author: Logan Gunthorpe <logang@deltatee.com>
Date:   Wed Jan 8 14:32:08 2020 -0700

    PCI: Don't disable bridge BARs when assigning bus resources
    
    Some PCI bridges implement BARs in addition to bridge windows.  For
    example, here's a PLX switch:
    
      04:00.0 PCI bridge: PLX Technology, Inc. PEX 8724 24-Lane, 6-Port PCI
                Express Gen 3 (8 GT/s) Switch, 19 x 19mm FCBGA (rev ca)
    	    (prog-if 00 [Normal decode])
          Flags: bus master, fast devsel, latency 0, IRQ 30, NUMA node 0
          Memory at 90a00000 (32-bit, non-prefetchable) [size=256K]
          Bus: primary=04, secondary=05, subordinate=0a, sec-latency=0
          I/O behind bridge: 00002000-00003fff
          Memory behind bridge: 90000000-909fffff
          Prefetchable memory behind bridge: 0000380000800000-0000380000bfffff
    
    Previously, when the kernel assigned resource addresses (with the
    pci=realloc command line parameter, for example) it could clear the struct
    resource corresponding to the BAR.  When this happened, lspci would report
    this BAR as "ignored":
    
       Region 0: Memory at <ignored> (32-bit, non-prefetchable) [size=256K]
    
    This is because the kernel reports a zero start address and zero flags
    in the corresponding sysfs resource file and in /proc/bus/pci/devices.
    Investigation with 'lspci -x', however, shows the BIOS-assigned address
    will still be programmed in the device's BAR registers.
    
    It's clearly a bug that the kernel lost track of the BAR value, but in most
    cases, this still won't result in a visible issue because nothing uses the
    memory, so nothing is affected.  However, when an IOMMU is in use, it will
    not reserve this space in the IOVA because the kernel no longer thinks the
    range is valid.  (See dmar_init_reserved_ranges() for the Intel
    implementation of this.)
    
    Without the proper reserved range, a DMA mapping may allocate an IOVA that
    matches a bridge BAR, which results in DMA accesses going to the BAR
    instead of the intended RAM.
    
    The problem was in pci_assign_unassigned_root_bus_resources().  When any
    resource from a bridge device fails to get assigned, the code set the
    resource's flags to zero.  This makes sense for bridge windows, as they
    will be re-enabled later, but for regular BARs, it makes the kernel
    permanently lose track of the fact that they decode address space.
    
    Change pci_assign_unassigned_root_bus_resources() and
    pci_assign_unassigned_bridge_resources() so they only clear "res->flags"
    for bridge *windows*, not bridge BARs.
    
    Fixes: da7822e5ad71 ("PCI: update bridge resources to get more big ranges when allocating space (again)")
    Link: https://lore.kernel.org/r/20200108213208.4612-1-logang@deltatee.com
    [bhelgaas: commit log, check for pci_is_bridge()]
    Reported-by: Kit Chow <kchow@gigaio.com>
    Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index f279826204eb..591161ce0f51 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1803,12 +1803,18 @@ void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus)
 	/* Restore size and flags */
 	list_for_each_entry(fail_res, &fail_head, list) {
 		struct resource *res = fail_res->res;
+		int idx;
 
 		res->start = fail_res->start;
 		res->end = fail_res->end;
 		res->flags = fail_res->flags;
-		if (fail_res->dev->subordinate)
-			res->flags = 0;
+
+		if (pci_is_bridge(fail_res->dev)) {
+			idx = res - &fail_res->dev->resource[0];
+			if (idx >= PCI_BRIDGE_RESOURCES &&
+			    idx <= PCI_BRIDGE_RESOURCE_END)
+				res->flags = 0;
+		}
 	}
 	free_list(&fail_head);
 
@@ -2055,12 +2061,18 @@ void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge)
 	/* Restore size and flags */
 	list_for_each_entry(fail_res, &fail_head, list) {
 		struct resource *res = fail_res->res;
+		int idx;
 
 		res->start = fail_res->start;
 		res->end = fail_res->end;
 		res->flags = fail_res->flags;
-		if (fail_res->dev->subordinate)
-			res->flags = 0;
+
+		if (pci_is_bridge(fail_res->dev)) {
+			idx = res - &fail_res->dev->resource[0];
+			if (idx >= PCI_BRIDGE_RESOURCES &&
+			    idx <= PCI_BRIDGE_RESOURCE_END)
+				res->flags = 0;
+		}
 	}
 	free_list(&fail_head);
 

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v5] PCI: Fix disabling of bridge BARs when assigning bus resources
  2020-01-13 20:07 ` Bjorn Helgaas
@ 2020-01-14 18:07   ` Logan Gunthorpe
  2020-01-14 19:06     ` Bjorn Helgaas
  0 siblings, 1 reply; 4+ messages in thread
From: Logan Gunthorpe @ 2020-01-14 18:07 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-kernel, linux-pci, Kit Chow, Benjamin Herrenschmidt,
	Nicholas Johnson, Mika Westerberg



On 2020-01-13 1:07 p.m., Bjorn Helgaas wrote:
> Applied to pci/resource for v5.6, thanks!
> 
> I added a check of pci_is_bridge() as another hint that this is really
> a bridge-specific issue.  Let me know if that breaks anything.
> 
> commit 9db8dc6d0785 ("PCI: Don't disable bridge BARs when assigning bus resources")
> Author: Logan Gunthorpe <logang@deltatee.com>
> Date:   Wed Jan 8 14:32:08 2020 -0700

Thanks! I was going to test the pci/resource branch, but I haven't seen
the patch in your repo yet... Current head is

86f98025a075 ("PCI: Allow extend_bridge_window() to shrink resource if
necessary")

Logan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v5] PCI: Fix disabling of bridge BARs when assigning bus resources
  2020-01-14 18:07   ` Logan Gunthorpe
@ 2020-01-14 19:06     ` Bjorn Helgaas
  0 siblings, 0 replies; 4+ messages in thread
From: Bjorn Helgaas @ 2020-01-14 19:06 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-pci, Kit Chow, Benjamin Herrenschmidt,
	Nicholas Johnson, Mika Westerberg

On Tue, Jan 14, 2020 at 11:07:12AM -0700, Logan Gunthorpe wrote:
> On 2020-01-13 1:07 p.m., Bjorn Helgaas wrote:
> > Applied to pci/resource for v5.6, thanks!
> > 
> > I added a check of pci_is_bridge() as another hint that this is really
> > a bridge-specific issue.  Let me know if that breaks anything.
> > 
> > commit 9db8dc6d0785 ("PCI: Don't disable bridge BARs when assigning bus resources")
> > Author: Logan Gunthorpe <logang@deltatee.com>
> > Date:   Wed Jan 8 14:32:08 2020 -0700
> 
> Thanks! I was going to test the pci/resource branch, but I haven't seen
> the patch in your repo yet... Current head is
> 
> 86f98025a075 ("PCI: Allow extend_bridge_window() to shrink resource if
> necessary")

Pushed.  I'm anticipating some tweaks to Nicholas' patches on that
branch, so I put yours at the base.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-01-14 19:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-08 21:32 [PATCH v5] PCI: Fix disabling of bridge BARs when assigning bus resources Logan Gunthorpe
2020-01-13 20:07 ` Bjorn Helgaas
2020-01-14 18:07   ` Logan Gunthorpe
2020-01-14 19:06     ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).