linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] Fix a pair of setup bus bugs
@ 2019-05-31 17:12 Logan Gunthorpe
  2019-05-31 17:12 ` [PATCH v3 1/2] PCI: Prevent 64-bit resources from being counted in 32-bit bridge region Logan Gunthorpe
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Logan Gunthorpe @ 2019-05-31 17:12 UTC (permalink / raw)
  To: linux-kernel, linux-pci, Bjorn Helgaas
  Cc: Kit Chow, Yinghai Lu, Logan Gunthorpe

Hey,

This is another resend to get some more attention. Nothing has changed
since v2.

For the first patch, there's a lot more information in the original
thread here[1] including instructions on how to reproduce it in QEMU.

The second patch fixes an unrelated bug, with similar symptoms, in
the same code. It was a lot easier to debug and the reasoning should
hopefully be easier to follow, but I don't think it was reviewed much
during the first posting due to the nightmare in the first patch.

Thanks,

Logan

[1] https://lore.kernel.org/lkml/de3e34d8-2ac3-e89b-30f1-a18826ce5d7d@deltatee.com/T/#m96ba95de4678146ed46b602e8bfd6ac08a588fa2

--

Changes in v3:

* Rebased onto v5.2-rc2 (no changes)

Changes in v2:

* Rebased onto v5.1-rc6 (no changes)
* Reworked the commit message in the first commit to try and explain
  it better.

--

Logan Gunthorpe (2):
  PCI: Prevent 64-bit resources from being counted in 32-bit bridge
    region
  PCI: Fix disabling of bridge BARs when assigning bus resources

 drivers/pci/setup-bus.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

--
2.20.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3 1/2] PCI: Prevent 64-bit resources from being counted in 32-bit bridge region
  2019-05-31 17:12 [PATCH v3 0/2] Fix a pair of setup bus bugs Logan Gunthorpe
@ 2019-05-31 17:12 ` Logan Gunthorpe
  2019-05-31 17:12 ` [PATCH v3 2/2] PCI: Fix disabling of bridge BARs when assigning bus resources Logan Gunthorpe
  2019-06-17 13:49 ` [PATCH v3 0/2] Fix a pair of setup bus bugs Bjorn Helgaas
  2 siblings, 0 replies; 7+ messages in thread
From: Logan Gunthorpe @ 2019-05-31 17:12 UTC (permalink / raw)
  To: linux-kernel, linux-pci, Bjorn Helgaas
  Cc: Kit Chow, Yinghai Lu, Logan Gunthorpe

In some situations (described below), hierarchies of 32-bit resources can
fail to be assigned when the kernel has to attempt to assign a large
64-bit resource. When this happens, lspci will report
some PCI BAR resources as 'ignored' and some PCI Bridge windows
being left unset. Sample lspci lines may look like:

  Memory behind bridge: fff00000-000fffff

or

  Region 0: Memory at <ignored> (32-bit, non-prefetchable) [size=256K]

lspci reports a BAR as 'ignored' when the kernel does not populate
the struct resource and the corresponding entry in either
/proc/bus/pci/devices or /sys/bus/pci/devices/.../resource are all zero.
Any device driver that depends on one of these BARs are likely to fail
initializing and the device will not be usable. Typically when this
happens, the underlying Base Address Registers in the configuration
space are still set to whatever the firmware set them to, it's only
the kernel's view of this that is wrong.

The possible situations where this can happen will be a bit varied and
depend highly on the exact hierarchy, what the firmware has assigned
and what the kernel must do to properly re-assign resources. In the
setup that first hit this bug, it failed only with the 'pci=realloc'
command line parameter. The bug has also been hackily reproduced with
QEMU[1] without the realloc parameter.

The following things are required to hit this bug:

1) A large 64-bit prefetchable BAR that can't be assigned in any
   pass of pci_assign_unassigned_bridge_resources(). The resource must
   be large enough that it will not be able to fit with-in the 32-bit
   region. This resource may or may not be assignable into the 64-bit
   prefetchable region after additional passes.

2) A victim 32-bit non-prefetchable BAR that is a neighbor of the
   large BAR (so typically it will have to be behind a switch). When
   the bug is hit, this BAR's struct resource will not be assign and
   lspci will report it as ignored.

3) There must exist a 64-bit prefetchable window for the original large
   BAR to fit in. Which generally implies there is no 32-bit
   prefetchable window.

4) The kernel has to have a reason to re-assign the heirarchy that
   contains both BARs.

The cause of this bug is in __pci_bus_size_bridges() which tries to
calculate the total resource space required for each of the bridge windows
(typically IO, 64-bit, and 32-bit / non-prefetchable). The code, as
written, tries to allocate all the 64-bit prefetchable resources
followed by all the remaining resources. It uses three calls to
pbus_size_mem() for this:

  1) If bridge has a 64-bit prefetchable window, find the size of all
     64-bit prefetchable resources below the bridge

  2) If bridge has no 64-bit prefetchable window, find the size
     of all prefetchable resources below the bridge

  3) Find the size of everything else (non-prefetchable resources plus
     any prefetchable ones that couldn't be accommodated above)

By the requirement (3) above, the system has a 64-bit prefetchable
window, so the large 64-bit BAR *should* be assigned to the 64-bit
prefetchable region. However, if the 64-bit bus resource has already
been assigned, then this call to pbus_size_mem() will fail. (See
the find_free_bus_resource() helper). When the first call fails, it falls
to the second call but, by requirement (3) above, there is no 32-bit
prefetchable window so this call also fails. Thus, it falls to the last
call which tries to fit all the resources into the 32-bit
catch-all window. However, because of requirement (1), the large
BAR will overfill this region and cause the victim 32-bit BAR to not
be assignable.

Looking at the first call to pbus_size_mem(): there are only two reasons
for it to fail: if there is no 64-bit/prefetchable bridge window, or if that
window is already assigned. We know the former case can't be true because,
in __pci_bus_size_bridges(), its existence is checked before making the call.
So if the pbus_size_mem() call in question fails, the window must already
be assigned, and in this case, the code should not try to assign
64-bit resources into the 32-bit catch-all window.

Thus, the fix for the bug is to ensure mask, type2 and type3 are set in
cases where a 64-bit resource exists even if pbus_size_mem() fails. Once
we do this, the large BAR resource will never be attempted to be
assigned to the 32-bit catch-all window and the victim BAR will still
be correctly assigned.

[1] https://lore.kernel.org/lkml/de3e34d8-2ac3-e89b-30f1-a18826ce5d7d@deltatee.com/T/#u

Reported-by: Kit Chow <kchow@gigaio.com>
Fixes: 5b28541552ef ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources")
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index ec44a0f3a7ac..0eb40924169b 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1228,21 +1228,20 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
 		prefmask = IORESOURCE_MEM | IORESOURCE_PREFETCH;
 		if (b_res[2].flags & IORESOURCE_MEM_64) {
 			prefmask |= IORESOURCE_MEM_64;
-			ret = pbus_size_mem(bus, prefmask, prefmask,
+			pbus_size_mem(bus, prefmask, prefmask,
 				  prefmask, prefmask,
 				  realloc_head ? 0 : additional_mem_size,
 				  additional_mem_size, realloc_head);
 
 			/*
-			 * If successful, all non-prefetchable resources
-			 * and any 32-bit prefetchable resources will go in
-			 * the non-prefetchable window.
+			 * Given the existence of a 64-bit resource for this
+			 * bus, all non-prefetchable resources and any 32-bit
+			 * prefetchable resources will go in the
+			 * non-prefetchable window.
 			 */
-			if (ret == 0) {
-				mask = prefmask;
-				type2 = prefmask & ~IORESOURCE_MEM_64;
-				type3 = prefmask & ~IORESOURCE_PREFETCH;
-			}
+			mask = prefmask;
+			type2 = prefmask & ~IORESOURCE_MEM_64;
+			type3 = prefmask & ~IORESOURCE_PREFETCH;
 		}
 
 		/*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v3 2/2] PCI: Fix disabling of bridge BARs when assigning bus resources
  2019-05-31 17:12 [PATCH v3 0/2] Fix a pair of setup bus bugs Logan Gunthorpe
  2019-05-31 17:12 ` [PATCH v3 1/2] PCI: Prevent 64-bit resources from being counted in 32-bit bridge region Logan Gunthorpe
@ 2019-05-31 17:12 ` Logan Gunthorpe
  2019-06-17 13:53   ` Bjorn Helgaas
  2019-06-17 13:49 ` [PATCH v3 0/2] Fix a pair of setup bus bugs Bjorn Helgaas
  2 siblings, 1 reply; 7+ messages in thread
From: Logan Gunthorpe @ 2019-05-31 17:12 UTC (permalink / raw)
  To: linux-kernel, linux-pci, Bjorn Helgaas
  Cc: Kit Chow, Yinghai Lu, Logan Gunthorpe

One odd quirk of PLX switches is that their upstream bridge port has
256K of space allocated behind its BAR0 (most other bridge
implementations do not report any BAR space). The lspci for such  device
looks like:

  04:00.0 PCI bridge: PLX Technology, Inc. PEX 8724 24-Lane, 6-Port PCI
            Express Gen 3 (8 GT/s) Switch, 19 x 19mm FCBGA (rev ca)
	    (prog-if 00 [Normal decode])
      Physical Slot: 1
      Flags: bus master, fast devsel, latency 0, IRQ 30, NUMA node 0
      Memory at 90a00000 (32-bit, non-prefetchable) [size=256K]
      Bus: primary=04, secondary=05, subordinate=0a, sec-latency=0
      I/O behind bridge: 00002000-00003fff
      Memory behind bridge: 90000000-909fffff
      Prefetchable memory behind bridge: 0000380000800000-0000380000bfffff
      Kernel driver in use: pcieport

It's not clear what the purpose of the memory at 0x90a00000 is, and
currently the kernel never actually uses it for anything. In most cases,
it's safely ignored and does not cause a problem.

However, when the kernel assigns the resource addresses (with the
pci=realloc command line parameter, for example) it can inadvertently
disable the struct resource corresponding to the bar. When this happens,
lspci will report this memory as ignored:

   Region 0: Memory at <ignored> (32-bit, non-prefetchable) [size=256K]

This is because the kernel reports a zero start address and zero flags
in the corresponding sysfs resource file and in /proc/bus/pci/devices.
Investigation with 'lspci -x', however shows the bios-assigned address
will still be programmed in the device's BAR registers.

In many cases, this still isn't a problem. Nothing uses the memory,
so nothing is affected. However, a big problem shows up when an IOMMU
is in use: the IOMMU will not reserve this space in the IOVA because the
kernel no longer thinks the range is valid. (See
dmar_init_reserved_ranges() for the Intel implementation of this.)

Without the proper reserved range, we have a situation where a DMA
mapping may occasionally allocate an IOVA which the PCI bus will actually
route to a BAR in the PLX switch. This will result in some random DMA
writes not actually writing to the RAM they are supposed to, or random
DMA reads returning all FFs from the PLX BAR when it's supposed to have
read from RAM.

The problem is caused in pci_assign_unassigned_root_bus_resources().
When any resource from a bridge device fails to get assigned, the code
sets the resource's flags to zero. This makes sense for bridge resources,
as they will be re-enabled later, but for regular BARs, it disables them
permanently. To fix the problem, we only set the flags to zero for
bridge resources and treat any other resources like non-bridge devices.

Reported-by: Kit Chow <kchow@gigaio.com>
Fixes: da7822e5ad71 ("PCI: update bridge resources to get more big ranges when allocating space (again)")
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
---
 drivers/pci/setup-bus.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 0eb40924169b..7adbd4bedd16 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1784,11 +1784,16 @@ void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus)
 	/* restore size and flags */
 	list_for_each_entry(fail_res, &fail_head, list) {
 		struct resource *res = fail_res->res;
+		int idx;
 
 		res->start = fail_res->start;
 		res->end = fail_res->end;
 		res->flags = fail_res->flags;
-		if (fail_res->dev->subordinate)
+
+		idx = res - &fail_res->dev->resource[0];
+		if (fail_res->dev->subordinate &&
+		    idx >= PCI_BRIDGE_RESOURCES &&
+		    idx <= PCI_BRIDGE_RESOURCE_END)
 			res->flags = 0;
 	}
 	free_list(&fail_head);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 0/2] Fix a pair of setup bus bugs
  2019-05-31 17:12 [PATCH v3 0/2] Fix a pair of setup bus bugs Logan Gunthorpe
  2019-05-31 17:12 ` [PATCH v3 1/2] PCI: Prevent 64-bit resources from being counted in 32-bit bridge region Logan Gunthorpe
  2019-05-31 17:12 ` [PATCH v3 2/2] PCI: Fix disabling of bridge BARs when assigning bus resources Logan Gunthorpe
@ 2019-06-17 13:49 ` Bjorn Helgaas
  2 siblings, 0 replies; 7+ messages in thread
From: Bjorn Helgaas @ 2019-06-17 13:49 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-pci, Kit Chow, Yinghai Lu,
	Benjamin Herrenschmidt, Nicholas Johnson

[+cc Ben, Nicholas]

On Fri, May 31, 2019 at 11:12:14AM -0600, Logan Gunthorpe wrote:
> Hey,
> 
> This is another resend to get some more attention. Nothing has changed
> since v2.
> 
> For the first patch, there's a lot more information in the original
> thread here[1] including instructions on how to reproduce it in QEMU.
> 
> The second patch fixes an unrelated bug, with similar symptoms, in
> the same code. It was a lot easier to debug and the reasoning should
> hopefully be easier to follow, but I don't think it was reviewed much
> during the first posting due to the nightmare in the first patch.
> 
> Thanks,
> 
> Logan
> 
> [1] https://lore.kernel.org/lkml/de3e34d8-2ac3-e89b-30f1-a18826ce5d7d@deltatee.com/T/#m96ba95de4678146ed46b602e8bfd6ac08a588fa2
> 
> --
> 
> Changes in v3:
> 
> * Rebased onto v5.2-rc2 (no changes)
> 
> Changes in v2:
> 
> * Rebased onto v5.1-rc6 (no changes)
> * Reworked the commit message in the first commit to try and explain
>   it better.
> 
> --
> 
> Logan Gunthorpe (2):
>   PCI: Prevent 64-bit resources from being counted in 32-bit bridge
>     region
>   PCI: Fix disabling of bridge BARs when assigning bus resources
> 
>  drivers/pci/setup-bus.c | 24 ++++++++++++++----------
>  1 file changed, 14 insertions(+), 10 deletions(-)
> 
> --
> 2.20.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 2/2] PCI: Fix disabling of bridge BARs when assigning bus resources
  2019-05-31 17:12 ` [PATCH v3 2/2] PCI: Fix disabling of bridge BARs when assigning bus resources Logan Gunthorpe
@ 2019-06-17 13:53   ` Bjorn Helgaas
  2019-06-17 17:32     ` Logan Gunthorpe
  2019-06-18  5:27     ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 7+ messages in thread
From: Bjorn Helgaas @ 2019-06-17 13:53 UTC (permalink / raw)
  To: Logan Gunthorpe; +Cc: linux-kernel, linux-pci, Kit Chow, Yinghai Lu

On Fri, May 31, 2019 at 11:12:16AM -0600, Logan Gunthorpe wrote:
> One odd quirk of PLX switches is that their upstream bridge port has
> 256K of space allocated behind its BAR0 (most other bridge
> implementations do not report any BAR space).

Somewhat unusual, but completely legal, of course.

If a bridge has memory BARs, AFAIK it is impossible to enable a memory
window without also enabling the BARs, so if we want to use the bridge
at all, we *must* allocate space for its BARs, just like for any other
device.

> The lspci for such  device
> looks like:
> 
>   04:00.0 PCI bridge: PLX Technology, Inc. PEX 8724 24-Lane, 6-Port PCI
>             Express Gen 3 (8 GT/s) Switch, 19 x 19mm FCBGA (rev ca)
> 	    (prog-if 00 [Normal decode])
>       Physical Slot: 1
>       Flags: bus master, fast devsel, latency 0, IRQ 30, NUMA node 0
>       Memory at 90a00000 (32-bit, non-prefetchable) [size=256K]
>       Bus: primary=04, secondary=05, subordinate=0a, sec-latency=0
>       I/O behind bridge: 00002000-00003fff
>       Memory behind bridge: 90000000-909fffff
>       Prefetchable memory behind bridge: 0000380000800000-0000380000bfffff
>       Kernel driver in use: pcieport
> 
> It's not clear what the purpose of the memory at 0x90a00000 is, and
> currently the kernel never actually uses it for anything. In most cases,
> it's safely ignored and does not cause a problem.
> 
> However, when the kernel assigns the resource addresses (with the
> pci=realloc command line parameter, for example) it can inadvertently
> disable the struct resource corresponding to the bar. When this happens,
> lspci will report this memory as ignored:
> 
>    Region 0: Memory at <ignored> (32-bit, non-prefetchable) [size=256K]
> 
> This is because the kernel reports a zero start address and zero flags
> in the corresponding sysfs resource file and in /proc/bus/pci/devices.
> Investigation with 'lspci -x', however shows the bios-assigned address
> will still be programmed in the device's BAR registers.

Ugh, yep.  It took me a while to trace through this, but "lspci -v" by
default shows the kernel view of addresses, i.e., the pdev->resource[]
values, which it gets from the sysfs "resource" file (resource_show())
or "/proc/bus/pci/devices" (show_device()).

But "lspci -x" shows the config space values (I think "lspci -bv"
should also show these) from the sysfs "config" file
(pci_read_config()).

It's definitely a kernel bug that we lost track of what's in config
space.

> In many cases, this still isn't a problem. Nothing uses the memory,
> so nothing is affected. However, a big problem shows up when an IOMMU
> is in use: the IOMMU will not reserve this space in the IOVA because the
> kernel no longer thinks the range is valid. (See
> dmar_init_reserved_ranges() for the Intel implementation of this.)
> 
> Without the proper reserved range, we have a situation where a DMA
> mapping may occasionally allocate an IOVA which the PCI bus will actually
> route to a BAR in the PLX switch. This will result in some random DMA
> writes not actually writing to the RAM they are supposed to, or random
> DMA reads returning all FFs from the PLX BAR when it's supposed to have
> read from RAM.
> 
> The problem is caused in pci_assign_unassigned_root_bus_resources().
> When any resource from a bridge device fails to get assigned, the code
> sets the resource's flags to zero. This makes sense for bridge resources,
> as they will be re-enabled later, but for regular BARs, it disables them
> permanently. To fix the problem, we only set the flags to zero for
> bridge resources and treat any other resources like non-bridge devices.
> 
> Reported-by: Kit Chow <kchow@gigaio.com>
> Fixes: da7822e5ad71 ("PCI: update bridge resources to get more big ranges when allocating space (again)")
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Yinghai Lu <yinghai@kernel.org>
> ---
>  drivers/pci/setup-bus.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index 0eb40924169b..7adbd4bedd16 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1784,11 +1784,16 @@ void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus)
>  	/* restore size and flags */
>  	list_for_each_entry(fail_res, &fail_head, list) {
>  		struct resource *res = fail_res->res;
> +		int idx;
>  
>  		res->start = fail_res->start;
>  		res->end = fail_res->end;
>  		res->flags = fail_res->flags;
> -		if (fail_res->dev->subordinate)
> +
> +		idx = res - &fail_res->dev->resource[0];
> +		if (fail_res->dev->subordinate &&
> +		    idx >= PCI_BRIDGE_RESOURCES &&
> +		    idx <= PCI_BRIDGE_RESOURCE_END)
>  			res->flags = 0;

In my ideal world we wouldn't zap the flags of any resource.  I think
we should derive the flags from the device's config space *once*
during enumeration and remember them for the life of the device.

This patch preserves res->flags for bridge BARs just like for any
other device, so I think this is definitely a step in the right
direction.

I'm not sure the "dev->subordinate" test is really correct, though.
I think the original intent of this code was to clear res->flags for
bridge windows under the assumptions that (a) we can identify bridges
by "dev->subordinate" being non-zero, and (b) bridges only have
windows and didn't have BARs.

This patch fixes assumption (b), but I think (a) is false, and we
should fix it as well.  One can imagine a bridge device without a
subordinate bus (maybe we ran out of bus numbers), so I don't think we
should test dev->subordinate.

We could test something like pci_is_bridge(), although testing for idx
being in the PCI_BRIDGE_RESOURCES range should be sufficient because I
don't think we use those resource for anything other than windows.

>  	}
>  	free_list(&fail_head);
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 2/2] PCI: Fix disabling of bridge BARs when assigning bus resources
  2019-06-17 13:53   ` Bjorn Helgaas
@ 2019-06-17 17:32     ` Logan Gunthorpe
  2019-06-18  5:27     ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 7+ messages in thread
From: Logan Gunthorpe @ 2019-06-17 17:32 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-kernel, linux-pci, Kit Chow, Yinghai Lu



On 2019-06-17 7:53 a.m., Bjorn Helgaas wrote:
>> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
>> index 0eb40924169b..7adbd4bedd16 100644
>> --- a/drivers/pci/setup-bus.c
>> +++ b/drivers/pci/setup-bus.c
>> @@ -1784,11 +1784,16 @@ void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus)
>>  	/* restore size and flags */
>>  	list_for_each_entry(fail_res, &fail_head, list) {
>>  		struct resource *res = fail_res->res;
>> +		int idx;
>>  
>>  		res->start = fail_res->start;
>>  		res->end = fail_res->end;
>>  		res->flags = fail_res->flags;
>> -		if (fail_res->dev->subordinate)
>> +
>> +		idx = res - &fail_res->dev->resource[0];
>> +		if (fail_res->dev->subordinate &&
>> +		    idx >= PCI_BRIDGE_RESOURCES &&
>> +		    idx <= PCI_BRIDGE_RESOURCE_END)
>>  			res->flags = 0;
> 
> In my ideal world we wouldn't zap the flags of any resource.  I think
> we should derive the flags from the device's config space *once*
> during enumeration and remember them for the life of the device.

Yes, I agree. The fact that this code seems to be constantly modifying
everything makes it difficult to follow. When it clears the flags like
this it's not clear if/where/how it will ever put them back.

> This patch preserves res->flags for bridge BARs just like for any
> other device, so I think this is definitely a step in the right
> direction.
> 
> I'm not sure the "dev->subordinate" test is really correct, though.
> I think the original intent of this code was to clear res->flags for
> bridge windows under the assumptions that (a) we can identify bridges
> by "dev->subordinate" being non-zero, and (b) bridges only have
> windows and didn't have BARs.

Yes, I was also unsure of the reasoning behind the dev->subordinate test
as well. But given that I didn't fully understand it, and it wasn't
itself causing any problems, I elected to just change around it only for
the bug I was trying to fix.

> This patch fixes assumption (b), but I think (a) is false, and we
> should fix it as well.  One can imagine a bridge device without a
> subordinate bus (maybe we ran out of bus numbers), so I don't think we
> should test dev->subordinate.
>
> We could test something like pci_is_bridge(), although testing for idx
> being in the PCI_BRIDGE_RESOURCES range should be sufficient because I
> don't think we use those resource for anything other than windows.

Ok, yes, there are a couple possibilities here and I'm unsure of the
best thing to do. I agree that, right now, testing the idx for the range
is probably sufficient. So logically we could probably just remove the
dev->subordinate test. Assuming nobody decides to reuse the bridge
indices for something else (which is probably a safe assumption).
Though, testing for pci_is_bridge() would definitely be an improvement
in terms of readability and the issues you point out.

One way or another I can add a third patch to do this next time I submit
this series.

Thanks,

Logan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 2/2] PCI: Fix disabling of bridge BARs when assigning bus resources
  2019-06-17 13:53   ` Bjorn Helgaas
  2019-06-17 17:32     ` Logan Gunthorpe
@ 2019-06-18  5:27     ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 7+ messages in thread
From: Benjamin Herrenschmidt @ 2019-06-18  5:27 UTC (permalink / raw)
  To: Bjorn Helgaas, Logan Gunthorpe
  Cc: linux-kernel, linux-pci, Kit Chow, Yinghai Lu

On Mon, 2019-06-17 at 08:53 -0500, Bjorn Helgaas wrote:
> On Fri, May 31, 2019 at 11:12:16AM -0600, Logan Gunthorpe wrote:
> > One odd quirk of PLX switches is that their upstream bridge port has
> > 256K of space allocated behind its BAR0 (most other bridge
> > implementations do not report any BAR space).
> 
> Somewhat unusual, but completely legal, of course.

Ah yes, I've seen these. They have an MMIO path to their internal
registers in addition to cfg. Can be annoying.

> If a bridge has memory BARs, AFAIK it is impossible to enable a memory
> window without also enabling the BARs, so if we want to use the bridge
> at all, we *must* allocate space for its BARs, just like for any other
> device.

Right.

 .../... (agreeing violently)

> In my ideal world we wouldn't zap the flags of any resource.  I think
> we should derive the flags from the device's config space *once*
> during enumeration and remember them for the life of the device.

Amen brother. It will take a little while to get there. One thing we
should do is have a clearer way to mark a resource that failed to
assign/allocate (though technically parent=NULL is it really, as long
as all archs these days claim properly, it used not to be the case).

We do wipe *bridge* windows (nor BARs) all over the place, that is less
of an issue I suppose though I would be more comfortable if we also
wrote to the bridge to close those windows as we do so...

The problem of course is how much old weird quirky will break due to
subtle assumptions as we "fix" these things :-)

> This patch preserves res->flags for bridge BARs just like for any
> other device, so I think this is definitely a step in the right
> direction.
> 
> I'm not sure the "dev->subordinate" test is really correct, though.

Right, shouldn't it be pci_is_bridge() ?

> I think the original intent of this code was to clear res->flags for
> bridge windows under the assumptions that (a) we can identify bridges
> by "dev->subordinate" being non-zero, and (b) bridges only have
> windows and didn't have BARs.
> 
> This patch fixes assumption (b), but I think (a) is false, and we
> should fix it as well.  One can imagine a bridge device without a
> subordinate bus (maybe we ran out of bus numbers), so I don't think we
> should test dev->subordinate.

Yup.

> We could test something like pci_is_bridge(), although testing for idx
> being in the PCI_BRIDGE_RESOURCES range should be sufficient because I
> don't think we use those resource for anything other than windows.

Yeah quite possibly.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-06-18  7:02 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-31 17:12 [PATCH v3 0/2] Fix a pair of setup bus bugs Logan Gunthorpe
2019-05-31 17:12 ` [PATCH v3 1/2] PCI: Prevent 64-bit resources from being counted in 32-bit bridge region Logan Gunthorpe
2019-05-31 17:12 ` [PATCH v3 2/2] PCI: Fix disabling of bridge BARs when assigning bus resources Logan Gunthorpe
2019-06-17 13:53   ` Bjorn Helgaas
2019-06-17 17:32     ` Logan Gunthorpe
2019-06-18  5:27     ` Benjamin Herrenschmidt
2019-06-17 13:49 ` [PATCH v3 0/2] Fix a pair of setup bus bugs Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).