All of lore.kernel.org
 help / color / mirror / Atom feed
* Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
@ 2014-12-03 22:15 Bjorn Helgaas
  2014-12-04  0:51 ` Bjorn Helgaas
                   ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Bjorn Helgaas @ 2014-12-03 22:15 UTC (permalink / raw)
  To: linux-pci, zermond, David Airlie, Yinghai Lu, saifi.khan,
	Alex Deucher, kordikmarek

There's a bugzilla for this, so this email is to make the bugzilla
more discoverable and to have a more public discussion about it.

  https://bugzilla.kernel.org/show_bug.cgi?id=85491

I think this regression is caused by 5b28541552ef ("PCI: Restrict
64-bit prefetchable bridge windows to 64-bit resources").  These dmesg
fragments are from v3.16, which doesn't work:

    pci_bus 0000:00: root bus resource [mem 0xc0000000-0xffffffff]

    pci 0000:00:01.0: PCI bridge to [bus 01]
    pci 0000:00:01.0:   bridge window [io  0x8000-0xafff]
    pci 0000:00:01.0:   bridge window [mem 0xfdf00000-0xfdffffff]
    pci 0000:00:01.0:   bridge window [mem 0xbdf00000-0xddefffff 64bit
pref]  <-- outside host bridge window

    pci 0000:01:00.0: reg 0x10: [mem 0xc0000000-0xcfffffff pref]
    pci 0000:01:00.0: reg 0x14: [io  0xa000-0xa0ff]
    pci 0000:01:00.0: reg 0x18: [mem 0xfdff0000-0xfdffffff]
    pci 0000:01:00.0: reg 0x30: [mem 0xfdfc0000-0xfdfdffff pref]

    pci 0000:00:01.0: can't claim BAR 15 [mem 0xbdf00000-0xddefffff
64bit pref]: no compatible bridge window
    pci 0000:01:00.0: can't claim BAR 0 [mem 0xc0000000-0xcfffffff
pref]: no compatible bridge window

    pci 0000:00:01.0: bridge window [mem 0x00100000-0x000fffff 64bit
pref] to [bus 01] add_size 200000
    pci 0000:00:01.0: res[15]=[mem 0x00100000-0x000fffff 64bit pref]
get_res_add_size add_size 200000  <-- size == 0
    pci 0000:00:01.0: BAR 15: assigned [mem 0xc0000000-0xc01fffff 64bit pref]
    pci 0000:00:01.0: PCI bridge to [bus 01]
    pci 0000:00:01.0:   bridge window [io  0x8000-0xafff]
    pci 0000:00:01.0:   bridge window [mem 0xfdf00000-0xfdffffff]
    pci 0000:00:01.0:   bridge window [mem 0xc0000000-0xc01fffff 64bit pref]

The initial 00:01.0 64bit pref window is interesting because it
doesn't fit in the host bridge window.  Probably a BIOS bug, but we
should be able to deal with it.  In v3.15 (which doesn't contain
5b28541552ef), we assign [mem 0xc0000000-0xcfffffff pref] to that
window and the radeon works.  (There are some error messages related
to this assignment, so the dmesg *looks* like it failed, but I think
it must have worked.)

In v3.16 (which does contain 5b28541552ef), when pbus_size_mem() sizes
the 64-bit prefetchable window, it only looks for downstream 64-bit
resources.  Since the radeon at 01:00.0 has none, we size the window
to 0 + 0x200000 (the 0x200000 part is pci_hotplug_mem_size, to
accommodate future hot-added devices).

But that's not big enough to hold the radeon BAR 0 [mem
0xc0000000-0xcfffffff pref], so BAR 0 remains unassigned, so
pci_enable_device() should fail, and radeon doesn't work.

I don't know what the best fix is, but I think it's probably too
aggressive to *never* use a 64-bit prefetchable window for downstream
32-bit prefetchable resources.  This configuration from BIOS should
just work without us changing anything (although we probably should
trim the window to start at 0xc0000000, which would still work).

In this case, I think we certainly want the radeon BAR 0 in the
prefetchable window.  In general, it seems wrong that a device with
32-bit BARs will work fine below a bridge with a 32-bit prefetchable
window, but it won't work (or will work slower) below a bridge with a
64-bit prefetchable window.  A more capable bridge should make things
work *better*, not worse.

Bjorn

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-03 22:15 Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init Bjorn Helgaas
@ 2014-12-04  0:51 ` Bjorn Helgaas
  2014-12-04  1:44   ` Yinghai Lu
  2014-12-04  1:38 ` Yinghai Lu
  2014-12-04  2:01 ` Yinghai Lu
  2 siblings, 1 reply; 37+ messages in thread
From: Bjorn Helgaas @ 2014-12-04  0:51 UTC (permalink / raw)
  To: linux-pci, zermond, David Airlie, Yinghai Lu, saifi.khan,
	Alex Deucher, kordikmarek
  Cc: Guo Chao, Benjamin Herrenschmidt

[+cc Guo, Ben]

On Wed, Dec 3, 2014 at 3:15 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> There's a bugzilla for this, so this email is to make the bugzilla
> more discoverable and to have a more public discussion about it.
>
>   https://bugzilla.kernel.org/show_bug.cgi?id=85491
>
> I think this regression is caused by 5b28541552ef ("PCI: Restrict
> 64-bit prefetchable bridge windows to 64-bit resources").
> ...

> In v3.16 (which does contain 5b28541552ef), when pbus_size_mem() sizes
> the 64-bit prefetchable window, it only looks for downstream 64-bit
> resources.  Since the radeon at 01:00.0 has none, we size the window
> to 0 + 0x200000 (the 0x200000 part is pci_hotplug_mem_size, to
> accommodate future hot-added devices).
> ...

> I don't know what the best fix is, but I think it's probably too
> aggressive to *never* use a 64-bit prefetchable window for downstream
> 32-bit prefetchable resources.  This configuration from BIOS should
> just work without us changing anything (although we probably should
> trim the window to start at 0xc0000000, which would still work).
>
> In this case, I think we certainly want the radeon BAR 0 in the
> prefetchable window.  In general, it seems wrong that a device with
> 32-bit BARs will work fine below a bridge with a 32-bit prefetchable
> window, but it won't work (or will work slower) below a bridge with a
> 64-bit prefetchable window.  A more capable bridge should make things
> work *better*, not worse.

5b28541552ef was added to fix https://bugzilla.kernel.org/show_bug.cgi?id=74151

In that case, there was a bridge with a 64-bit prefetchable window,
and downstream devices had 64-bit prefetchable BARs as well as
prefetchable 32-bit ROMs:

    pci 0006:00:00.0: PCI bridge to [bus 01]
    pci 0006:00:00.0:   bridge window [mem 0x3d24800000000-0x3d248007fffff]
    pci 0006:00:00.0:   bridge window [mem
0x3d24008000000-0x3d2400fffffff 64bit pref]

    pci 0006:01:00.0: reg 0x10: [mem 0x3d2400e000000-0x3d2400e007fff 64bit pref]
    pci 0006:01:00.0: reg 0x20: [mem 0x3d2400e030000-0x3d2400e030fff 64bit pref]
    pci 0006:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]

The problem was that we used to put all the prefetchable BARs in the
prefetchable window, which meant we had to put the window below 4GB
because the ROM BAR is only 32 bits.  But there wasn't enough space
below 4GB for all the other 64-bit BARs.

I think another way to fix the bug 74151 problem would have been to
put the ROMs in the non-prefetchable window and the 64-bit BARs in the
64-bit prefetchable window.

Bjorn

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-03 22:15 Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init Bjorn Helgaas
  2014-12-04  0:51 ` Bjorn Helgaas
@ 2014-12-04  1:38 ` Yinghai Lu
  2014-12-04  1:41   ` Yinghai Lu
  2014-12-04  2:01 ` Yinghai Lu
  2 siblings, 1 reply; 37+ messages in thread
From: Yinghai Lu @ 2014-12-04  1:38 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, zermond, David Airlie, saifi.khan, Alex Deucher, kordikmarek

On Wed, Dec 3, 2014 at 2:15 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> There's a bugzilla for this, so this email is to make the bugzilla
> more discoverable and to have a more public discussion about it.
>
>   https://bugzilla.kernel.org/show_bug.cgi?id=85491
>
> I think this regression is caused by 5b28541552ef ("PCI: Restrict
> 64-bit prefetchable bridge windows to 64-bit resources").  These dmesg
> fragments are from v3.16, which doesn't work:
>
>     pci_bus 0000:00: root bus resource [mem 0xc0000000-0xffffffff]
>
>     pci 0000:00:01.0: PCI bridge to [bus 01]
>     pci 0000:00:01.0:   bridge window [io  0x8000-0xafff]
>     pci 0000:00:01.0:   bridge window [mem 0xfdf00000-0xfdffffff]
>     pci 0000:00:01.0:   bridge window [mem 0xbdf00000-0xddefffff 64bit
> pref]  <-- outside host bridge window
>
>     pci 0000:01:00.0: reg 0x10: [mem 0xc0000000-0xcfffffff pref]
>     pci 0000:01:00.0: reg 0x14: [io  0xa000-0xa0ff]
>     pci 0000:01:00.0: reg 0x18: [mem 0xfdff0000-0xfdffffff]
>     pci 0000:01:00.0: reg 0x30: [mem 0xfdfc0000-0xfdfdffff pref]
>
>     pci 0000:00:01.0: can't claim BAR 15 [mem 0xbdf00000-0xddefffff
> 64bit pref]: no compatible bridge window
>     pci 0000:01:00.0: can't claim BAR 0 [mem 0xc0000000-0xcfffffff
> pref]: no compatible bridge window

Looks like we wrong type checking from parent resource searching.

Yinghai

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-04  1:38 ` Yinghai Lu
@ 2014-12-04  1:41   ` Yinghai Lu
  2014-12-04 16:15     ` Bjorn Helgaas
  0 siblings, 1 reply; 37+ messages in thread
From: Yinghai Lu @ 2014-12-04  1:41 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, zermond, David Airlie, saifi.khan, Alex Deucher, kordikmarek

On Wed, Dec 3, 2014 at 5:38 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Wed, Dec 3, 2014 at 2:15 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> There's a bugzilla for this, so this email is to make the bugzilla
>> more discoverable and to have a more public discussion about it.
>>
>>   https://bugzilla.kernel.org/show_bug.cgi?id=85491
>>
>> I think this regression is caused by 5b28541552ef ("PCI: Restrict
>> 64-bit prefetchable bridge windows to 64-bit resources").  These dmesg
>> fragments are from v3.16, which doesn't work:
>>
>>     pci_bus 0000:00: root bus resource [mem 0xc0000000-0xffffffff]
>>
>>     pci 0000:00:01.0: PCI bridge to [bus 01]
>>     pci 0000:00:01.0:   bridge window [io  0x8000-0xafff]
>>     pci 0000:00:01.0:   bridge window [mem 0xfdf00000-0xfdffffff]
>>     pci 0000:00:01.0:   bridge window [mem 0xbdf00000-0xddefffff 64bit
>> pref]  <-- outside host bridge window
>>
>>     pci 0000:01:00.0: reg 0x10: [mem 0xc0000000-0xcfffffff pref]
>>     pci 0000:01:00.0: reg 0x14: [io  0xa000-0xa0ff]
>>     pci 0000:01:00.0: reg 0x18: [mem 0xfdff0000-0xfdffffff]
>>     pci 0000:01:00.0: reg 0x30: [mem 0xfdfc0000-0xfdfdffff pref]
>>
>>     pci 0000:00:01.0: can't claim BAR 15 [mem 0xbdf00000-0xddefffff
>> 64bit pref]: no compatible bridge window
>>     pci 0000:01:00.0: can't claim BAR 0 [mem 0xc0000000-0xcfffffff
>> pref]: no compatible bridge window
>
> Looks like we wrong type checking from parent resource searching.

oh, no, the BIOS is crazy...

[    0.158601] pci 0000:00:01.0: address space collision: [mem
0xbdf00000-0xddefffff 64bit pref] conflicts with System RAM [mem
0x00100000-0xbff9ffff]

mmio is overlapping with ram range....

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-04  0:51 ` Bjorn Helgaas
@ 2014-12-04  1:44   ` Yinghai Lu
  2014-12-04 17:34     ` Bjorn Helgaas
  0 siblings, 1 reply; 37+ messages in thread
From: Yinghai Lu @ 2014-12-04  1:44 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, zermond, David Airlie, saifi.khan, Alex Deucher,
	kordikmarek, Guo Chao, Benjamin Herrenschmidt

On Wed, Dec 3, 2014 at 4:51 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> I think another way to fix the bug 74151 problem would have been to
> put the ROMs in the non-prefetchable window and the 64-bit BARs in the
> 64-bit prefetchable window.

I thought we did that way already.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-03 22:15 Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init Bjorn Helgaas
  2014-12-04  0:51 ` Bjorn Helgaas
  2014-12-04  1:38 ` Yinghai Lu
@ 2014-12-04  2:01 ` Yinghai Lu
  2014-12-04 16:37   ` Bjorn Helgaas
  2 siblings, 1 reply; 37+ messages in thread
From: Yinghai Lu @ 2014-12-04  2:01 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Alexey Voronkov, David Airlie, Saifi Khan,
	Alex Deucher, Marek Kordík

On Wed, Dec 3, 2014 at 2:15 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> In v3.16 (which does contain 5b28541552ef), when pbus_size_mem() sizes
> the 64-bit prefetchable window, it only looks for downstream 64-bit
> resources.  Since the radeon at 01:00.0 has none, we size the window
> to 0 + 0x200000 (the 0x200000 part is pci_hotplug_mem_size, to
> accommodate future hot-added devices).
>
> But that's not big enough to hold the radeon BAR 0 [mem
> 0xc0000000-0xcfffffff pref], so BAR 0 remains unassigned, so
> pci_enable_device() should fail, and radeon doesn't work.
>
> I don't know what the best fix is, but I think it's probably too
> aggressive to *never* use a 64-bit prefetchable window for downstream
> 32-bit prefetchable resources.  This configuration from BIOS should
> just work without us changing anything (although we probably should
> trim the window to start at 0xc0000000, which would still work).

pci=realloc should workaround the problem, as it would take control of
the bridge mmio BAR, and allocate more range for mmio pref under it.

Yinghai

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-04  1:41   ` Yinghai Lu
@ 2014-12-04 16:15     ` Bjorn Helgaas
  2014-12-04 22:15       ` Yinghai Lu
  0 siblings, 1 reply; 37+ messages in thread
From: Bjorn Helgaas @ 2014-12-04 16:15 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: linux-pci, zermond, David Airlie, saifi.khan, Alex Deucher, kordikmarek

On Wed, Dec 03, 2014 at 05:41:32PM -0800, Yinghai Lu wrote:
> On Wed, Dec 3, 2014 at 5:38 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> > On Wed, Dec 3, 2014 at 2:15 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> >> There's a bugzilla for this, so this email is to make the bugzilla
> >> more discoverable and to have a more public discussion about it.
> >>
> >>   https://bugzilla.kernel.org/show_bug.cgi?id=85491
> >>
> >> I think this regression is caused by 5b28541552ef ("PCI: Restrict
> >> 64-bit prefetchable bridge windows to 64-bit resources").  These dmesg
> >> fragments are from v3.16, which doesn't work:
> >>
> >>     pci_bus 0000:00: root bus resource [mem 0xc0000000-0xffffffff]
> >>
> >>     pci 0000:00:01.0: PCI bridge to [bus 01]
> >>     pci 0000:00:01.0:   bridge window [io  0x8000-0xafff]
> >>     pci 0000:00:01.0:   bridge window [mem 0xfdf00000-0xfdffffff]
> >>     pci 0000:00:01.0:   bridge window [mem 0xbdf00000-0xddefffff 64bit
> >> pref]  <-- outside host bridge window
> >>
> >>     pci 0000:01:00.0: reg 0x10: [mem 0xc0000000-0xcfffffff pref]
> >>     pci 0000:01:00.0: reg 0x14: [io  0xa000-0xa0ff]
> >>     pci 0000:01:00.0: reg 0x18: [mem 0xfdff0000-0xfdffffff]
> >>     pci 0000:01:00.0: reg 0x30: [mem 0xfdfc0000-0xfdfdffff pref]
> >>
> >>     pci 0000:00:01.0: can't claim BAR 15 [mem 0xbdf00000-0xddefffff
> >> 64bit pref]: no compatible bridge window
> >>     pci 0000:01:00.0: can't claim BAR 0 [mem 0xc0000000-0xcfffffff
> >> pref]: no compatible bridge window
> >
> > Looks like we wrong type checking from parent resource searching.

I don't understand what you're trying to say, but I don't think the "no
compatible bridge window" problem is related to resource types.  I think
it's because the 00:01.0 window at [mem 0xbdf00000-0xddefffff] starts
before the [mem 0xc0000000-0xffffffff] host bridge window.

> oh, no, the BIOS is crazy...
> 
> [    0.158601] pci 0000:00:01.0: address space collision: [mem
> 0xbdf00000-0xddefffff 64bit pref] conflicts with System RAM [mem
> 0x00100000-0xbff9ffff]
> 
> mmio is overlapping with ram range....

I agree, it certainly looks like the BIOS programmed the 00:01.0 window
incorrectly.  But I think we should just clip it so it starts at 0xc0000000
instead of 0xbdf00000.

There's no reason we should depend on correct PCI configuration from the
BIOS.

Bjorn

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-04  2:01 ` Yinghai Lu
@ 2014-12-04 16:37   ` Bjorn Helgaas
  2014-12-04 22:24     ` Yinghai Lu
  0 siblings, 1 reply; 37+ messages in thread
From: Bjorn Helgaas @ 2014-12-04 16:37 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: linux-pci, Alexey Voronkov, David Airlie, Saifi Khan,
	Alex Deucher, Marek Kordík

On Wed, Dec 03, 2014 at 06:01:33PM -0800, Yinghai Lu wrote:
> On Wed, Dec 3, 2014 at 2:15 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> > In v3.16 (which does contain 5b28541552ef), when pbus_size_mem() sizes
> > the 64-bit prefetchable window, it only looks for downstream 64-bit
> > resources.  Since the radeon at 01:00.0 has none, we size the window
> > to 0 + 0x200000 (the 0x200000 part is pci_hotplug_mem_size, to
> > accommodate future hot-added devices).
> >
> > But that's not big enough to hold the radeon BAR 0 [mem
> > 0xc0000000-0xcfffffff pref], so BAR 0 remains unassigned, so
> > pci_enable_device() should fail, and radeon doesn't work.
> >
> > I don't know what the best fix is, but I think it's probably too
> > aggressive to *never* use a 64-bit prefetchable window for downstream
> > 32-bit prefetchable resources.  This configuration from BIOS should
> > just work without us changing anything (although we probably should
> > trim the window to start at 0xc0000000, which would still work).
> 
> pci=realloc should workaround the problem, as it would take control of
> the bridge mmio BAR, and allocate more range for mmio pref under it.

I don't think pci=realloc is the answer because

  1) It's a poor user experience.  We should be able to handle this
     automatically, without asking the user to do anything.

  2) It puts the radeon framebuffer in the non-prefetchable window, and we
     should be able to keep it in the prefetchable window.

Bjorn

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-04  1:44   ` Yinghai Lu
@ 2014-12-04 17:34     ` Bjorn Helgaas
  2014-12-04 22:06       ` Yinghai Lu
  0 siblings, 1 reply; 37+ messages in thread
From: Bjorn Helgaas @ 2014-12-04 17:34 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: linux-pci, zermond, David Airlie, saifi.khan, Alex Deucher,
	kordikmarek, Guo Chao, Benjamin Herrenschmidt

On Wed, Dec 03, 2014 at 05:44:26PM -0800, Yinghai Lu wrote:
> On Wed, Dec 3, 2014 at 4:51 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> > I think another way to fix the bug 74151 problem would have been to
> > put the ROMs in the non-prefetchable window and the 64-bit BARs in the
> > 64-bit prefetchable window.
> 
> I thought we did that way already.

Can you point me to the code that does this?  I don't see anything in
pbus_size_mem() that treats ROMs differently.

I'm considering reverting 5b28541552ef (assuming that would fix this radeon
regression) because I think the radeon problem is worse than the 74151
problem and I think we can solve 74151 in a different way.

Guo, Ben, this would affect you!  It's too late to revert 5b28541552ef for
v3.18, but we really need to work on this for v3.19.

FOr 85491 (the radeon problem), we might be able to work around it by
fixing the bridge window, but I'm not sure we could even get that done
before v3.18.  I'll work on that today.

Bjorn

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-04 17:34     ` Bjorn Helgaas
@ 2014-12-04 22:06       ` Yinghai Lu
  2014-12-05 10:59         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 37+ messages in thread
From: Yinghai Lu @ 2014-12-04 22:06 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Alexey Voronkov, David Airlie, Saifi Khan,
	Alex Deucher, Marek Kordík, Guo Chao,
	Benjamin Herrenschmidt

On Thu, Dec 4, 2014 at 9:34 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Wed, Dec 03, 2014 at 05:44:26PM -0800, Yinghai Lu wrote:
>> On Wed, Dec 3, 2014 at 4:51 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> > I think another way to fix the bug 74151 problem would have been to
>> > put the ROMs in the non-prefetchable window and the 64-bit BARs in the
>> > 64-bit prefetchable window.
>>
>> I thought we did that way already.
>
> Can you point me to the code that does this?  I don't see anything in
> pbus_size_mem() that treats ROMs differently.

Current code:
1. when bridge does not support 64bit mmio pref, ROM bar will be
allocated to bridge mmio pref
2. when bridge does support 64bit mmio pref, ROM BAR will be allocated
to bridge mmio.

so you want to change case 1 to bridge to mmio pref?

>
> I'm considering reverting 5b28541552ef (assuming that would fix this radeon
> regression) because I think the radeon problem is worse than the 74151
> problem and I think we can solve 74151 in a different way.
>
> Guo, Ben, this would affect you!  It's too late to revert 5b28541552ef for
> v3.18, but we really need to work on this for v3.19.
>
> FOr 85491 (the radeon problem), we might be able to work around it by
> fixing the bridge window, but I'm not sure we could even get that done
> before v3.18.  I'll work on that today.

shrink bridge mmio pref range instead of reject it?

Yinghai

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-04 16:15     ` Bjorn Helgaas
@ 2014-12-04 22:15       ` Yinghai Lu
  2014-12-04 22:25         ` Bjorn Helgaas
  0 siblings, 1 reply; 37+ messages in thread
From: Yinghai Lu @ 2014-12-04 22:15 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Alexey Voronkov, David Airlie, Saifi Khan,
	Alex Deucher, Marek Kordík

On Thu, Dec 4, 2014 at 8:15 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Wed, Dec 03, 2014 at 05:41:32PM -0800, Yinghai Lu wrote:
>> On Wed, Dec 3, 2014 at 5:38 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> > On Wed, Dec 3, 2014 at 2:15 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> >> There's a bugzilla for this, so this email is to make the bugzilla
>> >> more discoverable and to have a more public discussion about it.
>> >>
>> >>   https://bugzilla.kernel.org/show_bug.cgi?id=85491
>> >>
>> >> I think this regression is caused by 5b28541552ef ("PCI: Restrict
>> >> 64-bit prefetchable bridge windows to 64-bit resources").  These dmesg
>> >> fragments are from v3.16, which doesn't work:
>> >>
>> >>     pci_bus 0000:00: root bus resource [mem 0xc0000000-0xffffffff]
>> >>
>> >>     pci 0000:00:01.0: PCI bridge to [bus 01]
>> >>     pci 0000:00:01.0:   bridge window [io  0x8000-0xafff]
>> >>     pci 0000:00:01.0:   bridge window [mem 0xfdf00000-0xfdffffff]
>> >>     pci 0000:00:01.0:   bridge window [mem 0xbdf00000-0xddefffff 64bit
>> >> pref]  <-- outside host bridge window
>> >>
>> >>     pci 0000:01:00.0: reg 0x10: [mem 0xc0000000-0xcfffffff pref]
>> >>     pci 0000:01:00.0: reg 0x14: [io  0xa000-0xa0ff]
>> >>     pci 0000:01:00.0: reg 0x18: [mem 0xfdff0000-0xfdffffff]
>> >>     pci 0000:01:00.0: reg 0x30: [mem 0xfdfc0000-0xfdfdffff pref]
>> >>
>> >>     pci 0000:00:01.0: can't claim BAR 15 [mem 0xbdf00000-0xddefffff
>> >> 64bit pref]: no compatible bridge window
>> >>     pci 0000:01:00.0: can't claim BAR 0 [mem 0xc0000000-0xcfffffff
>> >> pref]: no compatible bridge window
>> >
>> > Looks like we wrong type checking from parent resource searching.
>
> I don't understand what you're trying to say, but I don't think the "no
> compatible bridge window" problem is related to resource types.  I think
> it's because the 00:01.0 window at [mem 0xbdf00000-0xddefffff] starts
> before the [mem 0xc0000000-0xffffffff] host bridge window.

on this system _CRS is not used.

conflicts with System RAM [mem 0x00100000-0xbff9ffff]

Are you going to enforce using _CRS on this system with DMI ?

Yinghai

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-04 16:37   ` Bjorn Helgaas
@ 2014-12-04 22:24     ` Yinghai Lu
  2014-12-05  0:24       ` Yinghai Lu
  0 siblings, 1 reply; 37+ messages in thread
From: Yinghai Lu @ 2014-12-04 22:24 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Alexey Voronkov, David Airlie, Saifi Khan,
	Alex Deucher, Marek Kordík

On Thu, Dec 4, 2014 at 8:37 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> pci=realloc should workaround the problem, as it would take control of
>> the bridge mmio BAR, and allocate more range for mmio pref under it.
>
> I don't think pci=realloc is the answer because
>
>   1) It's a poor user experience.  We should be able to handle this
>      automatically, without asking the user to do anything.

we can detect that when bridge support 64bit mmio pref, and reject
child device pref that only
support mmio pref non-b64bit.

>
>   2) It puts the radeon framebuffer in the non-prefetchable window, and we
>      should be able to keep it in the prefetchable window.

clear bridge 64bit flags if all children does not support that?

even that slot supports hotplug.

Yinghai

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-04 22:15       ` Yinghai Lu
@ 2014-12-04 22:25         ` Bjorn Helgaas
  0 siblings, 0 replies; 37+ messages in thread
From: Bjorn Helgaas @ 2014-12-04 22:25 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: linux-pci, Alexey Voronkov, David Airlie, Saifi Khan,
	Alex Deucher, Marek Kordík

On Thu, Dec 04, 2014 at 02:15:14PM -0800, Yinghai Lu wrote:
> On Thu, Dec 4, 2014 at 8:15 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> > On Wed, Dec 03, 2014 at 05:41:32PM -0800, Yinghai Lu wrote:
> >> On Wed, Dec 3, 2014 at 5:38 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> >> > On Wed, Dec 3, 2014 at 2:15 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> >> >> There's a bugzilla for this, so this email is to make the bugzilla
> >> >> more discoverable and to have a more public discussion about it.
> >> >>
> >> >>   https://bugzilla.kernel.org/show_bug.cgi?id=85491
> >> >>
> >> >> I think this regression is caused by 5b28541552ef ("PCI: Restrict
> >> >> 64-bit prefetchable bridge windows to 64-bit resources").  These dmesg
> >> >> fragments are from v3.16, which doesn't work:
> >> >>
> >> >>     pci_bus 0000:00: root bus resource [mem 0xc0000000-0xffffffff]
> >> >>
> >> >>     pci 0000:00:01.0: PCI bridge to [bus 01]
> >> >>     pci 0000:00:01.0:   bridge window [io  0x8000-0xafff]
> >> >>     pci 0000:00:01.0:   bridge window [mem 0xfdf00000-0xfdffffff]
> >> >>     pci 0000:00:01.0:   bridge window [mem 0xbdf00000-0xddefffff 64bit
> >> >> pref]  <-- outside host bridge window
> >> >>
> >> >>     pci 0000:01:00.0: reg 0x10: [mem 0xc0000000-0xcfffffff pref]
> >> >>     pci 0000:01:00.0: reg 0x14: [io  0xa000-0xa0ff]
> >> >>     pci 0000:01:00.0: reg 0x18: [mem 0xfdff0000-0xfdffffff]
> >> >>     pci 0000:01:00.0: reg 0x30: [mem 0xfdfc0000-0xfdfdffff pref]
> >> >>
> >> >>     pci 0000:00:01.0: can't claim BAR 15 [mem 0xbdf00000-0xddefffff
> >> >> 64bit pref]: no compatible bridge window
> >> >>     pci 0000:01:00.0: can't claim BAR 0 [mem 0xc0000000-0xcfffffff
> >> >> pref]: no compatible bridge window
> >> >
> >> > Looks like we wrong type checking from parent resource searching.
> >
> > I don't understand what you're trying to say, but I don't think the "no
> > compatible bridge window" problem is related to resource types.  I think
> > it's because the 00:01.0 window at [mem 0xbdf00000-0xddefffff] starts
> > before the [mem 0xc0000000-0xffffffff] host bridge window.
> 
> on this system _CRS is not used.

Oh, sorry, I didn't notice that Zermond's system doesn't use _CRS.  I had
been looking at the logs from Marek, and we *do* use _CRS there.  On
Zermond's machine we do still have the e820 code that figures out this:

  e820: [mem 0xc0000000-0xfedfffff] available for PCI devices

I think we could at least clamp the default host bridge window
([mem 0x00000000-0xfffffffff]) to that.

Bjorn

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-04 22:24     ` Yinghai Lu
@ 2014-12-05  0:24       ` Yinghai Lu
  0 siblings, 0 replies; 37+ messages in thread
From: Yinghai Lu @ 2014-12-05  0:24 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Alexey Voronkov, David Airlie, Saifi Khan,
	Alex Deucher, Marek Kordík

[-- Attachment #1: Type: text/plain, Size: 3279 bytes --]

On Thu, Dec 4, 2014 at 2:24 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Thu, Dec 4, 2014 at 8:37 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>> pci=realloc should workaround the problem, as it would take control of
>>> the bridge mmio BAR, and allocate more range for mmio pref under it.
>>
>> I don't think pci=realloc is the answer because
>>
>>   1) It's a poor user experience.  We should be able to handle this
>>      automatically, without asking the user to do anything.
>
> we can detect that when bridge support 64bit mmio pref, and reject
> child device pref that only
> support mmio pref non-b64bit.
>
>>
>>   2) It puts the radeon framebuffer in the non-prefetchable window, and we
>>      should be able to keep it in the prefetchable window.
>
> clear bridge 64bit flags if all children does not support that?
>
> even that slot supports hotplug.

Please check if attached patch is good for v3.18.

---
 drivers/pci/setup-bus.c |   50 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

Index: linux-2.6/drivers/pci/setup-bus.c
===================================================================
--- linux-2.6.orig/drivers/pci/setup-bus.c
+++ linux-2.6/drivers/pci/setup-bus.c
@@ -1118,6 +1118,52 @@ handle_done:
     ;
 }

+static void pbus_check_mem64(struct pci_bus *bus, unsigned long mask,
+             unsigned long type, struct list_head *realloc_head)
+{
+    struct pci_dev *dev;
+    resource_size_t align;
+    resource_size_t aligns[18];    /* Alignments from 1Mb to 128Gb */
+    int order;
+    unsigned int mem64_mask = 0;
+    struct resource *b_res = find_free_bus_resource(bus, mask, type);
+
+    if (!b_res || !(b_res->flags & IORESOURCE_MEM_64))
+        return;
+
+    memset(aligns, 0, sizeof(aligns));
+
+    mem64_mask = IORESOURCE_MEM_64;
+    b_res->flags &= ~IORESOURCE_MEM_64;
+
+    list_for_each_entry(dev, &bus->devices, bus_list) {
+        int i;
+
+        for (i = 0; i < PCI_NUM_RESOURCES; i++) {
+            struct resource *r = &dev->resource[i];
+
+            if (r->parent || (r->flags & mask) != type)
+                continue;
+#ifdef CONFIG_PCI_IOV
+            if (realloc_head && i >= PCI_IOV_RESOURCES &&
+                    i <= PCI_IOV_RESOURCE_END)
+                continue;
+#endif
+            align = pci_resource_alignment(dev, r);
+            order = __ffs(align) - 20;
+            if (order < 0)
+                order = 0;
+            if (order >= ARRAY_SIZE(aligns))
+                continue;
+
+            if (i != PCI_ROM_RESOURCE)
+                mem64_mask &= r->flags & IORESOURCE_MEM_64;
+        }
+    }
+
+    b_res->flags |= mem64_mask;
+}
+
 void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head
*realloc_head)
 {
     struct pci_dev *dev;
@@ -1171,6 +1217,10 @@ void __pci_bus_size_bridges(struct pci_b
         b_res = &bus->self->resource[PCI_BRIDGE_RESOURCES];
         mask = IORESOURCE_MEM;
         prefmask = IORESOURCE_MEM | IORESOURCE_PREFETCH;
+
+        if (b_res[2].flags & IORESOURCE_MEM_64)
+            pbus_check_mem64(bus, prefmask, prefmask, realloc_head);
+
         if (b_res[2].flags & IORESOURCE_MEM_64) {
             prefmask |= IORESOURCE_MEM_64;
             ret = pbus_size_mem(bus, prefmask, prefmask,

[-- Attachment #2: clear_bridge_mmio64_flag.patch --]
[-- Type: text/x-patch, Size: 2073 bytes --]

---
 drivers/pci/setup-bus.c |   50 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

Index: linux-2.6/drivers/pci/setup-bus.c
===================================================================
--- linux-2.6.orig/drivers/pci/setup-bus.c
+++ linux-2.6/drivers/pci/setup-bus.c
@@ -1118,6 +1118,52 @@ handle_done:
 	;
 }
 
+static void pbus_check_mem64(struct pci_bus *bus, unsigned long mask,
+			 unsigned long type, struct list_head *realloc_head)
+{
+	struct pci_dev *dev;
+	resource_size_t align;
+	resource_size_t aligns[18];	/* Alignments from 1Mb to 128Gb */
+	int order;
+	unsigned int mem64_mask = 0;
+	struct resource *b_res = find_free_bus_resource(bus, mask, type);
+
+	if (!b_res || !(b_res->flags & IORESOURCE_MEM_64))
+		return;
+
+	memset(aligns, 0, sizeof(aligns));
+
+	mem64_mask = IORESOURCE_MEM_64;
+	b_res->flags &= ~IORESOURCE_MEM_64;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		int i;
+
+		for (i = 0; i < PCI_NUM_RESOURCES; i++) {
+			struct resource *r = &dev->resource[i];
+
+			if (r->parent || (r->flags & mask) != type)
+				continue;
+#ifdef CONFIG_PCI_IOV
+			if (realloc_head && i >= PCI_IOV_RESOURCES &&
+					i <= PCI_IOV_RESOURCE_END)
+				continue;
+#endif
+			align = pci_resource_alignment(dev, r);
+			order = __ffs(align) - 20;
+			if (order < 0)
+				order = 0;
+			if (order >= ARRAY_SIZE(aligns))
+				continue;
+
+			if (i != PCI_ROM_RESOURCE)
+				mem64_mask &= r->flags & IORESOURCE_MEM_64;
+		}
+	}
+
+	b_res->flags |= mem64_mask;
+}
+
 void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
 {
 	struct pci_dev *dev;
@@ -1171,6 +1217,10 @@ void __pci_bus_size_bridges(struct pci_b
 		b_res = &bus->self->resource[PCI_BRIDGE_RESOURCES];
 		mask = IORESOURCE_MEM;
 		prefmask = IORESOURCE_MEM | IORESOURCE_PREFETCH;
+
+		if (b_res[2].flags & IORESOURCE_MEM_64)
+			pbus_check_mem64(bus, prefmask, prefmask, realloc_head);
+
 		if (b_res[2].flags & IORESOURCE_MEM_64) {
 			prefmask |= IORESOURCE_MEM_64;
 			ret = pbus_size_mem(bus, prefmask, prefmask,

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-04 22:06       ` Yinghai Lu
@ 2014-12-05 10:59         ` Benjamin Herrenschmidt
  2014-12-08  0:47           ` Gavin Shan
  0 siblings, 1 reply; 37+ messages in thread
From: Benjamin Herrenschmidt @ 2014-12-05 10:59 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, linux-pci, Alexey Voronkov, David Airlie,
	Saifi Khan, Alex Deucher, Marek Kordík, Gavin Shan, weiyang

On Thu, 2014-12-04 at 14:06 -0800, Yinghai Lu wrote:
> > I'm considering reverting 5b28541552ef (assuming that would fix this radeon
> > regression) because I think the radeon problem is worse than the 74151
> > problem and I think we can solve 74151 in a different way.
> >
> > Guo, Ben, this would affect you!  It's too late to revert 5b28541552ef for
> > v3.18, but we really need to work on this for v3.19.
> >
> > FOr 85491 (the radeon problem), we might be able to work around it by
> > fixing the bridge window, but I'm not sure we could even get that done
> > before v3.18.  I'll work on that today.
> 
> shrink bridge mmio pref range instead of reject it?

Guo unfortunately doesn't work for us anymore. Gavin, Richard, can
you handle this please ?

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-05 10:59         ` Benjamin Herrenschmidt
@ 2014-12-08  0:47           ` Gavin Shan
  2014-12-08 21:04             ` Bjorn Helgaas
  0 siblings, 1 reply; 37+ messages in thread
From: Gavin Shan @ 2014-12-08  0:47 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Yinghai Lu, Bjorn Helgaas, linux-pci, Alexey Voronkov,
	David Airlie, Saifi Khan, Alex Deucher, Marek Kordík,
	Gavin Shan, weiyang

On Fri, Dec 05, 2014 at 09:59:37PM +1100, Benjamin Herrenschmidt wrote:
>On Thu, 2014-12-04 at 14:06 -0800, Yinghai Lu wrote:
>> > I'm considering reverting 5b28541552ef (assuming that would fix this radeon
>> > regression) because I think the radeon problem is worse than the 74151
>> > problem and I think we can solve 74151 in a different way.
>> >

Commit 5b28541552ef affects PowerPC PCI greatly. Without the commit, those
adapters requesting large BAR would fail and can't work properly. Another
related thing is SRIOV support. Currently, VF BARs are covered by 64-bits
prefetch windows and we expect VF BARs allocated from 64-bits prefetchable
windows. If we're going to revert commit 5b28541552ef, too much things are
affected. I think it's more reasonable to find a solution for bug#85491 if
that's not too hard.

Looking at bug#85491 and the logs from Marek, BIOS didn't configure windows
and BARs for PCI bridge and adapters properly, which forces the kernel to
reassign those BARs/windows failing to be claimed previously. If I'm correct,
we need a way to reassign the bridge's non-prefetchable window and with it,
find_free_bus_resource() needn't check on "!r->parent".

Another way to fix it would be to returning error from pbus_size_mem() for
64-bits-prefetchable case so that prefetchable BARs will be assigned from
prefetchable windows (64-bits flag will be ignored). I guss Yinghai will
have more comments on this.

>> > Guo, Ben, this would affect you!  It's too late to revert 5b28541552ef for
>> > v3.18, but we really need to work on this for v3.19.
>> >
>> > FOr 85491 (the radeon problem), we might be able to work around it by
>> > fixing the bridge window, but I'm not sure we could even get that done
>> > before v3.18.  I'll work on that today.
>> 
>> shrink bridge mmio pref range instead of reject it?
>
>Guo unfortunately doesn't work for us anymore. Gavin, Richard, can
>you handle this please ?
>

Thanks,
Gavin

>Cheers,
>Ben.
>
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-08  0:47           ` Gavin Shan
@ 2014-12-08 21:04             ` Bjorn Helgaas
  2014-12-08 21:38               ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 37+ messages in thread
From: Bjorn Helgaas @ 2014-12-08 21:04 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Benjamin Herrenschmidt, Yinghai Lu, linux-pci, Alexey Voronkov,
	David Airlie, Saifi Khan, Alex Deucher, Marek Kordík,
	Richard Yang

On Sun, Dec 7, 2014 at 5:47 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
> On Fri, Dec 05, 2014 at 09:59:37PM +1100, Benjamin Herrenschmidt wrote:
>>On Thu, 2014-12-04 at 14:06 -0800, Yinghai Lu wrote:
>>> > I'm considering reverting 5b28541552ef (assuming that would fix this radeon
>>> > regression) because I think the radeon problem is worse than the 74151
>>> > problem and I think we can solve 74151 in a different way.
>>> >
>
> Commit 5b28541552ef affects PowerPC PCI greatly. Without the commit, those
> adapters requesting large BAR would fail and can't work properly. Another
> related thing is SRIOV support. Currently, VF BARs are covered by 64-bits
> prefetch windows and we expect VF BARs allocated from 64-bits prefetchable
> windows. If we're going to revert commit 5b28541552ef, too much things are
> affected. I think it's more reasonable to find a solution for bug#85491 if
> that's not too hard.

Well, these systems with Radeons used to work, but were broken by
5b28541552ef.  Of course, this has nothing to do with Radeon per se;
the same problem could occur with other devices.

Some PowerPC systems didn't work, but were made to work by
5b28541552ef.  This change isn't PowerPC-specific either; it may have
made other systems start working as well.

>From a regression point of view, it is more important to keep the
previously-working Radeon systems working than it is to make
previously-broken systems work.  That's one argument for reverting
5b28541552ef.

It would be ideal to have a fix that makes both the Radeon systems and
the PowerPC systems work.  Yinghai has posted patches that seem to do
that, but they don't have changelogs and I don't understand them yet,
so it's not clear (yet) that they are the answer.

However, I think 5b28541552ef is taking the wrong approach.  Consider
a device that, like this Radeon, has a 32-bit prefetchable BAR.  If
the upstream bridge has a 32-bit prefetchable window, things work as
expected -- we put the prefetchable BAR in the prefetchable window.
But if the bridge has a 64-bit prefetchable window, we put that same
BAR in a 32-bit non-prefetchable window.

So we upgraded the system with a better bridge, i.e., it supports a
full 64-bit window instead of just a 32-bit window, and the system
performs WORSE.  That's what I think is fundamentally broken.

> Looking at bug#85491 and the logs from Marek, BIOS didn't configure windows
> and BARs for PCI bridge and adapters properly, which forces the kernel to
> reassign those BARs/windows failing to be claimed previously. If I'm correct,
> we need a way to reassign the bridge's non-prefetchable window and with it,
> find_free_bus_resource() needn't check on "!r->parent".

It's true that the BIOS didn't configure things correctly.  But we
can't use that as an excuse; the kernel should be able to configure
things correctly itself because there are plenty of machines where the
BIOS never configures things.

Bjorn

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-08 21:04             ` Bjorn Helgaas
@ 2014-12-08 21:38               ` Benjamin Herrenschmidt
  2014-12-08 21:46                 ` Bjorn Helgaas
  0 siblings, 1 reply; 37+ messages in thread
From: Benjamin Herrenschmidt @ 2014-12-08 21:38 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Gavin Shan, Yinghai Lu, linux-pci, Alexey Voronkov, David Airlie,
	Saifi Khan, Alex Deucher, Marek Kordík, Richard Yang

On Mon, 2014-12-08 at 14:04 -0700, Bjorn Helgaas wrote:
> 
> However, I think 5b28541552ef is taking the wrong approach.  Consider
> a device that, like this Radeon, has a 32-bit prefetchable BAR.  If
> the upstream bridge has a 32-bit prefetchable window, things work as
> expected -- we put the prefetchable BAR in the prefetchable window.
> But if the bridge has a 64-bit prefetchable window, we put that same
> BAR in a 32-bit non-prefetchable window.

Well, that's expected, unless the 64-bit prefetchable window happens to
be fully enclosed in 32-bit space ... So maybe the approach is to check
that this is the case and "downgrade" such 64-bit prefetchable BARs to
32-bit ...

Cheers,
Ben.


> So we upgraded the system with a better bridge, i.e., it supports a
> full 64-bit window instead of just a 32-bit window, and the system
> performs WORSE.  That's what I think is fundamentally broken.
> 
> > Looking at bug#85491 and the logs from Marek, BIOS didn't configure
> windows
> > and BARs for PCI bridge and adapters properly, which forces the
> kernel to
> > reassign those BARs/windows failing to be claimed previously. If I'm
> correct,
> > we need a way to reassign the bridge's non-prefetchable window and
> with it,
> > find_free_bus_resource() needn't check on "!r->parent".
> 
> It's true that the BIOS didn't configure things correctly.  But we
> can't use that as an excuse; the kernel should be able to configure
> things correctly itself because there are plenty of machines where the
> BIOS never configures things.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-08 21:38               ` Benjamin Herrenschmidt
@ 2014-12-08 21:46                 ` Bjorn Helgaas
  2014-12-08 23:38                   ` Gavin Shan
  0 siblings, 1 reply; 37+ messages in thread
From: Bjorn Helgaas @ 2014-12-08 21:46 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Gavin Shan, Yinghai Lu, linux-pci, Alexey Voronkov, David Airlie,
	Alex Deucher, Marek Kordík, Richard Yang

[-cc Saifi (bouncing)]

On Mon, Dec 8, 2014 at 2:38 PM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Mon, 2014-12-08 at 14:04 -0700, Bjorn Helgaas wrote:
>>
>> However, I think 5b28541552ef is taking the wrong approach.  Consider
>> a device that, like this Radeon, has a 32-bit prefetchable BAR.  If
>> the upstream bridge has a 32-bit prefetchable window, things work as
>> expected -- we put the prefetchable BAR in the prefetchable window.
>> But if the bridge has a 64-bit prefetchable window, we put that same
>> BAR in a 32-bit non-prefetchable window.
>
> Well, that's expected, unless the 64-bit prefetchable window happens to
> be fully enclosed in 32-bit space ... So maybe the approach is to check
> that this is the case and "downgrade" such 64-bit prefetchable BARs to
> 32-bit ...

Yeah, I didn't word that very clearly.  I meant that after
5b28541552ef, that 64-bit window is wasted unless there's a 64-bit BAR
below it; we can't even place the window below 4GB and use it for
32-bit prefetchable BARs.

>> So we upgraded the system with a better bridge, i.e., it supports a
>> full 64-bit window instead of just a 32-bit window, and the system
>> performs WORSE.  That's what I think is fundamentally broken.
>>
>> > Looking at bug#85491 and the logs from Marek, BIOS didn't configure
>> windows
>> > and BARs for PCI bridge and adapters properly, which forces the
>> kernel to
>> > reassign those BARs/windows failing to be claimed previously. If I'm
>> correct,
>> > we need a way to reassign the bridge's non-prefetchable window and
>> with it,
>> > find_free_bus_resource() needn't check on "!r->parent".
>>
>> It's true that the BIOS didn't configure things correctly.  But we
>> can't use that as an excuse; the kernel should be able to configure
>> things correctly itself because there are plenty of machines where the
>> BIOS never configures things.
>
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-08 21:46                 ` Bjorn Helgaas
@ 2014-12-08 23:38                   ` Gavin Shan
  2014-12-09  0:21                     ` Yinghai Lu
  0 siblings, 1 reply; 37+ messages in thread
From: Gavin Shan @ 2014-12-08 23:38 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Benjamin Herrenschmidt, Gavin Shan, Yinghai Lu, linux-pci,
	Alexey Voronkov, David Airlie, Alex Deucher, Marek Kordík,
	Richard Yang

[-- Attachment #1: Type: text/plain, Size: 1430 bytes --]

On Mon, Dec 08, 2014 at 02:46:00PM -0700, Bjorn Helgaas wrote:
>[-cc Saifi (bouncing)]
>
>On Mon, Dec 8, 2014 at 2:38 PM, Benjamin Herrenschmidt
><benh@kernel.crashing.org> wrote:
>> On Mon, 2014-12-08 at 14:04 -0700, Bjorn Helgaas wrote:
>>>
>>> However, I think 5b28541552ef is taking the wrong approach.  Consider
>>> a device that, like this Radeon, has a 32-bit prefetchable BAR.  If
>>> the upstream bridge has a 32-bit prefetchable window, things work as
>>> expected -- we put the prefetchable BAR in the prefetchable window.
>>> But if the bridge has a 64-bit prefetchable window, we put that same
>>> BAR in a 32-bit non-prefetchable window.
>>
>> Well, that's expected, unless the 64-bit prefetchable window happens to
>> be fully enclosed in 32-bit space ... So maybe the approach is to check
>> that this is the case and "downgrade" such 64-bit prefetchable BARs to
>> 32-bit ...
>
>Yeah, I didn't word that very clearly.  I meant that after
>5b28541552ef, that 64-bit window is wasted unless there's a 64-bit BAR
>below it; we can't even place the window below 4GB and use it for
>32-bit prefetchable BARs.
>

Yes, I agree. It's why I suggested to return error from pbus_size_mem()
to indicate the case: 64-bits prefetchable window isn't used and it's
still available to be used by child 32-bits prefetchable BARs. Please
take a look on the attached draft patch, which helps to explain the idea
only.

Thanks,
Gavin

[-- Attachment #2: 0001-PCI-Use-64-bits-pref-window-accomodate-pref-BARs.patch --]
[-- Type: text/x-diff, Size: 1749 bytes --]

>From bef332d8e2bfc464da202daa2c9d1415db1f1224 Mon Sep 17 00:00:00 2001
From: Gavin Shan <gwshan@linux.vnet.ibm.com>
Date: Tue, 9 Dec 2014 10:21:24 +1100
Subject: [PATCH] PCI: Use 64-bits pref window accomodate pref BARs

The PCI resource allocation and reassignment has been changed a
lot by Commit 5b28541552ef ("PCI: Restrict 64-bit prefetchable
bridge windows to 64-bit resources"): If parent bridge has 64-bits
prefetchable window, all child 64-bits prefetchable BARs are
expected to be assigned from the window. The left child BARs are
going to be allocated from parent non-prefetchable window.

If there're no child 64-bits prefetchable BARs, pbus_size_mem()
should return error so that the 64-bits prefetchable window still
can be used to accomodate child 32-bits prefetchable BARs.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
v1: Draft patch to explain idea only
---
 drivers/pci/setup-bus.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 0482235..7a7778f 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -989,6 +989,16 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 		}
 	}
 
+	/*
+	 * If we don't have unassigned BARs with indicated type, the
+	 * corresponding window won't be used. However, we might use
+	 * this window for BARs with other types, e.g. 64-bits pref
+	 * window for 32-bits pref BARs. Here to return error to
+	 * indicate the case.
+	 */
+	if (!size)
+		return -ENODEV;
+
 	min_align = calculate_mem_align(aligns, max_order);
 	min_align = max(min_align, window_alignment(bus, b_res->flags));
 	size0 = calculate_memsize(size, min_size, 0, resource_size(b_res), min_align);
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-08 23:38                   ` Gavin Shan
@ 2014-12-09  0:21                     ` Yinghai Lu
  2014-12-19 22:17                       ` Bjorn Helgaas
  2014-12-20  0:23                       ` Bjorn Helgaas
  0 siblings, 2 replies; 37+ messages in thread
From: Yinghai Lu @ 2014-12-09  0:21 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Bjorn Helgaas, Benjamin Herrenschmidt, linux-pci,
	Alexey Voronkov, David Airlie, Alex Deucher, Marek Kordík,
	Richard Yang

On Mon, Dec 8, 2014 at 3:38 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
> On Mon, Dec 08, 2014 at 02:46:00PM -0700, Bjorn Helgaas wrote:
>>Yeah, I didn't word that very clearly.  I meant that after
>>5b28541552ef, that 64-bit window is wasted unless there's a 64-bit BAR
>>below it; we can't even place the window below 4GB and use it for
>>32-bit prefetchable BARs.
>>
>
> Yes, I agree. It's why I suggested to return error from pbus_size_mem()
> to indicate the case: 64-bits prefetchable window isn't used and it's
> still available to be used by child 32-bits prefetchable BARs. Please
> take a look on the attached draft patch, which helps to explain the idea
> only.

That would not help. The 00:01.0 on Zermond's system support hotplug. so mem
pref will be used for 64bit pref.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-09  0:21                     ` Yinghai Lu
@ 2014-12-19 22:17                       ` Bjorn Helgaas
  2014-12-20  0:23                       ` Bjorn Helgaas
  1 sibling, 0 replies; 37+ messages in thread
From: Bjorn Helgaas @ 2014-12-19 22:17 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Gavin Shan, Benjamin Herrenschmidt, linux-pci, Alexey Voronkov,
	David Airlie, Alex Deucher, Marek Kordík, Richard Yang,
	linux-kernel

[+cc linux-kernel for visibility]

For context, this is about a regression caused by 5b28541552ef ("PCI:
Restrict 64-bit prefetchable bridge windows to 64-bit resources"),
which appeared in v3.16.  The bugzilla is at
https://bugzilla.kernel.org/show_bug.cgi?id=85491

The problem is that BIOS programmed an invalid Root Port window
leading to the Radeon device.  The window contains the Radeon device,
so the configuration actually *works* fine, but the window is invalid
because it either overlaps system RAM or starts below the upstream
host bridge window, so Linux discards it:

Zermond's system:
  acpi PNP0A08:00: host bridge window [mem 0xc0000000-0xffffffff] (ignored)
  pci_bus 0000:00: root bus resource [mem 0x00000000-0xfffffffff]
  pci 0000:00:01.0:   bridge window [mem 0xbdf00000-0xddefffff 64bit
pref]  # invalid Root Port window
  pci 0000:00:01.0: can't claim BAR 15 [mem 0xbdf00000-0xddefffff
64bit pref]: address conflict with System RAM [mem
0x00100000-0xbff9ffff]
  pci 0000:01:00.0: can't claim BAR 0 [mem 0xc0000000-0xcfffffff
pref]: no compatible bridge window  # Radeon

Marek's system:
  pci_bus 0000:00: root bus resource [mem 0xc0000000-0xffffffff]   (from _CRS)
  pci 0000:00:01.0:   bridge window [mem 0xbdf00000-0xddefffff 64bit
pref]  # invalid Root Port window
  pci 0000:00:01.0: can't claim BAR 15 [mem 0xbdf00000-0xddefffff
64bit pref]: no compatible bridge window
  pci 0000:01:00.0: can't claim BAR 0 [mem 0xc0000000-0xcfffffff
pref]: no compatible bridge window  # Radeon

The Root Port window had the same problem before 5b28541552ef, of
course, since BIOS set it up.  But before 5b28541552ef, Linux assigned
a valid window big enough for the Radeon:

  pci 0000:00:01.0:   bridge window [mem 0xc0000000-0xcfffffff pref]

After 5b28541552ef, we won't put a 64-bit window below 4GB, so we
assign space above 4GB:

  pci 0000:00:01.0:   bridge window [mem 0x140000000-0x1401fffff 64bit pref]

which is not usable by Radeon, since it only has a 32-bit BAR.

On Mon, Dec 8, 2014 at 5:21 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Mon, Dec 8, 2014 at 3:38 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
>> On Mon, Dec 08, 2014 at 02:46:00PM -0700, Bjorn Helgaas wrote:
>>>Yeah, I didn't word that very clearly.  I meant that after
>>>5b28541552ef, that 64-bit window is wasted unless there's a 64-bit BAR
>>>below it; we can't even place the window below 4GB and use it for
>>>32-bit prefetchable BARs.
>>>
>>
>> Yes, I agree. It's why I suggested to return error from pbus_size_mem()
>> to indicate the case: 64-bits prefetchable window isn't used and it's
>> still available to be used by child 32-bits prefetchable BARs. Please
>> take a look on the attached draft patch, which helps to explain the idea
>> only.
>
> That would not help. The 00:01.0 on Zermond's system support hotplug. so mem
> pref will be used for 64bit pref.

I don't think it is an improvement to restrict use of the 64bit pref
window to a hypothetical future 64-bit capable device.  The
configuration done by BIOS was optimal for the hardware already in the
machine -- the prefetchable BAR is in a prefetchable window.  I don't
think we should change that configuration just because there's a
possibility of a different device in the future.

Bjorn

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-09  0:21                     ` Yinghai Lu
  2014-12-19 22:17                       ` Bjorn Helgaas
@ 2014-12-20  0:23                       ` Bjorn Helgaas
  2014-12-20  0:34                         ` Yinghai Lu
  1 sibling, 1 reply; 37+ messages in thread
From: Bjorn Helgaas @ 2014-12-20  0:23 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Gavin Shan, Benjamin Herrenschmidt, linux-pci, Alexey Voronkov,
	David Airlie, Alex Deucher, Marek Kordík, Richard Yang

On Mon, Dec 08, 2014 at 04:21:34PM -0800, Yinghai Lu wrote:
> On Mon, Dec 8, 2014 at 3:38 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
> > On Mon, Dec 08, 2014 at 02:46:00PM -0700, Bjorn Helgaas wrote:
> >>Yeah, I didn't word that very clearly.  I meant that after
> >>5b28541552ef, that 64-bit window is wasted unless there's a 64-bit BAR
> >>below it; we can't even place the window below 4GB and use it for
> >>32-bit prefetchable BARs.
> >
> > Yes, I agree. It's why I suggested to return error from pbus_size_mem()
> > to indicate the case: 64-bits prefetchable window isn't used and it's
> > still available to be used by child 32-bits prefetchable BARs. Please
> > take a look on the attached draft patch, which helps to explain the idea
> > only.
> 
> That would not help. The 00:01.0 on Zermond's system support hotplug. so mem
> pref will be used for 64bit pref.

Maybe it doesn't work as-is, but I think the idea is worth pursuing.  We
can tell whether there are existing children, and we can tell whether there
are any 64-bit prefetchable BARs.

If there is already a device below a hotplug bridge, and it has no 64-bit
BARs, I think we should use the prefetchable window for that device.

It seems silly to reserve the prefetchable window for a possible future
hot-added device.  That means we penalize the device we already know about
because it can't have any prefetchable space.  And we've consumed some
64-bit space that is unused.  We only benefit if we hot-remove the existing
device and hot-add a new device with 64-bit BARs that happen to fit inside
the 2MB (DEFAULT_HOTPLUG_MEM_SIZE) we allocated for it.  That seems pretty
unlikely.

I don't see any patches that resolve the regression (bug 85491) yet.  If we
don't figure something out, I'm going to have to revert 5b28541552ef.

Bjorn

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-20  0:23                       ` Bjorn Helgaas
@ 2014-12-20  0:34                         ` Yinghai Lu
  2014-12-20  0:56                           ` Bjorn Helgaas
  2014-12-22  8:17                           ` Wei Yang
  0 siblings, 2 replies; 37+ messages in thread
From: Yinghai Lu @ 2014-12-20  0:34 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Gavin Shan, Benjamin Herrenschmidt, linux-pci, Alexey Voronkov,
	David Airlie, Alex Deucher, Marek Kordík, Richard Yang

On Fri, Dec 19, 2014 at 4:23 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Mon, Dec 08, 2014 at 04:21:34PM -0800, Yinghai Lu wrote:
>
> I don't see any patches that resolve the regression (bug 85491) yet.  If we
> don't figure something out, I'm going to have to revert 5b28541552ef.

As you don't like idea to clear the MEM64 bit in resource ...

After this merging window, I would like to post 5 patches in

https://git.kernel.org/cgit/linux/kernel/git/yinghai/linux-yinghai.git/log/?h=for-pci-allocate-fit-3.18
that will make those two systems works with "pci=realloc".

We could have another patch that enable pci=realloc when VGA bar get
rejected at the first point.

The point is there is no performance difference between two solutions according
to Marek's test.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-20  0:34                         ` Yinghai Lu
@ 2014-12-20  0:56                           ` Bjorn Helgaas
  2014-12-20  1:33                             ` Yinghai Lu
  2014-12-22  8:17                           ` Wei Yang
  1 sibling, 1 reply; 37+ messages in thread
From: Bjorn Helgaas @ 2014-12-20  0:56 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Gavin Shan, Benjamin Herrenschmidt, linux-pci, Alexey Voronkov,
	David Airlie, Alex Deucher, Marek Kordík, Richard Yang

On Fri, Dec 19, 2014 at 5:34 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, Dec 19, 2014 at 4:23 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> On Mon, Dec 08, 2014 at 04:21:34PM -0800, Yinghai Lu wrote:
>>
>> I don't see any patches that resolve the regression (bug 85491) yet.  If we
>> don't figure something out, I'm going to have to revert 5b28541552ef.
>
> As you don't like idea to clear the MEM64 bit in resource ...
>
> After this merging window, I would like to post 5 patches in
>
> https://git.kernel.org/cgit/linux/kernel/git/yinghai/linux-yinghai.git/log/?h=for-pci-allocate-fit-3.18
> that will make those two systems works with "pci=realloc".

It's fine with me if you post things during the merge window.  But a
fix that depends on adding "pci=realloc" to the kernel parameters is
only a workaround, not sufficient to fix a regression.

> We could have another patch that enable pci=realloc when VGA bar get
> rejected at the first point.

That sounds pretty hacky, but I haven't seen your patch.

> The point is there is no performance difference between two solutions according
> to Marek's test.

I don't believe that.  There might not be a difference in Marek's
case, but this problem could happen with any device, and a
prefetchable window clearly performs better when we do significant
amounts of MMIO.  I'm not interested in fixes targeted at individual
cases.  We have to come up with designs that are good in general.

Marek's case is a little easier because his system is using _CRS.  We
should be able to notice that the Root Port window overlaps the host
bridge window, but isn't contained by it.  In that case, if we merely
trim the Root Port window so it fits, everything will just work.

Zermond's case is similar, but his system isn't using _CRS.  If we
turned on _CRS across the board instead of just for 2008 and newer,
his system would be fixed, too.  That's too risky for v3.19, but worth
considering.

I think we need to think about a different way of resolving the
PowerPC issue (https://bugzilla.kernel.org/show_bug.cgi?id=74151).
5b28541552ef was one way, but certainly not the only possible
approach.

SeaBIOS uses a different PCI configuration strategy.  It doesn't
handle some of the things we need, e.g., SR-IOV, but it's worth
reading for some new ideas:
http://code.coreboot.org/p/seabios/source/tree/master/src/fw/pciinit.c

Bjorn

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-20  0:56                           ` Bjorn Helgaas
@ 2014-12-20  1:33                             ` Yinghai Lu
  2014-12-23  1:53                               ` Yinghai Lu
  0 siblings, 1 reply; 37+ messages in thread
From: Yinghai Lu @ 2014-12-20  1:33 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Gavin Shan, Benjamin Herrenschmidt, linux-pci, Alexey Voronkov,
	David Airlie, Alex Deucher, Marek Kordík, Richard Yang

On Fri, Dec 19, 2014 at 4:56 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>
> Marek's case is a little easier because his system is using _CRS.  We
> should be able to notice that the Root Port window overlaps the host
> bridge window, but isn't contained by it.  In that case, if we merely
> trim the Root Port window so it fits, everything will just work.

Ok, will check that Monday.

Yinghai

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-20  0:34                         ` Yinghai Lu
  2014-12-20  0:56                           ` Bjorn Helgaas
@ 2014-12-22  8:17                           ` Wei Yang
  2014-12-22 19:50                             ` Bjorn Helgaas
  1 sibling, 1 reply; 37+ messages in thread
From: Wei Yang @ 2014-12-22  8:17 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Gavin Shan, Benjamin Herrenschmidt, linux-pci,
	Alexey Voronkov, David Airlie, Alex Deucher, Marek Kordík,
	Richard Yang

On Fri, Dec 19, 2014 at 04:34:57PM -0800, Yinghai Lu wrote:
>On Fri, Dec 19, 2014 at 4:23 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> On Mon, Dec 08, 2014 at 04:21:34PM -0800, Yinghai Lu wrote:
>>
>> I don't see any patches that resolve the regression (bug 85491) yet.  If we
>> don't figure something out, I'm going to have to revert 5b28541552ef.
>
>As you don't like idea to clear the MEM64 bit in resource ...
>
>After this merging window, I would like to post 5 patches in
>
>https://git.kernel.org/cgit/linux/kernel/git/yinghai/linux-yinghai.git/log/?h=for-pci-allocate-fit-3.18
>that will make those two systems works with "pci=realloc".
>
>We could have another patch that enable pci=realloc when VGA bar get
>rejected at the first point.
>
>The point is there is no performance difference between two solutions according
>to Marek's test.

One quick question. 

The patch in http://www.spinics.net/lists/linux-pci/msg37377.html doesn't use
"pci=realloc" as it mentioned. And seem worked on Marek's machine.

We wouldn't take it into consideration?

>
>Thanks
>
>Yinghai

-- 
Richard Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-22  8:17                           ` Wei Yang
@ 2014-12-22 19:50                             ` Bjorn Helgaas
  0 siblings, 0 replies; 37+ messages in thread
From: Bjorn Helgaas @ 2014-12-22 19:50 UTC (permalink / raw)
  To: Wei Yang
  Cc: Yinghai Lu, Gavin Shan, Benjamin Herrenschmidt, linux-pci,
	Alexey Voronkov, David Airlie, Alex Deucher, Marek Kordík

On Mon, Dec 22, 2014 at 1:17 AM, Wei Yang <weiyang@linux.vnet.ibm.com> wrote:
> On Fri, Dec 19, 2014 at 04:34:57PM -0800, Yinghai Lu wrote:
>>On Fri, Dec 19, 2014 at 4:23 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>> On Mon, Dec 08, 2014 at 04:21:34PM -0800, Yinghai Lu wrote:
>>>
>>> I don't see any patches that resolve the regression (bug 85491) yet.  If we
>>> don't figure something out, I'm going to have to revert 5b28541552ef.
>>
>>As you don't like idea to clear the MEM64 bit in resource ...
>>
>>After this merging window, I would like to post 5 patches in
>>
>>https://git.kernel.org/cgit/linux/kernel/git/yinghai/linux-yinghai.git/log/?h=for-pci-allocate-fit-3.18
>>that will make those two systems works with "pci=realloc".
>>
>>We could have another patch that enable pci=realloc when VGA bar get
>>rejected at the first point.
>>
>>The point is there is no performance difference between two solutions according
>>to Marek's test.
>
> One quick question.
>
> The patch in http://www.spinics.net/lists/linux-pci/msg37377.html doesn't use
> "pci=realloc" as it mentioned. And seem worked on Marek's machine.
>
> We wouldn't take it into consideration?

Nope.  https://lkml.org/lkml/2014/12/19/392

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-20  1:33                             ` Yinghai Lu
@ 2014-12-23  1:53                               ` Yinghai Lu
  2014-12-23 21:41                                 ` Bjorn Helgaas
  0 siblings, 1 reply; 37+ messages in thread
From: Yinghai Lu @ 2014-12-23  1:53 UTC (permalink / raw)
  To: Bjorn Helgaas, Marek Kordík
  Cc: Gavin Shan, Benjamin Herrenschmidt, linux-pci, Alexey Voronkov,
	David Airlie, Alex Deucher, Richard Yang

[-- Attachment #1: Type: text/plain, Size: 591 bytes --]

On Fri, Dec 19, 2014 at 5:33 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, Dec 19, 2014 at 4:56 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>
>> Marek's case is a little easier because his system is using _CRS.  We
>> should be able to notice that the Root Port window overlaps the host
>> bridge window, but isn't contained by it.  In that case, if we merely
>> trim the Root Port window so it fits, everything will just work.
>
> Ok, will check that Monday.

Please check attached patch that clip bridge resource.

Marek, can you give it a try on top of v3.18?

Thanks

Yinghai

[-- Attachment #2: pci_bridge_clip.patch --]
[-- Type: text/x-patch, Size: 4382 bytes --]

Subject: [RFC PATCH] PCI, x86: clip firmware assigned pci bridges under hostbridge

Some bios put range that is not fully coverred by root bus resources.
Try to clip them and update them in pci bridge bars.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=85491
Reported-by: Marek Kordik <kordikmarek@gmail.com>
Fixes: 5b28541552ef ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources")
Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/pci/i386.c       |   43 +++++++++++++++++++++++++++++++------------
 drivers/pci/host-bridge.c |   31 +++++++++++++++++++++++++++++++
 include/linux/pci.h       |    2 ++
 3 files changed, 64 insertions(+), 12 deletions(-)

Index: linux-2.6/arch/x86/pci/i386.c
===================================================================
--- linux-2.6.orig/arch/x86/pci/i386.c
+++ linux-2.6/arch/x86/pci/i386.c
@@ -205,10 +205,11 @@ EXPORT_SYMBOL(pcibios_align_resource);
  *	    as well.
  */
 
-static void pcibios_allocate_bridge_resources(struct pci_dev *dev)
+static bool pcibios_allocate_bridge_resources(struct pci_dev *dev)
 {
 	int idx;
 	struct resource *r;
+	bool changed = false;
 
 	for (idx = PCI_BRIDGE_RESOURCES; idx < PCI_NUM_RESOURCES; idx++) {
 		r = &dev->resource[idx];
@@ -216,17 +217,31 @@ static void pcibios_allocate_bridge_reso
 			continue;
 		if (r->parent)	/* Already allocated */
 			continue;
-		if (!r->start || pci_claim_resource(dev, idx) < 0) {
-			/*
-			 * Something is wrong with the region.
-			 * Invalidate the resource to prevent
-			 * child resource allocations in this
-			 * range.
-			 */
-			r->start = r->end = 0;
-			r->flags = 0;
+		if (!r->start)
+			goto clear;
+		if (pci_claim_resource(dev, idx) < 0) {
+			/* try again after clip for pci bridge*/
+			if ((dev->class >> 8) == PCI_CLASS_BRIDGE_PCI &&
+			    pcibios_resource_clip_in_host_bridge(dev->bus, r)) {
+				changed = true;
+				if (pci_claim_resource(dev, idx) >= 0)
+					continue;
+			}
+			goto clear;
 		}
+		continue;
+clear:
+		/*
+		 * Something is wrong with the region.
+		 * Invalidate the resource to prevent
+		 * child resource allocations in this
+		 * range.
+		 */
+		r->start = r->end = 0;
+		r->flags = 0;
 	}
+
+	return changed;
 }
 
 static void pcibios_allocate_bus_resources(struct pci_bus *bus)
@@ -234,8 +249,12 @@ static void pcibios_allocate_bus_resourc
 	struct pci_bus *child;
 
 	/* Depth-First Search on bus tree */
-	if (bus->self)
-		pcibios_allocate_bridge_resources(bus->self);
+	if (bus->self) {
+		bool changed = pcibios_allocate_bridge_resources(bus->self);
+
+		if (changed)
+			pci_setup_bridge(bus);
+	}
 	list_for_each_entry(child, &bus->children, node)
 		pcibios_allocate_bus_resources(child);
 }
Index: linux-2.6/drivers/pci/host-bridge.c
===================================================================
--- linux-2.6.orig/drivers/pci/host-bridge.c
+++ linux-2.6/drivers/pci/host-bridge.c
@@ -31,6 +31,37 @@ void pci_set_host_bridge_release(struct
 	bridge->release_data = release_data;
 }
 
+bool pcibios_resource_clip_in_host_bridge(struct pci_bus *bus,
+					  struct resource *res)
+{
+	struct pci_host_bridge *bridge = find_pci_host_bridge(bus);
+	struct pci_host_bridge_window *window;
+	resource_size_t start, end;
+
+	list_for_each_entry(window, &bridge->windows, list) {
+		if (resource_type(res) != resource_type(window->res))
+			continue;
+
+		start = max(window->res->start, res->start);
+		end = min(window->res->end, res->end);
+
+		/* no overlap ? */
+		if (start > end)
+			continue;
+
+		if (res->start == start && res->end == end)
+			return false;
+
+		/* changed */
+		res->start = start;
+		res->end = end;
+
+		return true;
+	}
+
+	return false;
+}
+
 void pcibios_resource_to_bus(struct pci_bus *bus, struct pci_bus_region *region,
 			     struct resource *res)
 {
Index: linux-2.6/include/linux/pci.h
===================================================================
--- linux-2.6.orig/include/linux/pci.h
+++ linux-2.6/include/linux/pci.h
@@ -762,6 +762,8 @@ void pci_fixup_cardbus(struct pci_bus *)
 
 /* Generic PCI functions used internally */
 
+bool pcibios_resource_clip_in_host_bridge(struct pci_bus *bus,
+					  struct resource *res);
 void pcibios_resource_to_bus(struct pci_bus *bus, struct pci_bus_region *region,
 			     struct resource *res);
 void pcibios_bus_to_resource(struct pci_bus *bus, struct resource *res,

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-23  1:53                               ` Yinghai Lu
@ 2014-12-23 21:41                                 ` Bjorn Helgaas
  2014-12-24  2:29                                   ` Yinghai Lu
  0 siblings, 1 reply; 37+ messages in thread
From: Bjorn Helgaas @ 2014-12-23 21:41 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Marek Kordík, Gavin Shan, Benjamin Herrenschmidt, linux-pci,
	Alexey Voronkov, David Airlie, Alex Deucher, Richard Yang

On Mon, Dec 22, 2014 at 05:53:55PM -0800, Yinghai Lu wrote:
> On Fri, Dec 19, 2014 at 5:33 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> > On Fri, Dec 19, 2014 at 4:56 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> >>
> >> Marek's case is a little easier because his system is using _CRS.  We
> >> should be able to notice that the Root Port window overlaps the host
> >> bridge window, but isn't contained by it.  In that case, if we merely
> >> trim the Root Port window so it fits, everything will just work.
> >
> > Ok, will check that Monday.
> 
> Please check attached patch that clip bridge resource.
> 
> Marek, can you give it a try on top of v3.18?
> 
> Thanks
> 
> Yinghai

> Subject: [RFC PATCH] PCI, x86: clip firmware assigned pci bridges under hostbridge
> 
> Some bios put range that is not fully coverred by root bus resources.
> Try to clip them and update them in pci bridge bars.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=85491
> Reported-by: Marek Kordik <kordikmarek@gmail.com>
> Fixes: 5b28541552ef ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources")
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> 
> ---
>  arch/x86/pci/i386.c       |   43 +++++++++++++++++++++++++++++++------------
>  drivers/pci/host-bridge.c |   31 +++++++++++++++++++++++++++++++
>  include/linux/pci.h       |    2 ++
>  3 files changed, 64 insertions(+), 12 deletions(-)
> 
> Index: linux-2.6/arch/x86/pci/i386.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/pci/i386.c
> +++ linux-2.6/arch/x86/pci/i386.c
> @@ -205,10 +205,11 @@ EXPORT_SYMBOL(pcibios_align_resource);
>   *	    as well.
>   */
>  
> -static void pcibios_allocate_bridge_resources(struct pci_dev *dev)
> +static bool pcibios_allocate_bridge_resources(struct pci_dev *dev)
>  {
>  	int idx;
>  	struct resource *r;
> +	bool changed = false;
>  
>  	for (idx = PCI_BRIDGE_RESOURCES; idx < PCI_NUM_RESOURCES; idx++) {
>  		r = &dev->resource[idx];
> @@ -216,17 +217,31 @@ static void pcibios_allocate_bridge_reso
>  			continue;
>  		if (r->parent)	/* Already allocated */
>  			continue;
> -		if (!r->start || pci_claim_resource(dev, idx) < 0) {
> -			/*
> -			 * Something is wrong with the region.
> -			 * Invalidate the resource to prevent
> -			 * child resource allocations in this
> -			 * range.
> -			 */
> -			r->start = r->end = 0;
> -			r->flags = 0;
> +		if (!r->start)
> +			goto clear;
> +		if (pci_claim_resource(dev, idx) < 0) {
> +			/* try again after clip for pci bridge*/
> +			if ((dev->class >> 8) == PCI_CLASS_BRIDGE_PCI &&
> +			    pcibios_resource_clip_in_host_bridge(dev->bus, r)) {

Can you do this so it clips to any upstream bridge, whether it's a host
bridge or a PCI-to-PCI bridge?  It doesn't seem like this has to be
specific to host bridge windows -- any device below a PCI-to-PCI bridge has
to use address space forwarded by that upstream bridge.

I wish this were in generic code.  There's nothing architecture-specific
about the problem, but this patch only fixes things on x86.  Can you make a
similar change in the corresponding code for other arches, as well?

> +				changed = true;
> +				if (pci_claim_resource(dev, idx) >= 0)
> +					continue;
> +			}
> +			goto clear;
>  		}
> +		continue;
> +clear:
> +		/*
> +		 * Something is wrong with the region.
> +		 * Invalidate the resource to prevent
> +		 * child resource allocations in this
> +		 * range.
> +		 */
> +		r->start = r->end = 0;
> +		r->flags = 0;
>  	}
> +
> +	return changed;
>  }
>  
>  static void pcibios_allocate_bus_resources(struct pci_bus *bus)
> @@ -234,8 +249,12 @@ static void pcibios_allocate_bus_resourc
>  	struct pci_bus *child;
>  
>  	/* Depth-First Search on bus tree */
> -	if (bus->self)
> -		pcibios_allocate_bridge_resources(bus->self);
> +	if (bus->self) {
> +		bool changed = pcibios_allocate_bridge_resources(bus->self);
> +
> +		if (changed)
> +			pci_setup_bridge(bus);
> +	}
>  	list_for_each_entry(child, &bus->children, node)
>  		pcibios_allocate_bus_resources(child);
>  }
> Index: linux-2.6/drivers/pci/host-bridge.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/host-bridge.c
> +++ linux-2.6/drivers/pci/host-bridge.c
> @@ -31,6 +31,37 @@ void pci_set_host_bridge_release(struct
>  	bridge->release_data = release_data;
>  }
>  
> +bool pcibios_resource_clip_in_host_bridge(struct pci_bus *bus,
> +					  struct resource *res)
> +{
> +	struct pci_host_bridge *bridge = find_pci_host_bridge(bus);
> +	struct pci_host_bridge_window *window;
> +	resource_size_t start, end;
> +
> +	list_for_each_entry(window, &bridge->windows, list) {
> +		if (resource_type(res) != resource_type(window->res))
> +			continue;
> +
> +		start = max(window->res->start, res->start);
> +		end = min(window->res->end, res->end);
> +
> +		/* no overlap ? */
> +		if (start > end)
> +			continue;
> +
> +		if (res->start == start && res->end == end)
> +			return false;
> +
> +		/* changed */
> +		res->start = start;
> +		res->end = end;

I think it'd be useful to emit a diagnostic here so there's a clue in dmesg
about what's going on.

> +		return true;
> +	}
> +
> +	return false;
> +}
> +
>  void pcibios_resource_to_bus(struct pci_bus *bus, struct pci_bus_region *region,
>  			     struct resource *res)
>  {
> Index: linux-2.6/include/linux/pci.h
> ===================================================================
> --- linux-2.6.orig/include/linux/pci.h
> +++ linux-2.6/include/linux/pci.h
> @@ -762,6 +762,8 @@ void pci_fixup_cardbus(struct pci_bus *)
>  
>  /* Generic PCI functions used internally */
>  
> +bool pcibios_resource_clip_in_host_bridge(struct pci_bus *bus,
> +					  struct resource *res);
>  void pcibios_resource_to_bus(struct pci_bus *bus, struct pci_bus_region *region,
>  			     struct resource *res);
>  void pcibios_bus_to_resource(struct pci_bus *bus, struct resource *res,


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-23 21:41                                 ` Bjorn Helgaas
@ 2014-12-24  2:29                                   ` Yinghai Lu
  2014-12-24 17:26                                     ` Bjorn Helgaas
  2014-12-25 15:46                                     ` Marek Kordík
  0 siblings, 2 replies; 37+ messages in thread
From: Yinghai Lu @ 2014-12-24  2:29 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Marek Kordík, Gavin Shan, Benjamin Herrenschmidt, linux-pci,
	Alexey Voronkov, David Airlie, Alex Deucher, Richard Yang

[-- Attachment #1: Type: text/plain, Size: 655 bytes --]

On Tue, Dec 23, 2014 at 1:41 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:

>
> Can you do this so it clips to any upstream bridge, whether it's a host
> bridge or a PCI-to-PCI bridge?  It doesn't seem like this has to be
> specific to host bridge windows -- any device below a PCI-to-PCI bridge has
> to use address space forwarded by that upstream bridge.

ok, please check attached v2.

>
> I wish this were in generic code.  There's nothing architecture-specific
> about the problem, but this patch only fixes things on x86.  Can you make a
> similar change in the corresponding code for other arches, as well?

ok, later after x86.

Thanks

Yinghai

[-- Attachment #2: pci_bridge_clip_v2.patch --]
[-- Type: text/x-patch, Size: 5536 bytes --]

Subject: [RFC PATCH -v2] PCI, x86: clip firmware assigned pci bridges under hostbridge

Some bios put range that is not fully coverred by root bus resources.
Try to clip them and update them in pci bridge bars.

-v2: check with upstream bridge instead of host bridge only.
     also add debug print out.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=85491
Reported-by: Marek Kordik <kordikmarek@gmail.com>
Fixes: 5b28541552ef ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources")
Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/pci/i386.c |   81 +++++++++++++++++++++++++++++++++++++---------------
 drivers/pci/bus.c   |   35 ++++++++++++++++++++++
 include/linux/pci.h |    1 
 3 files changed, 95 insertions(+), 22 deletions(-)

Index: linux-2.6/arch/x86/pci/i386.c
===================================================================
--- linux-2.6.orig/arch/x86/pci/i386.c
+++ linux-2.6/arch/x86/pci/i386.c
@@ -205,28 +205,48 @@ EXPORT_SYMBOL(pcibios_align_resource);
  *	    as well.
  */
 
-static void pcibios_allocate_bridge_resources(struct pci_dev *dev)
+static bool pcibios_allocate_bridge_resources(struct pci_dev *dev)
 {
 	int idx;
 	struct resource *r;
+	bool changed = false;
 
 	for (idx = PCI_BRIDGE_RESOURCES; idx < PCI_NUM_RESOURCES; idx++) {
+		int ret;
+
 		r = &dev->resource[idx];
 		if (!r->flags)
 			continue;
 		if (r->parent)	/* Already allocated */
 			continue;
-		if (!r->start || pci_claim_resource(dev, idx) < 0) {
-			/*
-			 * Something is wrong with the region.
-			 * Invalidate the resource to prevent
-			 * child resource allocations in this
-			 * range.
-			 */
-			r->start = r->end = 0;
-			r->flags = 0;
+
+		if (!r->start)
+			goto clear;
+
+		ret = pci_claim_resource(dev, idx);
+		if (ret >= 0)
+			continue;
+
+		/* try again after clip for pci bridge*/
+		if ((dev->class >> 8) == PCI_CLASS_BRIDGE_PCI &&
+		    pci_bus_clip_resource(dev, r)) {
+			changed = true;
+			if (pci_claim_resource(dev, idx) >= 0)
+				continue;
 		}
+
+clear:
+		/*
+		 * Something is wrong with the region.
+		 * Invalidate the resource to prevent
+		 * child resource allocations in this
+		 * range.
+		 */
+		r->start = r->end = 0;
+		r->flags = 0;
 	}
+
+	return changed;
 }
 
 static void pcibios_allocate_bus_resources(struct pci_bus *bus)
@@ -234,8 +254,12 @@ static void pcibios_allocate_bus_resourc
 	struct pci_bus *child;
 
 	/* Depth-First Search on bus tree */
-	if (bus->self)
-		pcibios_allocate_bridge_resources(bus->self);
+	if (bus->self) {
+		bool changed = pcibios_allocate_bridge_resources(bus->self);
+
+		if (changed)
+			pci_setup_bridge(bus);
+	}
 	list_for_each_entry(child, &bus->children, node)
 		pcibios_allocate_bus_resources(child);
 }
@@ -271,21 +295,34 @@ static void pcibios_allocate_dev_resourc
 			else
 				disabled = !(command & PCI_COMMAND_MEMORY);
 			if (pass == disabled) {
+				int ret;
+
 				dev_dbg(&dev->dev,
 					"BAR %d: reserving %pr (d=%d, p=%d)\n",
 					idx, r, disabled, pass);
-				if (pci_claim_resource(dev, idx) < 0) {
-					if (r->flags & IORESOURCE_PCI_FIXED) {
-						dev_info(&dev->dev, "BAR %d %pR is immovable\n",
+
+				ret = pci_claim_resource(dev, idx);
+
+				if (ret >= 0)
+					continue;
+
+				if (r->flags & IORESOURCE_PCI_FIXED) {
+					dev_info(&dev->dev, "BAR %d %pR is immovable\n",
 							 idx, r);
-					} else {
-						/* We'll assign a new address later */
-						pcibios_save_fw_addr(dev,
-								idx, r->start);
-						r->end -= r->start;
-						r->start = 0;
-					}
+					continue;
+				}
+
+				/* try again with clip */
+				if (pci_bus_clip_resource(dev, r)) {
+					pci_update_resource(dev, idx);
+					if (pci_claim_resource(dev, idx) >= 0)
+						continue;
 				}
+
+				/* We'll assign a new address later */
+				pcibios_save_fw_addr(dev, idx, r->start);
+				r->end -= r->start;
+				r->start = 0;
 			}
 		}
 	if (!pass) {
Index: linux-2.6/include/linux/pci.h
===================================================================
--- linux-2.6.orig/include/linux/pci.h
+++ linux-2.6/include/linux/pci.h
@@ -1094,6 +1094,7 @@ void pci_free_resource_list(struct list_
 void pci_bus_add_resource(struct pci_bus *bus, struct resource *res, unsigned int flags);
 struct resource *pci_bus_resource_n(const struct pci_bus *bus, int n);
 void pci_bus_remove_resources(struct pci_bus *bus);
+bool pci_bus_clip_resource(struct pci_dev *dev, struct resource *res);
 
 #define pci_bus_for_each_resource(bus, res, i)				\
 	for (i = 0;							\
Index: linux-2.6/drivers/pci/bus.c
===================================================================
--- linux-2.6.orig/drivers/pci/bus.c
+++ linux-2.6/drivers/pci/bus.c
@@ -228,6 +228,41 @@ int pci_bus_alloc_resource(struct pci_bu
 }
 EXPORT_SYMBOL(pci_bus_alloc_resource);
 
+bool pci_bus_clip_resource(struct pci_dev *dev, struct resource *res)
+{
+	struct pci_bus *bus = dev->bus;
+	resource_size_t start, end;
+	struct resource orig_res = *res;
+	struct resource *r;
+	int i;
+
+	pci_bus_for_each_resource(bus, r, i) {
+
+		if (resource_type(res) != resource_type(r))
+			continue;
+
+		start = max(r->start, res->start);
+		end = min(r->end, res->end);
+
+		/* no overlap ? */
+		if (start > end)
+			continue;
+
+		if (res->start == start && res->end == end)
+			return false;
+
+		/* changed */
+		res->start = start;
+		res->end = end;
+		dev_printk(KERN_DEBUG, &dev->dev, "%pR ==> %pR\n",
+				 &orig_res, res);
+
+		return true;
+	}
+
+	return false;
+}
+
 void __weak pcibios_resource_survey_bus(struct pci_bus *bus) { }
 
 /**

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-24  2:29                                   ` Yinghai Lu
@ 2014-12-24 17:26                                     ` Bjorn Helgaas
  2014-12-24 20:29                                       ` Yinghai Lu
  2014-12-25 15:46                                     ` Marek Kordík
  1 sibling, 1 reply; 37+ messages in thread
From: Bjorn Helgaas @ 2014-12-24 17:26 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Marek Kordík, Gavin Shan, Benjamin Herrenschmidt, linux-pci,
	Alexey Voronkov, David Airlie, Alex Deucher, Richard Yang

On Tue, Dec 23, 2014 at 06:29:20PM -0800, Yinghai Lu wrote:
> On Tue, Dec 23, 2014 at 1:41 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> 
> >
> > Can you do this so it clips to any upstream bridge, whether it's a host
> > bridge or a PCI-to-PCI bridge?  It doesn't seem like this has to be
> > specific to host bridge windows -- any device below a PCI-to-PCI bridge has
> > to use address space forwarded by that upstream bridge.
> 
> ok, please check attached v2.
> 
> >
> > I wish this were in generic code.  There's nothing architecture-specific
> > about the problem, but this patch only fixes things on x86.  Can you make a
> > similar change in the corresponding code for other arches, as well?
> 
> ok, later after x86.

OK, how about if we test it on x86, and if it works, you fix the other
arches, and then I'll apply it for all arches at the same time?

We are trying to fix a regression, but I don't think there's any reason to
make the arches diverge for a relatively trivial change like this.

> Thanks
> 
> Yinghai

> Subject: [RFC PATCH -v2] PCI, x86: clip firmware assigned pci bridges under hostbridge
> 
> Some bios put range that is not fully coverred by root bus resources.
> Try to clip them and update them in pci bridge bars.
> 
> -v2: check with upstream bridge instead of host bridge only.
>      also add debug print out.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=85491
> Reported-by: Marek Kordik <kordikmarek@gmail.com>
> Fixes: 5b28541552ef ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources")
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> 
> ---
>  arch/x86/pci/i386.c |   81 +++++++++++++++++++++++++++++++++++++---------------
>  drivers/pci/bus.c   |   35 ++++++++++++++++++++++
>  include/linux/pci.h |    1 
>  3 files changed, 95 insertions(+), 22 deletions(-)
> 
> Index: linux-2.6/arch/x86/pci/i386.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/pci/i386.c
> +++ linux-2.6/arch/x86/pci/i386.c
> @@ -205,28 +205,48 @@ EXPORT_SYMBOL(pcibios_align_resource);
>   *	    as well.
>   */
>  
> -static void pcibios_allocate_bridge_resources(struct pci_dev *dev)
> +static bool pcibios_allocate_bridge_resources(struct pci_dev *dev)
>  {
>  	int idx;
>  	struct resource *r;
> +	bool changed = false;
>  
>  	for (idx = PCI_BRIDGE_RESOURCES; idx < PCI_NUM_RESOURCES; idx++) {
> +		int ret;
> +
>  		r = &dev->resource[idx];
>  		if (!r->flags)
>  			continue;
>  		if (r->parent)	/* Already allocated */
>  			continue;
> -		if (!r->start || pci_claim_resource(dev, idx) < 0) {
> -			/*
> -			 * Something is wrong with the region.
> -			 * Invalidate the resource to prevent
> -			 * child resource allocations in this
> -			 * range.
> -			 */
> -			r->start = r->end = 0;
> -			r->flags = 0;
> +
> +		if (!r->start)
> +			goto clear;
> +
> +		ret = pci_claim_resource(dev, idx);
> +		if (ret >= 0)
> +			continue;
> +
> +		/* try again after clip for pci bridge*/
> +		if ((dev->class >> 8) == PCI_CLASS_BRIDGE_PCI &&
> +		    pci_bus_clip_resource(dev, r)) {
> +			changed = true;
> +			if (pci_claim_resource(dev, idx) >= 0)
> +				continue;
>  		}
> +
> +clear:
> +		/*
> +		 * Something is wrong with the region.
> +		 * Invalidate the resource to prevent
> +		 * child resource allocations in this
> +		 * range.
> +		 */
> +		r->start = r->end = 0;
> +		r->flags = 0;
>  	}
> +
> +	return changed;
>  }
>  
>  static void pcibios_allocate_bus_resources(struct pci_bus *bus)
> @@ -234,8 +254,12 @@ static void pcibios_allocate_bus_resourc
>  	struct pci_bus *child;
>  
>  	/* Depth-First Search on bus tree */
> -	if (bus->self)
> -		pcibios_allocate_bridge_resources(bus->self);
> +	if (bus->self) {
> +		bool changed = pcibios_allocate_bridge_resources(bus->self);
> +
> +		if (changed)
> +			pci_setup_bridge(bus);
> +	}
>  	list_for_each_entry(child, &bus->children, node)
>  		pcibios_allocate_bus_resources(child);
>  }
> @@ -271,21 +295,34 @@ static void pcibios_allocate_dev_resourc
>  			else
>  				disabled = !(command & PCI_COMMAND_MEMORY);
>  			if (pass == disabled) {
> +				int ret;
> +
>  				dev_dbg(&dev->dev,
>  					"BAR %d: reserving %pr (d=%d, p=%d)\n",
>  					idx, r, disabled, pass);
> -				if (pci_claim_resource(dev, idx) < 0) {
> -					if (r->flags & IORESOURCE_PCI_FIXED) {
> -						dev_info(&dev->dev, "BAR %d %pR is immovable\n",
> +
> +				ret = pci_claim_resource(dev, idx);
> +
> +				if (ret >= 0)
> +					continue;
> +
> +				if (r->flags & IORESOURCE_PCI_FIXED) {
> +					dev_info(&dev->dev, "BAR %d %pR is immovable\n",
>  							 idx, r);
> -					} else {
> -						/* We'll assign a new address later */
> -						pcibios_save_fw_addr(dev,
> -								idx, r->start);
> -						r->end -= r->start;
> -						r->start = 0;
> -					}
> +					continue;
> +				}
> +
> +				/* try again with clip */
> +				if (pci_bus_clip_resource(dev, r)) {
> +					pci_update_resource(dev, idx);
> +					if (pci_claim_resource(dev, idx) >= 0)
> +						continue;
>  				}
> +
> +				/* We'll assign a new address later */
> +				pcibios_save_fw_addr(dev, idx, r->start);
> +				r->end -= r->start;
> +				r->start = 0;
>  			}
>  		}
>  	if (!pass) {
> Index: linux-2.6/include/linux/pci.h
> ===================================================================
> --- linux-2.6.orig/include/linux/pci.h
> +++ linux-2.6/include/linux/pci.h
> @@ -1094,6 +1094,7 @@ void pci_free_resource_list(struct list_
>  void pci_bus_add_resource(struct pci_bus *bus, struct resource *res, unsigned int flags);
>  struct resource *pci_bus_resource_n(const struct pci_bus *bus, int n);
>  void pci_bus_remove_resources(struct pci_bus *bus);
> +bool pci_bus_clip_resource(struct pci_dev *dev, struct resource *res);
>  
>  #define pci_bus_for_each_resource(bus, res, i)				\
>  	for (i = 0;							\
> Index: linux-2.6/drivers/pci/bus.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/bus.c
> +++ linux-2.6/drivers/pci/bus.c
> @@ -228,6 +228,41 @@ int pci_bus_alloc_resource(struct pci_bu
>  }
>  EXPORT_SYMBOL(pci_bus_alloc_resource);
>  
> +bool pci_bus_clip_resource(struct pci_dev *dev, struct resource *res)
> +{
> +	struct pci_bus *bus = dev->bus;
> +	resource_size_t start, end;
> +	struct resource orig_res = *res;
> +	struct resource *r;
> +	int i;
> +
> +	pci_bus_for_each_resource(bus, r, i) {
> +
> +		if (resource_type(res) != resource_type(r))
> +			continue;
> +
> +		start = max(r->start, res->start);
> +		end = min(r->end, res->end);
> +
> +		/* no overlap ? */
> +		if (start > end)
> +			continue;
> +
> +		if (res->start == start && res->end == end)
> +			return false;
> +
> +		/* changed */
> +		res->start = start;
> +		res->end = end;
> +		dev_printk(KERN_DEBUG, &dev->dev, "%pR ==> %pR\n",
> +				 &orig_res, res);
> +
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
>  void __weak pcibios_resource_survey_bus(struct pci_bus *bus) { }
>  
>  /**


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-24 17:26                                     ` Bjorn Helgaas
@ 2014-12-24 20:29                                       ` Yinghai Lu
  0 siblings, 0 replies; 37+ messages in thread
From: Yinghai Lu @ 2014-12-24 20:29 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Marek Kordík, Gavin Shan, Benjamin Herrenschmidt, linux-pci,
	Alexey Voronkov, David Airlie, Alex Deucher, Richard Yang

On Wed, Dec 24, 2014 at 9:26 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Tue, Dec 23, 2014 at 06:29:20PM -0800, Yinghai Lu wrote:
>
> OK, how about if we test it on x86, and if it works, you fix the other
> arches, and then I'll apply it for all arches at the same time?

Good to me.

>
> We are trying to fix a regression, but I don't think there's any reason to
> make the arches diverge for a relatively trivial change like this.

Sure.

Yinghai

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-24  2:29                                   ` Yinghai Lu
  2014-12-24 17:26                                     ` Bjorn Helgaas
@ 2014-12-25 15:46                                     ` Marek Kordík
  2014-12-25 21:12                                       ` Yinghai Lu
  1 sibling, 1 reply; 37+ messages in thread
From: Marek Kordík @ 2014-12-25 15:46 UTC (permalink / raw)
  To: Yinghai Lu, Bjorn Helgaas
  Cc: Gavin Shan, Benjamin Herrenschmidt, linux-pci, Alexey Voronkov,
	David Airlie, Alex Deucher, Richard Yang

On 12/24/2014 03:29 AM, Yinghai Lu wrote:
> On Tue, Dec 23, 2014 at 1:41 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>
>>
>> Can you do this so it clips to any upstream bridge, whether it's a host
>> bridge or a PCI-to-PCI bridge?  It doesn't seem like this has to be
>> specific to host bridge windows -- any device below a PCI-to-PCI bridge has
>> to use address space forwarded by that upstream bridge.
>
> ok, please check attached v2.
>
>>
>> I wish this were in generic code.  There's nothing architecture-specific
>> about the problem, but this patch only fixes things on x86.  Can you make a
>> similar change in the corresponding code for other arches, as well?
>
> ok, later after x86.
>
> Thanks
>
> Yinghai
>

I checked (only) v2 of your patch on versions 3.18.0 and 3.18.1. The 
result of both versions is that booting stops with kernel panic. I don't 
how to get the full log (there is nothing in journalctl and dmesg after 
booting some working kernel - maybe if you can give me some hint what to 
google) so I will attach photo of screen to the bugzilla issue.

Marek

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-25 15:46                                     ` Marek Kordík
@ 2014-12-25 21:12                                       ` Yinghai Lu
  2015-01-09 18:45                                         ` Bjorn Helgaas
  0 siblings, 1 reply; 37+ messages in thread
From: Yinghai Lu @ 2014-12-25 21:12 UTC (permalink / raw)
  To: Marek Kordík
  Cc: Bjorn Helgaas, Gavin Shan, Benjamin Herrenschmidt, linux-pci,
	Alexey Voronkov, David Airlie, Alex Deucher, Richard Yang

[-- Attachment #1: Type: text/plain, Size: 609 bytes --]

On Thu, Dec 25, 2014 at 7:46 AM, Marek Kordík <kordikmarek@gmail.com> wrote:
> On 12/24/2014 03:29 AM, Yinghai Lu wrote:

> I checked (only) v2 of your patch on versions 3.18.0 and 3.18.1. The result
> of both versions is that booting stops with kernel panic. I don't how to get
> the full log (there is nothing in journalctl and dmesg after booting some
> working kernel - maybe if you can give me some hint what to google) so I
> will attach photo of screen to the bugzilla issue.

Please check update one.

but i still can not understand why do we need

if (!r)
   continue;

checking...

[-- Attachment #2: pci_bridge_clip_v3.patch --]
[-- Type: text/x-patch, Size: 5601 bytes --]

Subject: [PATCH v3] PCI, x86: clip firmware assigned pci bridges under hostbridge

Some bios put range that is not fully coverred by root bus resources.
Try to clip them and update them in pci bridge bars.

-v2: check with upstream bridge instead of host bridge only.
     also add debug print out.
-v3: fix checking under pci_bus_for_each...

Link: https://bugzilla.kernel.org/show_bug.cgi?id=85491
Reported-by: Marek Kordik <kordikmarek@gmail.com>
Fixes: 5b28541552ef ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources")
Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/pci/i386.c |   81 +++++++++++++++++++++++++++++++++++++---------------
 drivers/pci/bus.c   |   37 +++++++++++++++++++++++
 include/linux/pci.h |    1 
 3 files changed, 97 insertions(+), 22 deletions(-)

Index: linux-2.6/arch/x86/pci/i386.c
===================================================================
--- linux-2.6.orig/arch/x86/pci/i386.c
+++ linux-2.6/arch/x86/pci/i386.c
@@ -205,28 +205,48 @@ EXPORT_SYMBOL(pcibios_align_resource);
  *	    as well.
  */
 
-static void pcibios_allocate_bridge_resources(struct pci_dev *dev)
+static bool pcibios_allocate_bridge_resources(struct pci_dev *dev)
 {
 	int idx;
 	struct resource *r;
+	bool changed = false;
 
 	for (idx = PCI_BRIDGE_RESOURCES; idx < PCI_NUM_RESOURCES; idx++) {
+		int ret;
+
 		r = &dev->resource[idx];
 		if (!r->flags)
 			continue;
 		if (r->parent)	/* Already allocated */
 			continue;
-		if (!r->start || pci_claim_resource(dev, idx) < 0) {
-			/*
-			 * Something is wrong with the region.
-			 * Invalidate the resource to prevent
-			 * child resource allocations in this
-			 * range.
-			 */
-			r->start = r->end = 0;
-			r->flags = 0;
+
+		if (!r->start)
+			goto clear;
+
+		ret = pci_claim_resource(dev, idx);
+		if (ret >= 0)
+			continue;
+
+		/* try again after clip for pci bridge*/
+		if ((dev->class >> 8) == PCI_CLASS_BRIDGE_PCI &&
+		    pci_bus_clip_resource(dev, r)) {
+			changed = true;
+			if (pci_claim_resource(dev, idx) >= 0)
+				continue;
 		}
+
+clear:
+		/*
+		 * Something is wrong with the region.
+		 * Invalidate the resource to prevent
+		 * child resource allocations in this
+		 * range.
+		 */
+		r->start = r->end = 0;
+		r->flags = 0;
 	}
+
+	return changed;
 }
 
 static void pcibios_allocate_bus_resources(struct pci_bus *bus)
@@ -234,8 +254,12 @@ static void pcibios_allocate_bus_resourc
 	struct pci_bus *child;
 
 	/* Depth-First Search on bus tree */
-	if (bus->self)
-		pcibios_allocate_bridge_resources(bus->self);
+	if (bus->self) {
+		bool changed = pcibios_allocate_bridge_resources(bus->self);
+
+		if (changed)
+			pci_setup_bridge(bus);
+	}
 	list_for_each_entry(child, &bus->children, node)
 		pcibios_allocate_bus_resources(child);
 }
@@ -271,21 +295,34 @@ static void pcibios_allocate_dev_resourc
 			else
 				disabled = !(command & PCI_COMMAND_MEMORY);
 			if (pass == disabled) {
+				int ret;
+
 				dev_dbg(&dev->dev,
 					"BAR %d: reserving %pr (d=%d, p=%d)\n",
 					idx, r, disabled, pass);
-				if (pci_claim_resource(dev, idx) < 0) {
-					if (r->flags & IORESOURCE_PCI_FIXED) {
-						dev_info(&dev->dev, "BAR %d %pR is immovable\n",
+
+				ret = pci_claim_resource(dev, idx);
+
+				if (ret >= 0)
+					continue;
+
+				if (r->flags & IORESOURCE_PCI_FIXED) {
+					dev_info(&dev->dev, "BAR %d %pR is immovable\n",
 							 idx, r);
-					} else {
-						/* We'll assign a new address later */
-						pcibios_save_fw_addr(dev,
-								idx, r->start);
-						r->end -= r->start;
-						r->start = 0;
-					}
+					continue;
+				}
+
+				/* try again with clip */
+				if (pci_bus_clip_resource(dev, r)) {
+					pci_update_resource(dev, idx);
+					if (pci_claim_resource(dev, idx) >= 0)
+						continue;
 				}
+
+				/* We'll assign a new address later */
+				pcibios_save_fw_addr(dev, idx, r->start);
+				r->end -= r->start;
+				r->start = 0;
 			}
 		}
 	if (!pass) {
Index: linux-2.6/include/linux/pci.h
===================================================================
--- linux-2.6.orig/include/linux/pci.h
+++ linux-2.6/include/linux/pci.h
@@ -1094,6 +1094,7 @@ void pci_free_resource_list(struct list_
 void pci_bus_add_resource(struct pci_bus *bus, struct resource *res, unsigned int flags);
 struct resource *pci_bus_resource_n(const struct pci_bus *bus, int n);
 void pci_bus_remove_resources(struct pci_bus *bus);
+bool pci_bus_clip_resource(struct pci_dev *dev, struct resource *res);
 
 #define pci_bus_for_each_resource(bus, res, i)				\
 	for (i = 0;							\
Index: linux-2.6/drivers/pci/bus.c
===================================================================
--- linux-2.6.orig/drivers/pci/bus.c
+++ linux-2.6/drivers/pci/bus.c
@@ -228,6 +228,43 @@ int pci_bus_alloc_resource(struct pci_bu
 }
 EXPORT_SYMBOL(pci_bus_alloc_resource);
 
+bool pci_bus_clip_resource(struct pci_dev *dev, struct resource *res)
+{
+	struct pci_bus *bus = dev->bus;
+	resource_size_t start, end;
+	struct resource orig_res = *res;
+	struct resource *r;
+	int i;
+
+	pci_bus_for_each_resource(bus, r, i) {
+		if (!r)
+			continue;
+
+		if (resource_type(res) != resource_type(r))
+			continue;
+
+		start = max(r->start, res->start);
+		end = min(r->end, res->end);
+
+		/* no overlap ? */
+		if (start > end)
+			continue;
+
+		if (res->start == start && res->end == end)
+			return false;
+
+		/* changed */
+		res->start = start;
+		res->end = end;
+		dev_printk(KERN_DEBUG, &dev->dev, "%pR ==> %pR\n",
+				 &orig_res, res);
+
+		return true;
+	}
+
+	return false;
+}
+
 void __weak pcibios_resource_survey_bus(struct pci_bus *bus) { }
 
 /**

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2014-12-25 21:12                                       ` Yinghai Lu
@ 2015-01-09 18:45                                         ` Bjorn Helgaas
  2015-01-09 20:38                                           ` Yinghai Lu
  0 siblings, 1 reply; 37+ messages in thread
From: Bjorn Helgaas @ 2015-01-09 18:45 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Marek Kordík, Gavin Shan, Benjamin Herrenschmidt, linux-pci,
	Alexey Voronkov, David Airlie, Alex Deucher, Richard Yang

On Thu, Dec 25, 2014 at 01:12:47PM -0800, Yinghai Lu wrote:
> On Thu, Dec 25, 2014 at 7:46 AM, Marek Kordík <kordikmarek@gmail.com> wrote:
> > On 12/24/2014 03:29 AM, Yinghai Lu wrote:
> 
> > I checked (only) v2 of your patch on versions 3.18.0 and 3.18.1. The result
> > of both versions is that booting stops with kernel panic. I don't how to get
> > the full log (there is nothing in journalctl and dmesg after booting some
> > working kernel - maybe if you can give me some hint what to google) so I
> > will attach photo of screen to the bugzilla issue.
> 
> Please check update one.
> 
> but i still can not understand why do we need
> 
> if (!r)
>    continue;
> 
> checking...

> Subject: [PATCH v3] PCI, x86: clip firmware assigned pci bridges under hostbridge
> 
> Some bios put range that is not fully coverred by root bus resources.
> Try to clip them and update them in pci bridge bars.
> 
> -v2: check with upstream bridge instead of host bridge only.
>      also add debug print out.
> -v3: fix checking under pci_bus_for_each...
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=85491
> Reported-by: Marek Kordik <kordikmarek@gmail.com>
> Fixes: 5b28541552ef ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources")
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>

This is becoming urgent.

Do we have a patch that fixes the problem on x86?  I see that v2 failed for
Marek, and I don't see any test results for v3.

Also, my intent is for Marek or others to test this and verify that it
works on x86, and then you extend the patch to cover all architectures, and
then I'll merge them all.  I do not intend to merge the x86 change by
itself before the changes to other arches, because if I do that, the other
arches will probably never get fixed.

My only other option is to revert 5b28541552ef.  That would break stuff on
powerpc, so I don't want to do it, but we need to resolve this soon.

Bjorn

> ---
>  arch/x86/pci/i386.c |   81 +++++++++++++++++++++++++++++++++++++---------------
>  drivers/pci/bus.c   |   37 +++++++++++++++++++++++
>  include/linux/pci.h |    1 
>  3 files changed, 97 insertions(+), 22 deletions(-)
> 
> Index: linux-2.6/arch/x86/pci/i386.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/pci/i386.c
> +++ linux-2.6/arch/x86/pci/i386.c
> @@ -205,28 +205,48 @@ EXPORT_SYMBOL(pcibios_align_resource);
>   *	    as well.
>   */
>  
> -static void pcibios_allocate_bridge_resources(struct pci_dev *dev)
> +static bool pcibios_allocate_bridge_resources(struct pci_dev *dev)
>  {
>  	int idx;
>  	struct resource *r;
> +	bool changed = false;
>  
>  	for (idx = PCI_BRIDGE_RESOURCES; idx < PCI_NUM_RESOURCES; idx++) {
> +		int ret;
> +
>  		r = &dev->resource[idx];
>  		if (!r->flags)
>  			continue;
>  		if (r->parent)	/* Already allocated */
>  			continue;
> -		if (!r->start || pci_claim_resource(dev, idx) < 0) {
> -			/*
> -			 * Something is wrong with the region.
> -			 * Invalidate the resource to prevent
> -			 * child resource allocations in this
> -			 * range.
> -			 */
> -			r->start = r->end = 0;
> -			r->flags = 0;
> +
> +		if (!r->start)
> +			goto clear;
> +
> +		ret = pci_claim_resource(dev, idx);
> +		if (ret >= 0)
> +			continue;
> +
> +		/* try again after clip for pci bridge*/
> +		if ((dev->class >> 8) == PCI_CLASS_BRIDGE_PCI &&
> +		    pci_bus_clip_resource(dev, r)) {
> +			changed = true;
> +			if (pci_claim_resource(dev, idx) >= 0)
> +				continue;
>  		}
> +
> +clear:
> +		/*
> +		 * Something is wrong with the region.
> +		 * Invalidate the resource to prevent
> +		 * child resource allocations in this
> +		 * range.
> +		 */
> +		r->start = r->end = 0;
> +		r->flags = 0;
>  	}
> +
> +	return changed;
>  }
>  
>  static void pcibios_allocate_bus_resources(struct pci_bus *bus)
> @@ -234,8 +254,12 @@ static void pcibios_allocate_bus_resourc
>  	struct pci_bus *child;
>  
>  	/* Depth-First Search on bus tree */
> -	if (bus->self)
> -		pcibios_allocate_bridge_resources(bus->self);
> +	if (bus->self) {
> +		bool changed = pcibios_allocate_bridge_resources(bus->self);
> +
> +		if (changed)
> +			pci_setup_bridge(bus);
> +	}
>  	list_for_each_entry(child, &bus->children, node)
>  		pcibios_allocate_bus_resources(child);
>  }
> @@ -271,21 +295,34 @@ static void pcibios_allocate_dev_resourc
>  			else
>  				disabled = !(command & PCI_COMMAND_MEMORY);
>  			if (pass == disabled) {
> +				int ret;
> +
>  				dev_dbg(&dev->dev,
>  					"BAR %d: reserving %pr (d=%d, p=%d)\n",
>  					idx, r, disabled, pass);
> -				if (pci_claim_resource(dev, idx) < 0) {
> -					if (r->flags & IORESOURCE_PCI_FIXED) {
> -						dev_info(&dev->dev, "BAR %d %pR is immovable\n",
> +
> +				ret = pci_claim_resource(dev, idx);
> +
> +				if (ret >= 0)
> +					continue;
> +
> +				if (r->flags & IORESOURCE_PCI_FIXED) {
> +					dev_info(&dev->dev, "BAR %d %pR is immovable\n",
>  							 idx, r);
> -					} else {
> -						/* We'll assign a new address later */
> -						pcibios_save_fw_addr(dev,
> -								idx, r->start);
> -						r->end -= r->start;
> -						r->start = 0;
> -					}
> +					continue;
> +				}
> +
> +				/* try again with clip */
> +				if (pci_bus_clip_resource(dev, r)) {
> +					pci_update_resource(dev, idx);
> +					if (pci_claim_resource(dev, idx) >= 0)
> +						continue;
>  				}
> +
> +				/* We'll assign a new address later */
> +				pcibios_save_fw_addr(dev, idx, r->start);
> +				r->end -= r->start;
> +				r->start = 0;
>  			}
>  		}
>  	if (!pass) {
> Index: linux-2.6/include/linux/pci.h
> ===================================================================
> --- linux-2.6.orig/include/linux/pci.h
> +++ linux-2.6/include/linux/pci.h
> @@ -1094,6 +1094,7 @@ void pci_free_resource_list(struct list_
>  void pci_bus_add_resource(struct pci_bus *bus, struct resource *res, unsigned int flags);
>  struct resource *pci_bus_resource_n(const struct pci_bus *bus, int n);
>  void pci_bus_remove_resources(struct pci_bus *bus);
> +bool pci_bus_clip_resource(struct pci_dev *dev, struct resource *res);
>  
>  #define pci_bus_for_each_resource(bus, res, i)				\
>  	for (i = 0;							\
> Index: linux-2.6/drivers/pci/bus.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/bus.c
> +++ linux-2.6/drivers/pci/bus.c
> @@ -228,6 +228,43 @@ int pci_bus_alloc_resource(struct pci_bu
>  }
>  EXPORT_SYMBOL(pci_bus_alloc_resource);
>  
> +bool pci_bus_clip_resource(struct pci_dev *dev, struct resource *res)
> +{
> +	struct pci_bus *bus = dev->bus;
> +	resource_size_t start, end;
> +	struct resource orig_res = *res;
> +	struct resource *r;
> +	int i;
> +
> +	pci_bus_for_each_resource(bus, r, i) {
> +		if (!r)
> +			continue;
> +
> +		if (resource_type(res) != resource_type(r))
> +			continue;
> +
> +		start = max(r->start, res->start);
> +		end = min(r->end, res->end);
> +
> +		/* no overlap ? */
> +		if (start > end)
> +			continue;
> +
> +		if (res->start == start && res->end == end)
> +			return false;
> +
> +		/* changed */
> +		res->start = start;
> +		res->end = end;
> +		dev_printk(KERN_DEBUG, &dev->dev, "%pR ==> %pR\n",
> +				 &orig_res, res);
> +
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
>  void __weak pcibios_resource_survey_bus(struct pci_bus *bus) { }
>  
>  /**


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init
  2015-01-09 18:45                                         ` Bjorn Helgaas
@ 2015-01-09 20:38                                           ` Yinghai Lu
  0 siblings, 0 replies; 37+ messages in thread
From: Yinghai Lu @ 2015-01-09 20:38 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Marek Kordík, Gavin Shan, Benjamin Herrenschmidt, linux-pci,
	Alexey Voronkov, David Airlie, Alex Deucher, Richard Yang

On Fri, Jan 9, 2015 at 10:45 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Thu, Dec 25, 2014 at 01:12:47PM -0800, Yinghai Lu wrote:
> Do we have a patch that fixes the problem on x86?  I see that v2 failed for
> Marek, and I don't see any test results for v3.


Yes. v3 works for Marek.

>
> Also, my intent is for Marek or others to test this and verify that it
> works on x86, and then you extend the patch to cover all architectures, and
> then I'll merge them all.  I do not intend to merge the x86 change by
> itself before the changes to other arches, because if I do that, the other
> arches will probably never get fixed.

Ok, will go over other arch that ...

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2015-01-09 20:38 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-03 22:15 Regression: bug 85491: radeon 0000:01:00.0: Fatal error during GPU init Bjorn Helgaas
2014-12-04  0:51 ` Bjorn Helgaas
2014-12-04  1:44   ` Yinghai Lu
2014-12-04 17:34     ` Bjorn Helgaas
2014-12-04 22:06       ` Yinghai Lu
2014-12-05 10:59         ` Benjamin Herrenschmidt
2014-12-08  0:47           ` Gavin Shan
2014-12-08 21:04             ` Bjorn Helgaas
2014-12-08 21:38               ` Benjamin Herrenschmidt
2014-12-08 21:46                 ` Bjorn Helgaas
2014-12-08 23:38                   ` Gavin Shan
2014-12-09  0:21                     ` Yinghai Lu
2014-12-19 22:17                       ` Bjorn Helgaas
2014-12-20  0:23                       ` Bjorn Helgaas
2014-12-20  0:34                         ` Yinghai Lu
2014-12-20  0:56                           ` Bjorn Helgaas
2014-12-20  1:33                             ` Yinghai Lu
2014-12-23  1:53                               ` Yinghai Lu
2014-12-23 21:41                                 ` Bjorn Helgaas
2014-12-24  2:29                                   ` Yinghai Lu
2014-12-24 17:26                                     ` Bjorn Helgaas
2014-12-24 20:29                                       ` Yinghai Lu
2014-12-25 15:46                                     ` Marek Kordík
2014-12-25 21:12                                       ` Yinghai Lu
2015-01-09 18:45                                         ` Bjorn Helgaas
2015-01-09 20:38                                           ` Yinghai Lu
2014-12-22  8:17                           ` Wei Yang
2014-12-22 19:50                             ` Bjorn Helgaas
2014-12-04  1:38 ` Yinghai Lu
2014-12-04  1:41   ` Yinghai Lu
2014-12-04 16:15     ` Bjorn Helgaas
2014-12-04 22:15       ` Yinghai Lu
2014-12-04 22:25         ` Bjorn Helgaas
2014-12-04  2:01 ` Yinghai Lu
2014-12-04 16:37   ` Bjorn Helgaas
2014-12-04 22:24     ` Yinghai Lu
2014-12-05  0:24       ` Yinghai Lu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.