dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: bugzilla-daemon@freedesktop.org
To: dri-devel@lists.freedesktop.org
Subject: [Bug 108521] RX 580 as eGPU amdgpu: gpu post error!
Date: Fri, 26 Oct 2018 04:42:39 +0000	[thread overview]
Message-ID: <bug-108521-502-9ykRz0H1do@http.bugs.freedesktop.org/> (raw)
In-Reply-To: <bug-108521-502@http.bugs.freedesktop.org/>


[-- Attachment #1.1: Type: text/plain, Size: 10476 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=108521

--- Comment #21 from Robert Strube <rstrube@gmail.com> ---
Hi guys,

Apologies for the deluge of posts here, I've been trying really hard to
investigate this issue!

So I took a closer look at the PCI resource issues that you mentioned, I've
also been looking and thunderbolt driver issues in general, and I've noticed
that this type of log message is quite common.  Here's what I'm wondering:

These four devices correspond to the TB to PCI bridges in the system

0000:04:00.0
0000:05:01.0
0000:05:02.0
0000:05:04.0

04:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step)
[Alpine Ridge 4C 2016] (rev 02) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Bus: primary=04, secondary=05, subordinate=6e, sec-latency=0
        Memory behind bridge: bc000000-ea0fffff
        Prefetchable memory behind bridge: 0000002fb0000000-0000002ff9ffffff
        Capabilities: [80] Power Management version 3
        Capabilities: [88] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [ac] Subsystem: Intel Corporation JHL6540 Thunderbolt 3
Bridge (C step) [Alpine Ridge 4C 2016]
        Capabilities: [c0] Express Upstream Port, MSI 00
        Capabilities: [100] Device Serial Number b7-de-04-b0-a6-c9-a0-00
        Capabilities: [200] Advanced Error Reporting
        Capabilities: [300] Virtual Channel
        Capabilities: [400] Power Budgeting <?>
        Capabilities: [500] Vendor Specific Information: ID=1234 Rev=1 Len=0d8
<?>
        Capabilities: [600] Latency Tolerance Reporting
        Capabilities: [700] #19
        Kernel driver in use: pcieport

05:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step)
[Alpine Ridge 4C 2016] (rev 02) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Bus: primary=05, secondary=06, subordinate=06, sec-latency=0
        Memory behind bridge: ea000000-ea0fffff
        Capabilities: [80] Power Management version 3
        Capabilities: [88] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [ac] Subsystem: Intel Corporation JHL6540 Thunderbolt 3
Bridge (C step) [Alpine Ridge 4C 2016]
        Capabilities: [c0] Express Downstream Port (Slot+), MSI 00
        Capabilities: [100] Device Serial Number b7-de-04-b0-a6-c9-a0-00
        Capabilities: [200] Advanced Error Reporting
        Capabilities: [300] Virtual Channel
        Capabilities: [400] Power Budgeting <?>
        Capabilities: [500] Vendor Specific Information: ID=1234 Rev=1 Len=0d8
<?>
        Capabilities: [700] #19
        Kernel driver in use: pcieport

05:01.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step)
[Alpine Ridge 4C 2016] (rev 02) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 17
        Bus: primary=05, secondary=07, subordinate=39, sec-latency=0
        Memory behind bridge: bc000000-d3efffff
        Prefetchable memory behind bridge: 0000002fb0000000-0000002fcfffffff
        Capabilities: [80] Power Management version 3
        Capabilities: [88] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [ac] Subsystem: Intel Corporation JHL6540 Thunderbolt 3
Bridge (C step) [Alpine Ridge 4C 2016]
        Capabilities: [c0] Express Downstream Port (Slot+), MSI 00
        Capabilities: [100] Device Serial Number b7-de-04-b0-a6-c9-a0-00
        Capabilities: [200] Advanced Error Reporting
        Capabilities: [300] Virtual Channel
        Capabilities: [400] Power Budgeting <?>
        Capabilities: [500] Vendor Specific Information: ID=1234 Rev=1 Len=0d8
<?>
        Capabilities: [700] #19
        Kernel driver in use: pcieport

05:02.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step)
[Alpine Ridge 4C 2016] (rev 02) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 18
        Bus: primary=05, secondary=3a, subordinate=3a, sec-latency=0
        Memory behind bridge: d3f00000-d3ffffff
        Capabilities: [80] Power Management version 3
        Capabilities: [88] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [ac] Subsystem: Intel Corporation JHL6540 Thunderbolt 3
Bridge (C step) [Alpine Ridge 4C 2016]
        Capabilities: [c0] Express Downstream Port (Slot+), MSI 00
        Capabilities: [100] Device Serial Number b7-de-04-b0-a6-c9-a0-00
        Capabilities: [200] Advanced Error Reporting
        Capabilities: [300] Virtual Channel
        Capabilities: [400] Power Budgeting <?>
        Capabilities: [500] Vendor Specific Information: ID=1234 Rev=1 Len=0d8
<?>
        Capabilities: [700] #19
        Kernel driver in use: pcieport

05:04.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step)
[Alpine Ridge 4C 2016] (rev 02) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Bus: primary=05, secondary=3b, subordinate=6e, sec-latency=0
        Memory behind bridge: d4000000-e9ffffff
        Prefetchable memory behind bridge: 0000002fd0000000-0000002ff9ffffff
        Capabilities: [80] Power Management version 3
        Capabilities: [88] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [ac] Subsystem: Intel Corporation JHL6540 Thunderbolt 3
Bridge (C step) [Alpine Ridge 4C 2016]
        Capabilities: [c0] Express Downstream Port (Slot+), MSI 00
        Capabilities: [100] Device Serial Number b7-de-04-b0-a6-c9-a0-00
        Capabilities: [200] Advanced Error Reporting
        Capabilities: [300] Virtual Channel
        Capabilities: [400] Power Budgeting <?>
        Capabilities: [500] Vendor Specific Information: ID=1234 Rev=1 Len=0d8
<?>
        Capabilities: [700] #19
        Kernel driver in use: pcieport

First you see pci defining the bridge windows for devices:

[  104.290143] pci 0000:05:01.0: bridge window [io  0x1000-0x0fff] to [bus
07-39] add_size 1000
[  104.290152] pci 0000:05:02.0: bridge window [io  0x1000-0x0fff] to [bus 3a]
add_size 1000
[  104.290155] pci 0000:05:02.0: bridge window [mem 0x00100000-0x000fffff 64bit
pref] to [bus 3a] add_size 200000 add_align 100000
[  104.290169] pci 0000:05:04.0: bridge window [io  0x1000-0x0fff] to [bus
3b-6e] add_size 1000
[  104.290180] pci 0000:04:00.0: bridge window [io  0x1000-0x0fff] to [bus
05-6e] add_size 3000

Then you see a bunch of BAR errors, saying there's no space and that they can't
be assigned:

[  104.290184] pci 0000:04:00.0: BAR 13: no space for [io  size 0x3000]
[  104.290185] pci 0000:04:00.0: BAR 13: failed to assign [io  size 0x3000]
[  104.290187] pci 0000:04:00.0: BAR 13: no space for [io  size 0x3000]
[  104.290188] pci 0000:04:00.0: BAR 13: failed to assign [io  size 0x3000]
[  104.290193] pci 0000:05:02.0: BAR 15: no space for [mem size 0x00200000
64bit pref]
[  104.290194] pci 0000:05:02.0: BAR 15: failed to assign [mem size 0x00200000
64bit pref]
[  104.290196] pci 0000:05:01.0: BAR 13: no space for [io  size 0x1000]
[  104.290197] pci 0000:05:01.0: BAR 13: failed to assign [io  size 0x1000]
[  104.290198] pci 0000:05:02.0: BAR 13: no space for [io  size 0x1000]
[  104.290199] pci 0000:05:02.0: BAR 13: failed to assign [io  size 0x1000]
[  104.290201] pci 0000:05:04.0: BAR 13: no space for [io  size 0x1000]
[  104.290202] pci 0000:05:04.0: BAR 13: failed to assign [io  size 0x1000]
[  104.290203] pci 0000:05:04.0: BAR 13: no space for [io  size 0x1000]
[  104.290205] pci 0000:05:04.0: BAR 13: failed to assign [io  size 0x1000]
[  104.290207] pci 0000:05:02.0: BAR 15: no space for [mem size 0x00200000
64bit pref]
[  104.290208] pci 0000:05:02.0: BAR 15: failed to assign [mem size 0x00200000
64bit pref]
[  104.290209] pci 0000:05:02.0: BAR 13: no space for [io  size 0x1000]
[  104.290210] pci 0000:05:02.0: BAR 13: failed to assign [io  size 0x1000]
[  104.290212] pci 0000:05:01.0: BAR 13: no space for [io  size 0x1000]
[  104.290213] pci 0000:05:01.0: BAR 13: failed to assign [io  size 0x1000]

But then you see that the PCI bridges seem to initialize for all the devices:

[  104.290215] pci 0000:05:00.0: PCI bridge to [bus 06]
[  104.290221] pci 0000:05:00.0:   bridge window [mem 0xea000000-0xea0fffff]
[  104.290231] pci 0000:05:01.0: PCI bridge to [bus 07-39]
[  104.290237] pci 0000:05:01.0:   bridge window [mem 0xbc000000-0xd3efffff]
[  104.290241] pci 0000:05:01.0:   bridge window [mem 0x2fb0000000-0x2fcfffffff
64bit pref]
[  104.290248] pci 0000:05:02.0: PCI bridge to [bus 3a]
[  104.290254] pci 0000:05:02.0:   bridge window [mem 0xd3f00000-0xd3ffffff]
[  104.290264] pci 0000:05:04.0: PCI bridge to [bus 3b-6e]
[  104.290270] pci 0000:05:04.0:   bridge window [mem 0xd4000000-0xe9ffffff]
[  104.290274] pci 0000:05:04.0:   bridge window [mem 0x2fd0000000-0x2ff9ffffff
64bit pref]
[  104.290281] pci 0000:04:00.0: PCI bridge to [bus 05-6e]
[  104.290286] pci 0000:04:00.0:   bridge window [mem 0xbc000000-0xea0fffff]
[  104.290291] pci 0000:04:00.0:   bridge window [mem 0x2fb0000000-0x2ff9ffffff
64bit pref]

Perhaps the BAR errors are just a red herring and at the end of the process all
of the the Thunderbolt PCI bridges *are* initialized correctly?

As I said, I've probably spent way too much time looking at this, the main
thing I keep coming back to is that my other GPU *does* work correctly as an
eGPU.  It's also a PCI x16 card (I know it's operating over PCI x4 due to TB3
bandwitch limitations), so theoretically if there were any PCI resource
problems with the Thunderbolt bridge then this GPU should also fail, correct?

I noticed a couple other things in my research:

I found a bug that points to tlp (specifically power management) as causing the
same problems with the atom bios being stuck in a loop:
https://bugs.freedesktop.org/show_bug.cgi?id=103783
Perhaps the issue is caused by some sort of aggressive PM?  I might try adding
some kernel boot parameters amdgpu.dpm=0 amdgpu.apm=0 etc.

I was also thinking that perhaps I should try the AMDGPU-PRO drivers just to
see if they would work by chance.  Somebody else reported that these drivers
worked, while the amdgpu drivers failed.  It's worth a shot.

Thanks for any feedback and/or advice!
Rob

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 11506 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  parent reply	other threads:[~2018-10-26  4:42 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-23  5:14 [Bug 108521] RX 580 as eGPU amdgpu: gpu post error! bugzilla-daemon
2018-10-23 14:25 ` bugzilla-daemon
2018-10-23 16:00 ` bugzilla-daemon
2018-10-23 16:00 ` bugzilla-daemon
2018-10-23 20:08 ` bugzilla-daemon
2018-10-24  6:24 ` bugzilla-daemon
2018-10-24 20:44 ` bugzilla-daemon
2018-10-25  1:13 ` bugzilla-daemon
2018-10-25  1:15 ` bugzilla-daemon
2018-10-25  1:34 ` bugzilla-daemon
2018-10-25  2:12 ` bugzilla-daemon
2018-10-25  2:16 ` bugzilla-daemon
2018-10-25  3:06 ` bugzilla-daemon
2018-10-25  3:21 ` bugzilla-daemon
2018-10-25  7:05 ` bugzilla-daemon
2018-10-25 14:56 ` bugzilla-daemon
2018-10-25 15:01 ` bugzilla-daemon
2018-10-25 15:02 ` bugzilla-daemon
2018-10-25 15:07 ` bugzilla-daemon
2018-10-25 15:11 ` bugzilla-daemon
2018-10-25 15:12 ` bugzilla-daemon
2018-10-25 15:12 ` bugzilla-daemon
2018-10-25 15:13 ` bugzilla-daemon
2018-10-25 20:05 ` bugzilla-daemon
2018-10-26  4:42 ` bugzilla-daemon [this message]
2018-10-26  5:11 ` bugzilla-daemon
2018-10-26  5:14 ` bugzilla-daemon
2018-10-26  5:15 ` bugzilla-daemon
2018-10-26  5:30 ` bugzilla-daemon
2018-10-26 10:35 ` bugzilla-daemon
2018-10-26 16:49 ` bugzilla-daemon
2018-11-29 22:54 ` bugzilla-daemon
2018-11-29 23:47 ` bugzilla-daemon
2018-11-30  1:26 ` bugzilla-daemon
2018-11-30  1:35 ` bugzilla-daemon
2018-11-30  1:43 ` bugzilla-daemon
2018-11-30  1:51 ` bugzilla-daemon
2019-01-09 21:05 ` bugzilla-daemon
2019-01-09 21:08 ` bugzilla-daemon
2019-01-09 21:09 ` bugzilla-daemon
2019-01-09 21:10 ` bugzilla-daemon
2019-01-09 21:13 ` bugzilla-daemon
2019-01-09 22:26 ` bugzilla-daemon
2019-03-29 10:01 ` bugzilla-daemon
2019-03-29 10:06 ` bugzilla-daemon
2019-03-29 13:00 ` bugzilla-daemon
2019-03-29 13:10 ` bugzilla-daemon
2019-03-29 13:56 ` bugzilla-daemon
2019-03-29 16:31 ` bugzilla-daemon
2019-03-29 17:43 ` bugzilla-daemon
2019-03-30 11:42 ` bugzilla-daemon
2019-03-30 11:44 ` bugzilla-daemon
2019-03-30 11:51 ` bugzilla-daemon
2019-04-01 16:14 ` bugzilla-daemon
2019-04-19 19:46 ` bugzilla-daemon
2019-04-25  0:18 ` bugzilla-daemon
2019-04-25  0:18 ` bugzilla-daemon
2019-04-25  0:19 ` bugzilla-daemon
2019-11-02  9:13 ` bugzilla-daemon
2019-11-02  9:15 ` bugzilla-daemon
2019-11-02  9:37 ` bugzilla-daemon
2019-11-02  9:39 ` [Bug 108521] RX 580 / Vega56 " bugzilla-daemon
2019-11-20  7:53 ` bugzilla-daemon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-108521-502-9ykRz0H1do@http.bugs.freedesktop.org/ \
    --to=bugzilla-daemon@freedesktop.org \
    --cc=dri-devel@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).