From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 108521] RX 580 as eGPU amdgpu: gpu post error! Date: Tue, 23 Oct 2018 05:14:00 +0000 Message-ID: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2025822665==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 12D3C6E07F for ; Tue, 23 Oct 2018 05:14:00 +0000 (UTC) List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============2025822665== Content-Type: multipart/alternative; boundary="15402716390.b5d3dc6.26617" Content-Transfer-Encoding: 7bit --15402716390.b5d3dc6.26617 Date: Tue, 23 Oct 2018 05:13:59 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D108521 Bug ID: 108521 Summary: RX 580 as eGPU amdgpu: gpu post error! Product: DRI Version: unspecified Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: rstrube@gmail.com Hello everyone, I've been attempting to get my RX 580 working correctly as an eGPU using the Akitio Node eGPU enclosure (over Thunderbolt 3). I've confirmed that both the Akitio Node and my laptops Thunderbolt 3 controller are running the most up-to-date firmware. I've also been able to successfully authorize the Thunderbolt eGPU enclosure, and see the RX 580 in lspci, see blow: 00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core Process= or Host Bridge/DRAM Registers (rev 05) 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Co= re Processor PCIe Controller (x16) (rev 05) 00:02.0 VGA compatible controller: Intel Corporation Device 591b (rev 04) 00:04.0 Signal processing controller: Intel Corporation Xeon E3-1200 v5/E3-= 1500 v5/6th Gen Core Processor Thermal Subsystem (rev 05) 00:13.0 Non-VGA unclassified device: Intel Corporation 100 Series/C230 Seri= es Chipset Family Integrated Sensor Hub (rev 31) 00:14.0 USB controller: Intel Corporation 100 Series/C230 Series Chipset Fa= mily USB 3.0 xHCI Controller (rev 31) 00:14.2 Signal processing controller: Intel Corporation 100 Series/C230 Ser= ies Chipset Family Thermal Subsystem (rev 31) 00:15.0 Signal processing controller: Intel Corporation 100 Series/C230 Ser= ies Chipset Family Serial IO I2C Controller #0 (rev 31) 00:15.1 Signal processing controller: Intel Corporation 100 Series/C230 Ser= ies Chipset Family Serial IO I2C Controller #1 (rev 31) 00:16.0 Communication controller: Intel Corporation 100 Series/C230 Series Chipset Family MEI Controller #1 (rev 31) 00:17.0 SATA controller: Intel Corporation HM170/QM170 Chipset SATA Control= ler [AHCI Mode] (rev 31) 00:1c.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family= PCI Express Root Port #1 (rev f1) 00:1c.4 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family= PCI Express Root Port #5 (rev f1) 00:1d.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family= PCI Express Root Port #9 (rev f1) 00:1f.0 ISA bridge: Intel Corporation QM175 Chipset LPC/eSPI Controller (rev 31) 00:1f.2 Memory controller: Intel Corporation 100 Series/C230 Series Chipset Family Power Management Controller (rev 31) 00:1f.3 Audio device: Intel Corporation CM238 HD Audio Controller (rev 31) 00:1f.4 SMBus: Intel Corporation 100 Series/C230 Series Chipset Family SMBus (rev 31) 01:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Polaris = 22 [Radeon RX Vega M GL] (rev c0) 02:00.0 Network controller: Qualcomm Atheros QCA6174 802.11ac Wireless Netw= ork Adapter (rev 32) 03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader (rev 01) 04:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02) 05:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02) 05:01.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02) 05:02.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02) 05:04.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02) 06:00.0 System peripheral: Intel Corporation JHL6540 Thunderbolt 3 NHI (C s= tep) [Alpine Ridge 4C 2016] (rev 02) 07:00.0 PCI bridge: Intel Corporation DSL6340 Thunderbolt 3 Bridge [Alpine Ridge 2C 2015] 08:01.0 PCI bridge: Intel Corporation DSL6340 Thunderbolt 3 Bridge [Alpine Ridge 2C 2015] 09:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X] (rev e7) 09:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Rad= eon RX 580] Looking at just the RX 580 in more detail using lspci -v we have: 09:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X] (rev e7) (prog-if 00 [VGA controller]) Subsystem: XFX Pine Group Inc. Ellesmere [Radeon RX 470/480/570/570X/580/580X] Flags: fast devsel, IRQ 18 Memory at 2fb0000000 (64-bit, prefetchable) [size=3D256M] Memory at 2fc0000000 (64-bit, prefetchable) [size=3D2M] I/O ports at 2000 [size=3D256] Memory at bc000000 (32-bit, non-prefetchable) [size=3D256K] Expansion ROM at bc040000 [disabled] [size=3D128K] Capabilities: [48] Vendor Specific Information: Len=3D08 Capabilities: [50] Power Management version 3 Capabilities: [58] Express Legacy Endpoint, MSI 00 Capabilities: [a0] MSI: Enable- Count=3D1/1 Maskable- 64bit+ Capabilities: [100] Vendor Specific Information: ID=3D0001 Rev=3D1 = Len=3D010 Capabilities: [150] Advanced Error Reporting Capabilities: [200] #15 Capabilities: [270] #19 Capabilities: [2b0] Address Translation Service (ATS) Capabilities: [2c0] Page Request Interface (PRI) Capabilities: [2d0] Process Address Space ID (PASID) Capabilities: [320] Latency Tolerance Reporting Capabilities: [328] Alternative Routing-ID Interpretation (ARI) Capabilities: [370] L1 PM Substates Kernel modules: amdgpu When looking at demsg I see the following (I've removed non-relevant lines): [ 8.534250] amdgpu 0000:09:00.0: enabling device (0006 -> 0007) [ 8.534756] [drm] initializing kernel modesetting (POLARIS10 0x1002:0x67= DF 0x1682:0xC580 0xE7). [ 8.537567] [drm] register mmio base: 0xBC000000 [ 8.537568] [drm] register mmio size: 262144 [ 8.537598] [drm] add ip block number 0 [ 8.537599] [drm] add ip block number 1 [ 8.537599] [drm] add ip block number 2 [ 8.537599] [drm] add ip block number 3 [ 8.537600] [drm] add ip block number 4 [ 8.537600] [drm] add ip block number 5 [ 8.537601] [drm] add ip block number 6 [ 8.537602] [drm] add ip block number 7 [ 8.537602] [drm] add ip block number 8 [ 8.537608] kfd kfd: skipped device 1002:67df, PCI rejects atomics [ 8.537630] [drm] UVD is enabled in VM mode [ 8.537630] [drm] UVD ENC is enabled in VM mode [ 8.537636] [drm] VCE enabled in VM mode [ 8.614467] ATOM BIOS: 401815-171128-QS1 [ 8.614512] [drm] GPU posting now... [ 13.621276] [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop f= or more than 5secs aborting [ 13.621310] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atom= bios stuck executing E650 (len 187, WS 0, PS 4) @ 0xE6FA [ 13.621341] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atom= bios stuck executing C53A (len 193, WS 4, PS 4) @ 0xC569 [ 13.621359] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atom= bios stuck executing C410 (len 114, WS 0, PS 8) @ 0xC47C [ 13.621361] amdgpu 0000:09:00.0: gpu post error! [ 13.621363] amdgpu 0000:09:00.0: Fatal error during GPU init [ 13.621370] [drm] amdgpu: finishing device. [ 13.621792] amdgpu: probe of 0000:09:00.0 failed with error -22 Here are my system details: System: Dell XPS 15 2 in 1 (Kaby Lake G) Kernel: 4.19 Mesa: 18.2.2 Xorg: 1.20.1 Built in GPUs: Intel iGPU, Vega M eGPU: RX 580 I'm not sure if I'm having problems because my laptop *also* contains a Veg= a M, which also uses the amdgpu driver. Perhaps there's a problem if there are multiple GPUs using amdgpu? One thing to point out is that the Vega M has worked flawlessly since Kernel 4.18.x. I did run across several other users posting about this same problem when attempting to run AMD GPUs as eGPUs. Here's a post where a user is reporti= ng the same issue: https://egpu.io/forums/thunderbolt-linux-setup/egpus-under-linux-an-advance= d-guide/#post-33304 And here's another post: https://forum.manjaro.org/t/rx-580-in-a-thunderbolt-egpu-dock/58210 I'm comfortable applying and testing kernel patches, so please feel free to= ask me to test any fixes. I'm currently running 4.19, but could also patch a 4.18.x kernel. Thanks! --=20 You are receiving this mail because: You are the assignee for the bug.= --15402716390.b5d3dc6.26617 Date: Tue, 23 Oct 2018 05:13:59 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated
Bug ID 108521
Summary RX 580 as eGPU amdgpu: gpu post error!
Product DRI
Version unspecified
Hardware x86-64 (AMD64)
OS Linux (All)
Status NEW
Severity normal
Priority medium
Component DRM/AMDgpu
Assignee dri-devel@lists.freedesktop.org
Reporter rstrube@gmail.com

Hello everyone,

I've been attempting to get my RX 580 working correctly as an eGPU using the
Akitio Node eGPU enclosure (over Thunderbolt 3).

I've confirmed that both the Akitio Node and my laptops Thunderbolt 3
controller are running the most up-to-date firmware.  I've also been able to
successfully authorize the Thunderbolt eGPU enclosure, and see the RX 580 in
lspci, see blow:

00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core Process=
or
Host Bridge/DRAM Registers (rev 05)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Co=
re
Processor PCIe Controller (x16) (rev 05)
00:02.0 VGA compatible controller: Intel Corporation Device 591b (rev 04)
00:04.0 Signal processing controller: Intel Corporation Xeon E3-1200 v5/E3-=
1500
v5/6th Gen Core Processor Thermal Subsystem (rev 05)
00:13.0 Non-VGA unclassified device: Intel Corporation 100 Series/C230 Seri=
es
Chipset Family Integrated Sensor Hub (rev 31)
00:14.0 USB controller: Intel Corporation 100 Series/C230 Series Chipset Fa=
mily
USB 3.0 xHCI Controller (rev 31)
00:14.2 Signal processing controller: Intel Corporation 100 Series/C230 Ser=
ies
Chipset Family Thermal Subsystem (rev 31)
00:15.0 Signal processing controller: Intel Corporation 100 Series/C230 Ser=
ies
Chipset Family Serial IO I2C Controller #0 (rev 31)
00:15.1 Signal processing controller: Intel Corporation 100 Series/C230 Ser=
ies
Chipset Family Serial IO I2C Controller #1 (rev 31)
00:16.0 Communication controller: Intel Corporation 100 Series/C230 Series
Chipset Family MEI Controller #1 (rev 31)
00:17.0 SATA controller: Intel Corporation HM170/QM170 Chipset SATA Control=
ler
[AHCI Mode] (rev 31)
00:1c.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family=
 PCI
Express Root Port #1 (rev f1)
00:1c.4 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family=
 PCI
Express Root Port #5 (rev f1)
00:1d.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family=
 PCI
Express Root Port #9 (rev f1)
00:1f.0 ISA bridge: Intel Corporation QM175 Chipset LPC/eSPI Controller (rev
31)
00:1f.2 Memory controller: Intel Corporation 100 Series/C230 Series Chipset
Family Power Management Controller (rev 31)
00:1f.3 Audio device: Intel Corporation CM238 HD Audio Controller (rev 31)
00:1f.4 SMBus: Intel Corporation 100 Series/C230 Series Chipset Family SMBus
(rev 31)
01:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Polaris =
22
[Radeon RX Vega M GL] (rev c0)
02:00.0 Network controller: Qualcomm Atheros QCA6174 802.11ac Wireless Netw=
ork
Adapter (rev 32)
03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI
Express Card Reader (rev 01)
04:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step)
[Alpine Ridge 4C 2016] (rev 02)
05:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step)
[Alpine Ridge 4C 2016] (rev 02)
05:01.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step)
[Alpine Ridge 4C 2016] (rev 02)
05:02.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step)
[Alpine Ridge 4C 2016] (rev 02)
05:04.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step)
[Alpine Ridge 4C 2016] (rev 02)
06:00.0 System peripheral: Intel Corporation JHL6540 Thunderbolt 3 NHI (C s=
tep)
[Alpine Ridge 4C 2016] (rev 02)
07:00.0 PCI bridge: Intel Corporation DSL6340 Thunderbolt 3 Bridge [Alpine
Ridge 2C 2015]
08:01.0 PCI bridge: Intel Corporation DSL6340 Thunderbolt 3 Bridge [Alpine
Ridge 2C 2015]
09:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
Ellesmere [Radeon RX 470/480/570/570X/580/580X] (rev e7)
09:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Rad=
eon
RX 580]

Looking at just the RX 580 in more detail using lspci -v we have:

09:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
Ellesmere [Radeon RX 470/480/570/570X/580/580X] (rev e7) (prog-if 00 [VGA
controller])
        Subsystem: XFX Pine Group Inc. Ellesmere [Radeon RX
470/480/570/570X/580/580X]
        Flags: fast devsel, IRQ 18
        Memory at 2fb0000000 (64-bit, prefetchable) [size=3D256M]
        Memory at 2fc0000000 (64-bit, prefetchable) [size=3D2M]
        I/O ports at 2000 [size=3D256]
        Memory at bc000000 (32-bit, non-prefetchable) [size=3D256K]
        Expansion ROM at bc040000 [disabled] [size=3D128K]
        Capabilities: [48] Vendor Specific Information: Len=3D08 <?>
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Legacy Endpoint, MSI 00
        Capabilities: [a0] MSI: Enable- Count=3D1/1 Maskable- 64bit+
        Capabilities: [100] Vendor Specific Information: ID=3D0001 Rev=3D1 =
Len=3D010
<?>
        Capabilities: [150] Advanced Error Reporting
        Capabilities: [200] #15
        Capabilities: [270] #19
        Capabilities: [2b0] Address Translation Service (ATS)
        Capabilities: [2c0] Page Request Interface (PRI)
        Capabilities: [2d0] Process Address Space ID (PASID)
        Capabilities: [320] Latency Tolerance Reporting
        Capabilities: [328] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [370] L1 PM Substates
        Kernel modules: amdgpu

When looking at demsg I see the following (I've removed non-relevant lines):

[    8.534250] amdgpu 0000:09:00.0: enabling device (0006 -> 0007)
[    8.534756] [drm] initializing kernel modesetting (POLARIS10 0x1002:0x67=
DF
0x1682:0xC580 0xE7).
[    8.537567] [drm] register mmio base: 0xBC000000
[    8.537568] [drm] register mmio size: 262144
[    8.537598] [drm] add ip block number 0 <vi_common>
[    8.537599] [drm] add ip block number 1 <gmc_v8_0>
[    8.537599] [drm] add ip block number 2 <tonga_ih>
[    8.537599] [drm] add ip block number 3 <powerplay>
[    8.537600] [drm] add ip block number 4 <dm>
[    8.537600] [drm] add ip block number 5 <gfx_v8_0>
[    8.537601] [drm] add ip block number 6 <sdma_v3_0>
[    8.537602] [drm] add ip block number 7 <uvd_v6_0>
[    8.537602] [drm] add ip block number 8 <vce_v3_0>
[    8.537608] kfd kfd: skipped device 1002:67df, PCI rejects atomics
[    8.537630] [drm] UVD is enabled in VM mode
[    8.537630] [drm] UVD ENC is enabled in VM mode
[    8.537636] [drm] VCE enabled in VM mode
[    8.614467] ATOM BIOS: 401815-171128-QS1
[    8.614512] [drm] GPU posting now...
[   13.621276] [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop f=
or
more than 5secs aborting
[   13.621310] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atom=
bios
stuck executing E650 (len 187, WS 0, PS 4) @ 0xE6FA
[   13.621341] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atom=
bios
stuck executing C53A (len 193, WS 4, PS 4) @ 0xC569
[   13.621359] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atom=
bios
stuck executing C410 (len 114, WS 0, PS 8) @ 0xC47C
[   13.621361] amdgpu 0000:09:00.0: gpu post error!
[   13.621363] amdgpu 0000:09:00.0: Fatal error during GPU init
[   13.621370] [drm] amdgpu: finishing device.
[   13.621792] amdgpu: probe of 0000:09:00.0 failed with error -22

Here are my system details:

System: Dell XPS 15 2 in 1 (Kaby Lake G)
Kernel: 4.19
Mesa: 18.2.2
Xorg: 1.20.1
Built in GPUs: Intel iGPU, Vega M
eGPU: RX 580

I'm not sure if I'm having problems because my laptop *also* contains a Veg=
a M,
which also uses the amdgpu driver.  Perhaps there's a problem if there are
multiple GPUs using amdgpu?  One thing to point out is that the Vega M has
worked flawlessly since Kernel 4.18.x.

I did run across several other users posting about this same problem when
attempting to run AMD GPUs as eGPUs.  Here's a post where a user is reporti=
ng
the same issue:

https://egpu.io/forums/thunderbolt-linux-se=
tup/egpus-under-linux-an-advanced-guide/#post-33304

And here's another post:

https://forum.manjaro.org/t/rx-580-in-a-thunderbolt-egpu-dock/58210

I'm comfortable applying and testing kernel patches, so please feel free to=
 ask
me to test any fixes.  I'm currently running 4.19, but could also patch a
4.18.x kernel.

Thanks!


You are receiving this mail because:
  • You are the assignee for the bug.
= --15402716390.b5d3dc6.26617-- --===============2025822665== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============2025822665==--