From mboxrd@z Thu Jan 1 00:00:00 1970
From: bugzilla-daemon@freedesktop.org
Subject: [Bug 108521] RX 580 as eGPU amdgpu: gpu post error!
Date: Tue, 23 Oct 2018 05:14:00 +0000
Message-ID:
Bug ID
108521
Summary
RX 580 as eGPU amdgpu: gpu post error!
Product
DRI
Version
unspecified
Hardware
x86-64 (AMD64)
OS
Linux (All)
Status
NEW
Severity
normal
Priority
medium
Component
DRM/AMDgpu
Assignee
dri-devel@lists.freedesktop.org
Reporter
rstrube@gmail.com
Hello everyone,
I've been attempting to get my RX 580 working correctly as an eGPU using the
Akitio Node eGPU enclosure (over Thunderbolt 3).
I've confirmed that both the Akitio Node and my laptops Thunderbolt 3
controller are running the most up-to-date firmware. I've also been able to
successfully authorize the Thunderbolt eGPU enclosure, and see the RX 580 in
lspci, see blow:
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core Process=
or
Host Bridge/DRAM Registers (rev 05)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Co=
re
Processor PCIe Controller (x16) (rev 05)
00:02.0 VGA compatible controller: Intel Corporation Device 591b (rev 04)
00:04.0 Signal processing controller: Intel Corporation Xeon E3-1200 v5/E3-=
1500
v5/6th Gen Core Processor Thermal Subsystem (rev 05)
00:13.0 Non-VGA unclassified device: Intel Corporation 100 Series/C230 Seri=
es
Chipset Family Integrated Sensor Hub (rev 31)
00:14.0 USB controller: Intel Corporation 100 Series/C230 Series Chipset Fa=
mily
USB 3.0 xHCI Controller (rev 31)
00:14.2 Signal processing controller: Intel Corporation 100 Series/C230 Ser=
ies
Chipset Family Thermal Subsystem (rev 31)
00:15.0 Signal processing controller: Intel Corporation 100 Series/C230 Ser=
ies
Chipset Family Serial IO I2C Controller #0 (rev 31)
00:15.1 Signal processing controller: Intel Corporation 100 Series/C230 Ser=
ies
Chipset Family Serial IO I2C Controller #1 (rev 31)
00:16.0 Communication controller: Intel Corporation 100 Series/C230 Series
Chipset Family MEI Controller #1 (rev 31)
00:17.0 SATA controller: Intel Corporation HM170/QM170 Chipset SATA Control=
ler
[AHCI Mode] (rev 31)
00:1c.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family=
PCI
Express Root Port #1 (rev f1)
00:1c.4 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family=
PCI
Express Root Port #5 (rev f1)
00:1d.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family=
PCI
Express Root Port #9 (rev f1)
00:1f.0 ISA bridge: Intel Corporation QM175 Chipset LPC/eSPI Controller (rev
31)
00:1f.2 Memory controller: Intel Corporation 100 Series/C230 Series Chipset
Family Power Management Controller (rev 31)
00:1f.3 Audio device: Intel Corporation CM238 HD Audio Controller (rev 31)
00:1f.4 SMBus: Intel Corporation 100 Series/C230 Series Chipset Family SMBus
(rev 31)
01:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Polaris =
22
[Radeon RX Vega M GL] (rev c0)
02:00.0 Network controller: Qualcomm Atheros QCA6174 802.11ac Wireless Netw=
ork
Adapter (rev 32)
03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI
Express Card Reader (rev 01)
04:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step)
[Alpine Ridge 4C 2016] (rev 02)
05:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step)
[Alpine Ridge 4C 2016] (rev 02)
05:01.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step)
[Alpine Ridge 4C 2016] (rev 02)
05:02.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step)
[Alpine Ridge 4C 2016] (rev 02)
05:04.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step)
[Alpine Ridge 4C 2016] (rev 02)
06:00.0 System peripheral: Intel Corporation JHL6540 Thunderbolt 3 NHI (C s=
tep)
[Alpine Ridge 4C 2016] (rev 02)
07:00.0 PCI bridge: Intel Corporation DSL6340 Thunderbolt 3 Bridge [Alpine
Ridge 2C 2015]
08:01.0 PCI bridge: Intel Corporation DSL6340 Thunderbolt 3 Bridge [Alpine
Ridge 2C 2015]
09:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
Ellesmere [Radeon RX 470/480/570/570X/580/580X] (rev e7)
09:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Rad=
eon
RX 580]
Looking at just the RX 580 in more detail using lspci -v we have:
09:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
Ellesmere [Radeon RX 470/480/570/570X/580/580X] (rev e7) (prog-if 00 [VGA
controller])
Subsystem: XFX Pine Group Inc. Ellesmere [Radeon RX
470/480/570/570X/580/580X]
Flags: fast devsel, IRQ 18
Memory at 2fb0000000 (64-bit, prefetchable) [size=3D256M]
Memory at 2fc0000000 (64-bit, prefetchable) [size=3D2M]
I/O ports at 2000 [size=3D256]
Memory at bc000000 (32-bit, non-prefetchable) [size=3D256K]
Expansion ROM at bc040000 [disabled] [size=3D128K]
Capabilities: [48] Vendor Specific Information: Len=3D08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [58] Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable- Count=3D1/1 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=3D0001 Rev=3D1 =
Len=3D010
<?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [200] #15
Capabilities: [270] #19
Capabilities: [2b0] Address Translation Service (ATS)
Capabilities: [2c0] Page Request Interface (PRI)
Capabilities: [2d0] Process Address Space ID (PASID)
Capabilities: [320] Latency Tolerance Reporting
Capabilities: [328] Alternative Routing-ID Interpretation (ARI)
Capabilities: [370] L1 PM Substates
Kernel modules: amdgpu
When looking at demsg I see the following (I've removed non-relevant lines):
[ 8.534250] amdgpu 0000:09:00.0: enabling device (0006 -> 0007)
[ 8.534756] [drm] initializing kernel modesetting (POLARIS10 0x1002:0x67=
DF
0x1682:0xC580 0xE7).
[ 8.537567] [drm] register mmio base: 0xBC000000
[ 8.537568] [drm] register mmio size: 262144
[ 8.537598] [drm] add ip block number 0 <vi_common>
[ 8.537599] [drm] add ip block number 1 <gmc_v8_0>
[ 8.537599] [drm] add ip block number 2 <tonga_ih>
[ 8.537599] [drm] add ip block number 3 <powerplay>
[ 8.537600] [drm] add ip block number 4 <dm>
[ 8.537600] [drm] add ip block number 5 <gfx_v8_0>
[ 8.537601] [drm] add ip block number 6 <sdma_v3_0>
[ 8.537602] [drm] add ip block number 7 <uvd_v6_0>
[ 8.537602] [drm] add ip block number 8 <vce_v3_0>
[ 8.537608] kfd kfd: skipped device 1002:67df, PCI rejects atomics
[ 8.537630] [drm] UVD is enabled in VM mode
[ 8.537630] [drm] UVD ENC is enabled in VM mode
[ 8.537636] [drm] VCE enabled in VM mode
[ 8.614467] ATOM BIOS: 401815-171128-QS1
[ 8.614512] [drm] GPU posting now...
[ 13.621276] [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop f=
or
more than 5secs aborting
[ 13.621310] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atom=
bios
stuck executing E650 (len 187, WS 0, PS 4) @ 0xE6FA
[ 13.621341] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atom=
bios
stuck executing C53A (len 193, WS 4, PS 4) @ 0xC569
[ 13.621359] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atom=
bios
stuck executing C410 (len 114, WS 0, PS 8) @ 0xC47C
[ 13.621361] amdgpu 0000:09:00.0: gpu post error!
[ 13.621363] amdgpu 0000:09:00.0: Fatal error during GPU init
[ 13.621370] [drm] amdgpu: finishing device.
[ 13.621792] amdgpu: probe of 0000:09:00.0 failed with error -22
Here are my system details:
System: Dell XPS 15 2 in 1 (Kaby Lake G)
Kernel: 4.19
Mesa: 18.2.2
Xorg: 1.20.1
Built in GPUs: Intel iGPU, Vega M
eGPU: RX 580
I'm not sure if I'm having problems because my laptop *also* contains a Veg=
a M,
which also uses the amdgpu driver. Perhaps there's a problem if there are
multiple GPUs using amdgpu? One thing to point out is that the Vega M has
worked flawlessly since Kernel 4.18.x.
I did run across several other users posting about this same problem when
attempting to run AMD GPUs as eGPUs. Here's a post where a user is reporti=
ng
the same issue:
https://egpu.io/forums/thunderbolt-linux-se=
tup/egpus-under-linux-an-advanced-guide/#post-33304
And here's another post:
https://forum.manjaro.org/t/rx-580-in-a-thunderbolt-egpu-dock/58210
I'm comfortable applying and testing kernel patches, so please feel free to=
ask
me to test any fixes. I'm currently running 4.19, but could also patch a
4.18.x kernel.
Thanks!