https://bugs.freedesktop.org/show_bug.cgi?id=108781 Bug ID: 108781 Summary: 4.19 Regression - Hawaii (R9 390) boot failure - Invalid PCC GPIO / invalid powerlevel state / Fatal error during GPU init Product: DRI Version: unspecified Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: critical Priority: medium Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: jamespharvey20@gmail.com Created attachment 142499 --> https://bugs.freedesktop.org/attachment.cgi?id=142499&action=edit dmesg (journalctl) of failure on 4.19.2.arch1-1 arch 4.18.16.arch1-1 works, using kernel parameters: radeon.cik_support=0 amdgpu.cik_support=1 amdgpu.dpm=1 amdgpu.dc=1 Upgraded to 4.19.2.arch1-1, and started getting this failure. Going back to 4.19.arch1-1 still gives this failure. Full dmesg (journalctl) attached for 4.19.2.arch1-1 (failing), 4.19.arch1-1 (failing), and 4.18.16.arch1-1 (working). But pertinent part of failure is below for search. This failure occurs booting to a tty, so no X logs are involved. (You might see on 4.18.16.arch1-1, there is a [drm:generic_reg_wait [amdgpu]] error and backtrace which has been happening forever, but it works and doesn't cause a noticeable problem.) ----- # lspci -v ... 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii PRO [Radeon R9 290/390] (rev 80) (prog-if 00 [VGA controller]) Subsystem: ASUSTeK Computer Inc. Hawaii PRO [Radeon R9 290/390] Flags: bus master, fast devsel, latency 0, IRQ 75, NUMA node 0 Memory at c0000000 (64-bit, prefetchable) [size=256M] Memory at d0000000 (64-bit, prefetchable) [size=8M] I/O ports at 8000 [size=256] Memory at dfe00000 (32-bit, non-prefetchable) [size=256K] Expansion ROM at 000c0000 [disabled] [size=128K] Capabilities: [48] Vendor Specific Information: Len=08 Capabilities: [50] Power Management version 3 Capabilities: [58] Express Legacy Endpoint, MSI 00 Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 Capabilities: [150] Advanced Error Reporting Capabilities: [200] Resizable BAR Capabilities: [270] Secondary PCI Express Capabilities: [2b0] Address Translation Service (ATS) Capabilities: [2c0] Page Request Interface (PRI) Capabilities: [2d0] Process Address Space ID (PASID) Kernel driver in use: amdgpu Kernel modules: radeon, amdgpu ----- [drm] Invalid PCC GPIO: 13! ui class: none internal class: boot caps: uvd vclk: 0 dclk: 0 power level 0 sclk: 30000 mclk: 15000 pcie gen: 3 pcie lanes: 16 status: c r b ui class: performance internal class: none caps: uvd vclk: 0 dclk: 0 power level 0 sclk: 30000 mclk: 15000 pcie gen: 3 pcie lanes: 16 power level 1 sclk: 105000 mclk: 150000 pcie gen: 3 pcie lanes: 16 status: [drm] amdgpu: dpm initialized [drm] Found UVD firmware Version: 1.64 Family ID: 9 [drm] Found VCE firmware Version: 50.10 Binary ID: 2 [drm] PCIE gen 3 link speeds already enabled [drm:dm_pp_get_static_clocks [amdgpu]] *ERROR* DM_PPLIB: invalid powerlevel state: 0! [drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VBIOS with error code 4! [drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VBIOS with error code 4! [drm] Display Core initialized with v3.1.59! [drm] DM_MST: Differing MST start on aconnector: 00000000d3bd29d7 [id: 55] [drm] DM_MST: Differing MST start on aconnector: 000000004b0d56b6 [id: 57] [drm] DM_MST: Differing MST start on aconnector: 0000000058d5a853 [id: 59] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [drm] Driver supports precise vblank timestamp query. [drm] UVD initialized successfully. [drm:amdgpu_vce_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 12 test failed [drm:amdgpu_device_init.cold.14 [amdgpu]] *ERROR* hw_init of IP block failed -110 amdgpu 0000:03:00.0: amdgpu_device_ip_init failed amdgpu 0000:03:00.0: Fatal error during GPU init [drm] amdgpu: finishing device. ------------[ cut here ]------------ Memory manager not clean during takedown. WARNING: CPU: 0 PID: 670 at drivers/gpu/drm/drm_mm.c:950 drm_mm_takedown+0x1f/0x30 [drm] Modules linked in: amdkfd amd_iommu_v2 amdgpu(+) intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel i> x_tables sr_mod cdrom btrfs xor sd_mod dm_thin_pool dm_persistent_data raid6_pq dm_bio_prison dm_bufio libcrc32c crc32c_gener> CPU: 0 PID: 670 Comm: kworker/0:4 Not tainted 4.19.0-arch1-1-ARCH #1 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./EP2C602, BIOS P1.90 04/12/2018 Workqueue: events work_for_cpu_fn RIP: 0010:drm_mm_takedown+0x1f/0x30 [drm] Code: 0d d0 cb 0f 1f 84 00 00 00 00 00 66 66 66 66 90 48 8b 47 38 48 83 c7 38 48 39 c7 75 01 c3 48 c7 c7 08 b1 1b c1 e8 5b 10 > RSP: 0018:ffff91764827bd08 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff8e5a1b613200 RCX: 0000000000000000 RDX: 0000000000000007 RSI: ffffffff8de9d696 RDI: 00000000ffffffff RBP: ffff8e5a0ca729a0 R08: 0000000000000001 R09: 00000000000005aa R10: 0000000000000004 R11: 0000000000000000 R12: ffff8e5a1b6132e8 R13: 0000000000000000 R14: 0000000000000170 R15: ffff8e5a0c69e650 FS: 0000000000000000(0000) GS:ffff8e5a1f800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f4f26530480 CR3: 00000001f0a0a006 CR4: 00000000000606f0 Call Trace: amdgpu_vram_mgr_fini+0x27/0x50 [amdgpu] ttm_bo_clean_mm+0xa9/0xb0 [ttm] amdgpu_ttm_fini+0x71/0x100 [amdgpu] amdgpu_bo_fini+0xe/0x30 [amdgpu] gmc_v7_0_sw_fini+0x32/0x60 [amdgpu] amdgpu_device_fini+0x2cc/0x4aa [amdgpu] amdgpu_driver_unload_kms+0x42/0x90 [amdgpu] amdgpu_driver_load_kms+0x168/0x2c0 [amdgpu] drm_dev_register+0x109/0x140 [drm] amdgpu_pci_probe+0x13c/0x1c0 [amdgpu] ? _raw_spin_unlock_irqrestore+0x20/0x40 local_pci_probe+0x41/0x90 work_for_cpu_fn+0x16/0x20 process_one_work+0x1eb/0x410 worker_thread+0x218/0x3d0 ? process_one_work+0x410/0x410 kthread+0x112/0x130 ? kthread_park+0x80/0x80 ret_from_fork+0x35/0x40 ---[ end trace 3cf1bcf02bf4fe1a ]--- -- You are receiving this mail because: You are the assignee for the bug.