All of lore.kernel.org
 help / color / mirror / Atom feed
* Warning appeared after c8b5a95 ("drm/amdgpu: Fix desktop freezed after gpu-reset")
@ 2023-06-19  8:32 Christian Kastner
  2023-06-19 14:05 ` Alex Deucher
  0 siblings, 1 reply; 3+ messages in thread
From: Christian Kastner @ 2023-06-19  8:32 UTC (permalink / raw)
  To: amd-gfx

[-- Attachment #1: Type: text/plain, Size: 1035 bytes --]

Hi,

On a Debian 12 ("bookworm") system, I observed a new warning when I
upgraded from kernel 6.1.25 to 6.1.27. This is on a system with an RX
6800 XT GPU and 3500X processor.

I've traced it down to commit c8b5a95 ("drm/amdgpu: Fix desktop freezed
after gpu-reset"). Rebuilding the 6.1.27 kernel without this change
makes the warning disappear.

I can reliably trigger this (and another) warning with

  $ sudo cat /sys/kernel/debug/dri/0/amdgpu_test_ib
  run ib test:
  ib ring tests passed.

5 or 6 seconds after this, two warnings are printed. I see these same
two warnings on system shutdown (or, at least, they looked similar
enough to the above that I didn't check for identity).

I've attached
  (1) the dmesg output after modprobe'ing amdgpu
  (2) the dmesg output after triggering amdgpu_test_ib

The system in question is only used for ROCm development. I haven't
observed any other side effects there, other than the warning. There's
no monitor attached. So I can't speak to the effect of a desktop freeze.

Best,
Christian

[-- Attachment #2: cat_amdgpu_test_ib --]
[-- Type: text/plain, Size: 11659 bytes --]

[  266.669251] [drm] PCIE GART of 512M enabled (table at 0x00000083FEB00000).
[  266.669268] [drm] PSP is resuming...
[  266.739148] [drm] reserve 0xa00000 from 0x83fd000000 for PSP TMR
[  266.876401] amdgpu 0000:09:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[  266.876404] amdgpu 0000:09:00.0: amdgpu: SMU is resuming...
[  266.876407] amdgpu 0000:09:00.0: amdgpu: smu driver if version = 0x00000040, smu fw if version = 0x00000041, smu fw program = 0, version = 0x003a5600 (58.86.0)
[  266.876410] amdgpu 0000:09:00.0: amdgpu: SMU driver if version not matched
[  266.876428] amdgpu 0000:09:00.0: amdgpu: dpm has been enabled
[  266.879972] amdgpu 0000:09:00.0: amdgpu: SMU is resumed successfully!
[  266.881457] [drm] DMUB hardware initialized: version=0x02020017
[  266.904086] [drm] kiq ring mec 2 pipe 1 q 0
[  266.910932] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[  266.911082] [drm] JPEG decode initialized successfully.
[  266.911104] amdgpu 0000:09:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[  266.911106] amdgpu 0000:09:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[  266.911107] amdgpu 0000:09:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[  266.911108] amdgpu 0000:09:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[  266.911109] amdgpu 0000:09:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[  266.911110] amdgpu 0000:09:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[  266.911110] amdgpu 0000:09:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[  266.911111] amdgpu 0000:09:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[  266.911112] amdgpu 0000:09:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[  266.911113] amdgpu 0000:09:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[  266.911114] amdgpu 0000:09:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[  266.911115] amdgpu 0000:09:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[  266.911116] amdgpu 0000:09:00.0: amdgpu: ring sdma2 uses VM inv eng 14 on hub 0
[  266.911117] amdgpu 0000:09:00.0: amdgpu: ring sdma3 uses VM inv eng 15 on hub 0
[  266.911117] amdgpu 0000:09:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
[  266.911118] amdgpu 0000:09:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
[  266.911119] amdgpu 0000:09:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
[  266.911120] amdgpu 0000:09:00.0: amdgpu: ring vcn_dec_1 uses VM inv eng 5 on hub 1
[  266.911121] amdgpu 0000:09:00.0: amdgpu: ring vcn_enc_1.0 uses VM inv eng 6 on hub 1
[  266.911122] amdgpu 0000:09:00.0: amdgpu: ring vcn_enc_1.1 uses VM inv eng 7 on hub 1
[  266.911123] amdgpu 0000:09:00.0: amdgpu: ring jpeg_dec uses VM inv eng 8 on hub 1
[  266.916173] amdgpu 0000:09:00.0: [drm] Cannot find any crtc or sizes
[  266.916177] amdgpu 0000:09:00.0: [drm] Cannot find any crtc or sizes
[  272.409887] ------------[ cut here ]------------
[  272.409891] WARNING: CPU: 1 PID: 259 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:656 amdgpu_irq_put+0x45/0x70 [amdgpu]
[  272.410166] Modules linked in: amdgpu gpu_sched drm_buddy drm_display_helper cec rc_core drm_ttm_helper ttm drm_kms_helper i2c_algo_bit ipt_REJECT xt_multiport nft_compat ctr ccm wireguard libchacha20poly1305 chacha_x86_64 poly1305_x86_64 curve25519_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables libcrc32c nfnetlink overlay binfmt_misc nls_ascii nls_cp437 vfat fat intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd kvm_amd iwlmvm kvm mac80211 snd_hda_codec_realtek irqbypass snd_hda_codec_generic ghash_clmulni_intel snd_hda_codec_hdmi sha512_ssse3 sha512_generic libarc4 snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec iwlwifi snd_hda_core eeepc_wmi snd_hwdep asus_wmi aesni_intel snd_pcm platform_profile snd_timer crypto_simd battery cryptd sparse_keymap ccp snd
[  272.410230]  ledtrig_audio cfg80211 evdev asus_ec_sensors rapl video wmi_bmof mxm_wmi rng_core pcspkr sp5100_tco soundcore k10temp watchdog rfkill acpi_cpufreq button drm loop fuse efi_pstore configfs efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic hid_generic usbhid hid dm_mod ahci libahci xhci_pci nvme libata xhci_hcd nvme_core crc32_pclmul t10_pi igc crc32c_intel usbcore scsi_mod ptp i2c_piix4 crc64_rocksoft crc64 pps_core crc_t10dif crct10dif_generic crct10dif_pclmul usb_common scsi_common crct10dif_common wmi gpio_amdpt gpio_generic
[  272.410277] CPU: 1 PID: 259 Comm: kworker/1:2 Tainted: G        W          6.1.0-9-amd64 #1  Debian 6.1.27-1
[  272.410281] Hardware name: ASUS System Product Name/ROG STRIX B550-E GAMING, BIOS 3002 02/23/2023
[  272.410283] Workqueue: pm pm_runtime_work
[  272.410288] RIP: 0010:amdgpu_irq_put+0x45/0x70 [amdgpu]
[  272.410542] Code: 48 8b 4e 10 48 83 39 00 74 2c 89 d1 48 8d 04 88 8b 08 85 c9 74 14 f0 ff 08 b8 00 00 00 00 74 05 e9 00 38 d0 f0 e9 8b fd ff ff <0f> 0b b8 ea ff ff ff e9 ef 37 d0 f0 b8 ea ff ff ff e9 e5 37 d0 f0
[  272.410544] RSP: 0018:ffffa83ac0cafca8 EFLAGS: 00010246
[  272.410546] RAX: ffff8f9163a3f968 RBX: ffff8f9147e9a800 RCX: 0000000000000000
[  272.410548] RDX: 0000000000000000 RSI: ffff8f9147e9a808 RDI: ffff8f9145120000
[  272.410550] RBP: ffff8f9145120000 R08: ffff8f9145129858 R09: ffff8fa06f3560d0
[  272.410551] R10: ffffa83ac542f000 R11: ffffa83ac542f000 R12: 0000000000001050
[  272.410552] R13: ffff8f91451373f0 R14: 0000000000000000 R15: ffff8f914141d248
[  272.410554] FS:  0000000000000000(0000) GS:ffff8fa02ea40000(0000) knlGS:0000000000000000
[  272.410556] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  272.410558] CR2: 00007f8ce0bd23d8 CR3: 0000000e86410000 CR4: 0000000000350ee0
[  272.410560] Call Trace:
[  272.410563]  <TASK>
[  272.410565]  smu_smc_hw_cleanup+0x50/0x300 [amdgpu]
[  272.410855]  smu_suspend+0x5b/0xe0 [amdgpu]
[  272.411134]  amdgpu_device_ip_suspend_phase2+0x107/0x1a0 [amdgpu]
[  272.411373]  amdgpu_device_suspend+0xc9/0x150 [amdgpu]
[  272.411611]  amdgpu_pmops_runtime_suspend+0xba/0x190 [amdgpu]
[  272.411847]  pci_pm_runtime_suspend+0x66/0x1b0
[  272.411852]  ? update_load_avg+0x7e/0x780
[  272.411857]  ? pci_dev_put+0x20/0x20
[  272.411860]  __rpm_callback+0x44/0x170
[  272.411862]  ? pci_dev_put+0x20/0x20
[  272.411866]  rpm_callback+0x5d/0x70
[  272.411868]  ? pci_dev_put+0x20/0x20
[  272.411871]  rpm_suspend+0x11a/0x720
[  272.411874]  ? _raw_spin_unlock+0x15/0x30
[  272.411878]  ? finish_task_switch.isra.0+0x9b/0x300
[  272.411880]  ? __switch_to+0x106/0x410
[  272.411885]  pm_runtime_work+0x94/0xa0
[  272.411888]  process_one_work+0x1c7/0x380
[  272.411893]  worker_thread+0x4d/0x380
[  272.411896]  ? _raw_spin_lock_irqsave+0x23/0x50
[  272.411900]  ? rescuer_thread+0x3a0/0x3a0
[  272.411903]  kthread+0xe9/0x110
[  272.411906]  ? kthread_complete_and_exit+0x20/0x20
[  272.411910]  ret_from_fork+0x22/0x30
[  272.411916]  </TASK>
[  272.411917] ---[ end trace 0000000000000000 ]---
[  272.411921] amdgpu 0000:09:00.0: amdgpu: Fail to disable thermal alert!
[  272.411927] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -22
[  272.435354] amdgpu 0000:09:00.0: amdgpu: free PSP TMR buffer
[  272.474456] ------------[ cut here ]------------
[  272.474458] WARNING: CPU: 1 PID: 259 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:656 amdgpu_irq_put+0x45/0x70 [amdgpu]
[  272.474713] Modules linked in: amdgpu gpu_sched drm_buddy drm_display_helper cec rc_core drm_ttm_helper ttm drm_kms_helper i2c_algo_bit ipt_REJECT xt_multiport nft_compat ctr ccm wireguard libchacha20poly1305 chacha_x86_64 poly1305_x86_64 curve25519_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables libcrc32c nfnetlink overlay binfmt_misc nls_ascii nls_cp437 vfat fat intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd kvm_amd iwlmvm kvm mac80211 snd_hda_codec_realtek irqbypass snd_hda_codec_generic ghash_clmulni_intel snd_hda_codec_hdmi sha512_ssse3 sha512_generic libarc4 snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec iwlwifi snd_hda_core eeepc_wmi snd_hwdep asus_wmi aesni_intel snd_pcm platform_profile snd_timer crypto_simd battery cryptd sparse_keymap ccp snd
[  272.474764]  ledtrig_audio cfg80211 evdev asus_ec_sensors rapl video wmi_bmof mxm_wmi rng_core pcspkr sp5100_tco soundcore k10temp watchdog rfkill acpi_cpufreq button drm loop fuse efi_pstore configfs efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic hid_generic usbhid hid dm_mod ahci libahci xhci_pci nvme libata xhci_hcd nvme_core crc32_pclmul t10_pi igc crc32c_intel usbcore scsi_mod ptp i2c_piix4 crc64_rocksoft crc64 pps_core crc_t10dif crct10dif_generic crct10dif_pclmul usb_common scsi_common crct10dif_common wmi gpio_amdpt gpio_generic
[  272.474802] CPU: 1 PID: 259 Comm: kworker/1:2 Tainted: G        W          6.1.0-9-amd64 #1  Debian 6.1.27-1
[  272.474805] Hardware name: ASUS System Product Name/ROG STRIX B550-E GAMING, BIOS 3002 02/23/2023
[  272.474806] Workqueue: pm pm_runtime_work
[  272.474809] RIP: 0010:amdgpu_irq_put+0x45/0x70 [amdgpu]
[  272.475058] Code: 48 8b 4e 10 48 83 39 00 74 2c 89 d1 48 8d 04 88 8b 08 85 c9 74 14 f0 ff 08 b8 00 00 00 00 74 05 e9 00 38 d0 f0 e9 8b fd ff ff <0f> 0b b8 ea ff ff ff e9 ef 37 d0 f0 b8 ea ff ff ff e9 e5 37 d0 f0
[  272.475060] RSP: 0018:ffffa83ac0cafce0 EFLAGS: 00010246
[  272.475062] RAX: ffff8f91f01320a8 RBX: ffff8f9145120000 RCX: 0000000000000000
[  272.475064] RDX: 0000000000000000 RSI: ffff8f91451224d8 RDI: ffff8f9145120000
[  272.475066] RBP: ffff8f9145120000 R08: 0000000000000000 R09: 0000000000000000
[  272.475067] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000001050
[  272.475068] R13: ffff8f91451373f0 R14: 0000000000000000 R15: ffff8f914141d248
[  272.475070] FS:  0000000000000000(0000) GS:ffff8fa02ea40000(0000) knlGS:0000000000000000
[  272.475072] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  272.475073] CR2: 00007f8ce0bd23d8 CR3: 0000000240050000 CR4: 0000000000350ee0
[  272.475075] Call Trace:
[  272.475077]  <TASK>
[  272.475078]  gmc_v10_0_hw_fini+0x46/0x80 [amdgpu]
[  272.475326]  gmc_v10_0_suspend+0xa/0x20 [amdgpu]
[  272.475572]  amdgpu_device_ip_suspend_phase2+0x107/0x1a0 [amdgpu]
[  272.475809]  amdgpu_device_suspend+0xc9/0x150 [amdgpu]
[  272.476045]  amdgpu_pmops_runtime_suspend+0xba/0x190 [amdgpu]
[  272.476281]  pci_pm_runtime_suspend+0x66/0x1b0
[  272.476284]  ? update_load_avg+0x7e/0x780
[  272.476288]  ? pci_dev_put+0x20/0x20
[  272.476291]  __rpm_callback+0x44/0x170
[  272.476294]  ? pci_dev_put+0x20/0x20
[  272.476297]  rpm_callback+0x5d/0x70
[  272.476299]  ? pci_dev_put+0x20/0x20
[  272.476302]  rpm_suspend+0x11a/0x720
[  272.476305]  ? _raw_spin_unlock+0x15/0x30
[  272.476308]  ? finish_task_switch.isra.0+0x9b/0x300
[  272.476311]  ? __switch_to+0x106/0x410
[  272.476315]  pm_runtime_work+0x94/0xa0
[  272.476317]  process_one_work+0x1c7/0x380
[  272.476322]  worker_thread+0x4d/0x380
[  272.476325]  ? _raw_spin_lock_irqsave+0x23/0x50
[  272.476329]  ? rescuer_thread+0x3a0/0x3a0
[  272.476332]  kthread+0xe9/0x110
[  272.476335]  ? kthread_complete_and_exit+0x20/0x20
[  272.476338]  ret_from_fork+0x22/0x30
[  272.476343]  </TASK>
[  272.476344] ---[ end trace 0000000000000000 ]---

[-- Attachment #3: modprobe_amdgpu --]
[-- Type: text/plain, Size: 12180 bytes --]

[  118.974007] [drm] amdgpu kernel modesetting enabled.
[  118.977812] amdgpu: Ignoring ACPI CRAT on non-APU system
[  118.977814] amdgpu: Virtual CRAT table created for CPU
[  118.977822] amdgpu: Topology: Add CPU node
[  118.977937] amdgpu 0000:09:00.0: enabling device (0006 -> 0007)
[  118.977963] [drm] initializing kernel modesetting (SIENNA_CICHLID 0x1002:0x73BF 0x1043:0x04F2 0xC1).
[  118.977971] [drm] register mmio base: 0xFCC00000
[  118.977972] [drm] register mmio size: 1048576
[  118.979785] [drm] add ip block number 0 <nv_common>
[  118.979786] [drm] add ip block number 1 <gmc_v10_0>
[  118.979786] [drm] add ip block number 2 <navi10_ih>
[  118.979787] [drm] add ip block number 3 <psp>
[  118.979788] [drm] add ip block number 4 <smu>
[  118.979789] [drm] add ip block number 5 <dm>
[  118.979790] [drm] add ip block number 6 <gfx_v10_0>
[  118.979790] [drm] add ip block number 7 <sdma_v5_2>
[  118.979791] [drm] add ip block number 8 <vcn_v3_0>
[  118.979792] [drm] add ip block number 9 <jpeg_v3_0>
[  118.979804] amdgpu 0000:09:00.0: amdgpu: Fetched VBIOS from VFCT
[  118.979806] amdgpu: ATOM BIOS: 115-D412BS0-101
[  118.979811] [drm] VCN(0) decode is enabled in VM mode
[  118.979812] [drm] VCN(1) decode is enabled in VM mode
[  118.979812] [drm] VCN(0) encode is enabled in VM mode
[  118.979813] [drm] VCN(1) encode is enabled in VM mode
[  118.979814] [drm] JPEG decode is enabled in VM mode
[  118.979843] amdgpu 0000:09:00.0: vgaarb: deactivate vga console
[  118.979845] amdgpu 0000:09:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
[  118.979871] amdgpu 0000:09:00.0: amdgpu: MEM ECC is not presented.
[  118.979871] amdgpu 0000:09:00.0: amdgpu: SRAM ECC is not presented.
[  118.979879] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
[  118.979884] amdgpu 0000:09:00.0: amdgpu: VRAM: 16368M 0x0000008000000000 - 0x00000083FEFFFFFF (16368M used)
[  118.979885] amdgpu 0000:09:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[  118.979887] amdgpu 0000:09:00.0: amdgpu: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
[  118.979893] [drm] Detected VRAM RAM=16368M, BAR=16384M
[  118.979894] [drm] RAM width 256bits GDDR6
[  118.979939] [drm] amdgpu: 16368M of VRAM memory ready
[  118.979940] [drm] amdgpu: 32105M of GTT memory ready.
[  118.979949] [drm] GART: num cpu pages 131072, num gpu pages 131072
[  118.980072] [drm] PCIE GART of 512M enabled (table at 0x00000083FEB00000).
[  118.987203] amdgpu 0000:09:00.0: firmware: direct-loading firmware amdgpu/sienna_cichlid_sos.bin
[  118.987659] amdgpu 0000:09:00.0: firmware: direct-loading firmware amdgpu/sienna_cichlid_ta.bin
[  118.987664] amdgpu 0000:09:00.0: amdgpu: PSP runtime database doesn't exist
[  118.987667] amdgpu 0000:09:00.0: amdgpu: PSP runtime database doesn't exist
[  120.557037] amdgpu 0000:09:00.0: firmware: direct-loading firmware amdgpu/sienna_cichlid_smc.bin
[  120.557049] amdgpu 0000:09:00.0: amdgpu: STB initialized to 2048 entries
[  120.557322] amdgpu 0000:09:00.0: firmware: direct-loading firmware amdgpu/sienna_cichlid_dmcub.bin
[  120.557325] [drm] Loading DMUB firmware via PSP: version=0x02020017
[  120.557676] amdgpu 0000:09:00.0: firmware: direct-loading firmware amdgpu/sienna_cichlid_pfp.bin
[  120.557955] amdgpu 0000:09:00.0: firmware: direct-loading firmware amdgpu/sienna_cichlid_me.bin
[  120.558358] amdgpu 0000:09:00.0: firmware: direct-loading firmware amdgpu/sienna_cichlid_ce.bin
[  120.558655] amdgpu 0000:09:00.0: firmware: direct-loading firmware amdgpu/sienna_cichlid_rlc.bin
[  120.558936] amdgpu 0000:09:00.0: firmware: direct-loading firmware amdgpu/sienna_cichlid_mec.bin
[  120.559228] amdgpu 0000:09:00.0: firmware: direct-loading firmware amdgpu/sienna_cichlid_mec2.bin
[  120.559799] amdgpu 0000:09:00.0: firmware: direct-loading firmware amdgpu/sienna_cichlid_sdma.bin
[  120.559803] [drm] use_doorbell being set to: [true]
[  120.559814] [drm] use_doorbell being set to: [true]
[  120.559824] [drm] use_doorbell being set to: [true]
[  120.559834] [drm] use_doorbell being set to: [true]
[  120.560288] amdgpu 0000:09:00.0: firmware: direct-loading firmware amdgpu/sienna_cichlid_vcn.bin
[  120.560290] [drm] Found VCN firmware Version ENC: 1.26 DEC: 2 VEP: 0 Revision: 0
[  120.560299] amdgpu 0000:09:00.0: amdgpu: Will use PSP to load VCN firmware
[  120.626402] [drm] reserve 0xa00000 from 0x83fd000000 for PSP TMR
[  120.770106] amdgpu 0000:09:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[  120.770128] amdgpu 0000:09:00.0: amdgpu: smu driver if version = 0x00000040, smu fw if version = 0x00000041, smu fw program = 0, version = 0x003a5600 (58.86.0)
[  120.770131] amdgpu 0000:09:00.0: amdgpu: SMU driver if version not matched
[  120.770158] amdgpu 0000:09:00.0: amdgpu: use vbios provided pptable
[  120.843840] amdgpu 0000:09:00.0: amdgpu: SMU is initialized successfully!
[  120.844056] [drm] Display Core initialized with v3.2.207!
[  120.845198] [drm] DMUB hardware initialized: version=0x02020017
[  120.847342] snd_hda_intel 0000:09:00.1: bound 0000:09:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
[  120.849209] [drm] kiq ring mec 2 pipe 1 q 0
[  120.856238] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[  120.856783] [drm] JPEG decode initialized successfully.
[  120.858253] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[  120.858425] amdgpu: sdma_bitmap: ffff
[  120.858485] amdgpu: SRAT table not found
[  120.858486] amdgpu: Virtual CRAT table created for GPU
[  120.858687] amdgpu: Topology: Add dGPU node [0x73bf:0x1002]
[  120.858689] kfd kfd: amdgpu: added device 1002:73bf
[  120.858712] amdgpu 0000:09:00.0: amdgpu: SE 4, SH per SE 2, CU per SH 10, active_cu_number 72
[  120.858772] amdgpu 0000:09:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[  120.858773] amdgpu 0000:09:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[  120.858774] amdgpu 0000:09:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[  120.858775] amdgpu 0000:09:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[  120.858776] amdgpu 0000:09:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[  120.858777] amdgpu 0000:09:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[  120.858777] amdgpu 0000:09:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[  120.858778] amdgpu 0000:09:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[  120.858779] amdgpu 0000:09:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[  120.858780] amdgpu 0000:09:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[  120.858781] amdgpu 0000:09:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[  120.858782] amdgpu 0000:09:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[  120.858783] amdgpu 0000:09:00.0: amdgpu: ring sdma2 uses VM inv eng 14 on hub 0
[  120.858784] amdgpu 0000:09:00.0: amdgpu: ring sdma3 uses VM inv eng 15 on hub 0
[  120.858784] amdgpu 0000:09:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
[  120.858785] amdgpu 0000:09:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
[  120.858786] amdgpu 0000:09:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
[  120.858787] amdgpu 0000:09:00.0: amdgpu: ring vcn_dec_1 uses VM inv eng 5 on hub 1
[  120.858788] amdgpu 0000:09:00.0: amdgpu: ring vcn_enc_1.0 uses VM inv eng 6 on hub 1
[  120.858789] amdgpu 0000:09:00.0: amdgpu: ring vcn_enc_1.1 uses VM inv eng 7 on hub 1
[  120.858790] amdgpu 0000:09:00.0: amdgpu: ring jpeg_dec uses VM inv eng 8 on hub 1
[  120.860623] amdgpu 0000:09:00.0: amdgpu: Using BACO for runtime pm
[  120.861653] [drm] Initialized amdgpu 3.49.0 20150101 for 0000:09:00.0 on minor 0
[  120.867960] amdgpu 0000:09:00.0: [drm] Cannot find any crtc or sizes
[  120.867972] [drm] DSC precompute is not needed.
[  126.172885] amdgpu 0000:09:00.0: amdgpu: free PSP TMR buffer
[  126.212061] ------------[ cut here ]------------
[  126.212063] WARNING: CPU: 0 PID: 53 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:656 amdgpu_irq_put+0x45/0x70 [amdgpu]
[  126.212344] Modules linked in: amdgpu gpu_sched drm_buddy drm_display_helper cec rc_core drm_ttm_helper ttm drm_kms_helper i2c_algo_bit ipt_REJECT xt_multiport nft_compat ctr ccm wireguard libchacha20poly1305 chacha_x86_64 poly1305_x86_64 curve25519_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables libcrc32c nfnetlink overlay binfmt_misc nls_ascii nls_cp437 vfat fat intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd kvm_amd iwlmvm kvm mac80211 snd_hda_codec_realtek irqbypass snd_hda_codec_generic ghash_clmulni_intel snd_hda_codec_hdmi sha512_ssse3 sha512_generic libarc4 snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec iwlwifi snd_hda_core eeepc_wmi snd_hwdep asus_wmi aesni_intel snd_pcm platform_profile snd_timer crypto_simd battery cryptd sparse_keymap ccp snd
[  126.212408]  ledtrig_audio cfg80211 evdev asus_ec_sensors rapl video wmi_bmof mxm_wmi rng_core pcspkr sp5100_tco soundcore k10temp watchdog rfkill acpi_cpufreq button drm loop fuse efi_pstore configfs efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic hid_generic usbhid hid dm_mod ahci libahci xhci_pci nvme libata xhci_hcd nvme_core crc32_pclmul t10_pi igc crc32c_intel usbcore scsi_mod ptp i2c_piix4 crc64_rocksoft crc64 pps_core crc_t10dif crct10dif_generic crct10dif_pclmul usb_common scsi_common crct10dif_common wmi gpio_amdpt gpio_generic
[  126.212458] CPU: 0 PID: 53 Comm: kworker/0:2 Not tainted 6.1.0-9-amd64 #1  Debian 6.1.27-1
[  126.212462] Hardware name: ASUS System Product Name/ROG STRIX B550-E GAMING, BIOS 3002 02/23/2023
[  126.212464] Workqueue: pm pm_runtime_work
[  126.212470] RIP: 0010:amdgpu_irq_put+0x45/0x70 [amdgpu]
[  126.212724] Code: 48 8b 4e 10 48 83 39 00 74 2c 89 d1 48 8d 04 88 8b 08 85 c9 74 14 f0 ff 08 b8 00 00 00 00 74 05 e9 00 38 d0 f0 e9 8b fd ff ff <0f> 0b b8 ea ff ff ff e9 ef 37 d0 f0 b8 ea ff ff ff e9 e5 37 d0 f0
[  126.212726] RSP: 0018:ffffa83ac0397ce0 EFLAGS: 00010246
[  126.212729] RAX: ffff8f91f01320a8 RBX: ffff8f9145120000 RCX: 0000000000000000
[  126.212731] RDX: 0000000000000000 RSI: ffff8f91451224d8 RDI: ffff8f9145120000
[  126.212732] RBP: ffff8f9145120000 R08: 0000000000000000 R09: 0000000000000000
[  126.212734] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000001050
[  126.212735] R13: ffff8f91451373f0 R14: 0000000000000000 R15: ffff8f914141d248
[  126.212737] FS:  0000000000000000(0000) GS:ffff8fa02ea00000(0000) knlGS:0000000000000000
[  126.212739] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  126.212740] CR2: 00005580bc35f9c0 CR3: 0000000e86410000 CR4: 0000000000350ef0
[  126.212743] Call Trace:
[  126.212746]  <TASK>
[  126.212747]  gmc_v10_0_hw_fini+0x46/0x80 [amdgpu]
[  126.212999]  gmc_v10_0_suspend+0xa/0x20 [amdgpu]
[  126.213247]  amdgpu_device_ip_suspend_phase2+0x107/0x1a0 [amdgpu]
[  126.213484]  amdgpu_device_suspend+0xc9/0x150 [amdgpu]
[  126.213721]  amdgpu_pmops_runtime_suspend+0xba/0x190 [amdgpu]
[  126.213957]  pci_pm_runtime_suspend+0x66/0x1b0
[  126.213962]  ? update_load_avg+0x7e/0x780
[  126.213967]  ? pci_dev_put+0x20/0x20
[  126.213970]  __rpm_callback+0x44/0x170
[  126.213973]  ? pci_dev_put+0x20/0x20
[  126.213976]  rpm_callback+0x5d/0x70
[  126.213978]  ? pci_dev_put+0x20/0x20
[  126.213981]  rpm_suspend+0x11a/0x720
[  126.213984]  ? _raw_spin_unlock+0x15/0x30
[  126.213988]  ? finish_task_switch.isra.0+0x9b/0x300
[  126.213991]  ? __switch_to+0x106/0x410
[  126.213995]  pm_runtime_work+0x94/0xa0
[  126.213998]  process_one_work+0x1c7/0x380
[  126.214003]  worker_thread+0x4d/0x380
[  126.214007]  ? _raw_spin_lock_irqsave+0x23/0x50
[  126.214010]  ? rescuer_thread+0x3a0/0x3a0
[  126.214014]  kthread+0xe9/0x110
[  126.214017]  ? kthread_complete_and_exit+0x20/0x20
[  126.214020]  ret_from_fork+0x22/0x30
[  126.214026]  </TASK>
[  126.214027] ---[ end trace 0000000000000000 ]---

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Warning appeared after c8b5a95 ("drm/amdgpu: Fix desktop freezed after gpu-reset")
  2023-06-19  8:32 Warning appeared after c8b5a95 ("drm/amdgpu: Fix desktop freezed after gpu-reset") Christian Kastner
@ 2023-06-19 14:05 ` Alex Deucher
  2023-06-19 14:48   ` Christian Kastner
  0 siblings, 1 reply; 3+ messages in thread
From: Alex Deucher @ 2023-06-19 14:05 UTC (permalink / raw)
  To: Christian Kastner; +Cc: amd-gfx

On Mon, Jun 19, 2023 at 9:05 AM Christian Kastner <ckk@debian.org> wrote:
>
> Hi,
>
> On a Debian 12 ("bookworm") system, I observed a new warning when I
> upgraded from kernel 6.1.25 to 6.1.27. This is on a system with an RX
> 6800 XT GPU and 3500X processor.
>
> I've traced it down to commit c8b5a95 ("drm/amdgpu: Fix desktop freezed
> after gpu-reset"). Rebuilding the 6.1.27 kernel without this change
> makes the warning disappear.
>
> I can reliably trigger this (and another) warning with
>
>   $ sudo cat /sys/kernel/debug/dri/0/amdgpu_test_ib
>   run ib test:
>   ib ring tests passed.
>
> 5 or 6 seconds after this, two warnings are printed. I see these same
> two warnings on system shutdown (or, at least, they looked similar
> enough to the above that I didn't check for identity).
>
> I've attached
>   (1) the dmesg output after modprobe'ing amdgpu
>   (2) the dmesg output after triggering amdgpu_test_ib
>
> The system in question is only used for ROCm development. I haven't
> observed any other side effects there, other than the warning. There's
> no monitor attached. So I can't speak to the effect of a desktop freeze.

The warnings are harmless, but they have been fixed[1] and the fixes
are making their way back to stable kernels.

[1] - https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=08c677cb0b436a96a836792bb35a8ec5de4999c2

Alex

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Warning appeared after c8b5a95 ("drm/amdgpu: Fix desktop freezed after gpu-reset")
  2023-06-19 14:05 ` Alex Deucher
@ 2023-06-19 14:48   ` Christian Kastner
  0 siblings, 0 replies; 3+ messages in thread
From: Christian Kastner @ 2023-06-19 14:48 UTC (permalink / raw)
  To: Alex Deucher; +Cc: amd-gfx

On 2023-06-19 16:05, Alex Deucher wrote:
> On Mon, Jun 19, 2023 at 9:05 AM Christian Kastner <ckk@debian.org> wrote:
>> On a Debian 12 ("bookworm") system, I observed a new warning when I
>> upgraded from kernel 6.1.25 to 6.1.27. This is on a system with an RX
>> 6800 XT GPU and 3500X processor.
> 
> The warnings are harmless, but they have been fixed[1] and the fixes
> are making their way back to stable kernels.
> 
> [1] - https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=08c677cb0b436a96a836792bb35a8ec5de4999c2

That was quick. Thank you for pointing out the resolution.

Best,
Christian

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-06-20  7:09 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-19  8:32 Warning appeared after c8b5a95 ("drm/amdgpu: Fix desktop freezed after gpu-reset") Christian Kastner
2023-06-19 14:05 ` Alex Deucher
2023-06-19 14:48   ` Christian Kastner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.