https://bugs.freedesktop.org/show_bug.cgi?id=110099 Bug ID: 110099 Summary: Unprivileged user mode program can cause GPU reset Product: Spam Version: unspecified Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: major Priority: medium Component: Two Assignee: daniel@fooishbar.org Reporter: baigshakira123@gmail.com CC: dri-devel@lists.freedesktop.org, sudolskym@gmail.com Depends on: 109978 Created attachment 143663 --> https://bugs.freedesktop.org/attachment.cgi?id=143663&action=edit clone1 +++ This bug was initially created as a clone of Bug #109978 +++ https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/issues/72 Sample program which causes this (needs ROCm): > #include > int main() > { > parallel_for_each(hc::extent<1>(1), [=]() [[hc]] > { > asm("s_trap 2"); > }); > return 0; > } > hcc -hc main.cpp > ./a.out Process never ends and CTRL-C causes GPU reset which breaks all other processes actually using rocm on that GPU. Seems trap handler expects queue handle in s[0:1] which is set when using __builtin_trap() so without it trap handler causes another exceptions. System logs: [ 247.428727] qcm fence wait loop timeout expired [ 247.428730] The cp might be in an unrecoverable state due to an unsuccessful queues preemption [ 247.428736] amdgpu 0000:0b:00.0: GPU reset begin! [ 247.619440] amdgpu 0000:0b:00.0: GPU reset [ 248.152762] [drm] psp mode1 reset succeed [ 248.279461] amdgpu 0000:0b:00.0: GPU reset succeeded, trying to resume [ 248.279584] [drm] PCIE GART of 512M enabled (table at 0x000000F400900000). [ 248.279639] [drm:amdgpu_device_gpu_recover [amdgpu]] *ERROR* VRAM is lost! [ 248.279769] [drm] PSP is resuming... [ 248.428305] [drm] reserve 0x400000 from 0xf400d00000 for PSP TMR SIZE [ 248.472774] WARNING: CPU: 23 PID: 21634 at /build/linux-uQJ2um/linux-4.15.0/kernel/kthread.c:498 kthread_park+0x67/0x80 [ 248.472775] Modules linked in: ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs msr nls_utf8 cifs ccm fscache cmac bnep binfmt_misc nls_iso8859_1 edac_mce_amd arc4 snd_hda_codec_realtek snd_hda_codec_generic kvm_amd snd_hda_codec_hdmi kvm snd_seq_midi irqbypass snd_hda_intel snd_seq_midi_event snd_hda_codec btusb snd_hda_core btrtl wmi_bmof snd_rawmidi iwlmvm snd_hwdep btbcm btintel snd_pcm snd_seq bluetooth mac80211 snd_seq_device ecdh_generic snd_timer iwlwifi ccp snd cfg80211 soundcore k10temp shpchp mac_hid sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nct6775 hwmon_vid parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 [ 248.472823] multipath linear raid0 amdgpu(OE) amdchash(OE) amdttm(OE) amd_sched(OE) mxm_wmi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 amdkcl(OE) crypto_simd glue_helper amd_iommu_v2 cryptd drm_kms_helper syscopyarea sysfillrect sysimgblt igb fb_sys_fops drm dca nvme i2c_algo_bit i2c_piix4 nvme_core ptp ahci atlantic libahci pps_core gpio_amdpt wmi gpio_generic [ 248.472846] CPU: 23 PID: 21634 Comm: a.out Tainted: G OE 4.15.0-45-generic #48-Ubuntu [ 248.472847] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X399 Professional Gaming, BIOS P3.30 08/14/2018 [ 248.472849] RIP: 0010:kthread_park+0x67/0x80 [ 248.472850] RSP: 0018:ffffb44fc7e27ad0 EFLAGS: 00010202 [ 248.472852] RAX: 0000000000000004 RBX: ffff9ec63f49e480 RCX: 0000000000000000 [ 248.472853] RDX: ffff9ec63c717198 RSI: ffff9ec63ea0c0c0 RDI: ffff9ec63dd38000 [ 248.472854] RBP: ffffb44fc7e27ae0 R08: 0000000000000051 R09: 0000000000000000 [ 248.472855] R10: 0000000000000000 R11: 0000000000000056 R12: ffff9ec63ea0c0c0 [ 248.472855] R13: ffff9ec64f4f4200 R14: ffff9ec63c710000 R15: 0000000000000000 [ 248.472857] FS: 00007fd52a286c00(0000) GS:ffff9ec65cdc0000(0000) knlGS:0000000000000000 [ 248.472858] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 248.472859] CR2: 00007f0c07687a98 CR3: 000000081b5b6000 CR4: 00000000003406e0 [ 248.472860] Call Trace: [ 248.472865] amddrm_sched_entity_fini+0x44/0x1b0 [amd_sched] [ 248.472868] amddrm_sched_entity_destroy+0x1f/0x30 [amd_sched] [ 248.472907] amdgpu_vm_fini+0xbb/0x4f0 [amdgpu] [ 248.472942] amdgpu_driver_postclose_kms+0x15b/0x2b0 [amdgpu] [ 248.472952] drm_release+0x26b/0x390 [drm] [ 248.472955] __fput+0xea/0x220 [ 248.472957] ____fput+0xe/0x10 [ 248.472959] task_work_run+0x9d/0xc0 [ 248.472961] do_exit+0x2ec/0xb40 [ 248.472963] do_group_exit+0x43/0xb0 [ 248.472965] get_signal+0x27b/0x590 [ 248.472968] do_signal+0x37/0x730 [ 248.472971] ? __switch_to_asm+0x34/0x70 [ 248.472973] ? __switch_to_asm+0x40/0x70 [ 248.472976] ? do_vfs_ioctl+0xa8/0x630 [ 248.472978] ? __schedule+0x299/0x8a0 [ 248.472980] exit_to_usermode_loop+0x73/0xd0 [ 248.472982] do_syscall_64+0x115/0x130 [ 248.472984] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 248.472986] RIP: 0033:0x7fd528bdd5d7 [ 248.472987] RSP: 002b:00007ffe830d4778 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 248.472988] RAX: fffffffffffffffc RBX: 0000000000000001 RCX: 00007fd528bdd5d7 [ 248.472989] RDX: 00007ffe830d47d0 RSI: 00000000c0184b0c RDI: 0000000000000003 [ 248.472990] RBP: 00007ffe830d47d0 R08: 00007ffe830d4890 R09: 0000000000000001 [ 248.472990] R10: 0000000000c92010 R11: 0000000000000246 R12: 00000000c0184b0c [ 248.472991] R13: 0000000000000003 R14: 0000000000000000 R15: 00000000fffffffe [ 248.472992] Code: 0e e8 6e c0 00 00 48 8d 7b 18 e8 35 d2 8e 00 44 89 e0 5b 41 5c 5d c3 0f 0b 41 bc da ff ff ff 44 89 e0 5b 41 5c 5d c3 0f 0b eb af <0f> 0b 41 bc f0 ff ff ff eb da 0f 1f 44 00 00 66 2e 0f 1f 84 00 [ 248.473020] ---[ end trace 19649ddd4a6314f7 ]--- [ 248.648453] [drm] UVD and UVD ENC initialized successfully. [ 248.748509] [drm] VCE initialized successfully. [ 248.749616] [drm] recover vram bo from shadow start [ 248.749666] [drm] recover vram bo from shadow done [ 248.749680] amdgpu 0000:0b:00.0: GPU reset(1) succeeded! Referenced Bugs: https://bugs.freedesktop.org/show_bug.cgi?id=109978 [Bug 109978] Unprivileged user mode program can cause GPU reset -- You are receiving this mail because: You are on the CC list for the bug.