dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [Bug 109978] Unprivileged user mode program can cause GPU reset
@ 2019-03-12 13:56 bugzilla-daemon
  2019-03-14  8:19 ` bugzilla-daemon
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: bugzilla-daemon @ 2019-03-12 13:56 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 6173 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109978

            Bug ID: 109978
           Summary: Unprivileged user mode program can cause GPU reset
           Product: DRI
           Version: XOrg git
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: major
          Priority: medium
         Component: DRM/amdkfd
          Assignee: dri-devel@lists.freedesktop.org
          Reporter: sudolskym@gmail.com

https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/issues/72

Sample program which causes this (needs ROCm):

> #include <hc.hpp>
> int main()
> {
> 	parallel_for_each(hc::extent<1>(1), [=]() [[hc]]
> 	{
> 		asm("s_trap 2");
> 	});
> 	return 0;
> }

> hcc -hc main.cpp
> ./a.out

Process never ends and CTRL-C causes GPU reset which breaks all other processes
actually using rocm on that GPU. Seems trap handler expects queue handle in
s[0:1] which is set when using __builtin_trap() so without it trap handler
causes another exceptions.

System logs:

[  247.428727] qcm fence wait loop timeout expired
[  247.428730] The cp might be in an unrecoverable state due to an unsuccessful
queues preemption
[  247.428736] amdgpu 0000:0b:00.0: GPU reset begin!
[  247.619440] amdgpu 0000:0b:00.0: GPU reset
[  248.152762] [drm] psp mode1 reset succeed 
[  248.279461] amdgpu 0000:0b:00.0: GPU reset succeeded, trying to resume
[  248.279584] [drm] PCIE GART of 512M enabled (table at 0x000000F400900000).
[  248.279639] [drm:amdgpu_device_gpu_recover [amdgpu]] *ERROR* VRAM is lost!
[  248.279769] [drm] PSP is resuming...
[  248.428305] [drm] reserve 0x400000 from 0xf400d00000 for PSP TMR SIZE
[  248.472774] WARNING: CPU: 23 PID: 21634 at
/build/linux-uQJ2um/linux-4.15.0/kernel/kthread.c:498 kthread_park+0x67/0x80
[  248.472775] Modules linked in: ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs
msr nls_utf8 cifs ccm fscache cmac bnep binfmt_misc nls_iso8859_1 edac_mce_amd
arc4 snd_hda_codec_realtek snd_hda_codec_generic kvm_amd snd_hda_codec_hdmi kvm
snd_seq_midi irqbypass snd_hda_intel snd_seq_midi_event snd_hda_codec btusb
snd_hda_core btrtl wmi_bmof snd_rawmidi iwlmvm snd_hwdep btbcm btintel snd_pcm
snd_seq bluetooth mac80211 snd_seq_device ecdh_generic snd_timer iwlwifi ccp
snd cfg80211 soundcore k10temp shpchp mac_hid sch_fq_codel ib_iser rdma_cm
iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
nct6775 hwmon_vid parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
async_tx xor raid6_pq libcrc32c raid1
[  248.472823]  multipath linear raid0 amdgpu(OE) amdchash(OE) amdttm(OE)
amd_sched(OE) mxm_wmi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc
aesni_intel aes_x86_64 amdkcl(OE) crypto_simd glue_helper amd_iommu_v2 cryptd
drm_kms_helper syscopyarea sysfillrect sysimgblt igb fb_sys_fops drm dca nvme
i2c_algo_bit i2c_piix4 nvme_core ptp ahci atlantic libahci pps_core gpio_amdpt
wmi gpio_generic
[  248.472846] CPU: 23 PID: 21634 Comm: a.out Tainted: G           OE   
4.15.0-45-generic #48-Ubuntu
[  248.472847] Hardware name: To Be Filled By O.E.M. To Be Filled By
O.E.M./X399 Professional Gaming, BIOS P3.30 08/14/2018
[  248.472849] RIP: 0010:kthread_park+0x67/0x80
[  248.472850] RSP: 0018:ffffb44fc7e27ad0 EFLAGS: 00010202
[  248.472852] RAX: 0000000000000004 RBX: ffff9ec63f49e480 RCX:
0000000000000000
[  248.472853] RDX: ffff9ec63c717198 RSI: ffff9ec63ea0c0c0 RDI:
ffff9ec63dd38000
[  248.472854] RBP: ffffb44fc7e27ae0 R08: 0000000000000051 R09:
0000000000000000
[  248.472855] R10: 0000000000000000 R11: 0000000000000056 R12:
ffff9ec63ea0c0c0
[  248.472855] R13: ffff9ec64f4f4200 R14: ffff9ec63c710000 R15:
0000000000000000
[  248.472857] FS:  00007fd52a286c00(0000) GS:ffff9ec65cdc0000(0000)
knlGS:0000000000000000
[  248.472858] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  248.472859] CR2: 00007f0c07687a98 CR3: 000000081b5b6000 CR4:
00000000003406e0
[  248.472860] Call Trace:
[  248.472865]  amddrm_sched_entity_fini+0x44/0x1b0 [amd_sched]
[  248.472868]  amddrm_sched_entity_destroy+0x1f/0x30 [amd_sched]
[  248.472907]  amdgpu_vm_fini+0xbb/0x4f0 [amdgpu]
[  248.472942]  amdgpu_driver_postclose_kms+0x15b/0x2b0 [amdgpu]
[  248.472952]  drm_release+0x26b/0x390 [drm]
[  248.472955]  __fput+0xea/0x220
[  248.472957]  ____fput+0xe/0x10
[  248.472959]  task_work_run+0x9d/0xc0
[  248.472961]  do_exit+0x2ec/0xb40
[  248.472963]  do_group_exit+0x43/0xb0
[  248.472965]  get_signal+0x27b/0x590
[  248.472968]  do_signal+0x37/0x730
[  248.472971]  ? __switch_to_asm+0x34/0x70
[  248.472973]  ? __switch_to_asm+0x40/0x70
[  248.472976]  ? do_vfs_ioctl+0xa8/0x630
[  248.472978]  ? __schedule+0x299/0x8a0
[  248.472980]  exit_to_usermode_loop+0x73/0xd0
[  248.472982]  do_syscall_64+0x115/0x130
[  248.472984]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  248.472986] RIP: 0033:0x7fd528bdd5d7
[  248.472987] RSP: 002b:00007ffe830d4778 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[  248.472988] RAX: fffffffffffffffc RBX: 0000000000000001 RCX:
00007fd528bdd5d7
[  248.472989] RDX: 00007ffe830d47d0 RSI: 00000000c0184b0c RDI:
0000000000000003
[  248.472990] RBP: 00007ffe830d47d0 R08: 00007ffe830d4890 R09:
0000000000000001
[  248.472990] R10: 0000000000c92010 R11: 0000000000000246 R12:
00000000c0184b0c
[  248.472991] R13: 0000000000000003 R14: 0000000000000000 R15:
00000000fffffffe
[  248.472992] Code: 0e e8 6e c0 00 00 48 8d 7b 18 e8 35 d2 8e 00 44 89 e0 5b
41 5c 5d c3 0f 0b 41 bc da ff ff ff 44 89 e0 5b 41 5c 5d c3 0f 0b eb af <0f> 0b
41 bc f0 ff ff ff eb da 0f 1f 44 00 00 66 2e 0f 1f 84 00 
[  248.473020] ---[ end trace 19649ddd4a6314f7 ]---
[  248.648453] [drm] UVD and UVD ENC initialized successfully.
[  248.748509] [drm] VCE initialized successfully.
[  248.749616] [drm] recover vram bo from shadow start
[  248.749666] [drm] recover vram bo from shadow done
[  248.749680] amdgpu 0000:0b:00.0: GPU reset(1) succeeded!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 7668 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug 109978] Unprivileged user mode program can cause GPU reset
  2019-03-12 13:56 [Bug 109978] Unprivileged user mode program can cause GPU reset bugzilla-daemon
@ 2019-03-14  8:19 ` bugzilla-daemon
  2019-03-14  8:37 ` bugzilla-daemon
  2019-11-19  7:53 ` bugzilla-daemon
  2 siblings, 0 replies; 4+ messages in thread
From: bugzilla-daemon @ 2019-03-14  8:19 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 501 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109978

baigshakira123@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |110099


Referenced Bugs:

https://bugs.freedesktop.org/show_bug.cgi?id=110099
[Bug 110099] Unprivileged user mode program can cause GPU reset
-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1548 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug 109978] Unprivileged user mode program can cause GPU reset
  2019-03-12 13:56 [Bug 109978] Unprivileged user mode program can cause GPU reset bugzilla-daemon
  2019-03-14  8:19 ` bugzilla-daemon
@ 2019-03-14  8:37 ` bugzilla-daemon
  2019-11-19  7:53 ` bugzilla-daemon
  2 siblings, 0 replies; 4+ messages in thread
From: bugzilla-daemon @ 2019-03-14  8:37 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 502 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109978

Andre Klapper <a9016009@gmx.de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|110099                      |


Referenced Bugs:

https://bugs.freedesktop.org/show_bug.cgi?id=110099
[Bug 110099] Unprivileged user mode program can cause GPU reset
-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1562 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug 109978] Unprivileged user mode program can cause GPU reset
  2019-03-12 13:56 [Bug 109978] Unprivileged user mode program can cause GPU reset bugzilla-daemon
  2019-03-14  8:19 ` bugzilla-daemon
  2019-03-14  8:37 ` bugzilla-daemon
@ 2019-11-19  7:53 ` bugzilla-daemon
  2 siblings, 0 replies; 4+ messages in thread
From: bugzilla-daemon @ 2019-11-19  7:53 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 803 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109978

Martin Peres <martin.peres@free.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |MOVED

--- Comment #1 from Martin Peres <martin.peres@free.fr> ---
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been
closed from further activity.

You can subscribe and participate further through the new bug through this link
to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/5.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2370 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-11-19  7:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-12 13:56 [Bug 109978] Unprivileged user mode program can cause GPU reset bugzilla-daemon
2019-03-14  8:19 ` bugzilla-daemon
2019-03-14  8:37 ` bugzilla-daemon
2019-11-19  7:53 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).