From mboxrd@z Thu Jan 1 00:00:00 1970
From: bugzilla-daemon@freedesktop.org
Subject: [Bug 93652] Random crashes/freezing with amdgpu Fury X mesa 11.1
Date: Sun, 10 Jan 2016 14:42:54 +0000
Message-ID:
Bug ID
93652
Summary
Random crashes/freezing with amdgpu Fury X mesa 11.1
Product
Mesa
Version
11.0
Hardware
x86-64 (AMD64)
OS
Linux (All)
Status
NEW
Severity
normal
Priority
medium
Component
Drivers/Gallium/radeonsi
Assignee
dri-devel@lists.freedesktop.org
Reporter
wittyman37@yahoo.com
QA Contact
dri-devel@lists.freedesktop.org
Created attachment 120931 [details]
dmesg output after crash on Dota 2
So I am using a Sapphire R9 Fury X with Antergos and the open source amdgpu
driver. I am currently running the 4.4rc8 kernel and am getting random freezes
or crashes about once per hour or so.
Software versions:
4.4.0-rc8-g02006f7a
OpenGL version string: 3.0 Mesa 11.1.0
GPU hardware:
OpenGL renderer string: Gallium 0.4 on AMD FIJI (DRM 3.1.0, LLVM 3.7.0)
01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc.
[AMD/ATI] Fiji XT [Radeon R9 FURY X] 1002:7300
CPU hardware:
x86_64
AMD FX-8370 Eight-Core Processor
Attached is the output of dmesg after a crash.
You should probably also provide your Xorg log and llvm versio= n. You can start Steam and/or Dota from a terminal window and see what it prin= ts when it crashes.
Created attachment 120942 [details]
steam log during system freeze
I'm not sure if this is the same issue but I get freezes with - Radeon R9 380 (Tonga) - MESA 11.1.2 - Linux 4.5-rc4 - amdgpu driver with powerplay enabled The game usually runs for up to 30 minutes and then freezes unexpectedly. To rule out hardware defects, I also tested the proprietary fglrx driver which seems stable. Steam doesn't print anything unusual on the console. However after 2 minutes, the kernel reports an unresponsive Xorg process with a backtrace: [ 7680.137938] INFO: task Xorg:8367 blocked for more than 120 seconds. [ 7680.137945] Tainted: G O 4.5.0-rc4-desktop #1 [ 7680.137948] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7680.137952] Xorg D ffff88043ed542c0 0 8367 8218 0x00000084 [ 7680.137959] ffff88040d723a38 ffff8800bb922240 0000000000000000 ffffffff813d7879 [ 7680.137964] ffff88040d724000 ffff88042c804630 ffff88042c800000 0000000000000001 [ 7680.137969] 0000000000000000 ffff8803f651d4c0 ffff88042c806178 ffffffff818ab045 [ 7680.137974] Call Trace: [ 7680.137985] [<ffffffff813d7879>] ? __kfifo_in+0x1d/0x25 [ 7680.137992] [<ffffffff818ab045>] ? schedule+0x7c/0x90 [ 7680.138050] [<ffffffffa006e9a4>] ? amd_sched_entity_push_job+0x52/0x6b [amdgpu] [ 7680.138056] [<ffffffff810bd286>] ? wait_woken+0x66/0x66 [ 7680.138102] [<ffffffffa006edd2>] ? amdgpu_sched_ib_submit_kernel_helper+0xfd/0x170 [amdgpu] [ 7680.138143] [<ffffffffa001ad02>] ? amdgpu_gem_prime_export+0x3f/0x3f [amdgpu] [ 7680.138184] [<ffffffffa001b386>] ? amdgpu_vm_bo_update_mapping+0x33f/0x415 [amdgpu] [ 7680.138226] [<ffffffffa001bcfe>] ? amdgpu_vm_bo_update+0xe2/0x172 [amdgpu] [ 7680.138265] [<ffffffffa0010def>] ? amdgpu_gem_va_update_vm+0x159/0x1aa [amdgpu] [ 7680.138306] [<ffffffffa001c10c>] ? amdgpu_vm_bo_map+0x191/0x329 [amdgpu] [ 7680.138344] [<ffffffffa0011cd2>] ? amdgpu_gem_va_ioctl+0x2b2/0x338 [amdgpu] [ 7680.138382] [<ffffffffa0011cd2>] ? amdgpu_gem_va_ioctl+0x2b2/0x338 [amdgpu] [ 7680.138389] [<ffffffff8149318d>] ? drm_ioctl+0x223/0x353 [ 7680.138392] [<ffffffff8149318d>] ? drm_ioctl+0x223/0x353 [ 7680.138431] [<ffffffffa0011a20>] ? amdgpu_gem_metadata_ioctl+0x1ca/0x1ca [amdgpu] [ 7680.138436] [<ffffffff81138c65>] ? unmap_region+0xc3/0xd2 [ 7680.138469] [<ffffffffa0000046>] ? amdgpu_drm_ioctl+0x46/0x72 [amdgpu] [ 7680.138474] [<ffffffff8116ecf5>] ? vfs_ioctl+0x16/0x23 [ 7680.138478] [<ffffffff8116f1df>] ? do_vfs_ioctl+0x46a/0x513 [ 7680.138483] [<ffffffff810ff59d>] ? __audit_syscall_entry+0xbe/0xe2 [ 7680.138488] [<ffffffff8116f2d6>] ? SyS_ioctl+0x4e/0x71 [ 7680.138493] [<ffffffff818adc57>] ? entry_SYSCALL_64_fastpath+0x12/0x66 [ 7680.138528] INFO: task kworker/u12:12:2299 blocked for more than 120 seconds. [ 7680.138531] Tainted: G O 4.5.0-rc4-desktop #1 [ 7680.138534] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7680.138536] kworker/u12:12 D ffff88043ec542c0 0 2299 2 0x00000080 [ 7680.138578] Workqueue: amdgpu-pageflip-queue amdgpu_flip_work_func [amdgpu] [ 7680.138581] ffff8803251dfca0 ffff88042ddc82c0 000000000000000a 0000000000000490 [ 7680.138586] ffff8803251e0000 ffff8803251dfd80 ffff88042ddc82c0 0000000000000246 [ 7680.138591] ffff8804253fe600 ffff8803251dfd90 ffff8803251dfd60 ffffffff818ab045 [ 7680.138595] Call Trace: [ 7680.138601] [<ffffffff818ab045>] ? schedule+0x7c/0x90 [ 7680.138606] [<ffffffff818acf7f>] ? schedule_timeout+0x44/0x1df [ 7680.138612] [<ffffffff810b818a>] ? load_balance+0x15c/0x7fe [ 7680.138617] [<ffffffff810b22ed>] ? sched_clock_cpu+0xc/0xb0 [ 7680.138623] [<ffffffff81583514>] ? fence_default_wait+0x109/0x1ac [ 7680.138628] [<ffffffff81583514>] ? fence_default_wait+0x109/0x1ac [ 7680.138633] [<ffffffff8158303b>] ? fence_free+0xe/0xe [ 7680.138670] [<ffffffffa000ea2b>] ? amdgpu_flip_wait_fence+0x32/0xa5 [amdgpu] [ 7680.138708] [<ffffffffa000fae3>] ? amdgpu_flip_work_func+0x5d/0x156 [amdgpu] [ 7680.138714] [<ffffffff810a4351>] ? process_one_work+0x194/0x29f [ 7680.138718] [<ffffffff810a49a6>] ? worker_thread+0x276/0x360 [ 7680.138723] [<ffffffff810a4730>] ? rescuer_thread+0x2ad/0x2ad [ 7680.138727] [<ffffffff810a85bf>] ? kthread+0xc1/0xc9 [ 7680.138731] [<ffffffff810a84fe>] ? kthread_create_on_node+0x17c/0x17c [ 7680.138735] [<ffffffff818adf9f>] ? ret_from_fork+0x3f/0x70 [ 7680.138739] [<ffffffff810a84fe>] ? kthread_create_on_node+0x17c/0x17c
Linux 4.5-rc7: The call trace seems to have changed since 4.5-rc4. The desktop hangs irrevocably after a few minutes of running the game "Left 4 Dead 2". The issue does not occur with an old Nvidia GPU and the nouveau drivers. [ 1080.214273] INFO: task Xorg:5799 blocked for more than 120 seconds. [ 1080.214277] Not tainted 4.5.0-rc7-desktop #1 [ 1080.214279] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1080.214282] Xorg D ffff88043ec13fc0 0 5799 5326 0x00000084 [ 1080.214286] ffff880420b1fa30 0000000000000000 ffff8800bcaa23c0 0000000000000000 [ 1080.214289] 0000000000000000 ffff880420b20000 ffff880420b1fa48 ffff88042c9d8000 [ 1080.214292] 0000000000000001 0000000000000000 ffff88042ab70040 ffff88042c9de178 [ 1080.214295] Call Trace: [ 1080.214302] [<ffffffff818a38b8>] ? schedule+0x7f/0x93 [ 1080.214306] [<ffffffff818a38b8>] ? schedule+0x7f/0x93 [ 1080.214331] [<ffffffffa004d52e>] ? amd_sched_entity_push_job+0x52/0x6b [amdgpu] [ 1080.214335] [<ffffffff810bd080>] ? wait_woken+0x66/0x66 [ 1080.214355] [<ffffffffa004d950>] ? amdgpu_sched_ib_submit_kernel_helper+0xfd/0x170 [amdgpu] [ 1080.214373] [<ffffffffa001aa00>] ? amdgpu_gem_prime_export+0x3f/0x3f [amdgpu] [ 1080.214391] [<ffffffffa001b064>] ? amdgpu_vm_bo_update_mapping+0x324/0x414 [amdgpu] [ 1080.214410] [<ffffffffa001b9fa>] ? amdgpu_vm_bo_update+0xe0/0x172 [amdgpu] [ 1080.214427] [<ffffffffa0010b53>] ? amdgpu_gem_va_update_vm+0x159/0x1a8 [amdgpu] [ 1080.214445] [<ffffffffa001be17>] ? amdgpu_vm_bo_map+0x198/0x335 [amdgpu] [ 1080.214462] [<ffffffffa0011a34>] ? amdgpu_gem_va_ioctl+0x2b7/0x343 [amdgpu] [ 1080.214478] [<ffffffffa0011a34>] ? amdgpu_gem_va_ioctl+0x2b7/0x343 [amdgpu] [ 1080.214482] [<ffffffff8148f8c3>] ? drm_ioctl+0x225/0x353 [ 1080.214484] [<ffffffff8148f8c3>] ? drm_ioctl+0x225/0x353 [ 1080.214501] [<ffffffffa001177d>] ? amdgpu_gem_metadata_ioctl+0x1c7/0x1c7 [amdgpu] [ 1080.214505] [<ffffffff81131fa2>] ? __do_fault+0x61/0xaa [ 1080.214519] [<ffffffffa0000046>] ? amdgpu_drm_ioctl+0x46/0x72 [amdgpu] [ 1080.214522] [<ffffffff8116e1db>] ? vfs_ioctl+0x16/0x23 [ 1080.214524] [<ffffffff8116e6f9>] ? do_vfs_ioctl+0x49e/0x50e [ 1080.214527] [<ffffffff810fec2c>] ? __audit_syscall_entry+0xbb/0xdf [ 1080.214530] [<ffffffff8116e7b6>] ? SyS_ioctl+0x4d/0x6f [ 1080.214533] [<ffffffff818a64d7>] ? entry_SYSCALL_64_fastpath+0x12/0x66
Thanks for the reports. Those backtraces are typical consequences of a GPU hang and are unfortunately not helpful for isolating the root cause. One thing you could try is starting Steam with R600_DEBUG=nodcc from a terminal window. Also, if it doesn't take too long for the hang to occur (Wolfgang mentions a few minutes), you could try recording an apitrace, and see whether you also get a lockup when you replay the trace. Such a trace would be very helpful.
I can also confirm this bug in form of a full system freeze on an AMD 380X on the latest Mesa revision in addition to running the latest drm-next as of 27th March 2016. To put it short, setting R600_DEBUG=nodcc seems to alleviate the crashes, based on a couple of hours of testing. For some details, I can confirm the crashes in at least Dota 2, Portal, Counter-Strike: Global Offensive - all native applications - and under WINE, Starcraft II and Heroes of the Storm both crash within minutes. On the contrary, the crash never occured in other programs, like Furmark, Battleblock Theater, as well as Awesomenauts. I also tried recording multiple apitraces where the issue occured, but none of the traces would actually reproduce the hang. However, what strikes me as most odd is that the hang generally never occured in games in WINE when using the Gallium Nine patches, while the hangs would occur within minutes otherwise.
What | Removed | Added |
---|---|---|
Version | 11.0 | git |
I also have a Sapphire 380X with Linux 4.5 and powerplay enabled. With latest git mesa glxinfo completely freezes the system. I have made a git bisect and it shows this commit: https://cgit.freedesktop.org/mesa/mesa/commit/?id=ec74deeb2466689a0eca52f290d5f9e44af6a97b radeonsi: set amdgpu metadata before exporting a texture I don't know if this has something to do with this bug but after reverting it i get no more freezes.
What | Removed | Added |
---|---|---|
CC | s@gonx.dk |
One of these commits fixes the issue for me but i don't know w= hich one it is exactly. drm/amdgpu: make sure vertical front porch is at least 1 https://cgit.freedes= ktop.org/~agd5f/linux/commit/?h=3Ddrm-fixes-4.6&id=3D0126d4b9a516256f24= 32ca0dc78ab293a8255378 drm/radeon: make sure vertical front porch is at least 1 https://cgit.freedes= ktop.org/~agd5f/linux/commit/?h=3Ddrm-fixes-4.6&id=3D3104b8128d4d646a57= 4ed9d5b17c7d10752cd70b drm/amdgpu: set metadata pointer to NULL after freeing. https://cgit.freedes= ktop.org/~agd5f/linux/commit/?h=3Ddrm-fixes-4.6&id=3D0092d3edcb23fcdb8c= be4159ba94a534290ff982 I think it is=20 drm/amdgpu: set metadata pointer to NULL after freeing. This patch is not merged into drm-next 4.6 yet.
Created attachment 123608=
[details]
Xorg blocked backtrace
Is there anything we can do to help debug this? The crashes still exist at
drm-next commit bafb86f5bc3173479002555dea7f31d943b12332 (May 9 13:49:56 +1=
000)
- basically 4.6.0-rc7.
The backtrace remains similar to what was posted earlier.
That commit doesn't particularly fix the crash for me. But I'v= e found out that forcing the performance level (echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level) severely red= uces the amount of crashes I have. With it set to auto in games that have varying GPU loads my card will crash in minutes. Whereas with it set to high I can = be lucky and not get a crash for hours, but it will still crash occasionally. This still occurs on drm-next, mainline, and ~agd5f/linux branch drm-next-4.8-wip
What | Removed | Added |
---|---|---|
CC | boltronics@gmail.com |
Confirming this same behaviour on an Asus Radeon R9 285 OC 2GB= card: 01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Tonga PRO [Radeon R9 285/380] [1002:6939] As others have said, Gallium Nine seems to run reliably. Vanilla Wine causes regular crashes, where I always have to SSH into the host from a laptop to reboot it. No problem with fglrx on Ubuntu 14.04.4. echo high > /sys/class/drm/card0/device/power_dpm_force_performance_leve= l does help a lot while active, but somehow this keeps getting reset back to auto which causes crashes again. I even have a cron job to run the above command every minute, but it's not enough. [ 6960.948175] INFO: task Xorg:5192 blocked for more than 120 seconds. [ 6960.948177] Tainted: G OE 4.7.0-rc2+ #2 [ 6960.948177] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs&qu= ot; disables this message. [ 6960.948178] Xorg D ffff88086ec56d80 0 5192 5169 0x0000= 0004 [ 6960.948180] ffff880820e5c0c0 000000000011f5f8 ffffffff81327b2d ffff88082620c000 [ 6960.948181] ffff880847c45780 ffff8808453274e8 0000000000000001 000000000011f5f8 [ 6960.948182] ffff880842e8d9c8 ffffffff815cd641 ffff880407949400 ffffffffc0b10c0f [ 6960.948183] Call Trace: [ 6960.948186] [<ffffffff81327b2d>] ? __kfifo_in+0x2d/0x40 [ 6960.948187] [<ffffffff815cd641>] ? schedule+0x31/0x80 [ 6960.948200] [<ffffffffc0b10c0f>] ? amd_sched_entity_push_job+0x6f= /0x110 [amdgpu] [ 6960.948202] [<ffffffff810b87b0>] ? wake_atomic_t_function+0x60/0x= 60 [ 6960.948211] [<ffffffffc0b115af>] ? amdgpu_job_submit+0x9f/0xf0 [a= mdgpu] [ 6960.948218] [<ffffffffc0ad6cbf>] ? amdgpu_vm_bo_update_mapping+0x= 2bf/0x430 [amdgpu] [ 6960.948225] [<ffffffffc0ad6f8a>] ? amdgpu_vm_bo_split_mapping+0x1= 5a/0x1a0 [amdgpu] [ 6960.948231] [<ffffffffc0ad81cf>] ? amdgpu_vm_clear_freed+0x4f/0x9= 0 [amdgpu] [ 6960.948237] [<ffffffffc0ac81a8>] ? amdgpu_gem_va_update_vm+0x188/= 0x1c0 [amdgpu] [ 6960.948239] [<ffffffffc09cdc9a>] ? ttm_bo_add_to_lru+0x8a/0xf0 [t= tm] [ 6960.948245] [<ffffffffc0ac929c>] ? amdgpu_gem_va_ioctl+0x22c/0x2e= 0 [amdgpu] [ 6960.948251] [<ffffffffc07e5701>] ? drm_gem_object_handle_unreference_unlocked+0x11/0xa0 [drm] [ 6960.948254] [<ffffffffc07e6601>] ? drm_ioctl+0x131/0x4c0 [drm] [ 6960.948260] [<ffffffffc0ac9070>] ? amdgpu_gem_metadata_ioctl+0x1c= 0/0x1c0 [amdgpu] [ 6960.948262] [<ffffffff811f1ee9>] ? do_readv_writev+0x149/0x240 [ 6960.948263] [<ffffffff8131b974>] ? timerqueue_add+0x54/0xa0 [ 6960.948267] [<ffffffffc0ab2046>] ? amdgpu_drm_ioctl+0x46/0x80 [am= dgpu] [ 6960.948269] [<ffffffff8120537d>] ? do_vfs_ioctl+0x9d/0x5c0 [ 6960.948270] [<ffffffff814b65ed>] ? __sys_recvmsg+0x7d/0x90 [ 6960.948271] [<ffffffff81205914>] ? SyS_ioctl+0x74/0x80 [ 6960.948272] [<ffffffff815d1536>] ? entry_SYSCALL_64_fastpath+0x1e= /0xa8 drm: 625d1810ad1f61dd4f4b2b2ee7e5cc67e1fdc2f1 on master. xf86-video-amdgpu: d96dabc71b1b32dc4b422a9633cdd4e0e95da052 on master. mesa: d93bacc1fa4bf1d6d358da3615b00305e8518f33 on master. linux: 0812a945fbb814e7946fbe6ddcc81d054c8b6c91 on polaris-test (from git://people.freedesktop.org/~agd5f/linux)
I seem to be experiencing this issue as well. ArchLinux Linux 4.7 Mesa 12.0.1 xorg-server 1.18.4 GPU is an AMD R9 285 with 2GB of video RAM. I seem to encounter this issue once every day or two. The desktop is a fro= zen framebuffer. SSH is still responsive, though I can't reboot or shutdown cleanly (I have = to hard power off; maybe I could kill Xorg or something, I haven't tried that yet). I noticed that the crash logs are similar to what I am observing, but that doesn't appear to collect data for this issue. 1) When it does crash is there anything that can be done at that point to collect data that would be useful? 2) Shouldn't the amdgpu driver respond appropriately to inappropriate use a= nd cause a clean crash of the offending application? Preferably just whichever program has the bug and not the entire xorg session?
Hi. I believe I have the same issue? I run openSUSE Tumbleweed= , and have Mesa 12.0.3 installed. Since some recent update, the OS apparently freezes at random, and the machine has to be powered off and restarted! Some games see= m to cause this issue, and playing them will randomly crash the machine at random intervals. Very disappointing that such a thing can still happen with MESA today...
I have separately reported my problem in another issue, as I u= se different hardware and mine might be a separate problem. This was reported at the beginning of the year, and at that time I did not experience those system freezes. https://bugs.freedesktop.org/show_bug.c= gi?id=3D98520
Does this happen with latest mesa and llvm? After all, llvm 3.= 7 is really old.