dri-devel Archive on lore.kernel.org
 help / color / Atom feed
* [Bug 206475] New: amdgpu under load drop signal to monitor until hard reset
@ 2020-02-09 20:36 bugzilla-daemon
  2020-02-10 13:20 ` [Bug 206475] " bugzilla-daemon
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: bugzilla-daemon @ 2020-02-09 20:36 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=206475

            Bug ID: 206475
           Summary: amdgpu under load drop signal to monitor until hard
                    reset
           Product: Drivers
           Version: 2.5
    Kernel Version: 5.5.2
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Video(DRI - non Intel)
          Assignee: drivers_video-dri@kernel-bugs.osdl.org
          Reporter: rodomar705@protonmail.com
        Regression: No

Created attachment 287265
  --> https://bugzilla.kernel.org/attachment.cgi?id=287265&action=edit
dmesg for the amdgpu hardware freeze

While gaming the monitor goes blank randomly, only with this error in the logs
of the system

kernel: amdgpu: [powerplay] last message was failed ret is 65535
kernel: amdgpu: [powerplay] failed to send message 200 ret is 65535 
kernel: amdgpu: [powerplay] last message was failed ret is 65535
kernel: amdgpu: [powerplay] failed to send message 282 ret is 65535 
kernel: amdgpu: [powerplay] last message was failed ret is 65535
kernel: amdgpu: [powerplay] failed to send message 201 ret is 65535

with the occasional
kernel: [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR*
[CRTC:47:crtc-0] flip_done timed out
kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled
seq=5275264, emitted seq=5275266
kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process
Hand of Fate 2. pid 682062 thread Hand of Fa:cs0 pid 682064
kernel: amdgpu 0000:06:00.0: GPU reset begin!

over and over again. If I reset the system no video output is seen until the
system is fully shut off.

B450 chipset + Ryzen 5 2600 + Radeon RX580 GPU

Full log is attached to this post.

Can anyone at AMD give me some pointers to what the problem is?

Thanks,

Marco.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 206475] amdgpu under load drop signal to monitor until hard reset
  2020-02-09 20:36 [Bug 206475] New: amdgpu under load drop signal to monitor until hard reset bugzilla-daemon
@ 2020-02-10 13:20 ` bugzilla-daemon
  2020-02-10 13:21 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2020-02-10 13:20 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=206475

--- Comment #1 from Marco (rodomar705@protonmail.com) ---
Just tested under 5.5.2 stock kernel (besides ZFS module) and the same problem
show up. Log attached.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 206475] amdgpu under load drop signal to monitor until hard reset
  2020-02-09 20:36 [Bug 206475] New: amdgpu under load drop signal to monitor until hard reset bugzilla-daemon
  2020-02-10 13:20 ` [Bug 206475] " bugzilla-daemon
@ 2020-02-10 13:21 ` bugzilla-daemon
  2020-02-10 16:39 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2020-02-10 13:21 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=206475

--- Comment #2 from Marco (rodomar705@protonmail.com) ---
Created attachment 287275
  --> https://bugzilla.kernel.org/attachment.cgi?id=287275&action=edit
amdgpu crash with stock kernel

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 206475] amdgpu under load drop signal to monitor until hard reset
  2020-02-09 20:36 [Bug 206475] New: amdgpu under load drop signal to monitor until hard reset bugzilla-daemon
  2020-02-10 13:20 ` [Bug 206475] " bugzilla-daemon
  2020-02-10 13:21 ` bugzilla-daemon
@ 2020-02-10 16:39 ` bugzilla-daemon
  2020-02-10 16:40 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2020-02-10 16:39 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=206475

--- Comment #3 from Marco (rodomar705@protonmail.com) ---
Same thing with linux-amd-drm-next, dmesg attached. Any pointers to the cause?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 206475] amdgpu under load drop signal to monitor until hard reset
  2020-02-09 20:36 [Bug 206475] New: amdgpu under load drop signal to monitor until hard reset bugzilla-daemon
                   ` (2 preceding siblings ...)
  2020-02-10 16:39 ` bugzilla-daemon
@ 2020-02-10 16:40 ` bugzilla-daemon
  2020-02-10 19:33 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2020-02-10 16:40 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=206475

--- Comment #4 from Marco (rodomar705@protonmail.com) ---
Created attachment 287277
  --> https://bugzilla.kernel.org/attachment.cgi?id=287277&action=edit
dmesg for amd-drm-next

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 206475] amdgpu under load drop signal to monitor until hard reset
  2020-02-09 20:36 [Bug 206475] New: amdgpu under load drop signal to monitor until hard reset bugzilla-daemon
                   ` (3 preceding siblings ...)
  2020-02-10 16:40 ` bugzilla-daemon
@ 2020-02-10 19:33 ` bugzilla-daemon
  2020-02-17 13:23 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2020-02-10 19:33 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=206475

Marco (rodomar705@protonmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |OBSOLETE

--- Comment #5 from Marco (rodomar705@protonmail.com) ---
It seems that the problem was insufficient cooling, since the same happened on
a Windows VM.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 206475] amdgpu under load drop signal to monitor until hard reset
  2020-02-09 20:36 [Bug 206475] New: amdgpu under load drop signal to monitor until hard reset bugzilla-daemon
                   ` (4 preceding siblings ...)
  2020-02-10 19:33 ` bugzilla-daemon
@ 2020-02-17 13:23 ` bugzilla-daemon
  2020-02-21 21:13 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2020-02-17 13:23 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=206475

Marco (rodomar705@protonmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|OBSOLETE                    |---

--- Comment #6 from Marco (rodomar705@protonmail.com) ---
(In reply to Marco from comment #5)
> It seems that the problem was insufficient cooling, since the same happened
> on a Windows VM.

Instead I was wrong, tested Furmark on two different driver sets on W10 bare
metal, no crashes for an hour (furmark on VM lasted for 30 seconds).

This is a firmware/software problem. Please fix it.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 206475] amdgpu under load drop signal to monitor until hard reset
  2020-02-09 20:36 [Bug 206475] New: amdgpu under load drop signal to monitor until hard reset bugzilla-daemon
                   ` (5 preceding siblings ...)
  2020-02-17 13:23 ` bugzilla-daemon
@ 2020-02-21 21:13 ` bugzilla-daemon
  2020-02-24 13:50 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2020-02-21 21:13 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=206475

Marco (rodomar705@protonmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|---                         |OBSOLETE

--- Comment #7 from Marco (rodomar705@protonmail.com) ---
Found the root of the issue, in some way ZFS was able to achieve a hard lock
always in the same way in amdgpu. After removal and a switch to xfs, the
problem is gone.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 206475] amdgpu under load drop signal to monitor until hard reset
  2020-02-09 20:36 [Bug 206475] New: amdgpu under load drop signal to monitor until hard reset bugzilla-daemon
                   ` (6 preceding siblings ...)
  2020-02-21 21:13 ` bugzilla-daemon
@ 2020-02-24 13:50 ` bugzilla-daemon
  2020-02-24 13:52 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2020-02-24 13:50 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=206475

Marco (rodomar705@protonmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|OBSOLETE                    |---

--- Comment #8 from Marco (rodomar705@protonmail.com) ---
Aaand it's back. Extremely less often, but it still there. However, this time
I've got a warning from the kernel in the backtrace:

feb 24 14:31:13 *** kernel: ------------[ cut here ]------------
feb 24 14:31:13 *** kernel: WARNING: CPU: 3 PID: 24149 at
drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_link_encoder.c:1099
dce110_link_encoder_disable_output+0x12a/0x140 [amdgpu]
feb 24 14:31:13 *** kernel: Modules linked in: rfcomm fuse xt_CHECKSUM
xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle
ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 ebtable_filter ebtables ip6table_filter ip6_tables
iptable_filter tun bridge stp llc cmac algif_hash algif_skcipher af_alg sr_mod
cdrom bnep hwmon_vid xfs nls_iso8859_1 nls_cp437 vfat fat btrfs edac_mce_amd
kvm_amd kvm blake2b_generic xor btusb btrtl btbcm btintel bluetooth
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel igb joydev ecdh_generic
aesni_intel eeepc_wmi asus_wmi crypto_simd battery cryptd sparse_keymap
mousedev input_leds ecc glue_helper raid6_pq ccp rfkill wmi_bmof pcspkr k10temp
dca libcrc32c i2c_piix4 rng_core evdev pinctrl_amd mac_hid gpio_amdpt
acpi_cpufreq vboxnetflt(OE) vboxnetadp(OE) vboxdrv(OE) virtio_mmio virtio_input
virtio_pci virtio_balloon usbip_host snd_hda_codec_realtek usbip_core
snd_hda_codec_generic uinput i2c_dev ledtrig_audio sg
feb 24 14:31:13 *** kernel:  snd_hda_codec_hdmi vhba(OE) crypto_user
snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_pcm
snd_timer snd soundcore ip_tables x_tables ext4 crc32c_generic crc16 mbcache
jbd2 sd_mod hid_generic usbhid hid ahci libahci libata crc32c_intel xhci_pci
xhci_hcd scsi_mod nouveau mxm_wmi wmi amdgpu gpu_sched i2c_algo_bit ttm
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm agpgart
vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio
feb 24 14:31:13 *** kernel: CPU: 3 PID: 24149 Comm: kworker/3:2 Tainted: G     
     OE     5.5.5-arch1-1 #1
feb 24 14:31:13 *** kernel: Hardware name: System manufacturer System Product
Name/ROG STRIX B450-F GAMING, BIOS 3003 12/09/2019
feb 24 14:31:13 *** kernel: Workqueue: events drm_sched_job_timedout
[gpu_sched]
feb 24 14:31:13 *** kernel: RIP:
0010:dce110_link_encoder_disable_output+0x12a/0x140 [amdgpu]
feb 24 14:31:13 *** kernel: Code: 44 24 38 65 48 33 04 25 28 00 00 00 75 20 48
83 c4 40 5b 5d 41 5c c3 48 c7 c6 40 05 76 c0 48 c7 c7 f0 b1 7d c0 e8 76 c3 d1
ff <0f> 0b eb d0 e8 7d 12 e7 df 66 66 2e 0f 1f 84 00 00 00 00 00 66 90
feb 24 14:31:13 *** kernel: RSP: 0018:ffffb06641417630 EFLAGS: 00010246
feb 24 14:31:13 *** kernel: RAX: 0000000000000000 RBX: ffff9790645be420 RCX:
0000000000000000
feb 24 14:31:13 *** kernel: RDX: 0000000000000000 RSI: 0000000000000082 RDI:
00000000ffffffff
feb 24 14:31:13 *** kernel: RBP: 0000000000000002 R08: 00000000000005ba R09:
0000000000000093
feb 24 14:31:13 *** kernel: R10: ffffb06641417480 R11: ffffb06641417485 R12:
ffffb06641417634
feb 24 14:31:13 *** kernel: R13: ffff979064fe6800 R14: ffff978f4f9201b8 R15:
ffff97906ba1ee00
feb 24 14:31:13 *** kernel: FS:  0000000000000000(0000)
GS:ffff97906e8c0000(0000) knlGS:0000000000000000
feb 24 14:31:13 *** kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
feb 24 14:31:13 *** kernel: CR2: 00007effa1509000 CR3: 0000000350d7a000 CR4:
00000000003406e0
feb 24 14:31:13 *** kernel: Call Trace:
feb 24 14:31:13 *** kernel:  core_link_disable_stream+0x10e/0x3d0 [amdgpu]
feb 24 14:31:13 *** kernel:  ? smu7_send_msg_to_smc.cold+0x20/0x25 [amdgpu]
feb 24 14:31:13 *** kernel:  dce110_reset_hw_ctx_wrap+0xc3/0x260 [amdgpu]
feb 24 14:31:13 *** kernel:  dce110_apply_ctx_to_hw+0x51/0x5d0 [amdgpu]
feb 24 14:31:13 *** kernel:  ? pp_dpm_dispatch_tasks+0x45/0x60 [amdgpu]
feb 24 14:31:13 *** kernel:  ? amdgpu_pm_compute_clocks+0xcd/0x600 [amdgpu]
feb 24 14:31:13 *** kernel:  ? dm_pp_apply_display_requirements+0x1a8/0x1c0
[amdgpu]
feb 24 14:31:13 *** kernel:  dc_commit_state+0x2b9/0x5e0 [amdgpu]
feb 24 14:31:13 *** kernel:  amdgpu_dm_atomic_commit_tail+0x398/0x20f0 [amdgpu]
feb 24 14:31:13 *** kernel:  ? number+0x337/0x380
feb 24 14:31:13 *** kernel:  ? vsnprintf+0x3aa/0x4f0
feb 24 14:31:13 *** kernel:  ? sprintf+0x5e/0x80
feb 24 14:31:13 *** kernel:  ? irq_work_queue+0x35/0x50
feb 24 14:31:13 *** kernel:  ? wake_up_klogd+0x4f/0x70
feb 24 14:31:13 *** kernel:  commit_tail+0x94/0x130 [drm_kms_helper]
feb 24 14:31:13 *** kernel:  drm_atomic_helper_commit+0x113/0x140
[drm_kms_helper]
feb 24 14:31:13 *** kernel:  drm_atomic_helper_disable_all+0x175/0x190
[drm_kms_helper]
feb 24 14:31:13 *** kernel:  drm_atomic_helper_suspend+0x73/0x120
[drm_kms_helper]
feb 24 14:31:13 *** kernel:  dm_suspend+0x1c/0x60 [amdgpu]
feb 24 14:31:13 *** kernel:  amdgpu_device_ip_suspend_phase1+0x81/0xe0 [amdgpu]
feb 24 14:31:13 *** kernel:  amdgpu_device_ip_suspend+0x1c/0x60 [amdgpu]
feb 24 14:31:13 *** kernel:  amdgpu_device_pre_asic_reset+0x191/0x1a4 [amdgpu]
feb 24 14:31:13 *** kernel:  amdgpu_device_gpu_recover+0x2ee/0xa13 [amdgpu]
feb 24 14:31:13 *** kernel:  amdgpu_job_timedout+0x103/0x130 [amdgpu]
feb 24 14:31:13 *** kernel:  drm_sched_job_timedout+0x3e/0x90 [gpu_sched]
feb 24 14:31:13 *** kernel:  process_one_work+0x1e1/0x3d0
feb 24 14:31:13 *** kernel:  worker_thread+0x4a/0x3d0
feb 24 14:31:13 *** kernel:  kthread+0xfb/0x130
feb 24 14:31:13 *** kernel:  ? process_one_work+0x3d0/0x3d0
feb 24 14:31:13 *** kernel:  ? kthread_park+0x90/0x90
feb 24 14:31:13 *** kernel:  ret_from_fork+0x22/0x40
feb 24 14:31:13 *** kernel: ---[ end trace 3e7589981fe74b17 ]---

Complete log attached below.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 206475] amdgpu under load drop signal to monitor until hard reset
  2020-02-09 20:36 [Bug 206475] New: amdgpu under load drop signal to monitor until hard reset bugzilla-daemon
                   ` (7 preceding siblings ...)
  2020-02-24 13:50 ` bugzilla-daemon
@ 2020-02-24 13:52 ` bugzilla-daemon
  2020-05-22 12:55 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2020-02-24 13:52 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=206475

--- Comment #9 from Marco (rodomar705@protonmail.com) ---
Created attachment 287575
  --> https://bugzilla.kernel.org/attachment.cgi?id=287575&action=edit
Latest log with a warning.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 206475] amdgpu under load drop signal to monitor until hard reset
  2020-02-09 20:36 [Bug 206475] New: amdgpu under load drop signal to monitor until hard reset bugzilla-daemon
                   ` (8 preceding siblings ...)
  2020-02-24 13:52 ` bugzilla-daemon
@ 2020-05-22 12:55 ` bugzilla-daemon
  2020-05-23 14:40 ` bugzilla-daemon
  2020-05-23 16:44 ` bugzilla-daemon
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2020-05-22 12:55 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=206475

andrewammerlaan@riseup.net (andrewammerlaan@riseup.net) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andrewammerlaan@riseup.net

--- Comment #10 from andrewammerlaan@riseup.net (andrewammerlaan@riseup.net) ---
Created attachment 289235
  --> https://bugzilla.kernel.org/attachment.cgi?id=289235&action=edit
syslog

I think I ran into this issue as well. It has happened twice. Both times it
happened 10 to 20 minutes *after* playing minecraft. Both times I was in a full
screen video meeting. Everything works, except the screen goes black, I could
finish the meeting, but without seeing anything. 

Only the monitors connected to my RX 590 go black, the one connected to the
iGPU just freezes, and after a while the cursor becomes usable again on that
monitor, though all applications remain frozen, and switching to tty does not
work. REISUB'ing the machine makes it boot on the iGPU. It needs to be
completely switched on and off to boot from the amdgpu.

It looks like it does a graphics reset (why though?):
15554.332021] amdgpu 0000:01:00.0: GPU reset begin!

And from that point onwards everyting goes wrong:
[15554.332296] amdgpu: [powerplay] 
[15554.332296]  last message was failed ret is 65535
[15554.332297] amdgpu: [powerplay] 
[15554.332297]  failed to send message 261 ret is 65535 
[15554.332297] amdgpu: [powerplay] 
[15554.332297]  last message was failed ret is 65535

This is kernel 5.6.14
xorg-1.20.8
mesa-20.1.0_rc3
xf86-video-amdgpu-19.1.0

Full log is attached.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 206475] amdgpu under load drop signal to monitor until hard reset
  2020-02-09 20:36 [Bug 206475] New: amdgpu under load drop signal to monitor until hard reset bugzilla-daemon
                   ` (9 preceding siblings ...)
  2020-05-22 12:55 ` bugzilla-daemon
@ 2020-05-23 14:40 ` bugzilla-daemon
  2020-05-23 16:44 ` bugzilla-daemon
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2020-05-23 14:40 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=206475

--- Comment #11 from Andrew Ammerlaan (andrewammerlaan@riseup.net) ---
Created attachment 289245
  --> https://bugzilla.kernel.org/attachment.cgi?id=289245&action=edit
messages

Happened again today, while playing GTA V. Same problems appear in the log
(attached). 

I think the title of this bug should be changed, there is more going on here
then just dropping the signal to the monitor. Because the monitors connected to
the iGPU freeze as well (no signal drop, just a freeze).

It would be great if someone could give me some pointers as to where I could
find more useful logs. /var/log/messages doesn't seem to be very informative.
It just says a GPU reset began and that it failed to sends some messages after.
Or do I maybe need to set some boot parameters, or kernel configs to get more
verbose logs?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 206475] amdgpu under load drop signal to monitor until hard reset
  2020-02-09 20:36 [Bug 206475] New: amdgpu under load drop signal to monitor until hard reset bugzilla-daemon
                   ` (10 preceding siblings ...)
  2020-05-23 14:40 ` bugzilla-daemon
@ 2020-05-23 16:44 ` bugzilla-daemon
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2020-05-23 16:44 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=206475

--- Comment #12 from Andrew Ammerlaan (andrewammerlaan@riseup.net) ---
Created attachment 289247
  --> https://bugzilla.kernel.org/attachment.cgi?id=289247&action=edit
messages (reset succesful this time)

And again, twice on the same day :(

But this time:
amdgpu 0000:01:00.0: GPU reset begin!
amdgpu 0000:01:00.0: GPU BACO reset
amdgpu 0000:01:00.0: GPU reset succeeded, trying to resume

This time the reset succeeded, however after restarting X, I got stuck on the
KDE login splash screen. The log (attached) shows some segfaults.

It seems to me that there are two issues here.

1) The GPU is (often) not successfully recovered after a reset, and if it is
recovered successfully segfaults follow in radeonsi_dri.so

2) It goes into a reset in the first place, for no apparent reason

I guess this bug report is mostly about the second issue, why does it go into a
reset? How do I debug this?

It would be great if we could get this fixed, as it is getting kinda annoying.
(This is a brand new GPU, it is not overheating, what is wrong? )

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, back to index

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-09 20:36 [Bug 206475] New: amdgpu under load drop signal to monitor until hard reset bugzilla-daemon
2020-02-10 13:20 ` [Bug 206475] " bugzilla-daemon
2020-02-10 13:21 ` bugzilla-daemon
2020-02-10 16:39 ` bugzilla-daemon
2020-02-10 16:40 ` bugzilla-daemon
2020-02-10 19:33 ` bugzilla-daemon
2020-02-17 13:23 ` bugzilla-daemon
2020-02-21 21:13 ` bugzilla-daemon
2020-02-24 13:50 ` bugzilla-daemon
2020-02-24 13:52 ` bugzilla-daemon
2020-05-22 12:55 ` bugzilla-daemon
2020-05-23 14:40 ` bugzilla-daemon
2020-05-23 16:44 ` bugzilla-daemon

dri-devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/dri-devel/0 dri-devel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dri-devel dri-devel/ https://lore.kernel.org/dri-devel \
		dri-devel@lists.freedesktop.org
	public-inbox-index dri-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.freedesktop.lists.dri-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git