All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 111633] amdgpu driver crash with kernel NULL pointer dereference
@ 2019-09-10 19:03 bugzilla-daemon
  2019-09-19 20:31 ` bugzilla-daemon
  2019-11-19  9:51 ` bugzilla-daemon
  0 siblings, 2 replies; 3+ messages in thread
From: bugzilla-daemon @ 2019-09-10 19:03 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 4038 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=111633

            Bug ID: 111633
           Summary: amdgpu driver crash with kernel NULL pointer
                    dereference
           Product: DRI
           Version: unspecified
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: not set
          Priority: not set
         Component: DRM/AMDgpu
          Assignee: dri-devel@lists.freedesktop.org
          Reporter: vakevk+freedesktopbugzilla@gmail.com

I am running on arch linux: Linux arch 5.2.13-arch1-1-ARCH #1 SMP PREEMPT Fri
Sep 6 17:52:33 UTC 2019 x86_64 GNU/Linux

I am running wayland via sway.

My gpu is a Radeon RX Vega 64.

While in my sway session the image on my screen froze but audio from a video
continued to play. I was able to ssh in from a different machine and found this
message with dmesg:

BUG: kernel NULL pointer dereference, address: 0000000000000360
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP PTI
CPU: 1 PID: 12766 Comm: kworker/u16:0 Not tainted 5.2.11-arch1-1-ARCH #1
Hardware name: ASUS All Series/Z87-PLUS, BIOS 2103 08/15/2014
Workqueue: events_unbound commit_work [drm_kms_helper]
RIP: 0010:dc_stream_retain+0x5/0x20 [amdgpu]
<Code and registers omitted. Can post if important and someone reassures me
that it doesn't sensitive information since it looks like a memory dump.>
Call Trace:
 dc_resource_state_copy_construct+0xa0/0xf0 [amdgpu]
 dc_commit_updates_for_stream+0xa63/0xc20 [amdgpu]
 amdgpu_dm_atomic_commit_tail+0xabe/0x19a0 [amdgpu]
 ? commit_tail+0x3c/0x70 [drm_kms_helper]
 commit_tail+0x3c/0x70 [drm_kms_helper]
 process_one_work+0x1d1/0x3e0
 worker_thread+0x4a/0x3d0
 kthread+0xfb/0x130
 ? process_one_work+0x3e0/0x3e0
 ? kthread_park+0x80/0x80
 ret_from_fork+0x35/0x40
Modules linked in: snd_seq_dummy snd_seq tun nft_ct nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 libcrc32c nf_tables_set cfg80211 nf_tables nfnetlink 8021q garp
mrp stp llc intel_rapl nls_iso8859_1 nls_cp437 vfat fat snd_hda_codec_realtek
snd_hda_codec_generic fuse ledtrig_audio ofpart snd_hda_codec_hdmi cmdlinepart
btusb intel_spi_platform snd_hda_intel btrtl x86_pkg_temp_thermal intel_spi
btbcm intel_powerclamp spi_nor btintel eeepc_wmi snd_usb_audio coretemp
snd_hda_codec uvcvideo asus_wmi bluetooth snd_usbmidi_lib iTCO_wdt kvm_intel
snd_hda_core videobuf2_vmalloc mei_hdcp mtd iTCO_vendor_support mxm_wmi
wmi_bmof sparse_keymap snd_hwdep snd_rawmidi snd_seq_device videobuf2_memops
snd_pcm videobuf2_v4l2 snd_timer videobuf2_common snd videodev kvm irqbypass
input_leds ecdh_generic intel_cstate mousedev rfkill intel_uncore mei_me joydev
cdc_acm media ecc e1000e intel_rapl_perf mei soundcore pcc_cpufreq i2c_i801
lpc_ich pcspkr wmi evdev mac_hid ip_tables x_tables ext4
 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid dm_crypt dm_mod
sd_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ahci
libahci aesni_intel libata aes_x86_64 xhci_pci crypto_simd cryptd glue_helper
xhci_hcd scsi_mod ehci_pci ehci_hcd amdgpu gpu_sched i2c_algo_bit ttm
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm agpgart
CR2: 0000000000000360
---[ end trace 08eaa2e1d713ba4d ]---

At this point I tried killing the sway process but did not succeed even with
`kill -9`. Not even `sudo reboot` completed despite killing the ssh session. I
had to hard reset the machine.

Potentially related is that since roughly a week I have been experiencing
intermittent screen freezes from time to time that would resolve themselves
after about 10 seconds with a message like this in dmesg:

drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR*
[CRTC:47:crtc-0] flip_done timed out
[drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:47:crtc-0] hw_done or
flip_done timed out

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 5327 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug 111633] amdgpu driver crash with kernel NULL pointer dereference
  2019-09-10 19:03 [Bug 111633] amdgpu driver crash with kernel NULL pointer dereference bugzilla-daemon
@ 2019-09-19 20:31 ` bugzilla-daemon
  2019-11-19  9:51 ` bugzilla-daemon
  1 sibling, 0 replies; 3+ messages in thread
From: bugzilla-daemon @ 2019-09-19 20:31 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2489 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=111633

--- Comment #1 from vakevk+freedesktopbugzilla@gmail.com ---
Another one, different stacktrace.

BUG: kernel NULL pointer dereference, address: 0000000000000360
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP PTI
CPU: 4 PID: 28005 Comm: kworker/u16:1 Not tainted 5.2.14-arch1-1-ARCH #1
Hardware name: ASUS All Series/Z87-PLUS, BIOS 2103 08/15/2014
Workqueue: events_unbound commit_work [drm_kms_helper]
RIP: 0010:dc_stream_retain+0x5/0x20 [amdgpu]
Call Trace:
dc_resource_state_copy_construct+0xa0/0xf0 [amdgpu]
dc_commit_updates_for_stream+0xa63/0xc20 [amdgpu]
amdgpu_dm_atomic_commit_tail+0xabe/0x19a0 [amdgpu]
? commit_tail+0x3c/0x70 [drm_kms_helper]
commit_tail+0x3c/0x70 [drm_kms_helper]
process_one_work+0x1d1/0x3e0
worker_thread+0x4a/0x3d0
kthread+0xfb/0x130
? process_one_work+0x3e0/0x3e0
? kthread_park+0x80/0x80
ret_from_fork+0x35/0x40
Modules linked in: tun nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
libcrc32c nf_tables_set cfg80211 8021q garp mrp nf_tables stp llc nfnetlink
intel_rapl ofpart nls_iso8859_1 nls_cp437 cmdlinepart vfat intel_spi_platform
fat fuse intel_spi mei_hdcp spi_nor iTCO_wdt x86_pkg_temp_thermal mtd
iTCO_vendor_support intel_powerclamp uvcvideo coretemp videobuf2_vmalloc
kvm_intel btusb snd_hda_codec_realtek videobuf2_memops btrtl btbcm
snd_hda_codec_generic videobuf2_v4l2 btintel ledtrig_audio snd_hda_codec_hdmi
videobuf2_common bluetooth eeepc_wmi kvm snd_usb_audio snd_hda_intel videodev
asus_wmi snd_hda_codec sparse_keymap wmi_bmof snd_usbmidi_lib mxm_wmi irqbypass
snd_hda_core snd_rawmidi snd_hwdep snd_seq_device intel_cstate ecdh_generic
snd_pcm intel_uncore mei_me i2c_i801 snd_timer intel_rapl_perf rfkill snd
pcspkr media cdc_acm pcc_cpufreq mousedev ecc joydev e1000e input_leds
soundcore mei lpc_ich evdev mac_hid wmi ip_tables x_tables ext4 crc32c_generic
crc16 mbcache jbd2
hid_generic usbhid hid dm_crypt dm_mod sd_mod crct10dif_pclmul crc32_pclmul
crc32c_intel ghash_clmulni_intel ahci libahci aesni_intel libata aes_x86_64
crypto_simd xhci_pci cryptd scsi_mod glue_helper xhci_hcd ehci_pci ehci_hcd
amdgpu gpu_sched i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect
sysimgblt fb_sys_fops drm agpgart
CR2: 0000000000000360
---[ end trace 3b3265e8a1ad7f82 ]---

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3300 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug 111633] amdgpu driver crash with kernel NULL pointer dereference
  2019-09-10 19:03 [Bug 111633] amdgpu driver crash with kernel NULL pointer dereference bugzilla-daemon
  2019-09-19 20:31 ` bugzilla-daemon
@ 2019-11-19  9:51 ` bugzilla-daemon
  1 sibling, 0 replies; 3+ messages in thread
From: bugzilla-daemon @ 2019-11-19  9:51 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 805 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=111633

Martin Peres <martin.peres@free.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |MOVED
             Status|NEW                         |RESOLVED

--- Comment #2 from Martin Peres <martin.peres@free.fr> ---
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been
closed from further activity.

You can subscribe and participate further through the new bug through this link
to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/904.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2392 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-11-19  9:51 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-10 19:03 [Bug 111633] amdgpu driver crash with kernel NULL pointer dereference bugzilla-daemon
2019-09-19 20:31 ` bugzilla-daemon
2019-11-19  9:51 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.