amd-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [BUG] VAAPI encoder cause kernel panic if encoded video in 4K
@ 2021-04-12 22:05 Mikhail Gavrilov
  2021-04-12 23:55 ` Leo Liu
  2021-04-13  7:29 ` Christian König
  0 siblings, 2 replies; 15+ messages in thread
From: Mikhail Gavrilov @ 2021-04-12 22:05 UTC (permalink / raw)
  To: amd-gfx list, Christian König, Linux List Kernel Mailing

Video demonstration: https://youtu.be/3nkvUeB0GSw

How looks kernel traces.

1.
[ 7315.156460] amdgpu 0000:0b:00.0: amdgpu: [mmhub] page fault
(src_id:0 ring:0 vmid:6 pasid:32779, for process obs pid 23963 thread
obs:cs0 pid 23977)
[ 7315.156490] amdgpu 0000:0b:00.0: amdgpu:   in page starting at
address 0x80011fdf5000 from client 18
[ 7315.156495] amdgpu 0000:0b:00.0: amdgpu:
MMVM_L2_PROTECTION_FAULT_STATUS:0x00641A51
[ 7315.156500] amdgpu 0000:0b:00.0: amdgpu: Faulty UTCL2 client ID: VCN1 (0xd)
[ 7315.156503] amdgpu 0000:0b:00.0: amdgpu: MORE_FAULTS: 0x1
[ 7315.156505] amdgpu 0000:0b:00.0: amdgpu: WALKER_ERROR: 0x0
[ 7315.156509] amdgpu 0000:0b:00.0: amdgpu: PERMISSION_FAULTS: 0x5
[ 7315.156510] amdgpu 0000:0b:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 7315.156513] amdgpu 0000:0b:00.0: amdgpu: RW: 0x1
[ 7315.156545] amdgpu 0000:0b:00.0: amdgpu: [mmhub] page fault
(src_id:0 ring:0 vmid:6 pasid:32779, for process obs pid 23963 thread
obs:cs0 pid 23977)
[ 7315.156549] amdgpu 0000:0b:00.0: amdgpu:   in page starting at
address 0x80011fdf6000 from client 18
[ 7315.156551] amdgpu 0000:0b:00.0: amdgpu:
MMVM_L2_PROTECTION_FAULT_STATUS:0x00641A51
[ 7315.156554] amdgpu 0000:0b:00.0: amdgpu: Faulty UTCL2 client ID: VCN1 (0xd)
[ 7315.156556] amdgpu 0000:0b:00.0: amdgpu: MORE_FAULTS: 0x1
[ 7315.156559] amdgpu 0000:0b:00.0: amdgpu: WALKER_ERROR: 0x0
[ 7315.156561] amdgpu 0000:0b:00.0: amdgpu: PERMISSION_FAULTS: 0x5
[ 7315.156564] amdgpu 0000:0b:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 7315.156566] amdgpu 0000:0b:00.0: amdgpu: RW: 0x1

This is a harmless panic, but nevertheless VAAPI does not work and the
application that tried to use the encoder crashed.

2.
If we tries again and again encode 4K stream through VAAPI we can
encounter the next trace:
[12341.860944] ------------[ cut here ]------------
[12341.860961] kernel BUG at drivers/dma-buf/dma-resv.c:287!
[12341.860968] invalid opcode: 0000 [#1] SMP NOPTI
[12341.860972] CPU: 28 PID: 18261 Comm: kworker/28:0 Tainted: G
W        --------- ---  5.12.0-0.rc5.180.fc35.x86_64+debug #1
[12341.860977] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 3402 01/13/2021
[12341.860981] Workqueue: events amdgpu_irq_handle_ih_soft [amdgpu]
[12341.861102] RIP: 0010:dma_resv_add_shared_fence+0x2ab/0x2c0
[12341.861108] Code: fd ff ff be 01 00 00 00 e8 e2 74 dc ff e9 ac fd
ff ff 48 83 c4 18 be 03 00 00 00 5b 5d 41 5c 41 5d 41 5e 41 5f e9 c5
74 dc ff <0f> 0b 31 ed e9 73 fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00
90 0f
[12341.861112] RSP: 0018:ffffb2f084c87bb0 EFLAGS: 00010246
[12341.861115] RAX: 0000000000000002 RBX: ffff9f9551184998 RCX: 0000000000000000
[12341.861119] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff9f9551184a50
[12341.861122] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
[12341.861124] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9f91b9a18140
[12341.861127] R13: ffff9f91c9020740 R14: ffff9f91c9020768 R15: 0000000000000000
[12341.861130] FS:  0000000000000000(0000) GS:ffff9f984a200000(0000)
knlGS:0000000000000000
[12341.861133] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12341.861136] CR2: 0000144e080d8000 CR3: 000000010e98c000 CR4: 0000000000350ee0
[12341.861139] Call Trace:
[12341.861143]  amdgpu_vm_sdma_commit+0x182/0x220 [amdgpu]
[12341.861251]  amdgpu_vm_bo_update_mapping.constprop.0+0x278/0x3c0 [amdgpu]
[12341.861356]  amdgpu_vm_handle_fault+0x145/0x290 [amdgpu]
[12341.861461]  gmc_v10_0_process_interrupt+0xb3/0x250 [amdgpu]
[12341.861571]  ? _raw_spin_unlock_irqrestore+0x37/0x40
[12341.861577]  ? lock_acquire+0x179/0x3a0
[12341.861583]  ? lock_acquire+0x179/0x3a0
[12341.861587]  ? amdgpu_irq_dispatch+0xc6/0x240 [amdgpu]
[12341.861692]  amdgpu_irq_dispatch+0xc6/0x240 [amdgpu]
[12341.861796]  amdgpu_ih_process+0x90/0x110 [amdgpu]
[12341.861900]  process_one_work+0x2b0/0x5e0
[12341.861906]  worker_thread+0x55/0x3c0
[12341.861910]  ? process_one_work+0x5e0/0x5e0
[12341.861915]  kthread+0x13a/0x150
[12341.861918]  ? __kthread_bind_mask+0x60/0x60
[12341.861922]  ret_from_fork+0x22/0x30
[12341.861928] Modules linked in: uinput snd_seq_dummy rfcomm
snd_hrtimer netconsole nft_objref nf_conntrack_netbios_ns
nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set
nf_tables nfnetlink cmac bnep sunrpc vfat fat hid_logitech_hidpp
joydev hid_logitech_dj mt76x2u mt76x2_common mt76x02_usb mt76_usb
mt76x02_lib intel_rapl_msr intel_rapl_common mt76 iwlmvm mac80211
snd_hda_codec_realtek edac_mce_amd snd_hda_codec_generic ledtrig_audio
snd_hda_codec_hdmi btusb kvm_amd snd_hda_intel btrtl snd_intel_dspcfg
btbcm snd_intel_sdw_acpi snd_usb_audio uvcvideo btintel snd_hda_codec
videobuf2_vmalloc snd_usbmidi_lib videobuf2_memops iwlwifi kvm
bluetooth snd_rawmidi snd_hda_core snd_seq videobuf2_v4l2 snd_hwdep
videobuf2_common snd_seq_device eeepc_wmi snd_pcm videodev asus_wmi
sparse_keymap libarc4 mc irqbypass snd_timer ecdh_generic cfg80211
video
[12341.861969]  wmi_bmof rapl ecc snd sp5100_tco k10temp i2c_piix4
soundcore rfkill acpi_cpufreq ip_tables amdgpu drm_ttm_helper ttm
iommu_v2 gpu_sched drm_kms_helper crct10dif_pclmul cec crc32_pclmul
crc32c_intel drm ghash_clmulni_intel igb nvme ccp dca nvme_core
i2c_algo_bit wmi pinctrl_amd fuse
[12341.862012] ---[ end trace 3503913ed2bda6ad ]---
[12341.862018] RIP: 0010:dma_resv_add_shared_fence+0x2ab/0x2c0
[12341.862024] Code: fd ff ff be 01 00 00 00 e8 e2 74 dc ff e9 ac fd
ff ff 48 83 c4 18 be 03 00 00 00 5b 5d 41 5c 41 5d 41 5e 41 5f e9 c5
74 dc ff <0f> 0b 31 ed e9 73 fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00
90 0f
[12341.862028] RSP: 0018:ffffb2f084c87bb0 EFLAGS: 00010246
[12341.862035] RAX: 0000000000000002 RBX: ffff9f9551184998 RCX: 0000000000000000
[12341.862039] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff9f9551184a50
[12341.862045] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
[12341.862049] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9f91b9a18140
[12341.862053] R13: ffff9f91c9020740 R14: ffff9f91c9020768 R15: 0000000000000000
[12341.862057] FS:  0000000000000000(0000) GS:ffff9f984a200000(0000)
knlGS:0000000000000000
[12341.862062] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12341.862066] CR2: 0000144e080d8000 CR3: 000000010e98c000 CR4: 0000000000350ee0
[12341.862072] note: kworker/28:0[18261] exited with preempt_count 1
[12392.114261] BUG: workqueue lockup - pool cpus=28 node=0 flags=0x0
nice=0 stuck for 50s!
[12392.114276] Showing busy workqueues and worker pools:
[12392.114279] workqueue events: flags=0x0
[12392.114283]   pwq 56: cpus=28 node=0 flags=0x0 nice=0 active=12/256 refcnt=14
[12392.114289]     in-flight: 26978:amdgpu_irq_handle_ih_soft [amdgpu]
[12392.114411]     pending: amdgpu_irq_handle_ih_soft [amdgpu],
free_work, kfree_rcu_monitor, free_obj_work, psi_avgs_work,
psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work,
drm_mode_rmfb_work_fn [drm] BAR(2588), event_mall_stutter [amdgpu]
[12392.114679]   pwq 54: cpus=27 node=0 flags=0x0 nice=0 active=12/256 refcnt=13
[12392.114684]     pending: psi_avgs_work, psi_avgs_work,
psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work,
psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work,
psi_avgs_work, psi_avgs_work
[12392.114704]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[12392.114708]     pending: kfree_rcu_monitor
[12392.114727] workqueue mm_percpu_wq: flags=0x8
[12392.114730]   pwq 56: cpus=28 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[12392.114734]     pending: vmstat_update
[12392.114739]   pwq 54: cpus=27 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[12392.114742]     pending: vmstat_update
[12392.114826] pool 56: cpus=28 node=0 flags=0x0 nice=0 hung=50s
workers=3 idle: 25238 14400
[12394.038243] watchdog: BUG: soft lockup - CPU#27 stuck for 22s! [obs:27017]
[12394.038262] Modules linked in: uinput snd_seq_dummy rfcomm
snd_hrtimer netconsole nft_objref nf_conntrack_netbios_ns
nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set
nf_tables nfnetlink cmac bnep sunrpc vfat fat hid_logitech_hidpp
joydev hid_logitech_dj mt76x2u mt76x2_common mt76x02_usb mt76_usb
mt76x02_lib intel_rapl_msr intel_rapl_common mt76 iwlmvm mac80211
snd_hda_codec_realtek edac_mce_amd snd_hda_codec_generic ledtrig_audio
snd_hda_codec_hdmi btusb kvm_amd snd_hda_intel btrtl snd_intel_dspcfg
btbcm snd_intel_sdw_acpi snd_usb_audio uvcvideo btintel snd_hda_codec
videobuf2_vmalloc snd_usbmidi_lib videobuf2_memops iwlwifi kvm
bluetooth snd_rawmidi snd_hda_core snd_seq videobuf2_v4l2 snd_hwdep
videobuf2_common snd_seq_device eeepc_wmi snd_pcm videodev asus_wmi
sparse_keymap libarc4 mc irqbypass snd_timer ecdh_generic cfg80211
video
[12394.038306]  wmi_bmof rapl ecc snd sp5100_tco k10temp i2c_piix4
soundcore rfkill acpi_cpufreq ip_tables amdgpu drm_ttm_helper ttm
iommu_v2 gpu_sched drm_kms_helper crct10dif_pclmul cec crc32_pclmul
crc32c_intel drm ghash_clmulni_intel igb nvme ccp dca nvme_core
i2c_algo_bit wmi pinctrl_amd fuse
[12394.038328] irq event stamp: 0
[12394.038331] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[12394.038337] hardirqs last disabled at (0): [<ffffffffaa0ddafb>]
copy_process+0x91b/0x1e10
[12394.038343] softirqs last  enabled at (0): [<ffffffffaa0ddafb>]
copy_process+0x91b/0x1e10
[12394.038347] softirqs last disabled at (0): [<0000000000000000>] 0x0

This is much more dangerous and, as a rule, leads to system freezing.

Full traces uploaded here:
[1] https://pastebin.com/aAeazCP8
[2] https://pastebin.com/ZLUP8Xvs

It curious why ffmpeg does not cause such issues.
For example such command not cause kernel panic:
$ ffmpeg -f x11grab -framerate 60 -video_size 3840x2160 -i :0.0 -vf
'format=nv12,hwupload' -vaapi_device /dev/dri/renderD128 -vcodec
h264_vaapi output3.mp4

1. Anyway, the user app shouldn't break the kernel.
2. Command above not working as expected because framerate is not 60
on result video. The ended video is looking as grabbed at 1 frame per
second or even less.

My hardware:
$ inxi -bM
System:    Host: fedora Kernel:
5.12.0-0.rc6.20210408git454859c552da.186.fc35.x86_64 x86_64 bits: 64
Desktop: GNOME 40.0
           Distro: Fedora release 35 (Rawhide)
Machine:   Type: Desktop Mobo: ASUSTeK model: ROG STRIX X570-I GAMING
v: Rev X.0x serial: <superuser required>
           UEFI: American Megatrends v: 3603 date: 03/20/2021
Battery:   ID-1: hidpp_battery_0 charge: N/A condition: N/A
CPU:       Info: 16-Core (2-Die) AMD Ryzen 9 3950X [MT MCP MCM] speed:
3433 MHz min/max: 2200/3500 MHz
Graphics:  Device-1: Advanced Micro Devices [AMD/ATI] Navi 21 [Radeon
RX 6800/6800 XT / 6900 XT] driver: amdgpu v: kernel
           Device-2: AVerMedia Live Streamer CAM 513 type: USB driver:
hid-generic,usbhid,uvcvideo
           Device-3: AVerMedia Live Gamer Ultra-Video type: USB
driver: hid-generic,snd-usb-audio,usbhid,uvcvideo
           Display: wayland server: X.Org 1.21.1 driver: loaded:
amdgpu,ati unloaded: fbdev,modesetting,radeon,vesa
           resolution: 3840x2160~60Hz
           OpenGL: renderer: AMD SIENNA_CICHLID (DRM 3.40.0
5.12.0-0.rc6.20210408git454859c552da.186.fc35.x86_64 LLVM 12.0.0)
           v: 4.6 Mesa 21.1.0-devel
Network:   Device-1: Intel Wi-Fi 6 AX200 driver: iwlwifi
           Device-2: Intel I211 Gigabit Network driver: igb
Drives:    Local Storage: total: 11.36 TiB used: 10.84 TiB (95.4%)
Info:      Processes: 766 Uptime: 5h 12m Memory: 62.63 GiB used: 36.93
GiB (59.0%) Shell: Bash inxi: 3.3.03


-- 
Best Regards,
Mike Gavrilov.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K
  2021-04-12 22:05 [BUG] VAAPI encoder cause kernel panic if encoded video in 4K Mikhail Gavrilov
@ 2021-04-12 23:55 ` Leo Liu
  2021-04-13 21:24   ` Mikhail Gavrilov
  2021-04-13  7:29 ` Christian König
  1 sibling, 1 reply; 15+ messages in thread
From: Leo Liu @ 2021-04-12 23:55 UTC (permalink / raw)
  To: Mikhail Gavrilov, amd-gfx list, Christian König

>It curious why ffmpeg does not cause such issues.
>For example such command not cause kernel panic:
>$ ffmpeg -f x11grab -framerate 60 -video_size 3840x2160 -i :0.0 -vf
>'format=nv12,hwupload' -vaapi_device /dev/dri/renderD128 -vcodec
>h264_vaapi output3.mp4

What command are you using to see the issue or how can the issue be reproduced?
Please file a freedesktop gitlab issue, so we can keep track of it.


On 2021-04-12 6:05 p.m., Mikhail Gavrilov wrote:

> Video demonstration: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fyoutu.be%2F3nkvUeB0GSw&amp;data=04%7C01%7Cleo.liu%40amd.com%7C87e4541e8fe14d78058108d8fdff115f%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637538619239490302%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=CJRVsYixJlfnt5%2BkSCCi5BqP6WK9izh%2FE1ZMEsaR5rU%3D&amp;reserved=0
>
> How looks kernel traces.
>
> 1.
> [ 7315.156460] amdgpu 0000:0b:00.0: amdgpu: [mmhub] page fault
> (src_id:0 ring:0 vmid:6 pasid:32779, for process obs pid 23963 thread
> obs:cs0 pid 23977)
> [ 7315.156490] amdgpu 0000:0b:00.0: amdgpu:   in page starting at
> address 0x80011fdf5000 from client 18
> [ 7315.156495] amdgpu 0000:0b:00.0: amdgpu:
> MMVM_L2_PROTECTION_FAULT_STATUS:0x00641A51
> [ 7315.156500] amdgpu 0000:0b:00.0: amdgpu: Faulty UTCL2 client ID: VCN1 (0xd)
> [ 7315.156503] amdgpu 0000:0b:00.0: amdgpu: MORE_FAULTS: 0x1
> [ 7315.156505] amdgpu 0000:0b:00.0: amdgpu: WALKER_ERROR: 0x0
> [ 7315.156509] amdgpu 0000:0b:00.0: amdgpu: PERMISSION_FAULTS: 0x5
> [ 7315.156510] amdgpu 0000:0b:00.0: amdgpu: MAPPING_ERROR: 0x0
> [ 7315.156513] amdgpu 0000:0b:00.0: amdgpu: RW: 0x1
> [ 7315.156545] amdgpu 0000:0b:00.0: amdgpu: [mmhub] page fault
> (src_id:0 ring:0 vmid:6 pasid:32779, for process obs pid 23963 thread
> obs:cs0 pid 23977)
> [ 7315.156549] amdgpu 0000:0b:00.0: amdgpu:   in page starting at
> address 0x80011fdf6000 from client 18
> [ 7315.156551] amdgpu 0000:0b:00.0: amdgpu:
> MMVM_L2_PROTECTION_FAULT_STATUS:0x00641A51
> [ 7315.156554] amdgpu 0000:0b:00.0: amdgpu: Faulty UTCL2 client ID: VCN1 (0xd)
> [ 7315.156556] amdgpu 0000:0b:00.0: amdgpu: MORE_FAULTS: 0x1
> [ 7315.156559] amdgpu 0000:0b:00.0: amdgpu: WALKER_ERROR: 0x0
> [ 7315.156561] amdgpu 0000:0b:00.0: amdgpu: PERMISSION_FAULTS: 0x5
> [ 7315.156564] amdgpu 0000:0b:00.0: amdgpu: MAPPING_ERROR: 0x0
> [ 7315.156566] amdgpu 0000:0b:00.0: amdgpu: RW: 0x1
>
> This is a harmless panic, but nevertheless VAAPI does not work and the
> application that tried to use the encoder crashed.
>
> 2.
> If we tries again and again encode 4K stream through VAAPI we can
> encounter the next trace:
> [12341.860944] ------------[ cut here ]------------
> [12341.860961] kernel BUG at drivers/dma-buf/dma-resv.c:287!
> [12341.860968] invalid opcode: 0000 [#1] SMP NOPTI
> [12341.860972] CPU: 28 PID: 18261 Comm: kworker/28:0 Tainted: G
> W        --------- ---  5.12.0-0.rc5.180.fc35.x86_64+debug #1
> [12341.860977] Hardware name: System manufacturer System Product
> Name/ROG STRIX X570-I GAMING, BIOS 3402 01/13/2021
> [12341.860981] Workqueue: events amdgpu_irq_handle_ih_soft [amdgpu]
> [12341.861102] RIP: 0010:dma_resv_add_shared_fence+0x2ab/0x2c0
> [12341.861108] Code: fd ff ff be 01 00 00 00 e8 e2 74 dc ff e9 ac fd
> ff ff 48 83 c4 18 be 03 00 00 00 5b 5d 41 5c 41 5d 41 5e 41 5f e9 c5
> 74 dc ff <0f> 0b 31 ed e9 73 fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00
> 90 0f
> [12341.861112] RSP: 0018:ffffb2f084c87bb0 EFLAGS: 00010246
> [12341.861115] RAX: 0000000000000002 RBX: ffff9f9551184998 RCX: 0000000000000000
> [12341.861119] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff9f9551184a50
> [12341.861122] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
> [12341.861124] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9f91b9a18140
> [12341.861127] R13: ffff9f91c9020740 R14: ffff9f91c9020768 R15: 0000000000000000
> [12341.861130] FS:  0000000000000000(0000) GS:ffff9f984a200000(0000)
> knlGS:0000000000000000
> [12341.861133] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [12341.861136] CR2: 0000144e080d8000 CR3: 000000010e98c000 CR4: 0000000000350ee0
> [12341.861139] Call Trace:
> [12341.861143]  amdgpu_vm_sdma_commit+0x182/0x220 [amdgpu]
> [12341.861251]  amdgpu_vm_bo_update_mapping.constprop.0+0x278/0x3c0 [amdgpu]
> [12341.861356]  amdgpu_vm_handle_fault+0x145/0x290 [amdgpu]
> [12341.861461]  gmc_v10_0_process_interrupt+0xb3/0x250 [amdgpu]
> [12341.861571]  ? _raw_spin_unlock_irqrestore+0x37/0x40
> [12341.861577]  ? lock_acquire+0x179/0x3a0
> [12341.861583]  ? lock_acquire+0x179/0x3a0
> [12341.861587]  ? amdgpu_irq_dispatch+0xc6/0x240 [amdgpu]
> [12341.861692]  amdgpu_irq_dispatch+0xc6/0x240 [amdgpu]
> [12341.861796]  amdgpu_ih_process+0x90/0x110 [amdgpu]
> [12341.861900]  process_one_work+0x2b0/0x5e0
> [12341.861906]  worker_thread+0x55/0x3c0
> [12341.861910]  ? process_one_work+0x5e0/0x5e0
> [12341.861915]  kthread+0x13a/0x150
> [12341.861918]  ? __kthread_bind_mask+0x60/0x60
> [12341.861922]  ret_from_fork+0x22/0x30
> [12341.861928] Modules linked in: uinput snd_seq_dummy rfcomm
> snd_hrtimer netconsole nft_objref nf_conntrack_netbios_ns
> nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
> nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
> nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set
> nf_tables nfnetlink cmac bnep sunrpc vfat fat hid_logitech_hidpp
> joydev hid_logitech_dj mt76x2u mt76x2_common mt76x02_usb mt76_usb
> mt76x02_lib intel_rapl_msr intel_rapl_common mt76 iwlmvm mac80211
> snd_hda_codec_realtek edac_mce_amd snd_hda_codec_generic ledtrig_audio
> snd_hda_codec_hdmi btusb kvm_amd snd_hda_intel btrtl snd_intel_dspcfg
> btbcm snd_intel_sdw_acpi snd_usb_audio uvcvideo btintel snd_hda_codec
> videobuf2_vmalloc snd_usbmidi_lib videobuf2_memops iwlwifi kvm
> bluetooth snd_rawmidi snd_hda_core snd_seq videobuf2_v4l2 snd_hwdep
> videobuf2_common snd_seq_device eeepc_wmi snd_pcm videodev asus_wmi
> sparse_keymap libarc4 mc irqbypass snd_timer ecdh_generic cfg80211
> video
> [12341.861969]  wmi_bmof rapl ecc snd sp5100_tco k10temp i2c_piix4
> soundcore rfkill acpi_cpufreq ip_tables amdgpu drm_ttm_helper ttm
> iommu_v2 gpu_sched drm_kms_helper crct10dif_pclmul cec crc32_pclmul
> crc32c_intel drm ghash_clmulni_intel igb nvme ccp dca nvme_core
> i2c_algo_bit wmi pinctrl_amd fuse
> [12341.862012] ---[ end trace 3503913ed2bda6ad ]---
> [12341.862018] RIP: 0010:dma_resv_add_shared_fence+0x2ab/0x2c0
> [12341.862024] Code: fd ff ff be 01 00 00 00 e8 e2 74 dc ff e9 ac fd
> ff ff 48 83 c4 18 be 03 00 00 00 5b 5d 41 5c 41 5d 41 5e 41 5f e9 c5
> 74 dc ff <0f> 0b 31 ed e9 73 fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00
> 90 0f
> [12341.862028] RSP: 0018:ffffb2f084c87bb0 EFLAGS: 00010246
> [12341.862035] RAX: 0000000000000002 RBX: ffff9f9551184998 RCX: 0000000000000000
> [12341.862039] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff9f9551184a50
> [12341.862045] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
> [12341.862049] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9f91b9a18140
> [12341.862053] R13: ffff9f91c9020740 R14: ffff9f91c9020768 R15: 0000000000000000
> [12341.862057] FS:  0000000000000000(0000) GS:ffff9f984a200000(0000)
> knlGS:0000000000000000
> [12341.862062] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [12341.862066] CR2: 0000144e080d8000 CR3: 000000010e98c000 CR4: 0000000000350ee0
> [12341.862072] note: kworker/28:0[18261] exited with preempt_count 1
> [12392.114261] BUG: workqueue lockup - pool cpus=28 node=0 flags=0x0
> nice=0 stuck for 50s!
> [12392.114276] Showing busy workqueues and worker pools:
> [12392.114279] workqueue events: flags=0x0
> [12392.114283]   pwq 56: cpus=28 node=0 flags=0x0 nice=0 active=12/256 refcnt=14
> [12392.114289]     in-flight: 26978:amdgpu_irq_handle_ih_soft [amdgpu]
> [12392.114411]     pending: amdgpu_irq_handle_ih_soft [amdgpu],
> free_work, kfree_rcu_monitor, free_obj_work, psi_avgs_work,
> psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work,
> drm_mode_rmfb_work_fn [drm] BAR(2588), event_mall_stutter [amdgpu]
> [12392.114679]   pwq 54: cpus=27 node=0 flags=0x0 nice=0 active=12/256 refcnt=13
> [12392.114684]     pending: psi_avgs_work, psi_avgs_work,
> psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work,
> psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work,
> psi_avgs_work, psi_avgs_work
> [12392.114704]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> [12392.114708]     pending: kfree_rcu_monitor
> [12392.114727] workqueue mm_percpu_wq: flags=0x8
> [12392.114730]   pwq 56: cpus=28 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> [12392.114734]     pending: vmstat_update
> [12392.114739]   pwq 54: cpus=27 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> [12392.114742]     pending: vmstat_update
> [12392.114826] pool 56: cpus=28 node=0 flags=0x0 nice=0 hung=50s
> workers=3 idle: 25238 14400
> [12394.038243] watchdog: BUG: soft lockup - CPU#27 stuck for 22s! [obs:27017]
> [12394.038262] Modules linked in: uinput snd_seq_dummy rfcomm
> snd_hrtimer netconsole nft_objref nf_conntrack_netbios_ns
> nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
> nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
> nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set
> nf_tables nfnetlink cmac bnep sunrpc vfat fat hid_logitech_hidpp
> joydev hid_logitech_dj mt76x2u mt76x2_common mt76x02_usb mt76_usb
> mt76x02_lib intel_rapl_msr intel_rapl_common mt76 iwlmvm mac80211
> snd_hda_codec_realtek edac_mce_amd snd_hda_codec_generic ledtrig_audio
> snd_hda_codec_hdmi btusb kvm_amd snd_hda_intel btrtl snd_intel_dspcfg
> btbcm snd_intel_sdw_acpi snd_usb_audio uvcvideo btintel snd_hda_codec
> videobuf2_vmalloc snd_usbmidi_lib videobuf2_memops iwlwifi kvm
> bluetooth snd_rawmidi snd_hda_core snd_seq videobuf2_v4l2 snd_hwdep
> videobuf2_common snd_seq_device eeepc_wmi snd_pcm videodev asus_wmi
> sparse_keymap libarc4 mc irqbypass snd_timer ecdh_generic cfg80211
> video
> [12394.038306]  wmi_bmof rapl ecc snd sp5100_tco k10temp i2c_piix4
> soundcore rfkill acpi_cpufreq ip_tables amdgpu drm_ttm_helper ttm
> iommu_v2 gpu_sched drm_kms_helper crct10dif_pclmul cec crc32_pclmul
> crc32c_intel drm ghash_clmulni_intel igb nvme ccp dca nvme_core
> i2c_algo_bit wmi pinctrl_amd fuse
> [12394.038328] irq event stamp: 0
> [12394.038331] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
> [12394.038337] hardirqs last disabled at (0): [<ffffffffaa0ddafb>]
> copy_process+0x91b/0x1e10
> [12394.038343] softirqs last  enabled at (0): [<ffffffffaa0ddafb>]
> copy_process+0x91b/0x1e10
> [12394.038347] softirqs last disabled at (0): [<0000000000000000>] 0x0
>
> This is much more dangerous and, as a rule, leads to system freezing.
>
> Full traces uploaded here:
> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FaAeazCP8&amp;data=04%7C01%7Cleo.liu%40amd.com%7C87e4541e8fe14d78058108d8fdff115f%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637538619239490302%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=J9Eu%2BBqznxPZpTeR%2ByvRF%2BEC8nBRQ8E%2FoNh9tZef2Zw%3D&amp;reserved=0
> [2] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FZLUP8Xvs&amp;data=04%7C01%7Cleo.liu%40amd.com%7C87e4541e8fe14d78058108d8fdff115f%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637538619239490302%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=MVnUbjJH7hilIQ3f2d2wWK04gDxFNCu5CqFRFIIRCrY%3D&amp;reserved=0
>
> It curious why ffmpeg does not cause such issues.
> For example such command not cause kernel panic:
> $ ffmpeg -f x11grab -framerate 60 -video_size 3840x2160 -i :0.0 -vf
> 'format=nv12,hwupload' -vaapi_device /dev/dri/renderD128 -vcodec
> h264_vaapi output3.mp4
>
> 1. Anyway, the user app shouldn't break the kernel.
> 2. Command above not working as expected because framerate is not 60
> on result video. The ended video is looking as grabbed at 1 frame per
> second or even less.
>
> My hardware:
> $ inxi -bM
> System:    Host: fedora Kernel:
> 5.12.0-0.rc6.20210408git454859c552da.186.fc35.x86_64 x86_64 bits: 64
> Desktop: GNOME 40.0
>             Distro: Fedora release 35 (Rawhide)
> Machine:   Type: Desktop Mobo: ASUSTeK model: ROG STRIX X570-I GAMING
> v: Rev X.0x serial: <superuser required>
>             UEFI: American Megatrends v: 3603 date: 03/20/2021
> Battery:   ID-1: hidpp_battery_0 charge: N/A condition: N/A
> CPU:       Info: 16-Core (2-Die) AMD Ryzen 9 3950X [MT MCP MCM] speed:
> 3433 MHz min/max: 2200/3500 MHz
> Graphics:  Device-1: Advanced Micro Devices [AMD/ATI] Navi 21 [Radeon
> RX 6800/6800 XT / 6900 XT] driver: amdgpu v: kernel
>             Device-2: AVerMedia Live Streamer CAM 513 type: USB driver:
> hid-generic,usbhid,uvcvideo
>             Device-3: AVerMedia Live Gamer Ultra-Video type: USB
> driver: hid-generic,snd-usb-audio,usbhid,uvcvideo
>             Display: wayland server: X.Org 1.21.1 driver: loaded:
> amdgpu,ati unloaded: fbdev,modesetting,radeon,vesa
>             resolution: 3840x2160~60Hz
>             OpenGL: renderer: AMD SIENNA_CICHLID (DRM 3.40.0
> 5.12.0-0.rc6.20210408git454859c552da.186.fc35.x86_64 LLVM 12.0.0)
>             v: 4.6 Mesa 21.1.0-devel
> Network:   Device-1: Intel Wi-Fi 6 AX200 driver: iwlwifi
>             Device-2: Intel I211 Gigabit Network driver: igb
> Drives:    Local Storage: total: 11.36 TiB used: 10.84 TiB (95.4%)
> Info:      Processes: 766 Uptime: 5h 12m Memory: 62.63 GiB used: 36.93
> GiB (59.0%) Shell: Bash inxi: 3.3.03
>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K
  2021-04-12 22:05 [BUG] VAAPI encoder cause kernel panic if encoded video in 4K Mikhail Gavrilov
  2021-04-12 23:55 ` Leo Liu
@ 2021-04-13  7:29 ` Christian König
  2021-04-13 21:19   ` Mikhail Gavrilov
  1 sibling, 1 reply; 15+ messages in thread
From: Christian König @ 2021-04-13  7:29 UTC (permalink / raw)
  To: Mikhail Gavrilov, amd-gfx list, Linux List Kernel Mailing

Hi Mikhail,

the crash is a known issue and should be fixed by:

commit f63da9ae7584280582cbc834b20cc18bfb203b14
Author: Philip Yang <Philip.Yang@amd.com>
Date:   Thu Apr 1 00:22:23 2021 -0400

     drm/amdgpu: reserve fence slot to update page table

But that an userspace application can cause a page fault is perfectly 
possible. See here for example https://en.wikipedia.org/wiki/Halting_problem

What we do with misbehaving applications is to log the incident and 
prevent the queue which does nasty things from doing even more submissions.

Regards,
Christian.

Am 13.04.21 um 00:05 schrieb Mikhail Gavrilov:
> Video demonstration: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fyoutu.be%2F3nkvUeB0GSw&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C15d8dd21061b4466fefd08d8fdff0df6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637538619197386172%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=yunKS%2Fbm%2B4eF5IMS4dYH9mKELbM6ajK19pXXgm8dv6Q%3D&amp;reserved=0
>
> How looks kernel traces.
>
> 1.
> [ 7315.156460] amdgpu 0000:0b:00.0: amdgpu: [mmhub] page fault
> (src_id:0 ring:0 vmid:6 pasid:32779, for process obs pid 23963 thread
> obs:cs0 pid 23977)
> [ 7315.156490] amdgpu 0000:0b:00.0: amdgpu:   in page starting at
> address 0x80011fdf5000 from client 18
> [ 7315.156495] amdgpu 0000:0b:00.0: amdgpu:
> MMVM_L2_PROTECTION_FAULT_STATUS:0x00641A51
> [ 7315.156500] amdgpu 0000:0b:00.0: amdgpu: Faulty UTCL2 client ID: VCN1 (0xd)
> [ 7315.156503] amdgpu 0000:0b:00.0: amdgpu: MORE_FAULTS: 0x1
> [ 7315.156505] amdgpu 0000:0b:00.0: amdgpu: WALKER_ERROR: 0x0
> [ 7315.156509] amdgpu 0000:0b:00.0: amdgpu: PERMISSION_FAULTS: 0x5
> [ 7315.156510] amdgpu 0000:0b:00.0: amdgpu: MAPPING_ERROR: 0x0
> [ 7315.156513] amdgpu 0000:0b:00.0: amdgpu: RW: 0x1
> [ 7315.156545] amdgpu 0000:0b:00.0: amdgpu: [mmhub] page fault
> (src_id:0 ring:0 vmid:6 pasid:32779, for process obs pid 23963 thread
> obs:cs0 pid 23977)
> [ 7315.156549] amdgpu 0000:0b:00.0: amdgpu:   in page starting at
> address 0x80011fdf6000 from client 18
> [ 7315.156551] amdgpu 0000:0b:00.0: amdgpu:
> MMVM_L2_PROTECTION_FAULT_STATUS:0x00641A51
> [ 7315.156554] amdgpu 0000:0b:00.0: amdgpu: Faulty UTCL2 client ID: VCN1 (0xd)
> [ 7315.156556] amdgpu 0000:0b:00.0: amdgpu: MORE_FAULTS: 0x1
> [ 7315.156559] amdgpu 0000:0b:00.0: amdgpu: WALKER_ERROR: 0x0
> [ 7315.156561] amdgpu 0000:0b:00.0: amdgpu: PERMISSION_FAULTS: 0x5
> [ 7315.156564] amdgpu 0000:0b:00.0: amdgpu: MAPPING_ERROR: 0x0
> [ 7315.156566] amdgpu 0000:0b:00.0: amdgpu: RW: 0x1
>
> This is a harmless panic, but nevertheless VAAPI does not work and the
> application that tried to use the encoder crashed.
>
> 2.
> If we tries again and again encode 4K stream through VAAPI we can
> encounter the next trace:
> [12341.860944] ------------[ cut here ]------------
> [12341.860961] kernel BUG at drivers/dma-buf/dma-resv.c:287!
> [12341.860968] invalid opcode: 0000 [#1] SMP NOPTI
> [12341.860972] CPU: 28 PID: 18261 Comm: kworker/28:0 Tainted: G
> W        --------- ---  5.12.0-0.rc5.180.fc35.x86_64+debug #1
> [12341.860977] Hardware name: System manufacturer System Product
> Name/ROG STRIX X570-I GAMING, BIOS 3402 01/13/2021
> [12341.860981] Workqueue: events amdgpu_irq_handle_ih_soft [amdgpu]
> [12341.861102] RIP: 0010:dma_resv_add_shared_fence+0x2ab/0x2c0
> [12341.861108] Code: fd ff ff be 01 00 00 00 e8 e2 74 dc ff e9 ac fd
> ff ff 48 83 c4 18 be 03 00 00 00 5b 5d 41 5c 41 5d 41 5e 41 5f e9 c5
> 74 dc ff <0f> 0b 31 ed e9 73 fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00
> 90 0f
> [12341.861112] RSP: 0018:ffffb2f084c87bb0 EFLAGS: 00010246
> [12341.861115] RAX: 0000000000000002 RBX: ffff9f9551184998 RCX: 0000000000000000
> [12341.861119] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff9f9551184a50
> [12341.861122] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
> [12341.861124] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9f91b9a18140
> [12341.861127] R13: ffff9f91c9020740 R14: ffff9f91c9020768 R15: 0000000000000000
> [12341.861130] FS:  0000000000000000(0000) GS:ffff9f984a200000(0000)
> knlGS:0000000000000000
> [12341.861133] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [12341.861136] CR2: 0000144e080d8000 CR3: 000000010e98c000 CR4: 0000000000350ee0
> [12341.861139] Call Trace:
> [12341.861143]  amdgpu_vm_sdma_commit+0x182/0x220 [amdgpu]
> [12341.861251]  amdgpu_vm_bo_update_mapping.constprop.0+0x278/0x3c0 [amdgpu]
> [12341.861356]  amdgpu_vm_handle_fault+0x145/0x290 [amdgpu]
> [12341.861461]  gmc_v10_0_process_interrupt+0xb3/0x250 [amdgpu]
> [12341.861571]  ? _raw_spin_unlock_irqrestore+0x37/0x40
> [12341.861577]  ? lock_acquire+0x179/0x3a0
> [12341.861583]  ? lock_acquire+0x179/0x3a0
> [12341.861587]  ? amdgpu_irq_dispatch+0xc6/0x240 [amdgpu]
> [12341.861692]  amdgpu_irq_dispatch+0xc6/0x240 [amdgpu]
> [12341.861796]  amdgpu_ih_process+0x90/0x110 [amdgpu]
> [12341.861900]  process_one_work+0x2b0/0x5e0
> [12341.861906]  worker_thread+0x55/0x3c0
> [12341.861910]  ? process_one_work+0x5e0/0x5e0
> [12341.861915]  kthread+0x13a/0x150
> [12341.861918]  ? __kthread_bind_mask+0x60/0x60
> [12341.861922]  ret_from_fork+0x22/0x30
> [12341.861928] Modules linked in: uinput snd_seq_dummy rfcomm
> snd_hrtimer netconsole nft_objref nf_conntrack_netbios_ns
> nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
> nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
> nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set
> nf_tables nfnetlink cmac bnep sunrpc vfat fat hid_logitech_hidpp
> joydev hid_logitech_dj mt76x2u mt76x2_common mt76x02_usb mt76_usb
> mt76x02_lib intel_rapl_msr intel_rapl_common mt76 iwlmvm mac80211
> snd_hda_codec_realtek edac_mce_amd snd_hda_codec_generic ledtrig_audio
> snd_hda_codec_hdmi btusb kvm_amd snd_hda_intel btrtl snd_intel_dspcfg
> btbcm snd_intel_sdw_acpi snd_usb_audio uvcvideo btintel snd_hda_codec
> videobuf2_vmalloc snd_usbmidi_lib videobuf2_memops iwlwifi kvm
> bluetooth snd_rawmidi snd_hda_core snd_seq videobuf2_v4l2 snd_hwdep
> videobuf2_common snd_seq_device eeepc_wmi snd_pcm videodev asus_wmi
> sparse_keymap libarc4 mc irqbypass snd_timer ecdh_generic cfg80211
> video
> [12341.861969]  wmi_bmof rapl ecc snd sp5100_tco k10temp i2c_piix4
> soundcore rfkill acpi_cpufreq ip_tables amdgpu drm_ttm_helper ttm
> iommu_v2 gpu_sched drm_kms_helper crct10dif_pclmul cec crc32_pclmul
> crc32c_intel drm ghash_clmulni_intel igb nvme ccp dca nvme_core
> i2c_algo_bit wmi pinctrl_amd fuse
> [12341.862012] ---[ end trace 3503913ed2bda6ad ]---
> [12341.862018] RIP: 0010:dma_resv_add_shared_fence+0x2ab/0x2c0
> [12341.862024] Code: fd ff ff be 01 00 00 00 e8 e2 74 dc ff e9 ac fd
> ff ff 48 83 c4 18 be 03 00 00 00 5b 5d 41 5c 41 5d 41 5e 41 5f e9 c5
> 74 dc ff <0f> 0b 31 ed e9 73 fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00
> 90 0f
> [12341.862028] RSP: 0018:ffffb2f084c87bb0 EFLAGS: 00010246
> [12341.862035] RAX: 0000000000000002 RBX: ffff9f9551184998 RCX: 0000000000000000
> [12341.862039] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff9f9551184a50
> [12341.862045] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
> [12341.862049] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9f91b9a18140
> [12341.862053] R13: ffff9f91c9020740 R14: ffff9f91c9020768 R15: 0000000000000000
> [12341.862057] FS:  0000000000000000(0000) GS:ffff9f984a200000(0000)
> knlGS:0000000000000000
> [12341.862062] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [12341.862066] CR2: 0000144e080d8000 CR3: 000000010e98c000 CR4: 0000000000350ee0
> [12341.862072] note: kworker/28:0[18261] exited with preempt_count 1
> [12392.114261] BUG: workqueue lockup - pool cpus=28 node=0 flags=0x0
> nice=0 stuck for 50s!
> [12392.114276] Showing busy workqueues and worker pools:
> [12392.114279] workqueue events: flags=0x0
> [12392.114283]   pwq 56: cpus=28 node=0 flags=0x0 nice=0 active=12/256 refcnt=14
> [12392.114289]     in-flight: 26978:amdgpu_irq_handle_ih_soft [amdgpu]
> [12392.114411]     pending: amdgpu_irq_handle_ih_soft [amdgpu],
> free_work, kfree_rcu_monitor, free_obj_work, psi_avgs_work,
> psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work,
> drm_mode_rmfb_work_fn [drm] BAR(2588), event_mall_stutter [amdgpu]
> [12392.114679]   pwq 54: cpus=27 node=0 flags=0x0 nice=0 active=12/256 refcnt=13
> [12392.114684]     pending: psi_avgs_work, psi_avgs_work,
> psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work,
> psi_avgs_work, psi_avgs_work, psi_avgs_work, psi_avgs_work,
> psi_avgs_work, psi_avgs_work
> [12392.114704]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> [12392.114708]     pending: kfree_rcu_monitor
> [12392.114727] workqueue mm_percpu_wq: flags=0x8
> [12392.114730]   pwq 56: cpus=28 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> [12392.114734]     pending: vmstat_update
> [12392.114739]   pwq 54: cpus=27 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> [12392.114742]     pending: vmstat_update
> [12392.114826] pool 56: cpus=28 node=0 flags=0x0 nice=0 hung=50s
> workers=3 idle: 25238 14400
> [12394.038243] watchdog: BUG: soft lockup - CPU#27 stuck for 22s! [obs:27017]
> [12394.038262] Modules linked in: uinput snd_seq_dummy rfcomm
> snd_hrtimer netconsole nft_objref nf_conntrack_netbios_ns
> nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
> nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
> nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set
> nf_tables nfnetlink cmac bnep sunrpc vfat fat hid_logitech_hidpp
> joydev hid_logitech_dj mt76x2u mt76x2_common mt76x02_usb mt76_usb
> mt76x02_lib intel_rapl_msr intel_rapl_common mt76 iwlmvm mac80211
> snd_hda_codec_realtek edac_mce_amd snd_hda_codec_generic ledtrig_audio
> snd_hda_codec_hdmi btusb kvm_amd snd_hda_intel btrtl snd_intel_dspcfg
> btbcm snd_intel_sdw_acpi snd_usb_audio uvcvideo btintel snd_hda_codec
> videobuf2_vmalloc snd_usbmidi_lib videobuf2_memops iwlwifi kvm
> bluetooth snd_rawmidi snd_hda_core snd_seq videobuf2_v4l2 snd_hwdep
> videobuf2_common snd_seq_device eeepc_wmi snd_pcm videodev asus_wmi
> sparse_keymap libarc4 mc irqbypass snd_timer ecdh_generic cfg80211
> video
> [12394.038306]  wmi_bmof rapl ecc snd sp5100_tco k10temp i2c_piix4
> soundcore rfkill acpi_cpufreq ip_tables amdgpu drm_ttm_helper ttm
> iommu_v2 gpu_sched drm_kms_helper crct10dif_pclmul cec crc32_pclmul
> crc32c_intel drm ghash_clmulni_intel igb nvme ccp dca nvme_core
> i2c_algo_bit wmi pinctrl_amd fuse
> [12394.038328] irq event stamp: 0
> [12394.038331] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
> [12394.038337] hardirqs last disabled at (0): [<ffffffffaa0ddafb>]
> copy_process+0x91b/0x1e10
> [12394.038343] softirqs last  enabled at (0): [<ffffffffaa0ddafb>]
> copy_process+0x91b/0x1e10
> [12394.038347] softirqs last disabled at (0): [<0000000000000000>] 0x0
>
> This is much more dangerous and, as a rule, leads to system freezing.
>
> Full traces uploaded here:
> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FaAeazCP8&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C15d8dd21061b4466fefd08d8fdff0df6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637538619197386172%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=VLxZoY2WX%2BZIiPG4ADi37WJszWaFXR%2Bz5MimvKRlHR0%3D&amp;reserved=0
> [2] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FZLUP8Xvs&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C15d8dd21061b4466fefd08d8fdff0df6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637538619197386172%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=J6PQR%2FDssvup2W5CTFLN7wsr79M2AZJDz1uT%2FfmxH1U%3D&amp;reserved=0
>
> It curious why ffmpeg does not cause such issues.
> For example such command not cause kernel panic:
> $ ffmpeg -f x11grab -framerate 60 -video_size 3840x2160 -i :0.0 -vf
> 'format=nv12,hwupload' -vaapi_device /dev/dri/renderD128 -vcodec
> h264_vaapi output3.mp4
>
> 1. Anyway, the user app shouldn't break the kernel.
> 2. Command above not working as expected because framerate is not 60
> on result video. The ended video is looking as grabbed at 1 frame per
> second or even less.
>
> My hardware:
> $ inxi -bM
> System:    Host: fedora Kernel:
> 5.12.0-0.rc6.20210408git454859c552da.186.fc35.x86_64 x86_64 bits: 64
> Desktop: GNOME 40.0
>             Distro: Fedora release 35 (Rawhide)
> Machine:   Type: Desktop Mobo: ASUSTeK model: ROG STRIX X570-I GAMING
> v: Rev X.0x serial: <superuser required>
>             UEFI: American Megatrends v: 3603 date: 03/20/2021
> Battery:   ID-1: hidpp_battery_0 charge: N/A condition: N/A
> CPU:       Info: 16-Core (2-Die) AMD Ryzen 9 3950X [MT MCP MCM] speed:
> 3433 MHz min/max: 2200/3500 MHz
> Graphics:  Device-1: Advanced Micro Devices [AMD/ATI] Navi 21 [Radeon
> RX 6800/6800 XT / 6900 XT] driver: amdgpu v: kernel
>             Device-2: AVerMedia Live Streamer CAM 513 type: USB driver:
> hid-generic,usbhid,uvcvideo
>             Device-3: AVerMedia Live Gamer Ultra-Video type: USB
> driver: hid-generic,snd-usb-audio,usbhid,uvcvideo
>             Display: wayland server: X.Org 1.21.1 driver: loaded:
> amdgpu,ati unloaded: fbdev,modesetting,radeon,vesa
>             resolution: 3840x2160~60Hz
>             OpenGL: renderer: AMD SIENNA_CICHLID (DRM 3.40.0
> 5.12.0-0.rc6.20210408git454859c552da.186.fc35.x86_64 LLVM 12.0.0)
>             v: 4.6 Mesa 21.1.0-devel
> Network:   Device-1: Intel Wi-Fi 6 AX200 driver: iwlwifi
>             Device-2: Intel I211 Gigabit Network driver: igb
> Drives:    Local Storage: total: 11.36 TiB used: 10.84 TiB (95.4%)
> Info:      Processes: 766 Uptime: 5h 12m Memory: 62.63 GiB used: 36.93
> GiB (59.0%) Shell: Bash inxi: 3.3.03
>
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K
  2021-04-13  7:29 ` Christian König
@ 2021-04-13 21:19   ` Mikhail Gavrilov
  2021-04-14  6:48     ` Christian König
  0 siblings, 1 reply; 15+ messages in thread
From: Mikhail Gavrilov @ 2021-04-13 21:19 UTC (permalink / raw)
  To: Christian König; +Cc: Linux List Kernel Mailing, amd-gfx list

On Tue, 13 Apr 2021 at 12:29, Christian König <christian.koenig@amd.com> wrote:
>
> Hi Mikhail,
>
> the crash is a known issue and should be fixed by:
>
> commit f63da9ae7584280582cbc834b20cc18bfb203b14
> Author: Philip Yang <Philip.Yang@amd.com>
> Date:   Thu Apr 1 00:22:23 2021 -0400
>
>      drm/amdgpu: reserve fence slot to update page table
>

Unfortunately, this commit couldn't fix the initial problem.
1. Result video is jerky if it grabbed and encoded with ffmpeg
(h264_vaapi codec).
2. OBS still crashed if I try to record or stream video.
3. In the kernel log still appears the message "amdgpu: [mmhub] page
fault (src_id:0 ring:0 vmid:4 pasid:32770, for process obs" if I tried
to record or stream video by OBS.

-- 
Best Regards,
Mike Gavrilov.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K
  2021-04-12 23:55 ` Leo Liu
@ 2021-04-13 21:24   ` Mikhail Gavrilov
  2021-04-13 22:22     ` Leo Liu
  0 siblings, 1 reply; 15+ messages in thread
From: Mikhail Gavrilov @ 2021-04-13 21:24 UTC (permalink / raw)
  To: Leo Liu; +Cc: Christian König, amd-gfx list

On Tue, 13 Apr 2021 at 04:55, Leo Liu <leo.liu@amd.com> wrote:
>
> >It curious why ffmpeg does not cause such issues.
> >For example such command not cause kernel panic:
> >$ ffmpeg -f x11grab -framerate 60 -video_size 3840x2160 -i :0.0 -vf
> >'format=nv12,hwupload' -vaapi_device /dev/dri/renderD128 -vcodec
> >h264_vaapi output3.mp4
>
> What command are you using to see the issue or how can the issue be reproduced?
$ mpv output4.mp4

And of course, I know how it should works because when I encode video
with CPU encoder (libx264) all fine.
$ ffmpeg -f x11grab -framerate 60 -video_size 3840x2160 -i :0.0
-vcodec libx264 output3.mp4

> Please file a freedesktop gitlab issue, so we can keep track of it.
Here? https://gitlab.freedesktop.org/drm/amd/-/issues

Also, I found that other users face the same problem.
https://bbs.archlinux.org/viewtopic.php?id=261965

-- 
Best Regards,
Mike Gavrilov.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K
  2021-04-13 21:24   ` Mikhail Gavrilov
@ 2021-04-13 22:22     ` Leo Liu
  2021-04-14 23:10       ` Mikhail Gavrilov
  0 siblings, 1 reply; 15+ messages in thread
From: Leo Liu @ 2021-04-13 22:22 UTC (permalink / raw)
  To: Mikhail Gavrilov; +Cc: Christian König, amd-gfx list


On 2021-04-13 5:24 p.m., Mikhail Gavrilov wrote:
> On Tue, 13 Apr 2021 at 04:55, Leo Liu <leo.liu@amd.com> wrote:
>>> It curious why ffmpeg does not cause such issues.
>>> For example such command not cause kernel panic:
>>> $ ffmpeg -f x11grab -framerate 60 -video_size 3840x2160 -i :0.0 -vf
>>> 'format=nv12,hwupload' -vaapi_device /dev/dri/renderD128 -vcodec
>>> h264_vaapi output3.mp4
>> What command are you using to see the issue or how can the issue be reproduced?
> $ mpv output4.mp4

This is decode command line, are you seeing issue with encode or 
decode?, you also said `ffmpeg -f x11grab -framerate 60 -video_size 
3840x2160 -i :0.0 -vf 'format=nv12,hwupload' -vaapi_device 
/dev/dri/renderD128 -vcodec h264_vaapi output3.mp4` doesn't cause such 
issue, right? What command line can cause the issue then?


>
> And of course, I know how it should works because when I encode video
> with CPU encoder (libx264) all fine.
> $ ffmpeg -f x11grab -framerate 60 -video_size 3840x2160 -i :0.0
> -vcodec libx264 output3.mp4
>
>> Please file a freedesktop gitlab issue, so we can keep track of it.
> Here? https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues&amp;data=04%7C01%7Cleo.liu%40amd.com%7C3cd466c3286e4303f2b108d8fec2833a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637539458675499474%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=GbiA7%2FrkiLwh2E9js9tGhWkZyr%2B9TY57H6G6cL7ex8s%3D&amp;reserved=0

Yes.

>
> Also, I found that other users face the same problem.
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbbs.archlinux.org%2Fviewtopic.php%3Fid%3D261965&amp;data=04%7C01%7Cleo.liu%40amd.com%7C3cd466c3286e4303f2b108d8fec2833a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637539458675499474%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=YNnk%2BZnPS0DVtuDfttnTThYfHOvmP38%2BwNpNZ5voLuk%3D&amp;reserved=0
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K
  2021-04-13 21:19   ` Mikhail Gavrilov
@ 2021-04-14  6:48     ` Christian König
  2021-04-20 18:03       ` Mikhail Gavrilov
  2021-09-14 18:19       ` Mikhail Gavrilov
  0 siblings, 2 replies; 15+ messages in thread
From: Christian König @ 2021-04-14  6:48 UTC (permalink / raw)
  To: Mikhail Gavrilov, Christian König
  Cc: Linux List Kernel Mailing, amd-gfx list



Am 13.04.21 um 23:19 schrieb Mikhail Gavrilov:
> On Tue, 13 Apr 2021 at 12:29, Christian König <christian.koenig@amd.com> wrote:
>> Hi Mikhail,
>>
>> the crash is a known issue and should be fixed by:
>>
>> commit f63da9ae7584280582cbc834b20cc18bfb203b14
>> Author: Philip Yang <Philip.Yang@amd.com>
>> Date:   Thu Apr 1 00:22:23 2021 -0400
>>
>>       drm/amdgpu: reserve fence slot to update page table
>>
> Unfortunately, this commit couldn't fix the initial problem.
> 1. Result video is jerky if it grabbed and encoded with ffmpeg
> (h264_vaapi codec).
> 2. OBS still crashed if I try to record or stream video.
> 3. In the kernel log still appears the message "amdgpu: [mmhub] page
> fault (src_id:0 ring:0 vmid:4 pasid:32770, for process obs" if I tried
> to record or stream video by OBS.

That is expected behavior, the application is just buggy and causing a 
page fault on the GPU.

The kernel should just not crash with a backtrace.

Regards,
Christian.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K
  2021-04-13 22:22     ` Leo Liu
@ 2021-04-14 23:10       ` Mikhail Gavrilov
  0 siblings, 0 replies; 15+ messages in thread
From: Mikhail Gavrilov @ 2021-04-14 23:10 UTC (permalink / raw)
  To: Leo Liu; +Cc: Christian König, amd-gfx list

On Wed, 14 Apr 2021 at 03:22, Leo Liu <leo.liu@amd.com> wrote:
>
> This is decode command line, are you seeing issue with encode or
> decode?

I was means that described above the kernel panic happens only when
OBS record or stream video with VAAPI encoder.
Grabbing and encoding video with ffmpeg (given command example) is
free from this issue, but result video encoded with ffmpeg not played
properly. And I believe that this is not a bug of ffmpeg itself,
because with CPU encoder (libx264), the resulting video is played
properly.

> you also said `ffmpeg -f x11grab -framerate 60 -video_size
> 3840x2160 -i :0.0 -vf 'format=nv12,hwupload' -vaapi_device
> /dev/dri/renderD128 -vcodec h264_vaapi output3.mp4` doesn't cause such
> issue, right?

This command does not cause described kernel panic, but the resulting
video looks like 0.01 FPS.

>
> Yes.
>

I filled bugreport about VAAPI encoder in ffmpeg here:
https://gitlab.freedesktop.org/drm/amd/-/issues/1570

We can continue there.

-- 
Best Regards,
Mike Gavrilov.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K
  2021-04-14  6:48     ` Christian König
@ 2021-04-20 18:03       ` Mikhail Gavrilov
  2021-04-21  6:42         ` Christian König
  2021-09-14 18:19       ` Mikhail Gavrilov
  1 sibling, 1 reply; 15+ messages in thread
From: Mikhail Gavrilov @ 2021-04-20 18:03 UTC (permalink / raw)
  To: Christian König
  Cc: Christian König, amd-gfx list, Linux List Kernel Mailing

On Wed, 14 Apr 2021 at 11:48, Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> >> commit f63da9ae7584280582cbc834b20cc18bfb203b14
> >> Author: Philip Yang <Philip.Yang@amd.com>
> >> Date:   Thu Apr 1 00:22:23 2021 -0400
> >>
> >>       drm/amdgpu: reserve fence slot to update page table
> >>
>
> That is expected behavior, the application is just buggy and causing a
> page fault on the GPU.
>
> The kernel should just not crash with a backtrace.
>

Any chance to see this commit to be backported to 5.12?
I plan to submit a bug report to OBS devs and don't want my system to
hang again and again when I would test their patches.

-- 
Best Regards,
Mike Gavrilov.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K
  2021-04-20 18:03       ` Mikhail Gavrilov
@ 2021-04-21  6:42         ` Christian König
  2021-04-21 13:58           ` Mikhail Gavrilov
  0 siblings, 1 reply; 15+ messages in thread
From: Christian König @ 2021-04-21  6:42 UTC (permalink / raw)
  To: Mikhail Gavrilov, Christian König
  Cc: Linux List Kernel Mailing, amd-gfx list



Am 20.04.21 um 20:03 schrieb Mikhail Gavrilov:
> On Wed, 14 Apr 2021 at 11:48, Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>> commit f63da9ae7584280582cbc834b20cc18bfb203b14
>>>> Author: Philip Yang <Philip.Yang@amd.com>
>>>> Date:   Thu Apr 1 00:22:23 2021 -0400
>>>>
>>>>        drm/amdgpu: reserve fence slot to update page table
>>>>
>> That is expected behavior, the application is just buggy and causing a
>> page fault on the GPU.
>>
>> The kernel should just not crash with a backtrace.
>>
> Any chance to see this commit to be backported to 5.12?
> I plan to submit a bug report to OBS devs and don't want my system to
> hang again and again when I would test their patches.

I can try, but I'm not sure if we even have the full page fault handling 
for Navi in 5.12.

Regards,
Christian.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K
  2021-04-21  6:42         ` Christian König
@ 2021-04-21 13:58           ` Mikhail Gavrilov
  0 siblings, 0 replies; 15+ messages in thread
From: Mikhail Gavrilov @ 2021-04-21 13:58 UTC (permalink / raw)
  To: Christian König
  Cc: Christian König, Linux List Kernel Mailing, amd-gfx list

On Wed, 21 Apr 2021 at 11:42, Christian König <christian.koenig@amd.com> wrote:
> I can try, but I'm not sure if we even have the full page fault handling
> for Navi in 5.12.
>

It would be great. For me this patch is working as expected and I
already for several days didn't see the panic "kernel BUG at
drivers/dma-buf/dma-resv.c:287!".
Anyway I will waiting for any news.

-- 
Best Regards,
Mike Gavrilov.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K
  2021-04-14  6:48     ` Christian König
  2021-04-20 18:03       ` Mikhail Gavrilov
@ 2021-09-14 18:19       ` Mikhail Gavrilov
  2021-09-15  9:55         ` Christian König
  1 sibling, 1 reply; 15+ messages in thread
From: Mikhail Gavrilov @ 2021-09-14 18:19 UTC (permalink / raw)
  To: Christian König
  Cc: Christian König, Linux List Kernel Mailing, amd-gfx list

[-- Attachment #1: Type: text/plain, Size: 527 bytes --]

On Wed, 14 Apr 2021 at 11:48, Christian König <
ckoenig.leichtzumerken@gmail.com> wrote:

>
> That is expected behavior, the application is just buggy and causing a
> page fault on the GPU.
>
> The kernel should just not crash with a backtrace.
>
> Regards,
> Christian.
>

If after it GPU hangs with the message "[drm:amdgpu_dm_atomic_commit_tail
[amdgpu]] *ERROR* Waiting for fences timed out!" is it also expected
behavior?
Kernel log: https://pastebin.com/WkhATKXX


-- 
Best Regards,
Mike Gavrilov.

[-- Attachment #2: Type: text/html, Size: 1647 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K
  2021-09-14 18:19       ` Mikhail Gavrilov
@ 2021-09-15  9:55         ` Christian König
  2021-09-15 10:36           ` Mikhail Gavrilov
  0 siblings, 1 reply; 15+ messages in thread
From: Christian König @ 2021-09-15  9:55 UTC (permalink / raw)
  To: Mikhail Gavrilov, Christian König
  Cc: Linux List Kernel Mailing, amd-gfx list

[-- Attachment #1: Type: text/plain, Size: 1160 bytes --]

Am 14.09.21 um 20:19 schrieb Mikhail Gavrilov:
> On Wed, 14 Apr 2021 at 11:48, Christian König 
> <ckoenig.leichtzumerken@gmail.com 
> <mailto:ckoenig.leichtzumerken@gmail.com>> wrote:
>
>
>     That is expected behavior, the application is just buggy and
>     causing a
>     page fault on the GPU.
>
>     The kernel should just not crash with a backtrace.
>
>     Regards,
>     Christian.
>
>
> If after it GPU hangs with the message 
> "[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for 
> fences timed out!" is it also expected behavior?

Yes, absolutely. You should see GPU resets and recovery in the system 
log after that.

Regards,
Christian.

> Kernel log: https://pastebin.com/WkhATKXX 
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FWkhATKXX&data=04%7C01%7Cchristian.koenig%40amd.com%7C83d3316107ce4c9fe42a08d977ac2ea3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637672405000225872%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=BSCdZefdxpQPBdKzrcII8e9LDR8nN%2BbMkswE%2FEkaXv8%3D&reserved=0>
>
>
> -- 
> Best Regards,
> Mike Gavrilov.


[-- Attachment #2: Type: text/html, Size: 3396 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K
  2021-09-15  9:55         ` Christian König
@ 2021-09-15 10:36           ` Mikhail Gavrilov
  2021-09-15 11:47             ` Christian König
  0 siblings, 1 reply; 15+ messages in thread
From: Mikhail Gavrilov @ 2021-09-15 10:36 UTC (permalink / raw)
  To: Christian König
  Cc: Christian König, Linux List Kernel Mailing, amd-gfx list

On Wed, 15 Sept 2021 at 14:55, Christian König <christian.koenig@amd.com> wrote:
>
> Yes, absolutely. You should see GPU resets and recovery in the system log after that.

Unfortunately, not one DE will survive a GPU reset. All applications
will terminate abnormally in fact this would be equivalent to reboot
(and denial of service). :(

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG] VAAPI encoder cause kernel panic if encoded video in 4K
  2021-09-15 10:36           ` Mikhail Gavrilov
@ 2021-09-15 11:47             ` Christian König
  0 siblings, 0 replies; 15+ messages in thread
From: Christian König @ 2021-09-15 11:47 UTC (permalink / raw)
  To: Mikhail Gavrilov, Christian König
  Cc: Linux List Kernel Mailing, amd-gfx list



Am 15.09.21 um 12:36 schrieb Mikhail Gavrilov:
> On Wed, 15 Sept 2021 at 14:55, Christian König <christian.koenig@amd.com> wrote:
>> Yes, absolutely. You should see GPU resets and recovery in the system log after that.
> Unfortunately, not one DE will survive a GPU reset. All applications
> will terminate abnormally in fact this would be equivalent to reboot
> (and denial of service). :(

Yeah, very well known bug in the DEs.

The OpenGL robustness extension and Vulkan specification are pretty 
clear what drivers are supposed to do in such a situation, it's just not 
very well handled in userspace.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-09-15 11:47 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-12 22:05 [BUG] VAAPI encoder cause kernel panic if encoded video in 4K Mikhail Gavrilov
2021-04-12 23:55 ` Leo Liu
2021-04-13 21:24   ` Mikhail Gavrilov
2021-04-13 22:22     ` Leo Liu
2021-04-14 23:10       ` Mikhail Gavrilov
2021-04-13  7:29 ` Christian König
2021-04-13 21:19   ` Mikhail Gavrilov
2021-04-14  6:48     ` Christian König
2021-04-20 18:03       ` Mikhail Gavrilov
2021-04-21  6:42         ` Christian König
2021-04-21 13:58           ` Mikhail Gavrilov
2021-09-14 18:19       ` Mikhail Gavrilov
2021-09-15  9:55         ` Christian König
2021-09-15 10:36           ` Mikhail Gavrilov
2021-09-15 11:47             ` Christian König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).