linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some games stopped working.
@ 2022-04-03 18:39 Mikhail Gavrilov
  2022-04-04  6:30 ` Christian König
  0 siblings, 1 reply; 15+ messages in thread
From: Mikhail Gavrilov @ 2022-04-03 18:39 UTC (permalink / raw)
  To: amd-gfx list, Christian König, daniel.vetter,
	thomas.hellstrom, Linux List Kernel Mailing

Hi,
Between commits ed4643521e6a and 34af78c4e616 something was broken.
I noted that kernel log flooded with warning message "WARNING: CPU: 31
PID: 51848 at drivers/dma-buf/dma-fence-array.c:191
dma_fence_array_create+0x101/0x120" when some games are running:
"Resident Evil Village", "Marvel's Avengers", "The Dark Pictures
Anthology: House of Ashes".

[16999.958726] ------------[ cut here ]------------
[16999.958731] WARNING: CPU: 31 PID: 51848 at
drivers/dma-buf/dma-fence-array.c:191
dma_fence_array_create+0x101/0x120
[16999.958738] Modules linked in: xone_gip_chatpad(OE)
xone_gip_gamepad(OE) xone_gip_common(OE) ff_memless tls uinput rfcomm
snd_seq_dummy snd_hrtimer snd_seq_midi snd_seq_midi_event nft_objref
nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet
nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4
nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep
sunrpc binfmt_misc iwlmvm vfat intel_rapl_msr fat intel_rapl_common
snd_hda_codec_realtek mac80211 snd_hda_codec_generic ledtrig_audio
snd_hda_codec_hdmi libarc4 snd_hda_intel edac_mce_amd snd_intel_dspcfg
snd_usb_audio snd_intel_sdw_acpi btusb kvm_amd snd_hda_codec btrtl
btbcm iwlwifi btintel snd_hda_core snd_usbmidi_lib uvcvideo snd_hwdep
kvm iwlmei snd_rawmidi videobuf2_vmalloc xone_dongle(OE)
videobuf2_memops xone_gip_bus(OE) snd_seq btmtk videobuf2_v4l2
videobuf2_common snd_seq_device irqbypass bluetooth cfg80211 snd_pcm
rapl videodev
[16999.958799]  eeepc_wmi asus_wmi snd_timer sparse_keymap
platform_profile ecdh_generic video wmi_bmof pcspkr snd k10temp
i2c_piix4 joydev mc soundcore rfkill mei acpi_cpufreq zram
hid_logitech_hidpp hid_logitech_dj amdgpu drm_ttm_helper ttm
crct10dif_pclmul ccp crc32_pclmul ucsi_ccg iommu_v2 crc32c_intel
typec_ucsi gpu_sched ghash_clmulni_intel sp5100_tco drm_dp_helper
typec igb nvme nvme_core dca wmi scsi_dh_rdac scsi_dh_emc scsi_dh_alua
ip6_tables ip_tables dm_multipath ipmi_devintf ipmi_msghandler fuse
[16999.958862] CPU: 31 PID: 51848 Comm: GWT.exe Tainted: G    B   W
OEL   --------- ---
5.18.0-0.rc0.20220401gite8b767f5e04097a.15.fc37.x86_64 #1
[16999.958865] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 4204 02/24/2022
[16999.958867] RIP: 0010:dma_fence_array_create+0x101/0x120
[16999.958871] Code: 45 85 e4 75 10 eb 2a 48 81 fa c0 aa 52 ab 74 1a
83 e8 01 72 1c 48 63 d0 48 8b 54 d5 00 48 8b 52 08 48 81 fa 60 aa 52
ab 75 dd <0f> 0b 83 e8 01 73 e4 48 83 c4 08 48 89 d8 5b 5d 41 5c 41 5d
41 5e
[16999.958874] RSP: 0018:ffffb03c071f7e08 EFLAGS: 00010246
[16999.958877] RAX: 0000000000000001 RBX: ffff98fdb03c6d00 RCX: 0000000000510e99
[16999.958879] RDX: ffffffffab52aac0 RSI: ffff98fdb03c6d10 RDI: ffff98fdb03c6d00
[16999.958880] RBP: ffff98fa31c59e40 R08: 0000000000000001 R09: 0000000000000000
[16999.958882] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
[16999.958883] R13: 0000000000000000 R14: ffff98fdb03c6d40 R15: 0000000000000001
[16999.958885] FS:  000000004789f640(0000) GS:ffff9907ea600000(0000)
knlGS:0000000029b70000
[16999.958887] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[16999.958888] CR2: 00007ff41eee8000 CR3: 000000002856a000 CR4: 0000000000350ee0
[16999.958890] Call Trace:
[16999.958893]  <TASK>
[16999.958897]  sync_file_ioctl+0x83d/0x9f0
[16999.958904]  __x64_sys_ioctl+0x8d/0xc0
[16999.958908]  do_syscall_64+0x3a/0x80
[16999.958913]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[16999.958917] RIP: 0033:0x7ff5e850b29f
[16999.958941] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24
10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00
00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28
00 00
[16999.958943] RSP: 002b:000000004789d540 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[16999.958946] RAX: ffffffffffffffda RBX: 00007ff5d5637040 RCX: 00007ff5e850b29f
[16999.958948] RDX: 000000004789d740 RSI: 00000000c0303e03 RDI: 0000000000000260
[16999.958949] RBP: 0000000000000260 R08: 0000000000000001 R09: 0000000000000000
[16999.958951] R10: 0000000000000000 R11: 0000000000000246 R12: 000000004789d740
[16999.958953] R13: 0000000000000000 R14: 00000000c0303e03 R15: 0000000000000000
[16999.958958]  </TASK>
[16999.958959] irq event stamp: 0
[16999.958961] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[16999.958964] hardirqs last disabled at (0): [<ffffffffaa0e88c1>]
copy_process+0x9f1/0x1e20
[16999.958968] softirqs last  enabled at (0): [<ffffffffaa0e88c1>]
copy_process+0x9f1/0x1e20
[16999.958971] softirqs last disabled at (0): [<0000000000000000>] 0x0
[16999.958974] ---[ end trace 0000000000000000 ]---


The games "Forza Horizon 5", "Forza Horizon 4", "Cyberpunk 2077",
"Ghostwire: Tokyo" stopped working. When these games crashed I again
saw the same warning message as above [2]. Difference only in thead
name and addresses.

[  643.442353] ------------[ cut here ]------------
[  643.442358] WARNING: CPU: 24 PID: 7824 at
drivers/dma-buf/dma-fence-array.c:191
dma_fence_array_create+0x101/0x120
[  643.442364] Modules linked in: uinput rfcomm snd_seq_dummy
snd_hrtimer snd_seq_midi snd_seq_midi_event nft_objref
nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet
nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4
nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep
sunrpc binfmt_misc iwlmvm snd_hda_codec_realtek mac80211
snd_hda_codec_generic vfat fat ledtrig_audio snd_hda_codec_hdmi
intel_rapl_msr snd_hda_intel intel_rapl_common snd_intel_dspcfg
libarc4 snd_intel_sdw_acpi snd_hda_codec edac_mce_amd snd_usb_audio
iwlwifi snd_hda_core btusb uvcvideo snd_usbmidi_lib btrtl snd_hwdep
snd_rawmidi btbcm videobuf2_vmalloc xone_dongle(OE) kvm_amd
videobuf2_memops xone_gip_bus(OE) iwlmei videobuf2_v4l2 snd_seq
btintel kvm eeepc_wmi btmtk asus_wmi snd_seq_device sparse_keymap
videobuf2_common irqbypass platform_profile rapl bluetooth snd_pcm
cfg80211 video pcspkr wmi_bmof k10temp i2c_piix4
[  643.442406]  videodev snd_timer snd ecdh_generic joydev mc
soundcore rfkill mei acpi_cpufreq scsi_dh_rdac scsi_dh_emc
scsi_dh_alua dm_multipath zram hid_logitech_hidpp hid_logitech_dj
amdgpu crct10dif_pclmul crc32_pclmul crc32c_intel drm_ttm_helper ttm
iommu_v2 ucsi_ccg ccp ghash_clmulni_intel gpu_sched typec_ucsi
sp5100_tco typec drm_dp_helper igb nvme nvme_core dca wmi ip6_tables
ip_tables ipmi_devintf ipmi_msghandler fuse
[  643.442427] CPU: 24 PID: 7824 Comm: GameThread Tainted: G    B   W
OEL   --------- ---
5.18.0-0.rc0.20220325git34af78c4e616.7.fc37.x86_64 #1
[  643.442430] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 4204 02/24/2022
[  643.442432] RIP: 0010:dma_fence_array_create+0x101/0x120
[  643.442434] Code: 45 85 e4 75 10 eb 2a 48 81 fa c0 6a 52 a8 74 1a
83 e8 01 72 1c 48 63 d0 48 8b 54 d5 00 48 8b 52 08 48 81 fa 60 6a 52
a8 75 dd <0f> 0b 83 e8 01 73 e4 48 83 c4 08 48 89 d8 5b 5d 41 5c 41 5d
41 5e
[  643.442436] RSP: 0018:ffffb0c783ea7e08 EFLAGS: 00010246
[  643.442437] RAX: 0000000000000001 RBX: ffffa0fe03e4d800 RCX: 0000000000003b48
[  643.442439] RDX: ffffffffa8526ac0 RSI: ffffa0fe03e4d810 RDI: ffffa0fe03e4d800
[  643.442440] RBP: ffffa0fb81c33e00 R08: 0000000000000001 R09: 0000000000000000
[  643.442441] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
[  643.442442] R13: 0000000000000000 R14: ffffa0fe03e4d840 R15: 0000000000000001
[  643.442443] FS:  000000007b59f640(0000) GS:ffffa10a68a00000(0000)
knlGS:000000007a4f0000
[  643.442445] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  643.442446] CR2: 00007f632016f000 CR3: 00000003787f8000 CR4: 0000000000350ee0
[  643.442448] Call Trace:
[  643.442449]  <TASK>
[  643.442453]  sync_file_ioctl+0x83d/0x9f0
[  643.442457]  __x64_sys_ioctl+0x8d/0xc0
[  643.442461]  do_syscall_64+0x3a/0x80
[  643.442464]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  643.442466] RIP: 0033:0x7f6377f0b29f
[  643.442484] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24
10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00
00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28
00 00
[  643.442486] RSP: 002b:000000007b59d540 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[  643.442488] RAX: ffffffffffffffda RBX: 000000007f068600 RCX: 00007f6377f0b29f
[  643.442489] RDX: 000000007b59d740 RSI: 00000000c0303e03 RDI: 000000000000011c
[  643.442490] RBP: 000000000000011c R08: 0000000000000001 R09: 0000000000000000
[  643.442491] R10: 0000000000000000 R11: 0000000000000246 R12: 000000007b59d740
[  643.442492] R13: 0000000000000000 R14: 00000000c0303e03 R15: 0000000000000000
[  643.442495]  </TASK>
[  643.442496] irq event stamp: 0
[  643.442497] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[  643.442500] hardirqs last disabled at (0): [<ffffffffa70e8a5e>]
copy_process+0x9fe/0x1ed0
[  643.442503] softirqs last  enabled at (0): [<ffffffffa70e8a5e>]
copy_process+0x9fe/0x1ed0
[  643.442505] softirqs last disabled at (0): [<0000000000000000>] 0x0
[  643.442507] ---[ end trace 0000000000000000 ]---

Before 5.18 git34af78c4e616 I also saw warning message. But this
message was another [1] "WARNING: CPU: 29 PID: 6282 at
kernel/dma/debug.c:1162 debug_dma_map_sg+0x329/0x380". And it not
affected for working the listed games.


[  572.507688] ------------[ cut here ]------------
[  572.507754] DMA-API: amdgpu 0000:0b:00.0: mapping sg segment longer
than device claims to support [len=516096] [max=65536]
[  572.507761] WARNING: CPU: 29 PID: 6282 at kernel/dma/debug.c:1162
debug_dma_map_sg+0x329/0x380
[  572.507768] Modules linked in: uinput rfcomm snd_seq_dummy
snd_hrtimer snd_seq_midi snd_seq_midi_event nft_objref
nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet
nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4
nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep
sunrpc binfmt_misc snd_hda_codec_realtek snd_hda_codec_generic iwlmvm
intel_rapl_msr ledtrig_audio snd_hda_codec_hdmi intel_rapl_common vfat
fat snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec
mac80211 edac_mce_amd snd_hda_core snd_usb_audio snd_usbmidi_lib
snd_hwdep snd_rawmidi btusb kvm_amd btrtl snd_seq btbcm libarc4
snd_seq_device btintel snd_pcm kvm iwlwifi uvcvideo xone_dongle(OE)
btmtk videobuf2_vmalloc xone_gip_bus(OE) videobuf2_memops eeepc_wmi
videobuf2_v4l2 asus_wmi iwlmei bluetooth sparse_keymap irqbypass
videobuf2_common snd_timer platform_profile rapl video pcspkr wmi_bmof
videodev k10temp
[  572.507848]  i2c_piix4 snd cfg80211 joydev ecdh_generic mc
soundcore rfkill mei acpi_cpufreq scsi_dh_rdac scsi_dh_emc
scsi_dh_alua dm_multipath zram hid_logitech_hidpp hid_logitech_dj
amdgpu crct10dif_pclmul crc32_pclmul crc32c_intel ucsi_ccg
drm_ttm_helper ghash_clmulni_intel ttm sp5100_tco igb ccp typec_ucsi
nvme iommu_v2 typec gpu_sched nvme_core dca wmi ip6_tables ip_tables
ipmi_devintf ipmi_msghandler fuse
[  572.507889] CPU: 29 PID: 6282 Comm: GameThread Tainted: G        W
OEL   --------- ---
5.18.0-0.rc0.20220324gited4643521e6a.6.fc37.x86_64 #1
[  572.507893] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 4204 02/24/2022
[  572.507895] RIP: 0010:debug_dma_map_sg+0x329/0x380
[  572.507899] Code: 5c 24 10 8b 4c 24 18 48 8b 54 24 20 48 89 c6 44
8b 44 24 2c 48 c7 c7 90 40 84 9f 4c 89 5c 24 10 4c 89 4c 24 08 e8 57
d6 c9 00 <0f> 0b 4c 8b 5c 24 10 4c 8b 4c 24 08 8b 15 75 4d 31 02 85 d2
0f 85
[  572.507902] RSP: 0018:ffffb748d2917b50 EFLAGS: 00010282
[  572.507906] RAX: 000000000000006e RBX: ffff9e1ad45540d0 RCX: 0000000000000000
[  572.507908] RDX: 0000000000000001 RSI: ffffffff9f8a4b50 RDI: 00000000ffffffff
[  572.507910] RBP: ffff9e1bfb936ea0 R08: 0000000000000000 R09: 00000000fff7ffff
[  572.507913] R10: ffffb748d2917980 R11: ffff9e29ee2fffe8 R12: 0000000000000001
[  572.507915] R13: 0000000000000004 R14: 0000000000000002 R15: ffff9e1ad22fe900
[  572.507917] FS:  00007fb5dd637fc0(0000) GS:ffff9e29a9e00000(0000)
knlGS:0000000067fe0000
[  572.507919] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  572.507922] CR2: 00007fb5db8af000 CR3: 00000002b492e000 CR4: 0000000000350ee0
[  572.507924] Call Trace:
[  572.507926]  <TASK>
[  572.507934]  __dma_map_sg_attrs+0xb8/0xf0
[  572.507939]  dma_map_sgtable+0x19/0x30
[  572.507943]  amdgpu_bo_move+0x57c/0x6f0 [amdgpu]
[  572.508064]  ? amdgpu_ttm_tt_populate+0x74/0x90 [amdgpu]
[  572.508177]  ttm_bo_handle_move_mem+0x8c/0x190 [ttm]
[  572.508186]  ttm_bo_validate+0xd7/0x150 [ttm]
[  572.508191]  ? ww_mutex_lock+0x38/0xa0
[  572.508197]  amdgpu_gem_userptr_ioctl+0x178/0x290 [amdgpu]
[  572.508296]  ? amdgpu_bo_vm_destroy+0x80/0x80 [amdgpu]
[  572.508399]  ? amdgpu_gem_create_ioctl+0x330/0x330 [amdgpu]
[  572.508494]  drm_ioctl_kernel+0xa1/0x150
[  572.508503]  drm_ioctl+0x21c/0x410
[  572.508508]  ? amdgpu_gem_create_ioctl+0x330/0x330 [amdgpu]
[  572.508605]  ? lock_release+0x14f/0x460
[  572.508611]  ? _raw_spin_unlock_irqrestore+0x30/0x60
[  572.508615]  ? lockdep_hardirqs_on+0x7d/0x100
[  572.508619]  ? _raw_spin_unlock_irqrestore+0x40/0x60
[  572.508624]  amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
[  572.508719]  __x64_sys_ioctl+0x8d/0xc0
[  572.508725]  do_syscall_64+0x3a/0x80
[  572.508730]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  572.508733] RIP: 0033:0x7fb5dd50b29f
[  572.508754] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24
10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00
00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28
00 00
[  572.508756] RSP: 002b:000000000027f680 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[  572.508760] RAX: ffffffffffffffda RBX: 000000007f09bd10 RCX: 00007fb5dd50b29f
[  572.508762] RDX: 000000000027f730 RSI: 00000000c0186451 RDI: 00000000000000bd
[  572.508764] RBP: 000000000027f730 R08: 00007fb5dd5f7b00 R09: 0000000000000070
[  572.508766] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c0186451
[  572.508769] R13: 00000000000000bd R14: 000000000000000e R15: 000000007dcc0000
[  572.508780]  </TASK>
[  572.508782] irq event stamp: 4594699
[  572.508784] hardirqs last  enabled at (4594707):
[<ffffffff9e17a17e>] __up_console_sem+0x5e/0x70
[  572.508788] hardirqs last disabled at (4594714):
[<ffffffff9e17a163>] __up_console_sem+0x43/0x70
[  572.508791] softirqs last  enabled at (4594478):
[<ffffffff9e0f2cb1>] __irq_exit_rcu+0xd1/0x160
[  572.508795] softirqs last disabled at (4594473):
[<ffffffff9e0f2cb1>] __irq_exit_rcu+0xd1/0x160
[  572.508798] ---[ end trace 0000000000000000 ]---
[  577.607889] ------------[ cut here ]------------
[  577.608030] WARNING: CPU: 27 PID: 6485 at
drivers/gpu/drm/drm_syncobj.c:400 drm_syncobj_find_fence+0x224/0x2c0
[  577.608045] Modules linked in: uinput rfcomm snd_seq_dummy
snd_hrtimer snd_seq_midi snd_seq_midi_event nft_objref
nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet
nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4
nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep
sunrpc binfmt_misc snd_hda_codec_realtek snd_hda_codec_generic iwlmvm
intel_rapl_msr ledtrig_audio snd_hda_codec_hdmi intel_rapl_common vfat
fat snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec
mac80211 edac_mce_amd snd_hda_core snd_usb_audio snd_usbmidi_lib
snd_hwdep snd_rawmidi btusb kvm_amd btrtl snd_seq btbcm libarc4
snd_seq_device btintel snd_pcm kvm iwlwifi uvcvideo xone_dongle(OE)
btmtk videobuf2_vmalloc xone_gip_bus(OE) videobuf2_memops eeepc_wmi
videobuf2_v4l2 asus_wmi iwlmei bluetooth sparse_keymap irqbypass
videobuf2_common snd_timer platform_profile rapl video pcspkr wmi_bmof
videodev k10temp
[  577.609566]  i2c_piix4 snd cfg80211 joydev ecdh_generic mc
soundcore rfkill mei acpi_cpufreq scsi_dh_rdac scsi_dh_emc
scsi_dh_alua dm_multipath zram hid_logitech_hidpp hid_logitech_dj
amdgpu crct10dif_pclmul crc32_pclmul crc32c_intel ucsi_ccg
drm_ttm_helper ghash_clmulni_intel ttm sp5100_tco igb ccp typec_ucsi
nvme iommu_v2 typec gpu_sched nvme_core dca wmi ip6_tables ip_tables
ipmi_devintf ipmi_msghandler fuse
[  577.609688] CPU: 27 PID: 6485 Comm: GameThread Tainted: G        W
OEL   --------- ---
5.18.0-0.rc0.20220324gited4643521e6a.6.fc37.x86_64 #1
[  577.609697] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 4204 02/24/2022
[  577.609704] RIP: 0010:drm_syncobj_find_fence+0x224/0x2c0
[  577.609714] Code: 70 90 9f e8 1e 1c 76 ff e8 29 0f 50 00 8b 15 cf
d6 af 01 85 d2 74 15 65 48 8b 04 25 80 1e 02 00 8b 80 78 0e 00 00 85
c0 74 02 <0f> 0b 4c 89 e7 e8 62 e9 ff ff 49 89 45 00 48 85 c0 0f 85 2e
fe ff
[  577.609722] RSP: 0018:ffffb748d2d8fac0 EFLAGS: 00010202
[  577.609731] RAX: 0000000000000001 RBX: 0000000000000002 RCX: ffff9e1c51140000
[  577.609738] RDX: 0000000000000001 RSI: ffffffff9f81a22d RDI: ffffffff9f8bb1ce
[  577.609744] RBP: ffffb748d2d8fb40 R08: 0000000000000002 R09: 0000000024924a83
[  577.609751] R10: 0000000000000001 R11: 0000000000000000 R12: ffff9e1c74a3bf80
[  577.609757] R13: ffffb748d2d8fb50 R14: 0000000000000011 R15: 0000000000001388
[  577.609765] FS:  000000007d2af640(0000) GS:ffff9e29a9600000(0000)
knlGS:000000007a4b0000
[  577.609772] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  577.609778] CR2: 00007fb5bb69b000 CR3: 00000002b492e000 CR4: 0000000000350ee0
[  577.609786] Call Trace:
[  577.609791]  <TASK>
[  577.609801]  ? find_held_lock+0x32/0x80
[  577.609811]  ? sched_clock_cpu+0xb/0xc0
[  577.609824]  ? lock_release+0x14f/0x460
[  577.609842]  amdgpu_syncobj_lookup_and_add_to_sync+0x24/0xb0 [amdgpu]
[  577.610175]  amdgpu_cs_ioctl+0xcb5/0x20b0 [amdgpu]
[  577.610476]  ? __lock_acquire+0x387/0x1ee0
[  577.610554]  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
[  577.610845]  drm_ioctl_kernel+0xa1/0x150
[  577.610865]  drm_ioctl+0x21c/0x410
[  577.610880]  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
[  577.611208]  ? lock_release+0x14f/0x460
[  577.611220]  ? _raw_spin_unlock_irqrestore+0x30/0x60
[  577.611232]  ? lockdep_hardirqs_on+0x7d/0x100
[  577.611242]  ? _raw_spin_unlock_irqrestore+0x40/0x60
[  577.611260]  amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
[  577.611548]  __x64_sys_ioctl+0x8d/0xc0
[  577.611564]  do_syscall_64+0x3a/0x80
[  577.611576]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  577.611584] RIP: 0033:0x7fb5dd50b29f
[  577.611614] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24
10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00
00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28
00 00
[  577.611622] RSP: 002b:000000007d2ad3f0 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[  577.611632] RAX: ffffffffffffffda RBX: 000000007d2ad6b8 RCX: 00007fb5dd50b29f
[  577.611639] RDX: 000000007d2ad4d0 RSI: 00000000c0186444 RDI: 00000000000000bd
[  577.611645] RBP: 000000007d2ad4d0 R08: 00007fb54c038b60 R09: 000000007d2ad490
[  577.611651] R10: 000000007fe71860 R11: 0000000000000246 R12: 00000000c0186444
[  577.611657] R13: 00000000000000bd R14: 000000007d2ad690 R15: 00007fb54c038c20
[  577.611692]  </TASK>
[  577.611698] irq event stamp: 7565
[  577.611703] hardirqs last  enabled at (7573): [<ffffffff9e17a17e>]
__up_console_sem+0x5e/0x70
[  577.611713] hardirqs last disabled at (7580): [<ffffffff9e17a163>]
__up_console_sem+0x43/0x70
[  577.611722] softirqs last  enabled at (7416): [<ffffffff9e0f2cb1>]
__irq_exit_rcu+0xd1/0x160
[  577.611731] softirqs last disabled at (7409): [<ffffffff9e0f2cb1>]
__irq_exit_rcu+0xd1/0x160
[  577.611739] ---[ end trace 0000000000000000 ]---

Unfortunately git bisect is not did not provide expected result due to
the large number of failed builds [3].

git bisect says that the code that prints the warnings was added by
Christian König.

$ git blame drivers/dma-buf/dma-fence-array.c -L 181,201 e8b767f5e04097a^
Blaming lines:   9% (21/221), done.
0fd9803b985e5 drivers/dma-buf/dma-fence-array.c (Christian König
2022-01-19 11:40:21 +0100 181)          * containers or otherwise we
run into recursion and potential kernel
0fd9803b985e5 drivers/dma-buf/dma-fence-array.c (Christian König
2022-01-19 11:40:21 +0100 182)          * stack overflow on operations
on the dma_fence_array.
0fd9803b985e5 drivers/dma-buf/dma-fence-array.c (Christian König
2022-01-19 11:40:21 +0100 183)          *
0fd9803b985e5 drivers/dma-buf/dma-fence-array.c (Christian König
2022-01-19 11:40:21 +0100 184)          * The correct way of handling
this is to flatten out the array by the
0fd9803b985e5 drivers/dma-buf/dma-fence-array.c (Christian König
2022-01-19 11:40:21 +0100 185)          * caller instead.
0fd9803b985e5 drivers/dma-buf/dma-fence-array.c (Christian König
2022-01-19 11:40:21 +0100 186)          *
0fd9803b985e5 drivers/dma-buf/dma-fence-array.c (Christian König
2022-01-19 11:40:21 +0100 187)          * Enforce this here by
checking that we don't create a dma_fence_array
0fd9803b985e5 drivers/dma-buf/dma-fence-array.c (Christian König
2022-01-19 11:40:21 +0100 188)          * with any container inside.
0fd9803b985e5 drivers/dma-buf/dma-fence-array.c (Christian König
2022-01-19 11:40:21 +0100 189)          */
0fd9803b985e5 drivers/dma-buf/dma-fence-array.c (Christian König
2022-01-19 11:40:21 +0100 190)         while (num_fences--)
0fd9803b985e5 drivers/dma-buf/dma-fence-array.c (Christian König
2022-01-19 11:40:21 +0100 191)
WARN_ON(dma_fence_is_container(fences[num_fences]));
0fd9803b985e5 drivers/dma-buf/dma-fence-array.c (Christian König
2022-01-19 11:40:21 +0100 192)
b3dfbdf261e07 drivers/dma-buf/fence-array.c     (Gustavo Padovan
2016-06-01 15:10:03 +0200 193)         return array;
b3dfbdf261e07 drivers/dma-buf/fence-array.c     (Gustavo Padovan
2016-06-01 15:10:03 +0200 194) }
f54d1867005c3 drivers/dma-buf/dma-fence-array.c (Chris Wilson
2016-10-25 13:00:45 +0100 195) EXPORT_SYMBOL(dma_fence_array_create);
d5b72a2123dfa drivers/dma-buf/dma-fence-array.c (Philipp Zabel
2017-03-17 17:34:49 +0100 196)
d5b72a2123dfa drivers/dma-buf/dma-fence-array.c (Philipp Zabel
2017-03-17 17:34:49 +0100 197) /**
d5b72a2123dfa drivers/dma-buf/dma-fence-array.c (Philipp Zabel
2017-03-17 17:34:49 +0100 198)  * dma_fence_match_context - Check if
all fences are from the given context
d5b72a2123dfa drivers/dma-buf/dma-fence-array.c (Philipp Zabel
2017-03-17 17:34:49 +0100 199)  * @fence:              [in]    fence
or fence array
d5b72a2123dfa drivers/dma-buf/dma-fence-array.c (Philipp Zabel
2017-03-17 17:34:49 +0100 200)  * @context:            [in]    fence
context to check all fences against

Christian can you lit the light on what's going on here?
Thanks.

[1] https://pastebin.com/tSWvLBus
[2] https://pastebin.com/VqNmYDm2
[3] https://pastebin.com/efHf3UF3

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some games stopped working.
  2022-04-03 18:39 [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some games stopped working Mikhail Gavrilov
@ 2022-04-04  6:30 ` Christian König
  2022-04-04  8:22   ` Paul Menzel
  2022-04-08 11:01   ` Mikhail Gavrilov
  0 siblings, 2 replies; 15+ messages in thread
From: Christian König @ 2022-04-04  6:30 UTC (permalink / raw)
  To: Mikhail Gavrilov, amd-gfx list, daniel.vetter, thomas.hellstrom,
	Linux List Kernel Mailing

Hi Mikhail,

those are two independent and already known problems.

The warning triggered from the sync_file is already fixed in 
drm-misc-next-fixes, but so far I couldn't figure out why the games 
suddenly doesn't work any more.

There is a bug report for that, but bisecting the changes didn't yielded 
anything valuable so far.

So if you can come up with something that would be rather valuable.

Regards,
Christian.

Am 03.04.22 um 20:39 schrieb Mikhail Gavrilov:
> Hi,
> Between commits ed4643521e6a and 34af78c4e616 something was broken.
> I noted that kernel log flooded with warning message "WARNING: CPU: 31
> PID: 51848 at drivers/dma-buf/dma-fence-array.c:191
> dma_fence_array_create+0x101/0x120" when some games are running:
> "Resident Evil Village", "Marvel's Avengers", "The Dark Pictures
> Anthology: House of Ashes".
>
> [16999.958726] ------------[ cut here ]------------
> [16999.958731] WARNING: CPU: 31 PID: 51848 at
> drivers/dma-buf/dma-fence-array.c:191
> dma_fence_array_create+0x101/0x120
> [16999.958738] Modules linked in: xone_gip_chatpad(OE)
> xone_gip_gamepad(OE) xone_gip_common(OE) ff_memless tls uinput rfcomm
> snd_seq_dummy snd_hrtimer snd_seq_midi snd_seq_midi_event nft_objref
> nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet
> nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4
> nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep
> sunrpc binfmt_misc iwlmvm vfat intel_rapl_msr fat intel_rapl_common
> snd_hda_codec_realtek mac80211 snd_hda_codec_generic ledtrig_audio
> snd_hda_codec_hdmi libarc4 snd_hda_intel edac_mce_amd snd_intel_dspcfg
> snd_usb_audio snd_intel_sdw_acpi btusb kvm_amd snd_hda_codec btrtl
> btbcm iwlwifi btintel snd_hda_core snd_usbmidi_lib uvcvideo snd_hwdep
> kvm iwlmei snd_rawmidi videobuf2_vmalloc xone_dongle(OE)
> videobuf2_memops xone_gip_bus(OE) snd_seq btmtk videobuf2_v4l2
> videobuf2_common snd_seq_device irqbypass bluetooth cfg80211 snd_pcm
> rapl videodev
> [16999.958799]  eeepc_wmi asus_wmi snd_timer sparse_keymap
> platform_profile ecdh_generic video wmi_bmof pcspkr snd k10temp
> i2c_piix4 joydev mc soundcore rfkill mei acpi_cpufreq zram
> hid_logitech_hidpp hid_logitech_dj amdgpu drm_ttm_helper ttm
> crct10dif_pclmul ccp crc32_pclmul ucsi_ccg iommu_v2 crc32c_intel
> typec_ucsi gpu_sched ghash_clmulni_intel sp5100_tco drm_dp_helper
> typec igb nvme nvme_core dca wmi scsi_dh_rdac scsi_dh_emc scsi_dh_alua
> ip6_tables ip_tables dm_multipath ipmi_devintf ipmi_msghandler fuse
> [16999.958862] CPU: 31 PID: 51848 Comm: GWT.exe Tainted: G    B   W
> OEL   --------- ---
> 5.18.0-0.rc0.20220401gite8b767f5e04097a.15.fc37.x86_64 #1
> [16999.958865] Hardware name: System manufacturer System Product
> Name/ROG STRIX X570-I GAMING, BIOS 4204 02/24/2022
> [16999.958867] RIP: 0010:dma_fence_array_create+0x101/0x120
> [16999.958871] Code: 45 85 e4 75 10 eb 2a 48 81 fa c0 aa 52 ab 74 1a
> 83 e8 01 72 1c 48 63 d0 48 8b 54 d5 00 48 8b 52 08 48 81 fa 60 aa 52
> ab 75 dd <0f> 0b 83 e8 01 73 e4 48 83 c4 08 48 89 d8 5b 5d 41 5c 41 5d
> 41 5e
> [16999.958874] RSP: 0018:ffffb03c071f7e08 EFLAGS: 00010246
> [16999.958877] RAX: 0000000000000001 RBX: ffff98fdb03c6d00 RCX: 0000000000510e99
> [16999.958879] RDX: ffffffffab52aac0 RSI: ffff98fdb03c6d10 RDI: ffff98fdb03c6d00
> [16999.958880] RBP: ffff98fa31c59e40 R08: 0000000000000001 R09: 0000000000000000
> [16999.958882] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
> [16999.958883] R13: 0000000000000000 R14: ffff98fdb03c6d40 R15: 0000000000000001
> [16999.958885] FS:  000000004789f640(0000) GS:ffff9907ea600000(0000)
> knlGS:0000000029b70000
> [16999.958887] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [16999.958888] CR2: 00007ff41eee8000 CR3: 000000002856a000 CR4: 0000000000350ee0
> [16999.958890] Call Trace:
> [16999.958893]  <TASK>
> [16999.958897]  sync_file_ioctl+0x83d/0x9f0
> [16999.958904]  __x64_sys_ioctl+0x8d/0xc0
> [16999.958908]  do_syscall_64+0x3a/0x80
> [16999.958913]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [16999.958917] RIP: 0033:0x7ff5e850b29f
> [16999.958941] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24
> 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00
> 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28
> 00 00
> [16999.958943] RSP: 002b:000000004789d540 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [16999.958946] RAX: ffffffffffffffda RBX: 00007ff5d5637040 RCX: 00007ff5e850b29f
> [16999.958948] RDX: 000000004789d740 RSI: 00000000c0303e03 RDI: 0000000000000260
> [16999.958949] RBP: 0000000000000260 R08: 0000000000000001 R09: 0000000000000000
> [16999.958951] R10: 0000000000000000 R11: 0000000000000246 R12: 000000004789d740
> [16999.958953] R13: 0000000000000000 R14: 00000000c0303e03 R15: 0000000000000000
> [16999.958958]  </TASK>
> [16999.958959] irq event stamp: 0
> [16999.958961] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
> [16999.958964] hardirqs last disabled at (0): [<ffffffffaa0e88c1>]
> copy_process+0x9f1/0x1e20
> [16999.958968] softirqs last  enabled at (0): [<ffffffffaa0e88c1>]
> copy_process+0x9f1/0x1e20
> [16999.958971] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [16999.958974] ---[ end trace 0000000000000000 ]---
>
>
> The games "Forza Horizon 5", "Forza Horizon 4", "Cyberpunk 2077",
> "Ghostwire: Tokyo" stopped working. When these games crashed I again
> saw the same warning message as above [2]. Difference only in thead
> name and addresses.
>
> [  643.442353] ------------[ cut here ]------------
> [  643.442358] WARNING: CPU: 24 PID: 7824 at
> drivers/dma-buf/dma-fence-array.c:191
> dma_fence_array_create+0x101/0x120
> [  643.442364] Modules linked in: uinput rfcomm snd_seq_dummy
> snd_hrtimer snd_seq_midi snd_seq_midi_event nft_objref
> nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet
> nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4
> nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep
> sunrpc binfmt_misc iwlmvm snd_hda_codec_realtek mac80211
> snd_hda_codec_generic vfat fat ledtrig_audio snd_hda_codec_hdmi
> intel_rapl_msr snd_hda_intel intel_rapl_common snd_intel_dspcfg
> libarc4 snd_intel_sdw_acpi snd_hda_codec edac_mce_amd snd_usb_audio
> iwlwifi snd_hda_core btusb uvcvideo snd_usbmidi_lib btrtl snd_hwdep
> snd_rawmidi btbcm videobuf2_vmalloc xone_dongle(OE) kvm_amd
> videobuf2_memops xone_gip_bus(OE) iwlmei videobuf2_v4l2 snd_seq
> btintel kvm eeepc_wmi btmtk asus_wmi snd_seq_device sparse_keymap
> videobuf2_common irqbypass platform_profile rapl bluetooth snd_pcm
> cfg80211 video pcspkr wmi_bmof k10temp i2c_piix4
> [  643.442406]  videodev snd_timer snd ecdh_generic joydev mc
> soundcore rfkill mei acpi_cpufreq scsi_dh_rdac scsi_dh_emc
> scsi_dh_alua dm_multipath zram hid_logitech_hidpp hid_logitech_dj
> amdgpu crct10dif_pclmul crc32_pclmul crc32c_intel drm_ttm_helper ttm
> iommu_v2 ucsi_ccg ccp ghash_clmulni_intel gpu_sched typec_ucsi
> sp5100_tco typec drm_dp_helper igb nvme nvme_core dca wmi ip6_tables
> ip_tables ipmi_devintf ipmi_msghandler fuse
> [  643.442427] CPU: 24 PID: 7824 Comm: GameThread Tainted: G    B   W
> OEL   --------- ---
> 5.18.0-0.rc0.20220325git34af78c4e616.7.fc37.x86_64 #1
> [  643.442430] Hardware name: System manufacturer System Product
> Name/ROG STRIX X570-I GAMING, BIOS 4204 02/24/2022
> [  643.442432] RIP: 0010:dma_fence_array_create+0x101/0x120
> [  643.442434] Code: 45 85 e4 75 10 eb 2a 48 81 fa c0 6a 52 a8 74 1a
> 83 e8 01 72 1c 48 63 d0 48 8b 54 d5 00 48 8b 52 08 48 81 fa 60 6a 52
> a8 75 dd <0f> 0b 83 e8 01 73 e4 48 83 c4 08 48 89 d8 5b 5d 41 5c 41 5d
> 41 5e
> [  643.442436] RSP: 0018:ffffb0c783ea7e08 EFLAGS: 00010246
> [  643.442437] RAX: 0000000000000001 RBX: ffffa0fe03e4d800 RCX: 0000000000003b48
> [  643.442439] RDX: ffffffffa8526ac0 RSI: ffffa0fe03e4d810 RDI: ffffa0fe03e4d800
> [  643.442440] RBP: ffffa0fb81c33e00 R08: 0000000000000001 R09: 0000000000000000
> [  643.442441] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
> [  643.442442] R13: 0000000000000000 R14: ffffa0fe03e4d840 R15: 0000000000000001
> [  643.442443] FS:  000000007b59f640(0000) GS:ffffa10a68a00000(0000)
> knlGS:000000007a4f0000
> [  643.442445] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  643.442446] CR2: 00007f632016f000 CR3: 00000003787f8000 CR4: 0000000000350ee0
> [  643.442448] Call Trace:
> [  643.442449]  <TASK>
> [  643.442453]  sync_file_ioctl+0x83d/0x9f0
> [  643.442457]  __x64_sys_ioctl+0x8d/0xc0
> [  643.442461]  do_syscall_64+0x3a/0x80
> [  643.442464]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [  643.442466] RIP: 0033:0x7f6377f0b29f
> [  643.442484] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24
> 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00
> 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28
> 00 00
> [  643.442486] RSP: 002b:000000007b59d540 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [  643.442488] RAX: ffffffffffffffda RBX: 000000007f068600 RCX: 00007f6377f0b29f
> [  643.442489] RDX: 000000007b59d740 RSI: 00000000c0303e03 RDI: 000000000000011c
> [  643.442490] RBP: 000000000000011c R08: 0000000000000001 R09: 0000000000000000
> [  643.442491] R10: 0000000000000000 R11: 0000000000000246 R12: 000000007b59d740
> [  643.442492] R13: 0000000000000000 R14: 00000000c0303e03 R15: 0000000000000000
> [  643.442495]  </TASK>
> [  643.442496] irq event stamp: 0
> [  643.442497] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
> [  643.442500] hardirqs last disabled at (0): [<ffffffffa70e8a5e>]
> copy_process+0x9fe/0x1ed0
> [  643.442503] softirqs last  enabled at (0): [<ffffffffa70e8a5e>]
> copy_process+0x9fe/0x1ed0
> [  643.442505] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [  643.442507] ---[ end trace 0000000000000000 ]---
>
> Before 5.18 git34af78c4e616 I also saw warning message. But this
> message was another [1] "WARNING: CPU: 29 PID: 6282 at
> kernel/dma/debug.c:1162 debug_dma_map_sg+0x329/0x380". And it not
> affected for working the listed games.
>
>
> [  572.507688] ------------[ cut here ]------------
> [  572.507754] DMA-API: amdgpu 0000:0b:00.0: mapping sg segment longer
> than device claims to support [len=516096] [max=65536]
> [  572.507761] WARNING: CPU: 29 PID: 6282 at kernel/dma/debug.c:1162
> debug_dma_map_sg+0x329/0x380
> [  572.507768] Modules linked in: uinput rfcomm snd_seq_dummy
> snd_hrtimer snd_seq_midi snd_seq_midi_event nft_objref
> nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet
> nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4
> nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep
> sunrpc binfmt_misc snd_hda_codec_realtek snd_hda_codec_generic iwlmvm
> intel_rapl_msr ledtrig_audio snd_hda_codec_hdmi intel_rapl_common vfat
> fat snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec
> mac80211 edac_mce_amd snd_hda_core snd_usb_audio snd_usbmidi_lib
> snd_hwdep snd_rawmidi btusb kvm_amd btrtl snd_seq btbcm libarc4
> snd_seq_device btintel snd_pcm kvm iwlwifi uvcvideo xone_dongle(OE)
> btmtk videobuf2_vmalloc xone_gip_bus(OE) videobuf2_memops eeepc_wmi
> videobuf2_v4l2 asus_wmi iwlmei bluetooth sparse_keymap irqbypass
> videobuf2_common snd_timer platform_profile rapl video pcspkr wmi_bmof
> videodev k10temp
> [  572.507848]  i2c_piix4 snd cfg80211 joydev ecdh_generic mc
> soundcore rfkill mei acpi_cpufreq scsi_dh_rdac scsi_dh_emc
> scsi_dh_alua dm_multipath zram hid_logitech_hidpp hid_logitech_dj
> amdgpu crct10dif_pclmul crc32_pclmul crc32c_intel ucsi_ccg
> drm_ttm_helper ghash_clmulni_intel ttm sp5100_tco igb ccp typec_ucsi
> nvme iommu_v2 typec gpu_sched nvme_core dca wmi ip6_tables ip_tables
> ipmi_devintf ipmi_msghandler fuse
> [  572.507889] CPU: 29 PID: 6282 Comm: GameThread Tainted: G        W
> OEL   --------- ---
> 5.18.0-0.rc0.20220324gited4643521e6a.6.fc37.x86_64 #1
> [  572.507893] Hardware name: System manufacturer System Product
> Name/ROG STRIX X570-I GAMING, BIOS 4204 02/24/2022
> [  572.507895] RIP: 0010:debug_dma_map_sg+0x329/0x380
> [  572.507899] Code: 5c 24 10 8b 4c 24 18 48 8b 54 24 20 48 89 c6 44
> 8b 44 24 2c 48 c7 c7 90 40 84 9f 4c 89 5c 24 10 4c 89 4c 24 08 e8 57
> d6 c9 00 <0f> 0b 4c 8b 5c 24 10 4c 8b 4c 24 08 8b 15 75 4d 31 02 85 d2
> 0f 85
> [  572.507902] RSP: 0018:ffffb748d2917b50 EFLAGS: 00010282
> [  572.507906] RAX: 000000000000006e RBX: ffff9e1ad45540d0 RCX: 0000000000000000
> [  572.507908] RDX: 0000000000000001 RSI: ffffffff9f8a4b50 RDI: 00000000ffffffff
> [  572.507910] RBP: ffff9e1bfb936ea0 R08: 0000000000000000 R09: 00000000fff7ffff
> [  572.507913] R10: ffffb748d2917980 R11: ffff9e29ee2fffe8 R12: 0000000000000001
> [  572.507915] R13: 0000000000000004 R14: 0000000000000002 R15: ffff9e1ad22fe900
> [  572.507917] FS:  00007fb5dd637fc0(0000) GS:ffff9e29a9e00000(0000)
> knlGS:0000000067fe0000
> [  572.507919] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  572.507922] CR2: 00007fb5db8af000 CR3: 00000002b492e000 CR4: 0000000000350ee0
> [  572.507924] Call Trace:
> [  572.507926]  <TASK>
> [  572.507934]  __dma_map_sg_attrs+0xb8/0xf0
> [  572.507939]  dma_map_sgtable+0x19/0x30
> [  572.507943]  amdgpu_bo_move+0x57c/0x6f0 [amdgpu]
> [  572.508064]  ? amdgpu_ttm_tt_populate+0x74/0x90 [amdgpu]
> [  572.508177]  ttm_bo_handle_move_mem+0x8c/0x190 [ttm]
> [  572.508186]  ttm_bo_validate+0xd7/0x150 [ttm]
> [  572.508191]  ? ww_mutex_lock+0x38/0xa0
> [  572.508197]  amdgpu_gem_userptr_ioctl+0x178/0x290 [amdgpu]
> [  572.508296]  ? amdgpu_bo_vm_destroy+0x80/0x80 [amdgpu]
> [  572.508399]  ? amdgpu_gem_create_ioctl+0x330/0x330 [amdgpu]
> [  572.508494]  drm_ioctl_kernel+0xa1/0x150
> [  572.508503]  drm_ioctl+0x21c/0x410
> [  572.508508]  ? amdgpu_gem_create_ioctl+0x330/0x330 [amdgpu]
> [  572.508605]  ? lock_release+0x14f/0x460
> [  572.508611]  ? _raw_spin_unlock_irqrestore+0x30/0x60
> [  572.508615]  ? lockdep_hardirqs_on+0x7d/0x100
> [  572.508619]  ? _raw_spin_unlock_irqrestore+0x40/0x60
> [  572.508624]  amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
> [  572.508719]  __x64_sys_ioctl+0x8d/0xc0
> [  572.508725]  do_syscall_64+0x3a/0x80
> [  572.508730]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [  572.508733] RIP: 0033:0x7fb5dd50b29f
> [  572.508754] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24
> 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00
> 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28
> 00 00
> [  572.508756] RSP: 002b:000000000027f680 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [  572.508760] RAX: ffffffffffffffda RBX: 000000007f09bd10 RCX: 00007fb5dd50b29f
> [  572.508762] RDX: 000000000027f730 RSI: 00000000c0186451 RDI: 00000000000000bd
> [  572.508764] RBP: 000000000027f730 R08: 00007fb5dd5f7b00 R09: 0000000000000070
> [  572.508766] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c0186451
> [  572.508769] R13: 00000000000000bd R14: 000000000000000e R15: 000000007dcc0000
> [  572.508780]  </TASK>
> [  572.508782] irq event stamp: 4594699
> [  572.508784] hardirqs last  enabled at (4594707):
> [<ffffffff9e17a17e>] __up_console_sem+0x5e/0x70
> [  572.508788] hardirqs last disabled at (4594714):
> [<ffffffff9e17a163>] __up_console_sem+0x43/0x70
> [  572.508791] softirqs last  enabled at (4594478):
> [<ffffffff9e0f2cb1>] __irq_exit_rcu+0xd1/0x160
> [  572.508795] softirqs last disabled at (4594473):
> [<ffffffff9e0f2cb1>] __irq_exit_rcu+0xd1/0x160
> [  572.508798] ---[ end trace 0000000000000000 ]---
> [  577.607889] ------------[ cut here ]------------
> [  577.608030] WARNING: CPU: 27 PID: 6485 at
> drivers/gpu/drm/drm_syncobj.c:400 drm_syncobj_find_fence+0x224/0x2c0
> [  577.608045] Modules linked in: uinput rfcomm snd_seq_dummy
> snd_hrtimer snd_seq_midi snd_seq_midi_event nft_objref
> nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet
> nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4
> nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep
> sunrpc binfmt_misc snd_hda_codec_realtek snd_hda_codec_generic iwlmvm
> intel_rapl_msr ledtrig_audio snd_hda_codec_hdmi intel_rapl_common vfat
> fat snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec
> mac80211 edac_mce_amd snd_hda_core snd_usb_audio snd_usbmidi_lib
> snd_hwdep snd_rawmidi btusb kvm_amd btrtl snd_seq btbcm libarc4
> snd_seq_device btintel snd_pcm kvm iwlwifi uvcvideo xone_dongle(OE)
> btmtk videobuf2_vmalloc xone_gip_bus(OE) videobuf2_memops eeepc_wmi
> videobuf2_v4l2 asus_wmi iwlmei bluetooth sparse_keymap irqbypass
> videobuf2_common snd_timer platform_profile rapl video pcspkr wmi_bmof
> videodev k10temp
> [  577.609566]  i2c_piix4 snd cfg80211 joydev ecdh_generic mc
> soundcore rfkill mei acpi_cpufreq scsi_dh_rdac scsi_dh_emc
> scsi_dh_alua dm_multipath zram hid_logitech_hidpp hid_logitech_dj
> amdgpu crct10dif_pclmul crc32_pclmul crc32c_intel ucsi_ccg
> drm_ttm_helper ghash_clmulni_intel ttm sp5100_tco igb ccp typec_ucsi
> nvme iommu_v2 typec gpu_sched nvme_core dca wmi ip6_tables ip_tables
> ipmi_devintf ipmi_msghandler fuse
> [  577.609688] CPU: 27 PID: 6485 Comm: GameThread Tainted: G        W
> OEL   --------- ---
> 5.18.0-0.rc0.20220324gited4643521e6a.6.fc37.x86_64 #1
> [  577.609697] Hardware name: System manufacturer System Product
> Name/ROG STRIX X570-I GAMING, BIOS 4204 02/24/2022
> [  577.609704] RIP: 0010:drm_syncobj_find_fence+0x224/0x2c0
> [  577.609714] Code: 70 90 9f e8 1e 1c 76 ff e8 29 0f 50 00 8b 15 cf
> d6 af 01 85 d2 74 15 65 48 8b 04 25 80 1e 02 00 8b 80 78 0e 00 00 85
> c0 74 02 <0f> 0b 4c 89 e7 e8 62 e9 ff ff 49 89 45 00 48 85 c0 0f 85 2e
> fe ff
> [  577.609722] RSP: 0018:ffffb748d2d8fac0 EFLAGS: 00010202
> [  577.609731] RAX: 0000000000000001 RBX: 0000000000000002 RCX: ffff9e1c51140000
> [  577.609738] RDX: 0000000000000001 RSI: ffffffff9f81a22d RDI: ffffffff9f8bb1ce
> [  577.609744] RBP: ffffb748d2d8fb40 R08: 0000000000000002 R09: 0000000024924a83
> [  577.609751] R10: 0000000000000001 R11: 0000000000000000 R12: ffff9e1c74a3bf80
> [  577.609757] R13: ffffb748d2d8fb50 R14: 0000000000000011 R15: 0000000000001388
> [  577.609765] FS:  000000007d2af640(0000) GS:ffff9e29a9600000(0000)
> knlGS:000000007a4b0000
> [  577.609772] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  577.609778] CR2: 00007fb5bb69b000 CR3: 00000002b492e000 CR4: 0000000000350ee0
> [  577.609786] Call Trace:
> [  577.609791]  <TASK>
> [  577.609801]  ? find_held_lock+0x32/0x80
> [  577.609811]  ? sched_clock_cpu+0xb/0xc0
> [  577.609824]  ? lock_release+0x14f/0x460
> [  577.609842]  amdgpu_syncobj_lookup_and_add_to_sync+0x24/0xb0 [amdgpu]
> [  577.610175]  amdgpu_cs_ioctl+0xcb5/0x20b0 [amdgpu]
> [  577.610476]  ? __lock_acquire+0x387/0x1ee0
> [  577.610554]  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
> [  577.610845]  drm_ioctl_kernel+0xa1/0x150
> [  577.610865]  drm_ioctl+0x21c/0x410
> [  577.610880]  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
> [  577.611208]  ? lock_release+0x14f/0x460
> [  577.611220]  ? _raw_spin_unlock_irqrestore+0x30/0x60
> [  577.611232]  ? lockdep_hardirqs_on+0x7d/0x100
> [  577.611242]  ? _raw_spin_unlock_irqrestore+0x40/0x60
> [  577.611260]  amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
> [  577.611548]  __x64_sys_ioctl+0x8d/0xc0
> [  577.611564]  do_syscall_64+0x3a/0x80
> [  577.611576]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [  577.611584] RIP: 0033:0x7fb5dd50b29f
> [  577.611614] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24
> 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00
> 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28
> 00 00
> [  577.611622] RSP: 002b:000000007d2ad3f0 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [  577.611632] RAX: ffffffffffffffda RBX: 000000007d2ad6b8 RCX: 00007fb5dd50b29f
> [  577.611639] RDX: 000000007d2ad4d0 RSI: 00000000c0186444 RDI: 00000000000000bd
> [  577.611645] RBP: 000000007d2ad4d0 R08: 00007fb54c038b60 R09: 000000007d2ad490
> [  577.611651] R10: 000000007fe71860 R11: 0000000000000246 R12: 00000000c0186444
> [  577.611657] R13: 00000000000000bd R14: 000000007d2ad690 R15: 00007fb54c038c20
> [  577.611692]  </TASK>
> [  577.611698] irq event stamp: 7565
> [  577.611703] hardirqs last  enabled at (7573): [<ffffffff9e17a17e>]
> __up_console_sem+0x5e/0x70
> [  577.611713] hardirqs last disabled at (7580): [<ffffffff9e17a163>]
> __up_console_sem+0x43/0x70
> [  577.611722] softirqs last  enabled at (7416): [<ffffffff9e0f2cb1>]
> __irq_exit_rcu+0xd1/0x160
> [  577.611731] softirqs last disabled at (7409): [<ffffffff9e0f2cb1>]
> __irq_exit_rcu+0xd1/0x160
> [  577.611739] ---[ end trace 0000000000000000 ]---
>
> Unfortunately git bisect is not did not provide expected result due to
> the large number of failed builds [3].
>
> git bisect says that the code that prints the warnings was added by
> Christian König.
>
> $ git blame drivers/dma-buf/dma-fence-array.c -L 181,201 e8b767f5e04097a^
> Blaming lines:   9% (21/221), done.
> 0fd9803b985e5 drivers/dma-buf/dma-fence-array.c (Christian König
> 2022-01-19 11:40:21 +0100 181)          * containers or otherwise we
> run into recursion and potential kernel
> 0fd9803b985e5 drivers/dma-buf/dma-fence-array.c (Christian König
> 2022-01-19 11:40:21 +0100 182)          * stack overflow on operations
> on the dma_fence_array.
> 0fd9803b985e5 drivers/dma-buf/dma-fence-array.c (Christian König
> 2022-01-19 11:40:21 +0100 183)          *
> 0fd9803b985e5 drivers/dma-buf/dma-fence-array.c (Christian König
> 2022-01-19 11:40:21 +0100 184)          * The correct way of handling
> this is to flatten out the array by the
> 0fd9803b985e5 drivers/dma-buf/dma-fence-array.c (Christian König
> 2022-01-19 11:40:21 +0100 185)          * caller instead.
> 0fd9803b985e5 drivers/dma-buf/dma-fence-array.c (Christian König
> 2022-01-19 11:40:21 +0100 186)          *
> 0fd9803b985e5 drivers/dma-buf/dma-fence-array.c (Christian König
> 2022-01-19 11:40:21 +0100 187)          * Enforce this here by
> checking that we don't create a dma_fence_array
> 0fd9803b985e5 drivers/dma-buf/dma-fence-array.c (Christian König
> 2022-01-19 11:40:21 +0100 188)          * with any container inside.
> 0fd9803b985e5 drivers/dma-buf/dma-fence-array.c (Christian König
> 2022-01-19 11:40:21 +0100 189)          */
> 0fd9803b985e5 drivers/dma-buf/dma-fence-array.c (Christian König
> 2022-01-19 11:40:21 +0100 190)         while (num_fences--)
> 0fd9803b985e5 drivers/dma-buf/dma-fence-array.c (Christian König
> 2022-01-19 11:40:21 +0100 191)
> WARN_ON(dma_fence_is_container(fences[num_fences]));
> 0fd9803b985e5 drivers/dma-buf/dma-fence-array.c (Christian König
> 2022-01-19 11:40:21 +0100 192)
> b3dfbdf261e07 drivers/dma-buf/fence-array.c     (Gustavo Padovan
> 2016-06-01 15:10:03 +0200 193)         return array;
> b3dfbdf261e07 drivers/dma-buf/fence-array.c     (Gustavo Padovan
> 2016-06-01 15:10:03 +0200 194) }
> f54d1867005c3 drivers/dma-buf/dma-fence-array.c (Chris Wilson
> 2016-10-25 13:00:45 +0100 195) EXPORT_SYMBOL(dma_fence_array_create);
> d5b72a2123dfa drivers/dma-buf/dma-fence-array.c (Philipp Zabel
> 2017-03-17 17:34:49 +0100 196)
> d5b72a2123dfa drivers/dma-buf/dma-fence-array.c (Philipp Zabel
> 2017-03-17 17:34:49 +0100 197) /**
> d5b72a2123dfa drivers/dma-buf/dma-fence-array.c (Philipp Zabel
> 2017-03-17 17:34:49 +0100 198)  * dma_fence_match_context - Check if
> all fences are from the given context
> d5b72a2123dfa drivers/dma-buf/dma-fence-array.c (Philipp Zabel
> 2017-03-17 17:34:49 +0100 199)  * @fence:              [in]    fence
> or fence array
> d5b72a2123dfa drivers/dma-buf/dma-fence-array.c (Philipp Zabel
> 2017-03-17 17:34:49 +0100 200)  * @context:            [in]    fence
> context to check all fences against
>
> Christian can you lit the light on what's going on here?
> Thanks.
>
> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FtSWvLBus&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Cbc778b55f1b94b98191508da15a149f9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637846080776251302%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=bs0G%2B7g8baraTiYldT0DmIaAMfxLxTGJZ%2Fj18OJqD4w%3D&amp;reserved=0
> [2] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FVqNmYDm2&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Cbc778b55f1b94b98191508da15a149f9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637846080776251302%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=ndYV9JGqqj7Qc4QLcRv2My%2BuKN4Y8GlDll6I0CNrZEg%3D&amp;reserved=0
> [3] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FefHf3UF3&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Cbc778b55f1b94b98191508da15a149f9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637846080776251302%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=ohOOaYemQAX%2FQOPEjuA6wXZn6h%2FlU5s2Ol9RssrOBUU%3D&amp;reserved=0
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some games stopped working.
  2022-04-04  6:30 ` Christian König
@ 2022-04-04  8:22   ` Paul Menzel
  2022-04-04  8:38     ` Christian König
  2022-04-08 11:01   ` Mikhail Gavrilov
  1 sibling, 1 reply; 15+ messages in thread
From: Paul Menzel @ 2022-04-04  8:22 UTC (permalink / raw)
  To: Christian König
  Cc: Mikhail Gavrilov, amd-gfx, daniel.vetter, thomas.hellstrom, LKML

Dear Christian,


Am 04.04.22 um 08:30 schrieb Christian König:

> those are two independent and already known problems.
> 
> The warning triggered from the sync_file is already fixed in 
> drm-misc-next-fixes, but so far I couldn't figure out why the games 
> suddenly doesn't work any more.
> 
> There is a bug report for that, but bisecting the changes didn't yielded 
> anything valuable so far.
> 
> So if you can come up with something that would be rather valuable.

It’d be great, if you (or somebody else) could provide the URL to that 
issue.


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some games stopped working.
  2022-04-04  8:22   ` Paul Menzel
@ 2022-04-04  8:38     ` Christian König
  0 siblings, 0 replies; 15+ messages in thread
From: Christian König @ 2022-04-04  8:38 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Mikhail Gavrilov, amd-gfx, daniel.vetter, thomas.hellstrom, LKML

Am 04.04.22 um 10:22 schrieb Paul Menzel:
> Dear Christian,
>
>
> Am 04.04.22 um 08:30 schrieb Christian König:
>
>> those are two independent and already known problems.
>>
>> The warning triggered from the sync_file is already fixed in 
>> drm-misc-next-fixes, but so far I couldn't figure out why the games 
>> suddenly doesn't work any more.
>>
>> There is a bug report for that, but bisecting the changes didn't 
>> yielded anything valuable so far.
>>
>> So if you can come up with something that would be rather valuable.
>
> It’d be great, if you (or somebody else) could provide the URL to that 
> issue.

I don't have the URL of hand either, just search the mailing list.

Regards,
Christian.

>
>
> Kind regards,
>
> Paul


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some games stopped working.
  2022-04-04  6:30 ` Christian König
  2022-04-04  8:22   ` Paul Menzel
@ 2022-04-08 11:01   ` Mikhail Gavrilov
  2022-04-08 11:13     ` Christian König
  1 sibling, 1 reply; 15+ messages in thread
From: Mikhail Gavrilov @ 2022-04-08 11:01 UTC (permalink / raw)
  To: Christian König, Ken.Xue, Deucher, Alexander
  Cc: amd-gfx list, Daniel Vetter, thomas.hellstrom, Linux List Kernel Mailing

Hi Christian

> those are two independent and already known problems.
>
> The warning triggered from the sync_file is already fixed in
> drm-misc-next-fixes, but so far I couldn't figure out why the games
> suddenly doesn't work any more.

I thought that these warnings are related to the stuck of the listed games.

> There is a bug report for that, but bisecting the changes didn't yielded
> anything valuable so far.
>
> So if you can come up with something that would be rather valuable.

I found how to fix my build problems. They are all related to gcc12.
And making again git bisect and found which commit lead to stuck the
games "Forza Horizon 5", "Forza Horizon 4", "Cyberpunk 2077".
At least it affected hardware Radeon 6900 XT, Radeon 6800M and Radeon VII.

$ git bisect log
git bisect start
# good: [ed4643521e6af8ab8ed1e467630a85884d2696cf] Merge tag
'arm-dt-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
git bisect good ed4643521e6af8ab8ed1e467630a85884d2696cf
# bad: [34af78c4e616c359ed428d79fe4758a35d2c5473] Merge tag
'iommu-updates-v5.18' of
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
git bisect bad 34af78c4e616c359ed428d79fe4758a35d2c5473
# good: [4a0cb83ba6e0cd73a50fa4f84736846bf0029f2b] netdevice: add
missing dm_private kdoc
git bisect good 4a0cb83ba6e0cd73a50fa4f84736846bf0029f2b
# skip: [2ab82efeeed885c0210a0029df93bb95a316e8c7] Merge tag
'drm-intel-gt-next-2022-03-03' of
git://anongit.freedesktop.org/drm/drm-intel into drm-next
git bisect skip 2ab82efeeed885c0210a0029df93bb95a316e8c7
# good: [00598b056aa6d46c7a6819efa850ec9d0d690d76] scsi: smartpqi:
Expose SAS address for SATA drives
git bisect good 00598b056aa6d46c7a6819efa850ec9d0d690d76
# good: [00598b056aa6d46c7a6819efa850ec9d0d690d76] scsi: smartpqi:
Expose SAS address for SATA drives
git bisect good 00598b056aa6d46c7a6819efa850ec9d0d690d76
# skip: [c674c5b9342e5cb0f3d9e9bcaf37dbe2087845e5] drm/i915/xehp: CCS
should use RCS setup functions
git bisect skip c674c5b9342e5cb0f3d9e9bcaf37dbe2087845e5
# good: [f0d4ce59f4d48622044933054a0e0cefa91ba15e] drm/i915: Disable
DRRS on IVB/HSW port != A
git bisect good f0d4ce59f4d48622044933054a0e0cefa91ba15e
# skip: [6de7e4f02640fba2ffa6ac04e2be13785d614175] Merge tag
'drm-msm-next-2022-03-01' of https://gitlab.freedesktop.org/drm/msm
into drm-next
git bisect skip 6de7e4f02640fba2ffa6ac04e2be13785d614175
# bad: [868f4357ed0d1e2f96bbd67d4ac862aa6335effe] drm/amd/display: Add
DMUB support for DCN316
git bisect bad 868f4357ed0d1e2f96bbd67d4ac862aa6335effe
# good: [39da460fd4c0f8e7290dcc9cbfc9375de9d0eeca] drm/amd/display:
Fix DP LT sequence on EQ fail
git bisect good 39da460fd4c0f8e7290dcc9cbfc9375de9d0eeca
# good: [3f268ef06f8cf3c481dbd5843d564f5170c6df54] drm/ttm: add back a
reference to the bdev to the res manager
git bisect good 3f268ef06f8cf3c481dbd5843d564f5170c6df54
# bad: [123db17ddff007080d464e785689fb14f94cbc7a] Merge tag
'amd-drm-next-5.18-2022-02-11-1' of
https://gitlab.freedesktop.org/agd5f/linux into drm-next
git bisect bad 123db17ddff007080d464e785689fb14f94cbc7a
# bad: [24992ab0b8b0d2521caa9c3dcbed0e2a56cbe3d0] drm/amdkfd: Fix
prototype warning for get_process_num_bos
git bisect bad 24992ab0b8b0d2521caa9c3dcbed0e2a56cbe3d0
# good: [1cbbc8d4f788af4c260ef3cae05902ef7b191197] drm/radeon/uvd: Fix
forgotten unmap buffer objects
git bisect good 1cbbc8d4f788af4c260ef3cae05902ef7b191197
# good: [69f915cc97c4bb82b34105a47abf613f7c87215d] drm/amdgpu: loose
check for umc poison mode
git bisect good 69f915cc97c4bb82b34105a47abf613f7c87215d
# good: [8bbd4d83a68beaf54ae01b2e2aa2024ff1dfc0ba] drm/amdgpu: Reset
OOB table error count info
git bisect good 8bbd4d83a68beaf54ae01b2e2aa2024ff1dfc0ba
# bad: [1915a433954262ac7466469d1a4684ac54218af4] drm/amdgpu: adjust
register address calculation
git bisect bad 1915a433954262ac7466469d1a4684ac54218af4
# bad: [461fa7b0ac565ef25c1da0ced31005dd437883a7] drm/amdgpu: remove ctx->lock
git bisect bad 461fa7b0ac565ef25c1da0ced31005dd437883a7
# first bad commit: [461fa7b0ac565ef25c1da0ced31005dd437883a7]
drm/amdgpu: remove ctx->lock

461fa7b0ac565ef25c1da0ced31005dd437883a7 is the first bad commit
commit 461fa7b0ac565ef25c1da0ced31005dd437883a7
Author: Ken Xue <Ken.Xue@amd.com>
Date:   Fri Feb 11 16:18:46 2022 -0500

    drm/amdgpu: remove ctx->lock

    KMD reports a warning on holding a lock from drm_syncobj_find_fence,
    when running amdgpu_test case “syncobj timeline test”.

    ctx->lock was designed to prevent concurrent "amdgpu_ctx_wait_prev_fence"
    calls and avoid dead reservation lock from GPU reset. since no reservation
    lock is held in latest GPU reset any more, ctx->lock can be simply removed
    and concurrent "amdgpu_ctx_wait_prev_fence" call also can be prevented by
    PD root bo reservation lock.

    call stacks:
    =================
    //hold lock
    amdgpu_cs_ioctl->amdgpu_cs_parser_init->mutex_lock(&parser->ctx->lock);
    …
    //report warning
    amdgpu_cs_dependencies->amdgpu_cs_process_syncobj_timeline_in_dep \
    ->amdgpu_syncobj_lookup_and_add_to_sync -> drm_syncobj_find_fence \
    -> lockdep_assert_none_held_once
    …
    amdgpu_cs_ioctl->amdgpu_cs_parser_fini->mutex_unlock(&parser->ctx->lock);

    Signed-off-by: Ken Xue <Ken.Xue@amd.com>
    Reviewed-by: Christian König <christian.koenig@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 16 +++++++++++-----
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c |  2 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h |  1 -
 3 files changed, 11 insertions(+), 8 deletions(-)

After reverting commits 57230f0ce6eda6d47a2029b7b3a39cc5bb63fe32,
461fa7b0ac565ef25c1da0ced31005dd437883a7 the games "Forza Horizon 5",
"Forza Horizon 4", "Cyberpunk 2077" start working again.
Removing commit 57230f0ce6eda6d47a2029b7b3a39cc5bb63fe32 isn't really
needed. I made it because I didn't want to resolve conflicts.


-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some games stopped working.
  2022-04-08 11:01   ` Mikhail Gavrilov
@ 2022-04-08 11:13     ` Christian König
  2022-04-08 12:24       ` Mikhail Gavrilov
  0 siblings, 1 reply; 15+ messages in thread
From: Christian König @ 2022-04-08 11:13 UTC (permalink / raw)
  To: Mikhail Gavrilov, Ken.Xue, Deucher, Alexander
  Cc: amd-gfx list, Daniel Vetter, thomas.hellstrom, Linux List Kernel Mailing

Am 08.04.22 um 13:01 schrieb Mikhail Gavrilov:
> Hi Christian
>
>> those are two independent and already known problems.
>>
>> The warning triggered from the sync_file is already fixed in
>> drm-misc-next-fixes, but so far I couldn't figure out why the games
>> suddenly doesn't work any more.
> I thought that these warnings are related to the stuck of the listed games.
>
>> There is a bug report for that, but bisecting the changes didn't yielded
>> anything valuable so far.
>>
>> So if you can come up with something that would be rather valuable.
> I found how to fix my build problems. They are all related to gcc12.
> And making again git bisect and found which commit lead to stuck the
> games "Forza Horizon 5", "Forza Horizon 4", "Cyberpunk 2077".
> At least it affected hardware Radeon 6900 XT, Radeon 6800M and Radeon VII.

I own you a beer.

I still don't know what happens here, but that makes at least a bit more 
sense than a patch which only changes comments :)

Looks like we are missing something here. Can I send you a patch to try 
something later today?

Thanks,
Christian.

>
> $ git bisect log
> git bisect start
> # good: [ed4643521e6af8ab8ed1e467630a85884d2696cf] Merge tag
> 'arm-dt-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
> git bisect good ed4643521e6af8ab8ed1e467630a85884d2696cf
> # bad: [34af78c4e616c359ed428d79fe4758a35d2c5473] Merge tag
> 'iommu-updates-v5.18' of
> git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
> git bisect bad 34af78c4e616c359ed428d79fe4758a35d2c5473
> # good: [4a0cb83ba6e0cd73a50fa4f84736846bf0029f2b] netdevice: add
> missing dm_private kdoc
> git bisect good 4a0cb83ba6e0cd73a50fa4f84736846bf0029f2b
> # skip: [2ab82efeeed885c0210a0029df93bb95a316e8c7] Merge tag
> 'drm-intel-gt-next-2022-03-03' of
> git://anongit.freedesktop.org/drm/drm-intel into drm-next
> git bisect skip 2ab82efeeed885c0210a0029df93bb95a316e8c7
> # good: [00598b056aa6d46c7a6819efa850ec9d0d690d76] scsi: smartpqi:
> Expose SAS address for SATA drives
> git bisect good 00598b056aa6d46c7a6819efa850ec9d0d690d76
> # good: [00598b056aa6d46c7a6819efa850ec9d0d690d76] scsi: smartpqi:
> Expose SAS address for SATA drives
> git bisect good 00598b056aa6d46c7a6819efa850ec9d0d690d76
> # skip: [c674c5b9342e5cb0f3d9e9bcaf37dbe2087845e5] drm/i915/xehp: CCS
> should use RCS setup functions
> git bisect skip c674c5b9342e5cb0f3d9e9bcaf37dbe2087845e5
> # good: [f0d4ce59f4d48622044933054a0e0cefa91ba15e] drm/i915: Disable
> DRRS on IVB/HSW port != A
> git bisect good f0d4ce59f4d48622044933054a0e0cefa91ba15e
> # skip: [6de7e4f02640fba2ffa6ac04e2be13785d614175] Merge tag
> 'drm-msm-next-2022-03-01' of https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Fmsm&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Cfb3893e4302546eb959608da194f3614%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637850125371388430%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=vlmJpTnApB4G9cnTiXK2U%2B8Qc6ZqDESk8sCdZmN0w1I%3D&amp;reserved=0
> into drm-next
> git bisect skip 6de7e4f02640fba2ffa6ac04e2be13785d614175
> # bad: [868f4357ed0d1e2f96bbd67d4ac862aa6335effe] drm/amd/display: Add
> DMUB support for DCN316
> git bisect bad 868f4357ed0d1e2f96bbd67d4ac862aa6335effe
> # good: [39da460fd4c0f8e7290dcc9cbfc9375de9d0eeca] drm/amd/display:
> Fix DP LT sequence on EQ fail
> git bisect good 39da460fd4c0f8e7290dcc9cbfc9375de9d0eeca
> # good: [3f268ef06f8cf3c481dbd5843d564f5170c6df54] drm/ttm: add back a
> reference to the bdev to the res manager
> git bisect good 3f268ef06f8cf3c481dbd5843d564f5170c6df54
> # bad: [123db17ddff007080d464e785689fb14f94cbc7a] Merge tag
> 'amd-drm-next-5.18-2022-02-11-1' of
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fagd5f%2Flinux&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Cfb3893e4302546eb959608da194f3614%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637850125371388430%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=UVN%2B7LS7%2FWeSJRfRptlnHCYHE1fffbSIw5GbhP7YWM0%3D&amp;reserved=0 into drm-next
> git bisect bad 123db17ddff007080d464e785689fb14f94cbc7a
> # bad: [24992ab0b8b0d2521caa9c3dcbed0e2a56cbe3d0] drm/amdkfd: Fix
> prototype warning for get_process_num_bos
> git bisect bad 24992ab0b8b0d2521caa9c3dcbed0e2a56cbe3d0
> # good: [1cbbc8d4f788af4c260ef3cae05902ef7b191197] drm/radeon/uvd: Fix
> forgotten unmap buffer objects
> git bisect good 1cbbc8d4f788af4c260ef3cae05902ef7b191197
> # good: [69f915cc97c4bb82b34105a47abf613f7c87215d] drm/amdgpu: loose
> check for umc poison mode
> git bisect good 69f915cc97c4bb82b34105a47abf613f7c87215d
> # good: [8bbd4d83a68beaf54ae01b2e2aa2024ff1dfc0ba] drm/amdgpu: Reset
> OOB table error count info
> git bisect good 8bbd4d83a68beaf54ae01b2e2aa2024ff1dfc0ba
> # bad: [1915a433954262ac7466469d1a4684ac54218af4] drm/amdgpu: adjust
> register address calculation
> git bisect bad 1915a433954262ac7466469d1a4684ac54218af4
> # bad: [461fa7b0ac565ef25c1da0ced31005dd437883a7] drm/amdgpu: remove ctx->lock
> git bisect bad 461fa7b0ac565ef25c1da0ced31005dd437883a7
> # first bad commit: [461fa7b0ac565ef25c1da0ced31005dd437883a7]
> drm/amdgpu: remove ctx->lock
>
> 461fa7b0ac565ef25c1da0ced31005dd437883a7 is the first bad commit
> commit 461fa7b0ac565ef25c1da0ced31005dd437883a7
> Author: Ken Xue <Ken.Xue@amd.com>
> Date:   Fri Feb 11 16:18:46 2022 -0500
>
>      drm/amdgpu: remove ctx->lock
>
>      KMD reports a warning on holding a lock from drm_syncobj_find_fence,
>      when running amdgpu_test case “syncobj timeline test”.
>
>      ctx->lock was designed to prevent concurrent "amdgpu_ctx_wait_prev_fence"
>      calls and avoid dead reservation lock from GPU reset. since no reservation
>      lock is held in latest GPU reset any more, ctx->lock can be simply removed
>      and concurrent "amdgpu_ctx_wait_prev_fence" call also can be prevented by
>      PD root bo reservation lock.
>
>      call stacks:
>      =================
>      //hold lock
>      amdgpu_cs_ioctl->amdgpu_cs_parser_init->mutex_lock(&parser->ctx->lock);
>      …
>      //report warning
>      amdgpu_cs_dependencies->amdgpu_cs_process_syncobj_timeline_in_dep \
>      ->amdgpu_syncobj_lookup_and_add_to_sync -> drm_syncobj_find_fence \
>      -> lockdep_assert_none_held_once
>      …
>      amdgpu_cs_ioctl->amdgpu_cs_parser_fini->mutex_unlock(&parser->ctx->lock);
>
>      Signed-off-by: Ken Xue <Ken.Xue@amd.com>
>      Reviewed-by: Christian König <christian.koenig@amd.com>
>      Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 16 +++++++++++-----
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c |  2 --
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h |  1 -
>   3 files changed, 11 insertions(+), 8 deletions(-)
>
> After reverting commits 57230f0ce6eda6d47a2029b7b3a39cc5bb63fe32,
> 461fa7b0ac565ef25c1da0ced31005dd437883a7 the games "Forza Horizon 5",
> "Forza Horizon 4", "Cyberpunk 2077" start working again.
> Removing commit 57230f0ce6eda6d47a2029b7b3a39cc5bb63fe32 isn't really
> needed. I made it because I didn't want to resolve conflicts.
>
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some games stopped working.
  2022-04-08 11:13     ` Christian König
@ 2022-04-08 12:24       ` Mikhail Gavrilov
  2022-04-08 14:27         ` Christian König
  0 siblings, 1 reply; 15+ messages in thread
From: Mikhail Gavrilov @ 2022-04-08 12:24 UTC (permalink / raw)
  To: Christian König
  Cc: Ken.Xue, Deucher, Alexander, amd-gfx list, Daniel Vetter,
	thomas.hellstrom, Linux List Kernel Mailing

On Fri, 8 Apr 2022 at 16:13, Christian König <christian.koenig@amd.com> wrote:

> I own you a beer.
>
> I still don't know what happens here, but that makes at least a bit more
> sense than a patch which only changes comments :)
>
> Looks like we are missing something here. Can I send you a patch to try
> something later today?

Yes, please feel free to send me a patch for testing.

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some games stopped working.
  2022-04-08 12:24       ` Mikhail Gavrilov
@ 2022-04-08 14:27         ` Christian König
  2022-04-08 17:25           ` Mikhail Gavrilov
  0 siblings, 1 reply; 15+ messages in thread
From: Christian König @ 2022-04-08 14:27 UTC (permalink / raw)
  To: Mikhail Gavrilov
  Cc: Ken.Xue, Deucher, Alexander, amd-gfx list, Daniel Vetter,
	thomas.hellstrom, Linux List Kernel Mailing

[-- Attachment #1: Type: text/plain, Size: 1208 bytes --]

Am 08.04.22 um 14:24 schrieb Mikhail Gavrilov:
> On Fri, 8 Apr 2022 at 16:13, Christian König <christian.koenig@amd.com> wrote:
>
>> I own you a beer.
>>
>> I still don't know what happens here, but that makes at least a bit more
>> sense than a patch which only changes comments :)
>>
>> Looks like we are missing something here. Can I send you a patch to try
>> something later today?
> Yes, please feel free to send me a patch for testing.
>

Please test the attached patch, it just re-introduce the lock without 
doing much else.

And does your branch contain the following patch:

commit d18b8eadd83e3d8d63a45f9479478640dbcfca02
Author: Christian König <christian.koenig@amd.com>
Date:   Wed Feb 23 14:35:31 2022 +0100

     drm/amdgpu: install ctx entities with cmpxchg

     Since we removed the context lock we need to make sure that not two 
threads
     are trying to install an entity at the same time.

     Signed-off-by: Christian König <christian.koenig@amd.com>
     Fixes: 461fa7b0ac565e ("drm/amdgpu: remove ctx->lock")
     Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Thanks,
Christian.

[-- Attachment #2: 0001-drm-amdgpu-partial-revert-remove-ctx-lock.patch --]
[-- Type: text/x-patch, Size: 2754 bytes --]

From e2e39cb1a4a1c7c0e3ff2e4e0188394b0eda0ba6 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christian=20K=C3=B6nig?= <christian.koenig@amd.com>
Date: Fri, 8 Apr 2022 16:22:55 +0200
Subject: [PATCH] drm/amdgpu: partial revert "remove ctx->lock"
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This reverts commit 461fa7b0ac565ef25c1da0ced31005dd437883a7.

We are missing some inter dependencies here.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 4 ++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h | 1 +
 3 files changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 8de283997769..5471b93f6808 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -128,6 +128,8 @@ static int amdgpu_cs_parser_init(struct amdgpu_cs_parser *p, union drm_amdgpu_cs
 		goto free_chunk;
 	}
 
+	mutex_lock(&p->ctx->lock);
+
 	/* skip guilty context job */
 	if (atomic_read(&p->ctx->guilty) == 1) {
 		ret = -ECANCELED;
@@ -688,6 +690,7 @@ static void amdgpu_cs_parser_fini(struct amdgpu_cs_parser *parser, int error,
 	dma_fence_put(parser->fence);
 
 	if (parser->ctx) {
+		mutex_unlock(&parser->ctx->lock);
 		amdgpu_ctx_put(parser->ctx);
 	}
 	if (parser->bo_list)
@@ -1332,6 +1335,7 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
 		goto out;
 
 	r = amdgpu_cs_submit(&parser, cs);
+
 out:
 	amdgpu_cs_parser_fini(&parser, r, reserved_buffers);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 5981c7d9bd48..8f0e6d93bb9c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -237,6 +237,7 @@ static int amdgpu_ctx_init(struct amdgpu_device *adev,
 
 	kref_init(&ctx->refcount);
 	spin_lock_init(&ctx->ring_lock);
+	mutex_init(&ctx->lock);
 
 	ctx->reset_counter = atomic_read(&adev->gpu_reset_counter);
 	ctx->reset_counter_query = ctx->reset_counter;
@@ -357,6 +358,7 @@ static void amdgpu_ctx_fini(struct kref *ref)
 		drm_dev_exit(idx);
 	}
 
+	mutex_destroy(&ctx->lock);
 	kfree(ctx);
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
index d0cbfcea90f7..142f2f87d44c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
@@ -49,6 +49,7 @@ struct amdgpu_ctx {
 	bool				preamble_presented;
 	int32_t				init_priority;
 	int32_t				override_priority;
+	struct mutex			lock;
 	atomic_t			guilty;
 	unsigned long			ras_counter_ce;
 	unsigned long			ras_counter_ue;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some games stopped working.
  2022-04-08 14:27         ` Christian König
@ 2022-04-08 17:25           ` Mikhail Gavrilov
  2022-04-09 14:27             ` Christian König
  0 siblings, 1 reply; 15+ messages in thread
From: Mikhail Gavrilov @ 2022-04-08 17:25 UTC (permalink / raw)
  To: Christian König
  Cc: Ken.Xue, Deucher, Alexander, amd-gfx list, Daniel Vetter,
	thomas.hellstrom, Linux List Kernel Mailing

On Fri, 8 Apr 2022 at 19:27, Christian König <christian.koenig@amd.com> wrote:
>
> Please test the attached patch, it just re-introduce the lock without
> doing much else.
>
> And does your branch contain the following patch:
>
> commit d18b8eadd83e3d8d63a45f9479478640dbcfca02
> Author: Christian König <christian.koenig@amd.com>
> Date:   Wed Feb 23 14:35:31 2022 +0100
>
>      drm/amdgpu: install ctx entities with cmpxchg
>
>      Since we removed the context lock we need to make sure that not two
> threads
>      are trying to install an entity at the same time.
>
>      Signed-off-by: Christian König <christian.koenig@amd.com>
>      Fixes: 461fa7b0ac565e ("drm/amdgpu: remove ctx->lock")
>      Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>      Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

The all listed games are now working with an attached patch.
Also flood messages "WARNING: CPU: 31 PID: 51848 at
drivers/dma-buf/dma-fence-array.c:191
dma_fence_array_create+0x101/0x120" has gone.

Thanks.

Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some games stopped working.
  2022-04-08 17:25           ` Mikhail Gavrilov
@ 2022-04-09 14:27             ` Christian König
  2022-04-15  5:38               ` Mikhail Gavrilov
  0 siblings, 1 reply; 15+ messages in thread
From: Christian König @ 2022-04-09 14:27 UTC (permalink / raw)
  To: Mikhail Gavrilov, Christian König
  Cc: thomas.hellstrom, Daniel Vetter, Linux List Kernel Mailing,
	amd-gfx list, Deucher, Alexander, Ken.Xue

Am 08.04.22 um 19:25 schrieb Mikhail Gavrilov:
> On Fri, 8 Apr 2022 at 19:27, Christian König <christian.koenig@amd.com> wrote:
>> Please test the attached patch, it just re-introduce the lock without
>> doing much else.
>>
>> And does your branch contain the following patch:
>>
>> commit d18b8eadd83e3d8d63a45f9479478640dbcfca02
>> Author: Christian König <christian.koenig@amd.com>
>> Date:   Wed Feb 23 14:35:31 2022 +0100
>>
>>       drm/amdgpu: install ctx entities with cmpxchg
>>
>>       Since we removed the context lock we need to make sure that not two
>> threads
>>       are trying to install an entity at the same time.
>>
>>       Signed-off-by: Christian König <christian.koenig@amd.com>
>>       Fixes: 461fa7b0ac565e ("drm/amdgpu: remove ctx->lock")
>>       Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>       Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> The all listed games are now working with an attached patch.
> Also flood messages "WARNING: CPU: 31 PID: 51848 at
> drivers/dma-buf/dma-fence-array.c:191
> dma_fence_array_create+0x101/0x120" has gone.

That's unfortunately not the end of the story.

This is fixing your problem, but reintroducing the original problem that 
we call the syncobj with a lock held which can crash badly as well.

Going to take a closer look on Monday. I hope you can test a few more 
patches to help narrow down what's actually going wrong here.

Thanks,
Christian.

>
> Thanks.
>
> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some games stopped working.
  2022-04-09 14:27             ` Christian König
@ 2022-04-15  5:38               ` Mikhail Gavrilov
  2022-04-15  8:04                 ` Christian König
  0 siblings, 1 reply; 15+ messages in thread
From: Mikhail Gavrilov @ 2022-04-15  5:38 UTC (permalink / raw)
  To: Christian König
  Cc: Christian König, thomas.hellstrom, Daniel Vetter,
	Linux List Kernel Mailing, amd-gfx list, Deucher, Alexander,
	Ken.Xue

On Sat, Apr 9, 2022 at 7:27 PM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> That's unfortunately not the end of the story.
>
> This is fixing your problem, but reintroducing the original problem that
> we call the syncobj with a lock held which can crash badly as well.
>
> Going to take a closer look on Monday. I hope you can test a few more
> patches to help narrow down what's actually going wrong here.
>
> Thanks,
> Christian.
>

Hi Christian.
I'm sorry to trouble you.
Have you forgotten about this issue?

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some games stopped working.
  2022-04-15  5:38               ` Mikhail Gavrilov
@ 2022-04-15  8:04                 ` Christian König
  2022-05-11  9:05                   ` Mikhail Gavrilov
  0 siblings, 1 reply; 15+ messages in thread
From: Christian König @ 2022-04-15  8:04 UTC (permalink / raw)
  To: Mikhail Gavrilov
  Cc: Christian König, thomas.hellstrom, Daniel Vetter,
	Linux List Kernel Mailing, amd-gfx list, Deucher, Alexander,
	Ken.Xue

Am 15.04.22 um 07:38 schrieb Mikhail Gavrilov:
> On Sat, Apr 9, 2022 at 7:27 PM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
>> That's unfortunately not the end of the story.
>>
>> This is fixing your problem, but reintroducing the original problem that
>> we call the syncobj with a lock held which can crash badly as well.
>>
>> Going to take a closer look on Monday. I hope you can test a few more
>> patches to help narrow down what's actually going wrong here.
>>
>> Thanks,
>> Christian.
>>
> Hi Christian.
> I'm sorry to trouble you.
> Have you forgotten about this issue?
>

No, I just couldn't find time during all that bug fixing :)

Sorry for the delay, going to take a look after the eastern holiday here.

Christian.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some games stopped working.
  2022-04-15  8:04                 ` Christian König
@ 2022-05-11  9:05                   ` Mikhail Gavrilov
  2022-05-11 12:01                     ` Christian König
  0 siblings, 1 reply; 15+ messages in thread
From: Mikhail Gavrilov @ 2022-05-11  9:05 UTC (permalink / raw)
  Cc: Linux List Kernel Mailing, amd-gfx list

On Fri, Apr 15, 2022 at 1:04 PM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> No, I just couldn't find time during all that bug fixing :)
>
> Sorry for the delay, going to take a look after the eastern holiday here.
>
> Christian.

The message is just for history. The issue was fixed between
b253435746d9a4a and 5.18rc4.

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some games stopped working.
  2022-05-11  9:05                   ` Mikhail Gavrilov
@ 2022-05-11 12:01                     ` Christian König
  2022-10-17 22:43                       ` Mikhail Gavrilov
  0 siblings, 1 reply; 15+ messages in thread
From: Christian König @ 2022-05-11 12:01 UTC (permalink / raw)
  To: Mikhail Gavrilov; +Cc: Linux List Kernel Mailing, amd-gfx list

Am 11.05.22 um 11:05 schrieb Mikhail Gavrilov:
> On Fri, Apr 15, 2022 at 1:04 PM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
>> No, I just couldn't find time during all that bug fixing :)
>>
>> Sorry for the delay, going to take a look after the eastern holiday here.
>>
>> Christian.
> The message is just for history. The issue was fixed between
> b253435746d9a4a and 5.18rc4.

We have implemented a workaround, but still don't know the exact root cause.

If anybody wants to look into this it would be rather helpful to be able 
to reproduce the issue.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some games stopped working.
  2022-05-11 12:01                     ` Christian König
@ 2022-10-17 22:43                       ` Mikhail Gavrilov
  0 siblings, 0 replies; 15+ messages in thread
From: Mikhail Gavrilov @ 2022-10-17 22:43 UTC (permalink / raw)
  To: Christian König, Deucher, Alexander
  Cc: Linux List Kernel Mailing, amd-gfx list, dri-devel

On Wed, May 11, 2022 at 5:01 PM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
>
> We have implemented a workaround, but still don't know the exact root cause.
>
> If anybody wants to look into this it would be rather helpful to be able
> to reproduce the issue.
>
> Regards,
> Christian.

I see that issue was returned after this commit
dd80d9c8eecac8c516da5b240d01a35660ba6cb6 is the first bad commit
commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6
Author: Christian König <christian.koenig@amd.com>
Date:   Thu Jul 14 10:23:38 2022 +0200

    drm/amdgpu: revert "partial revert "remove ctx->lock" v2"

    This reverts commit 94f4c4965e5513ba624488f4b601d6b385635aec.

    We found that the bo_list is missing a protection for its list entries.
    Since that is fixed now this workaround can be removed again.

    Signed-off-by: Christian König <christian.koenig@amd.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 21 ++++++---------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c |  2 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h |  1 -
 3 files changed, 6 insertions(+), 18 deletions(-)

The games Forza Horizon 4 and Cyberpunk 2077 again hangs at start.


-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-10-17 22:44 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-03 18:39 [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some games stopped working Mikhail Gavrilov
2022-04-04  6:30 ` Christian König
2022-04-04  8:22   ` Paul Menzel
2022-04-04  8:38     ` Christian König
2022-04-08 11:01   ` Mikhail Gavrilov
2022-04-08 11:13     ` Christian König
2022-04-08 12:24       ` Mikhail Gavrilov
2022-04-08 14:27         ` Christian König
2022-04-08 17:25           ` Mikhail Gavrilov
2022-04-09 14:27             ` Christian König
2022-04-15  5:38               ` Mikhail Gavrilov
2022-04-15  8:04                 ` Christian König
2022-05-11  9:05                   ` Mikhail Gavrilov
2022-05-11 12:01                     ` Christian König
2022-10-17 22:43                       ` Mikhail Gavrilov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).