dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [BUG][5.20] refcount_t: underflow; use-after-free
@ 2022-08-14 21:11 Mikhail Gavrilov
  2022-08-15  0:20 ` Maíra Canal
  0 siblings, 1 reply; 13+ messages in thread
From: Mikhail Gavrilov @ 2022-08-14 21:11 UTC (permalink / raw)
  To: dri-devel, amd-gfx list, Christian König, Linux List Kernel Mailing

Hi folks.
Joined testing 5.20 today (7ebfc85e2cd7).
I encountered a frequently GPU freeze, after which a message appears
in the kernel logs:
[ 220.280990] ------------[ cut here ]------------
[ 220.281000] refcount_t: underflow; use-after-free.
[ 220.281019] WARNING: CPU: 1 PID: 3746 at lib/refcount.c:28
refcount_warn_saturate+0xba/0x110
[ 220.281029] Modules linked in: uinput rfcomm snd_seq_dummy
snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
qrtr bnep sunrpc snd_seq_midi snd_seq_midi_event vfat intel_rapl_msr
fat intel_rapl_common snd_hda_codec_realtek mt76x2u
snd_hda_codec_generic snd_hda_codec_hdmi mt76x2_common iwlmvm
mt76x02_usb edac_mce_amd mt76_usb snd_hda_intel snd_intel_dspcfg
mt76x02_lib snd_intel_sdw_acpi snd_usb_audio snd_hda_codec mt76
kvm_amd uvcvideo mac80211 snd_hda_core btusb eeepc_wmi snd_usbmidi_lib
videobuf2_vmalloc videobuf2_memops kvm btrtl snd_rawmidi asus_wmi
snd_hwdep videobuf2_v4l2 btbcm iwlwifi ledtrig_audio libarc4 btintel
snd_seq videobuf2_common sparse_keymap btmtk irqbypass videodev
snd_seq_device joydev xpad iwlmei platform_profile bluetooth
ff_memless snd_pcm mc rapl
[ 220.281185] video snd_timer cfg80211 wmi_bmof snd pcspkr soundcore
k10temp i2c_piix4 rfkill mei asus_ec_sensors acpi_cpufreq zram
hid_logitech_hidpp amdgpu igb dca drm_ttm_helper ttm crct10dif_pclmul
iommu_v2 crc32_pclmul gpu_sched crc32c_intel ucsi_ccg drm_buddy nvme
typec_ucsi ghash_clmulni_intel drm_display_helper ccp nvme_core typec
sp5100_tco cec wmi ip6_tables ip_tables fuse
[ 220.281258] Unloaded tainted modules: amd64_edac():1 amd64_edac():1
amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
amd64_edac():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1
[ 220.281388] pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1
fjes():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1
[ 220.281415] CPU: 1 PID: 3746 Comm: chrome:cs0 Tainted: G W L -------
--- 5.20.0-0.rc0.20220812git7ebfc85e2cd7.10.fc38.x86_64 #1
[ 220.281421] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022
[ 220.281426] RIP: 0010:refcount_warn_saturate+0xba/0x110
[ 220.281431] Code: 01 01 e8 79 4a 6f 00 0f 0b e9 42 47 a5 00 80 3d de
7e be 01 00 75 85 48 c7 c7 f8 98 8e 98 c6 05 ce 7e be 01 01 e8 56 4a
6f 00 <0f> 0b e9 1f 47 a5 00 80 3d b9 7e be 01 00 0f 85 5e ff ff ff 48
c7
[ 220.281437] RSP: 0018:ffffb4b0d18d7a80 EFLAGS: 00010282
[ 220.281443] RAX: 0000000000000026 RBX: 0000000000000003 RCX: 0000000000000000
[ 220.281448] RDX: 0000000000000001 RSI: ffffffff988d06dc RDI: 00000000ffffffff
[ 220.281452] RBP: 00000000ffffffff R08: 0000000000000000 R09: ffffb4b0d18d7930
[ 220.281457] R10: 0000000000000003 R11: ffffa0672e2fffe8 R12: ffffa058ca360400
[ 220.281461] R13: ffffa05846c50a18 R14: 00000000fffffe00 R15: 0000000000000003
[ 220.281465] FS: 00007f82683e06c0(0000) GS:ffffa066e2e00000(0000)
knlGS:0000000000000000
[ 220.281470] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 220.281475] CR2: 00003590005cc000 CR3: 00000001fca46000 CR4: 0000000000350ee0
[ 220.281480] Call Trace:
[ 220.281485] <TASK>
[ 220.281490] amdgpu_cs_ioctl+0x4e2/0x2070 [amdgpu]
[ 220.281806] ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
[ 220.282028] drm_ioctl_kernel+0xa4/0x150
[ 220.282043] drm_ioctl+0x21f/0x420
[ 220.282053] ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
[ 220.282275] ? lock_release+0x14f/0x460
[ 220.282282] ? _raw_spin_unlock_irqrestore+0x30/0x60
[ 220.282290] ? _raw_spin_unlock_irqrestore+0x30/0x60
[ 220.282297] ? lockdep_hardirqs_on+0x7d/0x100
[ 220.282305] ? _raw_spin_unlock_irqrestore+0x40/0x60
[ 220.282317] amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
[ 220.282534] __x64_sys_ioctl+0x90/0xd0
[ 220.282545] do_syscall_64+0x5b/0x80
[ 220.282551] ? futex_wake+0x6c/0x150
[ 220.282568] ? lock_is_held_type+0xe8/0x140
[ 220.282580] ? do_syscall_64+0x67/0x80
[ 220.282585] ? lockdep_hardirqs_on+0x7d/0x100
[ 220.282592] ? do_syscall_64+0x67/0x80
[ 220.282597] ? do_syscall_64+0x67/0x80
[ 220.282602] ? lockdep_hardirqs_on+0x7d/0x100
[ 220.282609] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 220.282616] RIP: 0033:0x7f8282a4f8bf
[ 220.282639] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10
00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00
0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00
00
[ 220.282644] RSP: 002b:00007f82683df410 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[ 220.282651] RAX: ffffffffffffffda RBX: 00007f82683df588 RCX: 00007f8282a4f8bf
[ 220.282655] RDX: 00007f82683df4d0 RSI: 00000000c0186444 RDI: 0000000000000018
[ 220.282659] RBP: 00007f82683df4d0 R08: 00007f82683df5e0 R09: 00007f82683df4b0
[ 220.282663] R10: 00001d04000a0600 R11: 0000000000000246 R12: 00000000c0186444
[ 220.282667] R13: 0000000000000018 R14: 00007f82683df588 R15: 0000000000000003
[ 220.282689] </TASK>
[ 220.282693] irq event stamp: 6232311
[ 220.282697] hardirqs last enabled at (6232319): [<ffffffff9718cd7e>]
__up_console_sem+0x5e/0x70
[ 220.282704] hardirqs last disabled at (6232326):
[<ffffffff9718cd63>] __up_console_sem+0x43/0x70
[ 220.282709] softirqs last enabled at (6232072): [<ffffffff970ff669>]
__irq_exit_rcu+0xf9/0x170
[ 220.282716] softirqs last disabled at (6232061):
[<ffffffff970ff669>] __irq_exit_rcu+0xf9/0x170
[ 220.282722] ---[ end trace 0000000000000000 ]---


Full kernel log is here:
https://pastebin.com/gn01DVxE

My GPU hardware is AMD Radeon 6900XT.

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG][5.20] refcount_t: underflow; use-after-free
  2022-08-14 21:11 [BUG][5.20] refcount_t: underflow; use-after-free Mikhail Gavrilov
@ 2022-08-15  0:20 ` Maíra Canal
  2022-08-15 10:37   ` Mikhail Gavrilov
  2022-08-15 10:55   ` Melissa Wen
  0 siblings, 2 replies; 13+ messages in thread
From: Maíra Canal @ 2022-08-15  0:20 UTC (permalink / raw)
  To: Mikhail Gavrilov, dri-devel, amd-gfx list, Christian König,
	Linux List Kernel Mailing

Hi Mikhail

Looks like this use-after-free problem was introduced on
90af0ca047f3049c4b46e902f432ad6ef1e2ded6. Checking this patch it seems
like: if amdgpu_cs_vm_handling return r != 0, then it will unlock
bo_list_mutex inside the function amdgpu_cs_vm_handling and again on
amdgpu_cs_parser_fini.

Maybe the following patch will help:

---
From 71d718c0f53a334bb59bcd5dabd29bbe92c724af Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ma=C3=ADra=20Canal?= <mairacanal@riseup.net>
Date: Sun, 14 Aug 2022 21:12:24 -0300
Subject: [PATCH] drm/amdgpu: Fix use-after-free on amdgpu_bo_list mutex
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Fixes: 90af0ca047f3 ("drm/amdgpu: Protect the amdgpu_bo_list list with a
mutex v2")
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Signed-off-by: Maíra Canal <mairacanal@riseup.net>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index d8f1335bc68f..a7fce7b14321 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -837,17 +837,14 @@ static int amdgpu_cs_vm_handling(struct
amdgpu_cs_parser *p)
 			continue;

 		r = amdgpu_vm_bo_update(adev, bo_va, false);
-		if (r) {
-			mutex_unlock(&p->bo_list->bo_list_mutex);
+		if (r)
 			return r;
-		}

 		r = amdgpu_sync_fence(&p->job->sync, bo_va->last_pt_update);
-		if (r) {
-			mutex_unlock(&p->bo_list->bo_list_mutex);
+		if (r)
 			return r;
-		}
 	}
+	mutex_unlock(&p->bo_list->bo_list_mutex);

 	r = amdgpu_vm_handle_moved(adev, vm);
 	if (r)
-- 
2.37.1
---
Best Regards,
- Maíra Canal

On 8/14/22 18:11, Mikhail Gavrilov wrote:
> Hi folks.
> Joined testing 5.20 today (7ebfc85e2cd7).
> I encountered a frequently GPU freeze, after which a message appears
> in the kernel logs:
> [ 220.280990] ------------[ cut here ]------------
> [ 220.281000] refcount_t: underflow; use-after-free.
> [ 220.281019] WARNING: CPU: 1 PID: 3746 at lib/refcount.c:28
> refcount_warn_saturate+0xba/0x110
> [ 220.281029] Modules linked in: uinput rfcomm snd_seq_dummy
> snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
> nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
> qrtr bnep sunrpc snd_seq_midi snd_seq_midi_event vfat intel_rapl_msr
> fat intel_rapl_common snd_hda_codec_realtek mt76x2u
> snd_hda_codec_generic snd_hda_codec_hdmi mt76x2_common iwlmvm
> mt76x02_usb edac_mce_amd mt76_usb snd_hda_intel snd_intel_dspcfg
> mt76x02_lib snd_intel_sdw_acpi snd_usb_audio snd_hda_codec mt76
> kvm_amd uvcvideo mac80211 snd_hda_core btusb eeepc_wmi snd_usbmidi_lib
> videobuf2_vmalloc videobuf2_memops kvm btrtl snd_rawmidi asus_wmi
> snd_hwdep videobuf2_v4l2 btbcm iwlwifi ledtrig_audio libarc4 btintel
> snd_seq videobuf2_common sparse_keymap btmtk irqbypass videodev
> snd_seq_device joydev xpad iwlmei platform_profile bluetooth
> ff_memless snd_pcm mc rapl
> [ 220.281185] video snd_timer cfg80211 wmi_bmof snd pcspkr soundcore
> k10temp i2c_piix4 rfkill mei asus_ec_sensors acpi_cpufreq zram
> hid_logitech_hidpp amdgpu igb dca drm_ttm_helper ttm crct10dif_pclmul
> iommu_v2 crc32_pclmul gpu_sched crc32c_intel ucsi_ccg drm_buddy nvme
> typec_ucsi ghash_clmulni_intel drm_display_helper ccp nvme_core typec
> sp5100_tco cec wmi ip6_tables ip_tables fuse
> [ 220.281258] Unloaded tainted modules: amd64_edac():1 amd64_edac():1
> amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
> amd64_edac():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
> amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
> pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
> amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
> pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
> pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
> pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
> pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
> pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
> pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
> amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
> pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
> pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
> amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1
> [ 220.281388] pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1
> fjes():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1
> [ 220.281415] CPU: 1 PID: 3746 Comm: chrome:cs0 Tainted: G W L -------
> --- 5.20.0-0.rc0.20220812git7ebfc85e2cd7.10.fc38.x86_64 #1
> [ 220.281421] Hardware name: System manufacturer System Product
> Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022
> [ 220.281426] RIP: 0010:refcount_warn_saturate+0xba/0x110
> [ 220.281431] Code: 01 01 e8 79 4a 6f 00 0f 0b e9 42 47 a5 00 80 3d de
> 7e be 01 00 75 85 48 c7 c7 f8 98 8e 98 c6 05 ce 7e be 01 01 e8 56 4a
> 6f 00 <0f> 0b e9 1f 47 a5 00 80 3d b9 7e be 01 00 0f 85 5e ff ff ff 48
> c7
> [ 220.281437] RSP: 0018:ffffb4b0d18d7a80 EFLAGS: 00010282
> [ 220.281443] RAX: 0000000000000026 RBX: 0000000000000003 RCX: 0000000000000000
> [ 220.281448] RDX: 0000000000000001 RSI: ffffffff988d06dc RDI: 00000000ffffffff
> [ 220.281452] RBP: 00000000ffffffff R08: 0000000000000000 R09: ffffb4b0d18d7930
> [ 220.281457] R10: 0000000000000003 R11: ffffa0672e2fffe8 R12: ffffa058ca360400
> [ 220.281461] R13: ffffa05846c50a18 R14: 00000000fffffe00 R15: 0000000000000003
> [ 220.281465] FS: 00007f82683e06c0(0000) GS:ffffa066e2e00000(0000)
> knlGS:0000000000000000
> [ 220.281470] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 220.281475] CR2: 00003590005cc000 CR3: 00000001fca46000 CR4: 0000000000350ee0
> [ 220.281480] Call Trace:
> [ 220.281485] <TASK>
> [ 220.281490] amdgpu_cs_ioctl+0x4e2/0x2070 [amdgpu]
> [ 220.281806] ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
> [ 220.282028] drm_ioctl_kernel+0xa4/0x150
> [ 220.282043] drm_ioctl+0x21f/0x420
> [ 220.282053] ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
> [ 220.282275] ? lock_release+0x14f/0x460
> [ 220.282282] ? _raw_spin_unlock_irqrestore+0x30/0x60
> [ 220.282290] ? _raw_spin_unlock_irqrestore+0x30/0x60
> [ 220.282297] ? lockdep_hardirqs_on+0x7d/0x100
> [ 220.282305] ? _raw_spin_unlock_irqrestore+0x40/0x60
> [ 220.282317] amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
> [ 220.282534] __x64_sys_ioctl+0x90/0xd0
> [ 220.282545] do_syscall_64+0x5b/0x80
> [ 220.282551] ? futex_wake+0x6c/0x150
> [ 220.282568] ? lock_is_held_type+0xe8/0x140
> [ 220.282580] ? do_syscall_64+0x67/0x80
> [ 220.282585] ? lockdep_hardirqs_on+0x7d/0x100
> [ 220.282592] ? do_syscall_64+0x67/0x80
> [ 220.282597] ? do_syscall_64+0x67/0x80
> [ 220.282602] ? lockdep_hardirqs_on+0x7d/0x100
> [ 220.282609] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [ 220.282616] RIP: 0033:0x7f8282a4f8bf
> [ 220.282639] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10
> 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00
> 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00
> 00
> [ 220.282644] RSP: 002b:00007f82683df410 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [ 220.282651] RAX: ffffffffffffffda RBX: 00007f82683df588 RCX: 00007f8282a4f8bf
> [ 220.282655] RDX: 00007f82683df4d0 RSI: 00000000c0186444 RDI: 0000000000000018
> [ 220.282659] RBP: 00007f82683df4d0 R08: 00007f82683df5e0 R09: 00007f82683df4b0
> [ 220.282663] R10: 00001d04000a0600 R11: 0000000000000246 R12: 00000000c0186444
> [ 220.282667] R13: 0000000000000018 R14: 00007f82683df588 R15: 0000000000000003
> [ 220.282689] </TASK>
> [ 220.282693] irq event stamp: 6232311
> [ 220.282697] hardirqs last enabled at (6232319): [<ffffffff9718cd7e>]
> __up_console_sem+0x5e/0x70
> [ 220.282704] hardirqs last disabled at (6232326):
> [<ffffffff9718cd63>] __up_console_sem+0x43/0x70
> [ 220.282709] softirqs last enabled at (6232072): [<ffffffff970ff669>]
> __irq_exit_rcu+0xf9/0x170
> [ 220.282716] softirqs last disabled at (6232061):
> [<ffffffff970ff669>] __irq_exit_rcu+0xf9/0x170
> [ 220.282722] ---[ end trace 0000000000000000 ]---
> 
> 
> Full kernel log is here:
> https://pastebin.com/gn01DVxE
> 
> My GPU hardware is AMD Radeon 6900XT.
> 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [BUG][5.20] refcount_t: underflow; use-after-free
  2022-08-15  0:20 ` Maíra Canal
@ 2022-08-15 10:37   ` Mikhail Gavrilov
  2022-08-16 19:14     ` Mikhail Gavrilov
  2022-08-15 10:55   ` Melissa Wen
  1 sibling, 1 reply; 13+ messages in thread
From: Mikhail Gavrilov @ 2022-08-15 10:37 UTC (permalink / raw)
  To: Maíra Canal
  Cc: Christian König, amd-gfx list, dri-devel, Linux List Kernel Mailing

On Mon, Aug 15, 2022 at 5:20 AM Maíra Canal <mairacanal@riseup.net> wrote:
>
> Hi Mikhail
>
> Looks like this use-after-free problem was introduced on
> 90af0ca047f3049c4b46e902f432ad6ef1e2ded6. Checking this patch it seems
> like: if amdgpu_cs_vm_handling return r != 0, then it will unlock
> bo_list_mutex inside the function amdgpu_cs_vm_handling and again on
> amdgpu_cs_parser_fini.
>
> Maybe the following patch will help:

Thanks, I tested this patch.
But with this patch use-after-free problem happening in another place:

[  894.012920] ------------[ cut here ]------------
[  894.012939] refcount_t: underflow; use-after-free.
[  894.012968] WARNING: CPU: 14 PID: 205 at lib/refcount.c:28
refcount_warn_saturate+0xba/0x110
[  894.012999] Modules linked in: tls uinput rfcomm snd_seq_dummy
snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
qrtr bnep sunrpc snd_seq_midi snd_seq_midi_event snd_hda_codec_realtek
mt76x2u mt76x2_common snd_hda_codec_generic snd_hda_codec_hdmi
intel_rapl_msr mt76x02_usb intel_rapl_common snd_hda_intel mt76_usb
snd_intel_dspcfg vfat iwlmvm snd_intel_sdw_acpi mt76x02_lib fat
snd_usb_audio snd_hda_codec mt76 edac_mce_amd snd_usbmidi_lib
snd_hda_core btusb snd_rawmidi snd_hwdep mac80211 mc iwlwifi btrtl
eeepc_wmi asus_wmi btbcm snd_seq kvm_amd libarc4 ledtrig_audio
snd_seq_device btintel iwlmei sparse_keymap btmtk kvm snd_pcm
irqbypass platform_profile snd_timer xpad joydev cfg80211 rapl
hid_logitech_hidpp bluetooth ff_memless wmi_bmof video pcspkr snd
k10temp i2c_piix4
[  894.013086]  soundcore rfkill mei asus_ec_sensors acpi_cpufreq zram
amdgpu drm_ttm_helper ttm iommu_v2 crct10dif_pclmul ucsi_ccg gpu_sched
crc32_pclmul crc32c_intel typec_ucsi drm_buddy typec
drm_display_helper ghash_clmulni_intel igb ccp cec nvme sp5100_tco
nvme_core dca wmi ip6_tables ip_tables fuse
[  894.013322] Unloaded tainted modules: amd64_edac():1 amd64_edac():1
amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
pcc_cpufreq():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1
[  894.013455]  pcc_cpufreq():1 pcc_cpufreq():1 fjes():1
pcc_cpufreq():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1
[  894.013690] CPU: 14 PID: 205 Comm: kworker/14:1 Tainted: G        W
   L    -------  ---
5.20.0-0.rc0.20220812git7ebfc85e2cd7.11.fc38.x86_64 #1
[  894.013725] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022
[  894.013756] Workqueue: events drm_sched_entity_kill_jobs_work [gpu_sched]
[  894.013779] RIP: 0010:refcount_warn_saturate+0xba/0x110
[  894.013796] Code: 01 01 e8 79 4a 6f 00 0f 0b e9 42 47 a5 00 80 3d
de 7e be 01 00 75 85 48 c7 c7 f8 98 8e 9c c6 05 ce 7e be 01 01 e8 56
4a 6f 00 <0f> 0b e9 1f 47 a5 00 80 3d b9 7e be 01 00 0f 85 5e ff ff ff
48 c7
[  894.013842] RSP: 0018:ffffb48681153e60 EFLAGS: 00010286
[  894.013858] RAX: 0000000000000026 RBX: ffff9bad16f1f028 RCX: 0000000000000000
[  894.013878] RDX: 0000000000000001 RSI: ffffffff9c8d06dc RDI: 00000000ffffffff
[  894.013897] RBP: ffff9bba663f5600 R08: 0000000000000000 R09: ffffb48681153d10
[  894.013916] R10: 0000000000000003 R11: ffff9bbaae2fffe8 R12: ffff9bba663fc800
[  894.013934] R13: ffff9bab93fcab40 R14: ffff9bba663fc805 R15: ffff9bad16f1f030
[  894.013954] FS:  0000000000000000(0000) GS:ffff9bba66200000(0000)
knlGS:0000000000000000
[  894.013975] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  894.013991] CR2: 00001aa46b2ec008 CR3: 0000000101516000 CR4: 0000000000350ee0
[  894.014011] Call Trace:
[  894.014022]  <TASK>
[  894.014030]  process_one_work+0x2a0/0x600
[  894.014051]  worker_thread+0x4f/0x3a0
[  894.014065]  ? process_one_work+0x600/0x600
[  894.014079]  kthread+0xf5/0x120
[  894.014092]  ? kthread_complete_and_exit+0x20/0x20
[  894.014109]  ret_from_fork+0x22/0x30
[  894.014129]  </TASK>
[  894.014137] irq event stamp: 5802
[  894.014148] hardirqs last  enabled at (5801): [<ffffffff9bf2a9e4>]
_raw_spin_unlock_irq+0x24/0x50
[  894.014178] hardirqs last disabled at (5802): [<ffffffff9bf21d8c>]
__schedule+0xe2c/0x16d0
[  894.014206] softirqs last  enabled at (4350): [<ffffffff9b7acb88>]
rht_deferred_worker+0x708/0xc00
[  894.014235] softirqs last disabled at (4348): [<ffffffff9b7ac677>]
rht_deferred_worker+0x1f7/0xc00
[  894.014264] ---[ end trace 0000000000000000 ]---

Full kernel log is here:
https://pastebin.com/wwWkXQJZ


-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG][5.20] refcount_t: underflow; use-after-free
  2022-08-15  0:20 ` Maíra Canal
  2022-08-15 10:37   ` Mikhail Gavrilov
@ 2022-08-15 10:55   ` Melissa Wen
  2022-08-15 10:58     ` Christian König
  1 sibling, 1 reply; 13+ messages in thread
From: Melissa Wen @ 2022-08-15 10:55 UTC (permalink / raw)
  To: Maíra Canal
  Cc: Mikhail Gavrilov, Christian König, amd-gfx list, dri-devel,
	Linux List Kernel Mailing

[-- Attachment #1: Type: text/plain, Size: 9187 bytes --]

On 08/14, Maíra Canal wrote:
> Hi Mikhail
> 
> Looks like this use-after-free problem was introduced on
> 90af0ca047f3049c4b46e902f432ad6ef1e2ded6. Checking this patch it seems
> like: if amdgpu_cs_vm_handling return r != 0, then it will unlock
> bo_list_mutex inside the function amdgpu_cs_vm_handling and again on
> amdgpu_cs_parser_fini.
> 
> Maybe the following patch will help:
> 
> ---
> From 71d718c0f53a334bb59bcd5dabd29bbe92c724af Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Ma=C3=ADra=20Canal?= <mairacanal@riseup.net>
> Date: Sun, 14 Aug 2022 21:12:24 -0300
> Subject: [PATCH] drm/amdgpu: Fix use-after-free on amdgpu_bo_list mutex
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> Fixes: 90af0ca047f3 ("drm/amdgpu: Protect the amdgpu_bo_list list with a
> mutex v2")
> Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
> Signed-off-by: Maíra Canal <mairacanal@riseup.net>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index d8f1335bc68f..a7fce7b14321 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -837,17 +837,14 @@ static int amdgpu_cs_vm_handling(struct
> amdgpu_cs_parser *p)
>  			continue;
> 
>  		r = amdgpu_vm_bo_update(adev, bo_va, false);
> -		if (r) {
> -			mutex_unlock(&p->bo_list->bo_list_mutex);
> +		if (r)
>  			return r;
> -		}
> 
>  		r = amdgpu_sync_fence(&p->job->sync, bo_va->last_pt_update);
> -		if (r) {
> -			mutex_unlock(&p->bo_list->bo_list_mutex);
> +		if (r)
>  			return r;
> -		}
>  	}
> +	mutex_unlock(&p->bo_list->bo_list_mutex);

I think we don't need to unlock the bo_list_mutex here. If return != 0
amdgpu_cs_parser_fini() will unlock it; otherwise, amdgpu_cs_submit()
unlocks it in the end.

BR,

Melissa
> 
>  	r = amdgpu_vm_handle_moved(adev, vm);
>  	if (r)
> -- 
> 2.37.1
> ---
> Best Regards,
> - Maíra Canal
> 
> On 8/14/22 18:11, Mikhail Gavrilov wrote:
> > Hi folks.
> > Joined testing 5.20 today (7ebfc85e2cd7).
> > I encountered a frequently GPU freeze, after which a message appears
> > in the kernel logs:
> > [ 220.280990] ------------[ cut here ]------------
> > [ 220.281000] refcount_t: underflow; use-after-free.
> > [ 220.281019] WARNING: CPU: 1 PID: 3746 at lib/refcount.c:28
> > refcount_warn_saturate+0xba/0x110
> > [ 220.281029] Modules linked in: uinput rfcomm snd_seq_dummy
> > snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
> > nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
> > nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
> > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
> > qrtr bnep sunrpc snd_seq_midi snd_seq_midi_event vfat intel_rapl_msr
> > fat intel_rapl_common snd_hda_codec_realtek mt76x2u
> > snd_hda_codec_generic snd_hda_codec_hdmi mt76x2_common iwlmvm
> > mt76x02_usb edac_mce_amd mt76_usb snd_hda_intel snd_intel_dspcfg
> > mt76x02_lib snd_intel_sdw_acpi snd_usb_audio snd_hda_codec mt76
> > kvm_amd uvcvideo mac80211 snd_hda_core btusb eeepc_wmi snd_usbmidi_lib
> > videobuf2_vmalloc videobuf2_memops kvm btrtl snd_rawmidi asus_wmi
> > snd_hwdep videobuf2_v4l2 btbcm iwlwifi ledtrig_audio libarc4 btintel
> > snd_seq videobuf2_common sparse_keymap btmtk irqbypass videodev
> > snd_seq_device joydev xpad iwlmei platform_profile bluetooth
> > ff_memless snd_pcm mc rapl
> > [ 220.281185] video snd_timer cfg80211 wmi_bmof snd pcspkr soundcore
> > k10temp i2c_piix4 rfkill mei asus_ec_sensors acpi_cpufreq zram
> > hid_logitech_hidpp amdgpu igb dca drm_ttm_helper ttm crct10dif_pclmul
> > iommu_v2 crc32_pclmul gpu_sched crc32c_intel ucsi_ccg drm_buddy nvme
> > typec_ucsi ghash_clmulni_intel drm_display_helper ccp nvme_core typec
> > sp5100_tco cec wmi ip6_tables ip_tables fuse
> > [ 220.281258] Unloaded tainted modules: amd64_edac():1 amd64_edac():1
> > amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
> > amd64_edac():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
> > amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
> > pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
> > amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
> > pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
> > pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
> > pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
> > pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
> > pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
> > pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
> > amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
> > pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
> > pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
> > amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1
> > [ 220.281388] pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1
> > fjes():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1
> > [ 220.281415] CPU: 1 PID: 3746 Comm: chrome:cs0 Tainted: G W L -------
> > --- 5.20.0-0.rc0.20220812git7ebfc85e2cd7.10.fc38.x86_64 #1
> > [ 220.281421] Hardware name: System manufacturer System Product
> > Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022
> > [ 220.281426] RIP: 0010:refcount_warn_saturate+0xba/0x110
> > [ 220.281431] Code: 01 01 e8 79 4a 6f 00 0f 0b e9 42 47 a5 00 80 3d de
> > 7e be 01 00 75 85 48 c7 c7 f8 98 8e 98 c6 05 ce 7e be 01 01 e8 56 4a
> > 6f 00 <0f> 0b e9 1f 47 a5 00 80 3d b9 7e be 01 00 0f 85 5e ff ff ff 48
> > c7
> > [ 220.281437] RSP: 0018:ffffb4b0d18d7a80 EFLAGS: 00010282
> > [ 220.281443] RAX: 0000000000000026 RBX: 0000000000000003 RCX: 0000000000000000
> > [ 220.281448] RDX: 0000000000000001 RSI: ffffffff988d06dc RDI: 00000000ffffffff
> > [ 220.281452] RBP: 00000000ffffffff R08: 0000000000000000 R09: ffffb4b0d18d7930
> > [ 220.281457] R10: 0000000000000003 R11: ffffa0672e2fffe8 R12: ffffa058ca360400
> > [ 220.281461] R13: ffffa05846c50a18 R14: 00000000fffffe00 R15: 0000000000000003
> > [ 220.281465] FS: 00007f82683e06c0(0000) GS:ffffa066e2e00000(0000)
> > knlGS:0000000000000000
> > [ 220.281470] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 220.281475] CR2: 00003590005cc000 CR3: 00000001fca46000 CR4: 0000000000350ee0
> > [ 220.281480] Call Trace:
> > [ 220.281485] <TASK>
> > [ 220.281490] amdgpu_cs_ioctl+0x4e2/0x2070 [amdgpu]
> > [ 220.281806] ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
> > [ 220.282028] drm_ioctl_kernel+0xa4/0x150
> > [ 220.282043] drm_ioctl+0x21f/0x420
> > [ 220.282053] ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
> > [ 220.282275] ? lock_release+0x14f/0x460
> > [ 220.282282] ? _raw_spin_unlock_irqrestore+0x30/0x60
> > [ 220.282290] ? _raw_spin_unlock_irqrestore+0x30/0x60
> > [ 220.282297] ? lockdep_hardirqs_on+0x7d/0x100
> > [ 220.282305] ? _raw_spin_unlock_irqrestore+0x40/0x60
> > [ 220.282317] amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
> > [ 220.282534] __x64_sys_ioctl+0x90/0xd0
> > [ 220.282545] do_syscall_64+0x5b/0x80
> > [ 220.282551] ? futex_wake+0x6c/0x150
> > [ 220.282568] ? lock_is_held_type+0xe8/0x140
> > [ 220.282580] ? do_syscall_64+0x67/0x80
> > [ 220.282585] ? lockdep_hardirqs_on+0x7d/0x100
> > [ 220.282592] ? do_syscall_64+0x67/0x80
> > [ 220.282597] ? do_syscall_64+0x67/0x80
> > [ 220.282602] ? lockdep_hardirqs_on+0x7d/0x100
> > [ 220.282609] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > [ 220.282616] RIP: 0033:0x7f8282a4f8bf
> > [ 220.282639] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10
> > 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00
> > 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00
> > 00
> > [ 220.282644] RSP: 002b:00007f82683df410 EFLAGS: 00000246 ORIG_RAX:
> > 0000000000000010
> > [ 220.282651] RAX: ffffffffffffffda RBX: 00007f82683df588 RCX: 00007f8282a4f8bf
> > [ 220.282655] RDX: 00007f82683df4d0 RSI: 00000000c0186444 RDI: 0000000000000018
> > [ 220.282659] RBP: 00007f82683df4d0 R08: 00007f82683df5e0 R09: 00007f82683df4b0
> > [ 220.282663] R10: 00001d04000a0600 R11: 0000000000000246 R12: 00000000c0186444
> > [ 220.282667] R13: 0000000000000018 R14: 00007f82683df588 R15: 0000000000000003
> > [ 220.282689] </TASK>
> > [ 220.282693] irq event stamp: 6232311
> > [ 220.282697] hardirqs last enabled at (6232319): [<ffffffff9718cd7e>]
> > __up_console_sem+0x5e/0x70
> > [ 220.282704] hardirqs last disabled at (6232326):
> > [<ffffffff9718cd63>] __up_console_sem+0x43/0x70
> > [ 220.282709] softirqs last enabled at (6232072): [<ffffffff970ff669>]
> > __irq_exit_rcu+0xf9/0x170
> > [ 220.282716] softirqs last disabled at (6232061):
> > [<ffffffff970ff669>] __irq_exit_rcu+0xf9/0x170
> > [ 220.282722] ---[ end trace 0000000000000000 ]---
> > 
> > 
> > Full kernel log is here:
> > https://pastebin.com/gn01DVxE
> > 
> > My GPU hardware is AMD Radeon 6900XT.
> > 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG][5.20] refcount_t: underflow; use-after-free
  2022-08-15 10:55   ` Melissa Wen
@ 2022-08-15 10:58     ` Christian König
  0 siblings, 0 replies; 13+ messages in thread
From: Christian König @ 2022-08-15 10:58 UTC (permalink / raw)
  To: Melissa Wen, Maíra Canal
  Cc: Mikhail Gavrilov, amd-gfx list, dri-devel, Linux List Kernel Mailing

Am 15.08.22 um 12:55 schrieb Melissa Wen:
> On 08/14, Maíra Canal wrote:
>> Hi Mikhail
>>
>> Looks like this use-after-free problem was introduced on
>> 90af0ca047f3049c4b46e902f432ad6ef1e2ded6. Checking this patch it seems
>> like: if amdgpu_cs_vm_handling return r != 0, then it will unlock
>> bo_list_mutex inside the function amdgpu_cs_vm_handling and again on
>> amdgpu_cs_parser_fini.
>>
>> Maybe the following patch will help:
>>
>> ---
>>  From 71d718c0f53a334bb59bcd5dabd29bbe92c724af Mon Sep 17 00:00:00 2001
>> From: =?UTF-8?q?Ma=C3=ADra=20Canal?= <mairacanal@riseup.net>
>> Date: Sun, 14 Aug 2022 21:12:24 -0300
>> Subject: [PATCH] drm/amdgpu: Fix use-after-free on amdgpu_bo_list mutex
>> MIME-Version: 1.0
>> Content-Type: text/plain; charset=UTF-8
>> Content-Transfer-Encoding: 8bit
>>
>> Fixes: 90af0ca047f3 ("drm/amdgpu: Protect the amdgpu_bo_list list with a
>> mutex v2")
>> Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
>> Signed-off-by: Maíra Canal <mairacanal@riseup.net>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 9 +++------
>>   1 file changed, 3 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> index d8f1335bc68f..a7fce7b14321 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> @@ -837,17 +837,14 @@ static int amdgpu_cs_vm_handling(struct
>> amdgpu_cs_parser *p)
>>   			continue;
>>
>>   		r = amdgpu_vm_bo_update(adev, bo_va, false);
>> -		if (r) {
>> -			mutex_unlock(&p->bo_list->bo_list_mutex);
>> +		if (r)
>>   			return r;
>> -		}
>>
>>   		r = amdgpu_sync_fence(&p->job->sync, bo_va->last_pt_update);
>> -		if (r) {
>> -			mutex_unlock(&p->bo_list->bo_list_mutex);
>> +		if (r)
>>   			return r;
>> -		}
>>   	}
>> +	mutex_unlock(&p->bo_list->bo_list_mutex);
> I think we don't need to unlock the bo_list_mutex here. If return != 0
> amdgpu_cs_parser_fini() will unlock it; otherwise, amdgpu_cs_submit()
> unlocks it in the end.

Yeah, exactly that.

Apart from that the patch looks good to me. We moved the mutex unlocking 
around a few times during review. Probably just a fallout from that.

Thanks for fixing this,
Christian.

>
> BR,
>
> Melissa
>>   	r = amdgpu_vm_handle_moved(adev, vm);
>>   	if (r)
>> -- 
>> 2.37.1
>> ---
>> Best Regards,
>> - Maíra Canal
>>
>> On 8/14/22 18:11, Mikhail Gavrilov wrote:
>>> Hi folks.
>>> Joined testing 5.20 today (7ebfc85e2cd7).
>>> I encountered a frequently GPU freeze, after which a message appears
>>> in the kernel logs:
>>> [ 220.280990] ------------[ cut here ]------------
>>> [ 220.281000] refcount_t: underflow; use-after-free.
>>> [ 220.281019] WARNING: CPU: 1 PID: 3746 at lib/refcount.c:28
>>> refcount_warn_saturate+0xba/0x110
>>> [ 220.281029] Modules linked in: uinput rfcomm snd_seq_dummy
>>> snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
>>> nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
>>> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
>>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
>>> qrtr bnep sunrpc snd_seq_midi snd_seq_midi_event vfat intel_rapl_msr
>>> fat intel_rapl_common snd_hda_codec_realtek mt76x2u
>>> snd_hda_codec_generic snd_hda_codec_hdmi mt76x2_common iwlmvm
>>> mt76x02_usb edac_mce_amd mt76_usb snd_hda_intel snd_intel_dspcfg
>>> mt76x02_lib snd_intel_sdw_acpi snd_usb_audio snd_hda_codec mt76
>>> kvm_amd uvcvideo mac80211 snd_hda_core btusb eeepc_wmi snd_usbmidi_lib
>>> videobuf2_vmalloc videobuf2_memops kvm btrtl snd_rawmidi asus_wmi
>>> snd_hwdep videobuf2_v4l2 btbcm iwlwifi ledtrig_audio libarc4 btintel
>>> snd_seq videobuf2_common sparse_keymap btmtk irqbypass videodev
>>> snd_seq_device joydev xpad iwlmei platform_profile bluetooth
>>> ff_memless snd_pcm mc rapl
>>> [ 220.281185] video snd_timer cfg80211 wmi_bmof snd pcspkr soundcore
>>> k10temp i2c_piix4 rfkill mei asus_ec_sensors acpi_cpufreq zram
>>> hid_logitech_hidpp amdgpu igb dca drm_ttm_helper ttm crct10dif_pclmul
>>> iommu_v2 crc32_pclmul gpu_sched crc32c_intel ucsi_ccg drm_buddy nvme
>>> typec_ucsi ghash_clmulni_intel drm_display_helper ccp nvme_core typec
>>> sp5100_tco cec wmi ip6_tables ip_tables fuse
>>> [ 220.281258] Unloaded tainted modules: amd64_edac():1 amd64_edac():1
>>> amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
>>> amd64_edac():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
>>> amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
>>> pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
>>> amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
>>> pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
>>> pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
>>> pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
>>> pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
>>> pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
>>> pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
>>> amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
>>> pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
>>> pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
>>> amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1
>>> [ 220.281388] pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1
>>> fjes():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1
>>> [ 220.281415] CPU: 1 PID: 3746 Comm: chrome:cs0 Tainted: G W L -------
>>> --- 5.20.0-0.rc0.20220812git7ebfc85e2cd7.10.fc38.x86_64 #1
>>> [ 220.281421] Hardware name: System manufacturer System Product
>>> Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022
>>> [ 220.281426] RIP: 0010:refcount_warn_saturate+0xba/0x110
>>> [ 220.281431] Code: 01 01 e8 79 4a 6f 00 0f 0b e9 42 47 a5 00 80 3d de
>>> 7e be 01 00 75 85 48 c7 c7 f8 98 8e 98 c6 05 ce 7e be 01 01 e8 56 4a
>>> 6f 00 <0f> 0b e9 1f 47 a5 00 80 3d b9 7e be 01 00 0f 85 5e ff ff ff 48
>>> c7
>>> [ 220.281437] RSP: 0018:ffffb4b0d18d7a80 EFLAGS: 00010282
>>> [ 220.281443] RAX: 0000000000000026 RBX: 0000000000000003 RCX: 0000000000000000
>>> [ 220.281448] RDX: 0000000000000001 RSI: ffffffff988d06dc RDI: 00000000ffffffff
>>> [ 220.281452] RBP: 00000000ffffffff R08: 0000000000000000 R09: ffffb4b0d18d7930
>>> [ 220.281457] R10: 0000000000000003 R11: ffffa0672e2fffe8 R12: ffffa058ca360400
>>> [ 220.281461] R13: ffffa05846c50a18 R14: 00000000fffffe00 R15: 0000000000000003
>>> [ 220.281465] FS: 00007f82683e06c0(0000) GS:ffffa066e2e00000(0000)
>>> knlGS:0000000000000000
>>> [ 220.281470] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 220.281475] CR2: 00003590005cc000 CR3: 00000001fca46000 CR4: 0000000000350ee0
>>> [ 220.281480] Call Trace:
>>> [ 220.281485] <TASK>
>>> [ 220.281490] amdgpu_cs_ioctl+0x4e2/0x2070 [amdgpu]
>>> [ 220.281806] ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
>>> [ 220.282028] drm_ioctl_kernel+0xa4/0x150
>>> [ 220.282043] drm_ioctl+0x21f/0x420
>>> [ 220.282053] ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
>>> [ 220.282275] ? lock_release+0x14f/0x460
>>> [ 220.282282] ? _raw_spin_unlock_irqrestore+0x30/0x60
>>> [ 220.282290] ? _raw_spin_unlock_irqrestore+0x30/0x60
>>> [ 220.282297] ? lockdep_hardirqs_on+0x7d/0x100
>>> [ 220.282305] ? _raw_spin_unlock_irqrestore+0x40/0x60
>>> [ 220.282317] amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
>>> [ 220.282534] __x64_sys_ioctl+0x90/0xd0
>>> [ 220.282545] do_syscall_64+0x5b/0x80
>>> [ 220.282551] ? futex_wake+0x6c/0x150
>>> [ 220.282568] ? lock_is_held_type+0xe8/0x140
>>> [ 220.282580] ? do_syscall_64+0x67/0x80
>>> [ 220.282585] ? lockdep_hardirqs_on+0x7d/0x100
>>> [ 220.282592] ? do_syscall_64+0x67/0x80
>>> [ 220.282597] ? do_syscall_64+0x67/0x80
>>> [ 220.282602] ? lockdep_hardirqs_on+0x7d/0x100
>>> [ 220.282609] entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>> [ 220.282616] RIP: 0033:0x7f8282a4f8bf
>>> [ 220.282639] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10
>>> 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00
>>> 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00
>>> 00
>>> [ 220.282644] RSP: 002b:00007f82683df410 EFLAGS: 00000246 ORIG_RAX:
>>> 0000000000000010
>>> [ 220.282651] RAX: ffffffffffffffda RBX: 00007f82683df588 RCX: 00007f8282a4f8bf
>>> [ 220.282655] RDX: 00007f82683df4d0 RSI: 00000000c0186444 RDI: 0000000000000018
>>> [ 220.282659] RBP: 00007f82683df4d0 R08: 00007f82683df5e0 R09: 00007f82683df4b0
>>> [ 220.282663] R10: 00001d04000a0600 R11: 0000000000000246 R12: 00000000c0186444
>>> [ 220.282667] R13: 0000000000000018 R14: 00007f82683df588 R15: 0000000000000003
>>> [ 220.282689] </TASK>
>>> [ 220.282693] irq event stamp: 6232311
>>> [ 220.282697] hardirqs last enabled at (6232319): [<ffffffff9718cd7e>]
>>> __up_console_sem+0x5e/0x70
>>> [ 220.282704] hardirqs last disabled at (6232326):
>>> [<ffffffff9718cd63>] __up_console_sem+0x43/0x70
>>> [ 220.282709] softirqs last enabled at (6232072): [<ffffffff970ff669>]
>>> __irq_exit_rcu+0xf9/0x170
>>> [ 220.282716] softirqs last disabled at (6232061):
>>> [<ffffffff970ff669>] __irq_exit_rcu+0xf9/0x170
>>> [ 220.282722] ---[ end trace 0000000000000000 ]---
>>>
>>>
>>> Full kernel log is here:
>>> https://pastebin.com/gn01DVxE
>>>
>>> My GPU hardware is AMD Radeon 6900XT.
>>>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG][5.20] refcount_t: underflow; use-after-free
  2022-08-15 10:37   ` Mikhail Gavrilov
@ 2022-08-16 19:14     ` Mikhail Gavrilov
  2022-08-17 16:07       ` Melissa Wen
  0 siblings, 1 reply; 13+ messages in thread
From: Mikhail Gavrilov @ 2022-08-16 19:14 UTC (permalink / raw)
  To: Maíra Canal
  Cc: Christian König, amd-gfx list, dri-devel, Linux List Kernel Mailing

On Mon, Aug 15, 2022 at 3:37 PM Mikhail Gavrilov
<mikhail.v.gavrilov@gmail.com> wrote:
>
> Thanks, I tested this patch.
> But with this patch use-after-free problem happening in another place:

Does anyone have an idea why the second use-after-free happened?
From the trace I don't understand which code is related.
I don't quite understand what the "Workqueue" entry in the trace means.

[ 408.358737] ------------[ cut here ]------------
[ 408.358743] refcount_t: underflow; use-after-free.
[ 408.358760] WARNING: CPU: 9 PID: 62 at lib/refcount.c:28
refcount_warn_saturate+0xba/0x110
[ 408.358769] Modules linked in: uinput snd_seq_dummy rfcomm
snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
qrtr bnep sunrpc binfmt_misc snd_seq_midi snd_seq_midi_event mt76x2u
mt76x2_common snd_hda_codec_realtek mt76x02_usb snd_hda_codec_generic
iwlmvm snd_hda_codec_hdmi mt76_usb intel_rapl_msr snd_hda_intel
mt76x02_lib intel_rapl_common snd_intel_dspcfg snd_intel_sdw_acpi mt76
snd_hda_codec vfat fat snd_usb_audio snd_hda_core edac_mce_amd
mac80211 snd_usbmidi_lib snd_hwdep snd_rawmidi mc snd_seq btusb
kvm_amd iwlwifi snd_seq_device btrtl btbcm libarc4 btintel eeepc_wmi
snd_pcm iwlmei kvm btmtk asus_wmi ledtrig_audio irqbypass joydev
snd_timer sparse_keymap bluetooth platform_profile rapl cfg80211 snd
video wmi_bmof soundcore i2c_piix4 k10temp rfkill mei
[ 408.358853] asus_ec_sensors acpi_cpufreq zram hid_logitech_hidpp
amdgpu igb dca drm_ttm_helper ttm iommu_v2 crct10dif_pclmul gpu_sched
crc32_pclmul ucsi_ccg crc32c_intel drm_buddy nvme typec_ucsi
drm_display_helper ghash_clmulni_intel ccp typec nvme_core sp5100_tco
cec wmi ip6_tables ip_tables fuse
[ 408.358880] Unloaded tainted modules: amd64_edac():1 amd64_edac():1
amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
pcc_cpufreq():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1
[ 408.358953] pcc_cpufreq():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1
fjes():1 fjes():1 fjes():1 fjes():1 fjes():1
[ 408.358967] CPU: 9 PID: 62 Comm: kworker/9:0 Tainted: G W L -------
--- 6.0.0-0.rc1.13.fc38.x86_64+debug #1
[ 408.358971] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022
[ 408.358974] Workqueue: events drm_sched_entity_kill_jobs_work [gpu_sched]
[ 408.358982] RIP: 0010:refcount_warn_saturate+0xba/0x110
[ 408.358987] Code: 01 01 e8 d9 59 6f 00 0f 0b e9 a2 46 a5 00 80 3d 3e
7e be 01 00 75 85 48 c7 c7 70 99 8e 92 c6 05 2e 7e be 01 01 e8 b6 59
6f 00 <0f> 0b e9 7f 46 a5 00 80 3d 19 7e be 01 00 0f 85 5e ff ff ff 48
c7
[ 408.358990] RSP: 0018:ffffb124003efe60 EFLAGS: 00010286
[ 408.358994] RAX: 0000000000000026 RBX: ffff9987a025d428 RCX: 0000000000000000
[ 408.358997] RDX: 0000000000000001 RSI: ffffffff928d0754 RDI: 00000000ffffffff
[ 408.358999] RBP: ffff9994e4ff5600 R08: 0000000000000000 R09: ffffb124003efd10
[ 408.359001] R10: 0000000000000003 R11: ffff99952e2fffe8 R12: ffff9994e4ffc800
[ 408.359004] R13: ffff998600228cc0 R14: ffff9994e4ffc805 R15: ffff9987a025d430
[ 408.359006] FS: 0000000000000000(0000) GS:ffff9994e4e00000(0000)
knlGS:0000000000000000
[ 408.359009] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 408.359012] CR2: 000027ac39e78000 CR3: 00000001a66d8000 CR4: 0000000000350ee0
[ 408.359015] Call Trace:
[ 408.359017] <TASK>
[ 408.359020] process_one_work+0x2a0/0x600
[ 408.359032] worker_thread+0x4f/0x3a0
[ 408.359036] ? process_one_work+0x600/0x600
[ 408.359039] kthread+0xf5/0x120
[ 408.359044] ? kthread_complete_and_exit+0x20/0x20
[ 408.359049] ret_from_fork+0x22/0x30
[ 408.359061] </TASK>
[ 408.359063] irq event stamp: 5468
[ 408.359064] hardirqs last enabled at (5467): [<ffffffff91f2b9e4>]
_raw_spin_unlock_irq+0x24/0x50
[ 408.359071] hardirqs last disabled at (5468): [<ffffffff91f22d8c>]
__schedule+0xe2c/0x16d0
[ 408.359076] softirqs last enabled at (2482): [<ffffffff917acc28>]
rht_deferred_worker+0x708/0xc00
[ 408.359079] softirqs last disabled at (2480): [<ffffffff917ac717>]
rht_deferred_worker+0x1f7/0xc00
[ 408.359082] ---[ end trace 0000000000000000 ]---


Full kernel log is here: https://pastebin.com/Lam9CRLV

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG][5.20] refcount_t: underflow; use-after-free
  2022-08-16 19:14     ` Mikhail Gavrilov
@ 2022-08-17 16:07       ` Melissa Wen
  2022-08-17 17:44         ` Mikhail Gavrilov
  0 siblings, 1 reply; 13+ messages in thread
From: Melissa Wen @ 2022-08-17 16:07 UTC (permalink / raw)
  To: Mikhail Gavrilov
  Cc: Maíra Canal, Linux List Kernel Mailing, dri-devel,
	amd-gfx list, Christian König

[-- Attachment #1: Type: text/plain, Size: 5871 bytes --]

On 08/17, Mikhail Gavrilov wrote:
> On Mon, Aug 15, 2022 at 3:37 PM Mikhail Gavrilov
> <mikhail.v.gavrilov@gmail.com> wrote:
> >
> > Thanks, I tested this patch.
> > But with this patch use-after-free problem happening in another place:
> 
> Does anyone have an idea why the second use-after-free happened?
> From the trace I don't understand which code is related.
> I don't quite understand what the "Workqueue" entry in the trace means.

Hi Mikhail,

IIUC, you got this second user-after-free by applying the first version
of Maíra's patch, right? So, that version was adding another unbalanced
unlock to the cs ioctl flow, but it was solved in the latest version,
that you can find here: https://patchwork.freedesktop.org/patch/497680/
If this is the situation, can you check this last version?

Thanks,

Melissa

> 
> [ 408.358737] ------------[ cut here ]------------
> [ 408.358743] refcount_t: underflow; use-after-free.
> [ 408.358760] WARNING: CPU: 9 PID: 62 at lib/refcount.c:28
> refcount_warn_saturate+0xba/0x110
> [ 408.358769] Modules linked in: uinput snd_seq_dummy rfcomm
> snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
> nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
> qrtr bnep sunrpc binfmt_misc snd_seq_midi snd_seq_midi_event mt76x2u
> mt76x2_common snd_hda_codec_realtek mt76x02_usb snd_hda_codec_generic
> iwlmvm snd_hda_codec_hdmi mt76_usb intel_rapl_msr snd_hda_intel
> mt76x02_lib intel_rapl_common snd_intel_dspcfg snd_intel_sdw_acpi mt76
> snd_hda_codec vfat fat snd_usb_audio snd_hda_core edac_mce_amd
> mac80211 snd_usbmidi_lib snd_hwdep snd_rawmidi mc snd_seq btusb
> kvm_amd iwlwifi snd_seq_device btrtl btbcm libarc4 btintel eeepc_wmi
> snd_pcm iwlmei kvm btmtk asus_wmi ledtrig_audio irqbypass joydev
> snd_timer sparse_keymap bluetooth platform_profile rapl cfg80211 snd
> video wmi_bmof soundcore i2c_piix4 k10temp rfkill mei
> [ 408.358853] asus_ec_sensors acpi_cpufreq zram hid_logitech_hidpp
> amdgpu igb dca drm_ttm_helper ttm iommu_v2 crct10dif_pclmul gpu_sched
> crc32_pclmul ucsi_ccg crc32c_intel drm_buddy nvme typec_ucsi
> drm_display_helper ghash_clmulni_intel ccp typec nvme_core sp5100_tco
> cec wmi ip6_tables ip_tables fuse
> [ 408.358880] Unloaded tainted modules: amd64_edac():1 amd64_edac():1
> amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
> amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
> pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
> amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
> pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
> pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
> amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
> pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
> pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
> pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
> amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
> amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
> pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
> amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
> pcc_cpufreq():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1
> [ 408.358953] pcc_cpufreq():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1
> fjes():1 fjes():1 fjes():1 fjes():1 fjes():1
> [ 408.358967] CPU: 9 PID: 62 Comm: kworker/9:0 Tainted: G W L -------
> --- 6.0.0-0.rc1.13.fc38.x86_64+debug #1
> [ 408.358971] Hardware name: System manufacturer System Product
> Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022
> [ 408.358974] Workqueue: events drm_sched_entity_kill_jobs_work [gpu_sched]
> [ 408.358982] RIP: 0010:refcount_warn_saturate+0xba/0x110
> [ 408.358987] Code: 01 01 e8 d9 59 6f 00 0f 0b e9 a2 46 a5 00 80 3d 3e
> 7e be 01 00 75 85 48 c7 c7 70 99 8e 92 c6 05 2e 7e be 01 01 e8 b6 59
> 6f 00 <0f> 0b e9 7f 46 a5 00 80 3d 19 7e be 01 00 0f 85 5e ff ff ff 48
> c7
> [ 408.358990] RSP: 0018:ffffb124003efe60 EFLAGS: 00010286
> [ 408.358994] RAX: 0000000000000026 RBX: ffff9987a025d428 RCX: 0000000000000000
> [ 408.358997] RDX: 0000000000000001 RSI: ffffffff928d0754 RDI: 00000000ffffffff
> [ 408.358999] RBP: ffff9994e4ff5600 R08: 0000000000000000 R09: ffffb124003efd10
> [ 408.359001] R10: 0000000000000003 R11: ffff99952e2fffe8 R12: ffff9994e4ffc800
> [ 408.359004] R13: ffff998600228cc0 R14: ffff9994e4ffc805 R15: ffff9987a025d430
> [ 408.359006] FS: 0000000000000000(0000) GS:ffff9994e4e00000(0000)
> knlGS:0000000000000000
> [ 408.359009] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 408.359012] CR2: 000027ac39e78000 CR3: 00000001a66d8000 CR4: 0000000000350ee0
> [ 408.359015] Call Trace:
> [ 408.359017] <TASK>
> [ 408.359020] process_one_work+0x2a0/0x600
> [ 408.359032] worker_thread+0x4f/0x3a0
> [ 408.359036] ? process_one_work+0x600/0x600
> [ 408.359039] kthread+0xf5/0x120
> [ 408.359044] ? kthread_complete_and_exit+0x20/0x20
> [ 408.359049] ret_from_fork+0x22/0x30
> [ 408.359061] </TASK>
> [ 408.359063] irq event stamp: 5468
> [ 408.359064] hardirqs last enabled at (5467): [<ffffffff91f2b9e4>]
> _raw_spin_unlock_irq+0x24/0x50
> [ 408.359071] hardirqs last disabled at (5468): [<ffffffff91f22d8c>]
> __schedule+0xe2c/0x16d0
> [ 408.359076] softirqs last enabled at (2482): [<ffffffff917acc28>]
> rht_deferred_worker+0x708/0xc00
> [ 408.359079] softirqs last disabled at (2480): [<ffffffff917ac717>]
> rht_deferred_worker+0x1f7/0xc00
> [ 408.359082] ---[ end trace 0000000000000000 ]---
> 
> 
> Full kernel log is here: https://pastebin.com/Lam9CRLV
> 
> -- 
> Best Regards,
> Mike Gavrilov.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG][5.20] refcount_t: underflow; use-after-free
  2022-08-17 16:07       ` Melissa Wen
@ 2022-08-17 17:44         ` Mikhail Gavrilov
  2022-08-17 18:43           ` Maíra Canal
  0 siblings, 1 reply; 13+ messages in thread
From: Mikhail Gavrilov @ 2022-08-17 17:44 UTC (permalink / raw)
  To: Melissa Wen
  Cc: Maíra Canal, Linux List Kernel Mailing, dri-devel,
	amd-gfx list, Christian König

On Wed, Aug 17, 2022 at 9:08 PM Melissa Wen <mwen@igalia.com> wrote:
>
> Hi Mikhail,
>
> IIUC, you got this second user-after-free by applying the first version
> of Maíra's patch, right? So, that version was adding another unbalanced
> unlock to the cs ioctl flow, but it was solved in the latest version,
> that you can find here: https://patchwork.freedesktop.org/patch/497680/
> If this is the situation, can you check this last version?
>
> Thanks,
>
> Melissa

With the last version warning "bad unlock balance detected!" was gone,
but the user-after-free issue remains.
And again "Workqueue: events drm_sched_entity_kill_jobs_work [gpu_sched]".

[  297.834779] ------------[ cut here ]------------
[  297.834818] refcount_t: underflow; use-after-free.
[  297.834831] WARNING: CPU: 30 PID: 2377 at lib/refcount.c:28
refcount_warn_saturate+0xba/0x110
[  297.834838] Modules linked in: uinput rfcomm snd_seq_dummy
snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
qrtr bnep sunrpc binfmt_misc snd_seq_midi snd_seq_midi_event mt76x2u
mt76x2_common mt76x02_usb mt76_usb mt76x02_lib snd_hda_codec_realtek
iwlmvm intel_rapl_msr snd_hda_codec_generic snd_hda_codec_hdmi mt76
vfat fat snd_hda_intel intel_rapl_common mac80211 snd_intel_dspcfg
snd_intel_sdw_acpi snd_usb_audio snd_hda_codec snd_usbmidi_lib btusb
edac_mce_amd iwlwifi libarc4 uvcvideo snd_hda_core btrtl snd_rawmidi
snd_hwdep videobuf2_vmalloc btbcm kvm_amd videobuf2_memops snd_seq
iwlmei btintel videobuf2_v4l2 eeepc_wmi snd_seq_device
videobuf2_common btmtk kvm xpad videodev joydev irqbypass snd_pcm
asus_wmi hid_logitech_hidpp ff_memless cfg80211 bluetooth rapl mc
[  297.834932]  ledtrig_audio snd_timer sparse_keymap platform_profile
wmi_bmof snd video pcspkr k10temp i2c_piix4 rfkill soundcore mei
asus_ec_sensors acpi_cpufreq zram amdgpu drm_ttm_helper ttm
crct10dif_pclmul crc32_pclmul crc32c_intel iommu_v2 ucsi_ccg gpu_sched
typec_ucsi drm_buddy ghash_clmulni_intel drm_display_helper ccp igb
typec sp5100_tco nvme cec nvme_core dca wmi ip6_tables ip_tables fuse
[  297.834978] Unloaded tainted modules: amd64_edac():1 amd64_edac():1
amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 fjes():1
[  297.835055]  pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1
pcc_cpufreq():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1
[  297.835071] CPU: 30 PID: 2377 Comm: kworker/30:6 Tainted: G
W    L    -------  ---
6.0.0-0.rc1.20220817git3cc40a443a04.14.fc38.x86_64 #1
[  297.835075] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022
[  297.835078] Workqueue: events drm_sched_entity_kill_jobs_work [gpu_sched]
[  297.835085] RIP: 0010:refcount_warn_saturate+0xba/0x110
[  297.835088] Code: 01 01 e8 59 59 6f 00 0f 0b e9 22 46 a5 00 80 3d
be 7d be 01 00 75 85 48 c7 c7 c0 99 8e aa c6 05 ae 7d be 01 01 e8 36
59 6f 00 <0f> 0b e9 ff 45 a5 00 80 3d 99 7d be 01 00 0f 85 5e ff ff ff
48 c7
[  297.835091] RSP: 0018:ffffbd3506df7e60 EFLAGS: 00010286
[  297.835095] RAX: 0000000000000026 RBX: ffff961b250cbc28 RCX: 0000000000000000
[  297.835097] RDX: 0000000000000001 RSI: ffffffffaa8d07a4 RDI: 00000000ffffffff
[  297.835100] RBP: ffff96276a3f5600 R08: 0000000000000000 R09: ffffbd3506df7d10
[  297.835102] R10: 0000000000000003 R11: ffff9627ae2fffe8 R12: ffff96276a3fc800
[  297.835105] R13: ffff9618c03e6600 R14: ffff96276a3fc805 R15: ffff961b250cbc30
[  297.835108] FS:  0000000000000000(0000) GS:ffff96276a200000(0000)
knlGS:0000000000000000
[  297.835110] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  297.835113] CR2: 0000621001e4a000 CR3: 000000018d958000 CR4: 0000000000350ee0
[  297.835116] Call Trace:
[  297.835118]  <TASK>
[  297.835121]  process_one_work+0x2a0/0x600
[  297.835133]  worker_thread+0x4f/0x3a0
[  297.835139]  ? process_one_work+0x600/0x600
[  297.835142]  kthread+0xf5/0x120
[  297.835145]  ? kthread_complete_and_exit+0x20/0x20
[  297.835151]  ret_from_fork+0x22/0x30
[  297.835166]  </TASK>
[  297.835168] irq event stamp: 198245
[  297.835171] hardirqs last  enabled at (198253):
[<ffffffffa918ce7e>] __up_console_sem+0x5e/0x70
[  297.835175] hardirqs last disabled at (198260):
[<ffffffffa918ce63>] __up_console_sem+0x43/0x70
[  297.835177] softirqs last  enabled at (196454):
[<ffffffffa9de3a4e>] addrconf_verify_rtnl+0x23e/0x920
[  297.835182] softirqs last disabled at (196448):
[<ffffffffa9de3835>] addrconf_verify_rtnl+0x25/0x920
[  297.835185] ---[ end trace 0000000000000000 ]---


Fill kernel log: https://pastebin.com/zbbY2zDU

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG][5.20] refcount_t: underflow; use-after-free
  2022-08-17 17:44         ` Mikhail Gavrilov
@ 2022-08-17 18:43           ` Maíra Canal
  2022-08-17 20:57             ` Mikhail Gavrilov
  0 siblings, 1 reply; 13+ messages in thread
From: Maíra Canal @ 2022-08-17 18:43 UTC (permalink / raw)
  To: Mikhail Gavrilov, Melissa Wen
  Cc: Christian König, dri-devel, amd-gfx list, Linux List Kernel Mailing



On 8/17/22 14:44, Mikhail Gavrilov wrote:
> On Wed, Aug 17, 2022 at 9:08 PM Melissa Wen <mwen@igalia.com> wrote:
>>
>> Hi Mikhail,
>>
>> IIUC, you got this second user-after-free by applying the first version
>> of Maíra's patch, right? So, that version was adding another unbalanced
>> unlock to the cs ioctl flow, but it was solved in the latest version,
>> that you can find here: https://patchwork.freedesktop.org/patch/497680/
>> If this is the situation, can you check this last version?
>>
>> Thanks,
>>
>> Melissa
> 
> With the last version warning "bad unlock balance detected!" was gone,
> but the user-after-free issue remains.
> And again "Workqueue: events drm_sched_entity_kill_jobs_work [gpu_sched]".

Hi Mikhail,

Looks like 45ecaea738830b9d521c93520c8f201359dcbd95 ("drm/sched: Partial 
revert of 'drm/sched: Keep s_fence->parent pointer'") introduced the 
error. Try reverting it and check if the use-after-free still happens.

Best Regards,
- Maíra Canal

> 
> [  297.834779] ------------[ cut here ]------------
> [  297.834818] refcount_t: underflow; use-after-free.
> [  297.834831] WARNING: CPU: 30 PID: 2377 at lib/refcount.c:28
> refcount_warn_saturate+0xba/0x110
> [  297.834838] Modules linked in: uinput rfcomm snd_seq_dummy
> snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
> nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
> qrtr bnep sunrpc binfmt_misc snd_seq_midi snd_seq_midi_event mt76x2u
> mt76x2_common mt76x02_usb mt76_usb mt76x02_lib snd_hda_codec_realtek
> iwlmvm intel_rapl_msr snd_hda_codec_generic snd_hda_codec_hdmi mt76
> vfat fat snd_hda_intel intel_rapl_common mac80211 snd_intel_dspcfg
> snd_intel_sdw_acpi snd_usb_audio snd_hda_codec snd_usbmidi_lib btusb
> edac_mce_amd iwlwifi libarc4 uvcvideo snd_hda_core btrtl snd_rawmidi
> snd_hwdep videobuf2_vmalloc btbcm kvm_amd videobuf2_memops snd_seq
> iwlmei btintel videobuf2_v4l2 eeepc_wmi snd_seq_device
> videobuf2_common btmtk kvm xpad videodev joydev irqbypass snd_pcm
> asus_wmi hid_logitech_hidpp ff_memless cfg80211 bluetooth rapl mc
> [  297.834932]  ledtrig_audio snd_timer sparse_keymap platform_profile
> wmi_bmof snd video pcspkr k10temp i2c_piix4 rfkill soundcore mei
> asus_ec_sensors acpi_cpufreq zram amdgpu drm_ttm_helper ttm
> crct10dif_pclmul crc32_pclmul crc32c_intel iommu_v2 ucsi_ccg gpu_sched
> typec_ucsi drm_buddy ghash_clmulni_intel drm_display_helper ccp igb
> typec sp5100_tco nvme cec nvme_core dca wmi ip6_tables ip_tables fuse
> [  297.834978] Unloaded tainted modules: amd64_edac():1 amd64_edac():1
> amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
> amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
> pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
> amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
> pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
> pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
> pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
> pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
> pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
> amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
> pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
> amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
> amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
> pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
> pcc_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 fjes():1
> [  297.835055]  pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1
> pcc_cpufreq():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1
> [  297.835071] CPU: 30 PID: 2377 Comm: kworker/30:6 Tainted: G
> W    L    -------  ---
> 6.0.0-0.rc1.20220817git3cc40a443a04.14.fc38.x86_64 #1
> [  297.835075] Hardware name: System manufacturer System Product
> Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022
> [  297.835078] Workqueue: events drm_sched_entity_kill_jobs_work [gpu_sched]
> [  297.835085] RIP: 0010:refcount_warn_saturate+0xba/0x110
> [  297.835088] Code: 01 01 e8 59 59 6f 00 0f 0b e9 22 46 a5 00 80 3d
> be 7d be 01 00 75 85 48 c7 c7 c0 99 8e aa c6 05 ae 7d be 01 01 e8 36
> 59 6f 00 <0f> 0b e9 ff 45 a5 00 80 3d 99 7d be 01 00 0f 85 5e ff ff ff
> 48 c7
> [  297.835091] RSP: 0018:ffffbd3506df7e60 EFLAGS: 00010286
> [  297.835095] RAX: 0000000000000026 RBX: ffff961b250cbc28 RCX: 0000000000000000
> [  297.835097] RDX: 0000000000000001 RSI: ffffffffaa8d07a4 RDI: 00000000ffffffff
> [  297.835100] RBP: ffff96276a3f5600 R08: 0000000000000000 R09: ffffbd3506df7d10
> [  297.835102] R10: 0000000000000003 R11: ffff9627ae2fffe8 R12: ffff96276a3fc800
> [  297.835105] R13: ffff9618c03e6600 R14: ffff96276a3fc805 R15: ffff961b250cbc30
> [  297.835108] FS:  0000000000000000(0000) GS:ffff96276a200000(0000)
> knlGS:0000000000000000
> [  297.835110] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  297.835113] CR2: 0000621001e4a000 CR3: 000000018d958000 CR4: 0000000000350ee0
> [  297.835116] Call Trace:
> [  297.835118]  <TASK>
> [  297.835121]  process_one_work+0x2a0/0x600
> [  297.835133]  worker_thread+0x4f/0x3a0
> [  297.835139]  ? process_one_work+0x600/0x600
> [  297.835142]  kthread+0xf5/0x120
> [  297.835145]  ? kthread_complete_and_exit+0x20/0x20
> [  297.835151]  ret_from_fork+0x22/0x30
> [  297.835166]  </TASK>
> [  297.835168] irq event stamp: 198245
> [  297.835171] hardirqs last  enabled at (198253):
> [<ffffffffa918ce7e>] __up_console_sem+0x5e/0x70
> [  297.835175] hardirqs last disabled at (198260):
> [<ffffffffa918ce63>] __up_console_sem+0x43/0x70
> [  297.835177] softirqs last  enabled at (196454):
> [<ffffffffa9de3a4e>] addrconf_verify_rtnl+0x23e/0x920
> [  297.835182] softirqs last disabled at (196448):
> [<ffffffffa9de3835>] addrconf_verify_rtnl+0x25/0x920
> [  297.835185] ---[ end trace 0000000000000000 ]---
> 
> 
> Fill kernel log: https://pastebin.com/zbbY2zDU
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG][5.20] refcount_t: underflow; use-after-free
  2022-08-17 18:43           ` Maíra Canal
@ 2022-08-17 20:57             ` Mikhail Gavrilov
  2022-08-19 12:13               ` Maíra Canal
  0 siblings, 1 reply; 13+ messages in thread
From: Mikhail Gavrilov @ 2022-08-17 20:57 UTC (permalink / raw)
  To: Maíra Canal
  Cc: Melissa Wen, Christian König, dri-devel, amd-gfx list,
	Linux List Kernel Mailing

On Wed, Aug 17, 2022 at 11:43 PM Maíra Canal <mairacanal@riseup.net> wrote:
>
> Hi Mikhail,
>
> Looks like 45ecaea738830b9d521c93520c8f201359dcbd95 ("drm/sched: Partial
> revert of 'drm/sched: Keep s_fence->parent pointer'") introduced the
> error. Try reverting it and check if the use-after-free still happens.

Thanks, but unfortunately, this did not lead to the expected result.
Again happens use-after-free in an incomprehensible context.
From the new: added warning "suspicious RCU usage" but it looks like
it is completely not related to the use-after-free issue.

[ 215.434115] ------------[ cut here ]------------
[ 215.434184] refcount_t: underflow; use-after-free.
[ 215.434204] WARNING: CPU: 7 PID: 1258 at lib/refcount.c:28
refcount_warn_saturate+0xba/0x110
[ 215.434214] Modules linked in: uinput rfcomm snd_seq_dummy
snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
qrtr bnep sunrpc binfmt_misc snd_seq_midi snd_seq_midi_event
intel_rapl_msr intel_rapl_common snd_hda_codec_realtek vfat
snd_hda_codec_generic snd_hda_codec_hdmi mt76x2u fat mt76x2_common
snd_hda_intel mt76x02_usb snd_intel_dspcfg snd_intel_sdw_acpi mt76_usb
iwlmvm edac_mce_amd snd_usb_audio snd_hda_codec mt76x02_lib
snd_hda_core snd_usbmidi_lib snd_hwdep snd_rawmidi uvcvideo mt76
kvm_amd snd_seq videobuf2_vmalloc videobuf2_memops snd_seq_device
mac80211 videobuf2_v4l2 videobuf2_common kvm btusb iwlwifi snd_pcm
btrtl videodev libarc4 eeepc_wmi btbcm asus_wmi iwlmei btintel
ledtrig_audio xpad irqbypass sparse_keymap btmtk platform_profile
joydev
[ 215.434436] hid_logitech_hidpp rapl ff_memless mc snd_timer
bluetooth cfg80211 video pcspkr wmi_bmof snd soundcore k10temp
i2c_piix4 rfkill mei asus_ec_sensors acpi_cpufreq zram amdgpu
drm_ttm_helper ttm iommu_v2 ucsi_ccg gpu_sched crct10dif_pclmul
crc32_pclmul typec_ucsi drm_buddy crc32c_intel ghash_clmulni_intel ccp
igb sp5100_tco typec drm_display_helper nvme dca nvme_core cec wmi
ip6_tables ip_tables fuse
[ 215.434528] Unloaded tainted modules: amd64_edac():1 amd64_edac():1
amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
pcc_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 fjes():1
[ 215.434672] pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1
pcc_cpufreq():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1
[ 215.434702] CPU: 7 PID: 1258 Comm: kworker/7:3 Tainted: G W L
------- --- 6.0.0-0.rc1.20220817git3cc40a443a04.14.fc38.x86_64 #1
[ 215.434709] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022
[ 215.434715] Workqueue: events drm_sched_entity_kill_jobs_work [gpu_sched]
[ 215.434728] RIP: 0010:refcount_warn_saturate+0xba/0x110
[ 215.434734] Code: 01 01 e8 59 59 6f 00 0f 0b e9 22 46 a5 00 80 3d be
7d be 01 00 75 85 48 c7 c7 c0 99 8e 92 c6 05 ae 7d be 01 01 e8 36 59
6f 00 <0f> 0b e9 ff 45 a5 00 80 3d 99 7d be 01 00 0f 85 5e ff ff ff 48
c7
[ 215.434740] RSP: 0018:ffff9ccb0237fe60 EFLAGS: 00010286
[ 215.434747] RAX: 0000000000000026 RBX: ffff8d531f6f2828 RCX: 0000000000000000
[ 215.434753] RDX: 0000000000000001 RSI: ffffffff928d07a4 RDI: 00000000ffffffff
[ 215.434757] RBP: ffff8d61e47f5600 R08: 0000000000000000 R09: ffff9ccb0237fd10
[ 215.434762] R10: 0000000000000003 R11: ffff8d622e2fffe8 R12: ffff8d61e47fc800
[ 215.434767] R13: ffff8d5313e95500 R14: ffff8d61e47fc805 R15: ffff8d531f6f2830
[ 215.434772] FS: 0000000000000000(0000) GS:ffff8d61e4600000(0000)
knlGS:0000000000000000
[ 215.434777] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 215.434782] CR2: 00007f0c8b815048 CR3: 00000001ab0e8000 CR4: 0000000000350ee0
[ 215.434788] Call Trace:
[ 215.434792] <TASK>
[ 215.434797] process_one_work+0x2a0/0x600
[ 215.434819] worker_thread+0x4f/0x3a0
[ 215.434830] ? process_one_work+0x600/0x600
[ 215.434836] kthread+0xf5/0x120
[ 215.434842] ? kthread_complete_and_exit+0x20/0x20
[ 215.434854] ret_from_fork+0x22/0x30
[ 215.434881] </TASK>
[ 215.434885] irq event stamp: 134873
[ 215.434890] hardirqs last enabled at (134881): [<ffffffff9118ce7e>]
__up_console_sem+0x5e/0x70
[ 215.434897] hardirqs last disabled at (134888): [<ffffffff9118ce63>]
__up_console_sem+0x43/0x70
[ 215.434903] softirqs last enabled at (131264): [<ffffffff910ff769>]
__irq_exit_rcu+0xf9/0x170
[ 215.434910] softirqs last disabled at (131257): [<ffffffff910ff769>]
__irq_exit_rcu+0xf9/0x170
[ 215.434917] ---[ end trace 0000000000000000 ]---

Full kerner log: https://pastebin.com/qED477Pz

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG][5.20] refcount_t: underflow; use-after-free
  2022-08-17 20:57             ` Mikhail Gavrilov
@ 2022-08-19 12:13               ` Maíra Canal
  2022-08-24 21:44                 ` Mikhail Gavrilov
  0 siblings, 1 reply; 13+ messages in thread
From: Maíra Canal @ 2022-08-19 12:13 UTC (permalink / raw)
  To: Mikhail Gavrilov
  Cc: Melissa Wen, Christian König, dri-devel, amd-gfx list,
	Linux List Kernel Mailing



On 8/17/22 17:57, Mikhail Gavrilov wrote:
> On Wed, Aug 17, 2022 at 11:43 PM Maíra Canal <mairacanal@riseup.net> wrote:
>>
>> Hi Mikhail,
>>
>> Looks like 45ecaea738830b9d521c93520c8f201359dcbd95 ("drm/sched: Partial
>> revert of 'drm/sched: Keep s_fence->parent pointer'") introduced the
>> error. Try reverting it and check if the use-after-free still happens.
> 
> Thanks, but unfortunately, this did not lead to the expected result.
> Again happens use-after-free in an incomprehensible context.
> From the new: added warning "suspicious RCU usage" but it looks like
> it is completely not related to the use-after-free issue.
> 

Hi Mikhail,

Could you please specify the steps to reproduce this use-after-free? I
will try to reproduce it on the RX5700 XT and bisect the issue.

Best Regards,
- Maíra Canal

> [ 215.434115] ------------[ cut here ]------------
> [ 215.434184] refcount_t: underflow; use-after-free.
> [ 215.434204] WARNING: CPU: 7 PID: 1258 at lib/refcount.c:28
> refcount_warn_saturate+0xba/0x110
> [ 215.434214] Modules linked in: uinput rfcomm snd_seq_dummy
> snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
> nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
> qrtr bnep sunrpc binfmt_misc snd_seq_midi snd_seq_midi_event
> intel_rapl_msr intel_rapl_common snd_hda_codec_realtek vfat
> snd_hda_codec_generic snd_hda_codec_hdmi mt76x2u fat mt76x2_common
> snd_hda_intel mt76x02_usb snd_intel_dspcfg snd_intel_sdw_acpi mt76_usb
> iwlmvm edac_mce_amd snd_usb_audio snd_hda_codec mt76x02_lib
> snd_hda_core snd_usbmidi_lib snd_hwdep snd_rawmidi uvcvideo mt76
> kvm_amd snd_seq videobuf2_vmalloc videobuf2_memops snd_seq_device
> mac80211 videobuf2_v4l2 videobuf2_common kvm btusb iwlwifi snd_pcm
> btrtl videodev libarc4 eeepc_wmi btbcm asus_wmi iwlmei btintel
> ledtrig_audio xpad irqbypass sparse_keymap btmtk platform_profile
> joydev
> [ 215.434436] hid_logitech_hidpp rapl ff_memless mc snd_timer
> bluetooth cfg80211 video pcspkr wmi_bmof snd soundcore k10temp
> i2c_piix4 rfkill mei asus_ec_sensors acpi_cpufreq zram amdgpu
> drm_ttm_helper ttm iommu_v2 ucsi_ccg gpu_sched crct10dif_pclmul
> crc32_pclmul typec_ucsi drm_buddy crc32c_intel ghash_clmulni_intel ccp
> igb sp5100_tco typec drm_display_helper nvme dca nvme_core cec wmi
> ip6_tables ip_tables fuse
> [ 215.434528] Unloaded tainted modules: amd64_edac():1 amd64_edac():1
> amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
> amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
> amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
> pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
> pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
> amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
> amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
> amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
> amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
> amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
> amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
> pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
> amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
> pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
> pcc_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 fjes():1
> [ 215.434672] pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1
> pcc_cpufreq():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1
> [ 215.434702] CPU: 7 PID: 1258 Comm: kworker/7:3 Tainted: G W L
> ------- --- 6.0.0-0.rc1.20220817git3cc40a443a04.14.fc38.x86_64 #1
> [ 215.434709] Hardware name: System manufacturer System Product
> Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022
> [ 215.434715] Workqueue: events drm_sched_entity_kill_jobs_work [gpu_sched]
> [ 215.434728] RIP: 0010:refcount_warn_saturate+0xba/0x110
> [ 215.434734] Code: 01 01 e8 59 59 6f 00 0f 0b e9 22 46 a5 00 80 3d be
> 7d be 01 00 75 85 48 c7 c7 c0 99 8e 92 c6 05 ae 7d be 01 01 e8 36 59
> 6f 00 <0f> 0b e9 ff 45 a5 00 80 3d 99 7d be 01 00 0f 85 5e ff ff ff 48
> c7
> [ 215.434740] RSP: 0018:ffff9ccb0237fe60 EFLAGS: 00010286
> [ 215.434747] RAX: 0000000000000026 RBX: ffff8d531f6f2828 RCX: 0000000000000000
> [ 215.434753] RDX: 0000000000000001 RSI: ffffffff928d07a4 RDI: 00000000ffffffff
> [ 215.434757] RBP: ffff8d61e47f5600 R08: 0000000000000000 R09: ffff9ccb0237fd10
> [ 215.434762] R10: 0000000000000003 R11: ffff8d622e2fffe8 R12: ffff8d61e47fc800
> [ 215.434767] R13: ffff8d5313e95500 R14: ffff8d61e47fc805 R15: ffff8d531f6f2830
> [ 215.434772] FS: 0000000000000000(0000) GS:ffff8d61e4600000(0000)
> knlGS:0000000000000000
> [ 215.434777] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 215.434782] CR2: 00007f0c8b815048 CR3: 00000001ab0e8000 CR4: 0000000000350ee0
> [ 215.434788] Call Trace:
> [ 215.434792] <TASK>
> [ 215.434797] process_one_work+0x2a0/0x600
> [ 215.434819] worker_thread+0x4f/0x3a0
> [ 215.434830] ? process_one_work+0x600/0x600
> [ 215.434836] kthread+0xf5/0x120
> [ 215.434842] ? kthread_complete_and_exit+0x20/0x20
> [ 215.434854] ret_from_fork+0x22/0x30
> [ 215.434881] </TASK>
> [ 215.434885] irq event stamp: 134873
> [ 215.434890] hardirqs last enabled at (134881): [<ffffffff9118ce7e>]
> __up_console_sem+0x5e/0x70
> [ 215.434897] hardirqs last disabled at (134888): [<ffffffff9118ce63>]
> __up_console_sem+0x43/0x70
> [ 215.434903] softirqs last enabled at (131264): [<ffffffff910ff769>]
> __irq_exit_rcu+0xf9/0x170
> [ 215.434910] softirqs last disabled at (131257): [<ffffffff910ff769>]
> __irq_exit_rcu+0xf9/0x170
> [ 215.434917] ---[ end trace 0000000000000000 ]---
> 
> Full kerner log: https://pastebin.com/qED477Pz
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG][5.20] refcount_t: underflow; use-after-free
  2022-08-19 12:13               ` Maíra Canal
@ 2022-08-24 21:44                 ` Mikhail Gavrilov
  2022-09-19 23:27                   ` Mikhail Gavrilov
  0 siblings, 1 reply; 13+ messages in thread
From: Mikhail Gavrilov @ 2022-08-24 21:44 UTC (permalink / raw)
  To: Maíra Canal
  Cc: Melissa Wen, Christian König, dri-devel, amd-gfx list,
	Linux List Kernel Mailing

On Fri, Aug 19, 2022 at 5:13 PM Maíra Canal <mairacanal@riseup.net> wrote:
>
> Hi Mikhail,
>
> Could you please specify the steps to reproduce this use-after-free? I
> will try to reproduce it on the RX5700 XT and bisect the issue.
>

Hi Maíra, thanks for help.

I'm afraid that it will be unrealistic to reproduce, because on a
laptop with 6800M (also RDNA 2 graphics) the problem does not repeat.

Sorry for the long silence, but I was trying to bisect the problem myself.

git bisect start
# status: waiting for both good and bad commits
# good: [3d7cb6b04c3f3115719235cc6866b10326de34cd] Linux 5.19
git bisect good 3d7cb6b04c3f3115719235cc6866b10326de34cd
# status: waiting for bad commit, 1 good commit known
# bad: [7ebfc85e2cd7b08f518b526173e9a33b56b3913b] Merge tag
'net-6.0-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
git bisect bad 7ebfc85e2cd7b08f518b526173e9a33b56b3913b

# bad: [b44f2fd87919b5ae6e1756d4c7ba2cbba22238e1] Merge tag
'drm-next-2022-08-03' of git://anongit.freedesktop.org/drm/drm
# 001: GPU hangs + use-after-free issue - https://pastebin.com/z86E9ydx
git bisect bad b44f2fd87919b5ae6e1756d4c7ba2cbba22238e1

# good: [526942b8134cc34d25d27f95dfff98b8ce2f6fcd] Merge tag
'ata-5.20-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
# 002: good - https://pastebin.com/9qki65Sj
git bisect good 526942b8134cc34d25d27f95dfff98b8ce2f6fcd

# good: [45490ce2ff833c4ec0de66705e46ba41320860cb] nfp: flower: add
support for tunnel offload without key ID
# 003: good - https://pastebin.com/vHk5eRkw
git bisect good 45490ce2ff833c4ec0de66705e46ba41320860cb

# skip: [e23a5e14aa278858c2e3d81ec34e83aa9a4177c5] Backmerge tag
'v5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux
into drm-next
# 004: GPU not switched in graphic mode - https://pastebin.com/RmqCTMLD
git bisect skip e23a5e14aa278858c2e3d81ec34e83aa9a4177c5

# bad: [b2065fb21d9a789b14f737ea90facedabadeb8a4] drm/amdgpu: fix
i2s_pdata out of bound array access
# 005: GPU hangs + use-after-free issue - https://pastebin.com/Zgw5Hc48
git bisect bad b2065fb21d9a789b14f737ea90facedabadeb8a4

# skip: [344feb7ccf764756937cfd74fa4ac5caba069c99] Merge tag
'amd-drm-next-5.20-2022-07-05' of
https://gitlab.freedesktop.org/agd5f/linux into drm-next
# 006: GPU not switched in graphic mode - https://pastebin.com/b8BUBE7Q
git bisect skip 344feb7ccf764756937cfd74fa4ac5caba069c99

# skip: [869b10ac8d2300327f554d83f4dbab041bf27d49] drm/amdgpu: add dm
ip block for dcn 3.1.4
# 007: GPU not switched in graphic mode - https://pastebin.com/byd7HECH
git bisect skip 869b10ac8d2300327f554d83f4dbab041bf27d49

# skip: [676ad8e997036e2f815c293b76c356fb7cc97a08] drm: rcar-du: Lift
z-pos restriction on primary plane for Gen3
# 008: GPU not switched in graphic mode - https://pastebin.com/3fXCTinb
git bisect skip 676ad8e997036e2f815c293b76c356fb7cc97a08

# skip: [5c57cbc390b166950c2e6c2f0c4edaeb0f47e97d] drm/bridge: lt9211:
Convert to drm_of_get_data_lanes_count
# 009: Build error - https://pastebin.com/rxHe9QRB
git bisect skip 5c57cbc390b166950c2e6c2f0c4edaeb0f47e97d

# skip: [6db5e0c8692e590734a7ec7455365d9cbaa15ef1] Merge tag
'drm-intel-next-2022-07-06' of
git://anongit.freedesktop.org/drm/drm-intel into drm-next
# 010: GPU not switched in graphic mode - https://pastebin.com/rqubSuc8
git bisect skip 6db5e0c8692e590734a7ec7455365d9cbaa15ef1

# skip: [5d763a9955f0fbf2681a2f1fa87c416056bd0c89] drm/amd/display:
Remove compiler warning
# 011: GPU not switched in graphic mode - https://pastebin.com/BrJs6ybP
git bisect skip 5d763a9955f0fbf2681a2f1fa87c416056bd0c89

# skip: [e6c2db2be986158afb9991d9fa8a38fe65a88516] drm/i915: Don't use
DRM_DEBUG_WARN_ON for unexpected l3bank/mslice config
# 012: GPU not switched in graphic mode - https://pastebin.com/yxppyqbD
git bisect skip e6c2db2be986158afb9991d9fa8a38fe65a88516

# bad: [cb6b81b21bd9cf09d72b7fe711be1b55001eb166] Merge tag
'drm-misc-next-fixes-2022-07-21' of
git://anongit.freedesktop.org/drm/drm-misc into drm-next
# 013: GPU hangs without use-after-free issue - https://pastebin.com/iRek4bBy
git bisect bad cb6b81b21bd9cf09d72b7fe711be1b55001eb166

# skip: [48b927770f8ad3f8cf4a024a552abf272af9f592]
drm/exynos/exynos7_drm_decon: free resources when clk_set_parent()
failed.
# 014: GPU not switched in graphic mode - https://pastebin.com/ekp10xhP
git bisect skip 48b927770f8ad3f8cf4a024a552abf272af9f592

# skip: [c5da61cf5bab30059f22ea368702c445ee87171a] drm/amdgpu/display:
add missing FP_START/END checks dcn32_clk_mgr.c
# 015: GPU not switched in graphic mode - https://pastebin.com/YbskKWmA
git bisect skip c5da61cf5bab30059f22ea368702c445ee87171a

# skip: [a77f7c89e62c6dfe405a64995812746f27adc510] drm/edid: convert
drm_gtf_modes_for_range() to drm_edid
# 016: GPU not switched in graphic mode - https://pastebin.com/bA2AwkJ7
git bisect skip a77f7c89e62c6dfe405a64995812746f27adc510

# skip: [6fde8eec71796f3534f0c274066862829813b21f] drm/doc: Add KUnit
documentation
# 017: GPU not switched in graphic mode - https://pastebin.com/wiByMQDG
git bisect skip 6fde8eec71796f3534f0c274066862829813b21f

# skip: [bbded689680f0f2e65d4a57d0dfa654671052d56] drm/edid: convert
drm_edid_iter_begin() to drm_edid
# 018: GPU not switched in graphic mode - https://pastebin.com/wYjmXmHH
git bisect skip bbded689680f0f2e65d4a57d0dfa654671052d56

# skip: [c6dac00340fcd20b076cd2c3413610d1d7ade7bd] drm/vc4: hvs: Add
debugfs node that dumps the current display lists
# 019: GPU not switched in graphic mode - https://pastebin.com/JvuvfNt5
git bisect skip c6dac00340fcd20b076cd2c3413610d1d7ade7bd

# skip: [786a4f668550f8576c28d167fd50f4ef84af8ba4] drm/msm/dp: rename
second dp_display_enable()'s argument
# 020: GPU not switched in graphic mode - https://pastebin.com/428dBZad
git bisect skip 786a4f668550f8576c28d167fd50f4ef84af8ba4

# skip: [427a60c1c30e1c0e9d0800a63df51985aaf3a26a] drm/amd/display:
OVT Update on InfoFrame and Mode Management
# 021: GPU not switched in graphic mode - https://pastebin.com/m68JFsHe
git bisect skip 427a60c1c30e1c0e9d0800a63df51985aaf3a26a

# skip: [b87d39019651c9cae169396cf5ae525393084490] drm/i915/sseu:
Disassociate internal subslice mask representation from uapi
# 022: GPU not switched in graphic mode - https://pastebin.com/eVbTEj3M
git bisect skip b87d39019651c9cae169396cf5ae525393084490

# skip: [fb10dc451c0f15e3c19798a2f41d357f3f7576f5] drm/vc4: hdmi:
Correct HDMI timing registers for interlaced modes
# 023: GPU not switched in graphic mode - https://pastebin.com/dPJv2R6H
git bisect skip fb10dc451c0f15e3c19798a2f41d357f3f7576f5

# skip: [1c89b4b718168aa6cf136a984b474d663e4203b7] drm/gem-vram: Share
code between GEM VRAM's _{prepare, cleanup}_fb()
# 024: GPU not switched in graphic mode - https://pastebin.com/bjxKiY8f
git bisect skip 1c89b4b718168aa6cf136a984b474d663e4203b7

# skip: [bdd0d7e290e0e4c8f7545fff89770abbd22bd51a] drm/amd/display:
fix non-x86/PPC64 compilation
# 025: GPU not switched in graphic mode - https://pastebin.com/up0Hk998
git bisect skip bdd0d7e290e0e4c8f7545fff89770abbd22bd51a

# skip: [9569ff1a188fe48b46eb1ac2ae4543c271e0d4c2] drm/i915: Fix error
code in icl_compute_combo_phy_dpll()
# 026: GPU not switched in graphic mode - https://pastebin.com/Hq05Jnq1
git bisect skip 9569ff1a188fe48b46eb1ac2ae4543c271e0d4c2

# skip: [5074376822fe99fa4ce344b851c5016d00c0444f] drm/rockchip: Fix
an error handling path rockchip_dp_probe()
# 027: GPU not switched in graphic mode - https://pastebin.com/B6K3E7hh
git bisect skip 5074376822fe99fa4ce344b851c5016d00c0444f

# skip: [58eaa6b3fb636072a4f19e6b6c76bbf564e95b95] drm/i915/guc/slpc:
Use non-blocking H2G for waitboost
# 028: GPU not switched in graphic mode - https://pastebin.com/W9em564t
git bisect skip 58eaa6b3fb636072a4f19e6b6c76bbf564e95b95

# skip: [72bd9ea389c70ac948f48d20c0e4ae70c0153940] drm: Remove
linux/media-bus-format.h from drm_crtc.h
# 029: GPU not switched in graphic mode - https://pastebin.com/i2sDFXVc
git bisect skip 72bd9ea389c70ac948f48d20c0e4ae70c0153940

# skip: [851dd8625320fb626b6ab6399b2402fd84abcdfb] drm/amdgpu: fix
scratch register access method in SRIOV
# 030: GPU not switched in graphic mode - https://pastebin.com/0L7XA3dj
git bisect skip 851dd8625320fb626b6ab6399b2402fd84abcdfb

# skip: [d8b599bf625d1d818fdbb322a272fd2a5ea32e38] drm/bridge:
ti-sn65dsi86: Use atomic variants of drm_bridge_funcs
# 031: GPU not switched in graphic mode - https://pastebin.com/5V8KMZUv
git bisect skip d8b599bf625d1d818fdbb322a272fd2a5ea32e38

# skip: [89ed996b888faaf11c69bb4cbc19f21475c9050e]
drm/nouveau/kms/nv50-: remove unused functions
# 032: GPU not switched in graphic mode - https://pastebin.com/Md13jJmq
git bisect skip 89ed996b888faaf11c69bb4cbc19f21475c9050e

# skip: [c5cfd54e93f89c9cd5cf0f61408bf3e11c7e6684] drm/amdgpu: Fix
acronym typo in glossary
# 033: GPU not switched in graphic mode - https://pastebin.com/mX3QLTyp
git bisect skip c5cfd54e93f89c9cd5cf0f61408bf3e11c7e6684

# skip: [7fc83cd079bba8b96b0f46e31f26c8f31c814146] drm/amd/pm: support
BAMACO reset on smu_v13_0_7
# 034: GPU not switched in graphic mode - https://pastebin.com/gsLz4Q2w
git bisect skip 7fc83cd079bba8b96b0f46e31f26c8f31c814146

# skip: [3cffeffe051a961417bc26f2053bced4cff83119] drm/amd/display:
Add DCN314 DC resources
# 035: GPU not switched in graphic mode - https://pastebin.com/WnEaVEva
git bisect skip 3cffeffe051a961417bc26f2053bced4cff83119

# skip: [9e9fa6a9198b767b00f48160800128e83a038f9f] udmabuf: Set the
DMA mask for the udmabuf device (v2)
# 036: GPU not switched in graphic mode - https://pastebin.com/2wHWbaaL
git bisect skip 9e9fa6a9198b767b00f48160800128e83a038f9f

# skip: [3e7f74dcfb7233ba3f8b3879066fdd3e79f2f701] drm: rcar-du: Add
num_rpf to struct rcar_du_device_info
# 037: GPU not switched in graphic mode - https://pastebin.com/LE3K5KkF
git bisect skip 3e7f74dcfb7233ba3f8b3879066fdd3e79f2f701

# skip: [bb4f196b47b6554ba89f02ec60246f0c643a4bf8] drm/amdgpu/vcn:
support unified queue only in vcn4
# 038: GPU not switched in graphic mode - https://pastebin.com/uGEpH3Z5
git bisect skip bb4f196b47b6554ba89f02ec60246f0c643a4bf8

# skip: [e9794c88cd6cf4be4a79188916a75539751f532c] drm/i915: remove
single-use GEM_DEBUG_EXEC()
# 039: GPU not switched in graphic mode - https://pastebin.com/HF5YCX0B
git bisect skip e9794c88cd6cf4be4a79188916a75539751f532c

# skip: [06f2f7772dc7ff2e3734e654cb2d0b588076860d] drm/amd/display:
Fix eDP not light up on resume
# 040: GPU not switched in graphic mode - https://pastebin.com/grj2sMfN
git bisect skip 06f2f7772dc7ff2e3734e654cb2d0b588076860d

# skip: [8aa5bcb61612060429223d1fbb7a1c30a579fc1f] gpu: host1x: Add
context device management code
# 041: GPU not switched in graphic mode - https://pastebin.com/hWQH5ejq
git bisect skip 8aa5bcb61612060429223d1fbb7a1c30a579fc1f

# skip: [32e8ab05ed81c995b92f12b590c12ef951ca1129] drm/amd/display:
Update SW state correctly for FCLK
# 042: GPU not switched in graphic mode - https://pastebin.com/RzJdzsRJ
git bisect skip 32e8ab05ed81c995b92f12b590c12ef951ca1129

# skip: [9975af040a04ba9aef33f3ef1ca4e8f04c7223dd] drm/edid: convert
drm_detect_monitor_audio() to use cea db iter
# 043: GPU not switched in graphic mode - https://pastebin.com/z3z0xUip
git bisect skip 9975af040a04ba9aef33f3ef1ca4e8f04c7223dd

# skip: [ceb180361e3851007547c55035cd1de03f108f75] amdgpu/pm: Fix
possible array out-of-bounds if SCLK levels != 2
# 044: GPU not switched in graphic mode - https://pastebin.com/0t9V7LNE
git bisect skip ceb180361e3851007547c55035cd1de03f108f75

# skip: [8db73897698ccb4eb70ab103245372569ff5a5ec] drm/edid: detect
color formats and CTA revision in all CTA extensions
# 045: GPU not switched in graphic mode - https://pastebin.com/u8QhS8ru
git bisect skip 8db73897698ccb4eb70ab103245372569ff5a5ec

Bisect had to be aborted because yesterday gcc was updated to version
12.2 which makes it impossible to build a kernel with commits prior to
0af5cb349a2c97fbabb3cede96efcde9d54b7940.

As you can see a lot of steps I marked as "skip" because on these
steps the GNOME (gdm) graphical login screen doesn't appear. GPU stuck
on boot phase.

The question I consider open is whether we are looking for
use-after-free or the GPU freeze that preceded it?

If we search for use-after-free then I should answer on commit
cb6b81b21bd9cf09d72b7fe711be1b55001eb166 - good, because here the GPU
hangs without use-after-free.

To each bisect step I added by link to related the kernel log and
build log if the build was unsuccessful.

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG][5.20] refcount_t: underflow; use-after-free
  2022-08-24 21:44                 ` Mikhail Gavrilov
@ 2022-09-19 23:27                   ` Mikhail Gavrilov
  0 siblings, 0 replies; 13+ messages in thread
From: Mikhail Gavrilov @ 2022-09-19 23:27 UTC (permalink / raw)
  To: Maíra Canal
  Cc: Melissa Wen, Christian König, dri-devel, amd-gfx list,
	Linux List Kernel Mailing

Hi!
Unfortunately the use-after-free issue still happens on the 6.0-rc5 kernel.
The issue became hard to repeat. I spent the whole day at the computer
when use-after-free again happened, I was playing the game Tiny Tina's
Wonderlands.
Therefore, forget about repeatability. It remains only to hope for
logs and tracing.
I didn't see anything new in the logs. It seems that we need to
somehow expand the logging so that the next time this happens we have
more information.

Sep 18 20:52:16 primary-ws gnome-shell[2388]:
meta_window_set_stack_position_no_sync: assertion
'window->stack_position >= 0' failed
Sep 18 20:52:27 primary-ws gnome-shell[2388]:
meta_window_set_stack_position_no_sync: assertion
'window->stack_position >= 0' failed
Sep 18 20:53:44 primary-ws gnome-shell[2388]: Window manager warning:
Window 0x4e00003 sets an MWM hint indicating it isn't resizable, but
sets min size 1 x 1 and max size 2147483647 x 2147483647; this doesn't
make much sense.
Sep 18 20:53:45 primary-ws kernel: umip_printk: 11 callbacks suppressed
Sep 18 20:53:45 primary-ws kernel: umip: Wonderlands.exe[213853]
ip:14ebb0d03 sp:4ee528: SGDT instruction cannot be used by
applications.
Sep 18 20:53:45 primary-ws kernel: umip: Wonderlands.exe[213853]
ip:14ebb0d03 sp:4ee528: For now, expensive software emulation returns
the result.
Sep 18 20:53:53 primary-ws gnome-shell[2388]:
meta_window_set_stack_position_no_sync: assertion
'window->stack_position >= 0' failed
Sep 18 20:53:53 primary-ws kernel: umip: Wonderlands.exe[213853]
ip:14ebb0d03 sp:4ee528: SGDT instruction cannot be used by
applications.
Sep 18 20:53:53 primary-ws kernel: umip: Wonderlands.exe[213853]
ip:14ebb0d03 sp:4ee528: For now, expensive software emulation returns
the result.
Sep 18 20:54:15 primary-ws kernel: umip: Wonderlands.exe[214194]
ip:15a270815 sp:6eaef490: SGDT instruction cannot be used by
applications.
Sep 18 20:56:01 primary-ws kernel: umip_printk: 15 callbacks suppressed
Sep 18 20:56:01 primary-ws kernel: umip: Wonderlands.exe[213853]
ip:15e3a82b0 sp:4ed178: SGDT instruction cannot be used by
applications.
Sep 18 20:56:01 primary-ws kernel: umip: Wonderlands.exe[213853]
ip:15e3a82b0 sp:4ed178: For now, expensive software emulation returns
the result.
Sep 18 20:56:03 primary-ws kernel: umip: Wonderlands.exe[213853]
ip:15e3a82b0 sp:4edbe8: SGDT instruction cannot be used by
applications.
Sep 18 20:56:03 primary-ws kernel: umip: Wonderlands.exe[213853]
ip:15e3a82b0 sp:4edbe8: For now, expensive software emulation returns
the result.
Sep 18 20:56:03 primary-ws kernel: umip: Wonderlands.exe[213853]
ip:15e3a82b0 sp:4ebf18: SGDT instruction cannot be used by
applications.
Sep 18 20:57:55 primary-ws kernel: ------------[ cut here ]------------
Sep 18 20:57:55 primary-ws kernel: refcount_t: underflow; use-after-free.
Sep 18 20:57:55 primary-ws kernel: WARNING: CPU: 22 PID: 235114 at
lib/refcount.c:28 refcount_warn_saturate+0xba/0x110
Sep 18 20:57:55 primary-ws kernel: Modules linked in: tls uinput
rfcomm snd_seq_dummy snd_hrtimer nft_objref nf_conntrack_netbios_ns
nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
nft_reject_inet nf_reject_ipv4 nf_>
Sep 18 20:57:55 primary-ws kernel:  asus_wmi ledtrig_audio
sparse_keymap platform_profile irqbypass rfkill mc rapl snd_timer
video wmi_bmof pcspkr snd k10temp i2c_piix4 soundcore acpi_cpufreq
zram amdgpu drm_ttm_helper ttm iommu_v2 crct1>
Sep 18 20:57:55 primary-ws kernel: Unloaded tainted modules:
amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_eda>
Sep 18 20:57:55 primary-ws kernel:  pcc_cpufreq():1 pcc_cpufreq():1
fjes():1 fjes():1 pcc_cpufreq():1 fjes():1 fjes():1 fjes():1 fjes():1
fjes():1
Sep 18 20:57:55 primary-ws kernel: CPU: 22 PID: 235114 Comm:
kworker/22:0 Tainted: G        W    L    -------  ---
6.0.0-0.rc5.20220914git3245cb65fd91.39.fc38.x86_64 #1
Sep 18 20:57:55 primary-ws kernel: Hardware name: System manufacturer
System Product Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022
Sep 18 20:57:55 primary-ws kernel: Workqueue: events
drm_sched_entity_kill_jobs_work [gpu_sched]
Sep 18 20:57:55 primary-ws kernel: RIP: 0010:refcount_warn_saturate+0xba/0x110
Sep 18 20:57:55 primary-ws kernel: Code: 01 01 e8 69 6b 6f 00 0f 0b e9
32 38 a5 00 80 3d 4d 7d be 01 00 75 85 48 c7 c7 80 b7 8e 95 c6 05 3d
7d be 01 01 e8 46 6b 6f 00 <0f> 0b e9 0f 38 a5 00 80 3d 28 7d be 01 00
0f 85 5e ff ff ff 48 c7
Sep 18 20:57:55 primary-ws kernel: RSP: 0018:ffffa1a853ccbe60 EFLAGS: 00010286
Sep 18 20:57:55 primary-ws kernel: RAX: 0000000000000026 RBX:
ffff8e0e60a96c28 RCX: 0000000000000000
Sep 18 20:57:55 primary-ws kernel: RDX: 0000000000000001 RSI:
ffffffff958d255c RDI: 00000000ffffffff
Sep 18 20:57:55 primary-ws kernel: RBP: ffff8e19a83f5600 R08:
0000000000000000 R09: ffffa1a853ccbd10
Sep 18 20:57:55 primary-ws kernel: R10: 0000000000000003 R11:
ffff8e19ee2fffe8 R12: ffff8e19a83fc800
Sep 18 20:57:55 primary-ws kernel: R13: ffff8e0d44a4b440 R14:
ffff8e19a83fc805 R15: ffff8e0e60a96c30
Sep 18 20:57:55 primary-ws kernel: FS:  0000000000000000(0000)
GS:ffff8e19a8200000(0000) knlGS:0000000000000000
Sep 18 20:57:55 primary-ws kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Sep 18 20:57:55 primary-ws kernel: CR2: 00001adc05fb2000 CR3:
00000002cf050000 CR4: 0000000000350ee0
Sep 18 20:57:55 primary-ws kernel: Call Trace:
Sep 18 20:57:55 primary-ws kernel:  <TASK>
Sep 18 20:57:55 primary-ws kernel:  process_one_work+0x2a0/0x600
Sep 18 20:57:55 primary-ws kernel:  worker_thread+0x4f/0x3a0
Sep 18 20:57:55 primary-ws kernel:  ? process_one_work+0x600/0x600
Sep 18 20:57:55 primary-ws kernel:  kthread+0xf5/0x120
Sep 18 20:57:55 primary-ws kernel:  ? kthread_complete_and_exit+0x20/0x20
Sep 18 20:57:55 primary-ws kernel:  ret_from_fork+0x22/0x30
Sep 18 20:57:55 primary-ws kernel:  </TASK>
Sep 18 20:57:55 primary-ws kernel: irq event stamp: 63606683
Sep 18 20:57:55 primary-ws kernel: hardirqs last  enabled at
(63606691): [<ffffffff9418ce0e>] __up_console_sem+0x5e/0x70
Sep 18 20:57:55 primary-ws kernel: hardirqs last disabled at
(63606698): [<ffffffff9418cdf3>] __up_console_sem+0x43/0x70
Sep 18 20:57:55 primary-ws kernel: softirqs last  enabled at
(63490566): [<ffffffff940ff749>] __irq_exit_rcu+0xf9/0x170
Sep 18 20:57:55 primary-ws kernel: softirqs last disabled at
(63490561): [<ffffffff940ff749>] __irq_exit_rcu+0xf9/0x170
Sep 18 20:57:55 primary-ws kernel: ---[ end trace 0000000000000000 ]---
Sep 18 20:57:56 primary-ws abrt-dump-journal-oops[1409]:
abrt-dump-journal-oops: Found oopses: 1
Sep 18 20:57:56 primary-ws abrt-dump-journal-oops[1409]:
abrt-dump-journal-oops: Creating problem directories
Sep 18 20:57:57 primary-ws abrt-notification[261766]: [🡕] System
encountered a non-fatal error in kthread_complete_and_exit()
Sep 18 20:57:57 primary-ws abrt-dump-journal-oops[1409]: Reported 1
kernel oopses to Abrt
Sep 18 20:58:23 primary-ws gsd-power[2776]: Failed to acquire idle
monitor proxy: Timeout was reached
Sep 18 20:58:23 primary-ws gsd-power[2776]: Error setting property
'PowerSaveMode' on interface org.gnome.Mutter.DisplayConfig: Timeout
was reached (g-io-error-quark, 24)
Sep 18 20:58:53 primary-ws gsd-power[2776]: Failed to acquire idle
monitor proxy: Timeout was reached
Sep 18 20:58:53 primary-ws gsd-power[2776]: Error setting property
'PowerSaveMode' on interface org.gnome.Mutter.DisplayConfig: Timeout
was reached (g-io-error-quark, 24)
Sep 18 20:58:54 primary-ws gsd-power[2776]: Failed to acquire idle
monitor proxy: Timeout was reached

Full kernel log: https://pastebin.com/nj2syLPM

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-09-19 23:28 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-14 21:11 [BUG][5.20] refcount_t: underflow; use-after-free Mikhail Gavrilov
2022-08-15  0:20 ` Maíra Canal
2022-08-15 10:37   ` Mikhail Gavrilov
2022-08-16 19:14     ` Mikhail Gavrilov
2022-08-17 16:07       ` Melissa Wen
2022-08-17 17:44         ` Mikhail Gavrilov
2022-08-17 18:43           ` Maíra Canal
2022-08-17 20:57             ` Mikhail Gavrilov
2022-08-19 12:13               ` Maíra Canal
2022-08-24 21:44                 ` Mikhail Gavrilov
2022-09-19 23:27                   ` Mikhail Gavrilov
2022-08-15 10:55   ` Melissa Wen
2022-08-15 10:58     ` Christian König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).