linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [6.2][regression] looks like commit aab9cf7b6954136f4339136a1a7fc0602a2c4d8b leads to use-after-free and random computer hangs
@ 2022-12-18 13:28 Mikhail Gavrilov
  2022-12-22 12:31 ` Thorsten Leemhuis
  0 siblings, 1 reply; 2+ messages in thread
From: Mikhail Gavrilov @ 2022-12-18 13:28 UTC (permalink / raw)
  To: Christian König, amd-gfx list, Christian König,
	Linux List Kernel Mailing, dri-devel, luben.tuikov

Hi,
The kernel 6.2 preparation cycle has begun.
And after the kernel was updated on my Fedora Rawhide I started
receiving use-after-free errors with complete computer hangs.
At least a good reproducer of this behaviour is launch of the game
"Marvel's Avengers".

The backtrace of the issue looks like:
[  550.435083] ------------[ cut here ]------------
[  550.435110] refcount_t: underflow; use-after-free.
[  550.435808] WARNING: CPU: 9 PID: 738 at lib/refcount.c:25
refcount_warn_saturate+0x97/0x110
[  550.435812] Modules linked in: uinput rfcomm snd_seq_dummy
snd_hrtimer netconsole nft_objref nf_conntrack_netbios_ns
nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat nf_nat nf_conntrack
[  550.435887] refcount_t: saturated; leaking memory.
[  550.435893]  nf_defrag_ipv6 nf_defrag_ipv4
[  550.435902] WARNING: CPU: 26 PID: 5032 at lib/refcount.c:19
refcount_warn_saturate+0x74/0x110
[  550.435907]  ip_set
[  550.435909] Modules linked in:
[  550.435910]  nf_tables
[  550.435912]  uinput rfcomm
[  550.435918]  nfnetlink
[  550.435919]  snd_seq_dummy snd_hrtimer
[  550.435925]  qrtr
[  550.435926]  netconsole nft_objref
[  550.435931]  bnep
[  550.435933]  nf_conntrack_netbios_ns nf_conntrack_broadcast
[  550.435938]  sunrpc
[  550.435939]  nft_fib_inet
[  550.435941]  binfmt_misc
[  550.435942]  nft_fib_ipv4
[  550.435943]  iwlmvm
[  550.435130] WARNING: CPU: 25 PID: 740 at lib/refcount.c:28
refcount_warn_saturate+0xba/0x110
[  550.435945]  nft_fib_ipv6
[  550.435946]  btusb
[  550.435947]  nft_fib nft_reject_inet
[  550.435954]  btrtl
[  550.435955]  nf_reject_ipv4 nf_reject_ipv6
[  550.435963]  btbcm
[  550.435964]  nft_reject nft_ct
[  550.435969]  btintel
[  550.435971]  nft_chain_nat nf_nat
[  550.435977]  btmtk
[  550.435979]  nf_conntrack nf_defrag_ipv6
[  550.435984]  snd_seq_midi
[  550.435985]  nf_defrag_ipv4 ip_set
[  550.435991]  snd_seq_midi_event
[  550.435992]  nf_tables
[  550.435993]  bluetooth
[  550.435995]  nfnetlink
[  550.435996]  hid_logitech_hidpp
[  550.435142] Modules linked in: uinput rfcomm snd_seq_dummy
snd_hrtimer netconsole nft_objref nf_conntrack_netbios_ns
nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set
nf_tables nfnetlink qrtr bnep sunrpc binfmt_misc iwlmvm btusb btrtl
btbcm btintel btmtk snd_seq_midi snd_seq_midi_event bluetooth
hid_logitech_hidpp snd_usb_audio iwlwifi xpad ff_memless
snd_usbmidi_lib snd_rawmidi mc ecdh_generic intel_rapl_msr
intel_rapl_common mt76x2u mt76x2_common joydev snd_hda_codec_realtek
mt76x02_usb edac_mce_amd snd_hda_codec_generic mt76_usb
snd_hda_codec_hdmi mt76x02_lib kvm_amd snd_hda_intel snd_intel_dspcfg
snd_intel_sdw_acpi snd_hda_codec mt76 vfat kvm snd_hda_core fat
snd_seq snd_hwdep irqbypass snd_seq_device mac80211 snd_pcm eeepc_wmi
asus_wmi ledtrig_audio sparse_keymap rapl platform_profile wmi_bmof
snd_timer snd pcspkr i2c_piix4
[  550.435997]  qrtr bnep
[  550.436003]  snd_usb_audio
[  550.436004]  sunrpc binfmt_misc
[  550.436010]  iwlwifi
[  550.436012]  iwlmvm btusb
[  550.436018]  xpad
[  550.436019]  btrtl btbcm
[  550.436025]  ff_memless
[  550.436026]  btintel
[  550.436027]  snd_usbmidi_lib
[  550.436029]  btmtk
[  550.436030]  snd_rawmidi
[  550.436031]  snd_seq_midi snd_seq_midi_event
[  550.436037]  mc
[  550.436038]  bluetooth
[  550.436039]  ecdh_generic
[  550.436041]  hid_logitech_hidpp snd_usb_audio
[  550.436046]  intel_rapl_msr
[  550.436048]  iwlwifi xpad
[  550.436054]  intel_rapl_common
[  550.436055]  ff_memless
[  550.436056]  mt76x2u
[  550.436058]  snd_usbmidi_lib snd_rawmidi
[  550.436063]  mt76x2_common
[  550.436064]  mc ecdh_generic
[  550.436070]  joydev
[  550.436071]  intel_rapl_msr intel_rapl_common
[  550.436076]  snd_hda_codec_realtek
[  550.436078]  mt76x2u
[  550.436079]  mt76x02_usb
[  550.436080]  mt76x2_common joydev
[  550.436086]  edac_mce_amd
[  550.436088]  snd_hda_codec_realtek mt76x02_usb
[  550.436094]  snd_hda_codec_generic
[  550.436095]  edac_mce_amd
[  550.436096]  mt76_usb
[  550.436098]  snd_hda_codec_generic mt76_usb
[  550.436104]  snd_hda_codec_hdmi
[  550.436106]  snd_hda_codec_hdmi
[  550.436107]  mt76x02_lib
[  550.435234]  k10temp soundcore libarc4 acpi_cpufreq cfg80211
hid_logitech_dj rfkill zram amdgpu drm_ttm_helper ttm video iommu_v2
gpu_sched drm_buddy crct10dif_pclmul crc32_pclmul crc32c_intel igb
ucsi_ccg drm_display_helper nvme typec_ucsi ghash_clmulni_intel ccp
typec cec sp5100_tco dca sha512_ssse3 nvme_core wmi ip6_tables
ip_tables fuse
[  550.436108]  mt76x02_lib kvm_amd
[  550.436115]  kvm_amd
[  550.436116]  snd_hda_intel snd_intel_dspcfg
[  550.436122]  snd_hda_intel
[  550.436123]  snd_intel_sdw_acpi
[  550.435284] CPU: 25 PID: 740 Comm: sdma2 Tainted: G        W    L
  6.1.0-rc1-13-aab9cf7b6954136f4339136a1a7fc0602a2c4d8b+ #178
[  550.436124]  snd_intel_dspcfg
[  550.436126]  snd_hda_codec
[  550.436127]  snd_intel_sdw_acpi
[  550.436128]  mt76 vfat
[  550.436134]  snd_hda_codec
[  550.436135]  kvm
[  550.436136]  mt76
[  550.436138]  snd_hda_core fat
[  550.436144]  vfat
[  550.436145]  snd_seq snd_hwdep
[  550.436152]  kvm
[  550.436153]  irqbypass
[  550.436154]  snd_hda_core
[  550.436156]  snd_seq_device mac80211
[  550.436161]  fat
[  550.436163]  snd_pcm eeepc_wmi
[  550.436168]  snd_seq
[  550.436170]  asus_wmi ledtrig_audio
[  550.436176]  snd_hwdep
[  550.436177]  sparse_keymap
[  550.436178]  irqbypass
[  550.436180]  rapl platform_profile
[  550.436185]  snd_seq_device
[  550.436186]  wmi_bmof
[  550.436188]  mac80211
[  550.436189]  snd_timer
[  550.436190]  snd_pcm
[  550.436192]  snd pcspkr
[  550.436198]  eeepc_wmi
[  550.436200]  i2c_piix4 k10temp
[  550.436205]  asus_wmi
[  550.435292] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 4408 10/28/2022
[  550.436207]  soundcore
[  550.436208]  ledtrig_audio
[  550.436209]  libarc4 acpi_cpufreq
[  550.436215]  sparse_keymap
[  550.436216]  cfg80211
[  550.436217]  rapl
[  550.436219]  hid_logitech_dj rfkill
[  550.436224]  platform_profile
[  550.436225]  zram
[  550.436226]  wmi_bmof
[  550.436228]  amdgpu drm_ttm_helper
[  550.436234]  snd_timer
[  550.436235]  ttm
[  550.436236]  snd
[  550.436237]  video iommu_v2
[  550.436244]  pcspkr
[  550.436245]  gpu_sched
[  550.436246]  i2c_piix4
[  550.435299] RIP: 0010:refcount_warn_saturate+0xba/0x110
[  550.436248]  drm_buddy crct10dif_pclmul
[  550.436253]  k10temp
[  550.436255]  crc32_pclmul
[  550.436256]  soundcore
[  550.436257]  crc32c_intel igb
[  550.436263]  libarc4
[  550.436264]  ucsi_ccg
[  550.436265]  acpi_cpufreq
[  550.436267]  drm_display_helper nvme
[  550.436273]  cfg80211
[  550.436274]  typec_ucsi
[  550.436275]  hid_logitech_dj
[  550.436277]  ghash_clmulni_intel ccp
[  550.435308] Code: 01 01 e8 35 09 6f 00 0f 0b e9 02 af a5 00 80 3d
dd 84 be 01 00 75 85 48 c7 c7 70 83 87 b5 c6 05 cd 84 be 01 01 e8 12
09 6f 00 <0f> 0b e9 df ae a5 00 80 3d b8 84 be 01 00 0f 85 5e ff ff ff
48 c7
[  550.436283]  rfkill
[  550.436284]  typec
[  550.436285]  zram
[  550.436286]  cec sp5100_tco
[  550.436293]  amdgpu
[  550.436294]  dca
[  550.436295]  drm_ttm_helper
[  550.436297]  sha512_ssse3 nvme_core
[  550.436303]  ttm
[  550.436304]  wmi
[  550.436305]  video
[  550.436307]  ip6_tables ip_tables
[  550.436312]  iommu_v2
[  550.436314]  fuse
[  550.436315]  gpu_sched
[  550.436316]
[  550.436318] CPU: 26 PID: 5032 Comm: Avengers.exe Tainted: G
W    L     6.1.0-rc1-13-aab9cf7b6954136f4339136a1a7fc0602a2c4d8b+ #178
[  550.436322]  drm_buddy
[  550.436324] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 4408 10/28/2022
[  550.436325] RIP: 0010:refcount_warn_saturate+0x74/0x110
[  550.436329]  crct10dif_pclmul
[  550.436331] Code: 01 01 e8 7b 09 6f 00 0f 0b e9 48 af a5 00 80 3d
26 85 be 01 00 75 cb 48 c7 c7 18 83 87 b5 c6 05 16 85 be 01 01 e8 58
09 6f 00 <0f> 0b e9 25 af a5 00 80 3d 01 85 be 01 00 75 a8 48 c7 c7 40
83 87
[  550.436336]  crc32_pclmul
[  550.436338] RSP: 0018:ffffb11a471cf968 EFLAGS: 00010296
[  550.435315] RSP: 0018:ffffb11a47d27e68 EFLAGS: 00010296
[  550.436344]  crc32c_intel
[  550.436345]
[  550.436346]  igb
[  550.436348] RAX: 0000000000000026 RBX: ffffb11a471cf9d8 RCX: 0000000000000000
[  550.436354]  ucsi_ccg
[  550.436355] RDX: 0000000000000001 RSI: ffffffffb585efed RDI: 00000000ffffffff
[  550.436356] RBP: ffff9a8223960640 R08: 0000000000000001 R09: 0000000000000001
[  550.436361]  drm_display_helper
[  550.436362] R10: 0000000000000001 R11: 0000000000000000 R12: 00000000ffffffff
[  550.436363] R13: ffffb11a471cfb48 R14: ffffb11a471cfb48 R15: 000000080010c600
[  550.435324] RAX: 0000000000000026 RBX: ffff9a820d62e360 RCX: 0000000000000000
[  550.436368]  nvme
[  550.436369] FS:  00000000124ff6c0(0000) GS:ffff9a90e9200000(0000)
knlGS:000000000bdb0000
[  550.436375]  typec_ucsi
[  550.436376] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  550.436378] CR2: 000000001af20870 CR3: 000000029c884000 CR4: 0000000000350ee0
[  550.436382]  ghash_clmulni_intel
[  550.436383] Call Trace:
[  550.436384]  <TASK>
[  550.436389]  ccp
[  550.436390]  dma_resv_iter_walk_unlocked.part.0+0x143/0x1b0
[  550.436396]  typec
[  550.436400]  dma_resv_iter_first_unlocked+0x5d/0x200
[  550.435331] RDX: 0000000000000001 RSI: ffffffffb585efed RDI: 00000000ffffffff
[  550.436405]  cec
[  550.436408]  amdgpu_vm_sdma_update+0x5d/0x360 [amdgpu]
[  550.436581]  sp5100_tco
[  550.436700]  amdgpu_vm_ptes_update+0x28f/0x850 [amdgpu]
[  550.436844]  dca
[  550.436859]  amdgpu_vm_update_range+0x1f9/0x6d0 [amdgpu]
[  550.436994]  sha512_ssse3
[  550.437007]  amdgpu_vm_bo_update+0x29c/0x670 [amdgpu]
[  550.437138]  nvme_core
[  550.437152]  amdgpu_gem_va_ioctl+0x4bd/0x500 [amdgpu]
[  550.437311]  wmi
[  550.437331]  ? amdgpu_gem_va_map_flags+0x70/0x70 [amdgpu]
[  550.437463]  ip6_tables
[  550.437467]  drm_ioctl_kernel+0xa4/0x150
[  550.437471]  ip_tables
[  550.437477]  drm_ioctl+0x1de/0x460
[  550.437480]  fuse
[  550.435338] RBP: ffff9a8210fa4400 R08: 0000000000000000 R09: ffffb11a47d27d18
[  550.437546] CPU: 9 PID: 738 Comm: sdma0 Tainted: G        W    L
 6.1.0-rc1-13-aab9cf7b6954136f4339136a1a7fc0602a2c4d8b+ #178
[  550.437484]  ? amdgpu_gem_va_map_flags+0x70/0x70 [amdgpu]
[  550.437566] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 4408 10/28/2022
[  550.437567] RIP: 0010:refcount_warn_saturate+0x97/0x110
[  550.437570] Code: 01 01 e8 58 09 6f 00 0f 0b e9 25 af a5 00 80 3d
01 85 be 01 00 75 a8 48 c7 c7 40 83 87 b5 c6 05 f1 84 be 01 01 e8 35
09 6f 00 <0f> 0b e9 02 af a5 00 80 3d dd 84 be 01 00 75 85 48 c7 c7 70
83 87
[  550.437572] RSP: 0018:ffffb11a47d17e68 EFLAGS: 00010296
[  550.437574] RAX: 000000000000002a RBX: ffff9a822fac7520 RCX: 0000000000000000
[  550.437576] RDX: 0000000000000001 RSI: ffffffffb585efed RDI: 00000000ffffffff
[  550.437577] RBP: ffff9a8210fa2000 R08: 0000000000000001 R09: 0000000000000001
[  550.437578] R10: 0000000000000001 R11: 0000000000000000 R12: ffff9a8223960640
[  550.437580] R13: ffff9a8223961700 R14: ffff9a822fad0578 R15: ffff9a822fad0578
[  550.437581] FS:  0000000000000000(0000) GS:ffff9a90e4e00000(0000)
knlGS:0000000000000000
[  550.437583] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  550.437584] CR2: 00007f85fc297020 CR3: 000000029c884000 CR4: 0000000000350ee0
[  550.437586] Call Trace:
[  550.437587]  <TASK>
[  550.437588]  drm_sched_entity_pop_job+0x47e/0x490 [gpu_sched]
[  550.435345] R10: 0000000000000003 R11: ffff9a912e2fffe8 R12: 00000000ffffffff
[  550.437595]  drm_sched_main+0xa7/0x4a0 [gpu_sched]
[  550.437602]  ? prepare_to_wait_exclusive+0xd0/0xd0
[  550.437608]  ? trace_event_raw_event_drm_sched_job+0xf0/0xf0 [gpu_sched]
[  550.437612]  kthread+0xf5/0x120
[  550.435353] R13: ffff9a8223960700 R14: ffff9a822fad1e88 R15: ffff9a822fad1e88
[  550.437624]  ? kthread_complete_and_exit+0x20/0x20
[  550.437636]  ? lock_release+0x14f/0x460
[  550.437641]  ret_from_fork+0x22/0x30
[  550.437647]  ? _raw_spin_unlock_irqrestore+0x30/0x60
[  550.437662]  </TASK>
[  550.437666]  ? _raw_spin_unlock_irqrestore+0x30/0x60
[  550.437672] irq event stamp: 11657
[  550.437674]  ? lockdep_hardirqs_on+0x7d/0x100
[  550.437679] hardirqs last  enabled at (11667): [<ffffffffb418cbc2>]
console_emit_next_record.constprop.0+0x232/0x300
[  550.437687]  ? _raw_spin_unlock_irqrestore+0x40/0x60
[  550.435360] FS:  0000000000000000(0000) GS:ffff9a90e8e00000(0000)
knlGS:0000000000000000
[  550.437693] hardirqs last disabled at (11676): [<ffffffffb418cc01>]
console_emit_next_record.constprop.0+0x271/0x300
[  550.437695] softirqs last  enabled at (11498): [<ffffffffb40fe739>]
__irq_exit_rcu+0xf9/0x170
[  550.437704]  amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
[  550.437912] softirqs last disabled at (11493): [<ffffffffb40fe739>]
__irq_exit_rcu+0xf9/0x170
[  550.435367] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  550.437923]  __x64_sys_ioctl+0x90/0xd0
[  550.437929]  do_syscall_64+0x3a/0x90
[  550.437932] ---[ end trace 0000000000000000 ]---
[  550.437935]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  550.438006] ------------[ cut here ]------------
[  550.435375] CR2: 00007f2aa40330a8 CR3: 0000000119742000 CR4: 0000000000350ee0
[  550.438041] RIP: 0033:0x7f2b1e49065f
[  550.438062] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24
10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00
00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28
00 00
[  550.435382] Call Trace:
[  550.438064] RSP: 002b:00000000124fe120 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[  550.438067] RAX: ffffffffffffffda RBX: 00007f2acca74540 RCX: 00007f2b1e49065f
[  550.438068] RDX: 00000000124fe1c0 RSI: 00000000c0286448 RDI: 0000000000000021
[  550.438070] RBP: 00000000124fe1c0 R08: ffff80010a600000 R09: 000000000000000e
[  550.438071] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c0286448
[  550.438073] R13: 0000000000000021 R14: 0000000000000000 R15: 000000007e132290
[  550.438086]  </TASK>
[  550.438166] refcount_t: saturated; leaking memory.
[  550.438184] WARNING: CPU: 30 PID: 0 at lib/refcount.c:22
refcount_warn_saturate+0x51/0x110
[  550.438194] Modules linked in: uinput
[  550.438202] irq event stamp: 233045
[  550.438204] hardirqs last  enabled at (233051):
[<ffffffffb418e2c2>] vprintk_emit+0x342/0x350
[  550.435389]  <TASK>
[  550.438207] hardirqs last disabled at (233056):
[<ffffffffb418e271>] vprintk_emit+0x2f1/0x350
[  550.438404]  rfcomm snd_seq_dummy
[  550.438414] softirqs last  enabled at (231056):
[<ffffffffb40fe739>] __irq_exit_rcu+0xf9/0x170
[  550.438417] softirqs last disabled at (231049):
[<ffffffffb40fe739>] __irq_exit_rcu+0xf9/0x170
[  550.438477]  snd_hrtimer
[  550.438479] ---[ end trace 0000000000000000 ]---
[  550.435396]  drm_sched_entity_pop_job+0x10c/0x490 [gpu_sched]
[  550.438483]  netconsole nft_objref nf_conntrack_netbios_ns
nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set
nf_tables nfnetlink qrtr bnep sunrpc binfmt_misc iwlmvm btusb btrtl
btbcm btintel btmtk snd_seq_midi snd_seq_midi_event bluetooth
hid_logitech_hidpp snd_usb_audio iwlwifi xpad ff_memless
snd_usbmidi_lib snd_rawmidi mc ecdh_generic intel_rapl_msr
intel_rapl_common mt76x2u mt76x2_common joydev snd_hda_codec_realtek
mt76x02_usb edac_mce_amd snd_hda_codec_generic mt76_usb
snd_hda_codec_hdmi mt76x02_lib kvm_amd snd_hda_intel snd_intel_dspcfg
snd_intel_sdw_acpi snd_hda_codec mt76 vfat kvm snd_hda_core fat
snd_seq snd_hwdep irqbypass snd_seq_device mac80211 snd_pcm eeepc_wmi
asus_wmi ledtrig_audio sparse_keymap rapl platform_profile wmi_bmof
snd_timer snd pcspkr i2c_piix4 k10temp soundcore libarc4 acpi_cpufreq
cfg80211 hid_logitech_dj
[  550.435410]  drm_sched_main+0xa7/0x4a0 [gpu_sched]
[  550.438792]  rfkill zram amdgpu drm_ttm_helper ttm video iommu_v2
gpu_sched drm_buddy crct10dif_pclmul crc32_pclmul crc32c_intel igb
ucsi_ccg drm_display_helper nvme typec_ucsi ghash_clmulni_intel ccp
typec cec sp5100_tco dca sha512_ssse3 nvme_core wmi ip6_tables
ip_tables fuse
[  550.438870] CPU: 30 PID: 0 Comm: swapper/30 Tainted: G        W
L     6.1.0-rc1-13-aab9cf7b6954136f4339136a1a7fc0602a2c4d8b+ #178
[  550.438929] RSP: 0018:ffffb11a40798dc0 EFLAGS: 00010082
[  550.438943] RAX: 0000000000000026 RBX: ffff9a8223960600 RCX: 0000000000000000
[  550.438956] RDX: 0000000000010003 RSI: ffffffffb585efed RDI: 00000000ffffffff
[  550.435423]  ? prepare_to_wait_exclusive+0xd0/0xd0
[  550.438969] RBP: ffff9a822fad0578 R08: 0000000000000002 R09: 0000000024924924
[  550.438981] R10: 0000000000000001 R11: 0000000000000000 R12: ffff9a8223960678
[  550.438993] R13: 0000000000000002 R14: 00000000ffffffff R15: ffff9a822fad0488
[  550.439006] FS:  0000000000000000(0000) GS:ffff9a90ea200000(0000)
knlGS:0000000000000000
[  550.439019] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  550.439032] CR2: 000000007febe698 CR3: 000000031dfee000 CR4: 0000000000350ee0
[  550.439045] Call Trace:
[  550.439057]  <IRQ>
[  550.435435]  ? trace_event_raw_event_drm_sched_job+0xf0/0xf0 [gpu_sched]
[  550.439069]  drm_sched_job_done.isra.0+0x1ad/0x1e0 [gpu_sched]
[  550.439093]  dma_fence_signal_timestamp_locked+0x9e/0x1c0
[  550.439113]  dma_fence_signal+0x36/0x70
[  550.435445]  kthread+0xf5/0x120
[  550.435453]  ? kthread_complete_and_exit+0x20/0x20
[  550.439130]  amdgpu_fence_process+0xd5/0x140 [amdgpu]
[  550.435464]  ret_from_fork+0x22/0x30
[  550.435536]  </TASK>
[  550.439402]  sdma_v5_2_process_trap_irq+0x103/0x150 [amdgpu]
[  550.435542] irq event stamp: 22001
[  550.435571] hardirqs last  enabled at (22023): [<ffffffffb418b64e>]
__up_console_sem+0x5e/0x70
[  550.435581] hardirqs last disabled at (22032): [<ffffffffb418b633>]
__up_console_sem+0x43/0x70
[  550.439695]  amdgpu_irq_dispatch+0x10c/0x2b0 [amdgpu]
[  550.435589] softirqs last  enabled at (22020): [<ffffffffb40fe739>]
__irq_exit_rcu+0xf9/0x170
[  550.435598] softirqs last disabled at (22011): [<ffffffffb40fe739>]
__irq_exit_rcu+0xf9/0x170
[  550.439980]  amdgpu_ih_process+0x66/0x100 [amdgpu]
[  550.435606] ---[ end trace 0000000000000000 ]---
[  550.440255]  amdgpu_irq_handler+0x1f/0x60 [amdgpu]
[  550.440525]  __handle_irq_event_percpu+0x93/0x330
[  550.440546]  handle_irq_event+0x34/0x70
[  550.440563]  handle_edge_irq+0x9f/0x240
[  550.440580]  __common_interrupt+0x71/0x150
[  550.440598]  common_interrupt+0xb4/0xd0
[  550.440612]  </IRQ>
[  550.440623]  <TASK>
[  550.440638]  asm_common_interrupt+0x22/0x40
[  550.440654] RIP: 0010:cpuidle_enter_state+0xeb/0x320
[  550.440670] Code: 1b ae 5a ff 45 84 ff 74 16 9c 58 0f 1f 40 00 f6
c4 02 0f 85 05 02 00 00 31 ff e8 20 d0 62 ff e8 3b b4 6a ff fb 0f 1f
44 00 00 <45> 85 ed 0f 88 e2 00 00 00 49 63 cd 4c 89 f2 48 8d 04 49 48
8d 04
[  550.440683] RSP: 0018:ffffb11a4025fe98 EFLAGS: 00000206
[  550.440698] RAX: 00000000001058c9 RBX: ffff9a822f5bb000 RCX: 0000000000000001
[  550.440711] RDX: 0000000000000000 RSI: ffffffffb590ab9f RDI: ffffffffb587927e
[  550.440723] RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000001
[  550.440736] R10: 0000000000000018 R11: 0000000000000191 R12: ffffffffb62f7820
[  550.440748] R13: 0000000000000002 R14: 0000008028a95e72 R15: 0000000000000000
[  550.435771] ------------[ cut here ]------------
[  550.440897]  </TASK>
[  550.440909] irq event stamp: 1071306
[  550.440921] hardirqs last  enabled at (1071305):
[<ffffffffb4bb03e5>] cpuidle_enter_state+0xe5/0x320
[  550.440936] hardirqs last disabled at (1071306):
[<ffffffffb4f09895>] common_interrupt+0x15/0xd0
[  550.440950] softirqs last  enabled at (1071294):
[<ffffffffb40fe739>] __irq_exit_rcu+0xf9/0x170
[  550.435771] ------------[ cut here ]------------
[  550.440965] softirqs last disabled at (1071289):
[<ffffffffb40fe739>] __irq_exit_rcu+0xf9/0x170
[  550.440980] ---[ end trace 0000000000000000 ]---
[  550.440998] BUG: kernel NULL pointer dereference, address: 0000000000000008
[  550.441004] #PF: supervisor write access in kernel mode
[  550.441011] #PF: error_code(0x0002) - not-present page
[  550.441017] PGD 309296067 P4D 309296067 PUD 31a01b067 PMD 31a019067 PTE 0
[  550.441028] Oops: 0002 [#1] PREEMPT SMP NOPTI
[  550.441036] CPU: 30 PID: 0 Comm: swapper/30 Tainted: G        W
L     6.1.0-rc1-13-aab9cf7b6954136f4339136a1a7fc0602a2c4d8b+ #178
[  550.441043] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 4408 10/28/2022
[  550.441049] RIP: 0010:dma_fence_signal_timestamp_locked+0x35/0x1c0
[  550.441057] Code: 53 48 89 f3 48 83 ec 10 8b 05 e3 94 9b 01 85 c0
0f 85 ef 00 00 00 f0 48 0f ba 6d 30 00 0f 82 63 01 00 00 48 8b 45 10
49 89 e4 <4c> 89 60 08 48 89 04 24 48 8b 45 18 48 89 44 24 08 4c 89 20
48 89
[  550.441064] RSP: 0018:ffffb11a40798d70 EFLAGS: 00010002
[  550.441071] RAX: 0000000000000000 RBX: 0000008046b8fe15 RCX: 0000000000000001
[  550.441078] RDX: 0000000000000000 RSI: ffffffffb57ceec9 RDI: ffffffffb587927e
[  550.441084] RBP: ffff9a8223960640 R08: 0000000000000002 R09: 0000000024924924
[  550.441090] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb11a40798d70
[  550.441096] R13: 0000000000000002 R14: 00000000ffffffff R15: ffff9a822fad0488
[  550.441102] FS:  0000000000000000(0000) GS:ffff9a90ea200000(0000)
knlGS:0000000000000000
[  550.441133]  ? recalibrate_cpu_khz+0x10/0x10
[  550.441140]  ? ktime_get+0x4d/0xb0
[  550.441149]  dma_fence_signal+0x36/0x70
[  550.441157]  drm_sched_job_done.isra.0+0x1b5/0x1e0 [gpu_sched]
[  550.441170]  dma_fence_signal_timestamp_locked+0x9e/0x1c0
[  550.441179]  dma_fence_signal+0x36/0x70
[  550.435792] refcount_t: addition on 0; use-after-free.
[  550.441187]  amdgpu_fence_process+0xd5/0x140 [amdgpu]
[  550.441442]  sdma_v5_2_process_trap_irq+0x103/0x150 [amdgpu]
[  550.441714]  amdgpu_irq_dispatch+0x10c/0x2b0 [amdgpu]
[  550.441979]  amdgpu_ih_process+0x66/0x100 [amdgpu]
[  550.442241]  amdgpu_irq_handler+0x1f/0x60 [amdgpu]
[  550.442501]  __handle_irq_event_percpu+0x93/0x330
[  550.442512]  handle_irq_event+0x34/0x70
[  550.442522]  handle_edge_irq+0x9f/0x240
[  550.442531]  __common_interrupt+0x71/0x150
[  550.442539]  common_interrupt+0xb4/0xd0
[  550.442546]  </IRQ>
[  550.442551]  <TASK>
[  550.442558]  asm_common_interrupt+0x22/0x40
[  550.442567] RIP: 0010:cpuidle_enter_state+0xeb/0x320
[  550.442575] Code: 1b ae 5a ff 45 84 ff 74 16 9c 58 0f 1f 40 00 f6
c4 02 0f 85 05 02 00 00 31 ff e8 20 d0 62 ff e8 3b b4 6a ff fb 0f 1f
44 00 00 <45> 85 ed 0f 88 e2 00 00 00 49 63 cd 4c 89 f2 48 8d 04 49 48
8d 04
[  550.442582] RSP: 0018:ffffb11a4025fe98 EFLAGS: 00000206
[  550.442589] RAX: 00000000001058c9 RBX: ffff9a822f5bb000 RCX: 0000000000000001
[  550.442823] RSP: 0018:ffffb11a40798d70 EFLAGS: 00010002
[  550.442830] RAX: 0000000000000000 RBX: 0000008046b8fe15 RCX: 0000000000000001
[  550.442835] RDX: 0000000000000000 RSI: ffffffffb57ceec9 RDI: ffffffffb587927e
[  550.442841] RBP: ffff9a8223960640 R08: 0000000000000002 R09: 0000000024924924
[  550.442848] R10: 0000000000000001 R11: 0000000000000000 R12: ffffb11a40798d70
[  550.442854] R13: 0000000000000002 R14: 00000000ffffffff R15: ffff9a822fad0488
[  550.442861] FS:  0000000000000000(0000) GS:ffff9a90ea200000(0000)
knlGS:0000000000000000
[  550.442867] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  550.442873] CR2: 0000000000000008 CR3: 000000031dfee000 CR4: 0000000000350ee0
[  550.442880] Kernel panic - not syncing: Fatal exception in interrupt
[  551.913615] Shutting down cpus with NMI
[  551.913641] Kernel Offset: 0x33000000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  551.913652] ---[ end Kernel panic - not syncing: Fatal exception in
interrupt ]---

Bisecting blames this commit:
aab9cf7b6954136f4339136a1a7fc0602a2c4d8b is the first bad commit
commit aab9cf7b6954136f4339136a1a7fc0602a2c4d8b
Author: Christian König <christian.koenig@amd.com>
Date:   Thu Sep 29 13:34:34 2022 +0200

    drm/amdgpu: use scheduler dependencies for VM updates

    Instead of putting that into the job sync object.

    Signed-off-by: Christian König <christian.koenig@amd.com>
    Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-9-christian.koenig@amd.com

 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c    | 56 +++++++++++++++++++++--------
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h    |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 10 ++++--
 3 files changed, 52 insertions(+), 16 deletions(-)

Unfortunately I couldn't check the correctness of the find, because
the kernel was not compiled when I reverted this commit, I got this
compilation error:
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c: In function ‘amdgpu_cs_sync_rings’:
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:1209:21: error: implicit
declaration of function ‘amdgpu_sync_push_to_job’
[-Werror=implicit-function-declaration]
 1209 |                 r = amdgpu_sync_push_to_job(&p->sync, p->jobs[i]);
      |                     ^~~~~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
make[5]: *** [scripts/Makefile.build:250:
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.o] Error 1
make[4]: *** [scripts/Makefile.build:502: drivers/gpu/drm/amd/amdgpu] Error 2
make[4]: *** Waiting for unfinished jobs....

Christian, can you help me check the correctness of find?

Thanks.

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [6.2][regression] looks like commit aab9cf7b6954136f4339136a1a7fc0602a2c4d8b leads to use-after-free and random computer hangs
  2022-12-18 13:28 [6.2][regression] looks like commit aab9cf7b6954136f4339136a1a7fc0602a2c4d8b leads to use-after-free and random computer hangs Mikhail Gavrilov
@ 2022-12-22 12:31 ` Thorsten Leemhuis
  0 siblings, 0 replies; 2+ messages in thread
From: Thorsten Leemhuis @ 2022-12-22 12:31 UTC (permalink / raw)
  To: Mikhail Gavrilov, Christian König, amd-gfx list,
	Christian König, Linux List Kernel Mailing, dri-devel,
	luben.tuikov, regressions

Hi, this is your Linux kernel regression tracker.

On 18.12.22 14:28, Mikhail Gavrilov wrote:
>
> The kernel 6.2 preparation cycle has begun.
> And after the kernel was updated on my Fedora Rawhide I started
> receiving use-after-free errors with complete computer hangs.
> At least a good reproducer of this behaviour is launch of the game
> "Marvel's Avengers".
> 
> The backtrace of the issue looks like:
> [...]

Thx for your report. I'm not one of the developers for this area of the
kernel, but to my untrained eyes it looks like this patch might fix your
problem:

https://lore.kernel.org/all/20221219104718.21677-1-christian.koenig@amd.com/

Anyway, to be sure the issue doesn't fall through the cracks unnoticed,
I'm adding it to regzbot, my Linux kernel regression tracking bot:

#regzbot ^introduced aab9cf7b695413
#regzbot title drm: amdgpu: use-after-free and random computer hangs
#regzbot monitor:
https://lore.kernel.org/all/20221219104718.21677-1-christian.koenig@amd.com/
#regzbot fix: drm/amdgpu: grab extra fence reference for
drm_sched_job_add_dependency
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-12-22 12:31 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-18 13:28 [6.2][regression] looks like commit aab9cf7b6954136f4339136a1a7fc0602a2c4d8b leads to use-after-free and random computer hangs Mikhail Gavrilov
2022-12-22 12:31 ` Thorsten Leemhuis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).