All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matt Fagnani <matt.fagnani@bell.net>
To: Thorsten Leemhuis <regressions@leemhuis.info>,
	Lu Baolu <baolu.lu@linux.intel.com>
Cc: Joerg Roedel <jroedel@suse.de>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	LKML <linux-kernel@vger.kernel.org>,
	"regressions@lists.linux.dev" <regressions@lists.linux.dev>,
	Linux PCI <linux-pci@vger.kernel.org>,
	Bjorn Helgaas <bhelgaas@google.com>
Subject: Re: [regression, bisected, pci/iommu] Bug 216865 - Black screen when amdgpu started during 6.2-rc1 boot with AMD IOMMU enabled
Date: Tue, 3 Jan 2023 14:06:25 -0500	[thread overview]
Message-ID: <52583644-d875-a454-7288-8b00ea0566ae@bell.net> (raw)
In-Reply-To: <15d0f9ff-2a56-b3e9-5b45-e6b23300ae3b@leemhuis.info>

I reproduced the problem with 6.2-rc1 in a Fedora 37 installation with 
early kdump enabled as described at 
https://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes 
https://github.com/k-hagio/fedora-kexec-tools/blob/master/early-kdump-howto.txt 
I panicked the kernel with sysrq+alt+c. The dmesg saved with kdump 
showed warnings at drivers/pci/ats.c:251 pci_disable_pri+0x75/0x80 and 
at drivers/pci/ats.c:419 pci_disable_pasid+0x45/0x50 involving AMD IOMMU 
and amdgpu functions in the trace. Since those warnings' were
if (WARN_ON(!pdev->pri_enabled)) and if (WARN_ON(!pdev->pasid_enabled)), 
pci_disable_pri and pci_disable_pasid looked like they were called when 
pdev->pri_enabled and pdev->pasid_enabled were both false. A null 
pointer dereference occurred right after that which made amdgpu crash.

[   13.132368] [drm] amdgpu kernel modesetting enabled.
[   13.133766] amdgpu: Topology: Add APU node [0x0:0x0]
[   13.137596] Console: switching to colour dummy device 80x25
[   13.143717] amdgpu 0000:00:01.0: vgaarb: deactivate vga console
[   13.143970] [drm] initializing kernel modesetting (CARRIZO 
0x1002:0x9874 0x103C:0x8332 0xCA).
[   13.144205] [drm] register mmio base: 0xF0400000
[   13.144209] [drm] register mmio size: 262144
[   13.144310] [drm] add ip block number 0 <vi_common>
[   13.144316] [drm] add ip block number 1 <gmc_v8_0>
[   13.144320] [drm] add ip block number 2 <cz_ih>
[   13.144324] [drm] add ip block number 3 <gfx_v8_0>
[   13.144328] [drm] add ip block number 4 <sdma_v3_0>
[   13.144332] [drm] add ip block number 5 <powerplay>
[   13.144336] [drm] add ip block number 6 <dm>
[   13.144340] [drm] add ip block number 7 <uvd_v6_0>
[   13.144343] [drm] add ip block number 8 <vce_v3_0>
[   13.144347] [drm] add ip block number 9 <acp_ip>
[   13.144388] amdgpu 0000:00:01.0: amdgpu: Fetched VBIOS from VFCT
[   13.144397] amdgpu: ATOM BIOS: 113-C75100-031
[   13.144425] [drm] UVD is enabled in physical mode
[   13.144431] [drm] VCE enabled in physical mode
[   13.144435] amdgpu 0000:00:01.0: amdgpu: Trusted Memory Zone (TMZ) 
feature not supported
[   13.144491] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, 
fragment size is 9-bit
[   13.144503] amdgpu 0000:00:01.0: amdgpu: VRAM: 512M 
0x000000F400000000 - 0x000000F41FFFFFFF (512M used)
[   13.144511] amdgpu 0000:00:01.0: amdgpu: GART: 1024M 
0x000000FF00000000 - 0x000000FF3FFFFFFF
[   13.144524] [drm] Detected VRAM RAM=512M, BAR=512M
[   13.144529] [drm] RAM width 64bits UNKNOWN
[   13.144623] [drm] amdgpu: 512M of VRAM memory ready
[   13.144630] [drm] amdgpu: 3572M of GTT memory ready.
[   13.144653] [drm] GART: num cpu pages 262144, num gpu pages 262144
[   13.144705] [drm] PCIE GART of 1024M enabled (table at 
0x000000F400600000).
[   13.158820] amdgpu: hwmgr_sw_init smu backed is smu8_smu
[   13.175036] [drm] Found UVD firmware Version: 1.91 Family ID: 11
[   13.175097] [drm] UVD ENC is disabled
[   13.186675] [drm] Found VCE firmware Version: 52.4 Binary ID: 3
[   13.187879] amdgpu: smu version 27.18.00
[   13.193760] [drm] DM_PPLIB: values for Engine clock
[   13.193773] [drm] DM_PPLIB:     300000
[   13.193776] [drm] DM_PPLIB:     480000
[   13.193779] [drm] DM_PPLIB:     533340
[   13.193781] [drm] DM_PPLIB:     576000
[   13.193784] [drm] DM_PPLIB:     626090
[   13.193786] [drm] DM_PPLIB:     685720
[   13.193788] [drm] DM_PPLIB:     720000
[   13.193791] [drm] DM_PPLIB:     757900
[   13.193793] [drm] DM_PPLIB: Validation clocks:
[   13.193796] [drm] DM_PPLIB:    engine_max_clock: 75790
[   13.193799] [drm] DM_PPLIB:    memory_max_clock: 93300
[   13.193802] [drm] DM_PPLIB:    level           : 8
[   13.193806] [drm] DM_PPLIB: values for Display clock
[   13.193809] [drm] DM_PPLIB:     300000
[   13.193811] [drm] DM_PPLIB:     400000
[   13.193814] [drm] DM_PPLIB:     496560
[   13.193816] [drm] DM_PPLIB:     626090
[   13.193819] [drm] DM_PPLIB:     685720
[   13.193821] [drm] DM_PPLIB:     757900
[   13.193823] [drm] DM_PPLIB:     800000
[   13.193825] [drm] DM_PPLIB:     847060
[   13.193828] [drm] DM_PPLIB: Validation clocks:
[   13.193830] [drm] DM_PPLIB:    engine_max_clock: 75790
[   13.193833] [drm] DM_PPLIB:    memory_max_clock: 93300
[   13.193836] [drm] DM_PPLIB:    level           : 8
[   13.193839] [drm] DM_PPLIB: values for Memory clock
[   13.193842] [drm] DM_PPLIB:     667000
[   13.193844] [drm] DM_PPLIB:     933000
[   13.193847] [drm] DM_PPLIB: Validation clocks:
[   13.193849] [drm] DM_PPLIB:    engine_max_clock: 75790
[   13.193852] [drm] DM_PPLIB:    memory_max_clock: 93300
[   13.193854] [drm] DM_PPLIB:    level           : 8
[   13.193973] [drm] Display Core initialized with v3.2.215!
[   13.309967] [drm] UVD initialized successfully.
[   13.511031] [drm] VCE initialized successfully.
[   13.515217] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[   13.515442] amdgpu: sdma_bitmap: f
[   13.515549] ------------[ cut here ]------------
[   13.515555] WARNING: CPU: 0 PID: 477 at drivers/pci/ats.c:251 
pci_disable_pri+0x75/0x80
[   13.515571] Modules linked in: amdgpu(+) drm_ttm_helper ttm iommu_v2 
hid_logitech_hidpp crct10dif_pclmul drm_buddy crc32_pclmul gpu_sched 
crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel 
sha512_ssse3 drm_display_helper wdat_wdt serio_raw hid_multitouch 
sp5100_tco hid_logitech_dj r8169 cec video wmi scsi_dh_rdac scsi_dh_emc 
scsi_dh_alua fuse dm_multipath
[   13.515620] CPU: 0 PID: 477 Comm: systemd-udevd Kdump: loaded Not 
tainted 6.2.0-0.rc1.14.fc38.x86_64 #1
[   13.515628] Hardware name: HP HP Laptop 15-bw0xx/8332, BIOS F.52 
12/03/2019
[   13.515634] RIP: 0010:pci_disable_pri+0x75/0x80
[   13.515642] Code: 54 24 06 89 ee 48 89 df 83 e2 fe 66 89 54 24 06 0f 
b7 d2 e8 1d e1 fc ff 80 a3 4b 08 00 00 fd 48 83 c4 08 5b 5d e9 2b 8b 69 
00 <0f> 0b eb b6 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90
[   13.515651] RSP: 0018:ffffbaf4407ab8e8 EFLAGS: 00010046
[   13.515658] RAX: 0000000000000000 RBX: ffff90aa00ac4000 RCX: 
0000000000000009
[   13.515663] RDX: 0000000000000000 RSI: 0000000000000014 RDI: 
ffff90aa00ac4000
[   13.515668] RBP: ffff90aa0e0c3810 R08: 0000000000000002 R09: 
0000000000000000
[   13.515673] R10: 0000000000000000 R11: ffffffffade4e430 R12: 
ffff90aa011a8800
[   13.515678] R13: ffff90aa0e0c3800 R14: ffff90aa011a8800 R15: 
ffff90aa0e0c3960
[   13.515683] FS:  00007fabd67feb40(0000) GS:ffff90aaf7400000(0000) 
knlGS:0000000000000000
[   13.515689] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   13.515695] CR2: 00007f5689ff54c0 CR3: 0000000100f16000 CR4: 
00000000001506f0
[   13.515700] Call Trace:
[   13.515704]  <TASK>
[   13.515710]  amd_iommu_attach_device+0x2e0/0x300
[   13.515719]  __iommu_attach_device+0x1b/0x90
[   13.515727]  iommu_attach_group+0x65/0xa0
[   13.515735]  amd_iommu_init_device+0x16b/0x250 [iommu_v2]
[   13.515747]  kfd_iommu_resume+0x4c/0x1a0 [amdgpu]
[   13.517094]  kgd2kfd_resume_iommu+0x12/0x30 [amdgpu]
[   13.518419]  kgd2kfd_device_init.cold+0x346/0x49a [amdgpu]
[   13.519699]  amdgpu_amdkfd_device_init+0x142/0x1d0 [amdgpu]
[   13.520877]  amdgpu_device_init.cold+0x19f5/0x1e21 [amdgpu]
[   13.522118]  ? _raw_spin_lock_irqsave+0x23/0x50
[   13.522126]  amdgpu_driver_load_kms+0x15/0x110 [amdgpu]
[   13.523386]  amdgpu_pci_probe+0x161/0x370 [amdgpu]
[   13.524516]  local_pci_probe+0x41/0x80
[   13.524525]  pci_device_probe+0xb3/0x220
[   13.524533]  really_probe+0xde/0x380
[   13.524540]  ? pm_runtime_barrier+0x50/0x90
[   13.524546]  __driver_probe_device+0x78/0x170
[   13.524555]  driver_probe_device+0x1f/0x90
[   13.524560]  __driver_attach+0xce/0x1c0
[   13.524565]  ? __pfx___driver_attach+0x10/0x10
[   13.524570]  bus_for_each_dev+0x73/0xa0
[   13.524575]  bus_add_driver+0x1ae/0x200
[   13.524580]  driver_register+0x89/0xe0
[   13.524586]  ? __pfx_init_module+0x10/0x10 [amdgpu]
[   13.525819]  do_one_initcall+0x59/0x230
[   13.525828]  do_init_module+0x4a/0x200
[   13.525834]  __do_sys_init_module+0x157/0x180
[   13.525839]  do_syscall_64+0x5b/0x80
[   13.525845]  ? handle_mm_fault+0xff/0x2f0
[   13.525850]  ? do_user_addr_fault+0x1ef/0x690
[   13.525856]  ? exc_page_fault+0x70/0x170
[   13.525860]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[   13.525867] RIP: 0033:0x7fabd66cde4e
[   13.525872] Code: 48 8b 0d e5 5f 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 
66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b2 5f 0c 00 f7 d8 64 89 01 48
[   13.525878] RSP: 002b:00007ffdd89bc6a8 EFLAGS: 00000246 ORIG_RAX: 
00000000000000af
[   13.525884] RAX: ffffffffffffffda RBX: 0000563e4d23f0a0 RCX: 
00007fabd66cde4e
[   13.525887] RDX: 00007fabd6817453 RSI: 000000000174fb66 RDI: 
00007fabd3bd4010
[   13.525890] RBP: 00007fabd6817453 R08: 0000563e4d237c70 R09: 
00007fabd672f900
[   13.525893] R10: 0000000000000005 R11: 0000000000000246 R12: 
0000000000020000
[   13.525896] R13: 0000563e4d239060 R14: 0000000000000000 R15: 
0000563e4d23e450
[   13.525900]  </TASK>
[   13.525902] ---[ end trace 0000000000000000 ]---
[   13.525964] ------------[ cut here ]------------
[   13.525966] WARNING: CPU: 0 PID: 477 at drivers/pci/ats.c:419 
pci_disable_pasid+0x45/0x50
[   13.525974] Modules linked in: amdgpu(+) drm_ttm_helper ttm iommu_v2 
hid_logitech_hidpp crct10dif_pclmul drm_buddy crc32_pclmul gpu_sched 
crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel 
sha512_ssse3 drm_display_helper wdat_wdt serio_raw hid_multitouch 
sp5100_tco hid_logitech_dj r8169 cec video wmi scsi_dh_rdac scsi_dh_emc 
scsi_dh_alua fuse dm_multipath
[   13.526006] CPU: 0 PID: 477 Comm: systemd-udevd Kdump: loaded 
Tainted: G        W         -------  ---  6.2.0-0.rc1.14.fc38.x86_64 #1
[   13.526012] Hardware name: HP HP Laptop 15-bw0xx/8332, BIOS F.52 
12/03/2019
[   13.526015] RIP: 0010:pci_disable_pasid+0x45/0x50
[   13.526020] Code: 53 48 89 fb 85 f6 75 06 5b e9 67 8c 69 00 83 c6 06 
31 d2 e8 3d e2 fc ff 80 a3 4b 08 00 00 fe 5b e9 50 8c 69 00 e9 4b 8c 69 
00 <0f> 0b e9 44 8c 69 00 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90
[   13.526025] RSP: 0018:ffffbaf4407ab900 EFLAGS: 00010046
[   13.526028] RAX: 0000000000000000 RBX: ffff90aa00ac4000 RCX: 
0000000000000009
[   13.526031] RDX: 0000000000000000 RSI: 0000000000000014 RDI: 
ffff90aa00ac4000
[   13.526034] RBP: ffff90aa0e0c3810 R08: 0000000000000002 R09: 
0000000000000000
[   13.526037] R10: 0000000000000000 R11: ffffffffade4e430 R12: 
ffff90aa011a8800
[   13.526040] R13: ffff90aa0e0c3800 R14: ffff90aa011a8800 R15: 
ffff90aa0e0c3960
[   13.526043] FS:  00007fabd67feb40(0000) GS:ffff90aaf7400000(0000) 
knlGS:0000000000000000
[   13.526047] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   13.526050] CR2: 00007f5689ff54c0 CR3: 0000000100f16000 CR4: 
00000000001506f0
[   13.526053] Call Trace:
[   13.526056]  <TASK>
[   13.526058]  amd_iommu_attach_device+0x2e8/0x300
[   13.526064]  __iommu_attach_device+0x1b/0x90
[   13.526070]  iommu_attach_group+0x65/0xa0
[   13.526075]  amd_iommu_init_device+0x16b/0x250 [iommu_v2]
[   13.526083]  kfd_iommu_resume+0x4c/0x1a0 [amdgpu]
[   13.527397]  kgd2kfd_resume_iommu+0x12/0x30 [amdgpu]
[   13.528709]  kgd2kfd_device_init.cold+0x346/0x49a [amdgpu]
[   13.529877]  amdgpu_amdkfd_device_init+0x142/0x1d0 [amdgpu]
[   13.531039]  amdgpu_device_init.cold+0x19f5/0x1e21 [amdgpu]
[   13.532322]  ? _raw_spin_lock_irqsave+0x23/0x50
[   13.532333]  amdgpu_driver_load_kms+0x15/0x110 [amdgpu]
[   13.533642]  amdgpu_pci_probe+0x161/0x370 [amdgpu]
[   13.534758]  local_pci_probe+0x41/0x80
[   13.534766]  pci_device_probe+0xb3/0x220
[   13.534771]  really_probe+0xde/0x380
[   13.534776]  ? pm_runtime_barrier+0x50/0x90
[   13.534781]  __driver_probe_device+0x78/0x170
[   13.534785]  driver_probe_device+0x1f/0x90
[   13.534789]  __driver_attach+0xce/0x1c0
[   13.534793]  ? __pfx___driver_attach+0x10/0x10
[   13.534797]  bus_for_each_dev+0x73/0xa0
[   13.534801]  bus_add_driver+0x1ae/0x200
[   13.534805]  driver_register+0x89/0xe0
[   13.534809]  ? __pfx_init_module+0x10/0x10 [amdgpu]
[   13.536000]  do_one_initcall+0x59/0x230
[   13.536010]  do_init_module+0x4a/0x200
[   13.536015]  __do_sys_init_module+0x157/0x180
[   13.536020]  do_syscall_64+0x5b/0x80
[   13.536025]  ? handle_mm_fault+0xff/0x2f0
[   13.536030]  ? do_user_addr_fault+0x1ef/0x690
[   13.536036]  ? exc_page_fault+0x70/0x170
[   13.536040]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[   13.536047] RIP: 0033:0x7fabd66cde4e
[   13.536051] Code: 48 8b 0d e5 5f 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 
66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b2 5f 0c 00 f7 d8 64 89 01 48
[   13.536057] RSP: 002b:00007ffdd89bc6a8 EFLAGS: 00000246 ORIG_RAX: 
00000000000000af
[   13.536063] RAX: ffffffffffffffda RBX: 0000563e4d23f0a0 RCX: 
00007fabd66cde4e
[   13.536066] RDX: 00007fabd6817453 RSI: 000000000174fb66 RDI: 
00007fabd3bd4010
[   13.536069] RBP: 00007fabd6817453 R08: 0000563e4d237c70 R09: 
00007fabd672f900
[   13.536072] R10: 0000000000000005 R11: 0000000000000246 R12: 
0000000000020000
[   13.536075] R13: 0000563e4d239060 R14: 0000000000000000 R15: 
0000563e4d23e450
[   13.536079]  </TASK>
[   13.536081] ---[ end trace 0000000000000000 ]---
[   13.536117] kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:9874
[   13.537198] kfd kfd: amdgpu: device 1002:9874 NOT added due to errors
[   13.537218] amdgpu 0000:00:01.0: amdgpu: SE 1, SH per SE 1, CU per SH 
8, active_cu_number 6
[   13.537481] BUG: kernel NULL pointer dereference, address: 
0000000000000058
[   13.537499] #PF: supervisor read access in kernel mode
[   13.537504] #PF: error_code(0x0000) - not-present page
[   13.537509] PGD 0 P4D 0
[   13.537515] Oops: 0000 [#1] PREEMPT SMP NOPTI
[   13.537522] CPU: 2 PID: 56 Comm: irq/24-AMD-Vi Kdump: loaded Tainted: 
G        W         -------  ---  6.2.0-0.rc1.14.fc38.x86_64 #1
[   13.537530] Hardware name: HP HP Laptop 15-bw0xx/8332, BIOS F.52 
12/03/2019
[   13.537534] RIP: 0010:report_iommu_fault+0x11/0x90
[   13.537548] Code: 0f 0b eb cd 0f 1f 44 00 00 90 90 90 90 90 90 90 90 
90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 55 41 54 41 89 cc 55 48 89 d5 
53 <48> 8b 47 48 48 89 f3 48 85 c0 74 64 4c 8b 47 50 e8 da 3f 57 00 41
[   13.537557] RSP: 0018:ffffbaf4403ebe08 EFLAGS: 00010246
[   13.537562] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 
0000000000000000
[   13.537567] RDX: 000000010e9b0400 RSI: ffff90aa00ac40d0 RDI: 
0000000000000010
[   13.537572] RBP: 000000010e9b0400 R08: ffff90aa011a8800 R09: 
0000000000000050
[   13.537576] R10: ffff90aa00244000 R11: 0000000000000000 R12: 
0000000000000000
[   13.537581] R13: ffff90aa0005b000 R14: 0000000000000008 R15: 
0000000000000000
[   13.537585] FS:  0000000000000000(0000) GS:ffff90aaf7500000(0000) 
knlGS:0000000000000000
[   13.537591] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   13.537596] CR2: 0000000000000058 CR3: 000000010e22c000 CR4: 
00000000001506e0
[   13.537601] Call Trace:
[   13.537607]  <TASK>
[   13.537612]  amd_iommu_int_thread+0x60c/0x760
[   13.537620]  ? __pfx_irq_thread_fn+0x10/0x10
[   13.537627]  irq_thread_fn+0x1f/0x60
[   13.537633]  irq_thread+0xea/0x1a0
[   13.537638]  ? preempt_count_add+0x6a/0xa0
[   13.537647]  ? __pfx_irq_thread_dtor+0x10/0x10
[   13.537652]  ? __pfx_irq_thread+0x10/0x10
[   13.537657]  kthread+0xe9/0x110
[   13.537662]  ? __pfx_kthread+0x10/0x10
[   13.537667]  ret_from_fork+0x2c/0x50
[   13.537676]  </TASK>
[   13.537678] Modules linked in: amdgpu(+) drm_ttm_helper ttm iommu_v2 
hid_logitech_hidpp crct10dif_pclmul drm_buddy crc32_pclmul gpu_sched 
crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel 
sha512_ssse3 drm_display_helper wdat_wdt serio_raw hid_multitouch 
sp5100_tco hid_logitech_dj r8169 cec video wmi scsi_dh_rdac scsi_dh_emc 
scsi_dh_alua fuse dm_multipath
[   13.537723] CR2: 0000000000000058
[   13.537727] ---[ end trace 0000000000000000 ]---
[   13.537731] RIP: 0010:report_iommu_fault+0x11/0x90
[   13.537737] Code: 0f 0b eb cd 0f 1f 44 00 00 90 90 90 90 90 90 90 90 
90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 55 41 54 41 89 cc 55 48 89 d5 
53 <48> 8b 47 48 48 89 f3 48 85 c0 74 64 4c 8b 47 50 e8 da 3f 57 00 41
[   13.537746] RSP: 0018:ffffbaf4403ebe08 EFLAGS: 00010246
[   13.537751] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 
0000000000000000
[   13.537755] RDX: 000000010e9b0400 RSI: ffff90aa00ac40d0 RDI: 
0000000000000010
[   13.537759] RBP: 000000010e9b0400 R08: ffff90aa011a8800 R09: 
0000000000000050
[   13.537764] R10: ffff90aa00244000 R11: 0000000000000000 R12: 
0000000000000000
[   13.537768] R13: ffff90aa0005b000 R14: 0000000000000008 R15: 
0000000000000000
[   13.537773] FS:  0000000000000000(0000) GS:ffff90aaf7500000(0000) 
knlGS:0000000000000000
[   13.537779] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   13.537783] CR2: 0000000000000058 CR3: 000000010e22c000 CR4: 
00000000001506e0
[   13.537795] genirq: exiting task "irq/24-AMD-Vi" (56) is an active 
IRQ thread (irq 24)
[   13.537808] general protection fault, probably for non-canonical 
address 0x1ee201e8df8948: 0000 [#2] PREEMPT SMP NOPTI
[   13.537815] CPU: 2 PID: 56 Comm: irq/24-AMD-Vi Kdump: loaded Tainted: 
G      D W         -------  ---  6.2.0-0.rc1.14.fc38.x86_64 #1
[   13.537822] Hardware name: HP HP Laptop 15-bw0xx/8332, BIOS F.52 
12/03/2019
[   13.537825] RIP: 0010:__x86_return_thunk+0x0/0x40
[   13.537833] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 
cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 
f6 <c3> cc 0f ae e8 eb f9 cc 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e
[   13.537840] RSP: 0018:ffffbaf4403ebeb0 EFLAGS: 00010282
[   13.537844] RAX: 001ee201e8df8948 RBX: fff38839e8df8948 RCX: 
0000000000000000
[   13.537848] RDX: 0000000080000000 RSI: ffff90aa00400b68 RDI: 
ffffffffad106b7f
[   13.537852] RBP: ffff90aa00aa0000 R08: ffff90aa00400c50 R09: 
ffffffffaf143f00
[   13.537856] R10: 0000000000000000 R11: 0000000000000000 R12: 
ffff90aa00aa0cac
[   13.537859] R13: ffff90aa00938001 R14: 0000000000000000 R15: 
0000000000000000
[   13.537863] FS:  0000000000000000(0000) GS:ffff90aaf7500000(0000) 
knlGS:0000000000000000
[   13.537868] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   13.537872] CR2: 0000000000000058 CR3: 000000010e22c000 CR4: 
00000000001506e0
[   13.537876] Call Trace:
[   13.537879]  <TASK>
[   13.537882]  ? task_work_run+0x59/0x90
[   13.537888]  ? do_exit+0x31f/0xaf0
[   13.537894]  ? __pfx_irq_thread_dtor+0x10/0x10
[   13.537900]  ? make_task_dead+0x7a/0x80
[   13.537905]  ? rewind_stack_and_make_dead+0x17/0x20
[   13.537912]  </TASK>
[   13.537914] Modules linked in: amdgpu(+) drm_ttm_helper ttm iommu_v2 
hid_logitech_hidpp crct10dif_pclmul drm_buddy crc32_pclmul gpu_sched 
crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel 
sha512_ssse3 drm_display_helper wdat_wdt serio_raw hid_multitouch 
sp5100_tco hid_logitech_dj r8169 cec video wmi scsi_dh_rdac scsi_dh_emc 
scsi_dh_alua fuse dm_multipath
[   13.537946] ---[ end trace 0000000000000000 ]---
[   13.537950] RIP: 0010:report_iommu_fault+0x11/0x90
[   13.537955] Code: 0f 0b eb cd 0f 1f 44 00 00 90 90 90 90 90 90 90 90 
90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 55 41 54 41 89 cc 55 48 89 d5 
53 <48> 8b 47 48 48 89 f3 48 85 c0 74 64 4c 8b 47 50 e8 da 3f 57 00 41
[   13.537962] RSP: 0018:ffffbaf4403ebe08 EFLAGS: 00010246
[   13.537967] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 
0000000000000000
[   13.537971] RDX: 000000010e9b0400 RSI: ffff90aa00ac40d0 RDI: 
0000000000000010
[   13.537974] RBP: 000000010e9b0400 R08: ffff90aa011a8800 R09: 
0000000000000050
[   13.537978] R10: ffff90aa00244000 R11: 0000000000000000 R12: 
0000000000000000
[   13.537982] R13: ffff90aa0005b000 R14: 0000000000000008 R15: 
0000000000000000
[   13.537986] FS:  0000000000000000(0000) GS:ffff90aaf7500000(0000) 
knlGS:0000000000000000
[   13.537991] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   13.537995] CR2: 0000000000000058 CR3: 000000010e22c000 CR4: 
00000000001506e0
[   13.537999] Fixing recursive fault but reboot is needed!
[   13.538003] check_preemption_disabled: 6 callbacks suppressed
[   13.538005] BUG: using smp_processor_id() in preemptible [00000000] 
code: irq/24-AMD-Vi/56
[   13.538012] caller is __schedule+0x30/0x1390
[   13.538017] CPU: 2 PID: 56 Comm: irq/24-AMD-Vi Kdump: loaded Tainted: 
G      D W         -------  ---  6.2.0-0.rc1.14.fc38.x86_64 #1
[   13.538023] Hardware name: HP HP Laptop 15-bw0xx/8332, BIOS F.52 
12/03/2019
[   13.538027] Call Trace:
[   13.538030]  <TASK>
[   13.538032]  dump_stack_lvl+0x44/0x5c
[   13.538039]  check_preemption_disabled+0xe1/0xf0
[   13.538045]  __schedule+0x30/0x1390
[   13.538049]  ? __wake_up_klogd.part.0+0x56/0x80
[   13.538055]  ? vprintk_emit+0x11d/0x290
[   13.538061]  ? _printk+0x5a/0x60
[   13.538068]  do_task_dead+0x3f/0x50
[   13.538074]  make_task_dead.cold+0x51/0xba
[   13.538080]  rewind_stack_and_make_dead+0x17/0x20
[   13.538085] RIP: 0000:0x0
[   13.538092] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[   13.538096] RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 
0000000000000000
[   13.538101] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 
0000000000000000
[   13.538105] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
0000000000000000
[   13.538108] RBP: 0000000000000000 R08: 0000000000000000 R09: 
0000000000000000
[   13.538112] R10: 0000000000000000 R11: 0000000000000000 R12: 
0000000000000000
[   13.538116] R13: 0000000000000000 R14: 0000000000000000 R15: 
0000000000000000
[   13.538121]  </TASK>
[   13.538124] BUG: scheduling while atomic: irq/24-AMD-Vi/56/0x00000000
[   13.538128] Modules linked in: amdgpu(+) drm_ttm_helper ttm iommu_v2 
hid_logitech_hidpp crct10dif_pclmul drm_buddy crc32_pclmul gpu_sched 
crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel 
sha512_ssse3 drm_display_helper wdat_wdt serio_raw hid_multitouch 
sp5100_tco hid_logitech_dj r8169 cec video wmi scsi_dh_rdac scsi_dh_emc 
scsi_dh_alua fuse dm_multipath
[   13.538159] Preemption disabled at:
[   13.538160] [<0000000000000000>] 0x0
[   13.538166] CPU: 2 PID: 56 Comm: irq/24-AMD-Vi Kdump: loaded Tainted: 
G      D W         -------  ---  6.2.0-0.rc1.14.fc38.x86_64 #1
[   13.538172] Hardware name: HP HP Laptop 15-bw0xx/8332, BIOS F.52 
12/03/2019
[   13.538175] Call Trace:
[   13.538178]  <TASK>
[   13.538180]  dump_stack_lvl+0x44/0x5c
[   13.538185]  __schedule_bug.cold+0x80/0x8d
[   13.538191]  __schedule+0xf5c/0x1390
[   13.538195]  ? __wake_up_klogd.part.0+0x56/0x80
[   13.538200]  ? vprintk_emit+0x11d/0x290
[   13.538206]  ? _printk+0x5a/0x60
[   13.538211]  do_task_dead+0x3f/0x50
[   13.538216]  make_task_dead.cold+0x51/0xba
[   13.538221]  rewind_stack_and_make_dead+0x17/0x20
[   13.538226] RIP: 0000:0x0
[   13.538231] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[   13.538234] RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 
0000000000000000
[   13.538240] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 
0000000000000000
[   13.538243] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
0000000000000000
[   13.538247] RBP: 0000000000000000 R08: 0000000000000000 R09: 
0000000000000000
[   13.538251] R10: 0000000000000000 R11: 0000000000000000 R12: 
0000000000000000
[   13.538254] R13: 0000000000000000 R14: 0000000000000000 R15: 
0000000000000000
[   13.538260]  </TASK>

I tried to use the crash program on the core dump but it stopped with an 
error
crash: page excluded: kernel virtual address: ffff90aa0044db60 type: 
"xa_node shift" I attached the full dmesg file vmcore-dmesg.txt at 
https://bugzilla.kernel.org/show_bug.cgi?id=216865#c2

  parent reply	other threads:[~2023-01-03 19:06 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-30  8:18 [regression, bisected, pci/iommu] Bug 216865 - Black screen when amdgpu started during 6.2-rc1 boot with AMD IOMMU enabled Thorsten Leemhuis
2023-01-03 10:30 ` Joerg Roedel
2023-01-03 19:06 ` Matt Fagnani [this message]
     [not found] ` <5aa0e698-f715-0481-36e5-46505024ebc1@bell.net>
2023-01-04  6:54   ` Baolu Lu
2023-01-04 15:50     ` Vasant Hegde
2023-01-05  1:09       ` Matt Fagnani
2023-01-05 10:27         ` Vasant Hegde
2023-01-05 10:37           ` Baolu Lu
2023-01-05 10:46             ` Vasant Hegde
2023-01-05 14:46               ` Deucher, Alexander
2023-01-05 15:27                 ` Felix Kuehling
2023-01-06  5:48                   ` Baolu Lu
2023-02-15 15:39                     ` Bjorn Helgaas
2023-02-15 15:39                       ` Bjorn Helgaas
2023-02-16  0:35                       ` Felix Kuehling
2023-02-16  0:35                         ` Felix Kuehling
2023-02-16  0:44                         ` Jason Gunthorpe
2023-02-16  0:44                           ` Jason Gunthorpe
2023-02-16  5:37                           ` Vasant Hegde
2023-02-16  5:37                             ` Vasant Hegde
2023-02-16 14:55                             ` Felix Kuehling
2023-02-16 14:55                               ` Felix Kuehling
2023-02-16 14:53                           ` Felix Kuehling
2023-02-16 14:53                             ` Felix Kuehling
2023-02-16  5:25                         ` Vasant Hegde
2023-02-16  5:25                           ` Vasant Hegde
2023-02-16 18:59                           ` Matt Fagnani
2023-02-16 18:59                             ` Matt Fagnani
2023-02-16 19:59                             ` Felix Kuehling
2023-02-16 19:59                               ` Felix Kuehling
2023-02-17  5:36                               ` Vasant Hegde
2023-02-17  5:36                                 ` Vasant Hegde
2023-02-17  5:23                             ` Vasant Hegde
2023-02-17  5:23                               ` Vasant Hegde
2023-01-05 19:51           ` Matt Fagnani
2023-01-06  7:28           ` Matt Fagnani
2023-01-10 16:08             ` Vasant Hegde
2023-01-10 16:12               ` Vasant Hegde
2023-01-06 14:14           ` Jason Gunthorpe
2023-01-07  2:44             ` Baolu Lu
2023-01-09 13:43               ` Jason Gunthorpe
2023-01-10  5:28                 ` Baolu Lu
2023-01-10  5:48             ` Baolu Lu
2023-01-10  8:06               ` Matt Fagnani
     [not found]                 ` <bb3d5d1a-c222-9270-60fa-7d0b74bebd1a@linux.intel.com>
2023-01-10 22:12                   ` Matt Fagnani
2023-01-10 22:12                     ` Matt Fagnani
2023-01-10 13:25               ` Jason Gunthorpe
2023-01-10 13:25                 ` Jason Gunthorpe
2023-01-10 13:45                 ` Christian König
2023-01-10 13:45                   ` Christian König
2023-01-10 13:51                   ` Jason Gunthorpe
2023-01-10 13:51                     ` Jason Gunthorpe
2023-01-10 13:56                     ` Christian König
2023-01-10 13:56                       ` Christian König
2023-01-10 20:51                       ` Matt Fagnani
2023-01-10 20:51                         ` Matt Fagnani
2023-01-11  8:35                         ` Christian König
2023-01-11  8:35                           ` Christian König
2023-01-10 15:05                   ` Felix Kuehling
2023-01-10 15:05                     ` Felix Kuehling
2023-01-10 15:19                     ` Jason Gunthorpe
2023-01-10 15:19                       ` Jason Gunthorpe
2023-01-10 15:21                       ` Felix Kuehling
2023-01-10 15:21                         ` Felix Kuehling
2023-01-11  3:16                 ` Baolu Lu
2023-01-11  3:16                   ` Baolu Lu
2023-01-11 13:08                   ` Jason Gunthorpe
2023-01-11 13:08                     ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52583644-d875-a454-7288-8b00ea0566ae@bell.net \
    --to=matt.fagnani@bell.net \
    --cc=baolu.lu@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=iommu@lists.linux.dev \
    --cc=jroedel@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=regressions@leemhuis.info \
    --cc=regressions@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.