All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdgpu: Fix NULL pointer issue
@ 2023-10-26  8:36 Jesse Zhang
  2023-10-26  8:54 ` Li, Candice
  0 siblings, 1 reply; 3+ messages in thread
From: Jesse Zhang @ 2023-10-26  8:36 UTC (permalink / raw)
  To: amd-gfx
  Cc: Alexander.Deucher, Philip.Yang, Felix.Kuehling, Jesse Zhang,
	Yifan1.Zhang

Add check for ras pointers.
Issues caused by this commit: be5c7eb104067d61

[ 2312.987618] BUG: kernel NULL pointer dereference, address: 00000000000000e8
[ 2312.987622] #PF: supervisor read access in kernel mode
[ 2312.987624] #PF: error_code(0x0000) - not-present page
[ 2312.987625] PGD 0 P4D 0
[ 2312.987627] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 2312.987630] CPU: 9 PID: 1749 Comm: modprobe Not tainted 6.3.7-38fc8aadcfb2 #1
[ 2312.987632] Hardware name: AMD Celadon-CZN/Celadon-CZN, BIOS TLD1001Bb 12/01/2020
[ 2312.987634] RIP: 0010:amdgpu_ras_reset_error_count+0x126/0x140 [amdgpu]
[ 2312.987852] Code: 10 48 c7 c1 ec 6a 54 c1 77 08 4a 8b 0c ed c0 35 59 c1 48 8b 33 48 c7 c2 78 a7 4d c1 48 c7 c7 60 a4 5c c1 e8 8c 9e ca d0 eb bf <41> 8b 86 e8 00 00 00 85 c0 0f 84 37 ff ff ff e9 26 ff ff ff 31 c0
[ 2312.987855] RSP: 0018:ffffa40402e378e0 EFLAGS: 00010246
[ 2312.987856] RAX: 0000000000000000 RBX: ffff90cf09580000 RCX: 0000000000000000
[ 2312.987858] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffff90cf09580000
[ 2312.987859] RBP: ffffa40402e37908 R08: 0000000000000000 R09: c0000000fffeffff
[ 2312.987860] R10: 0000000000000000 R11: ffffa40402e37640 R12: ffffffffc1593d80
[ 2312.987861] R13: 0000000000000006 R14: 0000000000000000 R15: 0000000000000000
[ 2312.987862] FS:  00007fb5d3b33c40(0000) GS:ffff90d006840000(0000) knlGS:0000000000000000
[ 2312.987864] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2312.987865] CR2: 00000000000000e8 CR3: 000000010ae24000 CR4: 0000000000750ee0
[ 2312.987867] PKRU: 55555554
[ 2312.987868] Call Trace:
[ 2312.987870]  <TASK>
[ 2312.987872]  ? show_regs+0x5b/0x70
[ 2312.987877]  ? __die_body+0x1f/0x70
[ 2312.987879]  ? __die+0x2a/0x40
[ 2312.987881]  ? page_fault_oops+0x156/0x470
[ 2312.987884]  ? dev_printk_emit+0x87/0xc0
[ 2312.987889]  ? do_user_addr_fault+0x34a/0x720
[ 2312.987891]  ? exc_page_fault+0x75/0x180
[ 2312.987895]  ? asm_exc_page_fault+0x27/0x30
[ 2312.987898]  ? amdgpu_ras_reset_error_count+0x126/0x140 [amdgpu]
[ 2312.987980]  gmc_v9_0_late_init+0x7f/0xc0 [amdgpu]
[ 2312.988064]  amdgpu_device_ip_late_init+0x49/0x2b0 [amdgpu]
[ 2312.988144]  ? mutex_lock+0x12/0x40
[ 2312.988148]  amdgpu_device_init+0x2253/0x24e0 [amdgpu]
[ 2312.988225]  ? pci_read_config_word+0x23/0x40
[ 2312.988230]  amdgpu_driver_load_kms+0x1a/0x1a0 [amdgpu]
[ 2312.988278]  amdgpu_pci_probe+0x16b/0x4a0 [amdgpu]
[ 2312.988278]  local_pci_probe+0x4a/0xb0
[ 2312.988278]  pci_device_probe+0xd9/0x240
[ 2312.988278]  really_probe+0x116/0x3e0
[ 2312.988278]  ? pm_runtime_barrier+0x55/0xa0
[ 2312.988278]  __driver_probe_device+0x81/0x160
[ 2312.988278]  driver_probe_device+0x24/0xb0
[ 2312.988278]  __driver_attach+0x10e/0x170
[ 2312.988278]  ? __device_attach_driver+0x120/0x120
[ 2312.988278]  bus_for_each_dev+0x7b/0xd0
[ 2312.988278]  driver_attach+0x1e/0x30
[ 2312.988278]  bus_add_driver+0x11d/0x220
[ 2312.988278]  ? 0xffffffffc0b56000
[ 2312.988278]  driver_register+0x5e/0x120
[ 2312.988278]  ? 0xffffffffc0b56000
[ 2312.988278]  __pci_register_driver+0x68/0x70
[ 2312.988278]  amdgpu_init+0x74/0x1000 [amdgpu]
[ 2312.988278]  do_one_initcall+0x48/0x210
[ 2312.988278]  ? kmalloc_trace+0x2a/0xa0
[ 2312.988278]  do_init_module+0x4f/0x1f3
[ 2312.988278]  load_module+0x21fe/0x23f0
[ 2312.988278]  ? kernel_read_file+0x291/0x310
[ 2312.988278]  __do_sys_finit_module+0xc0/0x130
[ 2312.988278]  ? __do_sys_finit_module+0xc0/0x130
[ 2312.988278]  __x64_sys_finit_module+0x1a/0x20
[ 2312.988278]  do_syscall_64+0x3a/0x90
[ 2312.988278]  entry_SYSCALL_64_after_hwframe+0x63/0xcd

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 303fbb6a48b6..33801a5bb460 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1223,7 +1223,7 @@ int amdgpu_ras_reset_error_count(struct amdgpu_device *adev,
 	struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
 	const struct amdgpu_mca_smu_funcs *mca_funcs = adev->mca.mca_funcs;
 
-	if (!block_obj || !block_obj->hw_ops) {
+	if (!block_obj || !block_obj->hw_ops || !ras) {
 		dev_dbg_once(adev->dev, "%s doesn't config RAS function\n",
 				ras_block_str(block));
 		return -EOPNOTSUPP;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* RE: [PATCH] drm/amdgpu: Fix NULL pointer issue
  2023-10-26  8:36 [PATCH] drm/amdgpu: Fix NULL pointer issue Jesse Zhang
@ 2023-10-26  8:54 ` Li, Candice
  2023-10-26  9:08   ` Zhang, Jesse(Jie)
  0 siblings, 1 reply; 3+ messages in thread
From: Li, Candice @ 2023-10-26  8:54 UTC (permalink / raw)
  To: Zhang, Jesse(Jie), amd-gfx
  Cc: Deucher, Alexander, Yang, Philip, Kuehling, Felix, Zhang,
	 Jesse(Jie),
	Zhang, Yifan

[AMD Official Use Only - General]

Looks like Tao's patch already fixed it, [PATCH] drm/amdgpu: check RAS supported first in ras_reset_error_count



Thanks,
Candice

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Jesse Zhang
Sent: Thursday, October 26, 2023 4:37 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Yang, Philip <Philip.Yang@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Zhang, Jesse(Jie) <Jesse.Zhang@amd.com>; Zhang, Yifan <Yifan1.Zhang@amd.com>
Subject: [PATCH] drm/amdgpu: Fix NULL pointer issue

Add check for ras pointers.
Issues caused by this commit: be5c7eb104067d61

[ 2312.987618] BUG: kernel NULL pointer dereference, address: 00000000000000e8
[ 2312.987622] #PF: supervisor read access in kernel mode
[ 2312.987624] #PF: error_code(0x0000) - not-present page
[ 2312.987625] PGD 0 P4D 0
[ 2312.987627] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 2312.987630] CPU: 9 PID: 1749 Comm: modprobe Not tainted 6.3.7-38fc8aadcfb2 #1
[ 2312.987632] Hardware name: AMD Celadon-CZN/Celadon-CZN, BIOS TLD1001Bb 12/01/2020
[ 2312.987634] RIP: 0010:amdgpu_ras_reset_error_count+0x126/0x140 [amdgpu]
[ 2312.987852] Code: 10 48 c7 c1 ec 6a 54 c1 77 08 4a 8b 0c ed c0 35 59 c1 48 8b 33 48 c7 c2 78 a7 4d c1 48 c7 c7 60 a4 5c c1 e8 8c 9e ca d0 eb bf <41> 8b 86 e8 00 00 00 85 c0 0f 84 37 ff ff ff e9 26 ff ff ff 31 c0
[ 2312.987855] RSP: 0018:ffffa40402e378e0 EFLAGS: 00010246
[ 2312.987856] RAX: 0000000000000000 RBX: ffff90cf09580000 RCX: 0000000000000000
[ 2312.987858] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffff90cf09580000
[ 2312.987859] RBP: ffffa40402e37908 R08: 0000000000000000 R09: c0000000fffeffff
[ 2312.987860] R10: 0000000000000000 R11: ffffa40402e37640 R12: ffffffffc1593d80
[ 2312.987861] R13: 0000000000000006 R14: 0000000000000000 R15: 0000000000000000
[ 2312.987862] FS:  00007fb5d3b33c40(0000) GS:ffff90d006840000(0000) knlGS:0000000000000000
[ 2312.987864] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2312.987865] CR2: 00000000000000e8 CR3: 000000010ae24000 CR4: 0000000000750ee0
[ 2312.987867] PKRU: 55555554
[ 2312.987868] Call Trace:
[ 2312.987870]  <TASK>
[ 2312.987872]  ? show_regs+0x5b/0x70
[ 2312.987877]  ? __die_body+0x1f/0x70
[ 2312.987879]  ? __die+0x2a/0x40
[ 2312.987881]  ? page_fault_oops+0x156/0x470
[ 2312.987884]  ? dev_printk_emit+0x87/0xc0
[ 2312.987889]  ? do_user_addr_fault+0x34a/0x720
[ 2312.987891]  ? exc_page_fault+0x75/0x180
[ 2312.987895]  ? asm_exc_page_fault+0x27/0x30
[ 2312.987898]  ? amdgpu_ras_reset_error_count+0x126/0x140 [amdgpu]
[ 2312.987980]  gmc_v9_0_late_init+0x7f/0xc0 [amdgpu]
[ 2312.988064]  amdgpu_device_ip_late_init+0x49/0x2b0 [amdgpu]
[ 2312.988144]  ? mutex_lock+0x12/0x40
[ 2312.988148]  amdgpu_device_init+0x2253/0x24e0 [amdgpu]
[ 2312.988225]  ? pci_read_config_word+0x23/0x40
[ 2312.988230]  amdgpu_driver_load_kms+0x1a/0x1a0 [amdgpu]
[ 2312.988278]  amdgpu_pci_probe+0x16b/0x4a0 [amdgpu]
[ 2312.988278]  local_pci_probe+0x4a/0xb0
[ 2312.988278]  pci_device_probe+0xd9/0x240
[ 2312.988278]  really_probe+0x116/0x3e0
[ 2312.988278]  ? pm_runtime_barrier+0x55/0xa0
[ 2312.988278]  __driver_probe_device+0x81/0x160
[ 2312.988278]  driver_probe_device+0x24/0xb0
[ 2312.988278]  __driver_attach+0x10e/0x170
[ 2312.988278]  ? __device_attach_driver+0x120/0x120
[ 2312.988278]  bus_for_each_dev+0x7b/0xd0
[ 2312.988278]  driver_attach+0x1e/0x30
[ 2312.988278]  bus_add_driver+0x11d/0x220
[ 2312.988278]  ? 0xffffffffc0b56000
[ 2312.988278]  driver_register+0x5e/0x120
[ 2312.988278]  ? 0xffffffffc0b56000
[ 2312.988278]  __pci_register_driver+0x68/0x70
[ 2312.988278]  amdgpu_init+0x74/0x1000 [amdgpu]
[ 2312.988278]  do_one_initcall+0x48/0x210
[ 2312.988278]  ? kmalloc_trace+0x2a/0xa0
[ 2312.988278]  do_init_module+0x4f/0x1f3
[ 2312.988278]  load_module+0x21fe/0x23f0
[ 2312.988278]  ? kernel_read_file+0x291/0x310
[ 2312.988278]  __do_sys_finit_module+0xc0/0x130
[ 2312.988278]  ? __do_sys_finit_module+0xc0/0x130
[ 2312.988278]  __x64_sys_finit_module+0x1a/0x20
[ 2312.988278]  do_syscall_64+0x3a/0x90
[ 2312.988278]  entry_SYSCALL_64_after_hwframe+0x63/0xcd

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 303fbb6a48b6..33801a5bb460 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1223,7 +1223,7 @@ int amdgpu_ras_reset_error_count(struct amdgpu_device *adev,
        struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
        const struct amdgpu_mca_smu_funcs *mca_funcs = adev->mca.mca_funcs;

-       if (!block_obj || !block_obj->hw_ops) {
+       if (!block_obj || !block_obj->hw_ops || !ras) {
                dev_dbg_once(adev->dev, "%s doesn't config RAS function\n",
                                ras_block_str(block));
                return -EOPNOTSUPP;
--
2.25.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* RE: [PATCH] drm/amdgpu: Fix NULL pointer issue
  2023-10-26  8:54 ` Li, Candice
@ 2023-10-26  9:08   ` Zhang, Jesse(Jie)
  0 siblings, 0 replies; 3+ messages in thread
From: Zhang, Jesse(Jie) @ 2023-10-26  9:08 UTC (permalink / raw)
  To: Li, Candice, amd-gfx
  Cc: Deucher, Alexander, Yang, Philip, Kuehling, Felix, Zhang, Yifan

[AMD Official Use Only - General]

Looks like Tao's patch already fixed it, [PATCH] drm/amdgpu: check RAS supported first in ras_reset_error_count

[Zhang, Jesse(Jie)] I see it, Thanks for you reminder.

Thanks,
Candice

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Jesse Zhang
Sent: Thursday, October 26, 2023 4:37 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Yang, Philip <Philip.Yang@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Zhang, Jesse(Jie) <Jesse.Zhang@amd.com>; Zhang, Yifan <Yifan1.Zhang@amd.com>
Subject: [PATCH] drm/amdgpu: Fix NULL pointer issue

Add check for ras pointers.
Issues caused by this commit: be5c7eb104067d61

[ 2312.987618] BUG: kernel NULL pointer dereference, address: 00000000000000e8 [ 2312.987622] #PF: supervisor read access in kernel mode [ 2312.987624] #PF: error_code(0x0000) - not-present page [ 2312.987625] PGD 0 P4D 0 [ 2312.987627] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 2312.987630] CPU: 9 PID: 1749 Comm: modprobe Not tainted 6.3.7-38fc8aadcfb2 #1 [ 2312.987632] Hardware name: AMD Celadon-CZN/Celadon-CZN, BIOS TLD1001Bb 12/01/2020 [ 2312.987634] RIP: 0010:amdgpu_ras_reset_error_count+0x126/0x140 [amdgpu] [ 2312.987852] Code: 10 48 c7 c1 ec 6a 54 c1 77 08 4a 8b 0c ed c0 35 59 c1 48 8b 33 48 c7 c2 78 a7 4d c1 48 c7 c7 60 a4 5c c1 e8 8c 9e ca d0 eb bf <41> 8b 86 e8 00 00 00 85 c0 0f 84 37 ff ff ff e9 26 ff ff ff 31 c0 [ 2312.987855] RSP: 0018:ffffa40402e378e0 EFLAGS: 00010246 [ 2312.987856] RAX: 0000000000000000 RBX: ffff90cf09580000 RCX: 0000000000000000 [ 2312.987858] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffff90cf09580000 [ 2312.987859] RBP: ffffa40402e37908 R08: 0000000000000000 R09: c0000000fffeffff [ 2312.987860] R10: 0000000000000000 R11: ffffa40402e37640 R12: ffffffffc1593d80 [ 2312.987861] R13: 0000000000000006 R14: 0000000000000000 R15: 0000000000000000 [ 2312.987862] FS:  00007fb5d3b33c40(0000) GS:ffff90d006840000(0000) knlGS:0000000000000000 [ 2312.987864] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2312.987865] CR2: 00000000000000e8 CR3: 000000010ae24000 CR4: 0000000000750ee0 [ 2312.987867] PKRU: 55555554 [ 2312.987868] Call Trace:
[ 2312.987870]  <TASK>
[ 2312.987872]  ? show_regs+0x5b/0x70
[ 2312.987877]  ? __die_body+0x1f/0x70
[ 2312.987879]  ? __die+0x2a/0x40
[ 2312.987881]  ? page_fault_oops+0x156/0x470 [ 2312.987884]  ? dev_printk_emit+0x87/0xc0 [ 2312.987889]  ? do_user_addr_fault+0x34a/0x720 [ 2312.987891]  ? exc_page_fault+0x75/0x180 [ 2312.987895]  ? asm_exc_page_fault+0x27/0x30 [ 2312.987898]  ? amdgpu_ras_reset_error_count+0x126/0x140 [amdgpu] [ 2312.987980]  gmc_v9_0_late_init+0x7f/0xc0 [amdgpu] [ 2312.988064]  amdgpu_device_ip_late_init+0x49/0x2b0 [amdgpu] [ 2312.988144]  ? mutex_lock+0x12/0x40 [ 2312.988148]  amdgpu_device_init+0x2253/0x24e0 [amdgpu] [ 2312.988225]  ? pci_read_config_word+0x23/0x40 [ 2312.988230]  amdgpu_driver_load_kms+0x1a/0x1a0 [amdgpu] [ 2312.988278]  amdgpu_pci_probe+0x16b/0x4a0 [amdgpu] [ 2312.988278]  local_pci_probe+0x4a/0xb0 [ 2312.988278]  pci_device_probe+0xd9/0x240 [ 2312.988278]  really_probe+0x116/0x3e0 [ 2312.988278]  ? pm_runtime_barrier+0x55/0xa0 [ 2312.988278]  __driver_probe_device+0x81/0x160 [ 2312.988278]  driver_probe_device+0x24/0xb0 [ 2312.988278]  __driver_attach+0x10e/0x170 [ 2312.988278]  ? __device_attach_driver+0x120/0x120
[ 2312.988278]  bus_for_each_dev+0x7b/0xd0 [ 2312.988278]  driver_attach+0x1e/0x30 [ 2312.988278]  bus_add_driver+0x11d/0x220 [ 2312.988278]  ? 0xffffffffc0b56000 [ 2312.988278]  driver_register+0x5e/0x120 [ 2312.988278]  ? 0xffffffffc0b56000 [ 2312.988278]  __pci_register_driver+0x68/0x70 [ 2312.988278]  amdgpu_init+0x74/0x1000 [amdgpu] [ 2312.988278]  do_one_initcall+0x48/0x210 [ 2312.988278]  ? kmalloc_trace+0x2a/0xa0 [ 2312.988278]  do_init_module+0x4f/0x1f3 [ 2312.988278]  load_module+0x21fe/0x23f0 [ 2312.988278]  ? kernel_read_file+0x291/0x310 [ 2312.988278]  __do_sys_finit_module+0xc0/0x130 [ 2312.988278]  ? __do_sys_finit_module+0xc0/0x130 [ 2312.988278]  __x64_sys_finit_module+0x1a/0x20 [ 2312.988278]  do_syscall_64+0x3a/0x90 [ 2312.988278]  entry_SYSCALL_64_after_hwframe+0x63/0xcd

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 303fbb6a48b6..33801a5bb460 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1223,7 +1223,7 @@ int amdgpu_ras_reset_error_count(struct amdgpu_device *adev,
        struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
        const struct amdgpu_mca_smu_funcs *mca_funcs = adev->mca.mca_funcs;

-       if (!block_obj || !block_obj->hw_ops) {
+       if (!block_obj || !block_obj->hw_ops || !ras) {
                dev_dbg_once(adev->dev, "%s doesn't config RAS function\n",
                                ras_block_str(block));
                return -EOPNOTSUPP;
--
2.25.1



^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-10-26  9:08 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-26  8:36 [PATCH] drm/amdgpu: Fix NULL pointer issue Jesse Zhang
2023-10-26  8:54 ` Li, Candice
2023-10-26  9:08   ` Zhang, Jesse(Jie)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.