amd-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1] drm/amd/powerplay: Fix NULL dereference in lock_bus() on Vega20 w/o RAS
@ 2020-06-25 16:50 Ivan Mironov
  2020-06-25 16:56 ` Ivan Mironov
  2020-06-25 17:16 ` Alex Deucher
  0 siblings, 2 replies; 3+ messages in thread
From: Ivan Mironov @ 2020-06-25 16:50 UTC (permalink / raw)
  To: amd-gfx
  Cc: Andrey Grodzovsky, Ivan Mironov, Bjorn Nostvold, David Airlie,
	linux-kernel, dri-devel, stable, Daniel Vetter, Alex Deucher,
	Evan Quan, Christian König

I updated my system with Radeon VII from kernel 5.6 to kernel 5.7, and
following started to happen on each boot:

	...
	BUG: kernel NULL pointer dereference, address: 0000000000000128
	...
	CPU: 9 PID: 1940 Comm: modprobe Tainted: G            E     5.7.2-200.im0.fc32.x86_64 #1
	Hardware name: System manufacturer System Product Name/PRIME X570-P, BIOS 1407 04/02/2020
	RIP: 0010:lock_bus+0x42/0x60 [amdgpu]
	...
	Call Trace:
	 i2c_smbus_xfer+0x3d/0xf0
	 i2c_default_probe+0xf3/0x130
	 i2c_detect.isra.0+0xfe/0x2b0
	 ? kfree+0xa3/0x200
	 ? kobject_uevent_env+0x11f/0x6a0
	 ? i2c_detect.isra.0+0x2b0/0x2b0
	 __process_new_driver+0x1b/0x20
	 bus_for_each_dev+0x64/0x90
	 ? 0xffffffffc0f34000
	 i2c_register_driver+0x73/0xc0
	 do_one_initcall+0x46/0x200
	 ? _cond_resched+0x16/0x40
	 ? kmem_cache_alloc_trace+0x167/0x220
	 ? do_init_module+0x23/0x260
	 do_init_module+0x5c/0x260
	 __do_sys_init_module+0x14f/0x170
	 do_syscall_64+0x5b/0xf0
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
	...

Error appears when some i2c device driver tries to probe for devices
using adapter registered by `smu_v11_0_i2c_eeprom_control_init()`.
Code supporting this adapter requires `adev->psp.ras.ras` to be not
NULL, which is true only when `amdgpu_ras_init()` detects HW support by
calling `amdgpu_ras_check_supported()`.

Before 9015d60c9ee1, adapter was registered by

	-> amdgpu_device_ip_init()
	  -> amdgpu_ras_recovery_init()
	    -> amdgpu_ras_eeprom_init()
	      -> smu_v11_0_i2c_eeprom_control_init()

after verifying that `adev->psp.ras.ras` is not NULL in
`amdgpu_ras_recovery_init()`. Currently it is registered
unconditionally by

	-> amdgpu_device_ip_init()
	  -> pp_sw_init()
	    -> hwmgr_sw_init()
	      -> vega20_smu_init()
	        -> smu_v11_0_i2c_eeprom_control_init()

Fix simply adds HW support check (ras == NULL => no support) before
calling `smu_v11_0_i2c_eeprom_control_{init,fini}()`.

Please note that there is a chance that similar fix is also required for
CHIP_ARCTURUS. I do not know whether any actual Arcturus hardware without
RAS exist, and whether calling `smu_i2c_eeprom_init()` makes any sense
when there is no HW support.

Cc: stable@vger.kernel.org
Fixes: 9015d60c9ee1 ("drm/amdgpu: Move EEPROM I2C adapter to amdgpu_device")
Signed-off-by: Ivan Mironov <mironov.ivan@gmail.com>
Tested-by: Bjorn Nostvold <bjorn.nostvold@gmail.com>
---
Changelog:

v1:
  - Added "Tested-by" for another user who used this patch to fix the
    same issue.

v0:
  - Patch introduced.
---
 drivers/gpu/drm/amd/powerplay/smumgr/vega20_smumgr.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/vega20_smumgr.c b/drivers/gpu/drm/amd/powerplay/smumgr/vega20_smumgr.c
index 2fb97554134f..c2e0fbbccf56 100644
--- a/drivers/gpu/drm/amd/powerplay/smumgr/vega20_smumgr.c
+++ b/drivers/gpu/drm/amd/powerplay/smumgr/vega20_smumgr.c
@@ -522,9 +522,11 @@ static int vega20_smu_init(struct pp_hwmgr *hwmgr)
 	priv->smu_tables.entry[TABLE_ACTIVITY_MONITOR_COEFF].version = 0x01;
 	priv->smu_tables.entry[TABLE_ACTIVITY_MONITOR_COEFF].size = sizeof(DpmActivityMonitorCoeffInt_t);
 
-	ret = smu_v11_0_i2c_eeprom_control_init(&adev->pm.smu_i2c);
-	if (ret)
-		goto err4;
+	if (adev->psp.ras.ras) {
+		ret = smu_v11_0_i2c_eeprom_control_init(&adev->pm.smu_i2c);
+		if (ret)
+			goto err4;
+	}
 
 	return 0;
 
@@ -560,7 +562,8 @@ static int vega20_smu_fini(struct pp_hwmgr *hwmgr)
 			(struct vega20_smumgr *)(hwmgr->smu_backend);
 	struct amdgpu_device *adev = hwmgr->adev;
 
-	smu_v11_0_i2c_eeprom_control_fini(&adev->pm.smu_i2c);
+	if (adev->psp.ras.ras)
+		smu_v11_0_i2c_eeprom_control_fini(&adev->pm.smu_i2c);
 
 	if (priv) {
 		amdgpu_bo_free_kernel(&priv->smu_tables.entry[TABLE_PPTABLE].handle,
-- 
2.26.2

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v1] drm/amd/powerplay: Fix NULL dereference in lock_bus() on Vega20 w/o RAS
  2020-06-25 16:50 [PATCH v1] drm/amd/powerplay: Fix NULL dereference in lock_bus() on Vega20 w/o RAS Ivan Mironov
@ 2020-06-25 16:56 ` Ivan Mironov
  2020-06-25 17:16 ` Alex Deucher
  1 sibling, 0 replies; 3+ messages in thread
From: Ivan Mironov @ 2020-06-25 16:56 UTC (permalink / raw)
  To: amd-gfx
  Cc: Andrey Grodzovsky, Bjorn Nostvold, David Airlie, linux-kernel,
	dri-devel, stable, Daniel Vetter, Alex Deucher, Evan Quan,
	Christian König

Issue still reproduces on latest 5.8.0-rc2+
(8be3a53e18e0e1a98f288f6c7f5e9da3adbe9c49).

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v1] drm/amd/powerplay: Fix NULL dereference in lock_bus() on Vega20 w/o RAS
  2020-06-25 16:50 [PATCH v1] drm/amd/powerplay: Fix NULL dereference in lock_bus() on Vega20 w/o RAS Ivan Mironov
  2020-06-25 16:56 ` Ivan Mironov
@ 2020-06-25 17:16 ` Alex Deucher
  1 sibling, 0 replies; 3+ messages in thread
From: Alex Deucher @ 2020-06-25 17:16 UTC (permalink / raw)
  To: Ivan Mironov
  Cc: Andrey Grodzovsky, Bjorn Nostvold, David Airlie, LKML,
	Maling list - DRI developers, amd-gfx list, Daniel Vetter,
	Alex Deucher, for 3.8, Evan Quan, Christian König

Applied.  Thanks!

Alex

On Thu, Jun 25, 2020 at 1:14 PM Ivan Mironov <mironov.ivan@gmail.com> wrote:
>
> I updated my system with Radeon VII from kernel 5.6 to kernel 5.7, and
> following started to happen on each boot:
>
>         ...
>         BUG: kernel NULL pointer dereference, address: 0000000000000128
>         ...
>         CPU: 9 PID: 1940 Comm: modprobe Tainted: G            E     5.7.2-200.im0.fc32.x86_64 #1
>         Hardware name: System manufacturer System Product Name/PRIME X570-P, BIOS 1407 04/02/2020
>         RIP: 0010:lock_bus+0x42/0x60 [amdgpu]
>         ...
>         Call Trace:
>          i2c_smbus_xfer+0x3d/0xf0
>          i2c_default_probe+0xf3/0x130
>          i2c_detect.isra.0+0xfe/0x2b0
>          ? kfree+0xa3/0x200
>          ? kobject_uevent_env+0x11f/0x6a0
>          ? i2c_detect.isra.0+0x2b0/0x2b0
>          __process_new_driver+0x1b/0x20
>          bus_for_each_dev+0x64/0x90
>          ? 0xffffffffc0f34000
>          i2c_register_driver+0x73/0xc0
>          do_one_initcall+0x46/0x200
>          ? _cond_resched+0x16/0x40
>          ? kmem_cache_alloc_trace+0x167/0x220
>          ? do_init_module+0x23/0x260
>          do_init_module+0x5c/0x260
>          __do_sys_init_module+0x14f/0x170
>          do_syscall_64+0x5b/0xf0
>          entry_SYSCALL_64_after_hwframe+0x44/0xa9
>         ...
>
> Error appears when some i2c device driver tries to probe for devices
> using adapter registered by `smu_v11_0_i2c_eeprom_control_init()`.
> Code supporting this adapter requires `adev->psp.ras.ras` to be not
> NULL, which is true only when `amdgpu_ras_init()` detects HW support by
> calling `amdgpu_ras_check_supported()`.
>
> Before 9015d60c9ee1, adapter was registered by
>
>         -> amdgpu_device_ip_init()
>           -> amdgpu_ras_recovery_init()
>             -> amdgpu_ras_eeprom_init()
>               -> smu_v11_0_i2c_eeprom_control_init()
>
> after verifying that `adev->psp.ras.ras` is not NULL in
> `amdgpu_ras_recovery_init()`. Currently it is registered
> unconditionally by
>
>         -> amdgpu_device_ip_init()
>           -> pp_sw_init()
>             -> hwmgr_sw_init()
>               -> vega20_smu_init()
>                 -> smu_v11_0_i2c_eeprom_control_init()
>
> Fix simply adds HW support check (ras == NULL => no support) before
> calling `smu_v11_0_i2c_eeprom_control_{init,fini}()`.
>
> Please note that there is a chance that similar fix is also required for
> CHIP_ARCTURUS. I do not know whether any actual Arcturus hardware without
> RAS exist, and whether calling `smu_i2c_eeprom_init()` makes any sense
> when there is no HW support.
>
> Cc: stable@vger.kernel.org
> Fixes: 9015d60c9ee1 ("drm/amdgpu: Move EEPROM I2C adapter to amdgpu_device")
> Signed-off-by: Ivan Mironov <mironov.ivan@gmail.com>
> Tested-by: Bjorn Nostvold <bjorn.nostvold@gmail.com>
> ---
> Changelog:
>
> v1:
>   - Added "Tested-by" for another user who used this patch to fix the
>     same issue.
>
> v0:
>   - Patch introduced.
> ---
>  drivers/gpu/drm/amd/powerplay/smumgr/vega20_smumgr.c | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/vega20_smumgr.c b/drivers/gpu/drm/amd/powerplay/smumgr/vega20_smumgr.c
> index 2fb97554134f..c2e0fbbccf56 100644
> --- a/drivers/gpu/drm/amd/powerplay/smumgr/vega20_smumgr.c
> +++ b/drivers/gpu/drm/amd/powerplay/smumgr/vega20_smumgr.c
> @@ -522,9 +522,11 @@ static int vega20_smu_init(struct pp_hwmgr *hwmgr)
>         priv->smu_tables.entry[TABLE_ACTIVITY_MONITOR_COEFF].version = 0x01;
>         priv->smu_tables.entry[TABLE_ACTIVITY_MONITOR_COEFF].size = sizeof(DpmActivityMonitorCoeffInt_t);
>
> -       ret = smu_v11_0_i2c_eeprom_control_init(&adev->pm.smu_i2c);
> -       if (ret)
> -               goto err4;
> +       if (adev->psp.ras.ras) {
> +               ret = smu_v11_0_i2c_eeprom_control_init(&adev->pm.smu_i2c);
> +               if (ret)
> +                       goto err4;
> +       }
>
>         return 0;
>
> @@ -560,7 +562,8 @@ static int vega20_smu_fini(struct pp_hwmgr *hwmgr)
>                         (struct vega20_smumgr *)(hwmgr->smu_backend);
>         struct amdgpu_device *adev = hwmgr->adev;
>
> -       smu_v11_0_i2c_eeprom_control_fini(&adev->pm.smu_i2c);
> +       if (adev->psp.ras.ras)
> +               smu_v11_0_i2c_eeprom_control_fini(&adev->pm.smu_i2c);
>
>         if (priv) {
>                 amdgpu_bo_free_kernel(&priv->smu_tables.entry[TABLE_PPTABLE].handle,
> --
> 2.26.2
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-06-25 17:16 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-25 16:50 [PATCH v1] drm/amd/powerplay: Fix NULL dereference in lock_bus() on Vega20 w/o RAS Ivan Mironov
2020-06-25 16:56 ` Ivan Mironov
2020-06-25 17:16 ` Alex Deucher

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).