All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdgpu: Clear garbage data in err_data before usage
@ 2022-01-06  9:16 Jiawei Gu
  2022-01-06 10:05 ` Zhou1, Tao
  0 siblings, 1 reply; 3+ messages in thread
From: Jiawei Gu @ 2022-01-06  9:16 UTC (permalink / raw)
  To: amd-gfx, John.Clements, Stanley.Yang, Emily.Deng; +Cc: Jiawei Gu

Memory of err_data should be cleaned before usage
when there're multiple entry in ras ih.
Otherwise garbage data from last loop will be used.

Signed-off-by: Jiawei Gu <Jiawei.Gu@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 31bad1a20ed0..3f5bf5780ebf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1592,6 +1592,7 @@ static void amdgpu_ras_interrupt_handler(struct ras_manager *obj)
 				/* Let IP handle its data, maybe we need get the output
 				 * from the callback to udpate the error type/count, etc
 				 */
+				memset(&err_data, 0, sizeof(err_data));
 				ret = data->cb(obj->adev, &err_data, &entry);
 				/* ue will trigger an interrupt, and in that case
 				 * we need do a reset to recovery the whole system.
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* RE: [PATCH] drm/amdgpu: Clear garbage data in err_data before usage
  2022-01-06  9:16 [PATCH] drm/amdgpu: Clear garbage data in err_data before usage Jiawei Gu
@ 2022-01-06 10:05 ` Zhou1, Tao
  2022-01-06 10:22   ` Gu, JiaWei (Will)
  0 siblings, 1 reply; 3+ messages in thread
From: Zhou1, Tao @ 2022-01-06 10:05 UTC (permalink / raw)
  To: Gu, JiaWei (Will), amd-gfx, Clements, John, Yang, Stanley, Deng, Emily
  Cc: Gu, JiaWei (Will)

[AMD Official Use Only]

Reviewed-by: Tao Zhou <tao.zhou1@amd.com>

May I know how do you reproduce the issue?

> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Jiawei
> Gu
> Sent: Thursday, January 6, 2022 5:17 PM
> To: amd-gfx@lists.freedesktop.org; Clements, John
> <John.Clements@amd.com>; Yang, Stanley <Stanley.Yang@amd.com>; Deng,
> Emily <Emily.Deng@amd.com>
> Cc: Gu, JiaWei (Will) <JiaWei.Gu@amd.com>
> Subject: [PATCH] drm/amdgpu: Clear garbage data in err_data before usage
> 
> Memory of err_data should be cleaned before usage when there're multiple
> entry in ras ih.
> Otherwise garbage data from last loop will be used.
> 
> Signed-off-by: Jiawei Gu <Jiawei.Gu@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 31bad1a20ed0..3f5bf5780ebf 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -1592,6 +1592,7 @@ static void amdgpu_ras_interrupt_handler(struct
> ras_manager *obj)
>  				/* Let IP handle its data, maybe we need get
> the output
>  				 * from the callback to udpate the error
> type/count, etc
>  				 */
> +				memset(&err_data, 0, sizeof(err_data));
>  				ret = data->cb(obj->adev, &err_data, &entry);
>  				/* ue will trigger an interrupt, and in that case
>  				 * we need do a reset to recovery the whole
> system.
> --
> 2.17.1

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: [PATCH] drm/amdgpu: Clear garbage data in err_data before usage
  2022-01-06 10:05 ` Zhou1, Tao
@ 2022-01-06 10:22   ` Gu, JiaWei (Will)
  0 siblings, 0 replies; 3+ messages in thread
From: Gu, JiaWei (Will) @ 2022-01-06 10:22 UTC (permalink / raw)
  To: Zhou1, Tao, amd-gfx, Clements, John, Yang, Stanley, Deng, Emily

[AMD Official Use Only]

Via ras_ctrl sys node one uncorrectable error injection on Sienna Cichlid, two interrupts will be triggered.
I was informed the two interrupts are as expected since when error address is not 64byte aligned, one 64Byte SDP request will be split to two 32Byte request in UMC and sent to dram

Then the second interrupt handling will read the garbage data in err_data.
And the consequence is that ue counter increased by 2, and page at 0x0 address will be saved unexpectedly.

Best regards,
Jiawei  

-----Original Message-----
From: Zhou1, Tao <Tao.Zhou1@amd.com> 
Sent: Thursday, January 6, 2022 6:05 PM
To: Gu, JiaWei (Will) <JiaWei.Gu@amd.com>; amd-gfx@lists.freedesktop.org; Clements, John <John.Clements@amd.com>; Yang, Stanley <Stanley.Yang@amd.com>; Deng, Emily <Emily.Deng@amd.com>
Cc: Gu, JiaWei (Will) <JiaWei.Gu@amd.com>
Subject: RE: [PATCH] drm/amdgpu: Clear garbage data in err_data before usage

[AMD Official Use Only]

Reviewed-by: Tao Zhou <tao.zhou1@amd.com>

May I know how do you reproduce the issue?

> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of 
> Jiawei Gu
> Sent: Thursday, January 6, 2022 5:17 PM
> To: amd-gfx@lists.freedesktop.org; Clements, John 
> <John.Clements@amd.com>; Yang, Stanley <Stanley.Yang@amd.com>; Deng, 
> Emily <Emily.Deng@amd.com>
> Cc: Gu, JiaWei (Will) <JiaWei.Gu@amd.com>
> Subject: [PATCH] drm/amdgpu: Clear garbage data in err_data before 
> usage
> 
> Memory of err_data should be cleaned before usage when there're 
> multiple entry in ras ih.
> Otherwise garbage data from last loop will be used.
> 
> Signed-off-by: Jiawei Gu <Jiawei.Gu@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 31bad1a20ed0..3f5bf5780ebf 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -1592,6 +1592,7 @@ static void amdgpu_ras_interrupt_handler(struct
> ras_manager *obj)
>  				/* Let IP handle its data, maybe we need get the output
>  				 * from the callback to udpate the error type/count, etc
>  				 */
> +				memset(&err_data, 0, sizeof(err_data));
>  				ret = data->cb(obj->adev, &err_data, &entry);
>  				/* ue will trigger an interrupt, and in that case
>  				 * we need do a reset to recovery the whole system.
> --
> 2.17.1

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-01-06 10:22 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-06  9:16 [PATCH] drm/amdgpu: Clear garbage data in err_data before usage Jiawei Gu
2022-01-06 10:05 ` Zhou1, Tao
2022-01-06 10:22   ` Gu, JiaWei (Will)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.