All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdgpu: skip mes self test for gc 11.0.3 in recover
@ 2022-10-19  3:45 YuBiao Wang
  2022-10-19  3:53 ` Luben Tuikov
  0 siblings, 1 reply; 5+ messages in thread
From: YuBiao Wang @ 2022-10-19  3:45 UTC (permalink / raw)
  To: amd-gfx
  Cc: YuBiao Wang, Andrey Grodzovsky, Jack Xiao, Feifei Xu,
	horace.chen, Kevin Wang, Tuikov Luben, Deucher Alexander,
	Evan Quan, Christian König, Monk Liu, Hawking Zhang

Temporary disable mes self teset for gc 11.0.3 during gpu_recovery.

Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index e0445e8cc342..5b8362727226 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5381,7 +5381,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 			drm_sched_start(&ring->sched, !tmp_adev->asic_reset_res);
 		}
 
-		if (adev->enable_mes)
+		if (adev->enable_mes && adev->ip_versions[GC_HWIP][0] != IP_VERSION(11, 0, 3))
 			amdgpu_mes_self_test(tmp_adev);
 
 		if (!drm_drv_uses_atomic_modeset(adev_to_drm(tmp_adev)) && !job_signaled) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/amdgpu: skip mes self test for gc 11.0.3 in recover
  2022-10-19  3:45 [PATCH] drm/amdgpu: skip mes self test for gc 11.0.3 in recover YuBiao Wang
@ 2022-10-19  3:53 ` Luben Tuikov
  2022-10-19  4:50   ` Wang, YuBiao
  2022-10-20  2:44   ` Wang, YuBiao
  0 siblings, 2 replies; 5+ messages in thread
From: Luben Tuikov @ 2022-10-19  3:53 UTC (permalink / raw)
  To: YuBiao Wang, amd-gfx
  Cc: Andrey Grodzovsky, Jack Xiao, Feifei Xu, horace.chen, Kevin Wang,
	Deucher Alexander, Evan Quan, Christian König, Monk Liu,
	Hawking Zhang

On 2022-10-18 23:45, YuBiao Wang wrote:
> Temporary disable mes self teset for gc 11.0.3 during gpu_recovery.
> 

Is this "temporary" as in "we'll revert this commit later", or is
it "temporary" as in the code execution itself?

> Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index e0445e8cc342..5b8362727226 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -5381,7 +5381,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>  			drm_sched_start(&ring->sched, !tmp_adev->asic_reset_res);
>  		}
>  
> -		if (adev->enable_mes)
> +		if (adev->enable_mes && adev->ip_versions[GC_HWIP][0] != IP_VERSION(11, 0, 3))
>  			amdgpu_mes_self_test(tmp_adev);

Is this just for this version of the IP or this and any newer versions?

Regards,
Luben

>  
>  		if (!drm_drv_uses_atomic_modeset(adev_to_drm(tmp_adev)) && !job_signaled) {

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH] drm/amdgpu: skip mes self test for gc 11.0.3 in recover
  2022-10-19  3:53 ` Luben Tuikov
@ 2022-10-19  4:50   ` Wang, YuBiao
  2022-10-20  2:44   ` Wang, YuBiao
  1 sibling, 0 replies; 5+ messages in thread
From: Wang, YuBiao @ 2022-10-19  4:50 UTC (permalink / raw)
  To: Tuikov, Luben, amd-gfx
  Cc: Grodzovsky, Andrey, Xiao, Jack, Wang, Yang(Kevin),
	Xu, Feifei, Chen, Horace, Deucher, Alexander, Quan, Evan, Koenig,
	 Christian, Liu, Monk, Zhang,  Hawking

Hi Luben,

As far as I know of this is only for gc 11.0.3. Mes self test is also skipped in mes late init for this version of IP.

Best Regards,
Yubiao Wang

-----Original Message-----
From: Tuikov, Luben <Luben.Tuikov@amd.com> 
Sent: Wednesday, October 19, 2022 11:53 AM
To: Wang, YuBiao <YuBiao.Wang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Quan, Evan <Evan.Quan@amd.com>; Chen, Horace <Horace.Chen@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; Deucher, Alexander <Alexander.Deucher@amd.com>; Xiao, Jack <Jack.Xiao@amd.com>; Zhang, Hawking <Hawking.Zhang@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Xu, Feifei <Feifei.Xu@amd.com>; Wang, Yang(Kevin) <KevinYang.Wang@amd.com>
Subject: Re: [PATCH] drm/amdgpu: skip mes self test for gc 11.0.3 in recover

On 2022-10-18 23:45, YuBiao Wang wrote:
> Temporary disable mes self teset for gc 11.0.3 during gpu_recovery.
> 

Is this "temporary" as in "we'll revert this commit later", or is it "temporary" as in the code execution itself?

> Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index e0445e8cc342..5b8362727226 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -5381,7 +5381,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>  			drm_sched_start(&ring->sched, !tmp_adev->asic_reset_res);
>  		}
>  
> -		if (adev->enable_mes)
> +		if (adev->enable_mes && adev->ip_versions[GC_HWIP][0] != 
> +IP_VERSION(11, 0, 3))
>  			amdgpu_mes_self_test(tmp_adev);

Is this just for this version of the IP or this and any newer versions?

Regards,
Luben

>  
>  		if (!drm_drv_uses_atomic_modeset(adev_to_drm(tmp_adev)) && 
> !job_signaled) {

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH] drm/amdgpu: skip mes self test for gc 11.0.3 in recover
  2022-10-19  3:53 ` Luben Tuikov
  2022-10-19  4:50   ` Wang, YuBiao
@ 2022-10-20  2:44   ` Wang, YuBiao
  2022-10-20  7:03     ` Luben Tuikov
  1 sibling, 1 reply; 5+ messages in thread
From: Wang, YuBiao @ 2022-10-20  2:44 UTC (permalink / raw)
  To: Tuikov, Luben, amd-gfx, Zhang, Hawking
  Cc: Grodzovsky, Andrey, Xiao, Jack, Wang, Yang(Kevin),
	Xu, Feifei, Chen, Horace, Deucher, Alexander, Quan, Evan, Koenig,
	 Christian, Liu, Monk

Hi Luben,

>> Is this "temporary" as in "we'll revert this commit later", or is it "temporary" as in the code execution itself?
>> Is this just for this version of the IP or this and any newer versions?

I suppose that it is meant to be reverted later. There is a similar patch in commit c25a7a8bf19a98578ad27aaaa78082276ea1557c which also temporarily skip mes self test only for gc_11.0.3 during mes late init, which was reviewed by @Zhang, Hawking. My patch is to also skip mes self test during gpu recover since self test will also cause failure during reset.

Best Regards,
Yubiao Wang

-----Original Message-----
From: Tuikov, Luben <Luben.Tuikov@amd.com> 
Sent: Wednesday, October 19, 2022 11:53 AM
To: Wang, YuBiao <YuBiao.Wang@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Quan, Evan <Evan.Quan@amd.com>; Chen, Horace <Horace.Chen@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; Deucher, Alexander <Alexander.Deucher@amd.com>; Xiao, Jack <Jack.Xiao@amd.com>; Zhang, Hawking <Hawking.Zhang@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Xu, Feifei <Feifei.Xu@amd.com>; Wang, Yang(Kevin) <KevinYang.Wang@amd.com>
Subject: Re: [PATCH] drm/amdgpu: skip mes self test for gc 11.0.3 in recover

On 2022-10-18 23:45, YuBiao Wang wrote:
> Temporary disable mes self teset for gc 11.0.3 during gpu_recovery.
> 

Is this "temporary" as in "we'll revert this commit later", or is it "temporary" as in the code execution itself?

> Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index e0445e8cc342..5b8362727226 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -5381,7 +5381,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>  			drm_sched_start(&ring->sched, !tmp_adev->asic_reset_res);
>  		}
>  
> -		if (adev->enable_mes)
> +		if (adev->enable_mes && adev->ip_versions[GC_HWIP][0] != 
> +IP_VERSION(11, 0, 3))
>  			amdgpu_mes_self_test(tmp_adev);

Is this just for this version of the IP or this and any newer versions?

Regards,
Luben

>  
>  		if (!drm_drv_uses_atomic_modeset(adev_to_drm(tmp_adev)) && 
> !job_signaled) {

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/amdgpu: skip mes self test for gc 11.0.3 in recover
  2022-10-20  2:44   ` Wang, YuBiao
@ 2022-10-20  7:03     ` Luben Tuikov
  0 siblings, 0 replies; 5+ messages in thread
From: Luben Tuikov @ 2022-10-20  7:03 UTC (permalink / raw)
  To: Wang, YuBiao, amd-gfx, Zhang, Hawking
  Cc: Grodzovsky, Andrey, Xiao, Jack, Wang, Yang(Kevin),
	Xu, Feifei, Chen, Horace, Deucher, Alexander, Quan, Evan, Koenig,
	Christian, Liu, Monk

Hi YuBiao,

Ah, okay, there's a precedent for such a change then.

Acked-by: Luben Tuikov <luben.tuikov@amd.com>

Regards,
Luben

On 2022-10-19 22:44, Wang, YuBiao wrote:
> Hi Luben,
> 
>>> Is this "temporary" as in "we'll revert this commit later", or is it "temporary" as in the code execution itself?
>>> Is this just for this version of the IP or this and any newer versions?
> 
> I suppose that it is meant to be reverted later. There is a similar patch in commit c25a7a8bf19a98578ad27aaaa78082276ea1557c which also temporarily skip mes self test only for gc_11.0.3 during mes late init, which was reviewed by @Zhang, Hawking. My patch is to also skip mes self test during gpu recover since self test will also cause failure during reset.
> 
> Best Regards,
> Yubiao Wang
> 
> -----Original Message-----
> From: Tuikov, Luben <Luben.Tuikov@amd.com> 
> Sent: Wednesday, October 19, 2022 11:53 AM
> To: Wang, YuBiao <YuBiao.Wang@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Quan, Evan <Evan.Quan@amd.com>; Chen, Horace <Horace.Chen@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; Deucher, Alexander <Alexander.Deucher@amd.com>; Xiao, Jack <Jack.Xiao@amd.com>; Zhang, Hawking <Hawking.Zhang@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Xu, Feifei <Feifei.Xu@amd.com>; Wang, Yang(Kevin) <KevinYang.Wang@amd.com>
> Subject: Re: [PATCH] drm/amdgpu: skip mes self test for gc 11.0.3 in recover
> 
> On 2022-10-18 23:45, YuBiao Wang wrote:
>> Temporary disable mes self teset for gc 11.0.3 during gpu_recovery.
>>
> 
> Is this "temporary" as in "we'll revert this commit later", or is it "temporary" as in the code execution itself?
> 
>> Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index e0445e8cc342..5b8362727226 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -5381,7 +5381,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>>  			drm_sched_start(&ring->sched, !tmp_adev->asic_reset_res);
>>  		}
>>  
>> -		if (adev->enable_mes)
>> +		if (adev->enable_mes && adev->ip_versions[GC_HWIP][0] != 
>> +IP_VERSION(11, 0, 3))
>>  			amdgpu_mes_self_test(tmp_adev);
> 
> Is this just for this version of the IP or this and any newer versions?
> 
> Regards,
> Luben
> 
>>  
>>  		if (!drm_drv_uses_atomic_modeset(adev_to_drm(tmp_adev)) && 
>> !job_signaled) {

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-10-20  7:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-19  3:45 [PATCH] drm/amdgpu: skip mes self test for gc 11.0.3 in recover YuBiao Wang
2022-10-19  3:53 ` Luben Tuikov
2022-10-19  4:50   ` Wang, YuBiao
2022-10-20  2:44   ` Wang, YuBiao
2022-10-20  7:03     ` Luben Tuikov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.