All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH Review 1/1] drm/amdgpu: fix disable ras feature failed when unload drvier
@ 2021-11-26  9:48 Stanley.Yang
  2021-11-26 12:57 ` Zhang, Hawking
  0 siblings, 1 reply; 5+ messages in thread
From: Stanley.Yang @ 2021-11-26  9:48 UTC (permalink / raw)
  To: amd-gfx, Hawking.Zhang, John.Clements, tao.zhou1, candice.li,
	yipeng.chai
  Cc: Stanley.Yang

Function amdgpu_device_fini_hw is called before amdgpu_device_fini_sw,
so ras ta will unload before send ras disable command, ras dsiable operation
must before hw fini.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c    | 4 ----
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 73ec46140d68..d5e642e90010 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2838,8 +2838,6 @@ static int amdgpu_device_ip_fini(struct amdgpu_device *adev)
 	if (amdgpu_sriov_vf(adev) && adev->virt.ras_init_done)
 		amdgpu_virt_release_ras_err_handler_data(adev);
 
-	amdgpu_ras_pre_fini(adev);
-
 	if (adev->gmc.xgmi.num_physical_nodes > 1)
 		amdgpu_xgmi_remove_device(adev);
 
@@ -3959,6 +3957,9 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
 
 	amdgpu_fbdev_fini(adev);
 
+	/* disable ras feature must before hw fini */
+	amdgpu_ras_pre_fini(adev);
+
 	amdgpu_device_ip_fini_early(adev);
 
 	amdgpu_irq_fini_hw(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 39dfd4d59881..65102d2a0a98 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -2484,7 +2484,6 @@ void amdgpu_ras_late_fini(struct amdgpu_device *adev,
 	amdgpu_ras_sysfs_remove(adev, ras_block);
 	if (ih_info->cb)
 		amdgpu_ras_interrupt_remove_handler(adev, ih_info);
-	amdgpu_ras_feature_enable(adev, ras_block, 0);
 }
 
 /* do some init work after IP late init as dependence.
@@ -2564,9 +2563,6 @@ int amdgpu_ras_fini(struct amdgpu_device *adev)
 
 	WARN(con->features, "Feature mask is not cleared");
 
-	if (con->features)
-		amdgpu_ras_disable_all_features(adev, 1);
-
 	cancel_delayed_work_sync(&con->ras_counte_delay_work);
 
 	amdgpu_ras_set_context(adev, NULL);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* RE: [PATCH Review 1/1] drm/amdgpu: fix disable ras feature failed when unload drvier
  2021-11-26  9:48 [PATCH Review 1/1] drm/amdgpu: fix disable ras feature failed when unload drvier Stanley.Yang
@ 2021-11-26 12:57 ` Zhang, Hawking
  2021-11-26 13:07   ` 回复: " Yang, Stanley
  0 siblings, 1 reply; 5+ messages in thread
From: Zhang, Hawking @ 2021-11-26 12:57 UTC (permalink / raw)
  To: Yang, Stanley, amd-gfx, Clements, John, Zhou1, Tao, Li, Candice,
	Chai, Thomas
  Cc: Yang, Stanley

[AMD Official Use Only]

Good catch. We still need to release ras object in the end. Any reason the sequence was removed?

@@ -2564,9 +2563,6 @@ int amdgpu_ras_fini(struct amdgpu_device *adev)
 
 	WARN(con->features, "Feature mask is not cleared");
 
-	if (con->features)
-		amdgpu_ras_disable_all_features(adev, 1);
-
	cancel_delayed_work_sync(&con->ras_counte_delay_work);

Regards,
Hawking

-----Original Message-----
From: Stanley.Yang <Stanley.Yang@amd.com> 
Sent: Friday, November 26, 2021 17:48
To: amd-gfx@lists.freedesktop.org; Zhang, Hawking <Hawking.Zhang@amd.com>; Clements, John <John.Clements@amd.com>; Zhou1, Tao <Tao.Zhou1@amd.com>; Li, Candice <Candice.Li@amd.com>; Chai, Thomas <YiPeng.Chai@amd.com>
Cc: Yang, Stanley <Stanley.Yang@amd.com>
Subject: [PATCH Review 1/1] drm/amdgpu: fix disable ras feature failed when unload drvier

Function amdgpu_device_fini_hw is called before amdgpu_device_fini_sw, so ras ta will unload before send ras disable command, ras dsiable operation must before hw fini.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c    | 4 ----
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 73ec46140d68..d5e642e90010 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2838,8 +2838,6 @@ static int amdgpu_device_ip_fini(struct amdgpu_device *adev)
 	if (amdgpu_sriov_vf(adev) && adev->virt.ras_init_done)
 		amdgpu_virt_release_ras_err_handler_data(adev);
 
-	amdgpu_ras_pre_fini(adev);
-
 	if (adev->gmc.xgmi.num_physical_nodes > 1)
 		amdgpu_xgmi_remove_device(adev);
 
@@ -3959,6 +3957,9 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
 
 	amdgpu_fbdev_fini(adev);
 
+	/* disable ras feature must before hw fini */
+	amdgpu_ras_pre_fini(adev);
+
 	amdgpu_device_ip_fini_early(adev);
 
 	amdgpu_irq_fini_hw(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 39dfd4d59881..65102d2a0a98 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -2484,7 +2484,6 @@ void amdgpu_ras_late_fini(struct amdgpu_device *adev,
 	amdgpu_ras_sysfs_remove(adev, ras_block);
 	if (ih_info->cb)
 		amdgpu_ras_interrupt_remove_handler(adev, ih_info);
-	amdgpu_ras_feature_enable(adev, ras_block, 0);
 }
 
 /* do some init work after IP late init as dependence.
@@ -2564,9 +2563,6 @@ int amdgpu_ras_fini(struct amdgpu_device *adev)
 
 	WARN(con->features, "Feature mask is not cleared");
 
-	if (con->features)
-		amdgpu_ras_disable_all_features(adev, 1);
-
 	cancel_delayed_work_sync(&con->ras_counte_delay_work);
 
 	amdgpu_ras_set_context(adev, NULL);
--
2.17.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* 回复: [PATCH Review 1/1] drm/amdgpu: fix disable ras feature failed when unload drvier
  2021-11-26 12:57 ` Zhang, Hawking
@ 2021-11-26 13:07   ` Yang, Stanley
  2021-11-26 13:11     ` Zhang, Hawking
  0 siblings, 1 reply; 5+ messages in thread
From: Yang, Stanley @ 2021-11-26 13:07 UTC (permalink / raw)
  To: Zhang, Hawking, amd-gfx, Clements, John, Zhou1, Tao, Li, Candice,
	Chai, Thomas

[AMD Official Use Only]

It's not necessary, because before hw fini, all ras features have been disabled and con->features is set to zero.

Regards,
Stanley
> -----邮件原件-----
> 发件人: Zhang, Hawking <Hawking.Zhang@amd.com>
> 发送时间: Friday, November 26, 2021 8:57 PM
> 收件人: Yang, Stanley <Stanley.Yang@amd.com>; amd-
> gfx@lists.freedesktop.org; Clements, John <John.Clements@amd.com>;
> Zhou1, Tao <Tao.Zhou1@amd.com>; Li, Candice <Candice.Li@amd.com>;
> Chai, Thomas <YiPeng.Chai@amd.com>
> 抄送: Yang, Stanley <Stanley.Yang@amd.com>
> 主题: RE: [PATCH Review 1/1] drm/amdgpu: fix disable ras feature failed
> when unload drvier
> 
> [AMD Official Use Only]
> 
> Good catch. We still need to release ras object in the end. Any reason the
> sequence was removed?
> 
> @@ -2564,9 +2563,6 @@ int amdgpu_ras_fini(struct amdgpu_device *adev)
> 
>  	WARN(con->features, "Feature mask is not cleared");
> 
> -	if (con->features)
> -		amdgpu_ras_disable_all_features(adev, 1);
> -
> 	cancel_delayed_work_sync(&con->ras_counte_delay_work);
> 
> Regards,
> Hawking
> 
> -----Original Message-----
> From: Stanley.Yang <Stanley.Yang@amd.com>
> Sent: Friday, November 26, 2021 17:48
> To: amd-gfx@lists.freedesktop.org; Zhang, Hawking
> <Hawking.Zhang@amd.com>; Clements, John <John.Clements@amd.com>;
> Zhou1, Tao <Tao.Zhou1@amd.com>; Li, Candice <Candice.Li@amd.com>;
> Chai, Thomas <YiPeng.Chai@amd.com>
> Cc: Yang, Stanley <Stanley.Yang@amd.com>
> Subject: [PATCH Review 1/1] drm/amdgpu: fix disable ras feature failed
> when unload drvier
> 
> Function amdgpu_device_fini_hw is called before amdgpu_device_fini_sw,
> so ras ta will unload before send ras disable command, ras dsiable operation
> must before hw fini.
> 
> Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c    | 4 ----
>  2 files changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 73ec46140d68..d5e642e90010 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2838,8 +2838,6 @@ static int amdgpu_device_ip_fini(struct
> amdgpu_device *adev)
>  	if (amdgpu_sriov_vf(adev) && adev->virt.ras_init_done)
>  		amdgpu_virt_release_ras_err_handler_data(adev);
> 
> -	amdgpu_ras_pre_fini(adev);
> -
>  	if (adev->gmc.xgmi.num_physical_nodes > 1)
>  		amdgpu_xgmi_remove_device(adev);
> 
> @@ -3959,6 +3957,9 @@ void amdgpu_device_fini_hw(struct
> amdgpu_device *adev)
> 
>  	amdgpu_fbdev_fini(adev);
> 
> +	/* disable ras feature must before hw fini */
> +	amdgpu_ras_pre_fini(adev);
> +
>  	amdgpu_device_ip_fini_early(adev);
> 
>  	amdgpu_irq_fini_hw(adev);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 39dfd4d59881..65102d2a0a98 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -2484,7 +2484,6 @@ void amdgpu_ras_late_fini(struct amdgpu_device
> *adev,
>  	amdgpu_ras_sysfs_remove(adev, ras_block);
>  	if (ih_info->cb)
>  		amdgpu_ras_interrupt_remove_handler(adev, ih_info);
> -	amdgpu_ras_feature_enable(adev, ras_block, 0);
>  }
> 
>  /* do some init work after IP late init as dependence.
> @@ -2564,9 +2563,6 @@ int amdgpu_ras_fini(struct amdgpu_device *adev)
> 
>  	WARN(con->features, "Feature mask is not cleared");
> 
> -	if (con->features)
> -		amdgpu_ras_disable_all_features(adev, 1);
> -
>  	cancel_delayed_work_sync(&con->ras_counte_delay_work);
> 
>  	amdgpu_ras_set_context(adev, NULL);
> --
> 2.17.1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH Review 1/1] drm/amdgpu: fix disable ras feature failed when unload drvier
  2021-11-26 13:07   ` 回复: " Yang, Stanley
@ 2021-11-26 13:11     ` Zhang, Hawking
  2021-11-26 13:21       ` 回复: " Yang, Stanley
  0 siblings, 1 reply; 5+ messages in thread
From: Zhang, Hawking @ 2021-11-26 13:11 UTC (permalink / raw)
  To: Yang, Stanley, amd-gfx, Clements, John, Zhou1, Tao, Li, Candice,
	Chai, Thomas

[AMD Official Use Only]

I suspect it is still needed, especially when amdgpu_ras_fini is used to deal with ras initialization failure in psp_ras_initialize.

Regards,
Hawking

-----Original Message-----
From: Yang, Stanley <Stanley.Yang@amd.com> 
Sent: Friday, November 26, 2021 21:08
To: Zhang, Hawking <Hawking.Zhang@amd.com>; amd-gfx@lists.freedesktop.org; Clements, John <John.Clements@amd.com>; Zhou1, Tao <Tao.Zhou1@amd.com>; Li, Candice <Candice.Li@amd.com>; Chai, Thomas <YiPeng.Chai@amd.com>
Subject: 回复: [PATCH Review 1/1] drm/amdgpu: fix disable ras feature failed when unload drvier

[AMD Official Use Only]

It's not necessary, because before hw fini, all ras features have been disabled and con->features is set to zero.

Regards,
Stanley
> -----邮件原件-----
> 发件人: Zhang, Hawking <Hawking.Zhang@amd.com>
> 发送时间: Friday, November 26, 2021 8:57 PM
> 收件人: Yang, Stanley <Stanley.Yang@amd.com>; amd- 
> gfx@lists.freedesktop.org; Clements, John <John.Clements@amd.com>; 
> Zhou1, Tao <Tao.Zhou1@amd.com>; Li, Candice <Candice.Li@amd.com>; 
> Chai, Thomas <YiPeng.Chai@amd.com>
> 抄送: Yang, Stanley <Stanley.Yang@amd.com>
> 主题: RE: [PATCH Review 1/1] drm/amdgpu: fix disable ras feature failed 
> when unload drvier
> 
> [AMD Official Use Only]
> 
> Good catch. We still need to release ras object in the end. Any reason 
> the sequence was removed?
> 
> @@ -2564,9 +2563,6 @@ int amdgpu_ras_fini(struct amdgpu_device *adev)
> 
>  	WARN(con->features, "Feature mask is not cleared");
> 
> -	if (con->features)
> -		amdgpu_ras_disable_all_features(adev, 1);
> -
> 	cancel_delayed_work_sync(&con->ras_counte_delay_work);
> 
> Regards,
> Hawking
> 
> -----Original Message-----
> From: Stanley.Yang <Stanley.Yang@amd.com>
> Sent: Friday, November 26, 2021 17:48
> To: amd-gfx@lists.freedesktop.org; Zhang, Hawking 
> <Hawking.Zhang@amd.com>; Clements, John <John.Clements@amd.com>; 
> Zhou1, Tao <Tao.Zhou1@amd.com>; Li, Candice <Candice.Li@amd.com>; 
> Chai, Thomas <YiPeng.Chai@amd.com>
> Cc: Yang, Stanley <Stanley.Yang@amd.com>
> Subject: [PATCH Review 1/1] drm/amdgpu: fix disable ras feature failed 
> when unload drvier
> 
> Function amdgpu_device_fini_hw is called before amdgpu_device_fini_sw, 
> so ras ta will unload before send ras disable command, ras dsiable 
> operation must before hw fini.
> 
> Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c    | 4 ----
>  2 files changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 73ec46140d68..d5e642e90010 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2838,8 +2838,6 @@ static int amdgpu_device_ip_fini(struct 
> amdgpu_device *adev)
>  	if (amdgpu_sriov_vf(adev) && adev->virt.ras_init_done)
>  		amdgpu_virt_release_ras_err_handler_data(adev);
> 
> -	amdgpu_ras_pre_fini(adev);
> -
>  	if (adev->gmc.xgmi.num_physical_nodes > 1)
>  		amdgpu_xgmi_remove_device(adev);
> 
> @@ -3959,6 +3957,9 @@ void amdgpu_device_fini_hw(struct amdgpu_device 
> *adev)
> 
>  	amdgpu_fbdev_fini(adev);
> 
> +	/* disable ras feature must before hw fini */
> +	amdgpu_ras_pre_fini(adev);
> +
>  	amdgpu_device_ip_fini_early(adev);
> 
>  	amdgpu_irq_fini_hw(adev);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 39dfd4d59881..65102d2a0a98 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -2484,7 +2484,6 @@ void amdgpu_ras_late_fini(struct amdgpu_device 
> *adev,
>  	amdgpu_ras_sysfs_remove(adev, ras_block);
>  	if (ih_info->cb)
>  		amdgpu_ras_interrupt_remove_handler(adev, ih_info);
> -	amdgpu_ras_feature_enable(adev, ras_block, 0);
>  }
> 
>  /* do some init work after IP late init as dependence.
> @@ -2564,9 +2563,6 @@ int amdgpu_ras_fini(struct amdgpu_device *adev)
> 
>  	WARN(con->features, "Feature mask is not cleared");
> 
> -	if (con->features)
> -		amdgpu_ras_disable_all_features(adev, 1);
> -
>  	cancel_delayed_work_sync(&con->ras_counte_delay_work);
> 
>  	amdgpu_ras_set_context(adev, NULL);
> --
> 2.17.1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* 回复: [PATCH Review 1/1] drm/amdgpu: fix disable ras feature failed when unload drvier
  2021-11-26 13:11     ` Zhang, Hawking
@ 2021-11-26 13:21       ` Yang, Stanley
  0 siblings, 0 replies; 5+ messages in thread
From: Yang, Stanley @ 2021-11-26 13:21 UTC (permalink / raw)
  To: Zhang, Hawking, amd-gfx, Clements, John, Zhou1, Tao, Li, Candice,
	Chai, Thomas

[AMD Official Use Only]

Yeah, you are right, I ignored ras initialization failure case, will update soon, thanks.

Regards,
Stanley
> -----邮件原件-----
> 发件人: Zhang, Hawking <Hawking.Zhang@amd.com>
> 发送时间: Friday, November 26, 2021 9:11 PM
> 收件人: Yang, Stanley <Stanley.Yang@amd.com>; amd-
> gfx@lists.freedesktop.org; Clements, John <John.Clements@amd.com>;
> Zhou1, Tao <Tao.Zhou1@amd.com>; Li, Candice <Candice.Li@amd.com>;
> Chai, Thomas <YiPeng.Chai@amd.com>
> 主题: RE: [PATCH Review 1/1] drm/amdgpu: fix disable ras feature failed
> when unload drvier
> 
> [AMD Official Use Only]
> 
> I suspect it is still needed, especially when amdgpu_ras_fini is used to deal
> with ras initialization failure in psp_ras_initialize.
> 
> Regards,
> Hawking
> 
> -----Original Message-----
> From: Yang, Stanley <Stanley.Yang@amd.com>
> Sent: Friday, November 26, 2021 21:08
> To: Zhang, Hawking <Hawking.Zhang@amd.com>; amd-
> gfx@lists.freedesktop.org; Clements, John <John.Clements@amd.com>;
> Zhou1, Tao <Tao.Zhou1@amd.com>; Li, Candice <Candice.Li@amd.com>;
> Chai, Thomas <YiPeng.Chai@amd.com>
> Subject: 回复: [PATCH Review 1/1] drm/amdgpu: fix disable ras feature
> failed when unload drvier
> 
> [AMD Official Use Only]
> 
> It's not necessary, because before hw fini, all ras features have been
> disabled and con->features is set to zero.
> 
> Regards,
> Stanley
> > -----邮件原件-----
> > 发件人: Zhang, Hawking <Hawking.Zhang@amd.com>
> > 发送时间: Friday, November 26, 2021 8:57 PM
> > 收件人: Yang, Stanley <Stanley.Yang@amd.com>; amd-
> > gfx@lists.freedesktop.org; Clements, John <John.Clements@amd.com>;
> > Zhou1, Tao <Tao.Zhou1@amd.com>; Li, Candice <Candice.Li@amd.com>;
> > Chai, Thomas <YiPeng.Chai@amd.com>
> > 抄送: Yang, Stanley <Stanley.Yang@amd.com>
> > 主题: RE: [PATCH Review 1/1] drm/amdgpu: fix disable ras feature failed
> > when unload drvier
> >
> > [AMD Official Use Only]
> >
> > Good catch. We still need to release ras object in the end. Any reason
> > the sequence was removed?
> >
> > @@ -2564,9 +2563,6 @@ int amdgpu_ras_fini(struct amdgpu_device *adev)
> >
> >  	WARN(con->features, "Feature mask is not cleared");
> >
> > -	if (con->features)
> > -		amdgpu_ras_disable_all_features(adev, 1);
> > -
> > 	cancel_delayed_work_sync(&con->ras_counte_delay_work);
> >
> > Regards,
> > Hawking
> >
> > -----Original Message-----
> > From: Stanley.Yang <Stanley.Yang@amd.com>
> > Sent: Friday, November 26, 2021 17:48
> > To: amd-gfx@lists.freedesktop.org; Zhang, Hawking
> > <Hawking.Zhang@amd.com>; Clements, John
> <John.Clements@amd.com>;
> > Zhou1, Tao <Tao.Zhou1@amd.com>; Li, Candice <Candice.Li@amd.com>;
> > Chai, Thomas <YiPeng.Chai@amd.com>
> > Cc: Yang, Stanley <Stanley.Yang@amd.com>
> > Subject: [PATCH Review 1/1] drm/amdgpu: fix disable ras feature failed
> > when unload drvier
> >
> > Function amdgpu_device_fini_hw is called before amdgpu_device_fini_sw,
> > so ras ta will unload before send ras disable command, ras dsiable
> > operation must before hw fini.
> >
> > Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +++--
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c    | 4 ----
> >  2 files changed, 3 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 73ec46140d68..d5e642e90010 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -2838,8 +2838,6 @@ static int amdgpu_device_ip_fini(struct
> > amdgpu_device *adev)
> >  	if (amdgpu_sriov_vf(adev) && adev->virt.ras_init_done)
> >  		amdgpu_virt_release_ras_err_handler_data(adev);
> >
> > -	amdgpu_ras_pre_fini(adev);
> > -
> >  	if (adev->gmc.xgmi.num_physical_nodes > 1)
> >  		amdgpu_xgmi_remove_device(adev);
> >
> > @@ -3959,6 +3957,9 @@ void amdgpu_device_fini_hw(struct
> amdgpu_device
> > *adev)
> >
> >  	amdgpu_fbdev_fini(adev);
> >
> > +	/* disable ras feature must before hw fini */
> > +	amdgpu_ras_pre_fini(adev);
> > +
> >  	amdgpu_device_ip_fini_early(adev);
> >
> >  	amdgpu_irq_fini_hw(adev);
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > index 39dfd4d59881..65102d2a0a98 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > @@ -2484,7 +2484,6 @@ void amdgpu_ras_late_fini(struct amdgpu_device
> > *adev,
> >  	amdgpu_ras_sysfs_remove(adev, ras_block);
> >  	if (ih_info->cb)
> >  		amdgpu_ras_interrupt_remove_handler(adev, ih_info);
> > -	amdgpu_ras_feature_enable(adev, ras_block, 0);
> >  }
> >
> >  /* do some init work after IP late init as dependence.
> > @@ -2564,9 +2563,6 @@ int amdgpu_ras_fini(struct amdgpu_device *adev)
> >
> >  	WARN(con->features, "Feature mask is not cleared");
> >
> > -	if (con->features)
> > -		amdgpu_ras_disable_all_features(adev, 1);
> > -
> >  	cancel_delayed_work_sync(&con->ras_counte_delay_work);
> >
> >  	amdgpu_ras_set_context(adev, NULL);
> > --
> > 2.17.1

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-11-26 13:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-26  9:48 [PATCH Review 1/1] drm/amdgpu: fix disable ras feature failed when unload drvier Stanley.Yang
2021-11-26 12:57 ` Zhang, Hawking
2021-11-26 13:07   ` 回复: " Yang, Stanley
2021-11-26 13:11     ` Zhang, Hawking
2021-11-26 13:21       ` 回复: " Yang, Stanley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.