All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/amdgpu: Add a filter condition to restrict the SW ras function to be registered only by asics whose hardware supports the ras function
@ 2022-01-12  7:48 yipechai
  2022-01-12  7:48 ` [PATCH 2/2] drm/amdgpu: No longer insert ras blocks into ras_list if it already exists in ras_list yipechai
  2022-01-12  8:27 ` [PATCH 1/2] drm/amdgpu: Add a filter condition to restrict the SW ras function to be registered only by asics whose hardware supports the ras function Zhou1, Tao
  0 siblings, 2 replies; 9+ messages in thread
From: yipechai @ 2022-01-12  7:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Tao.Zhou1, Hawking.Zhang, John.Clements, yipechai, yipechai

Add a filter condition to restrict the SW ras function to be registered only by asics whose hardware supports the ras function.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index b1bedfd4febc..62be0b4909b3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -2754,7 +2754,7 @@ int amdgpu_ras_reset_gpu(struct amdgpu_device *adev)
 int amdgpu_ras_register_ras_block(struct amdgpu_device *adev,
 		struct amdgpu_ras_block_object* ras_block_obj)
 {
-	if (!adev || !ras_block_obj)
+	if (!adev || !amdgpu_ras_asic_supported(adev) || !ras_block_obj)
 		return -EINVAL;
 
 	INIT_LIST_HEAD(&ras_block_obj->node);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] drm/amdgpu: No longer insert ras blocks into ras_list if it already exists in ras_list
  2022-01-12  7:48 [PATCH 1/2] drm/amdgpu: Add a filter condition to restrict the SW ras function to be registered only by asics whose hardware supports the ras function yipechai
@ 2022-01-12  7:48 ` yipechai
  2022-01-12  8:36   ` Zhou1, Tao
  2022-01-12 16:39   ` Felix Kuehling
  2022-01-12  8:27 ` [PATCH 1/2] drm/amdgpu: Add a filter condition to restrict the SW ras function to be registered only by asics whose hardware supports the ras function Zhou1, Tao
  1 sibling, 2 replies; 9+ messages in thread
From: yipechai @ 2022-01-12  7:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Tao.Zhou1, Hawking.Zhang, John.Clements, yipechai, yipechai

No longer insert ras blocks into ras_list if it already exists in ras_list.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 62be0b4909b3..e6d3bb4b56e4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -2754,9 +2754,17 @@ int amdgpu_ras_reset_gpu(struct amdgpu_device *adev)
 int amdgpu_ras_register_ras_block(struct amdgpu_device *adev,
 		struct amdgpu_ras_block_object* ras_block_obj)
 {
+	struct amdgpu_ras_block_object *obj, *tmp;
 	if (!adev || !amdgpu_ras_asic_supported(adev) || !ras_block_obj)
 		return -EINVAL;
 
+	/* If the ras object had been in ras_list, doesn't add it to ras_list again */
+	list_for_each_entry_safe(obj, tmp, &adev->ras_list, node) {
+		if (obj == ras_block_obj) {
+			return 0;
+		}
+	}
+
 	INIT_LIST_HEAD(&ras_block_obj->node);
 	list_add_tail(&ras_block_obj->node, &adev->ras_list);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* RE: [PATCH 1/2] drm/amdgpu: Add a filter condition to restrict the SW ras function to be registered only by asics whose hardware supports the ras function
  2022-01-12  7:48 [PATCH 1/2] drm/amdgpu: Add a filter condition to restrict the SW ras function to be registered only by asics whose hardware supports the ras function yipechai
  2022-01-12  7:48 ` [PATCH 2/2] drm/amdgpu: No longer insert ras blocks into ras_list if it already exists in ras_list yipechai
@ 2022-01-12  8:27 ` Zhou1, Tao
  2022-01-12  9:10   ` Chai, Thomas
  1 sibling, 1 reply; 9+ messages in thread
From: Zhou1, Tao @ 2022-01-12  8:27 UTC (permalink / raw)
  To: Chai, Thomas, amd-gfx; +Cc: Clements, John, Zhang, Hawking

[AMD Official Use Only]



> -----Original Message-----
> From: Chai, Thomas <YiPeng.Chai@amd.com>
> Sent: Wednesday, January 12, 2022 3:48 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Chai, Thomas <YiPeng.Chai@amd.com>; Zhang, Hawking
> <Hawking.Zhang@amd.com>; Zhou1, Tao <Tao.Zhou1@amd.com>; Clements,
> John <John.Clements@amd.com>; Chai, Thomas <YiPeng.Chai@amd.com>
> Subject: [PATCH 1/2] drm/amdgpu: Add a filter condition to restrict the SW ras
> function to be registered only by asics whose hardware supports the ras function

[Tao] The subject is too long, I think "add ras supported check for register_ras_block" is enough.

> 
> Add a filter condition to restrict the SW ras function to be registered only by
> asics whose hardware supports the ras function.
> 
> Signed-off-by: yipechai <YiPeng.Chai@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index b1bedfd4febc..62be0b4909b3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -2754,7 +2754,7 @@ int amdgpu_ras_reset_gpu(struct amdgpu_device
> *adev)  int amdgpu_ras_register_ras_block(struct amdgpu_device *adev,
>  		struct amdgpu_ras_block_object* ras_block_obj)  {
> -	if (!adev || !ras_block_obj)
> +	if (!adev || !amdgpu_ras_asic_supported(adev) || !ras_block_obj)
>  		return -EINVAL;

[Tao] Can we return 0 if !amdgpu_ras_asic_supported(adev)? It's not an error.

> 
>  	INIT_LIST_HEAD(&ras_block_obj->node);
> --
> 2.25.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 2/2] drm/amdgpu: No longer insert ras blocks into ras_list if it already exists in ras_list
  2022-01-12  7:48 ` [PATCH 2/2] drm/amdgpu: No longer insert ras blocks into ras_list if it already exists in ras_list yipechai
@ 2022-01-12  8:36   ` Zhou1, Tao
  2022-01-12  9:13     ` Chai, Thomas
  2022-01-13 17:54     ` Alex Deucher
  2022-01-12 16:39   ` Felix Kuehling
  1 sibling, 2 replies; 9+ messages in thread
From: Zhou1, Tao @ 2022-01-12  8:36 UTC (permalink / raw)
  To: Chai, Thomas, amd-gfx; +Cc: Clements, John, Zhang, Hawking

[AMD Official Use Only]



> -----Original Message-----
> From: Chai, Thomas <YiPeng.Chai@amd.com>
> Sent: Wednesday, January 12, 2022 3:48 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Chai, Thomas <YiPeng.Chai@amd.com>; Zhang, Hawking
> <Hawking.Zhang@amd.com>; Zhou1, Tao <Tao.Zhou1@amd.com>; Clements,
> John <John.Clements@amd.com>; Chai, Thomas <YiPeng.Chai@amd.com>
> Subject: [PATCH 2/2] drm/amdgpu: No longer insert ras blocks into ras_list if it
> already exists in ras_list
> 
> No longer insert ras blocks into ras_list if it already exists in ras_list.
> 
> Signed-off-by: yipechai <YiPeng.Chai@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 62be0b4909b3..e6d3bb4b56e4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -2754,9 +2754,17 @@ int amdgpu_ras_reset_gpu(struct amdgpu_device
> *adev)  int amdgpu_ras_register_ras_block(struct amdgpu_device *adev,
>  		struct amdgpu_ras_block_object* ras_block_obj)  {
> +	struct amdgpu_ras_block_object *obj, *tmp;
>  	if (!adev || !amdgpu_ras_asic_supported(adev) || !ras_block_obj)
>  		return -EINVAL;
> 
> +	/* If the ras object had been in ras_list, doesn't add it to ras_list again */
[Tao] How about "If the ras object is in ras_list, don't add it again"

> +	list_for_each_entry_safe(obj, tmp, &adev->ras_list, node) {
> +		if (obj == ras_block_obj) {
> +			return 0;
> +		}
> +	}

[Tao] The patch is OK for me currently, but I think the root cause is we initialize adev->gmc.xgmi.ras in gmc_ras_late_init, the initialization should be called only in modprobe stage and we can create a general gmc_early_init for it.

> +
>  	INIT_LIST_HEAD(&ras_block_obj->node);
>  	list_add_tail(&ras_block_obj->node, &adev->ras_list);
> 
> --
> 2.25.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 1/2] drm/amdgpu: Add a filter condition to restrict the SW ras function to be registered only by asics whose hardware supports the ras function
  2022-01-12  8:27 ` [PATCH 1/2] drm/amdgpu: Add a filter condition to restrict the SW ras function to be registered only by asics whose hardware supports the ras function Zhou1, Tao
@ 2022-01-12  9:10   ` Chai, Thomas
  0 siblings, 0 replies; 9+ messages in thread
From: Chai, Thomas @ 2022-01-12  9:10 UTC (permalink / raw)
  To: Zhou1, Tao, amd-gfx; +Cc: Clements, John, Zhang, Hawking



-----Original Message-----
From: Zhou1, Tao <Tao.Zhou1@amd.com> 
Sent: Wednesday, January 12, 2022 4:28 PM
To: Chai, Thomas <YiPeng.Chai@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Clements, John <John.Clements@amd.com>
Subject: RE: [PATCH 1/2] drm/amdgpu: Add a filter condition to restrict the SW ras function to be registered only by asics whose hardware supports the ras function

[AMD Official Use Only]



> -----Original Message-----
> From: Chai, Thomas <YiPeng.Chai@amd.com>
> Sent: Wednesday, January 12, 2022 3:48 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Chai, Thomas <YiPeng.Chai@amd.com>; Zhang, Hawking 
> <Hawking.Zhang@amd.com>; Zhou1, Tao <Tao.Zhou1@amd.com>; Clements, 
> John <John.Clements@amd.com>; Chai, Thomas <YiPeng.Chai@amd.com>
> Subject: [PATCH 1/2] drm/amdgpu: Add a filter condition to restrict 
> the SW ras function to be registered only by asics whose hardware 
> supports the ras function

>[Tao] The subject is too long, I think "add ras supported check for register_ras_block" is enough.
[Thomas] Ok.

> 
> Add a filter condition to restrict the SW ras function to be 
> registered only by asics whose hardware supports the ras function.
> 
> Signed-off-by: yipechai <YiPeng.Chai@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index b1bedfd4febc..62be0b4909b3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -2754,7 +2754,7 @@ int amdgpu_ras_reset_gpu(struct amdgpu_device
> *adev)  int amdgpu_ras_register_ras_block(struct amdgpu_device *adev,
>  		struct amdgpu_ras_block_object* ras_block_obj)  {
> -	if (!adev || !ras_block_obj)
> +	if (!adev || !amdgpu_ras_asic_supported(adev) || !ras_block_obj)
>  		return -EINVAL;

>[Tao] Can we return 0 if !amdgpu_ras_asic_supported(adev)? It's not an error.
[Thomas] OK.

> 
>  	INIT_LIST_HEAD(&ras_block_obj->node);
> --
> 2.25.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 2/2] drm/amdgpu: No longer insert ras blocks into ras_list if it already exists in ras_list
  2022-01-12  8:36   ` Zhou1, Tao
@ 2022-01-12  9:13     ` Chai, Thomas
  2022-01-13 17:54     ` Alex Deucher
  1 sibling, 0 replies; 9+ messages in thread
From: Chai, Thomas @ 2022-01-12  9:13 UTC (permalink / raw)
  To: Zhou1, Tao, amd-gfx; +Cc: Clements, John, Zhang, Hawking



-----Original Message-----
From: Zhou1, Tao <Tao.Zhou1@amd.com> 
Sent: Wednesday, January 12, 2022 4:37 PM
To: Chai, Thomas <YiPeng.Chai@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Clements, John <John.Clements@amd.com>
Subject: RE: [PATCH 2/2] drm/amdgpu: No longer insert ras blocks into ras_list if it already exists in ras_list

[AMD Official Use Only]



> -----Original Message-----
> From: Chai, Thomas <YiPeng.Chai@amd.com>
> Sent: Wednesday, January 12, 2022 3:48 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Chai, Thomas <YiPeng.Chai@amd.com>; Zhang, Hawking 
> <Hawking.Zhang@amd.com>; Zhou1, Tao <Tao.Zhou1@amd.com>; Clements, 
> John <John.Clements@amd.com>; Chai, Thomas <YiPeng.Chai@amd.com>
> Subject: [PATCH 2/2] drm/amdgpu: No longer insert ras blocks into 
> ras_list if it already exists in ras_list
> 
> No longer insert ras blocks into ras_list if it already exists in ras_list.
> 
> Signed-off-by: yipechai <YiPeng.Chai@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 62be0b4909b3..e6d3bb4b56e4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -2754,9 +2754,17 @@ int amdgpu_ras_reset_gpu(struct amdgpu_device
> *adev)  int amdgpu_ras_register_ras_block(struct amdgpu_device *adev,
>  		struct amdgpu_ras_block_object* ras_block_obj)  {
> +	struct amdgpu_ras_block_object *obj, *tmp;
>  	if (!adev || !amdgpu_ras_asic_supported(adev) || !ras_block_obj)
>  		return -EINVAL;
> 
> +	/* If the ras object had been in ras_list, doesn't add it to 
> +ras_list again */
>[Tao] How about "If the ras object is in ras_list, don't add it again"

[Thomas] OK

> +	list_for_each_entry_safe(obj, tmp, &adev->ras_list, node) {
> +		if (obj == ras_block_obj) {
> +			return 0;
> +		}
> +	}

>[Tao] The patch is OK for me currently, but I think the root cause is we initialize adev->gmc.xgmi.ras in gmc_ras_late_init, the initialization should be called only in modprobe stage and we can create a general gmc_early_init for it.

[Thomas] This can create a new task to do it.

> +
>  	INIT_LIST_HEAD(&ras_block_obj->node);
>  	list_add_tail(&ras_block_obj->node, &adev->ras_list);
> 
> --
> 2.25.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: No longer insert ras blocks into ras_list if it already exists in ras_list
  2022-01-12  7:48 ` [PATCH 2/2] drm/amdgpu: No longer insert ras blocks into ras_list if it already exists in ras_list yipechai
  2022-01-12  8:36   ` Zhou1, Tao
@ 2022-01-12 16:39   ` Felix Kuehling
  2022-01-13  2:34     ` Chai, Thomas
  1 sibling, 1 reply; 9+ messages in thread
From: Felix Kuehling @ 2022-01-12 16:39 UTC (permalink / raw)
  To: yipechai, amd-gfx; +Cc: yipechai, Tao.Zhou1, John.Clements, Hawking.Zhang


Am 2022-01-12 um 2:48 a.m. schrieb yipechai:
> No longer insert ras blocks into ras_list if it already exists in ras_list.
>
> Signed-off-by: yipechai <YiPeng.Chai@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 62be0b4909b3..e6d3bb4b56e4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -2754,9 +2754,17 @@ int amdgpu_ras_reset_gpu(struct amdgpu_device *adev)
>  int amdgpu_ras_register_ras_block(struct amdgpu_device *adev,
>  		struct amdgpu_ras_block_object* ras_block_obj)
>  {
> +	struct amdgpu_ras_block_object *obj, *tmp;
>  	if (!adev || !amdgpu_ras_asic_supported(adev) || !ras_block_obj)
>  		return -EINVAL;
>  
> +	/* If the ras object had been in ras_list, doesn't add it to ras_list again */
> +	list_for_each_entry_safe(obj, tmp, &adev->ras_list, node) {
> +		if (obj == ras_block_obj) {
Instead of a loop, can't this be done more efficiently with "if
(!list_empty(&ras_block_obj->node))"?

Of course this would require that you move the INIT_LIST_HEAD to some
earlier stage so that list_empty is reliable.

Regards,
  Felix


> +			return 0;
> +		}
> +	}
> +
>  	INIT_LIST_HEAD(&ras_block_obj->node);
>  	list_add_tail(&ras_block_obj->node, &adev->ras_list);
>  

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 2/2] drm/amdgpu: No longer insert ras blocks into ras_list if it already exists in ras_list
  2022-01-12 16:39   ` Felix Kuehling
@ 2022-01-13  2:34     ` Chai, Thomas
  0 siblings, 0 replies; 9+ messages in thread
From: Chai, Thomas @ 2022-01-13  2:34 UTC (permalink / raw)
  To: Kuehling, Felix, amd-gfx; +Cc: Zhou1, Tao, Clements, John, Zhang, Hawking

Hi Felix:
     amdgpu_ras_register_ras_block was called by all IP ras blocks,  and every ip also has different ras versions.  We do common work together, which can reduce the chance of the ras function going wrong.

-----Original Message-----
From: Kuehling, Felix <Felix.Kuehling@amd.com> 
Sent: Thursday, January 13, 2022 12:39 AM
To: Chai, Thomas <YiPeng.Chai@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Zhou1, Tao <Tao.Zhou1@amd.com>; Zhang, Hawking <Hawking.Zhang@amd.com>; Clements, John <John.Clements@amd.com>; Chai, Thomas <YiPeng.Chai@amd.com>
Subject: Re: [PATCH 2/2] drm/amdgpu: No longer insert ras blocks into ras_list if it already exists in ras_list


Am 2022-01-12 um 2:48 a.m. schrieb yipechai:
> No longer insert ras blocks into ras_list if it already exists in ras_list.
>
> Signed-off-by: yipechai <YiPeng.Chai@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 62be0b4909b3..e6d3bb4b56e4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -2754,9 +2754,17 @@ int amdgpu_ras_reset_gpu(struct amdgpu_device 
> *adev)  int amdgpu_ras_register_ras_block(struct amdgpu_device *adev,
>  		struct amdgpu_ras_block_object* ras_block_obj)  {
> +	struct amdgpu_ras_block_object *obj, *tmp;
>  	if (!adev || !amdgpu_ras_asic_supported(adev) || !ras_block_obj)
>  		return -EINVAL;
>  
> +	/* If the ras object had been in ras_list, doesn't add it to ras_list again */
> +	list_for_each_entry_safe(obj, tmp, &adev->ras_list, node) {
> +		if (obj == ras_block_obj) {
Instead of a loop, can't this be done more efficiently with "if (!list_empty(&ras_block_obj->node))"?

Of course this would require that you move the INIT_LIST_HEAD to some earlier stage so that list_empty is reliable.

Regards,
  Felix


> +			return 0;
> +		}
> +	}
> +
>  	INIT_LIST_HEAD(&ras_block_obj->node);
>  	list_add_tail(&ras_block_obj->node, &adev->ras_list);
>  

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: No longer insert ras blocks into ras_list if it already exists in ras_list
  2022-01-12  8:36   ` Zhou1, Tao
  2022-01-12  9:13     ` Chai, Thomas
@ 2022-01-13 17:54     ` Alex Deucher
  1 sibling, 0 replies; 9+ messages in thread
From: Alex Deucher @ 2022-01-13 17:54 UTC (permalink / raw)
  To: Zhou1, Tao; +Cc: Chai, Thomas, Clements, John, amd-gfx, Zhang, Hawking

On Wed, Jan 12, 2022 at 3:36 AM Zhou1, Tao <Tao.Zhou1@amd.com> wrote:
>
> [AMD Official Use Only]
>
>
>
> > -----Original Message-----
> > From: Chai, Thomas <YiPeng.Chai@amd.com>
> > Sent: Wednesday, January 12, 2022 3:48 PM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Chai, Thomas <YiPeng.Chai@amd.com>; Zhang, Hawking
> > <Hawking.Zhang@amd.com>; Zhou1, Tao <Tao.Zhou1@amd.com>; Clements,
> > John <John.Clements@amd.com>; Chai, Thomas <YiPeng.Chai@amd.com>
> > Subject: [PATCH 2/2] drm/amdgpu: No longer insert ras blocks into ras_list if it
> > already exists in ras_list
> >
> > No longer insert ras blocks into ras_list if it already exists in ras_list.
> >
> > Signed-off-by: yipechai <YiPeng.Chai@amd.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > index 62be0b4909b3..e6d3bb4b56e4 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > @@ -2754,9 +2754,17 @@ int amdgpu_ras_reset_gpu(struct amdgpu_device
> > *adev)  int amdgpu_ras_register_ras_block(struct amdgpu_device *adev,
> >               struct amdgpu_ras_block_object* ras_block_obj)  {
> > +     struct amdgpu_ras_block_object *obj, *tmp;
> >       if (!adev || !amdgpu_ras_asic_supported(adev) || !ras_block_obj)
> >               return -EINVAL;
> >
> > +     /* If the ras object had been in ras_list, doesn't add it to ras_list again */
> [Tao] How about "If the ras object is in ras_list, don't add it again"
>
> > +     list_for_each_entry_safe(obj, tmp, &adev->ras_list, node) {
> > +             if (obj == ras_block_obj) {
> > +                     return 0;
> > +             }
> > +     }
>
> [Tao] The patch is OK for me currently, but I think the root cause is we initialize adev->gmc.xgmi.ras in gmc_ras_late_init, the initialization should be called only in modprobe stage and we can create a general gmc_early_init for it.

Yes, please fix the root cause.  We should only be adding the blocks
once.  This is just papering over the actual problem.

Alex


>
> > +
> >       INIT_LIST_HEAD(&ras_block_obj->node);
> >       list_add_tail(&ras_block_obj->node, &adev->ras_list);
> >
> > --
> > 2.25.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-01-13 17:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-12  7:48 [PATCH 1/2] drm/amdgpu: Add a filter condition to restrict the SW ras function to be registered only by asics whose hardware supports the ras function yipechai
2022-01-12  7:48 ` [PATCH 2/2] drm/amdgpu: No longer insert ras blocks into ras_list if it already exists in ras_list yipechai
2022-01-12  8:36   ` Zhou1, Tao
2022-01-12  9:13     ` Chai, Thomas
2022-01-13 17:54     ` Alex Deucher
2022-01-12 16:39   ` Felix Kuehling
2022-01-13  2:34     ` Chai, Thomas
2022-01-12  8:27 ` [PATCH 1/2] drm/amdgpu: Add a filter condition to restrict the SW ras function to be registered only by asics whose hardware supports the ras function Zhou1, Tao
2022-01-12  9:10   ` Chai, Thomas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.