All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/amdgpu: update ras support capability with different sram ecc configuration
@ 2020-03-12  2:54 Guchun Chen
  2020-03-12  2:54 ` [PATCH 2/2] drm/amdgpu: remove mem ecc check for vega20 and arcturus Guchun Chen
  2020-03-12  3:13 ` [PATCH 1/2] drm/amdgpu: update ras support capability with different sram ecc configuration Zhang, Hawking
  0 siblings, 2 replies; 5+ messages in thread
From: Guchun Chen @ 2020-03-12  2:54 UTC (permalink / raw)
  To: amd-gfx, Hawking.Zhang, Dennis.Li, Tao.Zhou1, John.Clements; +Cc: Guchun Chen

When sram ecc is disabled by vbios, ras initialization
process in the corrresponding IPs that suppport sram ecc
needs to be skipped. So update ras support capability
accordingly on top of this configuration. This capability
will block further ras operations to the unsupported IPs.

v2: check HBM ECC enablement and set ras mask accordingly.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 37 +++++++++++++++++++++++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 69b02b9d4131..b08226c10d95 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1748,8 +1748,41 @@ static void amdgpu_ras_check_supported(struct amdgpu_device *adev,
 			 amdgpu_atomfirmware_sram_ecc_supported(adev)))
 		*hw_supported = AMDGPU_RAS_BLOCK_MASK;
 
-	*supported = amdgpu_ras_enable == 0 ?
-				0 : *hw_supported & amdgpu_ras_mask;
+	/* Both HBM and SRAM ECC are disabled in vbios. */
+	if (*hw_supported == 0) {
+		DRM_INFO("RAS HW support is disabled as HBM"
+			" and SRAM ECC are not presented.\n");
+		return;
+	}
+
+	if (amdgpu_ras_enable) {
+		*supported = *hw_supported;
+
+		/*
+		 * When HBM ECC is disabled in vbios, remove
+		 * UMC's and DF's ras support.
+		 */
+		if (!amdgpu_atomfirmware_mem_ecc_supported(adev)) {
+			DRM_INFO("HBM ECC is disabled and "
+					"remove UMC and DF ras support.\n");
+			*supported &= ~(1 << AMDGPU_RAS_BLOCK__UMC |
+					1 << AMDGPU_RAS_BLOCK__DF);
+		}
+
+		/*
+		 * When sram ecc is disabled in vbios, bypass those IP
+		 * blocks that support sram ecc, and only hold UMC and DF.
+		 */
+		if (!amdgpu_atomfirmware_sram_ecc_supported(adev)) {
+			DRM_INFO("SRAM ECC is disabled and remove ras support "
+					"from IPs that support sram ecc.\n");
+			*supported &= (1 << AMDGPU_RAS_BLOCK__UMC |
+					1 << AMDGPU_RAS_BLOCK__DF);
+		}
+
+		/* ras support needs to align with module parmeter */
+		*supported &= amdgpu_ras_mask;
+	}
 }
 
 int amdgpu_ras_init(struct amdgpu_device *adev)
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] drm/amdgpu: remove mem ecc check for vega20 and arcturus
  2020-03-12  2:54 [PATCH 1/2] drm/amdgpu: update ras support capability with different sram ecc configuration Guchun Chen
@ 2020-03-12  2:54 ` Guchun Chen
  2020-03-12  3:14   ` Zhang, Hawking
  2020-03-12  3:13 ` [PATCH 1/2] drm/amdgpu: update ras support capability with different sram ecc configuration Zhang, Hawking
  1 sibling, 1 reply; 5+ messages in thread
From: Guchun Chen @ 2020-03-12  2:54 UTC (permalink / raw)
  To: amd-gfx, Hawking.Zhang, Dennis.Li, Tao.Zhou1, John.Clements; +Cc: Guchun Chen

Memory ecc check including HBM and SRAM has been done
in ras init function for vega20 and arcturus. So remove
it from gmc module, only keep this check for vega10.

Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 90216abf14a4..9bde66a6b432 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -887,28 +887,20 @@ static int gmc_v9_0_late_init(void *handle)
 		return r;
 	/* Check if ecc is available */
 	if (!amdgpu_sriov_vf(adev)) {
-		switch (adev->asic_type) {
-		case CHIP_VEGA10:
-		case CHIP_VEGA20:
-		case CHIP_ARCTURUS:
+		if (adev->asic_type == CHIP_VEGA10) {
 			r = amdgpu_atomfirmware_mem_ecc_supported(adev);
 			if (!r) {
 				DRM_INFO("ECC is not present.\n");
 				if (adev->df.funcs->enable_ecc_force_par_wr_rmw)
 					adev->df.funcs->enable_ecc_force_par_wr_rmw(adev, false);
-			} else {
+			} else
 				DRM_INFO("ECC is active.\n");
-			}
 
 			r = amdgpu_atomfirmware_sram_ecc_supported(adev);
 			if (!r) {
 				DRM_INFO("SRAM ECC is not present.\n");
-			} else {
+			} else
 				DRM_INFO("SRAM ECC is active.\n");
-			}
-			break;
-		default:
-			break;
 		}
 	}
 
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* RE: [PATCH 1/2] drm/amdgpu: update ras support capability with different sram ecc configuration
  2020-03-12  2:54 [PATCH 1/2] drm/amdgpu: update ras support capability with different sram ecc configuration Guchun Chen
  2020-03-12  2:54 ` [PATCH 2/2] drm/amdgpu: remove mem ecc check for vega20 and arcturus Guchun Chen
@ 2020-03-12  3:13 ` Zhang, Hawking
  2020-03-12  3:47   ` Chen, Guchun
  1 sibling, 1 reply; 5+ messages in thread
From: Zhang, Hawking @ 2020-03-12  3:13 UTC (permalink / raw)
  To: Chen, Guchun, amd-gfx, Li, Dennis, Zhou1, Tao, Clements, John

[AMD Official Use Only - Internal Distribution Only]

Hi Guchun,

It seems to me we still have redundant function call in amdgpu_ras_check_supported. The atomfirmware interfaces are possibly invoked twice?

As I listed the steps in last thread, we can assume hw_supported to 0 or 0xfffffff either. 

Check HBM ECC first, explicitly indicates it is present or not, and set the DF/UMC bit in hw_supported
Check SRAM ECC, explicitly indicates It is present or not, and set other ip blocks masks.

After we run all above checks, set the finally ras mask to con->supported.

We'd better keep the message consistent as what we had in gmc_v9_0_late. No need to highlight the what IP block get disabled, that should be transparent to users.

Regards,
Hawking

-----Original Message-----
From: Chen, Guchun <Guchun.Chen@amd.com> 
Sent: Thursday, March 12, 2020 10:55
To: amd-gfx@lists.freedesktop.org; Zhang, Hawking <Hawking.Zhang@amd.com>; Li, Dennis <Dennis.Li@amd.com>; Zhou1, Tao <Tao.Zhou1@amd.com>; Clements, John <John.Clements@amd.com>
Cc: Chen, Guchun <Guchun.Chen@amd.com>
Subject: [PATCH 1/2] drm/amdgpu: update ras support capability with different sram ecc configuration

When sram ecc is disabled by vbios, ras initialization process in the corrresponding IPs that suppport sram ecc needs to be skipped. So update ras support capability accordingly on top of this configuration. This capability will block further ras operations to the unsupported IPs.

v2: check HBM ECC enablement and set ras mask accordingly.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 37 +++++++++++++++++++++++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 69b02b9d4131..b08226c10d95 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1748,8 +1748,41 @@ static void amdgpu_ras_check_supported (struct amdgpu_device *adev,
 			 amdgpu_atomfirmware_sram_ecc_supported(adev)))
 		*hw_supported = AMDGPU_RAS_BLOCK_MASK;
 
-	*supported = amdgpu_ras_enable == 0 ?
-				0 : *hw_supported & amdgpu_ras_mask;
+	/* Both HBM and SRAM ECC are disabled in vbios. */
+	if (*hw_supported == 0) {
+		DRM_INFO("RAS HW support is disabled as HBM"
+			" and SRAM ECC are not presented.\n");
+		return;
+	}
+
+	if (amdgpu_ras_enable) {
+		*supported = *hw_supported;
+
+		/*
+		 * When HBM ECC is disabled in vbios, remove
+		 * UMC's and DF's ras support.
+		 */
+		if (!amdgpu_atomfirmware_mem_ecc_supported(adev)) {
+			DRM_INFO("HBM ECC is disabled and "
+					"remove UMC and DF ras support.\n");
+			*supported &= ~(1 << AMDGPU_RAS_BLOCK__UMC |
+					1 << AMDGPU_RAS_BLOCK__DF);
+		}
+
+		/*
+		 * When sram ecc is disabled in vbios, bypass those IP
+		 * blocks that support sram ecc, and only hold UMC and DF.
+		 */
+		if (!amdgpu_atomfirmware_sram_ecc_supported(adev)) {
+			DRM_INFO("SRAM ECC is disabled and remove ras support "
+					"from IPs that support sram ecc.\n");
+			*supported &= (1 << AMDGPU_RAS_BLOCK__UMC |
+					1 << AMDGPU_RAS_BLOCK__DF);
+		}
+
+		/* ras support needs to align with module parmeter */
+		*supported &= amdgpu_ras_mask;
+	}
 }
 
 int amdgpu_ras_init(struct amdgpu_device *adev)
--
2.17.1
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* RE: [PATCH 2/2] drm/amdgpu: remove mem ecc check for vega20 and arcturus
  2020-03-12  2:54 ` [PATCH 2/2] drm/amdgpu: remove mem ecc check for vega20 and arcturus Guchun Chen
@ 2020-03-12  3:14   ` Zhang, Hawking
  0 siblings, 0 replies; 5+ messages in thread
From: Zhang, Hawking @ 2020-03-12  3:14 UTC (permalink / raw)
  To: Chen, Guchun, amd-gfx, Li, Dennis, Zhou1, Tao, Clements, John

[AMD Official Use Only - Internal Distribution Only]

I think we can merge the patch with first one as they are all refine current logic for querying ras capability.

Regards,
Hawking

-----Original Message-----
From: Chen, Guchun <Guchun.Chen@amd.com> 
Sent: Thursday, March 12, 2020 10:55
To: amd-gfx@lists.freedesktop.org; Zhang, Hawking <Hawking.Zhang@amd.com>; Li, Dennis <Dennis.Li@amd.com>; Zhou1, Tao <Tao.Zhou1@amd.com>; Clements, John <John.Clements@amd.com>
Cc: Chen, Guchun <Guchun.Chen@amd.com>
Subject: [PATCH 2/2] drm/amdgpu: remove mem ecc check for vega20 and arcturus

Memory ecc check including HBM and SRAM has been done in ras init function for vega20 and arcturus. So remove it from gmc module, only keep this check for vega10.

Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 90216abf14a4..9bde66a6b432 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -887,28 +887,20 @@ static int gmc_v9_0_late_init(void *handle)
 		return r;
 	/* Check if ecc is available */
 	if (!amdgpu_sriov_vf(adev)) {
-		switch (adev->asic_type) {
-		case CHIP_VEGA10:
-		case CHIP_VEGA20:
-		case CHIP_ARCTURUS:
+		if (adev->asic_type == CHIP_VEGA10) {
 			r = amdgpu_atomfirmware_mem_ecc_supported(adev);
 			if (!r) {
 				DRM_INFO("ECC is not present.\n");
 				if (adev->df.funcs->enable_ecc_force_par_wr_rmw)
 					adev->df.funcs->enable_ecc_force_par_wr_rmw(adev, false);
-			} else {
+			} else
 				DRM_INFO("ECC is active.\n");
-			}
 
 			r = amdgpu_atomfirmware_sram_ecc_supported(adev);
 			if (!r) {
 				DRM_INFO("SRAM ECC is not present.\n");
-			} else {
+			} else
 				DRM_INFO("SRAM ECC is active.\n");
-			}
-			break;
-		default:
-			break;
 		}
 	}
 
--
2.17.1
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* RE: [PATCH 1/2] drm/amdgpu: update ras support capability with different sram ecc configuration
  2020-03-12  3:13 ` [PATCH 1/2] drm/amdgpu: update ras support capability with different sram ecc configuration Zhang, Hawking
@ 2020-03-12  3:47   ` Chen, Guchun
  0 siblings, 0 replies; 5+ messages in thread
From: Chen, Guchun @ 2020-03-12  3:47 UTC (permalink / raw)
  To: Zhang, Hawking, amd-gfx, Li, Dennis, Zhou1,  Tao, Clements, John

[AMD Public Use]

Thanks for your suggestion, Hawking.
I will send one patch v3 to target this.

Regards,
Guchun

-----Original Message-----
From: Zhang, Hawking <Hawking.Zhang@amd.com> 
Sent: Thursday, March 12, 2020 11:13 AM
To: Chen, Guchun <Guchun.Chen@amd.com>; amd-gfx@lists.freedesktop.org; Li, Dennis <Dennis.Li@amd.com>; Zhou1, Tao <Tao.Zhou1@amd.com>; Clements, John <John.Clements@amd.com>
Subject: RE: [PATCH 1/2] drm/amdgpu: update ras support capability with different sram ecc configuration

[AMD Official Use Only - Internal Distribution Only]

Hi Guchun,

It seems to me we still have redundant function call in amdgpu_ras_check_supported. The atomfirmware interfaces are possibly invoked twice?

As I listed the steps in last thread, we can assume hw_supported to 0 or 0xfffffff either. 

Check HBM ECC first, explicitly indicates it is present or not, and set the DF/UMC bit in hw_supported Check SRAM ECC, explicitly indicates It is present or not, and set other ip blocks masks.

After we run all above checks, set the finally ras mask to con->supported.

We'd better keep the message consistent as what we had in gmc_v9_0_late. No need to highlight the what IP block get disabled, that should be transparent to users.

Regards,
Hawking

-----Original Message-----
From: Chen, Guchun <Guchun.Chen@amd.com>
Sent: Thursday, March 12, 2020 10:55
To: amd-gfx@lists.freedesktop.org; Zhang, Hawking <Hawking.Zhang@amd.com>; Li, Dennis <Dennis.Li@amd.com>; Zhou1, Tao <Tao.Zhou1@amd.com>; Clements, John <John.Clements@amd.com>
Cc: Chen, Guchun <Guchun.Chen@amd.com>
Subject: [PATCH 1/2] drm/amdgpu: update ras support capability with different sram ecc configuration

When sram ecc is disabled by vbios, ras initialization process in the corrresponding IPs that suppport sram ecc needs to be skipped. So update ras support capability accordingly on top of this configuration. This capability will block further ras operations to the unsupported IPs.

v2: check HBM ECC enablement and set ras mask accordingly.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 37 +++++++++++++++++++++++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 69b02b9d4131..b08226c10d95 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1748,8 +1748,41 @@ static void amdgpu_ras_check_supported (struct amdgpu_device *adev,
 			 amdgpu_atomfirmware_sram_ecc_supported(adev)))
 		*hw_supported = AMDGPU_RAS_BLOCK_MASK;
 
-	*supported = amdgpu_ras_enable == 0 ?
-				0 : *hw_supported & amdgpu_ras_mask;
+	/* Both HBM and SRAM ECC are disabled in vbios. */
+	if (*hw_supported == 0) {
+		DRM_INFO("RAS HW support is disabled as HBM"
+			" and SRAM ECC are not presented.\n");
+		return;
+	}
+
+	if (amdgpu_ras_enable) {
+		*supported = *hw_supported;
+
+		/*
+		 * When HBM ECC is disabled in vbios, remove
+		 * UMC's and DF's ras support.
+		 */
+		if (!amdgpu_atomfirmware_mem_ecc_supported(adev)) {
+			DRM_INFO("HBM ECC is disabled and "
+					"remove UMC and DF ras support.\n");
+			*supported &= ~(1 << AMDGPU_RAS_BLOCK__UMC |
+					1 << AMDGPU_RAS_BLOCK__DF);
+		}
+
+		/*
+		 * When sram ecc is disabled in vbios, bypass those IP
+		 * blocks that support sram ecc, and only hold UMC and DF.
+		 */
+		if (!amdgpu_atomfirmware_sram_ecc_supported(adev)) {
+			DRM_INFO("SRAM ECC is disabled and remove ras support "
+					"from IPs that support sram ecc.\n");
+			*supported &= (1 << AMDGPU_RAS_BLOCK__UMC |
+					1 << AMDGPU_RAS_BLOCK__DF);
+		}
+
+		/* ras support needs to align with module parmeter */
+		*supported &= amdgpu_ras_mask;
+	}
 }
 
 int amdgpu_ras_init(struct amdgpu_device *adev)
--
2.17.1
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-03-12  3:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-12  2:54 [PATCH 1/2] drm/amdgpu: update ras support capability with different sram ecc configuration Guchun Chen
2020-03-12  2:54 ` [PATCH 2/2] drm/amdgpu: remove mem ecc check for vega20 and arcturus Guchun Chen
2020-03-12  3:14   ` Zhang, Hawking
2020-03-12  3:13 ` [PATCH 1/2] drm/amdgpu: update ras support capability with different sram ecc configuration Zhang, Hawking
2020-03-12  3:47   ` Chen, Guchun

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.