All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] add correctable error query support on arcturus
@ 2020-04-26  9:16 Guchun Chen
  2020-04-26  9:16 ` [PATCH 1/2] drm/amdgpu: switch to SMN interface to operate RSMU index mode Guchun Chen
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Guchun Chen @ 2020-04-26  9:16 UTC (permalink / raw)
  To: amd-gfx, Hawking.Zhang, Dennis.Li, Tao.Zhou1, John.Clements,
	alexander.deucher
  Cc: Candice.Li, Guchun Chen

Below two patches are submmited to promise UMC correctable error query
working fine on arcturus.
Patch 1 is to switch RSMU UMC index access to SMN interface to make it
stable, and to be consistent with other register access in this file.
Patch 2 is to decouple EccErrCnt error count query and clear operation,
due to unknown hardware cause.

Both are verified on arcturus and Vega20.

Guchun Chen (2):
  drm/amdgpu: switch to SMN interface to operate RSMU index mode
  drm/amdgpu: decouple EccErrCnt query and clear operation.

 drivers/gpu/drm/amd/amdgpu/umc_v6_1.c | 112 +++++++++++++++++++++++---
 1 file changed, 103 insertions(+), 9 deletions(-)

-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/2] drm/amdgpu: switch to SMN interface to operate RSMU index mode
  2020-04-26  9:16 [PATCH 0/2] add correctable error query support on arcturus Guchun Chen
@ 2020-04-26  9:16 ` Guchun Chen
  2020-04-26  9:16 ` [PATCH 2/2] drm/amdgpu: decouple EccErrCnt query and clear operation Guchun Chen
  2020-04-26 12:55 ` [PATCH 0/2] add correctable error query support on arcturus Zhou1, Tao
  2 siblings, 0 replies; 4+ messages in thread
From: Guchun Chen @ 2020-04-26  9:16 UTC (permalink / raw)
  To: amd-gfx, Hawking.Zhang, Dennis.Li, Tao.Zhou1, John.Clements,
	alexander.deucher
  Cc: Candice.Li, Guchun Chen

This makes consistent of regsiter access in this module.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/umc_v6_1.c | 29 ++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c b/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
index 616eac76eaa7..6d767970b2cf 100644
--- a/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
@@ -56,24 +56,43 @@ const uint32_t
 
 static void umc_v6_1_enable_umc_index_mode(struct amdgpu_device *adev)
 {
-	WREG32_FIELD15(RSMU, 0, RSMU_UMC_INDEX_REGISTER_NBIF_VG20_GPU,
+	uint32_t rsmu_umc_addr, rsmu_umc_val;
+
+	rsmu_umc_addr = SOC15_REG_OFFSET(RSMU, 0,
+			mmRSMU_UMC_INDEX_REGISTER_NBIF_VG20_GPU);
+	rsmu_umc_val = RREG32_PCIE(rsmu_umc_addr * 4);
+
+	rsmu_umc_val = REG_SET_FIELD(rsmu_umc_val,
+			RSMU_UMC_INDEX_REGISTER_NBIF_VG20_GPU,
 			RSMU_UMC_INDEX_MODE_EN, 1);
+
+	WREG32_PCIE(rsmu_umc_addr * 4, rsmu_umc_val);
 }
 
 static void umc_v6_1_disable_umc_index_mode(struct amdgpu_device *adev)
 {
-	WREG32_FIELD15(RSMU, 0, RSMU_UMC_INDEX_REGISTER_NBIF_VG20_GPU,
+	uint32_t rsmu_umc_addr, rsmu_umc_val;
+
+	rsmu_umc_addr = SOC15_REG_OFFSET(RSMU, 0,
+			mmRSMU_UMC_INDEX_REGISTER_NBIF_VG20_GPU);
+	rsmu_umc_val = RREG32_PCIE(rsmu_umc_addr * 4);
+
+	rsmu_umc_val = REG_SET_FIELD(rsmu_umc_val,
+			RSMU_UMC_INDEX_REGISTER_NBIF_VG20_GPU,
 			RSMU_UMC_INDEX_MODE_EN, 0);
+
+	WREG32_PCIE(rsmu_umc_addr * 4, rsmu_umc_val);
 }
 
 static uint32_t umc_v6_1_get_umc_index_mode_state(struct amdgpu_device *adev)
 {
-	uint32_t rsmu_umc_index;
+	uint32_t rsmu_umc_addr, rsmu_umc_val;
 
-	rsmu_umc_index = RREG32_SOC15(RSMU, 0,
+	rsmu_umc_addr = SOC15_REG_OFFSET(RSMU, 0,
 			mmRSMU_UMC_INDEX_REGISTER_NBIF_VG20_GPU);
+	rsmu_umc_val = RREG32_PCIE(rsmu_umc_addr * 4);
 
-	return REG_GET_FIELD(rsmu_umc_index,
+	return REG_GET_FIELD(rsmu_umc_val,
 			RSMU_UMC_INDEX_REGISTER_NBIF_VG20_GPU,
 			RSMU_UMC_INDEX_MODE_EN);
 }
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] drm/amdgpu: decouple EccErrCnt query and clear operation.
  2020-04-26  9:16 [PATCH 0/2] add correctable error query support on arcturus Guchun Chen
  2020-04-26  9:16 ` [PATCH 1/2] drm/amdgpu: switch to SMN interface to operate RSMU index mode Guchun Chen
@ 2020-04-26  9:16 ` Guchun Chen
  2020-04-26 12:55 ` [PATCH 0/2] add correctable error query support on arcturus Zhou1, Tao
  2 siblings, 0 replies; 4+ messages in thread
From: Guchun Chen @ 2020-04-26  9:16 UTC (permalink / raw)
  To: amd-gfx, Hawking.Zhang, Dennis.Li, Tao.Zhou1, John.Clements,
	alexander.deucher
  Cc: Candice.Li, Guchun Chen

Due to hardware bug that when RSMU UMC index is disabled,
clear EccErrCnt at the first UMC instance will clean up all other
EccErrCnt registes from other instances at the same time. This
will break the correctable error count log in EccErrCnt register
once querying it. So decouple both to make error count query workable.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/umc_v6_1.c | 83 +++++++++++++++++++++++++--
 1 file changed, 79 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c b/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
index 6d767970b2cf..fa889eeb3a17 100644
--- a/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
@@ -104,6 +104,81 @@ static inline uint32_t get_umc_6_reg_offset(struct amdgpu_device *adev,
 	return adev->umc.channel_offs*ch_inst + UMC_6_INST_DIST*umc_inst;
 }
 
+static void umc_v6_1_clear_error_count_per_channel(struct amdgpu_device *adev,
+					uint32_t umc_reg_offset)
+{
+	uint32_t ecc_err_cnt_addr;
+	uint32_t ecc_err_cnt_sel, ecc_err_cnt_sel_addr;
+
+	if (adev->asic_type == CHIP_ARCTURUS) {
+		/* UMC 6_1_2 registers */
+		ecc_err_cnt_sel_addr =
+			SOC15_REG_OFFSET(UMC, 0,
+					mmUMCCH0_0_EccErrCntSel_ARCT);
+		ecc_err_cnt_addr =
+			SOC15_REG_OFFSET(UMC, 0,
+					mmUMCCH0_0_EccErrCnt_ARCT);
+	} else {
+		/* UMC 6_1_1 registers */
+		ecc_err_cnt_sel_addr =
+			SOC15_REG_OFFSET(UMC, 0,
+					mmUMCCH0_0_EccErrCntSel);
+		ecc_err_cnt_addr =
+			SOC15_REG_OFFSET(UMC, 0,
+					mmUMCCH0_0_EccErrCnt);
+	}
+
+	/* select the lower chip */
+	ecc_err_cnt_sel = RREG32_PCIE((ecc_err_cnt_sel_addr +
+					umc_reg_offset) * 4);
+	ecc_err_cnt_sel = REG_SET_FIELD(ecc_err_cnt_sel,
+					UMCCH0_0_EccErrCntSel,
+					EccErrCntCsSel, 0);
+	WREG32_PCIE((ecc_err_cnt_sel_addr + umc_reg_offset) * 4,
+			ecc_err_cnt_sel);
+
+	/* clear lower chip error count */
+	WREG32_PCIE((ecc_err_cnt_addr + umc_reg_offset) * 4,
+			UMC_V6_1_CE_CNT_INIT);
+
+	/* select the higher chip */
+	ecc_err_cnt_sel = RREG32_PCIE((ecc_err_cnt_sel_addr +
+					umc_reg_offset) * 4);
+	ecc_err_cnt_sel = REG_SET_FIELD(ecc_err_cnt_sel,
+					UMCCH0_0_EccErrCntSel,
+					EccErrCntCsSel, 1);
+	WREG32_PCIE((ecc_err_cnt_sel_addr + umc_reg_offset) * 4,
+			ecc_err_cnt_sel);
+
+	/* clear higher chip error count */
+	WREG32_PCIE((ecc_err_cnt_addr + umc_reg_offset) * 4,
+			UMC_V6_1_CE_CNT_INIT);
+}
+
+static void umc_v6_1_clear_error_count(struct amdgpu_device *adev)
+{
+	uint32_t umc_inst        = 0;
+	uint32_t ch_inst         = 0;
+	uint32_t umc_reg_offset  = 0;
+	uint32_t rsmu_umc_index_state =
+				umc_v6_1_get_umc_index_mode_state(adev);
+
+	if (rsmu_umc_index_state)
+		umc_v6_1_disable_umc_index_mode(adev);
+
+	LOOP_UMC_INST_AND_CH(umc_inst, ch_inst) {
+		umc_reg_offset = get_umc_6_reg_offset(adev,
+						umc_inst,
+						ch_inst);
+
+		umc_v6_1_clear_error_count_per_channel(adev,
+						umc_reg_offset);
+	}
+
+	if (rsmu_umc_index_state)
+		umc_v6_1_enable_umc_index_mode(adev);
+}
+
 static void umc_v6_1_query_correctable_error_count(struct amdgpu_device *adev,
 						   uint32_t umc_reg_offset,
 						   unsigned long *error_count)
@@ -136,23 +211,21 @@ static void umc_v6_1_query_correctable_error_count(struct amdgpu_device *adev,
 	ecc_err_cnt_sel = REG_SET_FIELD(ecc_err_cnt_sel, UMCCH0_0_EccErrCntSel,
 					EccErrCntCsSel, 0);
 	WREG32_PCIE((ecc_err_cnt_sel_addr + umc_reg_offset) * 4, ecc_err_cnt_sel);
+
 	ecc_err_cnt = RREG32_PCIE((ecc_err_cnt_addr + umc_reg_offset) * 4);
 	*error_count +=
 		(REG_GET_FIELD(ecc_err_cnt, UMCCH0_0_EccErrCnt, EccErrCnt) -
 		 UMC_V6_1_CE_CNT_INIT);
-	/* clear the lower chip err count */
-	WREG32_PCIE((ecc_err_cnt_addr + umc_reg_offset) * 4, UMC_V6_1_CE_CNT_INIT);
 
 	/* select the higher chip and check the err counter */
 	ecc_err_cnt_sel = REG_SET_FIELD(ecc_err_cnt_sel, UMCCH0_0_EccErrCntSel,
 					EccErrCntCsSel, 1);
 	WREG32_PCIE((ecc_err_cnt_sel_addr + umc_reg_offset) * 4, ecc_err_cnt_sel);
+
 	ecc_err_cnt = RREG32_PCIE((ecc_err_cnt_addr + umc_reg_offset) * 4);
 	*error_count +=
 		(REG_GET_FIELD(ecc_err_cnt, UMCCH0_0_EccErrCnt, EccErrCnt) -
 		 UMC_V6_1_CE_CNT_INIT);
-	/* clear the higher chip err count */
-	WREG32_PCIE((ecc_err_cnt_addr + umc_reg_offset) * 4, UMC_V6_1_CE_CNT_INIT);
 
 	/* check for SRAM correctable error
 	  MCUMC_STATUS is a 64 bit register */
@@ -228,6 +301,8 @@ static void umc_v6_1_query_ras_error_count(struct amdgpu_device *adev,
 
 	if (rsmu_umc_index_state)
 		umc_v6_1_enable_umc_index_mode(adev);
+
+	umc_v6_1_clear_error_count(adev);
 }
 
 static void umc_v6_1_query_error_address(struct amdgpu_device *adev,
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* RE: [PATCH 0/2] add correctable error query support on arcturus
  2020-04-26  9:16 [PATCH 0/2] add correctable error query support on arcturus Guchun Chen
  2020-04-26  9:16 ` [PATCH 1/2] drm/amdgpu: switch to SMN interface to operate RSMU index mode Guchun Chen
  2020-04-26  9:16 ` [PATCH 2/2] drm/amdgpu: decouple EccErrCnt query and clear operation Guchun Chen
@ 2020-04-26 12:55 ` Zhou1, Tao
  2 siblings, 0 replies; 4+ messages in thread
From: Zhou1, Tao @ 2020-04-26 12:55 UTC (permalink / raw)
  To: Chen, Guchun, amd-gfx, Zhang, Hawking, Li, Dennis, Clements,
	John, Deucher, Alexander
  Cc: Li, Candice

[AMD Official Use Only - Internal Distribution Only]

The series is:

Reviewed-by: Tao Zhou <tao.zhou1@amd.com>

> -----Original Message-----
> From: Chen, Guchun <Guchun.Chen@amd.com>
> Sent: 2020年4月26日 17:17
> To: amd-gfx@lists.freedesktop.org; Zhang, Hawking
> <Hawking.Zhang@amd.com>; Li, Dennis <Dennis.Li@amd.com>; Zhou1, Tao
> <Tao.Zhou1@amd.com>; Clements, John <John.Clements@amd.com>;
> Deucher, Alexander <Alexander.Deucher@amd.com>
> Cc: Li, Candice <Candice.Li@amd.com>; Chen, Guchun
> <Guchun.Chen@amd.com>
> Subject: [PATCH 0/2] add correctable error query support on arcturus
> 
> Below two patches are submmited to promise UMC correctable error query
> working fine on arcturus.
> Patch 1 is to switch RSMU UMC index access to SMN interface to make it
> stable, and to be consistent with other register access in this file.
> Patch 2 is to decouple EccErrCnt error count query and clear operation, due
> to unknown hardware cause.
> 
> Both are verified on arcturus and Vega20.
> 
> Guchun Chen (2):
>   drm/amdgpu: switch to SMN interface to operate RSMU index mode
>   drm/amdgpu: decouple EccErrCnt query and clear operation.
> 
>  drivers/gpu/drm/amd/amdgpu/umc_v6_1.c | 112
> +++++++++++++++++++++++---
>  1 file changed, 103 insertions(+), 9 deletions(-)
> 
> --
> 2.17.1
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-04-26 12:55 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-26  9:16 [PATCH 0/2] add correctable error query support on arcturus Guchun Chen
2020-04-26  9:16 ` [PATCH 1/2] drm/amdgpu: switch to SMN interface to operate RSMU index mode Guchun Chen
2020-04-26  9:16 ` [PATCH 2/2] drm/amdgpu: decouple EccErrCnt query and clear operation Guchun Chen
2020-04-26 12:55 ` [PATCH 0/2] add correctable error query support on arcturus Zhou1, Tao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.