stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/2] drm/amdgpu: Move HDP remapping earlier during init
@ 2022-08-29  8:17 Lijo Lazar
  2022-08-29  8:17 ` [PATCH v2 2/2] drm/amdgpu: Init VF's HDP flush reg offset early Lijo Lazar
  2022-08-29 16:50 ` [PATCH v2 1/2] drm/amdgpu: Move HDP remapping earlier during init Alex Deucher
  0 siblings, 2 replies; 13+ messages in thread
From: Lijo Lazar @ 2022-08-29  8:17 UTC (permalink / raw)
  To: amd-gfx
  Cc: Hawking.Zhang, Alexander.Deucher, Felix.Kuehling,
	Christian.Koenig, tseewald, helgaas, sr, stable

HDP flush is used early in the init sequence as part of memory controller
block initialization. Hence remapping of HDP registers needed for flush
needs to happen earlier.

This also fixes the Unsupported Request error reported through AER during
driver load. The error happens as a write happens to the remap offset
before real remapping is done.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373

The error was unnoticed before and got visible because of the commit
referenced below. This doesn't fix anything in the commit below, rather
fixes the issue in amdgpu exposed by the commit. The reference is only
to associate this commit with below one so that both go together.

Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")

Reported-by: Tom Seewald <tseewald@gmail.com>
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Cc: stable@vger.kernel.org
---
v2:
	Take care of IP resume cases (Alex Deucher)
	Add NULL check to nbio.funcs to cover older (GFXv8) ASICs (Felix Kuehling)
	Add more details in commit message and associate with AER patch (Bjorn
Helgaas)

 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 24 ++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/nv.c            |  6 ------
 drivers/gpu/drm/amd/amdgpu/soc15.c         |  6 ------
 drivers/gpu/drm/amd/amdgpu/soc21.c         |  6 ------
 4 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index ce7d117efdb5..e420118769a5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2334,6 +2334,26 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
 	return 0;
 }
 
+/**
+ * amdgpu_device_prepare_ip - prepare IPs for hardware initialization
+ *
+ * @adev: amdgpu_device pointer
+ *
+ * Any common hardware initialization sequence that needs to be done before
+ * hw init of individual IPs is performed here. This is different from the
+ * 'common block' which initializes a set of IPs.
+ */
+static void amdgpu_device_prepare_ip(struct amdgpu_device *adev)
+{
+	/* Remap HDP registers to a hole in mmio space, for the purpose
+	 * of exposing those registers to process space. This needs to be
+	 * done before hw init of ip blocks to take care of HDP flush
+	 * operations through registers during hw_init.
+	 */
+	if (adev->nbio.funcs && adev->nbio.funcs->remap_hdp_registers &&
+	    !amdgpu_sriov_vf(adev))
+		adev->nbio.funcs->remap_hdp_registers(adev);
+}
 
 /**
  * amdgpu_device_ip_init - run init for hardware IPs
@@ -2376,6 +2396,8 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
 				DRM_ERROR("amdgpu_vram_scratch_init failed %d\n", r);
 				goto init_failed;
 			}
+
+			amdgpu_device_prepare_ip(adev);
 			r = adev->ip_blocks[i].version->funcs->hw_init((void *)adev);
 			if (r) {
 				DRM_ERROR("hw_init %d failed %d\n", i, r);
@@ -3058,6 +3080,7 @@ static int amdgpu_device_ip_reinit_early_sriov(struct amdgpu_device *adev)
 		AMD_IP_BLOCK_TYPE_IH,
 	};
 
+	amdgpu_device_prepare_ip(adev);
 	for (i = 0; i < adev->num_ip_blocks; i++) {
 		int j;
 		struct amdgpu_ip_block *block;
@@ -3139,6 +3162,7 @@ static int amdgpu_device_ip_resume_phase1(struct amdgpu_device *adev)
 {
 	int i, r;
 
+	amdgpu_device_prepare_ip(adev);
 	for (i = 0; i < adev->num_ip_blocks; i++) {
 		if (!adev->ip_blocks[i].status.valid || adev->ip_blocks[i].status.hw)
 			continue;
diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
index b3fba8dea63c..3ac7fef74277 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -1032,12 +1032,6 @@ static int nv_common_hw_init(void *handle)
 	nv_program_aspm(adev);
 	/* setup nbio registers */
 	adev->nbio.funcs->init_registers(adev);
-	/* remap HDP registers to a hole in mmio space,
-	 * for the purpose of expose those registers
-	 * to process space
-	 */
-	if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
-		adev->nbio.funcs->remap_hdp_registers(adev);
 	/* enable the doorbell aperture */
 	nv_enable_doorbell_aperture(adev, true);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
index fde6154f2009..a0481e37d7cf 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -1240,12 +1240,6 @@ static int soc15_common_hw_init(void *handle)
 	soc15_program_aspm(adev);
 	/* setup nbio registers */
 	adev->nbio.funcs->init_registers(adev);
-	/* remap HDP registers to a hole in mmio space,
-	 * for the purpose of expose those registers
-	 * to process space
-	 */
-	if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
-		adev->nbio.funcs->remap_hdp_registers(adev);
 
 	/* enable the doorbell aperture */
 	soc15_enable_doorbell_aperture(adev, true);
diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c b/drivers/gpu/drm/amd/amdgpu/soc21.c
index 55284b24f113..16b447055102 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc21.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
@@ -660,12 +660,6 @@ static int soc21_common_hw_init(void *handle)
 	soc21_program_aspm(adev);
 	/* setup nbio registers */
 	adev->nbio.funcs->init_registers(adev);
-	/* remap HDP registers to a hole in mmio space,
-	 * for the purpose of expose those registers
-	 * to process space
-	 */
-	if (adev->nbio.funcs->remap_hdp_registers)
-		adev->nbio.funcs->remap_hdp_registers(adev);
 	/* enable the doorbell aperture */
 	soc21_enable_doorbell_aperture(adev, true);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 2/2] drm/amdgpu: Init VF's HDP flush reg offset early
  2022-08-29  8:17 [PATCH v2 1/2] drm/amdgpu: Move HDP remapping earlier during init Lijo Lazar
@ 2022-08-29  8:17 ` Lijo Lazar
  2022-08-29 16:50 ` [PATCH v2 1/2] drm/amdgpu: Move HDP remapping earlier during init Alex Deucher
  1 sibling, 0 replies; 13+ messages in thread
From: Lijo Lazar @ 2022-08-29  8:17 UTC (permalink / raw)
  To: amd-gfx
  Cc: Hawking.Zhang, Alexander.Deucher, Felix.Kuehling,
	Christian.Koenig, tseewald, helgaas, sr, stable

Make sure the register offsets used for HDP flush in VF is
initialized early so that it works fine during any early HDP flush
sequence. For that, move the offset initialization to *_remap_hdp.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Cc: stable@vger.kernel.org
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  3 +--
 drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c     | 23 +++++++++++++--------
 drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c     | 12 +++++++----
 drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c     | 23 +++++++++++++--------
 drivers/gpu/drm/amd/amdgpu/nbio_v7_0.c     | 21 ++++++++++++-------
 drivers/gpu/drm/amd/amdgpu/nbio_v7_2.c     | 24 ++++++++++++++--------
 drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c     | 23 +++++++++++++--------
 7 files changed, 84 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index e420118769a5..9d698b9f4e54 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2350,8 +2350,7 @@ static void amdgpu_device_prepare_ip(struct amdgpu_device *adev)
 	 * done before hw init of ip blocks to take care of HDP flush
 	 * operations through registers during hw_init.
 	 */
-	if (adev->nbio.funcs && adev->nbio.funcs->remap_hdp_registers &&
-	    !amdgpu_sriov_vf(adev))
+	if (adev->nbio.funcs && adev->nbio.funcs->remap_hdp_registers)
 		adev->nbio.funcs->remap_hdp_registers(adev);
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c b/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c
index b465baa26762..20fa2c5ad510 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c
@@ -65,10 +65,21 @@
 
 static void nbio_v2_3_remap_hdp_registers(struct amdgpu_device *adev)
 {
-	WREG32_SOC15(NBIO, 0, mmREMAP_HDP_MEM_FLUSH_CNTL,
-		adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL);
-	WREG32_SOC15(NBIO, 0, mmREMAP_HDP_REG_FLUSH_CNTL,
-		adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_REG_FLUSH_CNTL);
+	if (amdgpu_sriov_vf(adev))
+		adev->rmmio_remap.reg_offset =
+			SOC15_REG_OFFSET(
+				NBIO, 0,
+				mmBIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL)
+			<< 2;
+
+	if (!amdgpu_sriov_vf(adev)) {
+		WREG32_SOC15(NBIO, 0, mmREMAP_HDP_MEM_FLUSH_CNTL,
+			     adev->rmmio_remap.reg_offset +
+				     KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL);
+		WREG32_SOC15(NBIO, 0, mmREMAP_HDP_REG_FLUSH_CNTL,
+			     adev->rmmio_remap.reg_offset +
+				     KFD_MMIO_REMAP_HDP_REG_FLUSH_CNTL);
+	}
 }
 
 static u32 nbio_v2_3_get_rev_id(struct amdgpu_device *adev)
@@ -338,10 +349,6 @@ static void nbio_v2_3_init_registers(struct amdgpu_device *adev)
 
 	if (def != data)
 		WREG32_PCIE(smnPCIE_CONFIG_CNTL, data);
-
-	if (amdgpu_sriov_vf(adev))
-		adev->rmmio_remap.reg_offset = SOC15_REG_OFFSET(NBIO, 0,
-			mmBIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL) << 2;
 }
 
 #define NAVI10_PCIE__LC_L0S_INACTIVITY_DEFAULT		0x00000000 // off by default, no gains over L1
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c b/drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c
index 982a89f841d5..e011d9856794 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c
@@ -30,10 +30,14 @@
 
 static void nbio_v4_3_remap_hdp_registers(struct amdgpu_device *adev)
 {
-	WREG32_SOC15(NBIO, 0, regBIF_BX0_REMAP_HDP_MEM_FLUSH_CNTL,
-		adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL);
-	WREG32_SOC15(NBIO, 0, regBIF_BX0_REMAP_HDP_REG_FLUSH_CNTL,
-		adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_REG_FLUSH_CNTL);
+	if (!amdgpu_sriov_vf(adev)) {
+		WREG32_SOC15(NBIO, 0, regBIF_BX0_REMAP_HDP_MEM_FLUSH_CNTL,
+			     adev->rmmio_remap.reg_offset +
+				     KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL);
+		WREG32_SOC15(NBIO, 0, regBIF_BX0_REMAP_HDP_REG_FLUSH_CNTL,
+			     adev->rmmio_remap.reg_offset +
+				     KFD_MMIO_REMAP_HDP_REG_FLUSH_CNTL);
+	}
 }
 
 static u32 nbio_v4_3_get_rev_id(struct amdgpu_device *adev)
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c b/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c
index f7f6ddebd3e4..7536ca3fcd69 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c
@@ -55,10 +55,21 @@
 
 static void nbio_v6_1_remap_hdp_registers(struct amdgpu_device *adev)
 {
-	WREG32_SOC15(NBIO, 0, mmREMAP_HDP_MEM_FLUSH_CNTL,
-		adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL);
-	WREG32_SOC15(NBIO, 0, mmREMAP_HDP_REG_FLUSH_CNTL,
-		adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_REG_FLUSH_CNTL);
+	if (amdgpu_sriov_vf(adev))
+		adev->rmmio_remap.reg_offset =
+			SOC15_REG_OFFSET(
+				NBIO, 0,
+				mmBIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL)
+			<< 2;
+
+	if (!amdgpu_sriov_vf(adev)) {
+		WREG32_SOC15(NBIO, 0, mmREMAP_HDP_MEM_FLUSH_CNTL,
+			     adev->rmmio_remap.reg_offset +
+				     KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL);
+		WREG32_SOC15(NBIO, 0, mmREMAP_HDP_REG_FLUSH_CNTL,
+			     adev->rmmio_remap.reg_offset +
+				     KFD_MMIO_REMAP_HDP_REG_FLUSH_CNTL);
+	}
 }
 
 static u32 nbio_v6_1_get_rev_id(struct amdgpu_device *adev)
@@ -276,10 +287,6 @@ static void nbio_v6_1_init_registers(struct amdgpu_device *adev)
 
 	if (def != data)
 		WREG32_PCIE(smnPCIE_CI_CNTL, data);
-
-	if (amdgpu_sriov_vf(adev))
-		adev->rmmio_remap.reg_offset = SOC15_REG_OFFSET(NBIO, 0,
-			mmBIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL) << 2;
 }
 
 static void nbio_v6_1_program_ltr(struct amdgpu_device *adev)
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_0.c b/drivers/gpu/drm/amd/amdgpu/nbio_v7_0.c
index aa0326d00c72..6b4ac16a8466 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_0.c
@@ -35,10 +35,20 @@
 
 static void nbio_v7_0_remap_hdp_registers(struct amdgpu_device *adev)
 {
-	WREG32_SOC15(NBIO, 0, mmREMAP_HDP_MEM_FLUSH_CNTL,
-		adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL);
-	WREG32_SOC15(NBIO, 0, mmREMAP_HDP_REG_FLUSH_CNTL,
-		adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_REG_FLUSH_CNTL);
+	if (amdgpu_sriov_vf(adev))
+		adev->rmmio_remap.reg_offset =
+			SOC15_REG_OFFSET(NBIO, 0,
+					 mmHDP_MEM_COHERENCY_FLUSH_CNTL)
+			<< 2;
+
+	if (!amdgpu_sriov_vf(adev)) {
+		WREG32_SOC15(NBIO, 0, mmREMAP_HDP_MEM_FLUSH_CNTL,
+			     adev->rmmio_remap.reg_offset +
+				     KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL);
+		WREG32_SOC15(NBIO, 0, mmREMAP_HDP_REG_FLUSH_CNTL,
+			     adev->rmmio_remap.reg_offset +
+				     KFD_MMIO_REMAP_HDP_REG_FLUSH_CNTL);
+	}
 }
 
 static u32 nbio_v7_0_get_rev_id(struct amdgpu_device *adev)
@@ -273,9 +283,6 @@ const struct nbio_hdp_flush_reg nbio_v7_0_hdp_flush_reg = {
 
 static void nbio_v7_0_init_registers(struct amdgpu_device *adev)
 {
-	if (amdgpu_sriov_vf(adev))
-		adev->rmmio_remap.reg_offset =
-			SOC15_REG_OFFSET(NBIO, 0, mmHDP_MEM_COHERENCY_FLUSH_CNTL) << 2;
 }
 
 const struct amdgpu_nbio_funcs nbio_v7_0_funcs = {
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_2.c b/drivers/gpu/drm/amd/amdgpu/nbio_v7_2.c
index 31776b12e4c4..fb4be72eade7 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_2.c
@@ -49,10 +49,21 @@
 
 static void nbio_v7_2_remap_hdp_registers(struct amdgpu_device *adev)
 {
-	WREG32_SOC15(NBIO, 0, regBIF_BX0_REMAP_HDP_MEM_FLUSH_CNTL,
-		adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL);
-	WREG32_SOC15(NBIO, 0, regBIF_BX0_REMAP_HDP_REG_FLUSH_CNTL,
-		adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_REG_FLUSH_CNTL);
+	if (amdgpu_sriov_vf(adev))
+		adev->rmmio_remap.reg_offset =
+			SOC15_REG_OFFSET(
+				NBIO, 0,
+				regBIF_BX_PF0_HDP_MEM_COHERENCY_FLUSH_CNTL)
+			<< 2;
+
+	if (!amdgpu_sriov_vf(adev)) {
+		WREG32_SOC15(NBIO, 0, regBIF_BX0_REMAP_HDP_MEM_FLUSH_CNTL,
+			     adev->rmmio_remap.reg_offset +
+				     KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL);
+		WREG32_SOC15(NBIO, 0, regBIF_BX0_REMAP_HDP_REG_FLUSH_CNTL,
+			     adev->rmmio_remap.reg_offset +
+				     KFD_MMIO_REMAP_HDP_REG_FLUSH_CNTL);
+	}
 }
 
 static u32 nbio_v7_2_get_rev_id(struct amdgpu_device *adev)
@@ -369,6 +380,7 @@ const struct nbio_hdp_flush_reg nbio_v7_2_hdp_flush_reg = {
 static void nbio_v7_2_init_registers(struct amdgpu_device *adev)
 {
 	uint32_t def, data;
+
 	switch (adev->ip_versions[NBIO_HWIP][0]) {
 	case IP_VERSION(7, 2, 1):
 	case IP_VERSION(7, 3, 0):
@@ -393,10 +405,6 @@ static void nbio_v7_2_init_registers(struct amdgpu_device *adev)
 			WREG32_PCIE_PORT(SOC15_REG_OFFSET(NBIO, 0, regPCIE_CONFIG_CNTL), data);
 		break;
 	}
-
-	if (amdgpu_sriov_vf(adev))
-		adev->rmmio_remap.reg_offset = SOC15_REG_OFFSET(NBIO, 0,
-			regBIF_BX_PF0_HDP_MEM_COHERENCY_FLUSH_CNTL) << 2;
 }
 
 const struct amdgpu_nbio_funcs nbio_v7_2_funcs = {
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
index 11848d1e238b..3c11af99582f 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
@@ -101,10 +101,21 @@ static void nbio_v7_4_query_ras_error_count(struct amdgpu_device *adev,
 
 static void nbio_v7_4_remap_hdp_registers(struct amdgpu_device *adev)
 {
-	WREG32_SOC15(NBIO, 0, mmREMAP_HDP_MEM_FLUSH_CNTL,
-		adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL);
-	WREG32_SOC15(NBIO, 0, mmREMAP_HDP_REG_FLUSH_CNTL,
-		adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_REG_FLUSH_CNTL);
+	if (amdgpu_sriov_vf(adev))
+		adev->rmmio_remap.reg_offset =
+			SOC15_REG_OFFSET(
+				NBIO, 0,
+				mmBIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL)
+			<< 2;
+
+	if (!amdgpu_sriov_vf(adev)) {
+		WREG32_SOC15(NBIO, 0, mmREMAP_HDP_MEM_FLUSH_CNTL,
+			     adev->rmmio_remap.reg_offset +
+				     KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL);
+		WREG32_SOC15(NBIO, 0, mmREMAP_HDP_REG_FLUSH_CNTL,
+			     adev->rmmio_remap.reg_offset +
+				     KFD_MMIO_REMAP_HDP_REG_FLUSH_CNTL);
+	}
 }
 
 static u32 nbio_v7_4_get_rev_id(struct amdgpu_device *adev)
@@ -343,10 +354,6 @@ static void nbio_v7_4_init_registers(struct amdgpu_device *adev)
 {
 	uint32_t baco_cntl;
 
-	if (amdgpu_sriov_vf(adev))
-		adev->rmmio_remap.reg_offset = SOC15_REG_OFFSET(NBIO, 0,
-			mmBIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL) << 2;
-
 	if (adev->ip_versions[NBIO_HWIP][0] == IP_VERSION(7, 4, 4) &&
 	    !amdgpu_sriov_vf(adev)) {
 		baco_cntl = RREG32_SOC15(NBIO, 0, mmBACO_CNTL);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/2] drm/amdgpu: Move HDP remapping earlier during init
  2022-08-29  8:17 [PATCH v2 1/2] drm/amdgpu: Move HDP remapping earlier during init Lijo Lazar
  2022-08-29  8:17 ` [PATCH v2 2/2] drm/amdgpu: Init VF's HDP flush reg offset early Lijo Lazar
@ 2022-08-29 16:50 ` Alex Deucher
  2022-08-30  4:04   ` Lazar, Lijo
  1 sibling, 1 reply; 13+ messages in thread
From: Alex Deucher @ 2022-08-29 16:50 UTC (permalink / raw)
  To: Lijo Lazar
  Cc: amd-gfx, Felix.Kuehling, stable, tseewald, helgaas,
	Alexander.Deucher, sr, Christian.Koenig, Hawking.Zhang

[-- Attachment #1: Type: text/plain, Size: 6613 bytes --]

On Mon, Aug 29, 2022 at 4:18 AM Lijo Lazar <lijo.lazar@amd.com> wrote:
>
> HDP flush is used early in the init sequence as part of memory controller
> block initialization. Hence remapping of HDP registers needed for flush
> needs to happen earlier.
>
> This also fixes the Unsupported Request error reported through AER during
> driver load. The error happens as a write happens to the remap offset
> before real remapping is done.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373
>
> The error was unnoticed before and got visible because of the commit
> referenced below. This doesn't fix anything in the commit below, rather
> fixes the issue in amdgpu exposed by the commit. The reference is only
> to associate this commit with below one so that both go together.
>
> Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
>
> Reported-by: Tom Seewald <tseewald@gmail.com>
> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
> Cc: stable@vger.kernel.org

How about something like the attached patch rather than these two
patches?  It's a bit bigger but seems cleaner and more defensive in my
opinion.

Alex

> ---
> v2:
>         Take care of IP resume cases (Alex Deucher)
>         Add NULL check to nbio.funcs to cover older (GFXv8) ASICs (Felix Kuehling)
>         Add more details in commit message and associate with AER patch (Bjorn
> Helgaas)
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 24 ++++++++++++++++++++++
>  drivers/gpu/drm/amd/amdgpu/nv.c            |  6 ------
>  drivers/gpu/drm/amd/amdgpu/soc15.c         |  6 ------
>  drivers/gpu/drm/amd/amdgpu/soc21.c         |  6 ------
>  4 files changed, 24 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index ce7d117efdb5..e420118769a5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2334,6 +2334,26 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
>         return 0;
>  }
>
> +/**
> + * amdgpu_device_prepare_ip - prepare IPs for hardware initialization
> + *
> + * @adev: amdgpu_device pointer
> + *
> + * Any common hardware initialization sequence that needs to be done before
> + * hw init of individual IPs is performed here. This is different from the
> + * 'common block' which initializes a set of IPs.
> + */
> +static void amdgpu_device_prepare_ip(struct amdgpu_device *adev)
> +{
> +       /* Remap HDP registers to a hole in mmio space, for the purpose
> +        * of exposing those registers to process space. This needs to be
> +        * done before hw init of ip blocks to take care of HDP flush
> +        * operations through registers during hw_init.
> +        */
> +       if (adev->nbio.funcs && adev->nbio.funcs->remap_hdp_registers &&
> +           !amdgpu_sriov_vf(adev))
> +               adev->nbio.funcs->remap_hdp_registers(adev);
> +}
>
>  /**
>   * amdgpu_device_ip_init - run init for hardware IPs
> @@ -2376,6 +2396,8 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
>                                 DRM_ERROR("amdgpu_vram_scratch_init failed %d\n", r);
>                                 goto init_failed;
>                         }
> +
> +                       amdgpu_device_prepare_ip(adev);
>                         r = adev->ip_blocks[i].version->funcs->hw_init((void *)adev);
>                         if (r) {
>                                 DRM_ERROR("hw_init %d failed %d\n", i, r);
> @@ -3058,6 +3080,7 @@ static int amdgpu_device_ip_reinit_early_sriov(struct amdgpu_device *adev)
>                 AMD_IP_BLOCK_TYPE_IH,
>         };
>
> +       amdgpu_device_prepare_ip(adev);
>         for (i = 0; i < adev->num_ip_blocks; i++) {
>                 int j;
>                 struct amdgpu_ip_block *block;
> @@ -3139,6 +3162,7 @@ static int amdgpu_device_ip_resume_phase1(struct amdgpu_device *adev)
>  {
>         int i, r;
>
> +       amdgpu_device_prepare_ip(adev);
>         for (i = 0; i < adev->num_ip_blocks; i++) {
>                 if (!adev->ip_blocks[i].status.valid || adev->ip_blocks[i].status.hw)
>                         continue;
> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
> index b3fba8dea63c..3ac7fef74277 100644
> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
> @@ -1032,12 +1032,6 @@ static int nv_common_hw_init(void *handle)
>         nv_program_aspm(adev);
>         /* setup nbio registers */
>         adev->nbio.funcs->init_registers(adev);
> -       /* remap HDP registers to a hole in mmio space,
> -        * for the purpose of expose those registers
> -        * to process space
> -        */
> -       if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
> -               adev->nbio.funcs->remap_hdp_registers(adev);
>         /* enable the doorbell aperture */
>         nv_enable_doorbell_aperture(adev, true);
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
> index fde6154f2009..a0481e37d7cf 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> @@ -1240,12 +1240,6 @@ static int soc15_common_hw_init(void *handle)
>         soc15_program_aspm(adev);
>         /* setup nbio registers */
>         adev->nbio.funcs->init_registers(adev);
> -       /* remap HDP registers to a hole in mmio space,
> -        * for the purpose of expose those registers
> -        * to process space
> -        */
> -       if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
> -               adev->nbio.funcs->remap_hdp_registers(adev);
>
>         /* enable the doorbell aperture */
>         soc15_enable_doorbell_aperture(adev, true);
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c b/drivers/gpu/drm/amd/amdgpu/soc21.c
> index 55284b24f113..16b447055102 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc21.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
> @@ -660,12 +660,6 @@ static int soc21_common_hw_init(void *handle)
>         soc21_program_aspm(adev);
>         /* setup nbio registers */
>         adev->nbio.funcs->init_registers(adev);
> -       /* remap HDP registers to a hole in mmio space,
> -        * for the purpose of expose those registers
> -        * to process space
> -        */
> -       if (adev->nbio.funcs->remap_hdp_registers)
> -               adev->nbio.funcs->remap_hdp_registers(adev);
>         /* enable the doorbell aperture */
>         soc21_enable_doorbell_aperture(adev, true);
>
> --
> 2.25.1
>

[-- Attachment #2: 0001-drm-amdgpu-cleanup-HDP-flush-remap-handling.patch --]
[-- Type: text/x-patch, Size: 18784 bytes --]

From 8ea0dca4f496d448a11abc6651c45a56a99b8303 Mon Sep 17 00:00:00 2001
From: Alex Deucher <alexander.deucher@amd.com>
Date: Mon, 29 Aug 2022 12:37:02 -0400
Subject: [PATCH] drm/amdgpu: cleanup HDP flush remap handling

Use the original register location if the remap has not
happened yet.  HDP flush happens early as part of the
GMC IP setup which happens before the HDP registers are
remapped.  Fix the HDP flush function to use either the
remapped or the original offset depending on whether the
remap has happened or not.

This fixes the Unsupported Request error reported through
AER during driver load. The error happens as a write happens
to the remap offset before real remapping is done.  This should
also fix ASICs with NBIO 7.7 which do not current remap the
HDP register.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373

The error was unnoticed before and got visible because of the commit
referenced below. This doesn't fix anything in the commit below, rather
fixes the issue in amdgpu exposed by the commit. The reference is only
to associate this commit with below one so that both go together.

Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/hdp_v4_0.c  | 13 +++++++++++--
 drivers/gpu/drm/amd/amdgpu/hdp_v5_0.c  | 13 +++++++++++--
 drivers/gpu/drm/amd/amdgpu/hdp_v5_2.c  | 16 +++++++++++-----
 drivers/gpu/drm/amd/amdgpu/hdp_v6_0.c  | 13 +++++++++++--
 drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c | 18 ++++++++++--------
 drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c | 14 ++++++++++----
 drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c | 18 ++++++++++--------
 drivers/gpu/drm/amd/amdgpu/nbio_v7_0.c | 17 ++++++++++-------
 drivers/gpu/drm/amd/amdgpu/nbio_v7_2.c | 18 ++++++++++--------
 drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 18 ++++++++++--------
 drivers/gpu/drm/amd/amdgpu/nv.c        |  7 +------
 drivers/gpu/drm/amd/amdgpu/soc15.c     |  7 +------
 drivers/gpu/drm/amd/amdgpu/soc21.c     |  3 ---
 13 files changed, 106 insertions(+), 69 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/hdp_v4_0.c b/drivers/gpu/drm/amd/amdgpu/hdp_v4_0.c
index adf89680f53e..96768bbda91e 100644
--- a/drivers/gpu/drm/amd/amdgpu/hdp_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/hdp_v4_0.c
@@ -27,6 +27,7 @@
 
 #include "hdp/hdp_4_0_offset.h"
 #include "hdp/hdp_4_0_sh_mask.h"
+#include "nbio/nbio_6_1_offset.h"
 #include <uapi/linux/kfd_ioctl.h>
 
 /* for Vega20 register name change */
@@ -40,10 +41,18 @@
 static void hdp_v4_0_flush_hdp(struct amdgpu_device *adev,
 				struct amdgpu_ring *ring)
 {
+	u32 reg_offset;
+
+	if (adev->rmmio_remap.reg_offset)
+		reg_offset = (adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2;
+	else
+		reg_offset = SOC15_REG_OFFSET(NBIO, 0,
+					      mmBIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL) << 2;
+
 	if (!ring || !ring->funcs->emit_wreg)
-		WREG32_NO_KIQ((adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2, 0);
+		WREG32_NO_KIQ(reg_offset, 0);
 	else
-		amdgpu_ring_emit_wreg(ring, (adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2, 0);
+		amdgpu_ring_emit_wreg(ring, reg_offset, 0);
 }
 
 static void hdp_v4_0_invalidate_hdp(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/hdp_v5_0.c b/drivers/gpu/drm/amd/amdgpu/hdp_v5_0.c
index a9ea23fa0def..4797103e1b14 100644
--- a/drivers/gpu/drm/amd/amdgpu/hdp_v5_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/hdp_v5_0.c
@@ -26,15 +26,24 @@
 
 #include "hdp/hdp_5_0_0_offset.h"
 #include "hdp/hdp_5_0_0_sh_mask.h"
+#include "nbio/nbio_2_3_offset.h"
 #include <uapi/linux/kfd_ioctl.h>
 
 static void hdp_v5_0_flush_hdp(struct amdgpu_device *adev,
 				struct amdgpu_ring *ring)
 {
+	u32 reg_offset;
+
+	if (adev->rmmio_remap.reg_offset)
+		reg_offset = (adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2;
+	else
+		reg_offset = SOC15_REG_OFFSET(NBIO, 0,
+					      mmBIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL) << 2;
+
 	if (!ring || !ring->funcs->emit_wreg)
-		WREG32_NO_KIQ((adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2, 0);
+		WREG32_NO_KIQ(reg_offset, 0);
 	else
-		amdgpu_ring_emit_wreg(ring, (adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2, 0);
+		amdgpu_ring_emit_wreg(ring, reg_offset, 0);
 }
 
 static void hdp_v5_0_invalidate_hdp(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/hdp_v5_2.c b/drivers/gpu/drm/amd/amdgpu/hdp_v5_2.c
index 29c3484ae1f1..b26b1d6fe6a6 100644
--- a/drivers/gpu/drm/amd/amdgpu/hdp_v5_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/hdp_v5_2.c
@@ -26,18 +26,24 @@
 
 #include "hdp/hdp_5_2_1_offset.h"
 #include "hdp/hdp_5_2_1_sh_mask.h"
+#include "nbio/nbio_2_3_offset.h"
 #include <uapi/linux/kfd_ioctl.h>
 
 static void hdp_v5_2_flush_hdp(struct amdgpu_device *adev,
 				struct amdgpu_ring *ring)
 {
+	u32 reg_offset;
+
+	if (adev->rmmio_remap.reg_offset)
+		reg_offset = (adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2;
+	else
+		reg_offset = SOC15_REG_OFFSET(NBIO, 0,
+					      mmBIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL) << 2;
+
 	if (!ring || !ring->funcs->emit_wreg)
-		WREG32_NO_KIQ((adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2,
-			0);
+		WREG32_NO_KIQ(reg_offset, 0);
 	else
-		amdgpu_ring_emit_wreg(ring,
-			(adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2,
-			0);
+		amdgpu_ring_emit_wreg(ring, reg_offset, 0);
 }
 
 static void hdp_v5_2_update_mem_power_gating(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/hdp_v6_0.c b/drivers/gpu/drm/amd/amdgpu/hdp_v6_0.c
index 063eba619f2f..d635249822fc 100644
--- a/drivers/gpu/drm/amd/amdgpu/hdp_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/hdp_v6_0.c
@@ -26,15 +26,24 @@
 
 #include "hdp/hdp_6_0_0_offset.h"
 #include "hdp/hdp_6_0_0_sh_mask.h"
+#include "nbio/nbio_4_3_0_offset.h"
 #include <uapi/linux/kfd_ioctl.h>
 
 static void hdp_v6_0_flush_hdp(struct amdgpu_device *adev,
 				struct amdgpu_ring *ring)
 {
+	u32 reg_offset;
+
+	if (adev->rmmio_remap.reg_offset)
+		reg_offset = (adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2;
+	else
+		reg_offset = SOC15_REG_OFFSET(NBIO, 0,
+					      regBIF_BX_PF0_HDP_MEM_COHERENCY_FLUSH_CNTL) << 2;
+
 	if (!ring || !ring->funcs->emit_wreg)
-		WREG32_NO_KIQ((adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2, 0);
+		WREG32_NO_KIQ(reg_offset, 0);
 	else
-		amdgpu_ring_emit_wreg(ring, (adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2, 0);
+		amdgpu_ring_emit_wreg(ring, reg_offset, 0);
 }
 
 static void hdp_v6_0_update_clock_gating(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c b/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c
index b465baa26762..ac0ea1a4263b 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c
@@ -63,12 +63,18 @@
 #define GPU_HDP_FLUSH_DONE__RSVD_ENG7_MASK	0x00080000L
 #define GPU_HDP_FLUSH_DONE__RSVD_ENG8_MASK	0x00100000L
 
+#define MMIO_REG_HOLE_OFFSET (0x80000 - PAGE_SIZE)
+
 static void nbio_v2_3_remap_hdp_registers(struct amdgpu_device *adev)
 {
-	WREG32_SOC15(NBIO, 0, mmREMAP_HDP_MEM_FLUSH_CNTL,
-		adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL);
-	WREG32_SOC15(NBIO, 0, mmREMAP_HDP_REG_FLUSH_CNTL,
-		adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_REG_FLUSH_CNTL);
+	if (!amdgpu_sriov_vf(adev)) {
+		adev->rmmio_remap.reg_offset = MMIO_REG_HOLE_OFFSET;
+		adev->rmmio_remap.bus_addr = adev->rmmio_base + MMIO_REG_HOLE_OFFSET;
+		WREG32_SOC15(NBIO, 0, mmREMAP_HDP_MEM_FLUSH_CNTL,
+			     adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL);
+		WREG32_SOC15(NBIO, 0, mmREMAP_HDP_REG_FLUSH_CNTL,
+			     adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_REG_FLUSH_CNTL);
+	}
 }
 
 static u32 nbio_v2_3_get_rev_id(struct amdgpu_device *adev)
@@ -338,10 +344,6 @@ static void nbio_v2_3_init_registers(struct amdgpu_device *adev)
 
 	if (def != data)
 		WREG32_PCIE(smnPCIE_CONFIG_CNTL, data);
-
-	if (amdgpu_sriov_vf(adev))
-		adev->rmmio_remap.reg_offset = SOC15_REG_OFFSET(NBIO, 0,
-			mmBIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL) << 2;
 }
 
 #define NAVI10_PCIE__LC_L0S_INACTIVITY_DEFAULT		0x00000000 // off by default, no gains over L1
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c b/drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c
index 982a89f841d5..f1510514fc07 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c
@@ -28,12 +28,18 @@
 #include "nbio/nbio_4_3_0_sh_mask.h"
 #include <uapi/linux/kfd_ioctl.h>
 
+#define MMIO_REG_HOLE_OFFSET (0x80000 - PAGE_SIZE)
+
 static void nbio_v4_3_remap_hdp_registers(struct amdgpu_device *adev)
 {
-	WREG32_SOC15(NBIO, 0, regBIF_BX0_REMAP_HDP_MEM_FLUSH_CNTL,
-		adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL);
-	WREG32_SOC15(NBIO, 0, regBIF_BX0_REMAP_HDP_REG_FLUSH_CNTL,
-		adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_REG_FLUSH_CNTL);
+	if (!amdgpu_sriov_vf(adev)) {
+		adev->rmmio_remap.reg_offset = MMIO_REG_HOLE_OFFSET;
+		adev->rmmio_remap.bus_addr = adev->rmmio_base + MMIO_REG_HOLE_OFFSET;
+		WREG32_SOC15(NBIO, 0, regBIF_BX0_REMAP_HDP_MEM_FLUSH_CNTL,
+			     adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL);
+		WREG32_SOC15(NBIO, 0, regBIF_BX0_REMAP_HDP_REG_FLUSH_CNTL,
+			     adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_REG_FLUSH_CNTL);
+	}
 }
 
 static u32 nbio_v4_3_get_rev_id(struct amdgpu_device *adev)
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c b/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c
index f7f6ddebd3e4..0c47b8b8f46a 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c
@@ -53,12 +53,18 @@
 #define RCC_BIF_STRAP3__STRAP_VLINK_PM_L1_ENTRY_TIMER__SHIFT	0x10
 #define RCC_BIF_STRAP5__STRAP_VLINK_LDN_ENTRY_TIMER__SHIFT	0x0
 
+#define MMIO_REG_HOLE_OFFSET (0x80000 - PAGE_SIZE)
+
 static void nbio_v6_1_remap_hdp_registers(struct amdgpu_device *adev)
 {
-	WREG32_SOC15(NBIO, 0, mmREMAP_HDP_MEM_FLUSH_CNTL,
-		adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL);
-	WREG32_SOC15(NBIO, 0, mmREMAP_HDP_REG_FLUSH_CNTL,
-		adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_REG_FLUSH_CNTL);
+	if (!amdgpu_sriov_vf(adev)) {
+		adev->rmmio_remap.reg_offset = MMIO_REG_HOLE_OFFSET;
+		adev->rmmio_remap.bus_addr = adev->rmmio_base + MMIO_REG_HOLE_OFFSET;
+		WREG32_SOC15(NBIO, 0, mmREMAP_HDP_MEM_FLUSH_CNTL,
+			     adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL);
+		WREG32_SOC15(NBIO, 0, mmREMAP_HDP_REG_FLUSH_CNTL,
+			     adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_REG_FLUSH_CNTL);
+	}
 }
 
 static u32 nbio_v6_1_get_rev_id(struct amdgpu_device *adev)
@@ -276,10 +282,6 @@ static void nbio_v6_1_init_registers(struct amdgpu_device *adev)
 
 	if (def != data)
 		WREG32_PCIE(smnPCIE_CI_CNTL, data);
-
-	if (amdgpu_sriov_vf(adev))
-		adev->rmmio_remap.reg_offset = SOC15_REG_OFFSET(NBIO, 0,
-			mmBIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL) << 2;
 }
 
 static void nbio_v6_1_program_ltr(struct amdgpu_device *adev)
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_0.c b/drivers/gpu/drm/amd/amdgpu/nbio_v7_0.c
index aa0326d00c72..9afa0bf9bc12 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_0.c
@@ -33,12 +33,18 @@
 
 #define smnNBIF_MGCG_CTRL_LCLK	0x1013a05c
 
+#define MMIO_REG_HOLE_OFFSET (0x80000 - PAGE_SIZE)
+
 static void nbio_v7_0_remap_hdp_registers(struct amdgpu_device *adev)
 {
-	WREG32_SOC15(NBIO, 0, mmREMAP_HDP_MEM_FLUSH_CNTL,
-		adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL);
-	WREG32_SOC15(NBIO, 0, mmREMAP_HDP_REG_FLUSH_CNTL,
-		adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_REG_FLUSH_CNTL);
+	if (!amdgpu_sriov_vf(adev)) {
+		adev->rmmio_remap.reg_offset = MMIO_REG_HOLE_OFFSET;
+		adev->rmmio_remap.bus_addr = adev->rmmio_base + MMIO_REG_HOLE_OFFSET;
+		WREG32_SOC15(NBIO, 0, mmREMAP_HDP_MEM_FLUSH_CNTL,
+			     adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL);
+		WREG32_SOC15(NBIO, 0, mmREMAP_HDP_REG_FLUSH_CNTL,
+			     adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_REG_FLUSH_CNTL);
+	}
 }
 
 static u32 nbio_v7_0_get_rev_id(struct amdgpu_device *adev)
@@ -273,9 +279,6 @@ const struct nbio_hdp_flush_reg nbio_v7_0_hdp_flush_reg = {
 
 static void nbio_v7_0_init_registers(struct amdgpu_device *adev)
 {
-	if (amdgpu_sriov_vf(adev))
-		adev->rmmio_remap.reg_offset =
-			SOC15_REG_OFFSET(NBIO, 0, mmHDP_MEM_COHERENCY_FLUSH_CNTL) << 2;
 }
 
 const struct amdgpu_nbio_funcs nbio_v7_0_funcs = {
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_2.c b/drivers/gpu/drm/amd/amdgpu/nbio_v7_2.c
index 31776b12e4c4..132ca949bbab 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_2.c
@@ -47,12 +47,18 @@
 #define BIF1_PCIE_TX_POWER_CTRL_1__MST_MEM_LS_EN_MASK		0x00000001L
 #define BIF1_PCIE_TX_POWER_CTRL_1__REPLAY_MEM_LS_EN_MASK	0x00000008L
 
+#define MMIO_REG_HOLE_OFFSET (0x80000 - PAGE_SIZE)
+
 static void nbio_v7_2_remap_hdp_registers(struct amdgpu_device *adev)
 {
-	WREG32_SOC15(NBIO, 0, regBIF_BX0_REMAP_HDP_MEM_FLUSH_CNTL,
-		adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL);
-	WREG32_SOC15(NBIO, 0, regBIF_BX0_REMAP_HDP_REG_FLUSH_CNTL,
-		adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_REG_FLUSH_CNTL);
+	if (!amdgpu_sriov_vf(adev)) {
+		adev->rmmio_remap.reg_offset = MMIO_REG_HOLE_OFFSET;
+		adev->rmmio_remap.bus_addr = adev->rmmio_base + MMIO_REG_HOLE_OFFSET;
+		WREG32_SOC15(NBIO, 0, regBIF_BX0_REMAP_HDP_MEM_FLUSH_CNTL,
+			     adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL);
+		WREG32_SOC15(NBIO, 0, regBIF_BX0_REMAP_HDP_REG_FLUSH_CNTL,
+			     adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_REG_FLUSH_CNTL);
+	}
 }
 
 static u32 nbio_v7_2_get_rev_id(struct amdgpu_device *adev)
@@ -393,10 +399,6 @@ static void nbio_v7_2_init_registers(struct amdgpu_device *adev)
 			WREG32_PCIE_PORT(SOC15_REG_OFFSET(NBIO, 0, regPCIE_CONFIG_CNTL), data);
 		break;
 	}
-
-	if (amdgpu_sriov_vf(adev))
-		adev->rmmio_remap.reg_offset = SOC15_REG_OFFSET(NBIO, 0,
-			regBIF_BX_PF0_HDP_MEM_COHERENCY_FLUSH_CNTL) << 2;
 }
 
 const struct amdgpu_nbio_funcs nbio_v7_2_funcs = {
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
index 11848d1e238b..bd3dee530150 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
@@ -99,12 +99,18 @@
 static void nbio_v7_4_query_ras_error_count(struct amdgpu_device *adev,
 					void *ras_error_status);
 
+#define MMIO_REG_HOLE_OFFSET (0x80000 - PAGE_SIZE)
+
 static void nbio_v7_4_remap_hdp_registers(struct amdgpu_device *adev)
 {
-	WREG32_SOC15(NBIO, 0, mmREMAP_HDP_MEM_FLUSH_CNTL,
-		adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL);
-	WREG32_SOC15(NBIO, 0, mmREMAP_HDP_REG_FLUSH_CNTL,
-		adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_REG_FLUSH_CNTL);
+	if (!amdgpu_sriov_vf(adev)) {
+		adev->rmmio_remap.reg_offset = MMIO_REG_HOLE_OFFSET;
+		adev->rmmio_remap.bus_addr = adev->rmmio_base + MMIO_REG_HOLE_OFFSET;
+		WREG32_SOC15(NBIO, 0, mmREMAP_HDP_MEM_FLUSH_CNTL,
+			     adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL);
+		WREG32_SOC15(NBIO, 0, mmREMAP_HDP_REG_FLUSH_CNTL,
+			     adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_REG_FLUSH_CNTL);
+	}
 }
 
 static u32 nbio_v7_4_get_rev_id(struct amdgpu_device *adev)
@@ -343,10 +349,6 @@ static void nbio_v7_4_init_registers(struct amdgpu_device *adev)
 {
 	uint32_t baco_cntl;
 
-	if (amdgpu_sriov_vf(adev))
-		adev->rmmio_remap.reg_offset = SOC15_REG_OFFSET(NBIO, 0,
-			mmBIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL) << 2;
-
 	if (adev->ip_versions[NBIO_HWIP][0] == IP_VERSION(7, 4, 4) &&
 	    !amdgpu_sriov_vf(adev)) {
 		baco_cntl = RREG32_SOC15(NBIO, 0, mmBACO_CNTL);
diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
index b3fba8dea63c..a2339cbedd32 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -677,13 +677,8 @@ static const struct amdgpu_asic_funcs nv_asic_funcs =
 
 static int nv_common_early_init(void *handle)
 {
-#define MMIO_REG_HOLE_OFFSET (0x80000 - PAGE_SIZE)
 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
-	if (!amdgpu_sriov_vf(adev)) {
-		adev->rmmio_remap.reg_offset = MMIO_REG_HOLE_OFFSET;
-		adev->rmmio_remap.bus_addr = adev->rmmio_base + MMIO_REG_HOLE_OFFSET;
-	}
 	adev->smc_rreg = NULL;
 	adev->smc_wreg = NULL;
 	adev->pcie_rreg = &nv_pcie_rreg;
@@ -1036,7 +1031,7 @@ static int nv_common_hw_init(void *handle)
 	 * for the purpose of expose those registers
 	 * to process space
 	 */
-	if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
+	if (adev->nbio.funcs->remap_hdp_registers)
 		adev->nbio.funcs->remap_hdp_registers(adev);
 	/* enable the doorbell aperture */
 	nv_enable_doorbell_aperture(adev, true);
diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
index fde6154f2009..89926a014174 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -926,13 +926,8 @@ static const struct amdgpu_asic_funcs vega20_asic_funcs =
 
 static int soc15_common_early_init(void *handle)
 {
-#define MMIO_REG_HOLE_OFFSET (0x80000 - PAGE_SIZE)
 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
-	if (!amdgpu_sriov_vf(adev)) {
-		adev->rmmio_remap.reg_offset = MMIO_REG_HOLE_OFFSET;
-		adev->rmmio_remap.bus_addr = adev->rmmio_base + MMIO_REG_HOLE_OFFSET;
-	}
 	adev->smc_rreg = NULL;
 	adev->smc_wreg = NULL;
 	adev->pcie_rreg = &soc15_pcie_rreg;
@@ -1244,7 +1239,7 @@ static int soc15_common_hw_init(void *handle)
 	 * for the purpose of expose those registers
 	 * to process space
 	 */
-	if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
+	if (adev->nbio.funcs->remap_hdp_registers)
 		adev->nbio.funcs->remap_hdp_registers(adev);
 
 	/* enable the doorbell aperture */
diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c b/drivers/gpu/drm/amd/amdgpu/soc21.c
index 55284b24f113..4d3ddcfd0769 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc21.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
@@ -532,11 +532,8 @@ static const struct amdgpu_asic_funcs soc21_asic_funcs =
 
 static int soc21_common_early_init(void *handle)
 {
-#define MMIO_REG_HOLE_OFFSET (0x80000 - PAGE_SIZE)
 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
-	adev->rmmio_remap.reg_offset = MMIO_REG_HOLE_OFFSET;
-	adev->rmmio_remap.bus_addr = adev->rmmio_base + MMIO_REG_HOLE_OFFSET;
 	adev->smc_rreg = NULL;
 	adev->smc_wreg = NULL;
 	adev->pcie_rreg = &soc21_pcie_rreg;
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/2] drm/amdgpu: Move HDP remapping earlier during init
  2022-08-29 16:50 ` [PATCH v2 1/2] drm/amdgpu: Move HDP remapping earlier during init Alex Deucher
@ 2022-08-30  4:04   ` Lazar, Lijo
  2022-08-30 13:48     ` Alex Deucher
  0 siblings, 1 reply; 13+ messages in thread
From: Lazar, Lijo @ 2022-08-30  4:04 UTC (permalink / raw)
  To: Alex Deucher
  Cc: amd-gfx, Felix.Kuehling, stable, tseewald, helgaas,
	Alexander.Deucher, sr, Christian.Koenig, Hawking.Zhang



On 8/29/2022 10:20 PM, Alex Deucher wrote:
> On Mon, Aug 29, 2022 at 4:18 AM Lijo Lazar <lijo.lazar@amd.com> wrote:
>>
>> HDP flush is used early in the init sequence as part of memory controller
>> block initialization. Hence remapping of HDP registers needed for flush
>> needs to happen earlier.
>>
>> This also fixes the Unsupported Request error reported through AER during
>> driver load. The error happens as a write happens to the remap offset
>> before real remapping is done.
>>
>> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D216373&amp;data=05%7C01%7Clijo.lazar%40amd.com%7C0882d00080124386814a08da89de9bcd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637973886457404198%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=dC%2BCY22cfix1VCcQINrvNWI5XW%2BYV5lleJX3Ju9A6Iw%3D&amp;reserved=0
>>
>> The error was unnoticed before and got visible because of the commit
>> referenced below. This doesn't fix anything in the commit below, rather
>> fixes the issue in amdgpu exposed by the commit. The reference is only
>> to associate this commit with below one so that both go together.
>>
>> Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
>>
>> Reported-by: Tom Seewald <tseewald@gmail.com>
>> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
>> Cc: stable@vger.kernel.org
> 
> How about something like the attached patch rather than these two
> patches?  It's a bit bigger but seems cleaner and more defensive in my
> opinion.
> 

Whenever device goes to suspend/reset and then comes back, remap offset 
has to be set back to 0 to make sure it doesn't use the wrong offset 
when the register assumes default values again.

To avoid the if-check in hdp_flush (which is more frequent), another way 
is to initialize the remap offset to default offset during early init 
and hw fini/suspend sequences. It won't be obvious (even with this 
patch) as to when remap offset vs default offset is used though.

Thanks,
Lijo

> Alex
> 
>> ---
>> v2:
>>          Take care of IP resume cases (Alex Deucher)
>>          Add NULL check to nbio.funcs to cover older (GFXv8) ASICs (Felix Kuehling)
>>          Add more details in commit message and associate with AER patch (Bjorn
>> Helgaas)
>>
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 24 ++++++++++++++++++++++
>>   drivers/gpu/drm/amd/amdgpu/nv.c            |  6 ------
>>   drivers/gpu/drm/amd/amdgpu/soc15.c         |  6 ------
>>   drivers/gpu/drm/amd/amdgpu/soc21.c         |  6 ------
>>   4 files changed, 24 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index ce7d117efdb5..e420118769a5 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -2334,6 +2334,26 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
>>          return 0;
>>   }
>>
>> +/**
>> + * amdgpu_device_prepare_ip - prepare IPs for hardware initialization
>> + *
>> + * @adev: amdgpu_device pointer
>> + *
>> + * Any common hardware initialization sequence that needs to be done before
>> + * hw init of individual IPs is performed here. This is different from the
>> + * 'common block' which initializes a set of IPs.
>> + */
>> +static void amdgpu_device_prepare_ip(struct amdgpu_device *adev)
>> +{
>> +       /* Remap HDP registers to a hole in mmio space, for the purpose
>> +        * of exposing those registers to process space. This needs to be
>> +        * done before hw init of ip blocks to take care of HDP flush
>> +        * operations through registers during hw_init.
>> +        */
>> +       if (adev->nbio.funcs && adev->nbio.funcs->remap_hdp_registers &&
>> +           !amdgpu_sriov_vf(adev))
>> +               adev->nbio.funcs->remap_hdp_registers(adev);
>> +}
>>
>>   /**
>>    * amdgpu_device_ip_init - run init for hardware IPs
>> @@ -2376,6 +2396,8 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
>>                                  DRM_ERROR("amdgpu_vram_scratch_init failed %d\n", r);
>>                                  goto init_failed;
>>                          }
>> +
>> +                       amdgpu_device_prepare_ip(adev);
>>                          r = adev->ip_blocks[i].version->funcs->hw_init((void *)adev);
>>                          if (r) {
>>                                  DRM_ERROR("hw_init %d failed %d\n", i, r);
>> @@ -3058,6 +3080,7 @@ static int amdgpu_device_ip_reinit_early_sriov(struct amdgpu_device *adev)
>>                  AMD_IP_BLOCK_TYPE_IH,
>>          };
>>
>> +       amdgpu_device_prepare_ip(adev);
>>          for (i = 0; i < adev->num_ip_blocks; i++) {
>>                  int j;
>>                  struct amdgpu_ip_block *block;
>> @@ -3139,6 +3162,7 @@ static int amdgpu_device_ip_resume_phase1(struct amdgpu_device *adev)
>>   {
>>          int i, r;
>>
>> +       amdgpu_device_prepare_ip(adev);
>>          for (i = 0; i < adev->num_ip_blocks; i++) {
>>                  if (!adev->ip_blocks[i].status.valid || adev->ip_blocks[i].status.hw)
>>                          continue;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
>> index b3fba8dea63c..3ac7fef74277 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
>> @@ -1032,12 +1032,6 @@ static int nv_common_hw_init(void *handle)
>>          nv_program_aspm(adev);
>>          /* setup nbio registers */
>>          adev->nbio.funcs->init_registers(adev);
>> -       /* remap HDP registers to a hole in mmio space,
>> -        * for the purpose of expose those registers
>> -        * to process space
>> -        */
>> -       if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
>> -               adev->nbio.funcs->remap_hdp_registers(adev);
>>          /* enable the doorbell aperture */
>>          nv_enable_doorbell_aperture(adev, true);
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
>> index fde6154f2009..a0481e37d7cf 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
>> @@ -1240,12 +1240,6 @@ static int soc15_common_hw_init(void *handle)
>>          soc15_program_aspm(adev);
>>          /* setup nbio registers */
>>          adev->nbio.funcs->init_registers(adev);
>> -       /* remap HDP registers to a hole in mmio space,
>> -        * for the purpose of expose those registers
>> -        * to process space
>> -        */
>> -       if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
>> -               adev->nbio.funcs->remap_hdp_registers(adev);
>>
>>          /* enable the doorbell aperture */
>>          soc15_enable_doorbell_aperture(adev, true);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c b/drivers/gpu/drm/amd/amdgpu/soc21.c
>> index 55284b24f113..16b447055102 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/soc21.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
>> @@ -660,12 +660,6 @@ static int soc21_common_hw_init(void *handle)
>>          soc21_program_aspm(adev);
>>          /* setup nbio registers */
>>          adev->nbio.funcs->init_registers(adev);
>> -       /* remap HDP registers to a hole in mmio space,
>> -        * for the purpose of expose those registers
>> -        * to process space
>> -        */
>> -       if (adev->nbio.funcs->remap_hdp_registers)
>> -               adev->nbio.funcs->remap_hdp_registers(adev);
>>          /* enable the doorbell aperture */
>>          soc21_enable_doorbell_aperture(adev, true);
>>
>> --
>> 2.25.1
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/2] drm/amdgpu: Move HDP remapping earlier during init
  2022-08-30  4:04   ` Lazar, Lijo
@ 2022-08-30 13:48     ` Alex Deucher
  2022-08-30 14:45       ` Lazar, Lijo
  0 siblings, 1 reply; 13+ messages in thread
From: Alex Deucher @ 2022-08-30 13:48 UTC (permalink / raw)
  To: Lazar, Lijo
  Cc: amd-gfx, Felix.Kuehling, stable, tseewald, helgaas,
	Alexander.Deucher, sr, Christian.Koenig, Hawking.Zhang

On Tue, Aug 30, 2022 at 12:05 AM Lazar, Lijo <lijo.lazar@amd.com> wrote:
>
>
>
> On 8/29/2022 10:20 PM, Alex Deucher wrote:
> > On Mon, Aug 29, 2022 at 4:18 AM Lijo Lazar <lijo.lazar@amd.com> wrote:
> >>
> >> HDP flush is used early in the init sequence as part of memory controller
> >> block initialization. Hence remapping of HDP registers needed for flush
> >> needs to happen earlier.
> >>
> >> This also fixes the Unsupported Request error reported through AER during
> >> driver load. The error happens as a write happens to the remap offset
> >> before real remapping is done.
> >>
> >> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D216373&amp;data=05%7C01%7Clijo.lazar%40amd.com%7C0882d00080124386814a08da89de9bcd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637973886457404198%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=dC%2BCY22cfix1VCcQINrvNWI5XW%2BYV5lleJX3Ju9A6Iw%3D&amp;reserved=0
> >>
> >> The error was unnoticed before and got visible because of the commit
> >> referenced below. This doesn't fix anything in the commit below, rather
> >> fixes the issue in amdgpu exposed by the commit. The reference is only
> >> to associate this commit with below one so that both go together.
> >>
> >> Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
> >>
> >> Reported-by: Tom Seewald <tseewald@gmail.com>
> >> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
> >> Cc: stable@vger.kernel.org
> >
> > How about something like the attached patch rather than these two
> > patches?  It's a bit bigger but seems cleaner and more defensive in my
> > opinion.
> >
>
> Whenever device goes to suspend/reset and then comes back, remap offset
> has to be set back to 0 to make sure it doesn't use the wrong offset
> when the register assumes default values again.
>
> To avoid the if-check in hdp_flush (which is more frequent), another way
> is to initialize the remap offset to default offset during early init
> and hw fini/suspend sequences. It won't be obvious (even with this
> patch) as to when remap offset vs default offset is used though.

On resume, the common IP is resumed first so it will always be set.
The only case that is a problem is init because we init GMC out of
order.  We could init common before GMC in amdgpu_device_ip_init().  I
think that should be fine, but I wasn't sure if there might be some
fallout from that on certain cards.

Alex

>
> Thanks,
> Lijo
>
> > Alex
> >
> >> ---
> >> v2:
> >>          Take care of IP resume cases (Alex Deucher)
> >>          Add NULL check to nbio.funcs to cover older (GFXv8) ASICs (Felix Kuehling)
> >>          Add more details in commit message and associate with AER patch (Bjorn
> >> Helgaas)
> >>
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 24 ++++++++++++++++++++++
> >>   drivers/gpu/drm/amd/amdgpu/nv.c            |  6 ------
> >>   drivers/gpu/drm/amd/amdgpu/soc15.c         |  6 ------
> >>   drivers/gpu/drm/amd/amdgpu/soc21.c         |  6 ------
> >>   4 files changed, 24 insertions(+), 18 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> index ce7d117efdb5..e420118769a5 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> @@ -2334,6 +2334,26 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
> >>          return 0;
> >>   }
> >>
> >> +/**
> >> + * amdgpu_device_prepare_ip - prepare IPs for hardware initialization
> >> + *
> >> + * @adev: amdgpu_device pointer
> >> + *
> >> + * Any common hardware initialization sequence that needs to be done before
> >> + * hw init of individual IPs is performed here. This is different from the
> >> + * 'common block' which initializes a set of IPs.
> >> + */
> >> +static void amdgpu_device_prepare_ip(struct amdgpu_device *adev)
> >> +{
> >> +       /* Remap HDP registers to a hole in mmio space, for the purpose
> >> +        * of exposing those registers to process space. This needs to be
> >> +        * done before hw init of ip blocks to take care of HDP flush
> >> +        * operations through registers during hw_init.
> >> +        */
> >> +       if (adev->nbio.funcs && adev->nbio.funcs->remap_hdp_registers &&
> >> +           !amdgpu_sriov_vf(adev))
> >> +               adev->nbio.funcs->remap_hdp_registers(adev);
> >> +}
> >>
> >>   /**
> >>    * amdgpu_device_ip_init - run init for hardware IPs
> >> @@ -2376,6 +2396,8 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
> >>                                  DRM_ERROR("amdgpu_vram_scratch_init failed %d\n", r);
> >>                                  goto init_failed;
> >>                          }
> >> +
> >> +                       amdgpu_device_prepare_ip(adev);
> >>                          r = adev->ip_blocks[i].version->funcs->hw_init((void *)adev);
> >>                          if (r) {
> >>                                  DRM_ERROR("hw_init %d failed %d\n", i, r);
> >> @@ -3058,6 +3080,7 @@ static int amdgpu_device_ip_reinit_early_sriov(struct amdgpu_device *adev)
> >>                  AMD_IP_BLOCK_TYPE_IH,
> >>          };
> >>
> >> +       amdgpu_device_prepare_ip(adev);
> >>          for (i = 0; i < adev->num_ip_blocks; i++) {
> >>                  int j;
> >>                  struct amdgpu_ip_block *block;
> >> @@ -3139,6 +3162,7 @@ static int amdgpu_device_ip_resume_phase1(struct amdgpu_device *adev)
> >>   {
> >>          int i, r;
> >>
> >> +       amdgpu_device_prepare_ip(adev);
> >>          for (i = 0; i < adev->num_ip_blocks; i++) {
> >>                  if (!adev->ip_blocks[i].status.valid || adev->ip_blocks[i].status.hw)
> >>                          continue;
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
> >> index b3fba8dea63c..3ac7fef74277 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
> >> @@ -1032,12 +1032,6 @@ static int nv_common_hw_init(void *handle)
> >>          nv_program_aspm(adev);
> >>          /* setup nbio registers */
> >>          adev->nbio.funcs->init_registers(adev);
> >> -       /* remap HDP registers to a hole in mmio space,
> >> -        * for the purpose of expose those registers
> >> -        * to process space
> >> -        */
> >> -       if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
> >> -               adev->nbio.funcs->remap_hdp_registers(adev);
> >>          /* enable the doorbell aperture */
> >>          nv_enable_doorbell_aperture(adev, true);
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
> >> index fde6154f2009..a0481e37d7cf 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> >> @@ -1240,12 +1240,6 @@ static int soc15_common_hw_init(void *handle)
> >>          soc15_program_aspm(adev);
> >>          /* setup nbio registers */
> >>          adev->nbio.funcs->init_registers(adev);
> >> -       /* remap HDP registers to a hole in mmio space,
> >> -        * for the purpose of expose those registers
> >> -        * to process space
> >> -        */
> >> -       if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
> >> -               adev->nbio.funcs->remap_hdp_registers(adev);
> >>
> >>          /* enable the doorbell aperture */
> >>          soc15_enable_doorbell_aperture(adev, true);
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c b/drivers/gpu/drm/amd/amdgpu/soc21.c
> >> index 55284b24f113..16b447055102 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/soc21.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
> >> @@ -660,12 +660,6 @@ static int soc21_common_hw_init(void *handle)
> >>          soc21_program_aspm(adev);
> >>          /* setup nbio registers */
> >>          adev->nbio.funcs->init_registers(adev);
> >> -       /* remap HDP registers to a hole in mmio space,
> >> -        * for the purpose of expose those registers
> >> -        * to process space
> >> -        */
> >> -       if (adev->nbio.funcs->remap_hdp_registers)
> >> -               adev->nbio.funcs->remap_hdp_registers(adev);
> >>          /* enable the doorbell aperture */
> >>          soc21_enable_doorbell_aperture(adev, true);
> >>
> >> --
> >> 2.25.1
> >>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/2] drm/amdgpu: Move HDP remapping earlier during init
  2022-08-30 13:48     ` Alex Deucher
@ 2022-08-30 14:45       ` Lazar, Lijo
  2022-08-30 15:09         ` Alex Deucher
  0 siblings, 1 reply; 13+ messages in thread
From: Lazar, Lijo @ 2022-08-30 14:45 UTC (permalink / raw)
  To: Alex Deucher
  Cc: amd-gfx, Felix.Kuehling, stable, tseewald, helgaas,
	Alexander.Deucher, sr, Christian.Koenig, Hawking.Zhang



On 8/30/2022 7:18 PM, Alex Deucher wrote:
> On Tue, Aug 30, 2022 at 12:05 AM Lazar, Lijo <lijo.lazar@amd.com> wrote:
>>
>>
>>
>> On 8/29/2022 10:20 PM, Alex Deucher wrote:
>>> On Mon, Aug 29, 2022 at 4:18 AM Lijo Lazar <lijo.lazar@amd.com> wrote:
>>>>
>>>> HDP flush is used early in the init sequence as part of memory controller
>>>> block initialization. Hence remapping of HDP registers needed for flush
>>>> needs to happen earlier.
>>>>
>>>> This also fixes the Unsupported Request error reported through AER during
>>>> driver load. The error happens as a write happens to the remap offset
>>>> before real remapping is done.
>>>>
>>>> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D216373&amp;data=05%7C01%7Clijo.lazar%40amd.com%7C66066549213c45b1341508da8a8e4f1b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637974641082680316%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=ti%2FktO6Gmrio3TLDoteeJil28PY3D%2BLAinB6TU2mI9s%3D&amp;reserved=0
>>>>
>>>> The error was unnoticed before and got visible because of the commit
>>>> referenced below. This doesn't fix anything in the commit below, rather
>>>> fixes the issue in amdgpu exposed by the commit. The reference is only
>>>> to associate this commit with below one so that both go together.
>>>>
>>>> Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
>>>>
>>>> Reported-by: Tom Seewald <tseewald@gmail.com>
>>>> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
>>>> Cc: stable@vger.kernel.org
>>>
>>> How about something like the attached patch rather than these two
>>> patches?  It's a bit bigger but seems cleaner and more defensive in my
>>> opinion.
>>>
>>
>> Whenever device goes to suspend/reset and then comes back, remap offset
>> has to be set back to 0 to make sure it doesn't use the wrong offset
>> when the register assumes default values again.
>>
>> To avoid the if-check in hdp_flush (which is more frequent), another way
>> is to initialize the remap offset to default offset during early init
>> and hw fini/suspend sequences. It won't be obvious (even with this
>> patch) as to when remap offset vs default offset is used though.
> 
> On resume, the common IP is resumed first so it will always be set.
> The only case that is a problem is init because we init GMC out of
> order.  We could init common before GMC in amdgpu_device_ip_init().  I
> think that should be fine, but I wasn't sure if there might be some
> fallout from that on certain cards.
> 

There are other places where an IP order is forced like in 
amdgpu_device_ip_reinit_early_sriov(). This also may not affect this 
case as remapping is not done for VF.

Agree that a better way is to have the common IP to be inited first.

Thanks,
Lijo

> Alex
> 
>>
>> Thanks,
>> Lijo
>>
>>> Alex
>>>
>>>> ---
>>>> v2:
>>>>           Take care of IP resume cases (Alex Deucher)
>>>>           Add NULL check to nbio.funcs to cover older (GFXv8) ASICs (Felix Kuehling)
>>>>           Add more details in commit message and associate with AER patch (Bjorn
>>>> Helgaas)
>>>>
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 24 ++++++++++++++++++++++
>>>>    drivers/gpu/drm/amd/amdgpu/nv.c            |  6 ------
>>>>    drivers/gpu/drm/amd/amdgpu/soc15.c         |  6 ------
>>>>    drivers/gpu/drm/amd/amdgpu/soc21.c         |  6 ------
>>>>    4 files changed, 24 insertions(+), 18 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> index ce7d117efdb5..e420118769a5 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> @@ -2334,6 +2334,26 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
>>>>           return 0;
>>>>    }
>>>>
>>>> +/**
>>>> + * amdgpu_device_prepare_ip - prepare IPs for hardware initialization
>>>> + *
>>>> + * @adev: amdgpu_device pointer
>>>> + *
>>>> + * Any common hardware initialization sequence that needs to be done before
>>>> + * hw init of individual IPs is performed here. This is different from the
>>>> + * 'common block' which initializes a set of IPs.
>>>> + */
>>>> +static void amdgpu_device_prepare_ip(struct amdgpu_device *adev)
>>>> +{
>>>> +       /* Remap HDP registers to a hole in mmio space, for the purpose
>>>> +        * of exposing those registers to process space. This needs to be
>>>> +        * done before hw init of ip blocks to take care of HDP flush
>>>> +        * operations through registers during hw_init.
>>>> +        */
>>>> +       if (adev->nbio.funcs && adev->nbio.funcs->remap_hdp_registers &&
>>>> +           !amdgpu_sriov_vf(adev))
>>>> +               adev->nbio.funcs->remap_hdp_registers(adev);
>>>> +}
>>>>
>>>>    /**
>>>>     * amdgpu_device_ip_init - run init for hardware IPs
>>>> @@ -2376,6 +2396,8 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
>>>>                                   DRM_ERROR("amdgpu_vram_scratch_init failed %d\n", r);
>>>>                                   goto init_failed;
>>>>                           }
>>>> +
>>>> +                       amdgpu_device_prepare_ip(adev);
>>>>                           r = adev->ip_blocks[i].version->funcs->hw_init((void *)adev);
>>>>                           if (r) {
>>>>                                   DRM_ERROR("hw_init %d failed %d\n", i, r);
>>>> @@ -3058,6 +3080,7 @@ static int amdgpu_device_ip_reinit_early_sriov(struct amdgpu_device *adev)
>>>>                   AMD_IP_BLOCK_TYPE_IH,
>>>>           };
>>>>
>>>> +       amdgpu_device_prepare_ip(adev);
>>>>           for (i = 0; i < adev->num_ip_blocks; i++) {
>>>>                   int j;
>>>>                   struct amdgpu_ip_block *block;
>>>> @@ -3139,6 +3162,7 @@ static int amdgpu_device_ip_resume_phase1(struct amdgpu_device *adev)
>>>>    {
>>>>           int i, r;
>>>>
>>>> +       amdgpu_device_prepare_ip(adev);
>>>>           for (i = 0; i < adev->num_ip_blocks; i++) {
>>>>                   if (!adev->ip_blocks[i].status.valid || adev->ip_blocks[i].status.hw)
>>>>                           continue;
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
>>>> index b3fba8dea63c..3ac7fef74277 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
>>>> @@ -1032,12 +1032,6 @@ static int nv_common_hw_init(void *handle)
>>>>           nv_program_aspm(adev);
>>>>           /* setup nbio registers */
>>>>           adev->nbio.funcs->init_registers(adev);
>>>> -       /* remap HDP registers to a hole in mmio space,
>>>> -        * for the purpose of expose those registers
>>>> -        * to process space
>>>> -        */
>>>> -       if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
>>>> -               adev->nbio.funcs->remap_hdp_registers(adev);
>>>>           /* enable the doorbell aperture */
>>>>           nv_enable_doorbell_aperture(adev, true);
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
>>>> index fde6154f2009..a0481e37d7cf 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
>>>> @@ -1240,12 +1240,6 @@ static int soc15_common_hw_init(void *handle)
>>>>           soc15_program_aspm(adev);
>>>>           /* setup nbio registers */
>>>>           adev->nbio.funcs->init_registers(adev);
>>>> -       /* remap HDP registers to a hole in mmio space,
>>>> -        * for the purpose of expose those registers
>>>> -        * to process space
>>>> -        */
>>>> -       if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
>>>> -               adev->nbio.funcs->remap_hdp_registers(adev);
>>>>
>>>>           /* enable the doorbell aperture */
>>>>           soc15_enable_doorbell_aperture(adev, true);
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c b/drivers/gpu/drm/amd/amdgpu/soc21.c
>>>> index 55284b24f113..16b447055102 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/soc21.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
>>>> @@ -660,12 +660,6 @@ static int soc21_common_hw_init(void *handle)
>>>>           soc21_program_aspm(adev);
>>>>           /* setup nbio registers */
>>>>           adev->nbio.funcs->init_registers(adev);
>>>> -       /* remap HDP registers to a hole in mmio space,
>>>> -        * for the purpose of expose those registers
>>>> -        * to process space
>>>> -        */
>>>> -       if (adev->nbio.funcs->remap_hdp_registers)
>>>> -               adev->nbio.funcs->remap_hdp_registers(adev);
>>>>           /* enable the doorbell aperture */
>>>>           soc21_enable_doorbell_aperture(adev, true);
>>>>
>>>> --
>>>> 2.25.1
>>>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/2] drm/amdgpu: Move HDP remapping earlier during init
  2022-08-30 14:45       ` Lazar, Lijo
@ 2022-08-30 15:09         ` Alex Deucher
  2022-08-30 16:06           ` Lazar, Lijo
  0 siblings, 1 reply; 13+ messages in thread
From: Alex Deucher @ 2022-08-30 15:09 UTC (permalink / raw)
  To: Lazar, Lijo
  Cc: amd-gfx, Felix.Kuehling, stable, tseewald, helgaas,
	Alexander.Deucher, sr, Christian.Koenig, Hawking.Zhang

[-- Attachment #1: Type: text/plain, Size: 9508 bytes --]

On Tue, Aug 30, 2022 at 10:45 AM Lazar, Lijo <lijo.lazar@amd.com> wrote:
>
>
>
> On 8/30/2022 7:18 PM, Alex Deucher wrote:
> > On Tue, Aug 30, 2022 at 12:05 AM Lazar, Lijo <lijo.lazar@amd.com> wrote:
> >>
> >>
> >>
> >> On 8/29/2022 10:20 PM, Alex Deucher wrote:
> >>> On Mon, Aug 29, 2022 at 4:18 AM Lijo Lazar <lijo.lazar@amd.com> wrote:
> >>>>
> >>>> HDP flush is used early in the init sequence as part of memory controller
> >>>> block initialization. Hence remapping of HDP registers needed for flush
> >>>> needs to happen earlier.
> >>>>
> >>>> This also fixes the Unsupported Request error reported through AER during
> >>>> driver load. The error happens as a write happens to the remap offset
> >>>> before real remapping is done.
> >>>>
> >>>> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D216373&amp;data=05%7C01%7Clijo.lazar%40amd.com%7C66066549213c45b1341508da8a8e4f1b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637974641082680316%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=ti%2FktO6Gmrio3TLDoteeJil28PY3D%2BLAinB6TU2mI9s%3D&amp;reserved=0
> >>>>
> >>>> The error was unnoticed before and got visible because of the commit
> >>>> referenced below. This doesn't fix anything in the commit below, rather
> >>>> fixes the issue in amdgpu exposed by the commit. The reference is only
> >>>> to associate this commit with below one so that both go together.
> >>>>
> >>>> Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
> >>>>
> >>>> Reported-by: Tom Seewald <tseewald@gmail.com>
> >>>> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
> >>>> Cc: stable@vger.kernel.org
> >>>
> >>> How about something like the attached patch rather than these two
> >>> patches?  It's a bit bigger but seems cleaner and more defensive in my
> >>> opinion.
> >>>
> >>
> >> Whenever device goes to suspend/reset and then comes back, remap offset
> >> has to be set back to 0 to make sure it doesn't use the wrong offset
> >> when the register assumes default values again.
> >>
> >> To avoid the if-check in hdp_flush (which is more frequent), another way
> >> is to initialize the remap offset to default offset during early init
> >> and hw fini/suspend sequences. It won't be obvious (even with this
> >> patch) as to when remap offset vs default offset is used though.
> >
> > On resume, the common IP is resumed first so it will always be set.
> > The only case that is a problem is init because we init GMC out of
> > order.  We could init common before GMC in amdgpu_device_ip_init().  I
> > think that should be fine, but I wasn't sure if there might be some
> > fallout from that on certain cards.
> >
>
> There are other places where an IP order is forced like in
> amdgpu_device_ip_reinit_early_sriov(). This also may not affect this
> case as remapping is not done for VF.
>
> Agree that a better way is to have the common IP to be inited first.

How about these patches?

Alex


>
> Thanks,
> Lijo
>
> > Alex
> >
> >>
> >> Thanks,
> >> Lijo
> >>
> >>> Alex
> >>>
> >>>> ---
> >>>> v2:
> >>>>           Take care of IP resume cases (Alex Deucher)
> >>>>           Add NULL check to nbio.funcs to cover older (GFXv8) ASICs (Felix Kuehling)
> >>>>           Add more details in commit message and associate with AER patch (Bjorn
> >>>> Helgaas)
> >>>>
> >>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 24 ++++++++++++++++++++++
> >>>>    drivers/gpu/drm/amd/amdgpu/nv.c            |  6 ------
> >>>>    drivers/gpu/drm/amd/amdgpu/soc15.c         |  6 ------
> >>>>    drivers/gpu/drm/amd/amdgpu/soc21.c         |  6 ------
> >>>>    4 files changed, 24 insertions(+), 18 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>>> index ce7d117efdb5..e420118769a5 100644
> >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>>> @@ -2334,6 +2334,26 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
> >>>>           return 0;
> >>>>    }
> >>>>
> >>>> +/**
> >>>> + * amdgpu_device_prepare_ip - prepare IPs for hardware initialization
> >>>> + *
> >>>> + * @adev: amdgpu_device pointer
> >>>> + *
> >>>> + * Any common hardware initialization sequence that needs to be done before
> >>>> + * hw init of individual IPs is performed here. This is different from the
> >>>> + * 'common block' which initializes a set of IPs.
> >>>> + */
> >>>> +static void amdgpu_device_prepare_ip(struct amdgpu_device *adev)
> >>>> +{
> >>>> +       /* Remap HDP registers to a hole in mmio space, for the purpose
> >>>> +        * of exposing those registers to process space. This needs to be
> >>>> +        * done before hw init of ip blocks to take care of HDP flush
> >>>> +        * operations through registers during hw_init.
> >>>> +        */
> >>>> +       if (adev->nbio.funcs && adev->nbio.funcs->remap_hdp_registers &&
> >>>> +           !amdgpu_sriov_vf(adev))
> >>>> +               adev->nbio.funcs->remap_hdp_registers(adev);
> >>>> +}
> >>>>
> >>>>    /**
> >>>>     * amdgpu_device_ip_init - run init for hardware IPs
> >>>> @@ -2376,6 +2396,8 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
> >>>>                                   DRM_ERROR("amdgpu_vram_scratch_init failed %d\n", r);
> >>>>                                   goto init_failed;
> >>>>                           }
> >>>> +
> >>>> +                       amdgpu_device_prepare_ip(adev);
> >>>>                           r = adev->ip_blocks[i].version->funcs->hw_init((void *)adev);
> >>>>                           if (r) {
> >>>>                                   DRM_ERROR("hw_init %d failed %d\n", i, r);
> >>>> @@ -3058,6 +3080,7 @@ static int amdgpu_device_ip_reinit_early_sriov(struct amdgpu_device *adev)
> >>>>                   AMD_IP_BLOCK_TYPE_IH,
> >>>>           };
> >>>>
> >>>> +       amdgpu_device_prepare_ip(adev);
> >>>>           for (i = 0; i < adev->num_ip_blocks; i++) {
> >>>>                   int j;
> >>>>                   struct amdgpu_ip_block *block;
> >>>> @@ -3139,6 +3162,7 @@ static int amdgpu_device_ip_resume_phase1(struct amdgpu_device *adev)
> >>>>    {
> >>>>           int i, r;
> >>>>
> >>>> +       amdgpu_device_prepare_ip(adev);
> >>>>           for (i = 0; i < adev->num_ip_blocks; i++) {
> >>>>                   if (!adev->ip_blocks[i].status.valid || adev->ip_blocks[i].status.hw)
> >>>>                           continue;
> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
> >>>> index b3fba8dea63c..3ac7fef74277 100644
> >>>> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
> >>>> @@ -1032,12 +1032,6 @@ static int nv_common_hw_init(void *handle)
> >>>>           nv_program_aspm(adev);
> >>>>           /* setup nbio registers */
> >>>>           adev->nbio.funcs->init_registers(adev);
> >>>> -       /* remap HDP registers to a hole in mmio space,
> >>>> -        * for the purpose of expose those registers
> >>>> -        * to process space
> >>>> -        */
> >>>> -       if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
> >>>> -               adev->nbio.funcs->remap_hdp_registers(adev);
> >>>>           /* enable the doorbell aperture */
> >>>>           nv_enable_doorbell_aperture(adev, true);
> >>>>
> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
> >>>> index fde6154f2009..a0481e37d7cf 100644
> >>>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> >>>> @@ -1240,12 +1240,6 @@ static int soc15_common_hw_init(void *handle)
> >>>>           soc15_program_aspm(adev);
> >>>>           /* setup nbio registers */
> >>>>           adev->nbio.funcs->init_registers(adev);
> >>>> -       /* remap HDP registers to a hole in mmio space,
> >>>> -        * for the purpose of expose those registers
> >>>> -        * to process space
> >>>> -        */
> >>>> -       if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
> >>>> -               adev->nbio.funcs->remap_hdp_registers(adev);
> >>>>
> >>>>           /* enable the doorbell aperture */
> >>>>           soc15_enable_doorbell_aperture(adev, true);
> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c b/drivers/gpu/drm/amd/amdgpu/soc21.c
> >>>> index 55284b24f113..16b447055102 100644
> >>>> --- a/drivers/gpu/drm/amd/amdgpu/soc21.c
> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
> >>>> @@ -660,12 +660,6 @@ static int soc21_common_hw_init(void *handle)
> >>>>           soc21_program_aspm(adev);
> >>>>           /* setup nbio registers */
> >>>>           adev->nbio.funcs->init_registers(adev);
> >>>> -       /* remap HDP registers to a hole in mmio space,
> >>>> -        * for the purpose of expose those registers
> >>>> -        * to process space
> >>>> -        */
> >>>> -       if (adev->nbio.funcs->remap_hdp_registers)
> >>>> -               adev->nbio.funcs->remap_hdp_registers(adev);
> >>>>           /* enable the doorbell aperture */
> >>>>           soc21_enable_doorbell_aperture(adev, true);
> >>>>
> >>>> --
> >>>> 2.25.1
> >>>>

[-- Attachment #2: 0002-drm-amdgpu-add-HDP-remap-functionality-to-nbio-7.7.patch --]
[-- Type: text/x-patch, Size: 1491 bytes --]

From db6523f284d9f356ade961e4e6a671c5a06e77d7 Mon Sep 17 00:00:00 2001
From: Alex Deucher <alexander.deucher@amd.com>
Date: Tue, 30 Aug 2022 11:08:09 -0400
Subject: [PATCH 2/2] drm/amdgpu: add HDP remap functionality to nbio 7.7

Was missing before and would have resulted in a write to
a non-existant register.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/nbio_v7_7.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_7.c b/drivers/gpu/drm/amd/amdgpu/nbio_v7_7.c
index 1dc95ef21da6..21eac06bf1ed 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_7.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_7.c
@@ -28,6 +28,14 @@
 #include "nbio/nbio_7_7_0_sh_mask.h"
 #include <uapi/linux/kfd_ioctl.h>
 
+static void nbio_v7_7_remap_hdp_registers(struct amdgpu_device *adev)
+{
+	WREG32_SOC15(NBIO, 0, regBIF_BX0_REMAP_HDP_MEM_FLUSH_CNTL,
+		     adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL);
+	WREG32_SOC15(NBIO, 0, regBIF_BX0_REMAP_HDP_REG_FLUSH_CNTL,
+		     adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_REG_FLUSH_CNTL);
+}
+
 static u32 nbio_v7_7_get_rev_id(struct amdgpu_device *adev)
 {
 	u32 tmp;
@@ -342,4 +350,5 @@ const struct amdgpu_nbio_funcs nbio_v7_7_funcs = {
 	.get_clockgating_state = nbio_v7_7_get_clockgating_state,
 	.ih_control = nbio_v7_7_ih_control,
 	.init_registers = nbio_v7_7_init_registers,
+	.remap_hdp_registers = nbio_v7_7_remap_hdp_registers,
 };
-- 
2.37.1


[-- Attachment #3: 0001-drm-amdgpu-make-sure-to-init-common-IP-before-gmc.patch --]
[-- Type: text/x-patch, Size: 2633 bytes --]

From c0dd4904aac3ed69ed5ba1f354bfc03c81c2723a Mon Sep 17 00:00:00 2001
From: Alex Deucher <alexander.deucher@amd.com>
Date: Tue, 30 Aug 2022 10:59:49 -0400
Subject: [PATCH 1/2] drm/amdgpu: make sure to init common IP before gmc

Common is mainly golden register setting and HDP register
remapping, it shouldn't allocate any GPU memory.  Make sure
common happens before gmc so that the HDP registers are
remapped before gmc attempts to access them.

This fixes the Unsupported Request error reported through
AER during driver load. The error happens as a write happens
to the remap offset before real remapping is done.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373

The error was unnoticed before and got visible because of the commit
referenced below. This doesn't fix anything in the commit below, rather
fixes the issue in amdgpu exposed by the commit. The reference is only
to associate this commit with below one so that both go together.

Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index d6424535e340..883645d81030 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2375,8 +2375,16 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
 		}
 		adev->ip_blocks[i].status.sw = true;
 
-		/* need to do gmc hw init early so we can allocate gpu mem */
-		if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_GMC) {
+		if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_COMMON) {
+			/* need to do common hw init early so everything is set up for gmc */
+			r = adev->ip_blocks[i].version->funcs->hw_init((void *)adev);
+			if (r) {
+				DRM_ERROR("hw_init %d failed %d\n", i, r);
+				goto init_failed;
+			}
+			adev->ip_blocks[i].status.hw = true;
+		} else if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_GMC) {
+			/* need to do gmc hw init early so we can allocate gpu mem */
 			/* Try to reserve bad pages early */
 			if (amdgpu_sriov_vf(adev))
 				amdgpu_virt_exchange_data(adev);
@@ -3062,8 +3070,8 @@ static int amdgpu_device_ip_reinit_early_sriov(struct amdgpu_device *adev)
 	int i, r;
 
 	static enum amd_ip_block_type ip_order[] = {
-		AMD_IP_BLOCK_TYPE_GMC,
 		AMD_IP_BLOCK_TYPE_COMMON,
+		AMD_IP_BLOCK_TYPE_GMC,
 		AMD_IP_BLOCK_TYPE_PSP,
 		AMD_IP_BLOCK_TYPE_IH,
 	};
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/2] drm/amdgpu: Move HDP remapping earlier during init
  2022-08-30 15:09         ` Alex Deucher
@ 2022-08-30 16:06           ` Lazar, Lijo
  2022-08-30 18:05             ` Alex Deucher
  0 siblings, 1 reply; 13+ messages in thread
From: Lazar, Lijo @ 2022-08-30 16:06 UTC (permalink / raw)
  To: Alex Deucher
  Cc: amd-gfx, Felix.Kuehling, stable, tseewald, helgaas,
	Alexander.Deucher, sr, Christian.Koenig, Hawking.Zhang



On 8/30/2022 8:39 PM, Alex Deucher wrote:
> On Tue, Aug 30, 2022 at 10:45 AM Lazar, Lijo <lijo.lazar@amd.com> wrote:
>>
>>
>>
>> On 8/30/2022 7:18 PM, Alex Deucher wrote:
>>> On Tue, Aug 30, 2022 at 12:05 AM Lazar, Lijo <lijo.lazar@amd.com> wrote:
>>>>
>>>>
>>>>
>>>> On 8/29/2022 10:20 PM, Alex Deucher wrote:
>>>>> On Mon, Aug 29, 2022 at 4:18 AM Lijo Lazar <lijo.lazar@amd.com> wrote:
>>>>>>
>>>>>> HDP flush is used early in the init sequence as part of memory controller
>>>>>> block initialization. Hence remapping of HDP registers needed for flush
>>>>>> needs to happen earlier.
>>>>>>
>>>>>> This also fixes the Unsupported Request error reported through AER during
>>>>>> driver load. The error happens as a write happens to the remap offset
>>>>>> before real remapping is done.
>>>>>>
>>>>>> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D216373&amp;data=05%7C01%7Clijo.lazar%40amd.com%7Cbe8781fe1b0c41d3bad408da8a99b3d8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637974690005710507%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=98WWFEcwi2tzyf%2BxnYC%2FK3UcCE5mfXI00qfYGUpt2Sk%3D&amp;reserved=0
>>>>>>
>>>>>> The error was unnoticed before and got visible because of the commit
>>>>>> referenced below. This doesn't fix anything in the commit below, rather
>>>>>> fixes the issue in amdgpu exposed by the commit. The reference is only
>>>>>> to associate this commit with below one so that both go together.
>>>>>>
>>>>>> Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
>>>>>>
>>>>>> Reported-by: Tom Seewald <tseewald@gmail.com>
>>>>>> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
>>>>>> Cc: stable@vger.kernel.org
>>>>>
>>>>> How about something like the attached patch rather than these two
>>>>> patches?  It's a bit bigger but seems cleaner and more defensive in my
>>>>> opinion.
>>>>>
>>>>
>>>> Whenever device goes to suspend/reset and then comes back, remap offset
>>>> has to be set back to 0 to make sure it doesn't use the wrong offset
>>>> when the register assumes default values again.
>>>>
>>>> To avoid the if-check in hdp_flush (which is more frequent), another way
>>>> is to initialize the remap offset to default offset during early init
>>>> and hw fini/suspend sequences. It won't be obvious (even with this
>>>> patch) as to when remap offset vs default offset is used though.
>>>
>>> On resume, the common IP is resumed first so it will always be set.
>>> The only case that is a problem is init because we init GMC out of
>>> order.  We could init common before GMC in amdgpu_device_ip_init().  I
>>> think that should be fine, but I wasn't sure if there might be some
>>> fallout from that on certain cards.
>>>
>>
>> There are other places where an IP order is forced like in
>> amdgpu_device_ip_reinit_early_sriov(). This also may not affect this
>> case as remapping is not done for VF.
>>
>> Agree that a better way is to have the common IP to be inited first.
> 
> How about these patches?
> 

Looks good to me. BTW, is nbio 7.7 for an APU (in which case hdp flush 
is not expected to be used)?

Thanks,
Lijo

> Alex
> 
> 
>>
>> Thanks,
>> Lijo
>>
>>> Alex
>>>
>>>>
>>>> Thanks,
>>>> Lijo
>>>>
>>>>> Alex
>>>>>
>>>>>> ---
>>>>>> v2:
>>>>>>            Take care of IP resume cases (Alex Deucher)
>>>>>>            Add NULL check to nbio.funcs to cover older (GFXv8) ASICs (Felix Kuehling)
>>>>>>            Add more details in commit message and associate with AER patch (Bjorn
>>>>>> Helgaas)
>>>>>>
>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 24 ++++++++++++++++++++++
>>>>>>     drivers/gpu/drm/amd/amdgpu/nv.c            |  6 ------
>>>>>>     drivers/gpu/drm/amd/amdgpu/soc15.c         |  6 ------
>>>>>>     drivers/gpu/drm/amd/amdgpu/soc21.c         |  6 ------
>>>>>>     4 files changed, 24 insertions(+), 18 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> index ce7d117efdb5..e420118769a5 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> @@ -2334,6 +2334,26 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
>>>>>>            return 0;
>>>>>>     }
>>>>>>
>>>>>> +/**
>>>>>> + * amdgpu_device_prepare_ip - prepare IPs for hardware initialization
>>>>>> + *
>>>>>> + * @adev: amdgpu_device pointer
>>>>>> + *
>>>>>> + * Any common hardware initialization sequence that needs to be done before
>>>>>> + * hw init of individual IPs is performed here. This is different from the
>>>>>> + * 'common block' which initializes a set of IPs.
>>>>>> + */
>>>>>> +static void amdgpu_device_prepare_ip(struct amdgpu_device *adev)
>>>>>> +{
>>>>>> +       /* Remap HDP registers to a hole in mmio space, for the purpose
>>>>>> +        * of exposing those registers to process space. This needs to be
>>>>>> +        * done before hw init of ip blocks to take care of HDP flush
>>>>>> +        * operations through registers during hw_init.
>>>>>> +        */
>>>>>> +       if (adev->nbio.funcs && adev->nbio.funcs->remap_hdp_registers &&
>>>>>> +           !amdgpu_sriov_vf(adev))
>>>>>> +               adev->nbio.funcs->remap_hdp_registers(adev);
>>>>>> +}
>>>>>>
>>>>>>     /**
>>>>>>      * amdgpu_device_ip_init - run init for hardware IPs
>>>>>> @@ -2376,6 +2396,8 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
>>>>>>                                    DRM_ERROR("amdgpu_vram_scratch_init failed %d\n", r);
>>>>>>                                    goto init_failed;
>>>>>>                            }
>>>>>> +
>>>>>> +                       amdgpu_device_prepare_ip(adev);
>>>>>>                            r = adev->ip_blocks[i].version->funcs->hw_init((void *)adev);
>>>>>>                            if (r) {
>>>>>>                                    DRM_ERROR("hw_init %d failed %d\n", i, r);
>>>>>> @@ -3058,6 +3080,7 @@ static int amdgpu_device_ip_reinit_early_sriov(struct amdgpu_device *adev)
>>>>>>                    AMD_IP_BLOCK_TYPE_IH,
>>>>>>            };
>>>>>>
>>>>>> +       amdgpu_device_prepare_ip(adev);
>>>>>>            for (i = 0; i < adev->num_ip_blocks; i++) {
>>>>>>                    int j;
>>>>>>                    struct amdgpu_ip_block *block;
>>>>>> @@ -3139,6 +3162,7 @@ static int amdgpu_device_ip_resume_phase1(struct amdgpu_device *adev)
>>>>>>     {
>>>>>>            int i, r;
>>>>>>
>>>>>> +       amdgpu_device_prepare_ip(adev);
>>>>>>            for (i = 0; i < adev->num_ip_blocks; i++) {
>>>>>>                    if (!adev->ip_blocks[i].status.valid || adev->ip_blocks[i].status.hw)
>>>>>>                            continue;
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
>>>>>> index b3fba8dea63c..3ac7fef74277 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
>>>>>> @@ -1032,12 +1032,6 @@ static int nv_common_hw_init(void *handle)
>>>>>>            nv_program_aspm(adev);
>>>>>>            /* setup nbio registers */
>>>>>>            adev->nbio.funcs->init_registers(adev);
>>>>>> -       /* remap HDP registers to a hole in mmio space,
>>>>>> -        * for the purpose of expose those registers
>>>>>> -        * to process space
>>>>>> -        */
>>>>>> -       if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
>>>>>> -               adev->nbio.funcs->remap_hdp_registers(adev);
>>>>>>            /* enable the doorbell aperture */
>>>>>>            nv_enable_doorbell_aperture(adev, true);
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
>>>>>> index fde6154f2009..a0481e37d7cf 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
>>>>>> @@ -1240,12 +1240,6 @@ static int soc15_common_hw_init(void *handle)
>>>>>>            soc15_program_aspm(adev);
>>>>>>            /* setup nbio registers */
>>>>>>            adev->nbio.funcs->init_registers(adev);
>>>>>> -       /* remap HDP registers to a hole in mmio space,
>>>>>> -        * for the purpose of expose those registers
>>>>>> -        * to process space
>>>>>> -        */
>>>>>> -       if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
>>>>>> -               adev->nbio.funcs->remap_hdp_registers(adev);
>>>>>>
>>>>>>            /* enable the doorbell aperture */
>>>>>>            soc15_enable_doorbell_aperture(adev, true);
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c b/drivers/gpu/drm/amd/amdgpu/soc21.c
>>>>>> index 55284b24f113..16b447055102 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/soc21.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
>>>>>> @@ -660,12 +660,6 @@ static int soc21_common_hw_init(void *handle)
>>>>>>            soc21_program_aspm(adev);
>>>>>>            /* setup nbio registers */
>>>>>>            adev->nbio.funcs->init_registers(adev);
>>>>>> -       /* remap HDP registers to a hole in mmio space,
>>>>>> -        * for the purpose of expose those registers
>>>>>> -        * to process space
>>>>>> -        */
>>>>>> -       if (adev->nbio.funcs->remap_hdp_registers)
>>>>>> -               adev->nbio.funcs->remap_hdp_registers(adev);
>>>>>>            /* enable the doorbell aperture */
>>>>>>            soc21_enable_doorbell_aperture(adev, true);
>>>>>>
>>>>>> --
>>>>>> 2.25.1
>>>>>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/2] drm/amdgpu: Move HDP remapping earlier during init
  2022-08-30 16:06           ` Lazar, Lijo
@ 2022-08-30 18:05             ` Alex Deucher
  2022-09-01 19:39               ` Alex Deucher
  0 siblings, 1 reply; 13+ messages in thread
From: Alex Deucher @ 2022-08-30 18:05 UTC (permalink / raw)
  To: Lazar, Lijo
  Cc: amd-gfx, Felix.Kuehling, stable, tseewald, helgaas,
	Alexander.Deucher, sr, Christian.Koenig, Hawking.Zhang

On Tue, Aug 30, 2022 at 12:06 PM Lazar, Lijo <lijo.lazar@amd.com> wrote:
>
>
>
> On 8/30/2022 8:39 PM, Alex Deucher wrote:
> > On Tue, Aug 30, 2022 at 10:45 AM Lazar, Lijo <lijo.lazar@amd.com> wrote:
> >>
> >>
> >>
> >> On 8/30/2022 7:18 PM, Alex Deucher wrote:
> >>> On Tue, Aug 30, 2022 at 12:05 AM Lazar, Lijo <lijo.lazar@amd.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 8/29/2022 10:20 PM, Alex Deucher wrote:
> >>>>> On Mon, Aug 29, 2022 at 4:18 AM Lijo Lazar <lijo.lazar@amd.com> wrote:
> >>>>>>
> >>>>>> HDP flush is used early in the init sequence as part of memory controller
> >>>>>> block initialization. Hence remapping of HDP registers needed for flush
> >>>>>> needs to happen earlier.
> >>>>>>
> >>>>>> This also fixes the Unsupported Request error reported through AER during
> >>>>>> driver load. The error happens as a write happens to the remap offset
> >>>>>> before real remapping is done.
> >>>>>>
> >>>>>> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D216373&amp;data=05%7C01%7Clijo.lazar%40amd.com%7Cbe8781fe1b0c41d3bad408da8a99b3d8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637974690005710507%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=98WWFEcwi2tzyf%2BxnYC%2FK3UcCE5mfXI00qfYGUpt2Sk%3D&amp;reserved=0
> >>>>>>
> >>>>>> The error was unnoticed before and got visible because of the commit
> >>>>>> referenced below. This doesn't fix anything in the commit below, rather
> >>>>>> fixes the issue in amdgpu exposed by the commit. The reference is only
> >>>>>> to associate this commit with below one so that both go together.
> >>>>>>
> >>>>>> Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
> >>>>>>
> >>>>>> Reported-by: Tom Seewald <tseewald@gmail.com>
> >>>>>> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
> >>>>>> Cc: stable@vger.kernel.org
> >>>>>
> >>>>> How about something like the attached patch rather than these two
> >>>>> patches?  It's a bit bigger but seems cleaner and more defensive in my
> >>>>> opinion.
> >>>>>
> >>>>
> >>>> Whenever device goes to suspend/reset and then comes back, remap offset
> >>>> has to be set back to 0 to make sure it doesn't use the wrong offset
> >>>> when the register assumes default values again.
> >>>>
> >>>> To avoid the if-check in hdp_flush (which is more frequent), another way
> >>>> is to initialize the remap offset to default offset during early init
> >>>> and hw fini/suspend sequences. It won't be obvious (even with this
> >>>> patch) as to when remap offset vs default offset is used though.
> >>>
> >>> On resume, the common IP is resumed first so it will always be set.
> >>> The only case that is a problem is init because we init GMC out of
> >>> order.  We could init common before GMC in amdgpu_device_ip_init().  I
> >>> think that should be fine, but I wasn't sure if there might be some
> >>> fallout from that on certain cards.
> >>>
> >>
> >> There are other places where an IP order is forced like in
> >> amdgpu_device_ip_reinit_early_sriov(). This also may not affect this
> >> case as remapping is not done for VF.
> >>
> >> Agree that a better way is to have the common IP to be inited first.
> >
> > How about these patches?
> >
>
> Looks good to me. BTW, is nbio 7.7 for an APU (in which case hdp flush
> is not expected to be used)?

It would be used in some cases, e.g., GPU VM passthrough where we use
the BAR rather than the carve out.

Alex


>
> Thanks,
> Lijo
>
> > Alex
> >
> >
> >>
> >> Thanks,
> >> Lijo
> >>
> >>> Alex
> >>>
> >>>>
> >>>> Thanks,
> >>>> Lijo
> >>>>
> >>>>> Alex
> >>>>>
> >>>>>> ---
> >>>>>> v2:
> >>>>>>            Take care of IP resume cases (Alex Deucher)
> >>>>>>            Add NULL check to nbio.funcs to cover older (GFXv8) ASICs (Felix Kuehling)
> >>>>>>            Add more details in commit message and associate with AER patch (Bjorn
> >>>>>> Helgaas)
> >>>>>>
> >>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 24 ++++++++++++++++++++++
> >>>>>>     drivers/gpu/drm/amd/amdgpu/nv.c            |  6 ------
> >>>>>>     drivers/gpu/drm/amd/amdgpu/soc15.c         |  6 ------
> >>>>>>     drivers/gpu/drm/amd/amdgpu/soc21.c         |  6 ------
> >>>>>>     4 files changed, 24 insertions(+), 18 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>>>>> index ce7d117efdb5..e420118769a5 100644
> >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>>>>> @@ -2334,6 +2334,26 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
> >>>>>>            return 0;
> >>>>>>     }
> >>>>>>
> >>>>>> +/**
> >>>>>> + * amdgpu_device_prepare_ip - prepare IPs for hardware initialization
> >>>>>> + *
> >>>>>> + * @adev: amdgpu_device pointer
> >>>>>> + *
> >>>>>> + * Any common hardware initialization sequence that needs to be done before
> >>>>>> + * hw init of individual IPs is performed here. This is different from the
> >>>>>> + * 'common block' which initializes a set of IPs.
> >>>>>> + */
> >>>>>> +static void amdgpu_device_prepare_ip(struct amdgpu_device *adev)
> >>>>>> +{
> >>>>>> +       /* Remap HDP registers to a hole in mmio space, for the purpose
> >>>>>> +        * of exposing those registers to process space. This needs to be
> >>>>>> +        * done before hw init of ip blocks to take care of HDP flush
> >>>>>> +        * operations through registers during hw_init.
> >>>>>> +        */
> >>>>>> +       if (adev->nbio.funcs && adev->nbio.funcs->remap_hdp_registers &&
> >>>>>> +           !amdgpu_sriov_vf(adev))
> >>>>>> +               adev->nbio.funcs->remap_hdp_registers(adev);
> >>>>>> +}
> >>>>>>
> >>>>>>     /**
> >>>>>>      * amdgpu_device_ip_init - run init for hardware IPs
> >>>>>> @@ -2376,6 +2396,8 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
> >>>>>>                                    DRM_ERROR("amdgpu_vram_scratch_init failed %d\n", r);
> >>>>>>                                    goto init_failed;
> >>>>>>                            }
> >>>>>> +
> >>>>>> +                       amdgpu_device_prepare_ip(adev);
> >>>>>>                            r = adev->ip_blocks[i].version->funcs->hw_init((void *)adev);
> >>>>>>                            if (r) {
> >>>>>>                                    DRM_ERROR("hw_init %d failed %d\n", i, r);
> >>>>>> @@ -3058,6 +3080,7 @@ static int amdgpu_device_ip_reinit_early_sriov(struct amdgpu_device *adev)
> >>>>>>                    AMD_IP_BLOCK_TYPE_IH,
> >>>>>>            };
> >>>>>>
> >>>>>> +       amdgpu_device_prepare_ip(adev);
> >>>>>>            for (i = 0; i < adev->num_ip_blocks; i++) {
> >>>>>>                    int j;
> >>>>>>                    struct amdgpu_ip_block *block;
> >>>>>> @@ -3139,6 +3162,7 @@ static int amdgpu_device_ip_resume_phase1(struct amdgpu_device *adev)
> >>>>>>     {
> >>>>>>            int i, r;
> >>>>>>
> >>>>>> +       amdgpu_device_prepare_ip(adev);
> >>>>>>            for (i = 0; i < adev->num_ip_blocks; i++) {
> >>>>>>                    if (!adev->ip_blocks[i].status.valid || adev->ip_blocks[i].status.hw)
> >>>>>>                            continue;
> >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
> >>>>>> index b3fba8dea63c..3ac7fef74277 100644
> >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
> >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
> >>>>>> @@ -1032,12 +1032,6 @@ static int nv_common_hw_init(void *handle)
> >>>>>>            nv_program_aspm(adev);
> >>>>>>            /* setup nbio registers */
> >>>>>>            adev->nbio.funcs->init_registers(adev);
> >>>>>> -       /* remap HDP registers to a hole in mmio space,
> >>>>>> -        * for the purpose of expose those registers
> >>>>>> -        * to process space
> >>>>>> -        */
> >>>>>> -       if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
> >>>>>> -               adev->nbio.funcs->remap_hdp_registers(adev);
> >>>>>>            /* enable the doorbell aperture */
> >>>>>>            nv_enable_doorbell_aperture(adev, true);
> >>>>>>
> >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
> >>>>>> index fde6154f2009..a0481e37d7cf 100644
> >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> >>>>>> @@ -1240,12 +1240,6 @@ static int soc15_common_hw_init(void *handle)
> >>>>>>            soc15_program_aspm(adev);
> >>>>>>            /* setup nbio registers */
> >>>>>>            adev->nbio.funcs->init_registers(adev);
> >>>>>> -       /* remap HDP registers to a hole in mmio space,
> >>>>>> -        * for the purpose of expose those registers
> >>>>>> -        * to process space
> >>>>>> -        */
> >>>>>> -       if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
> >>>>>> -               adev->nbio.funcs->remap_hdp_registers(adev);
> >>>>>>
> >>>>>>            /* enable the doorbell aperture */
> >>>>>>            soc15_enable_doorbell_aperture(adev, true);
> >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c b/drivers/gpu/drm/amd/amdgpu/soc21.c
> >>>>>> index 55284b24f113..16b447055102 100644
> >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/soc21.c
> >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
> >>>>>> @@ -660,12 +660,6 @@ static int soc21_common_hw_init(void *handle)
> >>>>>>            soc21_program_aspm(adev);
> >>>>>>            /* setup nbio registers */
> >>>>>>            adev->nbio.funcs->init_registers(adev);
> >>>>>> -       /* remap HDP registers to a hole in mmio space,
> >>>>>> -        * for the purpose of expose those registers
> >>>>>> -        * to process space
> >>>>>> -        */
> >>>>>> -       if (adev->nbio.funcs->remap_hdp_registers)
> >>>>>> -               adev->nbio.funcs->remap_hdp_registers(adev);
> >>>>>>            /* enable the doorbell aperture */
> >>>>>>            soc21_enable_doorbell_aperture(adev, true);
> >>>>>>
> >>>>>> --
> >>>>>> 2.25.1
> >>>>>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/2] drm/amdgpu: Move HDP remapping earlier during init
  2022-08-30 18:05             ` Alex Deucher
@ 2022-09-01 19:39               ` Alex Deucher
  2022-09-05  5:27                 ` Lazar, Lijo
  0 siblings, 1 reply; 13+ messages in thread
From: Alex Deucher @ 2022-09-01 19:39 UTC (permalink / raw)
  To: Lazar, Lijo
  Cc: amd-gfx, Felix.Kuehling, stable, tseewald, helgaas,
	Alexander.Deucher, sr, Christian.Koenig, Hawking.Zhang

[-- Attachment #1: Type: text/plain, Size: 10988 bytes --]

How about this?  We should just move HDP remap to gmc hw_init since
that is mainly what uses it anyway.

Alex

On Tue, Aug 30, 2022 at 2:05 PM Alex Deucher <alexdeucher@gmail.com> wrote:
>
> On Tue, Aug 30, 2022 at 12:06 PM Lazar, Lijo <lijo.lazar@amd.com> wrote:
> >
> >
> >
> > On 8/30/2022 8:39 PM, Alex Deucher wrote:
> > > On Tue, Aug 30, 2022 at 10:45 AM Lazar, Lijo <lijo.lazar@amd.com> wrote:
> > >>
> > >>
> > >>
> > >> On 8/30/2022 7:18 PM, Alex Deucher wrote:
> > >>> On Tue, Aug 30, 2022 at 12:05 AM Lazar, Lijo <lijo.lazar@amd.com> wrote:
> > >>>>
> > >>>>
> > >>>>
> > >>>> On 8/29/2022 10:20 PM, Alex Deucher wrote:
> > >>>>> On Mon, Aug 29, 2022 at 4:18 AM Lijo Lazar <lijo.lazar@amd.com> wrote:
> > >>>>>>
> > >>>>>> HDP flush is used early in the init sequence as part of memory controller
> > >>>>>> block initialization. Hence remapping of HDP registers needed for flush
> > >>>>>> needs to happen earlier.
> > >>>>>>
> > >>>>>> This also fixes the Unsupported Request error reported through AER during
> > >>>>>> driver load. The error happens as a write happens to the remap offset
> > >>>>>> before real remapping is done.
> > >>>>>>
> > >>>>>> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D216373&amp;data=05%7C01%7Clijo.lazar%40amd.com%7Cbe8781fe1b0c41d3bad408da8a99b3d8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637974690005710507%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=98WWFEcwi2tzyf%2BxnYC%2FK3UcCE5mfXI00qfYGUpt2Sk%3D&amp;reserved=0
> > >>>>>>
> > >>>>>> The error was unnoticed before and got visible because of the commit
> > >>>>>> referenced below. This doesn't fix anything in the commit below, rather
> > >>>>>> fixes the issue in amdgpu exposed by the commit. The reference is only
> > >>>>>> to associate this commit with below one so that both go together.
> > >>>>>>
> > >>>>>> Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
> > >>>>>>
> > >>>>>> Reported-by: Tom Seewald <tseewald@gmail.com>
> > >>>>>> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
> > >>>>>> Cc: stable@vger.kernel.org
> > >>>>>
> > >>>>> How about something like the attached patch rather than these two
> > >>>>> patches?  It's a bit bigger but seems cleaner and more defensive in my
> > >>>>> opinion.
> > >>>>>
> > >>>>
> > >>>> Whenever device goes to suspend/reset and then comes back, remap offset
> > >>>> has to be set back to 0 to make sure it doesn't use the wrong offset
> > >>>> when the register assumes default values again.
> > >>>>
> > >>>> To avoid the if-check in hdp_flush (which is more frequent), another way
> > >>>> is to initialize the remap offset to default offset during early init
> > >>>> and hw fini/suspend sequences. It won't be obvious (even with this
> > >>>> patch) as to when remap offset vs default offset is used though.
> > >>>
> > >>> On resume, the common IP is resumed first so it will always be set.
> > >>> The only case that is a problem is init because we init GMC out of
> > >>> order.  We could init common before GMC in amdgpu_device_ip_init().  I
> > >>> think that should be fine, but I wasn't sure if there might be some
> > >>> fallout from that on certain cards.
> > >>>
> > >>
> > >> There are other places where an IP order is forced like in
> > >> amdgpu_device_ip_reinit_early_sriov(). This also may not affect this
> > >> case as remapping is not done for VF.
> > >>
> > >> Agree that a better way is to have the common IP to be inited first.
> > >
> > > How about these patches?
> > >
> >
> > Looks good to me. BTW, is nbio 7.7 for an APU (in which case hdp flush
> > is not expected to be used)?
>
> It would be used in some cases, e.g., GPU VM passthrough where we use
> the BAR rather than the carve out.
>
> Alex
>
>
> >
> > Thanks,
> > Lijo
> >
> > > Alex
> > >
> > >
> > >>
> > >> Thanks,
> > >> Lijo
> > >>
> > >>> Alex
> > >>>
> > >>>>
> > >>>> Thanks,
> > >>>> Lijo
> > >>>>
> > >>>>> Alex
> > >>>>>
> > >>>>>> ---
> > >>>>>> v2:
> > >>>>>>            Take care of IP resume cases (Alex Deucher)
> > >>>>>>            Add NULL check to nbio.funcs to cover older (GFXv8) ASICs (Felix Kuehling)
> > >>>>>>            Add more details in commit message and associate with AER patch (Bjorn
> > >>>>>> Helgaas)
> > >>>>>>
> > >>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 24 ++++++++++++++++++++++
> > >>>>>>     drivers/gpu/drm/amd/amdgpu/nv.c            |  6 ------
> > >>>>>>     drivers/gpu/drm/amd/amdgpu/soc15.c         |  6 ------
> > >>>>>>     drivers/gpu/drm/amd/amdgpu/soc21.c         |  6 ------
> > >>>>>>     4 files changed, 24 insertions(+), 18 deletions(-)
> > >>>>>>
> > >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > >>>>>> index ce7d117efdb5..e420118769a5 100644
> > >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > >>>>>> @@ -2334,6 +2334,26 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
> > >>>>>>            return 0;
> > >>>>>>     }
> > >>>>>>
> > >>>>>> +/**
> > >>>>>> + * amdgpu_device_prepare_ip - prepare IPs for hardware initialization
> > >>>>>> + *
> > >>>>>> + * @adev: amdgpu_device pointer
> > >>>>>> + *
> > >>>>>> + * Any common hardware initialization sequence that needs to be done before
> > >>>>>> + * hw init of individual IPs is performed here. This is different from the
> > >>>>>> + * 'common block' which initializes a set of IPs.
> > >>>>>> + */
> > >>>>>> +static void amdgpu_device_prepare_ip(struct amdgpu_device *adev)
> > >>>>>> +{
> > >>>>>> +       /* Remap HDP registers to a hole in mmio space, for the purpose
> > >>>>>> +        * of exposing those registers to process space. This needs to be
> > >>>>>> +        * done before hw init of ip blocks to take care of HDP flush
> > >>>>>> +        * operations through registers during hw_init.
> > >>>>>> +        */
> > >>>>>> +       if (adev->nbio.funcs && adev->nbio.funcs->remap_hdp_registers &&
> > >>>>>> +           !amdgpu_sriov_vf(adev))
> > >>>>>> +               adev->nbio.funcs->remap_hdp_registers(adev);
> > >>>>>> +}
> > >>>>>>
> > >>>>>>     /**
> > >>>>>>      * amdgpu_device_ip_init - run init for hardware IPs
> > >>>>>> @@ -2376,6 +2396,8 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
> > >>>>>>                                    DRM_ERROR("amdgpu_vram_scratch_init failed %d\n", r);
> > >>>>>>                                    goto init_failed;
> > >>>>>>                            }
> > >>>>>> +
> > >>>>>> +                       amdgpu_device_prepare_ip(adev);
> > >>>>>>                            r = adev->ip_blocks[i].version->funcs->hw_init((void *)adev);
> > >>>>>>                            if (r) {
> > >>>>>>                                    DRM_ERROR("hw_init %d failed %d\n", i, r);
> > >>>>>> @@ -3058,6 +3080,7 @@ static int amdgpu_device_ip_reinit_early_sriov(struct amdgpu_device *adev)
> > >>>>>>                    AMD_IP_BLOCK_TYPE_IH,
> > >>>>>>            };
> > >>>>>>
> > >>>>>> +       amdgpu_device_prepare_ip(adev);
> > >>>>>>            for (i = 0; i < adev->num_ip_blocks; i++) {
> > >>>>>>                    int j;
> > >>>>>>                    struct amdgpu_ip_block *block;
> > >>>>>> @@ -3139,6 +3162,7 @@ static int amdgpu_device_ip_resume_phase1(struct amdgpu_device *adev)
> > >>>>>>     {
> > >>>>>>            int i, r;
> > >>>>>>
> > >>>>>> +       amdgpu_device_prepare_ip(adev);
> > >>>>>>            for (i = 0; i < adev->num_ip_blocks; i++) {
> > >>>>>>                    if (!adev->ip_blocks[i].status.valid || adev->ip_blocks[i].status.hw)
> > >>>>>>                            continue;
> > >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
> > >>>>>> index b3fba8dea63c..3ac7fef74277 100644
> > >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
> > >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
> > >>>>>> @@ -1032,12 +1032,6 @@ static int nv_common_hw_init(void *handle)
> > >>>>>>            nv_program_aspm(adev);
> > >>>>>>            /* setup nbio registers */
> > >>>>>>            adev->nbio.funcs->init_registers(adev);
> > >>>>>> -       /* remap HDP registers to a hole in mmio space,
> > >>>>>> -        * for the purpose of expose those registers
> > >>>>>> -        * to process space
> > >>>>>> -        */
> > >>>>>> -       if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
> > >>>>>> -               adev->nbio.funcs->remap_hdp_registers(adev);
> > >>>>>>            /* enable the doorbell aperture */
> > >>>>>>            nv_enable_doorbell_aperture(adev, true);
> > >>>>>>
> > >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > >>>>>> index fde6154f2009..a0481e37d7cf 100644
> > >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > >>>>>> @@ -1240,12 +1240,6 @@ static int soc15_common_hw_init(void *handle)
> > >>>>>>            soc15_program_aspm(adev);
> > >>>>>>            /* setup nbio registers */
> > >>>>>>            adev->nbio.funcs->init_registers(adev);
> > >>>>>> -       /* remap HDP registers to a hole in mmio space,
> > >>>>>> -        * for the purpose of expose those registers
> > >>>>>> -        * to process space
> > >>>>>> -        */
> > >>>>>> -       if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
> > >>>>>> -               adev->nbio.funcs->remap_hdp_registers(adev);
> > >>>>>>
> > >>>>>>            /* enable the doorbell aperture */
> > >>>>>>            soc15_enable_doorbell_aperture(adev, true);
> > >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c b/drivers/gpu/drm/amd/amdgpu/soc21.c
> > >>>>>> index 55284b24f113..16b447055102 100644
> > >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/soc21.c
> > >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
> > >>>>>> @@ -660,12 +660,6 @@ static int soc21_common_hw_init(void *handle)
> > >>>>>>            soc21_program_aspm(adev);
> > >>>>>>            /* setup nbio registers */
> > >>>>>>            adev->nbio.funcs->init_registers(adev);
> > >>>>>> -       /* remap HDP registers to a hole in mmio space,
> > >>>>>> -        * for the purpose of expose those registers
> > >>>>>> -        * to process space
> > >>>>>> -        */
> > >>>>>> -       if (adev->nbio.funcs->remap_hdp_registers)
> > >>>>>> -               adev->nbio.funcs->remap_hdp_registers(adev);
> > >>>>>>            /* enable the doorbell aperture */
> > >>>>>>            soc21_enable_doorbell_aperture(adev, true);
> > >>>>>>
> > >>>>>> --
> > >>>>>> 2.25.1
> > >>>>>>

[-- Attachment #2: 0001-drm-amdgpu-init-HDP-remap-as-part-of-GMC-hw_init-for.patch --]
[-- Type: text/x-patch, Size: 4079 bytes --]

From ba71fb1b78340a91468fab44008af24ca548f38d Mon Sep 17 00:00:00 2001
From: Alex Deucher <alexander.deucher@amd.com>
Date: Thu, 1 Sep 2022 15:32:17 -0400
Subject: [PATCH 1/3] drm/amdgpu: init HDP remap as part of GMC hw_init for
 gmc9, 10

GMC is the primary user of this so do it as part of GMC init
rather than common init. This way the remap is always applied
before it's used.

This fixes the Unsupported Request error reported through
AER during driver load. The error happens as a write happens
to the remap offset before real remapping is done.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373

The error was unnoticed before and got visible because of the commit
referenced below. This doesn't fix anything in the commit below, rather
fixes the issue in amdgpu exposed by the commit. The reference is only
to associate this commit with below one so that both go together.

Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 7 +++++++
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  | 7 +++++++
 drivers/gpu/drm/amd/amdgpu/nv.c        | 6 ------
 drivers/gpu/drm/amd/amdgpu/soc15.c     | 7 -------
 4 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index f513e2c2e964..6f16852ea7c6 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -1101,6 +1101,13 @@ static int gmc_v10_0_hw_init(void *handle)
 	if (adev->gfxhub.funcs && adev->gfxhub.funcs->utcl2_harvest)
 		adev->gfxhub.funcs->utcl2_harvest(adev);
 
+	/* remap HDP registers to a hole in mmio space,
+	 * for the purpose of expose those registers
+	 * to process space
+	 */
+	if (adev->nbio.funcs->remap_hdp_registers)
+		adev->nbio.funcs->remap_hdp_registers(adev);
+
 	r = gmc_v10_0_gart_enable(adev);
 	if (r)
 		return r;
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 4603653916f5..31e9e8159aba 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -1832,6 +1832,13 @@ static int gmc_v9_0_hw_init(void *handle)
 	if (adev->mmhub.funcs->update_power_gating)
 		adev->mmhub.funcs->update_power_gating(adev, true);
 
+	/* remap HDP registers to a hole in mmio space,
+	 * for the purpose of expose those registers
+	 * to process space
+	 */
+	if (adev->nbio.funcs->remap_hdp_registers)
+		adev->nbio.funcs->remap_hdp_registers(adev);
+
 	adev->hdp.funcs->init_registers(adev);
 
 	/* After HDP is initialized, flush HDP.*/
diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
index a2339cbedd32..b97c7cc72349 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -1027,12 +1027,6 @@ static int nv_common_hw_init(void *handle)
 	nv_program_aspm(adev);
 	/* setup nbio registers */
 	adev->nbio.funcs->init_registers(adev);
-	/* remap HDP registers to a hole in mmio space,
-	 * for the purpose of expose those registers
-	 * to process space
-	 */
-	if (adev->nbio.funcs->remap_hdp_registers)
-		adev->nbio.funcs->remap_hdp_registers(adev);
 	/* enable the doorbell aperture */
 	nv_enable_doorbell_aperture(adev, true);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
index 603b01db69e2..b1d8e6767d41 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -1235,13 +1235,6 @@ static int soc15_common_hw_init(void *handle)
 	soc15_program_aspm(adev);
 	/* setup nbio registers */
 	adev->nbio.funcs->init_registers(adev);
-	/* remap HDP registers to a hole in mmio space,
-	 * for the purpose of expose those registers
-	 * to process space
-	 */
-	if (adev->nbio.funcs->remap_hdp_registers)
-		adev->nbio.funcs->remap_hdp_registers(adev);
-
 	/* enable the doorbell aperture */
 	soc15_enable_doorbell_aperture(adev, true);
 	/* HW doorbell routing policy: doorbell writing not
-- 
2.37.2


[-- Attachment #3: 0003-drm-amdgpu-add-HDP-remap-functionality-to-nbio-7.7.patch --]
[-- Type: text/x-patch, Size: 1606 bytes --]

From 2b2fa1488ba8932cb7606343c81e2c59b9d86633 Mon Sep 17 00:00:00 2001
From: Alex Deucher <alexander.deucher@amd.com>
Date: Tue, 30 Aug 2022 11:08:09 -0400
Subject: [PATCH 3/3] drm/amdgpu: add HDP remap functionality to nbio 7.7

Was missing before and would have resulted in a write to
a non-existant register. Normally APUs don't use HDP, but
other asics could use this code and APUs do use the HDP
when used in passthrough.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/nbio_v7_7.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_7.c b/drivers/gpu/drm/amd/amdgpu/nbio_v7_7.c
index 1dc95ef21da6..21eac06bf1ed 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_7.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_7.c
@@ -28,6 +28,14 @@
 #include "nbio/nbio_7_7_0_sh_mask.h"
 #include <uapi/linux/kfd_ioctl.h>
 
+static void nbio_v7_7_remap_hdp_registers(struct amdgpu_device *adev)
+{
+	WREG32_SOC15(NBIO, 0, regBIF_BX0_REMAP_HDP_MEM_FLUSH_CNTL,
+		     adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL);
+	WREG32_SOC15(NBIO, 0, regBIF_BX0_REMAP_HDP_REG_FLUSH_CNTL,
+		     adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_REG_FLUSH_CNTL);
+}
+
 static u32 nbio_v7_7_get_rev_id(struct amdgpu_device *adev)
 {
 	u32 tmp;
@@ -342,4 +350,5 @@ const struct amdgpu_nbio_funcs nbio_v7_7_funcs = {
 	.get_clockgating_state = nbio_v7_7_get_clockgating_state,
 	.ih_control = nbio_v7_7_ih_control,
 	.init_registers = nbio_v7_7_init_registers,
+	.remap_hdp_registers = nbio_v7_7_remap_hdp_registers,
 };
-- 
2.37.2


[-- Attachment #4: 0002-drm-amdgpu-init-HDP-remap-as-part-of-GMC-hw_init-for.patch --]
[-- Type: text/x-patch, Size: 2552 bytes --]

From 3396c4a45c16cd1593d90f40a455ee30818cbf7c Mon Sep 17 00:00:00 2001
From: Alex Deucher <alexander.deucher@amd.com>
Date: Thu, 1 Sep 2022 15:36:29 -0400
Subject: [PATCH 2/3] drm/amdgpu: init HDP remap as part of GMC hw_init for
 gmc11

GMC is the primary user of this so do it as part of GMC init
rather than common init. This way the remap is always applied
before it's used.

This fixes the Unsupported Request error reported through
AER during driver load. The error happens as a write happens
to the remap offset before real remapping is done.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373

The error was unnoticed before and got visible because of the commit
referenced below. This doesn't fix anything in the commit below, rather
fixes the issue in amdgpu exposed by the commit. The reference is only
to associate this commit with below one so that both go together.

Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 7 +++++++
 drivers/gpu/drm/amd/amdgpu/soc21.c     | 6 ------
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
index 846ccb6cf07d..ae907bbe70d7 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
@@ -894,6 +894,13 @@ static int gmc_v11_0_hw_init(void *handle)
 	/* The sequence of these two function calls matters.*/
 	gmc_v11_0_init_golden_registers(adev);
 
+	/* remap HDP registers to a hole in mmio space,
+	 * for the purpose of expose those registers
+	 * to process space
+	 */
+	if (adev->nbio.funcs->remap_hdp_registers)
+		adev->nbio.funcs->remap_hdp_registers(adev);
+
 	r = gmc_v11_0_gart_enable(adev);
 	if (r)
 		return r;
diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c b/drivers/gpu/drm/amd/amdgpu/soc21.c
index a9d5a784cc48..648cd1d64fd7 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc21.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
@@ -674,12 +674,6 @@ static int soc21_common_hw_init(void *handle)
 	soc21_program_aspm(adev);
 	/* setup nbio registers */
 	adev->nbio.funcs->init_registers(adev);
-	/* remap HDP registers to a hole in mmio space,
-	 * for the purpose of expose those registers
-	 * to process space
-	 */
-	if (adev->nbio.funcs->remap_hdp_registers)
-		adev->nbio.funcs->remap_hdp_registers(adev);
 	/* enable the doorbell aperture */
 	soc21_enable_doorbell_aperture(adev, true);
 
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/2] drm/amdgpu: Move HDP remapping earlier during init
  2022-09-01 19:39               ` Alex Deucher
@ 2022-09-05  5:27                 ` Lazar, Lijo
  2022-09-06 15:25                   ` Alex Deucher
  0 siblings, 1 reply; 13+ messages in thread
From: Lazar, Lijo @ 2022-09-05  5:27 UTC (permalink / raw)
  To: Alex Deucher
  Cc: amd-gfx, Felix.Kuehling, stable, tseewald, helgaas,
	Alexander.Deucher, sr, Christian.Koenig, Hawking.Zhang



On 9/2/2022 1:09 AM, Alex Deucher wrote:
> How about this?  We should just move HDP remap to gmc hw_init since
> that is mainly what uses it anyway.
> 

Sorry, I missed to R-B the previous version. Did you find any problem 
when common block is initialized first?

I think that patch provides a consistent IP init sequence during cold 
init and resume.

Thanks,
Lijo

> Alex
> 
> On Tue, Aug 30, 2022 at 2:05 PM Alex Deucher <alexdeucher@gmail.com> wrote:
>>
>> On Tue, Aug 30, 2022 at 12:06 PM Lazar, Lijo <lijo.lazar@amd.com> wrote:
>>>
>>>
>>>
>>> On 8/30/2022 8:39 PM, Alex Deucher wrote:
>>>> On Tue, Aug 30, 2022 at 10:45 AM Lazar, Lijo <lijo.lazar@amd.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 8/30/2022 7:18 PM, Alex Deucher wrote:
>>>>>> On Tue, Aug 30, 2022 at 12:05 AM Lazar, Lijo <lijo.lazar@amd.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 8/29/2022 10:20 PM, Alex Deucher wrote:
>>>>>>>> On Mon, Aug 29, 2022 at 4:18 AM Lijo Lazar <lijo.lazar@amd.com> wrote:
>>>>>>>>>
>>>>>>>>> HDP flush is used early in the init sequence as part of memory controller
>>>>>>>>> block initialization. Hence remapping of HDP registers needed for flush
>>>>>>>>> needs to happen earlier.
>>>>>>>>>
>>>>>>>>> This also fixes the Unsupported Request error reported through AER during
>>>>>>>>> driver load. The error happens as a write happens to the remap offset
>>>>>>>>> before real remapping is done.
>>>>>>>>>
>>>>>>>>> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D216373&amp;data=05%7C01%7Clijo.lazar%40amd.com%7C4e59ae0f8ed54aa7c5a208da8c51aa29%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637976579623485975%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=WTcd9ImY7Oba0MT6oQ7%2B7UEBkdS6azvqgYoK%2B%2F4mJPg%3D&amp;reserved=0
>>>>>>>>>
>>>>>>>>> The error was unnoticed before and got visible because of the commit
>>>>>>>>> referenced below. This doesn't fix anything in the commit below, rather
>>>>>>>>> fixes the issue in amdgpu exposed by the commit. The reference is only
>>>>>>>>> to associate this commit with below one so that both go together.
>>>>>>>>>
>>>>>>>>> Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
>>>>>>>>>
>>>>>>>>> Reported-by: Tom Seewald <tseewald@gmail.com>
>>>>>>>>> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
>>>>>>>>> Cc: stable@vger.kernel.org
>>>>>>>>
>>>>>>>> How about something like the attached patch rather than these two
>>>>>>>> patches?  It's a bit bigger but seems cleaner and more defensive in my
>>>>>>>> opinion.
>>>>>>>>
>>>>>>>
>>>>>>> Whenever device goes to suspend/reset and then comes back, remap offset
>>>>>>> has to be set back to 0 to make sure it doesn't use the wrong offset
>>>>>>> when the register assumes default values again.
>>>>>>>
>>>>>>> To avoid the if-check in hdp_flush (which is more frequent), another way
>>>>>>> is to initialize the remap offset to default offset during early init
>>>>>>> and hw fini/suspend sequences. It won't be obvious (even with this
>>>>>>> patch) as to when remap offset vs default offset is used though.
>>>>>>
>>>>>> On resume, the common IP is resumed first so it will always be set.
>>>>>> The only case that is a problem is init because we init GMC out of
>>>>>> order.  We could init common before GMC in amdgpu_device_ip_init().  I
>>>>>> think that should be fine, but I wasn't sure if there might be some
>>>>>> fallout from that on certain cards.
>>>>>>
>>>>>
>>>>> There are other places where an IP order is forced like in
>>>>> amdgpu_device_ip_reinit_early_sriov(). This also may not affect this
>>>>> case as remapping is not done for VF.
>>>>>
>>>>> Agree that a better way is to have the common IP to be inited first.
>>>>
>>>> How about these patches?
>>>>
>>>
>>> Looks good to me. BTW, is nbio 7.7 for an APU (in which case hdp flush
>>> is not expected to be used)?
>>
>> It would be used in some cases, e.g., GPU VM passthrough where we use
>> the BAR rather than the carve out.
>>
>> Alex
>>
>>
>>>
>>> Thanks,
>>> Lijo
>>>
>>>> Alex
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> Lijo
>>>>>
>>>>>> Alex
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Lijo
>>>>>>>
>>>>>>>> Alex
>>>>>>>>
>>>>>>>>> ---
>>>>>>>>> v2:
>>>>>>>>>             Take care of IP resume cases (Alex Deucher)
>>>>>>>>>             Add NULL check to nbio.funcs to cover older (GFXv8) ASICs (Felix Kuehling)
>>>>>>>>>             Add more details in commit message and associate with AER patch (Bjorn
>>>>>>>>> Helgaas)
>>>>>>>>>
>>>>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 24 ++++++++++++++++++++++
>>>>>>>>>      drivers/gpu/drm/amd/amdgpu/nv.c            |  6 ------
>>>>>>>>>      drivers/gpu/drm/amd/amdgpu/soc15.c         |  6 ------
>>>>>>>>>      drivers/gpu/drm/amd/amdgpu/soc21.c         |  6 ------
>>>>>>>>>      4 files changed, 24 insertions(+), 18 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>> index ce7d117efdb5..e420118769a5 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>> @@ -2334,6 +2334,26 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
>>>>>>>>>             return 0;
>>>>>>>>>      }
>>>>>>>>>
>>>>>>>>> +/**
>>>>>>>>> + * amdgpu_device_prepare_ip - prepare IPs for hardware initialization
>>>>>>>>> + *
>>>>>>>>> + * @adev: amdgpu_device pointer
>>>>>>>>> + *
>>>>>>>>> + * Any common hardware initialization sequence that needs to be done before
>>>>>>>>> + * hw init of individual IPs is performed here. This is different from the
>>>>>>>>> + * 'common block' which initializes a set of IPs.
>>>>>>>>> + */
>>>>>>>>> +static void amdgpu_device_prepare_ip(struct amdgpu_device *adev)
>>>>>>>>> +{
>>>>>>>>> +       /* Remap HDP registers to a hole in mmio space, for the purpose
>>>>>>>>> +        * of exposing those registers to process space. This needs to be
>>>>>>>>> +        * done before hw init of ip blocks to take care of HDP flush
>>>>>>>>> +        * operations through registers during hw_init.
>>>>>>>>> +        */
>>>>>>>>> +       if (adev->nbio.funcs && adev->nbio.funcs->remap_hdp_registers &&
>>>>>>>>> +           !amdgpu_sriov_vf(adev))
>>>>>>>>> +               adev->nbio.funcs->remap_hdp_registers(adev);
>>>>>>>>> +}
>>>>>>>>>
>>>>>>>>>      /**
>>>>>>>>>       * amdgpu_device_ip_init - run init for hardware IPs
>>>>>>>>> @@ -2376,6 +2396,8 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
>>>>>>>>>                                     DRM_ERROR("amdgpu_vram_scratch_init failed %d\n", r);
>>>>>>>>>                                     goto init_failed;
>>>>>>>>>                             }
>>>>>>>>> +
>>>>>>>>> +                       amdgpu_device_prepare_ip(adev);
>>>>>>>>>                             r = adev->ip_blocks[i].version->funcs->hw_init((void *)adev);
>>>>>>>>>                             if (r) {
>>>>>>>>>                                     DRM_ERROR("hw_init %d failed %d\n", i, r);
>>>>>>>>> @@ -3058,6 +3080,7 @@ static int amdgpu_device_ip_reinit_early_sriov(struct amdgpu_device *adev)
>>>>>>>>>                     AMD_IP_BLOCK_TYPE_IH,
>>>>>>>>>             };
>>>>>>>>>
>>>>>>>>> +       amdgpu_device_prepare_ip(adev);
>>>>>>>>>             for (i = 0; i < adev->num_ip_blocks; i++) {
>>>>>>>>>                     int j;
>>>>>>>>>                     struct amdgpu_ip_block *block;
>>>>>>>>> @@ -3139,6 +3162,7 @@ static int amdgpu_device_ip_resume_phase1(struct amdgpu_device *adev)
>>>>>>>>>      {
>>>>>>>>>             int i, r;
>>>>>>>>>
>>>>>>>>> +       amdgpu_device_prepare_ip(adev);
>>>>>>>>>             for (i = 0; i < adev->num_ip_blocks; i++) {
>>>>>>>>>                     if (!adev->ip_blocks[i].status.valid || adev->ip_blocks[i].status.hw)
>>>>>>>>>                             continue;
>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
>>>>>>>>> index b3fba8dea63c..3ac7fef74277 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
>>>>>>>>> @@ -1032,12 +1032,6 @@ static int nv_common_hw_init(void *handle)
>>>>>>>>>             nv_program_aspm(adev);
>>>>>>>>>             /* setup nbio registers */
>>>>>>>>>             adev->nbio.funcs->init_registers(adev);
>>>>>>>>> -       /* remap HDP registers to a hole in mmio space,
>>>>>>>>> -        * for the purpose of expose those registers
>>>>>>>>> -        * to process space
>>>>>>>>> -        */
>>>>>>>>> -       if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
>>>>>>>>> -               adev->nbio.funcs->remap_hdp_registers(adev);
>>>>>>>>>             /* enable the doorbell aperture */
>>>>>>>>>             nv_enable_doorbell_aperture(adev, true);
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
>>>>>>>>> index fde6154f2009..a0481e37d7cf 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
>>>>>>>>> @@ -1240,12 +1240,6 @@ static int soc15_common_hw_init(void *handle)
>>>>>>>>>             soc15_program_aspm(adev);
>>>>>>>>>             /* setup nbio registers */
>>>>>>>>>             adev->nbio.funcs->init_registers(adev);
>>>>>>>>> -       /* remap HDP registers to a hole in mmio space,
>>>>>>>>> -        * for the purpose of expose those registers
>>>>>>>>> -        * to process space
>>>>>>>>> -        */
>>>>>>>>> -       if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
>>>>>>>>> -               adev->nbio.funcs->remap_hdp_registers(adev);
>>>>>>>>>
>>>>>>>>>             /* enable the doorbell aperture */
>>>>>>>>>             soc15_enable_doorbell_aperture(adev, true);
>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c b/drivers/gpu/drm/amd/amdgpu/soc21.c
>>>>>>>>> index 55284b24f113..16b447055102 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/soc21.c
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
>>>>>>>>> @@ -660,12 +660,6 @@ static int soc21_common_hw_init(void *handle)
>>>>>>>>>             soc21_program_aspm(adev);
>>>>>>>>>             /* setup nbio registers */
>>>>>>>>>             adev->nbio.funcs->init_registers(adev);
>>>>>>>>> -       /* remap HDP registers to a hole in mmio space,
>>>>>>>>> -        * for the purpose of expose those registers
>>>>>>>>> -        * to process space
>>>>>>>>> -        */
>>>>>>>>> -       if (adev->nbio.funcs->remap_hdp_registers)
>>>>>>>>> -               adev->nbio.funcs->remap_hdp_registers(adev);
>>>>>>>>>             /* enable the doorbell aperture */
>>>>>>>>>             soc21_enable_doorbell_aperture(adev, true);
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> 2.25.1
>>>>>>>>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/2] drm/amdgpu: Move HDP remapping earlier during init
  2022-09-05  5:27                 ` Lazar, Lijo
@ 2022-09-06 15:25                   ` Alex Deucher
  2022-09-06 15:56                     ` Lazar, Lijo
  0 siblings, 1 reply; 13+ messages in thread
From: Alex Deucher @ 2022-09-06 15:25 UTC (permalink / raw)
  To: Lazar, Lijo
  Cc: amd-gfx, Felix.Kuehling, stable, tseewald, helgaas,
	Alexander.Deucher, sr, Christian.Koenig, Hawking.Zhang

On Mon, Sep 5, 2022 at 1:27 AM Lazar, Lijo <lijo.lazar@amd.com> wrote:
>
>
>
> On 9/2/2022 1:09 AM, Alex Deucher wrote:
> > How about this?  We should just move HDP remap to gmc hw_init since
> > that is mainly what uses it anyway.
> >
>
> Sorry, I missed to R-B the previous version. Did you find any problem
> when common block is initialized first?
>

One of the users on the bug report reported an issue with that patch,
that said, that user seems to be seeing a slightly different issue
since he is on a gfx9 card and the original reporter was on gfx10.
https://bugzilla.kernel.org/show_bug.cgi?id=216373

Alex


> I think that patch provides a consistent IP init sequence during cold
> init and resume.
>
> Thanks,
> Lijo
>
> > Alex
> >
> > On Tue, Aug 30, 2022 at 2:05 PM Alex Deucher <alexdeucher@gmail.com> wrote:
> >>
> >> On Tue, Aug 30, 2022 at 12:06 PM Lazar, Lijo <lijo.lazar@amd.com> wrote:
> >>>
> >>>
> >>>
> >>> On 8/30/2022 8:39 PM, Alex Deucher wrote:
> >>>> On Tue, Aug 30, 2022 at 10:45 AM Lazar, Lijo <lijo.lazar@amd.com> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 8/30/2022 7:18 PM, Alex Deucher wrote:
> >>>>>> On Tue, Aug 30, 2022 at 12:05 AM Lazar, Lijo <lijo.lazar@amd.com> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 8/29/2022 10:20 PM, Alex Deucher wrote:
> >>>>>>>> On Mon, Aug 29, 2022 at 4:18 AM Lijo Lazar <lijo.lazar@amd.com> wrote:
> >>>>>>>>>
> >>>>>>>>> HDP flush is used early in the init sequence as part of memory controller
> >>>>>>>>> block initialization. Hence remapping of HDP registers needed for flush
> >>>>>>>>> needs to happen earlier.
> >>>>>>>>>
> >>>>>>>>> This also fixes the Unsupported Request error reported through AER during
> >>>>>>>>> driver load. The error happens as a write happens to the remap offset
> >>>>>>>>> before real remapping is done.
> >>>>>>>>>
> >>>>>>>>> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D216373&amp;data=05%7C01%7Clijo.lazar%40amd.com%7C4e59ae0f8ed54aa7c5a208da8c51aa29%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637976579623485975%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=WTcd9ImY7Oba0MT6oQ7%2B7UEBkdS6azvqgYoK%2B%2F4mJPg%3D&amp;reserved=0
> >>>>>>>>>
> >>>>>>>>> The error was unnoticed before and got visible because of the commit
> >>>>>>>>> referenced below. This doesn't fix anything in the commit below, rather
> >>>>>>>>> fixes the issue in amdgpu exposed by the commit. The reference is only
> >>>>>>>>> to associate this commit with below one so that both go together.
> >>>>>>>>>
> >>>>>>>>> Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
> >>>>>>>>>
> >>>>>>>>> Reported-by: Tom Seewald <tseewald@gmail.com>
> >>>>>>>>> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
> >>>>>>>>> Cc: stable@vger.kernel.org
> >>>>>>>>
> >>>>>>>> How about something like the attached patch rather than these two
> >>>>>>>> patches?  It's a bit bigger but seems cleaner and more defensive in my
> >>>>>>>> opinion.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Whenever device goes to suspend/reset and then comes back, remap offset
> >>>>>>> has to be set back to 0 to make sure it doesn't use the wrong offset
> >>>>>>> when the register assumes default values again.
> >>>>>>>
> >>>>>>> To avoid the if-check in hdp_flush (which is more frequent), another way
> >>>>>>> is to initialize the remap offset to default offset during early init
> >>>>>>> and hw fini/suspend sequences. It won't be obvious (even with this
> >>>>>>> patch) as to when remap offset vs default offset is used though.
> >>>>>>
> >>>>>> On resume, the common IP is resumed first so it will always be set.
> >>>>>> The only case that is a problem is init because we init GMC out of
> >>>>>> order.  We could init common before GMC in amdgpu_device_ip_init().  I
> >>>>>> think that should be fine, but I wasn't sure if there might be some
> >>>>>> fallout from that on certain cards.
> >>>>>>
> >>>>>
> >>>>> There are other places where an IP order is forced like in
> >>>>> amdgpu_device_ip_reinit_early_sriov(). This also may not affect this
> >>>>> case as remapping is not done for VF.
> >>>>>
> >>>>> Agree that a better way is to have the common IP to be inited first.
> >>>>
> >>>> How about these patches?
> >>>>
> >>>
> >>> Looks good to me. BTW, is nbio 7.7 for an APU (in which case hdp flush
> >>> is not expected to be used)?
> >>
> >> It would be used in some cases, e.g., GPU VM passthrough where we use
> >> the BAR rather than the carve out.
> >>
> >> Alex
> >>
> >>
> >>>
> >>> Thanks,
> >>> Lijo
> >>>
> >>>> Alex
> >>>>
> >>>>
> >>>>>
> >>>>> Thanks,
> >>>>> Lijo
> >>>>>
> >>>>>> Alex
> >>>>>>
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Lijo
> >>>>>>>
> >>>>>>>> Alex
> >>>>>>>>
> >>>>>>>>> ---
> >>>>>>>>> v2:
> >>>>>>>>>             Take care of IP resume cases (Alex Deucher)
> >>>>>>>>>             Add NULL check to nbio.funcs to cover older (GFXv8) ASICs (Felix Kuehling)
> >>>>>>>>>             Add more details in commit message and associate with AER patch (Bjorn
> >>>>>>>>> Helgaas)
> >>>>>>>>>
> >>>>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 24 ++++++++++++++++++++++
> >>>>>>>>>      drivers/gpu/drm/amd/amdgpu/nv.c            |  6 ------
> >>>>>>>>>      drivers/gpu/drm/amd/amdgpu/soc15.c         |  6 ------
> >>>>>>>>>      drivers/gpu/drm/amd/amdgpu/soc21.c         |  6 ------
> >>>>>>>>>      4 files changed, 24 insertions(+), 18 deletions(-)
> >>>>>>>>>
> >>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>>>>>>>> index ce7d117efdb5..e420118769a5 100644
> >>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>>>>>>>> @@ -2334,6 +2334,26 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
> >>>>>>>>>             return 0;
> >>>>>>>>>      }
> >>>>>>>>>
> >>>>>>>>> +/**
> >>>>>>>>> + * amdgpu_device_prepare_ip - prepare IPs for hardware initialization
> >>>>>>>>> + *
> >>>>>>>>> + * @adev: amdgpu_device pointer
> >>>>>>>>> + *
> >>>>>>>>> + * Any common hardware initialization sequence that needs to be done before
> >>>>>>>>> + * hw init of individual IPs is performed here. This is different from the
> >>>>>>>>> + * 'common block' which initializes a set of IPs.
> >>>>>>>>> + */
> >>>>>>>>> +static void amdgpu_device_prepare_ip(struct amdgpu_device *adev)
> >>>>>>>>> +{
> >>>>>>>>> +       /* Remap HDP registers to a hole in mmio space, for the purpose
> >>>>>>>>> +        * of exposing those registers to process space. This needs to be
> >>>>>>>>> +        * done before hw init of ip blocks to take care of HDP flush
> >>>>>>>>> +        * operations through registers during hw_init.
> >>>>>>>>> +        */
> >>>>>>>>> +       if (adev->nbio.funcs && adev->nbio.funcs->remap_hdp_registers &&
> >>>>>>>>> +           !amdgpu_sriov_vf(adev))
> >>>>>>>>> +               adev->nbio.funcs->remap_hdp_registers(adev);
> >>>>>>>>> +}
> >>>>>>>>>
> >>>>>>>>>      /**
> >>>>>>>>>       * amdgpu_device_ip_init - run init for hardware IPs
> >>>>>>>>> @@ -2376,6 +2396,8 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
> >>>>>>>>>                                     DRM_ERROR("amdgpu_vram_scratch_init failed %d\n", r);
> >>>>>>>>>                                     goto init_failed;
> >>>>>>>>>                             }
> >>>>>>>>> +
> >>>>>>>>> +                       amdgpu_device_prepare_ip(adev);
> >>>>>>>>>                             r = adev->ip_blocks[i].version->funcs->hw_init((void *)adev);
> >>>>>>>>>                             if (r) {
> >>>>>>>>>                                     DRM_ERROR("hw_init %d failed %d\n", i, r);
> >>>>>>>>> @@ -3058,6 +3080,7 @@ static int amdgpu_device_ip_reinit_early_sriov(struct amdgpu_device *adev)
> >>>>>>>>>                     AMD_IP_BLOCK_TYPE_IH,
> >>>>>>>>>             };
> >>>>>>>>>
> >>>>>>>>> +       amdgpu_device_prepare_ip(adev);
> >>>>>>>>>             for (i = 0; i < adev->num_ip_blocks; i++) {
> >>>>>>>>>                     int j;
> >>>>>>>>>                     struct amdgpu_ip_block *block;
> >>>>>>>>> @@ -3139,6 +3162,7 @@ static int amdgpu_device_ip_resume_phase1(struct amdgpu_device *adev)
> >>>>>>>>>      {
> >>>>>>>>>             int i, r;
> >>>>>>>>>
> >>>>>>>>> +       amdgpu_device_prepare_ip(adev);
> >>>>>>>>>             for (i = 0; i < adev->num_ip_blocks; i++) {
> >>>>>>>>>                     if (!adev->ip_blocks[i].status.valid || adev->ip_blocks[i].status.hw)
> >>>>>>>>>                             continue;
> >>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
> >>>>>>>>> index b3fba8dea63c..3ac7fef74277 100644
> >>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
> >>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
> >>>>>>>>> @@ -1032,12 +1032,6 @@ static int nv_common_hw_init(void *handle)
> >>>>>>>>>             nv_program_aspm(adev);
> >>>>>>>>>             /* setup nbio registers */
> >>>>>>>>>             adev->nbio.funcs->init_registers(adev);
> >>>>>>>>> -       /* remap HDP registers to a hole in mmio space,
> >>>>>>>>> -        * for the purpose of expose those registers
> >>>>>>>>> -        * to process space
> >>>>>>>>> -        */
> >>>>>>>>> -       if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
> >>>>>>>>> -               adev->nbio.funcs->remap_hdp_registers(adev);
> >>>>>>>>>             /* enable the doorbell aperture */
> >>>>>>>>>             nv_enable_doorbell_aperture(adev, true);
> >>>>>>>>>
> >>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
> >>>>>>>>> index fde6154f2009..a0481e37d7cf 100644
> >>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> >>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> >>>>>>>>> @@ -1240,12 +1240,6 @@ static int soc15_common_hw_init(void *handle)
> >>>>>>>>>             soc15_program_aspm(adev);
> >>>>>>>>>             /* setup nbio registers */
> >>>>>>>>>             adev->nbio.funcs->init_registers(adev);
> >>>>>>>>> -       /* remap HDP registers to a hole in mmio space,
> >>>>>>>>> -        * for the purpose of expose those registers
> >>>>>>>>> -        * to process space
> >>>>>>>>> -        */
> >>>>>>>>> -       if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
> >>>>>>>>> -               adev->nbio.funcs->remap_hdp_registers(adev);
> >>>>>>>>>
> >>>>>>>>>             /* enable the doorbell aperture */
> >>>>>>>>>             soc15_enable_doorbell_aperture(adev, true);
> >>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c b/drivers/gpu/drm/amd/amdgpu/soc21.c
> >>>>>>>>> index 55284b24f113..16b447055102 100644
> >>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/soc21.c
> >>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
> >>>>>>>>> @@ -660,12 +660,6 @@ static int soc21_common_hw_init(void *handle)
> >>>>>>>>>             soc21_program_aspm(adev);
> >>>>>>>>>             /* setup nbio registers */
> >>>>>>>>>             adev->nbio.funcs->init_registers(adev);
> >>>>>>>>> -       /* remap HDP registers to a hole in mmio space,
> >>>>>>>>> -        * for the purpose of expose those registers
> >>>>>>>>> -        * to process space
> >>>>>>>>> -        */
> >>>>>>>>> -       if (adev->nbio.funcs->remap_hdp_registers)
> >>>>>>>>> -               adev->nbio.funcs->remap_hdp_registers(adev);
> >>>>>>>>>             /* enable the doorbell aperture */
> >>>>>>>>>             soc21_enable_doorbell_aperture(adev, true);
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> 2.25.1
> >>>>>>>>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/2] drm/amdgpu: Move HDP remapping earlier during init
  2022-09-06 15:25                   ` Alex Deucher
@ 2022-09-06 15:56                     ` Lazar, Lijo
  0 siblings, 0 replies; 13+ messages in thread
From: Lazar, Lijo @ 2022-09-06 15:56 UTC (permalink / raw)
  To: Alex Deucher
  Cc: amd-gfx, Felix.Kuehling, stable, tseewald, helgaas,
	Alexander.Deucher, sr, Christian.Koenig, Hawking.Zhang



On 9/6/2022 8:55 PM, Alex Deucher wrote:
> On Mon, Sep 5, 2022 at 1:27 AM Lazar, Lijo <lijo.lazar@amd.com> wrote:
>>
>>
>>
>> On 9/2/2022 1:09 AM, Alex Deucher wrote:
>>> How about this?  We should just move HDP remap to gmc hw_init since
>>> that is mainly what uses it anyway.
>>>
>>
>> Sorry, I missed to R-B the previous version. Did you find any problem
>> when common block is initialized first?
>>
> 
> One of the users on the bug report reported an issue with that patch,
> that said, that user seems to be seeing a slightly different issue
> since he is on a gfx9 card and the original reporter was on gfx10.
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D216373&amp;data=05%7C01%7Clijo.lazar%40amd.com%7C5d6d4da1be4a4194b7cc08da901c0bb4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637980747398632310%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=hHoh2YpQ2vpjsRTEqI1vIRY4YQmzfFi5%2FG%2Fnh6vNuds%3D&amp;reserved=0
> 

Yes, I see Bjorn already pointed to another potential issue in LTR 
enablement. So regardless of init-common-first patch, the new error will 
be reported and that is unrelated to the original reported error through 
AER. I still think init-common-first patch is good.

Thanks,
Lijo

> Alex
> 
> 
>> I think that patch provides a consistent IP init sequence during cold
>> init and resume.
>>
>> Thanks,
>> Lijo
>>
>>> Alex
>>>
>>> On Tue, Aug 30, 2022 at 2:05 PM Alex Deucher <alexdeucher@gmail.com> wrote:
>>>>
>>>> On Tue, Aug 30, 2022 at 12:06 PM Lazar, Lijo <lijo.lazar@amd.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 8/30/2022 8:39 PM, Alex Deucher wrote:
>>>>>> On Tue, Aug 30, 2022 at 10:45 AM Lazar, Lijo <lijo.lazar@amd.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 8/30/2022 7:18 PM, Alex Deucher wrote:
>>>>>>>> On Tue, Aug 30, 2022 at 12:05 AM Lazar, Lijo <lijo.lazar@amd.com> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 8/29/2022 10:20 PM, Alex Deucher wrote:
>>>>>>>>>> On Mon, Aug 29, 2022 at 4:18 AM Lijo Lazar <lijo.lazar@amd.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> HDP flush is used early in the init sequence as part of memory controller
>>>>>>>>>>> block initialization. Hence remapping of HDP registers needed for flush
>>>>>>>>>>> needs to happen earlier.
>>>>>>>>>>>
>>>>>>>>>>> This also fixes the Unsupported Request error reported through AER during
>>>>>>>>>>> driver load. The error happens as a write happens to the remap offset
>>>>>>>>>>> before real remapping is done.
>>>>>>>>>>>
>>>>>>>>>>> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D216373&amp;data=05%7C01%7Clijo.lazar%40amd.com%7C5d6d4da1be4a4194b7cc08da901c0bb4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637980747398632310%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=hHoh2YpQ2vpjsRTEqI1vIRY4YQmzfFi5%2FG%2Fnh6vNuds%3D&amp;reserved=0
>>>>>>>>>>>
>>>>>>>>>>> The error was unnoticed before and got visible because of the commit
>>>>>>>>>>> referenced below. This doesn't fix anything in the commit below, rather
>>>>>>>>>>> fixes the issue in amdgpu exposed by the commit. The reference is only
>>>>>>>>>>> to associate this commit with below one so that both go together.
>>>>>>>>>>>
>>>>>>>>>>> Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
>>>>>>>>>>>
>>>>>>>>>>> Reported-by: Tom Seewald <tseewald@gmail.com>
>>>>>>>>>>> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
>>>>>>>>>>> Cc: stable@vger.kernel.org
>>>>>>>>>>
>>>>>>>>>> How about something like the attached patch rather than these two
>>>>>>>>>> patches?  It's a bit bigger but seems cleaner and more defensive in my
>>>>>>>>>> opinion.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Whenever device goes to suspend/reset and then comes back, remap offset
>>>>>>>>> has to be set back to 0 to make sure it doesn't use the wrong offset
>>>>>>>>> when the register assumes default values again.
>>>>>>>>>
>>>>>>>>> To avoid the if-check in hdp_flush (which is more frequent), another way
>>>>>>>>> is to initialize the remap offset to default offset during early init
>>>>>>>>> and hw fini/suspend sequences. It won't be obvious (even with this
>>>>>>>>> patch) as to when remap offset vs default offset is used though.
>>>>>>>>
>>>>>>>> On resume, the common IP is resumed first so it will always be set.
>>>>>>>> The only case that is a problem is init because we init GMC out of
>>>>>>>> order.  We could init common before GMC in amdgpu_device_ip_init().  I
>>>>>>>> think that should be fine, but I wasn't sure if there might be some
>>>>>>>> fallout from that on certain cards.
>>>>>>>>
>>>>>>>
>>>>>>> There are other places where an IP order is forced like in
>>>>>>> amdgpu_device_ip_reinit_early_sriov(). This also may not affect this
>>>>>>> case as remapping is not done for VF.
>>>>>>>
>>>>>>> Agree that a better way is to have the common IP to be inited first.
>>>>>>
>>>>>> How about these patches?
>>>>>>
>>>>>
>>>>> Looks good to me. BTW, is nbio 7.7 for an APU (in which case hdp flush
>>>>> is not expected to be used)?
>>>>
>>>> It would be used in some cases, e.g., GPU VM passthrough where we use
>>>> the BAR rather than the carve out.
>>>>
>>>> Alex
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> Lijo
>>>>>
>>>>>> Alex
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Lijo
>>>>>>>
>>>>>>>> Alex
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Lijo
>>>>>>>>>
>>>>>>>>>> Alex
>>>>>>>>>>
>>>>>>>>>>> ---
>>>>>>>>>>> v2:
>>>>>>>>>>>              Take care of IP resume cases (Alex Deucher)
>>>>>>>>>>>              Add NULL check to nbio.funcs to cover older (GFXv8) ASICs (Felix Kuehling)
>>>>>>>>>>>              Add more details in commit message and associate with AER patch (Bjorn
>>>>>>>>>>> Helgaas)
>>>>>>>>>>>
>>>>>>>>>>>       drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 24 ++++++++++++++++++++++
>>>>>>>>>>>       drivers/gpu/drm/amd/amdgpu/nv.c            |  6 ------
>>>>>>>>>>>       drivers/gpu/drm/amd/amdgpu/soc15.c         |  6 ------
>>>>>>>>>>>       drivers/gpu/drm/amd/amdgpu/soc21.c         |  6 ------
>>>>>>>>>>>       4 files changed, 24 insertions(+), 18 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>>>> index ce7d117efdb5..e420118769a5 100644
>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>>>> @@ -2334,6 +2334,26 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
>>>>>>>>>>>              return 0;
>>>>>>>>>>>       }
>>>>>>>>>>>
>>>>>>>>>>> +/**
>>>>>>>>>>> + * amdgpu_device_prepare_ip - prepare IPs for hardware initialization
>>>>>>>>>>> + *
>>>>>>>>>>> + * @adev: amdgpu_device pointer
>>>>>>>>>>> + *
>>>>>>>>>>> + * Any common hardware initialization sequence that needs to be done before
>>>>>>>>>>> + * hw init of individual IPs is performed here. This is different from the
>>>>>>>>>>> + * 'common block' which initializes a set of IPs.
>>>>>>>>>>> + */
>>>>>>>>>>> +static void amdgpu_device_prepare_ip(struct amdgpu_device *adev)
>>>>>>>>>>> +{
>>>>>>>>>>> +       /* Remap HDP registers to a hole in mmio space, for the purpose
>>>>>>>>>>> +        * of exposing those registers to process space. This needs to be
>>>>>>>>>>> +        * done before hw init of ip blocks to take care of HDP flush
>>>>>>>>>>> +        * operations through registers during hw_init.
>>>>>>>>>>> +        */
>>>>>>>>>>> +       if (adev->nbio.funcs && adev->nbio.funcs->remap_hdp_registers &&
>>>>>>>>>>> +           !amdgpu_sriov_vf(adev))
>>>>>>>>>>> +               adev->nbio.funcs->remap_hdp_registers(adev);
>>>>>>>>>>> +}
>>>>>>>>>>>
>>>>>>>>>>>       /**
>>>>>>>>>>>        * amdgpu_device_ip_init - run init for hardware IPs
>>>>>>>>>>> @@ -2376,6 +2396,8 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
>>>>>>>>>>>                                      DRM_ERROR("amdgpu_vram_scratch_init failed %d\n", r);
>>>>>>>>>>>                                      goto init_failed;
>>>>>>>>>>>                              }
>>>>>>>>>>> +
>>>>>>>>>>> +                       amdgpu_device_prepare_ip(adev);
>>>>>>>>>>>                              r = adev->ip_blocks[i].version->funcs->hw_init((void *)adev);
>>>>>>>>>>>                              if (r) {
>>>>>>>>>>>                                      DRM_ERROR("hw_init %d failed %d\n", i, r);
>>>>>>>>>>> @@ -3058,6 +3080,7 @@ static int amdgpu_device_ip_reinit_early_sriov(struct amdgpu_device *adev)
>>>>>>>>>>>                      AMD_IP_BLOCK_TYPE_IH,
>>>>>>>>>>>              };
>>>>>>>>>>>
>>>>>>>>>>> +       amdgpu_device_prepare_ip(adev);
>>>>>>>>>>>              for (i = 0; i < adev->num_ip_blocks; i++) {
>>>>>>>>>>>                      int j;
>>>>>>>>>>>                      struct amdgpu_ip_block *block;
>>>>>>>>>>> @@ -3139,6 +3162,7 @@ static int amdgpu_device_ip_resume_phase1(struct amdgpu_device *adev)
>>>>>>>>>>>       {
>>>>>>>>>>>              int i, r;
>>>>>>>>>>>
>>>>>>>>>>> +       amdgpu_device_prepare_ip(adev);
>>>>>>>>>>>              for (i = 0; i < adev->num_ip_blocks; i++) {
>>>>>>>>>>>                      if (!adev->ip_blocks[i].status.valid || adev->ip_blocks[i].status.hw)
>>>>>>>>>>>                              continue;
>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
>>>>>>>>>>> index b3fba8dea63c..3ac7fef74277 100644
>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
>>>>>>>>>>> @@ -1032,12 +1032,6 @@ static int nv_common_hw_init(void *handle)
>>>>>>>>>>>              nv_program_aspm(adev);
>>>>>>>>>>>              /* setup nbio registers */
>>>>>>>>>>>              adev->nbio.funcs->init_registers(adev);
>>>>>>>>>>> -       /* remap HDP registers to a hole in mmio space,
>>>>>>>>>>> -        * for the purpose of expose those registers
>>>>>>>>>>> -        * to process space
>>>>>>>>>>> -        */
>>>>>>>>>>> -       if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
>>>>>>>>>>> -               adev->nbio.funcs->remap_hdp_registers(adev);
>>>>>>>>>>>              /* enable the doorbell aperture */
>>>>>>>>>>>              nv_enable_doorbell_aperture(adev, true);
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
>>>>>>>>>>> index fde6154f2009..a0481e37d7cf 100644
>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
>>>>>>>>>>> @@ -1240,12 +1240,6 @@ static int soc15_common_hw_init(void *handle)
>>>>>>>>>>>              soc15_program_aspm(adev);
>>>>>>>>>>>              /* setup nbio registers */
>>>>>>>>>>>              adev->nbio.funcs->init_registers(adev);
>>>>>>>>>>> -       /* remap HDP registers to a hole in mmio space,
>>>>>>>>>>> -        * for the purpose of expose those registers
>>>>>>>>>>> -        * to process space
>>>>>>>>>>> -        */
>>>>>>>>>>> -       if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
>>>>>>>>>>> -               adev->nbio.funcs->remap_hdp_registers(adev);
>>>>>>>>>>>
>>>>>>>>>>>              /* enable the doorbell aperture */
>>>>>>>>>>>              soc15_enable_doorbell_aperture(adev, true);
>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c b/drivers/gpu/drm/amd/amdgpu/soc21.c
>>>>>>>>>>> index 55284b24f113..16b447055102 100644
>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/soc21.c
>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
>>>>>>>>>>> @@ -660,12 +660,6 @@ static int soc21_common_hw_init(void *handle)
>>>>>>>>>>>              soc21_program_aspm(adev);
>>>>>>>>>>>              /* setup nbio registers */
>>>>>>>>>>>              adev->nbio.funcs->init_registers(adev);
>>>>>>>>>>> -       /* remap HDP registers to a hole in mmio space,
>>>>>>>>>>> -        * for the purpose of expose those registers
>>>>>>>>>>> -        * to process space
>>>>>>>>>>> -        */
>>>>>>>>>>> -       if (adev->nbio.funcs->remap_hdp_registers)
>>>>>>>>>>> -               adev->nbio.funcs->remap_hdp_registers(adev);
>>>>>>>>>>>              /* enable the doorbell aperture */
>>>>>>>>>>>              soc21_enable_doorbell_aperture(adev, true);
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> 2.25.1
>>>>>>>>>>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-09-06 16:27 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-29  8:17 [PATCH v2 1/2] drm/amdgpu: Move HDP remapping earlier during init Lijo Lazar
2022-08-29  8:17 ` [PATCH v2 2/2] drm/amdgpu: Init VF's HDP flush reg offset early Lijo Lazar
2022-08-29 16:50 ` [PATCH v2 1/2] drm/amdgpu: Move HDP remapping earlier during init Alex Deucher
2022-08-30  4:04   ` Lazar, Lijo
2022-08-30 13:48     ` Alex Deucher
2022-08-30 14:45       ` Lazar, Lijo
2022-08-30 15:09         ` Alex Deucher
2022-08-30 16:06           ` Lazar, Lijo
2022-08-30 18:05             ` Alex Deucher
2022-09-01 19:39               ` Alex Deucher
2022-09-05  5:27                 ` Lazar, Lijo
2022-09-06 15:25                   ` Alex Deucher
2022-09-06 15:56                     ` Lazar, Lijo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).