regressions.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] fix PCI AER issues
@ 2022-09-09 16:47 Alex Deucher
  2022-09-09 16:47 ` [PATCH 1/7] drm/amdgpu: move nbio remap_hdp_registers() to gmc9 code Alex Deucher
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Alex Deucher @ 2022-09-09 16:47 UTC (permalink / raw)
  To: amd-gfx, helgaas
  Cc: regressions, airlied, linux-pci, tseewald, kai.heng.feng, daniel,
	sr, m.seyfarth, Alex Deucher

The first 3 patches fix the actual PCI AER issue by moving
the nbio HDP remap callback to the GMC code where it is used.
Lijo prefered calling the common hw init early, but that
ran into additional problems on vega systems because it
depends on doorbell apertures having been set up for IH
and SDMA so we run into an ordering problem there.  We
already call the other NBIO callbacks as well as the HDP
callbacks from other IPs when they are needed so it seems
logical to me to move the HDP remap into GMC since it's
mainly used as part of memory management anyway.

The 4th patch just fixes up nbio 7.7 for consistency.

The next 2 patches are optional, but make the code more
consistent with that we do for other IPs.  See those
patches for some additional comments about the ordering
with respect to the doorbell setup.  I didn't notice
any problems in my testing and the VCN doorbell setup
already breaks that rule, so I'm not sure if it's actually
necesary or not.

Finally the last patch enabled early common IP init to happen
before GMC.  It's not strictly necessary with the first 3
patches, but there may be value in it to have the common stuff
enabled before GMC.

Alex Deucher (7):
  drm/amdgpu: move nbio remap_hdp_registers() to gmc9 code
  drm/amdgpu: move nbio remap_hdp_registers() to gmc10 code
  drm/amdgpu: move nbio remap_hdp_registers() to gmc11 code
  drm/amdgpu: add HDP remap functionality to nbio 7.7
  drm/amdgpu: move nbio ih_doorbell_range() into ih code for vega
  drm/amdgpu: move nbio sdma_doorbell_range() into sdma code for vega
  drm/amdgpu: make sure to init common IP before gmc

 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 14 ++++++++--
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c     |  7 +++++
 drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c     |  7 +++++
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c      |  7 +++++
 drivers/gpu/drm/amd/amdgpu/nbio_v7_7.c     |  9 ++++++
 drivers/gpu/drm/amd/amdgpu/nv.c            |  6 ----
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c     |  5 ++++
 drivers/gpu/drm/amd/amdgpu/soc15.c         | 32 ----------------------
 drivers/gpu/drm/amd/amdgpu/soc21.c         |  6 ----
 drivers/gpu/drm/amd/amdgpu/vega10_ih.c     |  4 +++
 drivers/gpu/drm/amd/amdgpu/vega20_ih.c     |  4 +++
 11 files changed, 54 insertions(+), 47 deletions(-)

-- 
2.37.2


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/7] drm/amdgpu: move nbio remap_hdp_registers() to gmc9 code
  2022-09-09 16:47 [PATCH 0/7] fix PCI AER issues Alex Deucher
@ 2022-09-09 16:47 ` Alex Deucher
  2022-09-09 17:17   ` Lazar, Lijo
  2022-09-09 16:47 ` [PATCH 2/7] drm/amdgpu: move nbio remap_hdp_registers() to gmc10 code Alex Deucher
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Alex Deucher @ 2022-09-09 16:47 UTC (permalink / raw)
  To: amd-gfx, helgaas
  Cc: regressions, airlied, linux-pci, tseewald, kai.heng.feng, daniel,
	sr, m.seyfarth, Alex Deucher

This is where it is used, so move it into gmc init so
that it will always be initialized in the right order.
We already do this for other nbio and hdp callbacks so
it's consistent with what we do on other IPs.

This fixes the Unsupported Request error reported through
AER during driver load. The error happens as a write happens
to the remap offset before real remapping is done.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373

The error was unnoticed before and got visible because of the commit
referenced below. This doesn't fix anything in the commit below, rather
fixes the issue in amdgpu exposed by the commit. The reference is only
to associate this commit with below one so that both go together.

Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 7 +++++++
 drivers/gpu/drm/amd/amdgpu/soc15.c    | 7 -------
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 4603653916f5..3a4b0a475672 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -1819,6 +1819,13 @@ static int gmc_v9_0_hw_init(void *handle)
 	bool value;
 	int i, r;
 
+	/* remap HDP registers to a hole in mmio space,
+	 * for the purpose of expose those registers
+	 * to process space
+	 */
+	if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
+		adev->nbio.funcs->remap_hdp_registers(adev);
+
 	/* The sequence of these two function calls matters.*/
 	gmc_v9_0_init_golden_registers(adev);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
index 5188da87428d..39c3c6d65aef 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -1240,13 +1240,6 @@ static int soc15_common_hw_init(void *handle)
 	soc15_program_aspm(adev);
 	/* setup nbio registers */
 	adev->nbio.funcs->init_registers(adev);
-	/* remap HDP registers to a hole in mmio space,
-	 * for the purpose of expose those registers
-	 * to process space
-	 */
-	if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
-		adev->nbio.funcs->remap_hdp_registers(adev);
-
 	/* enable the doorbell aperture */
 	soc15_enable_doorbell_aperture(adev, true);
 	/* HW doorbell routing policy: doorbell writing not
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/7] drm/amdgpu: move nbio remap_hdp_registers() to gmc10 code
  2022-09-09 16:47 [PATCH 0/7] fix PCI AER issues Alex Deucher
  2022-09-09 16:47 ` [PATCH 1/7] drm/amdgpu: move nbio remap_hdp_registers() to gmc9 code Alex Deucher
@ 2022-09-09 16:47 ` Alex Deucher
  2022-09-09 16:47 ` [PATCH 3/7] drm/amdgpu: move nbio remap_hdp_registers() to gmc11 code Alex Deucher
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Alex Deucher @ 2022-09-09 16:47 UTC (permalink / raw)
  To: amd-gfx, helgaas
  Cc: regressions, airlied, linux-pci, tseewald, kai.heng.feng, daniel,
	sr, m.seyfarth, Alex Deucher

This is where it is used, so move it into gmc init so
that it will always be initialized in the right order.
We already do this for other nbio and hdp callbacks so
it's consistent with what we do on other IPs.

This fixes the Unsupported Request error reported through
AER during driver load. The error happens as a write happens
to the remap offset before real remapping is done.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373

The error was unnoticed before and got visible because of the commit
referenced below. This doesn't fix anything in the commit below, rather
fixes the issue in amdgpu exposed by the commit. The reference is only
to associate this commit with below one so that both go together.

Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 7 +++++++
 drivers/gpu/drm/amd/amdgpu/nv.c        | 6 ------
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index f513e2c2e964..140eb47abce6 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -1091,6 +1091,13 @@ static int gmc_v10_0_hw_init(void *handle)
 	int r;
 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
+	/* remap HDP registers to a hole in mmio space,
+	 * for the purpose of expose those registers
+	 * to process space
+	 */
+	if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
+		adev->nbio.funcs->remap_hdp_registers(adev);
+
 	/* The sequence of these two function calls matters.*/
 	gmc_v10_0_init_golden_registers(adev);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
index b3fba8dea63c..3ac7fef74277 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -1032,12 +1032,6 @@ static int nv_common_hw_init(void *handle)
 	nv_program_aspm(adev);
 	/* setup nbio registers */
 	adev->nbio.funcs->init_registers(adev);
-	/* remap HDP registers to a hole in mmio space,
-	 * for the purpose of expose those registers
-	 * to process space
-	 */
-	if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
-		adev->nbio.funcs->remap_hdp_registers(adev);
 	/* enable the doorbell aperture */
 	nv_enable_doorbell_aperture(adev, true);
 
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/7] drm/amdgpu: move nbio remap_hdp_registers() to gmc11 code
  2022-09-09 16:47 [PATCH 0/7] fix PCI AER issues Alex Deucher
  2022-09-09 16:47 ` [PATCH 1/7] drm/amdgpu: move nbio remap_hdp_registers() to gmc9 code Alex Deucher
  2022-09-09 16:47 ` [PATCH 2/7] drm/amdgpu: move nbio remap_hdp_registers() to gmc10 code Alex Deucher
@ 2022-09-09 16:47 ` Alex Deucher
  2022-09-09 16:47 ` [PATCH 4/7] drm/amdgpu: add HDP remap functionality to nbio 7.7 Alex Deucher
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Alex Deucher @ 2022-09-09 16:47 UTC (permalink / raw)
  To: amd-gfx, helgaas
  Cc: regressions, airlied, linux-pci, tseewald, kai.heng.feng, daniel,
	sr, m.seyfarth, Alex Deucher

This is where it is used, so move it into gmc init so
that it will always be initialized in the right order.
We already do this for other nbio and hdp callbacks so
it's consistent with what we do on other IPs.

This fixes the Unsupported Request error reported through
AER during driver load. The error happens as a write happens
to the remap offset before real remapping is done.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373

The error was unnoticed before and got visible because of the commit
referenced below. This doesn't fix anything in the commit below, rather
fixes the issue in amdgpu exposed by the commit. The reference is only
to associate this commit with below one so that both go together.

Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 7 +++++++
 drivers/gpu/drm/amd/amdgpu/soc21.c     | 6 ------
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
index 846ccb6cf07d..b0df27fea648 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
@@ -891,6 +891,13 @@ static int gmc_v11_0_hw_init(void *handle)
 	int r;
 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
+	/* remap HDP registers to a hole in mmio space,
+	 * for the purpose of expose those registers
+	 * to process space
+	 */
+	if (adev->nbio.funcs->remap_hdp_registers)
+		adev->nbio.funcs->remap_hdp_registers(adev);
+
 	/* The sequence of these two function calls matters.*/
 	gmc_v11_0_init_golden_registers(adev);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c b/drivers/gpu/drm/amd/amdgpu/soc21.c
index a26c5723c46e..4dbcc2b4fda0 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc21.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
@@ -677,12 +677,6 @@ static int soc21_common_hw_init(void *handle)
 	soc21_program_aspm(adev);
 	/* setup nbio registers */
 	adev->nbio.funcs->init_registers(adev);
-	/* remap HDP registers to a hole in mmio space,
-	 * for the purpose of expose those registers
-	 * to process space
-	 */
-	if (adev->nbio.funcs->remap_hdp_registers)
-		adev->nbio.funcs->remap_hdp_registers(adev);
 	/* enable the doorbell aperture */
 	soc21_enable_doorbell_aperture(adev, true);
 
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/7] drm/amdgpu: add HDP remap functionality to nbio 7.7
  2022-09-09 16:47 [PATCH 0/7] fix PCI AER issues Alex Deucher
                   ` (2 preceding siblings ...)
  2022-09-09 16:47 ` [PATCH 3/7] drm/amdgpu: move nbio remap_hdp_registers() to gmc11 code Alex Deucher
@ 2022-09-09 16:47 ` Alex Deucher
  2022-09-09 16:47 ` [PATCH 5/7] drm/amdgpu: move nbio ih_doorbell_range() into ih code for vega Alex Deucher
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Alex Deucher @ 2022-09-09 16:47 UTC (permalink / raw)
  To: amd-gfx, helgaas
  Cc: regressions, airlied, linux-pci, tseewald, kai.heng.feng, daniel,
	sr, m.seyfarth, Alex Deucher, Lijo Lazar

Was missing before and would have resulted in a write to
a non-existant register. Normally APUs don't use HDP, but
other asics could use this code and APUs do use the HDP
when used in passthrough.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/nbio_v7_7.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_7.c b/drivers/gpu/drm/amd/amdgpu/nbio_v7_7.c
index f30bc826a878..def89379b51a 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_7.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_7.c
@@ -28,6 +28,14 @@
 #include "nbio/nbio_7_7_0_sh_mask.h"
 #include <uapi/linux/kfd_ioctl.h>
 
+static void nbio_v7_7_remap_hdp_registers(struct amdgpu_device *adev)
+{
+	WREG32_SOC15(NBIO, 0, regBIF_BX0_REMAP_HDP_MEM_FLUSH_CNTL,
+		     adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL);
+	WREG32_SOC15(NBIO, 0, regBIF_BX0_REMAP_HDP_REG_FLUSH_CNTL,
+		     adev->rmmio_remap.reg_offset + KFD_MMIO_REMAP_HDP_REG_FLUSH_CNTL);
+}
+
 static u32 nbio_v7_7_get_rev_id(struct amdgpu_device *adev)
 {
 	u32 tmp;
@@ -336,4 +344,5 @@ const struct amdgpu_nbio_funcs nbio_v7_7_funcs = {
 	.get_clockgating_state = nbio_v7_7_get_clockgating_state,
 	.ih_control = nbio_v7_7_ih_control,
 	.init_registers = nbio_v7_7_init_registers,
+	.remap_hdp_registers = nbio_v7_7_remap_hdp_registers,
 };
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/7] drm/amdgpu: move nbio ih_doorbell_range() into ih code for vega
  2022-09-09 16:47 [PATCH 0/7] fix PCI AER issues Alex Deucher
                   ` (3 preceding siblings ...)
  2022-09-09 16:47 ` [PATCH 4/7] drm/amdgpu: add HDP remap functionality to nbio 7.7 Alex Deucher
@ 2022-09-09 16:47 ` Alex Deucher
  2022-09-09 16:47 ` [PATCH 6/7] drm/amdgpu: move nbio sdma_doorbell_range() into sdma " Alex Deucher
  2022-09-09 16:47 ` [PATCH 7/7] drm/amdgpu: make sure to init common IP before gmc Alex Deucher
  6 siblings, 0 replies; 11+ messages in thread
From: Alex Deucher @ 2022-09-09 16:47 UTC (permalink / raw)
  To: amd-gfx, helgaas
  Cc: regressions, airlied, linux-pci, tseewald, kai.heng.feng, daniel,
	sr, m.seyfarth, Alex Deucher

This mirrors what we do for other asics and this way we are
sure the ih doorbell range is properly initialized.

There is a comment about the way doorbells on gfx9 work that
requires that they are initialized for other IPs before GFX
is initialized.  In this case IH is initialized before GFX,
so there should be no issue.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/soc15.c     | 3 ---
 drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 4 ++++
 drivers/gpu/drm/amd/amdgpu/vega20_ih.c | 4 ++++
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
index 39c3c6d65aef..1dbb2a3ac4c4 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -1224,9 +1224,6 @@ static void soc15_doorbell_range_init(struct amdgpu_device *adev)
 				ring->use_doorbell, ring->doorbell_index,
 				adev->doorbell_index.sdma_doorbell_range);
 		}
-
-		adev->nbio.funcs->ih_doorbell_range(adev, adev->irq.ih.use_doorbell,
-						adev->irq.ih.doorbell_index);
 	}
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
index 03b7066471f9..1e83db0c5438 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
@@ -289,6 +289,10 @@ static int vega10_ih_irq_init(struct amdgpu_device *adev)
 		}
 	}
 
+	if (!amdgpu_sriov_vf(adev))
+		adev->nbio.funcs->ih_doorbell_range(adev, adev->irq.ih.use_doorbell,
+						    adev->irq.ih.doorbell_index);
+
 	pci_set_master(adev->pdev);
 
 	/* enable interrupts */
diff --git a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
index 2022ffbb8dba..59dfca093155 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
@@ -340,6 +340,10 @@ static int vega20_ih_irq_init(struct amdgpu_device *adev)
 		}
 	}
 
+	if (!amdgpu_sriov_vf(adev))
+		adev->nbio.funcs->ih_doorbell_range(adev, adev->irq.ih.use_doorbell,
+						    adev->irq.ih.doorbell_index);
+
 	pci_set_master(adev->pdev);
 
 	/* enable interrupts */
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 6/7] drm/amdgpu: move nbio sdma_doorbell_range() into sdma code for vega
  2022-09-09 16:47 [PATCH 0/7] fix PCI AER issues Alex Deucher
                   ` (4 preceding siblings ...)
  2022-09-09 16:47 ` [PATCH 5/7] drm/amdgpu: move nbio ih_doorbell_range() into ih code for vega Alex Deucher
@ 2022-09-09 16:47 ` Alex Deucher
  2022-09-09 16:47 ` [PATCH 7/7] drm/amdgpu: make sure to init common IP before gmc Alex Deucher
  6 siblings, 0 replies; 11+ messages in thread
From: Alex Deucher @ 2022-09-09 16:47 UTC (permalink / raw)
  To: amd-gfx, helgaas
  Cc: regressions, airlied, linux-pci, tseewald, kai.heng.feng, daniel,
	sr, m.seyfarth, Alex Deucher

This mirrors what we do for other asics and this way we are
sure the sdma doorbell range is properly initialized.

There is a comment about the way doorbells on gfx9 work that
requires that they are initialized for other IPs before GFX
is initialized.  However, the statement says that it applies to
multimedia as well, but the VCN code currently initializes
doorbells after GFX and there are no known issues there.  In my
testing at least I don't see any problems on SDMA.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c |  5 +++++
 drivers/gpu/drm/amd/amdgpu/soc15.c     | 22 ----------------------
 2 files changed, 5 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index 0cf9d3b486b2..7fe8bf3417db 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -1504,6 +1504,11 @@ static int sdma_v4_0_start(struct amdgpu_device *adev)
 		WREG32_SDMA(i, mmSDMA0_CNTL, temp);
 
 		if (!amdgpu_sriov_vf(adev)) {
+			ring = &adev->sdma.instance[i].ring;
+			adev->nbio.funcs->sdma_doorbell_range(adev, i,
+				ring->use_doorbell, ring->doorbell_index,
+				adev->doorbell_index.sdma_doorbell_range);
+
 			/* unhalt engine */
 			temp = RREG32_SDMA(i, mmSDMA0_F32_CNTL);
 			temp = REG_SET_FIELD(temp, SDMA0_F32_CNTL, HALT, 0);
diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
index 1dbb2a3ac4c4..218571574fa8 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -1211,22 +1211,6 @@ static int soc15_common_sw_fini(void *handle)
 	return 0;
 }
 
-static void soc15_doorbell_range_init(struct amdgpu_device *adev)
-{
-	int i;
-	struct amdgpu_ring *ring;
-
-	/* sdma/ih doorbell range are programed by hypervisor */
-	if (!amdgpu_sriov_vf(adev)) {
-		for (i = 0; i < adev->sdma.num_instances; i++) {
-			ring = &adev->sdma.instance[i].ring;
-			adev->nbio.funcs->sdma_doorbell_range(adev, i,
-				ring->use_doorbell, ring->doorbell_index,
-				adev->doorbell_index.sdma_doorbell_range);
-		}
-	}
-}
-
 static int soc15_common_hw_init(void *handle)
 {
 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
@@ -1239,12 +1223,6 @@ static int soc15_common_hw_init(void *handle)
 	adev->nbio.funcs->init_registers(adev);
 	/* enable the doorbell aperture */
 	soc15_enable_doorbell_aperture(adev, true);
-	/* HW doorbell routing policy: doorbell writing not
-	 * in SDMA/IH/MM/ACV range will be routed to CP. So
-	 * we need to init SDMA/IH/MM/ACV doorbell range prior
-	 * to CP ip block init and ring test.
-	 */
-	soc15_doorbell_range_init(adev);
 
 	return 0;
 }
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 7/7] drm/amdgpu: make sure to init common IP before gmc
  2022-09-09 16:47 [PATCH 0/7] fix PCI AER issues Alex Deucher
                   ` (5 preceding siblings ...)
  2022-09-09 16:47 ` [PATCH 6/7] drm/amdgpu: move nbio sdma_doorbell_range() into sdma " Alex Deucher
@ 2022-09-09 16:47 ` Alex Deucher
  6 siblings, 0 replies; 11+ messages in thread
From: Alex Deucher @ 2022-09-09 16:47 UTC (permalink / raw)
  To: amd-gfx, helgaas
  Cc: regressions, airlied, linux-pci, tseewald, kai.heng.feng, daniel,
	sr, m.seyfarth, Alex Deucher

This is not strictly necessary at this point since
we moved the HDP remap into GMC HW init, but at this
point it doesn't seem to cause any problems and it may
be beneficial to initialize the the common stuff before
GMC.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 899564ea8b4b..4da85ce9e3b1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2375,8 +2375,16 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
 		}
 		adev->ip_blocks[i].status.sw = true;
 
-		/* need to do gmc hw init early so we can allocate gpu mem */
-		if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_GMC) {
+		if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_COMMON) {
+			/* need to do common hw init early so everything is set up for gmc */
+			r = adev->ip_blocks[i].version->funcs->hw_init((void *)adev);
+			if (r) {
+				DRM_ERROR("hw_init %d failed %d\n", i, r);
+				goto init_failed;
+			}
+			adev->ip_blocks[i].status.hw = true;
+		} else if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_GMC) {
+			/* need to do gmc hw init early so we can allocate gpu mem */
 			/* Try to reserve bad pages early */
 			if (amdgpu_sriov_vf(adev))
 				amdgpu_virt_exchange_data(adev);
@@ -3062,8 +3070,8 @@ static int amdgpu_device_ip_reinit_early_sriov(struct amdgpu_device *adev)
 	int i, r;
 
 	static enum amd_ip_block_type ip_order[] = {
-		AMD_IP_BLOCK_TYPE_GMC,
 		AMD_IP_BLOCK_TYPE_COMMON,
+		AMD_IP_BLOCK_TYPE_GMC,
 		AMD_IP_BLOCK_TYPE_PSP,
 		AMD_IP_BLOCK_TYPE_IH,
 	};
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/7] drm/amdgpu: move nbio remap_hdp_registers() to gmc9 code
  2022-09-09 16:47 ` [PATCH 1/7] drm/amdgpu: move nbio remap_hdp_registers() to gmc9 code Alex Deucher
@ 2022-09-09 17:17   ` Lazar, Lijo
  2022-09-09 17:35     ` Alex Deucher
  0 siblings, 1 reply; 11+ messages in thread
From: Lazar, Lijo @ 2022-09-09 17:17 UTC (permalink / raw)
  To: Alex Deucher, amd-gfx, helgaas
  Cc: regressions, airlied, linux-pci, m.seyfarth, tseewald,
	kai.heng.feng, daniel, sr



On 9/9/2022 10:17 PM, Alex Deucher wrote:
> This is where it is used, so move it into gmc init so

It's only the *side effect* of GMC IP init process, but that doesn't 
mean these IPs are interlinked. Any IP init process which requires HDP 
flush also would need this. It is not a good idea to couple HDP remap 
with GMC especially when there exists a HDP data path way without 
setting up GMC (MM INDEX/DATA).

 From a generic software perspective, I think programming pre-requisite 
for HDP flush need to be standalone and the order needs to be guaranteed 
before any client IPs that make use of it.

Thanks,
Lijo

> that it will always be initialized in the right order.
> We already do this for other nbio and hdp callbacks so
> it's consistent with what we do on other IPs.
> 
> This fixes the Unsupported Request error reported through
> AER during driver load. The error happens as a write happens
> to the remap offset before real remapping is done.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373
> 
> The error was unnoticed before and got visible because of the commit
> referenced below. This doesn't fix anything in the commit below, rather
> fixes the issue in amdgpu exposed by the commit. The reference is only
> to associate this commit with below one so that both go together.
> 
> Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
> 
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 7 +++++++
>   drivers/gpu/drm/amd/amdgpu/soc15.c    | 7 -------
>   2 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> index 4603653916f5..3a4b0a475672 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> @@ -1819,6 +1819,13 @@ static int gmc_v9_0_hw_init(void *handle)
>   	bool value;
>   	int i, r;
>   
> +	/* remap HDP registers to a hole in mmio space,
> +	 * for the purpose of expose those registers
> +	 * to process space
> +	 */
> +	if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
> +		adev->nbio.funcs->remap_hdp_registers(adev);
> +
>   	/* The sequence of these two function calls matters.*/
>   	gmc_v9_0_init_golden_registers(adev);
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
> index 5188da87428d..39c3c6d65aef 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> @@ -1240,13 +1240,6 @@ static int soc15_common_hw_init(void *handle)
>   	soc15_program_aspm(adev);
>   	/* setup nbio registers */
>   	adev->nbio.funcs->init_registers(adev);
> -	/* remap HDP registers to a hole in mmio space,
> -	 * for the purpose of expose those registers
> -	 * to process space
> -	 */
> -	if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
> -		adev->nbio.funcs->remap_hdp_registers(adev);
> -
>   	/* enable the doorbell aperture */
>   	soc15_enable_doorbell_aperture(adev, true);
>   	/* HW doorbell routing policy: doorbell writing not
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/7] drm/amdgpu: move nbio remap_hdp_registers() to gmc9 code
  2022-09-09 17:17   ` Lazar, Lijo
@ 2022-09-09 17:35     ` Alex Deucher
  2022-09-12  4:41       ` Lazar, Lijo
  0 siblings, 1 reply; 11+ messages in thread
From: Alex Deucher @ 2022-09-09 17:35 UTC (permalink / raw)
  To: Lazar, Lijo
  Cc: Alex Deucher, amd-gfx, helgaas, regressions, airlied, linux-pci,
	m.seyfarth, tseewald, kai.heng.feng, daniel, sr

On Fri, Sep 9, 2022 at 1:17 PM Lazar, Lijo <lijo.lazar@amd.com> wrote:
>
>
>
> On 9/9/2022 10:17 PM, Alex Deucher wrote:
> > This is where it is used, so move it into gmc init so
>
> It's only the *side effect* of GMC IP init process, but that doesn't
> mean these IPs are interlinked. Any IP init process which requires HDP
> flush also would need this. It is not a good idea to couple HDP remap
> with GMC especially when there exists a HDP data path way without
> setting up GMC (MM INDEX/DATA).

We have no need for HDP flush at all without vram, and we only have
access to vram once GMC is initialized so it is sort of a dependency
in that regard.  We also call a bunch of the HDP callbacks in the GMC
code and I think those are sort of the boat.  Also, the whole reason
we are in this situation is because we need to init GMC before all
other HW because all other hardware has a dependency on being able to
access GPU memory.

>
>  From a generic software perspective, I think programming pre-requisite
> for HDP flush need to be standalone and the order needs to be guaranteed
> before any client IPs that make use of it.

In that case patches 5, 6, 7 could be an alternative.

Alex

>
> Thanks,
> Lijo
>
> > that it will always be initialized in the right order.
> > We already do this for other nbio and hdp callbacks so
> > it's consistent with what we do on other IPs.
> >
> > This fixes the Unsupported Request error reported through
> > AER during driver load. The error happens as a write happens
> > to the remap offset before real remapping is done.
> >
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373
> >
> > The error was unnoticed before and got visible because of the commit
> > referenced below. This doesn't fix anything in the commit below, rather
> > fixes the issue in amdgpu exposed by the commit. The reference is only
> > to associate this commit with below one so that both go together.
> >
> > Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
> >
> > Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 7 +++++++
> >   drivers/gpu/drm/amd/amdgpu/soc15.c    | 7 -------
> >   2 files changed, 7 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> > index 4603653916f5..3a4b0a475672 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> > @@ -1819,6 +1819,13 @@ static int gmc_v9_0_hw_init(void *handle)
> >       bool value;
> >       int i, r;
> >
> > +     /* remap HDP registers to a hole in mmio space,
> > +      * for the purpose of expose those registers
> > +      * to process space
> > +      */
> > +     if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
> > +             adev->nbio.funcs->remap_hdp_registers(adev);
> > +
> >       /* The sequence of these two function calls matters.*/
> >       gmc_v9_0_init_golden_registers(adev);
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > index 5188da87428d..39c3c6d65aef 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > @@ -1240,13 +1240,6 @@ static int soc15_common_hw_init(void *handle)
> >       soc15_program_aspm(adev);
> >       /* setup nbio registers */
> >       adev->nbio.funcs->init_registers(adev);
> > -     /* remap HDP registers to a hole in mmio space,
> > -      * for the purpose of expose those registers
> > -      * to process space
> > -      */
> > -     if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
> > -             adev->nbio.funcs->remap_hdp_registers(adev);
> > -
> >       /* enable the doorbell aperture */
> >       soc15_enable_doorbell_aperture(adev, true);
> >       /* HW doorbell routing policy: doorbell writing not
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/7] drm/amdgpu: move nbio remap_hdp_registers() to gmc9 code
  2022-09-09 17:35     ` Alex Deucher
@ 2022-09-12  4:41       ` Lazar, Lijo
  0 siblings, 0 replies; 11+ messages in thread
From: Lazar, Lijo @ 2022-09-12  4:41 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Alex Deucher, amd-gfx, helgaas, regressions, airlied, linux-pci,
	m.seyfarth, tseewald, kai.heng.feng, daniel, sr



On 9/9/2022 11:05 PM, Alex Deucher wrote:
> On Fri, Sep 9, 2022 at 1:17 PM Lazar, Lijo <lijo.lazar@amd.com> wrote:
>>
>>
>>
>> On 9/9/2022 10:17 PM, Alex Deucher wrote:
>>> This is where it is used, so move it into gmc init so
>>
>> It's only the *side effect* of GMC IP init process, but that doesn't
>> mean these IPs are interlinked. Any IP init process which requires HDP
>> flush also would need this. It is not a good idea to couple HDP remap
>> with GMC especially when there exists a HDP data path way without
>> setting up GMC (MM INDEX/DATA).
> 
> We have no need for HDP flush at all without vram, and we only have
> access to vram once GMC is initialized so it is sort of a dependency
> in that regard.  We also call a bunch of the HDP callbacks in the GMC
> code and I think those are sort of the boat.  Also, the whole reason
> we are in this situation is because we need to init GMC before all
> other HW because all other hardware has a dependency on being able to
> access GPU memory.
> 

We do have early VRAM access usecases over HDP to fixed offsets for 
discovery region, 2-stage memory training etc. So far there is no 
requirement for flush, or flush might be happening indirectly because of 
a register access. That doesn't rule out any future requirements for 
explicit HDP flush. Prefer to keep HDP and GMC programming separate.

Thanks,
Lijo

>>
>>   From a generic software perspective, I think programming pre-requisite
>> for HDP flush need to be standalone and the order needs to be guaranteed
>> before any client IPs that make use of it.
> 
> In that case patches 5, 6, 7 could be an alternative.
> 
> Alex
> 
>>
>> Thanks,
>> Lijo
>>
>>> that it will always be initialized in the right order.
>>> We already do this for other nbio and hdp callbacks so
>>> it's consistent with what we do on other IPs.
>>>
>>> This fixes the Unsupported Request error reported through
>>> AER during driver load. The error happens as a write happens
>>> to the remap offset before real remapping is done.
>>>
>>> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D216373&amp;data=05%7C01%7Clijo.lazar%40amd.com%7C984f5015c4104040ca1d08da9289c85d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637983417715604666%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=BJL7OWuAlaOzA%2F2G%2BYSzkdtaO3TmYwRK1gAsw26pW1U%3D&amp;reserved=0
>>>
>>> The error was unnoticed before and got visible because of the commit
>>> referenced below. This doesn't fix anything in the commit below, rather
>>> fixes the issue in amdgpu exposed by the commit. The reference is only
>>> to associate this commit with below one so that both go together.
>>>
>>> Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
>>>
>>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 7 +++++++
>>>    drivers/gpu/drm/amd/amdgpu/soc15.c    | 7 -------
>>>    2 files changed, 7 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> index 4603653916f5..3a4b0a475672 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> @@ -1819,6 +1819,13 @@ static int gmc_v9_0_hw_init(void *handle)
>>>        bool value;
>>>        int i, r;
>>>
>>> +     /* remap HDP registers to a hole in mmio space,
>>> +      * for the purpose of expose those registers
>>> +      * to process space
>>> +      */
>>> +     if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
>>> +             adev->nbio.funcs->remap_hdp_registers(adev);
>>> +
>>>        /* The sequence of these two function calls matters.*/
>>>        gmc_v9_0_init_golden_registers(adev);
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
>>> index 5188da87428d..39c3c6d65aef 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
>>> @@ -1240,13 +1240,6 @@ static int soc15_common_hw_init(void *handle)
>>>        soc15_program_aspm(adev);
>>>        /* setup nbio registers */
>>>        adev->nbio.funcs->init_registers(adev);
>>> -     /* remap HDP registers to a hole in mmio space,
>>> -      * for the purpose of expose those registers
>>> -      * to process space
>>> -      */
>>> -     if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
>>> -             adev->nbio.funcs->remap_hdp_registers(adev);
>>> -
>>>        /* enable the doorbell aperture */
>>>        soc15_enable_doorbell_aperture(adev, true);
>>>        /* HW doorbell routing policy: doorbell writing not
>>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-09-12  4:41 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-09 16:47 [PATCH 0/7] fix PCI AER issues Alex Deucher
2022-09-09 16:47 ` [PATCH 1/7] drm/amdgpu: move nbio remap_hdp_registers() to gmc9 code Alex Deucher
2022-09-09 17:17   ` Lazar, Lijo
2022-09-09 17:35     ` Alex Deucher
2022-09-12  4:41       ` Lazar, Lijo
2022-09-09 16:47 ` [PATCH 2/7] drm/amdgpu: move nbio remap_hdp_registers() to gmc10 code Alex Deucher
2022-09-09 16:47 ` [PATCH 3/7] drm/amdgpu: move nbio remap_hdp_registers() to gmc11 code Alex Deucher
2022-09-09 16:47 ` [PATCH 4/7] drm/amdgpu: add HDP remap functionality to nbio 7.7 Alex Deucher
2022-09-09 16:47 ` [PATCH 5/7] drm/amdgpu: move nbio ih_doorbell_range() into ih code for vega Alex Deucher
2022-09-09 16:47 ` [PATCH 6/7] drm/amdgpu: move nbio sdma_doorbell_range() into sdma " Alex Deucher
2022-09-09 16:47 ` [PATCH 7/7] drm/amdgpu: make sure to init common IP before gmc Alex Deucher

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).