regressions.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] fix PCI AER issues
@ 2022-09-13 14:48 Alex Deucher
  2022-09-13 14:48 ` [PATCH 1/3] drm/amdgpu: move nbio ih_doorbell_range() into ih code for vega Alex Deucher
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Alex Deucher @ 2022-09-13 14:48 UTC (permalink / raw)
  To: amd-gfx, helgaas
  Cc: regressions, airlied, linux-pci, tseewald, kai.heng.feng, daniel,
	sr, m.seyfarth, Alex Deucher

The first two patches fix ordering issues with the doorbells so
that the doorbell structures are initialized before the we program
them.  The last patch fixes the PCI AER errors by moving common hw
init before GMC hw init so that the HDP register is remapped before
it is used preventing a write to a non-existent register.

Drop the HDP remap in GMC init patches as per discussions with Lijo.

Alex Deucher (3):
  drm/amdgpu: move nbio ih_doorbell_range() into ih code for vega
  drm/amdgpu: move nbio sdma_doorbell_range() into sdma code for vega
  drm/amdgpu: make sure to init common IP before gmc

 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 14 +++++++++---
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c     |  5 +++++
 drivers/gpu/drm/amd/amdgpu/soc15.c         | 25 ----------------------
 drivers/gpu/drm/amd/amdgpu/vega10_ih.c     |  4 ++++
 drivers/gpu/drm/amd/amdgpu/vega20_ih.c     |  4 ++++
 5 files changed, 24 insertions(+), 28 deletions(-)

-- 
2.37.2


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/3] drm/amdgpu: move nbio ih_doorbell_range() into ih code for vega
  2022-09-13 14:48 [PATCH v3 0/3] fix PCI AER issues Alex Deucher
@ 2022-09-13 14:48 ` Alex Deucher
  2022-09-14  7:04   ` Lazar, Lijo
  2022-09-13 14:48 ` [PATCH 2/3] drm/amdgpu: move nbio sdma_doorbell_range() into sdma " Alex Deucher
  2022-09-13 14:48 ` [PATCH 3/3] drm/amdgpu: make sure to init common IP before gmc Alex Deucher
  2 siblings, 1 reply; 7+ messages in thread
From: Alex Deucher @ 2022-09-13 14:48 UTC (permalink / raw)
  To: amd-gfx, helgaas
  Cc: regressions, airlied, linux-pci, tseewald, kai.heng.feng, daniel,
	sr, m.seyfarth, Alex Deucher

This mirrors what we do for other asics and this way we are
sure the ih doorbell range is properly initialized.

There is a comment about the way doorbells on gfx9 work that
requires that they are initialized for other IPs before GFX
is initialized.  In this case IH is initialized before GFX,
so there should be no issue.

This fixes the Unsupported Request error reported through
AER during driver load. The error happens as a write happens
to the remap offset before real remapping is done.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373

The error was unnoticed before and got visible because of the commit
referenced below. This doesn't fix anything in the commit below, rather
fixes the issue in amdgpu exposed by the commit. The reference is only
to associate this commit with below one so that both go together.

Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/soc15.c     | 3 ---
 drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 4 ++++
 drivers/gpu/drm/amd/amdgpu/vega20_ih.c | 4 ++++
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
index 5188da87428d..e6a4002fa67d 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -1224,9 +1224,6 @@ static void soc15_doorbell_range_init(struct amdgpu_device *adev)
 				ring->use_doorbell, ring->doorbell_index,
 				adev->doorbell_index.sdma_doorbell_range);
 		}
-
-		adev->nbio.funcs->ih_doorbell_range(adev, adev->irq.ih.use_doorbell,
-						adev->irq.ih.doorbell_index);
 	}
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
index 03b7066471f9..1e83db0c5438 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
@@ -289,6 +289,10 @@ static int vega10_ih_irq_init(struct amdgpu_device *adev)
 		}
 	}
 
+	if (!amdgpu_sriov_vf(adev))
+		adev->nbio.funcs->ih_doorbell_range(adev, adev->irq.ih.use_doorbell,
+						    adev->irq.ih.doorbell_index);
+
 	pci_set_master(adev->pdev);
 
 	/* enable interrupts */
diff --git a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
index 2022ffbb8dba..59dfca093155 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
@@ -340,6 +340,10 @@ static int vega20_ih_irq_init(struct amdgpu_device *adev)
 		}
 	}
 
+	if (!amdgpu_sriov_vf(adev))
+		adev->nbio.funcs->ih_doorbell_range(adev, adev->irq.ih.use_doorbell,
+						    adev->irq.ih.doorbell_index);
+
 	pci_set_master(adev->pdev);
 
 	/* enable interrupts */
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/3] drm/amdgpu: move nbio sdma_doorbell_range() into sdma code for vega
  2022-09-13 14:48 [PATCH v3 0/3] fix PCI AER issues Alex Deucher
  2022-09-13 14:48 ` [PATCH 1/3] drm/amdgpu: move nbio ih_doorbell_range() into ih code for vega Alex Deucher
@ 2022-09-13 14:48 ` Alex Deucher
  2022-09-13 14:48 ` [PATCH 3/3] drm/amdgpu: make sure to init common IP before gmc Alex Deucher
  2 siblings, 0 replies; 7+ messages in thread
From: Alex Deucher @ 2022-09-13 14:48 UTC (permalink / raw)
  To: amd-gfx, helgaas
  Cc: regressions, airlied, linux-pci, tseewald, kai.heng.feng, daniel,
	sr, m.seyfarth, Alex Deucher

This mirrors what we do for other asics and this way we are
sure the sdma doorbell range is properly initialized.

There is a comment about the way doorbells on gfx9 work that
requires that they are initialized for other IPs before GFX
is initialized.  However, the statement says that it applies to
multimedia as well, but the VCN code currently initializes
doorbells after GFX and there are no known issues there.  In my
testing at least I don't see any problems on SDMA.

This fixes the Unsupported Request error reported through
AER during driver load. The error happens as a write happens
to the remap offset before real remapping is done.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373

The error was unnoticed before and got visible because of the commit
referenced below. This doesn't fix anything in the commit below, rather
fixes the issue in amdgpu exposed by the commit. The reference is only
to associate this commit with below one so that both go together.

Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c |  5 +++++
 drivers/gpu/drm/amd/amdgpu/soc15.c     | 22 ----------------------
 2 files changed, 5 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index 0cf9d3b486b2..7fe8bf3417db 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -1504,6 +1504,11 @@ static int sdma_v4_0_start(struct amdgpu_device *adev)
 		WREG32_SDMA(i, mmSDMA0_CNTL, temp);
 
 		if (!amdgpu_sriov_vf(adev)) {
+			ring = &adev->sdma.instance[i].ring;
+			adev->nbio.funcs->sdma_doorbell_range(adev, i,
+				ring->use_doorbell, ring->doorbell_index,
+				adev->doorbell_index.sdma_doorbell_range);
+
 			/* unhalt engine */
 			temp = RREG32_SDMA(i, mmSDMA0_F32_CNTL);
 			temp = REG_SET_FIELD(temp, SDMA0_F32_CNTL, HALT, 0);
diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
index e6a4002fa67d..d9914052d20d 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -1211,22 +1211,6 @@ static int soc15_common_sw_fini(void *handle)
 	return 0;
 }
 
-static void soc15_doorbell_range_init(struct amdgpu_device *adev)
-{
-	int i;
-	struct amdgpu_ring *ring;
-
-	/* sdma/ih doorbell range are programed by hypervisor */
-	if (!amdgpu_sriov_vf(adev)) {
-		for (i = 0; i < adev->sdma.num_instances; i++) {
-			ring = &adev->sdma.instance[i].ring;
-			adev->nbio.funcs->sdma_doorbell_range(adev, i,
-				ring->use_doorbell, ring->doorbell_index,
-				adev->doorbell_index.sdma_doorbell_range);
-		}
-	}
-}
-
 static int soc15_common_hw_init(void *handle)
 {
 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
@@ -1246,12 +1230,6 @@ static int soc15_common_hw_init(void *handle)
 
 	/* enable the doorbell aperture */
 	soc15_enable_doorbell_aperture(adev, true);
-	/* HW doorbell routing policy: doorbell writing not
-	 * in SDMA/IH/MM/ACV range will be routed to CP. So
-	 * we need to init SDMA/IH/MM/ACV doorbell range prior
-	 * to CP ip block init and ring test.
-	 */
-	soc15_doorbell_range_init(adev);
 
 	return 0;
 }
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] drm/amdgpu: make sure to init common IP before gmc
  2022-09-13 14:48 [PATCH v3 0/3] fix PCI AER issues Alex Deucher
  2022-09-13 14:48 ` [PATCH 1/3] drm/amdgpu: move nbio ih_doorbell_range() into ih code for vega Alex Deucher
  2022-09-13 14:48 ` [PATCH 2/3] drm/amdgpu: move nbio sdma_doorbell_range() into sdma " Alex Deucher
@ 2022-09-13 14:48 ` Alex Deucher
  2022-09-14  9:03   ` Christian König
  2 siblings, 1 reply; 7+ messages in thread
From: Alex Deucher @ 2022-09-13 14:48 UTC (permalink / raw)
  To: amd-gfx, helgaas
  Cc: regressions, airlied, linux-pci, tseewald, kai.heng.feng, daniel,
	sr, m.seyfarth, Alex Deucher

Move common IP init before GMC init so that HDP gets
remapped before GMC init which uses it.

This fixes the Unsupported Request error reported through
AER during driver load. The error happens as a write happens
to the remap offset before real remapping is done.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373

The error was unnoticed before and got visible because of the commit
referenced below. This doesn't fix anything in the commit below, rather
fixes the issue in amdgpu exposed by the commit. The reference is only
to associate this commit with below one so that both go together.

Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 899564ea8b4b..4da85ce9e3b1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2375,8 +2375,16 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
 		}
 		adev->ip_blocks[i].status.sw = true;
 
-		/* need to do gmc hw init early so we can allocate gpu mem */
-		if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_GMC) {
+		if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_COMMON) {
+			/* need to do common hw init early so everything is set up for gmc */
+			r = adev->ip_blocks[i].version->funcs->hw_init((void *)adev);
+			if (r) {
+				DRM_ERROR("hw_init %d failed %d\n", i, r);
+				goto init_failed;
+			}
+			adev->ip_blocks[i].status.hw = true;
+		} else if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_GMC) {
+			/* need to do gmc hw init early so we can allocate gpu mem */
 			/* Try to reserve bad pages early */
 			if (amdgpu_sriov_vf(adev))
 				amdgpu_virt_exchange_data(adev);
@@ -3062,8 +3070,8 @@ static int amdgpu_device_ip_reinit_early_sriov(struct amdgpu_device *adev)
 	int i, r;
 
 	static enum amd_ip_block_type ip_order[] = {
-		AMD_IP_BLOCK_TYPE_GMC,
 		AMD_IP_BLOCK_TYPE_COMMON,
+		AMD_IP_BLOCK_TYPE_GMC,
 		AMD_IP_BLOCK_TYPE_PSP,
 		AMD_IP_BLOCK_TYPE_IH,
 	};
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/3] drm/amdgpu: move nbio ih_doorbell_range() into ih code for vega
  2022-09-13 14:48 ` [PATCH 1/3] drm/amdgpu: move nbio ih_doorbell_range() into ih code for vega Alex Deucher
@ 2022-09-14  7:04   ` Lazar, Lijo
  2022-09-14 13:43     ` Alex Deucher
  0 siblings, 1 reply; 7+ messages in thread
From: Lazar, Lijo @ 2022-09-14  7:04 UTC (permalink / raw)
  To: Alex Deucher, amd-gfx, helgaas
  Cc: regressions, airlied, linux-pci, m.seyfarth, tseewald,
	kai.heng.feng, daniel, sr



On 9/13/2022 8:18 PM, Alex Deucher wrote:
> This mirrors what we do for other asics and this way we are
> sure the ih doorbell range is properly initialized.
> 
> There is a comment about the way doorbells on gfx9 work that
> requires that they are initialized for other IPs before GFX
> is initialized.  In this case IH is initialized before GFX,
> so there should be no issue.
> 

Not sure about the association of patch 1 and 2 with AER as in the 
comment below. I thought the access would go through (PCIE errors may 
not be reported) and the only side effect is doorbell won't be hit/routed.

The comments may not be relevant to patches 1/2, apart from that -

Series is:
	Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>

Thanks,
Lijo

> This fixes the Unsupported Request error reported through
> AER during driver load. The error happens as a write happens
> to the remap offset before real remapping is done.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373
> 
> The error was unnoticed before and got visible because of the commit
> referenced below. This doesn't fix anything in the commit below, rather
> fixes the issue in amdgpu exposed by the commit. The reference is only
> to associate this commit with below one so that both go together.
> 
> Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
> 
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/soc15.c     | 3 ---
>   drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 4 ++++
>   drivers/gpu/drm/amd/amdgpu/vega20_ih.c | 4 ++++
>   3 files changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
> index 5188da87428d..e6a4002fa67d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> @@ -1224,9 +1224,6 @@ static void soc15_doorbell_range_init(struct amdgpu_device *adev)
>   				ring->use_doorbell, ring->doorbell_index,
>   				adev->doorbell_index.sdma_doorbell_range);
>   		}
> -
> -		adev->nbio.funcs->ih_doorbell_range(adev, adev->irq.ih.use_doorbell,
> -						adev->irq.ih.doorbell_index);
>   	}
>   }
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
> index 03b7066471f9..1e83db0c5438 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
> @@ -289,6 +289,10 @@ static int vega10_ih_irq_init(struct amdgpu_device *adev)
>   		}
>   	}
>   
> +	if (!amdgpu_sriov_vf(adev))
> +		adev->nbio.funcs->ih_doorbell_range(adev, adev->irq.ih.use_doorbell,
> +						    adev->irq.ih.doorbell_index);
> +
>   	pci_set_master(adev->pdev);
>   
>   	/* enable interrupts */
> diff --git a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
> index 2022ffbb8dba..59dfca093155 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
> @@ -340,6 +340,10 @@ static int vega20_ih_irq_init(struct amdgpu_device *adev)
>   		}
>   	}
>   
> +	if (!amdgpu_sriov_vf(adev))
> +		adev->nbio.funcs->ih_doorbell_range(adev, adev->irq.ih.use_doorbell,
> +						    adev->irq.ih.doorbell_index);
> +
>   	pci_set_master(adev->pdev);
>   
>   	/* enable interrupts */
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/3] drm/amdgpu: make sure to init common IP before gmc
  2022-09-13 14:48 ` [PATCH 3/3] drm/amdgpu: make sure to init common IP before gmc Alex Deucher
@ 2022-09-14  9:03   ` Christian König
  0 siblings, 0 replies; 7+ messages in thread
From: Christian König @ 2022-09-14  9:03 UTC (permalink / raw)
  To: Alex Deucher, amd-gfx, helgaas
  Cc: regressions, airlied, linux-pci, m.seyfarth, tseewald,
	kai.heng.feng, daniel, sr

Am 13.09.22 um 16:48 schrieb Alex Deucher:
> Move common IP init before GMC init so that HDP gets
> remapped before GMC init which uses it.

At some point we should improve this so that we have the common and GMC 
stuff in the hardware init as first thing without those hacks.

But anyway Acked-by for now since this is higher level design work.

Regards,
Christian.

>
> This fixes the Unsupported Request error reported through
> AER during driver load. The error happens as a write happens
> to the remap offset before real remapping is done.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373
>
> The error was unnoticed before and got visible because of the commit
> referenced below. This doesn't fix anything in the commit below, rather
> fixes the issue in amdgpu exposed by the commit. The reference is only
> to associate this commit with below one so that both go together.
>
> Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 14 +++++++++++---
>   1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 899564ea8b4b..4da85ce9e3b1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2375,8 +2375,16 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
>   		}
>   		adev->ip_blocks[i].status.sw = true;
>   
> -		/* need to do gmc hw init early so we can allocate gpu mem */
> -		if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_GMC) {
> +		if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_COMMON) {
> +			/* need to do common hw init early so everything is set up for gmc */
> +			r = adev->ip_blocks[i].version->funcs->hw_init((void *)adev);
> +			if (r) {
> +				DRM_ERROR("hw_init %d failed %d\n", i, r);
> +				goto init_failed;
> +			}
> +			adev->ip_blocks[i].status.hw = true;
> +		} else if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_GMC) {
> +			/* need to do gmc hw init early so we can allocate gpu mem */
>   			/* Try to reserve bad pages early */
>   			if (amdgpu_sriov_vf(adev))
>   				amdgpu_virt_exchange_data(adev);
> @@ -3062,8 +3070,8 @@ static int amdgpu_device_ip_reinit_early_sriov(struct amdgpu_device *adev)
>   	int i, r;
>   
>   	static enum amd_ip_block_type ip_order[] = {
> -		AMD_IP_BLOCK_TYPE_GMC,
>   		AMD_IP_BLOCK_TYPE_COMMON,
> +		AMD_IP_BLOCK_TYPE_GMC,
>   		AMD_IP_BLOCK_TYPE_PSP,
>   		AMD_IP_BLOCK_TYPE_IH,
>   	};


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/3] drm/amdgpu: move nbio ih_doorbell_range() into ih code for vega
  2022-09-14  7:04   ` Lazar, Lijo
@ 2022-09-14 13:43     ` Alex Deucher
  0 siblings, 0 replies; 7+ messages in thread
From: Alex Deucher @ 2022-09-14 13:43 UTC (permalink / raw)
  To: Lazar, Lijo
  Cc: Alex Deucher, amd-gfx, helgaas, regressions, airlied, linux-pci,
	m.seyfarth, tseewald, kai.heng.feng, daniel, sr

On Wed, Sep 14, 2022 at 3:05 AM Lazar, Lijo <lijo.lazar@amd.com> wrote:
>
>
>
> On 9/13/2022 8:18 PM, Alex Deucher wrote:
> > This mirrors what we do for other asics and this way we are
> > sure the ih doorbell range is properly initialized.
> >
> > There is a comment about the way doorbells on gfx9 work that
> > requires that they are initialized for other IPs before GFX
> > is initialized.  In this case IH is initialized before GFX,
> > so there should be no issue.
> >
>
> Not sure about the association of patch 1 and 2 with AER as in the
> comment below. I thought the access would go through (PCIE errors may
> not be reported) and the only side effect is doorbell won't be hit/routed.
>
> The comments may not be relevant to patches 1/2, apart from that -

Patches 1 and 2 don't fix the actual issue, but they are prerequisites
for patch 3.  Without patches 1 and 2, patch 3 won't work on all
cards.  Seemed prudent to just mark all 3, but I could clarify that 1
and 2 are just prerequisites.

Thanks,

Alex

>
> Series is:
>         Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
>
> Thanks,
> Lijo
>
> > This fixes the Unsupported Request error reported through
> > AER during driver load. The error happens as a write happens
> > to the remap offset before real remapping is done.
> >
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373
> >
> > The error was unnoticed before and got visible because of the commit
> > referenced below. This doesn't fix anything in the commit below, rather
> > fixes the issue in amdgpu exposed by the commit. The reference is only
> > to associate this commit with below one so that both go together.
> >
> > Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
> >
> > Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/soc15.c     | 3 ---
> >   drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 4 ++++
> >   drivers/gpu/drm/amd/amdgpu/vega20_ih.c | 4 ++++
> >   3 files changed, 8 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > index 5188da87428d..e6a4002fa67d 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > @@ -1224,9 +1224,6 @@ static void soc15_doorbell_range_init(struct amdgpu_device *adev)
> >                               ring->use_doorbell, ring->doorbell_index,
> >                               adev->doorbell_index.sdma_doorbell_range);
> >               }
> > -
> > -             adev->nbio.funcs->ih_doorbell_range(adev, adev->irq.ih.use_doorbell,
> > -                                             adev->irq.ih.doorbell_index);
> >       }
> >   }
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
> > index 03b7066471f9..1e83db0c5438 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
> > @@ -289,6 +289,10 @@ static int vega10_ih_irq_init(struct amdgpu_device *adev)
> >               }
> >       }
> >
> > +     if (!amdgpu_sriov_vf(adev))
> > +             adev->nbio.funcs->ih_doorbell_range(adev, adev->irq.ih.use_doorbell,
> > +                                                 adev->irq.ih.doorbell_index);
> > +
> >       pci_set_master(adev->pdev);
> >
> >       /* enable interrupts */
> > diff --git a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
> > index 2022ffbb8dba..59dfca093155 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
> > @@ -340,6 +340,10 @@ static int vega20_ih_irq_init(struct amdgpu_device *adev)
> >               }
> >       }
> >
> > +     if (!amdgpu_sriov_vf(adev))
> > +             adev->nbio.funcs->ih_doorbell_range(adev, adev->irq.ih.use_doorbell,
> > +                                                 adev->irq.ih.doorbell_index);
> > +
> >       pci_set_master(adev->pdev);
> >
> >       /* enable interrupts */
> >

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-09-14 13:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-13 14:48 [PATCH v3 0/3] fix PCI AER issues Alex Deucher
2022-09-13 14:48 ` [PATCH 1/3] drm/amdgpu: move nbio ih_doorbell_range() into ih code for vega Alex Deucher
2022-09-14  7:04   ` Lazar, Lijo
2022-09-14 13:43     ` Alex Deucher
2022-09-13 14:48 ` [PATCH 2/3] drm/amdgpu: move nbio sdma_doorbell_range() into sdma " Alex Deucher
2022-09-13 14:48 ` [PATCH 3/3] drm/amdgpu: make sure to init common IP before gmc Alex Deucher
2022-09-14  9:03   ` Christian König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).