All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdgpu: fix sdma doorbell init ordering on APUs
@ 2022-10-20  3:48 Alex Deucher
  2022-10-20  5:59 ` Christian König
  2022-10-20 14:44 ` Shuah Khan
  0 siblings, 2 replies; 3+ messages in thread
From: Alex Deucher @ 2022-10-20  3:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher, skhan

Commit 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
uncovered a bug in amdgpu that required a reordering of the driver
init sequence to avoid accessing a special register on the GPU
before it was properly set up leading to an PCI AER error.  This
reordering uncovered a different hw programming ordering dependency
in some APUs where the SDMA doorbells need to be programmed before
the GFX doorbells. To fix this, move the SDMA doorbell programming
back into the soc15 common code, but use the actual doorbell range
values directly rather than the values stored in the ring structure
since those will not be initialized at this point.

This is a partial revert, but with the doorbell assignment
fixed so the proper doorbell index is set before it's used.

Fixes: e3163bc8ffdfdb ("drm/amdgpu: move nbio sdma_doorbell_range() into sdma code for vega")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: skhan@linuxfoundation.org
---
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c |  5 -----
 drivers/gpu/drm/amd/amdgpu/soc15.c     | 21 +++++++++++++++++++++
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index 298fa11702e7..1122bd4eae98 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -1417,11 +1417,6 @@ static int sdma_v4_0_start(struct amdgpu_device *adev)
 		WREG32_SDMA(i, mmSDMA0_CNTL, temp);
 
 		if (!amdgpu_sriov_vf(adev)) {
-			ring = &adev->sdma.instance[i].ring;
-			adev->nbio.funcs->sdma_doorbell_range(adev, i,
-				ring->use_doorbell, ring->doorbell_index,
-				adev->doorbell_index.sdma_doorbell_range);
-
 			/* unhalt engine */
 			temp = RREG32_SDMA(i, mmSDMA0_F32_CNTL);
 			temp = REG_SET_FIELD(temp, SDMA0_F32_CNTL, HALT, 0);
diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
index 183024d7c184..e3b2b6b4f1a6 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -1211,6 +1211,20 @@ static int soc15_common_sw_fini(void *handle)
 	return 0;
 }
 
+static void soc15_sdma_doorbell_range_init(struct amdgpu_device *adev)
+{
+	int i;
+
+	/* sdma doorbell range is programed by hypervisor */
+	if (!amdgpu_sriov_vf(adev)) {
+		for (i = 0; i < adev->sdma.num_instances; i++) {
+			adev->nbio.funcs->sdma_doorbell_range(adev, i,
+				true, adev->doorbell_index.sdma_engine[i] << 1,
+				adev->doorbell_index.sdma_doorbell_range);
+		}
+	}
+}
+
 static int soc15_common_hw_init(void *handle)
 {
 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
@@ -1230,6 +1244,13 @@ static int soc15_common_hw_init(void *handle)
 
 	/* enable the doorbell aperture */
 	soc15_enable_doorbell_aperture(adev, true);
+	/* HW doorbell routing policy: doorbell writing not
+	 * in SDMA/IH/MM/ACV range will be routed to CP. So
+	 * we need to init SDMA doorbell range prior
+	 * to CP ip block init and ring test.  IH already
+	 * happens before CP.
+	 */
+	soc15_sdma_doorbell_range_init(adev);
 
 	return 0;
 }
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] drm/amdgpu: fix sdma doorbell init ordering on APUs
  2022-10-20  3:48 [PATCH] drm/amdgpu: fix sdma doorbell init ordering on APUs Alex Deucher
@ 2022-10-20  5:59 ` Christian König
  2022-10-20 14:44 ` Shuah Khan
  1 sibling, 0 replies; 3+ messages in thread
From: Christian König @ 2022-10-20  5:59 UTC (permalink / raw)
  To: Alex Deucher, amd-gfx; +Cc: skhan

Am 20.10.22 um 05:48 schrieb Alex Deucher:
> Commit 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
> uncovered a bug in amdgpu that required a reordering of the driver
> init sequence to avoid accessing a special register on the GPU
> before it was properly set up leading to an PCI AER error.  This
> reordering uncovered a different hw programming ordering dependency
> in some APUs where the SDMA doorbells need to be programmed before
> the GFX doorbells. To fix this, move the SDMA doorbell programming
> back into the soc15 common code, but use the actual doorbell range
> values directly rather than the values stored in the ring structure
> since those will not be initialized at this point.
>
> This is a partial revert, but with the doorbell assignment
> fixed so the proper doorbell index is set before it's used.
>
> Fixes: e3163bc8ffdfdb ("drm/amdgpu: move nbio sdma_doorbell_range() into sdma code for vega")
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> Cc: skhan@linuxfoundation.org

Acked-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c |  5 -----
>   drivers/gpu/drm/amd/amdgpu/soc15.c     | 21 +++++++++++++++++++++
>   2 files changed, 21 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> index 298fa11702e7..1122bd4eae98 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> @@ -1417,11 +1417,6 @@ static int sdma_v4_0_start(struct amdgpu_device *adev)
>   		WREG32_SDMA(i, mmSDMA0_CNTL, temp);
>   
>   		if (!amdgpu_sriov_vf(adev)) {
> -			ring = &adev->sdma.instance[i].ring;
> -			adev->nbio.funcs->sdma_doorbell_range(adev, i,
> -				ring->use_doorbell, ring->doorbell_index,
> -				adev->doorbell_index.sdma_doorbell_range);
> -
>   			/* unhalt engine */
>   			temp = RREG32_SDMA(i, mmSDMA0_F32_CNTL);
>   			temp = REG_SET_FIELD(temp, SDMA0_F32_CNTL, HALT, 0);
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
> index 183024d7c184..e3b2b6b4f1a6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> @@ -1211,6 +1211,20 @@ static int soc15_common_sw_fini(void *handle)
>   	return 0;
>   }
>   
> +static void soc15_sdma_doorbell_range_init(struct amdgpu_device *adev)
> +{
> +	int i;
> +
> +	/* sdma doorbell range is programed by hypervisor */
> +	if (!amdgpu_sriov_vf(adev)) {
> +		for (i = 0; i < adev->sdma.num_instances; i++) {
> +			adev->nbio.funcs->sdma_doorbell_range(adev, i,
> +				true, adev->doorbell_index.sdma_engine[i] << 1,
> +				adev->doorbell_index.sdma_doorbell_range);
> +		}
> +	}
> +}
> +
>   static int soc15_common_hw_init(void *handle)
>   {
>   	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> @@ -1230,6 +1244,13 @@ static int soc15_common_hw_init(void *handle)
>   
>   	/* enable the doorbell aperture */
>   	soc15_enable_doorbell_aperture(adev, true);
> +	/* HW doorbell routing policy: doorbell writing not
> +	 * in SDMA/IH/MM/ACV range will be routed to CP. So
> +	 * we need to init SDMA doorbell range prior
> +	 * to CP ip block init and ring test.  IH already
> +	 * happens before CP.
> +	 */
> +	soc15_sdma_doorbell_range_init(adev);
>   
>   	return 0;
>   }


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] drm/amdgpu: fix sdma doorbell init ordering on APUs
  2022-10-20  3:48 [PATCH] drm/amdgpu: fix sdma doorbell init ordering on APUs Alex Deucher
  2022-10-20  5:59 ` Christian König
@ 2022-10-20 14:44 ` Shuah Khan
  1 sibling, 0 replies; 3+ messages in thread
From: Shuah Khan @ 2022-10-20 14:44 UTC (permalink / raw)
  To: Alex Deucher, amd-gfx; +Cc: Shuah Khan

On 10/19/22 21:48, Alex Deucher wrote:
> Commit 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
> uncovered a bug in amdgpu that required a reordering of the driver
> init sequence to avoid accessing a special register on the GPU
> before it was properly set up leading to an PCI AER error.  This
> reordering uncovered a different hw programming ordering dependency
> in some APUs where the SDMA doorbells need to be programmed before
> the GFX doorbells. To fix this, move the SDMA doorbell programming
> back into the soc15 common code, but use the actual doorbell range
> values directly rather than the values stored in the ring structure
> since those will not be initialized at this point.
> 
> This is a partial revert, but with the doorbell assignment
> fixed so the proper doorbell index is set before it's used.
> 
> Fixes: e3163bc8ffdfdb ("drm/amdgpu: move nbio sdma_doorbell_range() into sdma code for vega")
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> Cc: skhan@linuxfoundation.org

Thank you for fixing this quickly and getting me back to 6.1-rc1
on my primary system.

Reported-and-Tested-by: Shuah Khan <skhan@linuxfoundation.org>

thanks,
-- Shuah



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-10-21  7:32 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-20  3:48 [PATCH] drm/amdgpu: fix sdma doorbell init ordering on APUs Alex Deucher
2022-10-20  5:59 ` Christian König
2022-10-20 14:44 ` Shuah Khan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.