All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdgpu: clean wptr on wb when gpu recovery
@ 2020-02-28  6:31 Yintian Tao
  2020-02-28  9:19 ` Christian König
  0 siblings, 1 reply; 3+ messages in thread
From: Yintian Tao @ 2020-02-28  6:31 UTC (permalink / raw)
  To: christian.koenig, Alexander.Deucher; +Cc: amd-gfx, Yintian Tao

The TDR will be randomly failed due to compute ring
test failure. If the compute ring wptr & 0x7ff(ring_buf_mask)
is 0x100 then after map mqd the compute ring rptr will be
synced with 0x100. And the ring test packet size is also 0x100.
Then after invocation of amdgpu_ring_commit, the cp will not
really handle the packet on the ring buffer because rptr is equal to wptr.

Signed-off-by: Yintian Tao <yttao@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 1 +
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 44f00ecea322..5df1a6d45457 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -3508,6 +3508,7 @@ static int gfx_v10_0_kcq_init_queue(struct amdgpu_ring *ring)
 
 		/* reset ring buffer */
 		ring->wptr = 0;
+		atomic64_set((atomic64_t *)&adev->wb.wb[ring->wptr_offs], 0);
 		amdgpu_ring_clear_ring(ring);
 	} else {
 		amdgpu_ring_clear_ring(ring);
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 4135e4126e82..ac22490e8656 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -3664,6 +3664,7 @@ static int gfx_v9_0_kcq_init_queue(struct amdgpu_ring *ring)
 
 		/* reset ring buffer */
 		ring->wptr = 0;
+		atomic64_set((atomic64_t *)&adev->wb.wb[ring->wptr_offs], 0);
 		amdgpu_ring_clear_ring(ring);
 	} else {
 		amdgpu_ring_clear_ring(ring);
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] drm/amdgpu: clean wptr on wb when gpu recovery
  2020-02-28  6:31 [PATCH] drm/amdgpu: clean wptr on wb when gpu recovery Yintian Tao
@ 2020-02-28  9:19 ` Christian König
  2020-02-28  9:25   ` Liu, Monk
  0 siblings, 1 reply; 3+ messages in thread
From: Christian König @ 2020-02-28  9:19 UTC (permalink / raw)
  To: Yintian Tao, christian.koenig, Alexander.Deucher, monk.liu; +Cc: amd-gfx

Am 28.02.20 um 07:31 schrieb Yintian Tao:
> The TDR will be randomly failed due to compute ring
> test failure. If the compute ring wptr & 0x7ff(ring_buf_mask)
> is 0x100 then after map mqd the compute ring rptr will be
> synced with 0x100. And the ring test packet size is also 0x100.
> Then after invocation of amdgpu_ring_commit, the cp will not
> really handle the packet on the ring buffer because rptr is equal to wptr.
>
> Signed-off-by: Yintian Tao <yttao@amd.com>

Of hand that looks correct to me, but I can't fully judge if that won't 
have any negative side effects. Patch is Acked-by: Christian König 
<christian.koenig@amd.com> for now.

Monk according to git you modified that function as well. Could this 
have any potential negative effect for SRIOV? I don't think so, but 
better save than sorry.

Regards,
Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 1 +
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 1 +
>   2 files changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index 44f00ecea322..5df1a6d45457 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -3508,6 +3508,7 @@ static int gfx_v10_0_kcq_init_queue(struct amdgpu_ring *ring)
>   
>   		/* reset ring buffer */
>   		ring->wptr = 0;
> +		atomic64_set((atomic64_t *)&adev->wb.wb[ring->wptr_offs], 0);
>   		amdgpu_ring_clear_ring(ring);
>   	} else {
>   		amdgpu_ring_clear_ring(ring);
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 4135e4126e82..ac22490e8656 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -3664,6 +3664,7 @@ static int gfx_v9_0_kcq_init_queue(struct amdgpu_ring *ring)
>   
>   		/* reset ring buffer */
>   		ring->wptr = 0;
> +		atomic64_set((atomic64_t *)&adev->wb.wb[ring->wptr_offs], 0);
>   		amdgpu_ring_clear_ring(ring);
>   	} else {
>   		amdgpu_ring_clear_ring(ring);

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: [PATCH] drm/amdgpu: clean wptr on wb when gpu recovery
  2020-02-28  9:19 ` Christian König
@ 2020-02-28  9:25   ` Liu, Monk
  0 siblings, 0 replies; 3+ messages in thread
From: Liu, Monk @ 2020-02-28  9:25 UTC (permalink / raw)
  To: Koenig, Christian, Tao, Yintian, Deucher, Alexander; +Cc: amd-gfx

This is a clear fix :

After TDR we have a compute ring HQD restore from its MQD, but the MQD only record "WPTR_ADDR_LO/HI" so once
HQD restored the MEC would immediately read value from "WPTR_ADDR_LO/HI" which is a WB memory,  and that value is sometime not "0"  (because TDR won't clear WB, its value is what a hang process left there )
So MEC consider there is command in RB (since RPTR != WPTR) thus lead to further hang 

Reviewed-by: Monk Liu <monk.liu@amd.com>

_____________________________________
Monk Liu|GPU Virtualization Team |AMD


-----Original Message-----
From: Christian König <ckoenig.leichtzumerken@gmail.com> 
Sent: Friday, February 28, 2020 5:20 PM
To: Tao, Yintian <Yintian.Tao@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; Deucher, Alexander <Alexander.Deucher@amd.com>; Liu, Monk <Monk.Liu@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: clean wptr on wb when gpu recovery

Am 28.02.20 um 07:31 schrieb Yintian Tao:
> The TDR will be randomly failed due to compute ring test failure. If 
> the compute ring wptr & 0x7ff(ring_buf_mask) is 0x100 then after map 
> mqd the compute ring rptr will be synced with 0x100. And the ring test 
> packet size is also 0x100.
> Then after invocation of amdgpu_ring_commit, the cp will not really 
> handle the packet on the ring buffer because rptr is equal to wptr.
>
> Signed-off-by: Yintian Tao <yttao@amd.com>

Of hand that looks correct to me, but I can't fully judge if that won't have any negative side effects. Patch is Acked-by: Christian König <christian.koenig@amd.com> for now.

Monk according to git you modified that function as well. Could this have any potential negative effect for SRIOV? I don't think so, but better save than sorry.

Regards,
Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 1 +
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 1 +
>   2 files changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index 44f00ecea322..5df1a6d45457 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -3508,6 +3508,7 @@ static int gfx_v10_0_kcq_init_queue(struct 
> amdgpu_ring *ring)
>   
>   		/* reset ring buffer */
>   		ring->wptr = 0;
> +		atomic64_set((atomic64_t *)&adev->wb.wb[ring->wptr_offs], 0);
>   		amdgpu_ring_clear_ring(ring);
>   	} else {
>   		amdgpu_ring_clear_ring(ring);
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 4135e4126e82..ac22490e8656 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -3664,6 +3664,7 @@ static int gfx_v9_0_kcq_init_queue(struct 
> amdgpu_ring *ring)
>   
>   		/* reset ring buffer */
>   		ring->wptr = 0;
> +		atomic64_set((atomic64_t *)&adev->wb.wb[ring->wptr_offs], 0);
>   		amdgpu_ring_clear_ring(ring);
>   	} else {
>   		amdgpu_ring_clear_ring(ring);

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-02-28  9:26 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-28  6:31 [PATCH] drm/amdgpu: clean wptr on wb when gpu recovery Yintian Tao
2020-02-28  9:19 ` Christian König
2020-02-28  9:25   ` Liu, Monk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.