All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdgpu: Fix compute ring 1.0.0 failure after reset
@ 2018-10-25 20:16 Andrey Grodzovsky
       [not found] ` <1540498601-5270-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Andrey Grodzovsky @ 2018-10-25 20:16 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Alexander.Deucher-5C7GfCeVMHo, Andrey Grodzovsky,
	Hawking.Zhang-5C7GfCeVMHo

Problem: After GPU reset on dGPUs with gfx8 compute ring
1.0.0 fails to pass the ring test. Ring registers inspection
shows that it's active and no hang is observed (rptr == wptr)
No significant diffs were observed between CP_HQD* registers
for the ring in good and bad shape.

Fix: No clear reason why but reversing the order of ring tests
fixes the problem.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index b2e1376..02f8ca5 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -4811,8 +4811,10 @@ static int gfx_v8_0_kcq_resume(struct amdgpu_device *adev)
 	if (r)
 		goto done;
 
-	/* Test KCQs */
-	for (i = 0; i < adev->gfx.num_compute_rings; i++) {
+	/* Test KCQs - reversing the order of rings seems to fix ring test failure
+	 * after GPU reset
+	 */
+	for (i = adev->gfx.num_compute_rings - 1; i >= 0; i--) {
 		ring = &adev->gfx.compute_ring[i];
 		r = amdgpu_ring_test_helper(ring);
 	}
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] drm/amdgpu: Fix compute ring 1.0.0 failure after reset
       [not found] ` <1540498601-5270-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>
@ 2018-10-26  8:05   ` Christian König
       [not found]     ` <c402ce16-e8e3-78ee-3fb2-666d09b0807b-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Christian König @ 2018-10-26  8:05 UTC (permalink / raw)
  To: Andrey Grodzovsky, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Alexander.Deucher-5C7GfCeVMHo, Hawking.Zhang-5C7GfCeVMHo

Am 25.10.18 um 22:16 schrieb Andrey Grodzovsky:
> Problem: After GPU reset on dGPUs with gfx8 compute ring
> 1.0.0 fails to pass the ring test. Ring registers inspection
> shows that it's active and no hang is observed (rptr == wptr)
> No significant diffs were observed between CP_HQD* registers
> for the ring in good and bad shape.
>
> Fix: No clear reason why but reversing the order of ring tests
> fixes the problem.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

Mhm, maybe try adding a delay before the ring test?

Could be that the rings are started in reverse order as well and for 
some reason the first one is start tested to quickly after a reset.

Anyway patch is Acked-by: Christian König <christian.koenig@amd.com>

Thanks,
Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> index b2e1376..02f8ca5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> @@ -4811,8 +4811,10 @@ static int gfx_v8_0_kcq_resume(struct amdgpu_device *adev)
>   	if (r)
>   		goto done;
>   
> -	/* Test KCQs */
> -	for (i = 0; i < adev->gfx.num_compute_rings; i++) {
> +	/* Test KCQs - reversing the order of rings seems to fix ring test failure
> +	 * after GPU reset
> +	 */
> +	for (i = adev->gfx.num_compute_rings - 1; i >= 0; i--) {
>   		ring = &adev->gfx.compute_ring[i];
>   		r = amdgpu_ring_test_helper(ring);
>   	}

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] drm/amdgpu: Fix compute ring 1.0.0 failure after reset
       [not found]     ` <c402ce16-e8e3-78ee-3fb2-666d09b0807b-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2018-10-26 15:00       ` Grodzovsky, Andrey
  0 siblings, 0 replies; 3+ messages in thread
From: Grodzovsky, Andrey @ 2018-10-26 15:00 UTC (permalink / raw)
  To: Koenig, Christian, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Deucher, Alexander, Zhang, Hawking



On 10/26/2018 04:05 AM, Christian König wrote:
> Am 25.10.18 um 22:16 schrieb Andrey Grodzovsky:
>> Problem: After GPU reset on dGPUs with gfx8 compute ring
>> 1.0.0 fails to pass the ring test. Ring registers inspection
>> shows that it's active and no hang is observed (rptr == wptr)
>> No significant diffs were observed between CP_HQD* registers
>> for the ring in good and bad shape.
>>
>> Fix: No clear reason why but reversing the order of ring tests
>> fixes the problem.
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>
> Mhm, maybe try adding a delay before the ring test?
First thing I tried, didn't help.
>
> Could be that the rings are started in reverse order as well and for 
> some reason the first one is start tested to quickly after a reset.

No, KCQ queues mapping just before the test goes in 0..max order.

Andrey
>
> Anyway patch is Acked-by: Christian König <christian.koenig@amd.com>
>
> Thanks,
> Christian.
>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 6 ++++--
>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>> index b2e1376..02f8ca5 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>> @@ -4811,8 +4811,10 @@ static int gfx_v8_0_kcq_resume(struct 
>> amdgpu_device *adev)
>>       if (r)
>>           goto done;
>>   -    /* Test KCQs */
>> -    for (i = 0; i < adev->gfx.num_compute_rings; i++) {
>> +    /* Test KCQs - reversing the order of rings seems to fix ring 
>> test failure
>> +     * after GPU reset
>> +     */
>> +    for (i = adev->gfx.num_compute_rings - 1; i >= 0; i--) {
>>           ring = &adev->gfx.compute_ring[i];
>>           r = amdgpu_ring_test_helper(ring);
>>       }
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-10-26 15:00 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-25 20:16 [PATCH] drm/amdgpu: Fix compute ring 1.0.0 failure after reset Andrey Grodzovsky
     [not found] ` <1540498601-5270-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>
2018-10-26  8:05   ` Christian König
     [not found]     ` <c402ce16-e8e3-78ee-3fb2-666d09b0807b-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-10-26 15:00       ` Grodzovsky, Andrey

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.