* [PATCH] drm/amdgpu: Fix compute ring 1.0.0 failure after reset
@ 2018-10-25 20:16 Andrey Grodzovsky
[not found] ` <1540498601-5270-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>
0 siblings, 1 reply; 3+ messages in thread
From: Andrey Grodzovsky @ 2018-10-25 20:16 UTC (permalink / raw)
To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Cc: Alexander.Deucher-5C7GfCeVMHo, Andrey Grodzovsky,
Hawking.Zhang-5C7GfCeVMHo
Problem: After GPU reset on dGPUs with gfx8 compute ring
1.0.0 fails to pass the ring test. Ring registers inspection
shows that it's active and no hang is observed (rptr == wptr)
No significant diffs were observed between CP_HQD* registers
for the ring in good and bad shape.
Fix: No clear reason why but reversing the order of ring tests
fixes the problem.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index b2e1376..02f8ca5 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -4811,8 +4811,10 @@ static int gfx_v8_0_kcq_resume(struct amdgpu_device *adev)
if (r)
goto done;
- /* Test KCQs */
- for (i = 0; i < adev->gfx.num_compute_rings; i++) {
+ /* Test KCQs - reversing the order of rings seems to fix ring test failure
+ * after GPU reset
+ */
+ for (i = adev->gfx.num_compute_rings - 1; i >= 0; i--) {
ring = &adev->gfx.compute_ring[i];
r = amdgpu_ring_test_helper(ring);
}
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] drm/amdgpu: Fix compute ring 1.0.0 failure after reset
[not found] ` <1540498601-5270-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>
@ 2018-10-26 8:05 ` Christian König
[not found] ` <c402ce16-e8e3-78ee-3fb2-666d09b0807b-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
0 siblings, 1 reply; 3+ messages in thread
From: Christian König @ 2018-10-26 8:05 UTC (permalink / raw)
To: Andrey Grodzovsky, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Cc: Alexander.Deucher-5C7GfCeVMHo, Hawking.Zhang-5C7GfCeVMHo
Am 25.10.18 um 22:16 schrieb Andrey Grodzovsky:
> Problem: After GPU reset on dGPUs with gfx8 compute ring
> 1.0.0 fails to pass the ring test. Ring registers inspection
> shows that it's active and no hang is observed (rptr == wptr)
> No significant diffs were observed between CP_HQD* registers
> for the ring in good and bad shape.
>
> Fix: No clear reason why but reversing the order of ring tests
> fixes the problem.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Mhm, maybe try adding a delay before the ring test?
Could be that the rings are started in reverse order as well and for
some reason the first one is start tested to quickly after a reset.
Anyway patch is Acked-by: Christian König <christian.koenig@amd.com>
Thanks,
Christian.
> ---
> drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> index b2e1376..02f8ca5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> @@ -4811,8 +4811,10 @@ static int gfx_v8_0_kcq_resume(struct amdgpu_device *adev)
> if (r)
> goto done;
>
> - /* Test KCQs */
> - for (i = 0; i < adev->gfx.num_compute_rings; i++) {
> + /* Test KCQs - reversing the order of rings seems to fix ring test failure
> + * after GPU reset
> + */
> + for (i = adev->gfx.num_compute_rings - 1; i >= 0; i--) {
> ring = &adev->gfx.compute_ring[i];
> r = amdgpu_ring_test_helper(ring);
> }
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] drm/amdgpu: Fix compute ring 1.0.0 failure after reset
[not found] ` <c402ce16-e8e3-78ee-3fb2-666d09b0807b-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2018-10-26 15:00 ` Grodzovsky, Andrey
0 siblings, 0 replies; 3+ messages in thread
From: Grodzovsky, Andrey @ 2018-10-26 15:00 UTC (permalink / raw)
To: Koenig, Christian, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Cc: Deucher, Alexander, Zhang, Hawking
On 10/26/2018 04:05 AM, Christian König wrote:
> Am 25.10.18 um 22:16 schrieb Andrey Grodzovsky:
>> Problem: After GPU reset on dGPUs with gfx8 compute ring
>> 1.0.0 fails to pass the ring test. Ring registers inspection
>> shows that it's active and no hang is observed (rptr == wptr)
>> No significant diffs were observed between CP_HQD* registers
>> for the ring in good and bad shape.
>>
>> Fix: No clear reason why but reversing the order of ring tests
>> fixes the problem.
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>
> Mhm, maybe try adding a delay before the ring test?
First thing I tried, didn't help.
>
> Could be that the rings are started in reverse order as well and for
> some reason the first one is start tested to quickly after a reset.
No, KCQ queues mapping just before the test goes in 0..max order.
Andrey
>
> Anyway patch is Acked-by: Christian König <christian.koenig@amd.com>
>
> Thanks,
> Christian.
>
>> ---
>> drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 6 ++++--
>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>> index b2e1376..02f8ca5 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>> @@ -4811,8 +4811,10 @@ static int gfx_v8_0_kcq_resume(struct
>> amdgpu_device *adev)
>> if (r)
>> goto done;
>> - /* Test KCQs */
>> - for (i = 0; i < adev->gfx.num_compute_rings; i++) {
>> + /* Test KCQs - reversing the order of rings seems to fix ring
>> test failure
>> + * after GPU reset
>> + */
>> + for (i = adev->gfx.num_compute_rings - 1; i >= 0; i--) {
>> ring = &adev->gfx.compute_ring[i];
>> r = amdgpu_ring_test_helper(ring);
>> }
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2018-10-26 15:00 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-25 20:16 [PATCH] drm/amdgpu: Fix compute ring 1.0.0 failure after reset Andrey Grodzovsky
[not found] ` <1540498601-5270-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>
2018-10-26 8:05 ` Christian König
[not found] ` <c402ce16-e8e3-78ee-3fb2-666d09b0807b-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-10-26 15:00 ` Grodzovsky, Andrey
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.