All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Revert "drm/amdgpu/gfx8: Fix compute ring failure after resetting"
@ 2018-01-25 23:06 Andrey Grodzovsky
       [not found] ` <1516921601-15815-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Andrey Grodzovsky @ 2018-01-25 23:06 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Alexander.Deucher-5C7GfCeVMHo, Andrey Grodzovsky,
	Xiangliang.Yu-5C7GfCeVMHo, Christian.Koenig-5C7GfCeVMHo

This reverts commit 75737cb4eb78c7f185e4700b4aa20cf7a3381aca.

Fixes GFX ring test failure after HW reset.
No compute ring test failures were observed with the change reverted.
So seems like whatever problem that change was addressing is not
present anymore.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 1207f36..8a65b53 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -4847,6 +4847,9 @@ static int gfx_v8_0_kcq_init_queue(struct amdgpu_ring *ring)
 		/* reset MQD to a clean status */
 		if (adev->gfx.mec.mqd_backup[mqd_idx])
 			memcpy(mqd, adev->gfx.mec.mqd_backup[mqd_idx], sizeof(struct vi_mqd_allocation));
+		/* reset ring buffer */
+		ring->wptr = 0;
+		amdgpu_ring_clear_ring(ring);
 	} else {
 		amdgpu_ring_clear_ring(ring);
 	}
@@ -4921,13 +4924,6 @@ static int gfx_v8_0_kiq_resume(struct amdgpu_device *adev)
 	/* Test KCQs */
 	for (i = 0; i < adev->gfx.num_compute_rings; i++) {
 		ring = &adev->gfx.compute_ring[i];
-		if (adev->in_gpu_reset) {
-			/* move reset ring buffer to here to workaround
-			 * compute ring test failed
-			 */
-			ring->wptr = 0;
-			amdgpu_ring_clear_ring(ring);
-		}
 		ring->ready = true;
 		r = amdgpu_ring_test_ring(ring);
 		if (r)
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* RE: [PATCH] Revert "drm/amdgpu/gfx8: Fix compute ring failure after resetting"
       [not found] ` <1516921601-15815-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>
@ 2018-01-26  1:59   ` Yu, Xiangliang
       [not found]     ` <BN3PR1201MB09307139AD6C494B5F502DC3EBE00-2IT6yTwQdphHRjfJ0jqoHGrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  2018-01-31 17:08   ` Christian König
  1 sibling, 1 reply; 7+ messages in thread
From: Yu, Xiangliang @ 2018-01-26  1:59 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Deucher, Alexander, Grodzovsky, Andrey, Koenig, Christian

Did you test reset case in sriov?

> -----Original Message-----
> From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf
> Of Andrey Grodzovsky
> Sent: Friday, January 26, 2018 7:07 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Grodzovsky,
> Andrey <Andrey.Grodzovsky@amd.com>; Yu, Xiangliang
> <Xiangliang.Yu@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>
> Subject: [PATCH] Revert "drm/amdgpu/gfx8: Fix compute ring failure after
> resetting"
> 
> This reverts commit 75737cb4eb78c7f185e4700b4aa20cf7a3381aca.
> 
> Fixes GFX ring test failure after HW reset.
> No compute ring test failures were observed with the change reverted.
> So seems like whatever problem that change was addressing is not present
> anymore.
> 
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 10 +++-------
>  1 file changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> index 1207f36..8a65b53 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> @@ -4847,6 +4847,9 @@ static int gfx_v8_0_kcq_init_queue(struct
> amdgpu_ring *ring)
>  		/* reset MQD to a clean status */
>  		if (adev->gfx.mec.mqd_backup[mqd_idx])
>  			memcpy(mqd, adev-
> >gfx.mec.mqd_backup[mqd_idx], sizeof(struct vi_mqd_allocation));
> +		/* reset ring buffer */
> +		ring->wptr = 0;
> +		amdgpu_ring_clear_ring(ring);
>  	} else {
>  		amdgpu_ring_clear_ring(ring);
>  	}
> @@ -4921,13 +4924,6 @@ static int gfx_v8_0_kiq_resume(struct
> amdgpu_device *adev)
>  	/* Test KCQs */
>  	for (i = 0; i < adev->gfx.num_compute_rings; i++) {
>  		ring = &adev->gfx.compute_ring[i];
> -		if (adev->in_gpu_reset) {
> -			/* move reset ring buffer to here to workaround
> -			 * compute ring test failed
> -			 */
> -			ring->wptr = 0;
> -			amdgpu_ring_clear_ring(ring);
> -		}
>  		ring->ready = true;
>  		r = amdgpu_ring_test_ring(ring);
>  		if (r)
> --
> 2.7.4
> 
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Revert "drm/amdgpu/gfx8: Fix compute ring failure after resetting"
       [not found]     ` <BN3PR1201MB09307139AD6C494B5F502DC3EBE00-2IT6yTwQdphHRjfJ0jqoHGrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2018-01-26  3:28       ` Grodzovsky, Andrey
       [not found]         ` <BN6PR1201MB0115889698061BCBB4545A79EAE00-6iU6OBHu2P+5DJ1TLF5OxWrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Grodzovsky, Andrey @ 2018-01-26  3:28 UTC (permalink / raw)
  To: Yu, Xiangliang, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Deucher, Alexander, Koenig, Christian

No, just bare metal, I assumed your problem was with compute ring test failure which I didn't see. Can you please recheck if reverting this still failing on SRIOV ?
If so we obviously need to keep looking how to fix it.

Thanks,
Andrey

________________________________________
From: Yu, Xiangliang
Sent: 25 January 2018 20:59:45
To: Grodzovsky, Andrey; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander; Grodzovsky, Andrey; Koenig, Christian
Subject: RE: [PATCH] Revert "drm/amdgpu/gfx8: Fix compute ring failure after resetting"

Did you test reset case in sriov?

> -----Original Message-----
> From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf
> Of Andrey Grodzovsky
> Sent: Friday, January 26, 2018 7:07 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Grodzovsky,
> Andrey <Andrey.Grodzovsky@amd.com>; Yu, Xiangliang
> <Xiangliang.Yu@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>
> Subject: [PATCH] Revert "drm/amdgpu/gfx8: Fix compute ring failure after
> resetting"
>
> This reverts commit 75737cb4eb78c7f185e4700b4aa20cf7a3381aca.
>
> Fixes GFX ring test failure after HW reset.
> No compute ring test failures were observed with the change reverted.
> So seems like whatever problem that change was addressing is not present
> anymore.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 10 +++-------
>  1 file changed, 3 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> index 1207f36..8a65b53 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> @@ -4847,6 +4847,9 @@ static int gfx_v8_0_kcq_init_queue(struct
> amdgpu_ring *ring)
>               /* reset MQD to a clean status */
>               if (adev->gfx.mec.mqd_backup[mqd_idx])
>                       memcpy(mqd, adev-
> >gfx.mec.mqd_backup[mqd_idx], sizeof(struct vi_mqd_allocation));
> +             /* reset ring buffer */
> +             ring->wptr = 0;
> +             amdgpu_ring_clear_ring(ring);
>       } else {
>               amdgpu_ring_clear_ring(ring);
>       }
> @@ -4921,13 +4924,6 @@ static int gfx_v8_0_kiq_resume(struct
> amdgpu_device *adev)
>       /* Test KCQs */
>       for (i = 0; i < adev->gfx.num_compute_rings; i++) {
>               ring = &adev->gfx.compute_ring[i];
> -             if (adev->in_gpu_reset) {
> -                     /* move reset ring buffer to here to workaround
> -                      * compute ring test failed
> -                      */
> -                     ring->wptr = 0;
> -                     amdgpu_ring_clear_ring(ring);
> -             }
>               ring->ready = true;
>               r = amdgpu_ring_test_ring(ring);
>               if (r)
> --
> 2.7.4
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] Revert "drm/amdgpu/gfx8: Fix compute ring failure after resetting"
       [not found]         ` <BN6PR1201MB0115889698061BCBB4545A79EAE00-6iU6OBHu2P+5DJ1TLF5OxWrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2018-01-26  4:33           ` Yu, Xiangliang
       [not found]             ` <BY2PR1201MB09351A0D0D0D09849008B3B9EBE00-O28G1zQ8oGkaqtME6NEo1mrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Yu, Xiangliang @ 2018-01-26  4:33 UTC (permalink / raw)
  To: Grodzovsky, Andrey, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Deng, Emily
  Cc: Deucher, Alexander, Koenig, Christian

You can add amdgpu_sriov_vf() check to avoid breaking sriov.

> -----Original Message-----
> From: Grodzovsky, Andrey
> Sent: Friday, January 26, 2018 11:29 AM
> To: Yu, Xiangliang <Xiangliang.Yu@amd.com>; amd-
> gfx@lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian
> <Christian.Koenig@amd.com>
> Subject: Re: [PATCH] Revert "drm/amdgpu/gfx8: Fix compute ring failure
> after resetting"
> 
> No, just bare metal, I assumed your problem was with compute ring test
> failure which I didn't see. Can you please recheck if reverting this still failing
> on SRIOV ?
> If so we obviously need to keep looking how to fix it.
> 
> Thanks,
> Andrey
> 
> ________________________________________
> From: Yu, Xiangliang
> Sent: 25 January 2018 20:59:45
> To: Grodzovsky, Andrey; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander; Grodzovsky, Andrey; Koenig, Christian
> Subject: RE: [PATCH] Revert "drm/amdgpu/gfx8: Fix compute ring failure
> after resetting"
> 
> Did you test reset case in sriov?
> 
> > -----Original Message-----
> > From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf
> > Of Andrey Grodzovsky
> > Sent: Friday, January 26, 2018 7:07 AM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Grodzovsky,
> Andrey
> > <Andrey.Grodzovsky@amd.com>; Yu, Xiangliang
> <Xiangliang.Yu@amd.com>;
> > Koenig, Christian <Christian.Koenig@amd.com>
> > Subject: [PATCH] Revert "drm/amdgpu/gfx8: Fix compute ring failure
> > after resetting"
> >
> > This reverts commit 75737cb4eb78c7f185e4700b4aa20cf7a3381aca.
> >
> > Fixes GFX ring test failure after HW reset.
> > No compute ring test failures were observed with the change reverted.
> > So seems like whatever problem that change was addressing is not
> > present anymore.
> >
> > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 10 +++-------
> >  1 file changed, 3 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> > b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> > index 1207f36..8a65b53 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> > @@ -4847,6 +4847,9 @@ static int gfx_v8_0_kcq_init_queue(struct
> > amdgpu_ring *ring)
> >               /* reset MQD to a clean status */
> >               if (adev->gfx.mec.mqd_backup[mqd_idx])
> >                       memcpy(mqd, adev-
> > >gfx.mec.mqd_backup[mqd_idx], sizeof(struct vi_mqd_allocation));
> > +             /* reset ring buffer */
> > +             ring->wptr = 0;
> > +             amdgpu_ring_clear_ring(ring);
> >       } else {
> >               amdgpu_ring_clear_ring(ring);
> >       }
> > @@ -4921,13 +4924,6 @@ static int gfx_v8_0_kiq_resume(struct
> > amdgpu_device *adev)
> >       /* Test KCQs */
> >       for (i = 0; i < adev->gfx.num_compute_rings; i++) {
> >               ring = &adev->gfx.compute_ring[i];
> > -             if (adev->in_gpu_reset) {
> > -                     /* move reset ring buffer to here to workaround
> > -                      * compute ring test failed
> > -                      */
> > -                     ring->wptr = 0;
> > -                     amdgpu_ring_clear_ring(ring);
> > -             }
> >               ring->ready = true;
> >               r = amdgpu_ring_test_ring(ring);
> >               if (r)
> > --
> > 2.7.4
> >
> > _______________________________________________
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Revert "drm/amdgpu/gfx8: Fix compute ring failure after resetting"
       [not found]             ` <BY2PR1201MB09351A0D0D0D09849008B3B9EBE00-O28G1zQ8oGkaqtME6NEo1mrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2018-01-31 16:59               ` Andrey Grodzovsky
       [not found]                 ` <bb6b8b74-4587-8e56-9617-b03fe572eec7-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Andrey Grodzovsky @ 2018-01-31 16:59 UTC (permalink / raw)
  To: Yu, Xiangliang, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Deng, Emily
  Cc: Deucher, Alexander, Wu, Haisheng, Koenig, Christian


[-- Attachment #1.1: Type: text/plain, Size: 4810 bytes --]



On 01/25/2018 11:33 PM, Yu, Xiangliang wrote:
> You can add amdgpu_sriov_vf() check to avoid breaking sriov.

+ Haisheng

As found out after more debugging  and discussion with Haisheng from HW 
team, the sequence introduced by this change is is wrong, it causes 
compute rings test failure because "the ring buffer has to be filled 
with valid packets (such as NOPs) first before submitting MAP_QUEUEs 
packet into KIQ. Once a compute engine is mapped, it will immediately 
execute the ring buffer if the RTPR is not equal to the WTPR from the 
MQD. It could lead to engine hang if the ring buffer filled with random 
data."

Hence we would like to revert this change in amd-staging-drm-next and 
continue investigation on the SR-IOV side why the correct programming 
sequence doesn't work there. I myself currently working on setting up 
SR-IOV setup to take a look at that.

Thanks,
Andrey
>
>> -----Original Message-----
>> From: Grodzovsky, Andrey
>> Sent: Friday, January 26, 2018 11:29 AM
>> To: Yu, Xiangliang <Xiangliang.Yu-5C7GfCeVMHo@public.gmane.org>; amd-
>> gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>> Cc: Deucher, Alexander <Alexander.Deucher-5C7GfCeVMHo@public.gmane.org>; Koenig, Christian
>> <Christian.Koenig-5C7GfCeVMHo@public.gmane.org>
>> Subject: Re: [PATCH] Revert "drm/amdgpu/gfx8: Fix compute ring failure
>> after resetting"
>>
>> No, just bare metal, I assumed your problem was with compute ring test
>> failure which I didn't see. Can you please recheck if reverting this still failing
>> on SRIOV ?
>> If so we obviously need to keep looking how to fix it.
>>
>> Thanks,
>> Andrey
>>
>> ________________________________________
>> From: Yu, Xiangliang
>> Sent: 25 January 2018 20:59:45
>> To: Grodzovsky, Andrey; amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>> Cc: Deucher, Alexander; Grodzovsky, Andrey; Koenig, Christian
>> Subject: RE: [PATCH] Revert "drm/amdgpu/gfx8: Fix compute ring failure
>> after resetting"
>>
>> Did you test reset case in sriov?
>>
>>> -----Original Message-----
>>> From: amd-gfx [mailto:amd-gfx-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org] On Behalf
>>> Of Andrey Grodzovsky
>>> Sent: Friday, January 26, 2018 7:07 AM
>>> To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>> Cc: Deucher, Alexander <Alexander.Deucher-5C7GfCeVMHo@public.gmane.org>; Grodzovsky,
>> Andrey
>>> <Andrey.Grodzovsky-5C7GfCeVMHo@public.gmane.org>; Yu, Xiangliang
>> <Xiangliang.Yu-5C7GfCeVMHo@public.gmane.org>;
>>> Koenig, Christian <Christian.Koenig-5C7GfCeVMHo@public.gmane.org>
>>> Subject: [PATCH] Revert "drm/amdgpu/gfx8: Fix compute ring failure
>>> after resetting"
>>>
>>> This reverts commit 75737cb4eb78c7f185e4700b4aa20cf7a3381aca.
>>>
>>> Fixes GFX ring test failure after HW reset.
>>> No compute ring test failures were observed with the change reverted.
>>> So seems like whatever problem that change was addressing is not
>>> present anymore.
>>>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 10 +++-------
>>>   1 file changed, 3 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>>> index 1207f36..8a65b53 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>>> @@ -4847,6 +4847,9 @@ static int gfx_v8_0_kcq_init_queue(struct
>>> amdgpu_ring *ring)
>>>                /* reset MQD to a clean status */
>>>                if (adev->gfx.mec.mqd_backup[mqd_idx])
>>>                        memcpy(mqd, adev-
>>>> gfx.mec.mqd_backup[mqd_idx], sizeof(struct vi_mqd_allocation));
>>> +             /* reset ring buffer */
>>> +             ring->wptr = 0;
>>> +             amdgpu_ring_clear_ring(ring);
>>>        } else {
>>>                amdgpu_ring_clear_ring(ring);
>>>        }
>>> @@ -4921,13 +4924,6 @@ static int gfx_v8_0_kiq_resume(struct
>>> amdgpu_device *adev)
>>>        /* Test KCQs */
>>>        for (i = 0; i < adev->gfx.num_compute_rings; i++) {
>>>                ring = &adev->gfx.compute_ring[i];
>>> -             if (adev->in_gpu_reset) {
>>> -                     /* move reset ring buffer to here to workaround
>>> -                      * compute ring test failed
>>> -                      */
>>> -                     ring->wptr = 0;
>>> -                     amdgpu_ring_clear_ring(ring);
>>> -             }
>>>                ring->ready = true;
>>>                r = amdgpu_ring_test_ring(ring);
>>>                if (r)
>>> --
>>> 2.7.4
>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[-- Attachment #1.2: Type: text/html, Size: 7357 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Revert "drm/amdgpu/gfx8: Fix compute ring failure after resetting"
       [not found] ` <1516921601-15815-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>
  2018-01-26  1:59   ` Yu, Xiangliang
@ 2018-01-31 17:08   ` Christian König
  1 sibling, 0 replies; 7+ messages in thread
From: Christian König @ 2018-01-31 17:08 UTC (permalink / raw)
  To: Andrey Grodzovsky, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Alexander.Deucher-5C7GfCeVMHo, Xiangliang.Yu-5C7GfCeVMHo

Am 26.01.2018 um 00:06 schrieb Andrey Grodzovsky:
> This reverts commit 75737cb4eb78c7f185e4700b4aa20cf7a3381aca.
>
> Fixes GFX ring test failure after HW reset.
> No compute ring test failures were observed with the change reverted.
> So seems like whatever problem that change was addressing is not
> present anymore.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 10 +++-------
>   1 file changed, 3 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> index 1207f36..8a65b53 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> @@ -4847,6 +4847,9 @@ static int gfx_v8_0_kcq_init_queue(struct amdgpu_ring *ring)
>   		/* reset MQD to a clean status */
>   		if (adev->gfx.mec.mqd_backup[mqd_idx])
>   			memcpy(mqd, adev->gfx.mec.mqd_backup[mqd_idx], sizeof(struct vi_mqd_allocation));
> +		/* reset ring buffer */
> +		ring->wptr = 0;
> +		amdgpu_ring_clear_ring(ring);
>   	} else {
>   		amdgpu_ring_clear_ring(ring);
>   	}
> @@ -4921,13 +4924,6 @@ static int gfx_v8_0_kiq_resume(struct amdgpu_device *adev)
>   	/* Test KCQs */
>   	for (i = 0; i < adev->gfx.num_compute_rings; i++) {
>   		ring = &adev->gfx.compute_ring[i];
> -		if (adev->in_gpu_reset) {
> -			/* move reset ring buffer to here to workaround
> -			 * compute ring test failed
> -			 */
> -			ring->wptr = 0;
> -			amdgpu_ring_clear_ring(ring);
> -		}
>   		ring->ready = true;
>   		r = amdgpu_ring_test_ring(ring);
>   		if (r)

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] Revert "drm/amdgpu/gfx8: Fix compute ring failure after resetting"
       [not found]                 ` <bb6b8b74-4587-8e56-9617-b03fe572eec7-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-01  2:04                   ` Yu, Xiangliang
  0 siblings, 0 replies; 7+ messages in thread
From: Yu, Xiangliang @ 2018-02-01  2:04 UTC (permalink / raw)
  To: Grodzovsky, Andrey, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Deng, Emily
  Cc: Deucher, Alexander, Wu, Haisheng, Koenig, Christian


[-- Attachment #1.1: Type: text/plain, Size: 5279 bytes --]

Ok, thanks!


From: Grodzovsky, Andrey
Sent: Thursday, February 01, 2018 12:59 AM
To: Yu, Xiangliang <Xiangliang.Yu@amd.com>; amd-gfx@lists.freedesktop.org; Deng, Emily <Emily.Deng@amd.com>
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; Wu, Haisheng <Haisheng.Wu@amd.com>
Subject: Re: [PATCH] Revert "drm/amdgpu/gfx8: Fix compute ring failure after resetting"




On 01/25/2018 11:33 PM, Yu, Xiangliang wrote:

You can add amdgpu_sriov_vf() check to avoid breaking sriov.

+ Haisheng

As found out after more debugging  and discussion with Haisheng from HW team, the sequence introduced by this change is is wrong, it causes compute rings test failure because "the ring buffer has to be filled with valid packets (such as NOPs) first before submitting MAP_QUEUEs packet into KIQ. Once a compute engine is mapped, it will immediately execute the ring buffer if the RTPR is not equal to the WTPR from the MQD. It could lead to engine hang if the ring buffer filled with random data."

Hence we would like to revert this change in amd-staging-drm-next and continue investigation on the SR-IOV side why the correct programming sequence doesn't work there. I myself currently working on setting up SR-IOV setup to take a look at that.

Thanks,
Andrey






-----Original Message-----

From: Grodzovsky, Andrey

Sent: Friday, January 26, 2018 11:29 AM

To: Yu, Xiangliang <Xiangliang.Yu@amd.com><mailto:Xiangliang.Yu@amd.com>; amd-

gfx@lists.freedesktop.org<mailto:gfx@lists.freedesktop.org>

Cc: Deucher, Alexander <Alexander.Deucher@amd.com><mailto:Alexander.Deucher@amd.com>; Koenig, Christian

<Christian.Koenig@amd.com><mailto:Christian.Koenig@amd.com>

Subject: Re: [PATCH] Revert "drm/amdgpu/gfx8: Fix compute ring failure

after resetting"



No, just bare metal, I assumed your problem was with compute ring test

failure which I didn't see. Can you please recheck if reverting this still failing

on SRIOV ?

If so we obviously need to keep looking how to fix it.



Thanks,

Andrey



________________________________________

From: Yu, Xiangliang

Sent: 25 January 2018 20:59:45

To: Grodzovsky, Andrey; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>

Cc: Deucher, Alexander; Grodzovsky, Andrey; Koenig, Christian

Subject: RE: [PATCH] Revert "drm/amdgpu/gfx8: Fix compute ring failure

after resetting"



Did you test reset case in sriov?



-----Original Message-----

From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf

Of Andrey Grodzovsky

Sent: Friday, January 26, 2018 7:07 AM

To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>

Cc: Deucher, Alexander <Alexander.Deucher@amd.com><mailto:Alexander.Deucher@amd.com>; Grodzovsky,

Andrey

<Andrey.Grodzovsky@amd.com><mailto:Andrey.Grodzovsky@amd.com>; Yu, Xiangliang

<Xiangliang.Yu@amd.com><mailto:Xiangliang.Yu@amd.com>;

Koenig, Christian <Christian.Koenig@amd.com><mailto:Christian.Koenig@amd.com>

Subject: [PATCH] Revert "drm/amdgpu/gfx8: Fix compute ring failure

after resetting"



This reverts commit 75737cb4eb78c7f185e4700b4aa20cf7a3381aca.



Fixes GFX ring test failure after HW reset.

No compute ring test failures were observed with the change reverted.

So seems like whatever problem that change was addressing is not

present anymore.



Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com><mailto:andrey.grodzovsky@amd.com>

---

 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 10 +++-------

 1 file changed, 3 insertions(+), 7 deletions(-)



diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c

b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c

index 1207f36..8a65b53 100644

--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c

+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c

@@ -4847,6 +4847,9 @@ static int gfx_v8_0_kcq_init_queue(struct

amdgpu_ring *ring)

              /* reset MQD to a clean status */

              if (adev->gfx.mec.mqd_backup[mqd_idx])

                      memcpy(mqd, adev-

gfx.mec.mqd_backup[mqd_idx], sizeof(struct vi_mqd_allocation));

+             /* reset ring buffer */

+             ring->wptr = 0;

+             amdgpu_ring_clear_ring(ring);

      } else {

              amdgpu_ring_clear_ring(ring);

      }

@@ -4921,13 +4924,6 @@ static int gfx_v8_0_kiq_resume(struct

amdgpu_device *adev)

      /* Test KCQs */

      for (i = 0; i < adev->gfx.num_compute_rings; i++) {

              ring = &adev->gfx.compute_ring[i];

-             if (adev->in_gpu_reset) {

-                     /* move reset ring buffer to here to workaround

-                      * compute ring test failed

-                      */

-                     ring->wptr = 0;

-                     amdgpu_ring_clear_ring(ring);

-             }

              ring->ready = true;

              r = amdgpu_ring_test_ring(ring);

              if (r)

--

2.7.4



_______________________________________________

amd-gfx mailing list

amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>

https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[-- Attachment #1.2: Type: text/html, Size: 13381 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-02-01  2:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-25 23:06 [PATCH] Revert "drm/amdgpu/gfx8: Fix compute ring failure after resetting" Andrey Grodzovsky
     [not found] ` <1516921601-15815-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>
2018-01-26  1:59   ` Yu, Xiangliang
     [not found]     ` <BN3PR1201MB09307139AD6C494B5F502DC3EBE00-2IT6yTwQdphHRjfJ0jqoHGrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2018-01-26  3:28       ` Grodzovsky, Andrey
     [not found]         ` <BN6PR1201MB0115889698061BCBB4545A79EAE00-6iU6OBHu2P+5DJ1TLF5OxWrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2018-01-26  4:33           ` Yu, Xiangliang
     [not found]             ` <BY2PR1201MB09351A0D0D0D09849008B3B9EBE00-O28G1zQ8oGkaqtME6NEo1mrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2018-01-31 16:59               ` Andrey Grodzovsky
     [not found]                 ` <bb6b8b74-4587-8e56-9617-b03fe572eec7-5C7GfCeVMHo@public.gmane.org>
2018-02-01  2:04                   ` Yu, Xiangliang
2018-01-31 17:08   ` Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.