[PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list
@ 2019-04-01  8:58 wentalou
       [not found] ` <1554109125-21890-1-git-send-email-Wentao.Lou-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: wentalou @ 2019-04-01  8:58 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: wentalou

amdgpu_bo_restore_shadow would assign zero to r if succeeded.
r would remain zero if there is only one node in shadow_list.
current code would always return failure when r <= 0.

Change-Id: Iae6880e7c78b71fde6a6754c69665c2e312a80a5
Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index c4c61e9..5cf21a4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3171,6 +3171,7 @@ static int amdgpu_device_recover_vram(struct amdgpu_device *adev)
 	struct dma_fence *fence = NULL, *next = NULL;
 	struct amdgpu_bo *shadow;
 	long r = 1, tmo;
+	bool single_shadow = false;
 
 	if (amdgpu_sriov_runtime(adev))
 		tmo = msecs_to_jiffies(8000);
@@ -3194,10 +3195,12 @@ static int amdgpu_device_recover_vram(struct amdgpu_device *adev)
 			r = dma_fence_wait_timeout(fence, false, tmo);
 			dma_fence_put(fence);
 			fence = next;
+			single_shadow = false;
 			if (r <= 0)
 				break;
 		} else {
 			fence = next;
+			single_shadow = true;
 		}
 	}
 	mutex_unlock(&adev->shadow_list_lock);
@@ -3206,7 +3209,8 @@ static int amdgpu_device_recover_vram(struct amdgpu_device *adev)
 		tmo = dma_fence_wait_timeout(fence, false, tmo);
 	dma_fence_put(fence);
 
-	if (r <= 0 || tmo <= 0) {
+	/* r would be zero even if amdgpu_bo_restore_shadow succeeded when single shadow in list */
+	if (r < 0 || (r == 0 && !single_shadow) || tmo <= 0) {
 		DRM_ERROR("recover vram bo from shadow failed\n");
 		return -EIO;
 	}
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 7+ messages in thread

[parent not found: <1554109125-21890-1-git-send-email-Wentao.Lou-5C7GfCeVMHo@public.gmane.org>]

* RE: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list
       [not found] ` <1554109125-21890-1-git-send-email-Wentao.Lou-5C7GfCeVMHo@public.gmane.org>
@ 2019-04-02  4:03   ` Deng, Emily
       [not found]     ` <BN7PR12MB26445D172F6617137A8F905C8F560-Zx/IyJUqfGKoYeSiBV3SvgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Deng, Emily @ 2019-04-02  4:03 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Lou, Wentao

Maybe it will be better to add follow check, and change “if (r <= 0 || tmo <= 0) " to "if (r <0 || tmo <= 0)".
	r = dma_fence_wait_timeout(f, false, timeout);
	if (r == 0) {
		r = -ETIMEDOUT;
		break;
	} else if (r < 0) {
		break;
	}

Best wishes
Emily Deng


>-----Original Message-----
>From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of wentalou
>Sent: Monday, April 1, 2019 4:59 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Lou, Wentao <Wentao.Lou@amd.com>
>Subject: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if
>only one node in shadow_list
>
>amdgpu_bo_restore_shadow would assign zero to r if succeeded.
>r would remain zero if there is only one node in shadow_list.
>current code would always return failure when r <= 0.
>
>Change-Id: Iae6880e7c78b71fde6a6754c69665c2e312a80a5
>Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index c4c61e9..5cf21a4 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -3171,6 +3171,7 @@ static int amdgpu_device_recover_vram(struct
>amdgpu_device *adev)
> 	struct dma_fence *fence = NULL, *next = NULL;
> 	struct amdgpu_bo *shadow;
> 	long r = 1, tmo;
>+	bool single_shadow = false;
>
> 	if (amdgpu_sriov_runtime(adev))
> 		tmo = msecs_to_jiffies(8000);
>@@ -3194,10 +3195,12 @@ static int amdgpu_device_recover_vram(struct
>amdgpu_device *adev)
> 			r = dma_fence_wait_timeout(fence, false, tmo);
> 			dma_fence_put(fence);
> 			fence = next;
>+			single_shadow = false;
> 			if (r <= 0)
> 				break;
> 		} else {
> 			fence = next;
>+			single_shadow = true;
> 		}
> 	}
> 	mutex_unlock(&adev->shadow_list_lock);
>@@ -3206,7 +3209,8 @@ static int amdgpu_device_recover_vram(struct
>amdgpu_device *adev)
> 		tmo = dma_fence_wait_timeout(fence, false, tmo);
> 	dma_fence_put(fence);
>
>-	if (r <= 0 || tmo <= 0) {
>+	/* r would be zero even if amdgpu_bo_restore_shadow succeeded when
>single shadow in list */
>+	if (r < 0 || (r == 0 && !single_shadow) || tmo <= 0) {
> 		DRM_ERROR("recover vram bo from shadow failed\n");
> 		return -EIO;
> 	}
>--
>2.7.4
>
>_______________________________________________
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

[parent not found: <BN7PR12MB26445D172F6617137A8F905C8F560-Zx/IyJUqfGKoYeSiBV3SvgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>]

* Re: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list
       [not found]     ` <BN7PR12MB26445D172F6617137A8F905C8F560-Zx/IyJUqfGKoYeSiBV3SvgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2019-04-02  7:01       ` Christian König
       [not found]         ` <03f69f82-1352-8e4e-bd3b-21e5c031ec09-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Christian König @ 2019-04-02  7:01 UTC (permalink / raw)
  To: Deng, Emily, Lou, Wentao, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Yeah, agree that is much closer to a correct solution.

But even better would be to correctly update the timeout as well, e.g:

tmo = dma_fence_wait_timeout(fence, false, tmo);
dma_fence_put(fence);
fence = next;
if (tmo == 0)
     r = -ETIMEDOUT;
     break
} else if (tmo < 0) {
     r = tmo;
     break;
}

That we restart the timeout for each wait looks like a rather 
problematic bug to me as well.

Christian.

Am 02.04.19 um 06:03 schrieb Deng, Emily:
> Maybe it will be better to add follow check, and change “if (r <= 0 || tmo <= 0) " to "if (r <0 || tmo <= 0)".
> 	r = dma_fence_wait_timeout(f, false, timeout);
> 	if (r == 0) {
> 		r = -ETIMEDOUT;
> 		break;
> 	} else if (r < 0) {
> 		break;
> 	}
>
> Best wishes
> Emily Deng
>
>
>> -----Original Message-----
>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of wentalou
>> Sent: Monday, April 1, 2019 4:59 PM
>> To: amd-gfx@lists.freedesktop.org
>> Cc: Lou, Wentao <Wentao.Lou@amd.com>
>> Subject: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if
>> only one node in shadow_list
>>
>> amdgpu_bo_restore_shadow would assign zero to r if succeeded.
>> r would remain zero if there is only one node in shadow_list.
>> current code would always return failure when r <= 0.
>>
>> Change-Id: Iae6880e7c78b71fde6a6754c69665c2e312a80a5
>> Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index c4c61e9..5cf21a4 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -3171,6 +3171,7 @@ static int amdgpu_device_recover_vram(struct
>> amdgpu_device *adev)
>> 	struct dma_fence *fence = NULL, *next = NULL;
>> 	struct amdgpu_bo *shadow;
>> 	long r = 1, tmo;
>> +	bool single_shadow = false;
>>
>> 	if (amdgpu_sriov_runtime(adev))
>> 		tmo = msecs_to_jiffies(8000);
>> @@ -3194,10 +3195,12 @@ static int amdgpu_device_recover_vram(struct
>> amdgpu_device *adev)
>> 			r = dma_fence_wait_timeout(fence, false, tmo);
>> 			dma_fence_put(fence);
>> 			fence = next;
>> +			single_shadow = false;
>> 			if (r <= 0)
>> 				break;
>> 		} else {
>> 			fence = next;
>> +			single_shadow = true;
>> 		}
>> 	}
>> 	mutex_unlock(&adev->shadow_list_lock);
>> @@ -3206,7 +3209,8 @@ static int amdgpu_device_recover_vram(struct
>> amdgpu_device *adev)
>> 		tmo = dma_fence_wait_timeout(fence, false, tmo);
>> 	dma_fence_put(fence);
>>
>> -	if (r <= 0 || tmo <= 0) {
>> +	/* r would be zero even if amdgpu_bo_restore_shadow succeeded when
>> single shadow in list */
>> +	if (r < 0 || (r == 0 && !single_shadow) || tmo <= 0) {
>> 		DRM_ERROR("recover vram bo from shadow failed\n");
>> 		return -EIO;
>> 	}
>> --
>> 2.7.4
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

[parent not found: <03f69f82-1352-8e4e-bd3b-21e5c031ec09-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]

* RE: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list
       [not found]         ` <03f69f82-1352-8e4e-bd3b-21e5c031ec09-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2019-04-02  7:29           ` Lou, Wentao
       [not found]             ` <MN2PR12MB3280B0482A8DCD4D6499D42F83560-rweVpJHSKToYX/8oZD8ObAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Lou, Wentao @ 2019-04-02  7:29 UTC (permalink / raw)
  To: Koenig, Christian, Deng, Emily, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hi Christian,

If " tmo = dma_fence_wait_timeout(fence, false, tmo); " was executed inside list_for_each_entry, the value of tmo might be changed.
But " tmo = dma_fence_wait_timeout(fence, false, tmo); " might be called after exiting the loop of list_for_each_entry.
It might pass a different value of tmo into dma_fence_wait_timeout.

BR,
Wentao


-----Original Message-----
From: Christian König <ckoenig.leichtzumerken@gmail.com> 
Sent: Tuesday, April 2, 2019 3:01 PM
To: Deng, Emily <Emily.Deng@amd.com>; Lou, Wentao <Wentao.Lou@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list

Yeah, agree that is much closer to a correct solution.

But even better would be to correctly update the timeout as well, e.g:

tmo = dma_fence_wait_timeout(fence, false, tmo); dma_fence_put(fence); fence = next; if (tmo == 0)
     r = -ETIMEDOUT;
     break
} else if (tmo < 0) {
     r = tmo;
     break;
}

That we restart the timeout for each wait looks like a rather problematic bug to me as well.

Christian.

Am 02.04.19 um 06:03 schrieb Deng, Emily:
> Maybe it will be better to add follow check, and change “if (r <= 0 || tmo <= 0) " to "if (r <0 || tmo <= 0)".
> 	r = dma_fence_wait_timeout(f, false, timeout);
> 	if (r == 0) {
> 		r = -ETIMEDOUT;
> 		break;
> 	} else if (r < 0) {
> 		break;
> 	}
>
> Best wishes
> Emily Deng
>
>
>> -----Original Message-----
>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of 
>> wentalou
>> Sent: Monday, April 1, 2019 4:59 PM
>> To: amd-gfx@lists.freedesktop.org
>> Cc: Lou, Wentao <Wentao.Lou@amd.com>
>> Subject: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed 
>> if only one node in shadow_list
>>
>> amdgpu_bo_restore_shadow would assign zero to r if succeeded.
>> r would remain zero if there is only one node in shadow_list.
>> current code would always return failure when r <= 0.
>>
>> Change-Id: Iae6880e7c78b71fde6a6754c69665c2e312a80a5
>> Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index c4c61e9..5cf21a4 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -3171,6 +3171,7 @@ static int amdgpu_device_recover_vram(struct 
>> amdgpu_device *adev)
>> 	struct dma_fence *fence = NULL, *next = NULL;
>> 	struct amdgpu_bo *shadow;
>> 	long r = 1, tmo;
>> +	bool single_shadow = false;
>>
>> 	if (amdgpu_sriov_runtime(adev))
>> 		tmo = msecs_to_jiffies(8000);
>> @@ -3194,10 +3195,12 @@ static int amdgpu_device_recover_vram(struct 
>> amdgpu_device *adev)
>> 			r = dma_fence_wait_timeout(fence, false, tmo);
>> 			dma_fence_put(fence);
>> 			fence = next;
>> +			single_shadow = false;
>> 			if (r <= 0)
>> 				break;
>> 		} else {
>> 			fence = next;
>> +			single_shadow = true;
>> 		}
>> 	}
>> 	mutex_unlock(&adev->shadow_list_lock);
>> @@ -3206,7 +3209,8 @@ static int amdgpu_device_recover_vram(struct 
>> amdgpu_device *adev)
>> 		tmo = dma_fence_wait_timeout(fence, false, tmo);
>> 	dma_fence_put(fence);
>>
>> -	if (r <= 0 || tmo <= 0) {
>> +	/* r would be zero even if amdgpu_bo_restore_shadow succeeded when
>> single shadow in list */
>> +	if (r < 0 || (r == 0 && !single_shadow) || tmo <= 0) {
>> 		DRM_ERROR("recover vram bo from shadow failed\n");
>> 		return -EIO;
>> 	}
>> --
>> 2.7.4
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

[parent not found: <MN2PR12MB3280B0482A8DCD4D6499D42F83560-rweVpJHSKToYX/8oZD8ObAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>]

* Re: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list
       [not found]             ` <MN2PR12MB3280B0482A8DCD4D6499D42F83560-rweVpJHSKToYX/8oZD8ObAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2019-04-02  7:39               ` Koenig, Christian
       [not found]                 ` <d358855c-c37c-7623-cd24-86f4cb2be16f-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Koenig, Christian @ 2019-04-02  7:39 UTC (permalink / raw)
  To: Lou, Wentao, Deng, Emily, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Yeah, exactly that's what should happen here.

The value of tmo SHOULD be changed, otherwise we wait tmo jiffies on 
each loop.

Christian.

Am 02.04.19 um 09:29 schrieb Lou, Wentao:
> Hi Christian,
>
> If " tmo = dma_fence_wait_timeout(fence, false, tmo); " was executed inside list_for_each_entry, the value of tmo might be changed.
> But " tmo = dma_fence_wait_timeout(fence, false, tmo); " might be called after exiting the loop of list_for_each_entry.
> It might pass a different value of tmo into dma_fence_wait_timeout.
>
> BR,
> Wentao
>
>
> -----Original Message-----
> From: Christian König <ckoenig.leichtzumerken@gmail.com>
> Sent: Tuesday, April 2, 2019 3:01 PM
> To: Deng, Emily <Emily.Deng@amd.com>; Lou, Wentao <Wentao.Lou@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list
>
> Yeah, agree that is much closer to a correct solution.
>
> But even better would be to correctly update the timeout as well, e.g:
>
> tmo = dma_fence_wait_timeout(fence, false, tmo); dma_fence_put(fence); fence = next; if (tmo == 0)
>       r = -ETIMEDOUT;
>       break
> } else if (tmo < 0) {
>       r = tmo;
>       break;
> }
>
> That we restart the timeout for each wait looks like a rather problematic bug to me as well.
>
> Christian.
>
> Am 02.04.19 um 06:03 schrieb Deng, Emily:
>> Maybe it will be better to add follow check, and change “if (r <= 0 || tmo <= 0) " to "if (r <0 || tmo <= 0)".
>> 	r = dma_fence_wait_timeout(f, false, timeout);
>> 	if (r == 0) {
>> 		r = -ETIMEDOUT;
>> 		break;
>> 	} else if (r < 0) {
>> 		break;
>> 	}
>>
>> Best wishes
>> Emily Deng
>>
>>
>>> -----Original Message-----
>>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of
>>> wentalou
>>> Sent: Monday, April 1, 2019 4:59 PM
>>> To: amd-gfx@lists.freedesktop.org
>>> Cc: Lou, Wentao <Wentao.Lou@amd.com>
>>> Subject: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed
>>> if only one node in shadow_list
>>>
>>> amdgpu_bo_restore_shadow would assign zero to r if succeeded.
>>> r would remain zero if there is only one node in shadow_list.
>>> current code would always return failure when r <= 0.
>>>
>>> Change-Id: Iae6880e7c78b71fde6a6754c69665c2e312a80a5
>>> Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++++-
>>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index c4c61e9..5cf21a4 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -3171,6 +3171,7 @@ static int amdgpu_device_recover_vram(struct
>>> amdgpu_device *adev)
>>> 	struct dma_fence *fence = NULL, *next = NULL;
>>> 	struct amdgpu_bo *shadow;
>>> 	long r = 1, tmo;
>>> +	bool single_shadow = false;
>>>
>>> 	if (amdgpu_sriov_runtime(adev))
>>> 		tmo = msecs_to_jiffies(8000);
>>> @@ -3194,10 +3195,12 @@ static int amdgpu_device_recover_vram(struct
>>> amdgpu_device *adev)
>>> 			r = dma_fence_wait_timeout(fence, false, tmo);
>>> 			dma_fence_put(fence);
>>> 			fence = next;
>>> +			single_shadow = false;
>>> 			if (r <= 0)
>>> 				break;
>>> 		} else {
>>> 			fence = next;
>>> +			single_shadow = true;
>>> 		}
>>> 	}
>>> 	mutex_unlock(&adev->shadow_list_lock);
>>> @@ -3206,7 +3209,8 @@ static int amdgpu_device_recover_vram(struct
>>> amdgpu_device *adev)
>>> 		tmo = dma_fence_wait_timeout(fence, false, tmo);
>>> 	dma_fence_put(fence);
>>>
>>> -	if (r <= 0 || tmo <= 0) {
>>> +	/* r would be zero even if amdgpu_bo_restore_shadow succeeded when
>>> single shadow in list */
>>> +	if (r < 0 || (r == 0 && !single_shadow) || tmo <= 0) {
>>> 		DRM_ERROR("recover vram bo from shadow failed\n");
>>> 		return -EIO;
>>> 	}
>>> --
>>> 2.7.4
>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

[parent not found: <d358855c-c37c-7623-cd24-86f4cb2be16f-5C7GfCeVMHo@public.gmane.org>]

* RE: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list
       [not found]                 ` <d358855c-c37c-7623-cd24-86f4cb2be16f-5C7GfCeVMHo@public.gmane.org>
@ 2019-04-02  9:23                   ` Lou, Wentao
       [not found]                     ` <MN2PR12MB328037EF2991597695B6FB9F83560-rweVpJHSKToYX/8oZD8ObAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Lou, Wentao @ 2019-04-02  9:23 UTC (permalink / raw)
  To: Koenig, Christian, Deng, Emily, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Thanks Christian.
You mean msecs_to_jiffies(8000) should be time to complete whole loop, not each loop.
Just sent out another patch for review.
Thanks.

BR,
Wentao


-----Original Message-----
From: Koenig, Christian <Christian.Koenig@amd.com> 
Sent: Tuesday, April 2, 2019 3:39 PM
To: Lou, Wentao <Wentao.Lou@amd.com>; Deng, Emily <Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list

Yeah, exactly that's what should happen here.

The value of tmo SHOULD be changed, otherwise we wait tmo jiffies on each loop.

Christian.

Am 02.04.19 um 09:29 schrieb Lou, Wentao:
> Hi Christian,
>
> If " tmo = dma_fence_wait_timeout(fence, false, tmo); " was executed inside list_for_each_entry, the value of tmo might be changed.
> But " tmo = dma_fence_wait_timeout(fence, false, tmo); " might be called after exiting the loop of list_for_each_entry.
> It might pass a different value of tmo into dma_fence_wait_timeout.
>
> BR,
> Wentao
>
>
> -----Original Message-----
> From: Christian König <ckoenig.leichtzumerken@gmail.com>
> Sent: Tuesday, April 2, 2019 3:01 PM
> To: Deng, Emily <Emily.Deng@amd.com>; Lou, Wentao 
> <Wentao.Lou@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always 
> failed if only one node in shadow_list
>
> Yeah, agree that is much closer to a correct solution.
>
> But even better would be to correctly update the timeout as well, e.g:
>
> tmo = dma_fence_wait_timeout(fence, false, tmo); dma_fence_put(fence); fence = next; if (tmo == 0)
>       r = -ETIMEDOUT;
>       break
> } else if (tmo < 0) {
>       r = tmo;
>       break;
> }
>
> That we restart the timeout for each wait looks like a rather problematic bug to me as well.
>
> Christian.
>
> Am 02.04.19 um 06:03 schrieb Deng, Emily:
>> Maybe it will be better to add follow check, and change “if (r <= 0 || tmo <= 0) " to "if (r <0 || tmo <= 0)".
>> 	r = dma_fence_wait_timeout(f, false, timeout);
>> 	if (r == 0) {
>> 		r = -ETIMEDOUT;
>> 		break;
>> 	} else if (r < 0) {
>> 		break;
>> 	}
>>
>> Best wishes
>> Emily Deng
>>
>>
>>> -----Original Message-----
>>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of 
>>> wentalou
>>> Sent: Monday, April 1, 2019 4:59 PM
>>> To: amd-gfx@lists.freedesktop.org
>>> Cc: Lou, Wentao <Wentao.Lou@amd.com>
>>> Subject: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always 
>>> failed if only one node in shadow_list
>>>
>>> amdgpu_bo_restore_shadow would assign zero to r if succeeded.
>>> r would remain zero if there is only one node in shadow_list.
>>> current code would always return failure when r <= 0.
>>>
>>> Change-Id: Iae6880e7c78b71fde6a6754c69665c2e312a80a5
>>> Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++++-
>>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index c4c61e9..5cf21a4 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -3171,6 +3171,7 @@ static int amdgpu_device_recover_vram(struct 
>>> amdgpu_device *adev)
>>> 	struct dma_fence *fence = NULL, *next = NULL;
>>> 	struct amdgpu_bo *shadow;
>>> 	long r = 1, tmo;
>>> +	bool single_shadow = false;
>>>
>>> 	if (amdgpu_sriov_runtime(adev))
>>> 		tmo = msecs_to_jiffies(8000);
>>> @@ -3194,10 +3195,12 @@ static int amdgpu_device_recover_vram(struct 
>>> amdgpu_device *adev)
>>> 			r = dma_fence_wait_timeout(fence, false, tmo);
>>> 			dma_fence_put(fence);
>>> 			fence = next;
>>> +			single_shadow = false;
>>> 			if (r <= 0)
>>> 				break;
>>> 		} else {
>>> 			fence = next;
>>> +			single_shadow = true;
>>> 		}
>>> 	}
>>> 	mutex_unlock(&adev->shadow_list_lock);
>>> @@ -3206,7 +3209,8 @@ static int amdgpu_device_recover_vram(struct 
>>> amdgpu_device *adev)
>>> 		tmo = dma_fence_wait_timeout(fence, false, tmo);
>>> 	dma_fence_put(fence);
>>>
>>> -	if (r <= 0 || tmo <= 0) {
>>> +	/* r would be zero even if amdgpu_bo_restore_shadow succeeded when
>>> single shadow in list */
>>> +	if (r < 0 || (r == 0 && !single_shadow) || tmo <= 0) {
>>> 		DRM_ERROR("recover vram bo from shadow failed\n");
>>> 		return -EIO;
>>> 	}
>>> --
>>> 2.7.4
>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

[parent not found: <MN2PR12MB328037EF2991597695B6FB9F83560-rweVpJHSKToYX/8oZD8ObAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>]

* Re: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list
       [not found]                     ` <MN2PR12MB328037EF2991597695B6FB9F83560-rweVpJHSKToYX/8oZD8ObAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2019-04-02 10:34                       ` Koenig, Christian
  0 siblings, 0 replies; 7+ messages in thread
From: Koenig, Christian @ 2019-04-02 10:34 UTC (permalink / raw)
  To: Lou, Wentao, Deng, Emily, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 02.04.19 um 11:23 schrieb Lou, Wentao:
> Thanks Christian.
> You mean msecs_to_jiffies(8000) should be time to complete whole loop, not each loop.

Yeah, a normal desktop system can easily have more than 10000 BOs in 
that list.

So the total timeout could be more than a day, which is a bit long :)

> Just sent out another patch for review.

Going to take a look.

Regards,
Christian.

> Thanks.
>
> BR,
> Wentao
>
>
> -----Original Message-----
> From: Koenig, Christian <Christian.Koenig@amd.com>
> Sent: Tuesday, April 2, 2019 3:39 PM
> To: Lou, Wentao <Wentao.Lou@amd.com>; Deng, Emily <Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list
>
> Yeah, exactly that's what should happen here.
>
> The value of tmo SHOULD be changed, otherwise we wait tmo jiffies on each loop.
>
> Christian.
>
> Am 02.04.19 um 09:29 schrieb Lou, Wentao:
>> Hi Christian,
>>
>> If " tmo = dma_fence_wait_timeout(fence, false, tmo); " was executed inside list_for_each_entry, the value of tmo might be changed.
>> But " tmo = dma_fence_wait_timeout(fence, false, tmo); " might be called after exiting the loop of list_for_each_entry.
>> It might pass a different value of tmo into dma_fence_wait_timeout.
>>
>> BR,
>> Wentao
>>
>>
>> -----Original Message-----
>> From: Christian König <ckoenig.leichtzumerken@gmail.com>
>> Sent: Tuesday, April 2, 2019 3:01 PM
>> To: Deng, Emily <Emily.Deng@amd.com>; Lou, Wentao
>> <Wentao.Lou@amd.com>; amd-gfx@lists.freedesktop.org
>> Subject: Re: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always
>> failed if only one node in shadow_list
>>
>> Yeah, agree that is much closer to a correct solution.
>>
>> But even better would be to correctly update the timeout as well, e.g:
>>
>> tmo = dma_fence_wait_timeout(fence, false, tmo); dma_fence_put(fence); fence = next; if (tmo == 0)
>>        r = -ETIMEDOUT;
>>        break
>> } else if (tmo < 0) {
>>        r = tmo;
>>        break;
>> }
>>
>> That we restart the timeout for each wait looks like a rather problematic bug to me as well.
>>
>> Christian.
>>
>> Am 02.04.19 um 06:03 schrieb Deng, Emily:
>>> Maybe it will be better to add follow check, and change “if (r <= 0 || tmo <= 0) " to "if (r <0 || tmo <= 0)".
>>> 	r = dma_fence_wait_timeout(f, false, timeout);
>>> 	if (r == 0) {
>>> 		r = -ETIMEDOUT;
>>> 		break;
>>> 	} else if (r < 0) {
>>> 		break;
>>> 	}
>>>
>>> Best wishes
>>> Emily Deng
>>>
>>>
>>>> -----Original Message-----
>>>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of
>>>> wentalou
>>>> Sent: Monday, April 1, 2019 4:59 PM
>>>> To: amd-gfx@lists.freedesktop.org
>>>> Cc: Lou, Wentao <Wentao.Lou@amd.com>
>>>> Subject: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always
>>>> failed if only one node in shadow_list
>>>>
>>>> amdgpu_bo_restore_shadow would assign zero to r if succeeded.
>>>> r would remain zero if there is only one node in shadow_list.
>>>> current code would always return failure when r <= 0.
>>>>
>>>> Change-Id: Iae6880e7c78b71fde6a6754c69665c2e312a80a5
>>>> Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
>>>> ---
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++++-
>>>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> index c4c61e9..5cf21a4 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> @@ -3171,6 +3171,7 @@ static int amdgpu_device_recover_vram(struct
>>>> amdgpu_device *adev)
>>>> 	struct dma_fence *fence = NULL, *next = NULL;
>>>> 	struct amdgpu_bo *shadow;
>>>> 	long r = 1, tmo;
>>>> +	bool single_shadow = false;
>>>>
>>>> 	if (amdgpu_sriov_runtime(adev))
>>>> 		tmo = msecs_to_jiffies(8000);
>>>> @@ -3194,10 +3195,12 @@ static int amdgpu_device_recover_vram(struct
>>>> amdgpu_device *adev)
>>>> 			r = dma_fence_wait_timeout(fence, false, tmo);
>>>> 			dma_fence_put(fence);
>>>> 			fence = next;
>>>> +			single_shadow = false;
>>>> 			if (r <= 0)
>>>> 				break;
>>>> 		} else {
>>>> 			fence = next;
>>>> +			single_shadow = true;
>>>> 		}
>>>> 	}
>>>> 	mutex_unlock(&adev->shadow_list_lock);
>>>> @@ -3206,7 +3209,8 @@ static int amdgpu_device_recover_vram(struct
>>>> amdgpu_device *adev)
>>>> 		tmo = dma_fence_wait_timeout(fence, false, tmo);
>>>> 	dma_fence_put(fence);
>>>>
>>>> -	if (r <= 0 || tmo <= 0) {
>>>> +	/* r would be zero even if amdgpu_bo_restore_shadow succeeded when
>>>> single shadow in list */
>>>> +	if (r < 0 || (r == 0 && !single_shadow) || tmo <= 0) {
>>>> 		DRM_ERROR("recover vram bo from shadow failed\n");
>>>> 		return -EIO;
>>>> 	}
>>>> --
>>>> 2.7.4
>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-04-02 10:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-01  8:58 [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list wentalou
     [not found] ` <1554109125-21890-1-git-send-email-Wentao.Lou-5C7GfCeVMHo@public.gmane.org>
2019-04-02  4:03   ` Deng, Emily
     [not found]     ` <BN7PR12MB26445D172F6617137A8F905C8F560-Zx/IyJUqfGKoYeSiBV3SvgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-04-02  7:01       ` Christian König
     [not found]         ` <03f69f82-1352-8e4e-bd3b-21e5c031ec09-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2019-04-02  7:29           ` Lou, Wentao
     [not found]             ` <MN2PR12MB3280B0482A8DCD4D6499D42F83560-rweVpJHSKToYX/8oZD8ObAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-04-02  7:39               ` Koenig, Christian
     [not found]                 ` <d358855c-c37c-7623-cd24-86f4cb2be16f-5C7GfCeVMHo@public.gmane.org>
2019-04-02  9:23                   ` Lou, Wentao
     [not found]                     ` <MN2PR12MB328037EF2991597695B6FB9F83560-rweVpJHSKToYX/8oZD8ObAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-04-02 10:34                       ` Koenig, Christian

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.