dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/1] drm/ttm: return -EBUSY if waiting for busy BO fails
@ 2019-06-26  6:40 Kuehling, Felix
       [not found] ` <20190626063958.19941-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Kuehling, Felix @ 2019-06-26  6:40 UTC (permalink / raw)
  To: amd-gfx, dri-devel; +Cc: Kuehling, Felix, Koenig, Christian

Returning -EAGAIN prevents ttm_bo_mem_space from trying alternate
placements and can lead to live-locks in amdgpu_cs, retrying
indefinitely and never succeeding.

Fixes: cfcc52e477e4 ("drm/ttm: fix busy memory to fail other user v10")
CC: Christian Koenig <Christian.Koenig@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index c7de667d482a..58c403eda04e 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -827,7 +827,7 @@ static int ttm_mem_evict_wait_busy(struct ttm_buffer_object *busy_bo,
 	if (!r)
 		reservation_object_unlock(busy_bo->resv);
 
-	return r == -EDEADLK ? -EAGAIN : r;
+	return r == -EDEADLK ? -EBUSY : r;
 }
 
 static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
-- 
2.17.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/1] drm/ttm: return -EBUSY if waiting for busy BO fails
       [not found] ` <20190626063958.19941-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2019-06-26  6:54   ` Koenig, Christian
       [not found]     ` <410e8232-4edc-78ea-dc5b-4385cda01266-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Koenig, Christian @ 2019-06-26  6:54 UTC (permalink / raw)
  To: Kuehling, Felix, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 26.06.19 um 08:40 schrieb Kuehling, Felix:
> Returning -EAGAIN prevents ttm_bo_mem_space from trying alternate
> placements and can lead to live-locks in amdgpu_cs, retrying
> indefinitely and never succeeding.
>
> Fixes: cfcc52e477e4 ("drm/ttm: fix busy memory to fail other user v10")
> CC: Christian Koenig <Christian.Koenig@amd.com>
> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>

Crap, I feared that this could live-lock under some circumstances, but 
hoped that this would be a rather rare case.

How did you reproduce this?

Anyway patch is Reviewed-by: Christian König <christian.koenig@amd.com> 
for now.

> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index c7de667d482a..58c403eda04e 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -827,7 +827,7 @@ static int ttm_mem_evict_wait_busy(struct ttm_buffer_object *busy_bo,
>   	if (!r)
>   		reservation_object_unlock(busy_bo->resv);
>   
> -	return r == -EDEADLK ? -EAGAIN : r;
> +	return r == -EDEADLK ? -EBUSY : r;
>   }
>   
>   static int ttm_mem_evict_first(struct ttm_bo_device *bdev,

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/1] drm/ttm: return -EBUSY if waiting for busy BO fails
       [not found]     ` <410e8232-4edc-78ea-dc5b-4385cda01266-5C7GfCeVMHo@public.gmane.org>
@ 2019-06-26  7:04       ` Kuehling, Felix
       [not found]         ` <33c2b0f0-6747-1a36-117f-8e7fe12cbef0-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Kuehling, Felix @ 2019-06-26  7:04 UTC (permalink / raw)
  To: Koenig, Christian, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 2019-06-26 2:54 a.m., Koenig, Christian wrote:
> Am 26.06.19 um 08:40 schrieb Kuehling, Felix:
>> Returning -EAGAIN prevents ttm_bo_mem_space from trying alternate
>> placements and can lead to live-locks in amdgpu_cs, retrying
>> indefinitely and never succeeding.
>>
>> Fixes: cfcc52e477e4 ("drm/ttm: fix busy memory to fail other user v10")
>> CC: Christian Koenig <Christian.Koenig@amd.com>
>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
> Crap, I feared that this could live-lock under some circumstances, but
> hoped that this would be a rather rare case.
>
> How did you reproduce this?

kfdtest --gtest_filter=KFDEvictTest.* --gtest_repeat=10

It runs two processes, both of which do graphics CS and KFD compute 
queues at the same time with enough memory pressure to cause frequent 
KFD evictions. It's meant to test KFD eviction code paths, but ended up 
finding a problem the graphics CS code path. :/

I was able to reproduce it right after your changes. With the latest 
version of the branch I can't reproduce it any more. Some other commit 
must have changed things enough to avoid the live lock.

I also tried writing a test that reproduced it only with amdgpu_cs calls 
(without KFD), but no luck yet.

Regards,
   Felix

>
> Anyway patch is Reviewed-by: Christian König <christian.koenig@amd.com>
> for now.
>
>> ---
>>    drivers/gpu/drm/ttm/ttm_bo.c | 2 +-
>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>> index c7de667d482a..58c403eda04e 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>> @@ -827,7 +827,7 @@ static int ttm_mem_evict_wait_busy(struct ttm_buffer_object *busy_bo,
>>    	if (!r)
>>    		reservation_object_unlock(busy_bo->resv);
>>    
>> -	return r == -EDEADLK ? -EAGAIN : r;
>> +	return r == -EDEADLK ? -EBUSY : r;
>>    }
>>    
>>    static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/1] drm/ttm: return -EBUSY if waiting for busy BO fails
       [not found]         ` <33c2b0f0-6747-1a36-117f-8e7fe12cbef0-5C7GfCeVMHo@public.gmane.org>
@ 2019-06-26 10:03           ` Michel Dänzer
  0 siblings, 0 replies; 4+ messages in thread
From: Michel Dänzer @ 2019-06-26 10:03 UTC (permalink / raw)
  To: Kuehling, Felix, Koenig, Christian
  Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 2019-06-26 9:04 a.m., Kuehling, Felix wrote:
> On 2019-06-26 2:54 a.m., Koenig, Christian wrote:
>> Am 26.06.19 um 08:40 schrieb Kuehling, Felix:
>>> Returning -EAGAIN prevents ttm_bo_mem_space from trying alternate
>>> placements and can lead to live-locks in amdgpu_cs, retrying
>>> indefinitely and never succeeding.
>>>
>>> Fixes: cfcc52e477e4 ("drm/ttm: fix busy memory to fail other user v10")
>>> CC: Christian Koenig <Christian.Koenig@amd.com>
>>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>> Crap, I feared that this could live-lock under some circumstances, but
>> hoped that this would be a rather rare case.
>>
>> How did you reproduce this?
> 
> kfdtest --gtest_filter=KFDEvictTest.* --gtest_repeat=10
> 
> It runs two processes, both of which do graphics CS and KFD compute 
> queues at the same time with enough memory pressure to cause frequent 
> KFD evictions. It's meant to test KFD eviction code paths, but ended up 
> finding a problem the graphics CS code path. :/
> 
> I was able to reproduce it right after your changes. With the latest 
> version of the branch I can't reproduce it any more. Some other commit 
> must have changed things enough to avoid the live lock.

Probably just luck, unless this was a very recent change. I'd also been
seeing live-locks between memory-heavy piglit tests, last time just this
Monday. But it didn't happen every time.

I'd been meaning to report this, but kept getting distracted by other
stuff. Thanks for beating me to it, and for even coming up with a solution!


-- 
Earthling Michel Dänzer               |              https://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-06-26 10:03 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-26  6:40 [PATCH 1/1] drm/ttm: return -EBUSY if waiting for busy BO fails Kuehling, Felix
     [not found] ` <20190626063958.19941-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2019-06-26  6:54   ` Koenig, Christian
     [not found]     ` <410e8232-4edc-78ea-dc5b-4385cda01266-5C7GfCeVMHo@public.gmane.org>
2019-06-26  7:04       ` Kuehling, Felix
     [not found]         ` <33c2b0f0-6747-1a36-117f-8e7fe12cbef0-5C7GfCeVMHo@public.gmane.org>
2019-06-26 10:03           ` Michel Dänzer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).