Re: [PATCH] ttm: wait mem space if user allow while gpu busy

From: zhoucm1 <zhoucm1@amd.com>
To: christian.koenig@amd.com, "Zhou,
	David(ChunMing)" <David1.Zhou@amd.com>,
	"Liang, Prike" <Prike.Liang@amd.com>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
Date: Wed, 24 Apr 2019 15:59:33 +0800	[thread overview]
Message-ID: <26341685-6e73-55c4-2609-52616d92c06a@amd.com> (raw)
In-Reply-To: <8db16fe8-78fe-7f25-4acd-51e37e645f3b@gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 6883 bytes --]

how about new attached?


-David


On 2019年04月24日 15:30, Christian König wrote:
> That would change the semantics of ttm_bo_mem_space() and so could 
> change the return code in an IOCTL as well. Not a good idea, cause 
> that could have a lot of side effects.
>
> Instead in the calling DC code you could check if you get an -ENOMEM 
> and then call schedule().
>
> If after the schedule() we see that we have now BOs on the LRU we can 
> try again and see if pinning the frame buffer now succeeds.
>
> Christian.
>
> Am 24.04.19 um 09:17 schrieb zhoucm1:
>>
>> I made a patch as attached.
>>
>> I'm not sure how to make patch as your proposal, Could you make a 
>> patch for that if mine isn't enough?
>>
>> -David
>>
>> On 2019年04月24日 15:12, Christian König wrote:
>>>> how about just adding a wrapper for pin function as below?
>>> I considered this as well and don't think it will work reliable.
>>>
>>> We could use it as a band aid for this specific problem, but in 
>>> general we need to improve the handling in TTM to resolve those kind 
>>> of resource conflicts.
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):
>>>> >3. If we have a ticket we grab a reference to the first BO on the 
>>>> LRU, drop the LRU lock and try to grab the reservation lock with 
>>>> the ticket.
>>>>
>>>> The BO on LRU is already locked by cs user, can it be dropped here 
>>>> by DC user? and then DC user grab its lock with ticket, how does CS 
>>>> grab it again?
>>>>
>>>> If you think waiting in ttm has this risk, how about just adding a 
>>>> wrapper for pin function as below?
>>>> amdgpu_get_pin_bo_timeout()
>>>> {
>>>> do {
>>>> amdgpo_bo_reserve();
>>>> r=amdgpu_bo_pin();
>>>>
>>>> if(!r)
>>>>         break;
>>>> amdgpu_bo_unreserve();
>>>> timeout--;
>>>>
>>>> } while(timeout>0);
>>>>
>>>> }
>>>>
>>>> -------- Original Message --------
>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
>>>> From: Christian König
>>>> To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" 
>>>> ,dri-devel@lists.freedesktop.org
>>>> CC:
>>>>
>>>> Well that's not so easy of hand.
>>>>
>>>> The basic problem here is that when you busy wait at this place you 
>>>> can easily run into situations where application A busy waits for B 
>>>> while B busy waits for A -> deadlock.
>>>>
>>>> So what we need here is the deadlock detection logic of the 
>>>> ww_mutex. To use this we at least need to do the following steps:
>>>>
>>>> 1. Reserve the BO in DC using a ww_mutex ticket (trivial).
>>>>
>>>> 2. If we then run into this EBUSY condition in TTM check if the BO 
>>>> we need memory for (or rather the ww_mutex of its reservation 
>>>> object) has a ticket assigned.
>>>>
>>>> 3. If we have a ticket we grab a reference to the first BO on the 
>>>> LRU, drop the LRU lock and try to grab the reservation lock with 
>>>> the ticket.
>>>>
>>>> 4. If getting the reservation lock with the ticket succeeded we 
>>>> check if the BO is still the first one on the LRU in question (the 
>>>> BO could have moved).
>>>>
>>>> 5. If the BO is still the first one on the LRU in question we try 
>>>> to evict it as we would evict any other BO.
>>>>
>>>> 6. If any of the "If's" above fail we just back off and return -EBUSY.
>>>>
>>>> Steps 2-5 are certainly not trivial, but doable as far as I can see.
>>>>
>>>> Have fun :)
>>>> Christian.
>>>>
>>>> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
>>>>> How about adding more condition ctx->resv inline to address your 
>>>>> concern? As well as don't wait from same user, shouldn't lead to 
>>>>> deadlock.
>>>>>
>>>>> Otherwise, any other idea?
>>>>>
>>>>> -------- Original Message --------
>>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
>>>>> From: Christian König
>>>>> To: "Liang, Prike" ,"Zhou, David(ChunMing)" 
>>>>> ,dri-devel@lists.freedesktop.org
>>>>> CC:
>>>>>
>>>>> Well that is certainly a NAK because it can lead to deadlock in the
>>>>> memory management.
>>>>>
>>>>> You can't just busy wait with all those locks held.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>> Am 23.04.19 um 03:45 schrieb Liang, Prike:
>>>>> > Acked-by: Prike Liang <Prike.Liang@amd.com>
>>>>> >
>>>>> > Thanks,
>>>>> > Prike
>>>>> > -----Original Message-----
>>>>> > From: Chunming Zhou <david1.zhou@amd.com>
>>>>> > Sent: Monday, April 22, 2019 6:39 PM
>>>>> > To: dri-devel@lists.freedesktop.org
>>>>> > Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, David(ChunMing) 
>>>>> <David1.Zhou@amd.com>
>>>>> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>>>>> >
>>>>> > heavy gpu job could occupy memory long time, which could lead to 
>>>>> other user fail to get memory.
>>>>> >
>>>>> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
>>>>> > Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
>>>>> > ---
>>>>> >   drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--
>>>>> >   1 file changed, 4 insertions(+), 2 deletions(-)
>>>>> >
>>>>> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
>>>>> b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644
>>>>> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>>> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>> > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct 
>>>>> ttm_buffer_object *bo,
>>>>> >                if (mem->mm_node)
>>>>> >                        break;
>>>>> >                ret = ttm_mem_evict_first(bdev, mem_type, place, 
>>>>> ctx);
>>>>> > -             if (unlikely(ret != 0))
>>>>> > -                     return ret;
>>>>> > +             if (unlikely(ret != 0)) {
>>>>> > +                     if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)
>>>>> > +                             return ret;
>>>>> > +             }
>>>>> >        } while (1);
>>>>> >        mem->mem_type = mem_type;
>>>>> >        return ttm_bo_add_move_fence(bo, man, mem);
>>>>> > --
>>>>> > 2.17.1
>>>>> >
>>>>> > _______________________________________________
>>>>> > dri-devel mailing list
>>>>> > dri-devel@lists.freedesktop.org
>>>>> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> dri-devel mailing list
>>>>> dri-devel@lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>
>>>>
>>>> _______________________________________________
>>>> dri-devel mailing list
>>>> dri-devel@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>
>>
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>


[-- Attachment #1.2: Type: text/html, Size: 14393 bytes --]

[-- Attachment #2: 0001-drm-amdgpu-wait-memory-idle-when-fb-is-pinned-fail.patch --]
[-- Type: text/x-patch, Size: 2292 bytes --]

>From 61174dd44618bbfcb035484a44435b7ed4df86a5 Mon Sep 17 00:00:00 2001
From: Chunming Zhou <david1.zhou@amd.com>
Date: Wed, 24 Apr 2019 11:35:29 +0800
Subject: [PATCH] drm/amdgpu: wait memory idle when fb is pinned fail

heavy gpu job execution could occupy memory long time, which
could lead to other user fail to get memory.

Change-Id: I3857c4b99859c9d2f354cf2c486dbacb8520d0ed
Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index a5cacf846e1b..8577b629a640 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -4093,6 +4093,8 @@ static const struct drm_plane_funcs dm_plane_funcs = {
 	.atomic_destroy_state = dm_drm_plane_destroy_state,
 };
 
+/* 100s default to wait for pin fb */
+#define DM_PIN_MEM_TIMEOUT 100000000000ULL
 static int dm_plane_helper_prepare_fb(struct drm_plane *plane,
 				      struct drm_plane_state *new_state)
 {
@@ -4103,6 +4105,7 @@ static int dm_plane_helper_prepare_fb(struct drm_plane *plane,
 	struct dm_plane_state *dm_plane_state_new, *dm_plane_state_old;
 	uint64_t tiling_flags;
 	uint32_t domain;
+	u64 timeout = nsecs_to_jiffies64(DM_PIN_MEM_TIMEOUT);
 	int r;
 
 	dm_plane_state_old = to_dm_plane_state(plane->state);
@@ -4117,6 +4120,8 @@ static int dm_plane_helper_prepare_fb(struct drm_plane *plane,
 	obj = new_state->fb->obj[0];
 	rbo = gem_to_amdgpu_bo(obj);
 	adev = amdgpu_ttm_adev(rbo->tbo.bdev);
+
+retry:
 	r = amdgpu_bo_reserve(rbo, false);
 	if (unlikely(r != 0))
 		return r;
@@ -4131,8 +4136,18 @@ static int dm_plane_helper_prepare_fb(struct drm_plane *plane,
 		if (r != -ERESTARTSYS)
 			DRM_ERROR("Failed to pin framebuffer with error %d\n", r);
 		amdgpu_bo_unreserve(rbo);
+		if (r == -ENOMEM) {
+			set_current_state(TASK_INTERRUPTIBLE);
+			if (timeout == 0)
+				return r;
+			if (signal_pending(current))
+				return r;
+			timeout = schedule_timeout(timeout);
+			goto retry;
+		}
 		return r;
 	}
+        __set_current_state(TASK_RUNNING);
 
 	r = amdgpu_ttm_alloc_gart(&rbo->tbo);
 	if (unlikely(r != 0)) {
-- 
2.17.1


[-- Attachment #3: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel