All of lore.kernel.org
 help / color / mirror / Atom feed
From: zhoucm1 <zhoucm1@amd.com>
To: christian.koenig@amd.com, "Zhou,
	David(ChunMing)" <David1.Zhou@amd.com>,
	"Liang, Prike" <Prike.Liang@amd.com>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
Date: Wed, 24 Apr 2019 15:59:33 +0800	[thread overview]
Message-ID: <26341685-6e73-55c4-2609-52616d92c06a@amd.com> (raw)
In-Reply-To: <8db16fe8-78fe-7f25-4acd-51e37e645f3b@gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 6883 bytes --]

how about new attached?


-David


On 2019年04月24日 15:30, Christian König wrote:
> That would change the semantics of ttm_bo_mem_space() and so could 
> change the return code in an IOCTL as well. Not a good idea, cause 
> that could have a lot of side effects.
>
> Instead in the calling DC code you could check if you get an -ENOMEM 
> and then call schedule().
>
> If after the schedule() we see that we have now BOs on the LRU we can 
> try again and see if pinning the frame buffer now succeeds.
>
> Christian.
>
> Am 24.04.19 um 09:17 schrieb zhoucm1:
>>
>> I made a patch as attached.
>>
>> I'm not sure how to make patch as your proposal, Could you make a 
>> patch for that if mine isn't enough?
>>
>> -David
>>
>> On 2019年04月24日 15:12, Christian König wrote:
>>>> how about just adding a wrapper for pin function as below?
>>> I considered this as well and don't think it will work reliable.
>>>
>>> We could use it as a band aid for this specific problem, but in 
>>> general we need to improve the handling in TTM to resolve those kind 
>>> of resource conflicts.
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):
>>>> >3. If we have a ticket we grab a reference to the first BO on the 
>>>> LRU, drop the LRU lock and try to grab the reservation lock with 
>>>> the ticket.
>>>>
>>>> The BO on LRU is already locked by cs user, can it be dropped here 
>>>> by DC user? and then DC user grab its lock with ticket, how does CS 
>>>> grab it again?
>>>>
>>>> If you think waiting in ttm has this risk, how about just adding a 
>>>> wrapper for pin function as below?
>>>> amdgpu_get_pin_bo_timeout()
>>>> {
>>>> do {
>>>> amdgpo_bo_reserve();
>>>> r=amdgpu_bo_pin();
>>>>
>>>> if(!r)
>>>>         break;
>>>> amdgpu_bo_unreserve();
>>>> timeout--;
>>>>
>>>> } while(timeout>0);
>>>>
>>>> }
>>>>
>>>> -------- Original Message --------
>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
>>>> From: Christian König
>>>> To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" 
>>>> ,dri-devel@lists.freedesktop.org
>>>> CC:
>>>>
>>>> Well that's not so easy of hand.
>>>>
>>>> The basic problem here is that when you busy wait at this place you 
>>>> can easily run into situations where application A busy waits for B 
>>>> while B busy waits for A -> deadlock.
>>>>
>>>> So what we need here is the deadlock detection logic of the 
>>>> ww_mutex. To use this we at least need to do the following steps:
>>>>
>>>> 1. Reserve the BO in DC using a ww_mutex ticket (trivial).
>>>>
>>>> 2. If we then run into this EBUSY condition in TTM check if the BO 
>>>> we need memory for (or rather the ww_mutex of its reservation 
>>>> object) has a ticket assigned.
>>>>
>>>> 3. If we have a ticket we grab a reference to the first BO on the 
>>>> LRU, drop the LRU lock and try to grab the reservation lock with 
>>>> the ticket.
>>>>
>>>> 4. If getting the reservation lock with the ticket succeeded we 
>>>> check if the BO is still the first one on the LRU in question (the 
>>>> BO could have moved).
>>>>
>>>> 5. If the BO is still the first one on the LRU in question we try 
>>>> to evict it as we would evict any other BO.
>>>>
>>>> 6. If any of the "If's" above fail we just back off and return -EBUSY.
>>>>
>>>> Steps 2-5 are certainly not trivial, but doable as far as I can see.
>>>>
>>>> Have fun :)
>>>> Christian.
>>>>
>>>> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
>>>>> How about adding more condition ctx->resv inline to address your 
>>>>> concern? As well as don't wait from same user, shouldn't lead to 
>>>>> deadlock.
>>>>>
>>>>> Otherwise, any other idea?
>>>>>
>>>>> -------- Original Message --------
>>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
>>>>> From: Christian König
>>>>> To: "Liang, Prike" ,"Zhou, David(ChunMing)" 
>>>>> ,dri-devel@lists.freedesktop.org
>>>>> CC:
>>>>>
>>>>> Well that is certainly a NAK because it can lead to deadlock in the
>>>>> memory management.
>>>>>
>>>>> You can't just busy wait with all those locks held.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>> Am 23.04.19 um 03:45 schrieb Liang, Prike:
>>>>> > Acked-by: Prike Liang <Prike.Liang@amd.com>
>>>>> >
>>>>> > Thanks,
>>>>> > Prike
>>>>> > -----Original Message-----
>>>>> > From: Chunming Zhou <david1.zhou@amd.com>
>>>>> > Sent: Monday, April 22, 2019 6:39 PM
>>>>> > To: dri-devel@lists.freedesktop.org
>>>>> > Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, David(ChunMing) 
>>>>> <David1.Zhou@amd.com>
>>>>> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>>>>> >
>>>>> > heavy gpu job could occupy memory long time, which could lead to 
>>>>> other user fail to get memory.
>>>>> >
>>>>> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
>>>>> > Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
>>>>> > ---
>>>>> >   drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--
>>>>> >   1 file changed, 4 insertions(+), 2 deletions(-)
>>>>> >
>>>>> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
>>>>> b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644
>>>>> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>>> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>> > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct 
>>>>> ttm_buffer_object *bo,
>>>>> >                if (mem->mm_node)
>>>>> >                        break;
>>>>> >                ret = ttm_mem_evict_first(bdev, mem_type, place, 
>>>>> ctx);
>>>>> > -             if (unlikely(ret != 0))
>>>>> > -                     return ret;
>>>>> > +             if (unlikely(ret != 0)) {
>>>>> > +                     if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)
>>>>> > +                             return ret;
>>>>> > +             }
>>>>> >        } while (1);
>>>>> >        mem->mem_type = mem_type;
>>>>> >        return ttm_bo_add_move_fence(bo, man, mem);
>>>>> > --
>>>>> > 2.17.1
>>>>> >
>>>>> > _______________________________________________
>>>>> > dri-devel mailing list
>>>>> > dri-devel@lists.freedesktop.org
>>>>> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> dri-devel mailing list
>>>>> dri-devel@lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>
>>>>
>>>> _______________________________________________
>>>> dri-devel mailing list
>>>> dri-devel@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>
>>
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>


[-- Attachment #1.2: Type: text/html, Size: 14393 bytes --]

[-- Attachment #2: 0001-drm-amdgpu-wait-memory-idle-when-fb-is-pinned-fail.patch --]
[-- Type: text/x-patch, Size: 2292 bytes --]

>From 61174dd44618bbfcb035484a44435b7ed4df86a5 Mon Sep 17 00:00:00 2001
From: Chunming Zhou <david1.zhou@amd.com>
Date: Wed, 24 Apr 2019 11:35:29 +0800
Subject: [PATCH] drm/amdgpu: wait memory idle when fb is pinned fail

heavy gpu job execution could occupy memory long time, which
could lead to other user fail to get memory.

Change-Id: I3857c4b99859c9d2f354cf2c486dbacb8520d0ed
Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index a5cacf846e1b..8577b629a640 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -4093,6 +4093,8 @@ static const struct drm_plane_funcs dm_plane_funcs = {
 	.atomic_destroy_state = dm_drm_plane_destroy_state,
 };
 
+/* 100s default to wait for pin fb */
+#define DM_PIN_MEM_TIMEOUT 100000000000ULL
 static int dm_plane_helper_prepare_fb(struct drm_plane *plane,
 				      struct drm_plane_state *new_state)
 {
@@ -4103,6 +4105,7 @@ static int dm_plane_helper_prepare_fb(struct drm_plane *plane,
 	struct dm_plane_state *dm_plane_state_new, *dm_plane_state_old;
 	uint64_t tiling_flags;
 	uint32_t domain;
+	u64 timeout = nsecs_to_jiffies64(DM_PIN_MEM_TIMEOUT);
 	int r;
 
 	dm_plane_state_old = to_dm_plane_state(plane->state);
@@ -4117,6 +4120,8 @@ static int dm_plane_helper_prepare_fb(struct drm_plane *plane,
 	obj = new_state->fb->obj[0];
 	rbo = gem_to_amdgpu_bo(obj);
 	adev = amdgpu_ttm_adev(rbo->tbo.bdev);
+
+retry:
 	r = amdgpu_bo_reserve(rbo, false);
 	if (unlikely(r != 0))
 		return r;
@@ -4131,8 +4136,18 @@ static int dm_plane_helper_prepare_fb(struct drm_plane *plane,
 		if (r != -ERESTARTSYS)
 			DRM_ERROR("Failed to pin framebuffer with error %d\n", r);
 		amdgpu_bo_unreserve(rbo);
+		if (r == -ENOMEM) {
+			set_current_state(TASK_INTERRUPTIBLE);
+			if (timeout == 0)
+				return r;
+			if (signal_pending(current))
+				return r;
+			timeout = schedule_timeout(timeout);
+			goto retry;
+		}
 		return r;
 	}
+        __set_current_state(TASK_RUNNING);
 
 	r = amdgpu_ttm_alloc_gart(&rbo->tbo);
 	if (unlikely(r != 0)) {
-- 
2.17.1


[-- Attachment #3: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  reply	other threads:[~2019-04-24  7:59 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-22 10:38 [PATCH] ttm: wait mem space if user allow while gpu busy Chunming Zhou
2019-04-23  1:45 ` Liang, Prike
2019-04-23 13:05   ` Christian König
2019-04-23 13:19     ` Zhou, David(ChunMing)
2019-04-23 14:42       ` [PATCH] " Christian König
2019-04-23 15:09         ` Zhou, David(ChunMing)
2019-04-24  7:12           ` [PATCH] " Christian König
2019-04-24  7:17             ` zhoucm1
2019-04-24  7:30               ` Christian König
2019-04-24  7:59                 ` zhoucm1 [this message]
2019-04-24  8:07                   ` Christian König
2019-04-24  8:11                     ` zhoucm1
2019-04-24  8:59                       ` Koenig, Christian
2019-04-24  9:03                         ` zhoucm1
2019-04-24 10:03                           ` Christian König
2019-04-24  8:01         ` Daniel Vetter
2019-04-24  8:06           ` Koenig, Christian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=26341685-6e73-55c4-2609-52616d92c06a@amd.com \
    --to=zhoucm1@amd.com \
    --cc=David1.Zhou@amd.com \
    --cc=Prike.Liang@amd.com \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.