From mboxrd@z Thu Jan 1 00:00:00 1970 From: zhoucm1 Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy Date: Wed, 24 Apr 2019 17:03:59 +0800 Message-ID: References: <20190422103836.4300-1-david1.zhou@amd.com> <1c98b839-362d-c279-1abb-c022aed3abf1@gmail.com> <-jm8957cqk536djh1631fvvv-xx4wzb5q21ak-v8q7rd2l14aq-f68kxr7kbea18a7xceae626b-8s84c4d1mgrg53g0bhq3ahee89h70qrv4ly1-vf5a7d63x4mbquxnfmiehuk2-rwaw2m-qc2chu.1556032141262@email.android.com> <32f2a1b7-99f8-de9a-799c-f98af308842b@gmail.com> <8db16fe8-78fe-7f25-4acd-51e37e645f3b@gmail.com> <26341685-6e73-55c4-2609-52616d92c06a@amd.com> <9e6e865f-29dd-be69-fada-f6a6b35e6c69@amd.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0965436427==" Return-path: Received: from NAM04-SN1-obe.outbound.protection.outlook.com (mail-eopbgr700064.outbound.protection.outlook.com [40.107.70.64]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3D7A28920E for ; Wed, 24 Apr 2019 09:04:15 +0000 (UTC) In-Reply-To: <9e6e865f-29dd-be69-fada-f6a6b35e6c69@amd.com> Content-Language: en-US List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: "Koenig, Christian" , "Zhou, David(ChunMing)" , "Liang, Prike" , "dri-devel@lists.freedesktop.org" List-Id: dri-devel@lists.freedesktop.org --===============0965436427== Content-Type: multipart/alternative; boundary="------------6D3D0A982A70D98625DC07C2" Content-Language: en-US --------------6D3D0A982A70D98625DC07C2 Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit OK, Let's drop mine, then Could you draft a solution for that? -David On 2019年04月24日 16:59, Koenig, Christian wrote: > Am 24.04.19 um 10:11 schrieb zhoucm1: >> >> >> >> On 2019年04月24日 16:07, Christian König wrote: >>> This is used in a work item, so you don't need to check for signals. >> will remove. >>> >>> And checking if the LRU is populated is mandatory here >> How to check it outside of TTM? because the code is in dm. > > Well just use a static cast on the first entry of the LRU. > > We can't upstream that solution anyway, so just a hack should do for now. > >> >>> or otherwise you can run into an endless loop. >> I already add a timeout for that. > > Thinking more about it we most likely actually need to grab the lock > on the first BO entry from the LRU. > >> >> -David >>> >>> Christian. >>> >>> Am 24.04.19 um 09:59 schrieb zhoucm1: >>>> >>>> how about new attached? >>>> >>>> >>>> -David >>>> >>>> >>>> On 2019年04月24日 15:30, Christian König wrote: >>>>> That would change the semantics of ttm_bo_mem_space() and so could >>>>> change the return code in an IOCTL as well. Not a good idea, cause >>>>> that could have a lot of side effects. >>>>> >>>>> Instead in the calling DC code you could check if you get an >>>>> -ENOMEM and then call schedule(). >>>>> >>>>> If after the schedule() we see that we have now BOs on the LRU we >>>>> can try again and see if pinning the frame buffer now succeeds. >>>>> >>>>> Christian. >>>>> >>>>> Am 24.04.19 um 09:17 schrieb zhoucm1: >>>>>> >>>>>> I made a patch as attached. >>>>>> >>>>>> I'm not sure how to make patch as your proposal, Could you make a >>>>>> patch for that if mine isn't enough? >>>>>> >>>>>> -David >>>>>> >>>>>> On 2019年04月24日 15:12, Christian König wrote: >>>>>>>> how about just adding a wrapper for pin function as below? >>>>>>> I considered this as well and don't think it will work reliable. >>>>>>> >>>>>>> We could use it as a band aid for this specific problem, but in >>>>>>> general we need to improve the handling in TTM to resolve those >>>>>>> kind of resource conflicts. >>>>>>> >>>>>>> Regards, >>>>>>> Christian. >>>>>>> >>>>>>> Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing): >>>>>>>> >3. If we have a ticket we grab a reference to the first BO on >>>>>>>> the LRU, drop the LRU lock and try to grab the reservation lock >>>>>>>> with the ticket. >>>>>>>> >>>>>>>> The BO on LRU is already locked by cs user, can it be dropped >>>>>>>> here by DC user? and then DC user grab its lock with ticket, >>>>>>>> how does CS grab it again? >>>>>>>> >>>>>>>> If you think waiting in ttm has this risk, how about just >>>>>>>> adding a wrapper for pin function as below? >>>>>>>> amdgpu_get_pin_bo_timeout() >>>>>>>> { >>>>>>>> do { >>>>>>>> amdgpo_bo_reserve(); >>>>>>>> r=amdgpu_bo_pin(); >>>>>>>> >>>>>>>> if(!r) >>>>>>>>         break; >>>>>>>> amdgpu_bo_unreserve(); >>>>>>>> timeout--; >>>>>>>> >>>>>>>> } while(timeout>0); >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> -------- Original Message -------- >>>>>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while >>>>>>>> gpu busy >>>>>>>> From: Christian König >>>>>>>> To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, >>>>>>>> Prike" ,dri-devel@lists.freedesktop.org >>>>>>>> CC: >>>>>>>> >>>>>>>> Well that's not so easy of hand. >>>>>>>> >>>>>>>> The basic problem here is that when you busy wait at this place >>>>>>>> you can easily run into situations where application A busy >>>>>>>> waits for B while B busy waits for A -> deadlock. >>>>>>>> >>>>>>>> So what we need here is the deadlock detection logic of the >>>>>>>> ww_mutex. To use this we at least need to do the following steps: >>>>>>>> >>>>>>>> 1. Reserve the BO in DC using a ww_mutex ticket (trivial). >>>>>>>> >>>>>>>> 2. If we then run into this EBUSY condition in TTM check if the >>>>>>>> BO we need memory for (or rather the ww_mutex of its >>>>>>>> reservation object) has a ticket assigned. >>>>>>>> >>>>>>>> 3. If we have a ticket we grab a reference to the first BO on >>>>>>>> the LRU, drop the LRU lock and try to grab the reservation lock >>>>>>>> with the ticket. >>>>>>>> >>>>>>>> 4. If getting the reservation lock with the ticket succeeded we >>>>>>>> check if the BO is still the first one on the LRU in question >>>>>>>> (the BO could have moved). >>>>>>>> >>>>>>>> 5. If the BO is still the first one on the LRU in question we >>>>>>>> try to evict it as we would evict any other BO. >>>>>>>> >>>>>>>> 6. If any of the "If's" above fail we just back off and return >>>>>>>> -EBUSY. >>>>>>>> >>>>>>>> Steps 2-5 are certainly not trivial, but doable as far as I can >>>>>>>> see. >>>>>>>> >>>>>>>> Have fun :) >>>>>>>> Christian. >>>>>>>> >>>>>>>> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing): >>>>>>>>> How about adding more condition ctx->resv inline to address >>>>>>>>> your concern? As well as don't wait from same user, shouldn't >>>>>>>>> lead to deadlock. >>>>>>>>> >>>>>>>>> Otherwise, any other idea? >>>>>>>>> >>>>>>>>> -------- Original Message -------- >>>>>>>>> Subject: Re: [PATCH] ttm: wait mem space if user allow while >>>>>>>>> gpu busy >>>>>>>>> From: Christian König >>>>>>>>> To: "Liang, Prike" ,"Zhou, David(ChunMing)" >>>>>>>>> ,dri-devel@lists.freedesktop.org >>>>>>>>> CC: >>>>>>>>> >>>>>>>>> Well that is certainly a NAK because it can lead to deadlock >>>>>>>>> in the >>>>>>>>> memory management. >>>>>>>>> >>>>>>>>> You can't just busy wait with all those locks held. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Christian. >>>>>>>>> >>>>>>>>> Am 23.04.19 um 03:45 schrieb Liang, Prike: >>>>>>>>> > Acked-by: Prike Liang >>>>>>>>> > >>>>>>>>> > Thanks, >>>>>>>>> > Prike >>>>>>>>> > -----Original Message----- >>>>>>>>> > From: Chunming Zhou >>>>>>>>> > Sent: Monday, April 22, 2019 6:39 PM >>>>>>>>> > To: dri-devel@lists.freedesktop.org >>>>>>>>> > Cc: Liang, Prike ; Zhou, >>>>>>>>> David(ChunMing) >>>>>>>>> > Subject: [PATCH] ttm: wait mem space if user allow while gpu >>>>>>>>> busy >>>>>>>>> > >>>>>>>>> > heavy gpu job could occupy memory long time, which could >>>>>>>>> lead to other user fail to get memory. >>>>>>>>> > >>>>>>>>> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11 >>>>>>>>> > Signed-off-by: Chunming Zhou >>>>>>>>> > --- >>>>>>>>> >   drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++-- >>>>>>>>> >   1 file changed, 4 insertions(+), 2 deletions(-) >>>>>>>>> > >>>>>>>>> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c >>>>>>>>> b/drivers/gpu/drm/ttm/ttm_bo.c index >>>>>>>>> 7c484729f9b2..6c596cc24bec 100644 >>>>>>>>> > --- a/drivers/gpu/drm/ttm/ttm_bo.c >>>>>>>>> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c >>>>>>>>> > @@ -830,8 +830,10 @@ static int >>>>>>>>> ttm_bo_mem_force_space(struct ttm_buffer_object *bo, >>>>>>>>> >                if (mem->mm_node) >>>>>>>>> >                        break; >>>>>>>>> >                ret = ttm_mem_evict_first(bdev, mem_type, >>>>>>>>> place, ctx); >>>>>>>>> > -             if (unlikely(ret != 0)) >>>>>>>>> > -                     return ret; >>>>>>>>> > +             if (unlikely(ret != 0)) { >>>>>>>>> > +                     if (!ctx || ctx->no_wait_gpu || ret != >>>>>>>>> -EBUSY) >>>>>>>>> > +                             return ret; >>>>>>>>> > +             } >>>>>>>>> >        } while (1); >>>>>>>>> >        mem->mem_type = mem_type; >>>>>>>>> >        return ttm_bo_add_move_fence(bo, man, mem); >>>>>>>>> > -- >>>>>>>>> > 2.17.1 >>>>>>>>> > >>>>>>>>> > _______________________________________________ >>>>>>>>> > dri-devel mailing list >>>>>>>>> > dri-devel@lists.freedesktop.org >>>>>>>>> > https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> dri-devel mailing list >>>>>>>>> dri-devel@lists.freedesktop.org >>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> dri-devel mailing list >>>>>>>> dri-devel@lists.freedesktop.org >>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> dri-devel mailing list >>>>>> dri-devel@lists.freedesktop.org >>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>>> >>>> >>>> >>>> _______________________________________________ >>>> dri-devel mailing list >>>> dri-devel@lists.freedesktop.org >>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>> >> > --------------6D3D0A982A70D98625DC07C2 Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: 8bit

OK, Let's drop mine, then Could you draft a solution for that?


-David


On 2019年04月24日 16:59, Koenig, Christian wrote:
Am 24.04.19 um 10:11 schrieb zhoucm1:



On 2019年04月24日 16:07, Christian König wrote:
This is used in a work item, so you don't need to check for signals.
will remove.

And checking if the LRU is populated is mandatory here
How to check it outside of TTM? because the code is in dm.

Well just use a static cast on the first entry of the LRU.

We can't upstream that solution anyway, so just a hack should do for now.


or otherwise you can run into an endless loop.
I already add a timeout for that.

Thinking more about it we most likely actually need to grab the lock on the first BO entry from the LRU.


-David

Christian.

Am 24.04.19 um 09:59 schrieb zhoucm1:

how about new attached?


-David


On 2019年04月24日 15:30, Christian König wrote:
That would change the semantics of ttm_bo_mem_space() and so could change the return code in an IOCTL as well. Not a good idea, cause that could have a lot of side effects.

Instead in the calling DC code you could check if you get an -ENOMEM and then call schedule().

If after the schedule() we see that we have now BOs on the LRU we can try again and see if pinning the frame buffer now succeeds.

Christian.

Am 24.04.19 um 09:17 schrieb zhoucm1:

I made a patch as attached.

I'm not sure how to make patch as your proposal, Could you make a patch for that if mine isn't enough?

-David

On 2019年04月24日 15:12, Christian König wrote:
how about just adding a wrapper for pin function as below?
I considered this as well and don't think it will work reliable.

We could use it as a band aid for this specific problem, but in general we need to improve the handling in TTM to resolve those kind of resource conflicts.

Regards,
Christian.

Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):
>3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket.

The BO on LRU is already locked by cs user, can it be dropped here by DC user? and then DC user grab its lock with ticket, how does CS grab it again?

If you think waiting in ttm has this risk, how about just adding a wrapper for pin function as below?
amdgpu_get_pin_bo_timeout()
{
do {
amdgpo_bo_reserve();
r=amdgpu_bo_pin();

if(!r)
        break;
amdgpu_bo_unreserve();
timeout--;

} while(timeout>0);

}

-------- Original Message --------
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" ,dri-devel@lists.freedesktop.org
CC:

Well that's not so easy of hand.

The basic problem here is that when you busy wait at this place you can easily run into situations where application A busy waits for B while B busy waits for A -> deadlock.

So what we need here is the deadlock detection logic of the ww_mutex. To use this we at least need to do the following steps:

1. Reserve the BO in DC using a ww_mutex ticket (trivial).

2. If we then run into this EBUSY condition in TTM check if the BO we need memory for (or rather the ww_mutex of its reservation object) has a ticket assigned.

3. If we have a ticket we grab a reference to the first BO on the LRU, drop the LRU lock and try to grab the reservation lock with the ticket.

4. If getting the reservation lock with the ticket succeeded we check if the BO is still the first one on the LRU in question (the BO could have moved).

5. If the BO is still the first one on the LRU in question we try to evict it as we would evict any other BO.

6. If any of the "If's" above fail we just back off and return -EBUSY.

Steps 2-5 are certainly not trivial, but doable as far as I can see.

Have fun :)
Christian.

Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
How about adding more condition ctx->resv inline to address your concern? As well as don't wait from same user, shouldn't lead to deadlock.

Otherwise, any other idea?

-------- Original Message --------
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org
CC:

Well that is certainly a NAK because it can lead to deadlock in the
memory management.

You can't just busy wait with all those locks held.

Regards,
Christian.

Am 23.04.19 um 03:45 schrieb Liang, Prike:
> Acked-by: Prike Liang <Prike.Liang@amd.com>
>
> Thanks,
> Prike
> -----Original Message-----
> From: Chunming Zhou <david1.zhou@amd.com>
> Sent: Monday, April 22, 2019 6:39 PM
> To: dri-devel@lists.freedesktop.org
> Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, David(ChunMing) <David1.Zhou@amd.com>
> Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>
> heavy gpu job could occupy memory long time, which could lead to other user fail to get memory.
>
> Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
> Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
>                if (mem->mm_node)
>                        break;
>                ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);
> -             if (unlikely(ret != 0))
> -                     return ret;
> +             if (unlikely(ret != 0)) {
> +                     if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)
> +                             return ret;
> +             }
>        } while (1);
>        mem->mem_type = mem_type;
>        return ttm_bo_add_move_fence(bo, man, mem);
> --
> 2.17.1
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel



_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel



_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel




--------------6D3D0A982A70D98625DC07C2-- --===============0965436427== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs --===============0965436427==--