From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Zhou, David(ChunMing)" Subject: Re:[PATCH] ttm: wait mem space if user allow while gpu busy Date: Tue, 23 Apr 2019 15:09:05 +0000 Message-ID: <-jm8957cqk536djh1631fvvv-xx4wzb5q21ak-v8q7rd2l14aq-f68kxr7kbea18a7xceae626b-8s84c4d1mgrg53g0bhq3ahee89h70qrv4ly1-vf5a7d63x4mbquxnfmiehuk2-rwaw2m-qc2chu.1556032141262@email.android.com> References: <20190422103836.4300-1-david1.zhou@amd.com> <1c98b839-362d-c279-1abb-c022aed3abf1@gmail.com> , Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1769358761==" Return-path: Received: from NAM02-CY1-obe.outbound.protection.outlook.com (mail-eopbgr760083.outbound.protection.outlook.com [40.107.76.83]) by gabe.freedesktop.org (Postfix) with ESMTPS id 640A089113 for ; Tue, 23 Apr 2019 15:09:07 +0000 (UTC) In-Reply-To: Content-Language: en-US List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: "Koenig, Christian" , "Zhou, David(ChunMing)" , "Liang, Prike" , "dri-devel@lists.freedesktop.org" List-Id: dri-devel@lists.freedesktop.org --===============1769358761== Content-Language: en-US Content-Type: multipart/alternative; boundary="_000_jm8957cqk536djh1631fvvvxx4wzb5q21akv8q7rd2l14aqf68kxr7k_" --_000_jm8957cqk536djh1631fvvvxx4wzb5q21akv8q7rd2l14aqf68kxr7k_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable >3. If we have a ticket we grab a reference to the first BO on the LRU, dro= p the LRU lock and try to grab the reservation lock with the ticket. The BO on LRU is already locked by cs user, can it be dropped here by DC us= er? and then DC user grab its lock with ticket, how does CS grab it again? If you think waiting in ttm has this risk, how about just adding a wrapper = for pin function as below? amdgpu_get_pin_bo_timeout() { do { amdgpo_bo_reserve(); r=3Damdgpu_bo_pin(); if(!r) break; amdgpu_bo_unreserve(); timeout--; } while(timeout>0); } -------- Original Message -------- Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy From: Christian K=F6nig To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" ,dri-devel= @lists.freedesktop.org CC: Well that's not so easy of hand. The basic problem here is that when you busy wait at this place you can eas= ily run into situations where application A busy waits for B while B busy w= aits for A -> deadlock. So what we need here is the deadlock detection logic of the ww_mutex. To us= e this we at least need to do the following steps: 1. Reserve the BO in DC using a ww_mutex ticket (trivial). 2. If we then run into this EBUSY condition in TTM check if the BO we need = memory for (or rather the ww_mutex of its reservation object) has a ticket = assigned. 3. If we have a ticket we grab a reference to the first BO on the LRU, drop= the LRU lock and try to grab the reservation lock with the ticket. 4. If getting the reservation lock with the ticket succeeded we check if th= e BO is still the first one on the LRU in question (the BO could have moved= ). 5. If the BO is still the first one on the LRU in question we try to evict = it as we would evict any other BO. 6. If any of the "If's" above fail we just back off and return -EBUSY. Steps 2-5 are certainly not trivial, but doable as far as I can see. Have fun :) Christian. Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing): How about adding more condition ctx->resv inline to address your concern? A= s well as don't wait from same user, shouldn't lead to deadlock. Otherwise, any other idea? -------- Original Message -------- Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy From: Christian K=F6nig To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.or= g CC: Well that is certainly a NAK because it can lead to deadlock in the memory management. You can't just busy wait with all those locks held. Regards, Christian. Am 23.04.19 um 03:45 schrieb Liang, Prike: > Acked-by: Prike Liang > > Thanks, > Prike > -----Original Message----- > From: Chunming Zhou > Sent: Monday, April 22, 2019 6:39 PM > To: dri-devel@lists.freedesktop.org > Cc: Liang, Prike ; Zhou,= David(ChunMing) > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy > > heavy gpu job could occupy memory long time, which could lead to other us= er fail to get memory. > > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11 > Signed-off-by: Chunming Zhou > --- > drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c = index 7c484729f9b2..6c596cc24bec 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo.c > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_= object *bo, > if (mem->mm_node) > break; > ret =3D ttm_mem_evict_first(bdev, mem_type, place, ctx); > - if (unlikely(ret !=3D 0)) > - return ret; > + if (unlikely(ret !=3D 0)) { > + if (!ctx || ctx->no_wait_gpu || ret !=3D -EBUSY) > + return ret; > + } > } while (1); > mem->mem_type =3D mem_type; > return ttm_bo_add_move_fence(bo, man, mem); > -- > 2.17.1 > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel --_000_jm8957cqk536djh1631fvvvxx4wzb5q21akv8q7rd2l14aqf68kxr7k_ Content-Type: text/html; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable >3. If we have a ticket we grab a reference to the first BO on the LRU, = drop the LRU lock and try to grab the reservation lock with the ticket.

The BO on LRU is already locked by cs user, can it be dropped here by DC us= er? and then DC user grab its lock with ticket, how does CS grab it again?<= br>
If you think waiting in ttm has this risk, how about just adding a wrapper = for pin function as below?
amdgpu_get_pin_bo_timeout()
{
do {
amdgpo_bo_reserve();
r=3Damdgpu_bo_pin();

if(!r)
        break;
amdgpu_bo_unreserve();
timeout--;

} while(timeout>0);

}

-------- Original Message --------
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian K=F6nig
To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"= ;Liang, Prike" ,dri-devel@lists.freedesktop.org
CC:

Well that's not so easy of hand.

The basic problem here is that when you busy wait at this place you can eas= ily run into situations where application A busy waits for B while B busy w= aits for A -> deadlock.

So what we need here is the deadlock detection logic of the ww_mutex. To us= e this we at least need to do the following steps:

1. Reserve the BO in DC using a ww_mutex ticket (trivial).

2. If we then run into this EBUSY condition in TTM check if the BO we need = memory for (or rather the ww_mutex of its reservation object) has a ticket = assigned.

3. If we have a ticket we grab a reference to the first BO on the LRU, drop= the LRU lock and try to grab the reservation lock with the ticket.

4. If getting the reservation lock with the ticket succeeded we check if th= e BO is still the first one on the LRU in question (the BO could have moved= ).

5. If the BO is still the first one on the LRU in question we try to evict = it as we would evict any other BO.

6. If any of the "If's" above fail we just back off and return -E= BUSY.

Steps 2-5 are certainly not trivial, but doable as far as I can see.

Have fun :)
Christian.

Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
How about adding more condition ctx->resv inline to address your co= ncern? As well as don't wait from same user, shouldn't lead to deadlock.
Otherwise, any other idea?

-------- Original Message --------
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian K=F6nig
To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org
CC:

Well that is certainly a NAK because it can lead t= o deadlock in the
memory management.

You can't just busy wait with all those locks held.

Regards,
Christian.

Am 23.04.19 um 03:45 schrieb Liang, Prike:
> Acked-by: Prike Liang <Prike.Liang@amd.com>
>
> Thanks,
> Prike
> -----Original Message-----
> From: Chunming Zhou <david1.zhou@amd.com>
> Sent: Monday, April 22, 2019 6:39 PM
> To: dri-devel@lists.freedesktop.org
> Cc: Liang, Prike <Prike.Liang@amd.com>; Zhou, David(ChunMing) <David1.Zhou@amd.com>
> Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>
> heavy gpu job could occupy memory long time, which could lead to other= user fail to get memory.
>
> Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
> Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++-- >   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo= .c index 7c484729f9b2..6c596cc24bec 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct ttm_= buffer_object *bo,
>            = ;    if (mem->mm_node)
>            = ;            break;<= br> >            = ;    ret =3D ttm_mem_evict_first(bdev, mem_type, place, ctx)= ;
> -           &nb= sp; if (unlikely(ret !=3D 0))
> -           &nb= sp;         return ret;
> +           = ;  if (unlikely(ret !=3D 0)) {
> +           = ;          if (!ctx || ctx->= ;no_wait_gpu || ret !=3D -EBUSY)
> +           = ;            &n= bsp;     return ret;
> +           = ;  }
>        } while (1);
>        mem->mem_type =3D mem_typ= e;
>        return ttm_bo_add_move_fence= (bo, man, mem);
> --
> 2.17.1
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> h= ttps://lists.freedesktop.org/mailman/listinfo/dri-devel


______________________________________________=
_
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dr=
i-devel

--_000_jm8957cqk536djh1631fvvvxx4wzb5q21akv8q7rd2l14aqf68kxr7k_-- --===============1769358761== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs --===============1769358761==--