From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?Q?Christian_K=c3=b6nig?= Subject: Re: weird reservation issues Date: Wed, 28 Feb 2018 11:49:11 +0100 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1292331997==" Return-path: In-Reply-To: Content-Language: en-US List-Id: Discussion list for AMD gfx List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: amd-gfx-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org Sender: "amd-gfx" To: "Liu, Monk" , "amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org" This is a multi-part message in MIME format. --===============1292331997== Content-Type: multipart/alternative; boundary="------------30AFFC016205931E2C59A00B" Content-Language: en-US This is a multi-part message in MIME format. --------------30AFFC016205931E2C59A00B Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Hi Monk, yeah thinking about those issue currently as well. It indeed looks like we have some kind of race here. When we call backoff_reservation the next iteration of the look will reserve all BOs again, so that can't be the issue. Regards, Christian. Am 28.02.2018 um 11:37 schrieb Liu, Monk: > > Hi Christian > > In amdgpu_cs_parser_bos(), it calls ttm_eu_backoff_reservation() if > @need_pages not empty, is it safe ? > > You see that if two threads in parallel running in cs_parser_bos(), if > the thread1 did call backoff_reservation and continue, that leaves > > All following reservation functions on root BO executed without > protection from the lock, Meanwhile if this time thread2 call > cs_parser_bos() in parallel, it can > > Get the lock of the reservation object on root BO (assume thread 1/2 > share the same VM) immediately, so both of those threads consider they > are > > Under protection of lock of reservation on the root bo. Do you think > it’s a race issue ? > > You know that I recently hit BUG() in reservation.c, and also found > some weird bugs can be fixed by removing the kfree(obj->staged) > > In reservation_object_reserve_shared(), and I think you right on the > “staged” part that it shouldn’t be freed if everything go with rules ( > > Always held the lock of the bo) > > Thanks > > /Monk > --------------30AFFC016205931E2C59A00B Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit
Hi Monk,

yeah thinking about those issue currently as well. It indeed looks like we have some kind of race here.

When we call backoff_reservation the next iteration of the look will reserve all BOs again, so that can't be the issue.

Regards,
Christian.

Am 28.02.2018 um 11:37 schrieb Liu, Monk:

Hi Christian

 

In amdgpu_cs_parser_bos(), it calls ttm_eu_backoff_reservation() if @need_pages not empty, is it safe ?

 

You see that if two threads in parallel running in cs_parser_bos(), if the thread1 did call backoff_reservation and continue, that leaves

All following reservation functions on root BO executed without protection from the lock, Meanwhile if this time thread2 call cs_parser_bos() in parallel, it can

Get the lock of the reservation object on root BO (assume thread 1/2 share the same VM) immediately, so both of those threads consider they are

Under protection of lock of reservation on the root bo. Do you think it’s a race issue ?

 

You know that I recently hit BUG() in reservation.c, and also found some weird bugs can be fixed by removing the kfree(obj->staged)

In reservation_object_reserve_shared(), and I think you right on the “staged” part that it shouldn’t be freed if everything go with rules (

Always held the lock of the bo)

 

 

Thanks

/Monk

 


--------------30AFFC016205931E2C59A00B-- --===============1292331997== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KYW1kLWdmeCBt YWlsaW5nIGxpc3QKYW1kLWdmeEBsaXN0cy5mcmVlZGVza3RvcC5vcmcKaHR0cHM6Ly9saXN0cy5m cmVlZGVza3RvcC5vcmcvbWFpbG1hbi9saXN0aW5mby9hbWQtZ2Z4Cg== --===============1292331997==--