All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdgpu: Use dma_resv_lock instead in BO release_notify
@ 2021-05-21  5:26 xinhui pan
  2021-05-21 18:24 ` Felix Kuehling
  0 siblings, 1 reply; 4+ messages in thread
From: xinhui pan @ 2021-05-21  5:26 UTC (permalink / raw)
  To: amd-gfx; +Cc: alexander.deucher, Felix.Kuehling, xinhui pan, christian.koenig

The reservation object might be locked again by evict/swap after
individualized. The race is like below.
cpu 0					cpu 1
BO release				BO evict or swap
ttm_bo_individualize_resv {resv = &_resv}
					ttm_bo_evict_swapout_allowable
						dma_resv_trylock(resv)
->release_notify() {BUG_ON(!trylock(resv))}
					if (!ttm_bo_get_unless_zero))
						dma_resv_unlock(resv)
Actually this is not a bug if trylock fails. So use dma_resv_lock
instead.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 928e8d57cd08..beacb46265f8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -318,7 +318,7 @@ int amdgpu_amdkfd_remove_fence_on_pt_pd_bos(struct amdgpu_bo *bo)
 	ef = container_of(dma_fence_get(&info->eviction_fence->base),
 			struct amdgpu_amdkfd_fence, base);
 
-	BUG_ON(!dma_resv_trylock(bo->tbo.base.resv));
+	dma_resv_lock(bo->tbo.base.resv, NULL);
 	ret = amdgpu_amdkfd_remove_eviction_fence(bo, ef);
 	dma_resv_unlock(bo->tbo.base.resv);
 
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/amdgpu: Use dma_resv_lock instead in BO release_notify
  2021-05-21  5:26 [PATCH] drm/amdgpu: Use dma_resv_lock instead in BO release_notify xinhui pan
@ 2021-05-21 18:24 ` Felix Kuehling
  2021-05-21 18:27   ` Christian König
  2021-05-22  1:48   ` 回复: " Pan, Xinhui
  0 siblings, 2 replies; 4+ messages in thread
From: Felix Kuehling @ 2021-05-21 18:24 UTC (permalink / raw)
  To: xinhui pan, amd-gfx; +Cc: alexander.deucher, christian.koenig


Am 2021-05-21 um 1:26 a.m. schrieb xinhui pan:
> The reservation object might be locked again by evict/swap after
> individualized. The race is like below.
> cpu 0					cpu 1
> BO release				BO evict or swap
> ttm_bo_individualize_resv {resv = &_resv}
> 					ttm_bo_evict_swapout_allowable
> 						dma_resv_trylock(resv)
> ->release_notify() {BUG_ON(!trylock(resv))}
> 					if (!ttm_bo_get_unless_zero))
> 						dma_resv_unlock(resv)
> Actually this is not a bug if trylock fails. So use dma_resv_lock
> instead.

Please test this with LOCKDEP enabled. I believe the trylock here was
needed to avoid potential deadlocks. Maybe Christian can fill in more
details.

Regards,
  Felix


>
> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 928e8d57cd08..beacb46265f8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -318,7 +318,7 @@ int amdgpu_amdkfd_remove_fence_on_pt_pd_bos(struct amdgpu_bo *bo)
>  	ef = container_of(dma_fence_get(&info->eviction_fence->base),
>  			struct amdgpu_amdkfd_fence, base);
>  
> -	BUG_ON(!dma_resv_trylock(bo->tbo.base.resv));
> +	dma_resv_lock(bo->tbo.base.resv, NULL);
>  	ret = amdgpu_amdkfd_remove_eviction_fence(bo, ef);
>  	dma_resv_unlock(bo->tbo.base.resv);
>  
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/amdgpu: Use dma_resv_lock instead in BO release_notify
  2021-05-21 18:24 ` Felix Kuehling
@ 2021-05-21 18:27   ` Christian König
  2021-05-22  1:48   ` 回复: " Pan, Xinhui
  1 sibling, 0 replies; 4+ messages in thread
From: Christian König @ 2021-05-21 18:27 UTC (permalink / raw)
  To: Felix Kuehling, xinhui pan, amd-gfx; +Cc: alexander.deucher

Am 21.05.21 um 20:24 schrieb Felix Kuehling:
> Am 2021-05-21 um 1:26 a.m. schrieb xinhui pan:
>> The reservation object might be locked again by evict/swap after
>> individualized. The race is like below.
>> cpu 0					cpu 1
>> BO release				BO evict or swap
>> ttm_bo_individualize_resv {resv = &_resv}
>> 					ttm_bo_evict_swapout_allowable
>> 						dma_resv_trylock(resv)
>> ->release_notify() {BUG_ON(!trylock(resv))}
>> 					if (!ttm_bo_get_unless_zero))
>> 						dma_resv_unlock(resv)
>> Actually this is not a bug if trylock fails. So use dma_resv_lock
>> instead.
> Please test this with LOCKDEP enabled. I believe the trylock here was
> needed to avoid potential deadlocks. Maybe Christian can fill in more
> details.

Unfortunately I don't remember why trylock was needed here either.

But yes, testing with lockdep enabled is a really good idea.

Regards,
Christian.

>
> Regards,
>    Felix
>
>
>> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> index 928e8d57cd08..beacb46265f8 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> @@ -318,7 +318,7 @@ int amdgpu_amdkfd_remove_fence_on_pt_pd_bos(struct amdgpu_bo *bo)
>>   	ef = container_of(dma_fence_get(&info->eviction_fence->base),
>>   			struct amdgpu_amdkfd_fence, base);
>>   
>> -	BUG_ON(!dma_resv_trylock(bo->tbo.base.resv));
>> +	dma_resv_lock(bo->tbo.base.resv, NULL);
>>   	ret = amdgpu_amdkfd_remove_eviction_fence(bo, ef);
>>   	dma_resv_unlock(bo->tbo.base.resv);
>>   

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* 回复: [PATCH] drm/amdgpu: Use dma_resv_lock instead in BO release_notify
  2021-05-21 18:24 ` Felix Kuehling
  2021-05-21 18:27   ` Christian König
@ 2021-05-22  1:48   ` Pan, Xinhui
  1 sibling, 0 replies; 4+ messages in thread
From: Pan, Xinhui @ 2021-05-22  1:48 UTC (permalink / raw)
  To: Kuehling, Felix, amd-gfx; +Cc: Deucher, Alexander, Koenig, Christian

[AMD Official Use Only]

Oh, sorry for that. I notice the lockdep warning too.
I just think we use trylock elsewhere because we hold the lru_lock mostly.
So I think we can do something like below. Let me verify it later.

@@ -318,7 +318,9 @@ int amdgpu_amdkfd_remove_fence_on_pt_pd_bos(struct amdgpu_bo *bo)
        ef = container_of(dma_fence_get(&info->eviction_fence->base),
                        struct amdgpu_amdkfd_fence, base);

+       spin_lock(&bo->tbo.bdev->lru_lock);
        BUG_ON(!dma_resv_trylock(bo->tbo.base.resv));
+       spin_unlock(&bo->tbo.bdev->lru_lock);
        ret = amdgpu_amdkfd_remove_eviction_fence(bo, ef);
        dma_resv_unlock(bo->tbo.base.resv);


________________________________________
发件人: Kuehling, Felix <Felix.Kuehling@amd.com>
发送时间: 2021年5月22日 2:24
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander; Koenig, Christian
主题: Re: [PATCH] drm/amdgpu: Use dma_resv_lock instead in BO release_notify


Am 2021-05-21 um 1:26 a.m. schrieb xinhui pan:
> The reservation object might be locked again by evict/swap after
> individualized. The race is like below.
> cpu 0                                 cpu 1
> BO release                            BO evict or swap
> ttm_bo_individualize_resv {resv = &_resv}
>                                       ttm_bo_evict_swapout_allowable
>                                               dma_resv_trylock(resv)
> ->release_notify() {BUG_ON(!trylock(resv))}
>                                       if (!ttm_bo_get_unless_zero))
>                                               dma_resv_unlock(resv)
> Actually this is not a bug if trylock fails. So use dma_resv_lock
> instead.

Please test this with LOCKDEP enabled. I believe the trylock here was
needed to avoid potential deadlocks. Maybe Christian can fill in more
details.

Regards,
  Felix


>
> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 928e8d57cd08..beacb46265f8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -318,7 +318,7 @@ int amdgpu_amdkfd_remove_fence_on_pt_pd_bos(struct amdgpu_bo *bo)
>       ef = container_of(dma_fence_get(&info->eviction_fence->base),
>                       struct amdgpu_amdkfd_fence, base);
>
> -     BUG_ON(!dma_resv_trylock(bo->tbo.base.resv));
> +     dma_resv_lock(bo->tbo.base.resv, NULL);
>       ret = amdgpu_amdkfd_remove_eviction_fence(bo, ef);
>       dma_resv_unlock(bo->tbo.base.resv);
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-05-22  1:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-21  5:26 [PATCH] drm/amdgpu: Use dma_resv_lock instead in BO release_notify xinhui pan
2021-05-21 18:24 ` Felix Kuehling
2021-05-21 18:27   ` Christian König
2021-05-22  1:48   ` 回复: " Pan, Xinhui

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.