amd-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] drm/kfd: fix ttm_bo_release warning
@ 2021-09-23  9:44 Lang Yu
  2021-09-23 11:39 ` Christian König
  0 siblings, 1 reply; 15+ messages in thread
From: Lang Yu @ 2021-09-23  9:44 UTC (permalink / raw)
  To: amd-gfx; +Cc: Felix Kuehling, Christian K nig, Huang Rui, Lang Yu

If a BO is pinned, unpin it before freeing it.

Call Trace:
	ttm_bo_put+0x30/0x50 [ttm]
	amdgpu_bo_unref+0x1e/0x30 [amdgpu]
	amdgpu_gem_object_free+0x34/0x50 [amdgpu]
	drm_gem_object_free+0x1d/0x30 [drm]
	amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x31f/0x3a0 [amdgpu]
	kfd_process_device_free_bos+0xa3/0xf0 [amdgpu]
	kfd_process_wq_release+0x224/0x2e0 [amdgpu]
	process_one_work+0x220/0x3c0
	worker_thread+0x4d/0x3f0
	kthread+0x114/0x150
	process_one_work+0x3c0/0x3c0
	kthread_park+0x90/0x90
	ret_from_fork+0x22/0x30

Signed-off-by: Lang Yu <lang.yu@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 2d6b2d77b738..7e693b064072 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1567,6 +1567,9 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
 	pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va,
 		mem->va + bo_size * (1 + mem->aql_queue));
 
+	if (mem->bo->tbo.pin_count)
+		amdgpu_bo_unpin(mem->bo);
+
 	ret = unreserve_bo_and_vms(&ctx, false, false);
 
 	/* Remove from VM internal data structures */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] drm/kfd: fix ttm_bo_release warning
  2021-09-23  9:44 [PATCH] drm/kfd: fix ttm_bo_release warning Lang Yu
@ 2021-09-23 11:39 ` Christian König
  2021-09-23 12:09   ` Yu, Lang
  0 siblings, 1 reply; 15+ messages in thread
From: Christian König @ 2021-09-23 11:39 UTC (permalink / raw)
  To: Lang Yu, amd-gfx; +Cc: Felix Kuehling, Christian K nig, Huang Rui



Am 23.09.21 um 11:44 schrieb Lang Yu:
> If a BO is pinned, unpin it before freeing it.
>
> Call Trace:
> 	ttm_bo_put+0x30/0x50 [ttm]
> 	amdgpu_bo_unref+0x1e/0x30 [amdgpu]
> 	amdgpu_gem_object_free+0x34/0x50 [amdgpu]
> 	drm_gem_object_free+0x1d/0x30 [drm]
> 	amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x31f/0x3a0 [amdgpu]
> 	kfd_process_device_free_bos+0xa3/0xf0 [amdgpu]
> 	kfd_process_wq_release+0x224/0x2e0 [amdgpu]
> 	process_one_work+0x220/0x3c0
> 	worker_thread+0x4d/0x3f0
> 	kthread+0x114/0x150
> 	process_one_work+0x3c0/0x3c0
> 	kthread_park+0x90/0x90
> 	ret_from_fork+0x22/0x30
>
> Signed-off-by: Lang Yu <lang.yu@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 2d6b2d77b738..7e693b064072 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -1567,6 +1567,9 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>   	pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va,
>   		mem->va + bo_size * (1 + mem->aql_queue));
>   
> +	if (mem->bo->tbo.pin_count)
> +		amdgpu_bo_unpin(mem->bo);
> +

NAK, using mem->bo->tbo.pin_count like this is illegal.

Where was the BO pinned in the first place?

Christian.

>   	ret = unreserve_bo_and_vms(&ctx, false, false);
>   
>   	/* Remove from VM internal data structures */


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH] drm/kfd: fix ttm_bo_release warning
  2021-09-23 11:39 ` Christian König
@ 2021-09-23 12:09   ` Yu, Lang
  2021-09-23 12:23     ` Christian König
  2021-09-23 16:21     ` Felix Kuehling
  0 siblings, 2 replies; 15+ messages in thread
From: Yu, Lang @ 2021-09-23 12:09 UTC (permalink / raw)
  To: Koenig, Christian, amd-gfx; +Cc: Kuehling, Felix, Christian K nig, Huang, Ray

[AMD Official Use Only]



>-----Original Message-----
>From: Koenig, Christian <Christian.Koenig@amd.com>
>Sent: Thursday, September 23, 2021 7:40 PM
>To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org
>Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Christian K nig
><C3B6christian.koenig@amd.com>; Huang, Ray <Ray.Huang@amd.com>
>Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>
>
>
>Am 23.09.21 um 11:44 schrieb Lang Yu:
>> If a BO is pinned, unpin it before freeing it.
>>
>> Call Trace:
>> 	ttm_bo_put+0x30/0x50 [ttm]
>> 	amdgpu_bo_unref+0x1e/0x30 [amdgpu]
>> 	amdgpu_gem_object_free+0x34/0x50 [amdgpu]
>> 	drm_gem_object_free+0x1d/0x30 [drm]
>> 	amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x31f/0x3a0 [amdgpu]
>> 	kfd_process_device_free_bos+0xa3/0xf0 [amdgpu]
>> 	kfd_process_wq_release+0x224/0x2e0 [amdgpu]
>> 	process_one_work+0x220/0x3c0
>> 	worker_thread+0x4d/0x3f0
>> 	kthread+0x114/0x150
>> 	process_one_work+0x3c0/0x3c0
>> 	kthread_park+0x90/0x90
>> 	ret_from_fork+0x22/0x30
>>
>> Signed-off-by: Lang Yu <lang.yu@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> index 2d6b2d77b738..7e693b064072 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> @@ -1567,6 +1567,9 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>>   	pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va,
>>   		mem->va + bo_size * (1 + mem->aql_queue));
>>
>> +	if (mem->bo->tbo.pin_count)
>> +		amdgpu_bo_unpin(mem->bo);
>> +
>
>NAK, using mem->bo->tbo.pin_count like this is illegal.

I didn't  get your point. I referred to function-"void amdgpu_bo_unpin(struct amdgpu_bo *bo)",
which uses it like this.

>Where was the BO pinned in the first place?

I found two places:

	ret = kfd_process_alloc_gpuvm(pdd, qpd->ib_base, PAGE_SIZE, flags,
				      &kaddr);

	ret = kfd_process_alloc_gpuvm(pdd, qpd->cwsr_base,
				      KFD_CWSR_TBA_TMA_SIZE, flags, &kaddr);
Regards,
Lang

>Christian.
>
>>   	ret = unreserve_bo_and_vms(&ctx, false, false);
>>
>>   	/* Remove from VM internal data structures */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] drm/kfd: fix ttm_bo_release warning
  2021-09-23 12:09   ` Yu, Lang
@ 2021-09-23 12:23     ` Christian König
  2021-09-23 14:24       ` Yu, Lang
  2021-09-23 16:21     ` Felix Kuehling
  1 sibling, 1 reply; 15+ messages in thread
From: Christian König @ 2021-09-23 12:23 UTC (permalink / raw)
  To: Yu, Lang, amd-gfx; +Cc: Kuehling, Felix, Christian K nig, Huang, Ray

Am 23.09.21 um 14:09 schrieb Yu, Lang:
> [AMD Official Use Only]
>
>
>
>> -----Original Message-----
>> From: Koenig, Christian <Christian.Koenig@amd.com>
>> Sent: Thursday, September 23, 2021 7:40 PM
>> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org
>> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Christian K nig
>> <C3B6christian.koenig@amd.com>; Huang, Ray <Ray.Huang@amd.com>
>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>>
>>
>>
>> Am 23.09.21 um 11:44 schrieb Lang Yu:
>>> If a BO is pinned, unpin it before freeing it.
>>>
>>> Call Trace:
>>> 	ttm_bo_put+0x30/0x50 [ttm]
>>> 	amdgpu_bo_unref+0x1e/0x30 [amdgpu]
>>> 	amdgpu_gem_object_free+0x34/0x50 [amdgpu]
>>> 	drm_gem_object_free+0x1d/0x30 [drm]
>>> 	amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x31f/0x3a0 [amdgpu]
>>> 	kfd_process_device_free_bos+0xa3/0xf0 [amdgpu]
>>> 	kfd_process_wq_release+0x224/0x2e0 [amdgpu]
>>> 	process_one_work+0x220/0x3c0
>>> 	worker_thread+0x4d/0x3f0
>>> 	kthread+0x114/0x150
>>> 	process_one_work+0x3c0/0x3c0
>>> 	kthread_park+0x90/0x90
>>> 	ret_from_fork+0x22/0x30
>>>
>>> Signed-off-by: Lang Yu <lang.yu@amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +++
>>>    1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> index 2d6b2d77b738..7e693b064072 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> @@ -1567,6 +1567,9 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>>>    	pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va,
>>>    		mem->va + bo_size * (1 + mem->aql_queue));
>>>
>>> +	if (mem->bo->tbo.pin_count)
>>> +		amdgpu_bo_unpin(mem->bo);
>>> +
>> NAK, using mem->bo->tbo.pin_count like this is illegal.
> I didn't  get your point. I referred to function-"void amdgpu_bo_unpin(struct amdgpu_bo *bo)",
> which uses it like this.

What amdgpu_bo_unpin() does is the following:

         ttm_bo_unpin(&bo->tbo);
         if (bo->tbo.pin_count)
                 return;
....

In other words we take further actions based on if the buffer us is 
still pinned or not after an unpin operation.

What you try to do here is unpinning the BO when it is pinned 
independent if somebody else or our code has pinned it before.

That can lead to all kind of problems and is clearly illegal.

>> Where was the BO pinned in the first place?
> I found two places:
>
> 	ret = kfd_process_alloc_gpuvm(pdd, qpd->ib_base, PAGE_SIZE, flags,
> 				      &kaddr);
>
> 	ret = kfd_process_alloc_gpuvm(pdd, qpd->cwsr_base,
> 				      KFD_CWSR_TBA_TMA_SIZE, flags, &kaddr);

Well then you need to figure out where that memory is freed again and 
place the unpin appropriately.

General rule of thumb is that calls to amdgpu_bo_pin/amdgpu_bo_unpin 
should be balanced.

Regards,
Christian.

> Regards,
> Lang
>
>> Christian.
>>
>>>    	ret = unreserve_bo_and_vms(&ctx, false, false);
>>>
>>>    	/* Remove from VM internal data structures */


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH] drm/kfd: fix ttm_bo_release warning
  2021-09-23 12:23     ` Christian König
@ 2021-09-23 14:24       ` Yu, Lang
  2021-09-23 14:52         ` Christian König
  0 siblings, 1 reply; 15+ messages in thread
From: Yu, Lang @ 2021-09-23 14:24 UTC (permalink / raw)
  To: Koenig, Christian, amd-gfx; +Cc: Kuehling, Felix, Huang, Ray

[AMD Official Use Only]


>-----Original Message-----
>From: Koenig, Christian <Christian.Koenig@amd.com>
>Sent: Thursday, September 23, 2021 8:24 PM
>To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org
>Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Christian K nig
><C3B6christian.koenig@amd.com>; Huang, Ray <Ray.Huang@amd.com>
>Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>
>Am 23.09.21 um 14:09 schrieb Yu, Lang:
>> [AMD Official Use Only]
>>
>>
>>
>>> -----Original Message-----
>>> From: Koenig, Christian <Christian.Koenig@amd.com>
>>> Sent: Thursday, September 23, 2021 7:40 PM
>>> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org
>>> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Christian K nig
>>> <C3B6christian.koenig@amd.com>; Huang, Ray <Ray.Huang@amd.com>
>>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>>>
>>>
>>>
>>> Am 23.09.21 um 11:44 schrieb Lang Yu:
>>>> If a BO is pinned, unpin it before freeing it.
>>>>
>>>> Call Trace:
>>>> 	ttm_bo_put+0x30/0x50 [ttm]
>>>> 	amdgpu_bo_unref+0x1e/0x30 [amdgpu]
>>>> 	amdgpu_gem_object_free+0x34/0x50 [amdgpu]
>>>> 	drm_gem_object_free+0x1d/0x30 [drm]
>>>> 	amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x31f/0x3a0 [amdgpu]
>>>> 	kfd_process_device_free_bos+0xa3/0xf0 [amdgpu]
>>>> 	kfd_process_wq_release+0x224/0x2e0 [amdgpu]
>>>> 	process_one_work+0x220/0x3c0
>>>> 	worker_thread+0x4d/0x3f0
>>>> 	kthread+0x114/0x150
>>>> 	process_one_work+0x3c0/0x3c0
>>>> 	kthread_park+0x90/0x90
>>>> 	ret_from_fork+0x22/0x30
>>>>
>>>> Signed-off-by: Lang Yu <lang.yu@amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +++
>>>>    1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> index 2d6b2d77b738..7e693b064072 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> @@ -1567,6 +1567,9 @@ int
>amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>>>>    	pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va,
>>>>    		mem->va + bo_size * (1 + mem->aql_queue));
>>>>
>>>> +	if (mem->bo->tbo.pin_count)
>>>> +		amdgpu_bo_unpin(mem->bo);
>>>> +
>>> NAK, using mem->bo->tbo.pin_count like this is illegal.
>> I didn't  get your point. I referred to function-"void
>> amdgpu_bo_unpin(struct amdgpu_bo *bo)", which uses it like this.
>
>What amdgpu_bo_unpin() does is the following:
>
>         ttm_bo_unpin(&bo->tbo);
>         if (bo->tbo.pin_count)
>                 return;
>....
>
>In other words we take further actions based on if the buffer us is still pinned or
>not after an unpin operation.
>
>What you try to do here is unpinning the BO when it is pinned independent if
>somebody else or our code has pinned it before.

Hi Christian,

Thanks for your explanation and advice. I got your point.
Actually, these BOs are allocated and pinned during a kfd process lifecycle.
I will try to add a flag into struct kgd_mem to indicate which BO is pined 
and should be unpinned. Which will make amdgpu_bo_pin/amdgpu_bo_unpin 
calls balanced. Thanks!

Regards,
Lang
>
>That can lead to all kind of problems and is clearly illegal.
>
>>> Where was the BO pinned in the first place?
>> I found two places:
>>
>> 	ret = kfd_process_alloc_gpuvm(pdd, qpd->ib_base, PAGE_SIZE, flags,
>> 				      &kaddr);
>>
>> 	ret = kfd_process_alloc_gpuvm(pdd, qpd->cwsr_base,
>> 				      KFD_CWSR_TBA_TMA_SIZE, flags, &kaddr);
>
>Well then you need to figure out where that memory is freed again and place the
>unpin appropriately.
>
>General rule of thumb is that calls to amdgpu_bo_pin/amdgpu_bo_unpin should
>be balanced.
>
>Regards,
>Christian.
>
>> Regards,
>> Lang
>>
>>> Christian.
>>>
>>>>    	ret = unreserve_bo_and_vms(&ctx, false, false);
>>>>
>>>>    	/* Remove from VM internal data structures */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] drm/kfd: fix ttm_bo_release warning
  2021-09-23 14:24       ` Yu, Lang
@ 2021-09-23 14:52         ` Christian König
  2021-09-24  5:35           ` Yu, Lang
  0 siblings, 1 reply; 15+ messages in thread
From: Christian König @ 2021-09-23 14:52 UTC (permalink / raw)
  To: Yu, Lang, amd-gfx; +Cc: Kuehling, Felix, Huang, Ray

Am 23.09.21 um 16:24 schrieb Yu, Lang:
> [AMD Official Use Only]
>> -----Original Message-----
>> From: Koenig, Christian <Christian.Koenig@amd.com>
>> Sent: Thursday, September 23, 2021 8:24 PM
>> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org
>> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Christian K nig
>> <C3B6christian.koenig@amd.com>; Huang, Ray <Ray.Huang@amd.com>
>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>>
>> Am 23.09.21 um 14:09 schrieb Yu, Lang:
>>> [AMD Official Use Only]
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Koenig, Christian <Christian.Koenig@amd.com>
>>>> Sent: Thursday, September 23, 2021 7:40 PM
>>>> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org
>>>> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Christian K nig
>>>> <C3B6christian.koenig@amd.com>; Huang, Ray <Ray.Huang@amd.com>
>>>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>>>>
>>>>
>>>>
>>>> Am 23.09.21 um 11:44 schrieb Lang Yu:
>>>>> If a BO is pinned, unpin it before freeing it.
>>>>>
>>>>> Call Trace:
>>>>> 	ttm_bo_put+0x30/0x50 [ttm]
>>>>> 	amdgpu_bo_unref+0x1e/0x30 [amdgpu]
>>>>> 	amdgpu_gem_object_free+0x34/0x50 [amdgpu]
>>>>> 	drm_gem_object_free+0x1d/0x30 [drm]
>>>>> 	amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x31f/0x3a0 [amdgpu]
>>>>> 	kfd_process_device_free_bos+0xa3/0xf0 [amdgpu]
>>>>> 	kfd_process_wq_release+0x224/0x2e0 [amdgpu]
>>>>> 	process_one_work+0x220/0x3c0
>>>>> 	worker_thread+0x4d/0x3f0
>>>>> 	kthread+0x114/0x150
>>>>> 	process_one_work+0x3c0/0x3c0
>>>>> 	kthread_park+0x90/0x90
>>>>> 	ret_from_fork+0x22/0x30
>>>>>
>>>>> Signed-off-by: Lang Yu <lang.yu@amd.com>
>>>>> ---
>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +++
>>>>>     1 file changed, 3 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>> index 2d6b2d77b738..7e693b064072 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>> @@ -1567,6 +1567,9 @@ int
>> amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>>>>>     	pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va,
>>>>>     		mem->va + bo_size * (1 + mem->aql_queue));
>>>>>
>>>>> +	if (mem->bo->tbo.pin_count)
>>>>> +		amdgpu_bo_unpin(mem->bo);
>>>>> +
>>>> NAK, using mem->bo->tbo.pin_count like this is illegal.
>>> I didn't  get your point. I referred to function-"void
>>> amdgpu_bo_unpin(struct amdgpu_bo *bo)", which uses it like this.
>> What amdgpu_bo_unpin() does is the following:
>>
>>          ttm_bo_unpin(&bo->tbo);
>>          if (bo->tbo.pin_count)
>>                  return;
>> ....
>>
>> In other words we take further actions based on if the buffer us is still pinned or
>> not after an unpin operation.
>>
>> What you try to do here is unpinning the BO when it is pinned independent if
>> somebody else or our code has pinned it before.
> Hi Christian,
>
> Thanks for your explanation and advice. I got your point.
> Actually, these BOs are allocated and pinned during a kfd process lifecycle.
> I will try to add a flag into struct kgd_mem to indicate which BO is pined
> and should be unpinned. Which will make amdgpu_bo_pin/amdgpu_bo_unpin
> calls balanced. Thanks!

That isn't to much better. The real solution would be to unpin them when 
the kfd process is destroyed.

Regards,
Christian.

>
> Regards,
> Lang
>> That can lead to all kind of problems and is clearly illegal.
>>
>>>> Where was the BO pinned in the first place?
>>> I found two places:
>>>
>>> 	ret = kfd_process_alloc_gpuvm(pdd, qpd->ib_base, PAGE_SIZE, flags,
>>> 				      &kaddr);
>>>
>>> 	ret = kfd_process_alloc_gpuvm(pdd, qpd->cwsr_base,
>>> 				      KFD_CWSR_TBA_TMA_SIZE, flags, &kaddr);
>> Well then you need to figure out where that memory is freed again and place the
>> unpin appropriately.
>>
>> General rule of thumb is that calls to amdgpu_bo_pin/amdgpu_bo_unpin should
>> be balanced.
>>
>> Regards,
>> Christian.
>>
>>> Regards,
>>> Lang
>>>
>>>> Christian.
>>>>
>>>>>     	ret = unreserve_bo_and_vms(&ctx, false, false);
>>>>>
>>>>>     	/* Remove from VM internal data structures */


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] drm/kfd: fix ttm_bo_release warning
  2021-09-23 12:09   ` Yu, Lang
  2021-09-23 12:23     ` Christian König
@ 2021-09-23 16:21     ` Felix Kuehling
  2021-09-24  5:35       ` Yu, Lang
  1 sibling, 1 reply; 15+ messages in thread
From: Felix Kuehling @ 2021-09-23 16:21 UTC (permalink / raw)
  To: Yu, Lang, Koenig, Christian, amd-gfx; +Cc: Huang, Ray


On 2021-09-23 8:09 a.m., Yu, Lang wrote:
> [AMD Official Use Only]
>
>
>
>> -----Original Message-----
>> From: Koenig, Christian <Christian.Koenig@amd.com>
>> Sent: Thursday, September 23, 2021 7:40 PM
>> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org
>> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Christian K nig
>> <C3B6christian.koenig@amd.com>; Huang, Ray <Ray.Huang@amd.com>
>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>>
>>
>>
>> Am 23.09.21 um 11:44 schrieb Lang Yu:
>>> If a BO is pinned, unpin it before freeing it.
>>>
>>> Call Trace:
>>> 	ttm_bo_put+0x30/0x50 [ttm]
>>> 	amdgpu_bo_unref+0x1e/0x30 [amdgpu]
>>> 	amdgpu_gem_object_free+0x34/0x50 [amdgpu]
>>> 	drm_gem_object_free+0x1d/0x30 [drm]
>>> 	amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x31f/0x3a0 [amdgpu]
>>> 	kfd_process_device_free_bos+0xa3/0xf0 [amdgpu]
>>> 	kfd_process_wq_release+0x224/0x2e0 [amdgpu]
>>> 	process_one_work+0x220/0x3c0
>>> 	worker_thread+0x4d/0x3f0
>>> 	kthread+0x114/0x150
>>> 	process_one_work+0x3c0/0x3c0
>>> 	kthread_park+0x90/0x90
>>> 	ret_from_fork+0x22/0x30
>>>
>>> Signed-off-by: Lang Yu <lang.yu@amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +++
>>>    1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> index 2d6b2d77b738..7e693b064072 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> @@ -1567,6 +1567,9 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>>>    	pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va,
>>>    		mem->va + bo_size * (1 + mem->aql_queue));
>>>
>>> +	if (mem->bo->tbo.pin_count)
>>> +		amdgpu_bo_unpin(mem->bo);
>>> +
>> NAK, using mem->bo->tbo.pin_count like this is illegal.
> I didn't  get your point. I referred to function-"void amdgpu_bo_unpin(struct amdgpu_bo *bo)",
> which uses it like this.
>
>> Where was the BO pinned in the first place?
> I found two places:
>
> 	ret = kfd_process_alloc_gpuvm(pdd, qpd->ib_base, PAGE_SIZE, flags,
> 				      &kaddr);
>
> 	ret = kfd_process_alloc_gpuvm(pdd, qpd->cwsr_base,
> 				      KFD_CWSR_TBA_TMA_SIZE, flags, &kaddr);

These two allocations are created by the kernel mode driver. There is 
another case where a user-allocated BO can get pinned because we need to 
kmap it (in kfd_ioctl_create_event).

Regards,
   Felix


> Regards,
> Lang
>
>> Christian.
>>
>>>    	ret = unreserve_bo_and_vms(&ctx, false, false);
>>>
>>>    	/* Remove from VM internal data structures */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH] drm/kfd: fix ttm_bo_release warning
  2021-09-23 16:21     ` Felix Kuehling
@ 2021-09-24  5:35       ` Yu, Lang
  0 siblings, 0 replies; 15+ messages in thread
From: Yu, Lang @ 2021-09-24  5:35 UTC (permalink / raw)
  To: Kuehling, Felix, Koenig, Christian, amd-gfx; +Cc: Huang, Ray

[AMD Official Use Only]



>-----Original Message-----
>From: Kuehling, Felix <Felix.Kuehling@amd.com>
>Sent: Friday, September 24, 2021 12:21 AM
>To: Yu, Lang <Lang.Yu@amd.com>; Koenig, Christian
><Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org
>Cc: Huang, Ray <Ray.Huang@amd.com>
>Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>
>
>On 2021-09-23 8:09 a.m., Yu, Lang wrote:
>> [AMD Official Use Only]
>>
>>
>>
>>> -----Original Message-----
>>> From: Koenig, Christian <Christian.Koenig@amd.com>
>>> Sent: Thursday, September 23, 2021 7:40 PM
>>> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org
>>> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Christian K nig
>>> <C3B6christian.koenig@amd.com>; Huang, Ray <Ray.Huang@amd.com>
>>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>>>
>>>
>>>
>>> Am 23.09.21 um 11:44 schrieb Lang Yu:
>>>> If a BO is pinned, unpin it before freeing it.
>>>>
>>>> Call Trace:
>>>> 	ttm_bo_put+0x30/0x50 [ttm]
>>>> 	amdgpu_bo_unref+0x1e/0x30 [amdgpu]
>>>> 	amdgpu_gem_object_free+0x34/0x50 [amdgpu]
>>>> 	drm_gem_object_free+0x1d/0x30 [drm]
>>>> 	amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x31f/0x3a0 [amdgpu]
>>>> 	kfd_process_device_free_bos+0xa3/0xf0 [amdgpu]
>>>> 	kfd_process_wq_release+0x224/0x2e0 [amdgpu]
>>>> 	process_one_work+0x220/0x3c0
>>>> 	worker_thread+0x4d/0x3f0
>>>> 	kthread+0x114/0x150
>>>> 	process_one_work+0x3c0/0x3c0
>>>> 	kthread_park+0x90/0x90
>>>> 	ret_from_fork+0x22/0x30
>>>>
>>>> Signed-off-by: Lang Yu <lang.yu@amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +++
>>>>    1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> index 2d6b2d77b738..7e693b064072 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> @@ -1567,6 +1567,9 @@ int
>amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>>>>    	pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va,
>>>>    		mem->va + bo_size * (1 + mem->aql_queue));
>>>>
>>>> +	if (mem->bo->tbo.pin_count)
>>>> +		amdgpu_bo_unpin(mem->bo);
>>>> +
>>> NAK, using mem->bo->tbo.pin_count like this is illegal.
>> I didn't  get your point. I referred to function-"void
>> amdgpu_bo_unpin(struct amdgpu_bo *bo)", which uses it like this.
>>
>>> Where was the BO pinned in the first place?
>> I found two places:
>>
>> 	ret = kfd_process_alloc_gpuvm(pdd, qpd->ib_base, PAGE_SIZE, flags,
>> 				      &kaddr);
>>
>> 	ret = kfd_process_alloc_gpuvm(pdd, qpd->cwsr_base,
>> 				      KFD_CWSR_TBA_TMA_SIZE, flags, &kaddr);
>
>These two allocations are created by the kernel mode driver. There is another
>case where a user-allocated BO can get pinned because we need to kmap it (in
>kfd_ioctl_create_event).
>
>Regards,
>   Felix

Yes, these BOs will not be freed until a kfd process is destroyed.
I will make a v2 patch, please help review. Thanks!

Regards,
Lang 
>
>> Regards,
>> Lang
>>
>>> Christian.
>>>
>>>>    	ret = unreserve_bo_and_vms(&ctx, false, false);
>>>>
>>>>    	/* Remove from VM internal data structures */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH] drm/kfd: fix ttm_bo_release warning
  2021-09-23 14:52         ` Christian König
@ 2021-09-24  5:35           ` Yu, Lang
  2021-09-24  5:42             ` Christian König
  0 siblings, 1 reply; 15+ messages in thread
From: Yu, Lang @ 2021-09-24  5:35 UTC (permalink / raw)
  To: Koenig, Christian, amd-gfx; +Cc: Kuehling, Felix, Huang, Ray

[AMD Official Use Only]



>-----Original Message-----
>From: Koenig, Christian <Christian.Koenig@amd.com>
>Sent: Thursday, September 23, 2021 10:52 PM
>To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org
>Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Huang, Ray
><Ray.Huang@amd.com>
>Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>
>Am 23.09.21 um 16:24 schrieb Yu, Lang:
>> [AMD Official Use Only]
>>> -----Original Message-----
>>> From: Koenig, Christian <Christian.Koenig@amd.com>
>>> Sent: Thursday, September 23, 2021 8:24 PM
>>> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org
>>> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Christian K nig
>>> <C3B6christian.koenig@amd.com>; Huang, Ray <Ray.Huang@amd.com>
>>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>>>
>>> Am 23.09.21 um 14:09 schrieb Yu, Lang:
>>>> [AMD Official Use Only]
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Koenig, Christian <Christian.Koenig@amd.com>
>>>>> Sent: Thursday, September 23, 2021 7:40 PM
>>>>> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org
>>>>> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Christian K nig
>>>>> <C3B6christian.koenig@amd.com>; Huang, Ray <Ray.Huang@amd.com>
>>>>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>>>>>
>>>>>
>>>>>
>>>>> Am 23.09.21 um 11:44 schrieb Lang Yu:
>>>>>> If a BO is pinned, unpin it before freeing it.
>>>>>>
>>>>>> Call Trace:
>>>>>> 	ttm_bo_put+0x30/0x50 [ttm]
>>>>>> 	amdgpu_bo_unref+0x1e/0x30 [amdgpu]
>>>>>> 	amdgpu_gem_object_free+0x34/0x50 [amdgpu]
>>>>>> 	drm_gem_object_free+0x1d/0x30 [drm]
>>>>>> 	amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x31f/0x3a0 [amdgpu]
>>>>>> 	kfd_process_device_free_bos+0xa3/0xf0 [amdgpu]
>>>>>> 	kfd_process_wq_release+0x224/0x2e0 [amdgpu]
>>>>>> 	process_one_work+0x220/0x3c0
>>>>>> 	worker_thread+0x4d/0x3f0
>>>>>> 	kthread+0x114/0x150
>>>>>> 	process_one_work+0x3c0/0x3c0
>>>>>> 	kthread_park+0x90/0x90
>>>>>> 	ret_from_fork+0x22/0x30
>>>>>>
>>>>>> Signed-off-by: Lang Yu <lang.yu@amd.com>
>>>>>> ---
>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +++
>>>>>>     1 file changed, 3 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>>> index 2d6b2d77b738..7e693b064072 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>>> @@ -1567,6 +1567,9 @@ int
>>> amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>>>>>>     	pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va,
>>>>>>     		mem->va + bo_size * (1 + mem->aql_queue));
>>>>>>
>>>>>> +	if (mem->bo->tbo.pin_count)
>>>>>> +		amdgpu_bo_unpin(mem->bo);
>>>>>> +
>>>>> NAK, using mem->bo->tbo.pin_count like this is illegal.
>>>> I didn't  get your point. I referred to function-"void
>>>> amdgpu_bo_unpin(struct amdgpu_bo *bo)", which uses it like this.
>>> What amdgpu_bo_unpin() does is the following:
>>>
>>>          ttm_bo_unpin(&bo->tbo);
>>>          if (bo->tbo.pin_count)
>>>                  return;
>>> ....
>>>
>>> In other words we take further actions based on if the buffer us is
>>> still pinned or not after an unpin operation.
>>>
>>> What you try to do here is unpinning the BO when it is pinned
>>> independent if somebody else or our code has pinned it before.
>> Hi Christian,
>>
>> Thanks for your explanation and advice. I got your point.
>> Actually, these BOs are allocated and pinned during a kfd process lifecycle.
>> I will try to add a flag into struct kgd_mem to indicate which BO is
>> pined and should be unpinned. Which will make
>> amdgpu_bo_pin/amdgpu_bo_unpin calls balanced. Thanks!
>
>That isn't to much better. The real solution would be to unpin them when the kfd
>process is destroyed.

Yes, will unpin them when the kfd process is destroyed.
But we should indicate which BO is pinned and should be unpinned. Right?

Regards,
Lang
 
>Regards,
>Christian.
>
>>
>> Regards,
>> Lang
>>> That can lead to all kind of problems and is clearly illegal.
>>>
>>>>> Where was the BO pinned in the first place?
>>>> I found two places:
>>>>
>>>> 	ret = kfd_process_alloc_gpuvm(pdd, qpd->ib_base, PAGE_SIZE, flags,
>>>> 				      &kaddr);
>>>>
>>>> 	ret = kfd_process_alloc_gpuvm(pdd, qpd->cwsr_base,
>>>> 				      KFD_CWSR_TBA_TMA_SIZE, flags, &kaddr);
>>> Well then you need to figure out where that memory is freed again and
>>> place the unpin appropriately.
>>>
>>> General rule of thumb is that calls to amdgpu_bo_pin/amdgpu_bo_unpin
>>> should be balanced.
>>>
>>> Regards,
>>> Christian.
>>>
>>>> Regards,
>>>> Lang
>>>>
>>>>> Christian.
>>>>>
>>>>>>     	ret = unreserve_bo_and_vms(&ctx, false, false);
>>>>>>
>>>>>>     	/* Remove from VM internal data structures */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] drm/kfd: fix ttm_bo_release warning
  2021-09-24  5:35           ` Yu, Lang
@ 2021-09-24  5:42             ` Christian König
  2021-09-24  5:50               ` Yu, Lang
  0 siblings, 1 reply; 15+ messages in thread
From: Christian König @ 2021-09-24  5:42 UTC (permalink / raw)
  To: Yu, Lang, amd-gfx; +Cc: Kuehling, Felix, Huang, Ray

Am 24.09.21 um 07:35 schrieb Yu, Lang:
> [AMD Official Use Only]
>> -----Original Message-----
>> From: Koenig, Christian <Christian.Koenig@amd.com>
>> Sent: Thursday, September 23, 2021 10:52 PM
>> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org
>> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Huang, Ray
>> <Ray.Huang@amd.com>
>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>>
>> Am 23.09.21 um 16:24 schrieb Yu, Lang:
>>> [AMD Official Use Only]
>>>> -----Original Message-----
>>>> From: Koenig, Christian <Christian.Koenig@amd.com>
>>>> Sent: Thursday, September 23, 2021 8:24 PM
>>>> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org
>>>> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Christian K nig
>>>> <C3B6christian.koenig@amd.com>; Huang, Ray <Ray.Huang@amd.com>
>>>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>>>>
>>>> Am 23.09.21 um 14:09 schrieb Yu, Lang:
>>>>> [AMD Official Use Only]
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Koenig, Christian <Christian.Koenig@amd.com>
>>>>>> Sent: Thursday, September 23, 2021 7:40 PM
>>>>>> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org
>>>>>> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Christian K nig
>>>>>> <C3B6christian.koenig@amd.com>; Huang, Ray <Ray.Huang@amd.com>
>>>>>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>>>>>>
>>>>>>
>>>>>>
>>>>>> Am 23.09.21 um 11:44 schrieb Lang Yu:
>>>>>>> If a BO is pinned, unpin it before freeing it.
>>>>>>>
>>>>>>> Call Trace:
>>>>>>> 	ttm_bo_put+0x30/0x50 [ttm]
>>>>>>> 	amdgpu_bo_unref+0x1e/0x30 [amdgpu]
>>>>>>> 	amdgpu_gem_object_free+0x34/0x50 [amdgpu]
>>>>>>> 	drm_gem_object_free+0x1d/0x30 [drm]
>>>>>>> 	amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x31f/0x3a0 [amdgpu]
>>>>>>> 	kfd_process_device_free_bos+0xa3/0xf0 [amdgpu]
>>>>>>> 	kfd_process_wq_release+0x224/0x2e0 [amdgpu]
>>>>>>> 	process_one_work+0x220/0x3c0
>>>>>>> 	worker_thread+0x4d/0x3f0
>>>>>>> 	kthread+0x114/0x150
>>>>>>> 	process_one_work+0x3c0/0x3c0
>>>>>>> 	kthread_park+0x90/0x90
>>>>>>> 	ret_from_fork+0x22/0x30
>>>>>>>
>>>>>>> Signed-off-by: Lang Yu <lang.yu@amd.com>
>>>>>>> ---
>>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +++
>>>>>>>      1 file changed, 3 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>>>> index 2d6b2d77b738..7e693b064072 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>>>> @@ -1567,6 +1567,9 @@ int
>>>> amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>>>>>>>      	pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va,
>>>>>>>      		mem->va + bo_size * (1 + mem->aql_queue));
>>>>>>>
>>>>>>> +	if (mem->bo->tbo.pin_count)
>>>>>>> +		amdgpu_bo_unpin(mem->bo);
>>>>>>> +
>>>>>> NAK, using mem->bo->tbo.pin_count like this is illegal.
>>>>> I didn't  get your point. I referred to function-"void
>>>>> amdgpu_bo_unpin(struct amdgpu_bo *bo)", which uses it like this.
>>>> What amdgpu_bo_unpin() does is the following:
>>>>
>>>>           ttm_bo_unpin(&bo->tbo);
>>>>           if (bo->tbo.pin_count)
>>>>                   return;
>>>> ....
>>>>
>>>> In other words we take further actions based on if the buffer us is
>>>> still pinned or not after an unpin operation.
>>>>
>>>> What you try to do here is unpinning the BO when it is pinned
>>>> independent if somebody else or our code has pinned it before.
>>> Hi Christian,
>>>
>>> Thanks for your explanation and advice. I got your point.
>>> Actually, these BOs are allocated and pinned during a kfd process lifecycle.
>>> I will try to add a flag into struct kgd_mem to indicate which BO is
>>> pined and should be unpinned. Which will make
>>> amdgpu_bo_pin/amdgpu_bo_unpin calls balanced. Thanks!
>> That isn't to much better. The real solution would be to unpin them when the kfd
>> process is destroyed.
> Yes, will unpin them when the kfd process is destroyed.
> But we should indicate which BO is pinned and should be unpinned. Right?

Well not with a flag or something like that.

The knowledge which BO is pinned and needs to be unpinned should come 
from the control logic and not be papered over by some general handling. 
That's the background why we have removed the general handling for this 
from TTM in the first place.

In other words when you need to pin a BO because it is kmapped it should 
be unpinned when it is kunmapped and if you don't kunmap at all then 
there is something wrong with the code structure from a higher level 
point of view.

Regards,
Christian.

>
> Regards,
> Lang
>   
>> Regards,
>> Christian.
>>
>>> Regards,
>>> Lang
>>>> That can lead to all kind of problems and is clearly illegal.
>>>>
>>>>>> Where was the BO pinned in the first place?
>>>>> I found two places:
>>>>>
>>>>> 	ret = kfd_process_alloc_gpuvm(pdd, qpd->ib_base, PAGE_SIZE, flags,
>>>>> 				      &kaddr);
>>>>>
>>>>> 	ret = kfd_process_alloc_gpuvm(pdd, qpd->cwsr_base,
>>>>> 				      KFD_CWSR_TBA_TMA_SIZE, flags, &kaddr);
>>>> Well then you need to figure out where that memory is freed again and
>>>> place the unpin appropriately.
>>>>
>>>> General rule of thumb is that calls to amdgpu_bo_pin/amdgpu_bo_unpin
>>>> should be balanced.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> Regards,
>>>>> Lang
>>>>>
>>>>>> Christian.
>>>>>>
>>>>>>>      	ret = unreserve_bo_and_vms(&ctx, false, false);
>>>>>>>
>>>>>>>      	/* Remove from VM internal data structures */


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH] drm/kfd: fix ttm_bo_release warning
  2021-09-24  5:42             ` Christian König
@ 2021-09-24  5:50               ` Yu, Lang
  2021-09-24  5:54                 ` Christian König
  0 siblings, 1 reply; 15+ messages in thread
From: Yu, Lang @ 2021-09-24  5:50 UTC (permalink / raw)
  To: Koenig, Christian, amd-gfx; +Cc: Kuehling, Felix, Huang, Ray

[AMD Official Use Only]



>-----Original Message-----
>From: Koenig, Christian <Christian.Koenig@amd.com>
>Sent: Friday, September 24, 2021 1:42 PM
>To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org
>Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Huang, Ray
><Ray.Huang@amd.com>
>Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>
>Am 24.09.21 um 07:35 schrieb Yu, Lang:
>> [AMD Official Use Only]
>>> -----Original Message-----
>>> From: Koenig, Christian <Christian.Koenig@amd.com>
>>> Sent: Thursday, September 23, 2021 10:52 PM
>>> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org
>>> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Huang, Ray
>>> <Ray.Huang@amd.com>
>>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>>>
>>> Am 23.09.21 um 16:24 schrieb Yu, Lang:
>>>> [AMD Official Use Only]
>>>>> -----Original Message-----
>>>>> From: Koenig, Christian <Christian.Koenig@amd.com>
>>>>> Sent: Thursday, September 23, 2021 8:24 PM
>>>>> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org
>>>>> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Christian K nig
>>>>> <C3B6christian.koenig@amd.com>; Huang, Ray <Ray.Huang@amd.com>
>>>>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>>>>>
>>>>> Am 23.09.21 um 14:09 schrieb Yu, Lang:
>>>>>> [AMD Official Use Only]
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Koenig, Christian <Christian.Koenig@amd.com>
>>>>>>> Sent: Thursday, September 23, 2021 7:40 PM
>>>>>>> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org
>>>>>>> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Christian K nig
>>>>>>> <C3B6christian.koenig@amd.com>; Huang, Ray <Ray.Huang@amd.com>
>>>>>>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Am 23.09.21 um 11:44 schrieb Lang Yu:
>>>>>>>> If a BO is pinned, unpin it before freeing it.
>>>>>>>>
>>>>>>>> Call Trace:
>>>>>>>> 	ttm_bo_put+0x30/0x50 [ttm]
>>>>>>>> 	amdgpu_bo_unref+0x1e/0x30 [amdgpu]
>>>>>>>> 	amdgpu_gem_object_free+0x34/0x50 [amdgpu]
>>>>>>>> 	drm_gem_object_free+0x1d/0x30 [drm]
>>>>>>>> 	amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x31f/0x3a0
>[amdgpu]
>>>>>>>> 	kfd_process_device_free_bos+0xa3/0xf0 [amdgpu]
>>>>>>>> 	kfd_process_wq_release+0x224/0x2e0 [amdgpu]
>>>>>>>> 	process_one_work+0x220/0x3c0
>>>>>>>> 	worker_thread+0x4d/0x3f0
>>>>>>>> 	kthread+0x114/0x150
>>>>>>>> 	process_one_work+0x3c0/0x3c0
>>>>>>>> 	kthread_park+0x90/0x90
>>>>>>>> 	ret_from_fork+0x22/0x30
>>>>>>>>
>>>>>>>> Signed-off-by: Lang Yu <lang.yu@amd.com>
>>>>>>>> ---
>>>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +++
>>>>>>>>      1 file changed, 3 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>>>>> index 2d6b2d77b738..7e693b064072 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>>>>> @@ -1567,6 +1567,9 @@ int
>>>>> amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>>>>>>>>      	pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va,
>>>>>>>>      		mem->va + bo_size * (1 + mem->aql_queue));
>>>>>>>>
>>>>>>>> +	if (mem->bo->tbo.pin_count)
>>>>>>>> +		amdgpu_bo_unpin(mem->bo);
>>>>>>>> +
>>>>>>> NAK, using mem->bo->tbo.pin_count like this is illegal.
>>>>>> I didn't  get your point. I referred to function-"void
>>>>>> amdgpu_bo_unpin(struct amdgpu_bo *bo)", which uses it like this.
>>>>> What amdgpu_bo_unpin() does is the following:
>>>>>
>>>>>           ttm_bo_unpin(&bo->tbo);
>>>>>           if (bo->tbo.pin_count)
>>>>>                   return;
>>>>> ....
>>>>>
>>>>> In other words we take further actions based on if the buffer us is
>>>>> still pinned or not after an unpin operation.
>>>>>
>>>>> What you try to do here is unpinning the BO when it is pinned
>>>>> independent if somebody else or our code has pinned it before.
>>>> Hi Christian,
>>>>
>>>> Thanks for your explanation and advice. I got your point.
>>>> Actually, these BOs are allocated and pinned during a kfd process lifecycle.
>>>> I will try to add a flag into struct kgd_mem to indicate which BO is
>>>> pined and should be unpinned. Which will make
>>>> amdgpu_bo_pin/amdgpu_bo_unpin calls balanced. Thanks!
>>> That isn't to much better. The real solution would be to unpin them
>>> when the kfd process is destroyed.
>> Yes, will unpin them when the kfd process is destroyed.
>> But we should indicate which BO is pinned and should be unpinned. Right?
>
>Well not with a flag or something like that.
>
>The knowledge which BO is pinned and needs to be unpinned should come from
>the control logic and not be papered over by some general handling.
>That's the background why we have removed the general handling for this from
>TTM in the first place.
>
>In other words when you need to pin a BO because it is kmapped it should be
>unpinned when it is kunmapped and if you don't kunmap at all then there is
>something wrong with the code structure from a higher level point of view.

Yes, this function "amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel" did a kmap,
but without a kunmap when the kfd process is destroyed. The flag actually indicates kmap/kunmap.

Regards,
Lang

>Regards,
>Christian.
>
>>
>> Regards,
>> Lang
>>
>>> Regards,
>>> Christian.
>>>
>>>> Regards,
>>>> Lang
>>>>> That can lead to all kind of problems and is clearly illegal.
>>>>>
>>>>>>> Where was the BO pinned in the first place?
>>>>>> I found two places:
>>>>>>
>>>>>> 	ret = kfd_process_alloc_gpuvm(pdd, qpd->ib_base, PAGE_SIZE, flags,
>>>>>> 				      &kaddr);
>>>>>>
>>>>>> 	ret = kfd_process_alloc_gpuvm(pdd, qpd->cwsr_base,
>>>>>> 				      KFD_CWSR_TBA_TMA_SIZE, flags, &kaddr);
>>>>> Well then you need to figure out where that memory is freed again
>>>>> and place the unpin appropriately.
>>>>>
>>>>> General rule of thumb is that calls to
>>>>> amdgpu_bo_pin/amdgpu_bo_unpin should be balanced.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>> Regards,
>>>>>> Lang
>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>>      	ret = unreserve_bo_and_vms(&ctx, false, false);
>>>>>>>>
>>>>>>>>      	/* Remove from VM internal data structures */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] drm/kfd: fix ttm_bo_release warning
  2021-09-24  5:50               ` Yu, Lang
@ 2021-09-24  5:54                 ` Christian König
  2021-09-24  6:34                   ` Yu, Lang
  0 siblings, 1 reply; 15+ messages in thread
From: Christian König @ 2021-09-24  5:54 UTC (permalink / raw)
  To: Yu, Lang, amd-gfx; +Cc: Kuehling, Felix, Huang, Ray


Am 24.09.21 um 07:50 schrieb Yu, Lang:
> [AMD Official Use Only]
>> [SNIP]
>>>>> Hi Christian,
>>>>>
>>>>> Thanks for your explanation and advice. I got your point.
>>>>> Actually, these BOs are allocated and pinned during a kfd process lifecycle.
>>>>> I will try to add a flag into struct kgd_mem to indicate which BO is
>>>>> pined and should be unpinned. Which will make
>>>>> amdgpu_bo_pin/amdgpu_bo_unpin calls balanced. Thanks!
>>>> That isn't to much better. The real solution would be to unpin them
>>>> when the kfd process is destroyed.
>>> Yes, will unpin them when the kfd process is destroyed.
>>> But we should indicate which BO is pinned and should be unpinned. Right?
>> Well not with a flag or something like that.
>>
>> The knowledge which BO is pinned and needs to be unpinned should come from
>> the control logic and not be papered over by some general handling.
>> That's the background why we have removed the general handling for this from
>> TTM in the first place.
>>
>> In other words when you need to pin a BO because it is kmapped it should be
>> unpinned when it is kunmapped and if you don't kunmap at all then there is
>> something wrong with the code structure from a higher level point of view.
> Yes, this function "amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel" did a kmap,
> but without a kunmap when the kfd process is destroyed. The flag actually indicates kmap/kunmap.

Well that's the wrong approach then. I mean you need to have the BO 
reference and the mapped pointer somewhere, don't you?

How do you clean those up?

Regards,
Christian.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH] drm/kfd: fix ttm_bo_release warning
  2021-09-24  5:54                 ` Christian König
@ 2021-09-24  6:34                   ` Yu, Lang
  2021-09-24  6:37                     ` Christian König
  0 siblings, 1 reply; 15+ messages in thread
From: Yu, Lang @ 2021-09-24  6:34 UTC (permalink / raw)
  To: Koenig, Christian, amd-gfx; +Cc: Kuehling, Felix, Huang, Ray

[AMD Official Use Only]



>-----Original Message-----
>From: Koenig, Christian <Christian.Koenig@amd.com>
>Sent: Friday, September 24, 2021 1:54 PM
>To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org
>Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Huang, Ray
><Ray.Huang@amd.com>
>Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>
>
>Am 24.09.21 um 07:50 schrieb Yu, Lang:
>> [AMD Official Use Only]
>>> [SNIP]
>>>>>> Hi Christian,
>>>>>>
>>>>>> Thanks for your explanation and advice. I got your point.
>>>>>> Actually, these BOs are allocated and pinned during a kfd process lifecycle.
>>>>>> I will try to add a flag into struct kgd_mem to indicate which BO
>>>>>> is pined and should be unpinned. Which will make
>>>>>> amdgpu_bo_pin/amdgpu_bo_unpin calls balanced. Thanks!
>>>>> That isn't to much better. The real solution would be to unpin them
>>>>> when the kfd process is destroyed.
>>>> Yes, will unpin them when the kfd process is destroyed.
>>>> But we should indicate which BO is pinned and should be unpinned. Right?
>>> Well not with a flag or something like that.
>>>
>>> The knowledge which BO is pinned and needs to be unpinned should come
>>> from the control logic and not be papered over by some general handling.
>>> That's the background why we have removed the general handling for
>>> this from TTM in the first place.
>>>
>>> In other words when you need to pin a BO because it is kmapped it
>>> should be unpinned when it is kunmapped and if you don't kunmap at
>>> all then there is something wrong with the code structure from a higher level
>point of view.
>> Yes, this function "amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel" did a
>> kmap, but without a kunmap when the kfd process is destroyed. The flag
>actually indicates kmap/kunmap.
>
>Well that's the wrong approach then. I mean you need to have the BO reference
>and the mapped pointer somewhere, don't you?
>
>How do you clean those up?

They are respectively cleaned by " kfd_process_device_free_bos " and " kfd_process_destroy_pdds".
Let me put the code here. Thanks!

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index ec028cf963f5..d65b3bf13fd8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -81,6 +81,7 @@ struct kgd_mem {

        bool aql_queue;
        bool is_imported;
+       bool is_mapped_to_kernel;
 };

 /* KFD Memory Eviction */
@@ -278,6 +279,8 @@ int amdgpu_amdkfd_gpuvm_sync_memory(
                struct kgd_dev *kgd, struct kgd_mem *mem, bool intr);
 int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd,
                struct kgd_mem *mem, void **kptr, uint64_t *size);
+void amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(struct kgd_dev *kgd,
+               struct kgd_mem *mem);
 int amdgpu_amdkfd_gpuvm_restore_process_bos(void *process_info,
                                            struct dma_fence **ef);
 int amdgpu_amdkfd_gpuvm_get_vm_fault_info(struct kgd_dev *kgd,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 2d6b2d77b738..45ccbe9f63ee 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1857,6 +1857,8 @@ int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd,

        amdgpu_bo_unreserve(bo);

+       mem->is_mapped_to_kernel = true;
+
        mutex_unlock(&mem->process_info->lock);
        return 0;

@@ -1870,6 +1872,20 @@ int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd,
        return ret;
 }

+void amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(struct kgd_dev *kgd, struct kgd_mem *mem)
+{
+       struct amdgpu_bo *bo = mem->bo;
+
+       if (!mem->is_mapped_to_kernel)
+               return;
+
+       amdgpu_bo_reserve(bo, true);
+       amdgpu_bo_kunmap(bo);
+       amdgpu_bo_unpin(bo);
+       amdgpu_bo_unreserve(bo);
+       mem->is_mapped_to_kernel = false;
+}
+
 int amdgpu_amdkfd_gpuvm_get_vm_fault_info(struct kgd_dev *kgd,
                                              struct kfd_vm_fault_info *mem)
 {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 21ec8a18cad2..f5506b153aed 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -941,6 +941,8 @@ static void kfd_process_device_free_bos(struct kfd_process_device *pdd)
                                peer_pdd->dev->kgd, mem, peer_pdd->drm_priv);
                }

+               amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(pdd->dev->kgd, mem);
+
                amdgpu_amdkfd_gpuvm_free_memory_of_gpu(pdd->dev->kgd, mem,
                                                       pdd->drm_priv, NULL);
                kfd_process_device_remove_obj_handle(pdd, id);

Regards,
Lang

>Regards,
>Christian.

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] drm/kfd: fix ttm_bo_release warning
  2021-09-24  6:34                   ` Yu, Lang
@ 2021-09-24  6:37                     ` Christian König
  2021-09-24 10:37                       ` Yu, Lang
  0 siblings, 1 reply; 15+ messages in thread
From: Christian König @ 2021-09-24  6:37 UTC (permalink / raw)
  To: Yu, Lang, Koenig, Christian, amd-gfx; +Cc: Kuehling, Felix, Huang, Ray

Am 24.09.21 um 08:34 schrieb Yu, Lang:
> [AMD Official Use Only]
>
>
>
>> -----Original Message-----
>> From: Koenig, Christian <Christian.Koenig@amd.com>
>> Sent: Friday, September 24, 2021 1:54 PM
>> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org
>> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Huang, Ray
>> <Ray.Huang@amd.com>
>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>>
>>
>> Am 24.09.21 um 07:50 schrieb Yu, Lang:
>>> [AMD Official Use Only]
>>>> [SNIP]
>>>>>>> Hi Christian,
>>>>>>>
>>>>>>> Thanks for your explanation and advice. I got your point.
>>>>>>> Actually, these BOs are allocated and pinned during a kfd process lifecycle.
>>>>>>> I will try to add a flag into struct kgd_mem to indicate which BO
>>>>>>> is pined and should be unpinned. Which will make
>>>>>>> amdgpu_bo_pin/amdgpu_bo_unpin calls balanced. Thanks!
>>>>>> That isn't to much better. The real solution would be to unpin them
>>>>>> when the kfd process is destroyed.
>>>>> Yes, will unpin them when the kfd process is destroyed.
>>>>> But we should indicate which BO is pinned and should be unpinned. Right?
>>>> Well not with a flag or something like that.
>>>>
>>>> The knowledge which BO is pinned and needs to be unpinned should come
>>>> from the control logic and not be papered over by some general handling.
>>>> That's the background why we have removed the general handling for
>>>> this from TTM in the first place.
>>>>
>>>> In other words when you need to pin a BO because it is kmapped it
>>>> should be unpinned when it is kunmapped and if you don't kunmap at
>>>> all then there is something wrong with the code structure from a higher level
>> point of view.
>>> Yes, this function "amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel" did a
>>> kmap, but without a kunmap when the kfd process is destroyed. The flag
>> actually indicates kmap/kunmap.
>>
>> Well that's the wrong approach then. I mean you need to have the BO reference
>> and the mapped pointer somewhere, don't you?
>>
>> How do you clean those up?
> They are respectively cleaned by " kfd_process_device_free_bos " and " kfd_process_destroy_pdds".
> Let me put the code here. Thanks!
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index ec028cf963f5..d65b3bf13fd8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -81,6 +81,7 @@ struct kgd_mem {
>
>          bool aql_queue;
>          bool is_imported;
> +       bool is_mapped_to_kernel;

Yeah, that is exactly what you absolutely should NOT do.

>   };
>
>   /* KFD Memory Eviction */
> @@ -278,6 +279,8 @@ int amdgpu_amdkfd_gpuvm_sync_memory(
>                  struct kgd_dev *kgd, struct kgd_mem *mem, bool intr);
>   int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd,
>                  struct kgd_mem *mem, void **kptr, uint64_t *size);

The real question is who is calling this function here?

> +void amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(struct kgd_dev *kgd,
> +               struct kgd_mem *mem);
>   int amdgpu_amdkfd_gpuvm_restore_process_bos(void *process_info,
>                                              struct dma_fence **ef);
>   int amdgpu_amdkfd_gpuvm_get_vm_fault_info(struct kgd_dev *kgd,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 2d6b2d77b738..45ccbe9f63ee 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -1857,6 +1857,8 @@ int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd,
>
>          amdgpu_bo_unreserve(bo);
>
> +       mem->is_mapped_to_kernel = true;
> +
>          mutex_unlock(&mem->process_info->lock);
>          return 0;
>
> @@ -1870,6 +1872,20 @@ int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd,
>          return ret;
>   }
>
> +void amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(struct kgd_dev *kgd, struct kgd_mem *mem)
> +{
> +       struct amdgpu_bo *bo = mem->bo;
> +
> +       if (!mem->is_mapped_to_kernel)
> +               return;
> +
> +       amdgpu_bo_reserve(bo, true);
> +       amdgpu_bo_kunmap(bo);
> +       amdgpu_bo_unpin(bo);
> +       amdgpu_bo_unreserve(bo);
> +       mem->is_mapped_to_kernel = false;
> +}
> +
>   int amdgpu_amdkfd_gpuvm_get_vm_fault_info(struct kgd_dev *kgd,
>                                                struct kfd_vm_fault_info *mem)
>   {
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index 21ec8a18cad2..f5506b153aed 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -941,6 +941,8 @@ static void kfd_process_device_free_bos(struct kfd_process_device *pdd)
>                                  peer_pdd->dev->kgd, mem, peer_pdd->drm_priv);
>                  }
>
> +               amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(pdd->dev->kgd, mem);
> +

That's a general cleanup function for user space allocations and should 
not be abused for stuff like that.

Regards,
Christian.

>                  amdgpu_amdkfd_gpuvm_free_memory_of_gpu(pdd->dev->kgd, mem,
>                                                         pdd->drm_priv, NULL);
>                  kfd_process_device_remove_obj_handle(pdd, id);
>
> Regards,
> Lang
>
>> Regards,
>> Christian.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH] drm/kfd: fix ttm_bo_release warning
  2021-09-24  6:37                     ` Christian König
@ 2021-09-24 10:37                       ` Yu, Lang
  0 siblings, 0 replies; 15+ messages in thread
From: Yu, Lang @ 2021-09-24 10:37 UTC (permalink / raw)
  To: Christian König, Koenig, Christian, amd-gfx
  Cc: Kuehling, Felix, Huang, Ray

[AMD Official Use Only]



>-----Original Message-----
>From: Christian König <ckoenig.leichtzumerken@gmail.com>
>Sent: Friday, September 24, 2021 2:37 PM
>To: Yu, Lang <Lang.Yu@amd.com>; Koenig, Christian
><Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org
>Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Huang, Ray
><Ray.Huang@amd.com>
>Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>
>Am 24.09.21 um 08:34 schrieb Yu, Lang:
>> [AMD Official Use Only]
>>
>>
>>
>>> -----Original Message-----
>>> From: Koenig, Christian <Christian.Koenig@amd.com>
>>> Sent: Friday, September 24, 2021 1:54 PM
>>> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org
>>> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Huang, Ray
>>> <Ray.Huang@amd.com>
>>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>>>
>>>
>>> Am 24.09.21 um 07:50 schrieb Yu, Lang:
>>>> [AMD Official Use Only]
>>>>> [SNIP]
>>>>>>>> Hi Christian,
>>>>>>>>
>>>>>>>> Thanks for your explanation and advice. I got your point.
>>>>>>>> Actually, these BOs are allocated and pinned during a kfd process
>lifecycle.
>>>>>>>> I will try to add a flag into struct kgd_mem to indicate which
>>>>>>>> BO is pined and should be unpinned. Which will make
>>>>>>>> amdgpu_bo_pin/amdgpu_bo_unpin calls balanced. Thanks!
>>>>>>> That isn't to much better. The real solution would be to unpin
>>>>>>> them when the kfd process is destroyed.
>>>>>> Yes, will unpin them when the kfd process is destroyed.
>>>>>> But we should indicate which BO is pinned and should be unpinned. Right?
>>>>> Well not with a flag or something like that.
>>>>>
>>>>> The knowledge which BO is pinned and needs to be unpinned should
>>>>> come from the control logic and not be papered over by some general
>handling.
>>>>> That's the background why we have removed the general handling for
>>>>> this from TTM in the first place.
>>>>>
>>>>> In other words when you need to pin a BO because it is kmapped it
>>>>> should be unpinned when it is kunmapped and if you don't kunmap at
>>>>> all then there is something wrong with the code structure from a
>>>>> higher level
>>> point of view.
>>>> Yes, this function "amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel" did a
>>>> kmap, but without a kunmap when the kfd process is destroyed. The
>>>> flag
>>> actually indicates kmap/kunmap.
>>>
>>> Well that's the wrong approach then. I mean you need to have the BO
>>> reference and the mapped pointer somewhere, don't you?
>>>
>>> How do you clean those up?
>> They are respectively cleaned by " kfd_process_device_free_bos " and "
>kfd_process_destroy_pdds".
>> Let me put the code here. Thanks!
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> index ec028cf963f5..d65b3bf13fd8 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> @@ -81,6 +81,7 @@ struct kgd_mem {
>>
>>          bool aql_queue;
>>          bool is_imported;
>> +       bool is_mapped_to_kernel;
>
>Yeah, that is exactly what you absolutely should NOT do.
>
>>   };
>>
>>   /* KFD Memory Eviction */
>> @@ -278,6 +279,8 @@ int amdgpu_amdkfd_gpuvm_sync_memory(
>>                  struct kgd_dev *kgd, struct kgd_mem *mem, bool intr);
>>   int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd,
>>                  struct kgd_mem *mem, void **kptr, uint64_t *size);
>
>The real question is who is calling this function here?

Currently  there are 3 places called function "amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel" to kmap a BO. 

1, kmap a ptr for pdd->qpd->cwsr_kaddr
Call stack:
amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel
kfd_process_alloc_gpuvm
kfd_process_device_init_cwsr_dgpu
kfd_process_device_init_vm
kfd_ioctl_acquire_vm

2, kmap a ptr for pdd->qpd->ib_kaddr
Call stack:
amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel
kfd_process_alloc_gpuvm
kfd_process_device_reserve_ib_mem
kfd_process_device_init_vm
kfd_ioctl_acquire_vm

3, kmap a ptr for p->signal_page->kernel_address
Call stack:
amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel
kfd_ioctl_create_event

The problem is these kmaped BOs were not kunmaped properly 
when the kfd process is destroyed.

Regards,
Lang

>> +void amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(struct kgd_dev
>*kgd,
>> +               struct kgd_mem *mem);
>>   int amdgpu_amdkfd_gpuvm_restore_process_bos(void *process_info,
>>                                              struct dma_fence **ef);
>>   int amdgpu_amdkfd_gpuvm_get_vm_fault_info(struct kgd_dev *kgd, diff
>> --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> index 2d6b2d77b738..45ccbe9f63ee 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> @@ -1857,6 +1857,8 @@ int
>> amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd,
>>
>>          amdgpu_bo_unreserve(bo);
>>
>> +       mem->is_mapped_to_kernel = true;
>> +
>>          mutex_unlock(&mem->process_info->lock);
>>          return 0;
>>
>> @@ -1870,6 +1872,20 @@ int
>amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd,
>>          return ret;
>>   }
>>
>> +void amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(struct kgd_dev
>> +*kgd, struct kgd_mem *mem) {
>> +       struct amdgpu_bo *bo = mem->bo;
>> +
>> +       if (!mem->is_mapped_to_kernel)
>> +               return;
>> +
>> +       amdgpu_bo_reserve(bo, true);
>> +       amdgpu_bo_kunmap(bo);
>> +       amdgpu_bo_unpin(bo);
>> +       amdgpu_bo_unreserve(bo);
>> +       mem->is_mapped_to_kernel = false; }
>> +
>>   int amdgpu_amdkfd_gpuvm_get_vm_fault_info(struct kgd_dev *kgd,
>>                                                struct kfd_vm_fault_info *mem)
>>   {
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> index 21ec8a18cad2..f5506b153aed 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> @@ -941,6 +941,8 @@ static void kfd_process_device_free_bos(struct
>kfd_process_device *pdd)
>>                                  peer_pdd->dev->kgd, mem, peer_pdd->drm_priv);
>>                  }
>>
>> +
>> + amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(pdd->dev->kgd, mem);
>> +
>
>That's a general cleanup function for user space allocations and should not be
>abused for stuff like that.
>
>Regards,
>Christian.
>
>>                  amdgpu_amdkfd_gpuvm_free_memory_of_gpu(pdd->dev->kgd, mem,
>>                                                         pdd->drm_priv, NULL);
>>                  kfd_process_device_remove_obj_handle(pdd, id);
>>
>> Regards,
>> Lang
>>
>>> Regards,
>>> Christian.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-09-24 10:37 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-23  9:44 [PATCH] drm/kfd: fix ttm_bo_release warning Lang Yu
2021-09-23 11:39 ` Christian König
2021-09-23 12:09   ` Yu, Lang
2021-09-23 12:23     ` Christian König
2021-09-23 14:24       ` Yu, Lang
2021-09-23 14:52         ` Christian König
2021-09-24  5:35           ` Yu, Lang
2021-09-24  5:42             ` Christian König
2021-09-24  5:50               ` Yu, Lang
2021-09-24  5:54                 ` Christian König
2021-09-24  6:34                   ` Yu, Lang
2021-09-24  6:37                     ` Christian König
2021-09-24 10:37                       ` Yu, Lang
2021-09-23 16:21     ` Felix Kuehling
2021-09-24  5:35       ` Yu, Lang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).