* [PATCH] drm/kfd: fix ttm_bo_release warning @ 2021-09-23 9:44 Lang Yu 2021-09-23 11:39 ` Christian König 0 siblings, 1 reply; 15+ messages in thread From: Lang Yu @ 2021-09-23 9:44 UTC (permalink / raw) To: amd-gfx; +Cc: Felix Kuehling, Christian K nig, Huang Rui, Lang Yu If a BO is pinned, unpin it before freeing it. Call Trace: ttm_bo_put+0x30/0x50 [ttm] amdgpu_bo_unref+0x1e/0x30 [amdgpu] amdgpu_gem_object_free+0x34/0x50 [amdgpu] drm_gem_object_free+0x1d/0x30 [drm] amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x31f/0x3a0 [amdgpu] kfd_process_device_free_bos+0xa3/0xf0 [amdgpu] kfd_process_wq_release+0x224/0x2e0 [amdgpu] process_one_work+0x220/0x3c0 worker_thread+0x4d/0x3f0 kthread+0x114/0x150 process_one_work+0x3c0/0x3c0 kthread_park+0x90/0x90 ret_from_fork+0x22/0x30 Signed-off-by: Lang Yu <lang.yu@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 2d6b2d77b738..7e693b064072 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1567,6 +1567,9 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu( pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va, mem->va + bo_size * (1 + mem->aql_queue)); + if (mem->bo->tbo.pin_count) + amdgpu_bo_unpin(mem->bo); + ret = unreserve_bo_and_vms(&ctx, false, false); /* Remove from VM internal data structures */ -- 2.25.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] drm/kfd: fix ttm_bo_release warning 2021-09-23 9:44 [PATCH] drm/kfd: fix ttm_bo_release warning Lang Yu @ 2021-09-23 11:39 ` Christian König 2021-09-23 12:09 ` Yu, Lang 0 siblings, 1 reply; 15+ messages in thread From: Christian König @ 2021-09-23 11:39 UTC (permalink / raw) To: Lang Yu, amd-gfx; +Cc: Felix Kuehling, Christian K nig, Huang Rui Am 23.09.21 um 11:44 schrieb Lang Yu: > If a BO is pinned, unpin it before freeing it. > > Call Trace: > ttm_bo_put+0x30/0x50 [ttm] > amdgpu_bo_unref+0x1e/0x30 [amdgpu] > amdgpu_gem_object_free+0x34/0x50 [amdgpu] > drm_gem_object_free+0x1d/0x30 [drm] > amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x31f/0x3a0 [amdgpu] > kfd_process_device_free_bos+0xa3/0xf0 [amdgpu] > kfd_process_wq_release+0x224/0x2e0 [amdgpu] > process_one_work+0x220/0x3c0 > worker_thread+0x4d/0x3f0 > kthread+0x114/0x150 > process_one_work+0x3c0/0x3c0 > kthread_park+0x90/0x90 > ret_from_fork+0x22/0x30 > > Signed-off-by: Lang Yu <lang.yu@amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > index 2d6b2d77b738..7e693b064072 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > @@ -1567,6 +1567,9 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu( > pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va, > mem->va + bo_size * (1 + mem->aql_queue)); > > + if (mem->bo->tbo.pin_count) > + amdgpu_bo_unpin(mem->bo); > + NAK, using mem->bo->tbo.pin_count like this is illegal. Where was the BO pinned in the first place? Christian. > ret = unreserve_bo_and_vms(&ctx, false, false); > > /* Remove from VM internal data structures */ ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH] drm/kfd: fix ttm_bo_release warning 2021-09-23 11:39 ` Christian König @ 2021-09-23 12:09 ` Yu, Lang 2021-09-23 12:23 ` Christian König 2021-09-23 16:21 ` Felix Kuehling 0 siblings, 2 replies; 15+ messages in thread From: Yu, Lang @ 2021-09-23 12:09 UTC (permalink / raw) To: Koenig, Christian, amd-gfx; +Cc: Kuehling, Felix, Christian K nig, Huang, Ray [AMD Official Use Only] >-----Original Message----- >From: Koenig, Christian <Christian.Koenig@amd.com> >Sent: Thursday, September 23, 2021 7:40 PM >To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org >Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Christian K nig ><C3B6christian.koenig@amd.com>; Huang, Ray <Ray.Huang@amd.com> >Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning > > > >Am 23.09.21 um 11:44 schrieb Lang Yu: >> If a BO is pinned, unpin it before freeing it. >> >> Call Trace: >> ttm_bo_put+0x30/0x50 [ttm] >> amdgpu_bo_unref+0x1e/0x30 [amdgpu] >> amdgpu_gem_object_free+0x34/0x50 [amdgpu] >> drm_gem_object_free+0x1d/0x30 [drm] >> amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x31f/0x3a0 [amdgpu] >> kfd_process_device_free_bos+0xa3/0xf0 [amdgpu] >> kfd_process_wq_release+0x224/0x2e0 [amdgpu] >> process_one_work+0x220/0x3c0 >> worker_thread+0x4d/0x3f0 >> kthread+0x114/0x150 >> process_one_work+0x3c0/0x3c0 >> kthread_park+0x90/0x90 >> ret_from_fork+0x22/0x30 >> >> Signed-off-by: Lang Yu <lang.yu@amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >> index 2d6b2d77b738..7e693b064072 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >> @@ -1567,6 +1567,9 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu( >> pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va, >> mem->va + bo_size * (1 + mem->aql_queue)); >> >> + if (mem->bo->tbo.pin_count) >> + amdgpu_bo_unpin(mem->bo); >> + > >NAK, using mem->bo->tbo.pin_count like this is illegal. I didn't get your point. I referred to function-"void amdgpu_bo_unpin(struct amdgpu_bo *bo)", which uses it like this. >Where was the BO pinned in the first place? I found two places: ret = kfd_process_alloc_gpuvm(pdd, qpd->ib_base, PAGE_SIZE, flags, &kaddr); ret = kfd_process_alloc_gpuvm(pdd, qpd->cwsr_base, KFD_CWSR_TBA_TMA_SIZE, flags, &kaddr); Regards, Lang >Christian. > >> ret = unreserve_bo_and_vms(&ctx, false, false); >> >> /* Remove from VM internal data structures */ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] drm/kfd: fix ttm_bo_release warning 2021-09-23 12:09 ` Yu, Lang @ 2021-09-23 12:23 ` Christian König 2021-09-23 14:24 ` Yu, Lang 2021-09-23 16:21 ` Felix Kuehling 1 sibling, 1 reply; 15+ messages in thread From: Christian König @ 2021-09-23 12:23 UTC (permalink / raw) To: Yu, Lang, amd-gfx; +Cc: Kuehling, Felix, Christian K nig, Huang, Ray Am 23.09.21 um 14:09 schrieb Yu, Lang: > [AMD Official Use Only] > > > >> -----Original Message----- >> From: Koenig, Christian <Christian.Koenig@amd.com> >> Sent: Thursday, September 23, 2021 7:40 PM >> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org >> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Christian K nig >> <C3B6christian.koenig@amd.com>; Huang, Ray <Ray.Huang@amd.com> >> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning >> >> >> >> Am 23.09.21 um 11:44 schrieb Lang Yu: >>> If a BO is pinned, unpin it before freeing it. >>> >>> Call Trace: >>> ttm_bo_put+0x30/0x50 [ttm] >>> amdgpu_bo_unref+0x1e/0x30 [amdgpu] >>> amdgpu_gem_object_free+0x34/0x50 [amdgpu] >>> drm_gem_object_free+0x1d/0x30 [drm] >>> amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x31f/0x3a0 [amdgpu] >>> kfd_process_device_free_bos+0xa3/0xf0 [amdgpu] >>> kfd_process_wq_release+0x224/0x2e0 [amdgpu] >>> process_one_work+0x220/0x3c0 >>> worker_thread+0x4d/0x3f0 >>> kthread+0x114/0x150 >>> process_one_work+0x3c0/0x3c0 >>> kthread_park+0x90/0x90 >>> ret_from_fork+0x22/0x30 >>> >>> Signed-off-by: Lang Yu <lang.yu@amd.com> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>> index 2d6b2d77b738..7e693b064072 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>> @@ -1567,6 +1567,9 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu( >>> pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va, >>> mem->va + bo_size * (1 + mem->aql_queue)); >>> >>> + if (mem->bo->tbo.pin_count) >>> + amdgpu_bo_unpin(mem->bo); >>> + >> NAK, using mem->bo->tbo.pin_count like this is illegal. > I didn't get your point. I referred to function-"void amdgpu_bo_unpin(struct amdgpu_bo *bo)", > which uses it like this. What amdgpu_bo_unpin() does is the following: ttm_bo_unpin(&bo->tbo); if (bo->tbo.pin_count) return; .... In other words we take further actions based on if the buffer us is still pinned or not after an unpin operation. What you try to do here is unpinning the BO when it is pinned independent if somebody else or our code has pinned it before. That can lead to all kind of problems and is clearly illegal. >> Where was the BO pinned in the first place? > I found two places: > > ret = kfd_process_alloc_gpuvm(pdd, qpd->ib_base, PAGE_SIZE, flags, > &kaddr); > > ret = kfd_process_alloc_gpuvm(pdd, qpd->cwsr_base, > KFD_CWSR_TBA_TMA_SIZE, flags, &kaddr); Well then you need to figure out where that memory is freed again and place the unpin appropriately. General rule of thumb is that calls to amdgpu_bo_pin/amdgpu_bo_unpin should be balanced. Regards, Christian. > Regards, > Lang > >> Christian. >> >>> ret = unreserve_bo_and_vms(&ctx, false, false); >>> >>> /* Remove from VM internal data structures */ ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH] drm/kfd: fix ttm_bo_release warning 2021-09-23 12:23 ` Christian König @ 2021-09-23 14:24 ` Yu, Lang 2021-09-23 14:52 ` Christian König 0 siblings, 1 reply; 15+ messages in thread From: Yu, Lang @ 2021-09-23 14:24 UTC (permalink / raw) To: Koenig, Christian, amd-gfx; +Cc: Kuehling, Felix, Huang, Ray [AMD Official Use Only] >-----Original Message----- >From: Koenig, Christian <Christian.Koenig@amd.com> >Sent: Thursday, September 23, 2021 8:24 PM >To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org >Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Christian K nig ><C3B6christian.koenig@amd.com>; Huang, Ray <Ray.Huang@amd.com> >Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning > >Am 23.09.21 um 14:09 schrieb Yu, Lang: >> [AMD Official Use Only] >> >> >> >>> -----Original Message----- >>> From: Koenig, Christian <Christian.Koenig@amd.com> >>> Sent: Thursday, September 23, 2021 7:40 PM >>> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org >>> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Christian K nig >>> <C3B6christian.koenig@amd.com>; Huang, Ray <Ray.Huang@amd.com> >>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning >>> >>> >>> >>> Am 23.09.21 um 11:44 schrieb Lang Yu: >>>> If a BO is pinned, unpin it before freeing it. >>>> >>>> Call Trace: >>>> ttm_bo_put+0x30/0x50 [ttm] >>>> amdgpu_bo_unref+0x1e/0x30 [amdgpu] >>>> amdgpu_gem_object_free+0x34/0x50 [amdgpu] >>>> drm_gem_object_free+0x1d/0x30 [drm] >>>> amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x31f/0x3a0 [amdgpu] >>>> kfd_process_device_free_bos+0xa3/0xf0 [amdgpu] >>>> kfd_process_wq_release+0x224/0x2e0 [amdgpu] >>>> process_one_work+0x220/0x3c0 >>>> worker_thread+0x4d/0x3f0 >>>> kthread+0x114/0x150 >>>> process_one_work+0x3c0/0x3c0 >>>> kthread_park+0x90/0x90 >>>> ret_from_fork+0x22/0x30 >>>> >>>> Signed-off-by: Lang Yu <lang.yu@amd.com> >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +++ >>>> 1 file changed, 3 insertions(+) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>> index 2d6b2d77b738..7e693b064072 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>> @@ -1567,6 +1567,9 @@ int >amdgpu_amdkfd_gpuvm_free_memory_of_gpu( >>>> pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va, >>>> mem->va + bo_size * (1 + mem->aql_queue)); >>>> >>>> + if (mem->bo->tbo.pin_count) >>>> + amdgpu_bo_unpin(mem->bo); >>>> + >>> NAK, using mem->bo->tbo.pin_count like this is illegal. >> I didn't get your point. I referred to function-"void >> amdgpu_bo_unpin(struct amdgpu_bo *bo)", which uses it like this. > >What amdgpu_bo_unpin() does is the following: > > ttm_bo_unpin(&bo->tbo); > if (bo->tbo.pin_count) > return; >.... > >In other words we take further actions based on if the buffer us is still pinned or >not after an unpin operation. > >What you try to do here is unpinning the BO when it is pinned independent if >somebody else or our code has pinned it before. Hi Christian, Thanks for your explanation and advice. I got your point. Actually, these BOs are allocated and pinned during a kfd process lifecycle. I will try to add a flag into struct kgd_mem to indicate which BO is pined and should be unpinned. Which will make amdgpu_bo_pin/amdgpu_bo_unpin calls balanced. Thanks! Regards, Lang > >That can lead to all kind of problems and is clearly illegal. > >>> Where was the BO pinned in the first place? >> I found two places: >> >> ret = kfd_process_alloc_gpuvm(pdd, qpd->ib_base, PAGE_SIZE, flags, >> &kaddr); >> >> ret = kfd_process_alloc_gpuvm(pdd, qpd->cwsr_base, >> KFD_CWSR_TBA_TMA_SIZE, flags, &kaddr); > >Well then you need to figure out where that memory is freed again and place the >unpin appropriately. > >General rule of thumb is that calls to amdgpu_bo_pin/amdgpu_bo_unpin should >be balanced. > >Regards, >Christian. > >> Regards, >> Lang >> >>> Christian. >>> >>>> ret = unreserve_bo_and_vms(&ctx, false, false); >>>> >>>> /* Remove from VM internal data structures */ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] drm/kfd: fix ttm_bo_release warning 2021-09-23 14:24 ` Yu, Lang @ 2021-09-23 14:52 ` Christian König 2021-09-24 5:35 ` Yu, Lang 0 siblings, 1 reply; 15+ messages in thread From: Christian König @ 2021-09-23 14:52 UTC (permalink / raw) To: Yu, Lang, amd-gfx; +Cc: Kuehling, Felix, Huang, Ray Am 23.09.21 um 16:24 schrieb Yu, Lang: > [AMD Official Use Only] >> -----Original Message----- >> From: Koenig, Christian <Christian.Koenig@amd.com> >> Sent: Thursday, September 23, 2021 8:24 PM >> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org >> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Christian K nig >> <C3B6christian.koenig@amd.com>; Huang, Ray <Ray.Huang@amd.com> >> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning >> >> Am 23.09.21 um 14:09 schrieb Yu, Lang: >>> [AMD Official Use Only] >>> >>> >>> >>>> -----Original Message----- >>>> From: Koenig, Christian <Christian.Koenig@amd.com> >>>> Sent: Thursday, September 23, 2021 7:40 PM >>>> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org >>>> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Christian K nig >>>> <C3B6christian.koenig@amd.com>; Huang, Ray <Ray.Huang@amd.com> >>>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning >>>> >>>> >>>> >>>> Am 23.09.21 um 11:44 schrieb Lang Yu: >>>>> If a BO is pinned, unpin it before freeing it. >>>>> >>>>> Call Trace: >>>>> ttm_bo_put+0x30/0x50 [ttm] >>>>> amdgpu_bo_unref+0x1e/0x30 [amdgpu] >>>>> amdgpu_gem_object_free+0x34/0x50 [amdgpu] >>>>> drm_gem_object_free+0x1d/0x30 [drm] >>>>> amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x31f/0x3a0 [amdgpu] >>>>> kfd_process_device_free_bos+0xa3/0xf0 [amdgpu] >>>>> kfd_process_wq_release+0x224/0x2e0 [amdgpu] >>>>> process_one_work+0x220/0x3c0 >>>>> worker_thread+0x4d/0x3f0 >>>>> kthread+0x114/0x150 >>>>> process_one_work+0x3c0/0x3c0 >>>>> kthread_park+0x90/0x90 >>>>> ret_from_fork+0x22/0x30 >>>>> >>>>> Signed-off-by: Lang Yu <lang.yu@amd.com> >>>>> --- >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +++ >>>>> 1 file changed, 3 insertions(+) >>>>> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>>> index 2d6b2d77b738..7e693b064072 100644 >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>>> @@ -1567,6 +1567,9 @@ int >> amdgpu_amdkfd_gpuvm_free_memory_of_gpu( >>>>> pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va, >>>>> mem->va + bo_size * (1 + mem->aql_queue)); >>>>> >>>>> + if (mem->bo->tbo.pin_count) >>>>> + amdgpu_bo_unpin(mem->bo); >>>>> + >>>> NAK, using mem->bo->tbo.pin_count like this is illegal. >>> I didn't get your point. I referred to function-"void >>> amdgpu_bo_unpin(struct amdgpu_bo *bo)", which uses it like this. >> What amdgpu_bo_unpin() does is the following: >> >> ttm_bo_unpin(&bo->tbo); >> if (bo->tbo.pin_count) >> return; >> .... >> >> In other words we take further actions based on if the buffer us is still pinned or >> not after an unpin operation. >> >> What you try to do here is unpinning the BO when it is pinned independent if >> somebody else or our code has pinned it before. > Hi Christian, > > Thanks for your explanation and advice. I got your point. > Actually, these BOs are allocated and pinned during a kfd process lifecycle. > I will try to add a flag into struct kgd_mem to indicate which BO is pined > and should be unpinned. Which will make amdgpu_bo_pin/amdgpu_bo_unpin > calls balanced. Thanks! That isn't to much better. The real solution would be to unpin them when the kfd process is destroyed. Regards, Christian. > > Regards, > Lang >> That can lead to all kind of problems and is clearly illegal. >> >>>> Where was the BO pinned in the first place? >>> I found two places: >>> >>> ret = kfd_process_alloc_gpuvm(pdd, qpd->ib_base, PAGE_SIZE, flags, >>> &kaddr); >>> >>> ret = kfd_process_alloc_gpuvm(pdd, qpd->cwsr_base, >>> KFD_CWSR_TBA_TMA_SIZE, flags, &kaddr); >> Well then you need to figure out where that memory is freed again and place the >> unpin appropriately. >> >> General rule of thumb is that calls to amdgpu_bo_pin/amdgpu_bo_unpin should >> be balanced. >> >> Regards, >> Christian. >> >>> Regards, >>> Lang >>> >>>> Christian. >>>> >>>>> ret = unreserve_bo_and_vms(&ctx, false, false); >>>>> >>>>> /* Remove from VM internal data structures */ ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH] drm/kfd: fix ttm_bo_release warning 2021-09-23 14:52 ` Christian König @ 2021-09-24 5:35 ` Yu, Lang 2021-09-24 5:42 ` Christian König 0 siblings, 1 reply; 15+ messages in thread From: Yu, Lang @ 2021-09-24 5:35 UTC (permalink / raw) To: Koenig, Christian, amd-gfx; +Cc: Kuehling, Felix, Huang, Ray [AMD Official Use Only] >-----Original Message----- >From: Koenig, Christian <Christian.Koenig@amd.com> >Sent: Thursday, September 23, 2021 10:52 PM >To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org >Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Huang, Ray ><Ray.Huang@amd.com> >Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning > >Am 23.09.21 um 16:24 schrieb Yu, Lang: >> [AMD Official Use Only] >>> -----Original Message----- >>> From: Koenig, Christian <Christian.Koenig@amd.com> >>> Sent: Thursday, September 23, 2021 8:24 PM >>> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org >>> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Christian K nig >>> <C3B6christian.koenig@amd.com>; Huang, Ray <Ray.Huang@amd.com> >>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning >>> >>> Am 23.09.21 um 14:09 schrieb Yu, Lang: >>>> [AMD Official Use Only] >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: Koenig, Christian <Christian.Koenig@amd.com> >>>>> Sent: Thursday, September 23, 2021 7:40 PM >>>>> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org >>>>> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Christian K nig >>>>> <C3B6christian.koenig@amd.com>; Huang, Ray <Ray.Huang@amd.com> >>>>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning >>>>> >>>>> >>>>> >>>>> Am 23.09.21 um 11:44 schrieb Lang Yu: >>>>>> If a BO is pinned, unpin it before freeing it. >>>>>> >>>>>> Call Trace: >>>>>> ttm_bo_put+0x30/0x50 [ttm] >>>>>> amdgpu_bo_unref+0x1e/0x30 [amdgpu] >>>>>> amdgpu_gem_object_free+0x34/0x50 [amdgpu] >>>>>> drm_gem_object_free+0x1d/0x30 [drm] >>>>>> amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x31f/0x3a0 [amdgpu] >>>>>> kfd_process_device_free_bos+0xa3/0xf0 [amdgpu] >>>>>> kfd_process_wq_release+0x224/0x2e0 [amdgpu] >>>>>> process_one_work+0x220/0x3c0 >>>>>> worker_thread+0x4d/0x3f0 >>>>>> kthread+0x114/0x150 >>>>>> process_one_work+0x3c0/0x3c0 >>>>>> kthread_park+0x90/0x90 >>>>>> ret_from_fork+0x22/0x30 >>>>>> >>>>>> Signed-off-by: Lang Yu <lang.yu@amd.com> >>>>>> --- >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +++ >>>>>> 1 file changed, 3 insertions(+) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>>>> index 2d6b2d77b738..7e693b064072 100644 >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>>>> @@ -1567,6 +1567,9 @@ int >>> amdgpu_amdkfd_gpuvm_free_memory_of_gpu( >>>>>> pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va, >>>>>> mem->va + bo_size * (1 + mem->aql_queue)); >>>>>> >>>>>> + if (mem->bo->tbo.pin_count) >>>>>> + amdgpu_bo_unpin(mem->bo); >>>>>> + >>>>> NAK, using mem->bo->tbo.pin_count like this is illegal. >>>> I didn't get your point. I referred to function-"void >>>> amdgpu_bo_unpin(struct amdgpu_bo *bo)", which uses it like this. >>> What amdgpu_bo_unpin() does is the following: >>> >>> ttm_bo_unpin(&bo->tbo); >>> if (bo->tbo.pin_count) >>> return; >>> .... >>> >>> In other words we take further actions based on if the buffer us is >>> still pinned or not after an unpin operation. >>> >>> What you try to do here is unpinning the BO when it is pinned >>> independent if somebody else or our code has pinned it before. >> Hi Christian, >> >> Thanks for your explanation and advice. I got your point. >> Actually, these BOs are allocated and pinned during a kfd process lifecycle. >> I will try to add a flag into struct kgd_mem to indicate which BO is >> pined and should be unpinned. Which will make >> amdgpu_bo_pin/amdgpu_bo_unpin calls balanced. Thanks! > >That isn't to much better. The real solution would be to unpin them when the kfd >process is destroyed. Yes, will unpin them when the kfd process is destroyed. But we should indicate which BO is pinned and should be unpinned. Right? Regards, Lang >Regards, >Christian. > >> >> Regards, >> Lang >>> That can lead to all kind of problems and is clearly illegal. >>> >>>>> Where was the BO pinned in the first place? >>>> I found two places: >>>> >>>> ret = kfd_process_alloc_gpuvm(pdd, qpd->ib_base, PAGE_SIZE, flags, >>>> &kaddr); >>>> >>>> ret = kfd_process_alloc_gpuvm(pdd, qpd->cwsr_base, >>>> KFD_CWSR_TBA_TMA_SIZE, flags, &kaddr); >>> Well then you need to figure out where that memory is freed again and >>> place the unpin appropriately. >>> >>> General rule of thumb is that calls to amdgpu_bo_pin/amdgpu_bo_unpin >>> should be balanced. >>> >>> Regards, >>> Christian. >>> >>>> Regards, >>>> Lang >>>> >>>>> Christian. >>>>> >>>>>> ret = unreserve_bo_and_vms(&ctx, false, false); >>>>>> >>>>>> /* Remove from VM internal data structures */ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] drm/kfd: fix ttm_bo_release warning 2021-09-24 5:35 ` Yu, Lang @ 2021-09-24 5:42 ` Christian König 2021-09-24 5:50 ` Yu, Lang 0 siblings, 1 reply; 15+ messages in thread From: Christian König @ 2021-09-24 5:42 UTC (permalink / raw) To: Yu, Lang, amd-gfx; +Cc: Kuehling, Felix, Huang, Ray Am 24.09.21 um 07:35 schrieb Yu, Lang: > [AMD Official Use Only] >> -----Original Message----- >> From: Koenig, Christian <Christian.Koenig@amd.com> >> Sent: Thursday, September 23, 2021 10:52 PM >> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org >> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Huang, Ray >> <Ray.Huang@amd.com> >> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning >> >> Am 23.09.21 um 16:24 schrieb Yu, Lang: >>> [AMD Official Use Only] >>>> -----Original Message----- >>>> From: Koenig, Christian <Christian.Koenig@amd.com> >>>> Sent: Thursday, September 23, 2021 8:24 PM >>>> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org >>>> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Christian K nig >>>> <C3B6christian.koenig@amd.com>; Huang, Ray <Ray.Huang@amd.com> >>>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning >>>> >>>> Am 23.09.21 um 14:09 schrieb Yu, Lang: >>>>> [AMD Official Use Only] >>>>> >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Koenig, Christian <Christian.Koenig@amd.com> >>>>>> Sent: Thursday, September 23, 2021 7:40 PM >>>>>> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org >>>>>> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Christian K nig >>>>>> <C3B6christian.koenig@amd.com>; Huang, Ray <Ray.Huang@amd.com> >>>>>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning >>>>>> >>>>>> >>>>>> >>>>>> Am 23.09.21 um 11:44 schrieb Lang Yu: >>>>>>> If a BO is pinned, unpin it before freeing it. >>>>>>> >>>>>>> Call Trace: >>>>>>> ttm_bo_put+0x30/0x50 [ttm] >>>>>>> amdgpu_bo_unref+0x1e/0x30 [amdgpu] >>>>>>> amdgpu_gem_object_free+0x34/0x50 [amdgpu] >>>>>>> drm_gem_object_free+0x1d/0x30 [drm] >>>>>>> amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x31f/0x3a0 [amdgpu] >>>>>>> kfd_process_device_free_bos+0xa3/0xf0 [amdgpu] >>>>>>> kfd_process_wq_release+0x224/0x2e0 [amdgpu] >>>>>>> process_one_work+0x220/0x3c0 >>>>>>> worker_thread+0x4d/0x3f0 >>>>>>> kthread+0x114/0x150 >>>>>>> process_one_work+0x3c0/0x3c0 >>>>>>> kthread_park+0x90/0x90 >>>>>>> ret_from_fork+0x22/0x30 >>>>>>> >>>>>>> Signed-off-by: Lang Yu <lang.yu@amd.com> >>>>>>> --- >>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +++ >>>>>>> 1 file changed, 3 insertions(+) >>>>>>> >>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>>>>> index 2d6b2d77b738..7e693b064072 100644 >>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>>>>> @@ -1567,6 +1567,9 @@ int >>>> amdgpu_amdkfd_gpuvm_free_memory_of_gpu( >>>>>>> pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va, >>>>>>> mem->va + bo_size * (1 + mem->aql_queue)); >>>>>>> >>>>>>> + if (mem->bo->tbo.pin_count) >>>>>>> + amdgpu_bo_unpin(mem->bo); >>>>>>> + >>>>>> NAK, using mem->bo->tbo.pin_count like this is illegal. >>>>> I didn't get your point. I referred to function-"void >>>>> amdgpu_bo_unpin(struct amdgpu_bo *bo)", which uses it like this. >>>> What amdgpu_bo_unpin() does is the following: >>>> >>>> ttm_bo_unpin(&bo->tbo); >>>> if (bo->tbo.pin_count) >>>> return; >>>> .... >>>> >>>> In other words we take further actions based on if the buffer us is >>>> still pinned or not after an unpin operation. >>>> >>>> What you try to do here is unpinning the BO when it is pinned >>>> independent if somebody else or our code has pinned it before. >>> Hi Christian, >>> >>> Thanks for your explanation and advice. I got your point. >>> Actually, these BOs are allocated and pinned during a kfd process lifecycle. >>> I will try to add a flag into struct kgd_mem to indicate which BO is >>> pined and should be unpinned. Which will make >>> amdgpu_bo_pin/amdgpu_bo_unpin calls balanced. Thanks! >> That isn't to much better. The real solution would be to unpin them when the kfd >> process is destroyed. > Yes, will unpin them when the kfd process is destroyed. > But we should indicate which BO is pinned and should be unpinned. Right? Well not with a flag or something like that. The knowledge which BO is pinned and needs to be unpinned should come from the control logic and not be papered over by some general handling. That's the background why we have removed the general handling for this from TTM in the first place. In other words when you need to pin a BO because it is kmapped it should be unpinned when it is kunmapped and if you don't kunmap at all then there is something wrong with the code structure from a higher level point of view. Regards, Christian. > > Regards, > Lang > >> Regards, >> Christian. >> >>> Regards, >>> Lang >>>> That can lead to all kind of problems and is clearly illegal. >>>> >>>>>> Where was the BO pinned in the first place? >>>>> I found two places: >>>>> >>>>> ret = kfd_process_alloc_gpuvm(pdd, qpd->ib_base, PAGE_SIZE, flags, >>>>> &kaddr); >>>>> >>>>> ret = kfd_process_alloc_gpuvm(pdd, qpd->cwsr_base, >>>>> KFD_CWSR_TBA_TMA_SIZE, flags, &kaddr); >>>> Well then you need to figure out where that memory is freed again and >>>> place the unpin appropriately. >>>> >>>> General rule of thumb is that calls to amdgpu_bo_pin/amdgpu_bo_unpin >>>> should be balanced. >>>> >>>> Regards, >>>> Christian. >>>> >>>>> Regards, >>>>> Lang >>>>> >>>>>> Christian. >>>>>> >>>>>>> ret = unreserve_bo_and_vms(&ctx, false, false); >>>>>>> >>>>>>> /* Remove from VM internal data structures */ ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH] drm/kfd: fix ttm_bo_release warning 2021-09-24 5:42 ` Christian König @ 2021-09-24 5:50 ` Yu, Lang 2021-09-24 5:54 ` Christian König 0 siblings, 1 reply; 15+ messages in thread From: Yu, Lang @ 2021-09-24 5:50 UTC (permalink / raw) To: Koenig, Christian, amd-gfx; +Cc: Kuehling, Felix, Huang, Ray [AMD Official Use Only] >-----Original Message----- >From: Koenig, Christian <Christian.Koenig@amd.com> >Sent: Friday, September 24, 2021 1:42 PM >To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org >Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Huang, Ray ><Ray.Huang@amd.com> >Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning > >Am 24.09.21 um 07:35 schrieb Yu, Lang: >> [AMD Official Use Only] >>> -----Original Message----- >>> From: Koenig, Christian <Christian.Koenig@amd.com> >>> Sent: Thursday, September 23, 2021 10:52 PM >>> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org >>> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Huang, Ray >>> <Ray.Huang@amd.com> >>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning >>> >>> Am 23.09.21 um 16:24 schrieb Yu, Lang: >>>> [AMD Official Use Only] >>>>> -----Original Message----- >>>>> From: Koenig, Christian <Christian.Koenig@amd.com> >>>>> Sent: Thursday, September 23, 2021 8:24 PM >>>>> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org >>>>> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Christian K nig >>>>> <C3B6christian.koenig@amd.com>; Huang, Ray <Ray.Huang@amd.com> >>>>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning >>>>> >>>>> Am 23.09.21 um 14:09 schrieb Yu, Lang: >>>>>> [AMD Official Use Only] >>>>>> >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Koenig, Christian <Christian.Koenig@amd.com> >>>>>>> Sent: Thursday, September 23, 2021 7:40 PM >>>>>>> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org >>>>>>> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Christian K nig >>>>>>> <C3B6christian.koenig@amd.com>; Huang, Ray <Ray.Huang@amd.com> >>>>>>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning >>>>>>> >>>>>>> >>>>>>> >>>>>>> Am 23.09.21 um 11:44 schrieb Lang Yu: >>>>>>>> If a BO is pinned, unpin it before freeing it. >>>>>>>> >>>>>>>> Call Trace: >>>>>>>> ttm_bo_put+0x30/0x50 [ttm] >>>>>>>> amdgpu_bo_unref+0x1e/0x30 [amdgpu] >>>>>>>> amdgpu_gem_object_free+0x34/0x50 [amdgpu] >>>>>>>> drm_gem_object_free+0x1d/0x30 [drm] >>>>>>>> amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x31f/0x3a0 >[amdgpu] >>>>>>>> kfd_process_device_free_bos+0xa3/0xf0 [amdgpu] >>>>>>>> kfd_process_wq_release+0x224/0x2e0 [amdgpu] >>>>>>>> process_one_work+0x220/0x3c0 >>>>>>>> worker_thread+0x4d/0x3f0 >>>>>>>> kthread+0x114/0x150 >>>>>>>> process_one_work+0x3c0/0x3c0 >>>>>>>> kthread_park+0x90/0x90 >>>>>>>> ret_from_fork+0x22/0x30 >>>>>>>> >>>>>>>> Signed-off-by: Lang Yu <lang.yu@amd.com> >>>>>>>> --- >>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +++ >>>>>>>> 1 file changed, 3 insertions(+) >>>>>>>> >>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>>>>>> index 2d6b2d77b738..7e693b064072 100644 >>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>>>>>> @@ -1567,6 +1567,9 @@ int >>>>> amdgpu_amdkfd_gpuvm_free_memory_of_gpu( >>>>>>>> pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va, >>>>>>>> mem->va + bo_size * (1 + mem->aql_queue)); >>>>>>>> >>>>>>>> + if (mem->bo->tbo.pin_count) >>>>>>>> + amdgpu_bo_unpin(mem->bo); >>>>>>>> + >>>>>>> NAK, using mem->bo->tbo.pin_count like this is illegal. >>>>>> I didn't get your point. I referred to function-"void >>>>>> amdgpu_bo_unpin(struct amdgpu_bo *bo)", which uses it like this. >>>>> What amdgpu_bo_unpin() does is the following: >>>>> >>>>> ttm_bo_unpin(&bo->tbo); >>>>> if (bo->tbo.pin_count) >>>>> return; >>>>> .... >>>>> >>>>> In other words we take further actions based on if the buffer us is >>>>> still pinned or not after an unpin operation. >>>>> >>>>> What you try to do here is unpinning the BO when it is pinned >>>>> independent if somebody else or our code has pinned it before. >>>> Hi Christian, >>>> >>>> Thanks for your explanation and advice. I got your point. >>>> Actually, these BOs are allocated and pinned during a kfd process lifecycle. >>>> I will try to add a flag into struct kgd_mem to indicate which BO is >>>> pined and should be unpinned. Which will make >>>> amdgpu_bo_pin/amdgpu_bo_unpin calls balanced. Thanks! >>> That isn't to much better. The real solution would be to unpin them >>> when the kfd process is destroyed. >> Yes, will unpin them when the kfd process is destroyed. >> But we should indicate which BO is pinned and should be unpinned. Right? > >Well not with a flag or something like that. > >The knowledge which BO is pinned and needs to be unpinned should come from >the control logic and not be papered over by some general handling. >That's the background why we have removed the general handling for this from >TTM in the first place. > >In other words when you need to pin a BO because it is kmapped it should be >unpinned when it is kunmapped and if you don't kunmap at all then there is >something wrong with the code structure from a higher level point of view. Yes, this function "amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel" did a kmap, but without a kunmap when the kfd process is destroyed. The flag actually indicates kmap/kunmap. Regards, Lang >Regards, >Christian. > >> >> Regards, >> Lang >> >>> Regards, >>> Christian. >>> >>>> Regards, >>>> Lang >>>>> That can lead to all kind of problems and is clearly illegal. >>>>> >>>>>>> Where was the BO pinned in the first place? >>>>>> I found two places: >>>>>> >>>>>> ret = kfd_process_alloc_gpuvm(pdd, qpd->ib_base, PAGE_SIZE, flags, >>>>>> &kaddr); >>>>>> >>>>>> ret = kfd_process_alloc_gpuvm(pdd, qpd->cwsr_base, >>>>>> KFD_CWSR_TBA_TMA_SIZE, flags, &kaddr); >>>>> Well then you need to figure out where that memory is freed again >>>>> and place the unpin appropriately. >>>>> >>>>> General rule of thumb is that calls to >>>>> amdgpu_bo_pin/amdgpu_bo_unpin should be balanced. >>>>> >>>>> Regards, >>>>> Christian. >>>>> >>>>>> Regards, >>>>>> Lang >>>>>> >>>>>>> Christian. >>>>>>> >>>>>>>> ret = unreserve_bo_and_vms(&ctx, false, false); >>>>>>>> >>>>>>>> /* Remove from VM internal data structures */ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] drm/kfd: fix ttm_bo_release warning 2021-09-24 5:50 ` Yu, Lang @ 2021-09-24 5:54 ` Christian König 2021-09-24 6:34 ` Yu, Lang 0 siblings, 1 reply; 15+ messages in thread From: Christian König @ 2021-09-24 5:54 UTC (permalink / raw) To: Yu, Lang, amd-gfx; +Cc: Kuehling, Felix, Huang, Ray Am 24.09.21 um 07:50 schrieb Yu, Lang: > [AMD Official Use Only] >> [SNIP] >>>>> Hi Christian, >>>>> >>>>> Thanks for your explanation and advice. I got your point. >>>>> Actually, these BOs are allocated and pinned during a kfd process lifecycle. >>>>> I will try to add a flag into struct kgd_mem to indicate which BO is >>>>> pined and should be unpinned. Which will make >>>>> amdgpu_bo_pin/amdgpu_bo_unpin calls balanced. Thanks! >>>> That isn't to much better. The real solution would be to unpin them >>>> when the kfd process is destroyed. >>> Yes, will unpin them when the kfd process is destroyed. >>> But we should indicate which BO is pinned and should be unpinned. Right? >> Well not with a flag or something like that. >> >> The knowledge which BO is pinned and needs to be unpinned should come from >> the control logic and not be papered over by some general handling. >> That's the background why we have removed the general handling for this from >> TTM in the first place. >> >> In other words when you need to pin a BO because it is kmapped it should be >> unpinned when it is kunmapped and if you don't kunmap at all then there is >> something wrong with the code structure from a higher level point of view. > Yes, this function "amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel" did a kmap, > but without a kunmap when the kfd process is destroyed. The flag actually indicates kmap/kunmap. Well that's the wrong approach then. I mean you need to have the BO reference and the mapped pointer somewhere, don't you? How do you clean those up? Regards, Christian. ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH] drm/kfd: fix ttm_bo_release warning 2021-09-24 5:54 ` Christian König @ 2021-09-24 6:34 ` Yu, Lang 2021-09-24 6:37 ` Christian König 0 siblings, 1 reply; 15+ messages in thread From: Yu, Lang @ 2021-09-24 6:34 UTC (permalink / raw) To: Koenig, Christian, amd-gfx; +Cc: Kuehling, Felix, Huang, Ray [AMD Official Use Only] >-----Original Message----- >From: Koenig, Christian <Christian.Koenig@amd.com> >Sent: Friday, September 24, 2021 1:54 PM >To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org >Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Huang, Ray ><Ray.Huang@amd.com> >Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning > > >Am 24.09.21 um 07:50 schrieb Yu, Lang: >> [AMD Official Use Only] >>> [SNIP] >>>>>> Hi Christian, >>>>>> >>>>>> Thanks for your explanation and advice. I got your point. >>>>>> Actually, these BOs are allocated and pinned during a kfd process lifecycle. >>>>>> I will try to add a flag into struct kgd_mem to indicate which BO >>>>>> is pined and should be unpinned. Which will make >>>>>> amdgpu_bo_pin/amdgpu_bo_unpin calls balanced. Thanks! >>>>> That isn't to much better. The real solution would be to unpin them >>>>> when the kfd process is destroyed. >>>> Yes, will unpin them when the kfd process is destroyed. >>>> But we should indicate which BO is pinned and should be unpinned. Right? >>> Well not with a flag or something like that. >>> >>> The knowledge which BO is pinned and needs to be unpinned should come >>> from the control logic and not be papered over by some general handling. >>> That's the background why we have removed the general handling for >>> this from TTM in the first place. >>> >>> In other words when you need to pin a BO because it is kmapped it >>> should be unpinned when it is kunmapped and if you don't kunmap at >>> all then there is something wrong with the code structure from a higher level >point of view. >> Yes, this function "amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel" did a >> kmap, but without a kunmap when the kfd process is destroyed. The flag >actually indicates kmap/kunmap. > >Well that's the wrong approach then. I mean you need to have the BO reference >and the mapped pointer somewhere, don't you? > >How do you clean those up? They are respectively cleaned by " kfd_process_device_free_bos " and " kfd_process_destroy_pdds". Let me put the code here. Thanks! diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index ec028cf963f5..d65b3bf13fd8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -81,6 +81,7 @@ struct kgd_mem { bool aql_queue; bool is_imported; + bool is_mapped_to_kernel; }; /* KFD Memory Eviction */ @@ -278,6 +279,8 @@ int amdgpu_amdkfd_gpuvm_sync_memory( struct kgd_dev *kgd, struct kgd_mem *mem, bool intr); int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd, struct kgd_mem *mem, void **kptr, uint64_t *size); +void amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(struct kgd_dev *kgd, + struct kgd_mem *mem); int amdgpu_amdkfd_gpuvm_restore_process_bos(void *process_info, struct dma_fence **ef); int amdgpu_amdkfd_gpuvm_get_vm_fault_info(struct kgd_dev *kgd, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 2d6b2d77b738..45ccbe9f63ee 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1857,6 +1857,8 @@ int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd, amdgpu_bo_unreserve(bo); + mem->is_mapped_to_kernel = true; + mutex_unlock(&mem->process_info->lock); return 0; @@ -1870,6 +1872,20 @@ int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd, return ret; } +void amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(struct kgd_dev *kgd, struct kgd_mem *mem) +{ + struct amdgpu_bo *bo = mem->bo; + + if (!mem->is_mapped_to_kernel) + return; + + amdgpu_bo_reserve(bo, true); + amdgpu_bo_kunmap(bo); + amdgpu_bo_unpin(bo); + amdgpu_bo_unreserve(bo); + mem->is_mapped_to_kernel = false; +} + int amdgpu_amdkfd_gpuvm_get_vm_fault_info(struct kgd_dev *kgd, struct kfd_vm_fault_info *mem) { diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index 21ec8a18cad2..f5506b153aed 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -941,6 +941,8 @@ static void kfd_process_device_free_bos(struct kfd_process_device *pdd) peer_pdd->dev->kgd, mem, peer_pdd->drm_priv); } + amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(pdd->dev->kgd, mem); + amdgpu_amdkfd_gpuvm_free_memory_of_gpu(pdd->dev->kgd, mem, pdd->drm_priv, NULL); kfd_process_device_remove_obj_handle(pdd, id); Regards, Lang >Regards, >Christian. ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] drm/kfd: fix ttm_bo_release warning 2021-09-24 6:34 ` Yu, Lang @ 2021-09-24 6:37 ` Christian König 2021-09-24 10:37 ` Yu, Lang 0 siblings, 1 reply; 15+ messages in thread From: Christian König @ 2021-09-24 6:37 UTC (permalink / raw) To: Yu, Lang, Koenig, Christian, amd-gfx; +Cc: Kuehling, Felix, Huang, Ray Am 24.09.21 um 08:34 schrieb Yu, Lang: > [AMD Official Use Only] > > > >> -----Original Message----- >> From: Koenig, Christian <Christian.Koenig@amd.com> >> Sent: Friday, September 24, 2021 1:54 PM >> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org >> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Huang, Ray >> <Ray.Huang@amd.com> >> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning >> >> >> Am 24.09.21 um 07:50 schrieb Yu, Lang: >>> [AMD Official Use Only] >>>> [SNIP] >>>>>>> Hi Christian, >>>>>>> >>>>>>> Thanks for your explanation and advice. I got your point. >>>>>>> Actually, these BOs are allocated and pinned during a kfd process lifecycle. >>>>>>> I will try to add a flag into struct kgd_mem to indicate which BO >>>>>>> is pined and should be unpinned. Which will make >>>>>>> amdgpu_bo_pin/amdgpu_bo_unpin calls balanced. Thanks! >>>>>> That isn't to much better. The real solution would be to unpin them >>>>>> when the kfd process is destroyed. >>>>> Yes, will unpin them when the kfd process is destroyed. >>>>> But we should indicate which BO is pinned and should be unpinned. Right? >>>> Well not with a flag or something like that. >>>> >>>> The knowledge which BO is pinned and needs to be unpinned should come >>>> from the control logic and not be papered over by some general handling. >>>> That's the background why we have removed the general handling for >>>> this from TTM in the first place. >>>> >>>> In other words when you need to pin a BO because it is kmapped it >>>> should be unpinned when it is kunmapped and if you don't kunmap at >>>> all then there is something wrong with the code structure from a higher level >> point of view. >>> Yes, this function "amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel" did a >>> kmap, but without a kunmap when the kfd process is destroyed. The flag >> actually indicates kmap/kunmap. >> >> Well that's the wrong approach then. I mean you need to have the BO reference >> and the mapped pointer somewhere, don't you? >> >> How do you clean those up? > They are respectively cleaned by " kfd_process_device_free_bos " and " kfd_process_destroy_pdds". > Let me put the code here. Thanks! > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h > index ec028cf963f5..d65b3bf13fd8 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h > @@ -81,6 +81,7 @@ struct kgd_mem { > > bool aql_queue; > bool is_imported; > + bool is_mapped_to_kernel; Yeah, that is exactly what you absolutely should NOT do. > }; > > /* KFD Memory Eviction */ > @@ -278,6 +279,8 @@ int amdgpu_amdkfd_gpuvm_sync_memory( > struct kgd_dev *kgd, struct kgd_mem *mem, bool intr); > int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd, > struct kgd_mem *mem, void **kptr, uint64_t *size); The real question is who is calling this function here? > +void amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(struct kgd_dev *kgd, > + struct kgd_mem *mem); > int amdgpu_amdkfd_gpuvm_restore_process_bos(void *process_info, > struct dma_fence **ef); > int amdgpu_amdkfd_gpuvm_get_vm_fault_info(struct kgd_dev *kgd, > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > index 2d6b2d77b738..45ccbe9f63ee 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > @@ -1857,6 +1857,8 @@ int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd, > > amdgpu_bo_unreserve(bo); > > + mem->is_mapped_to_kernel = true; > + > mutex_unlock(&mem->process_info->lock); > return 0; > > @@ -1870,6 +1872,20 @@ int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd, > return ret; > } > > +void amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(struct kgd_dev *kgd, struct kgd_mem *mem) > +{ > + struct amdgpu_bo *bo = mem->bo; > + > + if (!mem->is_mapped_to_kernel) > + return; > + > + amdgpu_bo_reserve(bo, true); > + amdgpu_bo_kunmap(bo); > + amdgpu_bo_unpin(bo); > + amdgpu_bo_unreserve(bo); > + mem->is_mapped_to_kernel = false; > +} > + > int amdgpu_amdkfd_gpuvm_get_vm_fault_info(struct kgd_dev *kgd, > struct kfd_vm_fault_info *mem) > { > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c > index 21ec8a18cad2..f5506b153aed 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c > @@ -941,6 +941,8 @@ static void kfd_process_device_free_bos(struct kfd_process_device *pdd) > peer_pdd->dev->kgd, mem, peer_pdd->drm_priv); > } > > + amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(pdd->dev->kgd, mem); > + That's a general cleanup function for user space allocations and should not be abused for stuff like that. Regards, Christian. > amdgpu_amdkfd_gpuvm_free_memory_of_gpu(pdd->dev->kgd, mem, > pdd->drm_priv, NULL); > kfd_process_device_remove_obj_handle(pdd, id); > > Regards, > Lang > >> Regards, >> Christian. ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH] drm/kfd: fix ttm_bo_release warning 2021-09-24 6:37 ` Christian König @ 2021-09-24 10:37 ` Yu, Lang 0 siblings, 0 replies; 15+ messages in thread From: Yu, Lang @ 2021-09-24 10:37 UTC (permalink / raw) To: Christian König, Koenig, Christian, amd-gfx Cc: Kuehling, Felix, Huang, Ray [AMD Official Use Only] >-----Original Message----- >From: Christian König <ckoenig.leichtzumerken@gmail.com> >Sent: Friday, September 24, 2021 2:37 PM >To: Yu, Lang <Lang.Yu@amd.com>; Koenig, Christian ><Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org >Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Huang, Ray ><Ray.Huang@amd.com> >Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning > >Am 24.09.21 um 08:34 schrieb Yu, Lang: >> [AMD Official Use Only] >> >> >> >>> -----Original Message----- >>> From: Koenig, Christian <Christian.Koenig@amd.com> >>> Sent: Friday, September 24, 2021 1:54 PM >>> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org >>> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Huang, Ray >>> <Ray.Huang@amd.com> >>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning >>> >>> >>> Am 24.09.21 um 07:50 schrieb Yu, Lang: >>>> [AMD Official Use Only] >>>>> [SNIP] >>>>>>>> Hi Christian, >>>>>>>> >>>>>>>> Thanks for your explanation and advice. I got your point. >>>>>>>> Actually, these BOs are allocated and pinned during a kfd process >lifecycle. >>>>>>>> I will try to add a flag into struct kgd_mem to indicate which >>>>>>>> BO is pined and should be unpinned. Which will make >>>>>>>> amdgpu_bo_pin/amdgpu_bo_unpin calls balanced. Thanks! >>>>>>> That isn't to much better. The real solution would be to unpin >>>>>>> them when the kfd process is destroyed. >>>>>> Yes, will unpin them when the kfd process is destroyed. >>>>>> But we should indicate which BO is pinned and should be unpinned. Right? >>>>> Well not with a flag or something like that. >>>>> >>>>> The knowledge which BO is pinned and needs to be unpinned should >>>>> come from the control logic and not be papered over by some general >handling. >>>>> That's the background why we have removed the general handling for >>>>> this from TTM in the first place. >>>>> >>>>> In other words when you need to pin a BO because it is kmapped it >>>>> should be unpinned when it is kunmapped and if you don't kunmap at >>>>> all then there is something wrong with the code structure from a >>>>> higher level >>> point of view. >>>> Yes, this function "amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel" did a >>>> kmap, but without a kunmap when the kfd process is destroyed. The >>>> flag >>> actually indicates kmap/kunmap. >>> >>> Well that's the wrong approach then. I mean you need to have the BO >>> reference and the mapped pointer somewhere, don't you? >>> >>> How do you clean those up? >> They are respectively cleaned by " kfd_process_device_free_bos " and " >kfd_process_destroy_pdds". >> Let me put the code here. Thanks! >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h >> index ec028cf963f5..d65b3bf13fd8 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h >> @@ -81,6 +81,7 @@ struct kgd_mem { >> >> bool aql_queue; >> bool is_imported; >> + bool is_mapped_to_kernel; > >Yeah, that is exactly what you absolutely should NOT do. > >> }; >> >> /* KFD Memory Eviction */ >> @@ -278,6 +279,8 @@ int amdgpu_amdkfd_gpuvm_sync_memory( >> struct kgd_dev *kgd, struct kgd_mem *mem, bool intr); >> int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd, >> struct kgd_mem *mem, void **kptr, uint64_t *size); > >The real question is who is calling this function here? Currently there are 3 places called function "amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel" to kmap a BO. 1, kmap a ptr for pdd->qpd->cwsr_kaddr Call stack: amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel kfd_process_alloc_gpuvm kfd_process_device_init_cwsr_dgpu kfd_process_device_init_vm kfd_ioctl_acquire_vm 2, kmap a ptr for pdd->qpd->ib_kaddr Call stack: amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel kfd_process_alloc_gpuvm kfd_process_device_reserve_ib_mem kfd_process_device_init_vm kfd_ioctl_acquire_vm 3, kmap a ptr for p->signal_page->kernel_address Call stack: amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel kfd_ioctl_create_event The problem is these kmaped BOs were not kunmaped properly when the kfd process is destroyed. Regards, Lang >> +void amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(struct kgd_dev >*kgd, >> + struct kgd_mem *mem); >> int amdgpu_amdkfd_gpuvm_restore_process_bos(void *process_info, >> struct dma_fence **ef); >> int amdgpu_amdkfd_gpuvm_get_vm_fault_info(struct kgd_dev *kgd, diff >> --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >> index 2d6b2d77b738..45ccbe9f63ee 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >> @@ -1857,6 +1857,8 @@ int >> amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd, >> >> amdgpu_bo_unreserve(bo); >> >> + mem->is_mapped_to_kernel = true; >> + >> mutex_unlock(&mem->process_info->lock); >> return 0; >> >> @@ -1870,6 +1872,20 @@ int >amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd, >> return ret; >> } >> >> +void amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(struct kgd_dev >> +*kgd, struct kgd_mem *mem) { >> + struct amdgpu_bo *bo = mem->bo; >> + >> + if (!mem->is_mapped_to_kernel) >> + return; >> + >> + amdgpu_bo_reserve(bo, true); >> + amdgpu_bo_kunmap(bo); >> + amdgpu_bo_unpin(bo); >> + amdgpu_bo_unreserve(bo); >> + mem->is_mapped_to_kernel = false; } >> + >> int amdgpu_amdkfd_gpuvm_get_vm_fault_info(struct kgd_dev *kgd, >> struct kfd_vm_fault_info *mem) >> { >> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c >> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c >> index 21ec8a18cad2..f5506b153aed 100644 >> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c >> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c >> @@ -941,6 +941,8 @@ static void kfd_process_device_free_bos(struct >kfd_process_device *pdd) >> peer_pdd->dev->kgd, mem, peer_pdd->drm_priv); >> } >> >> + >> + amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(pdd->dev->kgd, mem); >> + > >That's a general cleanup function for user space allocations and should not be >abused for stuff like that. > >Regards, >Christian. > >> amdgpu_amdkfd_gpuvm_free_memory_of_gpu(pdd->dev->kgd, mem, >> pdd->drm_priv, NULL); >> kfd_process_device_remove_obj_handle(pdd, id); >> >> Regards, >> Lang >> >>> Regards, >>> Christian. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] drm/kfd: fix ttm_bo_release warning 2021-09-23 12:09 ` Yu, Lang 2021-09-23 12:23 ` Christian König @ 2021-09-23 16:21 ` Felix Kuehling 2021-09-24 5:35 ` Yu, Lang 1 sibling, 1 reply; 15+ messages in thread From: Felix Kuehling @ 2021-09-23 16:21 UTC (permalink / raw) To: Yu, Lang, Koenig, Christian, amd-gfx; +Cc: Huang, Ray On 2021-09-23 8:09 a.m., Yu, Lang wrote: > [AMD Official Use Only] > > > >> -----Original Message----- >> From: Koenig, Christian <Christian.Koenig@amd.com> >> Sent: Thursday, September 23, 2021 7:40 PM >> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org >> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Christian K nig >> <C3B6christian.koenig@amd.com>; Huang, Ray <Ray.Huang@amd.com> >> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning >> >> >> >> Am 23.09.21 um 11:44 schrieb Lang Yu: >>> If a BO is pinned, unpin it before freeing it. >>> >>> Call Trace: >>> ttm_bo_put+0x30/0x50 [ttm] >>> amdgpu_bo_unref+0x1e/0x30 [amdgpu] >>> amdgpu_gem_object_free+0x34/0x50 [amdgpu] >>> drm_gem_object_free+0x1d/0x30 [drm] >>> amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x31f/0x3a0 [amdgpu] >>> kfd_process_device_free_bos+0xa3/0xf0 [amdgpu] >>> kfd_process_wq_release+0x224/0x2e0 [amdgpu] >>> process_one_work+0x220/0x3c0 >>> worker_thread+0x4d/0x3f0 >>> kthread+0x114/0x150 >>> process_one_work+0x3c0/0x3c0 >>> kthread_park+0x90/0x90 >>> ret_from_fork+0x22/0x30 >>> >>> Signed-off-by: Lang Yu <lang.yu@amd.com> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>> index 2d6b2d77b738..7e693b064072 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>> @@ -1567,6 +1567,9 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu( >>> pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va, >>> mem->va + bo_size * (1 + mem->aql_queue)); >>> >>> + if (mem->bo->tbo.pin_count) >>> + amdgpu_bo_unpin(mem->bo); >>> + >> NAK, using mem->bo->tbo.pin_count like this is illegal. > I didn't get your point. I referred to function-"void amdgpu_bo_unpin(struct amdgpu_bo *bo)", > which uses it like this. > >> Where was the BO pinned in the first place? > I found two places: > > ret = kfd_process_alloc_gpuvm(pdd, qpd->ib_base, PAGE_SIZE, flags, > &kaddr); > > ret = kfd_process_alloc_gpuvm(pdd, qpd->cwsr_base, > KFD_CWSR_TBA_TMA_SIZE, flags, &kaddr); These two allocations are created by the kernel mode driver. There is another case where a user-allocated BO can get pinned because we need to kmap it (in kfd_ioctl_create_event). Regards, Felix > Regards, > Lang > >> Christian. >> >>> ret = unreserve_bo_and_vms(&ctx, false, false); >>> >>> /* Remove from VM internal data structures */ ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH] drm/kfd: fix ttm_bo_release warning 2021-09-23 16:21 ` Felix Kuehling @ 2021-09-24 5:35 ` Yu, Lang 0 siblings, 0 replies; 15+ messages in thread From: Yu, Lang @ 2021-09-24 5:35 UTC (permalink / raw) To: Kuehling, Felix, Koenig, Christian, amd-gfx; +Cc: Huang, Ray [AMD Official Use Only] >-----Original Message----- >From: Kuehling, Felix <Felix.Kuehling@amd.com> >Sent: Friday, September 24, 2021 12:21 AM >To: Yu, Lang <Lang.Yu@amd.com>; Koenig, Christian ><Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org >Cc: Huang, Ray <Ray.Huang@amd.com> >Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning > > >On 2021-09-23 8:09 a.m., Yu, Lang wrote: >> [AMD Official Use Only] >> >> >> >>> -----Original Message----- >>> From: Koenig, Christian <Christian.Koenig@amd.com> >>> Sent: Thursday, September 23, 2021 7:40 PM >>> To: Yu, Lang <Lang.Yu@amd.com>; amd-gfx@lists.freedesktop.org >>> Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Christian K nig >>> <C3B6christian.koenig@amd.com>; Huang, Ray <Ray.Huang@amd.com> >>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning >>> >>> >>> >>> Am 23.09.21 um 11:44 schrieb Lang Yu: >>>> If a BO is pinned, unpin it before freeing it. >>>> >>>> Call Trace: >>>> ttm_bo_put+0x30/0x50 [ttm] >>>> amdgpu_bo_unref+0x1e/0x30 [amdgpu] >>>> amdgpu_gem_object_free+0x34/0x50 [amdgpu] >>>> drm_gem_object_free+0x1d/0x30 [drm] >>>> amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x31f/0x3a0 [amdgpu] >>>> kfd_process_device_free_bos+0xa3/0xf0 [amdgpu] >>>> kfd_process_wq_release+0x224/0x2e0 [amdgpu] >>>> process_one_work+0x220/0x3c0 >>>> worker_thread+0x4d/0x3f0 >>>> kthread+0x114/0x150 >>>> process_one_work+0x3c0/0x3c0 >>>> kthread_park+0x90/0x90 >>>> ret_from_fork+0x22/0x30 >>>> >>>> Signed-off-by: Lang Yu <lang.yu@amd.com> >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +++ >>>> 1 file changed, 3 insertions(+) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>> index 2d6b2d77b738..7e693b064072 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>> @@ -1567,6 +1567,9 @@ int >amdgpu_amdkfd_gpuvm_free_memory_of_gpu( >>>> pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va, >>>> mem->va + bo_size * (1 + mem->aql_queue)); >>>> >>>> + if (mem->bo->tbo.pin_count) >>>> + amdgpu_bo_unpin(mem->bo); >>>> + >>> NAK, using mem->bo->tbo.pin_count like this is illegal. >> I didn't get your point. I referred to function-"void >> amdgpu_bo_unpin(struct amdgpu_bo *bo)", which uses it like this. >> >>> Where was the BO pinned in the first place? >> I found two places: >> >> ret = kfd_process_alloc_gpuvm(pdd, qpd->ib_base, PAGE_SIZE, flags, >> &kaddr); >> >> ret = kfd_process_alloc_gpuvm(pdd, qpd->cwsr_base, >> KFD_CWSR_TBA_TMA_SIZE, flags, &kaddr); > >These two allocations are created by the kernel mode driver. There is another >case where a user-allocated BO can get pinned because we need to kmap it (in >kfd_ioctl_create_event). > >Regards, > Felix Yes, these BOs will not be freed until a kfd process is destroyed. I will make a v2 patch, please help review. Thanks! Regards, Lang > >> Regards, >> Lang >> >>> Christian. >>> >>>> ret = unreserve_bo_and_vms(&ctx, false, false); >>>> >>>> /* Remove from VM internal data structures */ ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2021-09-24 10:37 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-09-23 9:44 [PATCH] drm/kfd: fix ttm_bo_release warning Lang Yu 2021-09-23 11:39 ` Christian König 2021-09-23 12:09 ` Yu, Lang 2021-09-23 12:23 ` Christian König 2021-09-23 14:24 ` Yu, Lang 2021-09-23 14:52 ` Christian König 2021-09-24 5:35 ` Yu, Lang 2021-09-24 5:42 ` Christian König 2021-09-24 5:50 ` Yu, Lang 2021-09-24 5:54 ` Christian König 2021-09-24 6:34 ` Yu, Lang 2021-09-24 6:37 ` Christian König 2021-09-24 10:37 ` Yu, Lang 2021-09-23 16:21 ` Felix Kuehling 2021-09-24 5:35 ` Yu, Lang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).