Hi Christian,
When we free an IB fence, we first call one
call_rcu in drm_sched_fence_release_scheduled as the call
trace one, then after the call trace one,
we call the call_rcu second in the amdgpu_fence_release
in call trace two, as below.
The kmem_cache_free(amdgpu_fence_slab, fence)'s call
trace as below:
1.drm_sched_entity_fini ->
drm_sched_entity_cleanup ->
dma_fence_put(entity->last_scheduled) ->
drm_sched_fence_release_finished ->
drm_sched_fence_release_scheduled ->
call_rcu(&fence->finished.rcu, drm_sched_fence_free)
2.drm_sched_fence_free ->
dma_fence_put(fence->parent) ->
amdgpu_fence_release ->
call_rcu(&f->rcu, amdgpu_fence_free) ->
kmem_cache_free(amdgpu_fence_slab, fence);
> -----Original Message-----
> From: Koenig, Christian
> Sent: Friday, May 18, 2018 5:46 PM
> Subject: Re: [PATCH] drm/amdgpu: add rcu_barrier
after entity fini
>
> Ok, I'm lost where do we use call_rcu() twice? Cause
that sounds incorrect in
> the first place.
>
> Christian.
>
> Am 18.05.2018 um 11:41 schrieb Deng, Emily:
> > Ping......
> >
> > Best Wishes,
> > Emily Deng
> >
> >
> >
> >
> >> -----Original Message-----
> >> Behalf Of Deng, Emily
> >> Sent: Friday, May 18, 2018 11:20 AM
> >> Subject: RE: [PATCH] drm/amdgpu: add
rcu_barrier after entity fini
> >>
> >> Hi Christian,
> >> Yes, it has already one rcu_barrier,
but it has called twice
> >> call_rcu, so the one rcu_barrier just could
barrier one call_rcu some time.
> >> After I added another rcu_barrier, the
kernel issue will disappear.
> >>
> >> Best Wishes,
> >> Emily Deng
> >>
> >>> -----Original Message-----
> >>> Sent: Thursday, May 17, 2018 7:08 PM
> >>> Subject: Re: [PATCH] drm/amdgpu: add
rcu_barrier after entity fini
> >>>
> >>> Am 17.05.2018 um 12:03 schrieb Emily
Deng:
> >>>> To free the fence from the
amdgpu_fence_slab, need twice call_rcu,
> >>>> to avoid the amdgpu_fence_slab_fini
call
> >>>>
kmem_cache_destroy(amdgpu_fence_slab) before
> >>> kmem_cache_free(amdgpu_fence_slab,
fence), add rcu_barrier after
> >>> drm_sched_entity_fini.
> >>>> The
kmem_cache_free(amdgpu_fence_slab, fence)'s call trace as
> below:
> >>>> 1.drm_sched_entity_fini ->
> >>>> drm_sched_entity_cleanup ->
> >>>>
dma_fence_put(entity->last_scheduled) ->
> >>>> drm_sched_fence_release_finished
->
> >>> drm_sched_fence_release_scheduled
> >>>> ->
call_rcu(&fence->finished.rcu, drm_sched_fence_free)
> >>>>
> >>>> 2.drm_sched_fence_free ->
> >>>> dma_fence_put(fence->parent)
->
> >>>> amdgpu_fence_release ->
> >>>> call_rcu(&f->rcu,
amdgpu_fence_free) ->
> >>>> kmem_cache_free(amdgpu_fence_slab,
fence);
> >>>>
> >>>> v2:put the barrier before the
kmem_cache_destroy
> >>>>
> >>>> Change-Id:
I8dcadd3372f97e72461bf46b41cc26d90f09b8df
> >>>> ---
> >>>>
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 1 +
> >>>> 1 file changed, 1 insertion(+)
> >>>>
> >>>> diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> >>>>
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> >>>> index 39ec6b8..42be65b 100644
> >>>> ---
a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> >>>> +++
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> >>>> @@ -69,6 +69,7 @@ int
amdgpu_fence_slab_init(void)
> >>>> void
amdgpu_fence_slab_fini(void)
> >>>> {
> >>>> rcu_barrier();
> >>>> + rcu_barrier();
> >>> Well, you should have noted that there
is already an rcu_barrier
> >>> here and adding another one shouldn't
have any additional effect. So
> >>> your explanation and the proposed
solution doesn't make to much
> sense.
> >>>
> >>> I think the problem you run into is
rather that the fence is
> >>> reference counted and might live longer
than the module who created it.
> >>>
> >>> Complicated issue, one possible
solution would be to release
> >>> fence->parent earlier in the
scheduler fence but that doesn't sound
> >>> fence->like
> >>> a general purpose solution.
> >>>
> >>> Christian.
> >>>
> >>>>
kmem_cache_destroy(amdgpu_fence_slab);
> >>>> }
> >>>> /*
> >>
_______________________________________________
> >> amd-gfx mailing list