All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdgpu: fix fence slab teardown
@ 2016-10-23 18:31 Grazvydas Ignotas
  2016-10-24  2:34 ` zhoucm1
       [not found] ` <1477247507-11378-1-git-send-email-notasas-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 2 replies; 6+ messages in thread
From: Grazvydas Ignotas @ 2016-10-23 18:31 UTC (permalink / raw)
  To: dri-devel, amd-gfx

To free fences, call_rcu() is used, which calls amdgpu_fence_free()
after a grace period. During teardown, there is no guarantee all
callbacks have finished, so amdgpu_fence_slab may be destroyed before
all fences have been freed. If we are lucky, this results in some slab
warnings, if not, we get a crash in one of rcu threads because callback
is called after amdgpu has already been unloaded.

Fix it with a rcu_barrier().

Fixes: b44135351a3a ("drm/amdgpu: RCU protected amdgpu_fence_release")
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 3a2e42f..77b34ec 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -68,6 +68,7 @@ int amdgpu_fence_slab_init(void)
 
 void amdgpu_fence_slab_fini(void)
 {
+	rcu_barrier();
 	kmem_cache_destroy(amdgpu_fence_slab);
 }
 /*
-- 
2.7.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/amdgpu: fix fence slab teardown
  2016-10-23 18:31 [PATCH] drm/amdgpu: fix fence slab teardown Grazvydas Ignotas
@ 2016-10-24  2:34 ` zhoucm1
       [not found]   ` <580D731B.5050304-5C7GfCeVMHo@public.gmane.org>
       [not found] ` <1477247507-11378-1-git-send-email-notasas-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  1 sibling, 1 reply; 6+ messages in thread
From: zhoucm1 @ 2016-10-24  2:34 UTC (permalink / raw)
  To: Grazvydas Ignotas, dri-devel, amd-gfx

Acked-by: Chunming Zhou <david1.zhou@amd.com>

On 2016年10月24日 02:31, Grazvydas Ignotas wrote:
> To free fences, call_rcu() is used, which calls amdgpu_fence_free()
> after a grace period. During teardown, there is no guarantee all
> callbacks have finished, so amdgpu_fence_slab may be destroyed before
> all fences have been freed. If we are lucky, this results in some slab
> warnings, if not, we get a crash in one of rcu threads because callback
> is called after amdgpu has already been unloaded.
>
> Fix it with a rcu_barrier().
>
> Fixes: b44135351a3a ("drm/amdgpu: RCU protected amdgpu_fence_release")
> Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index 3a2e42f..77b34ec 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -68,6 +68,7 @@ int amdgpu_fence_slab_init(void)
>   
>   void amdgpu_fence_slab_fini(void)
>   {
> +	rcu_barrier();
>   	kmem_cache_destroy(amdgpu_fence_slab);
>   }
>   /*

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* 答复: [PATCH] drm/amdgpu: fix fence slab teardown
       [not found]   ` <580D731B.5050304-5C7GfCeVMHo@public.gmane.org>
@ 2016-10-24  3:35     ` Qu, Jim
  2016-10-24  9:32       ` Grazvydas Ignotas
  2016-10-24  9:05     ` Christian König
  1 sibling, 1 reply; 6+ messages in thread
From: Qu, Jim @ 2016-10-24  3:35 UTC (permalink / raw)
  To: Zhou, David(ChunMing),
	Grazvydas Ignotas, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

I did observed the issue when replace kernel module use DKMS, and it maybe get error at reboot, got calltrace:

[ 3529.525360] =============================================================================
[ 3529.525361] BUG amd_sched_fence (Tainted: G    B      OE  ------------  ): Objects remaining in amd_sched_fence on kmem_cache_close()
[ 3529.525361] -----------------------------------------------------------------------------
[ 3529.525361] 
[ 3529.525361] INFO: Slab 0xffffea000094b200 objects=25 used=2 fp=0xffff8800252c9180 flags=0x1fffff00004080
[ 3529.525362] CPU: 0 PID: 18523 Comm: reboot Tainted: G    B      OE  ------------   3.10.0-512.el7.x86_64 #1
[ 3529.525362] Hardware name: ASUS All Series/Z87-PLUS, BIOS 1802 01/28/2014
[ 3529.525363]  ffffea000094b200 00000000b3b19dcf ffff880160827b50 ffffffff81685e8c
[ 3529.525363]  ffff880160827c28 ffffffff811d9e34 ffff880000000020 ffff880160827c38
[ 3529.525364]  ffff880160827be8 656a624f818de5f0 616d657220737463 6e6920676e696e69
[ 3529.525364] Call Trace:
[ 3529.525365]  [<ffffffff81685e8c>] dump_stack+0x19/0x1b
[ 3529.525366]  [<ffffffff811d9e34>] slab_err+0xb4/0xe0
[ 3529.525367]  [<ffffffff81088c29>] ? vprintk_default+0x29/0x40
[ 3529.525368]  [<ffffffff8167f434>] ? printk+0x5e/0x75
[ 3529.525369]  [<ffffffff811dd133>] ? __kmalloc+0x1f3/0x240
[ 3529.525370]  [<ffffffff811df80b>] ? kmem_cache_close+0x12b/0x2f0
[ 3529.525370]  [<ffffffff811df82c>] kmem_cache_close+0x14c/0x2f0
[ 3529.525371]  [<ffffffff811df9e4>] __kmem_cache_shutdown+0x14/0x80
[ 3529.525372]  [<ffffffff811a5704>] kmem_cache_destroy+0x44/0xf0
[ 3529.525387]  [<ffffffffa02bfb0c>] amd_sched_fini+0x3c/0x40 [amdgpu]
[ 3529.525395]  [<ffffffffa0231bfa>] amdgpu_fence_driver_fini+0x7a/0x110 [amdgpu]
[ 3529.525403]  [<ffffffffa02230dd>] amdgpu_device_fini+0x3d/0x1f0 [amdgpu]
[ 3529.525411]  [<ffffffffa0225673>] amdgpu_driver_unload_kms+0x43/0x80 [amdgpu]
[ 3529.525416]  [<ffffffffa005fb89>] drm_dev_unregister+0x29/0xb0 [drm]
[ 3529.525422]  [<ffffffffa0060273>] drm_put_dev+0x23/0x70 [drm]
[ 3529.525429]  [<ffffffffa021f3fd>] amdgpu_pci_shutdown+0x1d/0x20 [amdgpu]
[ 3529.525430]  [<ffffffff81359b56>] pci_device_shutdown+0x36/0x70
[ 3529.525431]  [<ffffffff8142a388>] device_shutdown+0xc8/0x180
[ 3529.525432]  [<ffffffff810a1536>] kernel_restart_prepare+0x36/0x40
[ 3529.525433]  [<ffffffff810a1552>] kernel_restart+0x12/0x60
[ 3529.525433]  [<ffffffff810a17c9>] SYSC_reboot+0x229/0x260
[ 3529.525435]  [<ffffffff81691971>] ? __do_page_fault+0x171/0x450
[ 3529.525436]  [<ffffffff810a186e>] SyS_reboot+0xe/0x10
[ 3529.525437]  [<ffffffff81696489>] system_call_fastpath+0x16/0x1b
[ 3529.525438] INFO: Object 0xffff8800252c8a00 @offset=2560
[ 3529.525438] INFO: Object 0xffff8800252c9540 @offset=5440


Do these series patches fix this issue?

Thanks
JimQu

________________________________________
发件人: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> 代表 zhoucm1 <david1.zhou@amd.com>
发送时间: 2016年10月24日 10:34
收件人: Grazvydas Ignotas; dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org
主题: Re: [PATCH] drm/amdgpu: fix fence slab teardown

Acked-by: Chunming Zhou <david1.zhou@amd.com>

On 2016年10月24日 02:31, Grazvydas Ignotas wrote:
> To free fences, call_rcu() is used, which calls amdgpu_fence_free()
> after a grace period. During teardown, there is no guarantee all
> callbacks have finished, so amdgpu_fence_slab may be destroyed before
> all fences have been freed. If we are lucky, this results in some slab
> warnings, if not, we get a crash in one of rcu threads because callback
> is called after amdgpu has already been unloaded.
>
> Fix it with a rcu_barrier().
>
> Fixes: b44135351a3a ("drm/amdgpu: RCU protected amdgpu_fence_release")
> Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index 3a2e42f..77b34ec 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -68,6 +68,7 @@ int amdgpu_fence_slab_init(void)
>
>   void amdgpu_fence_slab_fini(void)
>   {
> +     rcu_barrier();
>       kmem_cache_destroy(amdgpu_fence_slab);
>   }
>   /*

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/amdgpu: fix fence slab teardown
       [not found]   ` <580D731B.5050304-5C7GfCeVMHo@public.gmane.org>
  2016-10-24  3:35     ` 答复: " Qu, Jim
@ 2016-10-24  9:05     ` Christian König
  1 sibling, 0 replies; 6+ messages in thread
From: Christian König @ 2016-10-24  9:05 UTC (permalink / raw)
  To: zhoucm1, Grazvydas Ignotas,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Interesting catch, patch is Reviewed-by: Christian König 
<christian.koenig@amd.com>.

Am 24.10.2016 um 04:34 schrieb zhoucm1:
> Acked-by: Chunming Zhou <david1.zhou@amd.com>
>
> On 2016年10月24日 02:31, Grazvydas Ignotas wrote:
>> To free fences, call_rcu() is used, which calls amdgpu_fence_free()
>> after a grace period. During teardown, there is no guarantee all
>> callbacks have finished, so amdgpu_fence_slab may be destroyed before
>> all fences have been freed. If we are lucky, this results in some slab
>> warnings, if not, we get a crash in one of rcu threads because callback
>> is called after amdgpu has already been unloaded.
>>
>> Fix it with a rcu_barrier().
>>
>> Fixes: b44135351a3a ("drm/amdgpu: RCU protected amdgpu_fence_release")
>> Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> index 3a2e42f..77b34ec 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> @@ -68,6 +68,7 @@ int amdgpu_fence_slab_init(void)
>>     void amdgpu_fence_slab_fini(void)
>>   {
>> +    rcu_barrier();
>>       kmem_cache_destroy(amdgpu_fence_slab);
>>   }
>>   /*
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 答复: [PATCH] drm/amdgpu: fix fence slab teardown
  2016-10-24  3:35     ` 答复: " Qu, Jim
@ 2016-10-24  9:32       ` Grazvydas Ignotas
  0 siblings, 0 replies; 6+ messages in thread
From: Grazvydas Ignotas @ 2016-10-24  9:32 UTC (permalink / raw)
  To: Qu, Jim; +Cc: amd-gfx, dri-devel

On Mon, Oct 24, 2016 at 6:35 AM, Qu, Jim <Jim.Qu@amd.com> wrote:
> I did observed the issue when replace kernel module use DKMS, and it maybe get error at reboot, got calltrace:
>
> [ 3529.525360] =============================================================================
> [ 3529.525361] BUG amd_sched_fence (Tainted: G    B      OE  ------------  ): Objects remaining in amd_sched_fence on kmem_cache_close()
> [ 3529.525361] -----------------------------------------------------------------------------
> [ 3529.525361]
> [ 3529.525361] INFO: Slab 0xffffea000094b200 objects=25 used=2 fp=0xffff8800252c9180 flags=0x1fffff00004080
> [ 3529.525362] CPU: 0 PID: 18523 Comm: reboot Tainted: G    B      OE  ------------   3.10.0-512.el7.x86_64 #1
> [ 3529.525362] Hardware name: ASUS All Series/Z87-PLUS, BIOS 1802 01/28/2014
> [ 3529.525363]  ffffea000094b200 00000000b3b19dcf ffff880160827b50 ffffffff81685e8c
> [ 3529.525363]  ffff880160827c28 ffffffff811d9e34 ffff880000000020 ffff880160827c38
> [ 3529.525364]  ffff880160827be8 656a624f818de5f0 616d657220737463 6e6920676e696e69
> [ 3529.525364] Call Trace:
> [ 3529.525365]  [<ffffffff81685e8c>] dump_stack+0x19/0x1b
> [ 3529.525366]  [<ffffffff811d9e34>] slab_err+0xb4/0xe0
> [ 3529.525367]  [<ffffffff81088c29>] ? vprintk_default+0x29/0x40
> [ 3529.525368]  [<ffffffff8167f434>] ? printk+0x5e/0x75
> [ 3529.525369]  [<ffffffff811dd133>] ? __kmalloc+0x1f3/0x240
> [ 3529.525370]  [<ffffffff811df80b>] ? kmem_cache_close+0x12b/0x2f0
> [ 3529.525370]  [<ffffffff811df82c>] kmem_cache_close+0x14c/0x2f0
> [ 3529.525371]  [<ffffffff811df9e4>] __kmem_cache_shutdown+0x14/0x80
> [ 3529.525372]  [<ffffffff811a5704>] kmem_cache_destroy+0x44/0xf0
> [ 3529.525387]  [<ffffffffa02bfb0c>] amd_sched_fini+0x3c/0x40 [amdgpu]
> [ 3529.525395]  [<ffffffffa0231bfa>] amdgpu_fence_driver_fini+0x7a/0x110 [amdgpu]
> [ 3529.525403]  [<ffffffffa02230dd>] amdgpu_device_fini+0x3d/0x1f0 [amdgpu]
> [ 3529.525411]  [<ffffffffa0225673>] amdgpu_driver_unload_kms+0x43/0x80 [amdgpu]
> [ 3529.525416]  [<ffffffffa005fb89>] drm_dev_unregister+0x29/0xb0 [drm]
> [ 3529.525422]  [<ffffffffa0060273>] drm_put_dev+0x23/0x70 [drm]
> [ 3529.525429]  [<ffffffffa021f3fd>] amdgpu_pci_shutdown+0x1d/0x20 [amdgpu]
> [ 3529.525430]  [<ffffffff81359b56>] pci_device_shutdown+0x36/0x70
> [ 3529.525431]  [<ffffffff8142a388>] device_shutdown+0xc8/0x180
> [ 3529.525432]  [<ffffffff810a1536>] kernel_restart_prepare+0x36/0x40
> [ 3529.525433]  [<ffffffff810a1552>] kernel_restart+0x12/0x60
> [ 3529.525433]  [<ffffffff810a17c9>] SYSC_reboot+0x229/0x260
> [ 3529.525435]  [<ffffffff81691971>] ? __do_page_fault+0x171/0x450
> [ 3529.525436]  [<ffffffff810a186e>] SyS_reboot+0xe/0x10
> [ 3529.525437]  [<ffffffff81696489>] system_call_fastpath+0x16/0x1b
> [ 3529.525438] INFO: Object 0xffff8800252c8a00 @offset=2560
> [ 3529.525438] INFO: Object 0xffff8800252c9540 @offset=5440
>
>
> Do these series patches fix this issue?

Yes, but only partially - there are still some leaked objects left.
When SLUB_DEBUG is set, you can also set CONFIG_SLUB_DEBUG_ON or add
"slub_debug" to kernel command line to see the leak backtraces.

Gražvydas
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/amdgpu: fix fence slab teardown
       [not found] ` <1477247507-11378-1-git-send-email-notasas-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2016-10-24 16:31   ` Alex Deucher
  0 siblings, 0 replies; 6+ messages in thread
From: Alex Deucher @ 2016-10-24 16:31 UTC (permalink / raw)
  To: Grazvydas Ignotas; +Cc: amd-gfx list, Maling list - DRI developers

On Sun, Oct 23, 2016 at 2:31 PM, Grazvydas Ignotas <notasas@gmail.com> wrote:
> To free fences, call_rcu() is used, which calls amdgpu_fence_free()
> after a grace period. During teardown, there is no guarantee all
> callbacks have finished, so amdgpu_fence_slab may be destroyed before
> all fences have been freed. If we are lucky, this results in some slab
> warnings, if not, we get a crash in one of rcu threads because callback
> is called after amdgpu has already been unloaded.
>
> Fix it with a rcu_barrier().
>
> Fixes: b44135351a3a ("drm/amdgpu: RCU protected amdgpu_fence_release")
> Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>


Applied.  Thanks!

Alex


> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index 3a2e42f..77b34ec 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -68,6 +68,7 @@ int amdgpu_fence_slab_init(void)
>
>  void amdgpu_fence_slab_fini(void)
>  {
> +       rcu_barrier();
>         kmem_cache_destroy(amdgpu_fence_slab);
>  }
>  /*
> --
> 2.7.4
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-10-24 16:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-23 18:31 [PATCH] drm/amdgpu: fix fence slab teardown Grazvydas Ignotas
2016-10-24  2:34 ` zhoucm1
     [not found]   ` <580D731B.5050304-5C7GfCeVMHo@public.gmane.org>
2016-10-24  3:35     ` 答复: " Qu, Jim
2016-10-24  9:32       ` Grazvydas Ignotas
2016-10-24  9:05     ` Christian König
     [not found] ` <1477247507-11378-1-git-send-email-notasas-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-10-24 16:31   ` Alex Deucher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.