All of lore.kernel.org
 help / color / mirror / Atom feed
* bfq: BUG bfq_queue: Objects remaining in bfq_queue on __kmem_cache_shutdown() after rmmod
@ 2017-12-08  0:34 Holger Hoffstätte
  2017-12-20  6:24 ` Paolo Valente
  2017-12-21  7:08 ` Guoqing Jiang
  0 siblings, 2 replies; 6+ messages in thread
From: Holger Hoffstätte @ 2017-12-08  0:34 UTC (permalink / raw)
  To: linux-block, Paolo Valente


So plugging in a device on USB with BFQ as scheduler now works without
hiccup (probably thanks to Ming Lei's last patch), but of course I found
another problem. Unmounting the device after use, changing the scheduler
back to deadline or kyber and rmmod'ing the BFQ module reproducibly gives me:

kernel: =============================================================================
kernel: BUG bfq_queue (Tainted: G    B          ): Objects remaining in bfq_queue on __kmem_cache_shutdown()
kernel: -----------------------------------------------------------------------------
kernel: 
kernel: INFO: Slab 0xffffea001601fc00 objects=37 used=3 fp=0xffff8805807f0360 flags=0x8000000000008100
kernel: CPU: 0 PID: 9967 Comm: rmmod Tainted: G    B           4.14.5 #1
kernel: Hardware name: Gigabyte Technology Co., Ltd. P67-DS3-B3/P67-DS3-B3, BIOS F1 05/06/2011
kernel: Call Trace:
kernel:  dump_stack+0x46/0x5e
kernel:  slab_err+0x9e/0xb0
kernel:  ? on_each_cpu_mask+0x35/0x40
kernel:  ? ksm_migrate_page+0x60/0x60
kernel:  ? __kmalloc+0x1c9/0x1d0
kernel:  __kmem_cache_shutdown+0x177/0x350
kernel:  shutdown_cache+0xf/0x130
kernel:  kmem_cache_destroy+0x19e/0x1b0
kernel:  SyS_delete_module+0x168/0x230
kernel:  ? exit_to_usermode_loop+0x39/0x80
kernel:  entry_SYSCALL_64_fastpath+0x13/0x94
kernel: RIP: 0033:0x7f53e4136b97
kernel: RSP: 002b:00007ffd660061d8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f53e4136b97
kernel: RDX: 000000000000000a RSI: 0000000000000800 RDI: 00000000006247f8
kernel: RBP: 0000000000000000 R08: 00007ffd66005171 R09: 0000000000000000
kernel: R10: 00007f53e41acbc0 R11: 0000000000000206 R12: 0000000000624790
kernel: R13: 00007ffd660051d0 R14: 0000000000624790 R15: 0000000000623260
kernel: INFO: Object 0xffff8805807f0000 @offset=0
kernel: INFO: Object 0xffff8805807f01b0 @offset=432
kernel: INFO: Object 0xffff8805807f3cc0 @offset=15552
kernel: kmem_cache_destroy bfq_queue: Slab cache still has objects
kernel: CPU: 0 PID: 9967 Comm: rmmod Tainted: G    B           4.14.5 #1
kernel: Hardware name: Gigabyte Technology Co., Ltd. P67-DS3-B3/P67-DS3-B3, BIOS F1 05/06/2011
kernel: Call Trace:
kernel:  dump_stack+0x46/0x5e
kernel:  kmem_cache_destroy+0x191/0x1b0
kernel:  SyS_delete_module+0x168/0x230
kernel:  ? exit_to_usermode_loop+0x39/0x80
kernel:  entry_SYSCALL_64_fastpath+0x13/0x94
kernel: RIP: 0033:0x7f53e4136b97
kernel: RSP: 002b:00007ffd660061d8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f53e4136b97
kernel: RDX: 000000000000000a RSI: 0000000000000800 RDI: 00000000006247f8
kernel: RBP: 0000000000000000 R08: 00007ffd66005171 R09: 0000000000000000
kernel: R10: 00007f53e41acbc0 R11: 0000000000000206 R12: 0000000000624790
kernel: R13: 00007ffd660051d0 R14: 0000000000624790 R15: 0000000000623260

Unloading e.g. kyber repeatably works fine, so I assume this is a bug. :)
I'm running 4.14.5 (currently in -rc1) + block/scsi picked from 4.15-git and
the list. Would be good to know if someone can reproduce it.

thanks,
Holger

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bfq: BUG bfq_queue: Objects remaining in bfq_queue on __kmem_cache_shutdown() after rmmod
  2017-12-08  0:34 bfq: BUG bfq_queue: Objects remaining in bfq_queue on __kmem_cache_shutdown() after rmmod Holger Hoffstätte
@ 2017-12-20  6:24 ` Paolo Valente
  2017-12-21  7:08 ` Guoqing Jiang
  1 sibling, 0 replies; 6+ messages in thread
From: Paolo Valente @ 2017-12-20  6:24 UTC (permalink / raw)
  To: Holger Hoffstätte; +Cc: linux-block, DAVIDE FERRARI



> Il giorno 08 dic 2017, alle ore 01:34, Holger Hoffst=C3=A4tte =
<holger@applied-asynchrony.com> ha scritto:
>=20
>=20
> So plugging in a device on USB with BFQ as scheduler now works without
> hiccup (probably thanks to Ming Lei's last patch), but of course I =
found
> another problem. Unmounting the device after use, changing the =
scheduler
> back to deadline or kyber and rmmod'ing the BFQ module reproducibly =
gives me:
>=20

Hi Holger,
this is just to tell you that we have not overlooked your report, and
we are already working on it.  We are proceeding slowly, because we
are also working on a large backlog that I hope we will clear soon.

Thanks,
Paolo

> kernel: =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
> kernel: BUG bfq_queue (Tainted: G    B          ): Objects remaining =
in bfq_queue on __kmem_cache_shutdown()
> kernel: =
--------------------------------------------------------------------------=
---
> kernel:=20
> kernel: INFO: Slab 0xffffea001601fc00 objects=3D37 used=3D3 =
fp=3D0xffff8805807f0360 flags=3D0x8000000000008100
> kernel: CPU: 0 PID: 9967 Comm: rmmod Tainted: G    B           4.14.5 =
#1
> kernel: Hardware name: Gigabyte Technology Co., Ltd. =
P67-DS3-B3/P67-DS3-B3, BIOS F1 05/06/2011
> kernel: Call Trace:
> kernel:  dump_stack+0x46/0x5e
> kernel:  slab_err+0x9e/0xb0
> kernel:  ? on_each_cpu_mask+0x35/0x40
> kernel:  ? ksm_migrate_page+0x60/0x60
> kernel:  ? __kmalloc+0x1c9/0x1d0
> kernel:  __kmem_cache_shutdown+0x177/0x350
> kernel:  shutdown_cache+0xf/0x130
> kernel:  kmem_cache_destroy+0x19e/0x1b0
> kernel:  SyS_delete_module+0x168/0x230
> kernel:  ? exit_to_usermode_loop+0x39/0x80
> kernel:  entry_SYSCALL_64_fastpath+0x13/0x94
> kernel: RIP: 0033:0x7f53e4136b97
> kernel: RSP: 002b:00007ffd660061d8 EFLAGS: 00000206 ORIG_RAX: =
00000000000000b0
> kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: =
00007f53e4136b97
> kernel: RDX: 000000000000000a RSI: 0000000000000800 RDI: =
00000000006247f8
> kernel: RBP: 0000000000000000 R08: 00007ffd66005171 R09: =
0000000000000000
> kernel: R10: 00007f53e41acbc0 R11: 0000000000000206 R12: =
0000000000624790
> kernel: R13: 00007ffd660051d0 R14: 0000000000624790 R15: =
0000000000623260
> kernel: INFO: Object 0xffff8805807f0000 @offset=3D0
> kernel: INFO: Object 0xffff8805807f01b0 @offset=3D432
> kernel: INFO: Object 0xffff8805807f3cc0 @offset=3D15552
> kernel: kmem_cache_destroy bfq_queue: Slab cache still has objects
> kernel: CPU: 0 PID: 9967 Comm: rmmod Tainted: G    B           4.14.5 =
#1
> kernel: Hardware name: Gigabyte Technology Co., Ltd. =
P67-DS3-B3/P67-DS3-B3, BIOS F1 05/06/2011
> kernel: Call Trace:
> kernel:  dump_stack+0x46/0x5e
> kernel:  kmem_cache_destroy+0x191/0x1b0
> kernel:  SyS_delete_module+0x168/0x230
> kernel:  ? exit_to_usermode_loop+0x39/0x80
> kernel:  entry_SYSCALL_64_fastpath+0x13/0x94
> kernel: RIP: 0033:0x7f53e4136b97
> kernel: RSP: 002b:00007ffd660061d8 EFLAGS: 00000206 ORIG_RAX: =
00000000000000b0
> kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: =
00007f53e4136b97
> kernel: RDX: 000000000000000a RSI: 0000000000000800 RDI: =
00000000006247f8
> kernel: RBP: 0000000000000000 R08: 00007ffd66005171 R09: =
0000000000000000
> kernel: R10: 00007f53e41acbc0 R11: 0000000000000206 R12: =
0000000000624790
> kernel: R13: 00007ffd660051d0 R14: 0000000000624790 R15: =
0000000000623260
>=20
> Unloading e.g. kyber repeatably works fine, so I assume this is a bug. =
:)
> I'm running 4.14.5 (currently in -rc1) + block/scsi picked from =
4.15-git and
> the list. Would be good to know if someone can reproduce it.
>=20
> thanks,
> Holger

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bfq: BUG bfq_queue: Objects remaining in bfq_queue on __kmem_cache_shutdown() after rmmod
  2017-12-08  0:34 bfq: BUG bfq_queue: Objects remaining in bfq_queue on __kmem_cache_shutdown() after rmmod Holger Hoffstätte
  2017-12-20  6:24 ` Paolo Valente
@ 2017-12-21  7:08 ` Guoqing Jiang
  2017-12-21  7:53   ` Paolo Valente
  1 sibling, 1 reply; 6+ messages in thread
From: Guoqing Jiang @ 2017-12-21  7:08 UTC (permalink / raw)
  To: Holger Hoffstätte, linux-block, Paolo Valente

Hi,


On 12/08/2017 08:34 AM, Holger Hoffstätte wrote:
> So plugging in a device on USB with BFQ as scheduler now works without
> hiccup (probably thanks to Ming Lei's last patch), but of course I found
> another problem. Unmounting the device after use, changing the scheduler
> back to deadline or kyber and rmmod'ing the BFQ module reproducibly gives me:
>
> kernel: =============================================================================
> kernel: BUG bfq_queue (Tainted: G    B          ): Objects remaining in bfq_queue on __kmem_cache_shutdown()
> kernel: -----------------------------------------------------------------------------
> kernel:
> kernel: INFO: Slab 0xffffea001601fc00 objects=37 used=3 fp=0xffff8805807f0360 flags=0x8000000000008100
> kernel: CPU: 0 PID: 9967 Comm: rmmod Tainted: G    B           4.14.5 #1
> kernel: Hardware name: Gigabyte Technology Co., Ltd. P67-DS3-B3/P67-DS3-B3, BIOS F1 05/06/2011
> kernel: Call Trace:
> kernel:  dump_stack+0x46/0x5e
> kernel:  slab_err+0x9e/0xb0
> kernel:  ? on_each_cpu_mask+0x35/0x40
> kernel:  ? ksm_migrate_page+0x60/0x60
> kernel:  ? __kmalloc+0x1c9/0x1d0
> kernel:  __kmem_cache_shutdown+0x177/0x350
> kernel:  shutdown_cache+0xf/0x130
> kernel:  kmem_cache_destroy+0x19e/0x1b0
> kernel:  SyS_delete_module+0x168/0x230
> kernel:  ? exit_to_usermode_loop+0x39/0x80
> kernel:  entry_SYSCALL_64_fastpath+0x13/0x94
> kernel: RIP: 0033:0x7f53e4136b97
> kernel: RSP: 002b:00007ffd660061d8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
> kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f53e4136b97
> kernel: RDX: 000000000000000a RSI: 0000000000000800 RDI: 00000000006247f8
> kernel: RBP: 0000000000000000 R08: 00007ffd66005171 R09: 0000000000000000
> kernel: R10: 00007f53e41acbc0 R11: 0000000000000206 R12: 0000000000624790
> kernel: R13: 00007ffd660051d0 R14: 0000000000624790 R15: 0000000000623260
> kernel: INFO: Object 0xffff8805807f0000 @offset=0
> kernel: INFO: Object 0xffff8805807f01b0 @offset=432
> kernel: INFO: Object 0xffff8805807f3cc0 @offset=15552
> kernel: kmem_cache_destroy bfq_queue: Slab cache still has objects
> kernel: CPU: 0 PID: 9967 Comm: rmmod Tainted: G    B           4.14.5 #1
> kernel: Hardware name: Gigabyte Technology Co., Ltd. P67-DS3-B3/P67-DS3-B3, BIOS F1 05/06/2011
> kernel: Call Trace:
> kernel:  dump_stack+0x46/0x5e
> kernel:  kmem_cache_destroy+0x191/0x1b0
> kernel:  SyS_delete_module+0x168/0x230
> kernel:  ? exit_to_usermode_loop+0x39/0x80
> kernel:  entry_SYSCALL_64_fastpath+0x13/0x94
> kernel: RIP: 0033:0x7f53e4136b97
> kernel: RSP: 002b:00007ffd660061d8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
> kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f53e4136b97
> kernel: RDX: 000000000000000a RSI: 0000000000000800 RDI: 00000000006247f8
> kernel: RBP: 0000000000000000 R08: 00007ffd66005171 R09: 0000000000000000
> kernel: R10: 00007f53e41acbc0 R11: 0000000000000206 R12: 0000000000624790
> kernel: R13: 00007ffd660051d0 R14: 0000000000624790 R15: 0000000000623260
>

I also encountered the similar bug, seems there are some async objects 
which are not freed with
BFQ_GROUP_IOSCHED build as "Y".

If revert part of commit e21b7a0b988772e82e7147e1c659a5afe2ae003c 
("block, bfq: add full hierarchical
scheduling and cgroups support")like below, then I can't see the bug 
again with simple test.

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 889a8549d97f..c640e64e042f 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -4710,6 +4710,7 @@ static void bfq_exit_queue(struct elevator_queue *e)
         spin_lock_irq(&bfqd->lock);
         list_for_each_entry_safe(bfqq, n, &bfqd->idle_list, bfqq_list)
                 bfq_deactivate_bfqq(bfqd, bfqq, false, false);
+       bfq_put_async_queues(bfqd, bfqd->root_group);
         spin_unlock_irq(&bfqd->lock);

         hrtimer_cancel(&bfqd->idle_slice_timer);
@@ -4718,7 +4719,6 @@ static void bfq_exit_queue(struct elevator_queue *e)
         blkcg_deactivate_policy(bfqd->queue, &blkcg_policy_bfq);
  #else
         spin_lock_irq(&bfqd->lock);
-       bfq_put_async_queues(bfqd, bfqd->root_group);
         kfree(bfqd->root_group);
         spin_unlock_irq(&bfqd->lock);

But perhaps we should do it inside blkcg_deactivate_policy, right?

Thanks,
Guoqing

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: bfq: BUG bfq_queue: Objects remaining in bfq_queue on __kmem_cache_shutdown() after rmmod
  2017-12-21  7:08 ` Guoqing Jiang
@ 2017-12-21  7:53   ` Paolo Valente
  2017-12-21  9:14     ` Guoqing Jiang
  0 siblings, 1 reply; 6+ messages in thread
From: Paolo Valente @ 2017-12-21  7:53 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: Holger Hoffstätte, linux-block, DAVIDE FERRARI



> Il giorno 21 dic 2017, alle ore 08:08, Guoqing Jiang =
<gqjiang@suse.com> ha scritto:
>=20
> Hi,
>=20
>=20
> On 12/08/2017 08:34 AM, Holger Hoffst=C3=A4tte wrote:
>> So plugging in a device on USB with BFQ as scheduler now works =
without
>> hiccup (probably thanks to Ming Lei's last patch), but of course I =
found
>> another problem. Unmounting the device after use, changing the =
scheduler
>> back to deadline or kyber and rmmod'ing the BFQ module reproducibly =
gives me:
>>=20
>> kernel: =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
>> kernel: BUG bfq_queue (Tainted: G    B          ): Objects remaining =
in bfq_queue on __kmem_cache_shutdown()
>> kernel: =
--------------------------------------------------------------------------=
---
>> kernel:
>> kernel: INFO: Slab 0xffffea001601fc00 objects=3D37 used=3D3 =
fp=3D0xffff8805807f0360 flags=3D0x8000000000008100
>> kernel: CPU: 0 PID: 9967 Comm: rmmod Tainted: G    B           4.14.5 =
#1
>> kernel: Hardware name: Gigabyte Technology Co., Ltd. =
P67-DS3-B3/P67-DS3-B3, BIOS F1 05/06/2011
>> kernel: Call Trace:
>> kernel:  dump_stack+0x46/0x5e
>> kernel:  slab_err+0x9e/0xb0
>> kernel:  ? on_each_cpu_mask+0x35/0x40
>> kernel:  ? ksm_migrate_page+0x60/0x60
>> kernel:  ? __kmalloc+0x1c9/0x1d0
>> kernel:  __kmem_cache_shutdown+0x177/0x350
>> kernel:  shutdown_cache+0xf/0x130
>> kernel:  kmem_cache_destroy+0x19e/0x1b0
>> kernel:  SyS_delete_module+0x168/0x230
>> kernel:  ? exit_to_usermode_loop+0x39/0x80
>> kernel:  entry_SYSCALL_64_fastpath+0x13/0x94
>> kernel: RIP: 0033:0x7f53e4136b97
>> kernel: RSP: 002b:00007ffd660061d8 EFLAGS: 00000206 ORIG_RAX: =
00000000000000b0
>> kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: =
00007f53e4136b97
>> kernel: RDX: 000000000000000a RSI: 0000000000000800 RDI: =
00000000006247f8
>> kernel: RBP: 0000000000000000 R08: 00007ffd66005171 R09: =
0000000000000000
>> kernel: R10: 00007f53e41acbc0 R11: 0000000000000206 R12: =
0000000000624790
>> kernel: R13: 00007ffd660051d0 R14: 0000000000624790 R15: =
0000000000623260
>> kernel: INFO: Object 0xffff8805807f0000 @offset=3D0
>> kernel: INFO: Object 0xffff8805807f01b0 @offset=3D432
>> kernel: INFO: Object 0xffff8805807f3cc0 @offset=3D15552
>> kernel: kmem_cache_destroy bfq_queue: Slab cache still has objects
>> kernel: CPU: 0 PID: 9967 Comm: rmmod Tainted: G    B           4.14.5 =
#1
>> kernel: Hardware name: Gigabyte Technology Co., Ltd. =
P67-DS3-B3/P67-DS3-B3, BIOS F1 05/06/2011
>> kernel: Call Trace:
>> kernel:  dump_stack+0x46/0x5e
>> kernel:  kmem_cache_destroy+0x191/0x1b0
>> kernel:  SyS_delete_module+0x168/0x230
>> kernel:  ? exit_to_usermode_loop+0x39/0x80
>> kernel:  entry_SYSCALL_64_fastpath+0x13/0x94
>> kernel: RIP: 0033:0x7f53e4136b97
>> kernel: RSP: 002b:00007ffd660061d8 EFLAGS: 00000206 ORIG_RAX: =
00000000000000b0
>> kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: =
00007f53e4136b97
>> kernel: RDX: 000000000000000a RSI: 0000000000000800 RDI: =
00000000006247f8
>> kernel: RBP: 0000000000000000 R08: 00007ffd66005171 R09: =
0000000000000000
>> kernel: R10: 00007f53e41acbc0 R11: 0000000000000206 R12: =
0000000000624790
>> kernel: R13: 00007ffd660051d0 R14: 0000000000624790 R15: =
0000000000623260
>>=20
>=20
> I also encountered the similar bug, seems there are some async objects =
which are not freed with
> BFQ_GROUP_IOSCHED build as "Y".
>=20
> If revert part of commit e21b7a0b988772e82e7147e1c659a5afe2ae003c =
("block, bfq: add full hierarchical
> scheduling and cgroups support")like below, then I can't see the bug =
again with simple test.
>=20
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 889a8549d97f..c640e64e042f 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -4710,6 +4710,7 @@ static void bfq_exit_queue(struct elevator_queue =
*e)
>         spin_lock_irq(&bfqd->lock);
>         list_for_each_entry_safe(bfqq, n, &bfqd->idle_list, bfqq_list)
>                 bfq_deactivate_bfqq(bfqd, bfqq, false, false);
> +       bfq_put_async_queues(bfqd, bfqd->root_group);
>         spin_unlock_irq(&bfqd->lock);
>=20
>         hrtimer_cancel(&bfqd->idle_slice_timer);
> @@ -4718,7 +4719,6 @@ static void bfq_exit_queue(struct elevator_queue =
*e)
>         blkcg_deactivate_policy(bfqd->queue, &blkcg_policy_bfq);
>  #else
>         spin_lock_irq(&bfqd->lock);
> -       bfq_put_async_queues(bfqd, bfqd->root_group);
>         kfree(bfqd->root_group);
>         spin_unlock_irq(&bfqd->lock);
>=20
> But perhaps we should do it inside blkcg_deactivate_policy, right?
>=20

Thank you very much for investigating this. And yes, I removed that =
bfq_put_async_queues invocation a long ago, because it had to be done in =
blkcg_deactivate_policy; and actually it was invoked as expected. We are =
trying to understand what is going wrong now.

Thanks,
Paolo

> Thanks,
> Guoqing

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bfq: BUG bfq_queue: Objects remaining in bfq_queue on __kmem_cache_shutdown() after rmmod
  2017-12-21  7:53   ` Paolo Valente
@ 2017-12-21  9:14     ` Guoqing Jiang
  2018-01-09 11:28       ` Paolo Valente
  0 siblings, 1 reply; 6+ messages in thread
From: Guoqing Jiang @ 2017-12-21  9:14 UTC (permalink / raw)
  To: Paolo Valente; +Cc: Holger Hoffstätte, linux-block, DAVIDE FERRARI



On 12/21/2017 03:53 PM, Paolo Valente wrote:
>
>> Il giorno 21 dic 2017, alle ore 08:08, Guoqing Jiang <gqjiang@suse.com> ha scritto:
>>
>> Hi,
>>
>>
>> On 12/08/2017 08:34 AM, Holger Hoffstätte wrote:
>>> So plugging in a device on USB with BFQ as scheduler now works without
>>> hiccup (probably thanks to Ming Lei's last patch), but of course I found
>>> another problem. Unmounting the device after use, changing the scheduler
>>> back to deadline or kyber and rmmod'ing the BFQ module reproducibly gives me:
>>>
>>> kernel: =============================================================================
>>> kernel: BUG bfq_queue (Tainted: G    B          ): Objects remaining in bfq_queue on __kmem_cache_shutdown()
>>> kernel: -----------------------------------------------------------------------------
>>> kernel:
>>> kernel: INFO: Slab 0xffffea001601fc00 objects=37 used=3 fp=0xffff8805807f0360 flags=0x8000000000008100
>>> kernel: CPU: 0 PID: 9967 Comm: rmmod Tainted: G    B           4.14.5 #1
>>> kernel: Hardware name: Gigabyte Technology Co., Ltd. P67-DS3-B3/P67-DS3-B3, BIOS F1 05/06/2011
>>> kernel: Call Trace:
>>> kernel:  dump_stack+0x46/0x5e
>>> kernel:  slab_err+0x9e/0xb0
>>> kernel:  ? on_each_cpu_mask+0x35/0x40
>>> kernel:  ? ksm_migrate_page+0x60/0x60
>>> kernel:  ? __kmalloc+0x1c9/0x1d0
>>> kernel:  __kmem_cache_shutdown+0x177/0x350
>>> kernel:  shutdown_cache+0xf/0x130
>>> kernel:  kmem_cache_destroy+0x19e/0x1b0
>>> kernel:  SyS_delete_module+0x168/0x230
>>> kernel:  ? exit_to_usermode_loop+0x39/0x80
>>> kernel:  entry_SYSCALL_64_fastpath+0x13/0x94
>>> kernel: RIP: 0033:0x7f53e4136b97
>>> kernel: RSP: 002b:00007ffd660061d8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
>>> kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f53e4136b97
>>> kernel: RDX: 000000000000000a RSI: 0000000000000800 RDI: 00000000006247f8
>>> kernel: RBP: 0000000000000000 R08: 00007ffd66005171 R09: 0000000000000000
>>> kernel: R10: 00007f53e41acbc0 R11: 0000000000000206 R12: 0000000000624790
>>> kernel: R13: 00007ffd660051d0 R14: 0000000000624790 R15: 0000000000623260
>>> kernel: INFO: Object 0xffff8805807f0000 @offset=0
>>> kernel: INFO: Object 0xffff8805807f01b0 @offset=432
>>> kernel: INFO: Object 0xffff8805807f3cc0 @offset=15552
>>> kernel: kmem_cache_destroy bfq_queue: Slab cache still has objects
>>> kernel: CPU: 0 PID: 9967 Comm: rmmod Tainted: G    B           4.14.5 #1
>>> kernel: Hardware name: Gigabyte Technology Co., Ltd. P67-DS3-B3/P67-DS3-B3, BIOS F1 05/06/2011
>>> kernel: Call Trace:
>>> kernel:  dump_stack+0x46/0x5e
>>> kernel:  kmem_cache_destroy+0x191/0x1b0
>>> kernel:  SyS_delete_module+0x168/0x230
>>> kernel:  ? exit_to_usermode_loop+0x39/0x80
>>> kernel:  entry_SYSCALL_64_fastpath+0x13/0x94
>>> kernel: RIP: 0033:0x7f53e4136b97
>>> kernel: RSP: 002b:00007ffd660061d8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
>>> kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f53e4136b97
>>> kernel: RDX: 000000000000000a RSI: 0000000000000800 RDI: 00000000006247f8
>>> kernel: RBP: 0000000000000000 R08: 00007ffd66005171 R09: 0000000000000000
>>> kernel: R10: 00007f53e41acbc0 R11: 0000000000000206 R12: 0000000000624790
>>> kernel: R13: 00007ffd660051d0 R14: 0000000000624790 R15: 0000000000623260
>>>
>> I also encountered the similar bug, seems there are some async objects which are not freed with
>> BFQ_GROUP_IOSCHED build as "Y".
>>
>> If revert part of commit e21b7a0b988772e82e7147e1c659a5afe2ae003c ("block, bfq: add full hierarchical
>> scheduling and cgroups support")like below, then I can't see the bug again with simple test.
>>
>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
>> index 889a8549d97f..c640e64e042f 100644
>> --- a/block/bfq-iosched.c
>> +++ b/block/bfq-iosched.c
>> @@ -4710,6 +4710,7 @@ static void bfq_exit_queue(struct elevator_queue *e)
>>          spin_lock_irq(&bfqd->lock);
>>          list_for_each_entry_safe(bfqq, n, &bfqd->idle_list, bfqq_list)
>>                  bfq_deactivate_bfqq(bfqd, bfqq, false, false);
>> +       bfq_put_async_queues(bfqd, bfqd->root_group);
>>          spin_unlock_irq(&bfqd->lock);
>>
>>          hrtimer_cancel(&bfqd->idle_slice_timer);
>> @@ -4718,7 +4719,6 @@ static void bfq_exit_queue(struct elevator_queue *e)
>>          blkcg_deactivate_policy(bfqd->queue, &blkcg_policy_bfq);
>>   #else
>>          spin_lock_irq(&bfqd->lock);
>> -       bfq_put_async_queues(bfqd, bfqd->root_group);
>>          kfree(bfqd->root_group);
>>          spin_unlock_irq(&bfqd->lock);
>>
>> But perhaps we should do it inside blkcg_deactivate_policy, right?
>>
> Thank you very much for investigating this. And yes, I removed that bfq_put_async_queues invocation a long ago, because it had to be done in blkcg_deactivate_policy; and actually it was invoked as expected. We are trying to understand what is going wrong now.

Thanks for the in formations. I find another way, does it make sense?

diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index da1525ec4c87..0a070daf96c7 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -474,7 +474,10 @@ static void bfq_pd_init(struct blkg_policy_data *pd)
  static void bfq_pd_free(struct blkg_policy_data *pd)
  {
         struct bfq_group *bfqg = pd_to_bfqg(pd);
+       struct bfq_data *bfqd = bfqg->bfqd;

+       if (bfqd)
+               bfq_put_async_queues(bfqd, bfqg);
         bfqg_stats_exit(&bfqg->stats);
         bfqg_put(bfqg);
  }

Regards,
Guoqing

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: bfq: BUG bfq_queue: Objects remaining in bfq_queue on __kmem_cache_shutdown() after rmmod
  2017-12-21  9:14     ` Guoqing Jiang
@ 2018-01-09 11:28       ` Paolo Valente
  0 siblings, 0 replies; 6+ messages in thread
From: Paolo Valente @ 2018-01-09 11:28 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: Holger Hoffstätte, linux-block, DAVIDE FERRARI



> Il giorno 21 dic 2017, alle ore 10:14, Guoqing Jiang =
<gqjiang@suse.com> ha scritto:
>=20
>=20
>=20
> On 12/21/2017 03:53 PM, Paolo Valente wrote:
>>=20
>>> Il giorno 21 dic 2017, alle ore 08:08, Guoqing Jiang =
<gqjiang@suse.com> ha scritto:
>>>=20
>>> Hi,
>>>=20
>>>=20
>>> On 12/08/2017 08:34 AM, Holger Hoffst=C3=A4tte wrote:
>>>> So plugging in a device on USB with BFQ as scheduler now works =
without
>>>> hiccup (probably thanks to Ming Lei's last patch), but of course I =
found
>>>> another problem. Unmounting the device after use, changing the =
scheduler
>>>> back to deadline or kyber and rmmod'ing the BFQ module reproducibly =
gives me:
>>>>=20
>>>> kernel: =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
>>>> kernel: BUG bfq_queue (Tainted: G    B          ): Objects =
remaining in bfq_queue on __kmem_cache_shutdown()
>>>> kernel: =
--------------------------------------------------------------------------=
---
>>>> kernel:
>>>> kernel: INFO: Slab 0xffffea001601fc00 objects=3D37 used=3D3 =
fp=3D0xffff8805807f0360 flags=3D0x8000000000008100
>>>> kernel: CPU: 0 PID: 9967 Comm: rmmod Tainted: G    B           =
4.14.5 #1
>>>> kernel: Hardware name: Gigabyte Technology Co., Ltd. =
P67-DS3-B3/P67-DS3-B3, BIOS F1 05/06/2011
>>>> kernel: Call Trace:
>>>> kernel:  dump_stack+0x46/0x5e
>>>> kernel:  slab_err+0x9e/0xb0
>>>> kernel:  ? on_each_cpu_mask+0x35/0x40
>>>> kernel:  ? ksm_migrate_page+0x60/0x60
>>>> kernel:  ? __kmalloc+0x1c9/0x1d0
>>>> kernel:  __kmem_cache_shutdown+0x177/0x350
>>>> kernel:  shutdown_cache+0xf/0x130
>>>> kernel:  kmem_cache_destroy+0x19e/0x1b0
>>>> kernel:  SyS_delete_module+0x168/0x230
>>>> kernel:  ? exit_to_usermode_loop+0x39/0x80
>>>> kernel:  entry_SYSCALL_64_fastpath+0x13/0x94
>>>> kernel: RIP: 0033:0x7f53e4136b97
>>>> kernel: RSP: 002b:00007ffd660061d8 EFLAGS: 00000206 ORIG_RAX: =
00000000000000b0
>>>> kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: =
00007f53e4136b97
>>>> kernel: RDX: 000000000000000a RSI: 0000000000000800 RDI: =
00000000006247f8
>>>> kernel: RBP: 0000000000000000 R08: 00007ffd66005171 R09: =
0000000000000000
>>>> kernel: R10: 00007f53e41acbc0 R11: 0000000000000206 R12: =
0000000000624790
>>>> kernel: R13: 00007ffd660051d0 R14: 0000000000624790 R15: =
0000000000623260
>>>> kernel: INFO: Object 0xffff8805807f0000 @offset=3D0
>>>> kernel: INFO: Object 0xffff8805807f01b0 @offset=3D432
>>>> kernel: INFO: Object 0xffff8805807f3cc0 @offset=3D15552
>>>> kernel: kmem_cache_destroy bfq_queue: Slab cache still has objects
>>>> kernel: CPU: 0 PID: 9967 Comm: rmmod Tainted: G    B           =
4.14.5 #1
>>>> kernel: Hardware name: Gigabyte Technology Co., Ltd. =
P67-DS3-B3/P67-DS3-B3, BIOS F1 05/06/2011
>>>> kernel: Call Trace:
>>>> kernel:  dump_stack+0x46/0x5e
>>>> kernel:  kmem_cache_destroy+0x191/0x1b0
>>>> kernel:  SyS_delete_module+0x168/0x230
>>>> kernel:  ? exit_to_usermode_loop+0x39/0x80
>>>> kernel:  entry_SYSCALL_64_fastpath+0x13/0x94
>>>> kernel: RIP: 0033:0x7f53e4136b97
>>>> kernel: RSP: 002b:00007ffd660061d8 EFLAGS: 00000206 ORIG_RAX: =
00000000000000b0
>>>> kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: =
00007f53e4136b97
>>>> kernel: RDX: 000000000000000a RSI: 0000000000000800 RDI: =
00000000006247f8
>>>> kernel: RBP: 0000000000000000 R08: 00007ffd66005171 R09: =
0000000000000000
>>>> kernel: R10: 00007f53e41acbc0 R11: 0000000000000206 R12: =
0000000000624790
>>>> kernel: R13: 00007ffd660051d0 R14: 0000000000624790 R15: =
0000000000623260
>>>>=20
>>> I also encountered the similar bug, seems there are some async =
objects which are not freed with
>>> BFQ_GROUP_IOSCHED build as "Y".
>>>=20
>>> If revert part of commit e21b7a0b988772e82e7147e1c659a5afe2ae003c =
("block, bfq: add full hierarchical
>>> scheduling and cgroups support")like below, then I can't see the bug =
again with simple test.
>>>=20
>>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
>>> index 889a8549d97f..c640e64e042f 100644
>>> --- a/block/bfq-iosched.c
>>> +++ b/block/bfq-iosched.c
>>> @@ -4710,6 +4710,7 @@ static void bfq_exit_queue(struct =
elevator_queue *e)
>>>         spin_lock_irq(&bfqd->lock);
>>>         list_for_each_entry_safe(bfqq, n, &bfqd->idle_list, =
bfqq_list)
>>>                 bfq_deactivate_bfqq(bfqd, bfqq, false, false);
>>> +       bfq_put_async_queues(bfqd, bfqd->root_group);
>>>         spin_unlock_irq(&bfqd->lock);
>>>=20
>>>         hrtimer_cancel(&bfqd->idle_slice_timer);
>>> @@ -4718,7 +4719,6 @@ static void bfq_exit_queue(struct =
elevator_queue *e)
>>>         blkcg_deactivate_policy(bfqd->queue, &blkcg_policy_bfq);
>>>  #else
>>>         spin_lock_irq(&bfqd->lock);
>>> -       bfq_put_async_queues(bfqd, bfqd->root_group);
>>>         kfree(bfqd->root_group);
>>>         spin_unlock_irq(&bfqd->lock);
>>>=20
>>> But perhaps we should do it inside blkcg_deactivate_policy, right?
>>>=20
>> Thank you very much for investigating this. And yes, I removed that =
bfq_put_async_queues invocation a long ago, because it had to be done in =
blkcg_deactivate_policy; and actually it was invoked as expected. We are =
trying to understand what is going wrong now.
>=20
> Thanks for the in formations. I find another way, does it make sense?
>=20
> diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
> index da1525ec4c87..0a070daf96c7 100644
> --- a/block/bfq-cgroup.c
> +++ b/block/bfq-cgroup.c
> @@ -474,7 +474,10 @@ static void bfq_pd_init(struct blkg_policy_data =
*pd)
>  static void bfq_pd_free(struct blkg_policy_data *pd)
>  {
>         struct bfq_group *bfqg =3D pd_to_bfqg(pd);
> +       struct bfq_data *bfqd =3D bfqg->bfqd;
>=20
> +       if (bfqd)
> +               bfq_put_async_queues(bfqd, bfqg);
>         bfqg_stats_exit(&bfqg->stats);
>         bfqg_put(bfqg);
>  }
>=20

Yes, thanks.  Actually, the mistake was a little bit more articulate.  I =
have just
submitted a hopefully clean fix:
https://www.spinics.net/lists/kernel/msg2692962.html

Any feedback is more than welcome.

Thanks again,
Paolo

> Regards,
> Guoqing

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-01-09 11:29 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-08  0:34 bfq: BUG bfq_queue: Objects remaining in bfq_queue on __kmem_cache_shutdown() after rmmod Holger Hoffstätte
2017-12-20  6:24 ` Paolo Valente
2017-12-21  7:08 ` Guoqing Jiang
2017-12-21  7:53   ` Paolo Valente
2017-12-21  9:14     ` Guoqing Jiang
2018-01-09 11:28       ` Paolo Valente

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.