All of lore.kernel.org
 help / color / mirror / Atom feed
* Sleeping while atomic regression around hd_struct_free() in 5.9-rc
@ 2020-08-28 10:32 Ilya Dryomov
  2020-08-28 13:05 ` Ming Lei
  0 siblings, 1 reply; 3+ messages in thread
From: Ilya Dryomov @ 2020-08-28 10:32 UTC (permalink / raw)
  To: Ming Lei; +Cc: Yufen Yu, Hou Tao, Christoph Hellwig, Jens Axboe, linux-block

Hi Ming,

There seems to be a sleeping while atomic bug in hd_struct_free():

288 static void hd_struct_free(struct percpu_ref *ref)
289 {
290         struct hd_struct *part = container_of(ref, struct hd_struct, ref);
291         struct gendisk *disk = part_to_disk(part);
292         struct disk_part_tbl *ptbl =
293                 rcu_dereference_protected(disk->part_tbl, 1);
294
295         rcu_assign_pointer(ptbl->last_lookup, NULL);
296         put_device(disk_to_dev(disk));
297
298         INIT_RCU_WORK(&part->rcu_work, hd_struct_free_work);
299         queue_rcu_work(system_wq, &part->rcu_work);
300 }

hd_struct_free() is a percpu_ref release callback and must not sleep.
But put_device() can end up in disk_release(), resulting in anything
from "sleeping function called from invalid context" splats to actual
lockups if the queue ends up being released:

  BUG: scheduling while atomic: ksoftirqd/3/26/0x00000102
  INFO: lockdep is turned off.
  CPU: 3 PID: 26 Comm: ksoftirqd/3 Tainted: G        W
5.9.0-rc2-ceph-g2de49bea2ebc #1
  Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015
  Call Trace:
   dump_stack+0x96/0xd0
   __schedule_bug.cold+0x64/0x71
   __schedule+0x8ea/0xac0
   ? wait_for_completion+0x86/0x110
   schedule+0x5f/0xd0
   schedule_timeout+0x212/0x2a0
   ? wait_for_completion+0x86/0x110
   wait_for_completion+0xb0/0x110
   __wait_rcu_gp+0x139/0x150
   synchronize_rcu+0x79/0xf0
   ? invoke_rcu_core+0xb0/0xb0
   ? rcu_read_lock_any_held+0xb0/0xb0
   blk_free_flush_queue+0x17/0x30
   blk_mq_hw_sysfs_release+0x32/0x70
   kobject_put+0x7d/0x1d0
   blk_mq_release+0xbe/0xf0
   blk_release_queue+0xb7/0x140
   kobject_put+0x7d/0x1d0
   disk_release+0xb0/0xc0
   device_release+0x25/0x80
   kobject_put+0x7d/0x1d0
   hd_struct_free+0x4c/0xc0
   percpu_ref_switch_to_atomic_rcu+0x1df/0x1f0
   rcu_core+0x3fd/0x660
   ? rcu_core+0x3cc/0x660
   __do_softirq+0xd5/0x45e
   ? smpboot_thread_fn+0x26/0x1d0
   run_ksoftirqd+0x30/0x60
   smpboot_thread_fn+0xfe/0x1d0
   ? sort_range+0x20/0x20
   kthread+0x11a/0x150
   ? kthread_delayed_work_timer_fn+0xa0/0xa0
   ret_from_fork+0x1f/0x30

"git blame" points at your commit tb7d6c3033323 ("block: fix
use-after-free on cached last_lookup partition"), but there is
likely more to it because it went into 5.8 and I haven't seen
these lockups until we rebased to 5.9-rc.

Could you please take a look?

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Sleeping while atomic regression around hd_struct_free() in 5.9-rc
  2020-08-28 10:32 Sleeping while atomic regression around hd_struct_free() in 5.9-rc Ilya Dryomov
@ 2020-08-28 13:05 ` Ming Lei
  2020-08-30  8:46   ` Ilya Dryomov
  0 siblings, 1 reply; 3+ messages in thread
From: Ming Lei @ 2020-08-28 13:05 UTC (permalink / raw)
  To: Ilya Dryomov
  Cc: Yufen Yu, Hou Tao, Christoph Hellwig, Jens Axboe, linux-block

Hi Ilya,

Thanks for your report!

On Fri, Aug 28, 2020 at 12:32:48PM +0200, Ilya Dryomov wrote:
> Hi Ming,
> 
> There seems to be a sleeping while atomic bug in hd_struct_free():
> 
> 288 static void hd_struct_free(struct percpu_ref *ref)
> 289 {
> 290         struct hd_struct *part = container_of(ref, struct hd_struct, ref);
> 291         struct gendisk *disk = part_to_disk(part);
> 292         struct disk_part_tbl *ptbl =
> 293                 rcu_dereference_protected(disk->part_tbl, 1);
> 294
> 295         rcu_assign_pointer(ptbl->last_lookup, NULL);
> 296         put_device(disk_to_dev(disk));
> 297
> 298         INIT_RCU_WORK(&part->rcu_work, hd_struct_free_work);
> 299         queue_rcu_work(system_wq, &part->rcu_work);
> 300 }
> 
> hd_struct_free() is a percpu_ref release callback and must not sleep.
> But put_device() can end up in disk_release(), resulting in anything
> from "sleeping function called from invalid context" splats to actual
> lockups if the queue ends up being released:
> 
>   BUG: scheduling while atomic: ksoftirqd/3/26/0x00000102
>   INFO: lockdep is turned off.
>   CPU: 3 PID: 26 Comm: ksoftirqd/3 Tainted: G        W
> 5.9.0-rc2-ceph-g2de49bea2ebc #1
>   Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015
>   Call Trace:
>    dump_stack+0x96/0xd0
>    __schedule_bug.cold+0x64/0x71
>    __schedule+0x8ea/0xac0
>    ? wait_for_completion+0x86/0x110
>    schedule+0x5f/0xd0
>    schedule_timeout+0x212/0x2a0
>    ? wait_for_completion+0x86/0x110
>    wait_for_completion+0xb0/0x110
>    __wait_rcu_gp+0x139/0x150
>    synchronize_rcu+0x79/0xf0
>    ? invoke_rcu_core+0xb0/0xb0
>    ? rcu_read_lock_any_held+0xb0/0xb0
>    blk_free_flush_queue+0x17/0x30
>    blk_mq_hw_sysfs_release+0x32/0x70
>    kobject_put+0x7d/0x1d0
>    blk_mq_release+0xbe/0xf0
>    blk_release_queue+0xb7/0x140
>    kobject_put+0x7d/0x1d0
>    disk_release+0xb0/0xc0
>    device_release+0x25/0x80
>    kobject_put+0x7d/0x1d0
>    hd_struct_free+0x4c/0xc0
>    percpu_ref_switch_to_atomic_rcu+0x1df/0x1f0
>    rcu_core+0x3fd/0x660
>    ? rcu_core+0x3cc/0x660
>    __do_softirq+0xd5/0x45e
>    ? smpboot_thread_fn+0x26/0x1d0
>    run_ksoftirqd+0x30/0x60
>    smpboot_thread_fn+0xfe/0x1d0
>    ? sort_range+0x20/0x20
>    kthread+0x11a/0x150
>    ? kthread_delayed_work_timer_fn+0xa0/0xa0
>    ret_from_fork+0x1f/0x30
> 
> "git blame" points at your commit tb7d6c3033323 ("block: fix
> use-after-free on cached last_lookup partition"), but there is
> likely more to it because it went into 5.8 and I haven't seen
> these lockups until we rebased to 5.9-rc.

The pull-the-trigger commit is actually e8c7d14ac6c3 ("block: revert back to
synchronous request_queue removal").

> 
> Could you please take a look?

Can you try the following patch?

diff --git a/block/partitions/core.c b/block/partitions/core.c
index e62a98a8eeb7..b06fc3425802 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -278,6 +278,15 @@ static void hd_struct_free_work(struct work_struct *work)
 {
 	struct hd_struct *part =
 		container_of(to_rcu_work(work), struct hd_struct, rcu_work);
+	struct gendisk *disk = part_to_disk(part);
+
+	/*
+	 * Release the reference grabbed in delete_partition, and it should
+	 * have been done in hd_struct_free(), however device's release
+	 * handler can't be done in percpu_ref's ->release() callback
+	 * because it is run via call_rcu().
+	 */
+	put_device(disk_to_dev(disk));
 
 	part->start_sect = 0;
 	part->nr_sects = 0;
@@ -293,7 +302,6 @@ static void hd_struct_free(struct percpu_ref *ref)
 		rcu_dereference_protected(disk->part_tbl, 1);
 
 	rcu_assign_pointer(ptbl->last_lookup, NULL);
-	put_device(disk_to_dev(disk));
 
 	INIT_RCU_WORK(&part->rcu_work, hd_struct_free_work);
 	queue_rcu_work(system_wq, &part->rcu_work);



Thanks,
Ming


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: Sleeping while atomic regression around hd_struct_free() in 5.9-rc
  2020-08-28 13:05 ` Ming Lei
@ 2020-08-30  8:46   ` Ilya Dryomov
  0 siblings, 0 replies; 3+ messages in thread
From: Ilya Dryomov @ 2020-08-30  8:46 UTC (permalink / raw)
  To: Ming Lei; +Cc: Yufen Yu, Hou Tao, Christoph Hellwig, Jens Axboe, linux-block

On Fri, Aug 28, 2020 at 3:05 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> Hi Ilya,
>
> Thanks for your report!
>
> On Fri, Aug 28, 2020 at 12:32:48PM +0200, Ilya Dryomov wrote:
> > Hi Ming,
> >
> > There seems to be a sleeping while atomic bug in hd_struct_free():
> >
> > 288 static void hd_struct_free(struct percpu_ref *ref)
> > 289 {
> > 290         struct hd_struct *part = container_of(ref, struct hd_struct, ref);
> > 291         struct gendisk *disk = part_to_disk(part);
> > 292         struct disk_part_tbl *ptbl =
> > 293                 rcu_dereference_protected(disk->part_tbl, 1);
> > 294
> > 295         rcu_assign_pointer(ptbl->last_lookup, NULL);
> > 296         put_device(disk_to_dev(disk));
> > 297
> > 298         INIT_RCU_WORK(&part->rcu_work, hd_struct_free_work);
> > 299         queue_rcu_work(system_wq, &part->rcu_work);
> > 300 }
> >
> > hd_struct_free() is a percpu_ref release callback and must not sleep.
> > But put_device() can end up in disk_release(), resulting in anything
> > from "sleeping function called from invalid context" splats to actual
> > lockups if the queue ends up being released:
> >
> >   BUG: scheduling while atomic: ksoftirqd/3/26/0x00000102
> >   INFO: lockdep is turned off.
> >   CPU: 3 PID: 26 Comm: ksoftirqd/3 Tainted: G        W
> > 5.9.0-rc2-ceph-g2de49bea2ebc #1
> >   Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015
> >   Call Trace:
> >    dump_stack+0x96/0xd0
> >    __schedule_bug.cold+0x64/0x71
> >    __schedule+0x8ea/0xac0
> >    ? wait_for_completion+0x86/0x110
> >    schedule+0x5f/0xd0
> >    schedule_timeout+0x212/0x2a0
> >    ? wait_for_completion+0x86/0x110
> >    wait_for_completion+0xb0/0x110
> >    __wait_rcu_gp+0x139/0x150
> >    synchronize_rcu+0x79/0xf0
> >    ? invoke_rcu_core+0xb0/0xb0
> >    ? rcu_read_lock_any_held+0xb0/0xb0
> >    blk_free_flush_queue+0x17/0x30
> >    blk_mq_hw_sysfs_release+0x32/0x70
> >    kobject_put+0x7d/0x1d0
> >    blk_mq_release+0xbe/0xf0
> >    blk_release_queue+0xb7/0x140
> >    kobject_put+0x7d/0x1d0
> >    disk_release+0xb0/0xc0
> >    device_release+0x25/0x80
> >    kobject_put+0x7d/0x1d0
> >    hd_struct_free+0x4c/0xc0
> >    percpu_ref_switch_to_atomic_rcu+0x1df/0x1f0
> >    rcu_core+0x3fd/0x660
> >    ? rcu_core+0x3cc/0x660
> >    __do_softirq+0xd5/0x45e
> >    ? smpboot_thread_fn+0x26/0x1d0
> >    run_ksoftirqd+0x30/0x60
> >    smpboot_thread_fn+0xfe/0x1d0
> >    ? sort_range+0x20/0x20
> >    kthread+0x11a/0x150
> >    ? kthread_delayed_work_timer_fn+0xa0/0xa0
> >    ret_from_fork+0x1f/0x30
> >
> > "git blame" points at your commit tb7d6c3033323 ("block: fix
> > use-after-free on cached last_lookup partition"), but there is
> > likely more to it because it went into 5.8 and I haven't seen
> > these lockups until we rebased to 5.9-rc.
>
> The pull-the-trigger commit is actually e8c7d14ac6c3 ("block: revert back to
> synchronous request_queue removal").
>
> >
> > Could you please take a look?
>
> Can you try the following patch?
>
> diff --git a/block/partitions/core.c b/block/partitions/core.c
> index e62a98a8eeb7..b06fc3425802 100644
> --- a/block/partitions/core.c
> +++ b/block/partitions/core.c
> @@ -278,6 +278,15 @@ static void hd_struct_free_work(struct work_struct *work)
>  {
>         struct hd_struct *part =
>                 container_of(to_rcu_work(work), struct hd_struct, rcu_work);
> +       struct gendisk *disk = part_to_disk(part);
> +
> +       /*
> +        * Release the reference grabbed in delete_partition, and it should
> +        * have been done in hd_struct_free(), however device's release
> +        * handler can't be done in percpu_ref's ->release() callback
> +        * because it is run via call_rcu().
> +        */
> +       put_device(disk_to_dev(disk));
>
>         part->start_sect = 0;
>         part->nr_sects = 0;
> @@ -293,7 +302,6 @@ static void hd_struct_free(struct percpu_ref *ref)
>                 rcu_dereference_protected(disk->part_tbl, 1);
>
>         rcu_assign_pointer(ptbl->last_lookup, NULL);
> -       put_device(disk_to_dev(disk));
>
>         INIT_RCU_WORK(&part->rcu_work, hd_struct_free_work);
>         queue_rcu_work(system_wq, &part->rcu_work);

This patch fixes it for me.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-08-30  8:46 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-28 10:32 Sleeping while atomic regression around hd_struct_free() in 5.9-rc Ilya Dryomov
2020-08-28 13:05 ` Ming Lei
2020-08-30  8:46   ` Ilya Dryomov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.