* Sleeping while atomic regression around hd_struct_free() in 5.9-rc
@ 2020-08-28 10:32 Ilya Dryomov
2020-08-28 13:05 ` Ming Lei
0 siblings, 1 reply; 3+ messages in thread
From: Ilya Dryomov @ 2020-08-28 10:32 UTC (permalink / raw)
To: Ming Lei; +Cc: Yufen Yu, Hou Tao, Christoph Hellwig, Jens Axboe, linux-block
Hi Ming,
There seems to be a sleeping while atomic bug in hd_struct_free():
288 static void hd_struct_free(struct percpu_ref *ref)
289 {
290 struct hd_struct *part = container_of(ref, struct hd_struct, ref);
291 struct gendisk *disk = part_to_disk(part);
292 struct disk_part_tbl *ptbl =
293 rcu_dereference_protected(disk->part_tbl, 1);
294
295 rcu_assign_pointer(ptbl->last_lookup, NULL);
296 put_device(disk_to_dev(disk));
297
298 INIT_RCU_WORK(&part->rcu_work, hd_struct_free_work);
299 queue_rcu_work(system_wq, &part->rcu_work);
300 }
hd_struct_free() is a percpu_ref release callback and must not sleep.
But put_device() can end up in disk_release(), resulting in anything
from "sleeping function called from invalid context" splats to actual
lockups if the queue ends up being released:
BUG: scheduling while atomic: ksoftirqd/3/26/0x00000102
INFO: lockdep is turned off.
CPU: 3 PID: 26 Comm: ksoftirqd/3 Tainted: G W
5.9.0-rc2-ceph-g2de49bea2ebc #1
Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015
Call Trace:
dump_stack+0x96/0xd0
__schedule_bug.cold+0x64/0x71
__schedule+0x8ea/0xac0
? wait_for_completion+0x86/0x110
schedule+0x5f/0xd0
schedule_timeout+0x212/0x2a0
? wait_for_completion+0x86/0x110
wait_for_completion+0xb0/0x110
__wait_rcu_gp+0x139/0x150
synchronize_rcu+0x79/0xf0
? invoke_rcu_core+0xb0/0xb0
? rcu_read_lock_any_held+0xb0/0xb0
blk_free_flush_queue+0x17/0x30
blk_mq_hw_sysfs_release+0x32/0x70
kobject_put+0x7d/0x1d0
blk_mq_release+0xbe/0xf0
blk_release_queue+0xb7/0x140
kobject_put+0x7d/0x1d0
disk_release+0xb0/0xc0
device_release+0x25/0x80
kobject_put+0x7d/0x1d0
hd_struct_free+0x4c/0xc0
percpu_ref_switch_to_atomic_rcu+0x1df/0x1f0
rcu_core+0x3fd/0x660
? rcu_core+0x3cc/0x660
__do_softirq+0xd5/0x45e
? smpboot_thread_fn+0x26/0x1d0
run_ksoftirqd+0x30/0x60
smpboot_thread_fn+0xfe/0x1d0
? sort_range+0x20/0x20
kthread+0x11a/0x150
? kthread_delayed_work_timer_fn+0xa0/0xa0
ret_from_fork+0x1f/0x30
"git blame" points at your commit tb7d6c3033323 ("block: fix
use-after-free on cached last_lookup partition"), but there is
likely more to it because it went into 5.8 and I haven't seen
these lockups until we rebased to 5.9-rc.
Could you please take a look?
Thanks,
Ilya
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Sleeping while atomic regression around hd_struct_free() in 5.9-rc
2020-08-28 10:32 Sleeping while atomic regression around hd_struct_free() in 5.9-rc Ilya Dryomov
@ 2020-08-28 13:05 ` Ming Lei
2020-08-30 8:46 ` Ilya Dryomov
0 siblings, 1 reply; 3+ messages in thread
From: Ming Lei @ 2020-08-28 13:05 UTC (permalink / raw)
To: Ilya Dryomov
Cc: Yufen Yu, Hou Tao, Christoph Hellwig, Jens Axboe, linux-block
Hi Ilya,
Thanks for your report!
On Fri, Aug 28, 2020 at 12:32:48PM +0200, Ilya Dryomov wrote:
> Hi Ming,
>
> There seems to be a sleeping while atomic bug in hd_struct_free():
>
> 288 static void hd_struct_free(struct percpu_ref *ref)
> 289 {
> 290 struct hd_struct *part = container_of(ref, struct hd_struct, ref);
> 291 struct gendisk *disk = part_to_disk(part);
> 292 struct disk_part_tbl *ptbl =
> 293 rcu_dereference_protected(disk->part_tbl, 1);
> 294
> 295 rcu_assign_pointer(ptbl->last_lookup, NULL);
> 296 put_device(disk_to_dev(disk));
> 297
> 298 INIT_RCU_WORK(&part->rcu_work, hd_struct_free_work);
> 299 queue_rcu_work(system_wq, &part->rcu_work);
> 300 }
>
> hd_struct_free() is a percpu_ref release callback and must not sleep.
> But put_device() can end up in disk_release(), resulting in anything
> from "sleeping function called from invalid context" splats to actual
> lockups if the queue ends up being released:
>
> BUG: scheduling while atomic: ksoftirqd/3/26/0x00000102
> INFO: lockdep is turned off.
> CPU: 3 PID: 26 Comm: ksoftirqd/3 Tainted: G W
> 5.9.0-rc2-ceph-g2de49bea2ebc #1
> Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015
> Call Trace:
> dump_stack+0x96/0xd0
> __schedule_bug.cold+0x64/0x71
> __schedule+0x8ea/0xac0
> ? wait_for_completion+0x86/0x110
> schedule+0x5f/0xd0
> schedule_timeout+0x212/0x2a0
> ? wait_for_completion+0x86/0x110
> wait_for_completion+0xb0/0x110
> __wait_rcu_gp+0x139/0x150
> synchronize_rcu+0x79/0xf0
> ? invoke_rcu_core+0xb0/0xb0
> ? rcu_read_lock_any_held+0xb0/0xb0
> blk_free_flush_queue+0x17/0x30
> blk_mq_hw_sysfs_release+0x32/0x70
> kobject_put+0x7d/0x1d0
> blk_mq_release+0xbe/0xf0
> blk_release_queue+0xb7/0x140
> kobject_put+0x7d/0x1d0
> disk_release+0xb0/0xc0
> device_release+0x25/0x80
> kobject_put+0x7d/0x1d0
> hd_struct_free+0x4c/0xc0
> percpu_ref_switch_to_atomic_rcu+0x1df/0x1f0
> rcu_core+0x3fd/0x660
> ? rcu_core+0x3cc/0x660
> __do_softirq+0xd5/0x45e
> ? smpboot_thread_fn+0x26/0x1d0
> run_ksoftirqd+0x30/0x60
> smpboot_thread_fn+0xfe/0x1d0
> ? sort_range+0x20/0x20
> kthread+0x11a/0x150
> ? kthread_delayed_work_timer_fn+0xa0/0xa0
> ret_from_fork+0x1f/0x30
>
> "git blame" points at your commit tb7d6c3033323 ("block: fix
> use-after-free on cached last_lookup partition"), but there is
> likely more to it because it went into 5.8 and I haven't seen
> these lockups until we rebased to 5.9-rc.
The pull-the-trigger commit is actually e8c7d14ac6c3 ("block: revert back to
synchronous request_queue removal").
>
> Could you please take a look?
Can you try the following patch?
diff --git a/block/partitions/core.c b/block/partitions/core.c
index e62a98a8eeb7..b06fc3425802 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -278,6 +278,15 @@ static void hd_struct_free_work(struct work_struct *work)
{
struct hd_struct *part =
container_of(to_rcu_work(work), struct hd_struct, rcu_work);
+ struct gendisk *disk = part_to_disk(part);
+
+ /*
+ * Release the reference grabbed in delete_partition, and it should
+ * have been done in hd_struct_free(), however device's release
+ * handler can't be done in percpu_ref's ->release() callback
+ * because it is run via call_rcu().
+ */
+ put_device(disk_to_dev(disk));
part->start_sect = 0;
part->nr_sects = 0;
@@ -293,7 +302,6 @@ static void hd_struct_free(struct percpu_ref *ref)
rcu_dereference_protected(disk->part_tbl, 1);
rcu_assign_pointer(ptbl->last_lookup, NULL);
- put_device(disk_to_dev(disk));
INIT_RCU_WORK(&part->rcu_work, hd_struct_free_work);
queue_rcu_work(system_wq, &part->rcu_work);
Thanks,
Ming
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: Sleeping while atomic regression around hd_struct_free() in 5.9-rc
2020-08-28 13:05 ` Ming Lei
@ 2020-08-30 8:46 ` Ilya Dryomov
0 siblings, 0 replies; 3+ messages in thread
From: Ilya Dryomov @ 2020-08-30 8:46 UTC (permalink / raw)
To: Ming Lei; +Cc: Yufen Yu, Hou Tao, Christoph Hellwig, Jens Axboe, linux-block
On Fri, Aug 28, 2020 at 3:05 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> Hi Ilya,
>
> Thanks for your report!
>
> On Fri, Aug 28, 2020 at 12:32:48PM +0200, Ilya Dryomov wrote:
> > Hi Ming,
> >
> > There seems to be a sleeping while atomic bug in hd_struct_free():
> >
> > 288 static void hd_struct_free(struct percpu_ref *ref)
> > 289 {
> > 290 struct hd_struct *part = container_of(ref, struct hd_struct, ref);
> > 291 struct gendisk *disk = part_to_disk(part);
> > 292 struct disk_part_tbl *ptbl =
> > 293 rcu_dereference_protected(disk->part_tbl, 1);
> > 294
> > 295 rcu_assign_pointer(ptbl->last_lookup, NULL);
> > 296 put_device(disk_to_dev(disk));
> > 297
> > 298 INIT_RCU_WORK(&part->rcu_work, hd_struct_free_work);
> > 299 queue_rcu_work(system_wq, &part->rcu_work);
> > 300 }
> >
> > hd_struct_free() is a percpu_ref release callback and must not sleep.
> > But put_device() can end up in disk_release(), resulting in anything
> > from "sleeping function called from invalid context" splats to actual
> > lockups if the queue ends up being released:
> >
> > BUG: scheduling while atomic: ksoftirqd/3/26/0x00000102
> > INFO: lockdep is turned off.
> > CPU: 3 PID: 26 Comm: ksoftirqd/3 Tainted: G W
> > 5.9.0-rc2-ceph-g2de49bea2ebc #1
> > Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015
> > Call Trace:
> > dump_stack+0x96/0xd0
> > __schedule_bug.cold+0x64/0x71
> > __schedule+0x8ea/0xac0
> > ? wait_for_completion+0x86/0x110
> > schedule+0x5f/0xd0
> > schedule_timeout+0x212/0x2a0
> > ? wait_for_completion+0x86/0x110
> > wait_for_completion+0xb0/0x110
> > __wait_rcu_gp+0x139/0x150
> > synchronize_rcu+0x79/0xf0
> > ? invoke_rcu_core+0xb0/0xb0
> > ? rcu_read_lock_any_held+0xb0/0xb0
> > blk_free_flush_queue+0x17/0x30
> > blk_mq_hw_sysfs_release+0x32/0x70
> > kobject_put+0x7d/0x1d0
> > blk_mq_release+0xbe/0xf0
> > blk_release_queue+0xb7/0x140
> > kobject_put+0x7d/0x1d0
> > disk_release+0xb0/0xc0
> > device_release+0x25/0x80
> > kobject_put+0x7d/0x1d0
> > hd_struct_free+0x4c/0xc0
> > percpu_ref_switch_to_atomic_rcu+0x1df/0x1f0
> > rcu_core+0x3fd/0x660
> > ? rcu_core+0x3cc/0x660
> > __do_softirq+0xd5/0x45e
> > ? smpboot_thread_fn+0x26/0x1d0
> > run_ksoftirqd+0x30/0x60
> > smpboot_thread_fn+0xfe/0x1d0
> > ? sort_range+0x20/0x20
> > kthread+0x11a/0x150
> > ? kthread_delayed_work_timer_fn+0xa0/0xa0
> > ret_from_fork+0x1f/0x30
> >
> > "git blame" points at your commit tb7d6c3033323 ("block: fix
> > use-after-free on cached last_lookup partition"), but there is
> > likely more to it because it went into 5.8 and I haven't seen
> > these lockups until we rebased to 5.9-rc.
>
> The pull-the-trigger commit is actually e8c7d14ac6c3 ("block: revert back to
> synchronous request_queue removal").
>
> >
> > Could you please take a look?
>
> Can you try the following patch?
>
> diff --git a/block/partitions/core.c b/block/partitions/core.c
> index e62a98a8eeb7..b06fc3425802 100644
> --- a/block/partitions/core.c
> +++ b/block/partitions/core.c
> @@ -278,6 +278,15 @@ static void hd_struct_free_work(struct work_struct *work)
> {
> struct hd_struct *part =
> container_of(to_rcu_work(work), struct hd_struct, rcu_work);
> + struct gendisk *disk = part_to_disk(part);
> +
> + /*
> + * Release the reference grabbed in delete_partition, and it should
> + * have been done in hd_struct_free(), however device's release
> + * handler can't be done in percpu_ref's ->release() callback
> + * because it is run via call_rcu().
> + */
> + put_device(disk_to_dev(disk));
>
> part->start_sect = 0;
> part->nr_sects = 0;
> @@ -293,7 +302,6 @@ static void hd_struct_free(struct percpu_ref *ref)
> rcu_dereference_protected(disk->part_tbl, 1);
>
> rcu_assign_pointer(ptbl->last_lookup, NULL);
> - put_device(disk_to_dev(disk));
>
> INIT_RCU_WORK(&part->rcu_work, hd_struct_free_work);
> queue_rcu_work(system_wq, &part->rcu_work);
This patch fixes it for me.
Thanks,
Ilya
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2020-08-30 8:46 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-28 10:32 Sleeping while atomic regression around hd_struct_free() in 5.9-rc Ilya Dryomov
2020-08-28 13:05 ` Ming Lei
2020-08-30 8:46 ` Ilya Dryomov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.