linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* INFO: task hung in wb_shutdown (2)
@ 2018-04-01 17:10 syzbot
  2018-04-24 12:19 ` Tetsuo Handa
  0 siblings, 1 reply; 9+ messages in thread
From: syzbot @ 2018-04-01 17:10 UTC (permalink / raw)
  To: axboe, christophe.jaillet, jack, linux-kernel, linux-mm,
	syzkaller-bugs, zhangweiping

Hello,

syzbot hit the following crash on upstream commit
3eb2ce825ea1ad89d20f7a3b5780df850e4be274 (Sun Mar 25 22:44:30 2018 +0000)
Linux 4.16-rc7
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=c0cf869505e03bdf1a24

So far this crash happened 179 times on upstream.
Unfortunately, I don't have any reproducer for this crash yet.
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=4738516814659584
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=-8440362230543204781
compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+c0cf869505e03bdf1a24@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.
If you forward the report, please keep this part and the footer.

unregister_netdevice: waiting for lo to become free. Usage count = 1
unregister_netdevice: waiting for lo to become free. Usage count = 1
unregister_netdevice: waiting for lo to become free. Usage count = 1
unregister_netdevice: waiting for lo to become free. Usage count = 1
unregister_netdevice: waiting for lo to become free. Usage count = 1
INFO: task kworker/0:5:16458 blocked for more than 120 seconds.
       Not tainted 4.16.0-rc7+ #368
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/0:5     D20928 16458      2 0x80000000
Workqueue: events cgwb_release_workfn
Call Trace:
  context_switch kernel/sched/core.c:2862 [inline]
  __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
  schedule+0xf5/0x430 kernel/sched/core.c:3499
  bit_wait+0x18/0x90 kernel/sched/wait_bit.c:250
  __wait_on_bit+0x88/0x130 kernel/sched/wait_bit.c:51
  out_of_line_wait_on_bit+0x204/0x3a0 kernel/sched/wait_bit.c:64
  wait_on_bit include/linux/wait_bit.h:84 [inline]
  wb_shutdown+0x335/0x430 mm/backing-dev.c:377
  cgwb_release_workfn+0x8b/0x61d mm/backing-dev.c:520
  process_one_work+0xc47/0x1bb0 kernel/workqueue.c:2113
  worker_thread+0x223/0x1990 kernel/workqueue.c:2247
  kthread+0x33c/0x400 kernel/kthread.c:238
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406

Showing all locks held in the system:
3 locks held by kworker/u4:1/21:
  #0:  ((wq_completion)"%s""netns"){+.+.}, at: [<00000000c0d07d52>]  
work_static include/linux/workqueue.h:198 [inline]
  #0:  ((wq_completion)"%s""netns"){+.+.}, at: [<00000000c0d07d52>]  
set_work_data kernel/workqueue.c:619 [inline]
  #0:  ((wq_completion)"%s""netns"){+.+.}, at: [<00000000c0d07d52>]  
set_work_pool_and_clear_pending kernel/workqueue.c:646 [inline]
  #0:  ((wq_completion)"%s""netns"){+.+.}, at: [<00000000c0d07d52>]  
process_one_work+0xb12/0x1bb0 kernel/workqueue.c:2084
  #1:  (net_cleanup_work){+.+.}, at: [<000000006c4c2cfd>]  
process_one_work+0xb89/0x1bb0 kernel/workqueue.c:2088
  #2:  (net_mutex){+.+.}, at: [<0000000058427774>] cleanup_net+0x242/0xcb0  
net/core/net_namespace.c:484
2 locks held by khungtaskd/869:
  #0:  (rcu_read_lock){....}, at: [<000000009066e5de>]  
check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
  #0:  (rcu_read_lock){....}, at: [<000000009066e5de>] watchdog+0x1c5/0xd60  
kernel/hung_task.c:249
  #1:  (tasklist_lock){.+.+}, at: [<000000002117cbd8>]  
debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470
2 locks held by getty/4407:
  #0:  (&tty->ldisc_sem){++++}, at: [<0000000015dafb41>]  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000841085d3>]  
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4408:
  #0:  (&tty->ldisc_sem){++++}, at: [<0000000015dafb41>]  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000841085d3>]  
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4409:
  #0:  (&tty->ldisc_sem){++++}, at: [<0000000015dafb41>]  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000841085d3>]  
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4410:
  #0:  (&tty->ldisc_sem){++++}, at: [<0000000015dafb41>]  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000841085d3>]  
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4411:
  #0:  (&tty->ldisc_sem){++++}, at: [<0000000015dafb41>]  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000841085d3>]  
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4412:
  #0:  (&tty->ldisc_sem){++++}, at: [<0000000015dafb41>]  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000841085d3>]  
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4413:
  #0:  (&tty->ldisc_sem){++++}, at: [<0000000015dafb41>]  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1:  (&ldata->atomic_read_lock){+.+.}, at: [<00000000841085d3>]  
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by kworker/0:5/16458:
  #0:  ((wq_completion)"events"){+.+.}, at: [<00000000c0d07d52>] work_static  
include/linux/workqueue.h:198 [inline]
  #0:  ((wq_completion)"events"){+.+.}, at: [<00000000c0d07d52>]  
set_work_data kernel/workqueue.c:619 [inline]
  #0:  ((wq_completion)"events"){+.+.}, at: [<00000000c0d07d52>]  
set_work_pool_and_clear_pending kernel/workqueue.c:646 [inline]
  #0:  ((wq_completion)"events"){+.+.}, at: [<00000000c0d07d52>]  
process_one_work+0xb12/0x1bb0 kernel/workqueue.c:2084
  #1:  ((work_completion)(&wb->release_work)){+.+.}, at:  
[<000000006c4c2cfd>] process_one_work+0xb89/0x1bb0 kernel/workqueue.c:2088
1 lock held by syz-executor6/23709:
  #0:  (net_mutex){+.+.}, at: [<00000000843d65a3>] copy_net_ns+0x1f5/0x580  
net/core/net_namespace.c:417

=============================================

NMI backtrace for cpu 0
CPU: 0 PID: 869 Comm: khungtaskd Not tainted 4.16.0-rc7+ #368
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x194/0x24d lib/dump_stack.c:53
  nmi_cpu_backtrace+0x1d2/0x210 lib/nmi_backtrace.c:103
  nmi_trigger_cpumask_backtrace+0x123/0x180 lib/nmi_backtrace.c:62
  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
  trigger_all_cpu_backtrace include/linux/nmi.h:138 [inline]
  check_hung_task kernel/hung_task.c:132 [inline]
  check_hung_uninterruptible_tasks kernel/hung_task.c:190 [inline]
  watchdog+0x90c/0xd60 kernel/hung_task.c:249
  kthread+0x33c/0x400 kernel/kthread.c:238
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1 skipped: idling at native_safe_halt+0x6/0x10  
arch/x86/include/asm/irqflags.h:54


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkaller@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.
Note: all commands must start from beginning of the line in the email body.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: INFO: task hung in wb_shutdown (2)
  2018-04-01 17:10 INFO: task hung in wb_shutdown (2) syzbot
@ 2018-04-24 12:19 ` Tetsuo Handa
  2018-05-01 10:27   ` Tetsuo Handa
  0 siblings, 1 reply; 9+ messages in thread
From: Tetsuo Handa @ 2018-04-24 12:19 UTC (permalink / raw)
  To: Jens Axboe, Jan Kara, Tejun Heo
  Cc: syzbot, christophe.jaillet, LKML, linux-mm, syzkaller-bugs,
	weiping zhang

>From 39ed6be8a2c12dfe54feaa5abbc2ec46103022bf Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Tue, 24 Apr 2018 11:59:08 +0900
Subject: [PATCH] bdi: wake up concurrent wb_shutdown() callers.

syzbot is reporting hung tasks at wait_on_bit(WB_shutting_down) in
wb_shutdown() [1]. This might be because commit 5318ce7d46866e1d ("bdi:
Shutdown writeback on all cgwbs in cgwb_bdi_destroy()") forgot to call
wake_up_bit(WB_shutting_down) after clear_bit(WB_shutting_down).

[1] https://syzkaller.appspot.com/bug?id=b297474817af98d5796bc544e1bb806fc3da0e5e

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+c0cf869505e03bdf1a24@syzkaller.appspotmail.com>
Fixes: 5318ce7d46866e1d ("bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy()")
Cc: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@fb.com>
---
 mm/backing-dev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 023190c..dadac99 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -384,6 +384,8 @@ static void wb_shutdown(struct bdi_writeback *wb)
 	 */
 	smp_wmb();
 	clear_bit(WB_shutting_down, &wb->state);
+	smp_mb(); /* advised by wake_up_bit() */
+	wake_up_bit(&wb->state, WB_shutting_down);
 }
 
 static void wb_exit(struct bdi_writeback *wb)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: INFO: task hung in wb_shutdown (2)
  2018-04-24 12:19 ` Tetsuo Handa
@ 2018-05-01 10:27   ` Tetsuo Handa
  2018-05-01 16:06     ` Linus Torvalds
  2018-05-01 16:12     ` Jens Axboe
  0 siblings, 2 replies; 9+ messages in thread
From: Tetsuo Handa @ 2018-05-01 10:27 UTC (permalink / raw)
  To: Jens Axboe, Jan Kara, Tejun Heo
  Cc: syzbot, christophe.jaillet, LKML, linux-mm, syzkaller-bugs,
	weiping zhang, Linus Torvalds, Andrew Morton, Dmitry Vyukov,
	linux-block

Tejun, Jan, Jens,

Can you review this patch? syzbot has hit this bug for nearly 4000 times but
is still unable to find a reproducer. Therefore, the only way to test would be
to apply this patch upstream and test whether the problem is solved.

On 2018/04/24 21:19, Tetsuo Handa wrote:
>>From 39ed6be8a2c12dfe54feaa5abbc2ec46103022bf Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Tue, 24 Apr 2018 11:59:08 +0900
> Subject: [PATCH] bdi: wake up concurrent wb_shutdown() callers.
> 
> syzbot is reporting hung tasks at wait_on_bit(WB_shutting_down) in
> wb_shutdown() [1]. This might be because commit 5318ce7d46866e1d ("bdi:
> Shutdown writeback on all cgwbs in cgwb_bdi_destroy()") forgot to call
> wake_up_bit(WB_shutting_down) after clear_bit(WB_shutting_down).
> 
> [1] https://syzkaller.appspot.com/bug?id=b297474817af98d5796bc544e1bb806fc3da0e5e
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Reported-by: syzbot <syzbot+c0cf869505e03bdf1a24@syzkaller.appspotmail.com>
> Fixes: 5318ce7d46866e1d ("bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy()")
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Jens Axboe <axboe@fb.com>
> ---
>  mm/backing-dev.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/backing-dev.c b/mm/backing-dev.c
> index 023190c..dadac99 100644
> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c
> @@ -384,6 +384,8 @@ static void wb_shutdown(struct bdi_writeback *wb)
>  	 */
>  	smp_wmb();
>  	clear_bit(WB_shutting_down, &wb->state);
> +	smp_mb(); /* advised by wake_up_bit() */
> +	wake_up_bit(&wb->state, WB_shutting_down);
>  }
>  
>  static void wb_exit(struct bdi_writeback *wb)
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: INFO: task hung in wb_shutdown (2)
  2018-05-01 10:27   ` Tetsuo Handa
@ 2018-05-01 16:06     ` Linus Torvalds
  2018-05-01 21:30       ` Jens Axboe
  2018-05-01 16:12     ` Jens Axboe
  1 sibling, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2018-05-01 16:06 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Jens Axboe, Jan Kara, Tejun Heo, syzbot+c0cf869505e03bdf1a24,
	christophe.jaillet, Linux Kernel Mailing List, linux-mm,
	syzkaller-bugs, zhangweiping, Andrew Morton, Dmitry Vyukov,
	linux-block

On Tue, May 1, 2018 at 3:27 AM Tetsuo Handa <
penguin-kernel@i-love.sakura.ne.jp> wrote:

> Can you review this patch? syzbot has hit this bug for nearly 4000 times
but
> is still unable to find a reproducer. Therefore, the only way to test
would be
> to apply this patch upstream and test whether the problem is solved.

Looks ok to me, except:

> >       smp_wmb();
> >       clear_bit(WB_shutting_down, &wb->state);
> > +     smp_mb(); /* advised by wake_up_bit() */
> > +     wake_up_bit(&wb->state, WB_shutting_down);

This whole sequence really should just be a pattern with a helper function.

And honestly, the pattern probably *should* be

     clear_bit_unlock(bit, &mem);
     smp_mb__after_atomic()
     wake_up_bit(&mem, bit);

which looks like it is a bit cleaner wrt memory ordering rules.

             Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: INFO: task hung in wb_shutdown (2)
  2018-05-01 10:27   ` Tetsuo Handa
  2018-05-01 16:06     ` Linus Torvalds
@ 2018-05-01 16:12     ` Jens Axboe
  1 sibling, 0 replies; 9+ messages in thread
From: Jens Axboe @ 2018-05-01 16:12 UTC (permalink / raw)
  To: Tetsuo Handa, Jan Kara, Tejun Heo
  Cc: syzbot, christophe.jaillet, LKML, linux-mm, syzkaller-bugs,
	weiping zhang, Linus Torvalds, Andrew Morton, Dmitry Vyukov,
	linux-block

On 5/1/18 4:27 AM, Tetsuo Handa wrote:
> Tejun, Jan, Jens,
> 
> Can you review this patch? syzbot has hit this bug for nearly 4000 times but
> is still unable to find a reproducer. Therefore, the only way to test would be
> to apply this patch upstream and test whether the problem is solved.

I'll review it today.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: INFO: task hung in wb_shutdown (2)
  2018-05-01 16:06     ` Linus Torvalds
@ 2018-05-01 21:30       ` Jens Axboe
  2018-05-01 22:14         ` Tetsuo Handa
  0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2018-05-01 21:30 UTC (permalink / raw)
  To: Linus Torvalds, Tetsuo Handa
  Cc: Jan Kara, Tejun Heo, syzbot+c0cf869505e03bdf1a24,
	christophe.jaillet, Linux Kernel Mailing List, linux-mm,
	syzkaller-bugs, zhangweiping, Andrew Morton, Dmitry Vyukov,
	linux-block

On 5/1/18 10:06 AM, Linus Torvalds wrote:
> On Tue, May 1, 2018 at 3:27 AM Tetsuo Handa <
> penguin-kernel@i-love.sakura.ne.jp> wrote:
> 
>> Can you review this patch? syzbot has hit this bug for nearly 4000 times
> but
>> is still unable to find a reproducer. Therefore, the only way to test
> would be
>> to apply this patch upstream and test whether the problem is solved.
> 
> Looks ok to me, except:
> 
>>>       smp_wmb();
>>>       clear_bit(WB_shutting_down, &wb->state);
>>> +     smp_mb(); /* advised by wake_up_bit() */
>>> +     wake_up_bit(&wb->state, WB_shutting_down);
> 
> This whole sequence really should just be a pattern with a helper function.
> 
> And honestly, the pattern probably *should* be
> 
>      clear_bit_unlock(bit, &mem);
>      smp_mb__after_atomic()
>      wake_up_bit(&mem, bit);
> 
> which looks like it is a bit cleaner wrt memory ordering rules.

Agree, that construct looks saner than introducing a "random"
smp_mb(). As a pattern helper, should probably be introduced
after the fact.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: INFO: task hung in wb_shutdown (2)
  2018-05-01 21:30       ` Jens Axboe
@ 2018-05-01 22:14         ` Tetsuo Handa
  2018-05-03 15:13           ` Jan Kara
  2018-05-03 15:25           ` Jens Axboe
  0 siblings, 2 replies; 9+ messages in thread
From: Tetsuo Handa @ 2018-05-01 22:14 UTC (permalink / raw)
  To: axboe, torvalds
  Cc: jack, tj, syzbot+c0cf869505e03bdf1a24, christophe.jaillet,
	linux-kernel, linux-mm, syzkaller-bugs, zhangweiping, akpm,
	dvyukov, linux-block

>From 1b90d7f71d60e743c69cdff3ba41edd1f9f86f93 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Wed, 2 May 2018 07:07:55 +0900
Subject: [PATCH v2] bdi: wake up concurrent wb_shutdown() callers.

syzbot is reporting hung tasks at wait_on_bit(WB_shutting_down) in
wb_shutdown() [1]. This seems to be because commit 5318ce7d46866e1d ("bdi:
Shutdown writeback on all cgwbs in cgwb_bdi_destroy()") forgot to call
wake_up_bit(WB_shutting_down) after clear_bit(WB_shutting_down).

Introduce a helper function clear_and_wake_up_bit() and use it, in order
to avoid similar errors in future.

[1] https://syzkaller.appspot.com/bug?id=b297474817af98d5796bc544e1bb806fc3da0e5e

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+c0cf869505e03bdf1a24@syzkaller.appspotmail.com>
Fixes: 5318ce7d46866e1d ("bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy()")
Cc: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@fb.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 include/linux/wait_bit.h | 17 +++++++++++++++++
 mm/backing-dev.c         |  2 +-
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/linux/wait_bit.h b/include/linux/wait_bit.h
index 9318b21..2b0072f 100644
--- a/include/linux/wait_bit.h
+++ b/include/linux/wait_bit.h
@@ -305,4 +305,21 @@ struct wait_bit_queue_entry {
 	__ret;								\
 })
 
+/**
+ * clear_and_wake_up_bit - clear a bit and wake up anyone waiting on that bit
+ *
+ * @bit: the bit of the word being waited on
+ * @word: the word being waited on, a kernel virtual address
+ *
+ * You can use this helper if bitflags are manipulated atomically rather than
+ * non-atomically under a lock.
+ */
+static inline void clear_and_wake_up_bit(int bit, void *word)
+{
+	clear_bit_unlock(bit, word);
+	/* See wake_up_bit() for which memory barrier you need to use. */
+	smp_mb__after_atomic();
+	wake_up_bit(word, bit);
+}
+
 #endif /* _LINUX_WAIT_BIT_H */
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 023190c..fa5e6d7 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -383,7 +383,7 @@ static void wb_shutdown(struct bdi_writeback *wb)
 	 * the barrier provided by test_and_clear_bit() above.
 	 */
 	smp_wmb();
-	clear_bit(WB_shutting_down, &wb->state);
+	clear_and_wake_up_bit(WB_shutting_down, &wb->state);
 }
 
 static void wb_exit(struct bdi_writeback *wb)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: INFO: task hung in wb_shutdown (2)
  2018-05-01 22:14         ` Tetsuo Handa
@ 2018-05-03 15:13           ` Jan Kara
  2018-05-03 15:25           ` Jens Axboe
  1 sibling, 0 replies; 9+ messages in thread
From: Jan Kara @ 2018-05-03 15:13 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: axboe, torvalds, jack, tj, syzbot+c0cf869505e03bdf1a24,
	christophe.jaillet, linux-kernel, linux-mm, syzkaller-bugs,
	zhangweiping, akpm, dvyukov, linux-block

On Wed 02-05-18 07:14:51, Tetsuo Handa wrote:
> >From 1b90d7f71d60e743c69cdff3ba41edd1f9f86f93 Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Wed, 2 May 2018 07:07:55 +0900
> Subject: [PATCH v2] bdi: wake up concurrent wb_shutdown() callers.
> 
> syzbot is reporting hung tasks at wait_on_bit(WB_shutting_down) in
> wb_shutdown() [1]. This seems to be because commit 5318ce7d46866e1d ("bdi:
> Shutdown writeback on all cgwbs in cgwb_bdi_destroy()") forgot to call
> wake_up_bit(WB_shutting_down) after clear_bit(WB_shutting_down).
> 
> Introduce a helper function clear_and_wake_up_bit() and use it, in order
> to avoid similar errors in future.
> 
> [1] https://syzkaller.appspot.com/bug?id=b297474817af98d5796bc544e1bb806fc3da0e5e
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Reported-by: syzbot <syzbot+c0cf869505e03bdf1a24@syzkaller.appspotmail.com>
> Fixes: 5318ce7d46866e1d ("bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy()")
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Jens Axboe <axboe@fb.com>
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>

Thanks for debugging this and for the fix Tetsuo! The patch looks good to
me. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  include/linux/wait_bit.h | 17 +++++++++++++++++
>  mm/backing-dev.c         |  2 +-
>  2 files changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/wait_bit.h b/include/linux/wait_bit.h
> index 9318b21..2b0072f 100644
> --- a/include/linux/wait_bit.h
> +++ b/include/linux/wait_bit.h
> @@ -305,4 +305,21 @@ struct wait_bit_queue_entry {
>  	__ret;								\
>  })
>  
> +/**
> + * clear_and_wake_up_bit - clear a bit and wake up anyone waiting on that bit
> + *
> + * @bit: the bit of the word being waited on
> + * @word: the word being waited on, a kernel virtual address
> + *
> + * You can use this helper if bitflags are manipulated atomically rather than
> + * non-atomically under a lock.
> + */
> +static inline void clear_and_wake_up_bit(int bit, void *word)
> +{
> +	clear_bit_unlock(bit, word);
> +	/* See wake_up_bit() for which memory barrier you need to use. */
> +	smp_mb__after_atomic();
> +	wake_up_bit(word, bit);
> +}
> +
>  #endif /* _LINUX_WAIT_BIT_H */
> diff --git a/mm/backing-dev.c b/mm/backing-dev.c
> index 023190c..fa5e6d7 100644
> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c
> @@ -383,7 +383,7 @@ static void wb_shutdown(struct bdi_writeback *wb)
>  	 * the barrier provided by test_and_clear_bit() above.
>  	 */
>  	smp_wmb();
> -	clear_bit(WB_shutting_down, &wb->state);
> +	clear_and_wake_up_bit(WB_shutting_down, &wb->state);
>  }
>  
>  static void wb_exit(struct bdi_writeback *wb)
> -- 
> 1.8.3.1
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: INFO: task hung in wb_shutdown (2)
  2018-05-01 22:14         ` Tetsuo Handa
  2018-05-03 15:13           ` Jan Kara
@ 2018-05-03 15:25           ` Jens Axboe
  1 sibling, 0 replies; 9+ messages in thread
From: Jens Axboe @ 2018-05-03 15:25 UTC (permalink / raw)
  To: Tetsuo Handa, torvalds
  Cc: jack, tj, syzbot+c0cf869505e03bdf1a24, christophe.jaillet,
	linux-kernel, linux-mm, syzkaller-bugs, zhangweiping, akpm,
	dvyukov, linux-block

On 5/1/18 4:14 PM, Tetsuo Handa wrote:
>>From 1b90d7f71d60e743c69cdff3ba41edd1f9f86f93 Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Wed, 2 May 2018 07:07:55 +0900
> Subject: [PATCH v2] bdi: wake up concurrent wb_shutdown() callers.
> 
> syzbot is reporting hung tasks at wait_on_bit(WB_shutting_down) in
> wb_shutdown() [1]. This seems to be because commit 5318ce7d46866e1d ("bdi:
> Shutdown writeback on all cgwbs in cgwb_bdi_destroy()") forgot to call
> wake_up_bit(WB_shutting_down) after clear_bit(WB_shutting_down).
> 
> Introduce a helper function clear_and_wake_up_bit() and use it, in order
> to avoid similar errors in future.

Queued up, thanks Tetsuo!

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-05-03 15:25 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-01 17:10 INFO: task hung in wb_shutdown (2) syzbot
2018-04-24 12:19 ` Tetsuo Handa
2018-05-01 10:27   ` Tetsuo Handa
2018-05-01 16:06     ` Linus Torvalds
2018-05-01 21:30       ` Jens Axboe
2018-05-01 22:14         ` Tetsuo Handa
2018-05-03 15:13           ` Jan Kara
2018-05-03 15:25           ` Jens Axboe
2018-05-01 16:12     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).