All of lore.kernel.org
 help / color / mirror / Atom feed
* general protection fault in wb_workfn
@ 2018-04-19 16:05 syzbot
  2018-04-23 10:09 ` Tetsuo Handa
  0 siblings, 1 reply; 5+ messages in thread
From: syzbot @ 2018-04-19 16:05 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, syzkaller-bugs, viro

Hello,

syzbot hit the following crash on upstream commit
3eb2ce825ea1ad89d20f7a3b5780df850e4be274 (Sun Mar 25 22:44:30 2018 +0000)
Linux 4.16-rc7
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=9873874c735f2892e7e9

So far this crash happened 29 times on upstream.
Unfortunately, I don't have any reproducer for this crash yet.
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=6617474409693184
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=-8440362230543204781
compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+9873874c735f2892e7e9@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.
If you forward the report, please keep this part and the footer.

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
    (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 28 Comm: kworker/u4:2 Not tainted 4.16.0-rc7+ #368
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Workqueue: writeback wb_workfn
RIP: 0010:dev_name include/linux/device.h:981 [inline]
RIP: 0010:wb_workfn+0x1a2/0x16b0 fs/fs-writeback.c:1936
RSP: 0018:ffff8801d951f038 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff81bf6ea5
RDX: 000000000000000a RSI: ffffffff87b44840 RDI: 0000000000000050
RBP: ffff8801d951f558 R08: 1ffff1003b2a3def R09: 0000000000000004
R10: ffff8801d951f438 R11: 0000000000000004 R12: 0000000000000100
R13: ffff8801baee0dc0 R14: ffff8801d951f530 R15: ffff8801baee10d8
FS:  0000000000000000(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000047ff80 CR3: 0000000007a22006 CR4: 00000000001626f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  process_one_work+0xc47/0x1bb0 kernel/workqueue.c:2113
  process_scheduled_works kernel/workqueue.c:2173 [inline]
  worker_thread+0xa4b/0x1990 kernel/workqueue.c:2252
  kthread+0x33c/0x400 kernel/kthread.c:238
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
Code: fa 48 c1 ea 03 80 3c 02 00 0f 85 d9 14 00 00 48 8b 9b d0 05 00 00 48  
b8 00 00 00 00 00 fc ff df 48 8d 7b 50 48 89 fa 48 c1 ea 03 <80> 3c 02 00  
0f 85 a9 14 00 00 4c 8b 63 50 4d 85 e4 0f 84 49 0e
RIP: dev_name include/linux/device.h:981 [inline] RSP: ffff8801d951f038
RIP: wb_workfn+0x1a2/0x16b0 fs/fs-writeback.c:1936 RSP: ffff8801d951f038
---[ end trace 303e9927650f10c6 ]---
Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer:
    (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkaller@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.
Note: all commands must start from beginning of the line in the email body.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: general protection fault in wb_workfn
  2018-04-19 16:05 general protection fault in wb_workfn syzbot
@ 2018-04-23 10:09 ` Tetsuo Handa
  2018-04-23 21:43   ` Tetsuo Handa
  2018-05-03 16:03     ` Jan Kara
  0 siblings, 2 replies; 5+ messages in thread
From: Tetsuo Handa @ 2018-04-23 10:09 UTC (permalink / raw)
  To: syzbot, linux-kernel, syzkaller-bugs, Tejun Heo, Jan Kara,
	Jens Axboe, linux-block
  Cc: linux-fsdevel, viro

On 2018/04/20 1:05, syzbot wrote:
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault: 0000 [#1] SMP KASAN
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 28 Comm: kworker/u4:2 Not tainted 4.16.0-rc7+ #368
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Workqueue: writeback wb_workfn
> RIP: 0010:dev_name include/linux/device.h:981 [inline]
> RIP: 0010:wb_workfn+0x1a2/0x16b0 fs/fs-writeback.c:1936
> RSP: 0018:ffff8801d951f038 EFLAGS: 00010206
> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff81bf6ea5
> RDX: 000000000000000a RSI: ffffffff87b44840 RDI: 0000000000000050
> RBP: ffff8801d951f558 R08: 1ffff1003b2a3def R09: 0000000000000004
> R10: ffff8801d951f438 R11: 0000000000000004 R12: 0000000000000100
> R13: ffff8801baee0dc0 R14: ffff8801d951f530 R15: ffff8801baee10d8
> FS:  0000000000000000(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000000047ff80 CR3: 0000000007a22006 CR4: 00000000001626f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  process_one_work+0xc47/0x1bb0 kernel/workqueue.c:2113
>  process_scheduled_works kernel/workqueue.c:2173 [inline]
>  worker_thread+0xa4b/0x1990 kernel/workqueue.c:2252
>  kthread+0x33c/0x400 kernel/kthread.c:238
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406

This report says that wb->bdi->dev == NULL

  static inline const char *dev_name(const struct device *dev)
  {
    /* Use the init name until the kobject becomes available */
    if (dev->init_name)
      return dev->init_name;
  
    return kobject_name(&dev->kobj);
  }

  void wb_workfn(struct work_struct *work)
  {
  (...snipped...)
     set_worker_desc("flush-%s", dev_name(wb->bdi->dev));
  (...snipped...)
  }

immediately after ioctl(LOOP_CTL_REMOVE) was requested. It is plausible
because ioctl(LOOP_CTL_REMOVE) sets bdi->dev to NULL after returning from
wb_shutdown().

loop_control_ioctl(LOOP_CTL_REMOVE) {
  loop_remove(lo) {
    del_gendisk(lo->lo_disk) {
      bdi_unregister(disk->queue->backing_dev_info) {
        bdi_remove_from_list(bdi);
        wb_shutdown(&bdi->wb);
        cgwb_bdi_unregister(bdi);
        if (bdi->dev) {
          bdi_debug_unregister(bdi);
          device_unregister(bdi->dev);
          bdi->dev = NULL;
        }
      }
    }
  }
}

For some reason wb_shutdown() is not waiting for wb_workfn() to complete
( or something queues again after WB_registered bit was cleared ) ?

Anyway, I think that this is block layer problem rather than fs layer problem.



By the way, I got a newbie question regarding commit 5318ce7d46866e1d ("bdi:
Shutdown writeback on all cgwbs in cgwb_bdi_destroy()"). It uses clear_bit()
to clear WB_shutting_down bit so that threads waiting at wait_on_bit() will
wake up. But clear_bit() itself does not wake up threads, does it? Who wakes
them up (e.g. by calling wake_up_bit()) after clear_bit() was called?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: general protection fault in wb_workfn
  2018-04-23 10:09 ` Tetsuo Handa
@ 2018-04-23 21:43   ` Tetsuo Handa
  2018-05-03 16:03     ` Jan Kara
  1 sibling, 0 replies; 5+ messages in thread
From: Tetsuo Handa @ 2018-04-23 21:43 UTC (permalink / raw)
  To: Tejun Heo, Jan Kara, Jens Axboe, linux-block
  Cc: syzbot, linux-kernel, syzkaller-bugs, linux-fsdevel, viro

On 2018/04/23 19:09, Tetsuo Handa wrote:
> By the way, I got a newbie question regarding commit 5318ce7d46866e1d ("bdi:
> Shutdown writeback on all cgwbs in cgwb_bdi_destroy()"). It uses clear_bit()
> to clear WB_shutting_down bit so that threads waiting at wait_on_bit() will
> wake up. But clear_bit() itself does not wake up threads, does it? Who wakes
> them up (e.g. by calling wake_up_bit()) after clear_bit() was called?
> 

Below report might be waiting for wake_up_bit() ?

  INFO: task hung in wb_shutdown (2)
  https://syzkaller.appspot.com/bug?id=b297474817af98d5796bc544e1bb806fc3da0e5e

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: general protection fault in wb_workfn
  2018-04-23 10:09 ` Tetsuo Handa
@ 2018-05-03 16:03     ` Jan Kara
  2018-05-03 16:03     ` Jan Kara
  1 sibling, 0 replies; 5+ messages in thread
From: Jan Kara @ 2018-05-03 16:03 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: syzbot, linux-kernel, syzkaller-bugs, Tejun Heo, Jan Kara,
	Jens Axboe, linux-block, linux-fsdevel, viro

On Mon 23-04-18 19:09:51, Tetsuo Handa wrote:
> On 2018/04/20 1:05, syzbot wrote:
> > kasan: CONFIG_KASAN_INLINE enabled
> > kasan: GPF could be caused by NULL-ptr deref or user memory access
> > general protection fault: 0000 [#1] SMP KASAN
> > Dumping ftrace buffer:
> > �� (ftrace buffer empty)
> > Modules linked in:
> > CPU: 0 PID: 28 Comm: kworker/u4:2 Not tainted 4.16.0-rc7+ #368
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > Workqueue: writeback wb_workfn
> > RIP: 0010:dev_name include/linux/device.h:981 [inline]
> > RIP: 0010:wb_workfn+0x1a2/0x16b0 fs/fs-writeback.c:1936
> > RSP: 0018:ffff8801d951f038 EFLAGS: 00010206
> > RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff81bf6ea5
> > RDX: 000000000000000a RSI: ffffffff87b44840 RDI: 0000000000000050
> > RBP: ffff8801d951f558 R08: 1ffff1003b2a3def R09: 0000000000000004
> > R10: ffff8801d951f438 R11: 0000000000000004 R12: 0000000000000100
> > R13: ffff8801baee0dc0 R14: ffff8801d951f530 R15: ffff8801baee10d8
> > FS:� 0000000000000000(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000
> > CS:� 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 000000000047ff80 CR3: 0000000007a22006 CR4: 00000000001626f0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> > �process_one_work+0xc47/0x1bb0 kernel/workqueue.c:2113
> > �process_scheduled_works kernel/workqueue.c:2173 [inline]
> > �worker_thread+0xa4b/0x1990 kernel/workqueue.c:2252
> > �kthread+0x33c/0x400 kernel/kthread.c:238
> > �ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
> 
> This report says that wb->bdi->dev == NULL
> 
>   static inline const char *dev_name(const struct device *dev)
>   {
>     /* Use the init name until the kobject becomes available */
>     if (dev->init_name)
>       return dev->init_name;
>   
>     return kobject_name(&dev->kobj);
>   }
> 
>   void wb_workfn(struct work_struct *work)
>   {
>   (...snipped...)
>      set_worker_desc("flush-%s", dev_name(wb->bdi->dev));
>   (...snipped...)
>   }
> 
> immediately after ioctl(LOOP_CTL_REMOVE) was requested. It is plausible
> because ioctl(LOOP_CTL_REMOVE) sets bdi->dev to NULL after returning from
> wb_shutdown().
> 
> loop_control_ioctl(LOOP_CTL_REMOVE) {
>   loop_remove(lo) {
>     del_gendisk(lo->lo_disk) {
>       bdi_unregister(disk->queue->backing_dev_info) {
>         bdi_remove_from_list(bdi);
>         wb_shutdown(&bdi->wb);
>         cgwb_bdi_unregister(bdi);
>         if (bdi->dev) {
>           bdi_debug_unregister(bdi);
>           device_unregister(bdi->dev);
>           bdi->dev = NULL;
>         }
>       }
>     }
>   }
> }
> 
> For some reason wb_shutdown() is not waiting for wb_workfn() to complete
> ( or something queues again after WB_registered bit was cleared ) ?
> 
> Anyway, I think that this is block layer problem rather than fs layer
> problem.

Thanks for the analysis. I think I can see where is the problem -
wb_workfn() can requeue the work while wb_shutdown() is running I'll send a
patch shortly.

> By the way, I got a newbie question regarding commit 5318ce7d46866e1d ("bdi:
> Shutdown writeback on all cgwbs in cgwb_bdi_destroy()"). It uses clear_bit()
> to clear WB_shutting_down bit so that threads waiting at wait_on_bit() will
> wake up. But clear_bit() itself does not wake up threads, does it? Who wakes
> them up (e.g. by calling wake_up_bit()) after clear_bit() was called?

Yeah, that's a bug. Thanks for fixing it.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: general protection fault in wb_workfn
@ 2018-05-03 16:03     ` Jan Kara
  0 siblings, 0 replies; 5+ messages in thread
From: Jan Kara @ 2018-05-03 16:03 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: syzbot, linux-kernel, syzkaller-bugs, Tejun Heo, Jan Kara,
	Jens Axboe, linux-block, linux-fsdevel, viro

On Mon 23-04-18 19:09:51, Tetsuo Handa wrote:
> On 2018/04/20 1:05, syzbot wrote:
> > kasan: CONFIG_KASAN_INLINE enabled
> > kasan: GPF could be caused by NULL-ptr deref or user memory access
> > general protection fault: 0000 [#1] SMP KASAN
> > Dumping ftrace buffer:
> >    (ftrace buffer empty)
> > Modules linked in:
> > CPU: 0 PID: 28 Comm: kworker/u4:2 Not tainted 4.16.0-rc7+ #368
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > Workqueue: writeback wb_workfn
> > RIP: 0010:dev_name include/linux/device.h:981 [inline]
> > RIP: 0010:wb_workfn+0x1a2/0x16b0 fs/fs-writeback.c:1936
> > RSP: 0018:ffff8801d951f038 EFLAGS: 00010206
> > RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff81bf6ea5
> > RDX: 000000000000000a RSI: ffffffff87b44840 RDI: 0000000000000050
> > RBP: ffff8801d951f558 R08: 1ffff1003b2a3def R09: 0000000000000004
> > R10: ffff8801d951f438 R11: 0000000000000004 R12: 0000000000000100
> > R13: ffff8801baee0dc0 R14: ffff8801d951f530 R15: ffff8801baee10d8
> > FS:  0000000000000000(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 000000000047ff80 CR3: 0000000007a22006 CR4: 00000000001626f0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> >  process_one_work+0xc47/0x1bb0 kernel/workqueue.c:2113
> >  process_scheduled_works kernel/workqueue.c:2173 [inline]
> >  worker_thread+0xa4b/0x1990 kernel/workqueue.c:2252
> >  kthread+0x33c/0x400 kernel/kthread.c:238
> >  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
> 
> This report says that wb->bdi->dev == NULL
> 
>   static inline const char *dev_name(const struct device *dev)
>   {
>     /* Use the init name until the kobject becomes available */
>     if (dev->init_name)
>       return dev->init_name;
>   
>     return kobject_name(&dev->kobj);
>   }
> 
>   void wb_workfn(struct work_struct *work)
>   {
>   (...snipped...)
>      set_worker_desc("flush-%s", dev_name(wb->bdi->dev));
>   (...snipped...)
>   }
> 
> immediately after ioctl(LOOP_CTL_REMOVE) was requested. It is plausible
> because ioctl(LOOP_CTL_REMOVE) sets bdi->dev to NULL after returning from
> wb_shutdown().
> 
> loop_control_ioctl(LOOP_CTL_REMOVE) {
>   loop_remove(lo) {
>     del_gendisk(lo->lo_disk) {
>       bdi_unregister(disk->queue->backing_dev_info) {
>         bdi_remove_from_list(bdi);
>         wb_shutdown(&bdi->wb);
>         cgwb_bdi_unregister(bdi);
>         if (bdi->dev) {
>           bdi_debug_unregister(bdi);
>           device_unregister(bdi->dev);
>           bdi->dev = NULL;
>         }
>       }
>     }
>   }
> }
> 
> For some reason wb_shutdown() is not waiting for wb_workfn() to complete
> ( or something queues again after WB_registered bit was cleared ) ?
> 
> Anyway, I think that this is block layer problem rather than fs layer
> problem.

Thanks for the analysis. I think I can see where is the problem -
wb_workfn() can requeue the work while wb_shutdown() is running I'll send a
patch shortly.

> By the way, I got a newbie question regarding commit 5318ce7d46866e1d ("bdi:
> Shutdown writeback on all cgwbs in cgwb_bdi_destroy()"). It uses clear_bit()
> to clear WB_shutting_down bit so that threads waiting at wait_on_bit() will
> wake up. But clear_bit() itself does not wake up threads, does it? Who wakes
> them up (e.g. by calling wake_up_bit()) after clear_bit() was called?

Yeah, that's a bug. Thanks for fixing it.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-05-03 16:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-19 16:05 general protection fault in wb_workfn syzbot
2018-04-23 10:09 ` Tetsuo Handa
2018-04-23 21:43   ` Tetsuo Handa
2018-05-03 16:03   ` Jan Kara
2018-05-03 16:03     ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.