linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* INFO: task hung in __get_super
@ 2018-04-01 17:08 syzbot
  2018-06-19 11:44 ` Tetsuo Handa
  2019-04-28 18:14 ` syzbot
  0 siblings, 2 replies; 28+ messages in thread
From: syzbot @ 2018-04-01 17:08 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, syzkaller-bugs, viro

Hello,

syzbot hit the following crash on upstream commit
10b84daddbec72c6b440216a69de9a9605127f7a (Sat Mar 31 17:59:00 2018 +0000)
Merge branch 'perf-urgent-for-linus' of  
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=10007d66ca02b08f0e60

Unfortunately, I don't have any reproducer for this crash yet.
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=5899419228569600
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=-2760467897697295172
compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+10007d66ca02b08f0e60@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.
If you forward the report, please keep this part and the footer.

IPv6: ADDRCONF(NETDEV_UP): veth0: link is not ready
IPv6: ADDRCONF(NETDEV_UP): veth1: link is not ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
random: crng init done
INFO: task syz-executor3:13421 blocked for more than 120 seconds.
       Not tainted 4.16.0-rc7+ #9
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor3   D24672 13421   4481 0x00000004
Call Trace:
  context_switch kernel/sched/core.c:2862 [inline]
  __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
  schedule+0xf5/0x430 kernel/sched/core.c:3499
  __rwsem_down_read_failed_common kernel/locking/rwsem-xadd.c:269 [inline]
  rwsem_down_read_failed+0x401/0x6e0 kernel/locking/rwsem-xadd.c:286
  call_rwsem_down_read_failed+0x18/0x30 arch/x86/lib/rwsem.S:94
  __down_read arch/x86/include/asm/rwsem.h:83 [inline]
  down_read+0xa4/0x150 kernel/locking/rwsem.c:26
  __get_super.part.9+0x1d3/0x280 fs/super.c:663
  __get_super include/linux/spinlock.h:310 [inline]
  get_super+0x2d/0x40 fs/super.c:692
  fsync_bdev+0x19/0x80 fs/block_dev.c:468
  invalidate_partition+0x35/0x60 block/genhd.c:1566
  drop_partitions.isra.12+0xcd/0x1d0 block/partition-generic.c:440
  rescan_partitions+0x72/0x900 block/partition-generic.c:513
  __blkdev_reread_part+0x15f/0x1e0 block/ioctl.c:173
  blkdev_reread_part+0x26/0x40 block/ioctl.c:193
  loop_reread_partitions+0x12f/0x1a0 drivers/block/loop.c:619
  loop_set_status+0x9bb/0xf60 drivers/block/loop.c:1161
  loop_set_status64+0x9d/0x110 drivers/block/loop.c:1271
  lo_ioctl+0xd86/0x1b70 drivers/block/loop.c:1381
  __blkdev_driver_ioctl block/ioctl.c:303 [inline]
  blkdev_ioctl+0x1759/0x1e00 block/ioctl.c:601
  block_ioctl+0xde/0x120 fs/block_dev.c:1875
  vfs_ioctl fs/ioctl.c:46 [inline]
  do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
  SYSC_ioctl fs/ioctl.c:701 [inline]
  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
  do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x454e79
RSP: 002b:00007fda691eec68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007fda691ef6d4 RCX: 0000000000454e79
RDX: 00000000200001c0 RSI: 0000000000004c04 RDI: 0000000000000013
RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000000287 R14: 00000000006f5d48 R15: 0000000000000000

Showing all locks held in the system:
2 locks held by khungtaskd/878:
  #0:  (rcu_read_lock){....}, at: [<000000004cf2ddac>]  
check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
  #0:  (rcu_read_lock){....}, at: [<000000004cf2ddac>] watchdog+0x1c5/0xd60  
kernel/hung_task.c:249
  #1:  (tasklist_lock){.+.+}, at: [<00000000fc5e2248>]  
debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470
2 locks held by getty/4404:
  #0:  (&tty->ldisc_sem){++++}, at: [<00000000c5139392>]  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1:  (&ldata->atomic_read_lock){+.+.}, at: [<000000003da58a6e>]  
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4405:
  #0:  (&tty->ldisc_sem){++++}, at: [<00000000c5139392>]  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1:  (&ldata->atomic_read_lock){+.+.}, at: [<000000003da58a6e>]  
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4406:
  #0:  (&tty->ldisc_sem){++++}, at: [<00000000c5139392>]  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1:  (&ldata->atomic_read_lock){+.+.}, at: [<000000003da58a6e>]  
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4407:
  #0:  (&tty->ldisc_sem){++++}, at: [<00000000c5139392>]  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1:  (&ldata->atomic_read_lock){+.+.}, at: [<000000003da58a6e>]  
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4408:
  #0:  (&tty->ldisc_sem){++++}, at: [<00000000c5139392>]  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1:  (&ldata->atomic_read_lock){+.+.}, at: [<000000003da58a6e>]  
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4409:
  #0:  (&tty->ldisc_sem){++++}, at: [<00000000c5139392>]  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1:  (&ldata->atomic_read_lock){+.+.}, at: [<000000003da58a6e>]  
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4410:
  #0:  (&tty->ldisc_sem){++++}, at: [<00000000c5139392>]  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1:  (&ldata->atomic_read_lock){+.+.}, at: [<000000003da58a6e>]  
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
3 locks held by syz-executor3/13421:
  #0:  (&lo->lo_ctl_mutex/1){+.+.}, at: [<00000000834f78af>]  
lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
  #1:  (&bdev->bd_mutex){+.+.}, at: [<0000000003605603>]  
blkdev_reread_part+0x1e/0x40 block/ioctl.c:192
  #2:  (&type->s_umount_key#77){.+.+}, at: [<0000000077701649>]  
__get_super.part.9+0x1d3/0x280 fs/super.c:663
1 lock held by syz-executor3/13464:
  #0:  (&bdev->bd_mutex){+.+.}, at: [<00000000c39e77db>]  
__blkdev_get+0x176/0x13b0 fs/block_dev.c:1458
1 lock held by syz-executor3/13466:
  #0:  (&lo->lo_ctl_mutex/1){+.+.}, at: [<00000000834f78af>]  
lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
1 lock held by syz-executor2/13423:
  #0:  (&bdev->bd_mutex){+.+.}, at: [<0000000032c86bf7>]  
blkdev_put+0x2a/0x4f0 fs/block_dev.c:1808
2 locks held by syz-executor0/13428:
  #0:  (&type->s_umount_key#76/1){+.+.}, at: [<00000000d25ba33a>]  
alloc_super fs/super.c:211 [inline]
  #0:  (&type->s_umount_key#76/1){+.+.}, at: [<00000000d25ba33a>]  
sget_userns+0x3a1/0xe40 fs/super.c:502
  #1:  (&lo->lo_ctl_mutex/1){+.+.}, at: [<00000000834f78af>]  
lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
1 lock held by syz-executor0/13465:
  #0:  (&bdev->bd_mutex){+.+.}, at: [<00000000c39e77db>]  
__blkdev_get+0x176/0x13b0 fs/block_dev.c:1458
1 lock held by blkid/13434:
  #0:  (&bdev->bd_mutex){+.+.}, at: [<00000000c39e77db>]  
__blkdev_get+0x176/0x13b0 fs/block_dev.c:1458
1 lock held by syz-executor2/13638:
  #0:  (&bdev->bd_mutex){+.+.}, at: [<00000000c39e77db>]  
__blkdev_get+0x176/0x13b0 fs/block_dev.c:1458
1 lock held by syz-executor2/13639:
  #0:  (&bdev->bd_mutex){+.+.}, at: [<00000000c39e77db>]  
__blkdev_get+0x176/0x13b0 fs/block_dev.c:1458

=============================================

NMI backtrace for cpu 0
CPU: 0 PID: 878 Comm: khungtaskd Not tainted 4.16.0-rc7+ #9
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x194/0x24d lib/dump_stack.c:53
  nmi_cpu_backtrace+0x1d2/0x210 lib/nmi_backtrace.c:103
  nmi_trigger_cpumask_backtrace+0x123/0x180 lib/nmi_backtrace.c:62
  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
  trigger_all_cpu_backtrace include/linux/nmi.h:138 [inline]
  check_hung_task kernel/hung_task.c:132 [inline]
  check_hung_uninterruptible_tasks kernel/hung_task.c:190 [inline]
  watchdog+0x90c/0xd60 kernel/hung_task.c:249
  kthread+0x33c/0x400 kernel/kthread.c:238
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1 skipped: idling at native_safe_halt+0x6/0x10  
arch/x86/include/asm/irqflags.h:54


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkaller@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.
Note: all commands must start from beginning of the line in the email body.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: INFO: task hung in __get_super
  2018-04-01 17:08 INFO: task hung in __get_super syzbot
@ 2018-06-19 11:44 ` Tetsuo Handa
  2018-06-19 11:53   ` Dmitry Vyukov
  2019-04-28 18:14 ` syzbot
  1 sibling, 1 reply; 28+ messages in thread
From: Tetsuo Handa @ 2018-06-19 11:44 UTC (permalink / raw)
  To: syzbot, syzkaller-bugs; +Cc: linux-fsdevel, linux-kernel, viro

This bug report is getting no feedback, but I guess that this bug is in
block or mm or locking layer rather than fs layer.

NMI backtrace for this bug tends to report that sb_bread() from fill_super()
 from mount_bdev() is stalling is the cause of keep holding s_umount_key for
more than 120 seconds. What is strange is that NMI backtrace for this bug tends
to point at rcu_read_lock()/pagecache_get_page()/radix_tree_deref_slot()/
rcu_read_unlock() which is expected not to stall.

Since CONFIG_RCU_CPU_STALL_TIMEOUT is set to 120 (and actually +5 due to
CONFIG_PROVE_RCU=y) which is longer than CONFIG_DEFAULT_HUNG_TASK_TIMEOUT,
maybe setting CONFIG_RCU_CPU_STALL_TIMEOUT to smaller values (e.g. 25) can
give us some hints...

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: INFO: task hung in __get_super
  2018-06-19 11:44 ` Tetsuo Handa
@ 2018-06-19 11:53   ` Dmitry Vyukov
  2018-06-19 14:10     ` Tetsuo Handa
  0 siblings, 1 reply; 28+ messages in thread
From: Dmitry Vyukov @ 2018-06-19 11:53 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: syzbot, syzkaller-bugs, linux-fsdevel, LKML, Al Viro

On Tue, Jun 19, 2018 at 1:44 PM, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
> This bug report is getting no feedback, but I guess that this bug is in
> block or mm or locking layer rather than fs layer.
>
> NMI backtrace for this bug tends to report that sb_bread() from fill_super()
>  from mount_bdev() is stalling is the cause of keep holding s_umount_key for
> more than 120 seconds. What is strange is that NMI backtrace for this bug tends
> to point at rcu_read_lock()/pagecache_get_page()/radix_tree_deref_slot()/
> rcu_read_unlock() which is expected not to stall.
>
> Since CONFIG_RCU_CPU_STALL_TIMEOUT is set to 120 (and actually +5 due to
> CONFIG_PROVE_RCU=y) which is longer than CONFIG_DEFAULT_HUNG_TASK_TIMEOUT,
> maybe setting CONFIG_RCU_CPU_STALL_TIMEOUT to smaller values (e.g. 25) can
> give us some hints...

If an rcu stall is the true root cause of this, then I guess would see
"rcu stall" bug too. Rcu stall is detected after 120 seconds, but task
hang after 120-240 seconds. So rcu stall has much higher chances to be
detected. Do you see the corresponding "rcu stall" bug?

But, yes, we need to tune all timeouts. There is
https://github.com/google/syzkaller/issues/516 for this.
We also need "kernel/hung_task.c: allow to set checking interval
separately from timeout" to be merged:
https://groups.google.com/forum/#!topic/syzkaller/rOr3WBE-POY
as currently it's very hard to tune task hung timeout. But maybe we
will need similar patches for other watchdogs too if they have the
same problem.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: INFO: task hung in __get_super
  2018-06-19 11:53   ` Dmitry Vyukov
@ 2018-06-19 14:10     ` Tetsuo Handa
  2018-06-19 14:15       ` Dmitry Vyukov
  0 siblings, 1 reply; 28+ messages in thread
From: Tetsuo Handa @ 2018-06-19 14:10 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: syzbot, syzkaller-bugs, linux-fsdevel, LKML, Al Viro

On 2018/06/19 20:53, Dmitry Vyukov wrote:
> On Tue, Jun 19, 2018 at 1:44 PM, Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
>> This bug report is getting no feedback, but I guess that this bug is in
>> block or mm or locking layer rather than fs layer.
>>
>> NMI backtrace for this bug tends to report that sb_bread() from fill_super()
>>  from mount_bdev() is stalling is the cause of keep holding s_umount_key for
>> more than 120 seconds. What is strange is that NMI backtrace for this bug tends
>> to point at rcu_read_lock()/pagecache_get_page()/radix_tree_deref_slot()/
>> rcu_read_unlock() which is expected not to stall.
>>
>> Since CONFIG_RCU_CPU_STALL_TIMEOUT is set to 120 (and actually +5 due to
>> CONFIG_PROVE_RCU=y) which is longer than CONFIG_DEFAULT_HUNG_TASK_TIMEOUT,
>> maybe setting CONFIG_RCU_CPU_STALL_TIMEOUT to smaller values (e.g. 25) can
>> give us some hints...
> 
> If an rcu stall is the true root cause of this, then I guess would see
> "rcu stall" bug too. Rcu stall is detected after 120 seconds, but task
> hang after 120-240 seconds. So rcu stall has much higher chances to be
> detected. Do you see the corresponding "rcu stall" bug?

RCU stall is detected after 125 seconds due to CONFIG_PROVE_RCU=y
(e.g. https://syzkaller.appspot.com/bug?id=1fac0fd91219f3f2a03d6fa7deafc95fbed79cc2 ).

I didn't find the corresponding "rcu stall" bug. But it is not required
that one RCU stall takes longer than 120 seconds.

down(); // Will take 120 seconds due to multiple RCU stalls
  rcu_read_lock():
  do_something();
  rcu_read_unlock(): // Took 30 seconds for unknown reason.
  rcu_read_lock():
  do_something();
  rcu_read_unlock(): // Took 30 seconds for unknown reason.
  rcu_read_lock():
  do_something();
  rcu_read_unlock(): // Took 30 seconds for unknown reason.
  rcu_read_lock():
  do_something();
  rcu_read_unlock(): // Took 30 seconds for unknown reason.
up();

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: INFO: task hung in __get_super
  2018-06-19 14:10     ` Tetsuo Handa
@ 2018-06-19 14:15       ` Dmitry Vyukov
  2018-06-20  1:15         ` Tetsuo Handa
  0 siblings, 1 reply; 28+ messages in thread
From: Dmitry Vyukov @ 2018-06-19 14:15 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: syzbot, syzkaller-bugs, linux-fsdevel, LKML, Al Viro

On Tue, Jun 19, 2018 at 4:10 PM, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
> On 2018/06/19 20:53, Dmitry Vyukov wrote:
>> On Tue, Jun 19, 2018 at 1:44 PM, Tetsuo Handa
>> <penguin-kernel@i-love.sakura.ne.jp> wrote:
>>> This bug report is getting no feedback, but I guess that this bug is in
>>> block or mm or locking layer rather than fs layer.
>>>
>>> NMI backtrace for this bug tends to report that sb_bread() from fill_super()
>>>  from mount_bdev() is stalling is the cause of keep holding s_umount_key for
>>> more than 120 seconds. What is strange is that NMI backtrace for this bug tends
>>> to point at rcu_read_lock()/pagecache_get_page()/radix_tree_deref_slot()/
>>> rcu_read_unlock() which is expected not to stall.
>>>
>>> Since CONFIG_RCU_CPU_STALL_TIMEOUT is set to 120 (and actually +5 due to
>>> CONFIG_PROVE_RCU=y) which is longer than CONFIG_DEFAULT_HUNG_TASK_TIMEOUT,
>>> maybe setting CONFIG_RCU_CPU_STALL_TIMEOUT to smaller values (e.g. 25) can
>>> give us some hints...
>>
>> If an rcu stall is the true root cause of this, then I guess would see
>> "rcu stall" bug too. Rcu stall is detected after 120 seconds, but task
>> hang after 120-240 seconds. So rcu stall has much higher chances to be
>> detected. Do you see the corresponding "rcu stall" bug?
>
> RCU stall is detected after 125 seconds due to CONFIG_PROVE_RCU=y
> (e.g. https://syzkaller.appspot.com/bug?id=1fac0fd91219f3f2a03d6fa7deafc95fbed79cc2 ).
>
> I didn't find the corresponding "rcu stall" bug. But it is not required
> that one RCU stall takes longer than 120 seconds.
>
> down(); // Will take 120 seconds due to multiple RCU stalls
>   rcu_read_lock():
>   do_something();
>   rcu_read_unlock(): // Took 30 seconds for unknown reason.
>   rcu_read_lock():
>   do_something();
>   rcu_read_unlock(): // Took 30 seconds for unknown reason.
>   rcu_read_lock():
>   do_something();
>   rcu_read_unlock(): // Took 30 seconds for unknown reason.
>   rcu_read_lock():
>   do_something();
>   rcu_read_unlock(): // Took 30 seconds for unknown reason.
> up();


You think this is another false positive?
Like this one https://github.com/google/syzkaller/issues/516#issuecomment-395685629
?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: INFO: task hung in __get_super
  2018-06-19 14:15       ` Dmitry Vyukov
@ 2018-06-20  1:15         ` Tetsuo Handa
  0 siblings, 0 replies; 28+ messages in thread
From: Tetsuo Handa @ 2018-06-20  1:15 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: syzbot, syzkaller-bugs, linux-fsdevel, LKML, Al Viro

Dmitry Vyukov wrote:
> On Tue, Jun 19, 2018 at 4:10 PM, Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
> > On 2018/06/19 20:53, Dmitry Vyukov wrote:
> >> On Tue, Jun 19, 2018 at 1:44 PM, Tetsuo Handa
> >> <penguin-kernel@i-love.sakura.ne.jp> wrote:
> >>> This bug report is getting no feedback, but I guess that this bug is in
> >>> block or mm or locking layer rather than fs layer.
> >>>
> >>> NMI backtrace for this bug tends to report that sb_bread() from fill_super()
> >>>  from mount_bdev() is stalling is the cause of keep holding s_umount_key for
> >>> more than 120 seconds. What is strange is that NMI backtrace for this bug tends
> >>> to point at rcu_read_lock()/pagecache_get_page()/radix_tree_deref_slot()/
> >>> rcu_read_unlock() which is expected not to stall.
> >>>
> >>> Since CONFIG_RCU_CPU_STALL_TIMEOUT is set to 120 (and actually +5 due to
> >>> CONFIG_PROVE_RCU=y) which is longer than CONFIG_DEFAULT_HUNG_TASK_TIMEOUT,
> >>> maybe setting CONFIG_RCU_CPU_STALL_TIMEOUT to smaller values (e.g. 25) can
> >>> give us some hints...
> >>
> >> If an rcu stall is the true root cause of this, then I guess would see
> >> "rcu stall" bug too. Rcu stall is detected after 120 seconds, but task
> >> hang after 120-240 seconds. So rcu stall has much higher chances to be
> >> detected. Do you see the corresponding "rcu stall" bug?
> >
> > RCU stall is detected after 125 seconds due to CONFIG_PROVE_RCU=y
> > (e.g. https://syzkaller.appspot.com/bug?id=1fac0fd91219f3f2a03d6fa7deafc95fbed79cc2 ).
> >
> > I didn't find the corresponding "rcu stall" bug. But it is not required
> > that one RCU stall takes longer than 120 seconds.
> >
> > down(); // Will take 120 seconds due to multiple RCU stalls
> >   rcu_read_lock():
> >   do_something();
> >   rcu_read_unlock(): // Took 30 seconds for unknown reason.
> >   rcu_read_lock():
> >   do_something();
> >   rcu_read_unlock(): // Took 30 seconds for unknown reason.
> >   rcu_read_lock():
> >   do_something();
> >   rcu_read_unlock(): // Took 30 seconds for unknown reason.
> >   rcu_read_lock():
> >   do_something();
> >   rcu_read_unlock(): // Took 30 seconds for unknown reason.
> > up();
> 
> 
> You think this is another false positive?
> Like this one https://github.com/google/syzkaller/issues/516#issuecomment-395685629
> ?

According to https://syzkaller.appspot.com/text?tag=CrashLog&x=11db16c4400000 from
"INFO: rcu detected stall in __process_echoes":

[  859.630022] INFO: rcu_sched self-detected stall on CPU
[  859.635509] 	0-....: (124999 ticks this GP) idle=30e/1/4611686018427387906 softirq=287964/287964 fqs=31234 
[  859.645716] 	 (t=125000 jiffies g=156333 c=156332 q=555)
(...snipped...)
[  860.266660]  ? process_one_work+0x1ba0/0x1ba0
[  860.271135]  ? kthread_bind+0x40/0x40
[  860.274927]  ret_from_fork+0x3a/0x50
[  861.152252] INFO: task kworker/u4:2:59 blocked for more than 120 seconds.
[  861.159245]       Not tainted 4.18.0-rc1+ #109
[  861.163851] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

RCU stall was reported immediately before khungtaskd fires. Since syzbot assigns
only 2 CPUs, it might not be rare case that a hung process was unable to run just
because somebody else kept occupying CPU resources.

Well, "BUG: soft lockup in __process_echoes" will be a dup of
"INFO: rcu detected stall in __process_echoes". I wonder why
softlockup detector waited for 135 seconds.

Well, "BUG: soft lockup in shrink_dcache_parent (2)" and
"BUG: soft lockup in snd_virmidi_output_trigger" and
"BUG: soft lockup in smp_call_function_many" and
"BUG: soft lockup in do_raw_spin_unlock (2)" as well waited for 134 seconds
while "BUG: soft lockup in d_walk" waited for only 22 seconds...

Anyway, I think that in some cases RCU stalls/soft lockups are the cause of hung tasks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: INFO: task hung in __get_super
  2018-04-01 17:08 INFO: task hung in __get_super syzbot
  2018-06-19 11:44 ` Tetsuo Handa
@ 2019-04-28 18:14 ` syzbot
  2019-04-28 18:51   ` Al Viro
                     ` (2 more replies)
  1 sibling, 3 replies; 28+ messages in thread
From: syzbot @ 2019-04-28 18:14 UTC (permalink / raw)
  To: axboe, dvyukov, jack, linux-fsdevel, linux-kernel,
	penguin-kernel, penguin-kernel, syzkaller-bugs, viro

syzbot has found a reproducer for the following crash on:

HEAD commit:    037904a2 Merge branch 'x86-urgent-for-linus' of git://git...
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=135ff034a00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=a42d110b47dd6b36
dashboard link: https://syzkaller.appspot.com/bug?extid=10007d66ca02b08f0e60
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1291b1f4a00000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=135385a8a00000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+10007d66ca02b08f0e60@syzkaller.appspotmail.com

INFO: task syz-executor274:8097 blocked for more than 143 seconds.
       Not tainted 5.1.0-rc6+ #89
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor274 D28008  8097   8041 0x00000004
Call Trace:
  context_switch kernel/sched/core.c:2877 [inline]
  __schedule+0x813/0x1cc0 kernel/sched/core.c:3518
  schedule+0x92/0x180 kernel/sched/core.c:3562
  __rwsem_down_read_failed_common kernel/locking/rwsem-xadd.c:285 [inline]
  rwsem_down_read_failed+0x213/0x420 kernel/locking/rwsem-xadd.c:302
  call_rwsem_down_read_failed+0x18/0x30 arch/x86/lib/rwsem.S:94
  __down_read arch/x86/include/asm/rwsem.h:83 [inline]
  down_read+0x49/0x90 kernel/locking/rwsem.c:26
  __get_super.part.0+0x203/0x2e0 fs/super.c:788
  __get_super include/linux/spinlock.h:329 [inline]
  get_super+0x2e/0x50 fs/super.c:817
  fsync_bdev+0x19/0xd0 fs/block_dev.c:525
  invalidate_partition+0x36/0x60 block/genhd.c:1581
  drop_partitions block/partition-generic.c:443 [inline]
  rescan_partitions+0xef/0xa20 block/partition-generic.c:516
  __blkdev_reread_part+0x1a2/0x230 block/ioctl.c:173
  blkdev_reread_part+0x27/0x40 block/ioctl.c:193
  loop_reread_partitions+0x1c/0x40 drivers/block/loop.c:633
  loop_set_status+0xe57/0x1380 drivers/block/loop.c:1296
  loop_set_status64+0xc2/0x120 drivers/block/loop.c:1416
  lo_ioctl+0x8fc/0x2150 drivers/block/loop.c:1559
  __blkdev_driver_ioctl block/ioctl.c:303 [inline]
  blkdev_ioctl+0x6f2/0x1d10 block/ioctl.c:605
  block_ioctl+0xee/0x130 fs/block_dev.c:1933
  vfs_ioctl fs/ioctl.c:46 [inline]
  file_ioctl fs/ioctl.c:509 [inline]
  do_vfs_ioctl+0xd6e/0x1390 fs/ioctl.c:696
  ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
  __do_sys_ioctl fs/ioctl.c:720 [inline]
  __se_sys_ioctl fs/ioctl.c:718 [inline]
  __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
  do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x441937
Code: 48 83 c4 08 48 89 d8 5b 5d c3 66 0f 1f 84 00 00 00 00 00 48 89 e8 48  
f7 d8 48 39 c3 0f 92 c0 eb 92 66 90 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 8d 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffcabed4e08 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000441937
RDX: 00007ffcabed4ea0 RSI: 0000000000004c04 RDI: 0000000000000004
RBP: 0000000000000003 R08: 0000000000000000 R09: 000000000000000a
R10: 0000000000000075 R11: 0000000000000202 R12: 0000000000000001
R13: 0000000000000004 R14: 0000000000000000 R15: 0000000000000000
INFO: task blkid:8099 blocked for more than 143 seconds.
       Not tainted 5.1.0-rc6+ #89
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
blkid           D27504  8099   8021 0x00000004
Call Trace:
  context_switch kernel/sched/core.c:2877 [inline]
  __schedule+0x813/0x1cc0 kernel/sched/core.c:3518
  schedule+0x92/0x180 kernel/sched/core.c:3562
  schedule_preempt_disabled+0x13/0x20 kernel/sched/core.c:3620
  __mutex_lock_common kernel/locking/mutex.c:1002 [inline]
  __mutex_lock+0x726/0x1310 kernel/locking/mutex.c:1072
  mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1087
  blkdev_put+0x34/0x560 fs/block_dev.c:1866
  blkdev_close+0x8b/0xb0 fs/block_dev.c:1915
  __fput+0x2e5/0x8d0 fs/file_table.c:278
  ____fput+0x16/0x20 fs/file_table.c:309
  task_work_run+0x14a/0x1c0 kernel/task_work.c:113
  tracehook_notify_resume include/linux/tracehook.h:188 [inline]
  exit_to_usermode_loop+0x273/0x2c0 arch/x86/entry/common.c:166
  prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
  do_syscall_64+0x52d/0x610 arch/x86/entry/common.c:293
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f1ae9c432b0
Code: Bad RIP value.
RSP: 002b:00007ffc29ff6028 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007f1ae9c432b0
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000028 R09: 0000000001680000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000b9d030
R13: 0000000000000000 R14: 0000000000000003 R15: 0000000000000005

Showing all locks held in the system:
1 lock held by khungtaskd/1041:
  #0: 0000000027887009 (rcu_read_lock){....}, at:  
debug_show_all_locks+0x5f/0x27e kernel/locking/lockdep.c:5057
1 lock held by rs:main Q:Reg/7921:
  #0: 00000000e00580d7 (&rq->lock){-.-.}, at: rq_lock  
kernel/sched/sched.h:1168 [inline]
  #0: 00000000e00580d7 (&rq->lock){-.-.}, at: __schedule+0x1f8/0x1cc0  
kernel/sched/core.c:3456
1 lock held by rsyslogd/7923:
  #0: 00000000488dcec4 (&f->f_pos_lock){+.+.}, at: __fdget_pos+0xee/0x110  
fs/file.c:801
2 locks held by getty/8014:
  #0: 000000001b56f3c3 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
  #1: 00000000da9faa8c (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
2 locks held by getty/8015:
  #0: 00000000e00c81bb (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
  #1: 000000008d689a2e (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
2 locks held by getty/8016:
  #0: 00000000177f6359 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
  #1: 0000000096437898 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
2 locks held by getty/8017:
  #0: 000000002db00e12 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
  #1: 0000000071f2d88e (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
2 locks held by getty/8018:
  #0: 00000000a41b6290 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
  #1: 00000000c340e26f (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
2 locks held by getty/8019:
  #0: 00000000bca104ce (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
  #1: 000000007e045212 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
2 locks held by getty/8020:
  #0: 0000000070fdafae (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
  #1: 00000000b4e26fa9 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
1 lock held by syz-executor274/8083:
2 locks held by syz-executor274/8097:
  #0: 000000007a5ed526 (&bdev->bd_mutex){+.+.}, at:  
blkdev_reread_part+0x1f/0x40 block/ioctl.c:192
  #1: 0000000067606e21 (&type->s_umount_key#39){.+.+}, at:  
__get_super.part.0+0x203/0x2e0 fs/super.c:788
1 lock held by blkid/8099:
  #0: 000000007a5ed526 (&bdev->bd_mutex){+.+.}, at: blkdev_put+0x34/0x560  
fs/block_dev.c:1866
2 locks held by syz-executor274/11705:
  #0: 000000002b6bbb34 (&bdev->bd_mutex){+.+.}, at: __blkdev_put+0xbb/0x810  
fs/block_dev.c:1833
  #1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: lo_release+0x1f/0x200  
drivers/block/loop.c:1755
2 locks held by syz-executor274/11709:
  #0: 00000000a45cb906 (&bdev->bd_mutex){+.+.}, at: __blkdev_put+0xbb/0x810  
fs/block_dev.c:1833
  #1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: lo_release+0x1f/0x200  
drivers/block/loop.c:1755
2 locks held by syz-executor274/11716:
  #0: 00000000a19e2025 (&type->s_umount_key#38/1){+.+.}, at:  
alloc_super+0x158/0x890 fs/super.c:228
  #1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: lo_simple_ioctl  
drivers/block/loop.c:1514 [inline]
  #1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: lo_ioctl+0x266/0x2150  
drivers/block/loop.c:1572
2 locks held by syz-executor274/11717:
  #0: 00000000e185c083 (&type->s_umount_key#38/1){+.+.}, at:  
alloc_super+0x158/0x890 fs/super.c:228
  #1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: lo_simple_ioctl  
drivers/block/loop.c:1514 [inline]
  #1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: lo_ioctl+0x266/0x2150  
drivers/block/loop.c:1572
2 locks held by blkid/11718:
  #0: 000000003d9e77b2 (&bdev->bd_mutex){+.+.}, at: __blkdev_put+0xbb/0x810  
fs/block_dev.c:1833
  #1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: __loop_clr_fd+0x88/0xd60  
drivers/block/loop.c:1046
2 locks held by blkid/11720:
  #0: 000000000c0297bc (&bdev->bd_mutex){+.+.}, at: __blkdev_put+0xbb/0x810  
fs/block_dev.c:1833
  #1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: lo_release+0x1f/0x200  
drivers/block/loop.c:1755

=============================================

NMI backtrace for cpu 0
CPU: 0 PID: 1041 Comm: khungtaskd Not tainted 5.1.0-rc6+ #89
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x172/0x1f0 lib/dump_stack.c:113
  nmi_cpu_backtrace.cold+0x63/0xa4 lib/nmi_backtrace.c:101
  nmi_trigger_cpumask_backtrace+0x1be/0x236 lib/nmi_backtrace.c:62
  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
  trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
  check_hung_uninterruptible_tasks kernel/hung_task.c:204 [inline]
  watchdog+0x9b7/0xec0 kernel/hung_task.c:288
  kthread+0x357/0x430 kernel/kthread.c:253
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
Sending NMI from CPU 0 to CPUs 1:


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: INFO: task hung in __get_super
  2019-04-28 18:14 ` syzbot
@ 2019-04-28 18:51   ` Al Viro
  2019-04-29  1:38     ` Tetsuo Handa
                       ` (2 more replies)
  2019-05-14  5:15   ` Tetsuo Handa
  2019-05-15  1:02   ` Tetsuo Handa
  2 siblings, 3 replies; 28+ messages in thread
From: Al Viro @ 2019-04-28 18:51 UTC (permalink / raw)
  To: syzbot
  Cc: axboe, dvyukov, jack, linux-fsdevel, linux-kernel,
	penguin-kernel, syzkaller-bugs

On Sun, Apr 28, 2019 at 11:14:06AM -0700, syzbot wrote:
>  down_read+0x49/0x90 kernel/locking/rwsem.c:26
>  __get_super.part.0+0x203/0x2e0 fs/super.c:788
>  __get_super include/linux/spinlock.h:329 [inline]
>  get_super+0x2e/0x50 fs/super.c:817
>  fsync_bdev+0x19/0xd0 fs/block_dev.c:525
>  invalidate_partition+0x36/0x60 block/genhd.c:1581
>  drop_partitions block/partition-generic.c:443 [inline]
>  rescan_partitions+0xef/0xa20 block/partition-generic.c:516
>  __blkdev_reread_part+0x1a2/0x230 block/ioctl.c:173
>  blkdev_reread_part+0x27/0x40 block/ioctl.c:193
>  loop_reread_partitions+0x1c/0x40 drivers/block/loop.c:633
>  loop_set_status+0xe57/0x1380 drivers/block/loop.c:1296
>  loop_set_status64+0xc2/0x120 drivers/block/loop.c:1416
>  lo_ioctl+0x8fc/0x2150 drivers/block/loop.c:1559
>  __blkdev_driver_ioctl block/ioctl.c:303 [inline]
>  blkdev_ioctl+0x6f2/0x1d10 block/ioctl.c:605
>  block_ioctl+0xee/0x130 fs/block_dev.c:1933
>  vfs_ioctl fs/ioctl.c:46 [inline]
>  file_ioctl fs/ioctl.c:509 [inline]
>  do_vfs_ioctl+0xd6e/0x1390 fs/ioctl.c:696
>  ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
>  __do_sys_ioctl fs/ioctl.c:720 [inline]
>  __se_sys_ioctl fs/ioctl.c:718 [inline]
>  __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
>  do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe

ioctl(..., BLKRRPART) blocked on ->s_umount in __get_super().
The trouble is, the only things holding ->s_umount appears to be
these:

> 2 locks held by syz-executor274/11716:
>  #0: 00000000a19e2025 (&type->s_umount_key#38/1){+.+.}, at:
> alloc_super+0x158/0x890 fs/super.c:228
>  #1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: lo_simple_ioctl
> drivers/block/loop.c:1514 [inline]
>  #1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: lo_ioctl+0x266/0x2150
> drivers/block/loop.c:1572

> 2 locks held by syz-executor274/11717:
>  #0: 00000000e185c083 (&type->s_umount_key#38/1){+.+.}, at:
> alloc_super+0x158/0x890 fs/super.c:228
>  #1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: lo_simple_ioctl
> drivers/block/loop.c:1514 [inline]
>  #1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: lo_ioctl+0x266/0x2150
> drivers/block/loop.c:1572

... and that's bollocks.  ->s_umount held there is that on freshly allocated
superblock.  It *MUST* be in mount(2); no other syscall should be able to
call alloc_super() in the first place.  So what the hell is that doing
trying to call lo_ioctl() inside mount(2)?  Something like isofs attempting
cdrom ioctls on the underlying device?

Why do we have loop_func_table->ioctl(), BTW?  All in-tree instances are
either NULL or return -EINVAL unconditionally.  Considering that the
caller is
                err = lo->ioctl ? lo->ioctl(lo, cmd, arg) : -EINVAL;
we could bloody well just get rid of cryptoloop_ioctl() (the only
non-NULL instance) and get rid of calling lo_simple_ioctl() in
lo_ioctl() switch's default.

Something like this:

diff --git a/drivers/block/cryptoloop.c b/drivers/block/cryptoloop.c
index 254ee7d54e91..f16468a562f5 100644
--- a/drivers/block/cryptoloop.c
+++ b/drivers/block/cryptoloop.c
@@ -167,12 +167,6 @@ cryptoloop_transfer(struct loop_device *lo, int cmd,
 }
 
 static int
-cryptoloop_ioctl(struct loop_device *lo, int cmd, unsigned long arg)
-{
-	return -EINVAL;
-}
-
-static int
 cryptoloop_release(struct loop_device *lo)
 {
 	struct crypto_sync_skcipher *tfm = lo->key_data;
@@ -188,7 +182,6 @@ cryptoloop_release(struct loop_device *lo)
 static struct loop_func_table cryptoloop_funcs = {
 	.number = LO_CRYPT_CRYPTOAPI,
 	.init = cryptoloop_init,
-	.ioctl = cryptoloop_ioctl,
 	.transfer = cryptoloop_transfer,
 	.release = cryptoloop_release,
 	.owner = THIS_MODULE
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index bf1c61cab8eb..2ec162b80562 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -955,7 +955,6 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
 	lo->lo_flags = lo_flags;
 	lo->lo_backing_file = file;
 	lo->transfer = NULL;
-	lo->ioctl = NULL;
 	lo->lo_sizelimit = 0;
 	lo->old_gfp_mask = mapping_gfp_mask(mapping);
 	mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS));
@@ -1064,7 +1063,6 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
 
 	loop_release_xfer(lo);
 	lo->transfer = NULL;
-	lo->ioctl = NULL;
 	lo->lo_device = NULL;
 	lo->lo_encryption = NULL;
 	lo->lo_offset = 0;
@@ -1262,7 +1260,6 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
 	if (!xfer)
 		xfer = &none_funcs;
 	lo->transfer = xfer->transfer;
-	lo->ioctl = xfer->ioctl;
 
 	if ((lo->lo_flags & LO_FLAGS_AUTOCLEAR) !=
 	     (info->lo_flags & LO_FLAGS_AUTOCLEAR))
@@ -1525,7 +1522,7 @@ static int lo_simple_ioctl(struct loop_device *lo, unsigned int cmd,
 		err = loop_set_block_size(lo, arg);
 		break;
 	default:
-		err = lo->ioctl ? lo->ioctl(lo, cmd, arg) : -EINVAL;
+		err = -EINVAL;
 	}
 	mutex_unlock(&loop_ctl_mutex);
 	return err;
@@ -1567,10 +1564,9 @@ static int lo_ioctl(struct block_device *bdev, fmode_t mode,
 	case LOOP_SET_BLOCK_SIZE:
 		if (!(mode & FMODE_WRITE) && !capable(CAP_SYS_ADMIN))
 			return -EPERM;
-		/* Fall through */
+		return lo_simple_ioctl(lo, cmd, arg);
 	default:
-		err = lo_simple_ioctl(lo, cmd, arg);
-		break;
+		return -EINVAL;
 	}
 
 	return err;
diff --git a/drivers/block/loop.h b/drivers/block/loop.h
index af75a5ee4094..56a9a0c161d7 100644
--- a/drivers/block/loop.h
+++ b/drivers/block/loop.h
@@ -84,7 +84,6 @@ struct loop_func_table {
 	int (*init)(struct loop_device *, const struct loop_info64 *); 
 	/* release is called from loop_unregister_transfer or clr_fd */
 	int (*release)(struct loop_device *); 
-	int (*ioctl)(struct loop_device *, int cmd, unsigned long arg);
 	struct module *owner;
 }; 
 

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: INFO: task hung in __get_super
  2019-04-28 18:51   ` Al Viro
@ 2019-04-29  1:38     ` Tetsuo Handa
  2019-04-29  5:30     ` Dmitry Vyukov
  2019-04-30  2:55     ` Jan Kara
  2 siblings, 0 replies; 28+ messages in thread
From: Tetsuo Handa @ 2019-04-29  1:38 UTC (permalink / raw)
  To: Al Viro, syzbot
  Cc: axboe, dvyukov, jack, linux-fsdevel, linux-kernel, syzkaller-bugs

On 2019/04/29 3:51, Al Viro wrote:
> ioctl(..., BLKRRPART) blocked on ->s_umount in __get_super().
> The trouble is, the only things holding ->s_umount appears to be
> these:

Not always true. lockdep_print_held_locks() from debug_show_all_locks() can not
report locks held by TASK_RUNNING threads. Due to enabling CONFIG_PRINTK_CALLER=y,
the output from trigger_all_cpu_backtrace() is no longer included into the report
file (i.e. premature truncation) because NMI backtrace is printed from a different
printk() context. If we check the console output, we can understand that

>> 1 lock held by syz-executor274/8083:

was doing mount(2). Since there is a possibility that that thread was looping for
many seconds enough to trigger khungtaskd warnings, we can't tell whether this is
a locking dependency problem.

----------------------------------------
[ 1107.252933][ T1041] NMI backtrace for cpu 0
[ 1107.257402][ T1041] CPU: 0 PID: 1041 Comm: khungtaskd Not tainted 5.1.0-rc6+ #89
[ 1107.264960][ T1041] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ 1107.275380][ T1041] Call Trace:
[ 1107.278691][ T1041]  dump_stack+0x172/0x1f0
[ 1107.283216][ T1041]  nmi_cpu_backtrace.cold+0x63/0xa4
[ 1107.288469][ T1041]  ? lapic_can_unplug_cpu.cold+0x38/0x38
[ 1107.294155][ T1041]  nmi_trigger_cpumask_backtrace+0x1be/0x236
[ 1107.300256][ T1041]  arch_trigger_cpumask_backtrace+0x14/0x20
[ 1107.306174][ T1041]  watchdog+0x9b7/0xec0
[ 1107.310362][ T1041]  kthread+0x357/0x430
[ 1107.314446][ T1041]  ? reset_hung_task_detector+0x30/0x30
[ 1107.320016][ T1041]  ? kthread_cancel_delayed_work_sync+0x20/0x20
[ 1107.326280][ T1041]  ret_from_fork+0x3a/0x50
[ 1107.331403][ T1041] Sending NMI from CPU 0 to CPUs 1:
[ 1107.337617][    C1] NMI backtrace for cpu 1
[ 1107.337625][    C1] CPU: 1 PID: 8083 Comm: syz-executor274 Not tainted 5.1.0-rc6+ #89
[ 1107.337631][    C1] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ 1107.337636][    C1] RIP: 0010:debug_lockdep_rcu_enabled.part.0+0xb/0x60
[ 1107.337648][    C1] Code: 5b 5d c3 e8 67 71 e5 ff 0f 1f 80 00 00 00 00 55 48 89 e5 e8 37 ff ff ff 5d c3 0f 1f 44 00 00 48 b8 00 00 00 00 00 fc ff df 55 <48> 89 e5 53 65 48 8b 1c 25 00 ee 01 00 48 8d bb 7c 08 00 00 48 89
[ 1107.337652][    C1] RSP: 0018:ffff8880a85274c8 EFLAGS: 00000202
[ 1107.337661][    C1] RAX: dffffc0000000000 RBX: ffff8880a85275d8 RCX: 1ffffffff12bcd63
[ 1107.337666][    C1] RDX: 0000000000000000 RSI: ffffffff870d8f3c RDI: ffff8880a85275e0
[ 1107.337672][    C1] RBP: ffff8880a85274d8 R08: ffff888081e68540 R09: ffffed1015d25bc8
[ 1107.337677][    C1] R10: ffffed1015d25bc7 R11: ffff8880ae92de3b R12: 0000000000000000
[ 1107.337683][    C1] R13: ffff8880a694d640 R14: ffff88809541b942 R15: 0000000000000006
[ 1107.337689][    C1] FS:  0000000000e0b880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
[ 1107.337693][    C1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1107.337699][    C1] CR2: ffffffffff600400 CR3: 0000000092d6f000 CR4: 00000000001406e0
[ 1107.337704][    C1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1107.337710][    C1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1107.337713][    C1] Call Trace:
[ 1107.337717][    C1]  ? debug_lockdep_rcu_enabled+0x71/0xa0
[ 1107.337721][    C1]  xas_descend+0xbf/0x370
[ 1107.337724][    C1]  xas_load+0xef/0x150
[ 1107.337728][    C1]  find_get_entry+0x13d/0x880
[ 1107.337733][    C1]  ? find_get_entries_tag+0xc10/0xc10
[ 1107.337736][    C1]  ? mark_held_locks+0xa4/0xf0
[ 1107.337741][    C1]  ? pagecache_get_page+0x1a8/0x740
[ 1107.337745][    C1]  pagecache_get_page+0x4c/0x740
[ 1107.337749][    C1]  __getblk_gfp+0x27e/0x970
[ 1107.337752][    C1]  __bread_gfp+0x2f/0x300
[ 1107.337756][    C1]  udf_tread+0xf1/0x140
[ 1107.337760][    C1]  udf_read_tagged+0x50/0x530
[ 1107.337764][    C1]  udf_check_anchor_block+0x1ef/0x680
[ 1107.337768][    C1]  ? blkpg_ioctl+0xa90/0xa90
[ 1107.337772][    C1]  ? udf_process_sequence+0x35d0/0x35d0
[ 1107.337776][    C1]  ? submit_bio+0xba/0x480
[ 1107.337780][    C1]  udf_scan_anchors+0x3f4/0x680
[ 1107.337784][    C1]  ? udf_check_anchor_block+0x680/0x680
[ 1107.337789][    C1]  ? __sanitizer_cov_trace_const_cmp8+0x18/0x20
[ 1107.337793][    C1]  ? udf_get_last_session+0x120/0x120
[ 1107.337797][    C1]  udf_load_vrs+0x67f/0xc80
[ 1107.337801][    C1]  ? udf_scan_anchors+0x680/0x680
[ 1107.337805][    C1]  ? udf_bread+0x260/0x260
[ 1107.337809][    C1]  ? lockdep_init_map+0x1be/0x6d0
[ 1107.337813][    C1]  udf_fill_super+0x7d8/0x16d1
[ 1107.337817][    C1]  ? udf_load_vrs+0xc80/0xc80
[ 1107.337820][    C1]  ? vsprintf+0x40/0x40
[ 1107.337824][    C1]  ? set_blocksize+0x2bf/0x340
[ 1107.337829][    C1]  ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[ 1107.337833][    C1]  mount_bdev+0x307/0x3c0
[ 1107.337837][    C1]  ? udf_load_vrs+0xc80/0xc80
[ 1107.337840][    C1]  udf_mount+0x35/0x40
[ 1107.337844][    C1]  ? udf_get_pblock_meta25+0x3a0/0x3a0
[ 1107.337848][    C1]  legacy_get_tree+0xf2/0x200
[ 1107.337853][    C1]  ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[ 1107.337857][    C1]  vfs_get_tree+0x123/0x450
[ 1107.337860][    C1]  do_mount+0x1436/0x2c40
[ 1107.337864][    C1]  ? copy_mount_string+0x40/0x40
[ 1107.337868][    C1]  ? _copy_from_user+0xdd/0x150
[ 1107.337873][    C1]  ? __sanitizer_cov_trace_const_cmp8+0x18/0x20
[ 1107.337877][    C1]  ? copy_mount_options+0x280/0x3a0
[ 1107.337881][    C1]  ksys_mount+0xdb/0x150
[ 1107.337885][    C1]  __x64_sys_mount+0xbe/0x150
[ 1107.337889][    C1]  do_syscall_64+0x103/0x610
[ 1107.337893][    C1]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1107.337897][    C1] RIP: 0033:0x441a49
[ 1107.337909][    C1] Code: ad 07 fc ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b 07 fc ff c3 66 2e 0f 1f 84 00 00 00 00
[ 1107.337913][    C1] RSP: 002b:00007ffcabed5048 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[ 1107.337923][    C1] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000441a49
[ 1107.337929][    C1] RDX: 0000000020000140 RSI: 0000000020000080 RDI: 0000000020000000
[ 1107.337934][    C1] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 1107.337940][    C1] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
[ 1107.337946][    C1] R13: 00000000004026e0 R14: 0000000000000000 R15: 0000000000000000
[ 1107.339387][ T1041] Kernel panic - not syncing: hung_task: blocked tasks
[ 1107.787589][ T1041] CPU: 0 PID: 1041 Comm: khungtaskd Not tainted 5.1.0-rc6+ #89
[ 1107.795319][ T1041] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ 1107.806082][ T1041] Call Trace:
[ 1107.809403][ T1041]  dump_stack+0x172/0x1f0
[ 1107.813776][ T1041]  panic+0x2cb/0x65c
[ 1107.817695][ T1041]  ? __warn_printk+0xf3/0xf3
[ 1107.822391][ T1041]  ? lapic_can_unplug_cpu.cold+0x38/0x38
[ 1107.828236][ T1041]  ? ___preempt_schedule+0x16/0x18
[ 1107.833386][ T1041]  ? nmi_trigger_cpumask_backtrace+0x19e/0x236
[ 1107.839739][ T1041]  ? nmi_trigger_cpumask_backtrace+0x1fa/0x236
[ 1107.846024][ T1041]  ? nmi_trigger_cpumask_backtrace+0x204/0x236
[ 1107.852196][ T1041]  ? nmi_trigger_cpumask_backtrace+0x19e/0x236
[ 1107.858482][ T1041]  watchdog+0x9c8/0xec0
[ 1107.862858][ T1041]  kthread+0x357/0x430
[ 1107.866943][ T1041]  ? reset_hung_task_detector+0x30/0x30
[ 1107.872522][ T1041]  ? kthread_cancel_delayed_work_sync+0x20/0x20
[ 1107.878799][ T1041]  ret_from_fork+0x3a/0x50
[ 1107.884924][ T1041] Kernel Offset: disabled
[ 1107.889301][ T1041] Rebooting in 86400 seconds..
----------------------------------------

I don't know whether "it is not safe to print locks held by TASK_RUNNING threads"
remains true. But since a thread's state can change at any moment, there is no
guarantee that only locks held by !TASK_RUNNING threads will be printed by
debug_show_all_locks(), I guess that allow printing all locks at their own risk
using some kernel config is fine...

Also, we could replace trigger_all_cpu_backtrace() with a new function which scans
all threads and dumps threads with ->on_cpu == 1 so that the output comes from
the same printk() context.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: INFO: task hung in __get_super
  2019-04-28 18:51   ` Al Viro
  2019-04-29  1:38     ` Tetsuo Handa
@ 2019-04-29  5:30     ` Dmitry Vyukov
  2019-04-30  2:55     ` Jan Kara
  2 siblings, 0 replies; 28+ messages in thread
From: Dmitry Vyukov @ 2019-04-29  5:30 UTC (permalink / raw)
  To: Al Viro
  Cc: syzbot, Jens Axboe, Jan Kara, linux-fsdevel, LKML, Tetsuo Handa,
	syzkaller-bugs, Peter Zijlstra

On Sun, Apr 28, 2019 at 8:51 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Sun, Apr 28, 2019 at 11:14:06AM -0700, syzbot wrote:
> >  down_read+0x49/0x90 kernel/locking/rwsem.c:26
> >  __get_super.part.0+0x203/0x2e0 fs/super.c:788
> >  __get_super include/linux/spinlock.h:329 [inline]
> >  get_super+0x2e/0x50 fs/super.c:817
> >  fsync_bdev+0x19/0xd0 fs/block_dev.c:525
> >  invalidate_partition+0x36/0x60 block/genhd.c:1581
> >  drop_partitions block/partition-generic.c:443 [inline]
> >  rescan_partitions+0xef/0xa20 block/partition-generic.c:516
> >  __blkdev_reread_part+0x1a2/0x230 block/ioctl.c:173
> >  blkdev_reread_part+0x27/0x40 block/ioctl.c:193
> >  loop_reread_partitions+0x1c/0x40 drivers/block/loop.c:633
> >  loop_set_status+0xe57/0x1380 drivers/block/loop.c:1296
> >  loop_set_status64+0xc2/0x120 drivers/block/loop.c:1416
> >  lo_ioctl+0x8fc/0x2150 drivers/block/loop.c:1559
> >  __blkdev_driver_ioctl block/ioctl.c:303 [inline]
> >  blkdev_ioctl+0x6f2/0x1d10 block/ioctl.c:605
> >  block_ioctl+0xee/0x130 fs/block_dev.c:1933
> >  vfs_ioctl fs/ioctl.c:46 [inline]
> >  file_ioctl fs/ioctl.c:509 [inline]
> >  do_vfs_ioctl+0xd6e/0x1390 fs/ioctl.c:696
> >  ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
> >  __do_sys_ioctl fs/ioctl.c:720 [inline]
> >  __se_sys_ioctl fs/ioctl.c:718 [inline]
> >  __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
> >  do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
> >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> ioctl(..., BLKRRPART) blocked on ->s_umount in __get_super().
> The trouble is, the only things holding ->s_umount appears to be
> these:
>
> > 2 locks held by syz-executor274/11716:
> >  #0: 00000000a19e2025 (&type->s_umount_key#38/1){+.+.}, at:
> > alloc_super+0x158/0x890 fs/super.c:228
> >  #1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: lo_simple_ioctl
> > drivers/block/loop.c:1514 [inline]
> >  #1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: lo_ioctl+0x266/0x2150
> > drivers/block/loop.c:1572
>
> > 2 locks held by syz-executor274/11717:
> >  #0: 00000000e185c083 (&type->s_umount_key#38/1){+.+.}, at:
> > alloc_super+0x158/0x890 fs/super.c:228
> >  #1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: lo_simple_ioctl
> > drivers/block/loop.c:1514 [inline]
> >  #1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: lo_ioctl+0x266/0x2150
> > drivers/block/loop.c:1572
>
> ... and that's bollocks.  ->s_umount held there is that on freshly allocated
> superblock.  It *MUST* be in mount(2); no other syscall should be able to
> call alloc_super() in the first place.  So what the hell is that doing
> trying to call lo_ioctl() inside mount(2)?  Something like isofs attempting
> cdrom ioctls on the underlying device?


How useful would it be to see full stacks in such lockdep reports?
Now that we have lib/stackdepot.c that is capable of memorizing large
number of stacks and converting them to a single u32, we could use it
in more debug facilities. I remember +Peter mentioned some problems
with interrupts/reenterancy of stackdepot, but I hope it's resolvable
(at least in some conservative way as we already call stackdepot from
interrupts).
I think ODEBUG facility have the same problem of showing only single
PC in reports for a past stack.
Should I file an issue for this?


> Why do we have loop_func_table->ioctl(), BTW?  All in-tree instances are
> either NULL or return -EINVAL unconditionally.  Considering that the
> caller is
>                 err = lo->ioctl ? lo->ioctl(lo, cmd, arg) : -EINVAL;
> we could bloody well just get rid of cryptoloop_ioctl() (the only
> non-NULL instance) and get rid of calling lo_simple_ioctl() in
> lo_ioctl() switch's default.
>
> Something like this:
>
> diff --git a/drivers/block/cryptoloop.c b/drivers/block/cryptoloop.c
> index 254ee7d54e91..f16468a562f5 100644
> --- a/drivers/block/cryptoloop.c
> +++ b/drivers/block/cryptoloop.c
> @@ -167,12 +167,6 @@ cryptoloop_transfer(struct loop_device *lo, int cmd,
>  }
>
>  static int
> -cryptoloop_ioctl(struct loop_device *lo, int cmd, unsigned long arg)
> -{
> -       return -EINVAL;
> -}
> -
> -static int
>  cryptoloop_release(struct loop_device *lo)
>  {
>         struct crypto_sync_skcipher *tfm = lo->key_data;
> @@ -188,7 +182,6 @@ cryptoloop_release(struct loop_device *lo)
>  static struct loop_func_table cryptoloop_funcs = {
>         .number = LO_CRYPT_CRYPTOAPI,
>         .init = cryptoloop_init,
> -       .ioctl = cryptoloop_ioctl,
>         .transfer = cryptoloop_transfer,
>         .release = cryptoloop_release,
>         .owner = THIS_MODULE
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index bf1c61cab8eb..2ec162b80562 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -955,7 +955,6 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
>         lo->lo_flags = lo_flags;
>         lo->lo_backing_file = file;
>         lo->transfer = NULL;
> -       lo->ioctl = NULL;
>         lo->lo_sizelimit = 0;
>         lo->old_gfp_mask = mapping_gfp_mask(mapping);
>         mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS));
> @@ -1064,7 +1063,6 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
>
>         loop_release_xfer(lo);
>         lo->transfer = NULL;
> -       lo->ioctl = NULL;
>         lo->lo_device = NULL;
>         lo->lo_encryption = NULL;
>         lo->lo_offset = 0;
> @@ -1262,7 +1260,6 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
>         if (!xfer)
>                 xfer = &none_funcs;
>         lo->transfer = xfer->transfer;
> -       lo->ioctl = xfer->ioctl;
>
>         if ((lo->lo_flags & LO_FLAGS_AUTOCLEAR) !=
>              (info->lo_flags & LO_FLAGS_AUTOCLEAR))
> @@ -1525,7 +1522,7 @@ static int lo_simple_ioctl(struct loop_device *lo, unsigned int cmd,
>                 err = loop_set_block_size(lo, arg);
>                 break;
>         default:
> -               err = lo->ioctl ? lo->ioctl(lo, cmd, arg) : -EINVAL;
> +               err = -EINVAL;
>         }
>         mutex_unlock(&loop_ctl_mutex);
>         return err;
> @@ -1567,10 +1564,9 @@ static int lo_ioctl(struct block_device *bdev, fmode_t mode,
>         case LOOP_SET_BLOCK_SIZE:
>                 if (!(mode & FMODE_WRITE) && !capable(CAP_SYS_ADMIN))
>                         return -EPERM;
> -               /* Fall through */
> +               return lo_simple_ioctl(lo, cmd, arg);
>         default:
> -               err = lo_simple_ioctl(lo, cmd, arg);
> -               break;
> +               return -EINVAL;
>         }
>
>         return err;
> diff --git a/drivers/block/loop.h b/drivers/block/loop.h
> index af75a5ee4094..56a9a0c161d7 100644
> --- a/drivers/block/loop.h
> +++ b/drivers/block/loop.h
> @@ -84,7 +84,6 @@ struct loop_func_table {
>         int (*init)(struct loop_device *, const struct loop_info64 *);
>         /* release is called from loop_unregister_transfer or clr_fd */
>         int (*release)(struct loop_device *);
> -       int (*ioctl)(struct loop_device *, int cmd, unsigned long arg);
>         struct module *owner;
>  };
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: INFO: task hung in __get_super
  2019-04-28 18:51   ` Al Viro
  2019-04-29  1:38     ` Tetsuo Handa
  2019-04-29  5:30     ` Dmitry Vyukov
@ 2019-04-30  2:55     ` Jan Kara
  2019-04-30  3:11       ` Al Viro
  2 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2019-04-30  2:55 UTC (permalink / raw)
  To: Al Viro
  Cc: syzbot, axboe, dvyukov, jack, linux-fsdevel, linux-kernel,
	penguin-kernel, syzkaller-bugs

On Sun 28-04-19 19:51:09, Al Viro wrote:
> On Sun, Apr 28, 2019 at 11:14:06AM -0700, syzbot wrote:
> >  down_read+0x49/0x90 kernel/locking/rwsem.c:26
> >  __get_super.part.0+0x203/0x2e0 fs/super.c:788
> >  __get_super include/linux/spinlock.h:329 [inline]
> >  get_super+0x2e/0x50 fs/super.c:817
> >  fsync_bdev+0x19/0xd0 fs/block_dev.c:525
> >  invalidate_partition+0x36/0x60 block/genhd.c:1581
> >  drop_partitions block/partition-generic.c:443 [inline]
> >  rescan_partitions+0xef/0xa20 block/partition-generic.c:516
> >  __blkdev_reread_part+0x1a2/0x230 block/ioctl.c:173
> >  blkdev_reread_part+0x27/0x40 block/ioctl.c:193
> >  loop_reread_partitions+0x1c/0x40 drivers/block/loop.c:633
> >  loop_set_status+0xe57/0x1380 drivers/block/loop.c:1296
> >  loop_set_status64+0xc2/0x120 drivers/block/loop.c:1416
> >  lo_ioctl+0x8fc/0x2150 drivers/block/loop.c:1559
> >  __blkdev_driver_ioctl block/ioctl.c:303 [inline]
> >  blkdev_ioctl+0x6f2/0x1d10 block/ioctl.c:605
> >  block_ioctl+0xee/0x130 fs/block_dev.c:1933
> >  vfs_ioctl fs/ioctl.c:46 [inline]
> >  file_ioctl fs/ioctl.c:509 [inline]
> >  do_vfs_ioctl+0xd6e/0x1390 fs/ioctl.c:696
> >  ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
> >  __do_sys_ioctl fs/ioctl.c:720 [inline]
> >  __se_sys_ioctl fs/ioctl.c:718 [inline]
> >  __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
> >  do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
> >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> ioctl(..., BLKRRPART) blocked on ->s_umount in __get_super().
> The trouble is, the only things holding ->s_umount appears to be
> these:
> 
> > 2 locks held by syz-executor274/11716:
> >  #0: 00000000a19e2025 (&type->s_umount_key#38/1){+.+.}, at:
> > alloc_super+0x158/0x890 fs/super.c:228
> >  #1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: lo_simple_ioctl
> > drivers/block/loop.c:1514 [inline]
> >  #1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: lo_ioctl+0x266/0x2150
> > drivers/block/loop.c:1572
> 
> > 2 locks held by syz-executor274/11717:
> >  #0: 00000000e185c083 (&type->s_umount_key#38/1){+.+.}, at:
> > alloc_super+0x158/0x890 fs/super.c:228
> >  #1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: lo_simple_ioctl
> > drivers/block/loop.c:1514 [inline]
> >  #1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: lo_ioctl+0x266/0x2150
> > drivers/block/loop.c:1572
> 
> ... and that's bollocks.  ->s_umount held there is that on freshly allocated
> superblock.  It *MUST* be in mount(2); no other syscall should be able to
> call alloc_super() in the first place.  So what the hell is that doing
> trying to call lo_ioctl() inside mount(2)?  Something like isofs attempting
> cdrom ioctls on the underlying device?

Actually UDF also calls CDROMMULTISESSION ioctl during mount. So I could
see how we get to lo_simple_ioctl() and indeed that would acquire
loop_ctl_mutex under s_umount which is the other way around than in
BLKRRPART ioctl. 

> Why do we have loop_func_table->ioctl(), BTW?  All in-tree instances are
> either NULL or return -EINVAL unconditionally.  Considering that the
> caller is
>                 err = lo->ioctl ? lo->ioctl(lo, cmd, arg) : -EINVAL;
> we could bloody well just get rid of cryptoloop_ioctl() (the only
> non-NULL instance) and get rid of calling lo_simple_ioctl() in
> lo_ioctl() switch's default.

Yeah, you're right. And if we push the patch a bit further to not take
loop_ctl_mutex for invalid ioctl number, that would fix the problem. I
can send a fix.

								Honza

> 
> Something like this:
> 
> diff --git a/drivers/block/cryptoloop.c b/drivers/block/cryptoloop.c
> index 254ee7d54e91..f16468a562f5 100644
> --- a/drivers/block/cryptoloop.c
> +++ b/drivers/block/cryptoloop.c
> @@ -167,12 +167,6 @@ cryptoloop_transfer(struct loop_device *lo, int cmd,
>  }
>  
>  static int
> -cryptoloop_ioctl(struct loop_device *lo, int cmd, unsigned long arg)
> -{
> -	return -EINVAL;
> -}
> -
> -static int
>  cryptoloop_release(struct loop_device *lo)
>  {
>  	struct crypto_sync_skcipher *tfm = lo->key_data;
> @@ -188,7 +182,6 @@ cryptoloop_release(struct loop_device *lo)
>  static struct loop_func_table cryptoloop_funcs = {
>  	.number = LO_CRYPT_CRYPTOAPI,
>  	.init = cryptoloop_init,
> -	.ioctl = cryptoloop_ioctl,
>  	.transfer = cryptoloop_transfer,
>  	.release = cryptoloop_release,
>  	.owner = THIS_MODULE
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index bf1c61cab8eb..2ec162b80562 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -955,7 +955,6 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
>  	lo->lo_flags = lo_flags;
>  	lo->lo_backing_file = file;
>  	lo->transfer = NULL;
> -	lo->ioctl = NULL;
>  	lo->lo_sizelimit = 0;
>  	lo->old_gfp_mask = mapping_gfp_mask(mapping);
>  	mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS));
> @@ -1064,7 +1063,6 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
>  
>  	loop_release_xfer(lo);
>  	lo->transfer = NULL;
> -	lo->ioctl = NULL;
>  	lo->lo_device = NULL;
>  	lo->lo_encryption = NULL;
>  	lo->lo_offset = 0;
> @@ -1262,7 +1260,6 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
>  	if (!xfer)
>  		xfer = &none_funcs;
>  	lo->transfer = xfer->transfer;
> -	lo->ioctl = xfer->ioctl;
>  
>  	if ((lo->lo_flags & LO_FLAGS_AUTOCLEAR) !=
>  	     (info->lo_flags & LO_FLAGS_AUTOCLEAR))
> @@ -1525,7 +1522,7 @@ static int lo_simple_ioctl(struct loop_device *lo, unsigned int cmd,
>  		err = loop_set_block_size(lo, arg);
>  		break;
>  	default:
> -		err = lo->ioctl ? lo->ioctl(lo, cmd, arg) : -EINVAL;
> +		err = -EINVAL;
>  	}
>  	mutex_unlock(&loop_ctl_mutex);
>  	return err;
> @@ -1567,10 +1564,9 @@ static int lo_ioctl(struct block_device *bdev, fmode_t mode,
>  	case LOOP_SET_BLOCK_SIZE:
>  		if (!(mode & FMODE_WRITE) && !capable(CAP_SYS_ADMIN))
>  			return -EPERM;
> -		/* Fall through */
> +		return lo_simple_ioctl(lo, cmd, arg);
>  	default:
> -		err = lo_simple_ioctl(lo, cmd, arg);
> -		break;
> +		return -EINVAL;
>  	}
>  
>  	return err;
> diff --git a/drivers/block/loop.h b/drivers/block/loop.h
> index af75a5ee4094..56a9a0c161d7 100644
> --- a/drivers/block/loop.h
> +++ b/drivers/block/loop.h
> @@ -84,7 +84,6 @@ struct loop_func_table {
>  	int (*init)(struct loop_device *, const struct loop_info64 *); 
>  	/* release is called from loop_unregister_transfer or clr_fd */
>  	int (*release)(struct loop_device *); 
> -	int (*ioctl)(struct loop_device *, int cmd, unsigned long arg);
>  	struct module *owner;
>  }; 
>  
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: INFO: task hung in __get_super
  2019-04-30  2:55     ` Jan Kara
@ 2019-04-30  3:11       ` Al Viro
  2019-04-30 13:07         ` Jan Kara
  0 siblings, 1 reply; 28+ messages in thread
From: Al Viro @ 2019-04-30  3:11 UTC (permalink / raw)
  To: Jan Kara
  Cc: syzbot, axboe, dvyukov, linux-fsdevel, linux-kernel,
	penguin-kernel, syzkaller-bugs

On Tue, Apr 30, 2019 at 04:55:01AM +0200, Jan Kara wrote:

> Yeah, you're right. And if we push the patch a bit further to not take
> loop_ctl_mutex for invalid ioctl number, that would fix the problem. I
> can send a fix.

Huh?  We don't take it until in lo_simple_ioctl(), and that patch doesn't
get to its call on invalid ioctl numbers.  What am I missing here?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: INFO: task hung in __get_super
  2019-04-30  3:11       ` Al Viro
@ 2019-04-30 13:07         ` Jan Kara
  2019-04-30 13:18           ` Al Viro
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2019-04-30 13:07 UTC (permalink / raw)
  To: Al Viro
  Cc: Jan Kara, syzbot, axboe, dvyukov, linux-fsdevel, linux-kernel,
	penguin-kernel, syzkaller-bugs

On Tue 30-04-19 04:11:44, Al Viro wrote:
> On Tue, Apr 30, 2019 at 04:55:01AM +0200, Jan Kara wrote:
> 
> > Yeah, you're right. And if we push the patch a bit further to not take
> > loop_ctl_mutex for invalid ioctl number, that would fix the problem. I
> > can send a fix.
> 
> Huh?  We don't take it until in lo_simple_ioctl(), and that patch doesn't
> get to its call on invalid ioctl numbers.  What am I missing here?

Doesn't it? blkdev_ioctl() calls into __blkdev_driver_ioctl() for
unrecognized ioctl numbers. That calls into lo_ioctl() in case of a loop
device. lo_ioctl() calls into lo_simple_ioctl() for ioctl numbers it
doesn't recognize and lo_simple_ioctl() will lock loop_ctl_mutex as you
say.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: INFO: task hung in __get_super
  2019-04-30 13:07         ` Jan Kara
@ 2019-04-30 13:18           ` Al Viro
  2019-04-30 15:07             ` Jan Kara
  0 siblings, 1 reply; 28+ messages in thread
From: Al Viro @ 2019-04-30 13:18 UTC (permalink / raw)
  To: Jan Kara
  Cc: syzbot, axboe, dvyukov, linux-fsdevel, linux-kernel,
	penguin-kernel, syzkaller-bugs

On Tue, Apr 30, 2019 at 03:07:39PM +0200, Jan Kara wrote:
> On Tue 30-04-19 04:11:44, Al Viro wrote:
> > On Tue, Apr 30, 2019 at 04:55:01AM +0200, Jan Kara wrote:
> > 
> > > Yeah, you're right. And if we push the patch a bit further to not take
> > > loop_ctl_mutex for invalid ioctl number, that would fix the problem. I
> > > can send a fix.
> > 
> > Huh?  We don't take it until in lo_simple_ioctl(), and that patch doesn't
> > get to its call on invalid ioctl numbers.  What am I missing here?
> 
> Doesn't it? blkdev_ioctl() calls into __blkdev_driver_ioctl() for
> unrecognized ioctl numbers. That calls into lo_ioctl() in case of a loop
> device. lo_ioctl() calls into lo_simple_ioctl() for ioctl numbers it
> doesn't recognize and lo_simple_ioctl() will lock loop_ctl_mutex as you
> say.

Not with the patch upthread.  lo_ioctl() part was

@@ -1567,10 +1564,9 @@ static int lo_ioctl(struct block_device *bdev, fmode_t mode,
 	case LOOP_SET_BLOCK_SIZE:
 		if (!(mode & FMODE_WRITE) && !capable(CAP_SYS_ADMIN))
 			return -EPERM;
-		/* Fall through */
+		return lo_simple_ioctl(lo, cmd, arg);
 	default:
-		err = lo_simple_ioctl(lo, cmd, arg);
-		break;
+		return -EINVAL;
 	}
 
 	return err;

so anything unrecognized doesn't make it to lo_simple_ioctl() at all.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: INFO: task hung in __get_super
  2019-04-30 13:18           ` Al Viro
@ 2019-04-30 15:07             ` Jan Kara
  2019-04-30 15:34               ` Tetsuo Handa
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2019-04-30 15:07 UTC (permalink / raw)
  To: Al Viro
  Cc: Jan Kara, syzbot, axboe, dvyukov, linux-fsdevel, linux-kernel,
	penguin-kernel, syzkaller-bugs

On Tue 30-04-19 14:18:21, Al Viro wrote:
> On Tue, Apr 30, 2019 at 03:07:39PM +0200, Jan Kara wrote:
> > On Tue 30-04-19 04:11:44, Al Viro wrote:
> > > On Tue, Apr 30, 2019 at 04:55:01AM +0200, Jan Kara wrote:
> > > 
> > > > Yeah, you're right. And if we push the patch a bit further to not take
> > > > loop_ctl_mutex for invalid ioctl number, that would fix the problem. I
> > > > can send a fix.
> > > 
> > > Huh?  We don't take it until in lo_simple_ioctl(), and that patch doesn't
> > > get to its call on invalid ioctl numbers.  What am I missing here?
> > 
> > Doesn't it? blkdev_ioctl() calls into __blkdev_driver_ioctl() for
> > unrecognized ioctl numbers. That calls into lo_ioctl() in case of a loop
> > device. lo_ioctl() calls into lo_simple_ioctl() for ioctl numbers it
> > doesn't recognize and lo_simple_ioctl() will lock loop_ctl_mutex as you
> > say.
> 
> Not with the patch upthread.  lo_ioctl() part was
> 
> @@ -1567,10 +1564,9 @@ static int lo_ioctl(struct block_device *bdev, fmode_t mode,
>  	case LOOP_SET_BLOCK_SIZE:
>  		if (!(mode & FMODE_WRITE) && !capable(CAP_SYS_ADMIN))
>  			return -EPERM;
> -		/* Fall through */
> +		return lo_simple_ioctl(lo, cmd, arg);
>  	default:
> -		err = lo_simple_ioctl(lo, cmd, arg);
> -		break;
> +		return -EINVAL;
>  	}
>  
>  	return err;
> 
> so anything unrecognized doesn't make it to lo_simple_ioctl() at all.

Ah, right. I've missed that in your patch. So your patch should be really
fixing the problem. Will you post it officially? Thanks!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: INFO: task hung in __get_super
  2019-04-30 15:07             ` Jan Kara
@ 2019-04-30 15:34               ` Tetsuo Handa
  2019-05-03 10:29                 ` Tetsuo Handa
  0 siblings, 1 reply; 28+ messages in thread
From: Tetsuo Handa @ 2019-04-30 15:34 UTC (permalink / raw)
  To: Jan Kara, Al Viro
  Cc: syzbot, axboe, dvyukov, linux-fsdevel, linux-kernel, syzkaller-bugs

On 2019/05/01 0:07, Jan Kara wrote:
> Ah, right. I've missed that in your patch. So your patch should be really
> fixing the problem. Will you post it officially? Thanks!

I still cannot understand what the problem is.
According to console output,

----------
INFO: task syz-executor274:8097 blocked for more than 143 seconds. 
INFO: task blkid:8099 blocked for more than 143 seconds. 

1 lock held by syz-executor274/8083:
2 locks held by syz-executor274/8097:
 #0: 000000007a5ed526 (&bdev->bd_mutex){+.+.}, at: blkdev_reread_part+0x1f/0x40 block/ioctl.c:192
 #1: 0000000067606e21 (&type->s_umount_key#39){.+.+}, at: __get_super.part.0+0x203/0x2e0 fs/super.c:788
1 lock held by blkid/8099:
 #0: 000000007a5ed526 (&bdev->bd_mutex){+.+.}, at: blkdev_put+0x34/0x560 fs/block_dev.c:1866 
----------

8099 was blocked for too long waiting for 000000007a5ed526 held by 8097.
8097 was blocked for too long waiting for 0000000067606e21 held by somebody.
Since there is nobody else holding 0000000067606e21,
I guessed that the "somebody" which is holding 0000000067606e21 is 8083.

----------
[ 1107.337625][    C1] CPU: 1 PID: 8083 Comm: syz-executor274 Not tainted 5.1.0-rc6+ #89
[ 1107.337631][    C1] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ 1107.337636][    C1] RIP: 0010:debug_lockdep_rcu_enabled.part.0+0xb/0x60
[ 1107.337648][    C1] Code: 5b 5d c3 e8 67 71 e5 ff 0f 1f 80 00 00 00 00 55 48 89 e5 e8 37 ff ff ff 5d c3 0f 1f 44 00 00 48 b8 00 00 00 00 00 fc ff df 55 <48> 89 e5 53 65 48 8b 1c 25 00 ee 01 00 48 8d bb 7c 08 00 00 48 89
[ 1107.337652][    C1] RSP: 0018:ffff8880a85274c8 EFLAGS: 00000202
[ 1107.337661][    C1] RAX: dffffc0000000000 RBX: ffff8880a85275d8 RCX: 1ffffffff12bcd63
[ 1107.337666][    C1] RDX: 0000000000000000 RSI: ffffffff870d8f3c RDI: ffff8880a85275e0
[ 1107.337672][    C1] RBP: ffff8880a85274d8 R08: ffff888081e68540 R09: ffffed1015d25bc8
[ 1107.337677][    C1] R10: ffffed1015d25bc7 R11: ffff8880ae92de3b R12: 0000000000000000
[ 1107.337683][    C1] R13: ffff8880a694d640 R14: ffff88809541b942 R15: 0000000000000006
[ 1107.337689][    C1] FS:  0000000000e0b880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
[ 1107.337693][    C1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1107.337699][    C1] CR2: ffffffffff600400 CR3: 0000000092d6f000 CR4: 00000000001406e0
[ 1107.337704][    C1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1107.337710][    C1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1107.337713][    C1] Call Trace:
[ 1107.337717][    C1]  ? debug_lockdep_rcu_enabled+0x71/0xa0
[ 1107.337721][    C1]  xas_descend+0xbf/0x370
[ 1107.337724][    C1]  xas_load+0xef/0x150
[ 1107.337728][    C1]  find_get_entry+0x13d/0x880
[ 1107.337733][    C1]  ? find_get_entries_tag+0xc10/0xc10
[ 1107.337736][    C1]  ? mark_held_locks+0xa4/0xf0
[ 1107.337741][    C1]  ? pagecache_get_page+0x1a8/0x740
[ 1107.337745][    C1]  pagecache_get_page+0x4c/0x740
[ 1107.337749][    C1]  __getblk_gfp+0x27e/0x970
[ 1107.337752][    C1]  __bread_gfp+0x2f/0x300
[ 1107.337756][    C1]  udf_tread+0xf1/0x140
[ 1107.337760][    C1]  udf_read_tagged+0x50/0x530
[ 1107.337764][    C1]  udf_check_anchor_block+0x1ef/0x680
[ 1107.337768][    C1]  ? blkpg_ioctl+0xa90/0xa90
[ 1107.337772][    C1]  ? udf_process_sequence+0x35d0/0x35d0
[ 1107.337776][    C1]  ? submit_bio+0xba/0x480
[ 1107.337780][    C1]  udf_scan_anchors+0x3f4/0x680
[ 1107.337784][    C1]  ? udf_check_anchor_block+0x680/0x680
[ 1107.337789][    C1]  ? __sanitizer_cov_trace_const_cmp8+0x18/0x20
[ 1107.337793][    C1]  ? udf_get_last_session+0x120/0x120
[ 1107.337797][    C1]  udf_load_vrs+0x67f/0xc80
[ 1107.337801][    C1]  ? udf_scan_anchors+0x680/0x680
[ 1107.337805][    C1]  ? udf_bread+0x260/0x260
[ 1107.337809][    C1]  ? lockdep_init_map+0x1be/0x6d0
[ 1107.337813][    C1]  udf_fill_super+0x7d8/0x16d1
[ 1107.337817][    C1]  ? udf_load_vrs+0xc80/0xc80
[ 1107.337820][    C1]  ? vsprintf+0x40/0x40
[ 1107.337824][    C1]  ? set_blocksize+0x2bf/0x340
[ 1107.337829][    C1]  ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[ 1107.337833][    C1]  mount_bdev+0x307/0x3c0
[ 1107.337837][    C1]  ? udf_load_vrs+0xc80/0xc80
[ 1107.337840][    C1]  udf_mount+0x35/0x40
[ 1107.337844][    C1]  ? udf_get_pblock_meta25+0x3a0/0x3a0
[ 1107.337848][    C1]  legacy_get_tree+0xf2/0x200
[ 1107.337853][    C1]  ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[ 1107.337857][    C1]  vfs_get_tree+0x123/0x450
[ 1107.337860][    C1]  do_mount+0x1436/0x2c40
[ 1107.337864][    C1]  ? copy_mount_string+0x40/0x40
[ 1107.337868][    C1]  ? _copy_from_user+0xdd/0x150
[ 1107.337873][    C1]  ? __sanitizer_cov_trace_const_cmp8+0x18/0x20
[ 1107.337877][    C1]  ? copy_mount_options+0x280/0x3a0
[ 1107.337881][    C1]  ksys_mount+0xdb/0x150
[ 1107.337885][    C1]  __x64_sys_mount+0xbe/0x150
[ 1107.337889][    C1]  do_syscall_64+0x103/0x610
[ 1107.337893][    C1]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
----------

8083 is doing mount(2) but is not holding 00000000bde6230e (loop_ctl_mutex).
I guessed that something went wrong with 8083 inside __getblk_gfp().
How can loop_ctl_mutex be relevant to this problem?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: INFO: task hung in __get_super
  2019-04-30 15:34               ` Tetsuo Handa
@ 2019-05-03 10:29                 ` Tetsuo Handa
  0 siblings, 0 replies; 28+ messages in thread
From: Tetsuo Handa @ 2019-05-03 10:29 UTC (permalink / raw)
  To: Jan Kara, Al Viro
  Cc: syzbot, axboe, dvyukov, linux-fsdevel, linux-kernel, syzkaller-bugs

On 2019/05/01 0:34, Tetsuo Handa wrote:
> I still cannot understand what the problem is.
(...snipped...)
> I guessed that something went wrong with 8083 inside __getblk_gfp().
> How can loop_ctl_mutex be relevant to this problem?
> 

syzbot got similar NMI backtrace. No loop_ctl_mutex is involved.

INFO: task hung in mount_bdev (2)
https://syzkaller.appspot.com/bug?id=d9b9fa1428ff2466de64fc85256e769f516cea58

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: INFO: task hung in __get_super
  2019-04-28 18:14 ` syzbot
  2019-04-28 18:51   ` Al Viro
@ 2019-05-14  5:15   ` Tetsuo Handa
  2019-05-15  1:02   ` Tetsuo Handa
  2 siblings, 0 replies; 28+ messages in thread
From: Tetsuo Handa @ 2019-05-14  5:15 UTC (permalink / raw)
  To: Jan Kara, Jens Axboe, Alexander Viro
  Cc: syzbot, dvyukov, linux-fsdevel, linux-kernel, syzkaller-bugs

Hello, Jan.

syzbot is still reporting livelocks inside __getblk_gfp() [1] (similar to
commit 04906b2f542c2362 ("blockdev: Fix livelocks on loop device")).

  [1] https://syzkaller.appspot.com/bug?id=835a0b9e75b14b55112661cbc61ca8b8f0edf767

A debug printk() patch shown below revealed that since bdev->bd_inode->i_blkbits
is unexpectedly modified, __find_get_block_slow() is failing to find buffer pages
created by grow_dev_page(). I guess that opening a loop device (while somebody is
doing mount operation on that loop device) can trigger set_init_blocksize() from
__blkdev_get().

----------------------------------------
diff --git a/fs/buffer.c b/fs/buffer.c
index 0faa41fb4c88..8e197c771476 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -190,7 +190,7 @@ EXPORT_SYMBOL(end_buffer_write_sync);
  * succeeds, there is no need to take private_lock.
  */
 static struct buffer_head *
-__find_get_block_slow(struct block_device *bdev, sector_t block)
+__find_get_block_slow(struct block_device *bdev, sector_t block, unsigned size)
 {
 	struct inode *bd_inode = bdev->bd_inode;
 	struct address_space *bd_mapping = bd_inode->i_mapping;
@@ -204,12 +204,20 @@ __find_get_block_slow(struct block_device *bdev, sector_t block)
 
 	index = block >> (PAGE_SHIFT - bd_inode->i_blkbits);
 	page = find_get_page_flags(bd_mapping, index, FGP_ACCESSED);
-	if (!page)
+	if (!page) {
+		if (fatal_signal_pending(current) && __ratelimit(&last_warned))
+			printk("%s:%u block=%llu bd_inode->i_blkbits=%u index=%lu size=%u\n",
+			       __FILE__, __LINE__, block, bd_inode->i_blkbits, index, size);
 		goto out;
+	}
 
 	spin_lock(&bd_mapping->private_lock);
-	if (!page_has_buffers(page))
+	if (!page_has_buffers(page)) {
+		if (fatal_signal_pending(current) && __ratelimit(&last_warned))
+			printk("%s:%u block=%llu bd_inode->i_blkbits=%u index=%lu size=%u page=%p\n",
+			       __FILE__, __LINE__, block, bd_inode->i_blkbits, index, size, page);
 		goto out_unlock;
+	}
 	head = page_buffers(page);
 	bh = head;
 	do {
@@ -934,6 +942,8 @@ grow_dev_page(struct block_device *bdev, sector_t block,
 	sector_t end_block;
 	int ret = 0;		/* Will call free_more_memory() */
 	gfp_t gfp_mask;
+	static DEFINE_RATELIMIT_STATE(last_warned1, HZ, 1);
+	static DEFINE_RATELIMIT_STATE(last_warned2, HZ, 1);
 
 	gfp_mask = mapping_gfp_constraint(inode->i_mapping, ~__GFP_FS) | gfp;
 
@@ -946,11 +956,15 @@ grow_dev_page(struct block_device *bdev, sector_t block,
 	gfp_mask |= __GFP_NOFAIL;
 
 	page = find_or_create_page(inode->i_mapping, index, gfp_mask);
+	if (fatal_signal_pending(current) && __ratelimit(&last_warned1))
+		printk("%s:%u block=%llu inode->i_blkbits=%u index=%lu size=%u\n", __FILE__, __LINE__, block, inode->i_blkbits, index, size);
 
 	BUG_ON(!PageLocked(page));
 
 	if (page_has_buffers(page)) {
 		bh = page_buffers(page);
+		if (fatal_signal_pending(current) && __ratelimit(&last_warned2))
+			printk("%s:%u block=%llu index=%lu size=%u bh->b_size=%lu\n", __FILE__, __LINE__, block, index, size, bh->b_size);
 		if (bh->b_size == size) {
 			end_block = init_page_buffers(page, bdev,
 						(sector_t)index << sizebits,
@@ -1292,7 +1306,7 @@ __find_get_block(struct block_device *bdev, sector_t block, unsigned size)
 
 	if (bh == NULL) {
 		/* __find_get_block_slow will mark the page accessed */
-		bh = __find_get_block_slow(bdev, block);
+		bh = __find_get_block_slow(bdev, block, size);
 		if (bh)
 			bh_lru_install(bh);
 	} else
----------------------------------------

----------------------------------------
[   89.805981][ T7388] __find_get_block_slow: 1278192 callbacks suppressed
[   89.805987][ T7388] fs/buffer.c:218 block=1 bd_inode->i_blkbits=12 index=1 size=512 page=00000000cd89a24d
[   90.716164][ T7388] grow_dev_page: 1295496 callbacks suppressed
[   90.716167][ T7388] fs/buffer.c:960 block=1 inode->i_blkbits=12 index=0 size=512
[   90.735989][ T7388] grow_dev_page: 1296140 callbacks suppressed
[   90.735991][ T7388] fs/buffer.c:967 block=1 index=0 size=512 bh->b_size=512
[   90.815859][ T7388] __find_get_block_slow: 1312165 callbacks suppressed
[   90.815862][ T7388] fs/buffer.c:218 block=1 bd_inode->i_blkbits=12 index=1 size=512 page=00000000cd89a24d
[   91.726006][ T7388] grow_dev_page: 1306138 callbacks suppressed
[   91.726022][ T7388] fs/buffer.c:960 block=1 inode->i_blkbits=12 index=0 size=512
[   91.745856][ T7388] grow_dev_page: 1305423 callbacks suppressed
[   91.745859][ T7388] fs/buffer.c:967 block=1 index=0 size=512 bh->b_size=512
[   91.825701][ T7388] __find_get_block_slow: 1305252 callbacks suppressed
[   91.825719][ T7388] fs/buffer.c:218 block=1 bd_inode->i_blkbits=12 index=1 size=512 page=00000000cd89a24d
[   92.247976][ T1094] INFO: task a.out:7388 can't die for more than 60 seconds.
[   92.251681][ T1094] a.out           R  running task        0  7388      1 0x80004004
[   92.255528][ T1094] Call Trace:
[   92.258142][ T1094]  __schedule+0x2eb/0x600
[   92.260950][ T1094]  ? xas_start+0xa7/0x110
[   92.263748][ T1094]  ? xas_load+0x3d/0xd0
[   92.266548][ T1094]  ? xas_start+0xa7/0x110
[   92.269598][ T1094]  ? xas_load+0x3d/0xd0
[   92.272499][ T1094]  ? __find_get_block+0xa6/0x610
[   92.275293][ T1094]  ? _raw_spin_trylock+0x12/0x50
[   92.278173][ T1094]  ? ___ratelimit+0x28/0x1a0
[   92.280865][ T1094]  ? __getblk_gfp+0x307/0x490
[   92.283687][ T1094]  ? __bread_gfp+0x2d/0x130
[   92.286857][ T1094]  ? udf_tread+0x48/0x70
[   92.289344][ T1094]  ? udf_read_tagged+0x40/0x1a0
[   92.292008][ T1094]  ? udf_check_anchor_block+0x94/0x190
[   92.294766][ T1094]  ? udf_scan_anchors+0x12b/0x250
[   92.297524][ T1094]  ? udf_load_vrs+0x2b7/0x500
[   92.300086][ T1094]  ? udf_fill_super+0x32e/0x890
[   92.302667][ T1094]  ? snprintf+0x66/0x90
[   92.305029][ T1094]  ? mount_bdev+0x1c7/0x210
[   92.307570][ T1094]  ? udf_load_vrs+0x500/0x500
[   92.309999][ T1094]  ? udf_mount+0x34/0x40
[   92.312295][ T1094]  ? legacy_get_tree+0x2d/0x80
[   92.314673][ T1094]  ? vfs_get_tree+0x30/0x140
[   92.317041][ T1094]  ? do_mount+0x830/0xc30
[   92.319463][ T1094]  ? copy_mount_options+0x152/0x1c0
[   92.321987][ T1094]  ? ksys_mount+0xab/0x120
[   92.324226][ T1094]  ? __x64_sys_mount+0x26/0x30
[   92.326956][ T1094]  ? do_syscall_64+0x7c/0x1a0
[   92.329278][ T1094]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   92.331880][ T1094] INFO: task a.out:7400 can't die for more than 60 seconds.
[   92.334877][ T1094] a.out           D    0  7400      1 0x00004004
[   92.337732][ T1094] Call Trace:
[   92.339595][ T1094]  __schedule+0x2e3/0x600
[   92.341668][ T1094]  schedule+0x32/0xc0
[   92.343578][ T1094]  rwsem_down_read_failed+0xf0/0x1a0
[   92.346142][ T1094]  down_read+0x2f/0x40
[   92.348047][ T1094]  ? down_read+0x2f/0x40
[   92.351045][ T1094]  __get_super.part.12+0x113/0x140
[   92.353673][ T1094]  get_super+0x2d/0x40
[   92.355705][ T1094]  fsync_bdev+0x19/0x80
[   92.357634][ T1094]  invalidate_partition+0x35/0x60
[   92.359829][ T1094]  rescan_partitions+0x53/0x4b0
[   92.361983][ T1094]  __blkdev_reread_part+0x6a/0xa0
[   92.364185][ T1094]  blkdev_reread_part+0x24/0x40
[   92.366804][ T1094]  loop_reread_partitions+0x1e/0x60
[   92.369377][ T1094]  loop_set_status+0x4fa/0x5a0
[   92.371571][ T1094]  loop_set_status64+0x55/0x90
[   92.373722][ T1094]  lo_ioctl+0x651/0xc60
[   92.375750][ T1094]  ? loop_queue_work+0xdb0/0xdb0
[   92.377951][ T1094]  blkdev_ioctl+0xb69/0xc10
[   92.380030][ T1094]  block_ioctl+0x56/0x70
[   92.381994][ T1094]  ? blkdev_fallocate+0x230/0x230
[   92.384240][ T1094]  do_vfs_ioctl+0xc1/0x7e0
[   92.386495][ T1094]  ? tomoyo_file_ioctl+0x23/0x30
[   92.388886][ T1094]  ksys_ioctl+0x94/0xb0
[   92.390889][ T1094]  __x64_sys_ioctl+0x1e/0x30
[   92.392963][ T1094]  do_syscall_64+0x7c/0x1a0
[   92.395057][ T1094]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   92.397695][ T1094] RIP: 0033:0x7fc3dd4195d7
[   92.399728][ T1094] Code: Bad RIP value.
[   92.401773][ T1094] RSP: 002b:00007ffd221788b8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[   92.405194][ T1094] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fc3dd4195d7
[   92.409004][ T1094] RDX: 00007ffd22178970 RSI: 0000000000004c04 RDI: 0000000000000004
[   92.412289][ T1094] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
[   92.415315][ T1094] R10: 0000000000000000 R11: 0000000000000202 R12: 00007fc3dd90e480
[   92.418396][ T1094] R13: 00007ffd22178a60 R14: 00007ffd22178970 R15: 00007ffd22178b70
[   92.421509][ T1094] INFO: task a.out:7400 blocked for more than 60 seconds.
[   92.424267][ T1094]       Not tainted 5.1.0+ #193
[   92.426533][ T1094] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[   92.429861][ T1094] a.out           D    0  7400      1 0x00004004
[   92.432589][ T1094] Call Trace:
[   92.434377][ T1094]  __schedule+0x2e3/0x600
[   92.436637][ T1094]  schedule+0x32/0xc0
[   92.438625][ T1094]  rwsem_down_read_failed+0xf0/0x1a0
[   92.441090][ T1094]  down_read+0x2f/0x40
[   92.443124][ T1094]  ? down_read+0x2f/0x40
[   92.445145][ T1094]  __get_super.part.12+0x113/0x140
[   92.447907][ T1094]  get_super+0x2d/0x40
[   92.449997][ T1094]  fsync_bdev+0x19/0x80
[   92.452049][ T1094]  invalidate_partition+0x35/0x60
[   92.454394][ T1094]  rescan_partitions+0x53/0x4b0
[   92.456792][ T1094]  __blkdev_reread_part+0x6a/0xa0
[   92.459063][ T1094]  blkdev_reread_part+0x24/0x40
[   92.461303][ T1094]  loop_reread_partitions+0x1e/0x60
[   92.463630][ T1094]  loop_set_status+0x4fa/0x5a0
[   92.465990][ T1094]  loop_set_status64+0x55/0x90
[   92.468377][ T1094]  lo_ioctl+0x651/0xc60
[   92.470605][ T1094]  ? loop_queue_work+0xdb0/0xdb0
[   92.472968][ T1094]  blkdev_ioctl+0xb69/0xc10
[   92.475107][ T1094]  block_ioctl+0x56/0x70
[   92.477221][ T1094]  ? blkdev_fallocate+0x230/0x230
[   92.479441][ T1094]  do_vfs_ioctl+0xc1/0x7e0
[   92.481720][ T1094]  ? tomoyo_file_ioctl+0x23/0x30
[   92.483973][ T1094]  ksys_ioctl+0x94/0xb0
[   92.486686][ T1094]  __x64_sys_ioctl+0x1e/0x30
[   92.489081][ T1094]  do_syscall_64+0x7c/0x1a0
[   92.491202][ T1094]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   92.493565][ T1094] RIP: 0033:0x7fc3dd4195d7
[   92.495690][ T1094] Code: Bad RIP value.
[   92.497609][ T1094] RSP: 002b:00007ffd221788b8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[   92.500723][ T1094] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fc3dd4195d7
[   92.503822][ T1094] RDX: 00007ffd22178970 RSI: 0000000000004c04 RDI: 0000000000000004
[   92.506967][ T1094] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
[   92.510022][ T1094] R10: 0000000000000000 R11: 0000000000000202 R12: 00007fc3dd90e480
[   92.512985][ T1094] R13: 00007ffd22178a60 R14: 00007ffd22178970 R15: 00007ffd22178b70
[   92.516039][ T1094] INFO: task probe-bcache:7402 blocked for more than 60 seconds.
[   92.519171][ T1094]       Not tainted 5.1.0+ #193
[   92.521295][ T1094] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[   92.524469][ T1094] probe-bcache    D    0  7402   7305 0x80000122
[   92.527614][ T1094] Call Trace:
[   92.529505][ T1094]  __schedule+0x2e3/0x600
[   92.531494][ T1094]  schedule+0x32/0xc0
[   92.533473][ T1094]  schedule_preempt_disabled+0x13/0x20
[   92.536036][ T1094]  __mutex_lock.isra.8+0x16b/0x500
[   92.538407][ T1094]  __mutex_lock_slowpath+0xe/0x10
[   92.540665][ T1094]  ? __mutex_lock_slowpath+0xe/0x10
[   92.542944][ T1094]  mutex_lock+0x1d/0x20
[   92.544945][ T1094]  blkdev_put+0x22/0x190
[   92.547434][ T1094]  ? blkdev_put+0x190/0x190
[   92.549519][ T1094]  blkdev_close+0x26/0x30
[   92.551667][ T1094]  __fput+0xe9/0x300
[   92.553711][ T1094]  ____fput+0x15/0x20
[   92.555690][ T1094]  task_work_run+0xa9/0xd0
[   92.557821][ T1094]  do_exit+0x37a/0xf40
[   92.559723][ T1094]  ? __secure_computing+0x72/0x150
[   92.562039][ T1094]  ? syscall_trace_enter+0x10a/0x3b0
[   92.564355][ T1094]  do_group_exit+0x57/0xe0
[   92.566780][ T1094]  __x64_sys_exit_group+0x1c/0x20
[   92.569307][ T1094]  do_syscall_64+0x7c/0x1a0
[   92.571461][ T1094]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   92.573874][ T1094] RIP: 0033:0x7f197f9e6e06
[   92.576039][ T1094] Code: Bad RIP value.
[   92.578264][ T1094] RSP: 002b:00007ffcfbcfce58 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[   92.581582][ T1094] RAX: ffffffffffffffda RBX: 00007f197fce9740 RCX: 00007f197f9e6e06
[   92.584878][ T1094] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
[   92.588253][ T1094] RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffff30
[   92.591427][ T1094] R10: 0000000000000004 R11: 0000000000000246 R12: 00007f197fce9740
[   92.594646][ T1094] R13: 0000000000000001 R14: 00007f197fcf2628 R15: 0000000000000000
----------------------------------------

----------------------------------------
[  120.248453][ T7298] fs/buffer.c:960 block=7 inode->i_blkbits=12 index=3 size=2048
[  120.318363][ T7298] grow_dev_page: 1514869 callbacks suppressed
[  120.318365][ T7298] fs/buffer.c:967 block=7 index=3 size=2048 bh->b_size=2048
[  120.328408][ T7298] __find_get_block_slow: 1515516 callbacks suppressed
[  120.328410][ T7298] fs/buffer.c:210 block=7 bd_inode->i_blkbits=12 index=7 size=2048
[  121.258324][ T7298] grow_dev_page: 1514197 callbacks suppressed
[  121.258327][ T7298] fs/buffer.c:960 block=7 inode->i_blkbits=12 index=3 size=2048
[  121.328271][ T7298] grow_dev_page: 1508348 callbacks suppressed
[  121.328273][ T7298] fs/buffer.c:967 block=7 index=3 size=2048 bh->b_size=2048
[  121.338232][ T7298] __find_get_block_slow: 1507633 callbacks suppressed
[  121.338234][ T7298] fs/buffer.c:210 block=7 bd_inode->i_blkbits=12 index=7 size=2048
[  122.268187][ T7298] grow_dev_page: 1469440 callbacks suppressed
[  122.268190][ T7298] fs/buffer.c:960 block=7 inode->i_blkbits=12 index=3 size=2048
[  122.268307][ T1095] INFO: task a.out:7298 can't die for more than 30 seconds.
[  122.277909][ T1095] a.out           R  running task        0  7298      1 0x80004004
[  122.281569][ T1095] Call Trace:
[  122.283953][ T1095]  ? xas_start+0xa7/0x110
[  122.286658][ T1095]  ? xas_load+0x3d/0xd0
[  122.289248][ T1095]  ? xas_start+0xa7/0x110
[  122.291690][ T1095]  ? xas_load+0x3d/0xd0
[  122.294112][ T1095]  ? find_get_entry+0x172/0x220
[  122.296911][ T1095]  ? preempt_count_add+0x80/0xc0
[  122.299633][ T1095]  ? __getblk_gfp+0x439/0x490
[  122.302126][ T1095]  ? __bread_gfp+0x2d/0x130
[  122.305032][ T1095]  ? udf_tread+0x48/0x70
[  122.308252][ T1095]  ? udf_read_tagged+0x40/0x1a0
[  122.311864][ T1095]  ? udf_check_anchor_block+0x94/0x190
[  122.315502][ T1095]  ? udf_scan_anchors+0x12b/0x250
[  122.318995][ T1095]  ? udf_load_vrs+0x2b7/0x500
[  122.322278][ T1095]  ? udf_fill_super+0x32e/0x890
[  122.325533][ T1095]  ? snprintf+0x66/0x90
[  122.328862][ T1095]  ? mount_bdev+0x1c7/0x210
[  122.332195][ T1095]  ? udf_load_vrs+0x500/0x500
[  122.334499][ T1095]  ? udf_mount+0x34/0x40
[  122.336594][ T1095]  ? legacy_get_tree+0x2d/0x80
[  122.338071][ T7298] grow_dev_page: 1454158 callbacks suppressed
[  122.338074][ T7298] fs/buffer.c:967 block=7 index=3 size=2048 bh->b_size=2048
[  122.338945][ T1095]  ? vfs_get_tree+0x30/0x140
[  122.346797][ T1095]  ? do_mount+0x830/0xc30
[  122.348091][ T7298] __find_get_block_slow: 1459190 callbacks suppressed
[  122.348093][ T7298] fs/buffer.c:210 block=7 bd_inode->i_blkbits=12 index=7 size=2048
[  122.349493][ T1095]  ? copy_mount_options+0x152/0x1c0
[  122.357382][ T1095]  ? ksys_mount+0xab/0x120
[  122.359788][ T1095]  ? __x64_sys_mount+0x26/0x30
[  122.362247][ T1095]  ? do_syscall_64+0x7c/0x1a0
[  122.364517][ T1095]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  122.367426][ T1095] INFO: task systemd-udevd:7302 blocked for more than 30 seconds.
[  122.371445][ T1095]       Not tainted 5.1.0+ #193
[  122.373967][ T1095] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  122.377365][ T1095] systemd-udevd   D    0  7302   5781 0x00000120
[  122.380380][ T1095] Call Trace:
[  122.382268][ T1095]  __schedule+0x2e3/0x600
[  122.384422][ T1095]  schedule+0x32/0xc0
[  122.386491][ T1095]  schedule_preempt_disabled+0x13/0x20
[  122.389047][ T1095]  __mutex_lock.isra.8+0x16b/0x500
[  122.391466][ T1095]  ? get_disk_and_module+0x80/0x80
[  122.393879][ T1095]  __mutex_lock_slowpath+0xe/0x10
[  122.396457][ T1095]  ? __mutex_lock_slowpath+0xe/0x10
[  122.398992][ T1095]  mutex_lock+0x1d/0x20
[  122.401200][ T1095]  __blkdev_get+0x9a/0x720
[  122.403366][ T1095]  blkdev_get+0x18c/0x440
[  122.405591][ T1095]  ? preempt_count_add+0x80/0xc0
[  122.407966][ T1095]  blkdev_open+0xbd/0xf0
[  122.410221][ T1095]  do_dentry_open+0x2b2/0x500
[  122.412756][ T1095]  ? bd_acquire+0x120/0x120
[  122.415044][ T1095]  vfs_open+0x38/0x40
[  122.417109][ T1095]  path_openat+0x3c9/0x1a90
[  122.419546][ T1095]  ? ptep_set_access_flags+0x52/0x70
[  122.422031][ T1095]  do_filp_open+0xb9/0x150
[  122.424222][ T1095]  ? strncpy_from_user+0x19f/0x230
[  122.426670][ T1095]  ? _raw_spin_unlock+0x1f/0x40
[  122.429675][ T1095]  do_sys_open+0x2a2/0x380
[  122.431965][ T1095]  ? do_sys_open+0x2a2/0x380
[  122.434174][ T1095]  __x64_sys_openat+0x24/0x30
[  122.436449][ T1095]  do_syscall_64+0x7c/0x1a0
[  122.438747][ T1095]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  122.441343][ T1095] RIP: 0033:0x7f27aff41c8e
[  122.443511][ T1095] Code: Bad RIP value.
[  122.445632][ T1095] RSP: 002b:00007ffcabf214b0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[  122.449006][ T1095] RAX: ffffffffffffffda RBX: 00005639f5178280 RCX: 00007f27aff41c8e
[  122.452025][ T1095] RDX: 0000000000080000 RSI: 00005639f5179310 RDI: 00000000ffffff9c
[  122.455064][ T1095] RBP: 0000000000000001 R08: 00005639f3dba346 R09: 00007f27b021dc40
[  122.458167][ T1095] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[  122.461701][ T1095] R13: 0000000000000000 R14: 00005639f517ab50 R15: 00000000ffffffff
[  122.465636][ T1095] INFO: task a.out:7319 can't die for more than 30 seconds.
[  122.469411][ T1095] a.out           D    0  7319      1 0x00004004
[  122.473056][ T1095] Call Trace:
[  122.475416][ T1095]  __schedule+0x2e3/0x600
[  122.478216][ T1095]  ? __switch_to_asm+0x34/0x70
[  122.480823][ T1095]  schedule+0x32/0xc0
[  122.482688][ T1095]  rwsem_down_read_failed+0xf0/0x1a0
[  122.484951][ T1095]  down_read+0x2f/0x40
[  122.486819][ T1095]  ? down_read+0x2f/0x40
[  122.488965][ T1095]  __get_super.part.12+0x113/0x140
[  122.491230][ T1095]  get_super+0x2d/0x40
[  122.493112][ T1095]  fsync_bdev+0x19/0x80
[  122.495157][ T1095]  invalidate_partition+0x35/0x60
[  122.497444][ T1095]  rescan_partitions+0x53/0x4b0
[  122.499634][ T1095]  __blkdev_reread_part+0x6a/0xa0
[  122.501926][ T1095]  blkdev_reread_part+0x24/0x40
[  122.504018][ T1095]  loop_reread_partitions+0x1e/0x60
[  122.506257][ T1095]  loop_set_status+0x4fa/0x5a0
[  122.508836][ T1095]  loop_set_status64+0x55/0x90
[  122.510962][ T1095]  lo_ioctl+0x651/0xc60
[  122.513809][ T1095]  ? loop_queue_work+0xdb0/0xdb0
[  122.516747][ T1095]  blkdev_ioctl+0xb69/0xc10
[  122.519569][ T1095]  block_ioctl+0x56/0x70
[  122.522421][ T1095]  ? blkdev_fallocate+0x230/0x230
[  122.524723][ T1095]  do_vfs_ioctl+0xc1/0x7e0
[  122.526859][ T1095]  ? tomoyo_file_ioctl+0x23/0x30
[  122.529657][ T1095]  ksys_ioctl+0x94/0xb0
[  122.531694][ T1095]  __x64_sys_ioctl+0x1e/0x30
[  122.533872][ T1095]  do_syscall_64+0x7c/0x1a0
[  122.536038][ T1095]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  122.538589][ T1095] RIP: 0033:0x7fe9a73645d7
[  122.540708][ T1095] Code: Bad RIP value.
[  122.542701][ T1095] RSP: 002b:00007ffe29d27c78 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[  122.546144][ T1095] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fe9a73645d7
[  122.549752][ T1095] RDX: 00007ffe29d27d30 RSI: 0000000000004c04 RDI: 0000000000000004
[  122.552970][ T1095] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
[  122.556034][ T1095] R10: 0000000000000000 R11: 0000000000000202 R12: 00007fe9a7859480
[  122.559177][ T1095] R13: 00007ffe29d27e20 R14: 00007ffe29d27d30 R15: 00007ffe29d27f30
[  122.562486][ T1095] INFO: task a.out:7319 blocked for more than 30 seconds.
[  122.565427][ T1095]       Not tainted 5.1.0+ #193
[  122.567733][ T1095] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  122.571274][ T1095] a.out           D    0  7319      1 0x00004004
[  122.573997][ T1095] Call Trace:
[  122.575858][ T1095]  __schedule+0x2e3/0x600
[  122.578070][ T1095]  ? __switch_to_asm+0x34/0x70
[  122.580532][ T1095]  schedule+0x32/0xc0
[  122.582647][ T1095]  rwsem_down_read_failed+0xf0/0x1a0
[  122.585117][ T1095]  down_read+0x2f/0x40
[  122.587168][ T1095]  ? down_read+0x2f/0x40
[  122.589792][ T1095]  __get_super.part.12+0x113/0x140
[  122.592223][ T1095]  get_super+0x2d/0x40
[  122.594273][ T1095]  fsync_bdev+0x19/0x80
[  122.596558][ T1095]  invalidate_partition+0x35/0x60
[  122.598959][ T1095]  rescan_partitions+0x53/0x4b0
[  122.601206][ T1095]  __blkdev_reread_part+0x6a/0xa0
[  122.603587][ T1095]  blkdev_reread_part+0x24/0x40
[  122.605875][ T1095]  loop_reread_partitions+0x1e/0x60
[  122.608326][ T1095]  loop_set_status+0x4fa/0x5a0
[  122.610707][ T1095]  loop_set_status64+0x55/0x90
[  122.613168][ T1095]  lo_ioctl+0x651/0xc60
[  122.615247][ T1095]  ? loop_queue_work+0xdb0/0xdb0
[  122.617606][ T1095]  blkdev_ioctl+0xb69/0xc10
[  122.619987][ T1095]  block_ioctl+0x56/0x70
[  122.622083][ T1095]  ? blkdev_fallocate+0x230/0x230
[  122.624626][ T1095]  do_vfs_ioctl+0xc1/0x7e0
[  122.627152][ T1095]  ? tomoyo_file_ioctl+0x23/0x30
[  122.631102][ T1095]  ksys_ioctl+0x94/0xb0
[  122.634123][ T1095]  __x64_sys_ioctl+0x1e/0x30
[  122.636641][ T1095]  do_syscall_64+0x7c/0x1a0
[  122.638877][ T1095]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  122.641414][ T1095] RIP: 0033:0x7fe9a73645d7
[  122.643578][ T1095] Code: Bad RIP value.
[  122.645736][ T1095] RSP: 002b:00007ffe29d27c78 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[  122.649091][ T1095] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fe9a73645d7
[  122.652392][ T1095] RDX: 00007ffe29d27d30 RSI: 0000000000004c04 RDI: 0000000000000004
[  122.655581][ T1095] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
[  122.658769][ T1095] R10: 0000000000000000 R11: 0000000000000202 R12: 00007fe9a7859480
[  122.662073][ T1095] R13: 00007ffe29d27e20 R14: 00007ffe29d27d30 R15: 00007ffe29d27f30
----------------------------------------

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: INFO: task hung in __get_super
  2019-04-28 18:14 ` syzbot
  2019-04-28 18:51   ` Al Viro
  2019-05-14  5:15   ` Tetsuo Handa
@ 2019-05-15  1:02   ` Tetsuo Handa
  2019-05-15 10:21     ` Jan Kara
  2 siblings, 1 reply; 28+ messages in thread
From: Tetsuo Handa @ 2019-05-15  1:02 UTC (permalink / raw)
  To: Jan Kara, Jens Axboe, Alexander Viro
  Cc: syzbot, dvyukov, linux-fsdevel, linux-kernel, syzkaller-bugs

Since lo_ioctl() is calling sb_set_blocksize() immediately after udf_load_vrs()
called sb_set_blocksize(), udf_tread() can't use expected i_blkbits settings...

[   48.685672][ T7322] fs/block_dev.c:135 bdev=0000000014fa0ec2 12 -> 9
[   48.694675][ T7322] CPU: 4 PID: 7322 Comm: a.out Not tainted 5.1.0+ #196
[   48.701321][ T7322] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
[   48.710265][ T7322] Call Trace:
[   48.710272][ T7322]  dump_stack+0xaa/0xd8
[   48.715633][ T7322]  set_blocksize+0xff/0x140
[   48.822094][ T7322]  sb_set_blocksize+0x27/0x70
[   48.824843][ T7322]  udf_load_vrs+0x4b/0x500
[   48.827322][ T7322]  udf_fill_super+0x32e/0x890
[   48.830125][ T7322]  ? snprintf+0x66/0x90
[   48.832572][ T7322]  mount_bdev+0x1c7/0x210
[   48.835293][ T7322]  ? udf_load_vrs+0x500/0x500
[   48.838009][ T7322]  udf_mount+0x34/0x40
[   48.840504][ T7322]  legacy_get_tree+0x2d/0x80
[   48.843192][ T7322]  vfs_get_tree+0x30/0x140
[   48.845787][ T7322]  do_mount+0x830/0xc30
[   48.848325][ T7322]  ? copy_mount_options+0x152/0x1c0
[   48.851066][ T7322]  ksys_mount+0xab/0x120
[   48.853627][ T7322]  __x64_sys_mount+0x26/0x30
[   48.856168][ T7322]  do_syscall_64+0x7c/0x1a0
[   48.858943][ T7322]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

[   48.978376][ T7332] fs/block_dev.c:135 bdev=0000000014fa0ec2 9 -> 12
[   49.079394][ T7332] CPU: 6 PID: 7332 Comm: a.out Not tainted 5.1.0+ #196
[   49.082769][ T7332] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
[   49.089007][ T7332] Call Trace:
[   49.091410][ T7332]  dump_stack+0xaa/0xd8
[   49.094053][ T7332]  set_blocksize+0xff/0x140
[   49.096734][ T7332]  lo_ioctl+0x570/0xc60
[   49.099366][ T7332]  ? loop_queue_work+0xdb0/0xdb0
[   49.102079][ T7332]  blkdev_ioctl+0xb69/0xc10
[   49.104667][ T7332]  block_ioctl+0x56/0x70
[   49.107267][ T7332]  ? blkdev_fallocate+0x230/0x230
[   49.110035][ T7332]  do_vfs_ioctl+0xc1/0x7e0
[   49.112728][ T7332]  ? tomoyo_file_ioctl+0x23/0x30
[   49.115452][ T7332]  ksys_ioctl+0x94/0xb0
[   49.118008][ T7332]  __x64_sys_ioctl+0x1e/0x30
[   49.120686][ T7332]  do_syscall_64+0x7c/0x1a0
[   49.123470][ T7332]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

[   49.163269][ T7322] UDF-fs: error (device loop2): udf_read_tagged: read failed, block=256, location=256
[   51.009494][ T7322] grow_dev_page: 5 callbacks suppressed
[   51.009509][ T7322] fs/buffer.c:960 bdev=0000000014fa0ec2 block=32 inode->i_blkbits=12 index=4 size=512
[   51.016583][ T7322] fs/buffer.c:967 bdev=0000000014fa0ec2 block=32 index=4 size=512 bh->b_size=512

[   55.046361][ T7322] grow_dev_page: 1177059 callbacks suppressed
[   55.046366][ T7322] fs/buffer.c:960 bdev=0000000014fa0ec2 block=32 inode->i_blkbits=12 index=4 size=512
[   55.054903][ T7322] grow_dev_page: 1177037 callbacks suppressed
[   55.054906][ T7322] fs/buffer.c:967 bdev=0000000014fa0ec2 block=32 index=4 size=512 bh->b_size=512
[   56.055187][ T7322] grow_dev_page: 1279065 callbacks suppressed
[   56.055191][ T7322] fs/buffer.c:960 bdev=0000000014fa0ec2 block=32 inode->i_blkbits=12 index=4 size=512
[   56.064443][ T7322] grow_dev_page: 1279065 callbacks suppressed
[   56.064446][ T7322] fs/buffer.c:967 bdev=0000000014fa0ec2 block=32 index=4 size=512 bh->b_size=512

[   92.282798][ T1097] INFO: task a.out:7322 can't die for more than 30 seconds.
[   92.285926][ T1097] a.out           R  running task        0  7322      1 0x80004004
[   92.289143][ T1097] Call Trace:
[   92.291175][ T1097]  __schedule+0x2eb/0x600
[   92.293354][ T1097]  ? apic_timer_interrupt+0xa/0x20
[   92.295802][ T1097]  ? xas_load+0x13/0xd0
[   92.298079][ T1097]  ? __switch_to_asm+0x40/0x70
[   92.300466][ T1097]  ? __switch_to_asm+0x34/0x70
[   92.302890][ T1097]  ? __switch_to_asm+0x40/0x70
[   92.305299][ T1097]  ? __switch_to_asm+0x34/0x70
[   92.307621][ T1097]  ? __switch_to_asm+0x40/0x70
[   92.309908][ T1097]  ? __switch_to_asm+0x34/0x70
[   92.312161][ T1097]  ? debug_smp_processor_id+0x2b/0x130
[   92.314532][ T1097]  ? xas_start+0xa7/0x110
[   92.317139][ T1097]  ? xas_load+0x3d/0xd0
[   92.319674][ T1097]  ? xas_start+0xa7/0x110
[   92.321746][ T1097]  ? xas_load+0x3d/0xd0
[   92.323831][ T1097]  ? find_get_entry+0x172/0x220
[   92.326115][ T1097]  ? __this_cpu_preempt_check+0x37/0x120
[   92.328714][ T1097]  ? __find_get_block+0x485/0x610
[   92.331096][ T1097]  ? _raw_spin_trylock+0x12/0x50
[   92.333402][ T1097]  ? __getblk_gfp+0xd9/0x4a0
[   92.335617][ T1097]  ? __bread_gfp+0x2d/0x130
[   92.338410][ T1097]  ? udf_tread+0x48/0x70
[   92.341144][ T1097]  ? udf_read_tagged+0x40/0x1a0
[   92.343537][ T1097]  ? udf_check_anchor_block+0x94/0x190
[   92.346054][ T1097]  ? udf_scan_anchors+0x12b/0x250
[   92.348480][ T1097]  ? udf_load_vrs+0x2b7/0x500
[   92.350732][ T1097]  ? udf_fill_super+0x32e/0x890
[   92.353172][ T1097]  ? snprintf+0x66/0x90
[   92.355256][ T1097]  ? mount_bdev+0x1c7/0x210
[   92.358214][ T1097]  ? udf_load_vrs+0x500/0x500
[   92.361251][ T1097]  ? udf_mount+0x34/0x40
[   92.363661][ T1097]  ? legacy_get_tree+0x2d/0x80
[   92.366298][ T1097]  ? vfs_get_tree+0x30/0x140
[   92.368955][ T1097]  ? do_mount+0x830/0xc30
[   92.371468][ T1097]  ? copy_mount_options+0x152/0x1c0
[   92.374098][ T1097]  ? ksys_mount+0xab/0x120
[   92.376605][ T1097]  ? __x64_sys_mount+0x26/0x30
[   92.379553][ T1097]  ? do_syscall_64+0x7c/0x1a0
[   92.381615][ T1097]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   92.383942][ T1097] INFO: task a.out:7332 can't die for more than 30 seconds.
[   92.387037][ T1097] a.out           D    0  7332      1 0x00004004
[   92.389621][ T1097] Call Trace:
[   92.391297][ T1097]  __schedule+0x2e3/0x600
[   92.393313][ T1097]  ? __switch_to_asm+0x40/0x70
[   92.395486][ T1097]  schedule+0x32/0xc0
[   92.404871][ T1097]  rwsem_down_read_failed+0xf0/0x1a0
[   92.407605][ T1097]  down_read+0x2f/0x40
[   92.417196][ T1097]  ? down_read+0x2f/0x40
[   92.419579][ T1097]  __get_super.part.12+0x113/0x140
[   92.422165][ T1097]  get_super+0x2d/0x40
[   92.424559][ T1097]  fsync_bdev+0x19/0x80
[   92.427012][ T1097]  invalidate_partition+0x35/0x60
[   92.429588][ T1097]  rescan_partitions+0x53/0x4b0
[   92.432137][ T1097]  __blkdev_reread_part+0x6a/0xa0
[   92.434856][ T1097]  blkdev_reread_part+0x24/0x40
[   92.438203][ T1097]  loop_reread_partitions+0x1e/0x60
[   92.440909][ T1097]  loop_set_status+0x4fa/0x5a0
[   92.443427][ T1097]  loop_set_status64+0x55/0x90
[   92.445962][ T1097]  lo_ioctl+0x651/0xc60
[   92.448427][ T1097]  ? loop_queue_work+0xdb0/0xdb0
[   92.451260][ T1097]  blkdev_ioctl+0xb69/0xc10
[   92.453771][ T1097]  block_ioctl+0x56/0x70
[   92.456284][ T1097]  ? blkdev_fallocate+0x230/0x230
[   92.458957][ T1097]  do_vfs_ioctl+0xc1/0x7e0
[   92.461639][ T1097]  ? tomoyo_file_ioctl+0x23/0x30
[   92.464071][ T1097]  ksys_ioctl+0x94/0xb0
[   92.466394][ T1097]  __x64_sys_ioctl+0x1e/0x30
[   92.468879][ T1097]  do_syscall_64+0x7c/0x1a0
[   92.471339][ T1097]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: INFO: task hung in __get_super
  2019-05-15  1:02   ` Tetsuo Handa
@ 2019-05-15 10:21     ` Jan Kara
  2019-05-15 11:32       ` Tetsuo Handa
  2019-05-15 11:59       ` syzbot
  0 siblings, 2 replies; 28+ messages in thread
From: Jan Kara @ 2019-05-15 10:21 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Jan Kara, Jens Axboe, Alexander Viro, syzbot, dvyukov,
	linux-fsdevel, linux-kernel, syzkaller-bugs, linux-block

[-- Attachment #1: Type: text/plain, Size: 3813 bytes --]

On Wed 15-05-19 10:02:37, Tetsuo Handa wrote:
> Since lo_ioctl() is calling sb_set_blocksize() immediately after udf_load_vrs()
> called sb_set_blocksize(), udf_tread() can't use expected i_blkbits settings...

Thanks for debugging this but this doesn't quiet make sense to me. See
below:

> [   48.685672][ T7322] fs/block_dev.c:135 bdev=0000000014fa0ec2 12 -> 9
> [   48.694675][ T7322] CPU: 4 PID: 7322 Comm: a.out Not tainted 5.1.0+ #196
> [   48.701321][ T7322] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
> [   48.710265][ T7322] Call Trace:
> [   48.710272][ T7322]  dump_stack+0xaa/0xd8
> [   48.715633][ T7322]  set_blocksize+0xff/0x140
> [   48.822094][ T7322]  sb_set_blocksize+0x27/0x70
> [   48.824843][ T7322]  udf_load_vrs+0x4b/0x500
> [   48.827322][ T7322]  udf_fill_super+0x32e/0x890
> [   48.830125][ T7322]  ? snprintf+0x66/0x90
> [   48.832572][ T7322]  mount_bdev+0x1c7/0x210
> [   48.835293][ T7322]  ? udf_load_vrs+0x500/0x500
> [   48.838009][ T7322]  udf_mount+0x34/0x40
> [   48.840504][ T7322]  legacy_get_tree+0x2d/0x80
> [   48.843192][ T7322]  vfs_get_tree+0x30/0x140
> [   48.845787][ T7322]  do_mount+0x830/0xc30
> [   48.848325][ T7322]  ? copy_mount_options+0x152/0x1c0
> [   48.851066][ T7322]  ksys_mount+0xab/0x120
> [   48.853627][ T7322]  __x64_sys_mount+0x26/0x30
> [   48.856168][ T7322]  do_syscall_64+0x7c/0x1a0
> [   48.858943][ T7322]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

So this is normal - UDF sets block size it wants on the device during
mount. Now we have the block device exclusively open so nobody should be
changing it.

> [   48.978376][ T7332] fs/block_dev.c:135 bdev=0000000014fa0ec2 9 -> 12
> [   49.079394][ T7332] CPU: 6 PID: 7332 Comm: a.out Not tainted 5.1.0+ #196
> [   49.082769][ T7332] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
> [   49.089007][ T7332] Call Trace:
> [   49.091410][ T7332]  dump_stack+0xaa/0xd8
> [   49.094053][ T7332]  set_blocksize+0xff/0x140
> [   49.096734][ T7332]  lo_ioctl+0x570/0xc60
> [   49.099366][ T7332]  ? loop_queue_work+0xdb0/0xdb0
> [   49.102079][ T7332]  blkdev_ioctl+0xb69/0xc10
> [   49.104667][ T7332]  block_ioctl+0x56/0x70
> [   49.107267][ T7332]  ? blkdev_fallocate+0x230/0x230
> [   49.110035][ T7332]  do_vfs_ioctl+0xc1/0x7e0
> [   49.112728][ T7332]  ? tomoyo_file_ioctl+0x23/0x30
> [   49.115452][ T7332]  ksys_ioctl+0x94/0xb0
> [   49.118008][ T7332]  __x64_sys_ioctl+0x1e/0x30
> [   49.120686][ T7332]  do_syscall_64+0x7c/0x1a0
> [   49.123470][ T7332]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

And this is strange. set_blocksize() is only called from loop_set_fd() but
that means that the loop device must already be in lo->lo_state ==
Lo_unbound. But loop device that is being used should never be in
Lo_unbound state... Except if... Oh, I see what the problem is:

UDF opens unbound loop device (through mount_bdev() ->
blkdev_get_by_path()). That succeeds as loop allows opens on unbound
devices so that ioctl can be run to set it up. UDF sets block size for the
block device. Someone else comes and calls LOOP_SET_FD for the device and
plop, block device block size changes under UDF's hands.

The question is how to fix this problem. The simplest fix I can see is that
we'd just refuse to do LOOP_SET_FD if someone has the block device
exclusively open as there are high chances such user will be unpleasantly
surprised by the device changing under him. OTOH this has some potential
for userspace visible regressions. But I guess it's worth a try. Something
like attached patch?

Let syzbot test the patch as well:

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git v5.1

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[-- Attachment #2: 0001-loop-Don-t-change-loop-device-under-exclusive-opener.patch --]
[-- Type: text/x-patch, Size: 1703 bytes --]

From 0145358ae24581b7af36261caee0c6dbe22cce0c Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Wed, 15 May 2019 11:45:10 +0200
Subject: [PATCH] loop: Don't change loop device under exclusive opener

Loop module allows calling LOOP_SET_FD while there are other openers of
the loop device. Even exclusive ones. This can lead to weird
consequences such as kernel deadlocks like:

mount_bdev()				lo_ioctl()
  udf_fill_super()
    udf_load_vrs()
      sb_set_blocksize() - sets desired block size B
      udf_tread()
        sb_bread()
          __bread_gfp(bdev, block, B)
					  loop_set_fd()
					    set_blocksize()
            - now __getblk_slow() indefinitely loops because B != bdev
              block size

Fix the problem by disallowing LOOP_SET_FD ioctl when there are
exclusive openers of a loop device.

[Deliberately chosen not to CC stable as a user with priviledges to
trigger this race has other means of taking the system down and this
has a potential of breaking some weird userspace setup]

Reported-by: syzbot+10007d66ca02b08f0e60@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/block/loop.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 102d79575895..9dcf8bb60c4e 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -952,6 +952,9 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
 	error = -EBUSY;
 	if (lo->lo_state != Lo_unbound)
 		goto out_unlock;
+	/* Avoid changing loop device under an exclusive opener... */
+	if (!(mode & FMODE_EXCL) && bdev->bd_holders > 0)
+		goto out_unlock;
 
 	error = loop_validate_file(file, bdev);
 	if (error)
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: INFO: task hung in __get_super
  2019-05-15 10:21     ` Jan Kara
@ 2019-05-15 11:32       ` Tetsuo Handa
  2019-05-15 13:07         ` Jan Kara
  2019-05-15 11:59       ` syzbot
  1 sibling, 1 reply; 28+ messages in thread
From: Tetsuo Handa @ 2019-05-15 11:32 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, Alexander Viro, syzbot, dvyukov, linux-fsdevel,
	linux-kernel, syzkaller-bugs, linux-block

On 2019/05/15 19:21, Jan Kara wrote:
> The question is how to fix this problem. The simplest fix I can see is that
> we'd just refuse to do LOOP_SET_FD if someone has the block device
> exclusively open as there are high chances such user will be unpleasantly
> surprised by the device changing under him. OTOH this has some potential
> for userspace visible regressions. But I guess it's worth a try. Something
> like attached patch?

(1) If I understand correctly, FMODE_EXCL is set at blkdev_open() only if O_EXCL
    is specified. How can we detect if O_EXCL was not used, for the reproducer
    ( https://syzkaller.appspot.com/text?tag=ReproC&x=135385a8a00000 ) is not
    using O_EXCL ?

(2) There seems to be no serialization. What guarantees that mount_bdev()
    does not start due to preempted after the check added by this patch?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: INFO: task hung in __get_super
  2019-05-15 10:21     ` Jan Kara
  2019-05-15 11:32       ` Tetsuo Handa
@ 2019-05-15 11:59       ` syzbot
  1 sibling, 0 replies; 28+ messages in thread
From: syzbot @ 2019-05-15 11:59 UTC (permalink / raw)
  To: axboe, dvyukov, jack, linux-block, linux-fsdevel, linux-kernel,
	penguin-kernel, syzkaller-bugs, viro

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger  
crash:

Reported-and-tested-by:  
syzbot+10007d66ca02b08f0e60@syzkaller.appspotmail.com

Tested on:

commit:         e93c9c99 Linux 5.1
git tree:        
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git v5.1
kernel config:  https://syzkaller.appspot.com/x/.config?x=5edd1df52e9bc982
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
patch:          https://syzkaller.appspot.com/x/patch.diff?x=133626d8a00000

Note: testing is done by a robot and is best-effort only.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: INFO: task hung in __get_super
  2019-05-15 11:32       ` Tetsuo Handa
@ 2019-05-15 13:07         ` Jan Kara
  2019-05-16 11:48           ` Jan Kara
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2019-05-15 13:07 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Jan Kara, Jens Axboe, Alexander Viro, syzbot, dvyukov,
	linux-fsdevel, linux-kernel, syzkaller-bugs, linux-block

On Wed 15-05-19 20:32:27, Tetsuo Handa wrote:
> On 2019/05/15 19:21, Jan Kara wrote:
> > The question is how to fix this problem. The simplest fix I can see is that
> > we'd just refuse to do LOOP_SET_FD if someone has the block device
> > exclusively open as there are high chances such user will be unpleasantly
> > surprised by the device changing under him. OTOH this has some potential
> > for userspace visible regressions. But I guess it's worth a try. Something
> > like attached patch?
> 
> (1) If I understand correctly, FMODE_EXCL is set at blkdev_open() only if
> O_EXCL is specified.

Yes.

> How can we detect if O_EXCL was not used, for the reproducer (
> https://syzkaller.appspot.com/text?tag=ReproC&x=135385a8a00000 ) is not
> using O_EXCL ?

mount_bdev() is using O_EXCL and that's what matters.

> (2) There seems to be no serialization. What guarantees that mount_bdev()
>     does not start due to preempted after the check added by this patch?

That's a good question. lo_ctl_mutex actually synchronizes most of this
(taken in both loop_set_fd() and lo_open()) but you're right that there's
still a small race window where loop_set_fd() need not see bdev->bd_holders
elevated while blkdev_get() will succeed. So I need to think a bit more
about proper synchronization of this.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: INFO: task hung in __get_super
  2019-05-15 13:07         ` Jan Kara
@ 2019-05-16 11:48           ` Jan Kara
  2019-05-16 12:17             ` Tetsuo Handa
  2019-05-16 12:33             ` syzbot
  0 siblings, 2 replies; 28+ messages in thread
From: Jan Kara @ 2019-05-16 11:48 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Jan Kara, Jens Axboe, Alexander Viro, syzbot, dvyukov,
	linux-fsdevel, linux-kernel, syzkaller-bugs, linux-block

[-- Attachment #1: Type: text/plain, Size: 1801 bytes --]

On Wed 15-05-19 15:07:30, Jan Kara wrote:
> On Wed 15-05-19 20:32:27, Tetsuo Handa wrote:
> > On 2019/05/15 19:21, Jan Kara wrote:
> > > The question is how to fix this problem. The simplest fix I can see is that
> > > we'd just refuse to do LOOP_SET_FD if someone has the block device
> > > exclusively open as there are high chances such user will be unpleasantly
> > > surprised by the device changing under him. OTOH this has some potential
> > > for userspace visible regressions. But I guess it's worth a try. Something
> > > like attached patch?
> > 
> > (1) If I understand correctly, FMODE_EXCL is set at blkdev_open() only if
> > O_EXCL is specified.
> 
> Yes.
> 
> > How can we detect if O_EXCL was not used, for the reproducer (
> > https://syzkaller.appspot.com/text?tag=ReproC&x=135385a8a00000 ) is not
> > using O_EXCL ?
> 
> mount_bdev() is using O_EXCL and that's what matters.
> 
> > (2) There seems to be no serialization. What guarantees that mount_bdev()
> >     does not start due to preempted after the check added by this patch?
> 
> That's a good question. lo_ctl_mutex actually synchronizes most of this
> (taken in both loop_set_fd() and lo_open()) but you're right that there's
> still a small race window where loop_set_fd() need not see bdev->bd_holders
> elevated while blkdev_get() will succeed. So I need to think a bit more
> about proper synchronization of this.

OK, so non-racy fix was a bit more involved and I've ended up just
upgrading the file reference to an exclusive one in loop_set_fd() instead
of trying to hand-craft some locking solution. The result is attached and
it passes blktests.

Let syzbot also test it:

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git v5.1

									Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[-- Attachment #2: 0001-loop-Don-t-change-loop-device-under-exclusive-opener.patch --]
[-- Type: text/x-patch, Size: 2344 bytes --]

From e7a35f48a902b42894eba8cc4201ca745bfd5dfe Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Wed, 15 May 2019 11:45:10 +0200
Subject: [PATCH] loop: Don't change loop device under exclusive opener

Loop module allows calling LOOP_SET_FD while there are other openers of
the loop device. Even exclusive ones. This can lead to weird
consequences such as kernel deadlocks like:

mount_bdev()				lo_ioctl()
  udf_fill_super()
    udf_load_vrs()
      sb_set_blocksize() - sets desired block size B
      udf_tread()
        sb_bread()
          __bread_gfp(bdev, block, B)
					  loop_set_fd()
					    set_blocksize()
            - now __getblk_slow() indefinitely loops because B != bdev
              block size

Fix the problem by disallowing LOOP_SET_FD ioctl when there are
exclusive openers of a loop device.

[Deliberately chosen not to CC stable as a user with priviledges to
trigger this race has other means of taking the system down and this
has a potential of breaking some weird userspace setup]

Reported-by: syzbot+10007d66ca02b08f0e60@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/block/loop.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 102d79575895..f11b7dc16e9d 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -945,9 +945,20 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
 	if (!file)
 		goto out;
 
+	/*
+	 * If we don't hold exclusive handle for the device, upgrade to it
+	 * here to avoid changing device under exclusive owner.
+	 */
+	if (!(mode & FMODE_EXCL)) {
+		bdgrab(bdev);
+		error = blkdev_get(bdev, mode | FMODE_EXCL, loop_set_fd);
+		if (error)
+			goto out_putf;
+	}
+
 	error = mutex_lock_killable(&loop_ctl_mutex);
 	if (error)
-		goto out_putf;
+		goto out_bdev;
 
 	error = -EBUSY;
 	if (lo->lo_state != Lo_unbound)
@@ -1012,10 +1023,15 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
 	mutex_unlock(&loop_ctl_mutex);
 	if (partscan)
 		loop_reread_partitions(lo, bdev);
+	if (!(mode & FMODE_EXCL))
+		blkdev_put(bdev, mode | FMODE_EXCL);
 	return 0;
 
 out_unlock:
 	mutex_unlock(&loop_ctl_mutex);
+out_bdev:
+	if (!(mode & FMODE_EXCL))
+		blkdev_put(bdev, mode | FMODE_EXCL);
 out_putf:
 	fput(file);
 out:
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: INFO: task hung in __get_super
  2019-05-16 11:48           ` Jan Kara
@ 2019-05-16 12:17             ` Tetsuo Handa
  2019-05-16 12:32               ` Jan Kara
  2019-05-16 12:33             ` syzbot
  1 sibling, 1 reply; 28+ messages in thread
From: Tetsuo Handa @ 2019-05-16 12:17 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, Alexander Viro, syzbot, dvyukov, linux-fsdevel,
	linux-kernel, syzkaller-bugs, linux-block

On 2019/05/16 20:48, Jan Kara wrote:
> OK, so non-racy fix was a bit more involved and I've ended up just
> upgrading the file reference to an exclusive one in loop_set_fd() instead
> of trying to hand-craft some locking solution. The result is attached and
> it passes blktests.

blkdev_get() has corresponding blkdev_put().
bdgrab() does not have corresponding bdput() ?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: INFO: task hung in __get_super
  2019-05-16 12:17             ` Tetsuo Handa
@ 2019-05-16 12:32               ` Jan Kara
  2019-05-16 12:50                 ` Tetsuo Handa
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2019-05-16 12:32 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Jan Kara, Jens Axboe, Alexander Viro, syzbot, dvyukov,
	linux-fsdevel, linux-kernel, syzkaller-bugs, linux-block

On Thu 16-05-19 21:17:14, Tetsuo Handa wrote:
> On 2019/05/16 20:48, Jan Kara wrote:
> > OK, so non-racy fix was a bit more involved and I've ended up just
> > upgrading the file reference to an exclusive one in loop_set_fd() instead
> > of trying to hand-craft some locking solution. The result is attached and
> > it passes blktests.
> 
> blkdev_get() has corresponding blkdev_put().
> bdgrab() does not have corresponding bdput() ?

Yes, and that's hidden inside blkdev_put() (or failing blkdev_get()). Don't
get me started on calling conventions of these functions... I've wasted half
an hour trying to figure out where I'm leaking inode references in my patch
;).

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: INFO: task hung in __get_super
  2019-05-16 11:48           ` Jan Kara
  2019-05-16 12:17             ` Tetsuo Handa
@ 2019-05-16 12:33             ` syzbot
  1 sibling, 0 replies; 28+ messages in thread
From: syzbot @ 2019-05-16 12:33 UTC (permalink / raw)
  To: axboe, dvyukov, jack, linux-block, linux-fsdevel, linux-kernel,
	penguin-kernel, syzkaller-bugs, viro

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger  
crash:

Reported-and-tested-by:  
syzbot+10007d66ca02b08f0e60@syzkaller.appspotmail.com

Tested on:

commit:         e93c9c99 Linux 5.1
git tree:        
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git v5.1
kernel config:  https://syzkaller.appspot.com/x/.config?x=5edd1df52e9bc982
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
patch:          https://syzkaller.appspot.com/x/patch.diff?x=135c5b02a00000

Note: testing is done by a robot and is best-effort only.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: INFO: task hung in __get_super
  2019-05-16 12:32               ` Jan Kara
@ 2019-05-16 12:50                 ` Tetsuo Handa
  0 siblings, 0 replies; 28+ messages in thread
From: Tetsuo Handa @ 2019-05-16 12:50 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, Alexander Viro, syzbot, dvyukov, linux-fsdevel,
	linux-kernel, syzkaller-bugs, linux-block

On 2019/05/16 21:32, Jan Kara wrote:
> On Thu 16-05-19 21:17:14, Tetsuo Handa wrote:
>> On 2019/05/16 20:48, Jan Kara wrote:
>>> OK, so non-racy fix was a bit more involved and I've ended up just
>>> upgrading the file reference to an exclusive one in loop_set_fd() instead
>>> of trying to hand-craft some locking solution. The result is attached and
>>> it passes blktests.
>>
>> blkdev_get() has corresponding blkdev_put().
>> bdgrab() does not have corresponding bdput() ?
> 
> Yes, and that's hidden inside blkdev_put() (or failing blkdev_get()). Don't
> get me started on calling conventions of these functions... I've wasted half
> an hour trying to figure out where I'm leaking inode references in my patch
> ;).

Ah, found tricky comment. Please apply the patch. Thank you.

/**
 * blkdev_get - open a block device
 * @bdev: block_device to open
 * @mode: FMODE_* mask
 * @holder: exclusive holder identifier
 *
 * Open @bdev with @mode.  If @mode includes %FMODE_EXCL, @bdev is
 * open with exclusive access.  Specifying %FMODE_EXCL with %NULL
 * @holder is invalid.  Exclusive opens may nest for the same @holder.
 *
 * On success, the reference count of @bdev is unchanged.  On failure,
 * @bdev is put.
 *
 * CONTEXT:
 * Might sleep.
 *
 * RETURNS:
 * 0 on success, -errno on failure.
 */

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2019-05-16 12:51 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-01 17:08 INFO: task hung in __get_super syzbot
2018-06-19 11:44 ` Tetsuo Handa
2018-06-19 11:53   ` Dmitry Vyukov
2018-06-19 14:10     ` Tetsuo Handa
2018-06-19 14:15       ` Dmitry Vyukov
2018-06-20  1:15         ` Tetsuo Handa
2019-04-28 18:14 ` syzbot
2019-04-28 18:51   ` Al Viro
2019-04-29  1:38     ` Tetsuo Handa
2019-04-29  5:30     ` Dmitry Vyukov
2019-04-30  2:55     ` Jan Kara
2019-04-30  3:11       ` Al Viro
2019-04-30 13:07         ` Jan Kara
2019-04-30 13:18           ` Al Viro
2019-04-30 15:07             ` Jan Kara
2019-04-30 15:34               ` Tetsuo Handa
2019-05-03 10:29                 ` Tetsuo Handa
2019-05-14  5:15   ` Tetsuo Handa
2019-05-15  1:02   ` Tetsuo Handa
2019-05-15 10:21     ` Jan Kara
2019-05-15 11:32       ` Tetsuo Handa
2019-05-15 13:07         ` Jan Kara
2019-05-16 11:48           ` Jan Kara
2019-05-16 12:17             ` Tetsuo Handa
2019-05-16 12:32               ` Jan Kara
2019-05-16 12:50                 ` Tetsuo Handa
2019-05-16 12:33             ` syzbot
2019-05-15 11:59       ` syzbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).