Re: INFO: task hung in __sb_start_write

From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Will Deacon <will.deacon@arm.com>
Cc: syzbot <syzbot+7b2866454055e43c21e5@syzkaller.appspotmail.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	syzkaller-bugs@googlegroups.com, viro@zeniv.linux.org.uk
Subject: Re: INFO: task hung in __sb_start_write
Date: Sun, 10 Jun 2018 23:47:56 +0900	[thread overview]
Message-ID: <ee29ad1e-8bd0-3eef-f49e-793bfadcbea7@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <000000000000283c37056b4a81a5@google.com>

Hello.

Commits

  401c636a0eeb0d51 "kernel/hung_task.c: show all hung tasks before panic"
  8cc05c71ba5f7936 "locking/lockdep: Move sanity check to inside lockdep_print_held_locks()"

arrived at linux.git and syzbot started giving us more hints.

Quoting from https://syzkaller.appspot.com/text?tag=CrashReport&x=1477e81f800000 :
----------------------------------------
2 locks held by rs:main Q:Reg/4416:
 #0: 00000000dff3f899 (&f->f_pos_lock){+.+.}, at: __fdget_pos+0x1a9/0x1e0 fs/file.c:766
 #1: 00000000c4a96cb8 (sb_writers#6){++++}, at: file_start_write include/linux/fs.h:2737 [inline]
 #1: 00000000c4a96cb8 (sb_writers#6){++++}, at: vfs_write+0x452/0x560 fs/read_write.c:548

1 lock held by rsyslogd/4418:
 #0: 000000007f0c215c (&f->f_pos_lock){+.+.}, at: __fdget_pos+0x1a9/0x1e0 fs/file.c:766

syz-executor4   D22224  4597   4588 0x00000000
Call Trace:
 context_switch kernel/sched/core.c:2856 [inline]
 __schedule+0x801/0x1e30 kernel/sched/core.c:3498
 schedule+0xef/0x430 kernel/sched/core.c:3542
 __rwsem_down_read_failed_common kernel/locking/rwsem-xadd.c:269 [inline]
 rwsem_down_read_failed+0x350/0x5e0 kernel/locking/rwsem-xadd.c:286
 call_rwsem_down_read_failed+0x18/0x30 arch/x86/lib/rwsem.S:94
 __down_read arch/x86/include/asm/rwsem.h:83 [inline]
 __percpu_down_read+0x15d/0x200 kernel/locking/percpu-rwsem.c:85
 percpu_down_read_preempt_disable include/linux/percpu-rwsem.h:49 [inline]
 percpu_down_read include/linux/percpu-rwsem.h:59 [inline]
 __sb_start_write+0x2d7/0x300 fs/super.c:1403
 sb_start_write include/linux/fs.h:1552 [inline]
 mnt_want_write+0x3f/0xc0 fs/namespace.c:386
 do_unlinkat+0x2a3/0xa10 fs/namei.c:4026
 __do_sys_unlink fs/namei.c:4091 [inline]
 __se_sys_unlink fs/namei.c:4089 [inline]
 __x64_sys_unlink+0x42/0x50 fs/namei.c:4089
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
1 lock held by syz-executor4/4597:
 #0: 00000000c4a96cb8 (sb_writers#6){++++}, at: sb_start_write include/linux/fs.h:1552 [inline]
 #0: 00000000c4a96cb8 (sb_writers#6){++++}, at: mnt_want_write+0x3f/0xc0 fs/namespace.c:386

syz-executor6   D21536  4600   4591 0x00000000
Call Trace:
 context_switch kernel/sched/core.c:2856 [inline]
 __schedule+0x801/0x1e30 kernel/sched/core.c:3498
 schedule+0xef/0x430 kernel/sched/core.c:3542
 __rwsem_down_read_failed_common kernel/locking/rwsem-xadd.c:269 [inline]
 rwsem_down_read_failed+0x350/0x5e0 kernel/locking/rwsem-xadd.c:286
 call_rwsem_down_read_failed+0x18/0x30 arch/x86/lib/rwsem.S:94
 __down_read arch/x86/include/asm/rwsem.h:83 [inline]
 __percpu_down_read+0x15d/0x200 kernel/locking/percpu-rwsem.c:85
 percpu_down_read_preempt_disable include/linux/percpu-rwsem.h:49 [inline]
 percpu_down_read include/linux/percpu-rwsem.h:59 [inline]
 __sb_start_write+0x2d7/0x300 fs/super.c:1403
 sb_start_write include/linux/fs.h:1552 [inline]
 mnt_want_write+0x3f/0xc0 fs/namespace.c:386
 do_unlinkat+0x2a3/0xa10 fs/namei.c:4026
 __do_sys_unlink fs/namei.c:4091 [inline]
 __se_sys_unlink fs/namei.c:4089 [inline]
 __x64_sys_unlink+0x42/0x50 fs/namei.c:4089
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
1 lock held by syz-executor6/4600:
 #0: 00000000c4a96cb8 (sb_writers#6){++++}, at: sb_start_write include/linux/fs.h:1552 [inline]
 #0: 00000000c4a96cb8 (sb_writers#6){++++}, at: mnt_want_write+0x3f/0xc0 fs/namespace.c:386

syz-executor3   D21536  7320   7261 0x00000000
Call Trace:
 context_switch kernel/sched/core.c:2856 [inline]
 __schedule+0x801/0x1e30 kernel/sched/core.c:3498
 schedule+0xef/0x430 kernel/sched/core.c:3542
 __rwsem_down_read_failed_common kernel/locking/rwsem-xadd.c:269 [inline]
 rwsem_down_read_failed+0x350/0x5e0 kernel/locking/rwsem-xadd.c:286
 call_rwsem_down_read_failed+0x18/0x30 arch/x86/lib/rwsem.S:94
 __down_read arch/x86/include/asm/rwsem.h:83 [inline]
 __percpu_down_read+0x15d/0x200 kernel/locking/percpu-rwsem.c:85
 percpu_down_read_preempt_disable include/linux/percpu-rwsem.h:49 [inline]
 percpu_down_read include/linux/percpu-rwsem.h:59 [inline]
 __sb_start_write+0x2d7/0x300 fs/super.c:1403
 sb_start_write include/linux/fs.h:1552 [inline]
 mnt_want_write+0x3f/0xc0 fs/namespace.c:386
 do_unlinkat+0x2a3/0xa10 fs/namei.c:4026
 __do_sys_unlink fs/namei.c:4091 [inline]
 __se_sys_unlink fs/namei.c:4089 [inline]
 __x64_sys_unlink+0x42/0x50 fs/namei.c:4089
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
1 lock held by syz-executor3/7320:
 #0: 00000000c4a96cb8 (sb_writers#6){++++}, at: sb_start_write include/linux/fs.h:1552 [inline]
 #0: 00000000c4a96cb8 (sb_writers#6){++++}, at: mnt_want_write+0x3f/0xc0 fs/namespace.c:386

syz-executor2   D24032 13221   4593 0x00000004
Call Trace:
 context_switch kernel/sched/core.c:2856 [inline]
 __schedule+0x801/0x1e30 kernel/sched/core.c:3498
 schedule+0xef/0x430 kernel/sched/core.c:3542
 __rwsem_down_read_failed_common kernel/locking/rwsem-xadd.c:269 [inline]
 rwsem_down_read_failed+0x350/0x5e0 kernel/locking/rwsem-xadd.c:286
 call_rwsem_down_read_failed+0x18/0x30 arch/x86/lib/rwsem.S:94
 __down_read arch/x86/include/asm/rwsem.h:83 [inline]
 __percpu_down_read+0x15d/0x200 kernel/locking/percpu-rwsem.c:85
 percpu_down_read_preempt_disable include/linux/percpu-rwsem.h:49 [inline]
 percpu_down_read include/linux/percpu-rwsem.h:59 [inline]
 __sb_start_write+0x2d7/0x300 fs/super.c:1403
 sb_start_pagefault include/linux/fs.h:1581 [inline]
 ext4_page_mkwrite+0x1c8/0x1420 fs/ext4/inode.c:6125
 do_page_mkwrite+0x146/0x500 mm/memory.c:2380
 do_shared_fault mm/memory.c:3706 [inline]
 do_fault mm/memory.c:3745 [inline]
 handle_pte_fault mm/memory.c:3972 [inline]
 __handle_mm_fault+0x2acb/0x4390 mm/memory.c:4096
 handle_mm_fault+0x53a/0xc70 mm/memory.c:4133
 __do_page_fault+0x60b/0xe40 arch/x86/mm/fault.c:1403
 do_page_fault+0xee/0x8a7 arch/x86/mm/fault.c:1478
 page_fault+0x1e/0x30 arch/x86/entry/entry_64.S:1160
2 locks held by syz-executor2/13221:
 #0: 000000002618cb0b (&mm->mmap_sem){++++}, at: __do_page_fault+0x381/0xe40 arch/x86/mm/fault.c:1332
 #1: 00000000a50d0f8c (sb_pagefaults){++++}, at: sb_start_pagefault include/linux/fs.h:1581 [inline]
 #1: 00000000a50d0f8c (sb_pagefaults){++++}, at: ext4_page_mkwrite+0x1c8/0x1420 fs/ext4/inode.c:6125

syz-executor5   D24696 13225   4598 0x00000004
Call Trace:
 context_switch kernel/sched/core.c:2856 [inline]
 __schedule+0x801/0x1e30 kernel/sched/core.c:3498
 schedule+0xef/0x430 kernel/sched/core.c:3542
 __rwsem_down_read_failed_common kernel/locking/rwsem-xadd.c:269 [inline]
 rwsem_down_read_failed+0x350/0x5e0 kernel/locking/rwsem-xadd.c:286
 call_rwsem_down_read_failed+0x18/0x30 arch/x86/lib/rwsem.S:94
 __down_read arch/x86/include/asm/rwsem.h:83 [inline]
 __percpu_down_read+0x15d/0x200 kernel/locking/percpu-rwsem.c:85
 percpu_down_read_preempt_disable include/linux/percpu-rwsem.h:49 [inline]
 percpu_down_read include/linux/percpu-rwsem.h:59 [inline]
 __sb_start_write+0x2d7/0x300 fs/super.c:1403
 sb_start_pagefault include/linux/fs.h:1581 [inline]
 ext4_page_mkwrite+0x1c8/0x1420 fs/ext4/inode.c:6125
 do_page_mkwrite+0x146/0x500 mm/memory.c:2380
 wp_page_shared mm/memory.c:2676 [inline]
 do_wp_page+0xf5d/0x1990 mm/memory.c:2776
 handle_pte_fault mm/memory.c:3988 [inline]
 __handle_mm_fault+0x29f5/0x4390 mm/memory.c:4096
 handle_mm_fault+0x53a/0xc70 mm/memory.c:4133
 __do_page_fault+0x60b/0xe40 arch/x86/mm/fault.c:1403
 do_page_fault+0xee/0x8a7 arch/x86/mm/fault.c:1478
 page_fault+0x1e/0x30 arch/x86/entry/entry_64.S:1160
2 locks held by syz-executor5/13225:
 #0: 000000000aa505ed (&mm->mmap_sem){++++}, at: __do_page_fault+0x381/0xe40 arch/x86/mm/fault.c:1332
 #1: 00000000a50d0f8c (sb_pagefaults){++++}, at: sb_start_pagefault include/linux/fs.h:1581 [inline]
 #1: 00000000a50d0f8c (sb_pagefaults){++++}, at: ext4_page_mkwrite+0x1c8/0x1420 fs/ext4/inode.c:6125

syz-executor1   D23432 13236   4594 0x00000004
Call Trace:
 context_switch kernel/sched/core.c:2856 [inline]
 __schedule+0x801/0x1e30 kernel/sched/core.c:3498
 schedule+0xef/0x430 kernel/sched/core.c:3542
 __rwsem_down_read_failed_common kernel/locking/rwsem-xadd.c:269 [inline]
 rwsem_down_read_failed+0x350/0x5e0 kernel/locking/rwsem-xadd.c:286
 call_rwsem_down_read_failed+0x18/0x30 arch/x86/lib/rwsem.S:94
 __down_read arch/x86/include/asm/rwsem.h:83 [inline]
 __percpu_down_read+0x15d/0x200 kernel/locking/percpu-rwsem.c:85
 percpu_down_read_preempt_disable include/linux/percpu-rwsem.h:49 [inline]
 percpu_down_read include/linux/percpu-rwsem.h:59 [inline]
 __sb_start_write+0x2d7/0x300 fs/super.c:1403
 sb_start_pagefault include/linux/fs.h:1581 [inline]
 ext4_page_mkwrite+0x1c8/0x1420 fs/ext4/inode.c:6125
 do_page_mkwrite+0x146/0x500 mm/memory.c:2380
 wp_page_shared mm/memory.c:2676 [inline]
 do_wp_page+0xf5d/0x1990 mm/memory.c:2776
 handle_pte_fault mm/memory.c:3988 [inline]
 __handle_mm_fault+0x29f5/0x4390 mm/memory.c:4096
 handle_mm_fault+0x53a/0xc70 mm/memory.c:4133
 __do_page_fault+0x60b/0xe40 arch/x86/mm/fault.c:1403
 do_page_fault+0xee/0x8a7 arch/x86/mm/fault.c:1478
 page_fault+0x1e/0x30 arch/x86/entry/entry_64.S:1160
2 locks held by syz-executor1/13236:
 #0: 0000000079279c9c (&mm->mmap_sem){++++}, at: __do_page_fault+0x381/0xe40 arch/x86/mm/fault.c:1332
 #1: 00000000a50d0f8c (sb_pagefaults){++++}, at: sb_start_pagefault include/linux/fs.h:1581 [inline]
 #1: 00000000a50d0f8c (sb_pagefaults){++++}, at: ext4_page_mkwrite+0x1c8/0x1420 fs/ext4/inode.c:6125
----------------------------------------

This looks quite strange that nobody is holding percpu_rw_semaphore for
write but everybody is stuck trying to hold it for read. (Since there
is no "X locks held by ..." line without followup "#0:" line, there is
no possibility that somebody is in TASK_RUNNING state while holding
percpu_rw_semaphore for write.)

I feel that either API has a bug or API usage is wrong.
Any idea for debugging this?