* possible deadlock in proc_pid_syscall (2) @ 2020-08-28 4:57 syzbot 2020-08-28 12:01 ` Eric W. Biederman 0 siblings, 1 reply; 7+ messages in thread From: syzbot @ 2020-08-28 4:57 UTC (permalink / raw) To: adobriyan, akpm, avagin, christian, ebiederm, gladkov.alexey, keescook, linux-fsdevel, linux-kernel, syzkaller-bugs, walken Hello, syzbot found the following issue on: HEAD commit: 15bc20c6 Merge tag 'tty-5.9-rc3' of git://git.kernel.org/p.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=15349f96900000 kernel config: https://syzkaller.appspot.com/x/.config?x=978db74cb30aa994 dashboard link: https://syzkaller.appspot.com/bug?extid=db9cdf3dd1f64252c6ef compiler: gcc (GCC) 10.1.0-syz 20200507 Unfortunately, I don't have any reproducer for this issue yet. IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+db9cdf3dd1f64252c6ef@syzkaller.appspotmail.com ====================================================== WARNING: possible circular locking dependency detected 5.9.0-rc2-syzkaller #0 Not tainted ------------------------------------------------------ syz-executor.0/18445 is trying to acquire lock: ffff88809f2e0dc8 (&sig->exec_update_mutex){+.+.}-{3:3}, at: lock_trace fs/proc/base.c:408 [inline] ffff88809f2e0dc8 (&sig->exec_update_mutex){+.+.}-{3:3}, at: proc_pid_syscall+0xaa/0x2b0 fs/proc/base.c:646 but task is already holding lock: ffff88808e9a3c30 (&p->lock){+.+.}-{3:3}, at: seq_read+0x61/0x1070 fs/seq_file.c:155 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&p->lock){+.+.}-{3:3}: __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 seq_read+0x61/0x1070 fs/seq_file.c:155 pde_read fs/proc/inode.c:306 [inline] proc_reg_read+0x221/0x300 fs/proc/inode.c:318 do_loop_readv_writev fs/read_write.c:734 [inline] do_loop_readv_writev fs/read_write.c:721 [inline] do_iter_read+0x48e/0x6e0 fs/read_write.c:955 vfs_readv+0xe5/0x150 fs/read_write.c:1073 kernel_readv fs/splice.c:355 [inline] default_file_splice_read.constprop.0+0x4e6/0x9e0 fs/splice.c:412 do_splice_to+0x137/0x170 fs/splice.c:871 splice_direct_to_actor+0x307/0x980 fs/splice.c:950 do_splice_direct+0x1b3/0x280 fs/splice.c:1059 do_sendfile+0x55f/0xd40 fs/read_write.c:1540 __do_sys_sendfile64 fs/read_write.c:1601 [inline] __se_sys_sendfile64 fs/read_write.c:1587 [inline] __x64_sys_sendfile64+0x1cc/0x210 fs/read_write.c:1587 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #2 (sb_writers#4){.+.+}-{0:0}: percpu_down_read include/linux/percpu-rwsem.h:51 [inline] __sb_start_write+0x234/0x470 fs/super.c:1672 sb_start_write include/linux/fs.h:1643 [inline] mnt_want_write+0x3a/0xb0 fs/namespace.c:354 ovl_setattr+0x5c/0x850 fs/overlayfs/inode.c:28 notify_change+0xb60/0x10a0 fs/attr.c:336 chown_common+0x4a9/0x550 fs/open.c:674 do_fchownat+0x126/0x1e0 fs/open.c:704 __do_sys_lchown fs/open.c:729 [inline] __se_sys_lchown fs/open.c:727 [inline] __x64_sys_lchown+0x7a/0xc0 fs/open.c:727 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #1 (&ovl_i_mutex_dir_key[depth]){++++}-{3:3}: down_read+0x96/0x420 kernel/locking/rwsem.c:1492 inode_lock_shared include/linux/fs.h:789 [inline] lookup_slow fs/namei.c:1560 [inline] walk_component+0x409/0x6a0 fs/namei.c:1860 lookup_last fs/namei.c:2309 [inline] path_lookupat+0x1ba/0x830 fs/namei.c:2333 filename_lookup+0x19f/0x560 fs/namei.c:2366 create_local_trace_uprobe+0x87/0x4e0 kernel/trace/trace_uprobe.c:1574 perf_uprobe_init+0x132/0x210 kernel/trace/trace_event_perf.c:323 perf_uprobe_event_init+0xff/0x1c0 kernel/events/core.c:9580 perf_try_init_event+0x12a/0x560 kernel/events/core.c:10899 perf_init_event kernel/events/core.c:10951 [inline] perf_event_alloc.part.0+0xdee/0x3770 kernel/events/core.c:11229 perf_event_alloc kernel/events/core.c:11608 [inline] __do_sys_perf_event_open+0x72c/0x2cb0 kernel/events/core.c:11724 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&sig->exec_update_mutex){+.+.}-{3:3}: check_prev_add kernel/locking/lockdep.c:2496 [inline] check_prevs_add kernel/locking/lockdep.c:2601 [inline] validate_chain kernel/locking/lockdep.c:3218 [inline] __lock_acquire+0x2a6b/0x5640 kernel/locking/lockdep.c:4426 lock_acquire+0x1f1/0xad0 kernel/locking/lockdep.c:5005 __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 lock_trace fs/proc/base.c:408 [inline] proc_pid_syscall+0xaa/0x2b0 fs/proc/base.c:646 proc_single_show+0x116/0x1e0 fs/proc/base.c:775 seq_read+0x432/0x1070 fs/seq_file.c:208 do_loop_readv_writev fs/read_write.c:734 [inline] do_loop_readv_writev fs/read_write.c:721 [inline] do_iter_read+0x48e/0x6e0 fs/read_write.c:955 vfs_readv+0xe5/0x150 fs/read_write.c:1073 do_preadv fs/read_write.c:1165 [inline] __do_sys_preadv fs/read_write.c:1215 [inline] __se_sys_preadv fs/read_write.c:1210 [inline] __x64_sys_preadv+0x231/0x310 fs/read_write.c:1210 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Chain exists of: &sig->exec_update_mutex --> sb_writers#4 --> &p->lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&p->lock); lock(sb_writers#4); lock(&p->lock); lock(&sig->exec_update_mutex); *** DEADLOCK *** 1 lock held by syz-executor.0/18445: #0: ffff88808e9a3c30 (&p->lock){+.+.}-{3:3}, at: seq_read+0x61/0x1070 fs/seq_file.c:155 stack backtrace: CPU: 0 PID: 18445 Comm: syz-executor.0 Not tainted 5.9.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x18f/0x20d lib/dump_stack.c:118 check_noncircular+0x324/0x3e0 kernel/locking/lockdep.c:1827 check_prev_add kernel/locking/lockdep.c:2496 [inline] check_prevs_add kernel/locking/lockdep.c:2601 [inline] validate_chain kernel/locking/lockdep.c:3218 [inline] __lock_acquire+0x2a6b/0x5640 kernel/locking/lockdep.c:4426 lock_acquire+0x1f1/0xad0 kernel/locking/lockdep.c:5005 __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 lock_trace fs/proc/base.c:408 [inline] proc_pid_syscall+0xaa/0x2b0 fs/proc/base.c:646 proc_single_show+0x116/0x1e0 fs/proc/base.c:775 seq_read+0x432/0x1070 fs/seq_file.c:208 do_loop_readv_writev fs/read_write.c:734 [inline] do_loop_readv_writev fs/read_write.c:721 [inline] do_iter_read+0x48e/0x6e0 fs/read_write.c:955 vfs_readv+0xe5/0x150 fs/read_write.c:1073 do_preadv fs/read_write.c:1165 [inline] __do_sys_preadv fs/read_write.c:1215 [inline] __se_sys_preadv fs/read_write.c:1210 [inline] __x64_sys_preadv+0x231/0x310 fs/read_write.c:1210 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45d5b9 Code: 5d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fb613f9ec78 EFLAGS: 00000246 ORIG_RAX: 0000000000000127 RAX: ffffffffffffffda RBX: 0000000000025740 RCX: 000000000045d5b9 RDX: 0000000000000333 RSI: 00000000200017c0 RDI: 0000000000000006 RBP: 000000000118cf90 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118cf4c R13: 00007ffe2a82bbbf R14: 00007fb613f9f9c0 R15: 000000000118cf4c --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: possible deadlock in proc_pid_syscall (2) 2020-08-28 4:57 possible deadlock in proc_pid_syscall (2) syzbot @ 2020-08-28 12:01 ` Eric W. Biederman 2020-08-28 12:37 ` peterz 0 siblings, 1 reply; 7+ messages in thread From: Eric W. Biederman @ 2020-08-28 12:01 UTC (permalink / raw) To: syzbot Cc: adobriyan, akpm, avagin, christian, gladkov.alexey, keescook, linux-fsdevel, linux-kernel, syzkaller-bugs, walken, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Miklos Szeredi syzbot <syzbot+db9cdf3dd1f64252c6ef@syzkaller.appspotmail.com> writes: > Hello, > > syzbot found the following issue on: > > HEAD commit: 15bc20c6 Merge tag 'tty-5.9-rc3' of git://git.kernel.org/p.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=15349f96900000 > kernel config: https://syzkaller.appspot.com/x/.config?x=978db74cb30aa994 > dashboard link: https://syzkaller.appspot.com/bug?extid=db9cdf3dd1f64252c6ef > compiler: gcc (GCC) 10.1.0-syz 20200507 > > Unfortunately, I don't have any reproducer for this issue yet. Ok. So if I read this set of traces correctly: - perf_event_open holds exec_update_mutex - perf_event_open can call kern_path which for overlayfs takes ovl_i_mutex - chown on overlayfs calls mnt_want_write which takes sb_writes - sendfile/splice can read from a seq_file (which takes p->lock) while holding mnt_want_write aka sb_writes of the target file. - There are proc files that are seq_files that first take p->lock then take exec_update_mutex So this looks real, if painful. I have added some likely looking overlayfs and perf people to look at this. This feels like an issue where perf can just do too much under exec_update_mutex. In particular calling kern_path from create_local_trace_uprobe. Calling into the vfs at the very least makes it impossible to know exactly which locks will be taken. Thoughts? Eric > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+db9cdf3dd1f64252c6ef@syzkaller.appspotmail.com > > ====================================================== > WARNING: possible circular locking dependency detected > 5.9.0-rc2-syzkaller #0 Not tainted > ------------------------------------------------------ > syz-executor.0/18445 is trying to acquire lock: > ffff88809f2e0dc8 (&sig->exec_update_mutex){+.+.}-{3:3}, at: lock_trace fs/proc/base.c:408 [inline] > ffff88809f2e0dc8 (&sig->exec_update_mutex){+.+.}-{3:3}, at: proc_pid_syscall+0xaa/0x2b0 fs/proc/base.c:646 > > but task is already holding lock: > ffff88808e9a3c30 (&p->lock){+.+.}-{3:3}, at: seq_read+0x61/0x1070 fs/seq_file.c:155 > > which lock already depends on the new lock. > > > the existing dependency chain (in reverse order) is: > > -> #3 (&p->lock){+.+.}-{3:3}: > __mutex_lock_common kernel/locking/mutex.c:956 [inline] > __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 > seq_read+0x61/0x1070 fs/seq_file.c:155 > pde_read fs/proc/inode.c:306 [inline] > proc_reg_read+0x221/0x300 fs/proc/inode.c:318 > do_loop_readv_writev fs/read_write.c:734 [inline] > do_loop_readv_writev fs/read_write.c:721 [inline] > do_iter_read+0x48e/0x6e0 fs/read_write.c:955 > vfs_readv+0xe5/0x150 fs/read_write.c:1073 > kernel_readv fs/splice.c:355 [inline] > default_file_splice_read.constprop.0+0x4e6/0x9e0 fs/splice.c:412 > do_splice_to+0x137/0x170 fs/splice.c:871 > splice_direct_to_actor+0x307/0x980 fs/splice.c:950 > do_splice_direct+0x1b3/0x280 fs/splice.c:1059 > do_sendfile+0x55f/0xd40 fs/read_write.c:1540 > __do_sys_sendfile64 fs/read_write.c:1601 [inline] > __se_sys_sendfile64 fs/read_write.c:1587 [inline] > __x64_sys_sendfile64+0x1cc/0x210 fs/read_write.c:1587 > do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > -> #2 (sb_writers#4){.+.+}-{0:0}: > percpu_down_read include/linux/percpu-rwsem.h:51 [inline] > __sb_start_write+0x234/0x470 fs/super.c:1672 > sb_start_write include/linux/fs.h:1643 [inline] > mnt_want_write+0x3a/0xb0 fs/namespace.c:354 > ovl_setattr+0x5c/0x850 fs/overlayfs/inode.c:28 > notify_change+0xb60/0x10a0 fs/attr.c:336 > chown_common+0x4a9/0x550 fs/open.c:674 > do_fchownat+0x126/0x1e0 fs/open.c:704 > __do_sys_lchown fs/open.c:729 [inline] > __se_sys_lchown fs/open.c:727 [inline] > __x64_sys_lchown+0x7a/0xc0 fs/open.c:727 > do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > -> #1 (&ovl_i_mutex_dir_key[depth]){++++}-{3:3}: > down_read+0x96/0x420 kernel/locking/rwsem.c:1492 > inode_lock_shared include/linux/fs.h:789 [inline] > lookup_slow fs/namei.c:1560 [inline] > walk_component+0x409/0x6a0 fs/namei.c:1860 > lookup_last fs/namei.c:2309 [inline] > path_lookupat+0x1ba/0x830 fs/namei.c:2333 > filename_lookup+0x19f/0x560 fs/namei.c:2366 > create_local_trace_uprobe+0x87/0x4e0 kernel/trace/trace_uprobe.c:1574 > perf_uprobe_init+0x132/0x210 kernel/trace/trace_event_perf.c:323 > perf_uprobe_event_init+0xff/0x1c0 kernel/events/core.c:9580 > perf_try_init_event+0x12a/0x560 kernel/events/core.c:10899 > perf_init_event kernel/events/core.c:10951 [inline] > perf_event_alloc.part.0+0xdee/0x3770 kernel/events/core.c:11229 > perf_event_alloc kernel/events/core.c:11608 [inline] > __do_sys_perf_event_open+0x72c/0x2cb0 kernel/events/core.c:11724 > do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > -> #0 (&sig->exec_update_mutex){+.+.}-{3:3}: > check_prev_add kernel/locking/lockdep.c:2496 [inline] > check_prevs_add kernel/locking/lockdep.c:2601 [inline] > validate_chain kernel/locking/lockdep.c:3218 [inline] > __lock_acquire+0x2a6b/0x5640 kernel/locking/lockdep.c:4426 > lock_acquire+0x1f1/0xad0 kernel/locking/lockdep.c:5005 > __mutex_lock_common kernel/locking/mutex.c:956 [inline] > __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 > lock_trace fs/proc/base.c:408 [inline] > proc_pid_syscall+0xaa/0x2b0 fs/proc/base.c:646 > proc_single_show+0x116/0x1e0 fs/proc/base.c:775 > seq_read+0x432/0x1070 fs/seq_file.c:208 > do_loop_readv_writev fs/read_write.c:734 [inline] > do_loop_readv_writev fs/read_write.c:721 [inline] > do_iter_read+0x48e/0x6e0 fs/read_write.c:955 > vfs_readv+0xe5/0x150 fs/read_write.c:1073 > do_preadv fs/read_write.c:1165 [inline] > __do_sys_preadv fs/read_write.c:1215 [inline] > __se_sys_preadv fs/read_write.c:1210 [inline] > __x64_sys_preadv+0x231/0x310 fs/read_write.c:1210 > do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > other info that might help us debug this: > > Chain exists of: > &sig->exec_update_mutex --> sb_writers#4 --> &p->lock > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(&p->lock); > lock(sb_writers#4); > lock(&p->lock); > lock(&sig->exec_update_mutex); > > *** DEADLOCK *** > > 1 lock held by syz-executor.0/18445: > #0: ffff88808e9a3c30 (&p->lock){+.+.}-{3:3}, at: seq_read+0x61/0x1070 fs/seq_file.c:155 > > stack backtrace: > CPU: 0 PID: 18445 Comm: syz-executor.0 Not tainted 5.9.0-rc2-syzkaller #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > Call Trace: > __dump_stack lib/dump_stack.c:77 [inline] > dump_stack+0x18f/0x20d lib/dump_stack.c:118 > check_noncircular+0x324/0x3e0 kernel/locking/lockdep.c:1827 > check_prev_add kernel/locking/lockdep.c:2496 [inline] > check_prevs_add kernel/locking/lockdep.c:2601 [inline] > validate_chain kernel/locking/lockdep.c:3218 [inline] > __lock_acquire+0x2a6b/0x5640 kernel/locking/lockdep.c:4426 > lock_acquire+0x1f1/0xad0 kernel/locking/lockdep.c:5005 > __mutex_lock_common kernel/locking/mutex.c:956 [inline] > __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 > lock_trace fs/proc/base.c:408 [inline] > proc_pid_syscall+0xaa/0x2b0 fs/proc/base.c:646 > proc_single_show+0x116/0x1e0 fs/proc/base.c:775 > seq_read+0x432/0x1070 fs/seq_file.c:208 > do_loop_readv_writev fs/read_write.c:734 [inline] > do_loop_readv_writev fs/read_write.c:721 [inline] > do_iter_read+0x48e/0x6e0 fs/read_write.c:955 > vfs_readv+0xe5/0x150 fs/read_write.c:1073 > do_preadv fs/read_write.c:1165 [inline] > __do_sys_preadv fs/read_write.c:1215 [inline] > __se_sys_preadv fs/read_write.c:1210 [inline] > __x64_sys_preadv+0x231/0x310 fs/read_write.c:1210 > do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > RIP: 0033:0x45d5b9 > Code: 5d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 > RSP: 002b:00007fb613f9ec78 EFLAGS: 00000246 ORIG_RAX: 0000000000000127 > RAX: ffffffffffffffda RBX: 0000000000025740 RCX: 000000000045d5b9 > RDX: 0000000000000333 RSI: 00000000200017c0 RDI: 0000000000000006 > RBP: 000000000118cf90 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118cf4c > R13: 00007ffe2a82bbbf R14: 00007fb613f9f9c0 R15: 000000000118cf4c > > > --- > This report is generated by a bot. It may contain errors. > See https://goo.gl/tpsmEJ for more information about syzbot. > syzbot engineers can be reached at syzkaller@googlegroups.com. > > syzbot will keep track of this issue. See: > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: possible deadlock in proc_pid_syscall (2) 2020-08-28 12:01 ` Eric W. Biederman @ 2020-08-28 12:37 ` peterz 2020-08-30 12:31 ` Eric W. Biederman 0 siblings, 1 reply; 7+ messages in thread From: peterz @ 2020-08-28 12:37 UTC (permalink / raw) To: Eric W. Biederman Cc: syzbot, adobriyan, akpm, avagin, christian, gladkov.alexey, keescook, linux-fsdevel, linux-kernel, syzkaller-bugs, walken, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Miklos Szeredi, jannh On Fri, Aug 28, 2020 at 07:01:17AM -0500, Eric W. Biederman wrote: > This feels like an issue where perf can just do too much under > exec_update_mutex. In particular calling kern_path from > create_local_trace_uprobe. Calling into the vfs at the very least > makes it impossible to know exactly which locks will be taken. > > Thoughts? > > -> #1 (&ovl_i_mutex_dir_key[depth]){++++}-{3:3}: > > down_read+0x96/0x420 kernel/locking/rwsem.c:1492 > > inode_lock_shared include/linux/fs.h:789 [inline] > > lookup_slow fs/namei.c:1560 [inline] > > walk_component+0x409/0x6a0 fs/namei.c:1860 > > lookup_last fs/namei.c:2309 [inline] > > path_lookupat+0x1ba/0x830 fs/namei.c:2333 > > filename_lookup+0x19f/0x560 fs/namei.c:2366 > > create_local_trace_uprobe+0x87/0x4e0 kernel/trace/trace_uprobe.c:1574 > > perf_uprobe_init+0x132/0x210 kernel/trace/trace_event_perf.c:323 > > perf_uprobe_event_init+0xff/0x1c0 kernel/events/core.c:9580 > > perf_try_init_event+0x12a/0x560 kernel/events/core.c:10899 > > perf_init_event kernel/events/core.c:10951 [inline] > > perf_event_alloc.part.0+0xdee/0x3770 kernel/events/core.c:11229 > > perf_event_alloc kernel/events/core.c:11608 [inline] > > __do_sys_perf_event_open+0x72c/0x2cb0 kernel/events/core.c:11724 > > do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 > > entry_SYSCALL_64_after_hwframe+0x44/0xa9 Right, so we hold the mutex fairly long there, supposedly to ensure privs don't change out from under us. We do the permission checks early -- no point in doing anything else if we're not allowed, but we then have to hold this mutex until the event is actually installed according to that comment. /me goes look at git history: 6914303824bb5 - changed cred_guard_mutex into exec_update_mutex 79c9ce57eb2d5 - introduces cred_guard_mutex So that latter commit explains the race we're guarding against. Without this we can install the event after execve() which might have changed privs on us. I'm open to suggestions on how to do this differently. Could we check privs twice instead? Something like the completely untested below.. --- diff --git a/include/linux/freelist.h b/include/linux/freelist.h new file mode 100644 index 000000000000..e69de29bb2d1 diff --git a/kernel/events/core.c b/kernel/events/core.c index 5bfe8e3c6e44..14e6c9bbfcda 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -11701,21 +11701,9 @@ SYSCALL_DEFINE5(perf_event_open, } if (task) { - err = mutex_lock_interruptible(&task->signal->exec_update_mutex); - if (err) - goto err_task; - - /* - * Preserve ptrace permission check for backwards compatibility. - * - * We must hold exec_update_mutex across this and any potential - * perf_install_in_context() call for this new event to - * serialize against exec() altering our credentials (and the - * perf_event_exit_task() that could imply). - */ err = -EACCES; if (!perfmon_capable() && !ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS)) - goto err_cred; + goto err_task; } if (flags & PERF_FLAG_PID_CGROUP) @@ -11844,6 +11832,24 @@ SYSCALL_DEFINE5(perf_event_open, goto err_context; } + if (task) { + err = mutex_lock_interruptible(&task->signal->exec_update_mutex); + if (err) + goto err_file; + + /* + * Preserve ptrace permission check for backwards compatibility. + * + * We must hold exec_update_mutex across this and any potential + * perf_install_in_context() call for this new event to + * serialize against exec() altering our credentials (and the + * perf_event_exit_task() that could imply). + */ + err = -EACCES; + if (!perfmon_capable() && !ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS)) + goto err_cred; + } + if (move_group) { gctx = __perf_event_ctx_lock_double(group_leader, ctx); @@ -12019,7 +12025,10 @@ SYSCALL_DEFINE5(perf_event_open, if (move_group) perf_event_ctx_unlock(group_leader, gctx); mutex_unlock(&ctx->mutex); -/* err_file: */ +err_cred: + if (task) + mutex_unlock(&task->signal->exec_update_mutex); +err_file: fput(event_file); err_context: perf_unpin_context(ctx); @@ -12031,9 +12040,6 @@ SYSCALL_DEFINE5(perf_event_open, */ if (!event_file) free_event(event); -err_cred: - if (task) - mutex_unlock(&task->signal->exec_update_mutex); err_task: if (task) put_task_struct(task); ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: possible deadlock in proc_pid_syscall (2) 2020-08-28 12:37 ` peterz @ 2020-08-30 12:31 ` Eric W. Biederman 2020-08-31 7:43 ` peterz 2020-09-02 9:57 ` peterz 0 siblings, 2 replies; 7+ messages in thread From: Eric W. Biederman @ 2020-08-30 12:31 UTC (permalink / raw) To: peterz Cc: syzbot, adobriyan, akpm, avagin, christian, gladkov.alexey, keescook, linux-fsdevel, linux-kernel, syzkaller-bugs, walken, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Miklos Szeredi, jannh peterz@infradead.org writes: > On Fri, Aug 28, 2020 at 07:01:17AM -0500, Eric W. Biederman wrote: >> This feels like an issue where perf can just do too much under >> exec_update_mutex. In particular calling kern_path from >> create_local_trace_uprobe. Calling into the vfs at the very least >> makes it impossible to know exactly which locks will be taken. >> >> Thoughts? > >> > -> #1 (&ovl_i_mutex_dir_key[depth]){++++}-{3:3}: >> > down_read+0x96/0x420 kernel/locking/rwsem.c:1492 >> > inode_lock_shared include/linux/fs.h:789 [inline] >> > lookup_slow fs/namei.c:1560 [inline] >> > walk_component+0x409/0x6a0 fs/namei.c:1860 >> > lookup_last fs/namei.c:2309 [inline] >> > path_lookupat+0x1ba/0x830 fs/namei.c:2333 >> > filename_lookup+0x19f/0x560 fs/namei.c:2366 >> > create_local_trace_uprobe+0x87/0x4e0 kernel/trace/trace_uprobe.c:1574 >> > perf_uprobe_init+0x132/0x210 kernel/trace/trace_event_perf.c:323 >> > perf_uprobe_event_init+0xff/0x1c0 kernel/events/core.c:9580 >> > perf_try_init_event+0x12a/0x560 kernel/events/core.c:10899 >> > perf_init_event kernel/events/core.c:10951 [inline] >> > perf_event_alloc.part.0+0xdee/0x3770 kernel/events/core.c:11229 >> > perf_event_alloc kernel/events/core.c:11608 [inline] >> > __do_sys_perf_event_open+0x72c/0x2cb0 kernel/events/core.c:11724 >> > do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 >> > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > Right, so we hold the mutex fairly long there, supposedly to ensure > privs don't change out from under us. > > We do the permission checks early -- no point in doing anything else if > we're not allowed, but we then have to hold this mutex until the event > is actually installed according to that comment. > > /me goes look at git history: > > 6914303824bb5 - changed cred_guard_mutex into exec_update_mutex > 79c9ce57eb2d5 - introduces cred_guard_mutex > > So that latter commit explains the race we're guarding against. Without > this we can install the event after execve() which might have changed > privs on us. > > I'm open to suggestions on how to do this differently. > > Could we check privs twice instead? > > Something like the completely untested below.. That might work. I am thinking that for cases where we want to do significant work it might be better to ask the process to pause at someplace safe (probably get_signal) and then do all of the work when we know nothing is changing in the process. I don't really like the idea of checking and then checking again. We might have to do it but it feels like the model is wrong somewhere. Given that this is tricky to hit in practice, and given that I am already working the general problem of how to sort out the locking I am going to work this with the rest of the thorny issues of in exec. This feels like a case where the proper solution is that we simply need something better than a mutex. I had not realized before this how much setting up tracing in perf_even_open looks like attaching a debugger in ptrace_attach. I need to look at this some more but I suspect exec should be treating a tracer like exec currently treats a ptracer for purposes of permission checks. So I think we may have more issues than simply the possibility of deadlock on exec_update_mutex. Eric > --- > diff --git a/include/linux/freelist.h b/include/linux/freelist.h > new file mode 100644 > index 000000000000..e69de29bb2d1 > diff --git a/kernel/events/core.c b/kernel/events/core.c > index 5bfe8e3c6e44..14e6c9bbfcda 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -11701,21 +11701,9 @@ SYSCALL_DEFINE5(perf_event_open, > } > > if (task) { > - err = mutex_lock_interruptible(&task->signal->exec_update_mutex); > - if (err) > - goto err_task; > - > - /* > - * Preserve ptrace permission check for backwards compatibility. > - * > - * We must hold exec_update_mutex across this and any potential > - * perf_install_in_context() call for this new event to > - * serialize against exec() altering our credentials (and the > - * perf_event_exit_task() that could imply). > - */ > err = -EACCES; > if (!perfmon_capable() && !ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS)) > - goto err_cred; > + goto err_task; > } > > if (flags & PERF_FLAG_PID_CGROUP) > @@ -11844,6 +11832,24 @@ SYSCALL_DEFINE5(perf_event_open, > goto err_context; > } > > + if (task) { > + err = mutex_lock_interruptible(&task->signal->exec_update_mutex); > + if (err) > + goto err_file; > + > + /* > + * Preserve ptrace permission check for backwards compatibility. > + * > + * We must hold exec_update_mutex across this and any potential > + * perf_install_in_context() call for this new event to > + * serialize against exec() altering our credentials (and the > + * perf_event_exit_task() that could imply). > + */ > + err = -EACCES; > + if (!perfmon_capable() && !ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS)) > + goto err_cred; > + } > + > if (move_group) { > gctx = __perf_event_ctx_lock_double(group_leader, ctx); > > @@ -12019,7 +12025,10 @@ SYSCALL_DEFINE5(perf_event_open, > if (move_group) > perf_event_ctx_unlock(group_leader, gctx); > mutex_unlock(&ctx->mutex); > -/* err_file: */ > +err_cred: > + if (task) > + mutex_unlock(&task->signal->exec_update_mutex); > +err_file: > fput(event_file); > err_context: > perf_unpin_context(ctx); > @@ -12031,9 +12040,6 @@ SYSCALL_DEFINE5(perf_event_open, > */ > if (!event_file) > free_event(event); > -err_cred: > - if (task) > - mutex_unlock(&task->signal->exec_update_mutex); > err_task: > if (task) > put_task_struct(task); ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: possible deadlock in proc_pid_syscall (2) 2020-08-30 12:31 ` Eric W. Biederman @ 2020-08-31 7:43 ` peterz 2020-08-31 13:52 ` Eric W. Biederman 2020-09-02 9:57 ` peterz 1 sibling, 1 reply; 7+ messages in thread From: peterz @ 2020-08-31 7:43 UTC (permalink / raw) To: Eric W. Biederman Cc: syzbot, adobriyan, akpm, avagin, christian, gladkov.alexey, keescook, linux-fsdevel, linux-kernel, syzkaller-bugs, walken, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Miklos Szeredi, jannh On Sun, Aug 30, 2020 at 07:31:39AM -0500, Eric W. Biederman wrote: > I am thinking that for cases where we want to do significant work it > might be better to ask the process to pause at someplace safe (probably > get_signal) and then do all of the work when we know nothing is changing > in the process. > > I don't really like the idea of checking and then checking again. We > might have to do it but it feels like the model is wrong somewhere. > > Given that this is tricky to hit in practice, and given that I am > already working the general problem of how to sort out the locking I am > going to work this with the rest of the thorny issues of in exec. This > feels like a case where the proper solution is that we simply need > something better than a mutex. One possible alternative would be something RCU-like, surround the thing with get_task_cred() / put_cred() and then have commit_creds() wait for the usage of the old creds to drop to 0 before continuing. (Also, get_cred_rcu() is disgusting for casting away const) But this could be complete garbage, I'm not much familiar with any of thise code. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: possible deadlock in proc_pid_syscall (2) 2020-08-31 7:43 ` peterz @ 2020-08-31 13:52 ` Eric W. Biederman 0 siblings, 0 replies; 7+ messages in thread From: Eric W. Biederman @ 2020-08-31 13:52 UTC (permalink / raw) To: peterz Cc: syzbot, adobriyan, akpm, avagin, christian, gladkov.alexey, keescook, linux-fsdevel, linux-kernel, syzkaller-bugs, walken, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Miklos Szeredi, jannh peterz@infradead.org writes: > On Sun, Aug 30, 2020 at 07:31:39AM -0500, Eric W. Biederman wrote: > >> I am thinking that for cases where we want to do significant work it >> might be better to ask the process to pause at someplace safe (probably >> get_signal) and then do all of the work when we know nothing is changing >> in the process. >> >> I don't really like the idea of checking and then checking again. We >> might have to do it but it feels like the model is wrong somewhere. >> >> Given that this is tricky to hit in practice, and given that I am >> already working the general problem of how to sort out the locking I am >> going to work this with the rest of the thorny issues of in exec. This >> feels like a case where the proper solution is that we simply need >> something better than a mutex. > > One possible alternative would be something RCU-like, surround the thing > with get_task_cred() / put_cred() and then have commit_creds() wait for > the usage of the old creds to drop to 0 before continuing. > > (Also, get_cred_rcu() is disgusting for casting away const) > > But this could be complete garbage, I'm not much familiar with any of > thise code. This looks like an area of code that will take a couple of passes to get 100% right. Usually changing creds happens atomically, and separately from everything else so we simply don't care if there a race, Either the old creds or the new creds are valid. With exec the situation is trickier as several things in addition to the cred are changing at the same time. So a lock is needed. Now that it is separated from the cred_guard_mutex, probably the easiest solution is to make exec_update_mutex a sleeping reader writer lock. There are fewer cases that matter as such a lock would only block on exec (the writer). I don't understand perf well enough to do much without carefully studying the code. Eric ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: possible deadlock in proc_pid_syscall (2) 2020-08-30 12:31 ` Eric W. Biederman 2020-08-31 7:43 ` peterz @ 2020-09-02 9:57 ` peterz 1 sibling, 0 replies; 7+ messages in thread From: peterz @ 2020-09-02 9:57 UTC (permalink / raw) To: Eric W. Biederman Cc: syzbot, adobriyan, akpm, avagin, christian, gladkov.alexey, keescook, linux-fsdevel, linux-kernel, syzkaller-bugs, walken, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Miklos Szeredi, jannh On Sun, Aug 30, 2020 at 07:31:39AM -0500, Eric W. Biederman wrote: > peterz@infradead.org writes: > > Could we check privs twice instead? > > > > Something like the completely untested below.. > > That might work. > > I am thinking that for cases where we want to do significant work it > might be better to ask the process to pause at someplace safe (probably > get_signal) and then do all of the work when we know nothing is changing > in the process. > > I don't really like the idea of checking and then checking again. We > might have to do it but it feels like the model is wrong somewhere. Another possible aproach might be to grab a copy of the cred pointer and have the final install check that. It means we need to allow perf_install_in_context() to fail though. That might be a little more work. > I had not realized before this how much setting up tracing in > perf_even_open looks like attaching a debugger in ptrace_attach. Same problem; once you've attached a perf event you can observe much of what the task does. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-09-02 9:57 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-08-28 4:57 possible deadlock in proc_pid_syscall (2) syzbot 2020-08-28 12:01 ` Eric W. Biederman 2020-08-28 12:37 ` peterz 2020-08-30 12:31 ` Eric W. Biederman 2020-08-31 7:43 ` peterz 2020-08-31 13:52 ` Eric W. Biederman 2020-09-02 9:57 ` peterz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).