* Re: kernel panic: corrupted stack end in dput [not found] <20190703064307.13740-1-hdanton@sina.com> @ 2019-07-03 14:40 ` Al Viro 2019-07-03 15:23 ` Al Viro 0 siblings, 1 reply; 7+ messages in thread From: Al Viro @ 2019-07-03 14:40 UTC (permalink / raw) To: Hillf Danton; +Cc: syzbot, linux-fsdevel, linux-kernel, netdev, syzkaller-bugs On Wed, Jul 03, 2019 at 02:43:07PM +0800, Hillf Danton wrote: > > This is very much *NOT* fine. > > 1) trylock can fail from any number of reasons, starting > > with "somebody is going through the hash chain doing a lookup on > > something completely unrelated" > > They are also a red light that we need to bail out of spiraling up > the directory hierarchy imho. Translation: "let's leak the reference to parent, shall we?" > > 2) whoever had been holding the lock and whatever they'd > > been doing might be over right after we get the return value from > > spin_trylock(). > > Or after we send a mail using git. I don't know. > > > 3) even had that been really somebody adding children in > > the same parent *AND* even if they really kept doing that, rather > > than unlocking and buggering off, would you care to explain why > > dentry_unlist() called by __dentry_kill() and removing the victim > > from the list of children would be safe to do in parallel with that? > > > My bad. I have to walk around that unsafety. WHAT unsafety? Can you explain what are you seeing and how to reproduce it, whatever it is? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: kernel panic: corrupted stack end in dput 2019-07-03 14:40 ` kernel panic: corrupted stack end in dput Al Viro @ 2019-07-03 15:23 ` Al Viro 2019-07-03 15:45 ` Eric Biggers 0 siblings, 1 reply; 7+ messages in thread From: Al Viro @ 2019-07-03 15:23 UTC (permalink / raw) To: Hillf Danton; +Cc: syzbot, linux-fsdevel, linux-kernel, netdev, syzkaller-bugs On Wed, Jul 03, 2019 at 03:40:00PM +0100, Al Viro wrote: > On Wed, Jul 03, 2019 at 02:43:07PM +0800, Hillf Danton wrote: > > > > This is very much *NOT* fine. > > > 1) trylock can fail from any number of reasons, starting > > > with "somebody is going through the hash chain doing a lookup on > > > something completely unrelated" > > > > They are also a red light that we need to bail out of spiraling up > > the directory hierarchy imho. > > Translation: "let's leak the reference to parent, shall we?" > > > > 2) whoever had been holding the lock and whatever they'd > > > been doing might be over right after we get the return value from > > > spin_trylock(). > > > > Or after we send a mail using git. I don't know. > > > > > 3) even had that been really somebody adding children in > > > the same parent *AND* even if they really kept doing that, rather > > > than unlocking and buggering off, would you care to explain why > > > dentry_unlist() called by __dentry_kill() and removing the victim > > > from the list of children would be safe to do in parallel with that? > > > > > My bad. I have to walk around that unsafety. > > WHAT unsafety? Can you explain what are you seeing and how to > reproduce it, whatever it is? BTW, what makes you think that it's something inside dput() itself? All I see is that at some point in the beginning of the loop body in dput() we observe a buggered stack. Is that the first iteration through the loop? IOW, is that just the place where we first notice preexisting corruption, or is that something the code called from that loop does? If it's a stack overflow, I would be very surprised to see it here - dput() is iterative and it's called on a very shallow stack in those traces. What happens if you e.g. turn that dput(dentry); in __fput() into rcu_read_lock(); rcu_read_unlock(); // trigger the check dput(dentry); and run your reporducer? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: kernel panic: corrupted stack end in dput 2019-07-03 15:23 ` Al Viro @ 2019-07-03 15:45 ` Eric Biggers 2019-07-03 16:14 ` John Fastabend 0 siblings, 1 reply; 7+ messages in thread From: Eric Biggers @ 2019-07-03 15:45 UTC (permalink / raw) To: Al Viro Cc: Hillf Danton, syzbot, linux-fsdevel, linux-kernel, netdev, syzkaller-bugs, Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song, bpf, Boris Pismenny, Aviad Yehezkel, Dave Watson, John Fastabend [+bpf and tls maintainers] On Wed, Jul 03, 2019 at 04:23:34PM +0100, Al Viro wrote: > On Wed, Jul 03, 2019 at 03:40:00PM +0100, Al Viro wrote: > > On Wed, Jul 03, 2019 at 02:43:07PM +0800, Hillf Danton wrote: > > > > > > This is very much *NOT* fine. > > > > 1) trylock can fail from any number of reasons, starting > > > > with "somebody is going through the hash chain doing a lookup on > > > > something completely unrelated" > > > > > > They are also a red light that we need to bail out of spiraling up > > > the directory hierarchy imho. > > > > Translation: "let's leak the reference to parent, shall we?" > > > > > > 2) whoever had been holding the lock and whatever they'd > > > > been doing might be over right after we get the return value from > > > > spin_trylock(). > > > > > > Or after we send a mail using git. I don't know. > > > > > > > 3) even had that been really somebody adding children in > > > > the same parent *AND* even if they really kept doing that, rather > > > > than unlocking and buggering off, would you care to explain why > > > > dentry_unlist() called by __dentry_kill() and removing the victim > > > > from the list of children would be safe to do in parallel with that? > > > > > > > My bad. I have to walk around that unsafety. > > > > WHAT unsafety? Can you explain what are you seeing and how to > > reproduce it, whatever it is? > > BTW, what makes you think that it's something inside dput() itself? > All I see is that at some point in the beginning of the loop body > in dput() we observe a buggered stack. > > Is that the first iteration through the loop? IOW, is that just > the place where we first notice preexisting corruption, or is > that something the code called from that loop does? If it's > a stack overflow, I would be very surprised to see it here - > dput() is iterative and it's called on a very shallow stack in > those traces. > > What happens if you e.g. turn that > dput(dentry); > in __fput() into > rcu_read_lock(); rcu_read_unlock(); // trigger the check > dput(dentry); > > and run your reporducer? > Please don't waste your time on this, it looks like just another report from the massive memory corruption in BPF and/or TLS. Look at reproducer: bpf$MAP_CREATE(0x0, &(0x7f0000000280)={0xf, 0x4, 0x4, 0x400, 0x0, 0x1}, 0x3c) socket$rxrpc(0x21, 0x2, 0x800000000a) r0 = socket$inet6_tcp(0xa, 0x1, 0x0) setsockopt$inet6_tcp_int(r0, 0x6, 0x13, &(0x7f00000000c0)=0x100000001, 0x1d4) connect$inet6(r0, &(0x7f0000000140), 0x1c) bpf$MAP_CREATE(0x0, &(0x7f0000000000)={0x5}, 0xfffffffffffffdcb) bpf$MAP_CREATE(0x2, &(0x7f0000003000)={0x3, 0x0, 0x77fffb, 0x0, 0x10020000000, 0x0}, 0x2c) setsockopt$inet6_tcp_TCP_ULP(r0, 0x6, 0x1f, &(0x7f0000000040)='tls\x00', 0x4) It's the same as like 20 other syzbot reports. - Eric ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: kernel panic: corrupted stack end in dput 2019-07-03 15:45 ` Eric Biggers @ 2019-07-03 16:14 ` John Fastabend 0 siblings, 0 replies; 7+ messages in thread From: John Fastabend @ 2019-07-03 16:14 UTC (permalink / raw) To: Eric Biggers, Al Viro Cc: Hillf Danton, syzbot, linux-fsdevel, linux-kernel, netdev, syzkaller-bugs, Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song, bpf, Boris Pismenny, Aviad Yehezkel, Dave Watson, John Fastabend Eric Biggers wrote: > [+bpf and tls maintainers] > > On Wed, Jul 03, 2019 at 04:23:34PM +0100, Al Viro wrote: > > On Wed, Jul 03, 2019 at 03:40:00PM +0100, Al Viro wrote: > > > On Wed, Jul 03, 2019 at 02:43:07PM +0800, Hillf Danton wrote: > > > > > > > > This is very much *NOT* fine. > > > > > 1) trylock can fail from any number of reasons, starting > > > > > with "somebody is going through the hash chain doing a lookup on > > > > > something completely unrelated" > > > > > > > > They are also a red light that we need to bail out of spiraling up > > > > the directory hierarchy imho. > > > > > > Translation: "let's leak the reference to parent, shall we?" > > > > > > > > 2) whoever had been holding the lock and whatever they'd > > > > > been doing might be over right after we get the return value from > > > > > spin_trylock(). > > > > > > > > Or after we send a mail using git. I don't know. > > > > > > > > > 3) even had that been really somebody adding children in > > > > > the same parent *AND* even if they really kept doing that, rather > > > > > than unlocking and buggering off, would you care to explain why > > > > > dentry_unlist() called by __dentry_kill() and removing the victim > > > > > from the list of children would be safe to do in parallel with that? > > > > > > > > > My bad. I have to walk around that unsafety. > > > > > > WHAT unsafety? Can you explain what are you seeing and how to > > > reproduce it, whatever it is? > > > > BTW, what makes you think that it's something inside dput() itself? > > All I see is that at some point in the beginning of the loop body > > in dput() we observe a buggered stack. > > > > Is that the first iteration through the loop? IOW, is that just > > the place where we first notice preexisting corruption, or is > > that something the code called from that loop does? If it's > > a stack overflow, I would be very surprised to see it here - > > dput() is iterative and it's called on a very shallow stack in > > those traces. > > > > What happens if you e.g. turn that > > dput(dentry); > > in __fput() into > > rcu_read_lock(); rcu_read_unlock(); // trigger the check > > dput(dentry); > > > > and run your reporducer? > > > > Please don't waste your time on this, it looks like just another report from the > massive memory corruption in BPF and/or TLS. Look at reproducer: > > bpf$MAP_CREATE(0x0, &(0x7f0000000280)={0xf, 0x4, 0x4, 0x400, 0x0, 0x1}, 0x3c) > socket$rxrpc(0x21, 0x2, 0x800000000a) > r0 = socket$inet6_tcp(0xa, 0x1, 0x0) > setsockopt$inet6_tcp_int(r0, 0x6, 0x13, &(0x7f00000000c0)=0x100000001, 0x1d4) > connect$inet6(r0, &(0x7f0000000140), 0x1c) > bpf$MAP_CREATE(0x0, &(0x7f0000000000)={0x5}, 0xfffffffffffffdcb) > bpf$MAP_CREATE(0x2, &(0x7f0000003000)={0x3, 0x0, 0x77fffb, 0x0, 0x10020000000, 0x0}, 0x2c) > setsockopt$inet6_tcp_TCP_ULP(r0, 0x6, 0x1f, &(0x7f0000000040)='tls\x00', 0x4) > > It's the same as like 20 other syzbot reports. There is a missing synchronize_rcu we need to add and we have a race between map_free and tls close at the moment. The race cuases us to incorrectly set the sk->prot pointers when tls socket is closed in this case. I've added a hook to the ULP side now that should let the map_free reset the saved sk->prot pointers on the TLS side and am testing this now. The 20 syzbot reports appear to all be due to these two issues. This has nothing to do with dput(). Thanks, John > > - Eric ^ permalink raw reply [flat|nested] 7+ messages in thread
* kernel panic: corrupted stack end in dput @ 2019-07-01 8:27 syzbot 2019-07-01 12:14 ` syzbot 2019-07-02 13:21 ` Al Viro 0 siblings, 2 replies; 7+ messages in thread From: syzbot @ 2019-07-01 8:27 UTC (permalink / raw) To: linux-fsdevel, linux-kernel, netdev, syzkaller-bugs, viro Hello, syzbot found the following crash on: HEAD commit: 7b75e49d net: dsa: mv88e6xxx: wait after reset deactivation git tree: net console output: https://syzkaller.appspot.com/x/log.txt?x=10f51b13a00000 kernel config: https://syzkaller.appspot.com/x/.config?x=e7c31a94f66cc0aa dashboard link: https://syzkaller.appspot.com/bug?extid=d88a977731a9888db7ba compiler: gcc (GCC) 9.0.0 20181231 (experimental) syz repro: https://syzkaller.appspot.com/x/repro.syz?x=130f49bda00000 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+d88a977731a9888db7ba@syzkaller.appspotmail.com Kernel panic - not syncing: corrupted stack end detected inside scheduler CPU: 1 PID: 8936 Comm: syz-executor.3 Not tainted 5.2.0-rc6+ #69 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 panic+0x2cb/0x744 kernel/panic.c:219 schedule_debug kernel/sched/core.c:3272 [inline] __schedule+0x155d/0x1560 kernel/sched/core.c:3381 preempt_schedule_notrace kernel/sched/core.c:3664 [inline] preempt_schedule_notrace+0xa0/0x130 kernel/sched/core.c:3635 ___preempt_schedule_notrace+0x16/0x2f rcu_is_watching+0x23/0x30 kernel/rcu/tree.c:873 rcu_read_lock include/linux/rcupdate.h:594 [inline] dput+0x41e/0x690 fs/dcache.c:845 __fput+0x424/0x890 fs/file_table.c:293 ____fput+0x16/0x20 fs/file_table.c:313 task_work_run+0x145/0x1c0 kernel/task_work.c:113 tracehook_notify_resume include/linux/tracehook.h:185 [inline] exit_to_usermode_loop+0x273/0x2c0 arch/x86/entry/common.c:168 prepare_exit_to_usermode arch/x86/entry/common.c:199 [inline] syscall_return_slowpath arch/x86/entry/common.c:279 [inline] do_syscall_64+0x58e/0x680 arch/x86/entry/common.c:304 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x413201 Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3 48 83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01 RSP: 002b:00007ffcee6f8e20 EFLAGS: 00000293 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000413201 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005 RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff R10: 00007ffcee6f8f00 R11: 0000000000000293 R12: 000000000075c9a0 R13: 000000000075c9a0 R14: 00000000000127ed R15: ffffffffffffffff Shutting down cpus with NMI Kernel Offset: disabled Rebooting in 86400 seconds.. --- This bug is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this bug report. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. syzbot can test patches for this bug, for details see: https://goo.gl/tpsmEJ#testing-patches ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: kernel panic: corrupted stack end in dput 2019-07-01 8:27 syzbot @ 2019-07-01 12:14 ` syzbot 2019-07-02 13:21 ` Al Viro 1 sibling, 0 replies; 7+ messages in thread From: syzbot @ 2019-07-01 12:14 UTC (permalink / raw) To: ast, daniel, john.fastabend, linux-fsdevel, linux-kernel, netdev, syzkaller-bugs, viro syzbot has bisected this bug to: commit e9db4ef6bf4ca9894bb324c76e01b8f1a16b2650 Author: John Fastabend <john.fastabend@gmail.com> Date: Sat Jun 30 13:17:47 2018 +0000 bpf: sockhash fix omitted bucket lock in sock_close bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=17f7f4d5a00000 start commit: 7b75e49d net: dsa: mv88e6xxx: wait after reset deactivation git tree: net final crash: https://syzkaller.appspot.com/x/report.txt?x=140ff4d5a00000 console output: https://syzkaller.appspot.com/x/log.txt?x=100ff4d5a00000 kernel config: https://syzkaller.appspot.com/x/.config?x=e7c31a94f66cc0aa dashboard link: https://syzkaller.appspot.com/bug?extid=d88a977731a9888db7ba syz repro: https://syzkaller.appspot.com/x/repro.syz?x=130f49bda00000 Reported-by: syzbot+d88a977731a9888db7ba@syzkaller.appspotmail.com Fixes: e9db4ef6bf4c ("bpf: sockhash fix omitted bucket lock in sock_close") For information about bisection process see: https://goo.gl/tpsmEJ#bisection ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: kernel panic: corrupted stack end in dput 2019-07-01 8:27 syzbot 2019-07-01 12:14 ` syzbot @ 2019-07-02 13:21 ` Al Viro 1 sibling, 0 replies; 7+ messages in thread From: Al Viro @ 2019-07-02 13:21 UTC (permalink / raw) To: Hillf Danton; +Cc: syzbot, linux-fsdevel, linux-kernel, netdev, syzkaller-bugs On Tue, Jul 02, 2019 at 05:21:26PM +0800, Hillf Danton wrote: > Hello, > --- a/fs/dcache.c > +++ b/fs/dcache.c > @@ -673,14 +673,11 @@ static struct dentry *dentry_kill(struct dentry *dentry) > if (!IS_ROOT(dentry)) { > parent = dentry->d_parent; > if (unlikely(!spin_trylock(&parent->d_lock))) { > - parent = __lock_parent(dentry); > - if (likely(inode || !dentry->d_inode)) > - goto got_locks; > - /* negative that became positive */ > - if (parent) > - spin_unlock(&parent->d_lock); > - inode = dentry->d_inode; > - goto slow_positive; > + /* > + * fine if peer is busy either populating or > + * cleaning up parent > + */ > + parent = NULL; > } > } > __dentry_kill(dentry); This is very much *NOT* fine. 1) trylock can fail from any number of reasons, starting with "somebody is going through the hash chain doing a lookup on something completely unrelated" 2) whoever had been holding the lock and whatever they'd been doing might be over right after we get the return value from spin_trylock(). 3) even had that been really somebody adding children in the same parent *AND* even if they really kept doing that, rather than unlocking and buggering off, would you care to explain why dentry_unlist() called by __dentry_kill() and removing the victim from the list of children would be safe to do in parallel with that? NAK, in case it's not obvious from the above. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-07-03 16:14 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20190703064307.13740-1-hdanton@sina.com> 2019-07-03 14:40 ` kernel panic: corrupted stack end in dput Al Viro 2019-07-03 15:23 ` Al Viro 2019-07-03 15:45 ` Eric Biggers 2019-07-03 16:14 ` John Fastabend 2019-07-01 8:27 syzbot 2019-07-01 12:14 ` syzbot 2019-07-02 13:21 ` Al Viro
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.