* [syzbot] BUG: sleeping function called from invalid context in static_key_slow_inc @ 2022-11-17 9:55 syzbot 2022-11-17 12:03 ` syzbot 2022-11-18 1:56 ` syzbot 0 siblings, 2 replies; 11+ messages in thread From: syzbot @ 2022-11-17 9:55 UTC (permalink / raw) To: Jason, frederic, juri.lelli, kirill.shutemov, linux-kernel, mark.rutland, netdev, peterz, sathyanarayanan.kuppuswamy, steven.price, syzkaller-bugs, tglx Hello, syzbot found the following issue on: HEAD commit: 064bc7312bd0 netdevsim: Fix memory leak of nsim_dev->fa_co.. git tree: net console output: https://syzkaller.appspot.com/x/log.txt?x=13d3204e880000 kernel config: https://syzkaller.appspot.com/x/.config?x=a33ac7bbc22a8c35 dashboard link: https://syzkaller.appspot.com/bug?extid=703d9e154b3b58277261 compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2 Unfortunately, I don't have any reproducer for this issue yet. Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/0634e1c0e4cb/disk-064bc731.raw.xz vmlinux: https://storage.googleapis.com/syzbot-assets/fe1039d2de22/vmlinux-064bc731.xz kernel image: https://storage.googleapis.com/syzbot-assets/5a0d673875fa/bzImage-064bc731.xz IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+703d9e154b3b58277261@syzkaller.appspotmail.com BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:49 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 9420, name: syz-executor.5 preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 INFO: lockdep is turned off. Preemption disabled at: [<0000000000000000>] 0x0 CPU: 1 PID: 9420 Comm: syz-executor.5 Not tainted 6.1.0-rc4-syzkaller-00212-g064bc7312bd0 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd1/0x138 lib/dump_stack.c:106 __might_resched.cold+0x222/0x26b kernel/sched/core.c:9890 percpu_down_read include/linux/percpu-rwsem.h:49 [inline] cpus_read_lock+0x1b/0x140 kernel/cpu.c:310 static_key_slow_inc+0x12/0x20 kernel/jump_label.c:158 udp_tunnel_encap_enable include/net/udp_tunnel.h:187 [inline] setup_udp_tunnel_sock+0x43d/0x550 net/ipv4/udp_tunnel_core.c:81 l2tp_tunnel_register+0xc51/0x1210 net/l2tp/l2tp_core.c:1509 pppol2tp_connect+0xcdc/0x1a10 net/l2tp/l2tp_ppp.c:723 __sys_connect_file+0x153/0x1a0 net/socket.c:1976 __sys_connect+0x165/0x1a0 net/socket.c:1993 __do_sys_connect net/socket.c:2003 [inline] __se_sys_connect net/socket.c:2000 [inline] __x64_sys_connect+0x73/0xb0 net/socket.c:2000 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fd94d28b639 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fd94e038168 EFLAGS: 00000246 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: 00007fd94d3abf80 RCX: 00007fd94d28b639 RDX: 000000000000002e RSI: 0000000020000000 RDI: 0000000000000003 RBP: 00007fd94d2e6ae9 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffde93128bf R14: 00007fd94e038300 R15: 0000000000022000 </TASK> --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [syzbot] BUG: sleeping function called from invalid context in static_key_slow_inc 2022-11-17 9:55 [syzbot] BUG: sleeping function called from invalid context in static_key_slow_inc syzbot @ 2022-11-17 12:03 ` syzbot 2022-11-18 1:56 ` syzbot 1 sibling, 0 replies; 11+ messages in thread From: syzbot @ 2022-11-17 12:03 UTC (permalink / raw) To: Jason, davem, edumazet, frederic, jacob.e.keller, jiri, juri.lelli, kirill.shutemov, kuba, linux-kernel, mark.rutland, netdev, nicolas.dichtel, pabeni, paul, peterz, razor, sathyanarayanan.kuppuswamy, steven.price, syzkaller-bugs, tglx syzbot has found a reproducer for the following issue on: HEAD commit: 064bc7312bd0 netdevsim: Fix memory leak of nsim_dev->fa_co.. git tree: net console+strace: https://syzkaller.appspot.com/x/log.txt?x=16b2b231880000 kernel config: https://syzkaller.appspot.com/x/.config?x=a33ac7bbc22a8c35 dashboard link: https://syzkaller.appspot.com/bug?extid=703d9e154b3b58277261 compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13cd2f79880000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=109e1695880000 Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/0634e1c0e4cb/disk-064bc731.raw.xz vmlinux: https://storage.googleapis.com/syzbot-assets/fe1039d2de22/vmlinux-064bc731.xz kernel image: https://storage.googleapis.com/syzbot-assets/5a0d673875fa/bzImage-064bc731.xz IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+703d9e154b3b58277261@syzkaller.appspotmail.com BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:49 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3634, name: syz-executor167 preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 3 locks held by syz-executor167/3634: #0: ffffffff8df6b530 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:860 #1: ffffffff8df6b5e8 (genl_mutex){+.+.}-{3:3}, at: genl_lock net/netlink/genetlink.c:33 [inline] #1: ffffffff8df6b5e8 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg+0x50d/0x780 net/netlink/genetlink.c:848 #2: ffff8880182fa0b8 (k-clock-AF_INET){+++.}-{2:2}, at: l2tp_tunnel_register+0x126/0x1210 net/l2tp/l2tp_core.c:1477 Preemption disabled at: [<0000000000000000>] 0x0 CPU: 1 PID: 3634 Comm: syz-executor167 Not tainted 6.1.0-rc4-syzkaller-00212-g064bc7312bd0 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd1/0x138 lib/dump_stack.c:106 __might_resched.cold+0x222/0x26b kernel/sched/core.c:9890 percpu_down_read include/linux/percpu-rwsem.h:49 [inline] cpus_read_lock+0x1b/0x140 kernel/cpu.c:310 static_key_slow_inc+0x12/0x20 kernel/jump_label.c:158 udp_tunnel_encap_enable include/net/udp_tunnel.h:189 [inline] setup_udp_tunnel_sock+0x3e1/0x550 net/ipv4/udp_tunnel_core.c:81 l2tp_tunnel_register+0xc51/0x1210 net/l2tp/l2tp_core.c:1509 l2tp_nl_cmd_tunnel_create+0x3d6/0x8b0 net/l2tp/l2tp_netlink.c:245 genl_family_rcv_msg_doit+0x228/0x320 net/netlink/genetlink.c:756 genl_family_rcv_msg net/netlink/genetlink.c:833 [inline] genl_rcv_msg+0x445/0x780 net/netlink/genetlink.c:850 netlink_rcv_skb+0x157/0x430 net/netlink/af_netlink.c:2540 genl_rcv+0x28/0x40 net/netlink/genetlink.c:861 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x91b/0xe10 net/netlink/af_netlink.c:1921 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xd3/0x120 net/socket.c:734 sock_no_sendpage+0x10c/0x160 net/core/sock.c:3219 kernel_sendpage.part.0+0x1d5/0x700 net/socket.c:3561 kernel_sendpage net/socket.c:3558 [inline] sock_sendpage+0xe3/0x140 net/socket.c:1054 pipe_to_sendpage+0x2b1/0x380 fs/splice.c:361 splice_from_pipe_feed fs/splice.c:415 [inline] __splice_from_pipe+0x449/0x8a0 fs/splice.c:559 splice_from_pipe fs/splice.c:594 [inline] generic_splice_sendpage+0xd8/0x140 fs/splice.c:743 do_splice_from fs/splice.c:764 [inline] direct_splice_actor+0x114/0x180 fs/splice.c:931 splice_direct_to_actor+0x335/0x8a0 fs/splice.c:886 do_splice_direct+0x1ab/0x280 fs/splice.c:974 do_sendfile+0xb19/0x1270 fs/read_write.c:1255 __do_sys_sendfile64 fs/read_write.c:1323 [inline] __se_sys_sendfile64 fs/read_write.c:1309 [inline] __x64_sys_sendfile64+0x1d0/0x210 fs/read_write.c:1309 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f93d1567cb9 Code: 28 c3 e8 5a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffdd8ae4a88 EFLAGS: 00000246 ORIG_RAX: 0000000000000028 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f93d1567cb9 RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000005 RBP: 00007f93d152b680 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000100000000 R11: 0000000000000246 R12: 00007f93d152b710 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [syzbot] BUG: sleeping function called from invalid context in static_key_slow_inc 2022-11-17 9:55 [syzbot] BUG: sleeping function called from invalid context in static_key_slow_inc syzbot 2022-11-17 12:03 ` syzbot @ 2022-11-18 1:56 ` syzbot 2022-11-18 11:51 ` [PATCH 6.1-rc6] l2tp: call udp_tunnel_encap_enable() and sock_release() without sk_callback_lock Tetsuo Handa 1 sibling, 1 reply; 11+ messages in thread From: syzbot @ 2022-11-18 1:56 UTC (permalink / raw) To: Jason, davem, edumazet, frederic, gnault, jacob.e.keller, jakub, jiri, johannes, juri.lelli, kirill.shutemov, kuba, linux-kernel, mark.rutland, netdev, nicolas.dichtel, pabeni, paul, peterz, razor, sathyanarayanan.kuppuswamy, steven.price, syzkaller-bugs, tglx, tparkin syzbot has bisected this issue to: commit b68777d54fac21fc833ec26ea1a2a84f975ab035 Author: Jakub Sitnicki <jakub@cloudflare.com> Date: Mon Nov 14 19:16:19 2022 +0000 l2tp: Serialize access to sk_user_data with sk_callback_lock bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1600bb49880000 start commit: 064bc7312bd0 netdevsim: Fix memory leak of nsim_dev->fa_co.. git tree: net final oops: https://syzkaller.appspot.com/x/report.txt?x=1500bb49880000 console output: https://syzkaller.appspot.com/x/log.txt?x=1100bb49880000 kernel config: https://syzkaller.appspot.com/x/.config?x=a33ac7bbc22a8c35 dashboard link: https://syzkaller.appspot.com/bug?extid=703d9e154b3b58277261 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13cd2f79880000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=109e1695880000 Reported-by: syzbot+703d9e154b3b58277261@syzkaller.appspotmail.com Fixes: b68777d54fac ("l2tp: Serialize access to sk_user_data with sk_callback_lock") For information about bisection process see: https://goo.gl/tpsmEJ#bisection ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 6.1-rc6] l2tp: call udp_tunnel_encap_enable() and sock_release() without sk_callback_lock 2022-11-18 1:56 ` syzbot @ 2022-11-18 11:51 ` Tetsuo Handa 2022-11-18 12:36 ` Eric Dumazet 0 siblings, 1 reply; 11+ messages in thread From: Tetsuo Handa @ 2022-11-18 11:51 UTC (permalink / raw) To: David S. Miller", Eric Dumazet, Jakub Kicinski, Paolo Abeni, Hideaki YOSHIFUJI, David Ahern, Jakub Sitnicki, Tom Parkin Cc: syzbot, netdev, syzkaller-bugs, Haowei Yan syzbot is reporting sleep in atomic context at l2tp_tunnel_register() [1], for commit b68777d54fac ("l2tp: Serialize access to sk_user_data with sk_callback_lock") missed that udp_tunnel_encap_enable() from setup_udp_tunnel_sock() might sleep. Since we don't want to drop sk->sk_callback_lock inside setup_udp_tunnel_sock() right before calling udp_tunnel_encap_enable(), introduce a variant which does not call udp_tunnel_encap_enable(). And call udp_tunnel_encap_enable() after dropping sk->sk_callback_lock. Also, drop sk->sk_callback_lock before calling sock_release() in order to avoid circular locking dependency problem. Link: https://syzkaller.appspot.com/bug?extid=703d9e154b3b58277261 [1] Reported-by: syzbot <syzbot+703d9e154b3b58277261@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: b68777d54fac ("l2tp: Serialize access to sk_user_data with sk_callback_lock") --- F.Y.I. Below is the lockdep message: ====================================================== WARNING: possible circular locking dependency detected 6.1.0-rc5+ #2 Not tainted ------------------------------------------------------ a.out/2794 is trying to acquire lock: ffff8c628878bdf0 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: sk_common_release+0x19/0xe0 but task is already holding lock: ffff8c628878c078 (k-clock-AF_INET){+++.}-{2:2}, at: l2tp_tunnel_register+0x64/0x5e0 [l2tp_core] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (k-clock-AF_INET){+++.}-{2:2}: lock_acquire+0xc7/0x2e0 _raw_read_lock_bh+0x3d/0x80 sock_i_uid+0x19/0x40 udp_lib_lport_inuse+0x2c/0x120 udp_lib_get_port+0xf8/0x570 udp_v4_get_port+0xbb/0xc0 __inet_bind+0x10e/0x240 inet_bind+0x2b/0x40 kernel_bind+0xb/0x10 udp_sock_create4+0x97/0x160 [udp_tunnel] l2tp_tunnel_sock_create+0x316/0x330 [l2tp_core] l2tp_tunnel_register+0x394/0x5e0 [l2tp_core] l2tp_nl_cmd_tunnel_create+0xe8/0x200 [l2tp_netlink] genl_family_rcv_msg_doit.isra.17+0x102/0x140 genl_rcv_msg+0x112/0x270 netlink_rcv_skb+0x4f/0x100 genl_rcv+0x23/0x40 netlink_unicast+0x1a5/0x280 netlink_sendmsg+0x22f/0x490 sock_sendmsg+0x2e/0x40 ____sys_sendmsg+0x1e9/0x210 ___sys_sendmsg+0x77/0xb0 __sys_sendmsg+0x60/0xb0 __x64_sys_sendmsg+0x1a/0x20 do_syscall_64+0x34/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd -> #1 (&table->hash[i].lock){+...}-{2:2}: lock_acquire+0xc7/0x2e0 _raw_spin_lock_bh+0x31/0x40 udp_lib_get_port+0xda/0x570 udp_v4_get_port+0xbb/0xc0 __inet_bind+0x10e/0x240 inet_bind+0x2b/0x40 kernel_bind+0xb/0x10 udp_sock_create4+0x97/0x160 [udp_tunnel] l2tp_tunnel_sock_create+0x316/0x330 [l2tp_core] l2tp_tunnel_register+0x394/0x5e0 [l2tp_core] l2tp_nl_cmd_tunnel_create+0xe8/0x200 [l2tp_netlink] genl_family_rcv_msg_doit.isra.17+0x102/0x140 genl_rcv_msg+0x112/0x270 netlink_rcv_skb+0x4f/0x100 genl_rcv+0x23/0x40 netlink_unicast+0x1a5/0x280 netlink_sendmsg+0x22f/0x490 sock_sendmsg+0x2e/0x40 ____sys_sendmsg+0x1e9/0x210 ___sys_sendmsg+0x77/0xb0 __sys_sendmsg+0x60/0xb0 __x64_sys_sendmsg+0x1a/0x20 do_syscall_64+0x34/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd -> #0 (k-sk_lock-AF_INET){+.+.}-{0:0}: check_prevs_add+0x16a/0x1070 __lock_acquire+0x11bd/0x1670 lock_acquire+0xc7/0x2e0 udp_destroy_sock+0x2d/0xd0 sk_common_release+0x19/0xe0 udp_lib_close+0x9/0x10 inet_release+0x2e/0x60 __sock_release+0x7e/0xa0 sock_release+0xb/0x10 l2tp_tunnel_register+0x3f1/0x5e0 [l2tp_core] l2tp_nl_cmd_tunnel_create+0xe8/0x200 [l2tp_netlink] genl_family_rcv_msg_doit.isra.17+0x102/0x140 genl_rcv_msg+0x112/0x270 netlink_rcv_skb+0x4f/0x100 genl_rcv+0x23/0x40 netlink_unicast+0x1a5/0x280 netlink_sendmsg+0x22f/0x490 sock_sendmsg+0x2e/0x40 ____sys_sendmsg+0x1e9/0x210 ___sys_sendmsg+0x77/0xb0 __sys_sendmsg+0x60/0xb0 __x64_sys_sendmsg+0x1a/0x20 do_syscall_64+0x34/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd other info that might help us debug this: Chain exists of: k-sk_lock-AF_INET --> &table->hash[i].lock --> k-clock-AF_INET Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(k-clock-AF_INET); lock(&table->hash[i].lock); lock(k-clock-AF_INET); lock(k-sk_lock-AF_INET); *** DEADLOCK *** 3 locks held by a.out/2794: #0: ffffffffb466fc30 (cb_lock){++++}-{3:3}, at: genl_rcv+0x14/0x40 #1: ffffffffb466fcc8 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg+0x14d/0x270 #2: ffff8c628878c078 (k-clock-AF_INET){+++.}-{2:2}, at: l2tp_tunnel_register+0x64/0x5e0 [l2tp_core] include/net/udp_tunnel.h | 2 ++ net/ipv4/udp_tunnel_core.c | 10 ++++++++-- net/l2tp/l2tp_core.c | 10 +++++----- 3 files changed, 15 insertions(+), 7 deletions(-) diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h index 72394f441dad..a84fa57bc750 100644 --- a/include/net/udp_tunnel.h +++ b/include/net/udp_tunnel.h @@ -92,6 +92,8 @@ struct udp_tunnel_sock_cfg { /* Setup the given (UDP) sock to receive UDP encapsulated packets */ void setup_udp_tunnel_sock(struct net *net, struct socket *sock, struct udp_tunnel_sock_cfg *sock_cfg); +void setup_udp_tunnel_sock_no_enable(struct net *net, struct socket *sock, + struct udp_tunnel_sock_cfg *sock_cfg); /* -- List of parsable UDP tunnel types -- * diff --git a/net/ipv4/udp_tunnel_core.c b/net/ipv4/udp_tunnel_core.c index 8242c8947340..dff825664000 100644 --- a/net/ipv4/udp_tunnel_core.c +++ b/net/ipv4/udp_tunnel_core.c @@ -57,8 +57,8 @@ int udp_sock_create4(struct net *net, struct udp_port_cfg *cfg, } EXPORT_SYMBOL(udp_sock_create4); -void setup_udp_tunnel_sock(struct net *net, struct socket *sock, - struct udp_tunnel_sock_cfg *cfg) +void setup_udp_tunnel_sock_no_enable(struct net *net, struct socket *sock, + struct udp_tunnel_sock_cfg *cfg) { struct sock *sk = sock->sk; @@ -77,7 +77,13 @@ void setup_udp_tunnel_sock(struct net *net, struct socket *sock, udp_sk(sk)->encap_destroy = cfg->encap_destroy; udp_sk(sk)->gro_receive = cfg->gro_receive; udp_sk(sk)->gro_complete = cfg->gro_complete; +} +EXPORT_SYMBOL_GPL(setup_udp_tunnel_sock_no_enable); +void setup_udp_tunnel_sock(struct net *net, struct socket *sock, + struct udp_tunnel_sock_cfg *cfg) +{ + setup_udp_tunnel_sock_no_enable(net, sock, cfg); udp_tunnel_encap_enable(sock); } EXPORT_SYMBOL_GPL(setup_udp_tunnel_sock); diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c index 754fdda8a5f5..a4f611196c83 100644 --- a/net/l2tp/l2tp_core.c +++ b/net/l2tp/l2tp_core.c @@ -1506,7 +1506,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, .encap_destroy = l2tp_udp_encap_destroy, }; - setup_udp_tunnel_sock(net, sock, &udp_cfg); + setup_udp_tunnel_sock_no_enable(net, sock, &udp_cfg); } else { rcu_assign_sk_user_data(sk, tunnel); } @@ -1519,19 +1519,19 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, trace_register_tunnel(tunnel); + write_unlock(&sk->sk_callback_lock); + if (tunnel->encap == L2TP_ENCAPTYPE_UDP) + udp_tunnel_encap_enable(sock); if (tunnel->fd >= 0) sockfd_put(sock); - - write_unlock(&sk->sk_callback_lock); return 0; err_sock: + write_unlock(&sk->sk_callback_lock); if (tunnel->fd < 0) sock_release(sock); else sockfd_put(sock); - - write_unlock(&sk->sk_callback_lock); err: return ret; } -- 2.18.4 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 6.1-rc6] l2tp: call udp_tunnel_encap_enable() and sock_release() without sk_callback_lock 2022-11-18 11:51 ` [PATCH 6.1-rc6] l2tp: call udp_tunnel_encap_enable() and sock_release() without sk_callback_lock Tetsuo Handa @ 2022-11-18 12:36 ` Eric Dumazet 2022-11-18 13:19 ` Tetsuo Handa ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Eric Dumazet @ 2022-11-18 12:36 UTC (permalink / raw) To: Tetsuo Handa Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Hideaki YOSHIFUJI, David Ahern, Jakub Sitnicki, Tom Parkin, syzbot, netdev, syzkaller-bugs, Haowei Yan On Fri, Nov 18, 2022 at 3:51 AM Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > syzbot is reporting sleep in atomic context at l2tp_tunnel_register() [1], > for commit b68777d54fac ("l2tp: Serialize access to sk_user_data with > sk_callback_lock") missed that udp_tunnel_encap_enable() from > setup_udp_tunnel_sock() might sleep. > > Since we don't want to drop sk->sk_callback_lock inside > setup_udp_tunnel_sock() right before calling udp_tunnel_encap_enable(), > introduce a variant which does not call udp_tunnel_encap_enable(). And > call udp_tunnel_encap_enable() after dropping sk->sk_callback_lock. > > Also, drop sk->sk_callback_lock before calling sock_release() in order to > avoid circular locking dependency problem. Please look at recent discussion, your patch does not address another fundamental problem. Also, Jakub was working on a fix already. Perhaps sync with him to avoid duplicate work. https://lore.kernel.org/netdev/20221114191619.124659-1-jakub@cloudflare.com/T/ Thanks. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 6.1-rc6] l2tp: call udp_tunnel_encap_enable() and sock_release() without sk_callback_lock 2022-11-18 12:36 ` Eric Dumazet @ 2022-11-18 13:19 ` Tetsuo Handa 2022-11-18 15:04 ` Eric Dumazet 2022-11-18 17:50 ` Jakub Sitnicki [not found] ` <a2199ab7c03e71af3ac791e119e52c94e9f023f56c8b0d8014dd70aceee2784e@mu> 2 siblings, 1 reply; 11+ messages in thread From: Tetsuo Handa @ 2022-11-18 13:19 UTC (permalink / raw) To: Eric Dumazet Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Hideaki YOSHIFUJI, David Ahern, Jakub Sitnicki, Tom Parkin, syzbot, netdev, syzkaller-bugs, Haowei Yan On 2022/11/18 21:36, Eric Dumazet wrote: > Please look at recent discussion, your patch does not address another > fundamental problem. > > Also, Jakub was working on a fix already. Perhaps sync with him to > avoid duplicate work. I can't afford monitoring all mailing lists. Since a thread at syzkaller-bugs group did not get that information, I started this work. Please consider including syzbot+XXXXXXXXXXXXXXXXXXXX@syzkaller.appspotmail.com into the discussions so that we can google for recent discussions (if any) using mail address as a keyword. > > https://lore.kernel.org/netdev/20221114191619.124659-1-jakub@cloudflare.com/T/ > > Thanks. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 6.1-rc6] l2tp: call udp_tunnel_encap_enable() and sock_release() without sk_callback_lock 2022-11-18 13:19 ` Tetsuo Handa @ 2022-11-18 15:04 ` Eric Dumazet 0 siblings, 0 replies; 11+ messages in thread From: Eric Dumazet @ 2022-11-18 15:04 UTC (permalink / raw) To: Tetsuo Handa Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Hideaki YOSHIFUJI, David Ahern, Jakub Sitnicki, Tom Parkin, syzbot, netdev, syzkaller-bugs, Haowei Yan On Fri, Nov 18, 2022 at 5:19 AM Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > On 2022/11/18 21:36, Eric Dumazet wrote: > > Please look at recent discussion, your patch does not address another > > fundamental problem. > > > > Also, Jakub was working on a fix already. Perhaps sync with him to > > avoid duplicate work. > > I can't afford monitoring all mailing lists. Since a thread at syzkaller-bugs group > did not get that information, I started this work. Please consider including > syzbot+XXXXXXXXXXXXXXXXXXXX@syzkaller.appspotmail.com into the discussions so that > we can google for recent discussions (if any) using mail address as a keyword. > This is not going to happen. The discussion happened before the reports were made public. No more than 7 syzbot reports are attached to the same root cause. We deal with hundreds of syzbot reports per week, there is no way we can retroactively find all relevant netdev@ threads. If you can not afford making sure you are not wasting your time, this is your call. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 6.1-rc6] l2tp: call udp_tunnel_encap_enable() and sock_release() without sk_callback_lock 2022-11-18 12:36 ` Eric Dumazet 2022-11-18 13:19 ` Tetsuo Handa @ 2022-11-18 17:50 ` Jakub Sitnicki 2022-11-19 10:08 ` Tetsuo Handa [not found] ` <a2199ab7c03e71af3ac791e119e52c94e9f023f56c8b0d8014dd70aceee2784e@mu> 2 siblings, 1 reply; 11+ messages in thread From: Jakub Sitnicki @ 2022-11-18 17:50 UTC (permalink / raw) To: Tetsuo Handa, Eric Dumazet Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Hideaki YOSHIFUJI, David Ahern, Tom Parkin, syzbot, netdev, syzkaller-bugs, Haowei Yan On Fri, Nov 18, 2022 at 04:36 AM -08, Eric Dumazet wrote: > On Fri, Nov 18, 2022 at 3:51 AM Tetsuo Handa > <penguin-kernel@i-love.sakura.ne.jp> wrote: >> >> syzbot is reporting sleep in atomic context at l2tp_tunnel_register() [1], >> for commit b68777d54fac ("l2tp: Serialize access to sk_user_data with >> sk_callback_lock") missed that udp_tunnel_encap_enable() from >> setup_udp_tunnel_sock() might sleep. >> >> Since we don't want to drop sk->sk_callback_lock inside >> setup_udp_tunnel_sock() right before calling udp_tunnel_encap_enable(), >> introduce a variant which does not call udp_tunnel_encap_enable(). And >> call udp_tunnel_encap_enable() after dropping sk->sk_callback_lock. >> >> Also, drop sk->sk_callback_lock before calling sock_release() in order to >> avoid circular locking dependency problem. > > Please look at recent discussion, your patch does not address another > fundamental problem. > > Also, Jakub was working on a fix already. Perhaps sync with him to > avoid duplicate work. > > https://lore.kernel.org/netdev/20221114191619.124659-1-jakub@cloudflare.com/T/ > > Thanks. Thanks for the patch, Tetsuo. As Eric has pointed out [1], there is another problem - in addition to sleeping in atomic context, I have also failed to use the write_lock variant which disabled BH locally. The latter bug can lead to dead-locks, as reported by syzcaller [2, 3], because we grab sk_callback_lock in softirq context, which can then block waiting on us if: 1) it runs on the same CPU, or CPU0 ---- lock(clock-AF_INET6); <Interrupt> lock(clock-AF_INET6); 2) lock ordering leads to priority inversion CPU0 CPU1 ---- ---- lock(clock-AF_INET6); local_irq_disable(); lock(&tcp_hashinfo.bhash[i].lock); lock(clock-AF_INET6); <Interrupt> lock(&tcp_hashinfo.bhash[i].lock); IOW, your patch works if we also s/write_\(un\)\?lock/write_\1lock_bh/. But, I also have an alternative idea - instead of pulling the function call that might sleep out of the critical section, I think we can make the critical section much shorter by rearranging the tunnel initialization code slightly. That is, a change like below. -jkbs [1] https://lore.kernel.org/netdev/CANn89iLQUZnyGNCn2GpW31FXpE_Lt7a5Urr21RqzfAE4sYxs+w@mail.gmail.com/ [2] https://lore.kernel.org/netdev/000000000000e38b6605eda76f98@google.com [3] https://lore.kernel.org/netdev/000000000000dfa31e05eda76f75@google.com/ --8<-- diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c index 754fdda8a5f5..07454c0418e3 100644 --- a/net/l2tp/l2tp_core.c +++ b/net/l2tp/l2tp_core.c @@ -1474,11 +1474,15 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, } sk = sock->sk; - write_lock(&sk->sk_callback_lock); + write_lock_bh(&sk->sk_callback_lock); ret = l2tp_validate_socket(sk, net, tunnel->encap); if (ret < 0) goto err_sock; + if (tunnel->encap != L2TP_ENCAPTYPE_UDP) + rcu_assign_sk_user_data(sk, tunnel); + + write_unlock_bh(&sk->sk_callback_lock); tunnel->l2tp_net = net; pn = l2tp_pernet(net); @@ -1507,8 +1511,6 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, }; setup_udp_tunnel_sock(net, sock, &udp_cfg); - } else { - rcu_assign_sk_user_data(sk, tunnel); } tunnel->old_sk_destruct = sk->sk_destruct; @@ -1522,7 +1524,6 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, if (tunnel->fd >= 0) sockfd_put(sock); - write_unlock(&sk->sk_callback_lock); return 0; err_sock: @@ -1530,8 +1531,6 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, sock_release(sock); else sockfd_put(sock); - - write_unlock(&sk->sk_callback_lock); err: return ret; } ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 6.1-rc6] l2tp: call udp_tunnel_encap_enable() and sock_release() without sk_callback_lock 2022-11-18 17:50 ` Jakub Sitnicki @ 2022-11-19 10:08 ` Tetsuo Handa 2022-11-19 13:13 ` Jakub Sitnicki 0 siblings, 1 reply; 11+ messages in thread From: Tetsuo Handa @ 2022-11-19 10:08 UTC (permalink / raw) To: Jakub Sitnicki, Eric Dumazet Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Hideaki YOSHIFUJI, David Ahern, Tom Parkin, syzbot, netdev, syzkaller-bugs, Haowei Yan On 2022/11/19 2:50, Jakub Sitnicki wrote: > Thanks for the patch, Tetsuo. > > As Eric has pointed out [1], there is another problem - in addition to > sleeping in atomic context, I have also failed to use the write_lock > variant which disabled BH locally. > > The latter bug can lead to dead-locks, as reported by syzcaller [2, 3], > because we grab sk_callback_lock in softirq context, which can then > block waiting on us if: Below is another approach I was thinking of, for reusing existing locks is prone to locking bugs like [2] and [3]. I couldn't interpret "Write-protected by @sk_callback_lock." part because it does not say what lock is needed for protecting sk_user_data for read access. Is it possible to use a mutex dedicated for l2tp_tunnel_destruct() (and optionally setup_udp_tunnel_sock_no_enable() in order not to create l2tp_tunnel_register_mutex => cpu_hotplug_lock chain) ? By the way I haven't heard an response on Since userspace-supplied file descriptor has to be a datagram socket, can we somehow copy the source/destination addresses from userspace-supplied socket to kernel-created socket? at https://lkml.kernel.org/r/c9695548-3f27-dda1-3124-ec21da106741@I-love.SAKURA.ne.jp (that is, always create a new socket in order to be able to assign lockdep class before that socket is used). diff --git a/include/net/sock.h b/include/net/sock.h index e0517ecc6531..49473013afa6 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -323,7 +323,7 @@ struct sk_filter; * @sk_tskey: counter to disambiguate concurrent tstamp requests * @sk_zckey: counter to order MSG_ZEROCOPY notifications * @sk_socket: Identd and reporting IO signals - * @sk_user_data: RPC layer private data. Write-protected by @sk_callback_lock. + * @sk_user_data: RPC layer private data. * @sk_frag: cached page frag * @sk_peek_off: current peek_offset value * @sk_send_head: front of stuff to transmit diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c index 754fdda8a5f5..2bfcf6968d89 100644 --- a/net/l2tp/l2tp_core.c +++ b/net/l2tp/l2tp_core.c @@ -1150,10 +1150,8 @@ static void l2tp_tunnel_destruct(struct sock *sk) } /* Remove hooks into tunnel socket */ - write_lock_bh(&sk->sk_callback_lock); sk->sk_destruct = tunnel->old_sk_destruct; sk->sk_user_data = NULL; - write_unlock_bh(&sk->sk_callback_lock); /* Call the original destructor */ if (sk->sk_destruct) @@ -1455,6 +1453,7 @@ static int l2tp_validate_socket(const struct sock *sk, const struct net *net, int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, struct l2tp_tunnel_cfg *cfg) { + static DEFINE_MUTEX(l2tp_tunnel_register_mutex); struct l2tp_tunnel *tunnel_walk; struct l2tp_net *pn; struct socket *sock; @@ -1474,7 +1473,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, } sk = sock->sk; - write_lock(&sk->sk_callback_lock); + mutex_lock(&l2tp_tunnel_register_mutex); ret = l2tp_validate_socket(sk, net, tunnel->encap); if (ret < 0) @@ -1519,19 +1518,18 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, trace_register_tunnel(tunnel); + mutex_unlock(&l2tp_tunnel_register_mutex); if (tunnel->fd >= 0) sockfd_put(sock); - write_unlock(&sk->sk_callback_lock); return 0; err_sock: + mutex_unlock(&l2tp_tunnel_register_mutex); if (tunnel->fd < 0) sock_release(sock); else sockfd_put(sock); - - write_unlock(&sk->sk_callback_lock); err: return ret; } ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 6.1-rc6] l2tp: call udp_tunnel_encap_enable() and sock_release() without sk_callback_lock 2022-11-19 10:08 ` Tetsuo Handa @ 2022-11-19 13:13 ` Jakub Sitnicki 0 siblings, 0 replies; 11+ messages in thread From: Jakub Sitnicki @ 2022-11-19 13:13 UTC (permalink / raw) To: Tetsuo Handa Cc: Eric Dumazet, David S. Miller, Jakub Kicinski, Paolo Abeni, Hideaki YOSHIFUJI, David Ahern, Tom Parkin, syzbot, netdev, syzkaller-bugs, Haowei Yan On Sat, Nov 19, 2022 at 07:08 PM +09, Tetsuo Handa wrote: > On 2022/11/19 2:50, Jakub Sitnicki wrote: >> Thanks for the patch, Tetsuo. >> >> As Eric has pointed out [1], there is another problem - in addition to >> sleeping in atomic context, I have also failed to use the write_lock >> variant which disabled BH locally. >> >> The latter bug can lead to dead-locks, as reported by syzcaller [2, 3], >> because we grab sk_callback_lock in softirq context, which can then >> block waiting on us if: > > Below is another approach I was thinking of, for reusing existing locks is prone > to locking bugs like [2] and [3]. > > I couldn't interpret "Write-protected by @sk_callback_lock." part because > it does not say what lock is needed for protecting sk_user_data for read access. sk_user_data is RCU-protected on reader-side. But we still need to synchronize writers. > Is it possible to use a mutex dedicated for l2tp_tunnel_destruct() (and optionally > setup_udp_tunnel_sock_no_enable() in order not to create l2tp_tunnel_register_mutex => > cpu_hotplug_lock chain) ? No, we need to a common lock to synchronize with other users in the net stack (reuseport groups, sockmap/psock to name a couple). > By the way I haven't heard an response on > > Since userspace-supplied file descriptor has to be a datagram socket, > can we somehow copy the source/destination addresses from > userspace-supplied socket to kernel-created socket? > > at https://lkml.kernel.org/r/c9695548-3f27-dda1-3124-ec21da106741@I-love.SAKURA.ne.jp > (that is, always create a new socket in order to be able to assign lockdep class > before that socket is used). This is a drive by fix for me to l2tp, so I might not be the best person to ask, but I will take a look at the thread. [...] ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <a2199ab7c03e71af3ac791e119e52c94e9f023f56c8b0d8014dd70aceee2784e@mu>]
* Re: [PATCH 6.1-rc6] l2tp: call udp_tunnel_encap_enable() and sock_release() without sk_callback_lock [not found] ` <a2199ab7c03e71af3ac791e119e52c94e9f023f56c8b0d8014dd70aceee2784e@mu> @ 2022-11-18 22:10 ` Jakub Sitnicki 0 siblings, 0 replies; 11+ messages in thread From: Jakub Sitnicki @ 2022-11-18 22:10 UTC (permalink / raw) To: Tetsuo Handa, Eric Dumazet Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Hideaki YOSHIFUJI, David Ahern, Tom Parkin, syzbot, netdev, syzkaller-bugs, Haowei Yan On Fri, Nov 18, 2022 at 06:50 PM +01, Jakub Sitnicki wrote: [...] > But, I also have an alternative idea - instead of pulling the function > call that might sleep out of the critical section, I think we could make > the critical section much shorter by rearranging the tunnel > initialization code slightly. That is, a change like below. [...] > --8<-- > > diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c > index 754fdda8a5f5..07454c0418e3 100644 > --- a/net/l2tp/l2tp_core.c > +++ b/net/l2tp/l2tp_core.c > @@ -1474,11 +1474,15 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, > } > > sk = sock->sk; > - write_lock(&sk->sk_callback_lock); > + write_lock_bh(&sk->sk_callback_lock); > > ret = l2tp_validate_socket(sk, net, tunnel->encap); > if (ret < 0) > goto err_sock; > + if (tunnel->encap != L2TP_ENCAPTYPE_UDP) > + rcu_assign_sk_user_data(sk, tunnel); sk_user_data needs to be reset back to NULL if we bail out when the tunnel already exists. Will add that and turn it into a patch tomorrow. > + > + write_unlock_bh(&sk->sk_callback_lock); > > tunnel->l2tp_net = net; > pn = l2tp_pernet(net); > @@ -1507,8 +1511,6 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, > }; > > setup_udp_tunnel_sock(net, sock, &udp_cfg); > - } else { > - rcu_assign_sk_user_data(sk, tunnel); > } > > tunnel->old_sk_destruct = sk->sk_destruct; > @@ -1522,7 +1524,6 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, > if (tunnel->fd >= 0) > sockfd_put(sock); > > - write_unlock(&sk->sk_callback_lock); > return 0; > > err_sock: > @@ -1530,8 +1531,6 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, > sock_release(sock); > else > sockfd_put(sock); > - > - write_unlock(&sk->sk_callback_lock); > err: > return ret; > } ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2022-11-19 13:18 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-11-17 9:55 [syzbot] BUG: sleeping function called from invalid context in static_key_slow_inc syzbot 2022-11-17 12:03 ` syzbot 2022-11-18 1:56 ` syzbot 2022-11-18 11:51 ` [PATCH 6.1-rc6] l2tp: call udp_tunnel_encap_enable() and sock_release() without sk_callback_lock Tetsuo Handa 2022-11-18 12:36 ` Eric Dumazet 2022-11-18 13:19 ` Tetsuo Handa 2022-11-18 15:04 ` Eric Dumazet 2022-11-18 17:50 ` Jakub Sitnicki 2022-11-19 10:08 ` Tetsuo Handa 2022-11-19 13:13 ` Jakub Sitnicki [not found] ` <a2199ab7c03e71af3ac791e119e52c94e9f023f56c8b0d8014dd70aceee2784e@mu> 2022-11-18 22:10 ` Jakub Sitnicki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).